✈️

Airline Data Analysis - Multi-Task ML

Completed 2024 • Multi-Task ML Workflow with Binary, Multi-Class, and Regression Pipelines

This project is a comprehensive machine learning analysis of airline data, implementing three distinct ML tasks: binary classification, multi-class classification, and regression. The project compares multiple algorithms (Gaussian Naive Bayes, Categorical Naive Bayes, Linear SVC, Logistic Regression, Random Forest) across all three tasks, with extensive model evaluation, comparison, and persistence. The work includes hyperparameter tuning, model selection, and comprehensive performance analysis. Random Forest was identified as the best performing model for both binary and multi-class classification tasks.

Data Science Machine Learning Python Development Classification Regression Model Comparison Hyperparameter Tuning

Overview

Key Features

✓

Binary classification with 5 algorithm types

✓

Multi-class classification with same algorithm suite

✓

Regression models for continuous targets

✓

Hyperparameter tuning with multiple model variants

✓

Best model selection (Random Forest for classification)

✓

Comprehensive model comparison and evaluation

✓

Model persistence and results storage

✓

Visual comparison plots (accuracy, exactitude)

✓

Organized results directories (Binaire_Res, Multi_Res, Regression__Res)

✓

Large dataset processing (15MB airline data)

✓

pages.portfolio.projects.airline_data_analysis_multi_task.features.10

Technical Highlights

⚡

Implemented 5+ algorithms across 3 ML tasks (binary, multi-class, regression)

⚡

Identified Random Forest as best model for classification tasks

⚡

Created comprehensive model comparison with visualizations

⚡

Performed hyperparameter tuning with multiple model variants

⚡

Organized model persistence with pickle files and reports

⚡

Processed large airline dataset (15MB) efficiently

Challenges and Solutions

Algorithm Selection

Compared 5+ algorithms across all tasks to identify best performers

Hyperparameter Tuning

Created multiple model variants with hyperparameter tuning and selected best models

Multi-Task Evaluation

Implemented separate evaluation pipelines with consistent metrics across tasks

Model Persistence

Organized pickle files and reports in separate directories for reproducibility

Performance Comparison

Created comparison plots, accuracy plots, and exactitude plots for visualization

Large Dataset Handling

Efficiently processed 15MB airline dataset with optimized data pipelines

Technologies

ML Models

GaussianNB CategoricalNB LinearSVC LogisticRegression RandomForestClassifier

Tuning

GridSearchCV RandomizedSearchCV

Pipeline

Pipeline ColumnTransformer

Evaluation

Classification Reports Regression Metrics Model Comparison

Persistence

Joblib Pickle

Data

Pandas NumPy Matplotlib Seaborn

Environment

Python Jupyter Notebook

Project Information

Status: Completed
Year: 2024
Architecture: Multi-Task ML Workflow with Binary, Multi-Class, and Regression Pipelines
Category: Data Science

Back to Portfolio View Projects Data Science