✈️

Airline Data Analysis - Multi-Task ML

Completed 2024 Multi-Task ML Workflow with Binary, Multi-Class, and Regression Pipelines

This project is a comprehensive machine learning analysis of airline data, implementing three distinct ML tasks: binary classification, multi-class classification, and regression. The project compares multiple algorithms (Gaussian Naive Bayes, Categorical Naive Bayes, Linear SVC, Logistic Regression, Random Forest) across all three tasks, with extensive model evaluation, comparison, and persistence. The work includes hyperparameter tuning, model selection, and comprehensive performance analysis. Random Forest was identified as the best performing model for both binary and multi-class classification tasks.

Data Science Machine Learning Python Development Classification Regression Model Comparison Hyperparameter Tuning

Overview

This project is a comprehensive machine learning analysis of airline data, implementing three distinct ML tasks: binary classification, multi-class classification, and regression. The project compares multiple algorithms (Gaussian Naive Bayes, Categorical Naive Bayes, Linear SVC, Logistic Regression, Random Forest) across all three tasks, with extensive model evaluation, comparison, and persistence. The work includes hyperparameter tuning, model selection, and comprehensive performance analysis. Random Forest was identified as the best performing model for both binary and multi-class classification tasks.

Key Features

Binary classification with 5 algorithm types

Multi-class classification with same algorithm suite

Regression models for continuous targets

Hyperparameter tuning with multiple model variants

Best model selection (Random Forest for classification)

Comprehensive model comparison and evaluation

Model persistence and results storage

Visual comparison plots (accuracy, exactitude)

Organized results directories (Binaire_Res, Multi_Res, Regression__Res)

Large dataset processing (15MB airline data)

pages.portfolio.projects.airline_data_analysis_multi_task.features.10

Technical Highlights

Implemented 5+ algorithms across 3 ML tasks (binary, multi-class, regression)

Identified Random Forest as best model for classification tasks

Created comprehensive model comparison with visualizations

Performed hyperparameter tuning with multiple model variants

Organized model persistence with pickle files and reports

Processed large airline dataset (15MB) efficiently

Challenges and Solutions

Algorithm Selection

Compared 5+ algorithms across all tasks to identify best performers

Hyperparameter Tuning

Created multiple model variants with hyperparameter tuning and selected best models

Multi-Task Evaluation

Implemented separate evaluation pipelines with consistent metrics across tasks

Model Persistence

Organized pickle files and reports in separate directories for reproducibility

Performance Comparison

Created comparison plots, accuracy plots, and exactitude plots for visualization

Large Dataset Handling

Efficiently processed 15MB airline dataset with optimized data pipelines

Technologies

ML Models

GaussianNB CategoricalNB LinearSVC LogisticRegression RandomForestClassifier

Tuning

GridSearchCV RandomizedSearchCV

Pipeline

Pipeline ColumnTransformer

Evaluation

Classification Reports Regression Metrics Model Comparison

Persistence

Joblib Pickle

Data

Pandas NumPy Matplotlib Seaborn

Environment

Python Jupyter Notebook

Project Information

Status
Completed
Year
2024
Architecture
Multi-Task ML Workflow with Binary, Multi-Class, and Regression Pipelines
Category
Data Science