Airline Data Analysis - Multi-Task ML
This project is a comprehensive machine learning analysis of airline data, implementing three distinct ML tasks: binary classification, multi-class classification, and regression. The project compares multiple algorithms (Gaussian Naive Bayes, Categorical Naive Bayes, Linear SVC, Logistic Regression, Random Forest) across all three tasks, with extensive model evaluation, comparison, and persistence. The work includes hyperparameter tuning, model selection, and comprehensive performance analysis. Random Forest was identified as the best performing model for both binary and multi-class classification tasks.
Overview
This project is a comprehensive machine learning analysis of airline data, implementing three distinct ML tasks: binary classification, multi-class classification, and regression. The project compares multiple algorithms (Gaussian Naive Bayes, Categorical Naive Bayes, Linear SVC, Logistic Regression, Random Forest) across all three tasks, with extensive model evaluation, comparison, and persistence. The work includes hyperparameter tuning, model selection, and comprehensive performance analysis. Random Forest was identified as the best performing model for both binary and multi-class classification tasks.
Key Features
Binary classification with 5 algorithm types
Multi-class classification with same algorithm suite
Regression models for continuous targets
Hyperparameter tuning with multiple model variants
Best model selection (Random Forest for classification)
Comprehensive model comparison and evaluation
Model persistence and results storage
Visual comparison plots (accuracy, exactitude)
Organized results directories (Binaire_Res, Multi_Res, Regression__Res)
Large dataset processing (15MB airline data)
pages.portfolio.projects.airline_data_analysis_multi_task.features.10
Technical Highlights
Implemented 5+ algorithms across 3 ML tasks (binary, multi-class, regression)
Identified Random Forest as best model for classification tasks
Created comprehensive model comparison with visualizations
Performed hyperparameter tuning with multiple model variants
Organized model persistence with pickle files and reports
Processed large airline dataset (15MB) efficiently
Challenges and Solutions
Algorithm Selection
Compared 5+ algorithms across all tasks to identify best performers
Hyperparameter Tuning
Created multiple model variants with hyperparameter tuning and selected best models
Multi-Task Evaluation
Implemented separate evaluation pipelines with consistent metrics across tasks
Model Persistence
Organized pickle files and reports in separate directories for reproducibility
Performance Comparison
Created comparison plots, accuracy plots, and exactitude plots for visualization
Large Dataset Handling
Efficiently processed 15MB airline dataset with optimized data pipelines
Technologies
ML Models
Tuning
Pipeline
Evaluation
Persistence
Data
Environment
Project Information
- Status
- Completed
- Year
- 2024
- Architecture
- Multi-Task ML Workflow with Binary, Multi-Class, and Regression Pipelines
- Category
- Data Science