📈

Heart Disease Naive Bayes & Logistic Regression

Completed 2024 ML experimentation with multiple classifiers and preprocessing strategies

This project implements and compares multiple classification algorithms for heart disease prediction: Naive Bayes variants (Gaussian, Categorical, and Mixed) and Logistic Regression. The project demonstrates different approaches to handling mixed data types (numeric, categorical, ordinal, binary) and comprehensive hyperparameter tuning using GridSearchCV. Each model is evaluated on the Heart Disease UCI dataset with proper preprocessing pipelines and cross-validation. The project showcases model-specific preprocessing strategies, solver-penalty compatibility handling, and comprehensive evaluation metrics.

Data Science Machine Learning Python Development Healthcare Analytics Classification Model Tuning Statistical Learning

Overview

This project implements and compares multiple classification algorithms for heart disease prediction: Naive Bayes variants (Gaussian, Categorical, and Mixed) and Logistic Regression. The project demonstrates different approaches to handling mixed data types (numeric, categorical, ordinal, binary) and comprehensive hyperparameter tuning using GridSearchCV. Each model is evaluated on the Heart Disease UCI dataset with proper preprocessing pipelines and cross-validation. The project showcases model-specific preprocessing strategies, solver-penalty compatibility handling, and comprehensive evaluation metrics.

Key Features

pages.portfolio.projects.heart_disease_naive_bayes_logistic.features.0

Logistic Regression with hyperparameter tuning

Model-specific preprocessing strategies

Comprehensive hyperparameter tuning with GridSearchCV

Cross-validation for robust evaluation

Solver-penalty compatibility handling

Full evaluation metrics suite

Mixed data type handling (numeric, categorical, ordinal, binary)

pages.portfolio.projects.heart_disease_naive_bayes_logistic.features.8

Technical Highlights

Compared multiple Naive Bayes variants and Logistic Regression on heart disease data

Implemented model-specific preprocessing for optimal performance

Handled solver-penalty compatibility issues in Logistic Regression

Achieved strong performance with tuned models

Comprehensive evaluation with cross-validation

pages.portfolio.projects.heart_disease_naive_bayes_logistic.highlights.5

Challenges and Solutions

Mixed Data Types

Handled numeric, categorical, ordinal, and binary variables with appropriate preprocessing

Solver-Penalty Compatibility

Managed solver and penalty combinations in Logistic Regression for optimal performance

Model Comparison

Compared multiple algorithms with fair evaluation metrics and preprocessing

Technologies

ML Models

GaussianNB CategoricalNB MixedNB LogisticRegression

Tuning

GridSearchCV

Preprocessing

KNNImputer SimpleImputer OneHotEncoder OrdinalEncoder LabelEncoder MinMaxScaler

Pipeline

Pipeline ColumnTransformer

Data

Pandas NumPy

Environment

Python Jupyter Notebook

Project Information

Status
Completed
Year
2024
Architecture
ML experimentation with multiple classifiers and preprocessing strategies
Category
Data Science