🧪

Heart Disease Modeling (ML3)

Completed 2024 ML experimentation with SVM/LDA and tuned preprocessing

ML3 extends the heart disease work with KNN-based imputation, SVM variants (LinearSVC/SVC) grid-tuned over C/gamma/kernel, and LDA grid-tuned over solver/shrinkage. It preserves stratified splits from prior preprocessing, evaluates accuracy/precision/recall/F1, and handles sklearn predict quirks by passing numpy arrays. KNN-imputed preprocessing leverages RandomizedSearchCV to control compute while exploring the hyperparameter space.

Data Science Machine Learning Python Development Healthcare Analytics Classification Model Tuning

Overview

ML3 extends the heart disease work with KNN-based imputation, SVM variants (LinearSVC/SVC) grid-tuned over C/gamma/kernel, and LDA grid-tuned over solver/shrinkage. It preserves stratified splits from prior preprocessing, evaluates accuracy/precision/recall/F1, and handles sklearn predict quirks by passing numpy arrays. KNN-imputed preprocessing leverages RandomizedSearchCV to control compute while exploring the hyperparameter space.

Key Features

KNN-imputed preprocessing for heart disease data

SVM (LinearSVC/SVC) tuned over C/gamma/kernel

LDA tuned over solver and shrinkage

GridSearchCV and RandomizedSearchCV for efficient hyperparameter search

pages.portfolio.projects.heart_disease_ml3.features.4

Stratified splits preserved from preprocessing

Workaround for sklearn predict array requirements (DataFrame to numpy)

Technical Highlights

Compared SVM and LDA with tuned hyperparameters on heart disease data

Used KNN-imputed preprocessing and randomized search to manage compute

Reported strong LDA performance (Acc ~0.87, F1 ~0.86) and tuned SVC results

Handled sklearn predict quirk by converting DataFrames to numpy arrays

Challenges and Solutions

Search Space Size

Balanced exhaustive grid search with randomized search for the KNN-imputed variant

API Quirks

Ensured predict compatibility by using numpy arrays with certain sklearn versions

Model Coverage

Benchmarked multiple classifiers (SVM variants, LDA) to find best-performing configs

Technologies

ML

Scikit-learn SVC/LinearSVC LDA GridSearchCV RandomizedSearchCV

Data

Pandas NumPy

Viz

Matplotlib Seaborn

Environment

Python Jupyter Notebook

Project Information

Status
Completed
Year
2024
Architecture
ML experimentation with SVM/LDA and tuned preprocessing
Category
Data Science