🩺

Heart Disease KNN Modeling

Completed 2024 ML experimentation with k-NN, Grid/Random search, and evaluation pipeline

This project trains and tunes k-NN classifiers for heart disease prediction using two scenarios: baseline preprocessed features from the J1 pipeline, and KNN-imputed feature sets. It compares GridSearchCV (baseline) and RandomizedSearchCV (KNN-imputed) for hyperparameter tuning, evaluates accuracy/precision/recall/F1, and includes confusion matrices, classification reports, ROC and PR curves. A known sklearn 1.3 predict quirk is handled by passing numpy arrays instead of DataFrames.

Data Science Machine Learning Python Development Healthcare Analytics Classification Model Tuning

Overview

This project trains and tunes k-NN classifiers for heart disease prediction using two scenarios: baseline preprocessed features from the J1 pipeline, and KNN-imputed feature sets. It compares GridSearchCV (baseline) and RandomizedSearchCV (KNN-imputed) for hyperparameter tuning, evaluates accuracy/precision/recall/F1, and includes confusion matrices, classification reports, ROC and PR curves. A known sklearn 1.3 predict quirk is handled by passing numpy arrays instead of DataFrames.

Key Features

Baseline k-NN on preprocessed heart disease data

KNN-imputed variant with separate tuning

pages.portfolio.projects.heart_disease_knn_modeling.features.2

pages.portfolio.projects.heart_disease_knn_modeling.features.3

pages.portfolio.projects.heart_disease_knn_modeling.features.4

Handles sklearn predict array requirement (DataFrame to numpy)

Stratified split preserved from preprocessing stage

Technical Highlights

Compared baseline vs KNN-imputed feature sets with tuned k-NN models

Used GridSearchCV for exhaustive search and RandomizedSearchCV to cut compute

Reported full metrics suite and visual diagnostics (confusion matrix, ROC, PR)

Worked around sklearn 1.3 predict bug by using numpy arrays

Challenges and Solutions

Hyperparameter Search Scope

Balanced exhaustive grid for baseline with randomized search to reduce compute on imputed set

Predict API Quirk

Handled sklearn DataFrame predict issue by passing numpy arrays

Evaluation Coverage

Captured accuracy, precision, recall, F1 plus confusion matrix, ROC, and PR curves

Technologies

ML

Scikit-learn KNeighborsClassifier GridSearchCV RandomizedSearchCV

Data

Pandas NumPy

Viz

Matplotlib Seaborn

Environment

Python Jupyter Notebook

Project Information

Status
Completed
Year
2024
Architecture
ML experimentation with k-NN, Grid/Random search, and evaluation pipeline
Category
Data Science