Heart Disease KNN Modeling
This project trains and tunes k-NN classifiers for heart disease prediction using two scenarios: baseline preprocessed features from the J1 pipeline, and KNN-imputed feature sets. It compares GridSearchCV (baseline) and RandomizedSearchCV (KNN-imputed) for hyperparameter tuning, evaluates accuracy/precision/recall/F1, and includes confusion matrices, classification reports, ROC and PR curves. A known sklearn 1.3 predict quirk is handled by passing numpy arrays instead of DataFrames.
Overview
This project trains and tunes k-NN classifiers for heart disease prediction using two scenarios: baseline preprocessed features from the J1 pipeline, and KNN-imputed feature sets. It compares GridSearchCV (baseline) and RandomizedSearchCV (KNN-imputed) for hyperparameter tuning, evaluates accuracy/precision/recall/F1, and includes confusion matrices, classification reports, ROC and PR curves. A known sklearn 1.3 predict quirk is handled by passing numpy arrays instead of DataFrames.
Key Features
Baseline k-NN on preprocessed heart disease data
KNN-imputed variant with separate tuning
pages.portfolio.projects.heart_disease_knn_modeling.features.2
pages.portfolio.projects.heart_disease_knn_modeling.features.3
pages.portfolio.projects.heart_disease_knn_modeling.features.4
Handles sklearn predict array requirement (DataFrame to numpy)
Stratified split preserved from preprocessing stage
Technical Highlights
Compared baseline vs KNN-imputed feature sets with tuned k-NN models
Used GridSearchCV for exhaustive search and RandomizedSearchCV to cut compute
Reported full metrics suite and visual diagnostics (confusion matrix, ROC, PR)
Worked around sklearn 1.3 predict bug by using numpy arrays
Challenges and Solutions
Hyperparameter Search Scope
Balanced exhaustive grid for baseline with randomized search to reduce compute on imputed set
Predict API Quirk
Handled sklearn DataFrame predict issue by passing numpy arrays
Evaluation Coverage
Captured accuracy, precision, recall, F1 plus confusion matrix, ROC, and PR curves
Technologies
ML
Data
Viz
Environment
Project Information
- Status
- Completed
- Year
- 2024
- Architecture
- ML experimentation with k-NN, Grid/Random search, and evaluation pipeline
- Category
- Data Science