Detecting Parkinson’s Disease with XGBoost
Build a Parkinson’s disease detection model using XGBoost in this beginner-friendly Python machine learning project.
Detecting Parkinson’s Disease – Python ML Project
What is Parkinson’s Disease?
Parkinson’s disease is a progressive disorder of the central nervous system that affects movement and causes tremors and stiffness. It has five stages and affects over a million people every year in India. It is a chronic neurological condition with no known cure, caused by the degeneration of dopamine-producing neurons in the brain.
What is XGBoost?
XGBoost stands for eXtreme Gradient Boosting. It is a high-performance machine learning algorithm based on decision trees. In this project, we will use the XGBClassifier from the xgboost library, which integrates well with scikit-learn.
Project Objective
To develop a model that can accurately detect whether a person has Parkinson’s disease based on a given set of features.
Project Overview
We will use Python libraries like scikit-learn, numpy, pandas, and xgboost to build our model. The key steps will include loading the dataset, extracting features and labels, scaling features, splitting the dataset, training the model with XGBClassifier, and evaluating the model accuracy.
Dataset
We will use the Parkinson's dataset from the UCI Machine Learning Repository. The dataset contains 24 features and 195 records.
Prerequisites
Make sure the following libraries are installed:
bash
pip install numpy pandas sklearn xgboost
Also, install Jupyter Lab to run the project interactively:
bash
jupyter lab
Steps to Detect Parkinson’s Disease with XGBoost
1. Import Required Libraries
python
import numpy as np import pandas as pd import os, sys from sklearn.preprocessing import MinMaxScaler from xgboost import XGBClassifier from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score
2. Load the Dataset
python
# Debugshala - Read the data df = pd.read_csv('D:\\Debugshala\\parkinsons.data') df.head()
3. Extract Features and Labels
python
# Debugshala - Get the features and labels features = df.loc[:, df.columns != 'status'].values[:, 1:] labels = df.loc[:, 'status'].values
4. Check Label Distribution
python
# Debugshala - Count of labels (0 and 1) print(labels[labels == 1].shape[0], labels[labels == 0].shape[0])
There are 147 positive cases and 48 negative cases in the dataset.
5. Scale the Features
python
# Debugshala - Normalize features between -1 and 1 scaler = MinMaxScaler((-1, 1)) x = scaler.fit_transform(features) y = labels
6. Split the Dataset
python
# Debugshala - Split into training and testing sets x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=7)
7. Train the Model
python
# Debugshala - Train using XGBClassifier model = XGBClassifier() model.fit(x_train, y_train)
8. Make Predictions and Check Accuracy
python
# Debugshala - Predict and evaluate accuracy y_pred = model.predict(x_test) print(accuracy_score(y_test, y_pred) * 100)
This gives us an accuracy of 94.87%, which is excellent for such a compact and clean implementation.
Summary
In this project, we successfully created a machine learning model that can detect Parkinson’s disease using voice and biomedical data. Using XGBClassifier and scikit-learn tools, we achieved an impressive accuracy of nearly 95%. This shows how effective machine learning can be in health diagnostics with relatively small datasets and simple models.
Write A Comment
No Comments