Detecting Parkinson’s Disease with XGBoost

post

Build a Parkinson’s disease detection model using XGBoost in this beginner-friendly Python machine learning project.

Detecting Parkinson’s Disease – Python ML Project

What is Parkinson’s Disease?

Parkinson’s disease is a progressive disorder of the central nervous system that affects movement and causes tremors and stiffness. It has five stages and affects over a million people every year in India. It is a chronic neurological condition with no known cure, caused by the degeneration of dopamine-producing neurons in the brain.

What is XGBoost?

XGBoost stands for eXtreme Gradient Boosting. It is a high-performance machine learning algorithm based on decision trees. In this project, we will use the XGBClassifier from the xgboost library, which integrates well with scikit-learn.

Project Objective

To develop a model that can accurately detect whether a person has Parkinson’s disease based on a given set of features.

Project Overview

We will use Python libraries like scikit-learn, numpy, pandas, and xgboost to build our model. The key steps will include loading the dataset, extracting features and labels, scaling features, splitting the dataset, training the model with XGBClassifier, and evaluating the model accuracy.

Dataset

We will use the Parkinson's dataset from the UCI Machine Learning Repository. The dataset contains 24 features and 195 records.

Prerequisites

Make sure the following libraries are installed:

bash

pip install numpy pandas sklearn xgboost

Also, install Jupyter Lab to run the project interactively:

bash

jupyter lab

Steps to Detect Parkinson’s Disease with XGBoost

1. Import Required Libraries

python

import numpy as np import pandas as pd import os, sys from sklearn.preprocessing import MinMaxScaler from xgboost import XGBClassifier from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score

2. Load the Dataset

python

# Debugshala - Read the data df = pd.read_csv('D:\\Debugshala\\parkinsons.data') df.head()

3. Extract Features and Labels

python

# Debugshala - Get the features and labels features = df.loc[:, df.columns != 'status'].values[:, 1:] labels = df.loc[:, 'status'].values

4. Check Label Distribution

python

# Debugshala - Count of labels (0 and 1) print(labels[labels == 1].shape[0], labels[labels == 0].shape[0])

There are 147 positive cases and 48 negative cases in the dataset.

5. Scale the Features

python

# Debugshala - Normalize features between -1 and 1 scaler = MinMaxScaler((-1, 1)) x = scaler.fit_transform(features) y = labels

6. Split the Dataset

python

# Debugshala - Split into training and testing sets x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=7)

7. Train the Model

python

# Debugshala - Train using XGBClassifier model = XGBClassifier() model.fit(x_train, y_train)

8. Make Predictions and Check Accuracy

python

# Debugshala - Predict and evaluate accuracy y_pred = model.predict(x_test) print(accuracy_score(y_test, y_pred) * 100)

This gives us an accuracy of 94.87%, which is excellent for such a compact and clean implementation.

Summary

In this project, we successfully created a machine learning model that can detect Parkinson’s disease using voice and biomedical data. Using XGBClassifier and scikit-learn tools, we achieved an impressive accuracy of nearly 95%. This shows how effective machine learning can be in health diagnostics with relatively small datasets and simple models.


Share This Job:

Write A Comment

    No Comments