top of page

CARDIOVASCULAR DISEASE FORECASTING SCHEME

Around 17.3 million people died because of some sort of cardiovascular disease (CVD) in the world last year. One person dies every 36 seconds in the United States from CVD. The number of cases is increasing year on year in the world making it the deadliest disease, which causes around 32% of the total deaths.

cvs 1.jpg

Many factors are contributing to the disease, and the most common are high blood pressure because of hypertension and physical inactivity. Other factors can be Diabetes, smoking, obesity, an unhealthy diet, and excessive alcohol usage. High blood pressure is becoming more common in young people because of their exposure to processed food and sedentary lifestyle. All these problems can lead to CVD and can be proved fatal.

The main aim of this project is to predict whether the person has a CVD or not using 13 different parameters. The dataset for this project was taken from www.kaggle.com. It has a total of 1025 data entries. The main factors that are considered are described as follows:

1) Age (in years)

2) Sex (Male/ Female)

3) Type of the chest pain

a. Typical angina: chest pain related to decreased blood supply to the heart

b. Atypical angina: chest pain not related to the heart

c. Non-anginal pain: typically, esophageal spasms (not related to the heart)

d. Asymptomatic: chest pain not showing signs of disease

4) resting blood pressure [above 130-140]

5) serum cholesterol in mg/dl [above 200]

6) fasting blood sugar [above 120]

7) resting electrocardiographic results

8) maximum heart rate achieved

9) exercise-induced angina

10) ST depression

11) slope of the peak exercise ST segment

12) number of major vessels (0-3) colored by fluoroscopy

13) Thallium stress result

The dataset is explored using different bar charts and histograms to get some insights into the data. The correlation matrix is also plotted. The figure below shows the correlation matrix diagram of 13 features.

After exploring the dataset, the data is brought to the preprocessing stage. Here, it is normalized using the standard scaler method. After preprocessing, the dataset is split between training and testing data. 257 entries mean 25% of the total is reserved for testing and the rest of the dataset is reserved for training purposes.

Finally, the data is ready to be fed into ML models. These ML models are used for the predictions of CVD.

1) K-Neighbors Classifier

2) Decision Tree Classifier

3) Gaussian Naive Bayes Classifier

4) Support Vector Machine Classifier

5) Random Forest Classifier

The training and testing accuracies of all the models are described in the table below.

Screenshot (36).png

The graph below shows the comparison of each model. It can be clearly seen that the Random Forest classifier is more accurate in the prediction of CVD than the rest of the ML algorithms for this particular dataset.

cardio graph.png

bottom of page