top of page
istockphoto-1273931100-170667a.jpg

THE DETECTION OF A MALIGNANT TUMOR IN THE BREAST

women pink.jpg

There are many factors that can contribute to this disease. Genetics, family history, and race are quite important though uncontrollable factors. Being overweight, lack of exercise, smoking, and eating unhealthy food can also contribute to breast cancer.


There are mainly two types of breast tumor

  1). Benign

  2). Malignant


A tumor can be benign (not dangerous to health) or malignant  (has the potential to be dangerous). Benign tumors are not considered cancerous: their cells are close to normal in appearance, they grow slowly, and they do not invade nearby tissues or spread to other parts of the body. 

Malignant tumors are cancerous. Left unchecked, malignant cells eventually can spread beyond the original tumor to other parts of the body.

The term “breast cancer” refers to a malignant tumor that has developed from cells in the breast. 


Stages of breast cancer


Breast cancer stage is usually expressed as a number on a scale of 0 through IV — with stage 0 describing non-invasive cancers that remain within their original location and stage IV describing invasive cancers that have spread outside the breast to other parts of the body.


The main aim of this project is to detect the malignant tumor in early stages so that the patient can get enough time for treatment and a healthy recovery. 

For this, the dataset that is used has 30 different features regarding the texture, dimensionality, and shape of the tumor. Various models are used to detect whether it is Benign or Malignant.

The 30 features of the dataset are described below:

radius_mean

texture_mean

perimeter_mean

area_mean

smoothness_mean

compactness_mean

concavity_mean

concave points_mean

symmetry_mean

fractal_dimension_mean

radius_se

texture_se

perimeter_se

area_se

smoothness_se

compactness_se

concavity_se

concave points_se

symmetry_se

fractal_dimension_se

radius_worst

texture_worst

perimeter_worst

area_worst

smoothness_worst

compactness_worst

concavity_worst

concave points_worst

symmetry_worst

fractal_dimension_worst


Prediction models: These models are used for the classification of the tumor in breast

K-nearest neighbors

Logistic Regression

Decision Tree

Random Forest

Gradient Boosting

SVM

MLP Classification

K-NEAREST NEIGHBORS

K-Nearest Neighbors algorithm is used to classify the type of breast cancer. Pre-processed data is split into training and testing dataset. The prediction accuracy achieved on training dataset is 96%, whereas on testing dataset, the achieved accuracy is 92%.


The Graph below shows the relation between training and testing accuracy.

This is your Project description. Whether your work is based on text, images, videos or a different medium, providing a brief summary will help visitors understand the context and background. Then use the media section to showcase your project!

knn.png

LOGISTIC REGRESSION

For Logistic Regression the results achieved are describes below.

Training set accuracy: 95.5%

Testing set accuracy: 93.7%


The graph of coefficient magnitude and features is plotted.

This is your Project description. Whether your work is based on text, images, videos or a different medium, providing a brief summary will help visitors understand the context and background. Then use the media section to showcase your project!

lr_edited.png

DECISION TREE

Accuracy on training set: 0.986

Accuracy on test set: 0.937


Feature importance graph for Decision Tree is plotted as below.

This is your Project description. Whether your work is based on text, images, videos or a different medium, providing a brief summary will help visitors understand the context and background. Then use the media section to showcase your project!

DT.png

RANDOM FOREST

The accuracy achieved with RF and the feature importance graph are showed below.
Accuracy on training set: 1.000
Accuracy on test set: 0.958

This is your Project description. Whether your work is based on text, images, videos or a different medium, providing a brief summary will help visitors understand the context and background. Then use the media section to showcase your project!

RF.png

GRADIENT BOOSTING

The prediction accuracy achieved on training dataset is 100%, whereas on testing dataset, the achieved accuracy is 95.8%.

This is your Project description. Whether your work is based on text, images, videos or a different medium, providing a brief summary will help visitors understand the context and background. Then use the media section to showcase your project!

Gradient boosting.png

MLP CLASSIFICATION

MLP classification
Accuracy on training set: 0.958
Accuracy on test set: 0.979

The plot shows the weights that were learned connecting the input to the first hidden layer. The rows in this plot correspond to the 30 input features, while the columns correspond to the 100 hidden units. Light colors represent large positive values, while dark colors represent negative values.
One possible inference we can make is that features that have very small weights for all of the hidden units are “less important” to the model. We can see that “mean smoothness” and “mean compactness,” in addition to the features found between “smoothness error” and “fractal dimension error,” have relatively low weights compared to other features. This could mean that these are less important features or possibly that we didn’t represent them in a way that the neural network could use.

neural network.png
bottom of page