Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Identifying the Most Influential Variables in Breast Cancer Using Logistic Regression


Affiliations
1 University of Fallujah, Iraq
     

   Subscribe/Renew Journal


Breast cancer has become recently the most common cancer and a major cause of death among women all over the world and especially in developing countries like Iraq. This study aims to identify the most important features that affect in deciding the type of breast cancer whether benign or malignant.

A predictive model was developed using binary logistic regression which is expected to be helpful for oncologists in diagnosing the type of breast cancer data set have been downloaded from UCI ml repository that consists of 9attributes and 683valid instances.

At first, some preprocessing was done to cleanse the data, then two models were built using two different LR method to find out which one will give the most suitable model and highest classification rate. The first one was the full model with all predictive variables, while the other called reduced model with only 5 predictive variables. Each model was validated with a different data set than that used for developing the two models. Both validated and trained models were evaluated using different performance metrics like ROC curves, AUC, sensitivity and specificity. The analysis of the results showed that the reduced model is the best classifier since it gives the higher classification rate.


Keywords

UCI ML Repository, Logistic Regression, Classification, Validation, Breast Cancer.
Subscription Login to verify subscription
User
Notifications
Font Size


Abstract Views: 414

PDF Views: 0




  • Identifying the Most Influential Variables in Breast Cancer Using Logistic Regression

Abstract Views: 414  |  PDF Views: 0

Authors

Yousra Abdulaziz Mohammed
University of Fallujah, Iraq

Abstract


Breast cancer has become recently the most common cancer and a major cause of death among women all over the world and especially in developing countries like Iraq. This study aims to identify the most important features that affect in deciding the type of breast cancer whether benign or malignant.

A predictive model was developed using binary logistic regression which is expected to be helpful for oncologists in diagnosing the type of breast cancer data set have been downloaded from UCI ml repository that consists of 9attributes and 683valid instances.

At first, some preprocessing was done to cleanse the data, then two models were built using two different LR method to find out which one will give the most suitable model and highest classification rate. The first one was the full model with all predictive variables, while the other called reduced model with only 5 predictive variables. Each model was validated with a different data set than that used for developing the two models. Both validated and trained models were evaluated using different performance metrics like ROC curves, AUC, sensitivity and specificity. The analysis of the results showed that the reduced model is the best classifier since it gives the higher classification rate.


Keywords


UCI ML Repository, Logistic Regression, Classification, Validation, Breast Cancer.



DOI: https://doi.org/10.37506/v11%2Fi2%2F2020%2Fijphrd%2F195119