Open Access Open Access  Restricted Access Subscription Access

Developing a Modified Logistic Regression Model for Diabetes Mellitus and Identifying the Important Factors of Type II Dm


Affiliations
1 IT Department, Thiagarajar College of Engineering, Madurai - 625005, Tamil Nadu, India
2 Department of IT, KLN College of Information Technology, Sivaganga - 630612, Tamil Nadu, India
 

Background/Objectives: Different methods can be applied to create predictive models for the clinical data with binary outcome variable. This research aims to explore the process of constructing the modified predictive model of Logistic Regression (LR). Method/Statistical Analysis: To improve the accuracy of prediction, the Distance based Outlier Detection (DBOD) is used for pre-processing and Bipolar Sigmoid Function calculated using Neuro based Weight Activation Function is used in Logistic Regression instead of Sigmoid Function. Datasets were collected from clinical laboratory of AR Hospital in Madurai for the three years 2012, 2013 and 2014 are used for analysis. Data pre-processing is done to avoid the existence of insignificant data in the dataset. The detected outliers, using DBOD method are treated using a method closest to the normal range. A comparative study among different distance measures likes Euclidean and Manhattan etc. are done for DBOD method. The pre-processed data finally is fed as input to the Logistic Regression model. Maximum likelihood estimation is used to fit the model. Logistic Model is built from the Sigmoid Function using the Regression Coefficients. The accuracy of the model is evaluated by 10 fold cross validation. Findings: Logistic Model is built from the Sigmoid Function using the Regression Coefficients, produces the accuracy of 79%. The Sigmoid Function calculated using Random Weight Function provides the prediction accuracy of 84.2% and the Bipolar Sigmoid Function calculated using Neuro based Weight Activation function provides the prediction accuracy of 90.4%. On comparison, Bipolar Sigmoid Function calculated using Neuro weight activation function outperforms well than the Sigmoid Function calculated using regression coefficients. Improvements/Applications: The accuracy of Logistic Regression is improved from 79% to 90.4%. The most important factors: Erythrocyte Sedimentation Rate (ESR) and Estimation of Mean blood Glucose are identified from positive subjects of Diabetes Mellitus. The analysis is done for the 31 Diabetes Disease attributes of three years dataset.

Keywords

Bipolar Sigmoid Neuro-Weight Activation Function, Distance based Outlier Detection Method, Logistic Regression, Random Weight Function, Sigmoid Activation Function, Type 2 Diabetes Risk Factors
User

Abstract Views: 140

PDF Views: 0




  • Developing a Modified Logistic Regression Model for Diabetes Mellitus and Identifying the Important Factors of Type II Dm

Abstract Views: 140  |  PDF Views: 0

Authors

M . Nirmala Devi
IT Department, Thiagarajar College of Engineering, Madurai - 625005, Tamil Nadu, India
Appavu alias Balamurugan
Department of IT, KLN College of Information Technology, Sivaganga - 630612, Tamil Nadu, India
M. Reshma Kris
IT Department, Thiagarajar College of Engineering, Madurai - 625005, Tamil Nadu, India

Abstract


Background/Objectives: Different methods can be applied to create predictive models for the clinical data with binary outcome variable. This research aims to explore the process of constructing the modified predictive model of Logistic Regression (LR). Method/Statistical Analysis: To improve the accuracy of prediction, the Distance based Outlier Detection (DBOD) is used for pre-processing and Bipolar Sigmoid Function calculated using Neuro based Weight Activation Function is used in Logistic Regression instead of Sigmoid Function. Datasets were collected from clinical laboratory of AR Hospital in Madurai for the three years 2012, 2013 and 2014 are used for analysis. Data pre-processing is done to avoid the existence of insignificant data in the dataset. The detected outliers, using DBOD method are treated using a method closest to the normal range. A comparative study among different distance measures likes Euclidean and Manhattan etc. are done for DBOD method. The pre-processed data finally is fed as input to the Logistic Regression model. Maximum likelihood estimation is used to fit the model. Logistic Model is built from the Sigmoid Function using the Regression Coefficients. The accuracy of the model is evaluated by 10 fold cross validation. Findings: Logistic Model is built from the Sigmoid Function using the Regression Coefficients, produces the accuracy of 79%. The Sigmoid Function calculated using Random Weight Function provides the prediction accuracy of 84.2% and the Bipolar Sigmoid Function calculated using Neuro based Weight Activation function provides the prediction accuracy of 90.4%. On comparison, Bipolar Sigmoid Function calculated using Neuro weight activation function outperforms well than the Sigmoid Function calculated using regression coefficients. Improvements/Applications: The accuracy of Logistic Regression is improved from 79% to 90.4%. The most important factors: Erythrocyte Sedimentation Rate (ESR) and Estimation of Mean blood Glucose are identified from positive subjects of Diabetes Mellitus. The analysis is done for the 31 Diabetes Disease attributes of three years dataset.

Keywords


Bipolar Sigmoid Neuro-Weight Activation Function, Distance based Outlier Detection Method, Logistic Regression, Random Weight Function, Sigmoid Activation Function, Type 2 Diabetes Risk Factors



DOI: https://doi.org/10.17485/ijst%2F2016%2Fv9i4%2F130380