Open Access Open Access  Restricted Access Subscription Access

Detection of Projected Outliers from the Higher Dimensional data sets using Extended Kalman Filter and Fuzzy K-Means


Affiliations
1 IKGPTU, Jalandhar, Kapurthala - 144603, Punjab, India
2 RIMTIET (Affiliated to Punjab Technical University), Godindgarh - 147301, Punjab, India
 

Objectives: Curse of Dimensionality and the attribute relevance is the matter of great concern now these days while dealing with the higher dimensional data sets or Big Data, especially to detect the projected outliers. The objective of this research paper is to construct a Robust and a scalable model to prominently highlight the higher dimensional outliers in an effective and an efficient manner. Methods/Analysis: In order to detect the projected outliers, an algorithm EKFFK-Means with a hybrid approach is constructed using two important methodologies- Extended Kalman Filter (EKF) and Fuzzy K-Means. EKF is used to linearize the higher dimensional data by estimating the current mean and covariance by enhancing the Kalman gain and then fuzzy K-Means confirms the outlying property of each data instance and categorizes them in an effective and an efficient way using the membership label. Findings: A model EKFFK-Means is constructed that further creates 30 clusters from the complete data set to detect the projected outliers and various parameters like accuracy, cluster validity, True positive rate, False positive rate , robustness and cluster quality are calculated. Improvements: This algorithm is further compared with HPStream and CLUStream and is proved better against various parameters.

Keywords

Clustering, Projected Outliers, Robustness, Scalability, Unsupervised.
User

Abstract Views: 211

PDF Views: 0




  • Detection of Projected Outliers from the Higher Dimensional data sets using Extended Kalman Filter and Fuzzy K-Means

Abstract Views: 211  |  PDF Views: 0

Authors

Kamal Malik
IKGPTU, Jalandhar, Kapurthala - 144603, Punjab, India
Harsh Sadawarti
RIMTIET (Affiliated to Punjab Technical University), Godindgarh - 147301, Punjab, India

Abstract


Objectives: Curse of Dimensionality and the attribute relevance is the matter of great concern now these days while dealing with the higher dimensional data sets or Big Data, especially to detect the projected outliers. The objective of this research paper is to construct a Robust and a scalable model to prominently highlight the higher dimensional outliers in an effective and an efficient manner. Methods/Analysis: In order to detect the projected outliers, an algorithm EKFFK-Means with a hybrid approach is constructed using two important methodologies- Extended Kalman Filter (EKF) and Fuzzy K-Means. EKF is used to linearize the higher dimensional data by estimating the current mean and covariance by enhancing the Kalman gain and then fuzzy K-Means confirms the outlying property of each data instance and categorizes them in an effective and an efficient way using the membership label. Findings: A model EKFFK-Means is constructed that further creates 30 clusters from the complete data set to detect the projected outliers and various parameters like accuracy, cluster validity, True positive rate, False positive rate , robustness and cluster quality are calculated. Improvements: This algorithm is further compared with HPStream and CLUStream and is proved better against various parameters.

Keywords


Clustering, Projected Outliers, Robustness, Scalability, Unsupervised.



DOI: https://doi.org/10.17485/ijst%2F2016%2Fv9i26%2F135186