Open Access Open Access  Restricted Access Subscription Access

Quick Matching of Big Binary Data: A Probabilistic Approach


Affiliations
1 Department of Mechanical Engineering, Kuwait University, P. O. Box 5969 - Safat - Kuwait 13060, Kuwait
 

Given two sets of binary data, how can we determine if the data are dissimilar? The simplest technique is to simply subtract the two sets or to calculate the correlation between them. Both of these methods –as well as other methods– require some type of similarity operation to be applied to all points of the data. This implies that as the data becomes big, more processing time is required. In this paper, we present a novel approach to matching using a probabilistic model that requires a few number of points –and not all points – to be compared between two data sets to detect dissimilarity. Furthermore, the model is size invariant; big data can be matched just as quickly as matching small data. The similarity between the data can also be measured to a good degree by repeating the matching process several times.

Keywords

Big Data, Binary Data, Binary Matching, Pattern Matching, Probabilistic Model.
User

Abstract Views: 130

PDF Views: 0




  • Quick Matching of Big Binary Data: A Probabilistic Approach

Abstract Views: 130  |  PDF Views: 0

Authors

Adnan A. Y. Mustafa
Department of Mechanical Engineering, Kuwait University, P. O. Box 5969 - Safat - Kuwait 13060, Kuwait

Abstract


Given two sets of binary data, how can we determine if the data are dissimilar? The simplest technique is to simply subtract the two sets or to calculate the correlation between them. Both of these methods –as well as other methods– require some type of similarity operation to be applied to all points of the data. This implies that as the data becomes big, more processing time is required. In this paper, we present a novel approach to matching using a probabilistic model that requires a few number of points –and not all points – to be compared between two data sets to detect dissimilarity. Furthermore, the model is size invariant; big data can be matched just as quickly as matching small data. The similarity between the data can also be measured to a good degree by repeating the matching process several times.

Keywords


Big Data, Binary Data, Binary Matching, Pattern Matching, Probabilistic Model.



DOI: https://doi.org/10.17485/ijst%2F2016%2Fv9i28%2F132485