Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Big Data Privacy Preservation Using Two Phase Top-Down Specialization Algorithm with Multidimensional Map Reduce Framework on Hadoop


Affiliations
1 Department of Computer Science and Engineering, St. Joseph's College of Engineering and Technology, Palai, Kerala, India
     

   Subscribe/Renew Journal


Big data privacy preservation is one of the most disturbed issues in current industry. Sometimes the data privacy problems never identified when input data is published on cloud environment. Data privacy preservation in hadoop deals in hiding and publishing input dataset to the distributed environment. In this paper investigate the problem of big data anonymization for privacy preservation from the perspectives of scalability and time factor etc. At present, many cloud applications with big data anonymization faces the same kind of problems. For recovering this kind of problems, here introduced a data anonymization algorithm called Two Phase Top-Down Specialization (TPTDS) algorithm that is implemented in hadoop. For the data anonymization-45,222 records of adult's information with 15 attribute values was taken as the input big data. With the help of multidimensional anonymization in map reduce framework, here implemented proposed Two-Phase Top-Down Specialization anonymization algorithm in hadoop and it will increases the efficiency on the big data processing system. By conducting experiment in both one dimensional and multidimensional map reduce framework with Two Phase Top-Down Specialization algorithm on hadoop, the better result shown in multidimensional anonymization on input adult dataset. Data sets is generalized in a top-down manner and the better result was shown in multidimensional map reduce framework by the better IGPL values generated by the algorithm. The anonymization was performed with specialization operation on taxonomy tree. The experiment shows that the solutions improves the IGPL values, anonymity parameter and decreases the execution time of big data privacy preservation by compared to the existing algorithm. This experimental result will leads to great application to the distributed environment.

Keywords

Big Data, Cloud Computing, Data Anonymization, Map Reduce, Privacy Preservation, Top Down Specialization.
Subscription Login to verify subscription
User
Notifications
Font Size


  • Zhang, X., Yang, L. T., Liu, C., & Chen, J. (2014). A scalable two-phase top-down specialization approach for data anonymization using map reduce on cloud. IEEE Transactions on Parallel and Distributed Systems (TPDS), 25(2), 263-373.
  • Zhang, X., Liu, C., Nepal, S., Pandey, S., & Chen, J. (2012). A privacy leakage upper-bound constraint based approach for cost-effective privacy preserving of intermediate data sets in cloud. IEEE Transaction on Parallel and Distributed Systems.
  • Zhang, X., Liu, C., Nepal, S., Dou, W., & Chen, J. (2012). Privacy-preserving Layer over Map Reduce on Cloud and Green Computing (CGC 2012), pp. 304-310, Xiangtan, China.
  • Jurczyk, P., & Xiong, L. (2009). Distributed anonymization: achieving privacy for both data subjects and data providers. Proceedings of 23rd Annual IFIP WG 11.3 Working Conference Data and Applications Security XXIII (DBSec ’09), (pp. 191-207).
  • Liu H, Orban D (2011) Cloud map reduce: A Map Reduce implementation on top of a cloud operating system. In IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, (pp. 464-474).
  • Candan, K. S., Kim, J. W., Nagarkar, P., Nagendra, M., & Yu, R. (2010). RanKloud: Scalable multimedia data processing in server clusters. IEEE MultiMed, 18(1), 64-77.
  • Dean, J., Ghemawat, D. S. (2008). Map Reduce: Simplified data processing on large clusters. Communication of the ACM, 51, 107-113.
  • Fung, B. C. M., Wang, K., & Yu, P. S. (2007). Anonymizing classification data for privacy preservation. IEEE Transaction of Knowledge Data Engineering, 19(5), 711-725.
  • Xu, J., Wang, W., Pei, J., Wang, X., Shi, B., & Fu, A. W. (2006). Utility based anonymization using local recoding. In ACM SIGKDD.
  • Jiang, W., & Clifton, C. (2006). A secure distributed framework for achieving k-anonymity. VLDB Journal, 15(4), 316-333.
  • Amazon Web Services. (2013). Amazon Elastic Mapreduce. Retrieved from http://aws.amazon.com/elasticmapreduce/ (accessed on January 05, 2013)
  • Roy, I., Setty, S. T. V., Kilzer, A., Shmatikov, V., & Witchel, E. (2010). Airavat: Security and privacy for mapreduce. Proceedings of 7th USENIX Conference on Networked Systems Design and Implementation (NSDI’10), (pp. 297-312).
  • Brodsky, A., Farkas, C., & Jajodia, S. (2000). Secure databases: Constraints, inference channels, and monitoring disclosures. IEEE Transactions on Knowledge and Data Engineering. 12, 900-919.
  • Cao, N., Wang, C., Li, M., Ren, K., & Lou, W. (2011). Privacy preserving multi-keyword ranked search over encrypted cloud data. Proceedings of IEEE Infocom, ( pp. 829-837).
  • Armbrust, M., Fox, A., Griffith, R., Joseph, A. D., Katz, R., Konwinski, A., Lee, G., Patterson, D., Rabkin, A., Stoica, I., & Zaharia, M. (2010). A view of cloud computing. Communication of the ACM, 53(4), 50-58.
  • Mohan, P., Thakurta, A., Shi, E., Song, D. & Culler, D. (2012). Gupt: Privacy preserving data analysis made easy. Proceedings of ACMSIGMOD International Conference on Management of Data (pp. 349-360).
  • Hsiao-Ying, L., & Tzeng, W. G. (2012). A secure erasure code-based cloud storage system with secure data forwarding. IEEE Transactions and Distributed Systems, 23(6), 995-1003.
  • Zhang, X., & Dou, W. (2014). Proximity-aware local-recoding anonymization with map reduce for scalable big data privacy preservation in cloud. IEEE Transactions on Computers.
  • UCI Machine Learning Repository. Retrieved from ftp://ftp.ics.uci.edu/pub/machine-learnng-databases/

Abstract Views: 305

PDF Views: 0




  • Big Data Privacy Preservation Using Two Phase Top-Down Specialization Algorithm with Multidimensional Map Reduce Framework on Hadoop

Abstract Views: 305  |  PDF Views: 0

Authors

S. Shalin Eliabeth
Department of Computer Science and Engineering, St. Joseph's College of Engineering and Technology, Palai, Kerala, India
S. Sarju
Department of Computer Science and Engineering, St. Joseph's College of Engineering and Technology, Palai, Kerala, India

Abstract


Big data privacy preservation is one of the most disturbed issues in current industry. Sometimes the data privacy problems never identified when input data is published on cloud environment. Data privacy preservation in hadoop deals in hiding and publishing input dataset to the distributed environment. In this paper investigate the problem of big data anonymization for privacy preservation from the perspectives of scalability and time factor etc. At present, many cloud applications with big data anonymization faces the same kind of problems. For recovering this kind of problems, here introduced a data anonymization algorithm called Two Phase Top-Down Specialization (TPTDS) algorithm that is implemented in hadoop. For the data anonymization-45,222 records of adult's information with 15 attribute values was taken as the input big data. With the help of multidimensional anonymization in map reduce framework, here implemented proposed Two-Phase Top-Down Specialization anonymization algorithm in hadoop and it will increases the efficiency on the big data processing system. By conducting experiment in both one dimensional and multidimensional map reduce framework with Two Phase Top-Down Specialization algorithm on hadoop, the better result shown in multidimensional anonymization on input adult dataset. Data sets is generalized in a top-down manner and the better result was shown in multidimensional map reduce framework by the better IGPL values generated by the algorithm. The anonymization was performed with specialization operation on taxonomy tree. The experiment shows that the solutions improves the IGPL values, anonymity parameter and decreases the execution time of big data privacy preservation by compared to the existing algorithm. This experimental result will leads to great application to the distributed environment.

Keywords


Big Data, Cloud Computing, Data Anonymization, Map Reduce, Privacy Preservation, Top Down Specialization.

References