Open Access Open Access  Restricted Access Subscription Access

Experimental Setup of Logs Analysis on Distributed File Systems using MapReduce


Affiliations
1 Department of Computer Science, Vivekanand Institute of Education Society’s Arts, Science and Commerce College, Mumbai – 400071, Maharashtra, India
2 Department of Computer Science, HVPM, Amravati – 444605, Maharashtra, India
 

The computing world is undergoing a drastic change from traditional non-centralized distributed system architecture to typical parallel and pseudodistributed nodes. Such nodes are scattered across different geographic areas to a centralized cloud computing architecture where data transformation and computations are operated somewhere on any node. Data centres owned and maintained by third party or a cloud can be formed and maintained using the number of physical machines. These machines can be of different configurations or using virtual machines on a shared LAN to communicate with each other. It has been experienced that there is always a difference in performance when the MapReduce program is run on various input statements and different Distributed File System (DFS).

The use case on data generation from the Security Logs from the server machine has been taken into consideration. In our case to run this program, the mini-cloud has been configured on LAN. The outcome of analysis has been carried out using a MapReduce program, tested on the data generated from the security software, have been tested on various DFS like Hadoop, Ceph, Glusterfs and the Zfs. These DFS installed on infrastructures like Single Virtual Machine, a cluster of Virtual Machine and the minicloud. It has been noticed that MapReduce is the best technique for the logs analysis and computations.


Keywords

Ceph, Gluster, Hadoop, Logs, MapReduce.
User

Abstract Views: 167

PDF Views: 0




  • Experimental Setup of Logs Analysis on Distributed File Systems using MapReduce

Abstract Views: 167  |  PDF Views: 0

Authors

Madhavi Vaidya
Department of Computer Science, Vivekanand Institute of Education Society’s Arts, Science and Commerce College, Mumbai – 400071, Maharashtra, India
Shrinivas Deshpande
Department of Computer Science, HVPM, Amravati – 444605, Maharashtra, India

Abstract


The computing world is undergoing a drastic change from traditional non-centralized distributed system architecture to typical parallel and pseudodistributed nodes. Such nodes are scattered across different geographic areas to a centralized cloud computing architecture where data transformation and computations are operated somewhere on any node. Data centres owned and maintained by third party or a cloud can be formed and maintained using the number of physical machines. These machines can be of different configurations or using virtual machines on a shared LAN to communicate with each other. It has been experienced that there is always a difference in performance when the MapReduce program is run on various input statements and different Distributed File System (DFS).

The use case on data generation from the Security Logs from the server machine has been taken into consideration. In our case to run this program, the mini-cloud has been configured on LAN. The outcome of analysis has been carried out using a MapReduce program, tested on the data generated from the security software, have been tested on various DFS like Hadoop, Ceph, Glusterfs and the Zfs. These DFS installed on infrastructures like Single Virtual Machine, a cluster of Virtual Machine and the minicloud. It has been noticed that MapReduce is the best technique for the logs analysis and computations.


Keywords


Ceph, Gluster, Hadoop, Logs, MapReduce.



DOI: https://doi.org/10.17485/ijst%2F2017%2Fv10i29%2F158599