Open Access Open Access  Restricted Access Subscription Access

Online Incremental Learning for High Bandwidth Network Traffic Classification


Affiliations
1 Universiti Teknologi Malaysia, 81310 Johor Bahru, Johor, Malaysia
 

Data stream mining techniques are able to classify evolving data streams such as network traffic in the presence of concept drift. In order to classify high bandwidth network traffic in real-time, data stream mining classifiers need to be implemented on reconfigurable high throughput platform, such as Field Programmable Gate Array (FPGA). This paper proposes an algorithm for online network traffic classification based on the concept of incremental k-means clustering to continuously learn from both labeled and unlabeled flow instances. Two distance measures for incremental k-means (Euclidean andManhattan) distance are analyzed to measure their impact on the network traffic classification in the presence of concept drift. The experimental results on real datasets show that the proposed algorithm exhibits consistency, up to 94% average accuracy for both distance measures, even in the presence of concept drifts. The proposed incremental k-means classification using Manhattan distance can classify network traffic 3 times faster than Euclidean distance at 671 thousands flow instances per second.
User
Notifications
Font Size

Abstract Views: 93

PDF Views: 10




  • Online Incremental Learning for High Bandwidth Network Traffic Classification

Abstract Views: 93  |  PDF Views: 10

Authors

H. R. Loo
Universiti Teknologi Malaysia, 81310 Johor Bahru, Johor, Malaysia
S. B. Joseph
Universiti Teknologi Malaysia, 81310 Johor Bahru, Johor, Malaysia
M. N. Marsono
Universiti Teknologi Malaysia, 81310 Johor Bahru, Johor, Malaysia

Abstract


Data stream mining techniques are able to classify evolving data streams such as network traffic in the presence of concept drift. In order to classify high bandwidth network traffic in real-time, data stream mining classifiers need to be implemented on reconfigurable high throughput platform, such as Field Programmable Gate Array (FPGA). This paper proposes an algorithm for online network traffic classification based on the concept of incremental k-means clustering to continuously learn from both labeled and unlabeled flow instances. Two distance measures for incremental k-means (Euclidean andManhattan) distance are analyzed to measure their impact on the network traffic classification in the presence of concept drift. The experimental results on real datasets show that the proposed algorithm exhibits consistency, up to 94% average accuracy for both distance measures, even in the presence of concept drifts. The proposed incremental k-means classification using Manhattan distance can classify network traffic 3 times faster than Euclidean distance at 671 thousands flow instances per second.