Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Machine Learning Based Architecture for Rule Establishment of Web Proxy Server


Affiliations
1 Department of Computer Science & Engineering, Jaypee University of Engineering & Technology, Guna, Madhya Pradesh., India
2 Department of Information Technology & MCA, B.I.T. Mesra, Ranchi, Jharkhand., India
3 Department of Computer Science and Engineering, B.I.T. Extension Centre, Lalpur, Ranchi, Jharkhand., India
     

   Subscribe/Renew Journal


In present scenario Internet has become an integral part of every ones life, as many services like mail, news, chat are available and huge amounts of information on almost any subject is available. However, in most cases the bandwidth to connect to the Internet is limited. It needs to be used efficiently and more importantly productively. Generally, bandwidth is distributed among groups of users based on some policy constraints. However, it turns out that the users do not always use the entire allocated bandwidth at all times. Also, some times they need more bandwidth than the bandwidth allocated to them. Ideally, productive usage should be preferred over unproductive usage when bandwidth is scarce. But when it is abundant then any kind of use can be permitted provided it is in consonance with policy. The bandwidth usage patterns of users vary with time of the day, time of the year and requirements. So there is a need for dynamic allocation of bandwidth that satisfies the requirements of the users, manages variable usage and is consistent with administrative usage policy. Internet usage is varied and in the context of an institution or organization an administrator would like to maximize productive usage. There is, therefore, a need to implement control access policies, which prevents unproductive use but at the same time does not, to the extent possible, impose censorship. Squid proxy server is a full-featured web proxy, which increases the efficiency of the Internet link by providing caching and proxy services. Squid provides many mechanisms to set access control policies. However, deciding which policies to implement requires experimentation and usage statistics that must be processed to obtain useful data. The proposed architecture elaborated in this paper is based on machine learning to determine policies depending on the content of current URLs being visited. The main component in this architecture is the Squid traffic Analyzer, which classifies the traffic and generates URL lists. These URL lists are used in formulating access policies. The concept of delay priority will also be introduced which gives more options to system administrators in setting policies for bandwidth management. As Squid allows HTTP tunneling, it forms a loophole for strict policy management. In this paper the proxy tunneling in Squid has also been considered and some possible solutions to this problem will also be suggested.

Keywords

Web Proxy, Machine-learning, Network Traffic, Meta Data
Subscription Login to verify subscription
User
Notifications
Font Size


  • Mitchell, T. M. (1997). Machine Learning, McGraw-Hill.
  • Tanenbaum, S. (1999). Computer Networks. (3rded.) Delhi: Prentice-Hall of India Pvt. Ltd.
  • Luotonen, A. (1999). Tunneling TCP Based Protocols through Web Proxy Servers. Retrieved from www. webcache.Com/Writings/lnternet-Drafts/draft-luotonen-web-proxy-tunneling-01.txt
  • Squid Proxy Server. Retrieved from www.squidcache.org
  • Squid Frequently asked Questions. Retrieved from www.squid-cache.org/Doc/FAQj*
  • Squid Configuration File: Squid Configuration. Retrieved from http://www.squid-cache.org/Doc/config/
  • Squid configuration Manual. Retrieved from www. visolve.comjsquid24s1/contents.html
  • david@luyer.net
  • Upgrading to TLS within HTTP/1.1. (1997). Retrieved from www.ietf.org/rfc2817.txt
  • HTTP-Tunnel Corporation-Networking Products for Corporate Communications. Retrieved from www.http-tunnel.com
  • Cutting Edge Web Applications. Retrieved from www.totalrc.com
  • Squid Cache Logfile Analysis Scripts. Retrieved from www.squid-cache.org/Scripts
  • Calamaris: Log Analyzer. Retrieved from http:// cord.de/tools/calamaris/
  • Webalizer: Log Analyzer. Retrieved from http:// mrunix.netjwebalizr
  • Cache Digest Specification-Version 5. Retrieved from www.sequid-cache.org/CacheDigest/cachedigest-v5.Txt
  • Hyper Text Transfer Protocol - HTTP/1.1. Retrieved from www.ietf org/rfc/rfc2616.txt
  • Rousskov, A. & Soloviev, V. (1999). A Performance Study of the Squid Proxy on HTTP/l.0. World Wide Web, June, 2(1-2), 47-67.
  • Chamara Gunaratne, Gihan Dias (University of Moratuwa) Using DynamicDelay Pools for Bandwidth Management URL: www.2002.iwcw.org/
  • Squid Programmers Guide. Retrieved from www. squid-cache.org/Prog-Guide/prog-guide.html
  • Lang, K. (1995). NewsWeeder: Learning to Filter Netnews. In Priedits & Russel (eds.), Proceedings of 12th International conference on machine learning (pp. 331-339). San Francisco: Morgann Kaufmann Publishers.
  • Rish, I. (2001). An Empirical Study of the Naive Bayes Classifier. IJCAI-01 workshop on Empirical Methods in AI.
  • Rish, I., Hellerstein, J. & Jayram, T. S. (2001). An Analysis of Data Characteristics that Affect Naive Bayes Performance. IBM Technical Report RC21993, 2001.
  • Rish, I. (2000). Advances in Bayesian Learning. A short tutorial presented at ICAI'2000. Las Vegas.

Abstract Views: 1694

PDF Views: 6




  • Machine Learning Based Architecture for Rule Establishment of Web Proxy Server

Abstract Views: 1694  |  PDF Views: 6

Authors

P. S. Banerjee
Department of Computer Science & Engineering, Jaypee University of Engineering & Technology, Guna, Madhya Pradesh., India
G. Sahoo
Department of Information Technology & MCA, B.I.T. Mesra, Ranchi, Jharkhand., India
Umesh Prasad
Department of Computer Science and Engineering, B.I.T. Extension Centre, Lalpur, Ranchi, Jharkhand., India

Abstract


In present scenario Internet has become an integral part of every ones life, as many services like mail, news, chat are available and huge amounts of information on almost any subject is available. However, in most cases the bandwidth to connect to the Internet is limited. It needs to be used efficiently and more importantly productively. Generally, bandwidth is distributed among groups of users based on some policy constraints. However, it turns out that the users do not always use the entire allocated bandwidth at all times. Also, some times they need more bandwidth than the bandwidth allocated to them. Ideally, productive usage should be preferred over unproductive usage when bandwidth is scarce. But when it is abundant then any kind of use can be permitted provided it is in consonance with policy. The bandwidth usage patterns of users vary with time of the day, time of the year and requirements. So there is a need for dynamic allocation of bandwidth that satisfies the requirements of the users, manages variable usage and is consistent with administrative usage policy. Internet usage is varied and in the context of an institution or organization an administrator would like to maximize productive usage. There is, therefore, a need to implement control access policies, which prevents unproductive use but at the same time does not, to the extent possible, impose censorship. Squid proxy server is a full-featured web proxy, which increases the efficiency of the Internet link by providing caching and proxy services. Squid provides many mechanisms to set access control policies. However, deciding which policies to implement requires experimentation and usage statistics that must be processed to obtain useful data. The proposed architecture elaborated in this paper is based on machine learning to determine policies depending on the content of current URLs being visited. The main component in this architecture is the Squid traffic Analyzer, which classifies the traffic and generates URL lists. These URL lists are used in formulating access policies. The concept of delay priority will also be introduced which gives more options to system administrators in setting policies for bandwidth management. As Squid allows HTTP tunneling, it forms a loophole for strict policy management. In this paper the proxy tunneling in Squid has also been considered and some possible solutions to this problem will also be suggested.

Keywords


Web Proxy, Machine-learning, Network Traffic, Meta Data

References