Open Access Open Access  Restricted Access Subscription Access

Link Spam Analysis Using Hybrid Techniques


Affiliations
1 Bharati Vidyapeeth College of Engineering Sector-7, C.B.D, Belpada, Navi Mumbai-400614, India
 

Link spam is a form of spamming that recently became publicized most often when targeting weblogs, but also affects wikis, guest-books, and online discussion boards. The applications or the blogs that contain any hyperlinks which are entered or displayed by the user or visitor, then that application or blog may be the containing some malicious spam links which makes the other visitors as a target. Such hyperlinks can increase the page rankings of that respective page in the Google search engine. This may lead to the suppression of other relevant search web pages as these malicious webpages (with increased page ranks) will appear at the top of the search results. There are some approaches to detect such spam links. The techniques involved in Link Spam Analysis are either Content-based or Link-based. In this report, we will be implementing both these techniques to improve the efficiency of detecting spam links. Crawler is used to extract the attached links to the web page. These URLs will be studied based on Content-based algorithms. This content will be used to process the web page. Based on this results, we will get the result whether the entered URL contains spam links or not.

Keywords

Spamdexing, Web Crawler.
User
Notifications
Font Size

  • Rajendra Kumar Roul, Shubham Rohan Asthana, Mit Shah, and Dhruvesh Parikh, “Detection of spam web page using content and link-based techniques: A combined Approach,” BITS, Pilani-K.K.Birla Goa Campus, Goa, India, SadhanaVol. 41, No. 2 pp. 193-202, 2 February 2016.
  • Kiran Hunagund, Santosh Kumar K L, “Spam Web Page Detection based on Content and Link Structure of the Site”, Department of CS&E, Nitte Meenakshi Institute of Technology, Bangalore, India, Vol. 4, Issue 8, August 2015.
  • Jing Wan, Mufan Liu, Xuechao Zhang, “Detecting Spam WebPages through Topic and Semantic Analysis”, Beijing, China, 2015.

Abstract Views: 788

PDF Views: 338




  • Link Spam Analysis Using Hybrid Techniques

Abstract Views: 788  |  PDF Views: 338

Authors

Vinita Kule
Bharati Vidyapeeth College of Engineering Sector-7, C.B.D, Belpada, Navi Mumbai-400614, India
Darshana Ranbagle
Bharati Vidyapeeth College of Engineering Sector-7, C.B.D, Belpada, Navi Mumbai-400614, India
Reshma Dabade
Bharati Vidyapeeth College of Engineering Sector-7, C.B.D, Belpada, Navi Mumbai-400614, India

Abstract


Link spam is a form of spamming that recently became publicized most often when targeting weblogs, but also affects wikis, guest-books, and online discussion boards. The applications or the blogs that contain any hyperlinks which are entered or displayed by the user or visitor, then that application or blog may be the containing some malicious spam links which makes the other visitors as a target. Such hyperlinks can increase the page rankings of that respective page in the Google search engine. This may lead to the suppression of other relevant search web pages as these malicious webpages (with increased page ranks) will appear at the top of the search results. There are some approaches to detect such spam links. The techniques involved in Link Spam Analysis are either Content-based or Link-based. In this report, we will be implementing both these techniques to improve the efficiency of detecting spam links. Crawler is used to extract the attached links to the web page. These URLs will be studied based on Content-based algorithms. This content will be used to process the web page. Based on this results, we will get the result whether the entered URL contains spam links or not.

Keywords


Spamdexing, Web Crawler.

References