Link spam is a form of spamming that recently became publicized most often when targeting weblogs, but also affects wikis, guest-books, and online discussion boards. The applications or the blogs that contain any hyperlinks which are entered or displayed by the user or visitor, then that application or blog may be the containing some malicious spam links which makes the other visitors as a target. Such hyperlinks can increase the page rankings of that respective page in the Google search engine. This may lead to the suppression of other relevant search web pages as these malicious webpages (with increased page ranks) will appear at the top of the search results. There are some approaches to detect such spam links. The techniques involved in Link Spam Analysis are either Content-based or Link-based. In this report, we will be implementing both these techniques to improve the efficiency of detecting spam links. Crawler is used to extract the attached links to the web page. These URLs will be studied based on Content-based algorithms. This content will be used to process the web page. Based on this results, we will get the result whether the entered URL contains spam links or not.
Keywords
Spamdexing, Web Crawler.
User
Font Size
Information