Vitavonni

Thu, 26 Nov 2009

Identifying Link Spammers via nofollow links

I wonder if it's possible to identify link spammers (you know, these bots that mass-submit a link into as many blogs/etc they can find in order to boost their page rank) by the simple measure of how many of the links to their site are marked 'nofollow'.

Say, a regular page should have less than 5% (and less than 20) nofollow links; a site that goes significantly above this value probably employs some spam bot.

The only really hard thing is how to avoid attacks on a site using this ... say, I write a bot that spams links to Microsoft on as many sites as it can find that DO use 'nofollow', in order to get that site above the limit, and have google penalize it.

So in general I don't think Google would automatically penalize such things, still it could be used to e.g. have a human check the destination site for useful content, and then only blacklist when it doesn't seem to be useful.

P.S. Which BTW is a reason why some of the SEO "do nots" are bullshit: it would be too easy to deliberately use these to blacken a competitor. So a 'link farm' will at most do nothing to raise your ranking; but Google must not allow you to actually lower a competitors ranking by setting up a link farm to him!)

P.P.S. On another side note: Who guarantees that Google actually ignores "nofollow" links? They could also just be assigned a lower weight or a penalty, so that a "nofollow" link from a strong site such as Wikipedia would still be worth a lot, while the average blog comments page link goes down to 0. Say a "nofollow" link from a PR 6 site is as much worth as a regular link from a PR 4 site, and PR 2 becomes PR 0. Would already do much of the trick in discouraging the use of blog spam bots. Because after all, ignoring the links on Wikipedia for page rank would be quite stupid. In German Wikipedia, the page contents are even "sighted" (aka: peer reviewed); this is a rather trustworthy source, especially when you take time effects into account. A link being constantly in Wikipedia on a popular page for more than a month very likely is good.

[category: /en/web | Permalink]
Menu
[planet.debian]
[planet.xmlhack]
[planet SELinux]
[munichblogs]
[email]
[RSS 2 feed]
[English RSS 2]
Categories
< November 2009 >
SuMoTuWeThFrSa
1 2 3 4 5 6 7
8 91011121314
15161718192021
22232425262728
2930     
Archives
2010-Mar
2010-Feb
2010-Jan
2009-Dec
2009-Nov
2009-Oct
2009-Sep
2009-Aug
2009-Jul
2009-Jun
2009-May
2009-Apr
2009-Mar
2009-Feb
2009-Jan
2008-Dec
2008-Nov
2008-Oct
2008-Sep
2008-Aug
2008-Jul
2008-May
2008-Apr
2008-Mar
2008-Feb
2008-Jan
2007-Dec
2007-Nov
2007-Oct
2007-Sep
2007-Aug
2007-Jul
2007-Jun
2007-May
2007-Apr
2007-Mar
2007-Feb
2007-Jan
2006-Dec
2006-Nov
2006-Oct
2006-Sep
2006-Aug
2006-Jul
2006-Jun
2006-May
2006-Apr
2006-Mar
2006-Feb
2006-Jan
2005-Dec
2005-Nov
2005-Oct
2005-Sep
2005-Aug
2005-Jul
2005-Jun
2005-May
2005-Apr
2005-Mar
2005-Feb
2005-Jan
2004-Dec
2004-Nov
2004-Oct
2004-Sep
2004-Aug
2004-Jul
Other links:
Swing and the City - Lindy Hop in Munich