Over that past year, I've seen pretty much every type of spam the major SEs deal. I've seen major improvements in fighting spam in 2005. There was spam of every kind: domain hijacking, re-purposed content (ODP/Wikipedia), dynamic content generators, link spam, splogs, etc. (FYI if you want a good overview of web spam, check out Marc Najork's WebSpam presentation to the UC Sims 141 class.) However, one of the most difficult and insidious spam techniques from a search engine's perspective is cloaking. Cloaking is sending the search engine spiders different content than users see when they visit the page. Search Engine World has a good overview about the different types of cloaking spam.
My two most memorable spammers of 2005 are two cloaking sites that we caught. The two sites are http://www.cold-forming-company.com and http://www.metal-cold-forming.com. Both of these sites use referrer based cloaking. It's really sneaky. Let me illustrate:
The URL: www.cold-forming-company.com/coldforgedsteels.htm
Now look at this cloaked version coming from off of MSN search results:
Now, I won't pass judgment on Dogpile, but the webmaster doesn't seem to get any tangible benefits, unlike Google Adsense spammers. Dogpile, on the other hand, gets to leech off of other search engines' results and gain traffic and therefore money. I'll let you investigate and draw your own conclusions.
Here's how the spammers operate in this spam network. They use referrer based cloaking. When you come in from a search engine, they detect the external referrer (and direct navigation with no referrer) and inject a few seemingly innocuous lines of html into the page. Here is they code they inject (with the opening < removed)
frame src="'http://www.dogpile.com/info.rawhd/redirs_all.htm?pgtarg=" qkw="site%3acold%20forming%20company%20cold%20forged&qcat=">
What's interesting is to examine which SEs have caught the above sites and which ones haven't. How good are the major SEs at detecting referrer cloaking spam? Google has not indexed cold-forming-company.com. MSN has pages and Yahoo has only their homepage. On the other hand, Google has indexed metal-cold-forming.com, Yahoo again only has the homepage, but MSN does not have it indexed. Clearly, even the big three search engines have mixed success dealing with this type of spam.
FYI, the two sites mentioned above are a part of a much larger spam network. Here is a small sampling of the sites:
And the list is much longer than that. My, what a nice little spam network they've got there.
My prediction for 2006 is that these problems will become even more of a problem for SEs. Search engines of every kind will need to devote more resources to shoring up their defenses and weeding out crap like the above to stay relevant. I hope to see blacklists of these types of sites that are known referrer spammers to make it easier to filter these sites out of results.