Wednesday, January 11

Globalspec Most Memorable Spam of 2005: Dogpile Cloakers

I am going to steal a page from Matt Cutts, and talk about spam. Globalspec's Engineering Web is a gated community of only the engineering domain. Considering the limited scope, one might think that fighting spam would be easier than out there in the wilder "horizontal web" -- the would of GYM, right? Wrong. While GlobalSpec may not have to deal with Britney Spears and company (as much), it may surprise you to find that there are lots of people out there targeting the engineering domain with spam.

Over that past year, I've seen pretty much every type of spam the major SEs deal. I've seen major improvements in fighting spam in 2005. There was spam of every kind: domain hijacking, re-purposed content (ODP/Wikipedia), dynamic content generators, link spam, splogs, etc. (FYI if you want a good overview of web spam, check out Marc Najork's WebSpam presentation to the UC Sims 141 class.) However, one of the most difficult and insidious spam techniques from a search engine's perspective is cloaking. Cloaking is sending the search engine spiders different content than users see when they visit the page. Search Engine World has a good overview about the different types of cloaking spam.

My two most memorable spammers of 2005 are two cloaking sites that we caught. The two sites are and Both of these sites use referrer based cloaking. It's really sneaky. Let me illustrate:

The URL:

Now look at this cloaked version coming from off of MSN search results:

Now, I won't pass judgment on Dogpile, but the webmaster doesn't seem to get any tangible benefits, unlike Google Adsense spammers. Dogpile, on the other hand, gets to leech off of other search engines' results and gain traffic and therefore money. I'll let you investigate and draw your own conclusions.

Here's how the spammers operate in this spam network. They use referrer based cloaking. When you come in from a search engine, they detect the external referrer (and direct navigation with no referrer) and inject a few seemingly innocuous lines of html into the page. Here is they code they inject (with the opening < removed)

frame src="'" qkw="site%3acold%20forming%20company%20cold%20forged&qcat=">

script language="JavaScript">location.replace(
Whoa. The content is still there, but it is effectively hidden through their use of frames and javascript, which was previously not in the page. If you look at the bottom of the results page, there is a 1 row high line, which is where all the old content is displayed.

What's interesting is to examine which SEs have caught the above sites and which ones haven't. How good are the major SEs at detecting referrer cloaking spam? Google has not indexed MSN has pages and Yahoo has only their homepage. On the other hand, Google has indexed, Yahoo again only has the homepage, but MSN does not have it indexed. Clearly, even the big three search engines have mixed success dealing with this type of spam.

FYI, the two sites mentioned above are a part of a much larger spam network. Here is a small sampling of the sites:

And the list is much longer than that. My, what a nice little spam network they've got there.

My prediction for 2006 is that these problems will become even more of a problem for SEs. Search engines of every kind will need to devote more resources to shoring up their defenses and weeding out crap like the above to stay relevant. I hope to see blacklists of these types of sites that are known referrer spammers to make it easier to filter these sites out of results.

No comments:

Post a Comment