Daniel is introducing Matt Cutts, head of Google web spam.
- What is web spam
- Why people spam
-- He's going to teach you how to think like a spammer. Use this only for good; not for evil!
What is Web spam?
- Webspam is cheating (breaking search engine guidelines) in an attempt to rank higher in SE; just trying to rank higher is SEO, not spam.
Why web spam?
- A catholic priest used hidden text to spam! Catholicism, Catholicism, Catholicism!
The mind of a blackhat spammer
Anyone using Wifi?
- How do you know it's not fake and someone is trying to sniff your data?
- These guys are here to get you.
- "Recycle your badge" box at Web 2.0. The badge is worth thousands of dollars. How do know it's real?
- It shows you the blackhat attitude.
Scenario: Suppose Google starts penalizing sites that have spammy inlinks.
- People will create spammy links to their competitors!
- SPAM: "Sites Positioned Above Me"
1) Content (on-page)
2) Reputation (off-page)
- you need both to be a good spammer.
You need something else: A way to make money! (monetization)
- The most famous example is hidden text. (the old white text on a white background trip)
Old school: FFFFFF... new age...
Examples of spam techniques
SecretsMoney: Tax deferred.
Scraping: (From all about Jazz) A key give away is that they don't escape special characters. Spammers are now being tricky and stitching content together (sentence and phrase level stitching)
- Showing different content to Google than your users.
- Steve Bartel on MIT! got hacked... He's relying on the fact that search engines don't execute Javscript. (They do parse JS!, but many search engines don't.)
- Blog comment spam ... pretty common.
In 2006 spammers had their own sites, today they try and use other people's!
Hacking the biggest trend in spam; it's easier to hack somebody else's than build your own.
Make Spammers waste time or effort. Frustrate them!
Ways to frustrate trolls
- Disemvowelin. "thnk tht ths tpc s stpd nd dmb" -- invented by boing boing
- Show troll's comments only to the troll, not to anyone else.
- Slow down the website experience for the troll. (wait 20s for http reply... put him in dial up mode!)
- Start the troll in a -1 "hole" that they can dig out of. (You need to get someone to agree with you to get visible)
- PageRank, TrustRank, BingRank?
- Ebay: your seller rating. (100% positive, since 02, with hundreds of sales)... any time he writes a post on Amazon, it's probably ok.
Off the beaten path
- Clever: have a hidden form that only bots will fill out to catch them!
Where is this spam from?
Good tools: nofollow
Trends in webspam
- Search engines better at spotting spammy pages.
- Spammers make legit-looking pages for spammy links
- Spammers hack/deface legit sites for links/landing page
- Spammers are using malware!
Spam will soon be more dangerous!
Classifying whether or not a site was hacked, not whether it is keyword stuffing!
Porn producer: 1 in 50 converting to 1 in 200. Answer: Installing malware from a webpage!
Next wave of web spam will be hacking webservers (XSS)
* Detect when a website is hacked based on how links are added, etc...
Selling links from hacked sites.
Preventing Comment Spam
- Any way you can tell humans from bots.
- Is there a question that everyone in the world can answer?
- What techniques prevent comment spam?
- Web service to classify content as spam/nonspam
Trust, Identity, Authentication
- Is there a PageRank for people?... something that spans across social networks and the web.
- Bring authenticated authorship to the web
- When should a website vouch for a link to another website?
- Wikipedia nofollows links to most other sites!
(In short, when can you trust a member of your community)
WoW has better authentication than most sites on the web.
Twitter and Facebook
- Study adversial IR ... spam followers, or the "realmattcutts" he followed the same people as Matt!
- It recreates a lot of the same problems you see in e-mail.
- Twitter trending: the Acai Berries
- Using twitter for linking/malware spam! It's happening.
- Google bombs the first one was "talented hack?"