Thursday, November 17

Sitemap statistics are not like a bikini...

"Statistics are like a bikini. What they reveal is suggestive, but what they conceal is vital." - Arron Levenstein. At least in this case, I think the Google sitemap statistic real more vital information than they hide. Maybe statistics aren't so evil after all...

Google announced today on its official blog and on the Sitemaps Blog that it is going to provide more statistics to webmasters via the sitemap service.

What's even cooler is that the Google Sitemaps blog reports that you can get site indexing statistics even if you don't have a sitemap! Now, if only it was integrated better with Google Analytics (If you missed it, here is the official blog post on it being free).

What's really awesome is the ability it gives you to fix problems on your site. The statistics show the fetch details for every page in the Sitemap. In my opinion the two most interesting are the HTTP request details and the crawled date for individual pages. Did half your pages drop out of Google because one of your important pages 404ed? Was your site down when Google tried to crawl it? Now at least you are more empowered to do something about the problem. To my knowledge no other SE is providing this level of transparency with their crawling -- Globalspec, MSN, Yahoo, nobody.

I think it would be cool if there was a way you could suggest that Google retry crawling errored pages. When there was a 404 or some sort of logic error on your site, you could see it, fix it, and tell Google so that they can re-crawl it. I suppose if Google crawls you very frequently, this may not be a big issue, but if major portions of your site errored out repeatedly and dropped out of the index this could be devastating to a business that gets a lot of traffic from search engines (most do), especially small retailers in the holiday season!

Now here is an interesting experiment: Add a new page to my site (and sitemap) and then monitor its appears in the Google index. Then, compare the index date with the crawl date. What is the delay between crawling and appearance in the search index? Just how fast can Google get content that is crawled into its live search index?

The extra value provided by these sitemaps statistics is very smart because it is a very compelling incentive for webmasters to sign up for Sitemaps and also to spread adoption of the Sitemap format (it's still just a Google thing, after all).

The problem is, for me at least on Blogger, is that you need to "Verify" site ownership by placing a file in your root directory so that Google can fetch it. What sucks: no support for Blogger sites. And I'm not the only one and again on the Google Sitemap group ... who thinks this SuXors. It's ironic that the Google Sitemaps blog is on Blogspot and yet I have no way of verifying with Google that I own this blog on Blogspot . Another step on the way to setting up my own web server and WordPress.

