a software upgrade to Google’s infrastructure that provides the framework for a lot of improvements to core search quality in the coming months (smarter redirect handling, improved canonicalization, etc.). A team of dedicated people has worked very hard on this change; props to them for the code, sweat, and hours they’ve put into it.It started out at one data center and is now live on at all of Google's data centers. One of its biggest advantages of this upgrade is improved URL cannonicalization -- www vs. non-www, redirects, duplicate urls, 302 “hijacking,” etc... The biggest is most likely that they are better at picking www vs. non-www versions of URLs. Secondly, they are coming up to speed with Yahoo in regards to problems with redirects and "hijacking". Danny Sullivan over at SE Watch wrote a great article last August on Yahoo's policy regarding redirects and hijacking, including contrasting it with Google's (old) policy.
What caught my eye is what I interpret to be an entirely new crawler engine in BigDaddy. Here is the snippet from Matt's most recent post:
A: Yes, I believe so. You will probably see less crawling by the older Googlebot, which has a User-Agent of “Googlebot/2.1 (+http://www.google.com/bot.html)”. I believe crawling from the Bigdaddy infrastructure has a new User-Agent, which is “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”
For starters quality will improve because Google can tell more accurately what the users see on the page. No more hiding DIVs with CSS or JS to stuff keywords!
Some people have claimed that the new crawler is "blazing fast" compared to the old GoogleBot. While I believe it may seem this way to webmasters because Google is crawling more aggressively, I find it highlighly unlikely that the software itself is faster. If the new crawler is using a Mozilla based engine it MUST be slower than the text based crawler because of all of the new features -- Javscript parsing, CSS rendering, etc... which it hasn't done in the past.
Google is crawling more aggressively because I believe it is trying to re-crawl a large portion of the web very quickly. If you think about the impact of modifying the way URL canonization works along with a new crawler engine, it follows that you will probably need to re-computer PageRank. Crawling gently is not something you can do at this scale if you want to propogate these changes quickly. In the process, Google is creating some webmaster complaints. Along this line, Search Engine Journal has an recent article on the topic entitled: Mozilla Googlebot: Mozilla or Godzilla.