Tuesday, November 1

Google Toolbar Autolink Scanners

So, my hunch was right. I've been digging through the GTB code.

It's a very cool piece of code.

If you unpack the google-toolbar.jar file in the toolbar extension folder you will find a nice directory structure. Inside the content directory you find gtb.js -- this seems to be where most of the cool code powering the toolbar exists.

Here are some of the biggest chunks of code I identified: 1) Google Suggest and 2) AutoLink and 3) Spell checking ( Utilizing XMLHttp and Google's servers).
Additionally, there is code to store your search history and provide the PageRank visualizer. Nothing too shocking (yet..).

What's cool is that it doesn't seem to be obfuscated. Sure, there are no comments -- but I have seen open source software that is harder to understand.

The auto-link feature is powered by scanners. These scanners look over the words in the current document. Here is a selection of scanners and the words they look for:

Books ("ISBN", "Book", "Publication")
Package tracking for Fedex Ground / Express ("fedex" / "fed") tracking info / UPS ("ups") / USPS ("usps")
Vehicle Histories: "vin", "vehicle", "auto", "car", "bus", "pickup", "truck", "suv", "bike", "moto", "numbers", "number"
Address Mapping --

The address scanner was what I was looking for, so I'll cover that in a little more detail -- although all of the above are also interesting.

Address Scanner
Looking at the scanner code I see some interesting things, first the list of states is neat. Here are some highlights:
"American Samoa", "AS"
"Federated States of Micronesia", "FM"
"Marshall Islands", "MH"
"Northern Mariana Islands", "MP"
"Palau", "PW"

I didn't even know those were valid US states! You also have all the usual suspects. Nothing international in here.

It can't find certain addresses. Here is why:

var addressStreetScanner = "street", "st", "ave", "road", "avenue", "rd", "san", "blvd", "dr", "drive", "new", "york", "west", "east", "north", "south", "ct", "park", "way", "los", "city", "parkway", "beach", "main", "boulevard", "santa", "se", "ne", "sw", "nw"

Is it just me or do some of those seem very west coast oriented -- santa? beach? los? hmm... very interesting indeed.

What is more telling is what is lacking. Take a look at Wikipedia's entry on street names and what it has to say about street name designations. How did Google come up with this list? Perhaps it is direct from Google maps? Is the list the output of some sort of address text classifier / extractor ? Why aren't the Wikipedia street designations in it?

The address parsing problem extends beyond the Google Toolbar. It is a symptom of a larger problem with Google Maps. I went back over the list of addresses that auto-link couldn't extract and tried manually punching them into Google Maps. Did they work? Nope. Google maps can't handle the address formats I mentioned in my previous post -- and unsurprisingly, the functionality isn't built into the toolbar.

Now, just for curiousity's sake -- let's try Mapquest. Most of my problem addresses below work! It finds most of the addresses entered, including nice maps. However, Mapquest still can't handle the Empire State Plaza or Executive Park -- two local landmarks. Competition is good for business -- Mapquest is making upgrades to its UI to complete with Google Maps. It has recently upgraded to nice big maps, even if they aren't yet draggable. Did I mention Mapquest's maps are prettier and easier to read?
Verdict: at least for now, MapQuest is this geek's top choice for mapping / directions.

So what else did I find puttering around in the toolbar Javascript? Lots of cool and interesting things. There is enough material there to keep me busy for awhile. But here is a little interesting bit that I found enlightening:

the Google toolbar pings back to the Googlesphere daily. In Fact, the Google Toolbar sends out a GET request with the user's first search of the day. This isn't a 100% surprise, I think I saw something about usage statistics in EULA. However, this is the first hard evidence I found of Google collecting my information
The ping url:
var url = "http://toolbar.google.com/version3f?tbbrand=GGGL" + installId + "&" + "dll=" + "1.0.20051012" + "&" + "hl=" + GTB_GoogleToolbarOverlay.languageCode + "&" + "browser=" + encodeURIComponent(window.navigator.appVersion) + firstsearch;

One last ending thought:
As I mentioned earlier, I would like to see Google AutoLink integrated with Google local and its entity extraction algorithms. It would be so cool! Imagine, I am browsing a local restaurant review site and see pizza hut -- Ok, let's order. I hit the auto-link button. It finds extracts the entity Pizza Hut from the page using Google's meta data / local and it knows my zip code. Google looks up Pizza Hut in google local and gives me the option to auto-link to its Google Local listing, complete with my local phone number and reviews. Easier than putzing about Pizza Hut's website trying to find my nearest location!

Auto-Link has a lot of potential, but I think its name could be improved. I had no idea what it did or when I should use it before I saw the advert. Even then, I thought it was just for addresses, but the code showed me lots of cool ways to use it! How about more info on this in the docs Googlers? Did I miss this somewhere?

1 comment: