Saturday, November 5

Feature request for GDS 2.0

I like use the Google sidebar as my simple, stupid news and RSS reader. However, one thing I find annoying, -- ok, two things.

1) No ability to import or export feed lists! I have feeds in my RSS readers and other websites in OPML and other formats. Is this possible with an extension? I'll have to look around

2) The automatic of recent clips is annoying and leads to a lot of low-quality sites being added to my web clips. One of the first things I did after using GDS 2.0 for awhile was turn this feature off.

More on vertical search and other things soon to come ;-)

Thursday, November 3

Vertical Search: An Introduction

I had the opportunity to listen hear Jan Pederson from Yahoo speak at an industry conference on search. In Jan's presentation, he highlighted what he thinks part of the future in search might be: Vertical Search.

He identified several emerging search markets: local search, image search, desktop search, product search, and personalized search. Of course he also plugged Yahoo's new contextual search, Y!Q. First, I think Y!Q is a fantastic idea, but it's not really vertical search, so I'll leave that for another series on search and context.

So, what's in the news recently? Well...

Google just released Google Desktop 2.0 and moved Google Local out of beta. In light of these two new "vertical search" offerings and Jan's claim to their emerging importance, I think now is an auspicious time to take a closer look.

Update: It looks like Jan and company have been busy. Yesterday, they released a new version of Yahoo local / Maps beta. It has some very nifty features. Yes, I think this is fortuitous timing!

What I think I am going to do is start a series on "Vertical Search." I am going to start with Jan's emerging vertical search areas and compare what the offerings from the major players are in these areas. Then I'll give some of my ideas on vertical search and what I think is the future of vertical search.

However, before we dive right in, let's try to define vertical search and put it into some context. What is Vertical Search? What are some of its different definitions and what are some of its early pioneers. ... next time!

Tuesday, November 1

Google Toolbar Autolink Scanners

So, my hunch was right. I've been digging through the GTB code.

It's a very cool piece of code.

If you unpack the google-toolbar.jar file in the toolbar extension folder you will find a nice directory structure. Inside the content directory you find gtb.js -- this seems to be where most of the cool code powering the toolbar exists.

Here are some of the biggest chunks of code I identified: 1) Google Suggest and 2) AutoLink and 3) Spell checking ( Utilizing XMLHttp and Google's servers).
Additionally, there is code to store your search history and provide the PageRank visualizer. Nothing too shocking (yet..).

What's cool is that it doesn't seem to be obfuscated. Sure, there are no comments -- but I have seen open source software that is harder to understand.

The auto-link feature is powered by scanners. These scanners look over the words in the current document. Here is a selection of scanners and the words they look for:

Books ("ISBN", "Book", "Publication")
Package tracking for Fedex Ground / Express ("fedex" / "fed") tracking info / UPS ("ups") / USPS ("usps")
Vehicle Histories: "vin", "vehicle", "auto", "car", "bus", "pickup", "truck", "suv", "bike", "moto", "numbers", "number"
Address Mapping --

The address scanner was what I was looking for, so I'll cover that in a little more detail -- although all of the above are also interesting.

Address Scanner
:
Looking at the scanner code I see some interesting things, first the list of states is neat. Here are some highlights:
"American Samoa", "AS"
"Federated States of Micronesia", "FM"
"Marshall Islands", "MH"
"Northern Mariana Islands", "MP"
"Palau", "PW"

I didn't even know those were valid US states! You also have all the usual suspects. Nothing international in here.

It can't find certain addresses. Here is why:

var addressStreetScanner = "street", "st", "ave", "road", "avenue", "rd", "san", "blvd", "dr", "drive", "new", "york", "west", "east", "north", "south", "ct", "park", "way", "los", "city", "parkway", "beach", "main", "boulevard", "santa", "se", "ne", "sw", "nw"


Is it just me or do some of those seem very west coast oriented -- santa? beach? los? hmm... very interesting indeed.

What is more telling is what is lacking. Take a look at Wikipedia's entry on street names and what it has to say about street name designations. How did Google come up with this list? Perhaps it is direct from Google maps? Is the list the output of some sort of address text classifier / extractor ? Why aren't the Wikipedia street designations in it?

The address parsing problem extends beyond the Google Toolbar. It is a symptom of a larger problem with Google Maps. I went back over the list of addresses that auto-link couldn't extract and tried manually punching them into Google Maps. Did they work? Nope. Google maps can't handle the address formats I mentioned in my previous post -- and unsurprisingly, the functionality isn't built into the toolbar.

Now, just for curiousity's sake -- let's try Mapquest. Most of my problem addresses below work! It finds most of the addresses entered, including nice maps. However, Mapquest still can't handle the Empire State Plaza or Executive Park -- two local landmarks. Competition is good for business -- Mapquest is making upgrades to its UI to complete with Google Maps. It has recently upgraded to nice big maps, even if they aren't yet draggable. Did I mention Mapquest's maps are prettier and easier to read?
Verdict: at least for now, MapQuest is this geek's top choice for mapping / directions.

So what else did I find puttering around in the toolbar Javascript? Lots of cool and interesting things. There is enough material there to keep me busy for awhile. But here is a little interesting bit that I found enlightening:

the Google toolbar pings back to the Googlesphere daily. In Fact, the Google Toolbar sends out a GET request with the user's first search of the day. This isn't a 100% surprise, I think I saw something about usage statistics in EULA. However, this is the first hard evidence I found of Google collecting my information
GTB_SendDailyPing()
The ping url:
var url = "http://toolbar.google.com/version3f?tbbrand=GGGL" + installId + "&" + "dll=" + "1.0.20051012" + "&" + "hl=" + GTB_GoogleToolbarOverlay.languageCode + "&" + "browser=" + encodeURIComponent(window.navigator.appVersion) + firstsearch;

One last ending thought:
As I mentioned earlier, I would like to see Google AutoLink integrated with Google local and its entity extraction algorithms. It would be so cool! Imagine, I am browsing a local restaurant review site and see pizza hut -- Ok, let's order. I hit the auto-link button. It finds extracts the entity Pizza Hut from the page using Google's meta data / local and it knows my zip code. Google looks up Pizza Hut in google local and gives me the option to auto-link to its Google Local listing, complete with my local phone number and reviews. Easier than putzing about Pizza Hut's website trying to find my nearest location!

Auto-Link has a lot of potential, but I think its name could be improved. I had no idea what it did or when I should use it before I saw the advert. Even then, I thought it was just for addresses, but the code showed me lots of cool ways to use it! How about more info on this in the docs Googlers? Did I miss this somewhere?

Google's AutoLink In Action

Google is heavily promoting its toolbar on its website. Google has even gone so far as polluting its once pristine home page with adverts. It has embarked on an aggressive campaign to boost its desktop market penetration. The advertising and marketing blitz is not surprising in light of the recent partnership with Sun. Today, I want to focus attention on a feature of the toolbar I never used or understood until recently -- AutoLink.



Google's Promotion of the AutoLink Feature

So the question is, what is this auto-link feature and why should we care. In short it provides a tie into Google maps from wherever you happen to be. No more, find an address, go to mapquest, go back to other website, copy address, paste address, etc... You get the picture. Instant access to maps at your fingertips from your current context. Very Useful.

As it stands, AutoLink is good, but far from perfect. To test it, I went to a local yellow page website and did a search for cafe. I've got to keep tabs on my competition after all ;-). What is really interesting is what AutoLink misses!


AutoLink In Action

Here are some addresses it misses:
131 Colonie Ctr..
Empire State Plz..
Central & Colvin Ave..
Corner Central & Col ...
Executive Park ...

What I observe is that it handles the pattern: [numeric] [words]+ [street OR st OR Avenue OR ave OR boulevard OR blvd road OR rd]+ [words (like apartment/suite)]* ... city, state, etc..

It is pretty fragile in that it doesn't seem to handle: center / ctr as a street, non-numeric addresses (like Empire State Plaza), and it doesn't handle intersections. Parsing addresses correctly is not any easy job to do reliably. I hoped Google might outshine the others, but it looks like they are still human after all ;-). Good work guys, let's get that other 20%!

So how'd they do that? Well, it is a firefox plugin, so it probably isn't that hard to imagine. All you need to do is write an address parser in Javascript. If I have time later tonight perhaps I will do a little hacking and see if I can't confirm this 100%. This is a simple, stupid implementation. It's cool because it works on ANY page, intranet, database driven website, etc... It 's scope is global, but it is very limited in features.

An alternate and more complex implementation might be to look up the page in Google's meta-data index / Google local to find the entities with addresses on that page. This would be very powerful because you could have names of companies whose address might not appear on the page itself, but that are in Google local. You could then link these pages as well. This would be very very cool. The downside is that it wouldn't return things for Joe Blow's home because it isn't in Google local. Hmm, perhaps I might consider hacking around with the Google local API and see if I can't whip something up.