Tuesday, February 7

Search for Dorks

No, it's not a geek dating service, that's already been done. I thought I would mention two new "vertical" search engines targeted at coders and CS geeks.

For a coder's needs Google and the other major search engines have major shortcomings. For starters, they strip out important punctuation characters, like ;, slashes, asterisks, parenthesis, etc... for starters. What stinks even worse is when you are looking for a variable name or other certain specific piece of code but only know a partial variable name -- GYM only matches complete words. Taking it one step further us CS dorks like to search using the power of regular expressions! And don't even get me started about their case (in)sensitivity. The bottom line here is that there is major room for improvement, and that's what several new search engines are trying to do.

Krugle -- was announced yesterday at the DEMO conference. John Battelle has his coverage, here. “Krugle is a search engine for programmers”, according to Co-Founder and CEO, Steve Larsen. Ken Krugler, Co-founder and CTO writes, "While current search engines are OK at finding Web pages, they don’t crawl source code repositories, archives or knowledge bases, and they don’t leverage the inherent structure of code to support the types of searches programmers need.”

Exploiting the structural properties of the code would allow programmers to find code more like they would in their IDE than on a web page. More on this later. Exploiting the structure of the code opens up a whole new realm of vizualization and UI possibilites that are very exciting! Instead of search within this site, search within this project, within this logical grouping (like package). Imagine being able to specify the type of matches that that is language dependent -- is it the name of a class, variable, method, only in the comments? The possibilities are very exciting. Could the search engine perform "code translation" where you translate Java into C# and vice versa? That would be very awesome, especially if you were trying to learn a new language.

In addition to code, Krugle also searches technical articles, bug reports, documentation, standards, etc... After all, when you've got to navigate this huge API or even worse, this one, there's got to be a better way to find things the information you need.

The last major feature is the social aspect of search, which I think is more hype than substance at this point, but I could be wrong. It's an interesting idea that could open up some very interesting possibilities. In Krugle, users can comment on and tag results and share them, in their words: "save, annotate, and share your search results with others."

It is currently in a closed beta, but has a scheduled release for the O'Reilly ET conference in March. You can sign-up for the beta if you are really interested in getting a sneak peek.

Koders -- Is a code search engine that has been around for about a year. Recently, in December, it released plug-ins that integrate it with Eclipse and Visual Studio. The plugins use it's SmartSearch™ technology to find and recommend code similar to the code you are currently writing / viewing in your editor. In their words:
Koders.com helps developers navigate the rich but fragmented open source landscape by indexing thousands of open source software projects and more than 190 million lines of code at leading universities, consortiums and organizations including Apache, Mozilla, Novell Forge, SourceForge, and others.
There is still lots of opportunity for improvement. I'm not too impressed by the results for my test query, "Lucene", -- a popular open source IR library. Lucene's primary language is Java, and yet results are exclusively C on the first page of results. Even using a nice feature that allows you to restrict by language, restricting to Java still does not bring up any of the code on the Lucene project site. Instead, the first result is from some person's thesis: br.ufpe.liber.theses.examples.lucene.

The Eclipse integration is a great step in the right direction, I'll have to give the plug-in a try and give it a fair shot. Search inside the application is where the future of search is heading. After all, it provides a better grasp of the ever elusive "context" which SEs always seem to lament. This is a great feature to keep an eye on. There will be a lot more of this in the future.

Prospector Tool
I mentioned previously that it would be really cool if a search engine could exploit the structural properties of code to provide better SERPS. While not technically a search engine, one software engineering tool that I have been fascinated with is the Prospector tool that allows developers to mine code for snippets and examples. Have a file and need to read its input? Easily find and display other uses of that class as you type within Eclipse. In the authors' words, "Prospector scans and analyzes APIs and bodies of existing application code in advance, and then synthesizes code snippets on the fly in response to programmer queries, solving problems in seconds that otherwise take hours of searching documentation."

I highly recommend checking out David Mandelin's homepage for papers and presentations on code mining. Very neat research. Might be good for the developers of Krugle to look at.

It's an exciting time to be a programmer. The search engines described above and the Prospector tool are new and still immmature. They are still in beta (what isn't!). It will be exciting to see how they develop over the coming year or two. I'm looking forward to some real innovation.

That ends our search for dorks. It's getting late, and my supply of Penguins has run out. Time to go into Suspend Mode.