<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/'><id>tag:blogger.com,1999:blog-18315968.post6265425253909351506..comments</id><updated>2011-10-21T05:55:33.266-04:00</updated><category term='lingpipe'/><category term='nlp'/><category term='information retrieval'/><category term='java'/><category term='information extraction'/><category term='stemming'/><category term='personalization'/><category term='software'/><category term='local community'/><category term='chandler'/><category term='open source'/><category term='local search'/><category term='google'/><title type='text'>Comments on Jeff's Search Engine Caffè: Open Source Search Engines, Retrieval Tools and Li...</title><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://www.searchenginecaffe.com/feeds/6265425253909351506/comments/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18315968/6265425253909351506/comments/default'/><link rel='alternate' type='text/html' href='http://www.searchenginecaffe.com/2007/03/open-source-search-engines-in-java-and.html'/><author><name>jeff.dalton</name><uri>http://www.blogger.com/profile/12887721174386884522</uri><email>noreply@blogger.com</email><gd:image xmlns:gd='http://schemas.google.com/g/2005' rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='31' src='http://1.bp.blogspot.com/-BQPIreWshSg/Tf-6pG_XoCI/AAAAAAAAACs/0kJUPQH9tQI/s220/tw-32-sm.jpg'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>7</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-18315968.post-3421430950950096230</id><published>2011-10-21T05:55:33.266-04:00</published><updated>2011-10-21T05:55:33.266-04:00</updated><title type='text'>On the MG4J subject you might mention Mímir -- htt...</title><content type='html'>On the MG4J subject you might mention Mímir -- http://gate.ac.uk/family/mimir.html -- which layers annotation structure and semantic search (via an RDF repository) on top of MG4J.&lt;br /&gt;&lt;br /&gt;A nice list -- thanks!</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18315968/6265425253909351506/comments/default/3421430950950096230'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18315968/6265425253909351506/comments/default/3421430950950096230'/><link rel='alternate' type='text/html' href='http://www.searchenginecaffe.com/2007/03/open-source-search-engines-in-java-and.html?showComment=1319190933266#c3421430950950096230' title=''/><author><name>Hamish Cunningham</name><uri>http://www.blogger.com/profile/14849126066243170567</uri><email>noreply@blogger.com</email><gd:image xmlns:gd='http://schemas.google.com/g/2005' rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://www.searchenginecaffe.com/2007/03/open-source-search-engines-in-java-and.html' ref='tag:blogger.com,1999:blog-18315968.post-6265425253909351506' source='http://www.blogger.com/feeds/18315968/posts/default/6265425253909351506' type='text/html'/><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='blogger.itemClass' value='pid-2069739547'/></entry><entry><id>tag:blogger.com,1999:blog-18315968.post-1585433318035595440</id><published>2008-05-23T11:38:00.000-04:00</published><updated>2008-05-23T11:38:00.000-04:00</updated><title type='text'>Jeff,&lt;br&gt;&lt;br&gt;Thanks for putting this site together...</title><content type='html'>Jeff,&lt;BR/&gt;&lt;BR/&gt;Thanks for putting this site together - very informative. &lt;BR/&gt;&lt;BR/&gt;I wanted to get your thoughts on one mof my requirements:&lt;BR/&gt;&lt;BR/&gt;I am trying to build a search application that would require the following features of a search engine:&lt;BR/&gt;&lt;BR/&gt;1. ability to handle federated searches (collating search results, ranking, etc)&lt;BR/&gt;&lt;BR/&gt;2. The federated searches have to support Web-services to access data from multiple sources and in some cases be able to go directly against a database.&lt;BR/&gt;&lt;BR/&gt;I am looking for a pluggable java-based open source solution. Would you have any thoughts on this?&lt;BR/&gt;&lt;BR/&gt;&lt;BR/&gt;Thanks...</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18315968/6265425253909351506/comments/default/1585433318035595440'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18315968/6265425253909351506/comments/default/1585433318035595440'/><link rel='alternate' type='text/html' href='http://www.searchenginecaffe.com/2007/03/open-source-search-engines-in-java-and.html?showComment=1211557080000#c1585433318035595440' title=''/><author><name>Anonymous</name><email>noreply@blogger.com</email><gd:image xmlns:gd='http://schemas.google.com/g/2005' rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img1.blogblog.com/img/blank.gif'/></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://www.searchenginecaffe.com/2007/03/open-source-search-engines-in-java-and.html' ref='tag:blogger.com,1999:blog-18315968.post-6265425253909351506' source='http://www.blogger.com/feeds/18315968/posts/default/6265425253909351506' type='text/html'/><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='blogger.itemClass' value='pid-2074533873'/></entry><entry><id>tag:blogger.com,1999:blog-18315968.post-5251194234423568980</id><published>2007-04-06T15:57:00.000-04:00</published><updated>2007-04-06T15:57:00.000-04:00</updated><title type='text'>TopX would be another engine from the research wor...</title><content type='html'>TopX would be another engine from the research world that you could add...&lt;BR/&gt;&lt;BR/&gt;It is now available at:&lt;BR/&gt;&lt;BR/&gt;http://topx.sourceforge.net/</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18315968/6265425253909351506/comments/default/5251194234423568980'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18315968/6265425253909351506/comments/default/5251194234423568980'/><link rel='alternate' type='text/html' href='http://www.searchenginecaffe.com/2007/03/open-source-search-engines-in-java-and.html?showComment=1175889420000#c5251194234423568980' title=''/><author><name>Anonymous</name><email>noreply@blogger.com</email><gd:image xmlns:gd='http://schemas.google.com/g/2005' rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img1.blogblog.com/img/blank.gif'/></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://www.searchenginecaffe.com/2007/03/open-source-search-engines-in-java-and.html' ref='tag:blogger.com,1999:blog-18315968.post-6265425253909351506' source='http://www.blogger.com/feeds/18315968/posts/default/6265425253909351506' type='text/html'/><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='blogger.itemClass' value='pid-86954303'/></entry><entry><id>tag:blogger.com,1999:blog-18315968.post-6269041758910525967</id><published>2007-03-30T20:42:00.000-04:00</published><updated>2007-03-30T20:42:00.000-04:00</updated><title type='text'>Do you know of existing projects that are dedicate...</title><content type='html'>Do you know of existing projects that are dedicated to crawl only certain document type: PDF, DOC and PPT?&lt;BR/&gt;&lt;BR/&gt;I'm interested in building a web site that would allow the community of researchers, lawyers, etc. directly search these document types.&lt;BR/&gt;&lt;BR/&gt;I know Google has a filetype: filter, but I'd like to do more around the document: a la Digg, bookmarking them, discussing them, etc.</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18315968/6265425253909351506/comments/default/6269041758910525967'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18315968/6265425253909351506/comments/default/6269041758910525967'/><link rel='alternate' type='text/html' href='http://www.searchenginecaffe.com/2007/03/open-source-search-engines-in-java-and.html?showComment=1175301720000#c6269041758910525967' title=''/><author><name>laurent</name><uri>http://www.blogger.com/profile/05461623662714150435</uri><email>noreply@blogger.com</email><gd:image xmlns:gd='http://schemas.google.com/g/2005' rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://www.searchenginecaffe.com/2007/03/open-source-search-engines-in-java-and.html' ref='tag:blogger.com,1999:blog-18315968.post-6265425253909351506' source='http://www.blogger.com/feeds/18315968/posts/default/6265425253909351506' type='text/html'/><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='blogger.itemClass' value='pid-617839221'/></entry><entry><id>tag:blogger.com,1999:blog-18315968.post-7892470858988801477</id><published>2007-03-30T12:32:00.000-04:00</published><updated>2007-03-30T12:32:00.000-04:00</updated><title type='text'>Hi Jeff,&lt;br&gt;&lt;br&gt;Thanks for posting this list. Two ...</title><content type='html'>Hi Jeff,&lt;BR/&gt;&lt;BR/&gt;Thanks for posting this list. Two quick comments...&lt;BR/&gt;&lt;BR/&gt;1. I'd also mention that Lucene is used by Solr, a cool enterprise search server (used by CNET, Krugle and others).&lt;BR/&gt;&lt;BR/&gt;2. The maximum usable size of a single Lucene index has many free variables. For a typical Nutch-generated index, I think the upper bounds of 10-20M documents (assuming standard hardware) is about right. Document size, number of fields, complexity of the query, and amount of RAM are among the factors that can make this number go up or down.&lt;BR/&gt;&lt;BR/&gt;Lucene does support merging results from multiple indexes, and adjusting for IDF skew in the process. The main problem here (IMO) with effectively using this support is that the operational support (e.g. code/scripts for managing federated searchers) doesn't really exist.</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18315968/6265425253909351506/comments/default/7892470858988801477'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18315968/6265425253909351506/comments/default/7892470858988801477'/><link rel='alternate' type='text/html' href='http://www.searchenginecaffe.com/2007/03/open-source-search-engines-in-java-and.html?showComment=1175272320000#c7892470858988801477' title=''/><author><name>Ken Krugler</name><uri>http://krugle.com</uri><email>noreply@blogger.com</email><gd:image xmlns:gd='http://schemas.google.com/g/2005' rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img1.blogblog.com/img/blank.gif'/></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://www.searchenginecaffe.com/2007/03/open-source-search-engines-in-java-and.html' ref='tag:blogger.com,1999:blog-18315968.post-6265425253909351506' source='http://www.blogger.com/feeds/18315968/posts/default/6265425253909351506' type='text/html'/><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='blogger.itemClass' value='pid-289508537'/></entry><entry><id>tag:blogger.com,1999:blog-18315968.post-7908445155612531221</id><published>2007-03-29T11:17:00.000-04:00</published><updated>2007-03-29T11:17:00.000-04:00</updated><title type='text'>Hi Otis,&lt;br&gt;&lt;br&gt;I talked to some of the Amazon dev...</title><content type='html'>Hi Otis,&lt;BR/&gt;&lt;BR/&gt;I talked to some of the Amazon developers at SIGIR 2006.  They were giving out free t-shirts at their booth :-).</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18315968/6265425253909351506/comments/default/7908445155612531221'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18315968/6265425253909351506/comments/default/7908445155612531221'/><link rel='alternate' type='text/html' href='http://www.searchenginecaffe.com/2007/03/open-source-search-engines-in-java-and.html?showComment=1175181420000#c7908445155612531221' title=''/><author><name>jeff.dalton</name><uri>http://www.blogger.com/profile/12887721174386884522</uri><email>noreply@blogger.com</email><gd:image xmlns:gd='http://schemas.google.com/g/2005' rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://photos1.blogger.com/img/267/8468/100/images.jpg'/></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://www.searchenginecaffe.com/2007/03/open-source-search-engines-in-java-and.html' ref='tag:blogger.com,1999:blog-18315968.post-6265425253909351506' source='http://www.blogger.com/feeds/18315968/posts/default/6265425253909351506' type='text/html'/><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='blogger.itemClass' value='pid-1997369634'/></entry><entry><id>tag:blogger.com,1999:blog-18315968.post-1013916861862933049</id><published>2007-03-29T08:57:00.000-04:00</published><updated>2007-03-29T08:57:00.000-04:00</updated><title type='text'>Oh, I didn't know Amazon's search inside the book ...</title><content type='html'>Oh, I didn't know Amazon's search inside the book uses Lucene.  Where did you see they use Lucene for that?&lt;BR/&gt;&lt;A HREF="http://www.simpy.com/" REL="nofollow"&gt;Simpy&lt;/A&gt; uses Lucene, too, of course.</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18315968/6265425253909351506/comments/default/1013916861862933049'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18315968/6265425253909351506/comments/default/1013916861862933049'/><link rel='alternate' type='text/html' href='http://www.searchenginecaffe.com/2007/03/open-source-search-engines-in-java-and.html?showComment=1175173020000#c1013916861862933049' title=''/><author><name>Otis Gospodnetic</name><uri>http://www.simpy.com/</uri><email>noreply@blogger.com</email><gd:image xmlns:gd='http://schemas.google.com/g/2005' rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img1.blogblog.com/img/blank.gif'/></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://www.searchenginecaffe.com/2007/03/open-source-search-engines-in-java-and.html' ref='tag:blogger.com,1999:blog-18315968.post-6265425253909351506' source='http://www.blogger.com/feeds/18315968/posts/default/6265425253909351506' type='text/html'/><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='blogger.itemClass' value='pid-463804634'/></entry></feed>
