Thursday, April 12

Amazon CloudSearch, Elastic Search as a Service

The search division of Amazon, A9 today announced the release of CloudSearch.  Amazon CTO Werner Vogels announced it on his blog, All Thing Distributed.  The AWS service also has a new post on the announcement.

For the details and pricing, there is also the official CloudSearch details page.

CloudSearch is a fully managed search service based on Amazon's search infrastructure that provides near-realtime, faceted, scalable search.  The index is stored in memory for fast search and updates.

Dynamic Scaling
What makes A9 offering particularly interesting is it's ability to dynamically scale.  The architecture of A9's search system, with shards and replicas, is a common and well-understood model.  What makes Amazon's offering unique is the ability to easily scale your search cluster.  A9 will automatically add (and remove) search instances and index partitions as the index size grows and shrinks.  It will also dynamically add and remove replicas to respond to changes in search request traffic.    The exact details are still not clearly described technically in detail.

Right now, there is a limit to 50 search instances.  An extra large search instance can handle approximately 8 Million 1K documents. It appears that assumption is that the documents are quite small (e.g. product documents).  To put it in perspective, an rough rule of thumb for web documents is approximately 10k.  Given this, it translates into roughly 800k web documents per server * 50 servers = 40 million web documents.  This is not for building large-scale web search, yet.  However, it should be more than enough for most enterprise e-commerce and site-search applications.

The real value added by the search engine is in the ranking of results.

The control over the search index ranking is rudimentary with a few basic knobs.  You can add stopwords, perform stemming, and add synonyms.  This is very basic stuff.    How you might do more interesting (and important) IR ranking changes is vague.  From the article,
Rank expressions are mathematical functions that you can use to change how search results are ranked. By default, documents are ranked by a text relevance score that takes into account the proximity of the search terms and the frequency of those terms within a document. You can use rank expressions to include other factors in the ranking. For example, if you have a numeric field in your domain called 'popularity,' you can define a rank expression that combines popularity with the default text relevance score to rank relevant popular documents higher in your search results.
This indicators that it is possible to boost documents.  However, it is unclear how the underlying text search works in order to boost individual important fields (e.g. name, description).

For more details on the more advanced query processing needed to make search work in practice, read the post: Query Rewriting in Search Engines from Hugh Williams at EBay.  In order to employ these methods, you need log data, which brings me to my next point.

Missing Pieces
A key missing component is usage-driven framework to improve ranking that uses queries, clicks, and other user behavior indicators.  A feedback mechanism to change ranking based analysis (ideally automatic).

Overall, the most compelling aspect of this is the dynamic scaling.  It gives people a simple, platform that scales transparently for many enterprise search and ecommerce applications.


  1. This announcement is great but disappointing.
    A search service was the last missing piece to build a scalable web product easily on AWS so it's great to get one.
    However, I was really expecting something built on Solr for 2 reasons.
    - It will have been an easy replacement for a lot of existing users
    - Configuration and extension will have been much more powerful.

    Let's hope that this will come in the future, maybe when Solr 4 will be finalized.

  2. I agree that API incompatibility with existing systems such as Solr will reduce the incentive to experiment with this service. As far as scalability, it's not clear to me that most enterprise search is so bursty as to need significant back-end scalability. Maybe I am overlooking something.

    What enterprise search definitely needs is a way to handle lots of document formats seamlessly. I would love a service that told me what part of my PDF or Word document matched the query in a way that I could highlight that in the UI.

  3. Many knowledge and information I got from your site, thanks and greetings of friendship from Indonesia.

  4. I was very pleased to find this site.I wanted to thank you for this great read!

  5. Nowadays, most of the businesses rely on cloud based CRM tool to power their business process. They want to access the business from anywhere and anytime. In such scenarios, salesforce CRM will ensure massive advantage to the business owners. Salesforce Training | Salesforce Training in Chennai

  6. Excellent post!!! In this competitive market, customer relationship management plays a significant role in determining a business success. That too, cloud based CRM product offer more flexibility to business owners to main strong relationship with the consumers. Salesforce Training Institutes in Chennai | Salesforce Training in Chennai

  7. this is very nice article from the author , and dont forget to visit my special article at obat paten

  8. Excellent post!!!. The strategy you have posted on this technology helped me to get into the next level and had lot of information in it.
    cloud computing training in chennai | cloud computing courses in chennai

  9. Excellent post!!!. The strategy you have posted on this technology helped me to get into the next level and had lot of information in it.
    salesforce training in chennai | salesforce training institute in chennai

  10. Thanks for sharing this informative content which provided me the required information about the latest technology.
    Salesforce training in Chennai | Salesforce course in Chennai

  11. Posts shared useful information and meaningful life, I'm glad to be reading this article and hope to soon learn the next article.
    Justhost Coupons

  12. Thank you for this valuable information. Get your business to the next level in simple steps. erp in chennai | erp for automotive industry chennai


  13. very useful info, and please keep updating........
    Best Online Software Training

  14. I will do niche blog comment Just in 5$ .All comment relevant with your niche and UNIQUE .This off-page seo will increase your traffic and promote your business.

  15. Excellent post. I used to be checking continuously this blog and I’m impressed! Jelly Gamat Bali
    Very helpful info particularly the ultimate part
    care for such information much. I used to be seeking this
    particular info for a long time.

  16. It is designed with the intention to keep your Computer system safe from threats, virus, and malware. It is automated protection which aims to provide complete online security.
    Bitdefender customer support

  17. ACER customer support provides the high class customer services to our precious customers. We never make our customer unsatisfied with our services. We have team of experts who take care of their customer queries 24*7 and make our clients happy with complete satisfaction. Acer Tech Support

  18. Dell Customer care Number provide the support to their user if any one is having some problem then he or she can share your problems here.
    Dell Customer Support Number

  19. Excellent read, Positive site, where did u come up with the information on this posting? I have read a few of the articles on your website now, and I really like your style. Thanks a million and please keep up the effective work.

  20. Which post you have shared on your blog that is really nice and informative I like your post thanks for the sharing.
    lenovo phone number

  21. Hi! This is my first visit to your blog! We are a collection of volunteers and Your blog provided us valuable information to work on. You have done a outstanding job! Yahoo Customer Care

  22. amazing post and thanks for share this

  23. It is a very informative post. It really helps us a lot.
    AOL Customer Support Number

  24. Technical support may be delivered over by phone or a tool where users can log a call or incident. Larger organizations frequently have internal technical support available to their staff for computer-related problems.
    Windows tech support

  25. nice post I enjoyed it a lot keep sharing like this Hp printer Support

  26. Thanks for sharing a nice article really such a wonderful site you have done a great job once more thanks a lot.

  27. Thanks for sharing informative article on cloud computing technology. Your article helped me a lot in understand the future of cloud technology. Having strong expertise in leading cloud based CRM like Salesforce will ensure better career prospects for aspiring professionals. Salesforce Training in Chennai |salesforce developer training in chennai|salesforce administrator training in chennai

  28. Thanks a lot sir for sharing such a precious tips. It relay help lot. And I feel proud to share these tips.
    Chrome Customer Support Number

  29. I really enjoyed while reading your article, the information you have delivered in this post was damn good. Keep sharing your post with efficient news.
    cloud computing training in chennai|cloud computing training

  30. Thanks a lot sir for sharing such a precious tips. It relay help lot. And I feel proud to share these tips.
    Chrome Customer Support Number

  31. Sap Training Institute in Noida-Webtrackker is the exceptional SAP training center in noida. SAP or systems software and products in facts processing are an extraordinary a part of ERP or organization aid planning. ERP is an integration of several software or packages this is used to streamline the techniques of a big scale enterprise or business. ERP has been essential parts of enterprise organization way manipulate and one of the principal structures of ERP has been SAP.
    Sas Training Institute in Noida
    PHP Training Institute in Noida
    Hadoop Training Institute in Noida
    Oracle Training Institute in Noida
    Linux Training Institute in Noida
    Dot net Training Institute in Noida
    Salesforce training institute in noida
    Java training institute in noida

  32. This is an interesting article about hair and its different style.
    Canon Phone Number

  33. This post is very nice I really like to read about it.
    Canon Support

  34. Australia best tutor is well known academic portal. Here students can get different kind of Online Assignment help services like that

    1.Online Assignment Help
    2.Instant Assignment Help
    3.Assignment Help
    4.Online Assignment Help assignment Help

    And also access that services at any time and any where.

  35. The article you have shared here is very awesome. I really like and appreciate your work. The points you have mentioned in this article are useful. I must try to follow these points and also share others.
    Hifi Escorts In Delhi

  36. very Help Full Post. Look Some Different here at NDA exam

  37. Window ac repair Birmingham-only repair center is the best fact that Microsoft is stopping support for XP and transferring their safety specialists to the later running structures is certainly an awesome signal for Windows XP users, in a manner.

    1. Air Conditioning Installation birmingham,
    2. Commercial air conditioning repair Birmingham,
    3. residential air conditioning repair birmingham ,
    4. Air Conditioning Maintenance birmingham ,
    5. window ac repair birmingham,

  38. Good blog and informative content..Thank you for updating...VLSI Project Center in Chennai | VLSI Project Center in Velachery

  39. Thanks for sharing this information. I really like your blog post very much. You have really shared a informative and interesting blog post with people.
    Escorts in Gurgaon