Amazon has a new service, Public Data Sets, where it provides free hosting on EC2 for collections of public data across different domains. This makes it simple to download them or perform computation on Amazon's S3 service.
Should IR groups be using it or a similar model to distribute and perform processing of test collections?
For example, there will likely be a billion document web corpus for TREC 2009. However, there's concern over the number of groups with the resources able to handle a collection that large.