They just announced an API to access their best seller list.
I also wanted to remind readers about the Time's Annotated Corpus. From the introduction:
The New York Times Annotated Corpus contains over 1.8 million articles written and published by the New York Times between January 1, 1987 and June 19, 2007 with article metadata provided by the New York Times Newsroom, the New York Times Indexing Service and the online production staff at nytimes.com.You can read the full description on the LDC website.
For some ideas on what you could do, you could start by looking at the Stanford data mining course offered this winter.
0 comments:
Post a Comment