Thursday, January 29

NY Times Open Blog and 20 years of articles available

The NY Times started an Open blog to share information about their efforts to free more of their data.

They just announced an API to access their best seller list.

I also wanted to remind readers about the Time's Annotated Corpus. From the introduction:
The New York Times Annotated Corpus contains over 1.8 million articles written and published by the New York Times between January 1, 1987 and June 19, 2007 with article metadata provided by the New York Times Newsroom, the New York Times Indexing Service and the online production staff at
You can read the full description on the LDC website.

For some ideas on what you could do, you could start by looking at the Stanford data mining course offered this winter.

No comments:

Post a Comment