The training dataset that they provide is based on the TREC Million Query Track. The speller challenge team annotated a sample of the track logs and has it available for download. The dataset contains 5892 queries, 311 mispelled queries, and 1122 queries that have some suggested spelling change. The challenge will be evaluating using the Expected F1 (EF1) score, the harmonic mean of precision and recall.
I am a bit skeptical about the training dataset. First, the number of mispellings is significantly lower than I would expect. It is also quite small. The website appears to offer a place for contestants to share data, the "Team datasets", so it's possible that some teams could annotate a larger dataset. Also, the non-mispelling "suggestions" are not clearly described. I think these need more explaining and differentiation from query reformulations. To put this into more context, here is a very brief overview of query correction.
According to several studies of search queries, approximately 10-15% of search queries contain spelling errors. For example an important paper in the area is Spelling correction as an iterative process that exploits the collective knowledge of web users. Beyond spelling correction, there has been a recent trend towards other types of query reformulations: stemming, substitutions, and query expansion.
If you want an easy way to get started, consider the LingPipe toolkit, which has a tutorial and is offering a special license for the competition. You may also be inspired by Peter Norvig's 21 line spelling corrector in Python.