The Lingpipe blog has a good rant on Lucene's tokenizer infrastructure.
When I get around to it, I have a few rants on Lucene, but the downside is that whenever I start writing about the problems I feel obligated to submit patches. That's the problem with ranting on open source software--it's (usually) open to user contribs.
I'm reminded of Grant Ingersoll's response to a JavaLobby article offering Six Ways of Improving Lucene, he writes, "...thanks for the ideas. Hope to see your patches soon!"
Have a Lucene rant and want to fix it? Read the Lucene doc on How To Contribute.