[Noisebridge-discuss] How to mine for a lot of English phrases
Micah Lee
micahflee at gmail.com
Fri Mar 26 22:30:27 UTC 2010
Hi Noisebridge, I'm working on a cryptogram Android/iPhone game and I
need to create a large databases of short English sentences that make
sense. Things like popular sayings and quotes are great, or pieces of
lyrics from songs, or famous lines from plays. They need to be between
40 and 84 characters (I'll have to test each phrase to make sure it
fits the actual max size, which will likely be shorter than 84
characters due to word-wrapping). I'm hoping to get a large database
to work with, somewhere around 20,000 phrases.
I've tried googling for phrase databases but it isn't leading anywhere
good. I'll probably write some software that scrapes websites like
wikiquote.org for phrases of that size. It's also important that the
phrases make sense out of context and are in sensible English, which
rules out twitter feeds. Overall I don't have much experience in data
mining. Anyone have any suggestions?
micah
More information about the Noisebridge-discuss
mailing list