[Noisebridge-discuss] How to mine for a lot of English phrases
noisebridge at saizai.com
Fri Mar 26 22:37:42 UTC 2010
I know someone who has a giant database of this already.
This email is BCC to him, he'll contact you if interested.
On Fri, Mar 26, 2010 at 3:30 PM, Micah Lee <micahflee at gmail.com> wrote:
> Hi Noisebridge, I'm working on a cryptogram Android/iPhone game and I
> need to create a large databases of short English sentences that make
> sense. Things like popular sayings and quotes are great, or pieces of
> lyrics from songs, or famous lines from plays. They need to be between
> 40 and 84 characters (I'll have to test each phrase to make sure it
> fits the actual max size, which will likely be shorter than 84
> characters due to word-wrapping). I'm hoping to get a large database
> to work with, somewhere around 20,000 phrases.
> I've tried googling for phrase databases but it isn't leading anywhere
> good. I'll probably write some software that scrapes websites like
> wikiquote.org for phrases of that size. It's also important that the
> phrases make sense out of context and are in sensible English, which
> rules out twitter feeds. Overall I don't have much experience in data
> mining. Anyone have any suggestions?
> Noisebridge-discuss mailing list
> Noisebridge-discuss at lists.noisebridge.net
More information about the Noisebridge-discuss