[Noisebridge-discuss] uClassify library mashup? (with prize!)

Trochee trochee at gmail.com
Sun Dec 21 22:18:32 UTC 2008


for machine-learners: here's an interesting data challenge.

we need not use uClassify -- in fact, we may be able to beat them --
but it's a great collection of suggested questions.

-jeremy

Sent to you by Trochee via Google Reader: uClassify library mashup?
(with prize!) via Thingology (LibraryThing's ideas blog) by Tim on
12/21/08 I keep up with the Museum of Modern Betas* and today it found
something wonderful: uClassify.

uClassify is a place where you can build, train and use automatic
classification systems. It's free, and can be handled either on the
website or via an API. Of course, this sort of thing was possible
before uClassify, but you needed specialized tools. Now anyone can do
it—on a whim.

Their examples are geared toward the simple:
- Text language. What language is some text in?
- Gender. Did or a man or a woman write the blog? It was made for
genderanalyzer.com (It's right only 63% of the time.)
- Mood.
- What classical author your text is most alike? Used on oFaust.com
(this blog is Edgar Allen Poe).Where did I lose the librarians—mood?
But wait, come back! The language classifier works very well. It
managed to suss-out Norwegian, Swedish and Dutch reviews of the
Hobbit.** So what if the others are trivial? The idea is solid. Create
a classification. Feed it data and the right answer. Watch it get
better and better.

Now, I'm a sceptic of automatic classification in the library world.
There's a big difference between spam/not-spam and, say, giving a book
Library of Congress Subject Headings. But it's worth testing. And, even
if "real" classification is not amenable to automatic processes, there
must be other interesting book- and library-related projects.

The Prize! So, LibraryThing calls on the book and library worlds to
create something cool with uClassify by February 1, 2009 and post it
here. The winner gets Toby Segaran's Programming Collective
Intelligence and a $100 gift certificate to Amazon or IndieBound. You
can do it by hand or programmatically. If you use a lot of LibraryThing
data, and it's not one of the sets we release openly, shoot me an email
about what you're doing and I'll give you green light.

Some ideas. My idea list...
- Fiction vs. Non-Fiction. Feed it Amazon data, Common Knowledge or LT
tags.***
- DDC. Train it with Amazon's DDC numbers and book descriptions. Do ten
thousand books and see how well it's guessing the rest.
- Do a crosswalk, eg., DDC to LCC, BISAC to DDC, DDC to Cutter,
etc.Merry data-driven Christmas!

*A website that tracks new "betas." Basically, it tracks new web 2.0
apps. It also keeps tab of their popularity, according to Delicious
bookmarks. LibraryThing is now number 12, beating out Gmail. Life isn't
fair.
**Yes, we're going to get it going for reviews on the site itself. Give
us some time. Cool as it is, we're pretty busy right now. Note: You
can't give it the URL alone. You have to give it the text of the review.
***We may do this with tags. We already do it very crudely, using it
only for book recommendations.
Things you can do from here:
- Subscribe to Thingology (LibraryThing's ideas blog) using Google
Reader
- Get started using Google Reader to easily keep up with all your
favorite sites
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.noisebridge.net/pipermail/noisebridge-discuss/attachments/20081221/8fc47596/attachment-0002.html>


More information about the Noisebridge-discuss mailing list