[ml] this week: fun with R?

Tue Apr 26 04:22:38 UTC 2011

Hey Ben!

There's a technique people use in random forests to
determine how important a feature is, but it's probably
applicable to any technique -

Fit your model using all the features, and determine the
misclassification rate, ROC, whatever your performance
measure is. Then, take a feature of interest, and add random
noise to it to mess it all up. Re-run your training algorithms
on all the features, including the corrupted one, and recompute
your performance measure. This should give you an idea of
how important that feature is in regards to your performance
measure. Do that for each feature and rank them in terms of
how much damage they do to your performance measure when
corrupted by noise.

I interpret your second question as "my model doesn't fit the
intuitive hypothesis I have about my data, why?!?"

Either the model you're using is inadequate, or your intuition
is wrong! To test the first hypothesis, use different types of
classifiers, like compare a linear discriminant to an SVM to
a random forest. Does it still make the same weird predictions?
Then maybe you're intuition is wrong!

  mike

On Mon, Apr 25, 2011 at 5:37 PM, Ben Weisburd <ben.weisburd at gmail.com> wrote:
> Hi Mike,
> I'm looking for some help with these topics, so if somebody would be willing
> to talk about them it would be much appreciated:
> - feature selection for binary classification (or any classification) - when
> you're just starting to work on a problem and have some ideas about possible
> features, how do you decide which features are worth including? Lets say you
> don't care about computational cost - should you just include all the
> features you can think of? Or can some features actually hurt classification
> performance (lets you're using SVMs)?
> - iteratively improving performance - lets say you've picked a training set
> of positive and negative examples, optimized meta-parameters through cross
> validation, trained your classifier and run it to get some predictions. When
> you look at the predictions, you see some that you think should have been
> predicted the other way (based on your intuitive understanding of the data).
> What should do?
> -Ben
>
>
>
> On Mon, Apr 25, 2011 at 2:49 PM, Mike Schachter <mike at mindmech.com> wrote:
>>
>> Does anyone want to present something this week or have
>> a specific thing they'd like to talk about? If not, how about
>> we just meet up and mess around with R? I'd like to get
>> random forests going with some example code:
>>
>> http://cran.r-project.org/web/packages/randomForest/index.html
>>
>>  mike
>> _______________________________________________
>> ml mailing list
>> ml at lists.noisebridge.net
>> https://www.noisebridge.net/mailman/listinfo/ml
>
>