[ml] this week: fun with R?

Brian Morris cymraegish at gmail.com
Tue Apr 26 01:11:07 UTC 2011


Could you give an example. Assuming you are indeed facing this situation:
what problem are you working on and what features are you including or
trying ?

In general,

I think that certainly yes you could include too many. depending on the
context you could rank or weight them ? Perhaps then that would give you a
hook to optimize the results with ?

Possibly clues in the application area.

For instance with textual data classification : some preprocessing is
usually prudent. Also here it may be better to work with word triples rather
than individual words, sometimes even rather than working with adjacent
words to pick related words in close proximity, these are ways to reduce the
disambiguation problem (most words have several possible uses that should
not be treated the same - this could be semantic or syntactic), without
resorting to problematic NLP analysis.

Other forms of preprocessing include reducing the data set, either via an
application area understanding (eg news articles vs scientific abstracts) or
a mathematical / statistical method of eliminating less important data
points (such as LSI - which also makes it both faster and less susceptible
to calculation error effects).

On Mon, Apr 25, 2011 at 5:37 PM, Ben Weisburd <ben.weisburd at gmail.com>wrote:

> Hi Mike,
> I'm looking for some help with these topics, so if somebody would be
> willing to talk about them it would be much appreciated:
> - feature selection for binary classification (or any classification) -
> when you're just starting to work on a problem and have some ideas about
> possible features, how do you decide which features are worth including?
> Lets say you don't care about computational cost - should you just include
> all the features you can think of? Or can some features actually hurt
> classification performance (lets you're using SVMs)?
> - iteratively improving performance - lets say you've picked a training set
> of positive and negative examples, optimized meta-parameters through cross
> validation, trained your classifier and run it to get some predictions. When
> you look at the predictions, you see some that you think should have been
> predicted the other way (based on your intuitive understanding of the data).
> What should do?
>
> -Ben
>
>
>
>
> On Mon, Apr 25, 2011 at 2:49 PM, Mike Schachter <mike at mindmech.com> wrote:
>
>> Does anyone want to present something this week or have
>> a specific thing they'd like to talk about? If not, how about
>> we just meet up and mess around with R? I'd like to get
>> random forests going with some example code:
>>
>> http://cran.r-project.org/web/packages/randomForest/index.html
>>
>>  mike
>> _______________________________________________
>> ml mailing list
>> ml at lists.noisebridge.net
>> https://www.noisebridge.net/mailman/listinfo/ml
>>
>
>
> _______________________________________________
> ml mailing list
> ml at lists.noisebridge.net
> https://www.noisebridge.net/mailman/listinfo/ml
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.noisebridge.net/pipermail/ml/attachments/20110425/c7cecdb9/attachment-0003.html>


More information about the ml mailing list