[Noisebridge-discuss] Build advice for a new system / heavy cluster GPU AI processing?

Tue Jul 12 03:27:01 UTC 2011

On Mon, Jul 11, 2011 at 22:15, Mike Schachter <mike at mindmech.com> wrote:
> The grid search is your problem! It's unavoidable when you're
> doing cross validation though, because you definitely want the
> parameters that give you the lowest generalization error. You're
> doing cross validation, right?

Of course. That's kinda the main point - I want to know
a) what the best performances is on various parameters of binning,
vectorization method etc
b) whether there's some trend that may be interesting in the C/G
params over that, such as narrowness of optimum params, relationship
to bin size, or the like

Cross-validation results are the primary datum. ;-)

> Although a GPU will help individual instances of training the
> SVM classifer, in general you should parallelize the grid search
> across cores.

Sorry, I should've been clearer - I can easily use all 4 of my cores
using matlabpool (and for that matter multiple remote cores if it's
set up correctly), I just reported the single-core timings for
simplicity.

> Specifically, train an SVM classifer per hyperparameter
> combination (kernel, bin size, etc).

As in one training per hyperparam combo? If that were possible — i.e.
if I didn't have to retrain the damn thing from scratch for every step
in the grid search — that would drastically cut down my optimization
time.

> Also, SVM kind of sucks for multi-class classification. Have you
> considered random forests?

I'm not familiar with that. Could you give me a pointer?

Ideally I would like to be able to compare multiple different
classifier methods, as that's a large part of what interests me in the
question - eg maybe there's some interesting case where some
classifiers are better in one kind of binning and another set are
better for another kind.

Which of course means I still need to run even the slow ones. :-/

Thanks,
- Sai