[Noisebridge-discuss] Build advice for a new system / heavy cluster GPU AI processing?

Sai sai at saizai.com
Sat Jul 9 07:45:07 UTC 2011


Hi all.

I've been running a very heavy classification AI project, and well…
it's taking too fucking long to be realistic.[0]

I'm consequently thinking of
a) upgrading my home desktop to something more beefy — it's 3 years
old [1] — to something newer, and/or
b) using something like EC2's cluster GPU spot instances [2] together
with matlab parallel computing toolkit

I've looked into GPU enabled libsvm a bit - libsvm is the main thing
I'm currently using - and I've found a couple variant libraries [3]
that use nVidia's CUDA API to get 3-50x improvement over CPU-only
efficiency. I haven't found any non-beta ones that use opencl [4], and
I am not familiar with that level of programming so can't currently
create one. I'm willing learn it if needed, but that'd be a
significant investment.

AFAICT from talking w/ Ryan, CUDA is nVidia only but more well
advanced and supported by classifier libraries, whereas opencl is
supported by both Radeon & nVidia, possibly by more stuff in the
future from being compatible, but currently not as well by the
libraries I'm looking at.

Another option on GPUs of course is a mobo that supports multiple full
GPU cards and have one of each type.

Unfortunately I've not been following the hardware market at all for
the last 3 years, so I have no idea what the current sweet spots are
for the various combinations of mobo, CPU, GPU, etc.


So, I'd appreciate your advice:

1. Are there any good libraries or methods I may have missed that
would be more efficient for my purposes, or compatible w/ both major
GPU brands? (I'm open to non-SVM / MATLAB stuff too.)

My knowledge of classifier AIs is what I would call basic (though more
advanced than most); I don't have the math chops to really grok the
harder aspects of the linear algebra involved and have only studied it
so far as much as has been needed to get projects bootstrapped. I
would however be interested in learning a great deal more, so pointers
to good textbooks or whitepapers that would get me bootstrapped better
would be appreciated.

2. Is it worth upgrading my system vs renting EC2 instances?

I'd rather have hardware I can keep, and local accessibility of it,
but I'm not sure how much of a premium that'll cost me.

3. If I do decide to upgrade my system, what's the current optimum
"sweet spot" build of reasonably priced hardware?

Probably the biggest constraint is that I primarily run OSX86 (sorry,
I like it a lot more than any other OS I've used); preferably it
should cost less than ~$3k (mind, I already have perfectly fine case,
displays, sound system, HDs - we're only talking about internal
hardware).

It has to work as my day-to-day desktop machine (so e.g. driver
compatibility w/ OSX86, Win 7, & Kubuntu), support at least 2 and
preferably more monitors, sound system, SATA drives, etc (the usual
stuff), as well as being capable of delivering a fair amount of power
for the AI processing. Y'all are fellow hackers, so you probably
already have good ideas of what you'd want out of your own systems,
and that's probably reasonably close to my wants.

Thanks,
Sai



[0] It's an 8-class pattern classification problem on a few thousand
samples of direct-lead acquired neural firing traces, to investigate
both the "time code" vs "pattern code" theories of neural firing and
various aspects of the classification, like accuracy / compute time
tradeoffs.

Testing just one C/G pairing of one of the implementations - and
tuning optimization of the two using a simple hillclimbing grid search
algorithm requires testing a lot of 'em - is taking 17 hours (w/ one
CPU core, CPU bound).

And I want to test the Cartesian cross of several different
vectorizations, bin sizes, and SVM kernels. It's just not feasible at
this speed.

I'm reasonably sure that I'm not doing anything *too* stupid;
basically all the time is being spent in libsvm itself. No swapping or
other bottlenecks in my own code.

[1] Current build:

GIGABYTE GA-EP35-DS3L LGA 775 Intel P35 ATX Intel Motherboard
Intel Core 2 Quad Q6600 Kentsfield 2.4GHz LGA 775 Quad-Core Processor
Model BX80562Q6600
G.SKILL 4GB (2 x 2GB) 240-Pin DDR2 SDRAM DDR2 1066 (PC2 8500) Dual
Channel Kit Desktop Memory Model F2-8500CL5D-4GBPK
MSI NX8800GT 512M OC GeForce 8800GT 512MB 256-bit GDDR3 PCI Express
2.0 x16 HDCP Ready SLI Supported Video Card
Antec Nine Hundred Black Steel ATX Mid Tower Computer Case
PC Power & Cooling Silencer 750 Quad (Red) 750W EPS12V Power Supply

[2] http://aws.amazon.com/ec2/hpc-applications/
http://aws.amazon.com/ec2/spot-instances/

[3] http://mklab.iti.gr/project/GPU-LIBSVM
https://code.google.com/p/multisvm/
http://patternsonascreen.net/cuSVM.html

[4] https://code.ac.upc.edu/projects/nnvect/blog/author/ijurado



More information about the Noisebridge-discuss mailing list