[ml] Meeting Notes: Generative Music, Music identification, Restricted Boltzmann Machines, etc.

gershon bialer gershon.bialer at gmail.com
Sat Mar 24 06:17:46 UTC 2012

== Generative Music
We discussed making something for generative music. I'm going to try
to start something with this, and I should push this onto my github.

This should involve:
1) Get the raw audio data from a file into a waveform
PyMir (see https://github.com/jsawruk/pymir does this) does this
calling ffmpeg (see

2) Get the spectogram
Basically, we just take apply a hamming function (see
http://en.wikipedia.org/wiki/Hamming_function), and then do a Fourier
transform. This gets the signals from the sound, and  You can see this
in action at https://github.com/jsawruk/pymir/blob/master/pymir/audio/transform.py.

3) Additional pre-processing
Some choices are
  a) MFCC (see http://en.wikipedia.org/wiki/Mel-frequency_cepstrum)
  b) Linear Predictive Coding (see
  c) NMF (see http://en.wikipedia.org/wiki/NMF)
  d) Something better?

4) Fit the music to some sort of model for generating the music. The
idea is to predict s_k (pre-processed sound at time k) from
s_{k-1},s_{k-2},..s_{k-l} with some lag.

5) Apply the generative model from step 4 to generate music

6) Invert pre-processing steps to get a new waveform
This may or may not work very well.

== Music Identification
Another interesting project is echoprint. The echoprint project (see
http://echoprint.me/) has code for fingerprinting music. This involves
some sort of preprocessing and then binning. It might be interesting
to improve this.

The relevant code seems to be at
https://github.com/echonest/echoprint-codegen/tree/master/src in the
SubBandAnalysis and FingerPrint classes. The sub-band class seems to
create a time series of the amplitude of various frequency bands. I
think the FingerPrint class quantizes this data, and applies
MurmurHash. If someone has a better understanding of this, let me

== Contributing
=== PyMir
It is at https://github.com/jsawruk/pymir and numpy and other python libraries.
Things to do:
* Add more audio pre-processing functions (MFCC, NMF, LPC, etc.)
* Improve documentation
* Add better unit testing
* Better visualization of audio (this should be fairly easy with pyPlot)
* Direct bindings to the FFMPEG api
=== A new library for restricted Boltzmann machine deep learning
The idea would be to create a new C/C++ library for doing deep
learning. Theano has some capabilities, but it isn't as fast as it
could be, and it requires a CUDA Nvidia GPU to be fast. Presumably,
this would follow the ideas of http://deeplearning.net/.

For linear algebra operations, we could use Armadillo (see
http://arma.sourceforge.net/), or Eigen (see

Boost (see http://www.boost.org/) might be useful for its
pseudo-random number generator (see
http://www.boost.org/doc/libs/1_49_0/doc/html/boost_random.html) and
possibly other things.

You can look over how this is done with Theano at

== Upcoming conferences, contests, etc.
We are looking at entering PyMir in the ACM Multimedia conference in
Japan (see http://www.acmmm12.org/call-for-open-source-software-competition/).
I understand there are some other local conferences relating to this
stuff. If you have details, please send them to the list.

 == Next meeting
When does everyone want to meet next?

Gershon Bialer

More information about the ml mailing list