[ml] PyMir mp3 playback and STFT processing

Thu Mar 22 16:35:06 UTC 2012

Gershon, preprocessing is very important. When
it comes to linear models for neurons, we compute
the spectrogram and then take the logarithm. In a
deep net setting, you probably want to have a time-lag
representation for your input, where the input is not
just the frequencies at time t - but a long vector comprised
of frequencies at time t concatenated with those at time t-1,
t-2, and so forth back to some t - N.

We can talk more about this tonight, sounds like you're
doing awesome stuff!

  mike

On Wed, Mar 21, 2012 at 11:26 PM, gershon bialer
<gershon.bialer at gmail.com> wrote:
> I forked Jeremy's PyMir code on my github at http://github.com/gersh/pymir.
>
> I followed John's suggestions and I got the ffmpeg encodings to play
> with audiolab. I think there may some constraints on this API that
> make it less than ideal, if someone wants to get it working with a
> better API.
>
> I added Steven's stft code from stackoverflow to
> pymir/audio/transforms.py. I think it would be good to do stft, istft,
> then play it as a test case even if we lose some quality. If you look
> at the readmp3.py file I have wrote some commented out test code for
> this, but I don't think it currently works. I wasn't sure what
> parameters to use for encoding, and I haven't had time to really mess
> with it.
>
> It would be good to improve the organization of the test cases in the
> codebase. I haven't used Python all that much, so I don't really know
> what is best practice for Python on this.
>
> There is some cool stuff we could do with machine learning with a
> decent input interface. If the audio is properly pre-processed, I'd
> like to trying using the deep learning stuff, which Mike suggested.
> How much pre-processing is required for deep learning? Is it possible
> to just work from the raw audio? Can the pre-processing be
> incorporated into the neural network to allow fuller back-progation?
>
> On Tue, Mar 20, 2012 at 12:27 AM, John Hurliman <jhurliman at cull.tv> wrote:
>> I wrote a simple ffmpeg wrapper for extracting audio recently
>> (https://github.com/jhurliman/node-pcm). It's a node.js library but here is
>> the relevant ffmpeg command:
>>
>> var ffmpeg = spawn('ffmpeg', ['-i',filename,'-f','s16le','-ac',channels,
>> '-acodec','pcm_s16le','-ar',sampleRate,'-y','pipe:1']);
>>
>> Then read in stdout as a stream of 16-bit signed little endian integers and
>> divide each by 32767.0 to convert to floating point. Hope that helps, and
>> with any luck I'll make it this Thursday.
>>
>> Best,
>> John Hurliman
>>
>>
>> On Mon, Mar 19, 2012 at 10:57 PM, gershon bialer <gershon.bialer at gmail.com>
>> wrote:
>>>
>>> Hi Steve,
>>>
>>> Thats cool that you wrote that Stack Overflow answer.
>>>
>>> PyMIR looks like a good start. I see that Jeremy has a nice hack for
>>> importing from ffmpeg. I suppose we could try using ffmpeg's API
>>> directly, although that can be a tricky API to work with. I'd like to
>>> be able to play this at least as a sanity check. I suppose you might
>>> be able to play it with audiolab, but I think that requires converting
>>> from int16 to float. I suppose float might be better for fft and such,
>>> anyway. I tried feeding it back to ffplay with:
>>>   ffmpeg = Popen([
>>>            "ffplay",
>>>            "-i -"],
>>>            stdin=PIPE, stderr=open(os.devnull,"w"))
>>>   ffmpeg.communicate(mp3Array.tostring())
>>> but that doesn't seem to work. What do you think is the best way to do
>>> this?
>>>
>>> MFCC would be cool to work with. Is it invertible? How does it sound
>>> inverted?
>>>
>>> Does NMF give a sparse representation? What is a good reference on NMF?
>>>
>>> Thanks,
>>> Gershon Bialer
>>>
>>> On Mon, Mar 19, 2012 at 4:33 PM, Steve Tjoa <stjoa at izotope.com> wrote:
>>> > Hello Gershon, others,
>>> >
>>> > Lurker here. That happens to be my code and Stack Overflow answer that
>>> > you
>>> > linked to!
>>> >
>>> > Regarding concerns in this email thread:
>>> >
>>> > 1. Despite that "Python in Music" page, the lack of basic, simple
>>> > audio/music processing libraries in Python has motivated my friend
>>> > Jeremy to
>>> > begin a Github repo for that very purpose named PyMIR:
>>> > (http://jsawruk.com/?p=141). Feel free to use or contribute.
>>> >
>>> > 2. In there, you will find an MP3 importer that Jeremy wrote.
>>> >
>>> > 3. I have custom-brewed stuff for audio feature extraction operations,
>>> > including MFCCs. I also have sparse coding and NMF stuff.  If there are
>>> > specific requests that I can fulfill, I will add them to the repo.
>>> >
>>> > Please feel free to ask if you have any questions.
>>> >
>>> > Steve
>>> > http://stevetjoa.com
>>> >
>>> >
>>> > On Sun, Mar 18, 2012 at 10:59 PM, gershon bialer
>>> > <gershon.bialer at gmail.com>
>>> > wrote:
>>> >>
>>> >> Yeah, thursday would be cool.
>>> >>
>>> >> Friture looks interesting, I'll have to see I found some code at
>>> >> http://stackoverflow.com/questions/2459295/stft-and-istft-in-python
>>> >> for doing the spectogram. I couldn't find a good library for importing
>>> >> mp3's into python. Although, I suppose we can work with wav files for
>>> >> now.
>>> >>
>>> >> On Sun, Mar 18, 2012 at 10:50 PM, Mike Schachter
>>> >> <mschachter at eigenminds.com> wrote:
>>> >> > Hey Gershon,
>>> >> >
>>> >> > Do you want to meet up this Thursday and talk about
>>> >> > time-frequency representations for sound? I'm looking
>>> >> > at various packages in python. One that struck my eye
>>> >> > was a real-time spectrogram package:
>>> >> >
>>> >> > http://tlecomte.github.com/friture/
>>> >> >
>>> >> > Anyone else interested in this kind of stuff too? I could
>>> >> > put something on the calendar and make an official-like
>>> >> > announcement.
>>> >> >
>>> >> >  mike
>>> >> >
>>> >> > On Thu, Mar 15, 2012 at 12:00 PM, Mike Schachter
>>> >> > <mschachter at eigenminds.com> wrote:
>>> >> >> That's awesome Gershon!
>>> >> >>
>>> >> >> I can't come out tonight, but how about we meet
>>> >> >> up next Thursday and have a discussion about using
>>> >> >> deep nets for sound feature extraction? Spectrograms
>>> >> >> are also be invertible feature representation, as long
>>> >> >> as you use the overlapping windows for the FFT.
>>> >> >>
>>> >> >>  mike
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> On Thu, Mar 15, 2012 at 11:42 AM, gershon bialer
>>> >> >> <gershon.bialer at gmail.com> wrote:
>>> >> >>> Hi,
>>> >> >>>
>>> >> >>> Do you want to meet again tonight?
>>> >> >>>
>>> >> >>> I played a bit with trying to build a generative model for creating
>>> >> >>> music like we were talking about. I also read the papers and looked
>>> >> >>> at
>>> >> >>> the tutorial on deep learning.
>>> >> >>>
>>> >> >>> I think the first step is to find an invertible, sparse, feature,
>>> >> >>> representation. I think this would be MFCC or some sort of linear
>>> >> >>> predictive coding. I suppose you could then apply some of the deep
>>> >> >>> learning stuff to it for a generative model. Any thoughts?
>>> >> >>> --
>>> >> >>> ---------------------
>>> >> >>> Gershon Bialer
>>> >> >>> _______________________________________________
>>> >> >>> ml mailing list
>>> >> >>> ml at lists.noisebridge.net
>>> >> >>> https://www.noisebridge.net/mailman/listinfo/ml
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> ---------------------
>>> >> Gershon Bialer
>>> >> _______________________________________________
>>> >> ml mailing list
>>> >> ml at lists.noisebridge.net
>>> >> https://www.noisebridge.net/mailman/listinfo/ml
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> ---------------------
>>> Gershon Bialer
>>> _______________________________________________
>>> ml mailing list
>>> ml at lists.noisebridge.net
>>> https://www.noisebridge.net/mailman/listinfo/ml
>>
>>
>>
>> _______________________________________________
>> ml mailing list
>> ml at lists.noisebridge.net
>> https://www.noisebridge.net/mailman/listinfo/ml
>>
>
>
>
> --
> ---------------------
> Gershon Bialer
> _______________________________________________
> ml mailing list
> ml at lists.noisebridge.net
> https://www.noisebridge.net/mailman/listinfo/ml