Help - Search - Members - Calendar
Full Version: Wav -> Dsp -> Components
440 Forums > English > Mac Music > Music Software > Development

I am developing an MLP Artifical Neural Network application (, but am now at the stage of using a DSP toolkit to remove certain traits (components) of a .wav file.

I am developing in C++ in XCode on Panther.

Can anyone suggest a decent toolkit to use that will easily allow me to give it a wav and extract numerical components?

Best Regards,
Chris Tingley
What do you define as "numerical components" and what kind of traits do you want to remove ?

You can pretty easily use "AudioUnits" to chain DSP effects together. Once the chain is set you just need to define a callback called to feed the samples to the chain, and another callback called with the processed audio at the end on the chaini.

I am after numerical values for things such as:
  • Loudness (RMS)
  • Brightness
  • Pitch
  • Bandwidth
  • Average frequency
  • Harmonicity
But I want to research several different characteristics to find out which ones work best for the classification problem. At the moment, I am hoping to differentiate between composers.

Can you tell me more about this "AudioUnits"? I have downloaded the SDK off the apple website - but am still new to programming on the Mac. Are there any examples of using this SDK to open wav files and get parameters from them??

Any help is appriciated!

Being not only a musician, but also a developer with lots of experience in neural networks AND dsp/audio (separately, but still), I can tell you that you won't find a set of "numerical parameter" that you seek, especialy a set that is discriminant enough to feed a neural network.

Oh yeah the easy ones are, well, easy : loudness & bandwidth, and average frequency. But just "brigthness" is totaly subjective, "pitch" for simple melodies you can get "fairly good" result (assuming a clean sound without too much effects), but as you add polyphony it just collapses.
Harmonicity without accurate "pitch" is totaly out of the question.

So, not wanting to rain on your parade, but if you managed to make a program that can JUST discriminate -reliably- polyphony in a regular piece of music you'd probabaly already have patent material.

So as for picking the composer... blink.gif
Well, I must say that that doesnt sound too positive! Discriminating between composers/artists was always the goal - but for my project, I never thought i would be able to extarct enough features to make this possible.

I did belive however, that i would be able to extarct enough features to distinguish between things that are completely musically different e.g. Rock and Classical. Of course at that level, you could say that the network was distinugishing between composers - for that example: Black Sabath and Mozart!! biggrin.gif

As I mentioned, I am not expecting amazing patent worthy result - as I am sure that people with a lot more time, and much bigger bugets have researched this further than I have. Indeed - I have read many academic papers on this exact subject!

As my project stands, I have a very fast and reliable MLP toolkit that I have written. It can train very fast and obviously running the network once trained is exceedingly quick (could be done in real time while "listening to the music"?? blink.gif ).

Again, I ask if there are any resources out there that can help me on my quest! Especially this AudioUnits, which looks like it could be very useful. All i wish to do i pick out a few slapped-together features which I can shove into my network.

Of course, I will be publishing an academic paper on my findings and anyone willing to help will be fully acknowledged! tongue.gif
You can get information on AudioUnits on apple website, also in the /Developer/Examples folder if you have installed the Dev tools. There are various samples on how to setup audiounits chains, it's not complicated.

My suggestion is that you work with MIDI files, there you have the polyphony already in a digestable form (notes, separated by instruments).
However, even with MIDI, I can't really see how you will be able to feed that in a neural net; a neural net is good at learning static patterns like a retina, but music is primarily time based so you can't make a reliable and discriminative "pattern" to show to your network.
To do that you would have to make a massive work of 1) extracting "signature" bits of the music, and 2) represent these in a discriminant form.

For info I worked on a musical search engine a few years back, where you could sorta whisle a tune and the program would try to get you a match. The problem we had was not really finding matches, but diggesting "relevant" traits of the tunes to index to then do pattern matching.
I was always expecting the 'time domain' problem to be the main difficulty.

Do you think that FFTing my data would give me enough information to go on?

Again, as this is an academic excerise more than anything, it is not a massive problem if it doesnt actaully work! Obviuosly it would be nice for it to work, but the main focus is coming up with an interesting area to research and research it well enough to put something together. The bulk of the assesment is in writting the neural net library (and mine works really well tongue.gif)
FFTing won't help you much, FFT is an amazing tool (as you probably have noticed in my own FFTea!) but it is very very limited by :
*) it's ambivalence between time resolution and frequency resolution: the bigger the FFT, the more data you need to feed it, and therefore the less frequency resolution you get!
*) lack of resolution at low frequencies. Even with a large FFT, anything even remotely bass-like falls in between FFT bins and is pretty much not measurable.

Have a spin playing with FFTea, you'll quickly learn the limit of the mathematical tool.

Unfort. I only have a G3, so wont be able to use your FFTea tool angry.gif I have however, read quite a bit into FFT as it is covered as part of my university degree.

At this point (with 2 weeks left of the assignment) the emerging complexity (as I forsaw happening) is beginning to overwhelm me. I am not sure how much more I am going to be able to do.

I took a look into the audiounits/examples and they mainly appear to create signals - I have yet to come accross an example that 'loads in' an audio file.

As much as I would like to carry on - its looking like im going to have to give up and pick some stupid statistical problem to solve with my net unless anyone can suggest another audio based problem thats going to be easier to pre-process. I know genre has been done using the same net (and methods) as I am using, but again; I think that is going to require too much pre-processing of data, which I do not have the knowledge to implement or time to learn!

A pretty disheartened student,
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2018 Invision Power Services, Inc.