The Natural Ear for Digital Sound Processing

The Natural Ear for Digital Sound ProcessingAn alternative to the Fourier TransformAndy BosyiBlockedUnblockFollowFollowingJun 17, 2017This is a primitive prototype of the natural ear.

Why I came to it and how it can be better than the Fast Fourier Transform (FFT) in Digital Sound Processing (DSP) — that what the article is aboutSome of the software development projects that I was related to used Fourier Transform for waveform analysis.

The projects Included sound tone recognition for gun targets and DTMF signals.

But before that, I was keen to get a “picture” of human speech and music harmony.

Recently I started an app that will play some instrument while I am playing a lead guitar.

The problem was to teach the computer to listen to my tempo and keep the musical rhythm in order.

To accomplish this I used Fourier Transformation for the first seconds of Pink Floyd composition “Marooned”.

Then I compared the “picture” to the same composition performed by me and the results were poor until I selected FFT block size as much as 8192 to recognize notes at least to 6th octave.

This showed the first problem with Fourier Transform — for really good analysis you need to increase block size (on a number of frequency bins) and, as result, performance goes down, especially for real-time processing.

The second problem of Fourier Transform analysis for music — the same instrument depending on the timbre can generate a different set of overtones.

These overtone frequencies analyzed by FFT created peaks that were irrelevant to what we actually hear.

To generalize the result I summarized the frequency bins by twelve semitones.

The picture was better, but now the very first note recognized as C, while it was B in fact:This forced me to read more about the nature of sound, hearing and human ear.

I thought that maybe the problem is the third problem with Fourier Transform — it is sensitive to the signal phase.

The human ear does not recognize the phase of individual harmonics, only frequencies.

I created a model using R language (you can find the code at the end of the article) that generates input signals for a set of frequencies:Then used some formulas I combined fifteen years ago ( the same experiment failed due to the poor PC performance) to create a model of a pendulum.

The object can receive an incoming signal and oscillate if there is a frequency in the signal that is the same it’sFrequency:The fading coefficient that does not depend on the auto-oscillation frequency of the pendulum:The position of the pendulum:Velocity and energy:This is a reaction of the pendulum on the same frequency signal:green — input signalblue — pendulum oscillationred — pendulum energyFor the input signal that slightly differs from the frequency of the pendulum the amplitude and energy are significantly smaller than in the previous result:Combined plot for nine different signals — the central one has been recognized:After that, I built a set of pendulums for different frequencies to cover five octaves and twelve notes.

This is resulting in energy for 60 pendulums listening to the first chords of “Marooned”:And as a result, the main tone was detected correctly.

I think that the ability of the human ear to omit the phase information of the input signal is crucial for music recognition.

I used this model to create a C++ library named Cochlea to listen, detect and synchronize music in real-time.

That will be described in the next article.

R CODElibrary(plyr);library(tuneR);#define a class that imitate a pendulum and has two methods – init and tickpendulum <- setRefClass( "pendulum", fields = list( v = "numeric", x = "numeric", K = "numeric", T = "numeric", Phi = "numeric", E = "numeric", lastS = "numeric"), methods = list( #define the initial state and calculate coefficients init = function(w = "numeric") { #period T <<- 44100 / w; #coefficient of elasticity K <<- (2 * pi / T) * (2 * pi / T); #fading coefficient Phi <<- 2 * atan(T) / pi; #initial state v <<- 0; x <<- 0; lastS <<- 0; }, #pass the position of the stimulating lever tick = function(s) { lastX <- x; #position x <<- x + (v + s – lastS – K * x) * Phi; #velocity v <<- x – lastX; #energy E <<- (v * v) / 2 + (K * x * x) / 2; lastS <<- s; return(c(x, E)); } ) ) #create one pendulum and init with 700 as frequency of auto-oscillation p <- pendulum(); p$init(700); #init a vector of waveforms with frequencies from 500 to 900 m <- aaply(seq(500, 900, 50), 1, function(x) sine(x, 1500)@left); # clear end of the waveform m[, 1001:1500] <- 0; #apply the pendulum tick to the vector of waveforms m <- t(m); r <- aaply(m, c(1, 2), p$tick, .

progress = "time"); #index of the waveform to plot i <- 5; #show results plot(m[, i] * 100, type = "l", col = "dark green"); lines(r[ , i, 1], type = "l", col = "blue"); lines(r[ , i, 2], type = "l", col = "red");R code you can find follow this link: R CodeAuthor: Andy Bosyi, CEO/Lead Data Scientist.

. More details

Leave a Reply