Constant quality factor transform (CQT)
The constant constant quality factor transform (CQT), introduced by J.C. Brown in 1988 (see references), is an interesting alternative to the windowed Fourier transform (STFT / Short Time Fourier Transform) or wavelets, for time-frequency analysis, particularly in the field of audio applications.
Indeed, for such applications, the reference metric is defined by the capabilities of the human ear, and since we perceive sounds by a physical system that can be modeled by a bank of almost constant Q filters, where what we call 'quality factor' (Q) is the ratio of the width of the filter against the center frequency: Q = df / f. In other words, at low frequencies, we are capable of very fine frequency resolution (a few Hz at 1000 Hz) and at high frequency we are much less accurate (a few tens of Hz at 10 KHz).
The Scilab script you can find here (see link at right) offers two functions:
[t,f,Y] = cqt(x, fs, fmin, fmax, gamma, [Q = 34[, ofs = 20[, precision = 0.98]]])CQT computation, with:
- t: vector of output time positions
- f: vector of output frequencies
- Y: matrix containing the time / frequency representation : time axis in the first dimension, and frequency axis in the second dimension.
- x: input signal
- fs: sampling frequency (Hz)
- fmin: minimal analysis frequency (Hz)
- fmax: maximal analysis frequency (Hz)
- gamma: ratio between two sucessives analysis frequencies. For instance, for an accuracy of one eigth of tone (one tone = 1/12 octave),
gamma = 2 ^ (1 / (12 * 8))
- Q: quality factor (default value: 34)
- ofs: Output sampling frequency (default value: 20 Hz)
- precision: CQT computation accuracy (default value: 0.98). The more the value is close to 1, the more the computation is accurate, but needs more time and memory.
- t,f,Y: values returned by the cqt function
- opt: 'n' pour linear display in frequency or 'l' for logarithmic display in frequency (default value).
In this example, we test the CQT with a linear chirp (from 400 to 1000 Hz), followed with constant frequency sine (1000 Hz), and added to a permanent and constant frequency (300 Hz) sine.
// Sampling frequency = 8 KHz fs = 8e3; // Minimum frequency of analysis = 100 Hz fmin = 100; // up to 1200 Hz fmax = 1200; // Accuracy = 1 eigth of tone = 1 octave / 48 gamm = 2^(1/(12 * 4)); // Generate a 4 seconds signal, in 2 parts: // 2 seconds of linear chirp from 400 to 1000 Hz // 2 seconds of constant frequency 1000 Hz // + constant frequency 300 Hz signal duree = 2; t1 = linspace(0,duree,duree*fs); f = [linspace(400,1000,duree*fs/2) 1000*ones(1,duree*fs/2)]; phi = cumsum(f ./ fs); x = sin(2 * %pi * phi); x2 = sin(2*%pi*300 .* t1) / 2; x = x + x2; // Compute and plot the CQT [t,f,Y] = cqt(x, fs, fmin, gamm, nfreqs, Q = 34); cqt_plot(t,f,Y,'n');
Here is what we get:
Here, we test an audio file containing guitare notes (artificially generated), spaced regularly every 1/2 tone, and grouped in 4 successives notes spaced of one second.
[x,y] = loadwave("GUITAR-SCALE.wav"); fs = y(3); // Sampling frequency // Decimate by 2 to decrease processing time d = 0.5; x = intdec(x, d); fs = fs * d; // A1 (midi 13) fmin = 55; // A8 (midi 97) fmax = 7040; // Résolution = 1 demi-ton (1 douzième d'octave) gamm = 2^(1/12); [t,f,Y] = cqt(x, fs, fmin, fmax, gamm, Q = 34); cqt_plot(t,f,Y);
Here is what we get: the different notes are clearly identifiable, as much for the high frequencies as for the low frequencies.