I write application where I must process digital signal - array of double. I must the signal decimate, filter etc.. I found a project gnuradio where are functions for this problem. But I can't figure how to use them correctly.
I need signal decimate (for example from 250Hz to 200Hz). The function should be similar to resample function in Matlab. I found, the classes for it are:
rational_resampler_base_fff Class source
fir_filter_fff Class source
...
Unfortunately I can't figure how to use them.
gnuradio and shared library I have installed
Thanks for any advice
EDIT to #jcoppens
Thank you very much for you help.
But I must process signal in my code. I find classes in gnuradio which can solve my problem, but I need help how set them.
Functions which I must set are:
low_pass(doub gain, doub sampling_freq, doub cutoff_freq, doub transition_width, window, beta)
where:
use "window method" to design a low-pass FIR filter
gain: overall gain of filter (typically 1.0)
sampling_freq: sampling freq (Hz)
cutoff_freq: center of transition band (Hz)
transition_width: width of transition band (Hz).
The normalized width of the transition band is what sets the number of taps required. Narrow –> more taps
window_type: What kind of window to use. Determines maximum attenuation and passband ripple.
beta: parameter for Kaiser window
I know, I must use window = KAISER and beta = 5, but for the rest I'm not sure.
The func which I use are: low_pass and pfb_arb_resampler_fff::filter
UPDATE:
I solved the resampling using libsamplerate
I need signal decimate (for example from 250Hz to 200Hz)
WARNING: I expressed the original introductory paragraph incorrectly - my apologies.
As 250 Hz is not related directly to 200 Hz, you have to do some tricks to convert 250Hz into 200Hz. Inserting 4 interpolated samples in between the 250Hz samples, lowers the frequency to 50Hz. Then you can raise the frequency to 200Hz again by decimating by a factor 4.
For this you need the "Rational Resampler", where you can define the subsample and decimate factors. Something like this:
This means you would have to do something similar if you use the library. Maybe it's even simpler to do it without the library. Interpolate linearly between the 250 Hz samples (i.e. insert 4 extra samples between each), then decimate by selecting each 4th sample.
Note: There is a Signal Processing forum on stackexchange - maybe this question might fall in that category...
More information: If you only have to resample your input data, and you do not need the actual gnuradio program, then have a look at this document:
https://ccrma.stanford.edu/~jos/resample/resample.pdf
There are several links to other documents, and a link to libresample, libresample4, and others, which may be of use to you. Another, very interesting, page is:
http://www.dspguru.com/dsp/faqs/multirate/resampling
Finally, from the same source as the pdf above, check their snd program. It may solve your problem without writing any software. It can load floating point samples, resample, and save again:
http://ccrma.stanford.edu/planetccrma/software/soundapps.html#SECTION00062100000000000000
EDIT: And yet another solution - maybe the simplest of all: Use Matlab (or the free Octave version):
pkg load signal
t = linspace(0, 10*pi, 50); % Generate a timeline - 5 cycles
s = sin(t); % and the sines -> 250 Hz
tr = resample(s, 5, 4); % Convert to 200 Hz
plot(t, s, 'r') % Plot 250 Hz in red
hold on
plot(t, tr(1:50)) % and resampled in blue
Will give you:
Related
I use portaudio in a Cpp work.
My signal model treats the only 16000Hz audio input and
When the First released my work, I don't need to use 44100 sample rate. It was just about 48000Hz microphone.
So I resampled my signal like 48000 -> 16000 -> 48000 with a simple decimation algorithm and linear interpolation.
But now I want to use a 44100 microphone. In real-time processing, My buffer is 256 points in 16000 Hz. So it is hard to find the input buffer size in 44100 Hz and downsample from 44100 to 16000.
When I used just decimation or average filter(https://github.com/mattdiamond/Recorderjs/issues/186), the output speech is higher then input and windowed sinc function interpolation makes a distortion.
is there any method to make 44100->16000 downsampling for realtime processing? please let me know...
thank you.
I had to implement a similar problem in the past, not for audio, but to simulate an asynchronism between a transmitte signal sampling frequency and a receiver sampling frequency.
This is how I will proceed:
Let us call T1 the sampling time duration of the incoming signal x: T1=1/44100 and
let us call T2 the sampling time duration of the signal to be generated y.
To calculate the value of the signal y[n*T2], select the two input values x[k*T1]and x[(k+1)*T2]
that surround the value to be calculated:
k*T1 <= n*T2 < (k+1)*T1
Then perform a linear interpolation from these two values. The interpolation factor must be recalculated for each sample.
If t = n*T2, a = k*T1 and b = (k+1)*T2, then
p = (x[b] - x[a])/T1
y[t] = p*(t-a) + x[a]
With a 44.1kHz frequency, x|a]and x[a+T1] should be rather well correlated, and the linear interpolation could be goood enough.
With the obtained quality is not good enough, you can interpolate the incoming signal with a fixed interpolation ratio,
for example 2, with a classical well defined good interpolation filter.
Then you can apply the previous procedure, with the help of the new calculated signal,
the sampling duration of which is T1/2.
If the incoming signal has some high frequencies, then, in order to avoid aliasing, you need to apply a low-pas filter to the incoming signal prior to the downsampling. Note that this is necessary even in your previous case 48kHz -> 16kHz
Environment
Hardware: Raspberry Pi x
O.S.: Raspbian Jessie Lite
Language: Qt5 / C++
Goal
Execute an audio file (wav or better mp3) changing its speed smoothly and countinuosly. The pitch should change according to the speed (playback rate).
My application updates several times per second a variable that contains the desired speed: i.e. 1.0 = normal speed. Required range is about 0.2 .. 3.0, with a resolution of 0.01.
The audio is likely music, expected format: mono, 16-bit, 11.025 Hz.
No specific constraints about latency: below 500 ms is acceptable.
Some thougths
QMediaPlayer in QtMultimedia has the playbackRate property that should do exactly this. Unfortunately I have never be able to make QtMultimedia work in my systems.
It's ok to use also an external player, and send data using pipes or any IPC.
How would you achieve this?
I don't know how much of this translates to C++. The work I did on this problem uses Java. Still, something of the algorithm should be of help.
Example data (made up):
sample value
0 0.0
1 0.3
2 0.5
3 0.6
4 0.2
5 -0.1
6 -0.4
With normal speed, we send the output line a series of values where the sample number increments by 1 per output frame.
If we were going slower, say half speed, we should output twice as many values before reaching the same point in the media data. In other words, we need to include, in our output, values that are at the non-existent, intermediate sample frame locations 0.5, 1.5, 2.5, ...
To do this, it turns out that linear interpolation works quite well for audio. It is possible to use a more sophisticated curve fitting algorithm but the increase in fidelity is not considered to be worth the trouble.
So, we end up with a stream as follows (for half speed):
sample value
0 0.0
0.5 0.15
1 0.3
1.5 0.4
2 0.5
2.5 0.55
3 0.6
etc.
If you want to play back 3/4 speed, then the positions and values used in the output would be this:
sample value
0 0.0
0.75 0.225
1.5 0.4
2.25 0.525
3 0.6
3.75 0.525
etc.
I code this via a "cursor" that is incremented each sample frame, where the increment amount determines the "speed" of the playback. The cursor points into an array, like an integer index would, but instead, is a float (or double). If there is a fractional part to the cursor's value, the fraction is used to interpolate between sample values pointed to by the integer part and the integer part plus one.
For example, if the cursor was 6.25, and the value of soundData[6] was A and the value of soundData[6+1] was B, the sound value would be:
audioValue = A * 0.75 + B * 0.25
The degree of precision with which you can define your speed increment is quite high. I think Java's floats are considered sufficient for this purpose.
As for keeping a dynamically changing speed increment smooth, I am spreading out the changes to new speeds over a series of 4096 steps (roughly 1/10th of a second, at 44100 fps). Change requests are often asynchronous, e.g., from a GUI, and are spread out over time in a somewhat unpredictable way. The smoothing algorithm should be able to recalculate and update itself with each new speed request.
Following is a link that demonstrates both strategies, where a sound's playback speed is altered in real time via a slider control.
SlidersTest.jar
This is a runnable copy of the jar file that also contains the source code, and executes via Java 8. You can also rename the file SlidersTest.zip and then drill in to view the source code, in context.
But links to the source files can also be navigated to directly in the two following sections of a page I posted for this code I recently wrote and made open source:
see AudioCue.java
see SlidersTest.java
AudioCue.java is a long file. The relevant parts are in the inner class at the end of the file: class AudioCuePlayer, and for the smoothing algorithm, check the setter method setSpeed which is about 3/4's of the way down. Sorry I don't have line numbers.
I am learning about Two Dimensional Neuron Network so I am facing many obstacles but I believe it is worth it and I am really enjoying this learning process.
Here's my plan: To make a 2-D NN work on recognizing images of digits. Images are 5 by 3 grids and I prepared 10 images from zero to nine. For Example this would be number 7:
Number 7 has indexes 0,1,2,5,8,11,14 as 1s (or 3,4,6,7,9,10,12,13 as 0s doesn't matter) and so on. Therefore, my input layer will be a 5 by 3 neuron layer and I will be feeding it zeros OR ones only (not in between and the indexes depends on which image I am feeding the layer).
My output layer however will be one dimensional layer of 10 neurons. Depends on which digit was recognized, a certain neuron will fire a value of one and the rest should be zeros (shouldn't fire).
I am done with implementing everything, I have a problem in computing though and I would really appreciate any help. I am getting an extremely high error rate and an extremely low (negative) output values on all output neurons and values (error and output) do not change even on the 10,000th pass.
I would love to go further and post my Backpropagation methods since I believe the problem is in it. However to break down my work I would love to hear some comments first, I want to know if my design is approachable.
Does my plan make sense?
All the posts are speaking about ranges ( 0->1, -1 ->+1, 0.01 -> 0.5 etc ), will it work for either { 0 | .OR. | 1 } on the output layer and not a range? if yes, how can I control that?
I am using TanHyperbolic as my transfer function. Does it make a difference between this and sigmoid, other functions.. etc?
Any ideas/comments/guidance are appreciated and thanks in advance
Well, by the description given above, I think that the design and approach taken it's correct! With respect to the choice of the activation function, remember that those functions help to get the neurons which have the largest activation number, also, their algebraic properties, such as an easy derivative, help with the definition of Backpropagation. Taking this into account, you should not worry about your choice of activation function.
The ranges that you mention above, correspond to a process of scaling of the input, it is better to have your input images in range 0 to 1. This helps to scale the error surface and help with the speed and convergence of the optimization process. Because your input set is composed of images, and each image is composed of pixels, the minimum value and and the maximum value that a pixel can attain is 0 and 255, respectively. To scale your input in this example, it is essential to divide each value by 255.
Now, with respect to the training problems, Have you tried checking if your gradient calculation routine is correct? i.e., by using the cost function, and evaluating the cost function, J? If not, try generating a toy vector theta that contains all the weight matrices involved in your neural network, and evaluate the gradient at each point, by using the definition of gradient, sorry for the Matlab example, but it should be easy to port to C++:
perturb = zeros(size(theta));
e = 1e-4;
for p = 1:numel(theta)
% Set perturbation vector
perturb(p) = e;
loss1 = J(theta - perturb);
loss2 = J(theta + perturb);
% Compute Numerical Gradient
numgrad(p) = (loss2 - loss1) / (2*e);
perturb(p) = 0;
end
After evaluating the function, compare the numerical gradient, with the gradient calculated by using backpropagation. If the difference between each calculation is less than 3e-9, then your implementation shall be correct.
I recommend to checkout the UFLDL tutorials offered by the Stanford Artificial Intelligence Laboratory, there you can find a lot of information related to neural networks and its paradigms, it's worth to take look at it!
http://ufldl.stanford.edu/wiki/index.php/Main_Page
http://ufldl.stanford.edu/tutorial/
I have had the absurd idea to write a Commodore VIC-20 emulator, my first computer.
Everything has gone quite well until sound emulation time has come! The VIC-20 has 3 voices (square waveform) and a noise speaker. Searching the net I found that it is a PN generator (somewhere is called "white" noise).
I know that white noise is not frequency driven, but you put a specific frequency value into the noise register (POKE 36877,X command). The formula is:
freq = cpu_speed/(127 - x)
(more details on the VIC-20 Programmer's Guida, especially the MOS6560/6561 VIC-I chip)
where x is the 7-bit value of the noise register (bit 8 is noise on/off switch)
I have a 1024 pre-generated buffer of numbers (the pseudo-random sequence), the question is: how can I correlate the frequency (freq) to create a sample buffer to pass to the sound card (in this case to sf::SoundBuffer that accepts sf::Int16 (aka unsigned short) values?
I guess most of you had a Commodore VIC-20 or C64 at home and played with the old POKE instruction... Can anyone of you help me in understanding this step?
EDIT:
Searching on the internet I found the C64 Programmer's Guida that shows the waveform graph of its noise generator. Can anyone recognize this kind of wave/perturbation etc...? The waveform seems to be periodic (with period of freq), but how tu generate such wave?
I have Problem understanding all Parameter of backgroundsubtractormog2.
I looked in the code (located in bfgf_gaussmix2.cpp), but don't see the connection to the mentioned paper. For exmaple is Tb = varThreshold, but what is the name of Tb in the paper?
I am especially interested in the fat marked parameter.
Let's start with the easy parameter [my remarks]:
int nmixtures
Maximum allowed number of mixture components. Actual number is determined dynamically per pixel.
[set 0 for GMG]
uchar nShadowDetection
The value for marking shadow pixels in the output foreground mask. Default value is 127.
float fTau
Shadow threshold. The shadow is detected if the pixel is a darker version of the background. Tau is a threshold defining how much darker the shadow can be. Tau= 0.5 means that if a pixel is more than twice darker then it is not shadow.
Now to the ones i don't understand:
float backgroundRatio
Threshold defining whether the component is significant enough to be included into the background model ( corresponds to TB=1-cf from the paper??which paper??). cf=0.1 => TB=0.9 is default. For alpha=0.001, it means that the mode should exist for approximately 105 frames before it is considered foreground.
float varThresholdGen
Threshold for the squared Mahalanobis distance that helps decide when a sample is close to the existing components (corresponds to Tg). If it is not close to any component, a new component is generated. 3 sigma => Tg=3*3=9 is default. A smaller Tg value generates more components. A higher Tg value may result in a small number of components but they can grow too large. [i don't understand a word of this]
In the Constructor the variable varThreshold is used. Is it the same as varThresholdGen?
Threshold on the squared Mahalanobis distance to decide whether it is well described by the background model (see Cthr??). This parameter does not affect the background update. A typical value could be 4 sigma, that is, varThreshold=4*4=16; (see Tb??).
float fVarInit
Initial variance for the newly generated components. It affects the speed of adaptation. The parameter value is based on your estimate of the typical standard deviation from the images. OpenCV uses 15 as a reasonable value.
float fVarMin
Parameter used to further control the variance.
float fVarMax
Parameter used to further control the variance.
float fCT
Complexity reduction parameter. This parameter defines the number of samples needed to accept to prove the component exists. CT=0.05 is a default value for all the samples. By setting CT=0 you get an algorithm very similar to the standard Stauffer&Grimson algorithm.
Someone asked pretty much the same question on the OpenCV website, but without an answer.
Well, I don't think anyone could tell you which parameter is what if you don't know the details of the algorithm that you are using. Besides, you should not need anyone to tell you which parameter is what if you know the details of the algorithm. I'm telling this for detailed parameters (fCT, fVarMax, etc.) not for straightforward ones (nmixtures, nShadowDetection, etc.).
So, I think you should read the papers referenced in the documentation. Here are the links for the papers 1, 2, 3.
And also you should read this paper as well, which is the beginning of background estimation.
After reading these papers and checking out the code with, I'm sure you will understand what those parameters are.
Good luck!