Autocorrelation returns random results with mic input (using a high pass filter) - c++

Sorry to ask a similar question to the one i asked before (FFT Problem (Returns random results)), but i've looked up pitch detection and autocorrelation and have found some code for pitch detection using autocorrelation.
Im trying to do pitch detection of a users singing. Problem is, it keeps returning random results. I've got some code from http://code.google.com/p/yaalp/ which i've converted to C++ and modified (below). My sample rate is 2048, and data size is 1024. I'm detecting pitch of both a sine wave and mic input. The frequency of the sine wave is 726.0, and its detecting it to be 722.950820 (which im ok with), but its detecting the pitch of the mic as a random number from around 100 to around 1050.
I'm now using a High pass filter to remove the DC offset, but it's not working. Am i doing it right, and if so, what else can i do to fix it? Any help would be greatly appreciated!
(Fixed)
Thanks,
Niall.
Edit: Changed the code to implement a high pass filter with a cutoff of 30hz (from What Are High-Pass and Low-Pass Filters?, can anyone tell me how to convert the low-pass filter using convolution to a high-pass one?) but it's still returning random results. Plugging it into a VST host and using VST plugins to compare spectrums isn't an option to me unfortunately.
Edit: Fixed, thanks for everyones help, but I never got it to work, now using new code.

I am no sound expert, but if you are sampling with 44100 (I guess samples per second) and use 1024 datapoints. You are working with about 1/40th of a second worth of data. I doesn't surprise me that the current pitch varies a lot, depending on which piece you pick. If you want to find the average or main pitch of a voice, I'd expect to need about 1second worth of data.

At 44.1 kHz sampling frequency, 1024 samples is only a little bit over 23 ms worth of data. Isn't it possible that this is simply insufficient data in order to compute the pitch of a human singer?
I mean, the sound I can make that lasts for 23 ms is probably not something I have a lot of pitch-control over; I would expect this kind of measurement to be done over slighly longer periods of time.

The problem is in your findBestCandidates() function:
Inside this function you access the 'inputs' array from 0 up to 'length - 1'.
When you call this function inside detectPitchCalculation() function 'inputs' is 'results' and 'length' is 'nHiPeriodInSamples'.
But 'results' is only allocated and filled up to 'nHiPeriodInSamples - nLowPeriodInSamples - 1'.
So if 'nLowPeriodInSamples' is greater 0 you access unallocated and random memory inside the findBestCandidates() function!
EDIT:
Another bug is that you fill each 'nResolution' entry of the 'results' array in detectPitchCalculation() function but access each entry in the findBestCandidates() function (via the 'inputs' argument). But since you call detectPitchCalculation() with a 'nResolution=1' this does not explain your specific problem...so I will look a little bit more. But it would definitely a problem if you call it with higher resolutions.

I don't see the problem in you code, but I'm no good in C. But I'd try the following to find the problem:
run with data where the result in known, e.g. with sin(x) as input
run it with small data size (e.g. 2)
Compare the results with known correct ones. You should be able to find those on the internet, or do them by hand.
If random means: same input, different output, you most probably have some bug in the initialisation of variables. Use a debugger and known input to check, that all variables, especially all elements of arrays are properly initialized.

Related

Extracting number of bits in a macroblock from VVC VTM reference software

Final:Result after calculating and displaying the differenceI am new to VVC and I am going through the reference software's code trying to understand it. I have encoded and decoded videos using the reference software. I want to extract the bitstream from it, I want to know the number of bits there are in each macroblock. I am not sure which class I should be working with, for now I am looking at, mv.cpp, QuantRDOQ.cpp, and TrQuant.cpp.
I am afraid to mess the code up completely, I don't know where to add what lines of code. Start: Result after calculating and displaying the difference
P.S. The linked pictures are after my problem has been solved, I attached these pictures because of my query in the comments.
As the error says, getNumBins() is not supported by the CABAC estimator. So you should make sure you call it "only" during the encoding, and not during the RDO.
This should do the job:
if (isEncoding())
before = m_BinEncoder.getNumBins()
coding_unit( cu, partitioner, cuCtx );
if (isEncoding())
{
after = m_BinEncoder.getNumBins();
diff = after - before;
}
The simpleset solution that I'm aware of is at the encoder side.
The trick is to compute the difference in the number of written bits "before" and "after" encoding a Coding Unit (CU) (aka macroblock). This stuff happens in the CABACWriter.cpp file.
You should go to to coding_tree() function, where coding_unit() function is called, which is responsible for context-coding all syntax elementes in the current CU.
There, you may call the function getNumBins() twice: once before and once after coding_unit(). The difference of the two value should do the job for you.

how to get txPower to calculate distance from RSSI

I got this code from google code :
void QBluetoothDeviceDiscoveryAgent::deviceDiscovered(const QBluetoothDeviceInfo &info)
QBluetoothDeviceInfo::rssi().
But how to get rssi distance from `QBluetoothServiceDiscoveryAgent ?
I tried with
QBluetoothServiceDiscoveryAgent serviceInfo;
quint i =serviceInfo.device().rssi();
here i = -43
how to convert it to distance?
I got the link
Understanding ibeacon distancing
but how to get the transmitter power? to calculate the distance according to formula?
Make sure you understood the implications of QBluetoothDeviceInfo::rssi(). Calling this functions returns immediately with the last stored value when the device was scanned last. If you only receive one advertisement-packet, which happens to be at e.x. -90dB, and then immediately connect, this function will keep returning -90 until you disconnect from it and scan it again. Connected devices usually don't send advertisement-packets so the RSSI you can read via Qt won't be updated during the connection.
As for proximity, it's not so easy to get good values. To accurately convert from RSSI to geometric distance you must know the sender's original/intended signal-strength (or TX-power-level == RSSI at 1m distance). This value will differ between devices. To make things worse, in practice it can also vary by a huge margin depending on things like the sender's battery-level, physical orientations of sender/receiver to eachother, quality of individual parts, random interference from other RF devices....
The BLE-folk has a blog explaining how you should do it. You can read it up here. The linked article doesn't read or assume the theoretical maximum RSSI of the sender but instead it propoposes to gather multiple RSSI-values over time (+ do some mean/mode filtering), and use the current mean-value in comparison with the previous value to determine if you are approaching or moving away from the sender. Paired with some fine-tuning using real-world data you gotta collect, plus documentation-reading and common-sense, you could probably develop a proximity calculation for many or even most sender-devices which would be accurate to about one meter or even less at close proximity. In the end it's a tradeoff between how many devices you wish to 'calibrate' for and those you are okay with having shifted values due to higher or lower TX-power-levels.
The downside being - you can't test for every possible device on the market and as I said earlier, different devices have different TX-power-levels. With this approach you can develop an algorithm to get pretty good measurements for devices which have approximately equal signal-configurations but others will seem far off. The article's author talks about creating different profiles for different vendors but that's not really gonna help (consider two identical beacons ("big/small"), one for large and one for small indoor locations - with RSSI alone you can't reliably determine if you're close to the small beacon or in medium range to the big one unless they identify themselves via GAP or otherwise (forget MAC-addresses if you plan to deploy on MacOS or iOS).
Also, prepare yourself for the joyride that is Android BLE development. Some vendors know that their BLE implementation is so terribly bad and broken, they even disabled the HCI-Logging-Feature on all their ROMs to hide it. Others can be BLE-nuked like Win98 by ethernet, back in the days.

How to normalize sequence of numbers?

I am working user behavior project. Based on user interaction I have got some data. There is nice sequence which smoothly increases and decreases over the time. But there are little discrepancies, which are very bad. Please refer to graph below:
You can also find data here:
2.0789 2.09604 2.11472 2.13414 2.15609 2.17776 2.2021 2.22722 2.25019 2.27304 2.29724 2.31991 2.34285 2.36569 2.38682 2.40634 2.42068 2.43947 2.45099 2.46564 2.48385 2.49747 2.49031 2.51458 2.5149 2.52632 2.54689 2.56077 2.57821 2.57877 2.59104 2.57625 2.55987 2.5694 2.56244 2.56599 2.54696 2.52479 2.50345 2.48306 2.50934 2.4512 2.43586 2.40664 2.38721 2.3816 2.36415 2.33408 2.31225 2.28801 2.26583 2.24054 2.2135 2.19678 2.16366 2.13945 2.11102 2.08389 2.05533 2.02899 2.00373 1.9752 1.94862 1.91982 1.89125 1.86307 1.83539 1.80641 1.77946 1.75333 1.72765 1.70417 1.68106 1.65971 1.64032 1.62386 1.6034 1.5829 1.56022 1.54167 1.53141 1.52329 1.51128 1.52125 1.51127 1.50753 1.51494 1.51777 1.55563 1.56948 1.57866 1.60095 1.61939 1.64399 1.67643 1.70784 1.74259 1.7815 1.81939 1.84942 1.87731
1.89895 1.91676 1.92987
I would want to smooth out this sequence. The technique should be able to eliminate numbers with characteristic of X and Y, i.e. error in mono-increasing or mono-decreasing.
If not eliminate, technique should be able to shift them so that series is not affected by errors.
What I have tried and failed:
I tried to test difference between values. In some special cases it works, but for sequence as presented in this the distance between numbers is not such that I can cut out errors
I tried applying a counter, which is some X, then only change is accepted otherwise point is mapped to previous point only. Here I have great trouble deciding on value of X, because this is based on user-interaction, I am not really controller of it. If user interaction is such that its plot would be a zigzag pattern, I am ending up with 'no user movement data detected at all' situation.
Please share the techniques that you are aware of.
PS: Data made available in this example is a particular case. There is no typical pattern in which numbers are going to occure, but we expect some range to be continuous with all the examples. Solution I am seeking is generic.
I do not know how much effort you want to involve in this problem but if you want theoretical guaranties,
topological persistence seems well adapted to your problem imho.
Basically with that method, you can filtrate local maximum/minimum by fixing a scale
and there are theoritical proofs that says that if you sampling is
close from your function, then you extracts correct number of maximums with persistence.
You can see these slides (mainly pages 7-9 to get the idea) to get an idea of the method.
Basically, if you take your points as a landscape and imagine a watershed starting from maximum height and decreasing, you have some picks.
Every pick has a time where it is born which is the time where it becomes emerged and a time where it dies which is when it merges with an higher pick. Now a persistence diagram pictures a point for every pick where its x/y coordinates are its time of birth/death (by assumption the first pick does not die and is not shown).
If a pick is a global maximal, then it will be further from the diagonal in the persistence diagram than a local maximum pick. To remove local maximums you have to remove picks close to the diagonal. There are fours local maximums in your example as you can see with the persistence diagram of your data (thanks for providing the data btw) and two global ones (the first pick is not pictured in a persistence diagram):
If you noise your data like that :
You will still get a very decent persistence diagram that will allow you to filter local maximum as you want :
Please ask if you want more details or references.
Since you can not decide on a cut off frequency, and not even on the filter you want to use, I would implement several, and let the user set the parameters.
The first thing that I thought of is running average, and you can see that there are so many things to set, to get different outputs.

Neural Network gives same output for different inputs, doesn't learn

I have a neural network written in standard C++11 which I believe follows the back-propagation algorithm correctly (based on this). If I output the error in each step of the algorithm, however, it seems to oscillate without dampening over time. I've tried removing momentum entirely and choosing a very small learning rate (0.02), but it still oscillates at roughly the same amplitude per network (with each network having a different amplitude within a certain range).
Further, all inputs result in the same output (a problem I found posted here before, although for a different language. The author also mentions that he never got it working.)
The code can be found here.
To summarize how I have implemented the network:
Neurons hold the current weights to the neurons ahead of them, previous changes to those weights, and the sum of all inputs.
Neurons can have their value (sum of all inputs) accessed, or can output the result of passing said value through a given activation function.
NeuronLayers act as Neuron containers and set up the actual connections to the next layer.
NeuronLayers can send the actual outputs to the next layer (instead of pulling from the previous).
FFNeuralNetworks act as containers for NeuronLayers and manage forward-propagation, error calculation, and back-propagation. They can also simply process inputs.
The input layer of an FFNeuralNetwork sends its weighted values (value * weight) to the next layer. Each neuron in each layer afterwards outputs the weighted result of the activation function unless it is a bias, or the layer is the output layer (biases output the weighted value, the output layer simply passes the sum through the activation function).
Have I made a fundamental mistake in the implementation (a misunderstanding of the theory), or is there some simple bug I haven't found yet? If it would be a bug, where might it be?
Why might the error oscillate by the amount it does (around +-(0.2 +- learning rate)) even with a very low learning rate? Why might all the outputs be the same, no matter the input?
I've gone over most of it so much that I might be skipping over something, but I think I may have a plain misunderstanding of the theory.
It turns out I was just staring at the FFNeuralNetwork parts too much and accidentally used the wrong input set to confirm the correctness of the network. It actually does work correctly with the right learning rate, momentum, and number of iterations.
Specifically, in main, I was using inputs instead of a smaller array in to test the outputs of the network.

Some confusion over Numpy + Scipy + matplotlib Spectrum Analyzer code

I've been attempting to understand the code at the bottom of http://www.frank-zalkow.de/en/code-snippets/create-audio-spectrograms-with-python.html, though sadly I haven't been getting anywhere with it. I don't think I'm expected to understand most of the code, as I have limited experience with FFTs, but unfortunately I'm also having trouble understanding how the graph is generated. I'm also getting very limited progress from a trial-and-error approach, due to the fact that my computer lags heavily and because of the relatively long time it takes for a graph to be generated.
With that being said, I need a way to scale the graph so that it only displays values up to 5000 Hz, though still on a logarithmic scale. I'd also like to understand how the wav file is sampled, and what values I can edit in order to take more samples per second. Can somebody explain how both of these points work, and how I can edit the code in order to fulfill these requirements?
Hm, this code is by me so gladly help you understanding it. It's maybe not best practice and there may be several ways to improve it – suggestions are welcome. But at least it worked for me.
The function stft does a standard short-time-fourier-transform of an audio signal by the help of the numpy strides. The function logscale_spec takes an stft and scales it logarithmically. This is maybe a bit dirty and there must be a better way to do it. But it worked for me. plotstft is the function that finally reads a wave file via scipy.io.wavfile, combines the prior two functions and makes a plot with matplotlibs imshow. If you have a mono wavefile you should be able to just call plotstft("/path/to/mono.wav").
That was an overview – if I should explain some things in more detail, just say so.
To your questions. To leave out some frequencie values: You can get the frequencies values of the fft wih np.fft.fftfreq(binsize, 1./sr). You just have to find the index of of your cutoff value and leaving this values of the stft.
I don't understand your second question... You can have a look of all samples of your wavefile by:
>>> import scipy.io.wavfile as wav
>>> x = wav.read("/path/to/file.wav")
>>> x
(44100, array([4554752, 4848551, 3981874, ..., 2384923, 2040309, 294912], dtype=int32))
>>> x[1]
array([4554752, 4848551, 3981874, ..., 2384923, 2040309, 294912], dtype=int32)