Simple Curve Fitting Implimentation in C++ (SVD Least Sqares Fit or similar) - c++

I have been scouring the internet for quite some time now, trying to find a simple, intuitive, and fast way to approximate a 2nd degree polynomial using 5 data points.
I am using VC++ 2008.
I have come across many libraries, such as cminipack, cmpfit, lmfit, etc... but none of them seem very intuitive and I have had a hard time implementing the code.
Ultimately I have a set of discrete values put in a 1D array, and I am trying to find the 'virtual max point' by curve fitting the data and then finding the max point of that data at a non-integer value (where an integer value would be the highest accuracy just looking at the array).
Anyway, if someone has done something similar to this, and can point me to the package they used, and maybe a simple implementation of the package, that would be great!
I am happy to provide some test data and graphs to show you what kind of stuff I'm working with, but I feel my request is pretty straightforward. Thank you so much.
EDIT: Here is the code I wrote which works!
http://pastebin.com/tUvKmGPn
change size to change how many inputs are used
0 0
1 1
2 4
4 16
7 49
a: 1 b: 0 c: 0
Press any key to continue . . .
Thanks for the help!

Assuming that you want to fit a standard parabola of the form
y = ax^2 + bx + c
to your 5 data points, then all you will need is to solve a 3 x 3 matrix equation. Take a look at this example http://www.personal.psu.edu/jhm/f90/lectures/lsq2.html - it works through the same problem you seem to be describing (only using more data points). If you have a basic grasp of calculus and are able to invert a 3x3 matrix (or something nicer numerically - which I am guessing you do given you refer specifically to SVD in your question title) then this example will clarify what you need to do.

Look at this Wikipedia page on Poynomial Regression

Related

Correct values for SsaSpikeEstimator's pvalueHistoryLength

In the creation of a SsaSpikeEstimator instance by the DetectSpikeBySsa method, there is a parameter called pvalueHistoryLength - could anybody please help me understand, for any given time series with X points, which is the optimal value for this parameter?
I got similar issue, when I try to read the paper, https://arxiv.org/pdf/1206.6910.pdf, I notice one paragraph
Also, simulations and theory (Golyandina, 2010) show that it is
better to choose window length L smaller than half of the time series length
N. One of the recommended values is N/3.
Maybe that's why in the ML.Net Power Anomaly example, the value is chosen to be 30 for the 90 points dataset.

Some confusion over Numpy + Scipy + matplotlib Spectrum Analyzer code

I've been attempting to understand the code at the bottom of http://www.frank-zalkow.de/en/code-snippets/create-audio-spectrograms-with-python.html, though sadly I haven't been getting anywhere with it. I don't think I'm expected to understand most of the code, as I have limited experience with FFTs, but unfortunately I'm also having trouble understanding how the graph is generated. I'm also getting very limited progress from a trial-and-error approach, due to the fact that my computer lags heavily and because of the relatively long time it takes for a graph to be generated.
With that being said, I need a way to scale the graph so that it only displays values up to 5000 Hz, though still on a logarithmic scale. I'd also like to understand how the wav file is sampled, and what values I can edit in order to take more samples per second. Can somebody explain how both of these points work, and how I can edit the code in order to fulfill these requirements?
Hm, this code is by me so gladly help you understanding it. It's maybe not best practice and there may be several ways to improve it – suggestions are welcome. But at least it worked for me.
The function stft does a standard short-time-fourier-transform of an audio signal by the help of the numpy strides. The function logscale_spec takes an stft and scales it logarithmically. This is maybe a bit dirty and there must be a better way to do it. But it worked for me. plotstft is the function that finally reads a wave file via scipy.io.wavfile, combines the prior two functions and makes a plot with matplotlibs imshow. If you have a mono wavefile you should be able to just call plotstft("/path/to/mono.wav").
That was an overview – if I should explain some things in more detail, just say so.
To your questions. To leave out some frequencie values: You can get the frequencies values of the fft wih np.fft.fftfreq(binsize, 1./sr). You just have to find the index of of your cutoff value and leaving this values of the stft.
I don't understand your second question... You can have a look of all samples of your wavefile by:
>>> import scipy.io.wavfile as wav
>>> x = wav.read("/path/to/file.wav")
>>> x
(44100, array([4554752, 4848551, 3981874, ..., 2384923, 2040309, 294912], dtype=int32))
>>> x[1]
array([4554752, 4848551, 3981874, ..., 2384923, 2040309, 294912], dtype=int32)

CLI/C++ How to store more than 15 digit float number?

For a school project, I have a simple program, which compares 20x20 photos. I put 20 photos, and then i put 21th photo, which is compared to existing 20, and pops up the answer, which photo i did insert (or which one is most similar). The problem is, my teacher wanted me to use nearest neighbour algorithm, so i am counting distance from every photo. I got everything working, but the thing is, if photos are too similar, i got the problem with saying which one is closer to my one. For example i get these distances with 2 different photos (well, they are ALMOST the same):
0 distance: 1353.07982026191
1 distance: 1353.07982026191
It is 15 digits already, and i am using double type. I was reading that long double is the same. Is there any "easy" way to store numbers with more than 15 digits and do math on them?
I count distance using Euclidean distance
I just need to be more precise, or thats limit i probably wont pass here, and i should talk to my teacher i cant compare such similar photos?
I think you need this: gmplib.org
There's a guide how to install this library on this site too.
And here's article about floats: http://gmplib.org/manual/C_002b_002b-Interface-Floats.html#C_002b_002b-Interface-Floats
Maybe you could use an algebraic approach.
Let us assume that you are trying to calcuate if vector x is closer to a or b. What you need to calculate is the sign of
d2(x, a) - d2(x, b)
Which becomes (I'll omit some passages for brevity)
and then
Which only contains differences between values which should be very similar. Summing over such small values should yield a better precision than working on the aggregate.

Achieving Mutability When Mixing Primitives and Cocoa Collections

Okay, I think I might be over-complicating this issue but I truly am stuck. Basically, I am trying to model a weight set, specifically an olympic weight set. So I have the bar which is 45 lbs, then I have 2 weights of 2.5 lbs, 4 of 5 lbs, and then 2 of 10, 25, 35, and 45 respectively. This makes a total of 300 lbs.
bar = 45 lbs
2 of 2.5
4 of 5
2 of 10
2 of 25
2 of 35
2 of 45
I want to model this weight set so that I have this information: the weight and the quantity of weights I have. I know I could hard-code this but I eventually want to let the user enter how many of each weight they may have.
Anyways, originally I thought I could simply have an NSDictionary with the key being the weight, such as 35, and the value being the quantity.
Naturally I cannot store primitives in an NSDictionary or other Cocoa collection, so I have to encapsulate each integer in an NSNumber. However, the point of my modeling this weight set is so that I can simulate the use of certain weights. For example, if I use a 35 lbs. weight that takes 2 off (one for each side), so I have to go and edit the value for the 35 lbs. weight to reflect the fact that I have deducted 2 from the quantity.
This involves the tedious task of unboxing the NSNumber, converting back to a primitive, doing the math, and then re-boxing into an NSNumber and assigning that new result to the appropriate location in the NSDictionary. After searching around a bit, I confirmed my initial premonition that this was not a good idea.
So I have a couple questions. First of all, is there a better way of modeling a weight set aside from using a dictionary-style solution? If not, what is the suggested way to go about doing this? Do I have to leave the cocoa-realm and resort to using some sort of C++ STL template such as a map?
I have seen some information on NSDecimalNumber, should I just use that?
Like I said, I wouldn't be surprised if I am over-complicating this. I would really appreciate any help, thanks.
EDIT: I am beginning to think that the weight set 'definition' as described should indeed be immutable, as it is a definition after all. Then when I use a certain weight, I can add to some sort of tally. The thing is that the tally will also be some form of collection whose values I will be modifying (adding to), so that I can correlate it to the specific weight. So I guess I am in the same problem.
I think where I am trying to get at is creating a 'clone' so to speak of the weight set definition which I can easily modify (to simulate the usage of individual weights).
Sorry, I'm burned out.
Storing this in a dictionary isn't a natural fit. I think the best approach would be to make a Weight class that represents the weights, and stick them in an NSCountedSet. You can get the individual kinds of Weight and the counts for each kind, and you can get the weight of the whole set with [weightSet valueForKeyPath:#"#sum.weightInPounds"] (assuming the Weights have a weightInPounds property that represents how heavy they are).
You could use NSNumbers in the NSCountedSet and sum them with #sum.integerValue if you wanted, but it seems a bit awkward to me. At any rate, NSCountedSet is definitely a more natural collection than an NSDictionary for storing — well, a counted set.
There's nothing wrong with storing your numbers in an NSDictionary! The question you referenced was referring to complicated, frequent math. Converting from NSNumber and back is slow compared to simple int addition, but is still super-fast compared to human perception. I think your dictionary idea is EDIT: not as good as Chuck's NSCountedSet idea. :)

Autocorrelation returns random results with mic input (using a high pass filter)

Sorry to ask a similar question to the one i asked before (FFT Problem (Returns random results)), but i've looked up pitch detection and autocorrelation and have found some code for pitch detection using autocorrelation.
Im trying to do pitch detection of a users singing. Problem is, it keeps returning random results. I've got some code from http://code.google.com/p/yaalp/ which i've converted to C++ and modified (below). My sample rate is 2048, and data size is 1024. I'm detecting pitch of both a sine wave and mic input. The frequency of the sine wave is 726.0, and its detecting it to be 722.950820 (which im ok with), but its detecting the pitch of the mic as a random number from around 100 to around 1050.
I'm now using a High pass filter to remove the DC offset, but it's not working. Am i doing it right, and if so, what else can i do to fix it? Any help would be greatly appreciated!
(Fixed)
Thanks,
Niall.
Edit: Changed the code to implement a high pass filter with a cutoff of 30hz (from What Are High-Pass and Low-Pass Filters?, can anyone tell me how to convert the low-pass filter using convolution to a high-pass one?) but it's still returning random results. Plugging it into a VST host and using VST plugins to compare spectrums isn't an option to me unfortunately.
Edit: Fixed, thanks for everyones help, but I never got it to work, now using new code.
I am no sound expert, but if you are sampling with 44100 (I guess samples per second) and use 1024 datapoints. You are working with about 1/40th of a second worth of data. I doesn't surprise me that the current pitch varies a lot, depending on which piece you pick. If you want to find the average or main pitch of a voice, I'd expect to need about 1second worth of data.
At 44.1 kHz sampling frequency, 1024 samples is only a little bit over 23 ms worth of data. Isn't it possible that this is simply insufficient data in order to compute the pitch of a human singer?
I mean, the sound I can make that lasts for 23 ms is probably not something I have a lot of pitch-control over; I would expect this kind of measurement to be done over slighly longer periods of time.
The problem is in your findBestCandidates() function:
Inside this function you access the 'inputs' array from 0 up to 'length - 1'.
When you call this function inside detectPitchCalculation() function 'inputs' is 'results' and 'length' is 'nHiPeriodInSamples'.
But 'results' is only allocated and filled up to 'nHiPeriodInSamples - nLowPeriodInSamples - 1'.
So if 'nLowPeriodInSamples' is greater 0 you access unallocated and random memory inside the findBestCandidates() function!
EDIT:
Another bug is that you fill each 'nResolution' entry of the 'results' array in detectPitchCalculation() function but access each entry in the findBestCandidates() function (via the 'inputs' argument). But since you call detectPitchCalculation() with a 'nResolution=1' this does not explain your specific problem...so I will look a little bit more. But it would definitely a problem if you call it with higher resolutions.
I don't see the problem in you code, but I'm no good in C. But I'd try the following to find the problem:
run with data where the result in known, e.g. with sin(x) as input
run it with small data size (e.g. 2)
Compare the results with known correct ones. You should be able to find those on the internet, or do them by hand.
If random means: same input, different output, you most probably have some bug in the initialisation of variables. Use a debugger and known input to check, that all variables, especially all elements of arrays are properly initialized.