How to create and calculate a formula using an unknown number of variables in C++ - c++

Okay, so this is going to be very complicated to explain through text but I will do my best to try.
I am making a universal calculator where one of the function of the calculator is to process a formula when given an unknown number of variables. I have seen some ways to do this but for how i'm trying to use this calculator, it wont work.
Example for sum of function:
while (cin >> input)
count++;
Normally this would work but the problem is that I can't have the user input the values over and over again for one formula like for this formula: Formula Example
(Sorry its easier for me to explain through a picture) In it there are multiple times where I have to use the same numbers over and over again. Here is the entire process if you need it to understand what I'm saying:
Entire problem
The problem is that normally I would add another float for every point graph but I don't know ahead of time number of floats the user is going to enter in. The ideal way to do this is for the program to ask the user for all the points on the table and for the user to input those points in a format like: "(1,2) (2,4) (3,6)..."
Thinking ahead, would I make a function where the program creates an integer and assigns the integer to a value on the fly? But then how would the actual math formula interact with the new integers if they haven't been created yet?
Talking about this makes my head hurt....
I actually want to say more:
One idea that I tried to make in my head was something like
string VariableName = A [or something]
Then you would reassign VariableName = "A" to VariableName = "B" by something like VariableName = "A"+ 1 (which would equal B).
Then you would repeat that step until the user inputs a invalid input. But obviously you can't do math with letters so I wouldn't know how to do it.

I think that you are overthinking this. It's pretty simple and it doesn't need to store the input values.
The main thing to note is that you need to compute (step 2) the sum of the values of X and Y, the sum of their product and the sum of X squared. To compute the sum of a lot of values you don't need all the values together, but just one at the time. Exactly as when a user provides them. So declare four variables: sx, sy, sxy, sxx. Initialize them to 0. At every couple of X and Y you get, add it to sx and sy, add their product to sxy and the product of X with itself to sxx.
Now you've got all you need to compute the final result for a and b.
Anyway a good C++ book would be useful.

Related

If x is y then z, else x without repeating x in Google Sheets

In a cell, is it possible to do if(x=y, z, x) without having to repeat x in the value_if_false argument? Whether there is a way of using if() to make this work or another function doesn't matter, and there isn't a specific formula I'm struggling with as I come across this blocker quite often (hence posting).
To help illustrate the need, if we take x as a complex or more advanced formula, such as
ARRAYFORMULA(IF(E$6:Q$6 < EoMONTH($P$4,0), "Not Active", IF(E$6:Q$6<$Q$4 + ISBLANK($Q$4) > 0,
COUNTIF({'Data'!$B$3:$B&'Data'!$I$3:$I&'Data'!$K$3:$K},$B$4&$C9&E$6:Q$6), "Not Active")))
and I wanted to put an if statement in there that changed the result only if a condition was true, the formula would more than double in size due to having to reference x twice:
=ARRAYFORMULA(IF(IF(E$6:Q$6 < EoMONTH($P$4,0), "Not Active", IF(E$6:Q$6<$Q$4 + ISBLANK($Q$4) > 0,
COUNTIF({'Data'!$B$3:$B&'Data'!$I$3:$I&'Data'!$K$3:$K},$B$4&$C9&E$6:Q$6), "Not Active"))) = 0, "No data", IF(E$6:Q$6 < EoMONTH($P$4,0), "Not Active", IF(E$6:Q$6<$Q$4 + ISBLANK($Q$4) > 0,
COUNTIF({'Data'!$B$3:$B&'Data'!$I$3:$I&'Data'!$K$3:$K},$B$4&$C9&E$6:Q$6), "Not Active"))))
This is just an example (the code is irrelevant), I'm trying to keep my formulas neat, tidy and efficient so that handing off to others is easier. Then I'm also mindful that it is calculating the same complex formula twice, which would probably slow the spreadsheet down especially when iterated throughout a spreadsheet.
Interested to hear the community thoughts and suggestions on this, hopefully I was clear in explaining it. :)
The only simple way to achieve this would be with the use of helper columns. They don't need to be in the same sheet as your main equation, but they do need to be within that same spreadsheet as a whole (ie you could have a sheet named "calc" that's specifically used to calculate intermediate steps and set "variables" by referencing those cells).
The only other option (which gets a bit complicated) is to create a custom function within Google Apps Script. For example, if you wanted to calculate (B1*A4)/C5 in multiple places, you could create a custom function like this:
/**
* Returns a calculation using cells A4, B1, and C5.
* #return A calculation using cells A4, B1, and C5.
* #customfunction
*/
function x() {
var ss = SpreadsheetApp.getActiveSpreadsheet().getSheetByName('MainSheet');
var val1 = ss.getRange('B1').getValue();
var val2 = ss.getRange('A4').getValue();
var val3 = ss.getRange('C5').getValue();
return (val1*val2)/val3;
}
Then in your sheet, you could use this within a formula like this:
=if(A1="yes", x(), "no")
This custom function could obviously be altered to fit one's needs (ex taking in arguments to define the cells that the calculations should be done on instead of hard coding them, etc).
Other than this, there is currently no way to define variables within a formula itself.
This is possible to a certain extent, using TEXT's Meta Instructions, if you're using numbers and simple math conditions.
x
y
z
output
10
10
5
10
=TEXT(A3,"[="&B3&"]0;"&C3&"")
x
y
z
output
11
10
5
5
As long as your complex formula returns a number for x(or the output can be coerced to a number), this should be possible and it avoids repetition.
I agree, I would love if there was like a DECODE or NVL type function you could use so that you didn't need to repeat the original statement multiple times.
However, in many cases, when I encounter this, I can often reference another cell. Not in the way that has been suggested already, where the formula exists in another cell, but rather that the decision to perform the formula is based on another cell.
For example, using your values, lets assume the formula ((if(x=y, z, x)) only gets calculated when column 'w' is populated. Maybe column 'w' is a key component of the formula. Then you can write the formula as: if(w="",z,x). It's not exactly the same as testing the answer to the equation first and doesn't work in all situations, but in many cases I can find another field that's of key relevance to the formula that lets me get around this.

Correct values for SsaSpikeEstimator's pvalueHistoryLength

In the creation of a SsaSpikeEstimator instance by the DetectSpikeBySsa method, there is a parameter called pvalueHistoryLength - could anybody please help me understand, for any given time series with X points, which is the optimal value for this parameter?
I got similar issue, when I try to read the paper, https://arxiv.org/pdf/1206.6910.pdf, I notice one paragraph
Also, simulations and theory (Golyandina, 2010) show that it is
better to choose window length L smaller than half of the time series length
N. One of the recommended values is N/3.
Maybe that's why in the ML.Net Power Anomaly example, the value is chosen to be 30 for the 90 points dataset.

Get an interval of values possible in regression using sci-kit learn machine learning

I am trying to use regression to predict a value. For a given set of independent variables, I get a fixed number as the expected value. However, is it possible to get a range of value, so as to say that the maximum possible value be say x and minimum possible value be say y.
Using
regr = linear_model.LinearRegression()
regr.fit(X_train, Y_train)
pred = regr.predict([[a, b]])
The value of pred comes out be say 10 , but i would rather want something like max = 12 and min = 8
Simply saying a range of values.
UPDATE
Tried looking into GMM, not sure if that work for this.
Tried Gausian processes but it again give a single value something like 11.137631, which really doesn't as i am looking for a range of value rather than a single value.
The linear regression always gives same result for a given input vector, however using a random forest regressor in iteration gives different result on each iteration and that can be used to get a Minimum and maximum possible value from a given input vector forecast.

C++, determine the part that have the highest zero crosses

I’m not specialist in signal processing. I’m doing simple processing on 1D signal using c++. I want really to know how I can determine the part that have the highest zero cross rate (highest frequency!). Is there a simple way or method to tell the beginning and the end of this part.
This image illustrate the form of my signal, and this image is what I need to do (two indexes of beginning and end)
Edited:
Actually I have no prior idea about the width of the beginning and the end, it's so variable.
I could calculate the number of zero crossing, but I have no idea how to define it's range
double calculateZC(vector<double> signals){
int ZC_counter=0;
int size=signals.size();
for (int i=0; i<size-1; i++){
if((signals[i]>=0 && signals[i+1]<0) || (signals[i]<0 && signals[i+1]>=0)){
ZC_counter++;
}
}
return ZC_counter;
}
Here is a fairly simple strategy which might give you some point to start. The outline of the algorithm is as follows
Input: Vector of your data points {y0,y1,...}
Parameters:
Window size sigma.
A threshold 0<p<1 defining when to start looking for a region.
Output: The start- and endpoint {t0,t1} of the region with the most zero-crossings
I won't give any C++ code, but the method should be easy to implement. As example let us use the following function
What we desire is the region between about 480 and 600 where the zero density higher than in the front. First step in the algorithm is to calculate the positions of zeros. You can do this by what you already have but instead of counting, you store the values for i where you met a zero.
This will give you a list of zero positions
From this list (you can do this directly in the above for-loop!) you create a list having the same size as your input data which looks like {0,0,0,...,1,0,..,1,0,..}. Every zero-crossing position in your input data is marked with a 1.
The next step is to smooth this list with a smoothing filter of size sigma. Here, you can use what you like; in the simplest case a moving average or a Gaussian filter. The higher you choose sigma the bigger becomes your look around window which measures how many zero-crossings are around a certain point. Let me give the output of this filter together with the original zero positions. Note that I used a Gaussian filter of size 10 here
In a next step, you go through the filtered data find the maximum value. In this case it is about 0.15. Now you choose your second parameter which is some percentage of this maximum. Lets say p=0.6.
The final step is to go through the filtered data and when the value is greater than p you start to remember a new region. As soon as the value drops below p, you end this region and remember start and endpoint. Once you are finished walking through the data, you are left with a list of regions, each defined by a start and an endpoint. Now you choose the region with the biggest extend and you are done.
(Optionally, you could add the filter size to each end of the final region)
For the above example, I get 11 regions as follows
{{164,173},{196,205},{220,230},{241,252},{259,271},{278,290},
{297,309},{318,327},{341,350},{458,468},{476,590}}
where the one with the biggest extend is the last one {476,590}. The final result looks (with 1/2 filter region padding)
Conclusion
Please don't be discouraged by the length of my answer. I tried to explain everything in detail. The implementation is really just some loops:
one loop to create the zero-crossings list {0,0,..,1,0,...}
one nested loop for the moving average filter (or you use some library Gaussian filter). Here you can at the same time extract the maximum value
one loop to extract all regions
one loop to extract the largest region if you haven't already extracted it in the above step

CLI/C++ How to store more than 15 digit float number?

For a school project, I have a simple program, which compares 20x20 photos. I put 20 photos, and then i put 21th photo, which is compared to existing 20, and pops up the answer, which photo i did insert (or which one is most similar). The problem is, my teacher wanted me to use nearest neighbour algorithm, so i am counting distance from every photo. I got everything working, but the thing is, if photos are too similar, i got the problem with saying which one is closer to my one. For example i get these distances with 2 different photos (well, they are ALMOST the same):
0 distance: 1353.07982026191
1 distance: 1353.07982026191
It is 15 digits already, and i am using double type. I was reading that long double is the same. Is there any "easy" way to store numbers with more than 15 digits and do math on them?
I count distance using Euclidean distance
I just need to be more precise, or thats limit i probably wont pass here, and i should talk to my teacher i cant compare such similar photos?
I think you need this: gmplib.org
There's a guide how to install this library on this site too.
And here's article about floats: http://gmplib.org/manual/C_002b_002b-Interface-Floats.html#C_002b_002b-Interface-Floats
Maybe you could use an algebraic approach.
Let us assume that you are trying to calcuate if vector x is closer to a or b. What you need to calculate is the sign of
d2(x, a) - d2(x, b)
Which becomes (I'll omit some passages for brevity)
and then
Which only contains differences between values which should be very similar. Summing over such small values should yield a better precision than working on the aggregate.