Most frequent value in dataset (with variation) [closed] - c++

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
This is a part programming, part statistical math question.
I have a dataset where I want to get the most frequent number (mode), the problem is that I am dealing with values with slight variation.
So normally {1,2,50,50,90} the most frequent number would be 50
But in my case the numbers look like this:
{1,2,49,50,51,90} but the result is still 50
So my question is how can I efficiently calculate this number and is there a statistical term for this number?
Some pseudo code:
Float items.val[] = {1,2,49,50,51,90};
Float threshold = 4;
For (item in items) {
For (subitem in items){
Float dist=Distance(time,subitem)
If (dist < threshold){
item.dist += dist
}
}
}
Output=Sort(item.dist)[0]

There are various ways to go about this.
(1) the most careful, exact way is to assume a probabilistic model for the observed values, and look for the mode (as the expected value or most probable or some other criterion) of the inferred values. I am going to guess this is far too much work in this case, although given unlimited time I would certainly want to approach it that way.
(2) construct a histogram, and look for the bin which has the greatest density (with density = (#items in bin)/(width of bin)). This doesn't necessarily yield a single value.
(3) fit a parametric distribution to the observed values, and report the mode of the fitted distribution.
You might get more traction for this question at stats.stackexchange.com. Good luck and have fun.
EDIT: After looking at your example code, I see it is not too different from (2) above. It seems like a reasonable and workable approach.

Related

Solving system of equation modulo 2 [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I make a program in C++, that factorizes natural numbers. The only problem is to create a function that does the following:
input: it receives a matrix vector< vector< int> > M.
output: it gives a vector v so that a result of multiplying v and M is a vector that all its coordinates are equal to 0.
Everything must be modulo 2, so coefficients of M and v consists only 0s and 1s
Schould I use a Gauss elimination method? If so, how do this? The problem that implementations I saw on the net don't use vectors and the vectors are necessary in my main program
I would be grateful if someone helped me.
Regards
This is an interesting problem. The steps to be taken can be found in this exercise. link.
The tricky part is to understand that there are always only a finite number of solutions, i.e, only the trivial solution exists or non-trivial solutions exist.
Once you finish the row reduction steps and if non-trivial solutions exist, there is always going to be at least one independent variable (it can take any value 0/1) and the rest of the variables depend on the independent variables.

Detect extreme value vector C++ [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
I have a vector of values (delays) like this {2,4,6,3,4,5,6,4,..} in C++. My objective is to detect when a new value is an extreme value - for example, 96. I am trying to come up with a general check to detect instead of putting specific numerical checks.
By extreme value I mean 96 would be X times larger than 2 or 3 or 4. However, if I have delays as {15,23,10,26,..} and then a value 550, which is Y times larger than normal - I want to detect.
I need to start with the standard deviation, but not sure about the best approach further.
Thank you.
In the absence of any other statistical information, compute the mean and the standard deviation of the mean of your existing data, and if the new point is more than 3 standard deviations of the mean outside that mean, then don't add it.
After you have a certain number of points so you can be reasonably sure that the central limit theorem has started to work its magic (20 points as a rule of thumb, especially as "delays" implies "Poisson" on first glance), develop an algorithm to eliminate any outliers that might have been added to the initial set. Do that by considering each added point in turn - eliminate it, and see it matches the criteria for inclusion. This step is important: it's designed to fail an outlier that's introduced early; e.g. {2, 96, 4, 6, 3, 4, 5}. For really hostile data you might need to increase the dimensionality of that algorithm.
This is a tricky science - you'll have to calibrate this to suit your requirements but what I suggest will get you started.

A computing trick to calculate for e.g number of boxes required to place N objects given each box can hold M objects? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
As a part of a recent topcoder SRM problem we had to compute number of buses "B" required to carry "N" people given that each bus has "S" seats.
What is the smartest way to compute this in C++?
The obvious way is to do:
if(N%S==0){B=N/S;}
else{ B=N/S + 1;}
^ ALL VARIABLES ARE INTEGERS, N AND S ASSIGNED APPROPRIATE VALUES
However I cant understand the logic behind the following code which is one particular topcoder user's solution which I was checking out;
B = (N + (S-1))/S;
How does this work?
The code
B = (N + (S-1))/S;
is a common rounding trick. We know that in integer division, the remainder is cut-off, essentially what floor does. In this case, we enforce a ceil operation by adding S-1 first.
This is similiar to the common way of rounding floating point numbers:
n = floor(n + 0.5);

C++ comparing two value to find which is closest to user input value [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I have been looking on the internets for a while to find a solution to my problem. First some back ground. I'm writing a program that calculates catapult trajectory. The user must first type in a distance. Then I loop through the combinations of angle degrees and velocity to find which combination will give a distance that will come the closest to the users input. I don't quite know how to do the variable comparison to find which combination of degrees and velocity produces a distance closest to a users input of distance. I'm just trying to keep it simple and easy as possible. Also, I'm not using any kind of array to store the values. I want it done on the fly inside my for loops if possible. Any suggestions?
Well, the answer to this depends on the complexity of your trajectory formula. I'm guessing that you're not taking fluid dynamics or gravity differentials into consideration. In fact, what I imagine is that you're using a basic parabolic equation...
That equation can be solved directly by rearranging. But the thing is, you're solving for two variables that are actually co-dependent. There are infinite solutions if you allow both angle and velocity to vary, so you need to restrict the 'best' answer by some criteria (for example, desired angle or desired velocity).
If you have more variables, like lift, drag, spin, incident shape, non-constant gravity, air pressure and humidity, then you will need to employ a minimization algorithm which is non-trivial. One of the most basic, but a little unstable, is the Nelder-Mead algorithm.
If this has not been helpful enough, you should provide more information about your problem, and show some code.

Normalizing histograms? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
What is normalizing histograms? When and why would I use it? What are its advantages?
I don't understand the concept at all- when I try to apply it to my histogram, when I use back projection, I don't get any results.
Could someone give me a non-technical explanation of normalization?
I am using OpenCV
PS: Don't send me to wikipedia- I don't understand the Wikipedia Page
Thanks
It's very simple, actually. A normalized histogram is one in which the sum of the frequencies is exactly 1. Therefore, if you express each frequency as a percentage of the total, you get a normalized histogram.
What is the use of a normalized histogram? Well, if you studied probability and/or statistics, you might know that one property required for a function to be a probability distribution for a random variable is that the total area under the curve is 1. That's for continuous-variable functions. For discrete functions, the requirements is that the sum of all values of the function is 1. So a normalized histogram can be thought of a probability distribution function which shows how probable each of the values of your random variable is.