Detect extreme value vector C++ [closed] - c++

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
I have a vector of values (delays) like this {2,4,6,3,4,5,6,4,..} in C++. My objective is to detect when a new value is an extreme value - for example, 96. I am trying to come up with a general check to detect instead of putting specific numerical checks.
By extreme value I mean 96 would be X times larger than 2 or 3 or 4. However, if I have delays as {15,23,10,26,..} and then a value 550, which is Y times larger than normal - I want to detect.
I need to start with the standard deviation, but not sure about the best approach further.
Thank you.

In the absence of any other statistical information, compute the mean and the standard deviation of the mean of your existing data, and if the new point is more than 3 standard deviations of the mean outside that mean, then don't add it.
After you have a certain number of points so you can be reasonably sure that the central limit theorem has started to work its magic (20 points as a rule of thumb, especially as "delays" implies "Poisson" on first glance), develop an algorithm to eliminate any outliers that might have been added to the initial set. Do that by considering each added point in turn - eliminate it, and see it matches the criteria for inclusion. This step is important: it's designed to fail an outlier that's introduced early; e.g. {2, 96, 4, 6, 3, 4, 5}. For really hostile data you might need to increase the dimensionality of that algorithm.
This is a tricky science - you'll have to calibrate this to suit your requirements but what I suggest will get you started.

Related

Interior Point Method(Path following) vs Simplex [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
What are pros and cons of these two LP methods ?
I can only think of less iterations in Interior Point Method (when LPP is sufficiently large).
I'm going to list some features of both algorithms to explain what differentiates them.
Simplex
provides a basic solution, useful for branch and bound solvers in integer programming
easy to warm (or hot) start from a suboptimal solution, also necessary for integer programming
very high iteration speed mainly due to preservation of sparse data structures, but sometimes requires many iterations to reach optimality
memory efficient
numerically very stable
Interior Point
iteration count independent of problem size
often faster to reach optimality
easier to parallelize (Cholesky factorization)
In summary, IPM is the way to go for pure LPs, while for reoptimization-heavy applications like (mixed) integer programming the Simplex is better suited. One may also combine both approaches and perform a Simplex-like cross-over after the IPM found an optimal solution to get a basic one.
Often, it is a good idea to try both methods and decide then what works best, because performance is very much problem dependent.

C++ sort() function algorithm [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
Some days ago I wanted to use C++ sort() function to sort an array of strings, but I had a problem!
What algorithm does it use to sort the array? Is it a deterministic one or may it use different algorithms based on the type of the array?
Also, is there a clear time complexity analysis about it?
Does this function use the same algorithm for sorting numbers array and strings array?
It might or it might not. That is not specified by the standard.
And if we use it to sort an array of strings which the total size of them is less than 100,000 characters, would it work in less than 1 second(in the worst case)?
It might or it might not. It depends on the machine you're running the program on. Even if it will work in less than 1 second in worst case on a particular machine, it would be difficult to prove. But you can get a decent estimation by measuring. A measurement only applies to the machine it was performed, of course.

Most frequent value in dataset (with variation) [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
This is a part programming, part statistical math question.
I have a dataset where I want to get the most frequent number (mode), the problem is that I am dealing with values with slight variation.
So normally {1,2,50,50,90} the most frequent number would be 50
But in my case the numbers look like this:
{1,2,49,50,51,90} but the result is still 50
So my question is how can I efficiently calculate this number and is there a statistical term for this number?
Some pseudo code:
Float items.val[] = {1,2,49,50,51,90};
Float threshold = 4;
For (item in items) {
For (subitem in items){
Float dist=Distance(time,subitem)
If (dist < threshold){
item.dist += dist
}
}
}
Output=Sort(item.dist)[0]
There are various ways to go about this.
(1) the most careful, exact way is to assume a probabilistic model for the observed values, and look for the mode (as the expected value or most probable or some other criterion) of the inferred values. I am going to guess this is far too much work in this case, although given unlimited time I would certainly want to approach it that way.
(2) construct a histogram, and look for the bin which has the greatest density (with density = (#items in bin)/(width of bin)). This doesn't necessarily yield a single value.
(3) fit a parametric distribution to the observed values, and report the mode of the fitted distribution.
You might get more traction for this question at stats.stackexchange.com. Good luck and have fun.
EDIT: After looking at your example code, I see it is not too different from (2) above. It seems like a reasonable and workable approach.

Efficiency vs Memory tradeoff [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I am creating an interactive sudoku board in c++. Whenever the user changes a value, I would like to check if the board is completed. The board will be completed when all spaces on the board are filled. My two ideas of how to do this are:
Create a private data member that holds the amount of filled spaces. To check if the board is completed I will simply have to check if this value equals boardLength^2
Create a member function that iterates through the board and returns false when a blank space is found and true if it goes through the board without finding any blank spaces
Is this a matter of preference, or is there a more accepted/correct way to do this?
Is this a matter of preference, or is there a more accepted/correct way to do this?
There is an accepted and correct way of optimizing, in general:
Optimize for speed or memory footprint when you actually need to, when you identify an actual problem. Your project's unique requirements will govern what constitutes a "problem".
Otherwise, optimize your code for readability and maintainability.
In your particular case:
Chances are that no matter which algorithm you choose, your check will happen so quickly that you will not be able to measure it, and the user will never notice the difference between the simple solution and the "fast" solution. Any attempts to optimize this (at the cost of complexity or readability or time spent writing code) are poor trade-offs.
Use the simplest possible solution. Once finished, if there is a noticeable delay on user input, and you can confirm that it's caused by an inefficient check for board completion, consider ways to improve your algorithm.

C++ comparing two value to find which is closest to user input value [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I have been looking on the internets for a while to find a solution to my problem. First some back ground. I'm writing a program that calculates catapult trajectory. The user must first type in a distance. Then I loop through the combinations of angle degrees and velocity to find which combination will give a distance that will come the closest to the users input. I don't quite know how to do the variable comparison to find which combination of degrees and velocity produces a distance closest to a users input of distance. I'm just trying to keep it simple and easy as possible. Also, I'm not using any kind of array to store the values. I want it done on the fly inside my for loops if possible. Any suggestions?
Well, the answer to this depends on the complexity of your trajectory formula. I'm guessing that you're not taking fluid dynamics or gravity differentials into consideration. In fact, what I imagine is that you're using a basic parabolic equation...
That equation can be solved directly by rearranging. But the thing is, you're solving for two variables that are actually co-dependent. There are infinite solutions if you allow both angle and velocity to vary, so you need to restrict the 'best' answer by some criteria (for example, desired angle or desired velocity).
If you have more variables, like lift, drag, spin, incident shape, non-constant gravity, air pressure and humidity, then you will need to employ a minimization algorithm which is non-trivial. One of the most basic, but a little unstable, is the Nelder-Mead algorithm.
If this has not been helpful enough, you should provide more information about your problem, and show some code.