4 randomly pulled cards at least one would be ace - c++

Please help me my c++ program that I don't know how to write. Question is as below.
There is a well mixed deck of 32 cards. Method of statistical tests to obtain the probability of an event that of the 4 randomly pulled charts at least one would be ace.
Compare the value of the error of calculating the probability of the true error (the true probability value is approximately equal to 0.432). Vary the number of experiments n.

What are the odds of not drawing an ace in one draw?
In four successive draws?
What are the odds that that doesn't happen?

From what I understand of your question, you have already calculated the odds of drawing the ace, but now need a program to prove it.
Shuffle your cards.
Draw 4 cards.
Check your hand for the presence of an ace.
Repeat these steps n times, where n is the number of test you need to make. Your final, "proven" probability is a/n, where a is the number of times an ace came up.
Of course, given the nature of randomness, there's no way to ensure that your results will be near the mathematical answer, unless you have the time available to make n equal to infinity.

Unfortunately I need to 'answer' rather than comment as I would wish because my rep is not high enough to allow me to do so.
There is information missing which will make it impossible to be sure of providing a correctly functioning program.
Most importantly coming to your problem from a mathematical /probability background :
I need to know for sure how many of the reduced deck of 32 cards are aces!
Unfortunately this sentence :
Method of statistical tests to obtain
the probability of an event that of
the 4 randomly pulled charts at least
one would be ace.
is mathematical goobledygook!
You need to correctly quote the sentences given to you in your assignment.
Those sentences hold vital information on which depends what the c++ program is to simulate!

Related

Reinventing The Wheel: Random Number Generator

So I'm new to C++ and am trying to learn some things. As such I am trying to make a Random Number Generator (RNG or PRNG if you will). I have basic knowledge of RNGs, like you have to start with a seed and then send the seed through the algorithm. What I'm stuck at is how people come up with said algorithms.
Here is the code I have to get the seed.
int getSeed()
{
time_t randSeed;
randSeed = time(NULL);
return randSeed;
}
Now I know that there is are prebuilt RNGs in C++ but I'm looking to learn not just copy other people's work and try to figure it out.
So if anyone could lead me to where I could read or show me examples of how to come up with algorithms for this I would be greatly appreciative.
First, just to clarify, any algorithm you come up with will be a pseudo random number generator and not a true random number generator. Since you would be making an algorithm (i.e. writing a function, i.e. making a set of rules), the random number generator would have to eventually repeat itself or do something similar which would be non-random.
Examples of truly random number generators are one's that capture random noise from nature and digitize it. These include:
http://www.fourmilab.ch/hotbits/
http://www.random.org/
You can also buy physical equipment that generate white noise (or some other means on randomness) and digitally capture it:
http://www.lavarnd.org/
http://www.idquantique.com/true-random-number-generator/products-overview.html
http://www.araneus.fi/products-alea-eng.html
In terms of pseudo random number generators, the easiest ones to learn (and ones that an average lay person could probably make on their own) are the linear congruential generators. Unfortunately, these are also some of the worst PRNGs there are.
Some guidelines for determining what is a good PRNG include:
Periodicity (what is the range of available numbers?)
Consecutive numbers (what is the probability that the same number will be repeated twice in a row)
Uniformity (Is it just as likely to pick numbers from a certain sub range as another sub range)
Difficulty in reverse engineering it (If it is close to truly random then someone should not be able to figure out the next number it generates based on the last few numbers it generated)
Speed (how fast can I generate a new number? Does it take 5 or 500 arithmetic operations)
I'm sure there are others I am missing
One of the more popular ones right now that is considered good in most applications (ie not crptography) is the Mersenne Twister. As you can see from the link, it is a simple algorithm, perhaps only 30 lines of code. However trying to come up with those 20 or 30 lines of code from scratch takes a lot of brainpower and study of PRNGs. Usually the most famous algorithms are designed by a professor or industry professional that has studied PRNGs for decades.
I hope you do study PRNGs and try to roll your own (try Knuth's Art of Computer Programming or Numerical Recipes as a starting place), but I just wanted to lay this all out so at the end of the day (unless PRNGs will be your life's work) its much better to just use something someone else has come up with. Also, along those lines, I'd like to point out that historically compilers, spreadsheets, etc. don't use what most mathematicians consider good PRNGs so if you have a need for a high quality PRNGs don't use the standard library one in C++, Excel, .NET, Java, etc. until you have research what they are implementing it with.
A linear congruential generator is commonly used and the Wiki article explains it pretty well.
To quote John von Neumann:
Anyone who considers arithmetical
methods of producing random digits is
of course in a state of sin.
This is taken from Chapter 3 Random Numbers of Knuth's book "The Art of Computer Programming", which must be the most exhaustive overview of the subject available. And once you have read it, you will be exhausted. You will also know why you don't want to write your own random number generator.
The correct solution best fulfills the requirements and the requirements of every situation will be unique. This is probably the simplest way to go about it:
Create a large one dimensional array
populated with "real" random values.
"seed" your pseudo-random generator by
calculating the starting index with
system time.
Iterate through the array and return
the value for each call to your
function.
Wrap around when it reaches the end.

What's the best way to unit test code that generates random output?

Specifically, I've got a method picks n items from a list in such a way that a% of them meet one criterion, and b% meet a second, and so on. A simplified example would be to pick 5 items where 50% have a given property with the value 'true', and 50% 'false'; 50% of the time the method would return 2 true/3 false, and the other 50%, 3 true/2 false.
Statistically speaking, this means that over 100 runs, I should get about 250 true/250 false, but because of the randomness, 240/260 is entirely possible.
What's the best way to unit test this? I'm assuming that even though technically 300/200 is possible, it should probably fail the test if this happens. Is there a generally accepted tolerance for cases like this, and if so, how do you determine what that is?
Edit: In the code I'm working on, I don't have the luxury of using a pseudo-random number generator, or a mechanism of forcing it to balance out over time, as the lists that are picked out are generated on different machines. I need to be able to demonstrate that over time, the average number of items matching each criterion will tend to the required percentage.
Random and statistics are not favored in unit tests. Unit tests should always return the same result. Always. Not mostly.
What you could do is trying to remove the random generator of the logic you are testing. Then you can mock the random generator and return predefined values.
Additional thoughts:
You could consider to change the implementation to make it more testable. Try to get as less random values as possible. You could for instance only get one random value to determine the deviation from the average distribution. This would be easy to test. If the random value is zero, you should get the exact distribution you expect in average. If the value is for instance 1.0, you miss the average by some defined factor, for instance by 10%. You could also implement some Gaussian distribution etc. I know this is not the topic here, but if you are free to implement it as you want, consider testability.
According to the Statistical information you have, determine a range instead of a particular single value as a result.
Many probabilistic algorithms in e.g. scientific computing use pseudo-random number generators, instead of a true random number generator. Even though they're not truly random, a carefully chosen pseudo-random number generator will do the job just fine.
One advantage of a pseudo-random number generator is that the random number sequence they produce is fully reproducible. Since the algorithm is deterministic, the same seed would always generate the same sequence. This is often the deciding factor why they're chosen in the first place, because experiments need to be repeatable, results reproducible.
This concept is also applicable for testing. Components can be designed such that you can plug in any source of random numbers. For testing, you can then use generators that are consistently seeded. The result would then be repeatable, which is suitable for testing.
Note that if in fact a true random number is needed, you can still test it this way, as long as the component features a pluggable source of random numbers. You can re-plug in the same sequence (which may be truly random if need be) to the same component for testing.
It seems to me there are at least three distinct things you want to test here:
The correctness of the procedure that generates an output using the random source
That the distribution of the random source is what you expect
That the distribution of the output is what you expect
1 should be deterministic and you can unit test it by supplying a chosen set of known "random" values and inputs and checking that it produces the known correct outputs. This would be easiest if you structure the code so that the random source is passed as an argument rather than embedded in the code.
2 and 3 cannot be tested absolutely. You can test to some chosen confidence level, but you must be prepared for such tests to fail in some fraction of cases. Probably the thing you really want to look out for is test 3 failing much more often than test 2, since that would suggest that your algorithm is wrong.
The tests to apply will depend on the expected distribution. For 2 you most likely expect the random source to be uniformly distributed. There are various tests for this, depending on how involved you want to be, see for example Tests for pseudo-random number generators on this page.
The expected distribution for 3 will depend very much on exactly what you're producing. The simple 50-50 case in the question is exactly equivalent to testing for a fair coin, but obviously other cases will be more complicated. If you can work out what the distribution should be, a chi-square test against it may help.
That depends on the use you make of your test suite. If you run it every few seconds because you embrace test-driven development and aggressive refactoring, then it is very important that it doesn't fail spuriously, because this causes major disruption and lowers productivity, so you should choose a threshold that is practically impossible to reach for a well-behaved implementation. If you run your tests once a night and have some time to investigate failures you can be much stricter.
Under no circumstances should you deploy something that will lead to frequent uninvestigated failures - this defeats the entire purpose of having a test suite, and dramatically reduces its value to the team.
You should test the distribution of results in a "single" unit test, i.e. that the result is as close to the desired distribution as possible in any individual run. For your example, 2 true / 3 false is OK, 4 true / 1 false is not OK as a result.
Also you could write tests which execute the method e.g. 100 times and checks that the average of the distributions is "close enough" to the desired rate. This is a borderline case - running bigger batches may take a significant amount of time, so you might want to run these tests separately from your "regular" unit tests. Also, as Stefan Steinegger points out, such a test is going to fail every now and then if you define "close enough" stricter, or start being meaningless if you define the threshold too loosely. So it is a tricky case...
I think if I had the same problem I probably construct a confidence interval to detect anomalies if you have some statistics about average/stddev and such. So in your case if the average expected value is 250 then create a 95% confidence interval around the average using a normal distribution. If the results are outside that interval you fail the test.
see more
Why not re-factor the random number generation code and let the unit test framework and the source code both use it? You are trying to test your algorithm and not the randomized sequence right?
First you have to know what distribution should result from your random number generation process. In your case you are generating a result which is either 0 or 1 with probability -0.5. This describes a binomial distribution with p=0.5.
Given the sample size of n, you can construct (as an earlier poster suggested) a confidence interval around the mean. You can also make various statements about the probability of getting, for instance, 240 or less of either outcome when n=500.
You could use a normal distribution assumption for values of N greater than 20 as long as p is not very large or very small. The Wikipedia post has more on this.

What is the maximum theoretically possible compression rate?

This is a theoretical question, so expect that many details here are not computable in practice or even in theory.
Let's say I have a string s that I want to compress. The result should be a self-extracting binary (can be x86 assembler, but it can also be some other hypothetical Turing-complete low level language) which outputs s.
Now, we can easily iterate through all possible such binaries and programs, ordered by size. Let B_s be the sub-list of these binaries who output s (of course B_s is uncomputable).
As every set of positive integers must have a minimum, there must be a smallest program b_min_s in B_s.
For what languages (i.e. set of strings) do we know something about the size of b_min_s? Maybe only an estimation. (I can construct some trivial examples where I can always even calculate B_s and also b_min_s, but I am interested in more interesting languages.)
This is Kolmogorov complexity, and you are correct that it's not computable. If it were, you could create a paradoxical program of length n that printed a string with Kolmogorov complexity m > n.
Clearly, you can bound b_min_s for given inputs. However, as far as I know most of the efforts to do so have been existence proofs. For instance, there is an ongoing competition to compress English Wikipedia.
Claude Shannon estimated the information density of the English language to be somewhere between 0.6 and 1.3 bits per character in his 1951 paper Prediction and Entropy of Printed English (PDF, 1.6 MB. Bell Sys. Tech. J (3) p. 50-64).
The maximal (avarage) compression rate possible is 1:1.
The number of possible inputs is equal to the number of outputs.
It has to be to be able to map the output back to the input.
To be able to store the output you need container at the same size as the minimal container for the input - giving 1:1 compression rate.
Basically, you need enough information to rebuild your original information. I guess the other answers are more helpful for your theoretical discussion, but just keep this in mind.

Mahjong-solitaire solver algorithm, which needs a speed-up

I'm developing a Mahjong-solitaire solver and so far, I'm doing pretty good. However,
it is not so fast as I would like it to be so I'm asking for any additional optimization
techniques you guys might know of.
All the tiles are known from the layouts, but the solution isn't. At the moment, I have few
rules which guarantee safe removal of certain pairs of same tiles (which cannot be an obstacle to possible solution).
For clarity, a tile is free when it can be picked any time and tile is loose, when it doesn't bound any other tiles at all.
If there's four free free tiles available, remove them immediately.
If there's three tiles that can be picked up and at least one of them is a loose tile, remove the non-loose ones.
If there's three tiles that can be picked up and only one free tile (two looses), remove the free and one random loose.
If there's three loose tiles available, remove two of them (doesn't matter which ones).
Since there is four times the exact same tile, if two of them are left, remove them since they're the only ones left.
My algorithm searches solution in multiple threads recursively. Once a branch is finished (to a position where there is no more moves) and it didn't lead to a solution, it puts the position in a vector containing bad ones. Now, every time a new branch is launched it'll iterate via the bad positions to check, if that particular position has been already checked.
This process continues until solution is found or all possible positions are being checked.
This works nicely on a layout which contains, say, 36 or 72 tiles. But when there's more,
this algorithm becomes pretty much useless due to huge amount of positions to search from.
So, I ask you once more, if any of you have good ideas how to implement more rules for safe tile-removal or any other particular speed-up regarding the algorithm.
Very best regards,
nhaa123
I don't completely understand how your solver works. When you have a choice of moves, how do you decide which possibility to explore first?
If you pick an arbitrary one, it's not good enough - it's just brute search, basically. You might need to explore the "better branches" first. To determine which branches are "better", you need a heuristic function that evaluates a position. Then, you can use one of popular heuristic search algorithms. Check these:
A* search
beam search
(Google is your friend)
Some years ago, I wrote a program that solves Solitaire Mahjongg boards by peeking. I used it to examine one million turtles (took a day or something on half a computer: it had two cores) and it appeared that about 2.96 percent of them cannot be solved.
http://www.math.ru.nl/~debondt/mjsolver.html
Ok, that was not what you asked, but you might have a look at the code to find some pruning heuristics in it that did not cross your mind thus far. The program does not use more than a few megabytes of memory.
Instead of a vector containing the "bad" positions, use a set which has a constant lookup time instead of a linear one.
If 4 Tiles are visible but can not be picked up, the other Tiles around have to be removed first. The Path should use your Rules to remove a minimum of Tiles, towards these Tiles, to open them up.
If Tiles are hidden by other Tiles, the Problem has no full Information to find a Path and a Probability of remaining Tiles needs to be calculated.
Very nice Problem!

Rigor in capturing test cases for unit testing

Let's say we have a simple function defined in a pseudo language.
List<Numbers> SortNumbers(List<Numbers> unsorted, bool ascending);
We pass in an unsorted list of numbers and a boolean specifying ascending or descending sort order. In return, we get a sorted list of numbers.
In my experience, some people are better at capturing boundary conditions than others. The question is, "How do you know when you are 'done' capturing test cases"?
We can start listing cases now and some clever person will undoubtedly think of 'one more' case that isn't covered by any of the previous.
Don't waste too much time trying to think of every boundry condition. Your tests won't be able to catch every bug first time around. The idea is to have tests that are pretty good, and then each time a bug does surface, write a new test specifically for that bug so that you never hear from it again.
Another note I want to make about code coverage tools. In a language like C# or Java where your have many get/set and similar methods, you should not be shooting for 100% coverage. That means you are wasting too much time writing tests for trivial code. You only want 100% coverage on your complex business logic. If your full codebase is closer to 70-80% coverage, you are doing a good job. If your code coverage tool allows multiple coverage metrics, the best one is 'block coverage' which measures coverage of 'basic blocks'. Other types are class and method coverage (which don't give you as much information) and line coverage (which is too fine grain).
How do you know when you are 'done' capturing test cases?
You don't.You can't get to 100% except for the most trivial cases. Also 100% coverage (of lines, paths, conditions...) doesn't tell you you've hit all boundary conditions.
Most importantly, the test cases are not write-and-forget. Each time you find a bug, write an additional test. Check it fails with the original program, check it passes with the corrected program and add it to your test set.
An excerpt from The Art of Software Testing by Glenford J. Myers:
If an input condition specifies a range of values, write test cases for the ends of the range, and invalid-input test cases for situations just beyond the ends.
If an input condition specifies a number of values, write test cases for the minimum and maximum number of values and one beneath and beyond these values.
Use guideline 1 for each output condition.
Use guideline 2 for each output condition.
If the input or output of a program is an ordered set focus attention on the first and last elements of the set.
In addition, use your ingenuity to search for other boundary conditions
(I've only pasted the bare minimum for copyright reasons.)
Points 3. and 4. above are very important. People tend to forget boundary conditions for the outputs. 5. is OK. 6. really doesn't help :-)
Short exam
This is more difficult than it looks. Myers offers this test:
The program reads three integer values from an input dialog. The three values represent the lengths of the sides of a triangle. The program displays a message that states whether the triangle is scalene, isosceles, or equilateral.
Remember that a scalene triangle is one where no two sides are equal, whereas an isosceles triangle has two equal sides, and an equilateral triangle has three sides of equal length. Moreover, the angles opposite the equal sides in an isosceles triangle also are equal (it also follows that the sides opposite equal angles in a triangle are equal), and all angles in an equilateral triangle are equal.
Write your test cases. How many do you have? Myers asks 14 questions about your test set and reports that highly qualified professional programmes average 7.8 out of a possible 14.
From a practical standpoint, I create a list of tests that I believe must pass prior to acceptance. I test these and automate where possible. Based on how much time I've estimated for the task or how much time I've been given, I extend my test coverage to include items that should pass prior to acceptance. Of course, the line between must and should is subjective. After that, I update automated tests as bugs are discovered.
#Keith
I think you nailed it, code coverage is important to look at if you want to see how "done" you are, but I think 100% is a bit unrealistic a goal. Striving for 75-90% will give you pretty good coverage without going overboard... don't test for the pure sake of hitting 100%, because at that point you are just wasting your time.
A good code coverage tool really helps.
100% coverage doesn't mean that it definitely is adequately tested, but it's a good indicator.
For .Net NCover's quite good, but is no longer open source.
#Mike Stone -
Yeah, perhaps that should have been "high coverage" - we aim for 80% minimum, past about 95% it's usually diminishing returns, especially if you have belt 'n' braces code.