What is the consequence of negative probability in discrete_distribution? - c++

What is the consequence if we pass a probability vector with non-negative value into discrete distribution,e.g.:
discrete_distribution d({1,-2,3});
I try to search the documentation about it but seems no one mentions about it! is it undefined behaviour? I test it by asking it to generate a random number, it seems never returns 1 (which is the second element), is it real?

From the standard:
Unless specified otherwise, the distribution parameters are calculated as: pk = wk/S for k = 0, . . . , n−1 , in which the values wk, commonly known as the weights , shall be non-negative, non-NaN, and non-infinity.
Therefore undefined behaviour.

Related

Should this Sympy 1.0 code print True instead of None?

The variable xx is made positive by assumption.
Therefore exp(xx) > 1 and exp(xx)-1 must be positive.
There seem to be similar examples on this page that return derived results:
http://docs.sympy.org/latest/modules/assumptions/
I understand relations don't work right yet in Sympy 1.0.
Have I run into that issue indirectly here?
# Should this Sympy 1.0 code print True (or why not?)
# This code prints `None'
from sympy import *
from sympy.assumptions.assume import global_assumptions
xx=symbols('xx')
xxPos= Q.positive(xx)
with assuming(xxPos):
print(ask(Q.positive(exp(xx)-1)))
## None
The problem is simple: assumptions on ranges are not implemented. The only allowed subranges of real numbers are positive, negative, nonpositive and nonnegative intervals.
Your expression exp(xx)-1 is an Add object containing an exp object and -1 as addends. To have an idea, look at the code handling the positive assumptions for Add:
https://github.com/sympy/sympy/blob/sympy-1.0/sympy/assumptions/handlers/order.py#L267
The evaluation of the positivity Add is clear from the code:
it makes sure that the expression is a real number first (i.e. not a complex).
if any term in the addition is negative, it returns None.
otherwise it counts the number of nonnegative terms in the addition: if they are less than the number of terms (i.e. there are some positive terms), it return True, otherwise None.
In your expression there's a negative number, so the for loop will be interrupted, the default value None will be returned.
NOTE
This description may be appear to lack the case in which all terms are negative. The pointed function returns None in that case, which is later handled by the SAT solver and determined to be False.

Good way to detect identical expressions in C++

I am writing a program that solves this puzzle game: some numbers and a goal number is given, and you make the goal number using the n numbers and operators +, -, *, / and (). For example, given 2,3,5,7 and the goal number 10, the solutions are (2+3)*(7-5)=10, 3*5-(7-2)=10, and so on.
The catch is, if I implement it naively, I will get a bunch of identical solutions, like (2+3)*(7-5)=10 and (3+2)*(7-5)=10, and 3*5-(7-2)=10 and 5*3-(7-2)=10 and 3*5-7+2=10 and 3*5+2-7=10 and so on. So I'd like to detect those identical solutions and prune them.
I'm currently using randomly generated double numbers to detect identical solutions. What I'm doing is basically substituting those random numbers to the solution and check if there are any pairs of them that calculate to the same number. I have to perform the detection at every node of my search, so it has to be fast, and I use hashset for it now.
Now the problem is the error that comes with the calculation. Because even identical solutions do not calculate to the exactly same value, I currently round the calculated value to a precision when storing in the hashset. However this does not seem to work well enough, and gives different number of solutions every time to the same problem. Sometimes the random numbers are bad and prune some completely different solutions. Sometimes the calculated value lies on the edge of rounding function and it outputs two(or more) identical solutions. Is there a better way to do this?
EDIT:
By "identical" I mean two or more solutions(f(w,x,y,z,...) and g(w,x,y,z,...)) that calculate to the same number whatever the original number(w,x,y,z...) is. For more examples, 4/3*1/2 and 1*4/3/2 and (1/2)/(3/4) are identical, but 4/3/1/2 and 4/(3*1)/2 are not because if you change 1 to some other number they will not produce the same result.
It will be easier if you "canonicalize" the expressions before comparing them. One way would be to sort when an operation is commutative, so 3+2 becomes 2+3 whereas 2+3 remains as it was. Of course you will need to establish an ordering for parenthesized groups as well, like 3+(2*1)...does that become (1*2)+3 or 3+(1*2)? What the ordering is doesn't necessarily matter, so long as it is a total ordering.
Generate all possibilities of your expressions. Then..
When you create expressions, put them in a collection of parsed trees (this would also eliminate your parenthesis). Then "push down" any division and subtraction into the leaf nodes so that all the non-leaf nodes have * and +. Apply a sorting of the branches (e.g. regular string sort) and then compare the trees to see if they are identical.
I like the idea of using doubles. The problem is in the rounding. Why not use a container SORTED by the value obtained with one random set of double inputs. When you find the place you would insert in that container, you can look at the immediately preceding and following items. Use a different set of random doubles to recompute each for the more robust comparison. Then you can have a reasonable cutoff for "close enough to be equal" without arbitrary rounding.
If a pair of expressions are close enough for equal in both the main set of random numbers and the second set, the expressions are safely "same" and the newer one discarded. If close enough for equal in the main set but not the new set, you have a rare problem, that probably requires rekeying the entire container with a different random number set. If not close enough in either, then they are different.
For the larger n suggested by one of your recent comments, I think you would need the better performance that should be possible from a canonical by construction method (or maybe "almost" canonical by construction) rather than a primarily comparison based approach.
You don't want to construct an incredibly large number of expressions, then canonicalize and compare.
Define a doubly recursive function can(...) that takes as input:
A reference to a canonical expression tree.
A reference to one subexpression of that tree.
A count N of inputs to be injected.
A set of flags for prohibiting some injections.
A leaf function to call.
If N is zero, can just calls the leaf function. If N is nonzero, can patches the subtree in every possible way that produces a canonical tree with N injected variables, and calls the leaf function for each and restores the tree, undoing each part of the patch as it is done with it, so we never need massive copying.
X is the subtree and K is a leaf representing variable N-1. First can would replace the subtree temporarily one at a time with subtrees representing some of (X)+K, (X)-K, (X)*K, (X)/K and K/(X) but both flags and some other rules would cause some of those to be skipped. For each not skipped, recursively call itself with the whole tree as both top and sub, with N-1, and with 0 flags.
Next drill into the two children of X and call recursively itself with that as the subtree, with N, and with appropriate flags.
The outer just calls can with a single node tree representing variable N-1 of the original N, and passing N-1.
In discussion, it is easier to name the inputs forward, so A is input N-1 and B is input N-2 etc.
When we drill into X and see it is Y+Z or Y-Z we don't want to add or subtract K from Y or Z because those are redundant with X+K or X-K. So we pass a flag that suppresses direct add or subtract.
Similarly, when we drill into X and see it is Y*Z or Y/Z we don't want to multiply or divide either Y or Z by K because that is redundant with multiplying or dividing X by K.
Some cases for further clarification:
(A/C)/B and A/(B*C) are easily non canonical because we prefer (A/B)/C and so when distributing C into (A/B) we forbid direct multiplying or dividing.
I think it takes just a bit more effort to allow C/(A*B) while rejecting C/(A/B) which was covered by (B/A)*C.
It is easier if negation is inherently non canonical, so level 1 is just A and does not include -A then if the whole expression yields negative the target value, we negate the whole expression. Otherwise we never visit the negative of a canonical expression:
Given X, we might visit (X)+K, (X)-K, (X)*K, (X)/K and K/(X) and we might drill down into the parts of X passing flags which suppress some of the above cases for the parts:
If X is a + or - suppress '+' or '-' in its direct parts. If X is a * or / suppress * or divide in its direct parts.
But if X is a / we also suppress K/(X) before drilling into X.
Since you are dealing with integers, I'd focus on getting an exact result.
Claim: Suppose there is some f(a_1, ..., a_n) = x where a_i and x are your integer input numbers and f(a_1, ..., a_n) represents any functions of your desired form. Then clearly f(a_i) - x = 0. I claim, we can construct a different function g with g(x, a_1, ..., a_n) = 0 for the exact same x and g only uses ()s, +, - and * (no division).
I'll prove that below. Consequently you could construct g evaluate g(x, a_1, ..., a_n) = 0 on integers only.
Example:
Suppose we have a_i = i for i = 1, ..., 4 and f(a_i) = a_4 / (a_2 - (a_3 / 1)) (which contains divisions so far). This is how I would like to simplify:
0 = a_4 / (a_2 - (a_3 / a_1) ) - x | * (a_2 - (a_3 / a_1) )
0 = a_4 - x * (a_2 - (a_3 / a_1) ) | * a_1
0 = a_4 * a_1 - x * (a_2 * a_1 - (a_3) )
In this form, you can verify your equality for some given integer x using integer operations only.
Proof:
There is some g(x, a_i) := f(a_i) - x which is equivalent to f. Consider any equivalent g with as few as possible division. Assume there is at least one (otherwise we are done). Assume within g we divide by h(x, a_i) (any of your functions, may contain divisions itself). Then (g*h)(x, a_i) := g(x, a_i) * h(x, a_i) has the same roots, as g has (multiplying by a root, ie. (x, a_i) where g(a_i) - x = 0, preserves all roots). But on the other hand, g*h is composed of one division fewer. A contradiction (g with minimum number of divisions), which is why g doesn't contain any division.
I've updated the example to visualize the strategy.
Update: This works well on rational input numbers (those represent a single division p/q). This should help you. Other input can't be provided by humans.
What are you doing to find / test f's? I'd guess some form of dynamic programming will be fast in practice.

Mathematically defined result of an expression

What does mathematically defined result mean?
There is a quote from 5/4:
If during the evaluation of an expression, the result is not
mathematically defined or not in the range of representable values for
its type, the behavior is undefined.
There's a note right after this statement, which provides some types of examples:
[ Note: most existing implementations of C++ ignore integer overflows. Treatment of division by zero, forming a remainder using a zero divisor, and all floating point exceptions vary among machines, and is usually adjustable by a library function. —end note ]
For example, 0/0 is not mathematically defined.
The case of 1/0 is slightly different, but in practice, for the C++ standard you can be sure that it's not viewed as mathematically defined.
The mathematical way to state this would be that the behavior is undefined iff the inputs are not elements of the natural domain of the function.
(The second part, about results being representable, translates to a restriction on the codomain of the function)
It depends on context. Mathematically not defined means simply: it is not defined under mathematics.
Suppose you want to divide by 0 but it is not defined : Division is the inverse of multiplication. If a \ b=c, then b * c=a.
But if b=0 , then any multiple of b is also 0 , and so if a != 0 , no such c exists.
On the other hand, if a and b are both zero, then every real number c satisfies b * c=a. Either way, it is impossible to assign a particular real number to the quotient when the divisor is zero. (From wikipedia).
In algebra, a function is said to be "undefined" at points not in its domain.
In geometry,In ancient times, geometers attempted to define every term. For example, Euclid defined a point as "that which has no part". In modern times, mathematicians recognized that attempting to define every word inevitably led to circular definitions, and in geometry left some words, "point" for example, as undefined.
So it depends on context.
In programming context, you can assume that it means divide by 0 or out of a defined range,causing overflow.
"Mathematically defined result" means the same thing it would mean in other similar contexts. Namely, there are constructs in most programming languages that represent, to that or another degree of accuracy, various mathematical concepts via some kind of natural mapping. This mapping is normally not defined in the documentation but is implied by reader's understanding of the natural language and common sense. In the C++ standard, it is referred in various places as "usual" or "ordinary" mathematical rules.
If you are unsure what this mapping is, I guess you can submit a defect report, but usually users of the standard know what 2+2 maps to, just as they know what words like if or the or appear or requirement mean.
It is true that 2+2 can be conceivably mapped to various mathematical constructs, not necessarily even connected to say Peano arithmetic, but that's not an "ordinary" or "usual" mathematics by the standards of C++ community.
So to answer the question, take an expression and map it to the corresponding mathematical concept. If it is not defined, then it is not defined.

what's the difference between mid=(beg+end)/2 and mid=beg+(end-beg)/2 in binary search?

It is a problem from C++ primer fifth edition problem 3.26, I don't know the difference between them ?
May be the second one can avoid overflow.
May be the second one can avoid overflow.
Exactly. There's no guarantee that beg+end is representable; but in the second case the intermediate values, as well as the expected result, are no larger than end, so there is no danger of overflow.
The second form can also be used for affine types like pointers and other random-access iterators, which can be subtracted to give a distance, but not added together.
In general case the both expressions are invalid. For example the first expression is invalid because there is no such operation as + for pointers or iterators.
The second expression is invalid in case when non-random access iterators are used. For example when bidirectional iterators are used.
So the correct construction in C++ will look the following way
mid = std::next( beg, std::distance( beg, end ) / 2 );
If we consider the two lines in a more generic setting, not related to binary search, the following observations can be made:
You are correct that the problem the second form tries to avoid is overflow, attempting to represent a number that is larger than the maximum representable number.
There is no restriction on how large the individual numbers beg and end are, so potentially they can both be larger than half of the maximum representable number. Adding them means that the intermediate result (beg+end) can overflow.
The second solution seems to eliminate the risk of overflowing, but introduces another one. If the values are signed values, their difference can again overflow (or underflow, depending on their signs). Unsigned values have no problem.
There is another solution which you didn't post:
mid = beg/2 + end/2
This solves every problem with overflow and underflow, but introduces a new problem, of precision loss. If working with integer values, division by 2 can give a result off by 0.5, adding these together, means that mid can be off by 1:
mid = 3/2 + 5/2; // mid is 3, instead of the 4 expected
Working with floating point values has other precision problems.
Getting back to the problem at hand, the binary search, it's easy to see that beg and end are unsigned values, so the second solution will always give the correct result.
The answer is in the book:
"Because the iterator returned from end does not denote an element, it may
not be incremented or dereferenced."
Graphically it makes sense as an asymmetric range,
[begin, off-the-end)
or half-open range.
From Accelerated C++, page 28, Koenig.

Is it defined to provide an empty range to C++ standard algorithms?

Following on from my previous question, can we prove that the standard allows us to pass an empty range to a standard algorithm?
Paragraph 24.1/7 defines an "empty range" as the range [i,i) (where i is valid), and i would appear to be "reachable" from itself, but I'm not sure that this qualifies as a proof.
In particular, we run into trouble when looking at the sorting functions. For example, std::sort:
Complexity: O(N log(N)) (where N == last - first) comparisons
Since log(0) is generally considered to be undefined, and I don't know what 0*undefined is, could there be a problem here?
(Yes, ok, I'm being a bit pedantic. Of course no self-respecting stdlib implementation would cause a practical problem with an empty range passed to std::sort. But I'm wondering whether there's a potential hole in the standard wording here.)
Big-O notation is defined in terms of the limit of the function. An algorithm with actual running time g(N) is O(f(N)) if and only if lim N→∞ g(N)/f(N) is a non-negative real number g(N)/f(N) is less than some positive real number C for all values N greater than some constant k (the exact values of C and k are immaterial; you just have to be able to find any C and k that makes this true). (thanks for the correction, Jesse!)
You'll note that the actual number of elements is not relevant in big-O analysis. Big-O analysis says nothing about the behavior of the algorithm for small numbers of elements; therefore, it does not matter if f(N) is defined at N=0. More importantly, the actual runtime behavior is controlled by a different function g(N), which may well be defined at N=0 even if f(0) is undefined.
I don't seem much room for question. In §24.1/6 we're told:
An iterator j is called reachable from an iterator i if and only if there is a finite sequence of applications of the expression ++i that makes i == j.
and in $24.1/7:
Range [i, j) is valid if and only if j is reachable from i.
Since 0 is finite, [i, i) is a valid range. §24.1/7 goes on to say:
The result of the application of functions in the library to invalid ranges is
undefined.
That doesn't go quite so far as to say that a valid range guarantees defined results (reasonable, since there are other requirements, such as on the comparison function) but certainly seems to imply that a range being empty, in itself, should not lead to UB or anything like that. In particular, however, the standard makes an empty range just another valid range; there's no real differentiation between empty and non-empty valid ranges, so what applies to a non-empty valid range applies equally well to an empty valid range.
Apart from the relevant answer given by #bdonlan, note also that f(n) = n * log(n) does have a well-defined limit as n goes to zero, namely 0. This is because the logarithm diverges more slowly than any polynomial, in particular, slower than n. So all is well :-)