Subtracting numbers, result should be 0, but getting some strange exponential numbers - sas

I'm pretty new here...
I have a (simple, hopefully) problem. I am taking a column (a) and subtracting from it columns (x,y,z) and the result SHOULD be 0.
Some are 0 as expected, but on some others I get exponential numbers, but they should be 0... any idea what could be causing this? or how to fix it? floor?...

Related

An efficient way to calculate extremely large powers of 2

I am solving a problem which requires me to calculate the sum of squares of all possible subsets of a set. I am required to return this sum, modulo 10^9+7
I have understood the logic. I just need to sum the squares and multiply the result by 2^N-1, where N is the size of the set.
But the issue is that N can be as big as 10^5.
And for this, I am getting an integer overflow.
I looked into fast modular exponentiation but still where would I store something as huge as 2^100000 ?
Can I use the modulo as I calculate the power of 2, to keep the number down? Wouldn't that change the final value?
If anyone can tell me how to get it or what to read into, it would be really helpful.
If you modulo some value with 2^something_big it just means that you don't have to output bits beyond something_big. For instance x%power(2,10) == x%(1<<10) == x&(1<<10 - 1) == x&1023.
So in your case, the problem is computing the actual value before the modulo while keeping in mind that you only need 99999 bits. All higher bits are to be dropped (and should not influence the result if I understand your premise correctly).
Btw. storing 99999 bits is doable. It's just 13kB.

Adding large numbers returns strange, large numbers

I am trying to do some calculations in Fortran that looks like:
large number (order E40) - large number (order E40)
I should get back zero. Most of the time it works, but in a couple of cases I'm getting weird numbers. One answer Fortran gave me was -1E20. Another weird answer I got was 32768, which is 2^15, oddly enough.
Does anyone have any clue as to why this is happening?
It's hard to tell without actual code, but...
This is only to be expected if the numbers are sufficiently similar. While 1e20 is pretty large compared to 1 or 2, it is pretty small compared to 1e40.
In fact, even with double precision, you only have 15-17 digits of precision. Considering that, the values you get are below the accuracy possible with numbers in the range of 1e40.
What you see is numerical noise.
[ Another possibility, of course, is that you are trying to do this in single precision. This is not possible (max. exponent ~38) and anything might happen. ]

Division by Multiplication and Shifiting

Why when you use the multiplication/shift method of division (for instance multiply by 2^32/10, then shift 32 to the right) with negative numbers you get the expected result minus one?
For instance, if you do 99/10 you get 9, as expected, but if you do -99 / 10 you get -10.
I verified that this is indeed the case (I did this manually with bits) but I can't understand the reason behind it.
If anyone can explain why this happens in simple terms I would be thankful.
Why when you use the multiplication/shift method of division (for instance multiply by 2^32/10, then shift 32 to the right) with negative numbers you get the expected result minus one?
You get the expected result, rounded down.
-99/10 is -9.9 which is -10 rounded down.
Edit: Googled a bit more, this article mentions that you're supposed to handle negatives as a special case:
Be aware that in the debug mode the optimized code can be slower, especially if you have both negative and positive numbers and you have to handle the sign yourself.

c++ Floating point subtraction error and absolute values

The way I understand it is: when subtracting two double numbers with double precision in c++ they are first transformed to a significand starting with one times 2 to the power of the exponent. Then one can get an error if the subtracted numbers have the same exponent and many of the same digits in the significand, leading to loss of precision. To test this for my code I wrote the following safe addition function:
double Sadd(double d1, double d2, int& report, double prec) {
int exp1, exp2;
double man1=frexp(d1, &exp1), man2=frexp(d2, &exp2);
if(d1*d2<0) {
if(exp1==exp2) {
if(abs(man1+man2)<prec) {
cout << "Floating point error" << endl;
report=0;
}
}
}
return d1+d2;
}
However, testing this I notice something strange: it seems that the actual error (not whether the function reports error but the actual one resulting from the computation) seems to depend on the absolute values of the subtracted numbers and not just the number of equal digits in the significand...
For examples, using 1e-11 as the precision prec and subtracting the following numbers:
1) 9.8989898989898-9.8989898989897: The function reports error and I get the highly incorrect value 9.9475983006414e-14
2) 98989898989898-98989898989897: The function reports error but I get the correct value 1
Obviously I have misunderstood something. Any ideas?
If you subtract two floating-point values that are nearly equal, the result will mostly reflect noise in the low bits. Nearly equal here is more than just same exponent and almost the same digits. For example, 1.0001 and 1.0000 are nearly equal, and subtracting them could be caught by a test like this. But 1.0000 and 0.9999 differ by exactly the same amount, and would not be caught by a test like this.
Further, this is not a safe addition function. Rather, it's a post-hoc check for a design/coding error. If you're subtracting two values that are so close together that noise matters you've made a mistake. Fix the mistake. I'm not objecting to using something like this as a debugging aid, but please call it something that implies that that's what it is, rather than suggesting that there's something inherently dangerous about floating-point addition. Further, putting the check inside the addition function seems excessive: an assert that the two values won't cause problems, followed by a plain old floating-point addition, would probably be better. After all, most of the additions in your code won't lead to problems, and you'd better know where the problem spots are; put asserts in the problems spots.
+1 to Pete Becker's answer.
Note that the problem of degenerated result might also occur with exp1!=exp2
For example, if you subtract
1.0-0.99999999999999
So,
bool degenerated =
(epx1==exp2 && abs(d1+d2)<prec)
|| (epx1==exp2-1 && abs(d1+2*d2)<prec)
|| (epx1==exp2+1 && abs(2*d1+d2)<prec);
You can omit the check for d1*d2<0, or keep it to avoid the whole test otherwise...
If you want to also handle loss of precision with degenerated denormalized floats, that'll be a bit more involved (it's as if the significand had less bits).
It's quite easy to prove that for IEEE 754 floating-point arithmetic, if x/2 <= y <= 2x then calculating x - y is an exact operation and will give the exact result correctly without any rounding error.
And if the result of an addition or subtraction is a denormalised number, then the result is always exact.

Fastest way to find sum of digits on big numbers

I have some big numbers (again) and i need to find if the sum of the digits is an even number.
I tried this: finding the sum of the digits with a while loop and then checking if that sum % 2 equals 0 and it's working but it's too slow for big numbers, because i am given intervals of numbers and if the input is 1999999 19999999999 then my program fails, i cannot complete within the time limit which is 0,1 sec.
What to do ? Is there any other faster way to do this ?
EDIT: The input 1999999 19999999999 means it will start with 1999999 and check all the numbers like i wrote above until 19999999999, and because we are talking about big numbers (< 2^30) my program is not worthy.
You don't need to sum the digits. Think about it. The sum starts with zero, which is generally regarded as even (although you can special case this if you want).
Each even digit changes nothing. If the sum was odd, it stays odd, if it was even it stays even.
Each odd digit changes the sum from even to odd, or odd to even.
So, just count the number of odd digits. If the number is even, then the sum of all the digits is even. If the number is odd, then the sum of all the digits is odd.
Now, you only need to do this for the FIRST number in your range. What you need to do next is figure out how the evenness or oddness of the numbers change as you keep adding one.
I leave this as an exercise for the reader. Homework has to involve some work!
Hint: if you find that the sum of the digits of a given number n is odd, will the sum of the digits of the number n + 1 be odd or even?
Update: as #Mark pointed out, it is not so simple... but the anomalies appear only when n + 1 is a multiple of 10, i.e. (n + 1) % 10 == 0. Then the oddity does not change. However, out of these cases, every 10th is an exception when the oddity does change still (e.g. 199 -> 200). And so on... basically, depending on where the highest value 9 of n is, one can decide whether or not the oddity changes between n and n + 1. I admit it is a bit tedious to calculate, but still I am sure it is faster than just adding up all these digits...
Here is a hint, it may work -- you don't need to sum the digits you just need to know if the result will be odd or even -- if you start with the assumption your total is even, even numbers have no effect, odd number toggle (ie an odd number of odd digits make it odd).
Depending on the language there may be a faster way to perform the calculation without adding.
Also remember -- a number is odd or even based on its last binary digit.
Example:
In ASM you could XOR the low order bit to get the correct result
In FORTH this would not work so well...