I have this data set with 2 numeric values, these values are calculated by different systems with different precision parameters. So they round differently.
data test;
a = 10;
b= 11;
run;
Basically a and b started out as an almost same float value but due to rounding difference, ended up having a different value.
I need a proc sql which treats values like these as same (i,e. precision of (+/- 1);
So I need this to return true;
proc sql;
select * from test where a = b;
quit;
This is ugly, and assumes you are saying that anything within the range of a single integer should be treated as the same value, then you could do something like:
where max(a,b) - min(a,b) le 1;
This assumes that there are no missing values. If you have missing values you can use something like:
where max(sum(0,a),sum(0,b)) - min(sum(0,a),sum(0,b)) le 1;
Related
I noticed that Stata estimates slightly different sums depending on the level of aggregation of the summands.
To use an example, I have 4 variables (Var1, Var2, Var3, Var4).
Var1 Var2 Var3 Var4
420966 10804428 21982560 1055822272
207381 20133238 69127000 580531008
217297.6 7946694.5 23631250 554597952
327553.2 7505444 10898800 261170592
119776.4 715082.75 607820.3125 414926752
3758613 2533234.5 225734784 88380432
First, I estimate the sum of all 4 variables:
gen sumVars1234 = Var1 + Var2 + Var3 + Var4
// this calculates the same sum as `egen rowtotal`
Then I estimate the sum of Vars 1 and 2, and Vars 3 and 4, separately:
gen sumVars12 = Var1 + Var2
gen sumVars34 = Var3 + Var4
When I add together sumVars12 and sumVars34, this generates sumVars12_34:
gen sumVars12_34 = sumVars12 + sumVars34
gen dif = sumVars12_34 - sumVars1234 // I calculate difference between both sums
However, sumVars12_34 does NOT equal sumVars1234 and I don't understand why.
sumVars12 sumVars34 sumVars12_34 sumVars1234 dif
11225394 1077804800 1089030144 1089030272 -128
20340618 649657984 669998592 669998656 -64
8163992 578229184 586393152 586393216 -64
7832997 272069376 279902368 279902400 -32
834859.125 415534560 416369408 416369440 -32
6291848 314115200 320407040 320407072 -32
I know these differences are very small, and I'm sure there's a simple explanation, but I'm not sure what it is! Any insight would be very much appreciated. Thanks!
It's most likely due to "mixed math" (integers and real floating point type variables). You have digit precision in the input data which also contains Integers, so it's probably due to rounding. I would replicate the calculations in Excel, but only if .0 was added to your whole integers. In Excel, as you may know, you can select all the data in a range of cells, right-click, then select Format Cells-->Number, and specify 1 for Decimal Places. And then do your summing.
Good day,
I had this issue where I was writing some numbers to database, which should have had value 0.1 in SAS, but for some bizarre reason appeared as 0.09 in SQL database. When I manually checked the dataset it showed 0.10 in format 12.2.
So what I do is check if the values are actually 0.1 or somewhat below this:
data _checking;
set publish_data;
if value < 0.1;
dummy = value*10000000;
run;
It appeared that number of observations fulfill the first condition. Ok... That explains why the values come out as 0.09. Rounding issue.
However, all dummy values come out as integers. I tried 10, 100, 1k, 10k all appear to come out as integers. (1, 10, 100 ...)
Next step I try:
data _checking2;
set _checking;
if dummy<10; /*Depending on the factorial*/
run;
This is consistent. Dummy retains the value 'a little below the value shown'.
I solved the issue by round(value,.1);
Questions:
How to observe the actual value stored in dataset? (Especially in case 'a little below')
If first condition if is true, then how can the checking with dummy still show integer values. (Because in computers epsilon has to have actual value)
2.b Or is this just a display issue? Or does SAS has flag for 'value minus epsilon'?
Answer 1:
The most precise and least human way to see the actual value is to observe the underlying IEEE bytes using HEX format.
Answer 2:
The default format for those new dummy variables is BEST12., so you won't see any small offsets if they are smaller than what best12. will show, or more precisely epsilon < 1e-(12-log10(x)). The SAS format could be considered a display issue in this case.
If your use case is that of a 'shown' value must be the actual value sent to a remote database then you will want to use ROUND prior to populating the remote tables.
data x;
x = 1/3; output;
x = 0.1 - 1e-13; output;
format x 12.2;
run;
data y;
set x;
put x= x= HEX16.;
xhex = x;
format xhex hex16.;
array dummy dummy1-dummy13;
do _n_ = 1 to 13;
dummy(_n_) = x * 10**_n_;
end;
run;
proc print data=y;
run;
data z;
do p = 0 to 10;
do q = 1 to 15;
array z z1-z15;
z(q) = 10**p + 10**-q;
end; output;
end;
drop p q;
run;
==== LOG ====
x=0.33 x=3FD5555555555555
x=0.10 x=3FB9999999997D74
==== PRINT ====
Obs x xhex dummy1 dummy2 dummy3 dummy4 dummy5 dummy6 dummy7
1 0.33 3FD5555555555555 3.33333 33.3333 333.333 3333.33 33333.33 333333.33 3333333.33
2 0.10 3FB9999999997D74 1.00000 10.0000 100.000 1000.00 10000.00 100000.00 1000000.00
Obs dummy8 dummy9 dummy10 dummy11 dummy12 dummy13
1 33333333.33 333333333.33 3333333333.3 33333333333 333333333333 3.3333333E12
2 10000000.00 100000000.00 1000000000.0 10000000000 100000000000 999999999999
You can try a different format. try 32.31 or best32.
Subtract 0.1-value and look at the result. Again, use a format with a lot of decimal places.
You are probably not seeing the value in the dummy variables because the epsilon is very small and the dummy is still getting rounded for display.
Try dummy=value*1e16 or higher.
Numbers in SAS are C doubles, fwiw.
Why it is displaying the values of len1 to len3 as 12 in output?
data champ;
array len[4] len1-len4;
do i = 1 to 4;
len[i] = lengthn(len[i]);
end;
run;
Because you used a character function on a numeric variable. So SAS converted the number to a character string and reported the length of the generated string. You should have seen this note in the SAS log.
NOTE: Numeric values have been converted to character values at the
places given by:
By default SAS will use BEST12. format to convert the numbers to a string. Hence the returned value was always 12 characters long.
I want to put the number in the left hand side of the float point in a new variable
from a float number ex:
int var1;
float var2;
float x=12.505; // i want the number 12 to put it in "var1" and 0.505 in "var2"
It's very simple if you know that when you convert a floating point value to an integer, it simply truncates the floating point value. So you could simply do
var1 = x; // Assign 12 to var1
var2 = x - var1; // Assigns 12.505 - 12 to var2
var1 = x; // var1 is 12
var2 = x - var1; // var2 is 0.505
Run live.
Note that,
If the conversion is from a floating-point type to an integer type, the value is truncated (the decimal part is removed). If the result lies outside the range of representable values by the type, the conversion causes undefined behavior.
Read more about Type conversions.
You can cast the float to int, using one of the options below:
var1 = (int)x;
var1 = static_cast<int>(x);
var1 = int(x);
All these options are identical. Also, you can use implicit conversion, but it can cause compile warnings or errors depending on your compiler settings:
var1 = x;
After this conversion you will need to calculate the fractional part:
var2 = x - var1;
Please note that direct conversion to int truncates the fractional part, I mean,
(int)(12.505) == 12
(int)(12.499) == 12
(int)(-12.55) == -12
(int)(-12.49) == -12
If you need some other ways of rounding the number, use ceil(), floor() or round(). These functions are located in header file in std namespace.
There are various ways.
For one way, look up the function modf() in the standard header <math.h>
Just use auto truncate functionality.
float f = 12.93;
int b = f; // 12
float p = f - b // 0.93
The following two pieces of code produce two different outputs.
//this one gives incorrect output
cpp_dec_float_50 x=log(2)
std::cout << std::setprecision(std::numeric_limits<cpp_dec_float_50>::digits)<< x << std::endl;
The output it gives is
0.69314718055994528622676398299518041312694549560547
which is only correct upto the 15th decimal place. Had x been double, even then we'd have got first 15 digits correct. It seems that the result is overflowing. I don't see though why it should. cpp_dec_float_50 is supposed to have 50 digits precision.
//this one gives correct output
cpp_dec_float_50 x=2
std::cout << std::setprecision(std::numeric_limits<cpp_dec_float_50>::digits)<< log(x) << std::endl;
The output it gives is
0.69314718055994530941723212145817656807550013436026
which is correct according to wolframaplha .
When you do log(2), you're using the implementation of log in the standard library, which takes a double and returns a double, so the computation is carried out to double precision.
Only after that's computed (to, as you noted, a mere 15 digits of precision) is the result converted to your 50-digit extended precision number.
When you do:
cpp_dec_float_50 x=2;
/* ... */ log(x);
You're passing an extended precision number to start with, so (apparently) an extended precision overload of log is being selected, so it computes the result to the 50 digit precision you (apparently) want.
This is really just a complex version of:
float a = 1 / 2;
Here, 1 / 2 is integer division because the parameters are integers. It's only converted to a float to be stored in a after the result is computed.
C++ rules for how to compute a result do not depend on what you do with that result. So the actual calculation of log(2) is the same whether you store it in an int, a float, or a cpp_dec_float_50.
Your second bit of code is the equivalent of:
float b = 1;
float c = 2;
float a = b / c;
Now, you're calling / on a float, so you get floating point division. C++'s rules do take into account the types of arguments and paramaters. That's complex enough, and trying to also take into account what you do with the result would make C++'s already overly-complex rules incomprehensible to mere mortals.