Why does the binaryw. format behave differently above width 58? - sas

Consider the following program and output:
data _null_;
input a;
length b $64;
do i = 1 to 64;
fmtname = cats('binary',i);
b = cats(putn(a,fmtname));
put i= b=;
end;
cards;
1
;
run;
Output (SAS 9.1.3, Windows 7 x64):
i=1 b=1
i=2 b=01
i=3 b=001
i=4 b=0001
i=5 b=00001
/*Skipped a few very similar lines*/
i=58 b=0000000000000000000000000000000000000000000000000000000001
i=59 b=11111110000000000000000000000000000000000000000000000000000
i=60 b=111111110000000000000000000000000000000000000000000000000000
i=61 b=1111111110000000000000000000000000000000000000000000000000000
i=62 b=11111111110000000000000000000000000000000000000000000000000000
i=63 b=011111111110000000000000000000000000000000000000000000000000000
i=64 b=0011111111110000000000000000000000000000000000000000000000000000
Last few lines of output from SAS 9.4 on Linux x64:
i=60 b=000000000000000000000000000000000000000000000000000000000001
i=61 b=1111111110000000000000000000000000000000000000000000000000000
i=62 b=11111111110000000000000000000000000000000000000000000000000000
i=63 b=011111111110000000000000000000000000000000000000000000000000000
i=64 b=0011111111110000000000000000000000000000000000000000000000000000
This behaviour is rather unexpected, to me at least, and doesn't seem to be documented on the help page. It agrees with the document I found here for width 64 - standard double precision - but I don't understand why it flips over at width 59.

I don't quite get the same result - mine switches at 61 - but I believe the answer is the same.
Up to some point - 58, 60, somewhere around there - SAS is showing you the fixed-point integer representation of the number. Test this with a decimal, like so:
data _null_;
a=3.14159265358979323846264338327950288419716939937510582;
length b $64;
put a= hex4.;
put a= hex8.;
put a= hex16.;
do i = 1 to 64;
fmtname = cats('binary',i);
b = cats(putn(a,fmtname));
put i= b=;
end;
run;
And you will get a sort-of-surprising result - you see 000...0011 for most of your rows, up through 60. The documentation doesn't explicitly mention this, but it does show it in the example (123.45 and 123 are identical in binary8.).
Then starting at 61, or 59 for you I'm guessing, you see the actual representation of the number as SAS internally stores it (or, arguably, how Intel internally stores it).
The binary documentation doesn't explain this well, but the HEX. documentation does explain it pretty clearly in a tip:
If w< 16, the HEXw. format converts real binary numbers to fixed-point integers before writing them as hexadecimal characters. It also writes negative numbers in two's complement notation, and right aligns digits. If w is 16, HEXw. displays floating-point values in their hexadecimal form.
Binary is doing the same, and on my machine it happens right at the point HEX would also make the change - at 15x4=60. And HEX. shows the same - notice below; hex4. and hex8. show a different result than hex16..
To be clear, the value shown at binary64. is correct, and not any sort of truncation (though 61-63, and in your example 59-60, are left-truncated).
I did find a SAS usage note regarding this, though it's clearly out of date based on our tests:
Beginning with SASĀ® Version 7, the BINARYw. format was changed to be more consistent with the HEXw. format. When the HEXw. format uses a width of 16, (corresponding to 8 bytes of data), it produces a hexadecimal representation of the floating point value. The BINARYw. format changed so that widths of 57-64 produce a binary representation of the floating point value, since widths of 57-64 correspond to 8 bytes of data.
It also contains a suggestion for how to get consistent results for integers, which may be of use.
BIN_64=PUT(PUT(VALUE,S370FIB8.),$BINARY64.);
S370FIB8. is a format that converts numbers to their fixed integer binary representation, in IBM Mainframe format. (I.e., it writes the integer in Big-Endian format, which is not what you'd get on an Intel machine.)

Related

Attempting to identify non-integer values using for loop

I am trying to identify values that are not integers in Stata. My dataset is the following:
var1 var2 var3
1 2 3
2 4 5
3 6 7
4 2 3
5 1 1
6 2 8
My code is the following:
foreach var in var1 var2 var3 {
gen flag_`var' = 1 if format(`var') == %int
replace flag_`var' = 0 if flag_`var' ==.
I am getting an error message stating
unknown function format()
}
I also tried replacing the parentheses around format(`var') with format[`var'] but then I got an error stating format not found. Is there something wrong with the format I am using or is there a better way to identify non-integer values?
The first answer is what Stata told you: there is no format() function.
But a deeper answer is that thinking of (display) formats is the wrong way round for this question. A display format is in essence an instruction to show data in a certain way and has nothing to do with its stored value, or to be more precise the decimal equivalent of its stored value. Thus 42 displayed with format %4.3f is shown as 42.000 while 6.789 displayed with format %1.0f is shown as 7. Otherwise put, no value has an inherent format, but a display format is used to display a value, either by default or because a user specified a format. Stata is here just using the same broad ideas as say C and various C-like languages.
Nothing to do with its stored value is a slight exaggeration, as only numeric formats make sense for numbers and only string formats make sense for strings, but display format has nothing to do with whether a stored value is integer.
Further %int is not a display format any way. When formats are being checked for, they would be literal strings enclosed in "".
To show non-integers various methods could be used, say using rounding functions such as round(), int(), floor() or ceil(). So an indicator for whether x is integer could be
gen is_int_x = x == floor(x)
All the values in your data example are integer any way, but I take it that you are looking for non-integers elsewhere.

Output is '*' when writing a real to a string

I have a real CURRENTTIME I want to convert to a string named TIMEDIR. As TIMEDIR has to change size it is allocatable. As far as I could find out, the allocation works fine. Also, I checked that CURRENTTIME has a value.
ALLOCATE(CHARACTER(LEN=1)::TIMEDIR)
WRITE(TIMEDIR, '(F1.0)') CURRENTTIME
But
WRITE(*,*) TIMEDIR
outputs *, where it should be 0 (CURRENTTIME is 0.0000000). I have no clue what the problem is.
You're writing the output as a floating point number. Floating point numbers always have a decimal point or an exponent to differentiate them from integers. Thus the narrowest output of a float possible is 0., i.e. 2 characters, and a format of F1.0 will always result in a "*" being printed as the field width is insufficient for what is being written.
Ian Bush's answer says what you need to know: output for a real value using the F edit descriptor requires a field width of at least 2. I'll elaborate a bit on some other aspects.
You mention
As TIMEDIR has to change size it is allocatable
but in the code fragment we see
WRITE(TIMEDIR, '(F1.0)') CURRENTTIME
This suggests a little misunderstanding. [It may be that there's no confusion, but I'll labour the point for the benefit of any other reader coming to the question.]
When an output format looks like Fw.d for w greater than zero the width of the output field is always w. This "always" means: whatever the value of the corresponding variable, the effect of the write statement above on TIMEDIR is to have a single character non-blank.
Now, as in that other answer, 2 is the minimum field width for output of a real value[1]. As with all other numeric output formatting, if the field isn't wide enough for the corresponding value the field consists entirely of *s. F1.0 will always result in output *. If you want output 0. (or 0,)[2] you'll need F2.0.
Coming back to the "varying size of TIMEDIR", output format F2.0 is (possibly) sufficient for non-negative values of CURRENTTIME less than 10, but for negative values or values not less than 10 it isn't. It may well be that this is where F0.d comes in. It's only with this form of the F edit descriptor that the width of the field depends on the output value. That's probably an answer to another question, though.
Finally, as you mention
I have to find out how to make "0" out of "0."
I'll point out that you're looking at having to do some additional logic, such as mentioned elsewhere.
1 And 2 may not be sufficient, even for a zero value: print '(SP,F2.0)', 0.
2 The choice of 0. and 0, depends on the decimal mode: print '(DC,F2.0,DP,F2.0)', 0., 0.

Why do some formats create precision errors in sas numeric to character transformation?

In a data step, if I do
strip(put(number,best32.))
I get a precision error for some numbers and not for others:
0.2804 --> "0.2804"
0.0804 --> "0.08039999999999"
I don't understand why the first number is perfectly transformed into a string and the second has a precision error. Is it because the real number is in fact 0.08039999999999 (8 bytes length) and that SAS doesn't show as much precision (in Enterprise Guide)?
I tried this instead
strip(put(number,20.10))
but then I get much unneeded zeros.
0.2804 --> "0.2804000000"
0.0804 --> "0.0804000000"
finally I found this to do what I need:
strip(cats(number))
Is this the best option?
It is because your actual number is closer to 0.08039999999999 than to 0.0804. So when you asked it to present the data using the best it can do in 32 characters it picked the former.
Try running this code.
data _null_ ;
input x ;
s = put(x,best32.);
put (x s) (=);
cards;
0.2804
0.0804
0.08039999999
;

Why does comma9.2 not work?

Can anyone tell me why comma9.2 is not working in my sas codes?
data have;
input x $16.;
y = input(x, comma9.2);
z = input(x, comma9.);
put x= y= z= ;
cards;
1,740.32
5200
520
52
7,425
9,000.00
36,000.00
;
run;
To expand on Reeza's answer:
Informat decimal places do not quite work the way Format decimal places do. In almost all cases, you will not want to or need to specify the d in the informat. Comma9. is almost always correct, no matter how many decimal places you expect - even if you expect always two.
The only use informat decimal places serve is when you have a number like 12345600, which has no decimal in it, but it ought to (the last two zeros are after the decimal).
data _null_;
input numval 8.2;
put numval=;
datalines;
12345600
12345605
99999989
1857.145
;;;;
run;
This was something that was common once upon a time in the age of punch cards, particularly for accounting; since everything was in dollars and cents, you could save a column by leaving out the decimal, and just read everything in with two decimals. It is no longer common in most fields (at least in my experience), but SAS is always backwards compatible.
SAS will ignore the .d specification if it encounters a decimal point in the data (and will then use the location of that decimal to read in the value correctly), but if there are no decimal points in the data it may read it in incorrectly if you specify the .d. Notice in my example the final row has a decimal point followed by three decimal places, and is read in correctly.
You can read SAS Documentation for more information.
Comma9.2 assumes that values will always have 2 decimal places.

SAS Format in calculation

I am creating New variable as AGE.The CUTOFF value is 100 and it is divided by 12 so the value is exactly 8.3333.....But Few freshness values are 8.3333333. I have to pick the value of SEGMENT if FRESHNESS>= 100/12, but its picking AMU where freshness is 8.3333... The format of FRESHNESS is F12.9 and CUTOFF is BEST12.
data new;
set SEGMENT_AGE;
IF Freshness< CUTOFF/12 THEN AGE=AMU;
ELSE AGE=SEGMENT;
RUN;
I tried with different format making cutoff to F12.9 , still its not working
You're running into an issue of floating point precision. If a number is a repeating decimal (in binary), you may have two different values (the higher or lower - ie, 0.333333333333333333 or 0.3333333333333333333334) depending on how it was arrived at. IE:
1-(1/3) - (1/3) = 0.33333333333333333334
0+(1/3) = 0.33333333333333333333
So do not assume it is precisely equal just because it looks like it should be. Further, some numbers in decimal that are not repeating decimals are repeating in binary - 7/10 for example is 0.7 decimal but is not storable precisely in binary.
You should compare rounded numbers if you need to compare precisely; for example,
if round(freshness,0.001) < round(cutoff/12,0.001) ...
should result in your calculations matching your expectations.