I have a few million rows, where a particular columns’ values are showing as I.e. ###.## and I’d like them to show them as #####.
How can I modify this in the INFILE statement?
Thanks.
It you made the mistake of including a decimal width on in INFORMAT then that might be the cause of what you are seeing. The decimal width on an informat is for letting SAS know where the implied decimal place should be placed. You only want to do that when you know that your source strings were purposely generated without a period to mark the decimal place to save one character.
Example:
data have;
input #1 right 10. #1 wrong 10.3 ;
cards;
1.2
1234
;
Result:
Obs right wrong
1 1.2 1.200
2 1234.0 1.234
Related
I'm working in SAS EG and I'm trying to convert a column that's in character format to numeric format, EXACTLY as they appear in their character format. The numbers vary in length and some have one or two leading zeros.
If I do it one way, it gets rid of all leading zeros. Another way I tried, it adds leading zeros to the point that it's as long as the longest number in the column, e.g., a 9-digit number with one leading zero now has four leading zeros because the longest number in the column is 12 digits. (I hope this description makes sense).
I'm working in SAS EG. When I run proc contents, it tells me my existing variable is a character variable of length 26. It is blank for both 'format' and 'informat.'
I need to convert it so that a new column is a numeric variable, with length 8, and 'F12.' for 'format' and 'BEST12.' for 'informat,' as I plan to use it to match two data sets.
I created the following test data set in 'regular' SAS, but I'm not sure if fully recreates the issue I'm working on in SAS EG:
data have;
input mrn $1-12;
cards;
118283586928
003875807
038087875
0385709873
0038576830
;
run;
As you can see, I have one number that's 12 digits long (no leading zeros); two that are 9 digits (with one or two leading zeros); and two that are 10 digits (with one or two leading zeros).
Any help would be greatly appreciated.
Thanks
You cannot store 26 digit strings exactly as a number in SAS. SAS stores numbers as floating point values. You can use the CONSTANT() function to see the end of the contiguous integers that can be stored exactly.
73 data _null_;
74 x=constant('exactint');
75 put x= comma30.;
76 run;
x=9,007,199,254,740,992
So if you actually have values longer than 15 digits in the character variable you will not be able to convert them to numbers.
But if they are only 12 digits long then just convert the strings into numbers and compare the numbers.
proc sql;
create table want as
select *
from a, b
where a.mrn = input(b.mrn_string,32.)
;
quit;
It's not possible to have different formats in the same column in SAS. The only way to keep them looking exactly as they do while in the same column is to keep them as text. If you need to do calculations on them I'd suggest just creating a 2nd column with their numeric values.
Leading zeros can be added to numbers using the z. format.
https://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000205244.htm
I have 2 datasets which I am comparing. I have taken difference between each column in the two datasets. However SAS is returning these differences upto 15-16 decimal places. How can I limit the output to 8 decimal places.
For example I have column A in dataset 1 and Column A in dataset 2. I have created a new column newA which is data 1 A- data 2 A. The result is coming as 0.0009876543210987654. I want to see the out till 0.00098765 i.e till 8 decimal places.
Use the ROUND function, ROUND(DIFFVAR,10e-8), or format the difference variable 10.8.
Or use Proc COMPARE and the FUZZ option.
I have an input file with a lot of dollar amounts given like this:
$433.5B $41.1B $331.1B $407.4B
$110.8B $19B $2,265.8B $170.1B
where the 'B' character stands for "billions". I do not have other suffixes like M or k. I need to read in this file using an INPUT statement in SAS inside a DATA step, and these figures should be numerics. There are several challenges to overcome, as well as a couple of features of the data to note:
There are dollar signs everywhere.
Some of the numbers have decimal points, and some don't, so we're dealing with variable-length data.
There are commas inside the numbers, such as $2,265.8B.
The most pesky aspect of this data are the B's after each amount.
The B's are always in the same columns.
What informat should I use to read in this numerical data?
I thought of using something along the lines of :DOLLAR4.1, like this:
Data bigcompanies;
Infile 'path\bigcompanies.dat' MISSOVER;
Input (sales profits assets market_value) (:DOLLAR4.1);
Run;
but it gives me nothing (as in, I get periods for those numbers). I don't know how to handle the B, which is, I think, the crux of the problem. The SAS documentation on the DOLLAR informat is rather sparse, unfortunately.
Many thanks for your help!
If the data is in fixed columns then just skip the columns where the B appears.
data test;
input sales dollar6. +1 profits dollar8. +1 assets dollar9. +1 market_value dollar9. +1 ;
*---+---10----+---20----+---30----+---40 ;
cards;
$433.5B $41.1B $331.1B $407.4B
$110.8B $19B $2,265.8B $170.1B
;
proc print;
run;
Results
market_
Obs sales profits assets value
1 433.5 41.1 331.1 407.4
2 110.8 19.0 2265.8 170.1
Note that you normally never want to add a decimal part to an informat. That is telling SAS where to place the decimal point when it does not appear in the source text. So "integers" will be divided by that power of 10.
I have a column which range from 0 to 1. Most of the values are decimals but some values are exactly 1. When I format the column to 10.4 for example, the 1s get converted to 0.0001 rather than 1.0000! Why is this happening???
INFORMAT
probability_of_default 7.4
....
;
FORMAT
probability_of_default 7.4
....
;
INPUT
probability_of_default
;
Sounds like you what you did was read the data using an INFORMAT of 10.4, instead of displaying the data using a FORMAT of 10.4. If you specify the decimal part of an informat then you are telling SAS that when there is no decimal point in the text that it is reading to assume there is one at d characters before the end. So you told SAS to divide the integers by 10**4. Instead just use an informat without any decimal part, like 10., instead of 10.4.
Can anyone tell me why comma9.2 is not working in my sas codes?
data have;
input x $16.;
y = input(x, comma9.2);
z = input(x, comma9.);
put x= y= z= ;
cards;
1,740.32
5200
520
52
7,425
9,000.00
36,000.00
;
run;
To expand on Reeza's answer:
Informat decimal places do not quite work the way Format decimal places do. In almost all cases, you will not want to or need to specify the d in the informat. Comma9. is almost always correct, no matter how many decimal places you expect - even if you expect always two.
The only use informat decimal places serve is when you have a number like 12345600, which has no decimal in it, but it ought to (the last two zeros are after the decimal).
data _null_;
input numval 8.2;
put numval=;
datalines;
12345600
12345605
99999989
1857.145
;;;;
run;
This was something that was common once upon a time in the age of punch cards, particularly for accounting; since everything was in dollars and cents, you could save a column by leaving out the decimal, and just read everything in with two decimals. It is no longer common in most fields (at least in my experience), but SAS is always backwards compatible.
SAS will ignore the .d specification if it encounters a decimal point in the data (and will then use the location of that decimal to read in the value correctly), but if there are no decimal points in the data it may read it in incorrectly if you specify the .d. Notice in my example the final row has a decimal point followed by three decimal places, and is read in correctly.
You can read SAS Documentation for more information.
Comma9.2 assumes that values will always have 2 decimal places.