My dataset has a column with a wide range of values in it, such as the one below:
Value
3223145.306
1.044303129
345.556033
17693.00837
8.03E-06
NaN
1.97E-04
2.29E-04
8.01E-04
7.46E-04
18345.82237
47.78282804
4.14E-06
When I read this column in SAS, observations are read as character. Once I convert this to numeric the observations with E-04, E-05, E-06, etc. are being converted to 1.9736273 instead of 0.00019736273.
How do I account for E-04, E-05, E-05 etc.?
code for character to numeric:
Value=input(Value, best12.);
You have to make a NEW variable if you want it to have a different type.
The INPUT function does not care if the width used on the informat is larger than the length of the string being read. So just use the maximum width that the informat supports. Also BEST is the name of a FORMAT, not an INFORMAT. If you use as the name of an informat then SAS will just default to using the normal numeric informat. So just go ahead and say that from the start instead of confusing format names for informat names.
The normal numeric informat can read those strings as numbers. So this code will work to create a new numeric variable named NUMBER from the existing character variable named VALUE.
number = input(VALUE,32.);
The only string in your list that will cause any issues is the string 'NaN'. SAS will not know how to translate that so you will just get a missing value as the result. Which is basically what systems that use that "not a number" symbol mean by it anyway. To prevent the notes in the log you can either test for it explicitly.
if upcase(value) not in ('NA','N/A','NAN') then number=input(value,32.);
Or just suppress the error messages by add the ?? modifier.
number=input(value,??32.);
But then you will not get any message if there is other gibberish in the value variable.
I am trying to reformat my variables in SAS using the put statement and a user defined format. However, I can't seem to get it to work. I want to make the value "S0001-001" convert to "S0001-002". However, when I use this code:
put("S0001-001",$format.)
it returns "S0001-001". I double-checked my format and it is mapped correctly. I import it from Excel, convert it to a SAS table, and convert the SAS table to a SAS format.
Am I misunderstanding what the put statement is supposed to be doing?
Thanks for the help.
Assuming that you tried something like this it should work as you intended.
proc format ;
value $format 'S0001-001' = 'S0001-002' ;
run;
data want ;
old= 'S0001-001';
new=put(old,$format.);
put (old new) (=:$quote.);
run;
Make sure that you do not have leading spaces or other invisible characters in either the variable value or the START value of your format. Similarly make sure that your hyphens are actual hyphens and not em-dash characters.
Most likely a silly question, but I must be overlooking something.
I have a date field in which sometimes the date is missing (.). I have to create a file against this data set, but the requirements to have this loaded into a DB2 environment are requesting that instead of native SAS null numeric value (.), they require it to be a blank string.
This should be a simple task, by first converting the variable to character, and using the appropriate format:
LAST_ATTEMPT = PUT(ATTMPT1,YYMMDDS10.);
When a proc contents is run on the data set, it confirms that this has been converted to a character variable.
The issue is that when I look at the data set, it still has the (.) for the missing values. In an attempt to convert the missing date(.) to a blank string, it then blanks out every value for the variable...
What am I missing here?
Options MISSING=' ';
This will PUT blank for missing value when you execute your assignment.
One way is to use Options MISSING=' ';, but this might have unwanted impact on other parts of your program.
Another safer way is just adding a test to the original program:
IF ATTMPT1~=. THEN LAST_ATTEMPT = PUT(ATTMPT1,YYMMDDS10.);
ELSE LAST_ATTEMPT = "";
When I imported a CSvfile in Weka, it reads some numeric variable as Nominal Type. I would like to convert them to Numeric but Im not seeing any option in Weka.
I tried to open the .arff file using Notepad and Notepad++. I remove the variables and change it to numeric
example:
#attribute thours {' ',18,4,48,42,56,35,40,30,14,54,24,36,20,77,25,70,0,16,34,60,64,21,32,6,84,23,31,52,28,50,66,45,12,10,33,11,22,98,8,3,65,72,9,26,15,63,5,27,51,39,105,7,2,58,43,90,68,46,44,47,112,49,91,37,1,41,104,78,96,75,74,62,71,76,89,13,38,19,29,59,92,81,55,57,53,67,80,102,100,17}
to
#attribute thours numeric
and save the file. when i imprted the fiel again, Im getting an error
"...not recognized as an 'Arff data files' file. reason: numebr expected, read Token ], line 78"
Any help is greatly appreciated. Thanks.
Dixi
I believe the reason for your error is that one or more entries of the variable, "thours", is missing. This is represented in the attribute description as the single quotes. If those values are indeed supposed to be missing, you should change it to the format Weka expects in a ".arff", which is a question mark "?".
This link provides a very detailed description of ".arff" files, and what is expected in them.
I have a long ID number (say, 12184447992012111111). BY using proc import from csv file that number shortens itself with a addition of 'E' in between the digits (1.2184448E19, with format best12. and informat best32.). Browsing here I got to know the csv format itself shortens it previously so it is nothing to do with SAS. So I tried to copy say about 5 numbers and use datalines statement then also it results same.... It wil be helpful if anyone can suggest which format I need to use. Using best32. format I donot get the original number since most probably it modifies that altered number, which infact gives me 12184447992012111872 which is not my desired number.
Because your ID variable is really an identifier rather than a "real" number, you need to read it in as a character string. The value you show as an example is too large to be represented as an integer, so since SAS stores all numerics as floating point, you are losing "precision".
Since you mention using PROC IMPORT, copy the SAS program it generates and change the FORMAT and INFORMAT specifications from "21." and "best32." to "$32." (or whatever value matched your data.
Best of course would be if you had SAS Access to PC File formats, in which case you cound format the column as "text" in Excel and let SAS read it directly.
I'm not sure about the csv changing the value (they are just plain text files) - unless you are saving an excel spreadsheet as a csv file. If you are using excel just set the column to number format, no decimal places.
It might be easier to treat the column as text when importing it to SAS - unless you need to perform mathematical operations on it! If you really need to keep it as a number the format 32. should force it to be a 32 digit number - best is fairly sensibly changing it into scientific notation (though I suspect the data is there in the background and just displayed unhelpfully).
There is a SAS informat for reading exponential notation - Ew.d where w is the width and d the number of decimal places. In your case, it probably won't help because you will "lose" the complete number - and the value stored in case you read with this informat will be 1.2184448 * (10^19). The only way in your case is to ensure that the program which produces the CSV file outputs it in the right way. If you are creating the data from an Excel worksheet, then format the number in the Excel worksheet to display all the digits correctly.