I have a problem with numeric and character values.
I did proc contents, so I have variable Poids as characters.
I want to use the following, but it does not change to numeric . best32. is used as it is demanded in problem. Do I do any mistake?
data X;
set Y;
Poids=input(Poids,best32.);
run;
Okay, I found the problem. I cannot have same variable defined as both character and numeric. To fix this problem, I have to rename initial variable in dataset options as shown below and then drop the renamed variable.
data X( drop =Poids_char;
set Y(rename=(Poids=Poids_char));
Poids=input(Poids_char,best32.);
run;
Related
I am working with data in SAS environment. I'm trying to replace missing values with blanks for an entire table. Missing values can be found in columns that are both character and numeric.
I tried using the following:
RSUBMIT; proc stdize data=Final_note out=final_note_blanks reponly missing=''; run; quit; ENDRSUBMIT;
but since some of the columns are operating on numeric data types, this won't work because '' is a character. Below is the error message I received:
Syntax error, expecting one of the following: a numeric constant, a datetime constant, ABW, AGK, AHUBER, AWAVE, EUCLEN, IQR, LEAST, MAD, MAXABS, MEAN, MEDIAN, MIDRANGE, RANGE, SPACING, STD, SUM, USTD. The symbol is not recognized and will be ignored
You cannot replace a missing numeric value with a blank. This can only be done with character values. However, you can represent missing values as blank for reporting purposes.
options missing = '';
This only changes the way it is presented. SAS still sees it as a missing numeric value.
I obtain an error in SAS while creating a new field in a dataset recalling a numeric macro variable. Here there's an example.
data input;
input cutoff_1 cutoff_2;
datalines;
30 50
;
data db;
input outstanding;
datalines;
1000.34
2000.45
3000.90
5000.98
8000.02
;
data _null_;
set input;
call symput("perc1",cutoff_1);
call symput("perc2",cutoff_2);
run;
proc univariate data=db noprint;
var outstanding;
output out=test
pctlpts = &perc1. &perc2.
pctlpre = P_;
run;
data test2;
set test;
p_&perc1._round=round(P_&perc1.,1);
p_&perc2._round=round(P_&perc2.,1);
run;
From the log it seems that the macros &perc. are solved, but that it's not possible to use those results (30, 50) to name a new variable in the dataset test2. What am I missing?
When you ask normal SAS code to use a numeric value in a place where a character value is required (both arguments to CALL SYMPUT() require character values) then SAS will convert the number into a string using the BEST12. If the value does not require all 12 characters it will be right aligned. So you created macro variables with leading spaces. The leading spaces make no difference for generating pctlpts= option values as extra spaces will not matter there. But having the leading spaces mean you are generating code like:
p_ 30_round=round(P_ 30,1);
You should use the modern (probably over 20 years old) CALL SYMPUTX() function instead. That function will remove leading/trailing spaces from the second argument when creating the macro variable. It also accepts a numeric value as the second argument converting the number into a character string using a format more like BEST32 instead of BEST12.
call symputx("perc1",cutoff_1);
call symputx("perc2",cutoff_2);
The only cases where you should ever use the ancient CALL SYMPUT() function is when you actually need to create macro variables that contain leading and/or trailing spaces.
Other solutions are to remove the spaces in the function call
call symput("perc1",strip(put(cutoff_1,best32.)));
or after generating the macro variables.
%let perc1=&perc1;
One possible solution is to use call symputx and not call symput. With call symputx the variables created are globally defined.
I'm trying to determine how SAS is reading the length statement and then the informat statement. I could be misunderstanding, but I'm under the impression that the informat statement for numeric variables worked like this:
informat number 5.;
This would give the variable number the informat 5, allowing 5 numbers to fill it. E.G. 12345
However, when I run the below program, I have a number that has 9 digits, 987654321, with the appropriate length to fit the digits, 6, which will represent all numbers up to 137,438,953,472
Q: is length statement 'overriding' the informat statement and allowing all 9 digits to fill the variable number? How are all 9 digits able to fit in the variable number with an informat of 5.?
data tst;
input number;
length number 6;
informat number 5.;
datalines;
987654321
;
run;
proc print data=tst;
run;
Based on this SAS documentation:
http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000199348.htm
w specifies the width of the input field. Range: 1-32
It would seem that the informat w.d would work as I first described and not allow all 9 digits to fill number
Because you are using list mode input. In that situation SAS reads the next word, however long it is. Essentially in list mode input (including when using the : modifier before an informat specified in the input statement) the width on a informat is ignored.
Other than for creating metadata in the SAS dataset there is not much value in attaching informats like 5. or $10. to variables.
SAS does not need them to understand how to convert text into values, unlike informats like date..
In list mode it ignores the width part.
And in formatted input, where the width matters, you have to specify the informat in the INPUT statement itself.
First off: length is not overriding, or having any impact on, the informat or the read-in. length solely describes how many bytes are used to store the number, nothing more.
For numeric variables, informats don't work quite the intuitive way. I'm not sure why - but they don't.
See this quotation from the list input documentation:
For a character variable, this format modifier reads the value from the next non-blank column until the pointer reaches the next blank column, the defined length of the variable, or the end of the data line, whichever comes first. For a numeric variable, this format modifier reads the value from the next non-blank column until the pointer reaches the next blank column or the end of the data line, whichever comes first.
They do listen to the informat to some extent - add a .2 there and you'll get a forced decimal - but they don't listen to it as to how long of a value to read in. I'm not sure why; it seems intuitive that they should, but they don't.
Here's it with character variables - they respect the length but also ignore the informat:
data tst;
length number $9;
informat number $5.;
input number;
datalines;
987654321
;
run;
proc print data=tst;
run;
Though you do need to put the informat before the input statement (and the length for numeric variables).
More detail is available on the documentation page for INFORMAT:
How SAS Treats Variables When You Assign Informats with the INFORMAT Statement
Informats that are associated with variables by using the INFORMAT statement behave like informats that are used with modified list input. SAS reads the variables by using the scanning feature of list input, but applies the informat.
In modified list input, SAS
does not use the value of w in an informat to specify column positions or input field widths in an external file
uses the value of w in an informat to specify the length of previously undefined character variables
ignores the value of w in numeric informats
uses the value of d in an informat in the same way it usually does for numeric informats
treats blanks that are embedded as input data as delimiters unless you change their status with a DLM= or DLMSTR= option specification in an INFILE statement.
That is much more explicit about the fact that SAS ignores the value of w.
The length of a variable defines the amount of space the value occupies when stored to disk. NOTE: During a running DATA step all numerics are double precision, the truncation to a length < 8 only occurs during output media.
The informat is a separate concept from the length. Informat defines how incoming value representations are to be interpreted for storage as a SAS numeric value. Incoming value representations would be what ever text has to be processed; be it a INPUT statement reading a file, a VIEWTABLE field edit processing a typed in value, an EG grid cell edit, etc...
The format is similarly separate concept that defines how SAS renders a numeric value for output; be it a PUT statement, a VIEWTABLE row render, a placement in a PROCs output, an EG grid cell, etc...
Explanation
Now that that is out of the way, The informat is honored when explicitly stated in an INPUT statement:
data _null_;
attrib number length=6 informat=5.;
input number 5.;
put 'NOTE: ' number=;
datalines;
987654321
run;
===== LOG =====
NOTE: number=98765
And, as you question, the variables associated informat is not applied an explicit numeric informat is not stated
data _null_;
attrib number length=6 informat=5.;
input number;
put 'NOTE: ' number=;
datalines;
987654321
run;
===== LOG =====
NOTE: number=987654321
So the first is LIST input with format specified and the second is a simple LIST input (because no format is specified).
Simple list input will accept some absurdly large data, and the resultant value, while not tail-end precise, will be at the correct exponential level.
data _null_;
attrib number length=6 informat=5.;
input number;
put 'NOTE: ' number= ;
datalines;
123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890
run;
===== LOG =====
NOTE: number=1.2345679E89
What do the docs for INPUT Statement, List say ? Certainly nothing about using the variables declared informat when none indicated
Simple List Input
Simple list input places several restrictions on the type of data that
the INPUT statement can read:
• By default, at least one blank must separate the input values. Use
the DLM= or DLMSTR= option or the DSD option in the INFILE statement
to specify a delimiter other than a blank.
• Represent each missing value with a period, not a blank, or two
adjacent delimiters.
• Character input values cannot be longer than 8 bytes unless the
variable is given a longer length in an earlier LENGTH, ATTRIB, or
INFORMAT statement.
• Character values cannot contain embedded blanks unless you change
the delimiter.
• Data must be in standard numeric or character format. (footnote 1)
FOOTNOTE 1: See SAS Language Reference: Concepts for the information about standard and nonstandard data values. (my LOL)
The concepts for "SAS Variable Attributes" states
informat
refers to the instructions that SAS uses when reading data values. If
no informat is specified, the default informat is w.d for a numeric
variable, and $w. for a character variable. You can assign SAS
informats to a variable in the INFORMAT or ATTRIB statement. You can
use the FORMAT procedure to create your own informat for a variable.
(my bold)
Apparently there is no explicit default such as 32. or best32. because values with more than 32 digits will be inputted without error.
So does the documentation explain things ? Yea, well, sorta. What are the take aways:
The human intuition of a numeric variable inheriting its informat during simple list input does not align with the actual implemented behavior.
Tectonic amounts of existing SAS code means a change to implement this intuition is highly unlikely
Simple statements can involve a lot of concepts with wide ranging documentation
Possible change is that the documentation will be updated to be more explicit about the simple list input caveats
I'm creating a text file with SAS and I'm using a macro variable with a date in my text file's name to make it distinct from other similar files.
The problem I'm experiencing:
SAS is adding two unwanted spaces in the middle of the file name. The unwanted spaces are placed directly before the text generated by my macro variable
I'm certain this has everything to do with my macro variable being used, but on its own, the variable doesn't contain any spaces. Below is my code:
proc format;
picture dateFormat
other = '%Y%0m%0d%0H%0M' (datatype=datetime);
run;
data _null_;
dateTime=datetime();
call symput('dateTime', put(dateTime,dateFormat.));
run;
%LET FILE = text_text_abc_&dateTime..txt;
filename out "/location/here/&FILE" termstr=crlf;
data _null_; set flatfile;
/*file content is created in here*/
run;
The exported file name will look like this:
NOTE: The file OUT is:
Filename=/location/here/text_text_abc_ 201702010855.txt
If it helps, I'm using SAS E-Guide 7.1.
Any help is appreciated! Thanks, all!
You need to assign an appropriate default length to your picture format. SAS is applying a default default length of 14 but you need 12, e.g.
proc format;
picture dateFormat (default=12)
other = '%Y%0m%0d%0H%0M' (datatype=datetime);
run;
Use call symputx() instead of call symput(), then SAS will automatically strip the leading and trailing blanks from the value written to the macro variable. You should really only use call symput() in the rare cases where you want the macro variable value to have leading or trailing blanks.
Run this little program to see the difference.
data _null_;
str=' XX ';
call symput('var1',str);
call symputX('var2',str);
run;
%put |&var1|;
%put |&var2|;
Do we have any alternative for like operator(sql) in SAS datastep?
I am using below code for my requirement. but it is not working.
IF var1 ne : 'ABC' then new_var=XYZ;
Please anyone suggest what is wrong in this or suggest to me what the correct usage is for this situation.
Thanks,
In datastep, 'if' could be used with 'index/find/findw', but if you want to use 'like', you must use 'where' and 'like' together.
data want;
set sashelp.class;
where name like 'A%';
run;
You can use the find function,e.g.:
data want;
set sashelp.class;
if find(name,'e') then new_var='Y';
run;
The colon operator as you've used it only compares values that begin with the quoted string 'ABC'. Essentially SAS compares the 2 values, truncated to the smallest length of the 2. So if all the values in var1 are more than 3 characters, then it will truncate the values to 3 characters before comparing with 'ABC'.
It therefore differs from the like function in sql, which is used in conjunction with the % wildcard operator to determine whether to look at the beginning, end, or anywhere in the string.
To replicate like, you need to use a function such as find as recommended by #Amir, or index which is also commonly used in this situation.