My code was running fine until I added the last line for age 5+. Does anyone know what's wrong with that line? Thank you.
data Work.File ;
set Work.File;
Female =(Sex ='F');
Male = (Sex ='M');
Age1=(age=1);
Age2=(age=2);
Age3=(age=3);
Age4=(age=4);
Age5+=(age='5+');
run;
The name of a SAS variable has certain restrictions, you can't have a + sign. Also Age should be a numeric variable. You can write last line as:
Age5Plus=(age>=5);
"Age5+"n=(age>=5);
would also work after setting
options validvarname=any;
but than you have to escape that name every time you use that variable
Related
I have a variable DRG in my dataset and I would like to create a new variable with the second and third characters in the DRG string. For example, if DRG value is A23B I would like to extract 23 as a new variable.
Can someone please help me with the SAS code. Thanks a lot in advance.
Sample code
data example;
input DRG $4.;
cards;
A23B
A13A
A45C
B82B
B82C
B34A
C01A
C25B
C46B
;
run;
Thanks for the help.
I was able to work out the answer by following this webpage https://www.listendata.com/2017/03/extract-last-4-characters-digits-in-sas.html
Here is my code:
data example2;
set example;
want = substr(DRG,length(DRG)-2,2);
run;
I have a data step where I have a few columns that need tied to one other column.
I have tried using multiple "from" statements and " to" statements and a couple other permutations of that, but nothing seems to do the trick. The code looks something like this:
data analyze;
set css_email_analysis;
from = bill_account_number;
to = customer_number;
output;
from = bill_account_number;
to = email_addr;
output;
from = bill_account_number;
to = e_customer_nm;
output;
run;
I would like to see two columns showing bill accounts in the "from" column, and the other values in the "to", but instead I get a bill account and its customer number, with some "..."'s for the other values.
Issue
This is most likely because SAS has two datatypes and the first time the to variable is set up, it has the value of customer_number. At your second to statement you attempt to set to to have the value of email_addr. Assuming email_addr is a character variable, two things can happen here:
Customer_number is a number - to has already been set up as a number, so SAS cannot force to to become a character, an error like this may appear:
NOTE: Invalid numeric data, 'me#mywebsite.com' , at line 15 column 8. to=.
ERROR=1 N=1
Customer_number is a character - to has been set up as a character, but without explicitly defining its length, if it happens to be shorter than the value of email_addr then the email address will be truncated. SAS will not show an error if this happens:
Code:
data _NULL_;
to = 'hiya';
to = 'me#mydomain.com';
put to=;
run;
short=me#m
to is set with a length of 4, and SAS does not expand it to fit the new data.
Detail
The thing to bear in mind here is how SAS works behind the scenes.
The data statement sets up an output location
The set statement adds the variables from first observation of the dataset specified to a space in memory called the PDV, inheriting lengths and data types.
PDV:
bill_account_number|customer_number|email_addr|e_customer_nm
===================================================================
010101 | 758|me#my.com |John Smith
The to statement adds another variable inheriting the characteristics of customer_number
PDV:
bill_account_number|customer_number|email_addr|e_customer_nm|to
===================================================================
010101 | 758|me#my.com |John Smith |758
(to is either char length 3 or a numeric)
Subsequent to statements will not alter the characteristics of the variable and SAS will continue processing
PDV (if customer_number is character = TRUNCATION):
bill_account_number|customer_number|email_addr|e_customer_nm|to
===================================================================
010101 | 758|me#my.com |John Smith |me#
PDV (if customer_number is numeric = DATA ERROR, to set to missing):
bill_account_number|customer_number|email_addr|e_customer_nm|to
===================================================================
010101 | 758|me#my.com |John Smith |.
Resolution
To resolve this issue it's probably easiest to set the length and type of to before your first to statement:
data analyze;
set css_email_analysis;
from = bill_account_number;
length to $200;
to = customer_number;
output;
...
You may get messages like this, where SAS has converted data on your behalf:
NOTE: Numeric values have been converted to character
values at the places given by: (Line):(Column).
27:8
N.B. it's not necessary to explicitly define the length and type of from, because as far as I can see, you only ever get the values for this variable from one variable in the source dataset. You could also achieve this with a rename if you don't need to keep the bill_account_number variable:
rename bill_account_number = from;
I am trying to set missing values to NULL in SAS dataset for a numerical variable,
how can I do this?
as missing is null in sas?
If you're asking how to have the period not display for a missing value, you can use:
options missing=' ';
That however doesn't actually change them to null, but rather to space. SAS must have some character to display for missing, it won't allow no character. You could also pick another character, like:
options missing=%sysfunc(byte(255));
or even
options missing="%sysfunc(byte(0))";
I don't recommend the latter, because it causes some problems when SAS tries to display it.
You can then trim out the space (using trimn() which allows zero length strings) if you are concatenating it somewhere.
Taking the question very literally, and assuming that you want to display the string NULL for any missing values - one approach is to define a custom format and use that:
proc format;
value nnull
.a-.z = 'NULL'
. = 'NULL'
._ = 'NULL'
;
run;
data _null_;
do i = .a,., ._, 1,1.11;
put i nnull.;
end;
run;
You can set values to missing within a data step, when it is numeric :
age=.;
to check for missing numeric values use :
if numvar=. then do;
or use MISSING function :
if missing(var) then do;
IS NULL and IS MISSING are used in the WHERE clause.
Look at : http://www.sascommunity.org/wiki/Tips:Use_IS_MISSING_and_IS_NULL_with_Numeric_or_Character_Variables
I have a messy file, where some of the columns are tab delimitated and some are comma.
My problem with the data set is reading the files with variable lengths
12 Stephen Cole, 33, Columbia, MO
5 Dave Anderson, 25*, Concord, OH
The first column is a ID (tab) the the name (comma) age (comma), active (presence of an asterisk after age), home (tab)
The * after the age indicates if they are inactive.
All the names start at column #19, but everything after that is variable lengths and column starts.
I want to read into a format where I finally get.
ID Name Age Active Home
12 Stephen Cole 33 Active Columbia, MO
5 Dave Anderson 25 Inactive Concord, OH
Thus far I have:
data marathon;
infile 'c:/file.txt' dlm=',' pad firstobs=12;
input #3 ID 3. #19 Name $CHAR13.;
Then I get stuck on how to read the rest. I am mostly thrown with how to read the asterisk next to the age as its own column. If I had that understood, I think I can handle the rest.
You have a couple of issues. First, you need to use delimited input, specifically you need to combine comma and tab into one set of delimiters - one way is shown below. Second, you have two fields that are nontrivial; the one with the asterisk needs to be parsed afterwards (I use compress to keep specifically digits in the first line, and to keep specifically asterisks in the second line). You also need to read city/state in separate fields and combine them together (I use catx).
data want;
infile "c:\temp\test.dat" dlm='092C'x;
input
id
name :$50.
age_active $
home_city :$25.
home_st $
;
age=input(compress(age_active,,'kd'),best.);
active = ifc(compress(age_active,'*','k')='*','Active','Inactive');
home = catx(', ',home_city,home_st);
run;
Watch your lengths, I suggest reasonable ones given my past experience but you could see longer names or cities easily.
I am having two questions on the following SAS code:
%let crsnum=3;
data revenue;
set sasuser.all end=final;
where course_number=&crsnum;
total+1;
if paid=’Y’ then paidup+1;
if final then do;
call symput(’numpaid’,paidup);
call symput(’numstu’,total);
call symput(’crsname’,course_title);
end;
run;
proc print data=revenue noobs;
var student_name student_company paid;
title "Fee Status for &crsname (#&crsnum)";
footnote "Note: &numpaid Paid out of &numstu Students";
run;
First question, in line 5, it has
if paid=’Y’ then paidup+1;
"paidup" should be a variable here.
It seems to me that SAS setup the default initial value of "paidup" as 0. Is that true?
Second question, in the code segment of
title "Fee Status for &crsname (#&crsnum)";
How does #&crsnum work? Or what's the functionality of # here?
First question: yes, that's what SAS has done - it has initialised the variable with 0, and 'retains' the value of the variable across data set loops. (Unless the variable paidup already exists in the source data, in your case sasuser.all)
Second question: in the code you've posted, there is nothing special about the #: it will appear as a literal before the resolved value of &crsnum in the title. So if &crsname is Blah and &crsnum is 3, the title will read
Fee Status for Blah (#3)
The # can, however, affect titles when a by group is in play, when included in the title in a particular way - see the documentation here, under the heading 'Inserting BY-Group Information into a Title'.