a question on the default initialized value for a SAS variable - sas

I am having two questions on the following SAS code:
%let crsnum=3;
data revenue;
set sasuser.all end=final;
where course_number=&crsnum;
total+1;
if paid=’Y’ then paidup+1;
if final then do;
call symput(’numpaid’,paidup);
call symput(’numstu’,total);
call symput(’crsname’,course_title);
end;
run;
proc print data=revenue noobs;
var student_name student_company paid;
title "Fee Status for &crsname (#&crsnum)";
footnote "Note: &numpaid Paid out of &numstu Students";
run;
First question, in line 5, it has
if paid=’Y’ then paidup+1;
"paidup" should be a variable here.
It seems to me that SAS setup the default initial value of "paidup" as 0. Is that true?
Second question, in the code segment of
title "Fee Status for &crsname (#&crsnum)";
How does #&crsnum work? Or what's the functionality of # here?

First question: yes, that's what SAS has done - it has initialised the variable with 0, and 'retains' the value of the variable across data set loops. (Unless the variable paidup already exists in the source data, in your case sasuser.all)
Second question: in the code you've posted, there is nothing special about the #: it will appear as a literal before the resolved value of &crsnum in the title. So if &crsname is Blah and &crsnum is 3, the title will read
Fee Status for Blah (#3)
The # can, however, affect titles when a by group is in play, when included in the title in a particular way - see the documentation here, under the heading 'Inserting BY-Group Information into a Title'.

Related

Where statement is not capturing my condition correctly

I want to tell SAS to capture specific observation under the variable "rashloc_spcy" (and others) for a string observations ("B", "P/G", "Peri", "Gen"). However, when I see the results, SAS is capturing other observations not described in my statement. Is there anything I can do to modify my code?
output result
proc print data=k.dataset;
var rashloc_GNT rashloc_PER rasloc_Spcfy;
where ((rashloc_GNT = "GNT") OR (rashloc_PER = "PER")) OR rashloc_Spcfy in ("B", "P/G", "Peri", "Gen"));
run;
I should be getting only the quoted keyterms in the variable of interest (rashloc_spcfy)
So you want to exclude cases where the third variable is some other non missing value if they meet the first two criteria?
So perhaps?
where ((rashloc_GNT = "GNT") OR (rashloc_PER = "PER"))
and not (rashloc_Spcfy not in (" ","B","P/G","Peri","Gen"))
;

Dummary Variable numeric difference in SAS

My code was running fine until I added the last line for age 5+. Does anyone know what's wrong with that line? Thank you.
data Work.File ;
set Work.File;
Female =(Sex ='F');
Male = (Sex ='M');
Age1=(age=1);
Age2=(age=2);
Age3=(age=3);
Age4=(age=4);
Age5+=(age='5+');
run;
The name of a SAS variable has certain restrictions, you can't have a + sign. Also Age should be a numeric variable. You can write last line as:
Age5Plus=(age>=5);
"Age5+"n=(age>=5);
would also work after setting
options validvarname=any;
but than you have to escape that name every time you use that variable

Multiple To clauses in Data step

I have a data step where I have a few columns that need tied to one other column.
I have tried using multiple "from" statements and " to" statements and a couple other permutations of that, but nothing seems to do the trick. The code looks something like this:
data analyze;
set css_email_analysis;
from = bill_account_number;
to = customer_number;
output;
from = bill_account_number;
to = email_addr;
output;
from = bill_account_number;
to = e_customer_nm;
output;
run;
I would like to see two columns showing bill accounts in the "from" column, and the other values in the "to", but instead I get a bill account and its customer number, with some "..."'s for the other values.
Issue
This is most likely because SAS has two datatypes and the first time the to variable is set up, it has the value of customer_number. At your second to statement you attempt to set to to have the value of email_addr. Assuming email_addr is a character variable, two things can happen here:
Customer_number is a number - to has already been set up as a number, so SAS cannot force to to become a character, an error like this may appear:
NOTE: Invalid numeric data, 'me#mywebsite.com' , at line 15 column 8. to=.
ERROR=1 N=1
Customer_number is a character - to has been set up as a character, but without explicitly defining its length, if it happens to be shorter than the value of email_addr then the email address will be truncated. SAS will not show an error if this happens:
Code:
data _NULL_;
to = 'hiya';
to = 'me#mydomain.com';
put to=;
run;
short=me#m
to is set with a length of 4, and SAS does not expand it to fit the new data.
Detail
The thing to bear in mind here is how SAS works behind the scenes.
The data statement sets up an output location
The set statement adds the variables from first observation of the dataset specified to a space in memory called the PDV, inheriting lengths and data types.
PDV:
bill_account_number|customer_number|email_addr|e_customer_nm
===================================================================
010101 | 758|me#my.com |John Smith
The to statement adds another variable inheriting the characteristics of customer_number
PDV:
bill_account_number|customer_number|email_addr|e_customer_nm|to
===================================================================
010101 | 758|me#my.com |John Smith |758
(to is either char length 3 or a numeric)
Subsequent to statements will not alter the characteristics of the variable and SAS will continue processing
PDV (if customer_number is character = TRUNCATION):
bill_account_number|customer_number|email_addr|e_customer_nm|to
===================================================================
010101 | 758|me#my.com |John Smith |me#
PDV (if customer_number is numeric = DATA ERROR, to set to missing):
bill_account_number|customer_number|email_addr|e_customer_nm|to
===================================================================
010101 | 758|me#my.com |John Smith |.
Resolution
To resolve this issue it's probably easiest to set the length and type of to before your first to statement:
data analyze;
set css_email_analysis;
from = bill_account_number;
length to $200;
to = customer_number;
output;
...
You may get messages like this, where SAS has converted data on your behalf:
NOTE: Numeric values have been converted to character
values at the places given by: (Line):(Column).
27:8
N.B. it's not necessary to explicitly define the length and type of from, because as far as I can see, you only ever get the values for this variable from one variable in the source dataset. You could also achieve this with a rename if you don't need to keep the bill_account_number variable:
rename bill_account_number = from;

How to record qualitative variable with over 100 dummies to several levels as quantitative in SAS

I am working with SAS and want to record variable which with over 50+ different qualitative dummies. For example, the state of the U.S.
In this case, I just want to reduce them into 4 or 5 levels dummy as quantitative variable.
I get several ideaS, for example to use if/else statement, however, the problem is that i have to write down and specify each of area name in SAS and the code looks like super heavy.
Is there any other ways to do that without redundant code? Or to avoid write each specific name of variable? In SAS.
Any ideas are appreciated!!
Method 1:
Use IN, but you still have to list the variables. You can also do it via a format, but you have to define the format first anyways.
if state in ('AL', 'AK', 'AZ' ... etc) then state_group = 1;
else if state in ( .... ) then state_group = 2;
Method 2:
For a format, you create format using PROC FORMAT and then apply it.
proc format;
value $ state_grp_fmt
'AL', 'AK', 'AZ' = 1
'DC', 'NC' = 2 ;
run;
And then you can use it with a PUT statement.
State_Group = put(state, state_grp_fmt);

Is it possible to filter a data step on a newly computed variable?

In a basic data step I'm creating a new variable and I need to filter the dataset based on this new variable.
data want;
set have;
newVariable = 'aaa';
*lots of computations that change newVariable ;
*if xxx then newVariable = 'bbb';
*if yyy AND not zzz then newVariable = 'ccc';
*etc.;
where newVariable ne 'aaa';
run;
ERROR: Variable newVariable is not on file WORK.have.
I usually do this in 2 steps, but I'm wondering if there is a better way.
( Of course you could always write a complex where statement based on variables present in WORK.have. But in this case the computation of newVariable it's too complex and it is more efficient to do the filter in a 2nd data step )
I couldn't find any info on this, I apologize for the dumb question if the answer is in the documentation and I didn't find it. I'll remove the question if needed.
Thanks!
Use a subsetting if statement:
if newVariable ne 'aaa';
In general, if <condition>; is equivalent to if not(<condition>) then delete;. The delete statement tells SAS to abandon this iteration of the data step and go back to the start for the next iteration. Unless you have used an explicit output statement before your subsetting if statement, this will prevent a row from being output.