SAS length statement - sas

I've just realised how useful it is to reduce the length of numeric (dummies and integers) variables, since it saves me both time and diskspace. However, I think it's convenient to use the length statement at the end of my code rather than before mentioning "set" (the latter way is how SAS bloggers and other experts mostly recommend you to use the length statement).
So, is there a difference between these two ways (see the examples below)? I can't find any differences in the output, but I'm a bit worried that I might be doing something wrong. Can you please explain what the difference is (if there is one) and why you would prefer to do it the either way.
Thanks in advance!
This is an example of how I use the length statement:
data b;
set a;
dummy = income > 10 000;
label "dummy = Income > 10 000";
length dummy 3;
run;
But here is how the experts recommend you to do it.
data b;
length dummy 3;
set a;
dummy = income > 10 000;
label "dummy = Income > 10 000";
run;

I would swear that in previous versions of SAS, you'd not be able to override the length of the variable once defined by a length statement or "inherited" from source data.
I remember some notes or warnings about "length of the variable ... was already set".
In SAS 9.3 the code:
data a;
length income dummy 8.;
income = 1234567890;
dummy = 1234567890;
output;
stop;
run;
data b;
set a;
attrib dummy length = 3 label = "dummy = Income > 10 000";
dummy = income > 10000;
length dummy 8;
length dummy 5;
run;
creates a variable dummy with length 5, without any notes.
So it seems to me the behaviour has changed. Previously, I'd say you would end up with a variable as defined by first of explicit definition or appearance in source data.
However it surely does not help readability and maintainability of code to first assign values to variables and define basic properties of variables at the very end.
Btw the correct definition of label would be: label dummy = "dummy = Income > 10 000";
Alternatively you might prefer ATTRIB statement to specify various properties of single variable in single statement.
data b;
set a (drop = dummy);
attrib dummy length = 3 label = "dummy = Income > 10 000";
dummy = income > 10000;
run;

Numeric variables may have their length changed at any time, while character variables may have it only done prior to their creation. That's because a numeric variable's length only affects the output dataset; inside the PDV, numeric variables always have 8 bytes of precision regardless of any length statements. However, character variables may not have their lengths redefined, as the PDV length associated with a character variable is not fungible after it is initially defined (in a set statement or the first length/attrib/assignment for a character variable). See the documentation on LENGTH for more details (although not as many as I'd like to see).
That said, personally I prefer formatting and lengths up front rather than at the end. Part of this is so that anyone reading the program knows going in what the ultimate formats will be; but most of it is that some lengths/attribs must be done up front: character lengths, in particular, and any variable where you need to specify the type (numeric/character) ahead of time in order to ensure you end up with the right type. If you usually put lengths at the end, you'll end up with a mixture of some at front/some at end, and as such I'd rather do all at front to be more organized.

Related

Is there any function in SAS where we can read the exact value from the variable

Suppose i have a column called ABC and that variable has the data like
:
123_112233_66778_1122 or
123_112233_1122_11232 or
1122_112233_66778_123
so i want to generate the desire variable in the next column as 1122. like this "1122" i have a long list where i need to cross the value from the column called ABC, if found the exact match then need to generate. However, i don't want to generate the match like 112233 because it does not match the value what i am looking for.
For an example you can see all three line what i have given for reference. I am taking only the match records which is "1122" from all the above 3 lines.
I really have no clue to overcome on the problem. I have tried my hands with wildcards but did not get much success. Any help would be much apricated
It is hard to tell from your description, but from the values you show it looks like you want the INDEXW() function. That will let you search a string for matching words with a option to specify which characters are to be considered as the separators between the words. The result is the location of where the word starts within longer string. When the word is not found the result is a zero.
Let's create a simple example to demonstrate.
data have;
input abc $30. ;
cards;
123_112233_66778_1122
123_112233_1122_11232
1122_112233_66778_123
;
data want;
set have ;
location = indexw(trim(abc),'1122','_');
run;
Note that SAS will consider any value other than zero (or missing) as TRUE so you can just use the INDEXW() function call in a WHERE statement.
data want;
set have;
where indexw(trim(abc),'1122','_');
run;

Using variables in SAS User defined function

I am a beginner SAS user with much experience in VBA and am having a hard time figuring out User defined functions in SAS.
I am having several problems using variables in User defined functions, but I think the two listed below are probably a related issue and would likely solve the rest of them.
I) How do you use a variable in a macro function from within a user defined function?
proc fcmp outlib = sasuser.funcs.trial;
function testNumbers(testvar $) $;
length testing $ 100;
lencheck = %length(testvar);
return (lencheck);
endsub;
run;
options cmplib = sasuser.funcs;
%put %sysfunc(testnumbers(short));
No matter what the input to the function is, the result is always 7, which matches the length of the input variable name "testvar" If I change the variable name, it changes the result. I've tried putting an ampersand in front of the variable name, but this doesn't work (it just makes the result 8...). I can get the function to return the input by putting in "return (testvar)" but can't figure out how to get the length function to work.
II) How do you define a variable as numeric in the context of the user defined function?
proc fcmp outlib=sasuser.funcs.trial;
function testNumbers(testvar $) $;
length testvar $ 100;
myNumber = 5
testNum = put(myNumber, 2.);
tempPath = %substr(1234567890, 3, 2)
tempPath1 = %substr(1234567890, 3, myNumber)
tempPath2 = %substr(1234567890, 3, testNum)
tempPath3 = %substr(1234567890, 3, put(myNumber, 2.))
return (tempPath);
endsub;
run;
The first tempPath works and returns "34" as expected. But tempPath1, tempPath2 and tempPath3 all return errors. The error is that Argument 3 to macro function %substr is not a number. For tempPath3 there is an additional error that a required operateor not found in the expression.
Note: I am aware that these functions do not do anything worthwhile. These are simplified as I am trying to learn the language and the possibilities. There may be other problems even with the simple code provided, and any advice on that would be appreciated.
What I was actually trying to code was a function that will allow for dynamically changing the library being used (so if a temp flag is set, everything will go into the Work directory, but if not, it will go to the final production location). If there is a better solution than a UDF for this, I'd love to hear this too.
The macro processor evaluates before the results are passed onto base SAS for processing.
Since your program uses this macro logic.
lencheck = %length(testvar);
The macro processor will calculate %length(testvar) which is 7 since that is how many characters are in the string testvar. It is the same as if wrote this statement:
lencheck = 7 ;
If you want the function to find the length of the variable TESTVAR then you need to use the LENGTH() function and not the macro function %LENGTH().
You have a similar issue with your use of the %SUBSTR() macro function instead of the function SUBSTR() in your second example.

How does the reverse function in SAS work?

I have a time data field, say, 10/1/2014.
I want to extract the month and the year information dynamically in SAS, given any date.
I wrote the following code in SAS to extract the month info:
month = substr(time_field, 1, index(time_field, '/')-1);
This worked fine.
I wrote the following snippet to extract the year info:
year = substr(reverse(time_field), 1, 4);
This doesn't work; it throws a blank. Have I missed something? Please help.
SAS will return the year for you. No need to write any custom function for this purpose. Look:
data _null_;
length year 4.;
year=year(today());
put "we are on the year of " year;
run;
Your variable has trailing spaces most likely. So when you reverse it, the trailing spaces become leading spaces and then you take the first four characters which are blanks.
You can verify this by running the reverse function alone on the variable and see the results.
Try adding the compress function.
year = substr(reverse(compress(time_field)), 1, 4);
Though this may solve your problem, you should really convert your date to a SAS date and then use the Month/Day/Year functions.
data have;
length time_field $20.;
time_field="10/1/2014";
year_bad = substr(reverse(time_field),1, 4);
year_good = reverse(substr(reverse(compress(time_field)),1, 4));
year_better = year(input(time_field, mmddyy10.));
put "year_bad:" year_bad;
put "year_good:" year_good;
put "year_better:" year_better;
run;
Your data is either a month in a character field, or it is a numeric value formatted as a date. While you can use text expressions on numerics, you shouldn't; you should explicitly convert them.
When you don't, then you end up with things like this - ie, improper lengths of fields, because the automatic conversion is very loose. It tends to allow a huge amount of extra space where it's not required to.
If your data is numeric, use MONTH() or YEAR() and be done with it; there's no reason to play in text here. Look at the field in the data explorer; it will tell you if it's numeric or not. (Numeric with a format can still look like text, so actually look at it!)
If your data is text, then you have some better options than REVERSE.
First is SCAN. SCAN splits by word, similar to many other languages; often strsplit (R) or similar.
month=scan(mdy_var,1,'/');
day =scan(mdy_var,2,'/');
year =scan(mdy_var,3,'/');
Second, you could still use SUBSTR, along with LENGTH.
year = scan(mdy_var,length(mdy_var)-3,4);
LENGTH tells you how long the string really is (minus trailing spaces), so '10/1/2014' is 9 long; 6th character (9-3) is the 2, and then 4 characters after that [which should be unnecessary]. This method wouldn't really work with Day, of course, only with year (and only with 4 digit year). Scan is better really, but this is a good example of how this works.
Going along the same lines, you can use FIND and look backwards, also, using a negative start position.
year = substr(mdy_var,find(mdy_var,'/',-99)+1,4);
That starts it at the 99th character (which is realistically your maximum, right?) and goes left, and then tells you what position the first '/' it finds.

Use array variables in subsetting IF without specifying number of array variables

In SAS, I have a few columns of dollar values and several other columns. I don't care about a row of data if it has 0 values for all dollar values and I don't want these rows in the final data. The columns that are dollar values are already in an array Fix_Missing. I am in the data step.
So, for example, I could do:
IF Fix_Missing{1} NE 0 OR Fix_Missing{2} NE 0 OR Fix_Missing{3} NE 0;
However, the number of variables in the array may change and I want code that works no matter how many variables are in it. Is there an easy way to do this?
In most functions that accept uncounted lists of variables, you can use of array[*].
if sum(of fix_missing[*]) > 0
for example would work assuming fix_missing cannot have negative values. You can also use this with CATS, so for example:
if cats(of fix_missing[*]) ne '000';
would work, and you could do something even like this:
if cats(of fix_missing[*]) ne repeat('0',dim(fix_missing)-1);
if you might have unknown numbers of array elements (repeat takes a character and repeats it n+1 times).
Another useful function, probably not useful here but in the same vein, is whichn and whichc (numeric/character versions of same thing). If you wanted to know if ANY of them had a 0 in them:
if whichn(0,of fix_missing[*]) > 0;
which returns the position of the first zero it finds.

SAS - selecting character observations from position 1 to position 2

I am stuck in this one particular point. I have a character variable with observations extracted from rtf document. I need to keep only the observations from obs A to obs B. The firstobs and obs is not helpful here because we do not know the observation number beforehand. All we know is the two unique strings. For example in the dataset, I need to create a dataset with observations from obs 11 to 16. This is only part of dataset, the original dataset has over 1500 observations, that is why we use unique text to capture instead of observation number.
Thank you all in advance.
You don't explain enough, but odds are you can do something sort of like this if I understand you right (you have a "start" and a "stop" string in the document).
data want;
set have;
retain keep 0;
if strvar = "keepme" then keep=1;
if keep=1;
if strvar = "lastone" then keep=0;
run;
IE, have some condition set the keep variable to 1, then test for it, then have the off condition after that (assuming you want to keep the off condition row). Use string functions like index or find or scan to search for your particular string if it's not an entire string. You could also use regular expressions if necessary.