Position of a word/var in a list SAS - sas

This might be a rather simple question but I am new to SAS and am clueless even after researching on this in Google.
I have a macro variable as -
%let list = 12AUG2013 13AUG2013 15AUG2013 16AUG2014 09SEPT2014;
I need to get the following things -
a) Total words in list: In R this would be length(list). But in SAS, length counts each character. COUNTW does not work. Anyway I can do this?
b) Finding the ith word: If I need the 3rd element in this list, I would say list[3] in R. How can I do that in SAS?
c) Finding the position of an element: Suppose I need to know which position 16AUG2014 is at in the list variable, how can I get it?
Thanks for all the help!

As you're asking about macro variables, it's a little different to using SAS data-step functions. Your questions provide a useful example of how they differ. Some data-step functions have macro function equivalents %SCAN, %SUBSTR etc. Others will require the use of %SYSFUNC, which allows most SAS datastep functions to be converted into macro functions.
So, referring to your example:
%let list = 12AUG2013 13AUG2013 15AUG2013 16AUG2014 09SEPT2014;
%let list_numwords = %sysfunc(countw(&list)); /* This example shows the use of SYSFUNC */
%let list_word3 = %scan(&list,3); /* These examples show the use of SCAN and INDEX, inbuilt macro functions*/
%let list_pos16AUG2014 = %index(&list,16AUG2014);
The code creates new macro variables that store the answers to your questions a,b and c* respectively.
* If you require the word number of 16AUG2014 (i.e. 4), then this is a little more difficult as I don't think there's a string function for this in SAS. It would involve using a combination of COUNT and %SUBSTR.

The Code for position in list is
%let list_pos=%sysfunc(countw(%substr(&list,1,%index(&list,16AUG2014)+1)));
Cheers

Related

How to choose indexed assignment variable dynamically in SAS?

I am trying to build a custom transformation in SAS DI. This transformation will "act" on columns in an input data set, producing the desired output. For simplicity let's assume the transformation will use input_col1 to compute output_col1, input_col2 to compute output_col2, and so on up to some specified number of columns to act on (let's say 2).
In the Code Options section of the custom transformation users are able to specify (via prompts) the names of the columns to be acted on; for example, a user could specify that input_col1 should refer to the column named "order_datetime" in the input dataset, and either make a similar specification for input_col2 or else leave that prompt blank.
Here is the code I am using to generate the output for the custom transformation:
data cust_trans;
set &_INPUT0;
i=1;
do while(i<3);
call symputx('index',i);
result = myfunc("&&input_col&index");
output_col&index = result; /*what is proper syntax here?*/
i = i+1;
end;
run;
Here myfunc refers to a custom function I made using proc fcmp which works fine.
The custom transformation works fine if I do not try to take into account the variable number of input columns to act on (i.e. if I use "&&input_col&i" instead of "&&input_col&index" and just use the column result on the output table).
However, I'm having two issues with trying to make the approach more dynamic:
I get the following warning on the line containing
result = myfunc("&&input_col&index"):
WARNING: Apparent symbolic reference INDEX not resolved.
I do not know how to have the assignment to the desired output column happen dynamically; i.e., depending on the iteration of the do loop I'd like to assign the output value to the corresponding output column.
I feel confident that the solution to this must be well known amongst experts, but I cannot find anything explaining how to do this.
Any help is greatly appreciated!
You can't use macro variables that depend on data variables, in this manner. Macro variables are resolved at compile time, not at run time.
So you either have to
%do i = 1 %to .. ;
which is fine if you're in a macro (it won't work outside of an actual macro), or you need to use an array.
data cust_trans;
set &_INPUT0;
array in[2] &input_col1 &input_col2; *or however you determine the input columns;
array output_col[2]; *automatically names the results;
do i = 1 to dim(in);
result = myfunc(in[i]); *You quote the input - I cannot see what your function is doing, but it is probably wrong to do so;
output_col[i] = result; /*what is proper syntax here?*/
end;
run;
That's the way you'd normally do that. I don't know what myfunc does, and I also don't know why you quote "&&input_col&index." when you pass it to it, but that would be a strange way to operate unless you want the name of the input column as text (and don't want to know what data is in that variable). If you do, then pass vname(in[i]) which passes the name of the variable as a character.

How do I retrieve numerical value of macro argument set in data step

I've gone in circles on this one for 1.5 hours, so I'm giving in and asking for help here. What I'm trying to do is dead simple but I cannot for the life of me find a link describing the process.
I have the following data step:
data _null_;
some_date = "01JAN2000"D;
call symput('macro_input_date',left(put(some_date),date9.)));
%useful_macro(&macro_input_date);
run;
where a date value is passed to a macro function (I'm new to these). I'd like to use the numeric value of the date value - let's be wild and say I want to get the value of the year, multiply it by the day value, and subtract the remainder after dividing the month value by 3. I can't seem to get just the year value out of the input. I've tried various things such as
symget, both "naked" and prepended with "%", with arguments that represent all possible permutations of the following variants:
have a naked reference to the variable, e.g. macro_input_date
enclose in single quotes, e.g. 'macro_input_date'
enclose in double quotes, e.g. "macro_input_date"
prepend with the ampersand, e.g. &macro_input_date
direct call to %sysfunc(year(<argument as variously specified above>)
Can anyone tell me what I am missing?
Thanks!
Given that you asked about macro functions, I'll guess that your example date processing is just an example. Talking about macro functions in general, it's important to understand that a macro function will (generally) not be doing any processing of its own, it will just be generating some data step code to do some task. So, for something like your contrived example, the data step code would be something like:
data out;
set in; * Assume this contains a numeric called 'some_date';
result = year(some_date) * day(some_date) - mod(month(some_date), 3);
run;
To macroise this, you don't need to transfer the data values to the macro, you just need to transfer the variable name:
%macro date_func(var=);
year(&var) * day(&var) - mod(month(&var), 3)
%mend;
data out;
set in; * Assume this contains a numeric called 'some_date';
result = %date_func(var=some_date);
run;
Note that the value of the var parameter here is the literal text some_date, not the value of the some_date data step variable. There are other ways to do it of course - you could actually pass this macro a date literal and it would still work:
data out;
set in; * Assume this contains a numeric called 'some_date';
result = %date_func(var="21apr2017"d);
run;
so it all depends on exactly what you're trying to do... maybe you want to assign the result to another macro variable, so it doesn't need to be part of a data step at all, in which case you could do a similar thing with %sysfunc functions etc.
If you're just trying to get the year, you would do something like:
data _null_;
some_date = "01JAN2000"D;
call symput('macro_input_date',left(put(some_date,date9.)));
yearval = substr(symget('macro_input_date'),6,4);
put yearval=;
run;
Your macro value (&macro_input_date) is not the actual date value (14610) but is the text 01JAN2000. So you cannot use the year function (unless you INPUT it back), you would use substr to grab the year part.
Of course, this is all sort of pointless as going to/from macro variable doesn't really accomplish much here.
Are you just have trouble with date literals? Your data step code
data _null_;
some_date = "01JAN2000"D;
call symput('macro_input_date',left(put(some_date),date9.)));
run;
is just going to do the same thing as
%let macro_input_date=01JAN2000 ;
Now if you want to treat that string of characters as if it represents a date then you need to either wrap it up as a date literal
"&macro_input_date"d
Or convert it.
%sysfunc(inputn(&macro_input_date,date9))
Why not just store the actual date value into the macro variable?
call symputx('macro_input_date',some_date);
Then it wouldn't look like a date to you but it would look like a date to the YEAR() function.

SAS - Keeping only observations with all variables

I have a dataset of membership information, and I want to keep only the people who have been continuously enrolled for the entire year. There are 12 variables for each person, one for each month of the year with how many days during that month they were enrolled. Is there a way to make a subset of the data for just those with a value >1 for each of the month variables?
Thanks!
SAS has various summary functions that might well be what you're looking for. See min() (minimum) in particular, as it will allow you to find the minimum of several variables. You may also want to consider nmiss() (number of missing values) and n() (number of non-missing values) if you have to deal with missing values in your data.
Summary functions can be passed lists of variables like this (in a data step):
minimum = min(var1, var2, var3);
However, that can become long winded if you need to use a lot of variables. Fortunately, SAS provides several ways to reference lists of variables to make things neater. You can read about these variable lists here. To use them in a summary function use the of qualifier:
minimum = min(of var1-var12);
maximum = max(of var:);
blanks = nmiss(of _NUMERIC_);
Finally you will want to use your new found data to decide whether what data to include. To do this in a data step look at the output statement (user guide):
if min(of var:) > 1 then output;
Or if you feel like learning a bit more about SAS's syntax you could try using an implicit output by reading through the last link.
In general it's preferred to ask specific questions and show your current work on SO, and I'd advise using google to answer your basic questions while you're learning the fundamentals. There is plenty of great documentation available to help you out there.

code re-use in proc IML: accessing contents of vectors by specifying their name as a string

Within proc IML of the SAS system from within a user-defined module I want to be able to access the data in a vector by taking as an argument the name of the vector.
For example in the code below the module called "test" builds the two strings "x_vec1" and "x_vec2" but does not print the contents of these two vectors, but rather prints their names (I think submit blocks produce the same results).
This concept is very easily accomplished using macros by calling IML within macros, but I want to do it purely in IML to keep the code "clean". Whilst IML is brilliant for statistical work I need more than this - I need to keep my code short by having an analogous concept as the macro variable which can be resolved within a function at run time.
proc iml;
start test(ep);
str = concat("x_",ep);
print str;
finish;
x_vec1 = J(10,1,34);
x_vec2 = J(10,1,67);
run test("vec1");
run test("vec2");
quit;
The short answer to your question is that you need the function value.
proc iml;
x_vec1 = J(10,1,34);
str = cats('x_','vec1');
_temp = value(str);
print(_temp);
quit;
However, it doesn't work in a subroutine quite like you programmed it. That's because the subroutine needs to know that it's allowed to access the variable x_vec1 as a global variable - which you can't actually do here since you're specifying it on the fly.
You can get around this by PUSHing the command back to the calling environment, however.
start test(ep);
str = cats('print(x_',ep,')');
print(str);
call push(str);
finish;
Not sure if this gives you the flexibility you need; if you are trying to do something more complex you may want to clarify what the final result needs to be.
Using Joe's answer above, this new example accesses 2 vectors called "x_vec1" and "x_vec2" and adds a user defined amount to each element. Thus I think this approach can be used to create re-usable code that takes vector/matrix names as arguments.
proc iml;
start test(ep,n);
str1 = cats('print(x_',ep,');');
print str1;
str2 = cats('y_',ep,'= x_',ep,'+',char(n),';');
print str2;
str3 = cats('print(y_',ep,');');
print str3;
call queue(str1,str2,str3);
finish;
x_vec1 = J(10,1,34);
x_vec2 = J(10,1,67);
run test("vec1",10);
run test("vec2",100);
quit;

Wildcard in variable list

totalSUPPLY= sum(of supply1-supply485);
Ive got this simple calculation to make (in SAS) from a table that Ive transposed (hence the variable names). I have to do this several times, and the the number of supply variables is not the same for each calculation. I.e. in the above example its 485, but I do it later in my analysis and its 350.
My question: Is there a way to 'wildcard' the number of 'supply' columns. Basically, I want something like this (but this doesnt work): totalSUPPLY= sum(of supply1-supply%);
Also: If there is an easier way do the same Im open (and would actually prefer) that.
Thanks everyone!
data yoursummary;
set yourdata; /*dataset containing supply1-supply485*/
array supplies{*} supply:;
totalSUPPLY = sum(of supplies{*});
run;
N.B. using a : wildcard like this will only pick up matching variables that are present in the PDV at the point when you create the array, so the array definition has to come after the set statement. Also, it only works for variables with a common prefix, not those with a common suffix.
As Joe has pointed out, the following more concise code also works:
data yoursummary;
set yourdata; /*dataset containing supply1-supply485*/
totalSUPPLY = sum(of supplies:);
run;
Of course, if you declare an array it's then easier to do related things like checking how many variables are being added together, or looping through the variables in the array and applying the same logic to each one in turn.