SAS: How to create a variable like 1,2,3,4,...,N - sas

I need to introduce a time trend to a regression model for a course but have no idea how to create a variable that's just (1,2,3,4,...,108). In R or Python I would just create an empty vector of 0's and then loop through to fill them with the loop index but I have no clue how to do it in SAS.
Thank you in advance

data want;
set have;
time_trend+1;
run;
SAS is an inherently looping language. The code above does four things:
Read a row
Add 1 to a variable called time_trend
Output the row to a dataset named want
Read the next row and execute the statements again
SAS automatically initialized the variable time_trend for us at compilation, so we do not need to declare a length or type. SAS assumes it is a numeric variable by default.
The statement time_trend+1 is a special shortcut of the below logic:
data want;
set have;
retain time_trend 0;
time_trend = time_trend + 1;
run;

Related

How do I sum similarly named variables in SAS?

I want to remove every observation where the variable starting with R1 has a missing value. In order to do this, I first try to sum every variable with that prefix:
data test
input R1_1 R1_2 R1_3;
datalines;
. . .
;
run;
data test2;
set test;
diagnosis=sum (of R1:);
run;
This syntax should work according to this article. However something seems to be wrong. In the above example, I get an error complaining about the function call not having enough arguments. In other cases, the code seems to run smoothly but my diagnosis variable isn't created.
Can I fix this and in that case how?
Your code does not work because you did not have a semicolon ending the DATA statement so the TEST dataset you created does not have any variables. Instead you also created datasets named INPUT R1_1 R1_2 and R1_3 that also did not have any variables.
To your actual question you can use NMISS() to count the number of missing numeric values.
nmiss = nmiss(of R1_:) ;
So you can eliminate observations with ANY missing values by using something like:
data want;
set have;
where nmiss(of R1_1-R1_3);
run;
If the goal is to remove observations where ALL of the values are missing you need to know how many variables you are testing. If you don't know that number in advance then you could use an ARRAY to count them. But then you would need to use a subsetting IF instead of WHERE.
data want;
set have;
array x r1_: ;
if nmiss(of r1_:) < dim(x);
run;
If you have a mix of numeric and character variables you can use CMISS() instead.

How to use call symput on a specific observation in SAS

I'm trying to convert a SAS dataset column to a list of macro variables but am unsure of how indexing works in this language.
DATA _Null_;
do I = 1 to &num_or;
set CondensedOverrides4 nobs = num_or;
call symputx("Item" !! left(put(I,8.))
,"Rule", "G");
end;
run;
Right now this code creates a list of macro variables Item1,Item2,..ItemN etc. and assigns the entire column called "Rule" to each new variable. My goal is to put the first observation of "Rule" in Item1, the second observation in that column in Item2, etc.
I'm pretty new to SAS and understand you can't brute force logic in the same way as other languages but if there's a way to do this I would appreciate the guidance.
Much easier to create a series of macro variables using PROC SQL's INTO clause. You can save the number of items into a macro variable.
proc sql noprint;
select rule into :Item1-
from CondensedOverrides4
;
%let num_or=&sqlobs;
quit;
If you want to use a data step there is no need for a DO loop. The data step iterates over the inputs automatically. Put the code to save the number of observations into a macro variable BEFORE the set statement in case the input dataset is empty.
data _null_;
if eof then call symputx('num_or',_n_-1);
set CondensedOverrides4 end=eof ;
call symputx(cats('Item',_n_),rule,'g');
run;
SAS does not need loops to access each row, it does it automatically. So your code is really close. Instead of I, use the automatic variable _n_ which can function as a row counter though it's actually a step counter.
DATA _Null_;
set CondensedOverrides4;
call symputx("Item" || put(_n_,8. -l) , Rule, "G");
run;
To be honest though, if you're new to SAS using macro variables to start isn't recommended, there are usually multiple ways to avoid it anyways and I only use it if there's no other choice. It's incredibly powerful, but easy to get wrong and harder to debug.
EDIT: I modified the code to remove the LEFT() function since you can use the -l option on the PUT statement to left align the results directly.
EDIT2: Removing the quotes around RULE since I suspect it's a variable you want to store the value of, not the text string 'RULE'. If you want the macro variables to resolve to a string you would add back the quotes but that seems incorrect based on your question.

Create a basic data set by specifying a range of values

I am trying to create a single variable for my the purpose of my macro function. What I want to do is simple. I want to create a dataframe with a single variable with a range of character values. For example:
forecast
fore1
fore2
fore3
fore4
I am aware of the some of the ways this can be done with the input and datalines statements, however, the issue I am having is that I want to use fore1-fore4 to generate the data in this dataframe so that it will generalize to my macro function.
Assuming you literally want that data set, it could be as simple as this.
data want;
do i=1 to 4;
forecast = catt('fore', i);
output;
end;
keep forecast;
run;

Probt in sas for column of values

Im looking do a probt for a column of values in sas not just one and to give two tailed p values.
I have the following code Id like to amend
data all_ssr;
x=.551447;
df=25;
p=(1-probt(abs(x),df))*2;
put p=;
run;
however I would like x to be a column of values within another file. I have tried work.ttest which is just a file of ttest values.
Many thanks
You need to use a set statement to access data from another SAS dataset.
data all_ssr;
set work.ttest; /*Dataset containing column of values*/
df=25;
p=(1-probt(abs(x),df))*2;
run;
Removing the put statement avoids clogging up the log.

Naming variable using _n_, a column for each iteration of a datastep

I need to declare a variable for each iteration of a datastep (for each n), but when I run the code, SAS will output only the last one variable declared, the greatest n.
It seems stupid declaring a variable for each row, but I need to achieve this result, I'm working on a dataset created by a proc freq, and I need a column for each group (each row of the dataset).
The result will be in a macro, so it has to be completely flexible.
proc freq data=&data noprint ;
table &group / out=frgroup;
run;
data group1;
set group (keep=&group count ) end=eof;
call symput('gr', _n_);
*REQUESTED code will go here;
run;
I tried these:
var&gr.=.;
call missing(var&gr.);
and a lot of other statement, but none worked.
Always the same result, the ds includes only var&gr where &gr is the maximum n.
It seems that the PDV is overwriting the new variable each iteration, but the name is different.
Please, include the result in a single datastep, or, at least, let the code take less time as possible.
Any idea on how can I achieve the requested result?
Thanks.
Macro variables don't work like you think they do. Any macro variable reference is resolved at compile time, so your call symput is changing the value of the macro variable after all the references have been resolved. The reason you are getting results where the &gr is the maximum n is because that is what &gr was as a result of the last time you ran the code.
If you know you can determine the maximum _n_, you can put the max value into a macro variable and declare an array like so:
Find max _n_ and assign value to maxn:
data _null_;
set have end=eof;
if eof then call symput('maxn',_n_);
run;
Create variables:
data want;
set have;
array var (&maxn);
run;
If you don't like proc transpose (if you need 3 columns you can always use it once for every column and then put together the outputs) what you ask can be done with arrays.
First thing you need to determine the number of groups (i.e. rows) in the input dataset and then define an array with dimension equal to that number.
Then the i-th element of your array can be recalled using _n_ as index.
In the following code &gr. contains the number of groups:
data group1;
set group;
array arr_counts(&gr.) var1-var&gr.;
arr_counts(_n_)= count;
run;
In SAS there're several methods to determine the number of obs in a dataset, my favorite is the following: (doesn't work with views)
data _null_;
if 0 then set group nobs=n;
call symputx('gr',n);
run;