SAS: How to create datasets in loop based on macro variable - sas

I have a macro variable like this:
%let months = 202002 202001 201912 201911 201910;
As one can see, we have 5 months, separated by space ' '.
I would like to create 5 datasets like a_202002, a_202001, a_201912, a_2019_11, a_201910. How can I run this in loop and create 5 datasets, instead of writing the datastep 5 times?
Pseudo code:
for m in &months.
data a_m;
....
....
run;
How can I do that in SAS? I tried %do_over but that did not help me.

Use the knowledge you gained from #Tom answer in an earlier question to create the macro
%macro datasets_for_months ...
...
%mend;
Specify the output data sets in the DATA statement:
DATA %datasets_for_months(...);
...
RUN;
Direct rows to specific output data sets by naming the data set, such as
OUTPUT a_202002;
Note:
A step with no OUTPUT statements will implicitly output to each data set
A step with an OUTPUT statement will cause records to be written to either ALL data sets, or only the ones named in the statement:
OUTPUT writes records to all output data sets
OUTPUT data-set-name-1 writes records to only data sets specified
The DATA Step documentation covers what you need to know in greater detail
DATA Statement
Begins a DATA step and provides names for any output such as SAS data sets, views, or programs.
...
Syntax
Form 1:
DATA statement for creating output data sets
DATA <data-set-name-1 <(data-set-options-1)>>
... <data-set-name-n <(data-set-options-n)>>
... ;
THE ROAD AHEAD
You will likely discover that month will be better served in a conceptual role as a categorical variable in a single large data set, instead of breaking the data into multiple month-named data sets.
A categorical variable will let you leverage the power of SAS' partitioning and segregating statements such as WHERE, BY and CLASS when pursuing processing, reporting and visualization of your data at different combinations of class level values.

How about this approach? Create the data set names in another macro variable and use a single data step.
%let months = 202002 202001 201912 201911 201910;
data _null_;
ds = prxchange('s/(\d+)/a_$1/', -1, "&months.");
call symputx('ds', ds);
run;
options symbolgen;
data &ds.;
run;

You can use a %DO loop and the %SCAN() function. Use the COUNTW() function to find the upper bound.
%do i=1 %to %sysfunc(countw(&months,%str( )));
%let month=%scan(&months,&i,%str( ));
....
%end;

Related

How to move specific rows from an original table to a new table?

In SAS, I have a table that have 1000 rows. I am trying to separate that table into two tables. Row1-500 to Table A and row501-100 to table B. What is the code that can do this function. Thank you all for helping!
I am searching the code online and cannot get anything on google, help is appreciated.
The DATA statement lists the output tables of the step. An OUTPUT statement explicitly sends a row to each of the output tables. An explicit OUTPUT <target> ... <target-k> statement sends records to the specified tables only. The automatic implicit loop index variable _n_ can act as a row counter when a single data set is being read with SET.
Try
data want1 want2;
set have;
if _n_ <= 500 then output want1; else output want2;
run;
However, you may be better served by creating a categorical variable that can be used later in WHERE or BY statements.
Maybe the set options will help.Try firstobs= and obs= to choose the rows you want.
Here is how to use them:
data want1;
set have(obs=500);
run;
data want2;
set have(firstobs=501 obs=1000);
run;

Export/Import Attributes of a SAS dataset

I am working with multiple waves of survey data. I have finished defining formats and labels for the first wave of data.
The second wave of data will be different, but the codes, labels, formats, and variable names will all be the same. I do not want to define all these attributes again...it seems like there should be a way to export the PROC CONTENTS information for one dataset and import it into another dataset. Is there a way to do this?
The closest thing I've found is PROC CPORT but I am totally confused by it and cannot get it to run.
(Just to be clear I'll ask the question another way as well...)
When you run PROC CONTENTS, SAS tells you what format, labels, etc. it is using for each variable in the dataset.
I have a second dataset with the exact same variable names. I would like to use the variable attributes from the first dataset on the variables in the second dataset. Is there any way to do this?
Thanks!
So you have a MODEL dataset and a HAVE dataset, both with data in them. You want to create WANT dataset which has data from HAVE, with attributes of MODEL (formats, labels, and variable lengths). You can do this like:
data WANT ;
if 0 then set MODEL ;
set HAVE ;
run ;
This works because when the DATA step compiles, SAS builds the Program Data Vector (PDV) which defines variable attributes. Even though the SET MODEL never executes (because 0 is not true), all of the variables in MODEL are created in the PDV when the step compiles.
Importantly, note that if there are corresponding variables with different lengths, the length from MODEL will determine the length of the variable in WANT. So if HAVE has a variable that is longer than the same-named variable in MODEL, it may be truncated. Options VARLENCHK determines whether or not SAS throws a warning/error if this happens.
That assumes there are no formats/labels on the HAVE dataset. If there is a variable in HAVE that has a format/label, and the corresponding variable in MODEL does not have a format/label, the format/label from HAVE will be applied to WANT.
Sample code below.
data model;
set sashelp.class;
length FavoriteColor $3;
FavoriteColor="Red";
dob=today();
label
dob='BirthDate'
;
format
dob mmddyy10.
;
run;
data have;
set sashelp.class;
length FavoriteColor $10;
dob=today()-1;
FavoriteColor="Orange";
label
Name="HaveLabel"
dob="HaveLabel"
;
format
Name $1.
dob comma.
;
run;
options varlenchk=warn;
data want;
if 0 then set model;
set have;
run;
I'd create an empty dataset based on the existing one, and then use proc append to append the contents to it.
Create some sample data for the second round of data:
data new_data;
age = 10;
run;
Create an empty dataset based on the original data:
proc sql noprint;
create table want like sashelp.class;
quit;
Append the data into the empty dataset, retaining the details from the original:
proc append base=want data=new_data force nowarn;
run;
Note that I've used the force and nowarn options on proc append. This will ensure the data is appended even if differences are found between the two datasets being used. This is expected if you have, for example, format differences. It will also hide things like if columns exist in the new table that aren't in the old table etc. So be careful that this is doing what you want it to. If the behaviour is undesirable, consider using a datastep to append instead (and list the want dataset first).
Welcome to the stack.
If you want to copy the properties of the table without the data within it, you could use PROC SQL or data step with zero rows read in.
This examples copies all information about the SASHELP.CLASS dataset into a brand new dataset. All formats, attributes, labels, the whole thing is copies over. If you want to only copy some of the columns, specify them in select clause instead of asterix.
PROC SQL outobs=0;
CREATE TABLE WANT as SELECT * FROM SASHELP.CLASS;
QUIT;
Regards,
Vasilij

CREATING SUBSETS IN SAS "automatically" [duplicate]

This question already has an answer here:
Dynamic first observation: Need to put a variable in firstobs=
(1 answer)
Closed 8 years ago.
I have generated a dataset through sas consisting of 144 rows and now I want to create 18 subsets of the dataset (each subset containing 8 rows). I know how to create subsets "manually using the firstobs= and obs= commands. However, I want the subsets to be created "automatically". The reason for this is that I am extracting data from a particular website and each time I run the code the dataset is of different size, all I know is that each time I generate a dataset I want to create X subsets containing 8 rows (e.g the first subset will consist of rows 1-8, the second will consist of rows 9-16 and so on...).
So my question, how do I go about attacking this problem?
I don't know what the purpose of your subsets is, but it's generally a bad idea to split datasets up into multiple versions. The better approach is to create a variable in the original dataset that holds the subset number, that way it is easy to extract particular subsets or group by them later on.
Here's an easy way to do that, clearly the final subset will not have 8 rows if the number of records is not divisible by 8. I assumed from your question that 8 rows is a fixed amount regardless of record size.
data want;
set sashelp.citimon;
subset = ceil(_n_/8);
run;
Macro approach:
I just provided a rough code. Optimize it according to your requirements.
proc sql;
select count(*) into :total from source_data;
quit;
%macro create_subsets(count,ds);
%let cnt=%sysfunc(ceil(%sysevalf(&count/8)));
%let num=1;
%do i=1 %to &cnt;
%if(&i = &cnt) %then %do;
%let toread=&count;
%end;
%else %do;
%let toread=&num+7;
%end;
data &ds._&i;
set &ds(firstobs=&num obs=%eval(&toread));
run;
%let num=%eval(&num + 8);
%end;
%mend create_subsets;
%create_subsets(&total,source_data)
Edited based on the valuable comment from Robert.
Note: A non macro approach will always be easier and more efficient.

How do I work with a SAS file that was created in a different format (Linux/Windows) if I don't have access to machine that created it?

I have numerous SAS datasets on my Windows 7 / SAS 9.4 machine:
data_19921.sas7bdat
data_19922.sas7bdat
data_19923.sas7bdat
....
data_200212.sas7bdat
etc. The format is data_YYYYM.sas7bdat or data_YYYYMM.sas7bdat (the latter for two digit months) and every dataset has identical variables and formatting. I'm trying to iterate over all of these files and append them into one big SAS dataset. The datasets were created on some Unix machine elsewhere in my company that I don't have access to. When I try to append the files:
%let root = C:\data;
libname in "&raw\in";
libname out "&raw\out";
/*****************************************************************************/
* initialize final data set but don't add any observations to it;
data out.alldata;
set in.data_19921;
stop;
run;
%macro append_files;
%do year=1992 %to 2002;
%do month=1 %to 12;
proc append data=out.alldata base=in.data_&year&month;
run;
%end;
%end;
%mend;
%append_files;
I get errors that say
File in.data_19921 cannot be updated because its encoding does not match the session encoding or the file is in a format native to another host, such as LINUX_32, INTEL_ABI
I don't have access to the Unix machine that created these datasets (or any Unix machine right now), so I don't think I can use proc cport and proc cimport.
How can I read/append these data sets on my Windows machine?
You can use the colon operator or dash to shortcut your process:
data out.alldata;
set in.data_:;
run;
OR
data out.alldata;
set in.data19921-in.data200212;
run;
If you have variables that are different lengths you may get truncation. SAS will warn you though:
WARNING: Multiple lengths were specified for the variable name by input data set(s). This may
cause truncation of data.
One way to avoid it is to create a false table first, using the original table. The following SQL code will generate the CREATE Table statements and you can modify the lengths manually as needed.
proc sql;
describe table in.data19921;
quit;
*run create table statement here with modified lengths;
data out.alldata;
set fake_data (obs=0)
in.data19921-in.data200212;
run;

Sas macro with proc sql

I want to perform some regression and i would like to count the number of nonmissing observation for each variable. But i don't know yet which variable i will use. I've come up with the following solution which does not work. Any help?
Here basically I put each one of my explanatory variable in variable. For example
var1 var 2 -> w1 = var1, w2= var2. Notice that i don't know how many variable i have in advance so i leave room for ten variables.
Then store the potential variable using symput.
data _null_;
cntw=countw(&parameters);
i = 1;
array w{10} $15.;
do while(i <= cntw);
w[i]= scan((&parameters"),i, ' ');
i = i +1;
end;
/* store a variable globally*/
do j=1 to 10;
call symput("explanVar"||left(put(j,3.)), w(j));
end;
run;
My next step is to perform a proc sql using the variable i've stored. It does not work as
if I have less than 10 variables.
proc sql;
select count(&explanVar1), count(&explanVar2),
count(&explanVar3), count(&explanVar4),
count(&explanVar5), count(&explanVar6),
count(&explanVar7), count(&explanVar8),
count(&explanVar9), count(&explanVar10)
from estimation
;quit;
Can this code work with less than 10 variables?
You haven't provided the full context for this project, so it's unclear if this will work for you - but I think this is what I'd do.
First off, you're in SAS, use SAS where it's best - counting things. Instead of the PROC SQL and the data step, use PROC MEANS:
proc means data=estimation n;
var &parameters.;
run;
That, without any extra work, gets you the number of nonmissing values for all of your variables in one nice table.
Secondly, if there is a reason to do the PROC SQL, it's probably a bit more logical to structure it this way.
proc sql;
select
%do i = 1 %to %sysfunc(countw(&parameters.));
count(%scan(&parameters.,&i.) ) as Parameter_&i., /* or could reuse the %scan result to name this better*/
%end; count(1) as Total_Obs
from estimation;
quit;
The final Total Obs column is useful to simplify the code (dealing with the extra comma is mildly annoying). You could also put it at the start and prepend the commas.
You finally could also drive this from a dataset rather than a macro variable. I like that better, in general, as it's easier to deal with in a lot of ways. If your parameter list is in a data set somewhere (one parameter per row, in the dataset "Parameters", with "var" as the name of the column containing the parameter), you could do
proc sql;
select cats('%countme(var=',var,')') into :countlist separated by ','
from parameters;
quit;
%macro countme(var=);
count(&var.) as &var._count
%mend countme;
proc sql;
select &countlist from estimation;
quit;
This I like the best, as it is the simplest code and is very easy to modify. You could even drive it from a contents of estimation, if it's easy to determine what your potential parameters might be from that (or from dictionary.columns).
I'm not sure about your SAS macro, but the SQL query will work with these two notes:
1) If you don't follow your COUNT() functions with an identifier such as "COUNT() AS VAR1", your results will not have field headings. If that's ok with you, then you may not need to worry about it. But if you export the data, it will be helpful for you if you name them by adding "...AS "MY_NAME".
2) For observations with fewer than 10 variables, the query will return NULL values. So don't worry about not getting all of the results with what you have, because as long as the table you're querying has space for 10 variables (10 separate fields), you will get data back.