SAS: Change dataset with loop count into append statement - sas

TOPIC: Change dataset with loop count into append statement
I have a macro that will loop and create a new dataset with a counter behind.
Code like this:
PROC IMPORT OUT=WORK.out&i DATAFILE= "&dir/&name"
/excelout/
DBMS=csv REPLACE; delimiter='09'x; getnames=no; RUN;
data test&i (drop= %do k=1 %to &cnt; &&col&k.. %end;
);
length station $10 voltage $10 year 8 month $20 transformer $10 Day $20 Date Time MW_Imp MW_Exp MVAR_Imp MVAR_Exp MVA
Power_Factor 8; format Time hhmm.; set out&i. end=last;
Currently the script will generate about 4 data sets if i have 4 external files by PROC IMPORT.
What i want is to eliminate the creation of multiple datasets but just append them into the master file. Is there a way to do so?

An append statement inside the loop should be sufficient to achieve this. SAS will copy first dataset as base since it was not existing.
proc append base=test data=test&i force; run;

Appending is probably just as easy, but if you don't want to create many datasets to begin with, you could use a data step to read in several files at once, using wildcards. That would eliminate the need to loop through the files, but does require that the files have the same structure and aren't stored in a folder with other similarly named files. The firstobs-option caused som issues in my tests, but as you have specified getnames=no in your import, I guess you have no need for it.
The snippet below inputs all csv files in c:\test.
data test;
infile "c:\test\*.csv" dsd delimiter='09'x;
input varA $ varB $;
run;

Related

Insert (internally existing) column headers as first row to a table

Assume that we have a table INPUT_TABLE which has four columns name, lat, lon, and z, filled with many data sets. In the SAS Explorer it would e.g. look like this:
name lat lon z
1 Germany 49.420469 8.7269178 17
2 England 51.5540693 -0.8249039 16
...
I handover a PREPROCESSED_TABLE based on this INPUT_TABLE to a macro %tabl:
data V42.PREPROCESSED_TABLE;
set V21.INPUT_TABLE;
drop NAME;
run;
%tabl(libin=V42, file=PREPROCESSED_TABLE);
The macro itself I am not allowed to modify.
Among other things, %tabl also writes a plain text file PREPROCESSED_TABLE.txt:
49.420469|8.7269178|17
51.5540693|-0.8249039|16
I would like to have the header names written out as well, e.g.:
lat|lon|z
49.420469|8.7269178|17
51.5540693|-0.8249039|16
My idea is to expand the PREPROCESSED_TABLE somewhere in the data step - could somebody help me with that, please? How can I read out the header names which are internally stored?
If the goal is to make a file with one line with the variable names then just write the file yourself. First get the names into a dataset (in order) and then write them. For example you could use PROC TRANSPOSE with OBS=0 dataset option to generate a file with one observation per variable.
proc transpose data=V42.PREPROCESSED_TABLE(obs=0) out=NAMES ;
var _all_ ;
run;
Which you can then use to write to a file.
data _null_;
set names ;
file 'preprocessed.txt' dsd dlm='|';
put _name_ # ;
run;
If you also want to add the data to that same file just use a second data step. Make sure to use the MOD option on the FILE statement so that data lines are appended to the existing file.
data _null_;
set V42.PREPROCESSED_TABLE;
file 'preprocessed.txt' dsd dlm='|' mod;
put (_all_) (+0);
run;
If you need to call the existing macro for other reasons you could either ignore the file it creates. Or if for some reason the content is different than just the simple dump of the file then you could just concatenate the file with the the headers with the file the macro generates. Say the macro generated 'PREPROCESSED_TABLE.txt' and your code generated the one line file 'headers.txt'. Then this step will read both and write 'PREPROCESSED_TABLE_w_headers.txt';
data _null_;
file 'PREPROCESSED_TABLE_w_headers.txt';
if _n_=1 then do;
infile 'headers.txt';
input;
put _infile_;
end;
infile 'PREPROCESSED_TABLE.txt';
input;
put _infile_;
run;
Given Reeza's and Tom's hints, I figured out a workaround myself: We simple call out macro %tabl twice, once with a 1-row-table with column-names and once with the data. This approach essentially corresponds to attaching to the file first the headers and then then data to the file (except that I have to worry about additional things added by %tabl further down in the process chain).
The technical difficulty I had was how to extract this 1-row-table with column names from the meta-info of the table input table V21.INPUT_TABLE.
My team mate showed me how that is done. To make it testable for everybody, I will show this step for the test data table sashelp.class:
proc contents data=sashelp.class out=meta (keep=NAME VARNUM) noprint;
run;
proc sort data=meta out=meta2;
by VARNUM;
run;
proc transpose data=meta2 out=colheaders (drop=_NAME_ _LABEL_);
var name;
run;
As a result, we will have a table colheaders with exactly one line containing the table headers, sorted by VARNUM which is the order in which they appear in the original table:
COL1 COL2 COL3 COL4 COL5
1 NAME SEX AGE HEIGHT WEIGHT
Problem solved, at least theoretically.

Need to set unknown amount of datasets together

I have a macro that creates multiple datasets. The number of datasets depends on other factors and could be 0-X. I need to set all of these datasets together into one dataset to export.
%macro runpromo(setid=, title=, start=, end=, no=);
%get_offer_data(label=&title , start=date &start, end=date &end, rc=%quote(&setid), source=1);
data promo&no.;
length item_desc $50.0;
set edw_final;
run;
%mend runpromo;
data _null_;
set macros;
call execute('%runpromo(setid='||code||',title='||promo_title||',start ='||start_date||',end='||end_date||',no='||count||');');
run;
data all_promos;
length item_desc $50.0;
set promo1-promoX;
run;
I want to automate this code to run daily so I don't want to have to go in and update the dataset names each time.
Use a naming convention where each data set has the same prefix and goes to a single library. Then you can stack them easily at the end:
data want;
set prom:;
run;
The colon (:) acts as a wild card and all datasets with the prefix prom in the work library will be combined.
Why not just build the larger dataset as you go?
Macro Definition:
%macro runpromo(setid=, title=, start=, end=, no=);
%get_offer_data(label=&title,start=date &start,end=date &end,rc=%quote(&setid),source=1);
data promo_fix;
length item_desc $50;
set edw_final;
run;
proc append base=all_promos data=promo_fix force;
run;
%mend runpromo;
Program:
proc delete data=all_promos;
run;
data _null_;
set macros;
call execute(cats('%nrstr(%runpromo)'
,'(setid=',code
,',title=',promo_title
,',start =',start_date
,',end=',end_date
,',no=',count
,');'
));
run;

Check if a value exists in a dataset or column

I have a macro program with a loop (for i in 1 to n). With each i i have a table with many columns - variables. In these columns, we have one named var (who has 3 possible values: a b and c).
So for each table i, I want to check his column var if it exists the value "c". If yes, I want to export this table into a sheet of excel. Otherwise, I will concatenate this table with others.
Can you please tell me how can I do it?
Ok, in your macro at step i you have to do something like this
proc sql;
select min(sum(case when var = 'c' then 1 else 0 end),1) into :trigger from table_i;
quit;
then, you will get macro variable trigger equal 1 if you have to do export, and 0 if you have to do concatenetion. Next, you have to code something like this
%if &trigger = 1 %then %do;
proc export data = table_i blah-blah-blah;
run;
%end;
%else %do;
data concate_data;
set concate_data table_i;
run;
%end;
Without knowing the whole nine yard of your problem, I am at risk to say that you may not need Macro at all, if you don't mind exporting to .CSV instead of native xls or xlsx. IMHO, if you do 'Proc Export', meaning you can't embed fancy formats anyway, you 'd better off just use .CSV in most of the settings. If you need to include column headings, you need to tap into metadata (dictionary tables) and add a few lines.
filename outcsv '/share/test/'; /*define the destination for CSV, change it to fit your real settings*/
/*This is to Cat all of the tables first, use VIEW to save space if you must*/
data want1;
set table: indsname=_dsn;
dsn=_dsn;
run;
/*Belowing is a classic 2XDOW implementation*/
data want;
file outcsv(test.csv) dsd; /*This is your output CSV file, comma delimed with quotes*/
do until (last.dsn);
set want1;
by dsn notsorted; /*Use this as long as your group is clustered*/
if var='c' then _flag=1; /*_flag value will be carried on to the next DOW, only reset when back to top*/
end;
do until (last.dsn);
set want1;
by dsn notsorted;
if _flag=1 then put (_all_) (~); /*if condition meets, output to CSV file*/
else output; /*Otherwise remaining in the Cat*/
end;
drop dsn _flag;
run;

Set the labels of a SAS Dataset equal to their variable name

I'm working with a rather large several dataset that are provided to me as a CSV files. When I attempt to import one of the files the data will come in fine but, the number of variables in the file is too large for SAS, so it stops reading the variable names and starts assigning them sequential numbers. In order to maintain the variable names off of the data set I read in the file with the data row starting on 1 so it did not read the first row as variable names -
proc import file="X:\xxx\xxx\xxx\Extract\Live\Live.xlsx" out=raw_names dbms=xlsx replace;
SHEET="live";
GETNAMES=no;
DATAROW=1;
run;
I then run a macro to start breaking down the dataset and rename the variables based on the first observations in each variable -
%macro raw_sas_datasets(lib,output,start,end);
data raw_names2;
raw_names;
if _n_ ne 1 then delete;
keep A -- E &start. -- &end.;
run;
proc transpose data=raw_names2 out=raw_names2;
var A -- &end.;
run;
data raw_names2;
set raw_names2;
col1=compress(col1);
run;
data raw_values;
set raw;
keep A -- E &start. -- &end.;
run;
%macro rename(old,new);
data raw_values;
set raw_values;
rename &old.=&new.;
run;
%mend rename;
data _null_;
set raw_names2;
call execute('%rename('||_name_||","||col1||")");
run;
%macro freq(var);
proc freq data=raw_values noprint;
tables &var. / out=&var.;
run;
%mend freq;
data raw_names3;
set raw_names2;
if _n_ < 6 then delete;
run;
data _null_;
set raw_names3;
call execute('%freq('||col1||")");
run;
proc sort data=raw_values;
by StudySubjectID;
run;
data &lib..&output.;
set raw_values;
run;
%mend raw_sas_datasets;
The problem I'm running into is that the variable names are now all set properly and the data is lined up correctly, but the labels are still the original SAS assigned sequential numbers. Is there any way to set all of the labels equal to the variable names?
If you just want to remove the variable labels (at which point they default to the variable name), that's easy. From the SAS Documentation:
proc datasets lib=&lib.;
modify &output.;
attrib _all_ label=' ';
run;
I suspect you have a simpler solution than the above, though.
The actual renaming step needs to be done differently. Right now it's rewriting the entire dataset over and over again - for a lot of variables that is a terrible idea. Get your rename statements all into one datastep, or into a PROC DATASETS, or something else. Look up 'list processing SAS' for details on how to do that; on this site or on google you will find lots of solutions.
You likely can get SAS to read in the whole first line. The number of variables isn't the problem; it is probably the length of the line. There's another question that I'll find if I can on this site from a few months ago that deals with this exact problem.
My preferred option is not to use PROC IMPORT for CSVs anyway; I would suggest writing a metadata table that stores the variable names and the length/types for the variables, then using that to write import code. A little more work at first, but only has to be done once per study and you guarantee PROC IMPORT isn't making silly decisions for you.
In the library sashelp is a table vcolumn. vcolumn contains all the names of your variables for each library by table. You could write a macro that puts all your variable names into macro variables and then from there set the label.
Here's some code that I put together (not very pretty) but it does what you're looking for:
data test.label_var;
x=1;
y=1;
label x = 'xx';
label y = 'yy';
run;
proc sql noprint;
select count(*) into: cnt
from sashelp.vcolumn
where memname = 'LABEL_VAR';quit;
%let cnt = &cnt;
proc sql noprint;
select name into: name1 - :name&cnt
from sashelp.vcolumn
where memname = 'LABEL_VAR';quit;
%macro test;
%do i = 1 %to &cnt;
proc datasets library=test nolist;
modify label_var;
label &&name&i=&&name&i;
quit;
%end;
%mend test;
%test;

Need to repeate data reading from multiple files using sas and run freqs on separate dataset created from separate files

I am new to SAS and facing few difficulties while creating following program.
My requirement is to pass the filename generated dynamically and read it so that don't have to write code five times to read data from 5 different files and then run freqs on the datasets.
I have provided the code below and have to write this code for more than 50 files:
Code
filename inp1 '/chshttp/prod/clients/coms/raw/coms_coms_relg_f1102_t1102_c10216_vEL5535.raw';
filename inp2 '/chshttp/prod/clients/coms/raw/coms_coms_relg_f1103_t1103_c10317_vEL8312.raw';
filename inp3 '/chshttp/prod/clients/coms/raw/coms_coms_relg_f1104_t1104_c10420_vEL11614.raw';
filename inp4 '/chshttp/prod/clients/coms/raw/coms_coms_relg_f1105_t1105_c10510_vEL13913.raw';
filename inp5 '/chshttp/prod/clients/coms/raw/coms_coms_relg_f1106_t1106_c10628_vEL17663.raw';
data test;
Do i = 1 to 5;
infile_name = 'inp' || i;
infile infile_name recfm = v lrecl=1800 end=eof truncover;
INPUT
#1 E_CUSTDEF1_CLIENT_ID $CHAR5.
#1235 E_MED_PLAN_CODE $CHAR20.
#1090 MED_INS_ELIG_COVERAGE_IND $CHAR20.
#1064 MED_COVERAGE_BEGIN_DATE $CHAR8.
#1072 MED_COVERAGE_TERM_DATE $CHAR8.
;
if E_CUSTDEF1_CLIENT_ID ='00002' then
output test;
end;
run;
proc freq data = test;
tables E_CUSTDEF1_CLIENT_ID*E_MED_PLAN_CODE / list missing;
run;
Please help!!
Here's an example you can adapt. There are different ways to do this, but this is one- depending no how you want the frequencies.
Step 1: Create a dataset, 'my_filenames', that stores the filename you want to read in, one per line, in a variable FILE_NAME.
Step 2: Read in the files.
data my_data;
set my_filenames;
infile a filevar=file_name <the rest of your options>;
<your input statement>;
run;
proc freq data=mydata;
by file_name;
<your table statements>;
run;
This is simple, data driven code that doesn't require macros or storing large amounts of data in things that shouldn't have data in them (macro variables, filenames, etc.)
To directly answer your question, here is a SAS macro to read each file and run PROC FREQ:
%macro freqme(dsn);
data test;
infile "&dsn" recfm = v lrecl=1800 end=eof truncover;
INPUT #1 E_CUSTDEF1_CLIENT_ID $CHAR5.
#1235 E_MED_PLAN_CODE $CHAR20.
#1090 MED_INS_ELIG_COVERAGE_IND $CHAR20.
#1064 MED_COVERAGE_BEGIN_DATE $CHAR8.
#1072 MED_COVERAGE_TERM_DATE $CHAR8.
;
if E_CUSTDEF1_CLIENT_ID = '00002';
run;
proc freq data=test;
tables E_CUSTDEF1_CLIENT_ID*E_MED_PLAN_CODE / list missing;
run;
proc delete data=test;
run;
%mend;
%freqme(/chshttp/prod/clients/coms/raw/coms_coms_relg_f1102_t1102_c10216_vEL5535.raw);
%freqme(/chshttp/prod/clients/coms/raw/coms_coms_relg_f1103_t1103_c10317_vEL8312.raw);
%freqme(/chshttp/prod/clients/coms/raw/coms_coms_relg_f1104_t1104_c10420_vEL11614.raw);
%freqme(/chshttp/prod/clients/coms/raw/coms_coms_relg_f1105_t1105_c10510_vEL13913.raw);
%freqme(/chshttp/prod/clients/coms/raw/coms_coms_relg_f1106_t1106_c10628_vEL17663.raw);
Note that I added a PROC DELETE step to delete the SAS data set after creating the report. I did that more for illustration, since you don't say you need the file as a SAS data set for subsequent processing.
You can use this as a template for other macro programming.