I am looking to export data from SAS to .csv with PROC EXPORT. I would like the name of the file to change based on the value of a variable. Is that possible?
PROC EXPORT DATA= WORK.A
OUTFILE= "c:\folders\filenameVAR1.csv"
DBMS=CSV LABEL REPLACE;
PUTNAMES=YES;
RUN;
I would like to add the value of a variable (eg. VAR1) at the end of my file name as shown above. I would like a file named filenameVAR1.csv and then when I change the variable it would be called filenameVAR2.csv.
Thanks
In order to do this, you'll need to pull that value out into a macro variable. You can't directly use a data step variable in this manner, because PROC EXPORT doesn't interact with the data step (even if it makes use of it in some cases).
One question would be, when you say ' the value of a variable ' what do you mean? If you mean "the value of one variable on one row", even if that's actually present on all rows, you can do it pretty easily:
proc sql;
select max(var1) into :var1
from work.a
;
quit;
Then you have a &VAR1. macro variable you can insert into your export:
PROC EXPORT DATA= WORK.A
OUTFILE= "c:\folders\filename&VAR1..csv"
DBMS=CSV LABEL REPLACE;
PUTNAMES=YES;
RUN;
Note the extra . that terminates the macro variable.
Now, if this value changes, and you want a new file for each set of rows with a common value for var1, then it's different. You can't do that directly in proc export, but since you're writing out a CSV you could do it yourself!
data _null_;
set work.a;
length file_w_Var1 $255;
file_w_var1 = cats('c:\folders\filename',VAR1,'.csv');
file a filevar=file_w_var1 dlm=',' dsd;
put
var1
var2 $
var3
var4
var5 $
var6
;
run;
(obviously with real variable names and appropriate $ and whatnot).
You could do the export, copy the export code from the log into a program file, and just change the file statement as I do above (and add the creation of that variable code) in order to get it to work. For this to work how you want it must be sorted by var1 (at least grouped by it, if not sorted).
What the filevar option does is it tells SAS to look at a variable and use that for the file location, instead of whatever is on the file statement - so there is a dummy a there (unrelated to the name of your dataset). It will make a new file each time that value changes.
Related
Sometimes if I import multiple SAV files into the SAS work library, one variable imported later on overwrites the display text (i.e., the format) of an earlier imported variable with a similar name.
I've determined that this is because the later dataset's variable produces a format name for the custom format (from SPSS Values Labels) that is identical to format name from the earlier variable, even though the variables have different definitions in the Value Labels attributes in the SAV files.
Is there a way to force SAS to not re-use the same format names by automatically checking at PROC IMPORT whether a format name already exists in the work library format library before auto-naming a new custom format? Or is there any other way of preventing this from happening?
Here is my code as well as an example of the variable names, format names, etc.
proc import out=Dataset1 datafile="S:\folder\Dataset1.SAV"
dbms=SAV replace;
run;
proc import out=DatasetA datafile="S:\folder\DatasetA.SAV"
dbms=SAV replace;
run;
Dataset1 contains variable Question_1. The original SPSS Values Labels are 1=Yes 2=No. When this dataset is imported, SAS automatically generates the Format Name QUESTION., for Question_1. When only Dataset1 is imported, the definition of format QUESTION. corresponds to the SPSS Value Labels for Question_1 in Dataset1.SAV
DatasetA contains variable Question_A with SPSS Value Labels 1=Agree 2=Unsure 3=Disagree. When this dataset is imported after Dataset1, SAS automatically generates the Format Name QUESTION. for Question_A, even though the work library already contains a format named QUESTION.. Therefore, this overwrites the definition of format QUESTION. that was generated when Dataset1 was imported. Once DatasetA is imported, the definition of format QUESTION. corresponds to the SPSS Value Labels for Question_A in DatasetA.SAV
Therefore, when Dataset1 and DatasetA are both imported, Variable Question_1 and Question_A both have the format name QUESTION assigned to them - And the definition of the format QUESTION. in the SAS work folder corresponds to the SPSS Value Labels in DatasetA.SAV, not Dataset1.SAV. Therefore, Question_1 will display as 1=Agree 2=Unsure, even though the variable values actually mean 1=Yes 2=No.
I would ideally like for these two variables to produce distinct custom format names at their import step, automatically. Is there any way to make this happen? Alternatively, is there any other way that prevent this type of overwriting from occurring?
Thank you.
The way to prevent literal overwriting is to point to a different format catalog for each SPSS file that is being read using the FMTLIB= optional statement.
proc import out=dataset1 replace
datafile="S:\folder\Dataset1.SAV" dbms=SAV
;
fmtlib=work.fmtcat1;
run;
proc import out=dataset2 replace
datafile="S:\folder\Dataset2.SAV" dbms=SAV
;
fmtlib=work.fmtcat2;
run;
You can then work later to rename the conflicting formats (and change the attached format in the dataset to use the new name).
So if the member name and format name are short enough you should be able to generate a unique new name by appending the two (add something in between to avoid conflict). So something like this will rename the formats, change the format name attached to the variables and rebuild the formats into the WORK.FORMATS catalog.
%macro sav_import(file,memname);
%if 0=%length(&memname) %then %let memname=%scan(&file,-2,\./);
proc import datafile=%sysfunc(quote(&file)) dbms=save
out=&memname replace
;
fmtlib=work.&memname ;
run;
proc format lib=work.&memname cntlout=formats;
run;
data formats ;
set formats end=eof;
by fmtname type notsorted;
oldname=fmtname;
fmtname=catx('_',"&memname",oldname);
run;
proc contents data=&memname noprint out=contents;
run;
proc sql noprint;
select distinct catx(' ',c.name,cats(f.fmtname,'.'))
into :fmtlist separated by ' '
from contents c inner join formats f
on c.format = f.oldname
;
quit;
proc datasets nolist lib=work;
modify &memname;
format &fmtlist ;
run;
quit;
proc format lib=work.formats cntlin=formats;
run;
%mend sav_import;
%sav_import(S:\folder\Dataset1.SAV);
%sav_import(S:\folder\Dataset2.SAV);
I'm reading in a number of tables and would like to know the name of the table being processed so I can save it to my output table. Is there an automatic variable or some sort of flag that will help? I'm sure this can be done when reading in a list of CSV files etc. But these are data sets. Something like:
%let table_list=one two three;
Data whatever;
set &table_list;
table_name = ?????;
You need to use the INDSNAME= option on the SET statement. Look up the details.
INDSNAME=variable
creates and names a variable that stores the name of the SAS data set from which the current observation is read. The stored name can be a data set name or a physical name. The physical name is the name by which the operating environment recognizes the file.
If you have just created a dataset in a previous proc or datastep, you can use the &SYSLAST automatic macro variable to retrieve its name.
If you want to save this as part of the metadata for a downstream data set, rather than storing it in a variable, one option is to assign a label to that dataset, e.g.
data input_ds;
a=1;
output;
run;
%put &SYSLAST;
data output_ds(label="created from &SYSLAST");
set input_ds;
b=1;
run;
%put &SYSLAST;
You can also use proc datasets to assign data set labels:
/*Modify an existing label*/
proc datasets lib = work;
modify output_ds(label="New label");
run;
quit;
You can retrieve a data set label using the attrc function.
How does one input all variables/columns within a data step using INPUT but without naming every variable? This can be done by naming each variable, for example:
DATA dataset;
INFILE '/folders/myfolders/file.txt';
INPUT variable1 variable2 variable3 variable4 $ variable5;
RUN;
However, this is very tedious for large datasets containing 200+ variables.
The original question implied that you already had a SAS data set. In that case all variables are automatically included when you SET the dataset.
data copy ;
set '/folders/myfolders/file.sas7bdat';
run;
Or just reference it in the analysis you want to do.
proc means data='/folders/myfolders/file.sas7bdat';
run;
If you actually have a TEXT file and you want to read it into a SAS dataset you could use PROC IMPORT to guess what is in the file. If it has a header row then proc import will try to convert those into valid variable names. It will also try to guess how to define the variables based on what values it sees in the text file.
proc import out=want datafile='/folders/myfolders/file.txt' dbm=dlm ;
delimiter=',';
run;
Or if the issue that it is too hard to create 200 unique variable names you could just use a variable list with numeric suffixes to save a lot of typing.
DATA dataset;
INFILE '/folders/myfolders/file.txt' dsd ;
length var1-var200 $20 ;
input var1-var200 ;
RUN;
I currently have a SAS process that generates multiple data sets (whether they have observations or not). I want to determine a way to control the export procedure based on the total number of observations (if nobs > 0, then export). My first attempt was something primitive using if/then logic comparing a select into macro var (counting obs in a data set) -
DATA _NULL_;
SET A_EXISTS_ON_B;
IF &A_E > 0 THEN DO;
FILE "C:\Users\ME\Desktop\WORKLIST_T &PDAY..xls";
PUT TASK;
END;
RUN;
The issue here is that I don't have a way to write multiple sets to the same workbook with multiple sheets(or do I?)
In addition, whenever I try and add another "Do" block, with similar logic, the execution fails. If this cannot be done with a data null, would ODS be the answer?
The core of what you want to do, conditionally execute code, can be done one of a number of ways.
Let's imagine we have a short macro that exports a dataset to excel. Simple as pie.
%macro export_to_excel(data=,file=,sheet=);
proc export data=&data. outfile=&file. dbms=excel replace;
sheet=&sheet.;
run;
%mend export_to_excel;
Now let's say we want to do this conditionally. How we do it depends, to some degree, on how we call this macro in our code now.
Let's say you have:
%let wherecondition=1; *always true!;
data class;
set sashelp.class;
if &wherecondition. then output;
run;
%export_to_excel(data=class,file="c:\temp\class.xlsx", sheet=class1);
Now you want to make this so it only exports if class has some rows in it, right. So you get the # of obs in class:
proc sql;
select count(1) into :classobs from class;
quit;
And now you need to incorporate that somehow. In this case, the easiest way is to add a condition to the export macro. Open code doesn't allow conditional executing of code, so it needs to be in a macro.
So we do:
%macro export_to_excel(data=,file=,sheet=,condition=1);
%if &condition. %then %do;
proc export data=&data. outfile=&file. dbms=excel replace;
sheet=&sheet.;
run;
%end;
%mend export_to_excel;
And you add the count to the call:
%export_to_excel(data=class,file="c:\temp\class.xlsx", sheet=class1,condition=&classobs.)
Tada, now it won't try to export when it's 0. Great.
If this code is already in a macro, you don't have to alter the export macro itself. You can simply put that %if %then part around the macro call. But that's only if the whole thing is already a macro - %if isn't allowed outside of macros (sorry).
Now, if you're exporting a whole bunch of datasets, and you're generating your export calls from something, you can add the condition there, more easily and more smoothly than this.
Basically, either make by hand (if that makes sense), or use proc sql or proc contents or (other method of your choice) to make a dataset that contains one row per dataset-to-export, with four variables: dataset name, file to export, sheet to export (unless that's the same as the dataset name), and count of observations for that dataset. Often the first three would be made by hand, and then merged/updated via sql or something else to the count of obs per dataset.
Then you can generate calls to export, like so:
proc sql;
select cats('%export_to_excel(data=',dataname,',file=',filename,',sheet=',sheetname,')')
into :explist separated by ' '
from datasetwithnames
where obsnum>0;
quit;
&explist.; *this actually executes them;
Assuming obsnum is the new variable you created with the # of obs, and the other variables are obviously named. That won't pull a line with anything with 0 observations - so it never tries to execute the export. That works with the initial export macro just as well as with the modified one.
Suggest you google around for different approaches to writing XLS files.
Regarding using a DATA step or PROC step, the DATA step is tolerant of datasets that have 0 obs. If the SET statement reads a dataset that has 0 obs, it will simply end the step. So you don't need special logic. Most PROCS also accomodate 0 obs dataset without throwing a warning or error.
For example:
1218 *Make a 0 obs dataset;
1219 data empty;
1220 x=1;
1221 stop;
1222 run;
NOTE: The data set WORK.EMPTY has 0 observations and 1 variables.
1223
1224 data want;
1225 put "I run before SET statement.";
1226 set empty;
1227 put "I do not run after SET statement.";
1228 run;
I run before SET statement.
NOTE: There were 0 observations read from the data set WORK.EMPTY.
NOTE: The data set WORK.WANT has 0 observations and 1 variables.
1229
1230 proc print data=empty;
1231 run;
NOTE: No observations in data set WORK.EMPTY.
But note as Joe points out, PROC EXPORT will happily export a dataset with 0 obs and write an file with 0 records, overwriting if it was there already. e.g.:
1582 proc export data=sashelp.class outfile="d:\junk\class.xls";
1583 run;
NOTE: File "d:\junk\class.xls" will be created if the export process succeeds.
NOTE: "CLASS" range/sheet was successfully created.
1584
1585 data class;
1586 stop;
1587 set sashelp.class;
1588 run;
NOTE: The data set WORK.CLASS has 0 observations and 5 variables.
1589
1590 *This will replace class.xls";
1591 proc export data=class outfile="d:\junk\class.xls" replace;
1592 run;
NOTE: "CLASS" range/sheet was successfully created.
ODS statements would likely do the same.
I use a macro to check if a dataset is empty. SO answers like:
How to detect how many observations in a dataset (or if it is empty), in SAS?
I found a macro and have been using it to import datasets that are given to me in csv format. Now I need to edit it because I have datasets that have an id number in them and I want sas datasets with the same name.
THE csvs are named things like IDSTUDY233_first.csv So I want the sas dataset to be IDSTUDY233_first. It should appear in my work folder.
I thought it would just create a sas dataset for each csv named IDSTUDY233_first or something like that. (and so on and so forth for each additional study). However it's naming this way.
IDSTUDY_FIRST
and over rights itself for every ID. I am newer to macros and have been trying to figure out WHY it does this and how to fix it. Suggestions?
%let subdir=Y:\filepath\; *MACRO VARIABLE FOR FILEPATH;
filename dir "&subdir.*.csv "; *give the file the name from the path that your at whatever the csv is named;
data new; *create the dataset new it has all those filepath names csv names;
length filename fname $ 200;
infile dir eof=last filename=fname;
input ;
last: filename=fname;
run;
proc sort data=new nodupkey; *sort but don't keep duplicate files;
by filename;
run;
data null; *create the dataset null;
set new;
call symputx(cats('filename',_n_),filename); *call the file name for this observation n;
call symputx(cats('dsn',_n_),compress(scan(filename,-2,'\.'), ,'ka')); *call the dataset for this file compress then read the file;
call symputx('nobs',_n_); *call for the number of observations;
run;
%put &nobs.; *but each observation in;
%macro import; *start the macro import;
%do i=1 %to &nobs; *Do for each fie to number of observations;
proc import datafile="&&filename&i" out=&&dsn&i dbms=csv replace;
getnames=yes;
run;
%end;
%mend import;
%import
*call import macro;
As you can see I added my comments of my understanding. Like I said macros are new to me. I may be incorrect in my understanding. I am guessing the problem is either in
call symputx(cats('dsn',_n_),compress(scan(filename,-2,'\.'), ,'ka'));
or it is in the import statement probably out=&&dsn&i since it rapidly over writes the previous SAS files until it does every one. It's just I need all the sas files not just the last 1.
My guess is that you are right, it is to do with this line:
call symputx(cats('dsn',_n_),compress(scan(filename,-2,'\.'), ,'ka'));
The gotcha is in the arguments passed to compress. Compress can be used to remove or keep certain characters in a string. In the above example, they are using it to just keep alphabetic characters by passing in the 'ka' modifiers. This is effectively causing files with different names (because they have different numbers) to be treated as the same file.
You can modify this behaviour to keep alphabetic characters, digits, and the underscore character by changing the parameters from ka to kn.
This change does mean that you also need to make sure that none of your file names begin with a number (as SAS datasets can't begin with a number).
The documentation for the compress function is here:
http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000212246.htm
An easy way to debug this would be to take the dataset with all of the call symput statements, and in addition to storing these values in macro variables, write them to variables in the dataset. Also change it from a data _null_ to a data tmp statement. You can then see for each file what the destination table name will be.