SAS - Input all variables in a data step without naming every variable - sas

How does one input all variables/columns within a data step using INPUT but without naming every variable? This can be done by naming each variable, for example:
DATA dataset;
INFILE '/folders/myfolders/file.txt';
INPUT variable1 variable2 variable3 variable4 $ variable5;
RUN;
However, this is very tedious for large datasets containing 200+ variables.

The original question implied that you already had a SAS data set. In that case all variables are automatically included when you SET the dataset.
data copy ;
set '/folders/myfolders/file.sas7bdat';
run;
Or just reference it in the analysis you want to do.
proc means data='/folders/myfolders/file.sas7bdat';
run;
If you actually have a TEXT file and you want to read it into a SAS dataset you could use PROC IMPORT to guess what is in the file. If it has a header row then proc import will try to convert those into valid variable names. It will also try to guess how to define the variables based on what values it sees in the text file.
proc import out=want datafile='/folders/myfolders/file.txt' dbm=dlm ;
delimiter=',';
run;
Or if the issue that it is too hard to create 200 unique variable names you could just use a variable list with numeric suffixes to save a lot of typing.
DATA dataset;
INFILE '/folders/myfolders/file.txt' dsd ;
length var1-var200 $20 ;
input var1-var200 ;
RUN;

Related

SAS PROC IMPORT Multiple SAV Files- Force SPSS Value Labels to Create UNIQUE SAS Format Names

Sometimes if I import multiple SAV files into the SAS work library, one variable imported later on overwrites the display text (i.e., the format) of an earlier imported variable with a similar name.
I've determined that this is because the later dataset's variable produces a format name for the custom format (from SPSS Values Labels) that is identical to format name from the earlier variable, even though the variables have different definitions in the Value Labels attributes in the SAV files.
Is there a way to force SAS to not re-use the same format names by automatically checking at PROC IMPORT whether a format name already exists in the work library format library before auto-naming a new custom format? Or is there any other way of preventing this from happening?
Here is my code as well as an example of the variable names, format names, etc.
proc import out=Dataset1 datafile="S:\folder\Dataset1.SAV"
dbms=SAV replace;
run;
proc import out=DatasetA datafile="S:\folder\DatasetA.SAV"
dbms=SAV replace;
run;
Dataset1 contains variable Question_1. The original SPSS Values Labels are 1=Yes 2=No. When this dataset is imported, SAS automatically generates the Format Name QUESTION., for Question_1. When only Dataset1 is imported, the definition of format QUESTION. corresponds to the SPSS Value Labels for Question_1 in Dataset1.SAV
DatasetA contains variable Question_A with SPSS Value Labels 1=Agree 2=Unsure 3=Disagree. When this dataset is imported after Dataset1, SAS automatically generates the Format Name QUESTION. for Question_A, even though the work library already contains a format named QUESTION.. Therefore, this overwrites the definition of format QUESTION. that was generated when Dataset1 was imported. Once DatasetA is imported, the definition of format QUESTION. corresponds to the SPSS Value Labels for Question_A in DatasetA.SAV
Therefore, when Dataset1 and DatasetA are both imported, Variable Question_1 and Question_A both have the format name QUESTION assigned to them - And the definition of the format QUESTION. in the SAS work folder corresponds to the SPSS Value Labels in DatasetA.SAV, not Dataset1.SAV. Therefore, Question_1 will display as 1=Agree 2=Unsure, even though the variable values actually mean 1=Yes 2=No.
I would ideally like for these two variables to produce distinct custom format names at their import step, automatically. Is there any way to make this happen? Alternatively, is there any other way that prevent this type of overwriting from occurring?
Thank you.
The way to prevent literal overwriting is to point to a different format catalog for each SPSS file that is being read using the FMTLIB= optional statement.
proc import out=dataset1 replace
datafile="S:\folder\Dataset1.SAV" dbms=SAV
;
fmtlib=work.fmtcat1;
run;
proc import out=dataset2 replace
datafile="S:\folder\Dataset2.SAV" dbms=SAV
;
fmtlib=work.fmtcat2;
run;
You can then work later to rename the conflicting formats (and change the attached format in the dataset to use the new name).
So if the member name and format name are short enough you should be able to generate a unique new name by appending the two (add something in between to avoid conflict). So something like this will rename the formats, change the format name attached to the variables and rebuild the formats into the WORK.FORMATS catalog.
%macro sav_import(file,memname);
%if 0=%length(&memname) %then %let memname=%scan(&file,-2,\./);
proc import datafile=%sysfunc(quote(&file)) dbms=save
out=&memname replace
;
fmtlib=work.&memname ;
run;
proc format lib=work.&memname cntlout=formats;
run;
data formats ;
set formats end=eof;
by fmtname type notsorted;
oldname=fmtname;
fmtname=catx('_',"&memname",oldname);
run;
proc contents data=&memname noprint out=contents;
run;
proc sql noprint;
select distinct catx(' ',c.name,cats(f.fmtname,'.'))
into :fmtlist separated by ' '
from contents c inner join formats f
on c.format = f.oldname
;
quit;
proc datasets nolist lib=work;
modify &memname;
format &fmtlist ;
run;
quit;
proc format lib=work.formats cntlin=formats;
run;
%mend sav_import;
%sav_import(S:\folder\Dataset1.SAV);
%sav_import(S:\folder\Dataset2.SAV);

Naming exported files with variable in SAS

I am looking to export data from SAS to .csv with PROC EXPORT. I would like the name of the file to change based on the value of a variable. Is that possible?
PROC EXPORT DATA= WORK.A
OUTFILE= "c:\folders\filenameVAR1.csv"
DBMS=CSV LABEL REPLACE;
PUTNAMES=YES;
RUN;
I would like to add the value of a variable (eg. VAR1) at the end of my file name as shown above. I would like a file named filenameVAR1.csv and then when I change the variable it would be called filenameVAR2.csv.
Thanks
In order to do this, you'll need to pull that value out into a macro variable. You can't directly use a data step variable in this manner, because PROC EXPORT doesn't interact with the data step (even if it makes use of it in some cases).
One question would be, when you say ' the value of a variable ' what do you mean? If you mean "the value of one variable on one row", even if that's actually present on all rows, you can do it pretty easily:
proc sql;
select max(var1) into :var1
from work.a
;
quit;
Then you have a &VAR1. macro variable you can insert into your export:
PROC EXPORT DATA= WORK.A
OUTFILE= "c:\folders\filename&VAR1..csv"
DBMS=CSV LABEL REPLACE;
PUTNAMES=YES;
RUN;
Note the extra . that terminates the macro variable.
Now, if this value changes, and you want a new file for each set of rows with a common value for var1, then it's different. You can't do that directly in proc export, but since you're writing out a CSV you could do it yourself!
data _null_;
set work.a;
length file_w_Var1 $255;
file_w_var1 = cats('c:\folders\filename',VAR1,'.csv');
file a filevar=file_w_var1 dlm=',' dsd;
put
var1
var2 $
var3
var4
var5 $
var6
;
run;
(obviously with real variable names and appropriate $ and whatnot).
You could do the export, copy the export code from the log into a program file, and just change the file statement as I do above (and add the creation of that variable code) in order to get it to work. For this to work how you want it must be sorted by var1 (at least grouped by it, if not sorted).
What the filevar option does is it tells SAS to look at a variable and use that for the file location, instead of whatever is on the file statement - so there is a dummy a there (unrelated to the name of your dataset). It will make a new file each time that value changes.

SAS - keep name of table being processed

I'm reading in a number of tables and would like to know the name of the table being processed so I can save it to my output table. Is there an automatic variable or some sort of flag that will help? I'm sure this can be done when reading in a list of CSV files etc. But these are data sets. Something like:
%let table_list=one two three;
Data whatever;
set &table_list;
table_name = ?????;
You need to use the INDSNAME= option on the SET statement. Look up the details.
INDSNAME=variable
creates and names a variable that stores the name of the SAS data set from which the current observation is read. The stored name can be a data set name or a physical name. The physical name is the name by which the operating environment recognizes the file.
If you have just created a dataset in a previous proc or datastep, you can use the &SYSLAST automatic macro variable to retrieve its name.
If you want to save this as part of the metadata for a downstream data set, rather than storing it in a variable, one option is to assign a label to that dataset, e.g.
data input_ds;
a=1;
output;
run;
%put &SYSLAST;
data output_ds(label="created from &SYSLAST");
set input_ds;
b=1;
run;
%put &SYSLAST;
You can also use proc datasets to assign data set labels:
/*Modify an existing label*/
proc datasets lib = work;
modify output_ds(label="New label");
run;
quit;
You can retrieve a data set label using the attrc function.

Stop sas macro from overwriting different imported csv files as the same sas dataset

I found a macro and have been using it to import datasets that are given to me in csv format. Now I need to edit it because I have datasets that have an id number in them and I want sas datasets with the same name.
THE csvs are named things like IDSTUDY233_first.csv So I want the sas dataset to be IDSTUDY233_first. It should appear in my work folder.
I thought it would just create a sas dataset for each csv named IDSTUDY233_first or something like that. (and so on and so forth for each additional study). However it's naming this way.
IDSTUDY_FIRST
and over rights itself for every ID. I am newer to macros and have been trying to figure out WHY it does this and how to fix it. Suggestions?
%let subdir=Y:\filepath\; *MACRO VARIABLE FOR FILEPATH;
filename dir "&subdir.*.csv "; *give the file the name from the path that your at whatever the csv is named;
data new; *create the dataset new it has all those filepath names csv names;
length filename fname $ 200;
infile dir eof=last filename=fname;
input ;
last: filename=fname;
run;
proc sort data=new nodupkey; *sort but don't keep duplicate files;
by filename;
run;
data null; *create the dataset null;
set new;
call symputx(cats('filename',_n_),filename); *call the file name for this observation n;
call symputx(cats('dsn',_n_),compress(scan(filename,-2,'\.'), ,'ka')); *call the dataset for this file compress then read the file;
call symputx('nobs',_n_); *call for the number of observations;
run;
%put &nobs.; *but each observation in;
%macro import; *start the macro import;
%do i=1 %to &nobs; *Do for each fie to number of observations;
proc import datafile="&&filename&i" out=&&dsn&i dbms=csv replace;
getnames=yes;
run;
%end;
%mend import;
%import
*call import macro;
As you can see I added my comments of my understanding. Like I said macros are new to me. I may be incorrect in my understanding. I am guessing the problem is either in
call symputx(cats('dsn',_n_),compress(scan(filename,-2,'\.'), ,'ka'));
or it is in the import statement probably out=&&dsn&i since it rapidly over writes the previous SAS files until it does every one. It's just I need all the sas files not just the last 1.
My guess is that you are right, it is to do with this line:
call symputx(cats('dsn',_n_),compress(scan(filename,-2,'\.'), ,'ka'));
The gotcha is in the arguments passed to compress. Compress can be used to remove or keep certain characters in a string. In the above example, they are using it to just keep alphabetic characters by passing in the 'ka' modifiers. This is effectively causing files with different names (because they have different numbers) to be treated as the same file.
You can modify this behaviour to keep alphabetic characters, digits, and the underscore character by changing the parameters from ka to kn.
This change does mean that you also need to make sure that none of your file names begin with a number (as SAS datasets can't begin with a number).
The documentation for the compress function is here:
http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000212246.htm
An easy way to debug this would be to take the dataset with all of the call symput statements, and in addition to storing these values in macro variables, write them to variables in the dataset. Also change it from a data _null_ to a data tmp statement. You can then see for each file what the destination table name will be.

Running All Variables Through a Function in SAS

I am new to SAS and need to sgplot 112 variables. The variable names are all very different and may change over time. How can I call each variable in the statement without having to list all of them?
Here is what I have done so far:
%macro graph(var);
proc sgplot data=monthly;
series x=date y=var;
title 'var';
run;
%mend;
%graph(gdp);
%graph(lbr);
The above code can be a pain since I have to list 112 %graph() lines and then change the names in the future as the variable names change.
Thanks for the help in advance.
List processing is the concept you need to deal with something like this. You can also use BY group processing or in the case of graphing Paneling in some cases to approach this issue.
Create a dataset from a source convenient to you that contains the list of variables. This could be an excel or text file, or it could be created from your data if there's a way to programmatically tell which variables you need.
Then you can use any of a number of methods to produce this:
proc sql;
select cats('%graph(',var,')')
into: graphlist separated by ' '
from yourdata;
quit;
&graphlist
For example.
In your case, you could also generate a vertical dataset with one row per variable, which might be easier to determine which variables are correct:
data citiwk;
set sashelp.citiwk;
var='COM';
val=WSPCA;
output;
var='UTI';
val=WSPUA;
output;
var='INDU';
val=WSPIA;
output;
val=WSPGLT;
var='GOV';
output;
keep val var date;
run;
proc sort data=citiwk;
by var date;
run;
proc sgplot data=citiwk;
by var;
series x=date y=val;
run;
While I hardcoded those four, you could easily create an array and use VNAME() to get the variable name or VLABEL() to get the variable label of each array element.