SAS - keep name of table being processed - sas

I'm reading in a number of tables and would like to know the name of the table being processed so I can save it to my output table. Is there an automatic variable or some sort of flag that will help? I'm sure this can be done when reading in a list of CSV files etc. But these are data sets. Something like:
%let table_list=one two three;
Data whatever;
set &table_list;
table_name = ?????;

You need to use the INDSNAME= option on the SET statement. Look up the details.
INDSNAME=variable
creates and names a variable that stores the name of the SAS data set from which the current observation is read. The stored name can be a data set name or a physical name. The physical name is the name by which the operating environment recognizes the file.

If you have just created a dataset in a previous proc or datastep, you can use the &SYSLAST automatic macro variable to retrieve its name.
If you want to save this as part of the metadata for a downstream data set, rather than storing it in a variable, one option is to assign a label to that dataset, e.g.
data input_ds;
a=1;
output;
run;
%put &SYSLAST;
data output_ds(label="created from &SYSLAST");
set input_ds;
b=1;
run;
%put &SYSLAST;
You can also use proc datasets to assign data set labels:
/*Modify an existing label*/
proc datasets lib = work;
modify output_ds(label="New label");
run;
quit;
You can retrieve a data set label using the attrc function.

Related

Proc contents looping through table names from a different data set

I am a newbie to SAS and I am trying to execute below code to obtain all the information for a particular library. However it fails in between due to data in a particular dataset. Is there any way to read dataset names from a different dataset and loop through them creating a different dataset specific to each datasetname from the list?
Proc contents data= testlib. _ALL_ out=x;
Run;
Instead I want something like this
Proc contents data in (work. Tbnames) out = x;
Run;
And read data from below data set.
Data tbnames(keep tablename) ;
Set WORK. tablenames;
Run;
Please help
St
Proc contents data = work.Tbnames out = x;
Run;
Use Proc COPY to copy data sets from one library to another.
libname testlib '<os-path-to-folder>';
proc copy in=testlib out=work memtype=DATA;
run;
Read the data from dictionary.table instead.
This assumes that you have the list of tables in a data set called tableNames and it has a variable called tName, which is the variable name. Note that it is a case sensitive comparison so UPCASE() is used make it all upper case.
proc sql;
create table summary as
select *
from dictionary.table
where memname in (select upcase(tName) from tableNames);
quit;
Or look at PROC DATASETS which operates on a library, not a single data set.
proc datasets lib=myLib;
run;quit;

SAS - Input all variables in a data step without naming every variable

How does one input all variables/columns within a data step using INPUT but without naming every variable? This can be done by naming each variable, for example:
DATA dataset;
INFILE '/folders/myfolders/file.txt';
INPUT variable1 variable2 variable3 variable4 $ variable5;
RUN;
However, this is very tedious for large datasets containing 200+ variables.
The original question implied that you already had a SAS data set. In that case all variables are automatically included when you SET the dataset.
data copy ;
set '/folders/myfolders/file.sas7bdat';
run;
Or just reference it in the analysis you want to do.
proc means data='/folders/myfolders/file.sas7bdat';
run;
If you actually have a TEXT file and you want to read it into a SAS dataset you could use PROC IMPORT to guess what is in the file. If it has a header row then proc import will try to convert those into valid variable names. It will also try to guess how to define the variables based on what values it sees in the text file.
proc import out=want datafile='/folders/myfolders/file.txt' dbm=dlm ;
delimiter=',';
run;
Or if the issue that it is too hard to create 200 unique variable names you could just use a variable list with numeric suffixes to save a lot of typing.
DATA dataset;
INFILE '/folders/myfolders/file.txt' dsd ;
length var1-var200 $20 ;
input var1-var200 ;
RUN;

Adding a column is SAS using MODIFY (no sql)

I'm new to SAS and have some problems with adding a column to existing data set in SAS using MODIFY statement (without proc sql).
Let's say I have data like this
id name salary perks
1 John 2000 50
2 Mary 3000 120
What I need to get is a new column with the sum of salary and perks.
I tried to do it this way
data data1;
modify data1;
money=salary+perks;
run;
but apparently it doesn't work.
I would be grateful for any help!
As #Tom mentioned you use SET to access the dataset.
I generally don't recommend programming this way with the same name in set and data statements, especially as you're learning SAS. This is because it's harder to detect errors, since once run and encounter an error, you destroy your original dataset and have to recreate it before you start again.
If you want to work step by step, consider intermediary datasets and then clean up after you're done by using proc datasets to delete any unnecessary intermediary datasets. Use a naming conventions to be able to drop them all at once, i.e. data1, data2, data3 can be referenced as data1-data3 or data:.
data data2;
set data1;
money = salary + perks;
run;
You do now have two datasets but it's easy to drop datasets later on and you can now run your code in sections rather than running all at once.
Here's how you would drop intermediary datasets
proc datasets library=work nodetails holist;
delete data1-data3;
run;quit;
You can't add a column to an existing dataset. You can make a new dataset with the same name.
data data1;
set data1;
money=salary+perks;
run;
SAS will build it as a new physical file (with a temporary name) and when the step finishes without error it deletes the original and renames the new one.
If you want to use a data set you do it like this:
data dataset;
set dataset;
format new_column $12;
new_column = 'xxx';
run;
Or use Proc SQL and ALTER TABLE.
proc sql;
alter table dataset
add new_column char(8) format = $12.
;
quit;

Export/Import Attributes of a SAS dataset

I am working with multiple waves of survey data. I have finished defining formats and labels for the first wave of data.
The second wave of data will be different, but the codes, labels, formats, and variable names will all be the same. I do not want to define all these attributes again...it seems like there should be a way to export the PROC CONTENTS information for one dataset and import it into another dataset. Is there a way to do this?
The closest thing I've found is PROC CPORT but I am totally confused by it and cannot get it to run.
(Just to be clear I'll ask the question another way as well...)
When you run PROC CONTENTS, SAS tells you what format, labels, etc. it is using for each variable in the dataset.
I have a second dataset with the exact same variable names. I would like to use the variable attributes from the first dataset on the variables in the second dataset. Is there any way to do this?
Thanks!
So you have a MODEL dataset and a HAVE dataset, both with data in them. You want to create WANT dataset which has data from HAVE, with attributes of MODEL (formats, labels, and variable lengths). You can do this like:
data WANT ;
if 0 then set MODEL ;
set HAVE ;
run ;
This works because when the DATA step compiles, SAS builds the Program Data Vector (PDV) which defines variable attributes. Even though the SET MODEL never executes (because 0 is not true), all of the variables in MODEL are created in the PDV when the step compiles.
Importantly, note that if there are corresponding variables with different lengths, the length from MODEL will determine the length of the variable in WANT. So if HAVE has a variable that is longer than the same-named variable in MODEL, it may be truncated. Options VARLENCHK determines whether or not SAS throws a warning/error if this happens.
That assumes there are no formats/labels on the HAVE dataset. If there is a variable in HAVE that has a format/label, and the corresponding variable in MODEL does not have a format/label, the format/label from HAVE will be applied to WANT.
Sample code below.
data model;
set sashelp.class;
length FavoriteColor $3;
FavoriteColor="Red";
dob=today();
label
dob='BirthDate'
;
format
dob mmddyy10.
;
run;
data have;
set sashelp.class;
length FavoriteColor $10;
dob=today()-1;
FavoriteColor="Orange";
label
Name="HaveLabel"
dob="HaveLabel"
;
format
Name $1.
dob comma.
;
run;
options varlenchk=warn;
data want;
if 0 then set model;
set have;
run;
I'd create an empty dataset based on the existing one, and then use proc append to append the contents to it.
Create some sample data for the second round of data:
data new_data;
age = 10;
run;
Create an empty dataset based on the original data:
proc sql noprint;
create table want like sashelp.class;
quit;
Append the data into the empty dataset, retaining the details from the original:
proc append base=want data=new_data force nowarn;
run;
Note that I've used the force and nowarn options on proc append. This will ensure the data is appended even if differences are found between the two datasets being used. This is expected if you have, for example, format differences. It will also hide things like if columns exist in the new table that aren't in the old table etc. So be careful that this is doing what you want it to. If the behaviour is undesirable, consider using a datastep to append instead (and list the want dataset first).
Welcome to the stack.
If you want to copy the properties of the table without the data within it, you could use PROC SQL or data step with zero rows read in.
This examples copies all information about the SASHELP.CLASS dataset into a brand new dataset. All formats, attributes, labels, the whole thing is copies over. If you want to only copy some of the columns, specify them in select clause instead of asterix.
PROC SQL outobs=0;
CREATE TABLE WANT as SELECT * FROM SASHELP.CLASS;
QUIT;
Regards,
Vasilij

Rename Variable Regardless of its Name in SAS

Lets suppose we have the following dataset:
ID Stress_Level Heart_Rate
1 5 10
2 7 12
3 9 16
And the code one would use to rename a variable would be:
data test1;
set test0;
rename Stress_Level=A Heart_Rate=B;
run;
However, what I would like to do is to rename the 2 columns without using their names. Is there an "internal" SAS command that addresses the variable depending on which column it is? So for instance Stress_Level which is the 2nd column could be addressed as "COL2 " or something similar. Thus the code would be:
data test1;
set test0;
rename COL2=A COL3=B;
run;
Where "COL2" would always refer to the second column in the dataset regardless of its name. Is there a direct or maybe an indirect way to achieve that?
I think the easiest way is to build up a rename statement string from the metadata table DICTIONARY.COLUMNS (the view of this is SASHELP.VCOLUMN). This holds the column names and position for all tables in active libraries.
I've taken advantage of the ASCII sequence (the byte function) to rename the columns A, B etc, obviously you'd run into problems if there are more than 26 columns to be renamed in the table!
You'll also need to tweak the varnum+63 calculation if you wanted to start from a different column than 2.
proc sql noprint;
select cats(name,"=",byte(varnum+63)) into :newvars separated by ' '
from dictionary.columns
where libname = 'WORK' and memname='HAVE' and varnum>=2;
quit;
data want;
set have;
rename &newvars.;
run;
/* or */
/*
proc datasets lib=work nolist nodetails;
modify have;
rename &newvars.;
quit;
*/
There are a couple of ways you can do this.
The shortest approach is probably to use an array. The only drawbacks are that you need to know the types of the variables in advance and the name of the first variable.
If they are all numeric as in your example the following could be used:
data test1;
set test0;
array vars[*] _numeric_;
A = vars[2];
B = vars[3];
keep ID A B;
run;
You can only have one type of variable in an array, so it's slightly more complicated if they are not all numeric or all character. Additionally you will need to know the name of the first variable and any other variables that you wish to keep if you don't want to have the duplicates of the second and third variables.
A more robust approach is to use information from a dictionary table and a macro variable to write your rename statement:
proc sql;
/* Write the individual rename assignments */
select strip(name) || " = " || substr("ABCDEFGHIJKLMNOPQRSTUVWXYZ", varnum - 1, 1)
/* Store them in a macro variable and separate them by spaces */
into :vars separated by " "
/* Use a sas dictionary table to find metadata about the dataset */
from sashelp.vcolumn
where
libname = "WORK" and
memname = "TEST0" and
2 <= varnum <= 3;
quit;
data test1;
set test0;
rename &vars.;
run;
SAS stores information about datasets in dictionary tables, which have views available in the sashelp library. Take a look in some of the sashelp.v* tables to see what kind of information is available. The proc sql colon is used to store values in a macro variable, which can then be used in the rename statement.
I'd recommend the second approach as it is considerably more flexible and less dependent on the exact structure of your data. It also expands better when you have more than a couple of variables to rename.
Finally, if you want to make the changes to a dataset in place you may want to take a look at using proc datasets (in combination with the dictionary table approach) to do the renaming, as this can change the variable names without having to read and write every line of data.