Proc contents looping through table names from a different data set - sas

I am a newbie to SAS and I am trying to execute below code to obtain all the information for a particular library. However it fails in between due to data in a particular dataset. Is there any way to read dataset names from a different dataset and loop through them creating a different dataset specific to each datasetname from the list?
Proc contents data= testlib. _ALL_ out=x;
Run;
Instead I want something like this
Proc contents data in (work. Tbnames) out = x;
Run;
And read data from below data set.
Data tbnames(keep tablename) ;
Set WORK. tablenames;
Run;
Please help
St

Proc contents data = work.Tbnames out = x;
Run;

Use Proc COPY to copy data sets from one library to another.
libname testlib '<os-path-to-folder>';
proc copy in=testlib out=work memtype=DATA;
run;

Read the data from dictionary.table instead.
This assumes that you have the list of tables in a data set called tableNames and it has a variable called tName, which is the variable name. Note that it is a case sensitive comparison so UPCASE() is used make it all upper case.
proc sql;
create table summary as
select *
from dictionary.table
where memname in (select upcase(tName) from tableNames);
quit;
Or look at PROC DATASETS which operates on a library, not a single data set.
proc datasets lib=myLib;
run;quit;

Related

Can I use wildcards in dataset names for PROC CONTENTS?

On the SAS server we have a library that contains thousands of datasets. I want to catalog the contents of a subset of these, all of which have names that begin with "prov". Can I use a wildcard to specify this?
I tried:
PROC CONTENTS DATA=library.prov*;
RUN;
But that just produces a log with this error message:
ERROR: File LIBRARY.PROV.DATA does not exist.
I also tried library.prov%, and that gave the same error.
There are over 100 datasets that start with "prov" so I really don't want to have to do them one at a time. Any ideas?
Depending on what information you want that the CONTENTS procedure produces you could just use the DICTIONARY metadata views.
proc sql ;
create table want as
select *
from dictionary.columns
where libname = 'LIBREF'
and memname like 'PROV%'
;
quit;
Use a WHERE data set option.
proc contents data=sashelp._all_ noprint out=class(where=(memname like 'CLASS%'));
run;
When you specify the keyword _ALL_ in the PROC CONTENTS statement, the step displays a list of all the SAS files that are in the specified SAS library.
Example :
PROC CONTENTS DATA=libref._ALL_ NODS;
RUN;
But to open only the datasets that begin with prov you can use the SQL and add CONTAINS to WHERE e.g:
proc sql ;
create table mytables as
select *
from dictionary.tables
where libname = 'WORK'
order by memname ;
quit ;
Now just run:
PROC CONTENTS DATA mytables;
RUN;
I may be using a different version of SAS check if you have the library SASHELP if so try this based on my note in your comment on the previous response you may see that this works out for you:
proc sql outobs=100;
create table see as
select distinct libname,memname,crdate,modate from sashelp.vtable
where libname='LIBRARY' and memname like 'PROV%'
order by memname;
quit;

SAS - keep name of table being processed

I'm reading in a number of tables and would like to know the name of the table being processed so I can save it to my output table. Is there an automatic variable or some sort of flag that will help? I'm sure this can be done when reading in a list of CSV files etc. But these are data sets. Something like:
%let table_list=one two three;
Data whatever;
set &table_list;
table_name = ?????;
You need to use the INDSNAME= option on the SET statement. Look up the details.
INDSNAME=variable
creates and names a variable that stores the name of the SAS data set from which the current observation is read. The stored name can be a data set name or a physical name. The physical name is the name by which the operating environment recognizes the file.
If you have just created a dataset in a previous proc or datastep, you can use the &SYSLAST automatic macro variable to retrieve its name.
If you want to save this as part of the metadata for a downstream data set, rather than storing it in a variable, one option is to assign a label to that dataset, e.g.
data input_ds;
a=1;
output;
run;
%put &SYSLAST;
data output_ds(label="created from &SYSLAST");
set input_ds;
b=1;
run;
%put &SYSLAST;
You can also use proc datasets to assign data set labels:
/*Modify an existing label*/
proc datasets lib = work;
modify output_ds(label="New label");
run;
quit;
You can retrieve a data set label using the attrc function.

create a copy of table with PROC COPY with flexible names and libs

I read that a PROC COPY is faster than a DATA step .
I understand how the following works :
proc copy in=lib1 out=lib2;
select have;
run;
However I'd like some flexibility on the name of the output, mainly because I want to copy a table in the same library as the source.
Basically I want (if possible at all) the more efficient version of :
DATA lib1.have;
set lib2.want;
run;
if PROC COPY is faster than a data step it is probably because it knows that it does not need to manipulate the data before writing it back out.
Why not use PROC APPEND for what you want? Use the BASE= option to set the target table and the DATA= option to set the source table.
proc append data=lib1.have base=lib2.want ;
run;
If you want to make sure there is not already a table lib2.want then add a proc delete step before it.
proc delete data=lib2.want;
run;

Adding a column is SAS using MODIFY (no sql)

I'm new to SAS and have some problems with adding a column to existing data set in SAS using MODIFY statement (without proc sql).
Let's say I have data like this
id name salary perks
1 John 2000 50
2 Mary 3000 120
What I need to get is a new column with the sum of salary and perks.
I tried to do it this way
data data1;
modify data1;
money=salary+perks;
run;
but apparently it doesn't work.
I would be grateful for any help!
As #Tom mentioned you use SET to access the dataset.
I generally don't recommend programming this way with the same name in set and data statements, especially as you're learning SAS. This is because it's harder to detect errors, since once run and encounter an error, you destroy your original dataset and have to recreate it before you start again.
If you want to work step by step, consider intermediary datasets and then clean up after you're done by using proc datasets to delete any unnecessary intermediary datasets. Use a naming conventions to be able to drop them all at once, i.e. data1, data2, data3 can be referenced as data1-data3 or data:.
data data2;
set data1;
money = salary + perks;
run;
You do now have two datasets but it's easy to drop datasets later on and you can now run your code in sections rather than running all at once.
Here's how you would drop intermediary datasets
proc datasets library=work nodetails holist;
delete data1-data3;
run;quit;
You can't add a column to an existing dataset. You can make a new dataset with the same name.
data data1;
set data1;
money=salary+perks;
run;
SAS will build it as a new physical file (with a temporary name) and when the step finishes without error it deletes the original and renames the new one.
If you want to use a data set you do it like this:
data dataset;
set dataset;
format new_column $12;
new_column = 'xxx';
run;
Or use Proc SQL and ALTER TABLE.
proc sql;
alter table dataset
add new_column char(8) format = $12.
;
quit;

Running All Variables Through a Function in SAS

I am new to SAS and need to sgplot 112 variables. The variable names are all very different and may change over time. How can I call each variable in the statement without having to list all of them?
Here is what I have done so far:
%macro graph(var);
proc sgplot data=monthly;
series x=date y=var;
title 'var';
run;
%mend;
%graph(gdp);
%graph(lbr);
The above code can be a pain since I have to list 112 %graph() lines and then change the names in the future as the variable names change.
Thanks for the help in advance.
List processing is the concept you need to deal with something like this. You can also use BY group processing or in the case of graphing Paneling in some cases to approach this issue.
Create a dataset from a source convenient to you that contains the list of variables. This could be an excel or text file, or it could be created from your data if there's a way to programmatically tell which variables you need.
Then you can use any of a number of methods to produce this:
proc sql;
select cats('%graph(',var,')')
into: graphlist separated by ' '
from yourdata;
quit;
&graphlist
For example.
In your case, you could also generate a vertical dataset with one row per variable, which might be easier to determine which variables are correct:
data citiwk;
set sashelp.citiwk;
var='COM';
val=WSPCA;
output;
var='UTI';
val=WSPUA;
output;
var='INDU';
val=WSPIA;
output;
val=WSPGLT;
var='GOV';
output;
keep val var date;
run;
proc sort data=citiwk;
by var date;
run;
proc sgplot data=citiwk;
by var;
series x=date y=val;
run;
While I hardcoded those four, you could easily create an array and use VNAME() to get the variable name or VLABEL() to get the variable label of each array element.