SAS - Choose last dataset in library that satisfies specific name convention - sas

Say I have the a library named mylib.
Within the mylib library, the following datasets are held:
mylib.data_yearly_2015
mylib.data_yearly_2016
mylib.data_yearly_2017
mylib.data_yearly_2018
mylib.data_yearly_2015
mylib.data_mtly_01JUN2015
mylib.data_mtly_01DEC2015
mylib.data_mtly_01JUN2016
mylib.data_mtly_01DEC2016
mylib.data_mtly_01JUN2017
mylib.data_mtly_01DEC2017
Now I need to write a macro that will specifically choose the latest data_mtly_xxxxxx table from the mylib library.
For example, in the current stage, it should choose mylib.data_mtly_01DEC2017
If, however, a new dataset gets added, for example mylib.data_mtly_01JUN2018, it would have to choose that table.
How can I go about doing this in SAS?

Get a list of all data sets
Get the date portion using SCAN() and INPUT()
Get max date.
Proc sql noprint;
Select max(input(scan(name, -1, ‘_’), date9.) ) into :latest_date
From sashelp.vtable
Where upcase(libname) = ‘MYLIB’ and upcase(memname) like ‘DATA_MTLY_%’;
Quit;
Now you should have the latest date value in a macro variable and can use that in your code.
%put &latest_date.;
If it looks like a number and not a date, you’ll need a format applied but you should be able to convert it using PUT().
Note: code is untested.

Related

Can I get SAS to concatenate and entire folder of data sets?

I'm working with data that seem to be split into nearly arbitrary sets from year to year. What I would like to do is to be able to start by concatenating all of the .sas7bdat files in a single library. How would I go about this?
Alternatively, if I know all of the possible names that files in the library might be assigned (but many are potentially missing from any given library), how can I get SAS to ignore missing files? For instance, say that I know all of the .sas7bdat files in my library have one of the names "set01", "set02", "set03" or "set04". If a particular library ("L") is missing one of these, then the data step:
DATA temp;
SET L.set01 L.set02 L.set03 L.set04;
RUN;
will produce an error. Assuming that I know that at least one of these exists, is there an option that will tell SAS to ignore the missing ones?
(I understand that these are two totally different questions, but either would solve my immediate problem.)
in SAS there is an easy way for SAS to automatically choose the datasets that start with some common name, you can use following statement:
data temp;
set L.set0: ; /*It will search for all datasets that start with set0 and will set only those which are available*/
run;
Does it answer your query?
Second approach
libname L "Y:\Test Data";
proc sql;
select strip("L."||memname) into :DSNAME separated by ' '
from dictionary.tables
where libname='L';
quit;
/* Main final DS*/
data want;
set &DSNAME;
run;
It will extract all Dataset names in L directory and will create macro variable DSNAME such as : L.set01 L.oth02 etc. , common names won't matter here..

create lookup table from all formats

I would like to export all available user defined formats into a single lookup table that would contain columns format, value, and label (so that I can use it to manipulate data in R and SQL).
Is there a way to do this ?
You can export the contents of a SAS format catalog (commonly, formats.sas7bcat) with the CNTLOUT method on PROC FORMAT:
proc format lib=mylib cntlout=myformatds;
quit;
This would take the default format catalog stored in the library mylib as formats.sas7bcat and export it to a dataset, myformatds in the work library.
As #user667489 indicated in the comment, the answer was in using PROC FORMAT with the CNTLOUT option.
The general format of the instruction is:
PROC FORMAT
LIBRARY=lib_name CNTLOUT=created_table;
RUN;
One must specify the library however, to find out which library is relevant one can take a look at SASHELP.VFORMAT.
Then one can create different tables and concatenate them.
In my case all user defined formats were in a library called LIBRARY and I don't need all columns so my solution is:
PROC FORMAT
LIBRARY=LIBRARY CNTLOUT=FORMAT_LOOKUP (KEEP=FMTNAME START END LABEL);
RUN;

How to extract latest dataset in a particular library by using macro

I've created table from library called 'common' by using proc sql table is created with crdate by descending now I need to write macro to pick top one which is the latest dataset which is created in that library
Assuming your library contains SAS datasets (.sas7bdat), then the following will create a macro variable latest_dataset with the name of the latest dataset in the COMMON library, without the use of an actual macro:
proc sql noprint;
select memname into: latest_dataset
from dictionary.tables
where libname='COMMON'
having crdate=max(crdate);
%put &=latest_dataset;

IGNORE DATA IN SAS IMPORT FROM EXCEL

I have no working knowledge of SAS, but I have an excel file that I need to import and work with. In the excel file there are about 100 rows (observations) and 7 columns (quantities). In some cases, a particular observation may not have any data in one column. I need to completely ignore that observation when reading my data into SAS. I'm wondering what the commands for this would be.
An obvious cheap solution would be to delete the rows in the excel file with missing data, but I want to do this with SAS commands, because I want to learn some SAS.
Thanks!
Import the data however you want, for example with the IMPORT procedure, as Stig Eide mentioned.
proc import
datafile = 'C:\...\file.xlsx'
dbms = xlsx
out = xldata
replace;
mixed = YES;
getnames = YES;
run;
Explanation:
The DBMS= option specifies how SAS will try to read the data. If your file is an Excel 2007+ file, i.e. xlsx, then you can use DBMS=XLSX as shown here. If your file is older, e.g. xls rather than xlsx, try DBMS=EXCEL.
The OUT= option names the output dataset.
If a single level name is specified, the dataset is written to the WORK library. That's the temporary library that's unique to each SAS session. It gets deleted when the session ends.
To create a permanent dataset, specify a two level name, like mylib.xldata, where mylib refers to a SAS library reference (libref) created with a LIBNAME statement.
REPLACE replaces the dataset created the first time you run this step.
MIXED=YES tells SAS that the data may be of mixed types.
GETNAMES=YES will name your SAS dataset variables based on the column names in Excel.
If I understand you correctly, you want to remove every observation in the dataset that has a missing value in any of the seven columns. There are fancier ways to do this, but I recommend a simple approach like this:
data xldata;
set xldata;
where cmiss(col1, col2, ..., col7) = 0;
run;
The CMISS function counts the number of missing values in the variables you specify at each observation, regardless of the data type. Since we're using WHERE CMISS()=0, the resulting dataset will contain only the records with no missing data for any of the seven columns.
When in doubt, try browsing the SAS online documentation. It's very thorough.
If you have "SAS/ACCESS Interface to PC Files" licensed (hint: proc setinit) you can import the Excel file with this code. The where option lets you select which rows you want to keep, in this example you will keep the rows where the column "name" is not blank:
proc import
DATAFILE="your file.xlsx"
DBMS=XLSX
OUT=resulttabel(where=(name ne ""))
REPLACE;
MIXED=YES;
QUIT;

SAS Proc SQL to add a constant to a variable

I have a SAS dataset with numeric variables to, from, and weight. Some of the observations have value 0 for weight. I need all the weight values to be positive, so I wish to simply add 1 to all weight values.
How can I do that using Proc SQL?
I have tried the following, but it doesn't work:
proc sql;
update mylib.mydata
set weight=weight+1;
quit;
The error is:
ERROR: A CURRENT-OF-CURSOR operation cannot be initiated because
the column "weight" cannot be used to uniquely identify a row
because of its data type.
Also, mylib refers to a Greenplum appliance. This might be the problem...
If you have the database permissions to update that table, you might want to use the SAS/Access pass-through facility. You will need to know the correct syntax for this to work. Here is a non-working example:
proc sql;
connect to greenplm as dbcon
(server=greenplum04 db=sample port=5432 user=gpusr1 password=gppwd1);
execute (
/* Native code goes here */
update sample.mydata
set weight=weight+1
) by dbcon;
quit;
The connection string would be the same as used on the LIBNAME that defined your "mylib' libref.
However, if you are really trying to create a SAS dataset (not update the real table), you can do that with a simple data step:
data mydata;
set mylib.mydata
weight = weight + 1;
run;
That will create a copy of the table that can be used with other SAS procedures.
Check out this note at prosgress.com. You probably need to add UPDATE_MULT_ROWS=YES to your library definition.