Can I get SAS to concatenate and entire folder of data sets? - sas

I'm working with data that seem to be split into nearly arbitrary sets from year to year. What I would like to do is to be able to start by concatenating all of the .sas7bdat files in a single library. How would I go about this?
Alternatively, if I know all of the possible names that files in the library might be assigned (but many are potentially missing from any given library), how can I get SAS to ignore missing files? For instance, say that I know all of the .sas7bdat files in my library have one of the names "set01", "set02", "set03" or "set04". If a particular library ("L") is missing one of these, then the data step:
DATA temp;
SET L.set01 L.set02 L.set03 L.set04;
RUN;
will produce an error. Assuming that I know that at least one of these exists, is there an option that will tell SAS to ignore the missing ones?
(I understand that these are two totally different questions, but either would solve my immediate problem.)

in SAS there is an easy way for SAS to automatically choose the datasets that start with some common name, you can use following statement:
data temp;
set L.set0: ; /*It will search for all datasets that start with set0 and will set only those which are available*/
run;
Does it answer your query?
Second approach
libname L "Y:\Test Data";
proc sql;
select strip("L."||memname) into :DSNAME separated by ' '
from dictionary.tables
where libname='L';
quit;
/* Main final DS*/
data want;
set &DSNAME;
run;
It will extract all Dataset names in L directory and will create macro variable DSNAME such as : L.set01 L.oth02 etc. , common names won't matter here..

Related

How to copy data afrer "cards"/"datalines" in SAS

I have to perform statistical analysis on a file with hundreds of observations and 7 variables(columns)on SAS. I know that it is necessary to insert all the observations after "cards" or "datalines". But I can't write them all obviously. How can I do? Moreover, the given data file already is .sas7bdat.
Then, since (in my case) the multiple correspondence analysis requires only six of the seven variables, does this affect what I have to write in INPUT or/and in CARDS?
You only use CARDS when you're trying to manually write a data set. If you already have a SAS data set (sas7bdat) you can usually use that directly (there are some exceptions but likely don't apply here).
First create a libname to the folder where the file is:
libname myFiles 'path to fodler with sas file';
Then load it into your work library - this is a temporary space that is cleaned up when you're done so no files here are saved permanently.
This copies it over to that library - which is often faster.
data myFileName;
set myFiles.myFileName;
run;
You can just work with the file from that library by referencing it as myFiles.myFileName in your code.
proc means data=myFiles.myFileName;
run;
This should get you started, but you should take the SAS free e-course to understand the basics, it will save you time overall.
Just tell SAS to use the dataset. INPUT statement (and CARDS/DATALINES or INFILE statement) are for reading from text files.
proc corresp data='/my directory/mydataset.sas7bdat' .... ;
...
run;
You could also make a libref that points to the directory and use two level name to reference the dataset.
libname myfiles '/my directory/';
proc corresp data=myfiles.mydataset .... ;
...
run;

Assigning Headers from one file to multiple data files

I have a list of ~100 files. The first file contains header information for the other 98 data files. The information should be in table format, however each table is a different size (with regards to column and row number).
My goal is to import these files such that the column headers from the first file are correctly assigned.
Additional information:
I am told this list of files was generated using SAS (however I am not familiar with the file format) Furthermore, the "CIMPORT" command does not work on these files.
The files are "|" delineated
Thank you very much for any help.
This was a fun issue. I came up with following way:
First lets load up some data.
proc import datafile = "\\Datadrive\mydata.csv"
out=w_headers;
delimiter=";";
guessingrows=32767;
run;
proc import datafile = "\\Datadrive\no_headers.csv"
out=no_headers;
delimiter=";";
guessingrows=32767;
run;
Then I extract the names of the columns and variable number to a dataset.
proc contents data=w_headers out=meta(keep=NAME VARNUM) noprint ; run ;
Then I create commands to renaming the columns without names to have proper names based on the existing. ones.
data meta;
set meta;
cmd = cats('VAR',VARNUM,'=', name);
run;
Here comes the kicker, I put the the commends to a variable. Next the variable is fed to proc datasets for renaming the columns.
proc sql noprint;
select cmd into :cmd_list separated by ' ' from meta;
quit;
proc datasets library = work nolist;
modify no_headers;
rename &cmd_list;
quit;
At this point my two datasets have identical column names. the method is a bit tricky, but works. I'm sure there is another way, but this was fun one. :)

How to get SAS tables sizes and last usage time in library

Good day!
I need a list of libraries-tables on a SAS server with a size of each table and last time, when it was open/used.
I'm not very familiar with SAS, so I don't even know where would I start searching :(
I assume, that there is some simple solution, maybe a proc of some sort, that may help...
You can use proc contents to access metadata about a library in SAS, for example using the sashelp library:
proc contents data = sashelp._ALL_ NODS;
run;
sashelp is the library you are refencing. By specifying _ALL_ you ask SAS for data about all the files in this library (by choosing a singular file such as sashelp.ztc you can get information on jut one file).
This will give you a lot of information, so by using the NODS statement you can suppress the output to give you less detail. The above code will give you the number of files, their type, the level, the file size, and the data they were last modified.
If you want to output this information to a dataset, you have to use the ODS output system with the correct ods table name, in this case it is Members. Furthermore, if you're looking for datasets in particular then you can filter the output with a where= statement:
ods output Members = test (where = (memtype = "DATA"));
proc contents data = work._ALL_ NODS noprint;
run;
ods listing; /* change back to listing output*/

IGNORE DATA IN SAS IMPORT FROM EXCEL

I have no working knowledge of SAS, but I have an excel file that I need to import and work with. In the excel file there are about 100 rows (observations) and 7 columns (quantities). In some cases, a particular observation may not have any data in one column. I need to completely ignore that observation when reading my data into SAS. I'm wondering what the commands for this would be.
An obvious cheap solution would be to delete the rows in the excel file with missing data, but I want to do this with SAS commands, because I want to learn some SAS.
Thanks!
Import the data however you want, for example with the IMPORT procedure, as Stig Eide mentioned.
proc import
datafile = 'C:\...\file.xlsx'
dbms = xlsx
out = xldata
replace;
mixed = YES;
getnames = YES;
run;
Explanation:
The DBMS= option specifies how SAS will try to read the data. If your file is an Excel 2007+ file, i.e. xlsx, then you can use DBMS=XLSX as shown here. If your file is older, e.g. xls rather than xlsx, try DBMS=EXCEL.
The OUT= option names the output dataset.
If a single level name is specified, the dataset is written to the WORK library. That's the temporary library that's unique to each SAS session. It gets deleted when the session ends.
To create a permanent dataset, specify a two level name, like mylib.xldata, where mylib refers to a SAS library reference (libref) created with a LIBNAME statement.
REPLACE replaces the dataset created the first time you run this step.
MIXED=YES tells SAS that the data may be of mixed types.
GETNAMES=YES will name your SAS dataset variables based on the column names in Excel.
If I understand you correctly, you want to remove every observation in the dataset that has a missing value in any of the seven columns. There are fancier ways to do this, but I recommend a simple approach like this:
data xldata;
set xldata;
where cmiss(col1, col2, ..., col7) = 0;
run;
The CMISS function counts the number of missing values in the variables you specify at each observation, regardless of the data type. Since we're using WHERE CMISS()=0, the resulting dataset will contain only the records with no missing data for any of the seven columns.
When in doubt, try browsing the SAS online documentation. It's very thorough.
If you have "SAS/ACCESS Interface to PC Files" licensed (hint: proc setinit) you can import the Excel file with this code. The where option lets you select which rows you want to keep, in this example you will keep the rows where the column "name" is not blank:
proc import
DATAFILE="your file.xlsx"
DBMS=XLSX
OUT=resulttabel(where=(name ne ""))
REPLACE;
MIXED=YES;
QUIT;

Rename Variable Name Starting with Number in SAS

I have some results that came from a relational database in a SAS data set. All of the variable names start with numbers, so I can't rename them or access them in a data step. Is there any way to rename them or access them without getting the data out of the RDBMS again?
options validvarname=any; will allow you to access them, and perhaps even use the dataset - you can enclose an "illegal" variable name in "variable name"n (quotes then an n afterwards) to make a name literal which is equivalent to a variable name (like in Oracle using "variable name").
If you want to make them easier to use, you can do something like
proc sql;
select catx(' ','rename',name,'=',cats('_',name,';')) into :renamelist separated by ' '
from dictionary.columns
where libname='WORK' and memname='DATASETNAME'; *perhaps AND ANYDIGIT(substr(name,1,1)) as well;
quit;
proc datasets lib=work;
modify datasetname;
&renamelist;
quit;
You could also try setting options validvarname=v7; before you connect to the RDBMS as it's possible SAS will do this for you (depending on the situation) if you have it set that way (and don't currently).
The answer given by Joe has some helpful information, but I actually discovered that SAS has a (somewhat automatic) method for handling this. When you query data from an RDBMS, SAS will actually replace any column names starting with numbers with an underscore for the first character. So 1994Q4 becomes _994Q4. Thus, you can simply access the data that way.
SAS will, however, preserve the original name from the RDBMS as the variable title, so it will display as 1994Q4 (or whatever) in table view mode.