SAS is my primary software for this project. I am using SAS to call 7zip. Both SAS and 7zip are 64 bit versions. The objective is to read compressed US NOAA weather station data for 4 years -- about 4,000 stations, so approx. 16,000 files.
Each file contains multiple, variable length, undelimited records with each record containing date, time, and weather information for a given station (e.g., temperature, visibility, precipitation, and so on). This problem is not about reading the records. It's about reading the files.
These files are not stored using an extension of any type, e.g., there is no *.txt, *.dat, *.gz, *.tar. Nothing defining a file type is used in their naming. I have checked this by turning the ‘File Name Extension’ option on and off in Windows File Explorer. File extensions appear for other information but not for the NOAA files. Each file name has 3 fields. The first two fields define a unique NOAA weather station (USAF and WBAN respectively) and the last field is the year. Here are some representative file names:
702120-26646-2011
702120-26646-2012
702120-26646-2013
etc.
Since the files are compressed, I am using 7zip to uncompress them with a macro call from SAS. Here is the SAS macro syntax for these calls:
%do year=2011 %to 2014;
filename in pipe "c:/7-zip/7z.exe x
""C:\data\stuff\weather\data\extracted.zip\extracted\&&usaf-&&wban-&&year\""
-so" lrecl=3000;
run;
%end;
And here is resolved code for 1 file:
MLOGIC(LOOPS): %DO loop beginning; index variable YEAR; start value
is 2011; stop value is 2014; by value is 1.
SYMBOLGEN: Macro variable USAF resolves to 702120
SYMBOLGEN: Macro variable WBAN resolves to 26646
SYMBOLGEN: Macro variable YEAR resolves to 2011
MPRINT(LOOPS): filename in pipe "c:/7-zip/7z.exe x
""C:\data\stuff\weather\data\extracted.zip\extracted\702120-26646-2011\""
-so" lrecl=3000;
MPRINT(LOOPS): run;
NOTE: The infile IN is:
Unnamed Pipe Access Device,
PROCESS=c:/7-zip/7z.exe x
"C:\data\stuff\weather\data\extracted.zip\extracted\702120-26646-2011\" -so,
RECFM=V,LRECL=3000
Stderr output:
ERROR: The system cannot find the path specified.
C:\data\stuff\weather\data\extracted.zip\extracted\702120-26646-2011
System ERROR:
The system cannot find the path specified.
Clearly there are no errors in syntax but SAS cannot find the file.
The thing that is confusing me is that the file exists or resides in the folder exactly as specified by the resolved path.
Is there a mistake in the 7zup call such that it can’t find the file? For instance should a 7zip option other than “-so” be used?
What else could be going wrong here? Any suggestions are most welcome!
SAS now supports reading ZIP files directly. Try something like this:
filename in zip "C:\data\stuff\weather\data\extracted.zip"
member="&usaf.-&wban.-&year."
lrecl=3000
;
Related
I have to perform statistical analysis on a file with hundreds of observations and 7 variables(columns)on SAS. I know that it is necessary to insert all the observations after "cards" or "datalines". But I can't write them all obviously. How can I do? Moreover, the given data file already is .sas7bdat.
Then, since (in my case) the multiple correspondence analysis requires only six of the seven variables, does this affect what I have to write in INPUT or/and in CARDS?
You only use CARDS when you're trying to manually write a data set. If you already have a SAS data set (sas7bdat) you can usually use that directly (there are some exceptions but likely don't apply here).
First create a libname to the folder where the file is:
libname myFiles 'path to fodler with sas file';
Then load it into your work library - this is a temporary space that is cleaned up when you're done so no files here are saved permanently.
This copies it over to that library - which is often faster.
data myFileName;
set myFiles.myFileName;
run;
You can just work with the file from that library by referencing it as myFiles.myFileName in your code.
proc means data=myFiles.myFileName;
run;
This should get you started, but you should take the SAS free e-course to understand the basics, it will save you time overall.
Just tell SAS to use the dataset. INPUT statement (and CARDS/DATALINES or INFILE statement) are for reading from text files.
proc corresp data='/my directory/mydataset.sas7bdat' .... ;
...
run;
You could also make a libref that points to the directory and use two level name to reference the dataset.
libname myfiles '/my directory/';
proc corresp data=myfiles.mydataset .... ;
...
run;
I've scoured the internet but cannot seem to figure this out. My question is, if I have a sas7bdat file, how can I read a sas7bdat file in SAS studio so that I can work with it.
I've tried:
libname test 'C:\Users\name\Downloads\test.sas7bdat';
which gives me the error that library test does not exist and if I try the following, I know that I need an INPUT which I don't know of unless I can see into the file.
DATA test;
INFILE 'C:\Users\lees162\Downloads\test.sas7bdat';
RUN;
Is there something I'm missing?
Libref's that you create via the LIBNAME statement point to directories, not individual files.
libname test 'C:\Users\name\Downloads\';
INFILE is for reading raw data files. To reference an existing SAS dataset you use a SET statement (or MERGE,MODIFY,UPDATE statement).
set test.test ;
Note that you can skip defining a libref and just use the quoted physical name in the SET statement.
DATA test;
set 'C:\Users\lees162\Downloads\test.sas7bdat';
RUN;
Of course to use C:\ in the paths this is assuming that you are using SAS/Studio to point to full SAS running on your PC. If you are using SAS University Edition then it is running in a virtual machine and you will need to put the SAS dataset into a folder that is mapped to the virtual machine and then reference it in the SAS code with the name that the virtual machine uses for the directory.
So something like:
DATA test;
set '/folders/myfolders/test.sas7bdat';
RUN;
Libname is just pointing the location and once you have done that you can use that libname followed period and dataset in your set statement
libname test "C:\Users\name\Downloads";
DATA test;
set test.asl;
RUN;
One possible reason could be that you are using the SAS University edition (It doesn't support variable library address).
From one of the SAS community Q/A:
"When you are using the SAS University Edition, any libraries that you create must be assigned to a shared folder. You access your shared folder with this pathname: /folders/myfolders/. Always use '/' in the directory path, even in Windows operating environments"
After setting the directory address, proceed as instructed by Tom above in one of the answers.
Suppose you have the sas dataset at location. C:\Users\name\Downloads\test.sas7bdat
libname download 'C:\Users\name\Downloads';
proc sql;
select * from downloads.test;
run;
you can read your dataset like a table using the proc sql, in case you want to query the dataset, but if you want to modify the existing dataset then you can use the data setp as mentioned by #krian.
I'm trying to move a SAS dataset over to our Linux server from a client. They created it on SAS 9.4, 64-bit on Windows 7. I'm using SAS 9.4, 64-bit on Linux.
If I do
proc datasets library=din;
run;
I get the following in my log
Libref DIN
Engine V9
Physical Name /sasUsr/DM/DATA/SAS_DATA/201510_SSI
Filename /sasUsr/DM/DATA/SAS_DATA/201510_SSI
Inode Number 46358529
Access Permission rwxrwxr-x
Owner Name cvandenb
File Size (bytes) 4096
Member File
# Name Type Size Last Modified
1 SAMPLE_FROM_SSI DATA 131072 09/14/2015 17:07:01
2 TEST DATA 131072 09/15/2015 09:35:59
15 run;
but when I do
data test;
set din.sample_from_SSI;
run;
I get
18 data test;
19 set din.sample_from_SSI;
ERROR: File DIN.SAMPLE_FROM_SSI.DATA does not exist.
20 run;
I also created a dummy dataset din.test and was able to proc print it. This seems to either be a version compatibility issue or transmission issue. I thought this would be straightforward. Any suggestions? I'm moving the file from windows to Linux with WinSCP. I'd rather not have to request a .csv and create the input statement, but will if I have to. Your help is appreciated.
Thanks,
Cory
If you are talking about an actual SAS dataset then make sure that the name of the file is in all lowercase letters and has the extension of .sas7bdat. If the source file from Windows did not have an extension of .sas7bdat then perhaps you are not dealing with a SAS dataset, but some other type of file.
In SAS code it does not matter whether you reference a dataset using upper or lower case letters. So you can reference a datasets as sample_from_SSI or Sample_From_Ssi to refer to the same file. The same is true of general filenames on a Windows machine. But on Unix system file names with different use of upper and lower case letters are distinct files. SAS requires that the filename of a SAS dataset must be in all lowercase letters.
So if you write:
libname DIN '/sasUsr/DM/DATA/SAS_DATA/201510_SSI';
proc print data=DIN.SAMPLE_FROM_SSI;
run;
Then you are looking to make a listing of the data in a file named:
/sasUsr/DM/DATA/SAS_DATA/201510_SSI/sample_from_ssi.sas7bdat
I usually get a note about CEDA in this case not missing data.
Create either a CPORT or XPORT file using the associated proc, PROC CPORT or XPORT and then move that file.
Try referring to the data with all caps as well, which I don't think should be the issue, but is possible.
I would try using PROC COPY directly on the libname, as you can select memtype=data that way without explicitly specifying the file.
If SAS still can't do that, then you might have a permissions issue or something else that is outside of the SAS realm I suspect.
Try using PROC CPORT and PROC CIMPORT.
Use the CPORT Procedure to convert the file into a transport file.
Use the CIMPORT Procedure to convert the transport file to a SAS format.
There is an example that sounds similar to what you are doing here.
According to SAS, the general procedure is:
A transport file is created at the source computer using PROC CPORT.
The transport file is transferred from the source computer to the target computer via communications software or a magnetic medium
The transport file is read at the target computer using PROC CIMPORT.
Note: Transport files that are created using PROC CPORT are not
interchangeable with transport files that are created using the XPORT
engine.
If that doesn't work, or it is taking a very long time to figure out, it would be faster to ask them for a CSV and import it directly using PROC IMPORT. It should read in quite easily, especially if it comes from PROC EXPORT.
I am using sas DI studio and I am trying to get a date value written within the name of the input file,to put the same date also in the output file name. So first I need to get the filename and the extract the date and then put the date back. What is the easiest approach?
The best way to automize your SAS code is bringing the date information from the system like:
data _null_;
call symput("PERDATE", Compress(put(intnx('month',date(),-1,'end'), YYMMDDn8.)));
run;
From the line above you get Perdate Macro variable in YYYYMMDD format for the previous month (-1). Then you can substract from this variable however you want.
%let ayear = %Substr(&PERDATE,1,4); /* ayear is the year in YYYY format */
%let amonth = %Substr(&PERDATE,5,2); /* amonth is the month in MM format */
%let aday = %Substr(&PERDATE,7,2); /* aday is the day in DD format */
%let adate=&ayear&amonth; /* adate is if you want in format YYYYMM */
%put &ayear&amonth;
%put &amonth;
%put &perdate;
Then you can read your input file daily or monthly basis automatically without changing the INPUT and OUTPUT file names. For example, if you get Sales report monthly basis in a format like Sales_YYYYMM then you can just write in your code like:
infile sales_&ayear.&amonth.; /* Put Dot after each Macro statement */
Sales_&ayear.&amonth. will give you Sales_201506. You can describe your output file in a same matter like Out=result_&ayear.&amonth.;
data result_&ayear.&amonth.;
set Sales_&ayear.&amonth.;
run;
If you are not getting your file regularly, if you need to enter the date manually, then you can just write create a macro variable for date at the beginning of your code, then you always use that macro variable in your code:
%let mydate=201506;
%put &mydate;
So in each run, you just change the variable mydate in yoru code. You don't need to change anything else...
The trick is setup the EXTERNAL FILE Metadata for input and output file correctly. Here's how to do that. Try creating EXTERNAL FILE metadata using an already existing file so that metadata of the columns are created correctly with format and informat. After EXTERNAL FILE metadata is created, edit the EXTERNAL FILE metadata by right clicking the FILE metadata and goto Properties -> File Location tab. In the file name section instead of file that already exist replace the date part with the macro variable. For example, if file name is : c:\Sushil_20150701.txt enter c:\Sushil_&mydatevar..txt . Also in FILE NAME QUOTING section, select "Double quotes around file name"
The same goes for Output file. Since we are going to use the &mydatevar macro variable on the input file name in the output file name, you can put the mydatevar similarly as done above.
After all this is done, we are good to use the file metadata with File Reader and File Writer transformation to read and write the files respectively.
Hope this helps!
I have 4 txt files that need to be loaded to SAS and save them as 4 sas files. Here are how the text files look like: cle20130805.txt, cle20130812.txt, cle20130819.txt and cle20130826.txt . I used a % Do loop under % Macro in order to get the 4 files imported with only one invoke of the Macro. So Here is my code:
%macro cle;
%do i=20130805 %to 20130826 %by 7;
Data cleaug.cle&i;
infile "home/abc/cle&i..txt" dlm= '|' dsd firstobs=1 obs=100;
input a_no b_no c_no;
run;
%end;
%mend cle;
%cle
I am expect to have 4 sas file saved with only invoke the marco once. However it just can't run successfully. Any ideas where am I doing wrong in the code?
Thanks,
I don't recommend you try to write one macro to import all four files. Either it will be a specific macro you only ever use once - in which case you could just write this by hand and save the time you've already spent - or it will be something you have to modify every single month or whatever that you use it.
Instead, make the macro something that does precisely one file, but includes the information needed to call it easily. In this case, it sounds like you need one parameter: the date, so 20130805 or whatnot. Then give it a reasonable name that really says what it does.
%macro import_files(date=);
Data cleaug.cle&date.;
infile "home/abc/cle&date..txt" dlm= '|' dsd firstobs=1 obs=100;
input a_no b_no c_no;
run;
%mend import_files;
Now you call it:
%import_files(date=20130805)
%import_files(date=20130812)
%import_files(date=20130819)
%import_files(date=20130826)
Just as easy as the macro you wrote above, even hardcoding the four dates. If the dates are predictable in some fashion, you can generate the macro calls very easily as well (if there are more than 4, for example). You could do a directory listing of the location where the files are, or call the macro from a data step using CALL EXECUTE if you really like looping.