I am using sas DI studio and I am trying to get a date value written within the name of the input file,to put the same date also in the output file name. So first I need to get the filename and the extract the date and then put the date back. What is the easiest approach?
The best way to automize your SAS code is bringing the date information from the system like:
data _null_;
call symput("PERDATE", Compress(put(intnx('month',date(),-1,'end'), YYMMDDn8.)));
run;
From the line above you get Perdate Macro variable in YYYYMMDD format for the previous month (-1). Then you can substract from this variable however you want.
%let ayear = %Substr(&PERDATE,1,4); /* ayear is the year in YYYY format */
%let amonth = %Substr(&PERDATE,5,2); /* amonth is the month in MM format */
%let aday = %Substr(&PERDATE,7,2); /* aday is the day in DD format */
%let adate=&ayear&amonth; /* adate is if you want in format YYYYMM */
%put &ayear&amonth;
%put &amonth;
%put &perdate;
Then you can read your input file daily or monthly basis automatically without changing the INPUT and OUTPUT file names. For example, if you get Sales report monthly basis in a format like Sales_YYYYMM then you can just write in your code like:
infile sales_&ayear.&amonth.; /* Put Dot after each Macro statement */
Sales_&ayear.&amonth. will give you Sales_201506. You can describe your output file in a same matter like Out=result_&ayear.&amonth.;
data result_&ayear.&amonth.;
set Sales_&ayear.&amonth.;
run;
If you are not getting your file regularly, if you need to enter the date manually, then you can just write create a macro variable for date at the beginning of your code, then you always use that macro variable in your code:
%let mydate=201506;
%put &mydate;
So in each run, you just change the variable mydate in yoru code. You don't need to change anything else...
The trick is setup the EXTERNAL FILE Metadata for input and output file correctly. Here's how to do that. Try creating EXTERNAL FILE metadata using an already existing file so that metadata of the columns are created correctly with format and informat. After EXTERNAL FILE metadata is created, edit the EXTERNAL FILE metadata by right clicking the FILE metadata and goto Properties -> File Location tab. In the file name section instead of file that already exist replace the date part with the macro variable. For example, if file name is : c:\Sushil_20150701.txt enter c:\Sushil_&mydatevar..txt . Also in FILE NAME QUOTING section, select "Double quotes around file name"
The same goes for Output file. Since we are going to use the &mydatevar macro variable on the input file name in the output file name, you can put the mydatevar similarly as done above.
After all this is done, we are good to use the file metadata with File Reader and File Writer transformation to read and write the files respectively.
Hope this helps!
Related
I have a sas7bdat format file, but it's zipped.
I could unzip the file and work on it, but this makes me lose hard disk space and time.
So I tried this code on SAS :
filename myfile ZIP 'C:\...\data.zip' member="data.sas7bdat" ;
data yoyo;
infile myfile (data.sas7bdat);
input;
put _infile_;
run;
But I get an empty yoyo table in the WORK library.
How can I successfully import the data.sas7bdat ?
Thank you,
You need to uncompress the dataset before SAS can use it. So you need to find a place that has enough space for the fully expanded file.
Note that your code is trying to specify the member name of the file within the ZIP file twice. You should only do that once. Either point the fileref to the aggregate location and use member name in the reference. Or point the fileref to the individual member and just use the fileref.
Here is a method to expand the file into your current WORK folder.
%let member=data.sas7bdat;
filename in zip 'C:\...\data.zip' member="&member" recfm=n;
filename out "%sysfunc(pathname(work))/&member" recfm=n;
data _null_;
rc=fcopy('in','out');
run;
You can now work with the file using the name WORK.DATA.
proc print data=work.data(obs=1); run;
If you want to read data from a ZIP file directly then it either needs to be raw (text) data or in a streaming format, like a SAS V5 XPORT file.
I am trying to open a file in SAS where the name of the file changes for each row based on the value of a variable.
All the files I want to open are in the same directory named a day of the year as a number (127.csv, 128.csv, 129.csv, etc). My SAS data has a column called "day" and for every row, I would like to open the file named that day, extract a value from that file and add it to my original file.
What is the best way to open a file when the name changes each row with the value of a variable?
See my attached data if this is unclear.
Thanks.
In general if you want to drive reading a set of files from a list you use the FILEVAR= option on the INFILE statement.
data want;
set have ;
length fname $200 ;
fname = catx('/',"&path",cats(day,'.csv'));
infile csv filevar=fname dsd firstobs=2 end=eof truncover ;
do while (not eof);
input ..... ;
output;
end;
run;
Put whatever code you need to read the CSV file(s) inside the DO loop. Make sure to use the OUTPUT statement to explicitly write out the observations since each iteration of the data step will process one whole file.
SAS is my primary software for this project. I am using SAS to call 7zip. Both SAS and 7zip are 64 bit versions. The objective is to read compressed US NOAA weather station data for 4 years -- about 4,000 stations, so approx. 16,000 files.
Each file contains multiple, variable length, undelimited records with each record containing date, time, and weather information for a given station (e.g., temperature, visibility, precipitation, and so on). This problem is not about reading the records. It's about reading the files.
These files are not stored using an extension of any type, e.g., there is no *.txt, *.dat, *.gz, *.tar. Nothing defining a file type is used in their naming. I have checked this by turning the ‘File Name Extension’ option on and off in Windows File Explorer. File extensions appear for other information but not for the NOAA files. Each file name has 3 fields. The first two fields define a unique NOAA weather station (USAF and WBAN respectively) and the last field is the year. Here are some representative file names:
702120-26646-2011
702120-26646-2012
702120-26646-2013
etc.
Since the files are compressed, I am using 7zip to uncompress them with a macro call from SAS. Here is the SAS macro syntax for these calls:
%do year=2011 %to 2014;
filename in pipe "c:/7-zip/7z.exe x
""C:\data\stuff\weather\data\extracted.zip\extracted\&&usaf-&&wban-&&year\""
-so" lrecl=3000;
run;
%end;
And here is resolved code for 1 file:
MLOGIC(LOOPS): %DO loop beginning; index variable YEAR; start value
is 2011; stop value is 2014; by value is 1.
SYMBOLGEN: Macro variable USAF resolves to 702120
SYMBOLGEN: Macro variable WBAN resolves to 26646
SYMBOLGEN: Macro variable YEAR resolves to 2011
MPRINT(LOOPS): filename in pipe "c:/7-zip/7z.exe x
""C:\data\stuff\weather\data\extracted.zip\extracted\702120-26646-2011\""
-so" lrecl=3000;
MPRINT(LOOPS): run;
NOTE: The infile IN is:
Unnamed Pipe Access Device,
PROCESS=c:/7-zip/7z.exe x
"C:\data\stuff\weather\data\extracted.zip\extracted\702120-26646-2011\" -so,
RECFM=V,LRECL=3000
Stderr output:
ERROR: The system cannot find the path specified.
C:\data\stuff\weather\data\extracted.zip\extracted\702120-26646-2011
System ERROR:
The system cannot find the path specified.
Clearly there are no errors in syntax but SAS cannot find the file.
The thing that is confusing me is that the file exists or resides in the folder exactly as specified by the resolved path.
Is there a mistake in the 7zup call such that it can’t find the file? For instance should a 7zip option other than “-so” be used?
What else could be going wrong here? Any suggestions are most welcome!
SAS now supports reading ZIP files directly. Try something like this:
filename in zip "C:\data\stuff\weather\data\extracted.zip"
member="&usaf.-&wban.-&year."
lrecl=3000
;
I'm trying to use a Macro that retrieves a single value from a CSV file. I've written a MACRO that works perfectly fine if there is only 1 CSV file, but does not deliver the expected results when I have to run it against more than one file. If there is more than one file it returns the value of the last file in each iteration.
%macro reporting_import( full_file_route );
%PUT The Source file route is: &full_file_route;
%PUT ##############################################################;
PROC IMPORT datafile = "&full_file_route"
out = file_indicator_tmp
dbms = csv
replace;
datarow = 3;
RUN;
data file_indicator_tmp (KEEP= lbl);
set file_indicator_tmp;
if _N_ = 1;
lbl = "_410 - ACCOUNTS"n;
run;
proc sql noprint ;
select lbl
into :file_indicator
from file_indicator_tmp;
quit;
%PUT The Source Reporting period states: &file_indicator;
%PUT ##############################################################;
%mend;
This is where I execute the Macro. Each excel file's full route exists as a seperate record in a dataset called "HELPERS.RAW_WAITLIST".
data _NULL_;
set HELPERS.RAW_WAITLIST;
call execute('%reporting_import('||filename||')');
run;
In the one example I just ran, The one file contains 01-JUN-2015 and the other 02-JUN-2015. But what the code returns in the LOG file is:
The Source file route is: <route...>\FOO1.csv
##############################################################
The Source Reporting period states: Reporting Date:02-JUN-2015
##############################################################
The Source file route is: <route...>\FOO2.csv
##############################################################
The Source Reporting period states: Reporting Date:02-JUN-2015
##############################################################
Does anybody understand why this is happening? Or is there perhaps a better way to solve this?
UPDATE:
If I remove the code from the MACRO and run it manually for each input file, It works perfectly. So it must have something to do with the MACRO overwriting values.
CALL EXECUTE has tricky timing issues. When it invokes a macro, if that macro generates macro variables from data set variables, it's a good idea to wrap the macro call in %NRSTR(). That way call execute generates the macro call, but doesn't actually execute the macro. So try changing your call execute statement to:
call execute('%nrstr(%%)reporting_import('||filename||')');
I posted a much longer explanation here.
I'm not too clear on the connections between your files. But instead of importing the CSV files and then searching for your string, couldn't you use a pipe command to save the results of a grep search on your CSV files to a dataset and then read just in the results?
Update:
I tried replicating your issue locally and it works for me if I set file_indicator with a call symput as below instead of your into :file_indicator:
data file_indicator_tmp (KEEP= lbl);
set file_indicator_tmp;
if _N_ = 1;
lbl = "_410 - ACCOUNTS"n;
data _null_ ;
set file_indicator_tmp ;
if _n_=1 then call symput('file_indicator',lbl) ;
run;
I have an excel file where open of the columns is temperature (F) and then when I import it in sas it saves variable name as temperature_F_ or when I use validvarany option it saves exactly as temperature (F). However, I need to now convert the data in C. So whenever I use either of the variable name (i.e temperature_F_ or temperature (F)) it does not work. For the second one, it thinks temperature as functions. So wats the way around this one?
The exact nature of your problem isn't clear, as temperature_F_ should be fine if you've imported under validvarname=v7.
data want;
set have;
temperature_c_ = (5/9)*((temperature_f_)-32);
run;
If you have to work with the validvarname=any; version, then you use named literals:
data want;
set have;
'temperature(c)'n = (5/9)*(('temperature(f)'n)-32);
run;
Similar to a date literal (ie, '01JAN2010'd) but for member/variable/etc. names.