I'm attempting to build a loop in SAS to upload several files, and am running into a few issues to work through. Current code:
%Macro Weatherupload(File=, output=);
proc import datafile = &File;
out = &output;
dbms=dlm replace;
delimiter= ",";
getnames=yes;
guessingrows = 1000;
run;
%Mend Weatherupload;
%Macro WeatherPrepare(input=, output=);
data &output (keep=Wban_Number _YearMonthDay DewPoint Temp _Avg_Dew_Pt _Avg_Temp year month day);
set &input;
DewPoint = Input(compress(_Avg_Dew_Pt,"*"), 3.);
Temp = Input(compress(_Avg_Temp,"*"), 3.);
year = (_yearmonthday - mod(_yearmonthday, 10000))/10000;
month = ((_yearmonthday - mod(_yearmonthday, 100)) - (_yearmonthday - mod(_yearmonthday,10000)))/100;
day = mod(_yearmonthday, 100);
drop _Avg_Dew_Pt _Avg_Temp _YearMonthDay;
run;
%Mend WeatherPrepare;
data temperatures;
do i = 1999 to 2015;
do j = 1 to 12;
name = 'C:\Users\DILLON.SAXE\Documents\'||i||j||'.tar'||' \'||i||j||'daily.txt';
output = i||j||'weather';
final = i||j||'final';
%Weatherupload(File=name, output=output)
%WeatherPrepare(input=output, output=final)
end;
end;
run;
The goal is to run through several files, in several folders, listed in month + day + rest of title, and (at the moment) upload two variables of data from them. Later I will want to add in merging the files, and doing some more data work, but for the moment it's the macro issues and uploading that are holding it up.
Is there a way to either use proc upload in a loop, or use another data step in the loop?
I get the error "more positional variables than (something)" (I forget exact error, but it lists positional variables). I've tried adding and removing commas in the macros, but have not been able to get rid of this error. Any ideas?
I don't think you can call macro's like you have in your data step. I think you're intending to use Call Execute.
data temperatures;
do i = 1999 to 2015;
do j = 1 to 12;
name = 'C:\Users\DILLON.SAXE\Documents\'||i||j||'.tar'||' \'||i||j||'daily.txt';
output = i||j||'weather';
final = i||j||'final';
call execute('%Weatherupload(File='||name||', output='||output||')');
call execute('%WeatherPrepare(input='||output||', output='||final||')');
end;
end;
run;
Alternatively, assuming you're trying to read all files in a folder, I think you should be creating a list of file names in a data set, use a data step with the filename option to input all files at once instead. Here's a brief method on how to do it if all where in a single folder: https://communities.sas.com/docs/DOC-10426
Here is a page that has code to get a list of files into a data set
http://www.sascommunity.org/wiki/Making_Lists
since your macros have neither conditionals (%if) nor loops (%do)
then I suggest you use them as parameterized %incudes
Here is a tool to read the list-of-files data set and call a program
http://www.sascommunity.org/wiki/Call_Execute_Parameterized_Include
note: in proc import always set guessingrows to the max value;
in v9.3 that is 2147483647;
Got it sorted out, based on the first answer. Eventual code:
%Macro Weatherupload(File=, output=);
proc import datafile = "&File"
out = &output
dbms=dlm replace;
delimiter= ",";
getnames=yes;
guessingrows = 1000;
run;
%Mend Weatherupload;
%Macro WeatherPrepare(input=, output=);
data &output;
set &input;
DewPoint = Input(compress(_Avg_Dew_Pt,"*"), 3.);
Temp = Input(compress(_Avg_Temp,"*"), 3.);
year = (_yearmonthday - mod(_yearmonthday, 10000))/10000;
month = ((_yearmonthday - mod(_yearmonthday, 100)) - (_yearmonthday - mod(_yearmonthday,10000)))/100;
day = mod(_yearmonthday, 100);
keep Wban_Number DewPoint Temp year month day;
run;
%Mend WeatherPrepare;
%Macro WeatherPrepare2(input=, output=);
data &output;
set &input;
DewPoint = Input(DewPoint, 3.);
Temp = Input(compress(_Avg_Temp,"*"), 3.);
year = (_yearmonthday - mod(_yearmonthday, 10000))/10000;
month = ((_yearmonthday - mod(_yearmonthday, 100)) - (_yearmonthday - mod(_yearmonthday,10000)))/100;
day = mod(_yearmonthday, 100);
Wban_Number = Wban;
keep Wban_Number DewPoint Temp year month day;
run;
%Mend WeatherPrepare;
%Macro Append(merge=);
data temperatures;
set temperatures &merge;
%Mend Append;
data temperatures;
do i = 1999 to 2015;
do j = 1 to 12;
jzero = put(j, z2.);
name = compress('C:\Users\DILLON.SAXE\Documents\'||i||jzero||'.tar'||'\'||i||jzero||'daily.txt');
name2 = compress('C:\Users\DILLON.SAXE\Documents\'||'QCLCD'||i||jzero||'\'||i||jzero||'daily.txt');
output = compress('weather'||i||j);
final = compress('final'||i||j);
if 1000*i+j < 200708 then
do;
call execute('%Weatherupload(File='||name||', output='||output||')');
call execute('%WeatherPrepare(input='||output||', output='||final||')');
end;
else
do;
call execute('%Weatherupload(File='||name2||', output='||output||')');
call execute('%WeatherPrepare2(input='||output||', output='||final||')');
end;
call execute('%Append(merge='||final||')');
end;
end;
drop i j jzero name name2 output final;
run;
Related
I have many datasets for each month with the same name, changing just the end with specific month so for instance my datasets that i am calling with this code:
TEMPCAAD.LIFT_&NOME_MODELO._&VERSAO_MODELO._'!! put(cur_month,yymmn6.));
are called "TEMPCAAD.LIFT_MODEL_V1_202021", "TEMPCAAD.LIFT_MODEL_V1_202022" and so on...
I am trying to append all datasets but some of them doesn't exist, so when i run the following code I get the error
Dataset "TEMPCAAD.LIFT_MODEL_V1_202022" does not exist.
%let currentmonth = &anomes_scores;
%let previousyearmonth = &anomes_x12;
data _null_;
length string $1000;
cur_month = input("&previousyearmonth.01",yymmdd8.);
do until (cur_month > input("¤tmonth.01",yymmdd8.));
string = catx(' ',trim(string),'TEMPCAAD.LIFT_&NOME_MODELO._&VERSAO_MODELO._'!! put(cur_month,yymmn6.));
cur_month = intnx('month',cur_month,1,'b');
end;
call symput('mydatasets',trim(string));
%put &mydatasets;
run;
data WORK.LIFTS_U6M;
set &mydatasets.;
run;
How can I append only existing datasets?
Instead of looping on every file to see whether it exist or not, why don't you just extract all the dataset names from dictionary.tables?
libname TEMPCAAD "/home/kermit/TEMPCAAD";
data tempcaad.lift_model_v1_202110 tempcaad.lift_model_v1_202111 tempcaad.lift_model_v1_202112;
id = 1;
output tempcaad.lift_model_v1_202110;
id = 2;
output tempcaad.lift_model_v1_202111;
id = 3;
output tempcaad.lift_model_v1_202112;
run;
%let nome_modelo = MODEL;
%let versao_modelo = V1;
proc sql;
select strip("TEMPCAAD."||memname) into :dataset separated by " "
from dictionary.tables
where libname="TEMPCAAD" and memname like "LIFT_&NOME_MODELO._&VERSAO_MODELO.%";
quit;
data want;
set &dataset.;
run;
You can easily tweak the where statement to only extract the data that you wish to append. Just remember to put double quotes if you specify a macro-variable in it.
I have a dataset in SAS in which the months would be dynamically updated each month. I need to calculate the sum vertically each month and paste the sum below, as shown in the image.
Proc means/ proc summary and proc print are not doing the trick for me.
I was given the following code before:
`%let month = month name;
%put &month.;
data new_totals;
set Final_&month. end=end;
&month._sum + &month._final;
/*feb_sum + &month._final;*/
output;
if end then do;
measure = 'Total';
&month._final = &month._sum;
/*Feb_final = feb_sum;*/
output;
end;
drop &month._sum;
run; `
The problem is this has all the months hardcoded, which i don't want. I am not too familiar with loops or arrays, so need a solution for this, please.
enter image description here
It may be better to use a reporting procedure such as PRINT or REPORT to produce the desired output.
data have;
length group $20;
do group = 'A', 'B', 'C';
array month_totals jan2020 jan2019 feb2020 feb2019 mar2019 apr2019 may2019 jun2019 jul2019 aug2019 sep2019 oct2019 oct2019 nov2019 dec2019;
do over month_totals;
month_totals = 10 + floor(rand('uniform', 60));
end;
output;
end;
run;
ods excel file='data_with_total_row.xlsx';
proc print noobs data=have;
var group ;
sum jan2020--dec2019;
run;
proc report data=have;
columns group jan2020--dec2019;
define group / width=20;
rbreak after / summarize;
compute after;
group = 'Total';
endcomp;
run;
ods excel close;
Data structure
The data sets you are working with are 'difficult' because the date aspect of the data is actually in the metadata, i.e. the column name. An even better approach, in SAS, is too have a categorical data with columns
group (categorical role)
month (categorical role)
total (continuous role)
Such data can be easily filtered with a where clause, and reporting procedures such as REPORT and TABULATE can use the month variable in a class statement.
Example:
data have;
length group $20;
do group = 'A', 'B', 'C';
do _n_ = 0 by 1 until (month >= '01feb2020'd);
month = intnx('month', '01jan2018'd, _n_);
total = 10 + floor(rand('uniform', 60));
output;
end;
end;
format month monyy5.;
run;
proc tabulate data=have;
class group month;
var total;
table
group all='Total'
,
month='' * total='' * sum=''*f=comma9.
;
where intck('month', month, '01feb2020'd) between 0 and 13;
run;
proc report data=have;
column group (month,total);
define group / group;
define month / '' across order=data ;
define total / '' ;
where intck('month', month, '01feb2020'd) between 0 and 13;
run;
Here is a basic way. Borrowed sample data from Richard.
data have;
length group $20;
do group = 'A', 'B';
array months jan2020 jan2019 feb2020 feb2019 mar2019 apr2019 may2019 jun2019 jul2019 aug2019 sep2019 oct2019 oct2019 nov2019 dec2019;
do over months;
months = 10 + floor(rand('uniform', 60, 1));
end;
output;
end;
run;
proc summary data=have;
var _numeric_;
output out=temp(drop=_:) sum=;
run;
data want;
set have temp (in=t);
if t then group='Total';
run;
I have a SAS code (SQL) that has to repeat for 25 times; for each month/year combination (see code below). How can I use a macro in this code?
proc sql;
create table hh_oud_AUG_17 as
select hh_key
,sum(RG_count) as RG_count_aug_17
,case when sum(RG_count) >=2 then 1 else 0 end as loyabo_recht_aug_17
from basis_RG_oud
where valid_from_dt <= "01AUG2017"d <= valid_to_dt
group by hh_key
order by hh_key
;
quit;
proc sql;
create table hh_oud_SEP_17 as
select hh_key
,sum(RG_count) as RG_count_sep_17
,case when sum(RG_count) >=2 then 1 else 0 end as loyabo_recht_sep_17
from basis_RG_oud
where valid_from_dt <= "01SEP2017"d <= valid_to_dt
group by hh_key
order by hh_key
;
quit;
If you use a data step to do this, you can put all the desired columns in the same output dataset rather than using a macro to create 25 separate datasets:
/*Generate lists of variable names*/
data _null_;
stem1 = "RG_count_";
stem2 = "loyabo_recht_";
month = '01aug2017'd;
length suffix $4 vlist1 vlist2 $1000;
do i = 0 to 24;
suffix = put(intnx('month', month, i, 's'), yymmn4.);
vlist1 = catx(' ', vlist1, cats(stem1,suffix));
vlist2 = catx(' ', vlist2, cats(stem2,suffix));
end;
call symput("vlist1",vlist1);
call symput("vlist2",vlist2);
run;
%put vlist1 = &vlist1;
%put vlist2 = &vlist2;
/*Produce output table*/
data want;
if 0 then set have;
start_month = '01aug2017'd;
array rg_count[2, 0:24] &vlist1 &vlist2;
do _n_ = 1 by 1 until(last.hh_key);
set basis_RG_oud;
by hh_key;
do i = 0 to hbound2(rg_count);
if valid_from_dt <= intnx('month', start_month, i, 's') <= valid_to_dt
then rg_count[1,i] = sum(rg_count[1,i],1);
end;
end;
do _n_ = 1 to _n_;
set basis_RG_oud;
do i = 0 to hbound2(rg_count);
rg_count[2,i] = rg_count[1,i] >= 2;
end;
end;
run;
Create a second data set that enumerates (is a list of) the months to be examined. Cross Join the original data to that second data set. Create a single output table (or view) that contains the month as a categorical variable and aggregates based on that. You will be able to by-group process, classify or subset based on the month variable.
data months;
do month = '01jan2017'd to '31dec2018'd;
output;
month = intnx ('month', month, 0, 'E');
end;
format month monyy7.;
run;
proc sql;
create table want as
select
month, hh_key,
sum(RG_count) as RG_count,
case when sum(RG_count) >=2 then 1 else 0 end as loyabo_recht
from
basis_RG_oud
cross join
months
where
valid_from_dt <= month <= valid_to_dt
group
by month, hh_key
order
by month, hh_key
;
…
/* Some analysis */
BY MONTH;
…
/* Some tabulation */
CLASS MONTH;
TABLE … MONTH …
WHERE year(month) = 2018;
I am trying to delete the files which are having 3 year old files from a folder. But when I am running the code it is also deleting the other files which are not in the file name frmat which I tried to delete. The file name is like SFRE_BIL_SIT_20160812_134317_PAM_FILES1.zip I attached the code also with this
options mlogic;
%macro delete_year_files_in_folder(folder);
filename filelist "&folder";
data _null_;
dir_id = dopen('filelist');
total_members = dnum(dir_id);
do i = 1 to total_members;
member_name = dread(dir_id,i);
datestring = scan(member_name,4,'_');
month = input(substr(datestring,5,2),best.);
day = input(substr(datestring,5,2),best.);
year = input(substr(datestring,1,4),best.);
date = mdy(month, day, year);
if intnx('year', today(),-3,'S') > date %put _all_;
then do;
file_id = mopen(dir_id,member_name,'i',0);
if file_id > 0 then do;
freadrc = fread(file_id);
rc = fclose(file_id);
rc = filename('delete',member_name,,,'filelist');
rc = fdelete('delete');
end; %put _all_;
rc = fclose(file_id);
end;
end;
rc = dclose(dir_id);
run;
%mend;
I can see at least one bug in your code that might be causing unexpected behaviour:
month = input(substr(datestring,5,2),best.);
day = input(substr(datestring,5,2),best.);
I think you meant to type:
day = input(substr(datestring,7,2),best.);
I wouldn't do this, though - it's quicker to use date informats to do this:
date = input(datestring,yymmdd8.);
However, I think the bigger problem is with this line:
if intnx('year', today(),-3,'S') > date then do; /*Deletion logic follows*/
If you have a file that you don't want to delete that isn't in the same format, it's likely that date will have a missing value, as it won't have a date in the place where you're looking and the input functions earlier on will return missing values. In SAS, numeric missing values are less than any non-missing numeric value, so this condition will evaluate to true except for files with names in the format that you want to delete that are less than 3 years old.
You can avoid missing values fairly easily by tweaking your code like so:
if intnx('year', today(),-3,'S') > date and not(missing(date)) then do;
So I have this code that works well for one year, but I need to convert it as a loop so it works for years from 1970 to 2015.
Here is the code for 1 year that I specify in a %let statement.
%let year=1970
rsubmit;
data home.historical_returns_&year;
set home.crspdata;
where (year <= &year - 1) and (year >= &year - 5);
returns_count + 1;
by id year;
if first.id or missing(tot_ret) then returns_count = 1;
run;
endrsubmit;
So far, that code works great for me. Now, I am trying to use a loop so I do it for year 1970 to 2015.
I have came up with this. Which looks like it works great, but the year stays at 1970.
%macro GMV;
rsubmit;
%do year=1970 %to 2015;
data home.historical_returns_&year;
set home.crspdata;
where (year <= &year - 1) and (year >= &year - 5);
returns_count + 1;
by id year;
if first.id or missing(tot_ret) then returns_count = 1;
run;
%end;
endrsubmit;
%mend GMV;
%GMV
In the log, I see that the &year in the name never actually changes from 1970 to 1971 to 1972 and so on. So I do not end up with the 45 different datasets that I need.
Anybody ever had this problem?
Thank you!
You're mixing up remote processing with local processing in a way that's going to cause problems like this. Your macro variable won't be updated (and I'm a bit surprised it's not throwing an error about the %do loop, personally).
rsubmit;
%macro GMV;
%do year=1970 %to 2015;
data home.historical_returns_&year;
set home.crspdata;
where (year <= &year - 1) and (year >= &year - 5);
returns_count + 1;
by id year;
if first.id or missing(tot_ret) then returns_count = 1;
run;
%end;
%mend GMV;
%GMV
endrsubmit;
Put the whole macro in the rsubmit to get the result you're looking for - or put the whole rsubmit in the macro (not as good of an idea in my opinion, though Tom in comments notes that it might be the safer option in some cases).
If you want to reference a macro variable in the code that you RSUBMIT then the macro variable needs to exist in the remote session.
%macro GMV(start,end);
%local year;
%do year=&start %to &end;
%syslput year=&year;
rsubmit;
data home.historical_returns_&year;
set home.crspdata;
by id year;
where (year <= &year - 1) and (year >= &year - 5);
returns_count + 1;
if first.id or missing(tot_ret) then returns_count = 1;
run;
endrsubmit;
%end;
%mend GMV;
%GMV(1970,2015);