SAS - referring to a folder based on date - sas

In my SAS code I would like to refer to an existing folder based on the date value (LADE_DATUM) that I declare with a prompt. From this date I define LADE_JAHR and LADE_MONAT:
%let LADE_JAHR = %sysfunc(year("&LADE_DATUM"D));
%let LADE_MONAT = %sysfunc(month("&LADE_DATUM"D));
Based on these two variables I would like to refer to some existing folders (importpfad) that look like:
2020-09, 2020-10, 2020-11, 2020-12, etc.
This is the code:
data _null_;
if &lade_monat < 10 then a = '0'; else a = '';
call symput('a',a);
%let importpfad = /folderx/Input_Files/**&lade_jahr/&lade_jahr.-&a.&lade_monat**/;
The problem is, if a = '' then the folder it refers to looks like "2020- 10" instead of "2020-10".
So there is a space between that I don't want to have.
If a is between 1 and 9, everything is OK.

Just tell %SYSFUNC() you want the month number generated with the Z format instead of the default best format.
%let LADE_MONAT = %sysfunc(month("&LADE_DATUM"D),Z2.);
%let importpfad = /folderx/Input_Files/&lade_jahr/&lade_jahr.-&lade_monat/;

Related

Remove single quotes in list of values in macro variable

I have a project with multiple programs. Each program has a proc SQL statement which will use the same list of values for a condition in the WHERE clause; however, the column type of one database table needed is a character type while the column type of the other is numeric.
So I have a list of "Client ID" values I'd like to put into a macro variable as these IDs can change, and I would like to change them once in the variable instead of in multiple programs.
For example, I have this macro variable set up like so and it works in the proc SQL which queries the character column:
%let CLNT_ID_STR = ('179966', '200829', '201104', '211828', '264138');
Proc SQL part:
...IN &CLNT_ID_STR.
I would like to create another macro variable, say CLNT_ID_NUM, which takes the first variable (CLNT_ID_STR) but removes the quotes.
Desired output: (179966, 200829, 201104, 211828, 264138)
Proc SQL part: ...IN &CLNT_ID_NUM.
I've tried using the sysfunc, dequote and translate functions but have not figured it out.
TRANSLATE doesn't seem to want to allow a null string as the replacement.
Below uses TRANSTRN, which has no problem translating single quote into null:
1 %let CLNT_ID_STR = ('179966', '200829', '201104', '211828', '264138');
2 %let want=%sysfunc(transtrn(&clnt_id_str,%str(%'),%str())) ;
3 %put &want ;
(179966, 200829, 201104, 211828, 264138)
It uses the macro quoting function %str() to mask the meaning of a single quote.
Three other ways to remove single quotes are COMPRESS, TRANSLATE and PRXCHANGE
%let CLNT_ID_STR = ('179966', '200829', '201104', '211828', '264138');
%let id_list_1 = %sysfunc(compress (&CLNT_ID_STR, %str(%')));
%let id_list_2 = %sysfunc(translate(&CLNT_ID_STR, %str( ), %str(%')));
%let id_list_3 = %sysfunc(prxchange(%str(s/%'//), -1, &CLNT_ID_STR));
%put &=id_list_1;
%put &=id_list_2;
%put &=id_list_3;
----- LOG -----
ID_LIST_1=(179966, 200829, 201104, 211828, 264138)
ID_LIST_2=( 179966 , 200829 , 201104 , 211828 , 264138 )
ID_LIST_3=(179966, 200829, 201104, 211828, 264138)
It really doesn't matter that TRANSLATE replaces the ' with a single blank () because the context for interpretation is numeric.

SAS call symput in data step

I am facing this issue with sas data step. My requirement is to get a list of variables such as
total_jun2018 = sum(jun2018, dep_jun2018);
total_jul2018 = sum(jul2018, dep_jul2018);
Data final4;
set final3;
by hh_no;
do i=0 to &tot_bal_mnth.;
bal_mnth = put(intnx('month',"&min_Completed_dt."d, i-1), monyy7.);
call symputx('bal_mnth', bal_mnth);
&bal_mnth._total=sum(&bal_mnth., Dep_&bal_mnth.);
output;
end;
But I am facing error that macro variable bal_mnth not resolved. Also once it did ran successfully but I want that output must be printed sequentially but it only prints output for last loop when i=6 then it prints only Total_DEC2018=sum(DEC2018, DEP_DEC2018);
Any help will be appreciated!
Thanks,
Ajay
This is a common issue when learning SAS Macro. The problem is that the macro processor needs to resolve &bal_mnth to a value when the data step is first submitted for execution, but the CALL SYMPUT doesn't execute until the data step is actually executed, so at the time you submit the code, there is no value available for &bal_mnth.
In this case you don't need bal_mnth to be created as a variable in the data set, so you could replace the line that starts bal_mnth = put(intck(...)) with a %let bal_mnth = ... statement. The %let executes while the data step is being submitted, so that way its value will be available when you need it.
My proposed %let statement will need to wrap the functions in at least one SYSFUNC call, which is left as an exercise for the reader :-)
It looks like you want to generate a series of assignment statements like:
total_jun2018 = sum(jun2018, dep_jun2018);
total_jul2018 = sum(jul2018, dep_jul2018);
...
total_jan2019 = sum(jan2019, dep_jan2019);
What is known as wallpaper code.
If your variables names were easier, such as dep1 to dep18 then it would be easy to use arrays to process the data. With your current naming convention the problem with generating the array statements is not much different than the problem of generating a series of assignment statements.
You can create a macro so that you could use a %DO loop to generate your wallpaper code.
%local i bal_mnth;
%do i=0 %to &tot_bal_mnth.;
%let bal_mnth = %sysfunc(intnx(month,"&min_Completed_dt."d, &i-1), monyy7.);
total_&bal_mnth = sum(&bal_mnth , Dep_&bal_mnth );
%end;
Or you could just generate the code to a file with a data step.
%let tot_bal_mnth = 7;
%let min_Completed_dt=01JUN2018;
filename code temp;
data _null_;
file code;
length bal_mnth $7 ;
do i=0 to &tot_bal_mnth.;
bal_mnth = put(intnx('month',"&min_Completed_dt."d, i-1), monyy7.);
put 'total_' bal_mnth $7. ' = sum(' bal_mnth $7. ', Dep_' bal_mnth $7. ');';
end;
run;
So the generated file of code looks like this:
total_MAY2018 = sum(MAY2018, Dep_MAY2018);
total_JUN2018 = sum(JUN2018, Dep_JUN2018);
total_JUL2018 = sum(JUL2018, Dep_JUL2018);
total_AUG2018 = sum(AUG2018, Dep_AUG2018);
total_SEP2018 = sum(SEP2018, Dep_SEP2018);
total_OCT2018 = sum(OCT2018, Dep_OCT2018);
total_NOV2018 = sum(NOV2018, Dep_NOV2018);
total_DEC2018 = sum(DEC2018, Dep_DEC2018);
You can then use %include to run it in your data step.
data final4;
set final3;
by hh_no;
%include code / source2 ;
run;
I would like to offer another point of view: the difficulty you are having here results from the use of a wide data shape, with lots of columns.
Rather than working with your data in this shape, you could first transpose from wide to long, so that instead of having lots of total_xxx columns you just have 3: total, total_dep and date, with one row per month. Once it's in this format, it will be much easier to work with, potentially allowing you to avoid resorting to macros and wallpaper code.
Suggested reading:
Transpose wide to long with dynamic variables

Stop SAS from trying to resolve & reference within a macro string

I'm running a process that lists jobs I want to check the modification date on. I list the jobs in a dataset and then pass these to macro variables with a number.
e.g.
Data List_Prep;
Format Folder
Code $100.;
Folder = 'C:\FilePath\Job ABC'; Code = '01 Job Name.sas'; Output;
Folder = 'C:\FilePath\Job X&Y'; Code = '01 Another Job.sas'; Output;
Run;
%Macro List_Check();
Data List;
Set List_Prep;
Job + 1;
Call Symput (Cats("Folder", Job), Strip(Folder));
Call Symput (Cats("Code", Job), Strip(Code));
Run;
%Put Folder1 = &Folder1;
%Put Folder2 = &Folder2;
%MEnd;
%List_Check;
It prints the %Put statement just fine for foler 1, but folder 2 doesn't work right.
Folder1 = C:\FilePath\Job ABC
WARNING: Apparent symbolic reference Y not resolved.
Folder2 = C:\FilePath\Job X&Y
When I then go in to a loop to check the datasets, again, it work, so looks for Folder1, Code1 etc, but I still get the warnings.
How can I stop these warnings? I've tried %Str("&") instead, but still get the issue.
The %superq() macro function is a great way to mask macro triggers that are already in a macro variable. You could either remember to quote the values when using them,
%put Folder1 = %superq(Folder1) ;
or you could adjust your process to quote them right after creating them.
data List_Prep;
length Folder Code $100;
Folder = 'C:\FilePath\Job ABC'; Code = '01 Job Name.sas'; Output;
Folder = 'C:\FilePath\Job X&Y'; Code = '01 Another Job.sas'; Output;
run;
data List;
set List_Prep;
Job + 1;
length dummy $200 ;
call symputx(cats("Folder", Job), Folder);
dummy = resolve(catx(' ','%let',cats("Folder", Job),'=%superq(',cats("Folder", Job),');'));
call symputx(cats("Code", Job), Code);
dummy = resolve(catx(' ','%let',cats("Code", Job),'=%superq(',cats("Code", Job),');'));
drop dummy;
run;
P.S. Don't use FORMAT to define variables. Use statements like LENGTH or ATTRIB that are designed for defining variables. FORMAT is for attaching formats to variable, not for defining them. The only reason that using FORMAT worked is that it had the side effect of SAS defining the variable's type and length to match the format that you attached to it because it was the first place you referenced the variable in the data step.
You can prevent SAS from trying to resolve the ampersand in the value by using the %superq function
%put Folder2 = %superq(Folder2);

Incrementing and evaluating dates in SAS

I have the following variables:
%let curr_score_date = '31DEC2013'D;
%let target_date = %sysfunc(intnx(month,&curr_score_date,12,e));
%let prod_start_date = %sysfunc(intnx(month,&curr_score_date,-11,b));
%let prod_end_date = %sysfunc(intnx(month,&curr_score_date,0,e));
If evaluating based on this documentation, I evaluate each statement on its own to:
target_date = '31Dec2014'
prod_start_date = '01Jan2013'
prod_end_date = '31Dec2013'
However, I am wondering if each step just returns a value, or actually updates &curr_score_date. If it was updated at each calculation, this would certainly affect the results.
In SAS, functions return values (and cannot change their arguments), while call routines can change their arguments.
As such, in the above, &curr_score_date cannot be changed by use of %sysfunc.

Extract "dynamic" part from SAS data-set

I am unsure if this is possible (or stupid question), as I just started looking at SAS last week. I've managed to import my .CSV file to a SAS data set using the:
proc import
Specifying the guessingrows= to limit my out=.
My problem is now that my CSV files to import are not of same structure, which I noticed after writing some code using the obsnum= to specify start and x-lines to read.
So my question is wether or not SAS is capable of either look for a specific string/empty variable, and use as end observation?
My Data looks like (but number of Var_x varies for each file):
First I tried looking at the slice= but is only useful if I know the exact Places of interest, as the empty Space between the Groups can vary.
Is it possible to use the set function to specify to start at line 1 and read till encounting a blank field? Or can you redirect me to some function (that I couldn't find myself)?
I would like to look at each "block" separately and process.
Thank you in advance
I think you can do this in a relatively straightforward way if you are comfortable doing some processing after all the data has been inputted.
So do proc import on the whole dataset with no restriction.
Then use a data step and a counter to process through the data and output as necessary. Something like:
data output1 output2 output3;
set imported_data;
if _n_ = 1 then counter = 1;
var1lag = lag(var1);
if var1 = '' and var1lag ne '' then counter=counter+1;
if counter = 1 then output output1;
else if counter = 2 then output output2;
else output output3;
run;
data output1;
set output1;
if var1 = '' and var2 = . and var3 = . then delete;
run;
data output2;
set output2;
if var1 = '' and var2 = . and var3 = . then delete;
run;
data output3;
set output3;
if var1 = '' and var2 = . and var3 = . then delete;
run;
The above code outputs to three datasets based on the value of counter. The lag function lets us look up a row to ensure its the first time we see no data and updates the counter as we see no data.
Then we go back and remove any fully blank data for our datasets.
You could easily use some arrays to make this work more scaleably if you have many outputs instead of the if/else statements to output the data.