An efficient way to update multiple columns using join - sas

I am really struggling with it guys.
The table needs to be updates has ~15M rows and ~200 columns.
I need to update few columns using a work table table.
This is (partly) what I need to do:
%macro condition;
%if &row_count>0 %then %do;
data _null_;
set W4TWGKJ6 end=final;
if _n_ = 1 then call execute("proc sql ;");
call execute
("update dds.insurance_policy set X_STORNO_BY_VERSION="||TOSNUM||" where policy_no='"||cats(polid)||"' and X_INSURANCE_PRODUCT_CD='"||cats(prodid)||"'
and X_INSURER_SERIAL_NO = "||X_INSURER_SERIAL_NO||" and x_source_system_cd ="||'"5"'||" and x_source_system_category_cd ="||'"5"'||" and x_current_ind = "||'"Y"'||";,
update dds.insurance_policy set STATUS_CHANGE_DT="||ISSUE_DT||" where policy_no='"||cats(polid)||"' and X_INSURANCE_PRODUCT_CD='"||cats(prodid)||"'
and X_INSURER_SERIAL_NO = "||X_INSURER_SERIAL_NO||" and x_source_system_cd ="||'"5"'||" and x_source_system_category_cd ="||'"5"'||" and x_current_ind = "||'"Y"'||";");
if final then call execute('quit;'); run;
%end;
%mend;
%condition;
I first check if there are rows in table (&row_count)
if there are,
I update 2 columns (I need to update 5, I just cut them from the example)
using a work table called W4TWGKJ6.
This update takes forever.
In fact, I stopped the process every single time, as it worked for hours without returning anything....
Does anyone knows a better solution for this problem?
Thanks in advance,
Gal.

I'd suggest using MODIFY statement in datastep:
You should have same column names in both tables for BY variables and have them sorted by those variables.
data dds.insurance_policy;
modify
dds.insurance_policy
W4TWGKJ6 (keep= POLICY_NO X_INSURER_SERIAL_NO /* key variables */
X_STORNO_BY_VERSION STATUS_CHANGE_DT /* ... other variables from source to update target */
updatemode=nomissingcheck;
by POLICY_NO X_INSURER_SERIAL_NO;
if _iorc_ = %sysrc(_SOK) then do;
* Update row ;
replace;
end;
else _error_ = 0;
run;
See SAS: How not to overwrite a dataset when the "where" condition in a "Modify" statement does not hold? for complete reference of iorc return values.

Related

How to create several tables from one table using Loops in SAS?

I have a table with observations from the date 01.08.2016 to 30.08.2016.
How to create 12 tables in the following way:
the first one contains observations from the date 01.08.2016 to 20.08.2016;
the second one contains observations from the date 01.08.2016 to 21.08.2016;
...
the 12th one contains observations from the date 01.08.2016 to 30.08.2016.
I think that it is better to do using loops, but dont know how.
This assumes that the date is in SAS date format. You can use character comparison if your date is in character format.
The data vector still contains the observation after the output statement is executed. So as long as the condition is true, the data step will write the same observation to multiple datasets. Also, I think you will need the date comparisons till 31st August if you want 12 datasets.
data want1 want2 want3 ... want12;
set have;
if date <= '20AUG2016'd then output want1;
if date <= '21AUG2016'd then output want2;
if date <= '22AUG2016'd then output want3;
.
.
.
if date <= '31AUG2016'd then output want12;
run;
It is probably better to use WHERE statements than to make separate tables. But to do either without hardcoding you need to use code generation. That is normally done using macro logic.
%macro split(start,stop);
%local i n;
%let n=%sysfunc(intck(day,&start,&stop));
%let n=%eval(&n+1);
DATA
%do i=1 %to &n;
WANT&i
%end;
;
set have ;
%do i=1 %to &n ;
if date <= %sysfunc(intnx(day,&start,&i-1)) then output WANT&i ;
%end;
run;
%mend split;
%split('20AUG2016'd,'31AUG2016'd);

Insert text into all cells of first column in a sas dataset

I've output 'Moments' from Proc Univariate to datasets. Many.
Example: Moments_001.sas7bdat through to Moments_237.sas7bdat
For the first column of each dataset (new added first column, and probably new dataset, as opposed to the original) I would like to have a particular text in every cell going down to bottom row.
The exact text would be the name of the respective dataset file: say, "Moments_001".
I do not have to 'grab' the filename, per se, if that's not possible. As I know what the names are already, I can put that text into the procedure. However, grabbing the filenames, if possible, would be easier from my standpoint.
I'd greatly appreciate any help anyone could provide to accomplish this.
Thanks,
Nicholas Kormanik
Are you looking for the INDSNAME option of the SET statement? You need to define two variables because the one generated by the option is automatically dropped.
data want;
length moment dsn $41 ;
set Moments_001 - Moments_237 indsname=dsn ;
moment=dsn;
run;
I think something along these lines should be what you're after. Assuming you have a list of moments, you can loop through it and add a new variable as the first column of each dataset.
%let list_of_moments = moments_001 moments_002 ... moments_237;
%macro your_macro;
%do i = 1 %to %sysfunc(countw(&list_of_moments.));
%let this_moment = %scan(&list_of_moments., &i.);
data &this_moment._v2;
retain new_variable;
set &this_moment.;
new_variable = "&this_moment.";
run;
%end;
%mend your_macro;
%your_macro;
The brute force entering of text into column 1 looks like this:
data moments_001;
length text $ 16;
set moments_001;
text="Moments_001";
run;
You could also write a macro that would loop through all 237 data sets and insert the text.
UNTESTED CODE
%macro do_all;
%do i=1 %to 237;
%let num = %sysfunc(putn(&i,z3.));
data moments_&num;
length text & 16;
set moments_&num;
text="Moments_&num";
run;
%end;
%mend
%do_all
It seems to me (not knowing your problem) that if you use PROC UNIVARIATE with the BY option, then you wouldn't need 237 different data sets, all of your output would be in one data set and the BY variable would also be in the data set. Does that solve your problem?

SAS: Drop column in a if statement

I have a dataset called have with one entry with multiple variables that look like this:
message reference time qty price
x 101 35000 100 .
the above dataset changes every time in a loop where message can be ="A". If the message="X" then this means to remove 100 qty from the MASTER set where the reference number equals the reference number in the MASTER database. The price=. is because it is already in the MASTER database under reference=101. The MASTER database aggregates all the available orders at some price with quantity available. If in the next loop message="A" then the have dataset would look like this:
message reference time qty price
A 102 35010 150 500
then this mean to add a new reference number to the MASTER database. In other words, to append the line to the MASTER.
I have the following code in my loop to update the quantity in my MASTER database when there is a message X:
data b.master;
modify b.master have(where=(message="X")) updatemode=nomissingcheck;
by order_reference_number;
if _iorc_ = %sysrc(_SOK) then do;
replace;
end;
else if _iorc_ = %sysrc(_DSENMR) then do;
output;
_error_ = 0;
end;
else if _iorc_ = %sysrc(_DSEMTR) then do;
_error_ = 0;
end;
else if _iorc_ = %sysrc(_DSENOM) then do;
_error_ = 0;
end;
run;
I use the replace to update the quantity. But since my entry for price=. when message is X, the above code sets the price='.' where reference=101 in the MASTER via the replace statement...which I don't want. Hence, I prefer to delete the price column is message=X in the have dataset. But I don't want to delete column price when message=A since I use this code
proc append base=MASTER data=have(where=(msg_type="A")) force;
run;
Hence, I have this code price to my Modify statement:
data have(drop=price_alt);
set have; if message="X" then do;
output;end;
else do; /*I WANT TO MAKE NO CHANGE*/
end;run;
but it doesn't do what I want. If the message is not equal X then I don't want to drop the column. If it is equal X, I want to drop the column. How can I adapt the code above to make it work?
Its a bit of a strange request to be honest, such that it raises questions about whether what you're doing is the best way of doing it. However, in the spirit of answering the question...
The answer by DomPazz gives the option of splitting the data into two possible sets, but if you want code down the line to always refer to a specific data set, this creates its own complications.
You also can't, in the one data step, tell SAS to output to the "same" data set where one instance has a column and one instance doesn't. So what you'd like, therefor, is for the code itself to be dynamic, so that the data step that exists is either one that does drop the column, or one that does not drop the column, depending on whether message=x. The answer to this, dynamic code, like many things in SAS, resolves to the creative use of macros. And it looks something like this:
/* Just making your input data set */
data have;
message='x';
time=35000;
qty=1000;
price=10.05;
price_alt=10.6;
run;
/* Writing the macro */
%macro solution;
%local id rc1 rc2;
%let id=%sysfunc(open(work.have));
%syscall set(id);
%let rc1=%sysfunc(fetchobs(&id, 1));
%let rc2=%sysfunc(close(&id));
%IF &message=x %THEN %DO;
data have(drop=price_alt);
set have;
run;
%END;
%ELSE %DO;
data have;
set have;
run;
%END;
%mend solution;
/* Running the macro */
%solution;
Try this:
data outX(drop=price_alt) outNoX;
set have;
if message = "X" then
output outX;
else
output outNoX;
run;
As #sasfrog says in the comments, a table either has a column or it does not. If you want to subset things where MESSAGE="X" then you can use something like this to create 2 data sets.

SAS Create Multiple Tables Based on Given Character Elements

UPDATE I've been told this isn't possible using arrays because of they way they are stored. This changes my question a bit, but the gist is still the same. How can I most efficiently generate the tables I need from a given vector of values (ex: day, week, month, year) without just repeating the code multiple times? Is there any way to simply substitute the given date value into INTX in a loop?
Ok, this is my last question on this subject, I promise. After some good advice, I'm using the INTX function. However, I'd like to just loop through the different categories I select and create tables. I tried this, but to no avail.
data;
array period [*] $ day week month year;
run;
%MACRO sqlloop;
proc sql;
%DO k = 1 %TO dim(&period); /* in case i decide to drop/add from array later */
%LET bucket = &period[&k];
CREATE TABLE output.t_&bucket AS (
SELECT INTX( "&bucket.", date_field, O, 'E') AS test FROM table);
%END
quit;
%MEND
%sqlloop
Sadly this doesn't work because I'm fouling up the array reference somehow. If I can get this step I'll be in good shape.
You could replace your array with a macro variable string:
%let period=day week month year;
In your macro then, you loop over the words in the macro variable:
%MACRO sqlloop;
proc sql;
%DO k = 1 %TO %sysfunc(countw(&period.)); /*fixed extra s*/
%LET bucket = %scan(&period.,&k.);
CREATE TABLE output.t_&bucket AS (
SELECT INTNX( "&bucket.", date_field, 0, 'E') AS test FROM table);
%END;
quit;
%MEND;
%sqlloop
edit you forgot some semicolons apparently. :p

SAS: do loop attribute selection dynamically

i have a data set with multiple attributes and each attribute has 10-15 rows each in the master table. i wish to use a do loop on the data set which would allow me to extract outputs for each attribute seperately. my concern is how to automate the selection of attribute in the do loop once the previous attribute's output is extracted??
thanks in advance.
I'm not completely sure what you're asking to do, but I can hopefully show the basic ideas of a do loop.
%macro YOUR_MACRO();
%let YOUR_VARIABLE = 1 2 3 ...; /*This could be whatever you want to split up from your master table*/
%let NUM_VAR = 3; /*Change this to the number of YOUR_VARIABLEs listed*/
%do i = 1 %to &NUM_VAR. %by 1;
%let LOOP_VAR = %scan(&YOUR_VARIABLE., %i.);
/*This do i = 1 starts your loop at 1 and goes up by 1 until your NUM_VAR is reached*/
proc sql;
create table TABLE_&LOOP_VAR. as /*Creates a specific table for each variable*/
select *
from MASTER_TABLE
where COLUMN_NAME = &LOOP_VAR. /*Splits up your table by a certain attribute equaling the loop variable*/
;
quit;
%end;
%mend;
%YOUR_MACRO(); /*Runs your loop*/
This is the basic structure and should give a little help. You can also just scan your master table for each variable name then separate it by that without having to type each one out.