I have a below code in the SAS:
proc sort data=MYDATA1;
by VarNum Size Flavour Brand Retailer Market date;
run;
DATA MYDATA;
SET MYDATA1;
by VarNum Brand Size Flavour Retailer Market date;
/* Loop while for transformations. */
SUM = 0;
VAR1 = 1;
V1= Transformation;
VAR = Variable_for_SAS;
DO WHILE(FIND(V1,";")<>0);
V=V1;
V1=substr(V1,1,FIND(V1,";")-1);
IF SUBSTR(V1,1,1)="/" THEN
VT=STRIP(SUBSTR(V1,2,Find(V1,";")-2))||STRIP(date);
if _n_=1 then do;
declare hash h(dataset: 'MYDATA1');
h.definekey('Variable_date');
h.definedata('Variable_for_SAS');
h.definedone();
end;
if not h.find(key: VT) then new=Variable_for_SAS;
h.find();
SUM1=1*VAR;
/*Overwrite variable*/
VAR=SUM1;
V1=substr(TRIM(V),FIND(V,";")+1);
run;
But I have error:
run;
_
117
ERROR 117-185: There was 1 unclosed DO block.
Do you know what I should do to solve this problem?
Is problem because I use DO WHILE and hash together?
Now code is full.
Just add in the missing END statement for where you want your DO WHILE () loop to end.
Because you could be looping multiple times on even the first iteration of the data step your IF condition to run the hash creation steps is not sufficient to make sure those statements only run once. So either move the block that creates the HASH object to BEFORE the while loop. Or add additional conditions to the IF statement to keep it from re-running on every iteration of the DO WHILE loop.
Related
I have the following code. I am trying to test a paragraph (descr) for a list of keywords (key_words). When I execute this code, the log reads in all the variables for the array, but will only test 2 of the 20,000 rows in the do loop (do i=1 to 100 and on). Any suggestions on how to fix this issue?
data JE.KeywordMatchTemp1;
set JE.JEMasterTemp end=eof;
if _n_ = 1 then do i = 1 by 1 until (eof);
set JE.KeyWords;
array keywords[100] $30 _temporary_;
keywords[i] = Key_Words;
end;
match = 0;
do i = 1 to 100;
if index(descr, keywords[i]) then match = 1;
end;
drop i;
run;
Your problem is that your end=eof is in the wrong place.
This is a trivial example calculating the 'rank' of the age value for each SASHELP.CLASS respondent.
See where I put the end=eof. That's because you need to use it to control the array filling operation. Otherwise, what happens is your loop that is do i = 1 to eof; doesn't really do what you're saying it should: it's not actually terminating at eof since that is never true (as it is defined in the first set statement). Instead it terminates because you reach beyond the end of the dataset, which is specifically what you don't want.
That's what the end=eof is doing: it's preventing you from trying to pull a row when the array filling dataset is finished, which terminates the whole data step. Any time you see a data step terminate after exactly 2 iterations, you can be confident that's what the problem is likely to be - it is a very common issue.
data class_ranks;
set sashelp.class; *This dataset you are okay iterating over until the end of the dataset and then quitting the data step, like a normal data step.;
array ages[19] _temporary_;
if _n_=1 then do;
do _i = 1 by 1 until (eof); *iterate until the end of the *second* set statement;
set sashelp.class end=eof; *see here? This eof is telling this loop when to stop. It is okay that it is not created until after the loop is.;
ages[_i] = age;
end;
call sortn(of ages[*]); *ordering the ages loaded by number so they are in proper order for doing the trivial rank task;
end;
age_rank = whichn(age,of ages[*]); *determine where in the list the age falls. For a real version of this task you would have to check whether this ever happens, and if not you would have to have logic to find the nearest point or whatnot.;
run;
I'm coming at SAS from a Python/R/Stata background, and learning that things are rather different in SAS. I'm approaching the following problem from the standpoint of one of these languages, perhaps SAS isn't up to what I want to do.
I have a panel dataset with an age column in it. I want to add new columns to the dataset using this age column. I'm going to simplify the functions of age to keep it simple in my example.
The goal is to loop over a sequence, and use the value of that sequence at each loop step to 1. assign the name of the new column and 2. assign the values of that column. I'm hoping to get my starting dataset, with new columns added to it taking values spline1 spline2... spline7
data somePath.FinalDataset;
do i = 1 to 7;
if i = 1 then
spline&i. = age;
if i ^= 1 then spline&i. = age + i;
end;
set somePath.StartingDataset;
run;
This code won't even run, though in an earlier version I was able to get it to run, but the new columns had their values shifted down one row from what they should have been. I include this code block as pseudocode of what I'm trying to do. Any help is much appreciated
One way to do this in SAS is with arrays. A SAS array can be used to reference a group of variables, and it can also create variables.
data have;
input age;
cards;
5
10
;
run;
data want;
set have;
array spline{7}; *create spline1 spline2 ... spline7;
do i=1 to 7;
if i = 1 then spline{i} = age;
else spline{i} = age + i;
end;
drop i;
run;
Spline{i} referes to the ith variable of the array named spline.
i is a regular variable, the DROP statement prevents it from being written to the output dataset.
When you say new columns were "shifted by one," note that spline1=age and spline2=age+2. You can change your code accordingly, e.g. if you want spline2=age+1, you could change your else statement to else spline{i} = age + i - 1 ; It is also possible to change the array statement to define it with 0 as the lower bound, rather than 1.
Arrays are likely the best way to solve this, but I will demonstrate a macro approach, which is necessary in some cases.
SAS separates its doing-things-with-data language from its writing-code language into the 'data step language' and the 'macro language'. They don't really talk to each other during a data step, because the macro language runs during the compilation stage (before any data is processed) while the data step language runs during the execution stage (while rows of data are being processed).
In any event, for something like this it's quite possible to write a macro to do what you want. Borrowing Quentin's general structure and initial dataset:
data have;
input age;
cards;
5
10
;
run;
%macro make_spline(var=, count=);
%local i;
%do i = 1 %to &count;
%if &i=1 %then &var.&i. = &var.;
%else &var.&i. = &var. + &i.;
; *this semicolon ends the assignment statement;
%end;
/* You end up with the IF statement generating:
age1 = age
and the extra semicolon after the if/else generates the ; for that line, making it
age1 = age;
etc. for the other lines.
*/
%mend make_spline;
data want;
set have;
%make_spline(var=age,count=7);
run;
This would then perform what you're looking to perform. The looping is in the macro language, not in the data step. You can assign parameters however you see fit; I prefer to have parameters like above, or even more (start loop could also be a parameter, and in fact the assignment code could be a parameter!).
I am really struggling with it guys.
The table needs to be updates has ~15M rows and ~200 columns.
I need to update few columns using a work table table.
This is (partly) what I need to do:
%macro condition;
%if &row_count>0 %then %do;
data _null_;
set W4TWGKJ6 end=final;
if _n_ = 1 then call execute("proc sql ;");
call execute
("update dds.insurance_policy set X_STORNO_BY_VERSION="||TOSNUM||" where policy_no='"||cats(polid)||"' and X_INSURANCE_PRODUCT_CD='"||cats(prodid)||"'
and X_INSURER_SERIAL_NO = "||X_INSURER_SERIAL_NO||" and x_source_system_cd ="||'"5"'||" and x_source_system_category_cd ="||'"5"'||" and x_current_ind = "||'"Y"'||";,
update dds.insurance_policy set STATUS_CHANGE_DT="||ISSUE_DT||" where policy_no='"||cats(polid)||"' and X_INSURANCE_PRODUCT_CD='"||cats(prodid)||"'
and X_INSURER_SERIAL_NO = "||X_INSURER_SERIAL_NO||" and x_source_system_cd ="||'"5"'||" and x_source_system_category_cd ="||'"5"'||" and x_current_ind = "||'"Y"'||";");
if final then call execute('quit;'); run;
%end;
%mend;
%condition;
I first check if there are rows in table (&row_count)
if there are,
I update 2 columns (I need to update 5, I just cut them from the example)
using a work table called W4TWGKJ6.
This update takes forever.
In fact, I stopped the process every single time, as it worked for hours without returning anything....
Does anyone knows a better solution for this problem?
Thanks in advance,
Gal.
I'd suggest using MODIFY statement in datastep:
You should have same column names in both tables for BY variables and have them sorted by those variables.
data dds.insurance_policy;
modify
dds.insurance_policy
W4TWGKJ6 (keep= POLICY_NO X_INSURER_SERIAL_NO /* key variables */
X_STORNO_BY_VERSION STATUS_CHANGE_DT /* ... other variables from source to update target */
updatemode=nomissingcheck;
by POLICY_NO X_INSURER_SERIAL_NO;
if _iorc_ = %sysrc(_SOK) then do;
* Update row ;
replace;
end;
else _error_ = 0;
run;
See SAS: How not to overwrite a dataset when the "where" condition in a "Modify" statement does not hold? for complete reference of iorc return values.
I have a dataset called have with one entry with multiple variables that look like this:
message reference time qty price
x 101 35000 100 .
the above dataset changes every time in a loop where message can be ="A". If the message="X" then this means to remove 100 qty from the MASTER set where the reference number equals the reference number in the MASTER database. The price=. is because it is already in the MASTER database under reference=101. The MASTER database aggregates all the available orders at some price with quantity available. If in the next loop message="A" then the have dataset would look like this:
message reference time qty price
A 102 35010 150 500
then this mean to add a new reference number to the MASTER database. In other words, to append the line to the MASTER.
I have the following code in my loop to update the quantity in my MASTER database when there is a message X:
data b.master;
modify b.master have(where=(message="X")) updatemode=nomissingcheck;
by order_reference_number;
if _iorc_ = %sysrc(_SOK) then do;
replace;
end;
else if _iorc_ = %sysrc(_DSENMR) then do;
output;
_error_ = 0;
end;
else if _iorc_ = %sysrc(_DSEMTR) then do;
_error_ = 0;
end;
else if _iorc_ = %sysrc(_DSENOM) then do;
_error_ = 0;
end;
run;
I use the replace to update the quantity. But since my entry for price=. when message is X, the above code sets the price='.' where reference=101 in the MASTER via the replace statement...which I don't want. Hence, I prefer to delete the price column is message=X in the have dataset. But I don't want to delete column price when message=A since I use this code
proc append base=MASTER data=have(where=(msg_type="A")) force;
run;
Hence, I have this code price to my Modify statement:
data have(drop=price_alt);
set have; if message="X" then do;
output;end;
else do; /*I WANT TO MAKE NO CHANGE*/
end;run;
but it doesn't do what I want. If the message is not equal X then I don't want to drop the column. If it is equal X, I want to drop the column. How can I adapt the code above to make it work?
Its a bit of a strange request to be honest, such that it raises questions about whether what you're doing is the best way of doing it. However, in the spirit of answering the question...
The answer by DomPazz gives the option of splitting the data into two possible sets, but if you want code down the line to always refer to a specific data set, this creates its own complications.
You also can't, in the one data step, tell SAS to output to the "same" data set where one instance has a column and one instance doesn't. So what you'd like, therefor, is for the code itself to be dynamic, so that the data step that exists is either one that does drop the column, or one that does not drop the column, depending on whether message=x. The answer to this, dynamic code, like many things in SAS, resolves to the creative use of macros. And it looks something like this:
/* Just making your input data set */
data have;
message='x';
time=35000;
qty=1000;
price=10.05;
price_alt=10.6;
run;
/* Writing the macro */
%macro solution;
%local id rc1 rc2;
%let id=%sysfunc(open(work.have));
%syscall set(id);
%let rc1=%sysfunc(fetchobs(&id, 1));
%let rc2=%sysfunc(close(&id));
%IF &message=x %THEN %DO;
data have(drop=price_alt);
set have;
run;
%END;
%ELSE %DO;
data have;
set have;
run;
%END;
%mend solution;
/* Running the macro */
%solution;
Try this:
data outX(drop=price_alt) outNoX;
set have;
if message = "X" then
output outX;
else
output outNoX;
run;
As #sasfrog says in the comments, a table either has a column or it does not. If you want to subset things where MESSAGE="X" then you can use something like this to create 2 data sets.
i have a data set with multiple attributes and each attribute has 10-15 rows each in the master table. i wish to use a do loop on the data set which would allow me to extract outputs for each attribute seperately. my concern is how to automate the selection of attribute in the do loop once the previous attribute's output is extracted??
thanks in advance.
I'm not completely sure what you're asking to do, but I can hopefully show the basic ideas of a do loop.
%macro YOUR_MACRO();
%let YOUR_VARIABLE = 1 2 3 ...; /*This could be whatever you want to split up from your master table*/
%let NUM_VAR = 3; /*Change this to the number of YOUR_VARIABLEs listed*/
%do i = 1 %to &NUM_VAR. %by 1;
%let LOOP_VAR = %scan(&YOUR_VARIABLE., %i.);
/*This do i = 1 starts your loop at 1 and goes up by 1 until your NUM_VAR is reached*/
proc sql;
create table TABLE_&LOOP_VAR. as /*Creates a specific table for each variable*/
select *
from MASTER_TABLE
where COLUMN_NAME = &LOOP_VAR. /*Splits up your table by a certain attribute equaling the loop variable*/
;
quit;
%end;
%mend;
%YOUR_MACRO(); /*Runs your loop*/
This is the basic structure and should give a little help. You can also just scan your master table for each variable name then separate it by that without having to type each one out.