Update variable based on match in two tables - sas

I have 2 tables, lets name them table1 and table2. Both of them have credit_id, loan_id and Date field. For some reason credit_id field needs to be updated with corresponding values from table2, linking data by Date and loan_id fields. To do so, I made a query like:
proc sql;
UPDATE a
SET a.credit_id = b.credit_id
FROM table1 a, table2 b
WHERE (a.Date = b.Date) AND (a.loan_id = b.loan_id);
quit;
According to googling, this query should work in many sql environments, but it seems that SAS is an exception here, because it seems that from part is ignored.
How to update needed field then?

I can't comment on the SQL, but you can do the same thing using a data step:
data table1;
update table1 table2(keep = date loan_id credit_id);
by date loan_id;
run;
This requires that:
No two rows in the same table have the same date and loan_id, and
Both tables are sorted/indexed by date and loan id
You need the keep on the transaction dataset in order to prevent it from updating/creating any other variables on the master dataset. There are also several other ways you could do this, e.g. using the modify or merge statements.

Related

Iteratively adding to merged SAS dataset

I have 18 separate datasets that contain similar information: patient ID, number of 30-day equivalents, and total day supply of those 30-day equivalents. I've output these from a dataset that contains those 3 variables plus the medication class (VA_CLASS) and the quarter it was captured in (a total of 6 quarters).
Here's how I've created the 18 separate datasets from the snip of the dataset shown above:
%macro rx(class,num);
proc sql;
create table dm_sum&clas._qtr&num as select PatID,
sum(equiv_30) as equiv_30_&class._&num
from dm_qtrs
where va_class = "HS&class" and dm_qtr = &qtr
group by 1;
quit;
%mend;
%rx(500,1);
%rx(500,2);
%rx(500,3);
%rx(500,4);
%rx(500,5);
%rx(500,6);
%rx(501,1);
and so on...
I then need to merge all 18 datasets back together by PatID and what I'd like to do is iteratively add the next dataset created to the previous, as in, add dataset dm_sum_500_qtr3 to a file that already contains the results of dm_sum_500_qtr1 & dm_sum_500_qtr1.
Thanks for looking, Brian
In the macro append the created data set to it an accumulator data set. Be sure to delete it before starting so there is a fresh accumulation. If the process is run at different times (like weekly or monthly) you may want to incorporate a unique index to prevent repeated appendings. If you are stacking all these sums, the create table should also select va_class and dm_qtr
%macro (class, num, stack=perm.allClassNumSums);
proc sql; create table dm_sum&clas._qtr&num as … ;
proc append force base=perm.allClassNumSums data=dm_sum&clas._qtr#
run;
%mend;
proc sql;
drop table perm.allClassNumSums;
%rx(500,1)
%rx(500,2)
%rx(500,3)
%rx(500,4)
%rx(500,5)
…
A better approach might be a single query with an larger where, and leave the class and qtr as categorical variables. Your current approach is moving data (class and qtr) into metadata (column names). Such a transformation makes additional downstream processing more difficult.
Proc TABULATE or REPORT can be use a CLASS statement to assist the creation of output having category based columns. These procedures might even be able to work directly with the original data set and not require a preparatory SQL query.
proc sql;
create table want as
select
PatID, va_class, dm_qtr,
sum(equiv_30) as equiv_30_sum
from dm_qtrs
where catx(':', va_class, dm_sqt) in
(
'HS500:1'
'HS500:2'
'HS500:3'
…
'HS501:1'
)
group by PatID, va_class, dm_qtr;
quit;

Date missing in dataset

I created below SAS code to pull the data for particular a date.
%let date =2016-12-31;
proc sql;
connect to teradata as tera ( user=testuser password=testpass );
create table new as select * from connection tera (select acct,org
from dw.act
where date= &date.);
disconnect from tera;
quit;
There are situation where that particular date may be missing in the dataset due to holiday.
I thinking how to query the previous date(non-holiday) if the mention date in the %let statement is holiday
Before running your query you have to do a lookup or data check on the date you are using. You have two options:
Use a Date Dimension table in order identify/lookup holidays.
Count how many records you have for that date, if you get 0 obs for this date, use date+1 in your query.
I recommend using the date dimension table option.
Teradata has Sys_Calendar.Calendar view. You can use that in query, it has all the information regarding weekdays and others.
if you want to SAS way use weekday function and use call symput as shown below. Teradata needs single quote around the date, so it is better to have single quotes around when creating macro variable
data _null_;
/* this is for intial date*/
date_int = input('2016-12-31', yymmdd10.);
/* create a new date variable depending on weekday*/
if weekday(date_int) = 7 then date =date_int-2; /*sunday -2 days to get
friday*/
else if weekday(date_int) = 6 then date =date_int-1;/*saturday -1 day to get
friday*/
else date =date_int;
format date date_int yymmdd10.;
call symputx('date', ''''||put(date,yymmdd10.)||'''');
run;
%put modfied date is &date;
modified date is '2016-12-29'
Now you can use this macro variable in your pass through.

Conditional Merge in SAS data step

I have two tables, which for the sake of simplicity are:
Table 1:
Date ID
20170401 X
20170501 Y
20170601 Z
Table 2:
Date ID
20170201 Z
20170301 Y
20170501 X
I want to create a new table which has everything from table 1, unless the ID occurred at a previous date in table 2.
The desired output for table 1 and 2 would be:
Date ID
20170401 X
This is what I currently have. I'm not sure where to put the conditional merge:
data new;
merge table1 table2(in=b);
by date ID;
if not b [where table2.date is before table1.date];
run;
Thanks
Unfortunately SAS overwrites variables with the same names, so rename date in table2. Merge on ID and only keep if it's in table1 and passes your date criterion. Then drop the date2 column as it's not needed.
data new;
merge table1 (in=a) table2(in=b rename=(date=date2));
by ID;
if a and not(date2<date);
drop date2;
run;

select only a few columns from a large table in SAS

I have to join 2 tables on a key (say XYZ). I have to update one single column in table A using a coalesce function. Coalesce(a.status_cd, b.status_cd).
TABLE A:
contains some 100 columns. KEY Columns ABC.
TABLE B:
Contains just 2 columns. KEY Column ABC and status_cd
TABLE A, which I use in this left join query is having more than 100 columns. Is there a way to use a.* followed by this coalesce function in my PROC SQL without creating a new column from the PROC SQL; CREATE TABLE AS ... step?
Thanks in advance.
You can take advantage of dataset options to make it so you can use wildcards in the select statement. Note that the order of the columns could change doing this.
proc sql ;
create table want as
select a.*
, coalesce(a.old_status,b.status_cd) as status_cd
from tableA(rename=(status_cd=old_status)) a
left join tableB b
on a.abc = b.abc
;
quit;
I eventually found a fairly simple way of doing this in proc sql after working through several more complex approaches:
proc sql noprint;
update master a
set status_cd= coalesce(status_cd,
(select status_cd
from transaction b
where a.key= b.key))
where exists (select 1
from transaction b
where a.ABC = b.ABC);
quit;
This will update just the one column you're interested in and will only update it for rows with key values that match in the transaction dataset.
Earlier attempts:
The most obvious bit of more general SQL syntax would seem to be the update...set...from...where pattern as used in the top few answers to this question. However, this syntax is not currently supported - the documentation for the SQL update statement only allows for a where clause, not a from clause.
If you are running a pass-through query to another database that does support this syntax, it might still be a viable option.
Alternatively, there is a way to do this within SAS via a data step, provided that the master dataset is indexed on your key variable:
/*Create indexed master dataset with some missing values*/
data master(index = (name));
set sashelp.class;
if _n_ <= 5 then call missing(weight);
run;
/*Create transaction dataset with some missing values*/
data transaction;
set sashelp.class(obs = 10 keep = name weight);
if _n_ > 5 then call missing(weight);
run;
data master;
set transaction;
t_weight = weight;
modify master key = name;
if _IORC_ = 0 then do;
weight = coalesce(weight, t_weight);
replace;
end;
/*Suppress log messages if there are key values in transaction but not master*/
else _ERROR_ = 0;
run;
A standard warning relating to the the modify statement: if this data step is interrupted then the master dataset may be irreparably damaged, so make sure you have a backup first.
In this case I've assumed that the key variable is unique - a slightly more complex data step is needed if it isn't.
Another way to work around the lack of a from clause in the proc sql update statement would be to set up a format merge, e.g.
data v_format_def /view = v_format_def;
set transaction(rename = (name = start weight = label));
retain fmtname 'key' type 'i';
end = start;
run;
proc format cntlin = v_format_def; run;
proc sql noprint;
update master
set weight = coalesce(weight,input(name,key.))
where master.name in (select name from transaction);
run;
In this scenario I've used type = 'i' in the format definition to create a numeric informat, which proc sql uses convert the character variable name to the numeric variable weight. Depending on whether your key and status_cd columns are character or numeric you may need to do this slightly differently.
This approach effectively loads the entire transaction dataset into memory when using the format, which might be a problem if you have a very large transaction dataset. The data step approach should hardly use any memory as it only has to load 1 row at a time.

Adding multiple record count to a table

My students table has the following tables
student id | student year | test result | semester
I would like to group the records together to see how many re-tests did the student do in a particular semester.
I am trying to alter the table and add the total_tests_taken column to the table and use an
update statement like:
ALTER table students
(add total_tests_taken number );
UPDATE students
SET total_tests_taken = (select count(*) OVER ( PARTITION BY student_id, semester) FROM students)
but my sql fails saying: "ORA-01427: single-row subquery returns more than one row"
what am I doing wrong?
Do I need to create a temp table and than do it?
Thanks
the reason you are getting the error is because you are trying to set a column's value = a table. SET statement would update each row that matches the constraint with the given value. What you are trying to do can be accomplished by UPDATE with JOIN statement if your DBMS supports it. you can check out the answer to this question for the syntax
How can I do an UPDATE statement with JOIN in SQL?