How to get previous row values Teradata - row

I have the data in the following format
Id Code Date Amount Type
101 B25 5/4/2020 $500 C
101 A15 5/5/2020 $100 D
101 D15 5/5/2020 $200 D
102 B35 6/2/2020 $400 C
102 A15 6/2/2020 $50 D
I need the following
Id Code Date Amount Type C_Date C_Amount
101 A15 5/5/2020 $100 D 5/4/2020 $500
102 A15 6/2/2020 $50 D 6/2/2020 $400
For all Code='A15' I need Date and amount from previous row where Type='C'
I did this
Select id, Amount, Date,
sum(Amount) over (partition by ID ROWS between UNBOUNDED PRECEDING and CURRENT ROW) as
C_Amount,
Max(Date) over (partition by ID ROWS between UNBOUNDED PRECEDING and CURRENT ROW) as
C_Date
from Table
where code='A15' or Type='C'
Output is not the desired one
Id Code Date Amount Type C_Date C_Amount
101 A15 5/5/2020 $100 D ***5/5/2020 $100***
102 A15 6/2/2020 $50 D ***6/2/2020 $50***
Any help is appreciated

The answer set doesn't match your query, it will include 'C' rows, too.
ROWS between UNBOUNDED PRECEDING and CURRENT ROW is a group sum.
You need ROWS between 1 PRECEDING and 1 PRECEDNG and an ORDER BY to get the previous row's value.
Better switch to LAG or LAST_VALUE, which is simpler and allows dealing with additional rows between the 'A15' and the previous 'C' row:
Select id, Amount, Date,
LAG(case when Type='C' then Amount end IGNORE NULLS)
over (partition by ID
order by date) as C_Amount,
LAG(case when Type='C' then date end IGNORE NULLS)
over (partition by ID
order by date) as C_Date
from Table
where code='A15' or Type='C'
qualify code='A15';

Related

Transform the set of data into 5 values

Suppose that I was given the following data
ID Birthday Monthly Salary
P222 2 March 1976 9,600
P013 13 June 1955 31,450
S015 12 September 1966 27,500
The ID number starts with a character followed by three digits.
The first character is the abbreviation of the occupation ("P" for Professor. and "S" for Staff, etc.).
Consider the following data, denoted by (*) and (**):
(*):
P222 2Mar1976 9,60000
P013 13Jun1955 31,45000
S015 12Sep1966 27,50000
(**):
P222 2Mar1976 $9,6,00
***************
P013 13Jun1955 $31,450
**************
S015 12Sep1966 $27,500
***********
Suppose I have to write SAS programs to read the aforementioned data (*) and (**) respectively to create a temporary SAS data file, called PERSONEL, which contains five variables, namely ID, OCCUPATION, BIRTHDAY, YEAR and SALARY.
I mean YEAR by the year of birth here. So variables BIRTHDAY, YEAR and SALARY are numeric, but ID and OCCUPATION would be character variables.
For example, the first record should have
ID="P222", OCCUPATION="P", BIRTHDAY=27821, YEAR=1976, SALARY=9600
Is it possible for me to do this WITHOUT using assignment statement?
If you have fixed column text file, like your first example:
RULE: ----+----1----+----2----+----3
1311 P222 2Mar1976 9,60000
1312 P013 13Jun1955 31,45000
1313 S015 12Sep1966 27,50000
Then you could read the variables directly from the proper columns.
data want;
infile 'myfile' truncover;
input id $ 1-4 occupation $ 1 #7 birthday date9. year 12-15 #16 salary comma12.2 ;
format birthday date9. salary dollar12.2;
run;
Result:
Obs id occupation birthday year salary
1 P222 P 02MAR1976 1976 $9,600.00
2 P013 P 13JUN1955 1955 $31,450.00
3 S015 S 12SEP1966 1966 $27,500.00
The second version has the values in slightly different positions and and extra line that would need to be skipped.

How do I compare each row in one table with a column in another table?

I have tables as below.
Table A(total of 3000 rows, end_date may have duplicates, ex, 123 and 223 may have the same end_date)
enroll_dt,end_date, acct_nbr
12/31/2016, 01/03/2017, 123
12/31/2016, 01/04/2017, 234
01/05/2017, 02/02/2017, 334
Table B(total of 30 unique values)
enroll_dt
12/31/2016
01/01/2017
01/02/2017
01/03/2017
01/04/2017
01/05/2017
...
Desired table:
Date number_of_records
12/31/2016 2
01/01/2017 2
01/02/2017 2
01/03/2017 2
01/04/2017 1
02/01/2017 1
What I want to do is for each value from Table B, I would sort all of rows from Table A, and return # of acct_nbr if
for total # of accounts get enrolled until dateA, how many accounts have
end_date>DateA.
Ex. for 01/01/2017 from Table B, number_of_records = 2 since we only have 2 accounts enrolled until 01/01/2017(acct_nbr=123 and 234)
and end_date'01/03/2017' and '01/04/2017' both greater than '01/01/2017'
Thanks a lot for your help
Assuming your dates are stored as actual dates:
select
b.datea,
count(distinct a.acct_nbr)
from
b
inner join a
on a.end_date >= b.datea
group by
1

SAS proc sql inner join without duplicates

I am struggling to join two table without creating duplicate rows using proc sql ( not sure if any other method is more efficient).
Inner join is on: datepart(table1.date)=datepart(table2.date) AND tag=tag AND ID=ID
I think the problem is date and different names in table 1. By just looking that the table its clear that table1's row 1 should be joined with table 2's row 1 because the transaction started at 00:04 in table one and finished at 00:06 in table 2. I issue I am having is I cant join on dates with the timestamp so I am removing timestamps and because of that its creating duplicates.
Table1:
id tag date amount name_x
1 23 01JUL2018:00:04 12 smith ltd
1 23 01JUL2018:00:09 12 anna smith
table 2:
id tag ref amount date
1 23 19 12 01JUL2018:00:06:00
1 23 20 12 01JUL2018:00:10:00
Desired output:
id tag date amount name_x ref
1 23 01JUL2018 12 smith ltd 19
1 23 01JUL2018 12 anna smith 20
Appreciate your help.
Thanks!
You need to set a boundary for that datetime join. You are correct in why you are getting duplicates. I would guess the lower bound is the previous datetime, if it exists and the upper bound is this record's datetime.
As an aside, this is poor database design on someone's part...
Let's first sort table2 by id, tag, and date
proc sort data=table2 out=temp;
by id tag date;
run;
Now write a data step to add the previous date for unique id/tag combinations.
data temp;
set temp;
format low_date datetime20.
by id tag;
retain p_date;
if first.tag then
p_date = 0;
low_date = p_date;
p_date = date;
run;
Now update your join to use the date range.
proc sql noprint;
create table want as
select a.id, a.tag, a.date, a.amount, a.name_x, b.ref
from table1 as a
inner join
temp as b
on a.id = b.id
and a.tag = b.tag
and b.low_date < a.date <= b.date;
quit;
If my understanding is correct, you want to merge by ID, tag and the closest two date, it means that 01JUL2018:00:04 in table1 is the closest with 01JUL2018:00:06:00 in talbe2, and 01JUL2018:00:09 is with 01JUL2018:00:10:00, you could try this:
data table1;
input id tag date:datetime21. amount name_x $15.;
format date datetime21.;
cards;
1 23 01JUL2018:00:04 12 smith ltd
1 23 01JUL2018:00:09 12 anna smith
;
data table2;
input id tag ref amount date: datetime21.;
format date datetime21.;
cards;
1 23 19 12 01JUL2018:00:06:00
1 23 20 12 01JUL2018:00:10:00
;
proc sql;
select a.*,b.ref from table1 a inner join table2 b
on a.id=b.id and a.tag=b.tag
group by a.id,a.tag,a.date
having abs(a.date-b.date)=min(abs(a.date-b.date));
quit;

Splitting a Column into two based on condtions in Proc Sql ,SAS

I want to Split the airlines column into two groups and then
Add each group 's amount for all clients... : -
Group 1 = Air India & jet airways
| Group 2 = Others.
Loc Client_Name Airlines Amout
BBI A_1ABC2 Air India 41302
BBI A 1ABC2 Air India 41302
MAA Th 1ABC2 Spice Jet Airlines 288713
HYD Ma 1ABC2 Jet Airways 365667
BOM Vi 1ABC2 Air India 552506
Something like this: -
Rank Client_name Group1 Group2 Total
1 Ca 1ABC2 5266269 7040320 1230658
2 Ve 1ABC2 2815593 2675886 5491479
3 Ma 1ABC2 1286686 437843 1724529
4 Th 1ABC2 723268 701712 1424980
5 Ec 1ABC2 113517 627734 741251
6 A 1ABC2 152804 439381 592185
I grouped it first ..but i am confused regarding how to split: -
Data assign6.Airlines_grouping1;
Set assign6.Airlines_grouping;
if Scan(Airlines,1) IN ('Air','Jet') then Group = "Group1";
else
if Scan(Airlines,1) Not in('Air','Jet') then Group = "Group2";
Run;
You are categorizing a row based on the first word of the airline.
Proc TRANSPOSE with an ID statement is one common way to reshape data so that a categorical value becomes a column. A second way is to bypass the categorization and use a data step to produce the new shape of data directly.
Here is an example of the second way -- create new columns group1 and group2 and set value based on airline criteria.
data airlines_group_amounts;
set airlines;
if scan (airlines,1) in ('Air', 'Jet') then
group1 = amount;
else
group2 = amount;
run;
summarize over client
proc sql;
create table want as
select
client_name
, sum(group1) as group1
, sum(group2) as group2
, sum(amount) as total
from airlines_group_amounts
group by client_name
;
You can avoid the two steps and do all of the processing in a single query, or you can do the summarization with Proc MEANS
Here is a single query way.
proc sql;
create table want as
select
client_name
, sum(case when scan (airlines,1) in ('Air', 'Jet') then amount else 0 end) as group1
, sum(case when scan (airlines,1) in ('Air', 'Jet') then 0 else amount end) as group2
, sum(amount) as total
from airlines
group by client_name
;

SAS: Exclude patients based on diagnoses on multiple lines and calculate incidence rates

I have large dataset of a few million patient encounters that include a diagnosis, timestamp, patientID, and demographic information.
For each patient, their diagnoses are listed on multiple lines. I need to exclude patients who have a certain diagnosis (282.1) and calculate incidence rates of other diseases in the year 2014.
IF diagnosis NE 282.1;
This in the data step does not work, because it does not take into account the other diagnoses on the other lines.
If possible, I would also like to calculate the incidence rates by disease.
This is an example of what the data looks like. There are multiple lines with multiple diagnoses.
PatientID Diagnosis Date Gender Age
1 282.1 1/2/10 F 25
1 232.1 1/2/10 F 87
1 250.02 1/2/10 F 41
1 125.1 1/2/10 F 46
1 90.1 1/2/10 F 58
2 140 12/15/13 M 57
2 132.3 12/15/13 M 41
2 149.1 12/15/13 M 66
3 601.1 11/19/13 F 58
3 231.1 11/19/13 F 76
3 123.1 11/19/13 F 29
4 282.1 12/30/14 F 81
4 130.1 12/30/14 F 86
5 230.1 1/22/14 M 60
5 282.1 1/22/14 M 46
5 250.02 1/22/14 M 53
Dual reading sollution
Straight forward version
You said you sorted the data first, probably like this
proc sort data=MYLIB.DIAGNOSES;
by PatientID;
run;
Assuming your data is ordered by patientID, you can process each with the diagnose to exclude first.
data WORK.NOT_HAVING_282_1;
set MYLIB.DIAGNOSES (where=(diagnosis EQ 282.1))
MYLIB.DIAGNOSES (where=(diagnosis NE 282.1));
by PatientID;
As we need to report by year, not by date:
year = year(Date);
Next step is to exclude those you don't need, so you need to remember if the unwanted diagnose occured:
retain has_282_1;
if first.PatientID then has_282_1 = 0;
if diagnosis EQ 282.1 then has_282_1 = 1;
and then keep the other diagnoses in 2014 for patients that do not have 282.1
else if not has_282_1 then output;
run;
Next you could SQL to count what you need
proc sql:
create table MYLIB.STATISTICS as
select year, Diagonsis, count(distinct PatientID) as incidence
from WORK.NOT_HAVING_282_1
group by year, Diagonsis;
quit;
Improvements
The above solution would take more processing power then needed:
you read DIAGNOSES from diks, then write FIRST_282_1 to disk, just to read it back in again
you can keep multiple observations of the same diagnose at diffrerent dates in the same year for the same patient, so you need count(distinct PatientID), which is a costly operation.
About diagnose 282.1, we only need to know who was ever diagnosed:
proc sort noduplicates
data=MYLIB.DIAGNOSES (where=(diagnosis EQ 282.1))
out=WORK.HAVING_282_1 (keep=PatientID);
by PatientID;
run;
About other diagnoses, we also need the year, which here:
data WORK.VIEW_OTHER / view=WORK.VIEW_OTHER;
set MYLIB.DIAGNOSES (where=(diagnosis NE 282.1));
year = year(Date);
keep PatientID year Diagnose;
run;
but as we use a view, we do not realy read and calculate anything before the view is used in this sort:
proc sort noduplicates
data=WORK.VIEW_OTHER (where=(diagnosis EQ 282.1))
out=WORK.OTHER_DIAGNOSES;
by PatientID year Diagnose;
run;
Now things become simpler. We use temproary variables exclude and other to indicate where data came from
data WORK.NOT_HAVING_282_1;
set WORK.HAVING_282_1 (in=exclude)
WORK.OTHER_DIAGNOSES (in=other);
by PatientID;
retain has_282_1;
if first.PatientID then has_282_1 = exclude;
if other and not has_282_1 then output;
run;
proc sql:
create table MYLIB.STATISTICS as
select year, Diagonsis, count(*) as incidence
from WORK.NOT_HAVING_282_1
group by year, Diagonsis;
quit;
Remark: this code is not tested