I have a sales dataset where, in addition to (Sale#) and several foreign keys ID1, ID2, ID3, ID4 I have up to 3 Invoices (associated with a Sale#). E.g. Invoice#1, SaleAmount1, Date1, Invoice#2, SaleAmount2, Date2, Invoice#3, SaleAmount3, Date3; all as columns.
I need these three invoice information as rows (as shown below ) instead of columns. Any idea how can it be done in Power BI?
Sales# ID1 ID2 ID3 Invoice# SaleAmount Date
----------------------------------------------------------
Sales#1 123 XYZ A234Y Invoice#1 SaleAmount1 Date1
Sales#1 123 XYZ A234Y Invoice#2 SaleAmount2 Date2
Sales#1 123 XYZ A234Y Invoice#3 SaleAmount3 Date3
Yep, you just need to unpivot sets of columns.
For example, if you select the 3 date columns in the following table
Sales# ID1 Date1 Date2 Date3
-----------------------------------------
Sale#1 123 1/1/2018 1/2/2018 1/3/2018
Sale#2 456 2/2/2018 3/3/2018 4/4/2018
and go to Transform > Unpivot Column in the query editor, then you'll get this:
Sales# ID1 Attribute Value
--------------------------------
Sale#1 123 Date1 1/1/2018
Sale#1 123 Date2 1/2/2018
Sale#1 123 Date3 1/3/2018
Sale#2 456 Date1 2/2/2018
Sale#2 456 Date2 3/3/2018
Sale#2 456 Date3 4/4/2018
Then pick which column(s) you want to keep and rename appropriately.
You can do the other sets of 3 columns in the same way.
Related
I need to write data step query in sas where i need to give sequence numbers to a column starting from a particular number.
For example right now my table looks like this:
Column 1 Column 2
abc book1
xyz book2
zex book3
I want my table to look like this:
Column 1 Column 2 Column3
abc book1 151
xyz book2 152
zex book3 153
How to add Column 3 with a sequence number staring from a particular number?
How about this
data have;
input Column1 $ Column2 $;
datalines;
abc book1
xyz book2
zex book3
;
data want;
do Column3 = 150 by 1 until (lr);
set have end=lr;
output;
end;
run;
I have tables as below.
Table A(total of 3000 rows, end_date may have duplicates, ex, 123 and 223 may have the same end_date)
enroll_dt,end_date, acct_nbr
12/31/2016, 01/03/2017, 123
12/31/2016, 01/04/2017, 234
01/05/2017, 02/02/2017, 334
Table B(total of 30 unique values)
enroll_dt
12/31/2016
01/01/2017
01/02/2017
01/03/2017
01/04/2017
01/05/2017
...
Desired table:
Date number_of_records
12/31/2016 2
01/01/2017 2
01/02/2017 2
01/03/2017 2
01/04/2017 1
02/01/2017 1
What I want to do is for each value from Table B, I would sort all of rows from Table A, and return # of acct_nbr if
for total # of accounts get enrolled until dateA, how many accounts have
end_date>DateA.
Ex. for 01/01/2017 from Table B, number_of_records = 2 since we only have 2 accounts enrolled until 01/01/2017(acct_nbr=123 and 234)
and end_date'01/03/2017' and '01/04/2017' both greater than '01/01/2017'
Thanks a lot for your help
Assuming your dates are stored as actual dates:
select
b.datea,
count(distinct a.acct_nbr)
from
b
inner join a
on a.end_date >= b.datea
group by
1
I am struggling to join two table without creating duplicate rows using proc sql ( not sure if any other method is more efficient).
Inner join is on: datepart(table1.date)=datepart(table2.date) AND tag=tag AND ID=ID
I think the problem is date and different names in table 1. By just looking that the table its clear that table1's row 1 should be joined with table 2's row 1 because the transaction started at 00:04 in table one and finished at 00:06 in table 2. I issue I am having is I cant join on dates with the timestamp so I am removing timestamps and because of that its creating duplicates.
Table1:
id tag date amount name_x
1 23 01JUL2018:00:04 12 smith ltd
1 23 01JUL2018:00:09 12 anna smith
table 2:
id tag ref amount date
1 23 19 12 01JUL2018:00:06:00
1 23 20 12 01JUL2018:00:10:00
Desired output:
id tag date amount name_x ref
1 23 01JUL2018 12 smith ltd 19
1 23 01JUL2018 12 anna smith 20
Appreciate your help.
Thanks!
You need to set a boundary for that datetime join. You are correct in why you are getting duplicates. I would guess the lower bound is the previous datetime, if it exists and the upper bound is this record's datetime.
As an aside, this is poor database design on someone's part...
Let's first sort table2 by id, tag, and date
proc sort data=table2 out=temp;
by id tag date;
run;
Now write a data step to add the previous date for unique id/tag combinations.
data temp;
set temp;
format low_date datetime20.
by id tag;
retain p_date;
if first.tag then
p_date = 0;
low_date = p_date;
p_date = date;
run;
Now update your join to use the date range.
proc sql noprint;
create table want as
select a.id, a.tag, a.date, a.amount, a.name_x, b.ref
from table1 as a
inner join
temp as b
on a.id = b.id
and a.tag = b.tag
and b.low_date < a.date <= b.date;
quit;
If my understanding is correct, you want to merge by ID, tag and the closest two date, it means that 01JUL2018:00:04 in table1 is the closest with 01JUL2018:00:06:00 in talbe2, and 01JUL2018:00:09 is with 01JUL2018:00:10:00, you could try this:
data table1;
input id tag date:datetime21. amount name_x $15.;
format date datetime21.;
cards;
1 23 01JUL2018:00:04 12 smith ltd
1 23 01JUL2018:00:09 12 anna smith
;
data table2;
input id tag ref amount date: datetime21.;
format date datetime21.;
cards;
1 23 19 12 01JUL2018:00:06:00
1 23 20 12 01JUL2018:00:10:00
;
proc sql;
select a.*,b.ref from table1 a inner join table2 b
on a.id=b.id and a.tag=b.tag
group by a.id,a.tag,a.date
having abs(a.date-b.date)=min(abs(a.date-b.date));
quit;
I have 4 columns in my SAS dataset as shown in first image below. I need to compare the dates of consecutive rows by ID. For each ID, if Date2 occurs before the next row's Date1 for the same ID, then keep the Bill amount. If Date2 occurs after the Date1 of the next row, delete the bill amount. So for each ID, only keep the bill where the Date2 is less than the next rows Date1. I have placed what the result set should look like at the bottom.
Result set should look like
You'll want to create a new variable that moves the next row's DATE1 up one row to make the comparison. Assuming your date variables are in a date format, use PROC EXPAND and make the comparison ensuring that you're not comparing the last value which will have a missing LEAD value:
DATA TEST;
INPUT ID: $3. DATE1: MMDDYY10. DATE2: MMDDYY10. BILL: 8.;
FORMAT DATE1 DATE2 MMDDYY10.;
DATALINES;
AA 07/23/2015 07/31/2015 34
AA 07/30/2015 08/10/2015 50
AA 08/12/2015 08/15/2015 18
BB 07/23/2015 07/24/2015 20
BB 07/30/2015 08/08/2015 20
BB 08/06/2015 08/08/2015 20
;
RUN;
PROC EXPAND DATA = TEST OUT=TEST1 METHOD=NONE;
BY ID;
CONVERT DATE1 = DATE1_LEAD / TRANSFORMOUT=(LEAD 1);
RUN;
DATA TEST2; SET TEST1;
IF DATE1_LEAD NE . AND DATE2 GT DATE1_LEAD THEN BILL=.;
RUN;
If you sort your data so that you are looking to the previous obs to compare your dates, you can use a the LAG Function in a DATA STEP.
I'm attempting to put together a list of people and the number of times a claim is submitted in a unique combination.
Table A structure is setup like this:
PERSON_ID CLAIM_ID
123456 A123C
123456 Z321C
123456 B123C
111111 A123C
111111 Z321C
Table B structure is setup like this:
PERSON_ID CLAIM_1 CLAIM_2 CLAIM_3
123456 A123C Z321C B123C
123456 A123C B123C
123456 B123C
111111 A123C Z321C
111111 A321C
The results I need to produce is like this:
PERSON_ID CLAIM_ID NUM_TIMES_CLAIMED
123456 A123C 2
123456 Z321C 1
123456 B123C 3
111111 A123C 1
111111 Z321C 2
I can do this in MSAccess using loops with open recordsets and I've tried researching on how to open a SAS recordset to loop through (macros) it but I can't seem to sort out how to implement it correctly.
Any ideas?
EDIT
The steps that I think I have to take are:
Step 1 - Isoloate a single persons distinct list of CLAIM_IDs
Step 2 - For each CLAIM_ID, scan across 25 variables to find a match
Step 3 - Count each time a match is found
Step 4 - Save observation (PERSON_ID, CLAIM_ID, NUM_TIMES_CLAIMED)
From VBA to SAS I can't seem to isolate the single persons distinct list of claims and loop through them while looping through each of the 25 variables in TABLE B
Here's what I use to evaluate if one claim is billed with another which is what I think I need to automate somehow:
data LOCALPC.SEL_ASMT_DEL;
SET LOCALPC.FY2014_CC_FINAL;
ARRAY FSC{25} $ FSC1-FSC25;
DO I = 1 TO 25;
IF FIND (FSC{I},'A123A') THEN
DO N = I+11 TO 25;
IF FIND (FSC{J},'Z321A') THEN
OUTPUT;
END;
END;
RUN;
I think you can get the result from just from 'Table A' assuming all the claims are inserted in Table A in the form of rows and there are duplicated claims for a person_id.
SELECT PERSON_ID, CLAIM_ID, COUNT(1)
FROM [TABLE A] A
GROUP BY PERSON_ID, CLAIM_ID
If not, then please describe your table structures and relations between them so that we could help you.
Not sure why you would ever use loops to answer a straight forward join. Now it would be easier if you first convert table B to a more normalized form.
First get your sample data into datasets:
data A ;
length PERSON_ID CLAIM_ID $10 ;
input PERSON_ID CLAIM_ID ;
cards;
123456 A123C
123456 Z321C
123456 B123C
111111 A123C
111111 Z321C
;;;;
data B ;
length PERSON_ID CLAIM_1 - CLAIM_3 $10 ;
input PERSON_ID CLAIM_1-CLAIM_3 ;
cards;
123456 A123C Z321C B123C
123456 A123C B123C .
123456 B123C . .
111111 A123C Z321C .
111111 A321C ..
;;;;
Then just join the tables and count the number of matching rows.
proc sql ;
create table want as
select a.*,count(*) as num_times_claimed
from a
left join b
on a.person_id = b.person_id
and (a.claim_id = b.claim_1
or a.claim_id = b.claim_2
or a.claim_id = b.claim_3
)
group by 1,2
order by 1,2
;
quit;
proc print; run;
Results:
PERSON_
ID CLAIM_ID num_times_claimed
111111 A123C 1
111111 Z321C 1
123456 A123C 2
123456 B123C 3
123456 Z321C 1