Select median value per time range - sas

I need to select a median value for each id, in each age range. So in the following table, for id = 1, in age_range of 6 months, I need to select value for row 2. Basically, I need to create a column per id where only median for each range is selected.
id wt age_range
1 22 6
1 23 6
1 24 6
2 25 12
2 24 12
2 44 18

If I understand correctly, you're looking to make a new column where for each id and age_range you have the median value for comparison. You could do this in base SAS by using proc means to output the medians and then merge it back to the original dataset. However proc sql will do this all in one step and to easily name your new column.
proc sql data;
create table want as
select id, wt, age_range, median(wt) as median_wt
from have
group by id, age_range;
quit;
id wt age_range median_wt
1 24 6 23
1 22 6 23
1 23 6 23
2 24 12 24.5
2 25 12 24.5
2 44 18 44

Related

How to create a monthly summary table in Power BI

In PowerBI, I have a table with data in days
Table 1
Day
Order
1/1/2022
3
1/31/2022
5
2/2/2022
7
2/11/2022
12
3/1/2022
31
4/31/2022
5
4/2/2022
7
6/11/2022
21
And I want to have a summary table for months like
Table 1
Month
Order
1 2022
8
2 2022
19
3 2022
31
4 2022
12
6/11/2022
21
How can I do that using DAX?
Sure. Create a Calculated Table using the SUMMARIZECOLUMNS function.
Something like:
OrdersByMonth = summarizecolumns(MyTable[Month], "Orders", sum(MyTable[Orders]))

Calculating rows in a table for every Quarter in DAX, Power BI

I have the following tbl_Episodes table (50K records):
ID Month
22 01/01/2019
22 02/01/2019
22 03/01/2019
22 04/01/2019
22 05/01/2019
23 03/01/2020
23 06/01/2020
I need to create a calculated column in DAX language, that will place "1" value on each row where it'll be the beginning or the end of the Quarter, otherwise - "0" value, as:
ID Month NewColumn
22 01/01/2019 1
22 02/01/2019 0
22 03/01/2019 1
22 04/01/2019 0
22 05/01/2019 0
23 03/01/2020 1
23 06/01/2020 1
There are only 4 quarters, the simpler way is to switch dates :
Add to your calendar table columns :
(Consider that your calendar table has "Year" columns)
SWITCH([MONTH],date(1,1,[Year]),1,date(31,03,[Year]),1,
date(1,4,[Year]),1,date(30,6,[Year]),1
,date(1,7,[Year],1,date(30,9,[Year]),1
,date(1,10,[Year]),1,date(31,12,[Year]),1,0)

Changing ID from nth to last row if something happens at nth row

My data has some problem. The survey is conducted on housing unit. So the two rows with the same person ID might not actually indicate the same person.
I want to assign different ID for actually different person.
Let's say I have this data.
id yearmonth age
1 200001 12
1 200002 12
1 200003 14
1 200004 14
1 200005 14
3rd row is definitely different person. Its age increase by 2.
So I want to change ID like
id yearmonth age
1 200001 12
1 200002 12
10 200003 14
10 200004 14
10 200005 14
How can I do this? I think I can change the ID of 3rd row by writing
bysort id (yearmonth): replace id=id*10 if age[_n-1]>age+1 | age[_n-1]+1<age
(where I multiply by 10 because all IDs have the same number of numbers, so that multiplying by 10 won't give any duplicate)
But how can I change all subsequent rows?
Building on what you have, something like this might do what you want.
bysort id (yearmonth): generate idchange = age[_n-1]>age+1 | age[_n-1]+1<age
bysort id (yearmonth): generate numchange = sum(idchange)
replace id = 10*id + (idchange-1) if idchange>0
Note that this will handle the case where one original id has two or more changes detected. For up to 10 changes, anyhow.
id yearmonth age
2 200001 12
2 200002 14
2 200003 15
2 200004 18
2 200005 18

All the records within an hour of each other

I have a data set that has ID, datetime + bunch of value fields.
The idea is that the records are within one hour of each other are one session. There can only be one session every 24 hours. (Time is measured from the start of the first record)
The day() approach does not work as one record can be 23:55 PM and the next one could be 12:01 AM the next day and it would be the same session.
I've added rowid and ran the following:
data testing;
set testing;
by subscriber_no;
prev_dt = lag(record_ts);
prev_row = lag(rowid);
time_from_last = intck("Second",record_ts,prev_dt);
if intck("Second",record_ts,prev_dt) > -60*60 and intck("Second",record_ts,prev_dt) < 0 then
same_session = 'yes';
else same_session = 'no';
if intck("Second",record_ts,prev_dt) > -60*60 and intck("Second",record_ts,prev_dt) < 0 then
rowid = prev_row;
else rowid = rowid;
format prev_dt datetime19.;
output;
run;
Input
ID record_TS rowid
52 17MAY2017:06:24:28 4
52 17MAY2017:07:16:12 5
91 05APR2017:07:04:55 6
91 05APR2017:07:23:37 7
91 05APR2017:08:04:52 8
91 05MAY2017:08:56:23 9
input file is sorted by ID and record TS.
The output was
ID record_TS rowid prev_dt prev_row time_from_last same_session
52 17MAY2017:06:24:28 4 28APR2017:08:51:25 3 -1632783 no
52 17MAY2017:07:16:12 4 17MAY2017:06:24:28 4 -3104 yes
91 05APR2017:07:04:55 6 17MAY2017:07:16:12 5 3629477 no
91 05APR2017:07:23:37 6 05APR2017:07:04:55 6 -1122 yes
91 05APR2017:08:04:52 7 05APR2017:07:23:37 7 -2475 yes This needs to be 6
91 05MAY2017:08:56:23 9 05APR2017:08:04:52 8 -2595091 no
Second row from the bottom - rowid comes out 7, while I need it to come be 6.
Basically I need to change to the current rowid saved before the script moves to assess the next one.
Thank you
Ben
I've achieved what I needed with
proc sql;
create table testing2 as
select distinct t1.*, min(t2.record_TS) format datetime19. as from_time, max(t2.record_TS) format datetime19. as to_time
from testing t1
join testing t2 on t1.id_val= t2.id_val
and intck("Second",t1.record_ts,t2.record_ts) between -3600 and 3600
group by t1.id_val, t1.record_ts
order by t1.id_val, t1.record_ts
;
quit;
But I'm still wondering if there is a way to commit changes to current row before moving to assess the next row.
I think your logic is just:
Grab record_TS datetime of the first record for each ID
For subsequent records, if their record_TS is within an hour of the first record's, recode it to be the same rowID as first record.
If that's the case, you can use RETAIN to keep track of the first record_TS and rowID for each ID. This should be easier than lag(), and allows there to be multiple records in a single session. Below seems to work:
data have;
input ID record_TS datetime. rowid;
format record_TS datetime.;
cards;
52 17MAY2017:06:24:28 4
52 17MAY2017:07:16:12 5
91 05APR2017:07:04:55 6
91 05APR2017:07:23:37 7
91 05APR2017:08:04:52 8
91 05MAY2017:08:56:23 9
;
run;
data want;
set have;
by ID Record_TS;
retain SessionStart SessionRowID;
if first.ID then do;
SessionStart=Record_TS;
SessionRowID=RowID;
end;
else if (record_TS-SessionStart)<(60*60) then RowID=SessionRowID;
drop SessionStart SessionRowID;
run;
Outputs:
ID record_TS rowid
52 17MAY17:06:24:28 4
52 17MAY17:07:16:12 4
91 05APR17:07:04:55 6
91 05APR17:07:23:37 6
91 05APR17:08:04:52 6
91 05MAY17:08:56:23 9

Adding column based on ID in another data

data1 is data from 1990 and it looks like
Panelkey Region income
1 9 30
2 1 20
4 2 40
data2 is data from 2000 and it looks like
Panelkey Region income
3 2 40
2 1 30
1 1 20
I want to add a column of where each person lived in 1990.
Panelkey Region income Region1990
3 2 40 .
2 1 30 1
1 1 20 9
How can I do this on Stata?
The following code will deal with panels that live in multiple regions in the same year by choosing the region with larger income. This would make sense if income was proportional to fraction of the year spent in a region. Same income ties will be broken arbitrarily using the highest region's value. Other types of aggregation might make sense (take a look at the -collapse- command).
Note that I tweaked your data by inserting second rows for the last observation in each year:
clear
input Panelkey Region income
1 9 30
2 1 20
4 2 40
4 10 80
end
rename (Region income) =1990
bysort Panelkey (income Region): keep if _n==_N
isid Panelkey
save "data1990.dta", replace
clear
input Panelkey Region income
3 2 40
2 1 30
1 1 20
1 9 20
end
bysort Panelkey (income Region): keep if _n==_N
isid Panelkey
merge 1:1 Panelkey using "data1990.dta", keep(match master) nogen
list, clean noobs