Expanding per-person cumulative variable to time-interval variable - stata

I have dataset which shows how people spent their 30 minutes in 10-minute interval.
Person cumulative_time Activity
A 10 Game
A 30 Eat
B 10 Sleep
B 20 Game
B 30 Sleep
which means person A did gaming during the first 10 minutes,
and eating during the next 20 minutes,
and person B was sleeping for the first 10 min,
gaming for the next 10 min, and sleeping for the last 10 mins.
I want to restructure the dataset. Each row will be each unique person.
Then, each column will be each time interval like this.
Person time10 time20 time30
A Game Eat Eat
B Sleep Game Sleep
I know I can use "collapse" to make person unique but I don't know how this can be used for my purpose. The "reshape" command does something similar but again I cannot figure out how to use it to do what I want to do.

Reshape is the way to solve this problem. Something like this may accomplish what you need.
clear
input str1 Person int cumulative_time str8 Activity
A 10 Game
A 30 Eat
B 10 Sleep
B 20 Game
B 30 Sleep
end
rename Activity time
reshape wide time, i(Person) j(cumulative_time)
replace time20 = time10 if missing(time20)
replace time30 = time20 if missing(time30)
list, clean
If your problem had many cumulative_time values, not just three, I would solve the problem of missing values in a different way.

In addition to the William Lisowski answer, here is an approach using tsset and tsfill commands:
clear
input str1 Person int cumulative_time str8 Activity
A 10 Game
A 30 Eat
B 10 Sleep
B 20 Game
B 30 Sleep
end
rename Activity time
egen id = group(Person)
tsset id cumulative_time, delta(10)
tsfill, full
bysort id : replace Person = Person[_n-1] if Person==""
bysort id : replace time= time[_n+1] if time==""
drop id
reshape wide time, i(Person) j(cumulative_time)
list, clean
Which outputs:
Person time10 time20 time30
1. A Game Eat Eat
2. B Sleep Game Sleep

Related

QuickSight - How to calculate sum of rows with if statement

Given the following table I want to create another table with the total of video_duration and total duration of videos that have events
video can have numerous events, but the duration is of course the same per video file, hence I can have rows of the same video with different event but the video duration stay the same.
input
filename
event
video_duration
A
RUN
20
A
WALK
20
B
FIGHT
10
B
RUN
10
C
30
D
WALK
25
D
FALL
25
E
15
desired output
total videos duration
videos with events duration
100
55
what I've tried:
I created a calculated field
C_total_videos_duration = sum(max({video_duration, [filename]))
which gave me the desired output (100). But, for gods sake, I can't figure out how to get the "videos without events duration".
I have tried:
sumIf(max({video_duration}, [{filename}]), isNotNull({event})) ERROR: the calculation operated on LAC agg experssions is not valid
sum(maxIf({video_duration}, isNotNull({event})), [{filename}])
ERROR: Nesting of aggregate functions like NESTED_sum and NESTED_SUM(MAX(CASE WHEN "id" IS NOT NULL THEN video_duration ELSE NULL END), filename) is not allowed
ifelse(isNotNull({event}), sum(max({vide_duration}, [{filename}])), 0) ERROR: Mismatched aggregation. Custom aggregations can't contain both aggregate SUM and non-aggregated fields SUM(NESTEDMAX(video_duration, filename)) in any combination
The only thing that partially work is
sumOver(maxIf({video_duration},isNotNull(id)), [filename],POST_AGG_FILTER)
but here I get:
filename
total_videos_duration
videos_with_events_duration
A
20
20
B
10
10
C
30
D
25
25
E
15
Total
100
55
I don't this output because I have A LOT of videos, I just want to get the total durations
thank you!
I figure it out just now!
I did sum(max(ifelse(isNotNull(id),{video_duration}, 0), [filename])) and it worked.
Thank you stack!

DAX : Average of Cumulative Durations

I need a little help with this one that seems very simple but I cant write the right DAX for it.
Context
I have a table of insurance claims and the days they were assigned and unassigned to adjusters, and the duration of this assignments in days.
ClaimID
Another header
A header
Another header
1
10/31/2022
11/30/2022
30
1
1/1/2023
1/4/2023
3
2
10/29/2022
12/28/2022
60
2
12/28/2022
1/6/2023
9
I need a measure (CycleTime) that calculates a monthly cumulative duration for each claim, and then take an average. All this based on the UnAssignedDate.
Desired output.
The measure will be plotted by month-year and this is how it needs to calculate CycleTime:
November 2022 : We only have one unassigned claim (1), so the cycletime equals to that single duration (30).
December 2022 : Again, we only have one unassigned claim (2), so the cycletime equals to that single duration (60).
January 2022 : For this month, both claims were unassigned, so we need to calculate the cumulative duration for each one and then take the average:
Claim 1 : 30 + 3 = 33
Claim 2 : 60+9 = 69
CycleTime = (33 + 69)/2 = 51
The measure should work for multiple claims and multiple unassignments per claim.
Any help would be greatly appreciated. Thank you for reading!

Determine Maximum Profit Algorithm C++

Consider the following problem:
The Searcy Wood Shop has a backlog of orders for its world famous rocking chair (1 chair per order). The total time required to make a chair is 1 week. However, since the chairs are sold in different regions and various markets, the amount of profit for each order may differ. In addition, there is a deadline associated with each order. The company will only earn a profit if they meet the deadline; otherwise, the profit is 0.
Write a program that will determine an optimal schedule for the orders that will maximize profit.
The first line in a test case will contain an integer, n (0 ≤ n ≤ 1000), that represents the number of orders that are pending. A value of 0 for n indicates the end of the input file.
The next n lines contain 3 positive integers each. The first integer, i, is an order number. All order numbers for a given test case are unique. The second integer represents the number of weeks from now until the deadline for order number i. The third integer represents the amount of profit that the company will earn if the deadline is met for order number i.
Example input:
7
1 3 40
2 1 35
3 1 30
4 3 25
5 1 20
6 3 15
7 2 10
4
3054 2 30
4099 1 35
3059 2 25
2098 1 40
0
Ouput:
100
70
The output will be the optimal sum of the input of the test case.
The problem that I am having is that I am struggling to come up with an algorithm that consistently finds this optimal sum.
My first idea was that I could simply go through each input week by week and choose the chair with the highest profit for said week. This didn't work though in the case that a week has two chairs that both have a higher profit than the week prior.
My next idea was that I could order the list in order from highest to lowest profit. Then I would go through the list from the highest profit and compare the current entry to the next entry and choose the entry with the lower week.
None of these are consistently working. Can anyone help me?
I would first sort the list by second column (number of weeks before the deadline) in increasing order and then sort the third column (profit) in decreasing order.
For example, in your file:
2098 1 40
2 1 35
4099 1 35
3 1 30
5 1 20
3054 2 30
3059 2 25
7 2 10
1 3 40
4 3 25
6 3 15
Among the same number of week orders, I will peak the highest profit to execute. If deadline is 1 week - top highest order; 2 weeks - 2 top highest orders, 3 weeks - 3 top highest orders and so on.
Firstly you'll have to think which orders are eligible to be completed on the 'ith' day, that would be all the orders with deadline greater than or equal to i. So just iterate all the orders in decreasing order of their deadline.
Lets say the last deadline week is 'x' then push all the profit values of week 'x' in a priority queue. The max value from the pushed values would be your optimal profit for week 'x'. Now remove the selected profit from the priority queue and add it to your answer. The remaining values are still eligible to be used in the previous weeks and now add the profit values with deadline 'x-1' to the priority queue and take the max out of them and repeat until deadline week becomes 0.

PowerBI and filtered sum calculation

I should be able to make a report concerning a relationship between sick leaves (days) and man-years. Data is on monthly level, consists of four years and looks like this (there is also own columns for year and business unit):
Month Sick leaves (days) Man-years
January 35 1,5
February 0 1,63
March 87 1,63
April 60 2,4
May 44 2,6
June 0 1,8
July 0 1,4
August 51 1,7
September 22 1,6
October 64 1,9
November 70 2,2
December 55 2
It has to be possible for the user to filter year, month, as well as business unit and get information about sick leave days during the filtered time period (and in selected business unit) compared to the total sum of man-years in the same period (and unit). Calculated from the test data above, the desired result should be 488/22.36 = 21.82
However, I have not managed to do what I want. The main problem is, that calculation takes into account only those months with nonzero sick leave days and ignores man-years of those months with zero days of sick leaves (in example data: February, June, July). I have tried several alternative functions (all, allselected, filter…), but results remain poor. So all information about a better solution will be highly appreciated.
It sounds like this has to do with the way DAX handles blanks (https://www.sqlbi.com/articles/blank-handling-in-dax/). Your context is probably filtering out the rows with blank values for "Sick-days". How to resolve this depends on how your data are structured, but you could try using variables to change your filter context or use "IF ( ISBLANK ( ... ) )" to make sure you're counting the blank rows.

How to write the best code for data aggregation?

I have the following dataset (individual level data):
pid year state income
1 2000 il 100
2 2000 ms 200
3 2000 al 30
4 2000 dc 400
5 2000 ri 205
1 2001 il 120
2 2001 ms 230
3 2001 al 50
4 2001 dc 400
5 2001 ri 235
.........etc.......
I need to estimate average income for each state in each year and create a new dataset that would look like this:
state year average_income
ar 2000 150
ar 2001 200
ar 2002 250
il 2000 150
il 2001 160
il 2002 160
...........etc...............
I already have a code that runs perfectly fine (I have two loops). However, I would like to know is there any better way in Stata like sql style query?
This is shorter code than any suggested so far:
collapse average_income=income, by(state year)
This shouldn't need 2 loops, or any for that matter. There are in fact more efficient ways to do this. When you are repeating an operation on many groups, the bysort command is useful:
bysort year state: egen average_income = mean(income)
You also don't have to create a new dataset, you can just prune this one and save it. Start by only keeping the variables you want (state, year and average_income) and get rid of duplicates:
keep state year average_income
duplicates drop
save "mynewdataset.dta"
You have the SQL tag on the question. This is a basic aggregation query in SQL:
select state, year, avg(income) as average_income
from t
group by state, year;
To put this in a table, depends on your database. One of the following typically works:
create table NewTable as
select state, year, avg(income) as average_income
from t
group by state, year;
Or:
select state, year, avg(income) as average_income
into NewTable
from t
group by state, year;