Win percentage rolling average in Stata - stata

I'm currently producing some work on a win percentage rolling average for international football managers, as I aim to use this in some duration modelling in the future. I'm just a little unsure on how to produce this rolling average in Stata to take account of the dummies for win, draw or loss and when the manager leaves their job. I've also produced a 'result variable' merely as a category variable capturing these three outcomes.
E.g for the first 3 observations in my dataset I have the first manager who wins his first two games and loses his third; after this he then leaves his position. So numerically he would have 100% win percentage for the first and second observations followed by 66.6% for the third. Then the win percentage would have to reset for the new manager. I've coded managers' ids respectively if this helps. I'm just wondering how to code this rolling average properly as opposed to using a calculator each time?

Suppose you have data like this with a win or lose dummy:
win manager_id game_num
1 1 1
1 1 2
0 1 3
1 2 1
1 2 2
1 2 3
0 3 1
0 3 2
1 3 3
You can use something like this:
bysort manager_id (game_num): gen pct = sum(win)
replace pct = 100*pct/game_num

Related

I'm trying to filter a matrix visual based on the number of occurrences

I have a table full of inter-warehouse transfers that I'm trying to analyze. It looks something like this:
Product
Quantity
From Warehouse
To Warehouse
Date
Item 1
1
South
North
1/1/22
Item 1
1
South
North
4/1/22
Item 2
5
North
South
3/1/22
Item 3
6
South
North
2/1/22
Item 3
2
South
North
6/1/22
Item 3
2
North
South
5/1/22
I'm trying to see which items we regularly transfer between warehouses, so I have a simple
Number of Transfers = COUNTROWS()
measure that is effectively returning the number of transfers for each product as another column in my visual. Item 1 is showing "2", Item 2 is showing "1", etc.
I don't want to see intermittent transfers, so I tried filtering the visual to see all products with a countrows result greater than or equal to a certain number, but when I do, nothing comes up. I've also tried using a calculated column and run into the same issue.
I can do the Top N based on the measure, but that's pretty tedious and feels sloppy. What I'd like is to filter out anything below a given number of transfers. I'm still rather new to all of this and would like to avoid building bad habits if possible.
Does anyone have a better idea how I can go about this? Thanks in advance!

Moving average of cumulative points

Let's say a have a dataset with teams' scores over many matches, aggregated at a daily level. How could I compute a moving average of the last 10 matches?
Important: some teams may have played more than once per day. I have something like this:
Date
Team
Score
Match id
01Jan2021
Team A
3
1
01Jan2021
Team A
2
2
02Jan2021
Team A
4
3
02Jan2021
Team B
4
3
etc. Note that race_id may be repeated, since more Teams A and B may have competed against each other.
Basically, I need to get the moving average of scores of each given team, considering the last 10 matches it was featured in (if it played less than 10, then the moving average of the total number of matches).

Calculate the frequency of duplicates using table calculations in Looker

I have an explore like the following -
Timestamp Rate Count
July 1 $2.00 15
July 2 $2.00 12
July 3 $3.00 20
July 4 $3.00 25
July 5 $2.00 10
I want to get the below results -
Rate Number of days Count
$2.00 3 37
$3.00 2 45
How can I calculate the Number of days column in the the table calculation? I don't want the timestamp to be included in the final table.
First of all— is rate a dimension? If so, and you have LookML access, you could create a "Count Days" measure that's just a simple count, and then return Rate, Count Days, and Count. That would be really simple.
If you can't do that, this hard to do with just a table calculation, since what you're asking for is to change the grouping of the data. Generally, that's something that's only possible in SQL or LookML, where you can actually alter the grouping and aggregation of the data.
With Table Calculations, you can make operations on the data that's been returned by the query, but you can't change the grouping or aggregation of it— So the issue becomes that it's quite difficult to take 3 rows and then use a table calculation to represent those as 1 row.
I'd recommend taking this to the LookML or SQL if you have developer access or can ask someone who does. If you can't do that, then I'd suggest you look at this thread: https://discourse.looker.com/t/creating-a-window-function-inside-a-table-calculation-custom-measure/16973 which explains how to do these kinds of functions in table calculations. It's a bit complex, though.
Once you've done the calculation, you'd want to use the Hide No's from Visualization feature to remove the rows you aren't interested in.

Trying to group a tree exported from command prompt

In excel macro, i am trying to get numbers to group as follows.
1
1
2
2
5
5
5
1
1
7
7
7
7
1
1
My data is in the format above. I need the numbers that are next too each other need to group.
For example
1 Grouped
2 Grouped
5 Grouped
1 Grouped
7 Grouped
1 Grouped
Can anyone help?
Grouping data in Excel has many straightforward options
Grouping in excel is available:
based on the "outline" feature
based on the "grouping" feature
based on the "pivot table" feature
See also
MSFT Excel Support -- Pivot tables

Randomly Assign Observations into either Test or Control for a SAS dataset

I have a large SAS dataset and I want to randomly assign the observations to different test and control groups.
20% of observations would have be Control
5% would be Test1
75% would be Test2
Basicaly,
obs
1
2
3
4
5
would become
obs cell
1 control
2 test2
3 test2
4 test1
5 test2
How would I do that?
Thanks
PROC SURVEYSELECT is the base way of doing this. However, Surveyselect doesn't allow picking 3 groups at once.
You can either do this in the data step, or use SURVEYSELECT twice; once to pick the first group (20%), then pick the second group (75%/80%, 93.75%) from the unselected, then still-unselected are group 3.
In the datastep this isn't terribly difficult; you can either just assign a random value, sort the data by the random value, then take first 5% of records as 1, next 20% of (all) records as 2, and last 75% as 3; or you can use k/n sampling with some modifications for a third group.