Moving average of cumulative points

Moving average of cumulative points - stata

Let's say a have a dataset with teams' scores over many matches, aggregated at a daily level. How could I compute a moving average of the last 10 matches?
Important: some teams may have played more than once per day. I have something like this:
Date
Team
Score
Match id
01Jan2021
Team A
3
1
01Jan2021
Team A
2
2
02Jan2021
Team A
4
3
02Jan2021
Team B
4
3
etc. Note that race_id may be repeated, since more Teams A and B may have competed against each other.
Basically, I need to get the moving average of scores of each given team, considering the last 10 matches it was featured in (if it played less than 10, then the moving average of the total number of matches).

Related

Measure to calculate Average Unit Cost providing unexpected values

I'd like to show the average unit cost in each year for a series of items I'm purchasing. I'll be purchasing 13 items over 10 years. In some years, I purchase multiple items, in others, I purchase only one.
My Measure is Sum(Cost)/DistinctCount(Items). In all cases, this results in a count of 13 items, rather than the number of items in a given year. For reference, I'm using this in a line chart, where years are along the x-axis, so I (wrongly) assumed that the Year context would apply. Any suggestions?
For example, in one year, I'm purchasing 5 items for $350 dollars, which should result in an average unit cost of $70, but is instead returning $26.92, which is 350/13.

If the year context doesn't apply, then it sounds like an issue with your data model. Have a look at whether there's a relationship from your Date table to your Fact table (the one with the item counts).

Calculate the frequency of duplicates using table calculations in Looker

I have an explore like the following -
Timestamp Rate Count
July 1 $2.00 15
July 2 $2.00 12
July 3 $3.00 20
July 4 $3.00 25
July 5 $2.00 10
I want to get the below results -
Rate Number of days Count
$2.00 3 37
$3.00 2 45
How can I calculate the Number of days column in the the table calculation? I don't want the timestamp to be included in the final table.

First of all— is rate a dimension? If so, and you have LookML access, you could create a "Count Days" measure that's just a simple count, and then return Rate, Count Days, and Count. That would be really simple.
If you can't do that, this hard to do with just a table calculation, since what you're asking for is to change the grouping of the data. Generally, that's something that's only possible in SQL or LookML, where you can actually alter the grouping and aggregation of the data.
With Table Calculations, you can make operations on the data that's been returned by the query, but you can't change the grouping or aggregation of it— So the issue becomes that it's quite difficult to take 3 rows and then use a table calculation to represent those as 1 row.
I'd recommend taking this to the LookML or SQL if you have developer access or can ask someone who does. If you can't do that, then I'd suggest you look at this thread: https://discourse.looker.com/t/creating-a-window-function-inside-a-table-calculation-custom-measure/16973 which explains how to do these kinds of functions in table calculations. It's a bit complex, though.
Once you've done the calculation, you'd want to use the Hide No's from Visualization feature to remove the rows you aren't interested in.

Ranking in Tableau Based on another Field

How to rank based on a field where the value is the same. There is some more ranking applied here and this scenario need to be addressed as well as I can not rank again by Sales field. Instead, I need to say:
If Unit is the same on the list of territories, rank based on Sales.
Example:
Terr Sales Unit Should look like : Terr Sales Unit
---- ------ ----- -----------------------
Boston 1 5 Maine 10 5
Maine 10 5 Boston 1 5

Often a mathematical approach works well for this. First, without wanting to patronise, it's possible to use a discrete (blue) measure to sort data. Place the sorting pill to the far left on the Rows and the table will sort according to this pill.
Ok, so the formula. Without knowing how large the Sales figure can go, you want to create a calculation that would give the highest value to that you want to appear top.
For example perhaps multiple Unit by 1,000,000 and add Sales. Just make sure the Units are multiplied by a number large enough to make Sales inconsequential.
This field may work, depending how large the Sales figure can go:
[SortField] = (SUM([Unit])*1000000 + SUM([Sales])) * -1
Put the field to Rows, convert to Discrete, then place to the far left. If the sorting is correct hide the field header.
It multiplies by -1 to sort descending.

DAX TOPN Filter Not Returning Enough Rows

I'm trying to apply a TOPN() visual filter to a Power BI sheet based on an Average Loan Amount measure. I want to see the top 5 employees with the highest average loan amount, ignoring employees who have disbursed 4 or fewer loans.
The problem I'm running into is that I don't get 5 rows returned, even though I've selected the top 5. I have to adjust the "TOPN" parameter (in the visuals) to include more than 5, just to get 5 rows.
This seems to be because when I have both the TOP5 average AND the loan count > 4 filters working, neither updates the other; that is, I can find the top 5 rows based on the average parameter, but once I include the "loan count > 4" condition, a few of the top 5 disappear, and they're not replaced by the runners-up to the original 5.
In the past, when I placed a top 5 filter for average and nothing came up, it was because all the top 5 entries all had a loan count of under 5. Once I relaxed the "TOPN" condition to be "TOP 52," I got 5 entries visible.
Does anyone know why this happens & how to fix it so I always get 5 rows returned?
EDITED TO ADD: For an example of the data, please click here. Please note that any employee with a loan count of 4 or less should be filtered out. I created the filter in PowerBI because the data sets are dynamic, and so are the filter results.

The fundamental problem is that you're applying 2 filters to the same visualization:
You only want to include employee's with a loan count of 5 or more
Of those, you want the 5 employees with the highest average loan amount
Power BI is applying both filters independently. So, it is taking the 5 employees with the highest average loan amount, and then removing 3 of them because their loan count is less than 5. I can imagine this is a common problem for people working with a Top N filter plus another filter.
One way to work around this (and I don't claim this is the only or even the best way), is to take into account the loan count before calculating the average.
For example, assuming you have the following two measures and the following data:
Loan Count = DISTINCTCOUNT(Employee[Loan Number])
Avg Loan Amt = AVERAGE(Employee[Loan Amount])
It's clear from the picture that Liz, Montgomery and Oscar are in the top 5 but have only 3 loans to their name.
Next, we can create a new measure that checks the Loan Count before calculating the average loan amount. If the loan count doesn't meet the threshold, you don't care about their average.
*Filtered Avg Loan Amt = IF([Loan Count] < 5, BLANK(), [Avg Loan Amt])
This creates the following result. Notice that Liz, Montgomery & Oscar now all have no average calculated because they don't have enough loans.
Now, you don't necessarily have to display the Filtered Avg Loan Amt measure on your table, but you can now use that measure in your Top N visual filter and that, by itself, will filter your table to the top 5 employees with a high enough loan count.
Notice that in my filters, I only have 1 filter (on Filtered Avg Loan Amt). I don't also need to filter to a loan count of 5 or greater. This results in the following top 5 employees:
I hope this solves the problem you're having!
Unrelated sidenote: if you're using this threshold of 5 in a few places, I would recommend sourcing the number from an external source (including possibly a disconnected table) rather than hard-coding 5 in the measure itself. That way, if someone decides that 5 isn't the right threshold, you only have to update it one place, rather than hunting through all your measures looking for the number 5. There's an article here on using a disconnected table so that end-users can pick the threshold themselves (though it could definitely be overkill for your situation): https://powerpivotpro.com/2013/08/moving-averages-controlled-by-slicer/

Win percentage rolling average in Stata

I'm currently producing some work on a win percentage rolling average for international football managers, as I aim to use this in some duration modelling in the future. I'm just a little unsure on how to produce this rolling average in Stata to take account of the dummies for win, draw or loss and when the manager leaves their job. I've also produced a 'result variable' merely as a category variable capturing these three outcomes.
E.g for the first 3 observations in my dataset I have the first manager who wins his first two games and loses his third; after this he then leaves his position. So numerically he would have 100% win percentage for the first and second observations followed by 66.6% for the third. Then the win percentage would have to reset for the new manager. I've coded managers' ids respectively if this helps. I'm just wondering how to code this rolling average properly as opposed to using a calculator each time?

Suppose you have data like this with a win or lose dummy:
win manager_id game_num
1 1 1
1 1 2
0 1 3
1 2 1
1 2 2
1 2 3
0 3 1
0 3 2
1 3 3
You can use something like this:
bysort manager_id (game_num): gen pct = sum(win)
replace pct = 100*pct/game_num

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js