Calculate the frequency of duplicates using table calculations in Looker - amazon-web-services

I have an explore like the following -
Timestamp Rate Count
July 1 $2.00 15
July 2 $2.00 12
July 3 $3.00 20
July 4 $3.00 25
July 5 $2.00 10
I want to get the below results -
Rate Number of days Count
$2.00 3 37
$3.00 2 45
How can I calculate the Number of days column in the the table calculation? I don't want the timestamp to be included in the final table.

First of all— is rate a dimension? If so, and you have LookML access, you could create a "Count Days" measure that's just a simple count, and then return Rate, Count Days, and Count. That would be really simple.
If you can't do that, this hard to do with just a table calculation, since what you're asking for is to change the grouping of the data. Generally, that's something that's only possible in SQL or LookML, where you can actually alter the grouping and aggregation of the data.
With Table Calculations, you can make operations on the data that's been returned by the query, but you can't change the grouping or aggregation of it— So the issue becomes that it's quite difficult to take 3 rows and then use a table calculation to represent those as 1 row.
I'd recommend taking this to the LookML or SQL if you have developer access or can ask someone who does. If you can't do that, then I'd suggest you look at this thread: https://discourse.looker.com/t/creating-a-window-function-inside-a-table-calculation-custom-measure/16973 which explains how to do these kinds of functions in table calculations. It's a bit complex, though.
Once you've done the calculation, you'd want to use the Hide No's from Visualization feature to remove the rows you aren't interested in.

Related

Power BI - How do I use calculated values from various tables and store them in a separate table

I'm trying to figure out a solution to my problem. Basically we get a monthly report with about 3000 records and there's a bunch of reporting that is done on that, and there are calculations based on various columns. e.g.
Date
Total usage
Recommended reduction
Product
01.01.2022
1000
500
A
01.01.2022
1300
70
B
01.01.2022
2000
900
C
...
...
...
At the end of it Power BI kindly sums up the columns which is great, but now what I am trying to do is take the sum of these columns and store them in a summary table so that it would be something like this so that I could use it for a time series visual
Month
Sum Total Usage
Sum Recommended Reduction
January
59720
12040
February
81020
20580
...
...
...
I have no idea how to go about doing this. Is this the right way to go ? Or is there a way to create a visual without having to create a summary table ? I'm at a bit of a loss, so any suggestions would be really appreciated.
You don't need any DAX calculations for that. Simply pull your data onto the fields of a line chart visual like shown below. Note that you have to drill-down from Year to Month to actually see the lines.

Filling in missing dates Redshift

I have a table that looks like this:
Account Value Last_Day_in_Month
ABC 7 2018-06-30
ABC 12 2018-06-30
ABC 3 2018-08-31
FGH 57 2019-01-31
FGH 13 2019-03-31
FGH 127 2019-03-31
For each account, I need to fill in the missing dates corresponding to the last day in each month such that the resulting table just fills in the value from the last month (you'll notice two additional rows)
Account Value Last_Day_in_Month
ABC 7 2018-06-30
ABC 12 2018-06-30
ABC 12 2018-07-31
ABC 3 2018-08-31
FGH 57 2019-01-31
FGH 57 2019-02-28
FGH 13 2019-03-31
FGH 127 2019-03-31
I have many accounts each of them with different start and stop times (Last_Day_in_Month) so I only need to fill in the missing months between the min and max months for each account. Because I may have multiple values corresponding to one single month end date per account, my current solution is to use a lead with a case statement that adds a single day and a date table that contains only the last day of each month and perform a cross join. But, I think it's messy and I'm sure there's a better way that I'm not aware of. Here is my current solution...
select
*,
lead(Last_Day_in_Month,1)over (
partition by Account
order by Last_Day_in_Month
) as intermed2,
case
when intermed2 = Last_Day_in_Month
then dateadd('day',1, intermed2)
else intermed2
end as next_last_day
from table
cross join dates
where dates.date_actual >= table.Last_Day_in_Month
and dates.date_actual < table.next_last_day
Any suggestions are appreciated.
What you are doing is fine for reasonable numbers of rows. One thing I'd recommend for clarity is changing from a cross join to a right join with an ON clause. The query planner should see right through what you have and plan an efficient query so just a nit.
There are a number of other ways to do this and you can find examples by searching "gaps and islands" in stack overflow. The biggest feedback I have is about creating additional rows. What you are doing is making new rows for the missing months which is fine for reasonably small tables because they don't get super large when you add rows. For example if you have a table with 100 billion rows and you have an average gap size of 2, you will be creating a result with 300 billion rows. Making this much data will never be fast or efficient. So you say you have "many accounts", how many is many?
If the amount of data can fit in memory or you are doing this operation only once in a while then creating rows will work ok. If this is being done as part of ongoing queries and the data created will be large then I'd rethink why you need to create data to get your queries to execute. In general Redshift stores very large sets of data and multiplying out (cross join) these rows by some other factor (dates) will result in very slow queries. If the the intent is to pared this data down to some smaller result you will want to find a way to create this result without making such a large intermediate dataset.

Use a slicer on a calculated column

I need to find one way or another the following formula in Power BI:
Total Hours of Use of a Machine = Hours Function * Range of Functioning
where Hours Function is the hours of use of a certain machine. Take it at a cost that for each machine is a constant and Range of Functioning is the difference between the final date of the evaluation and the initial date, measured in hours.
For example, I want to measure the Total Hour Use of a Machine in between 15/10/2019 and 14/20/2019. So the math is the following:
Assume: 2 machines
Hours Function machine A: 6
Hours Function machine B: 9
Range of Functioning = 15/10/2019 - 14/10/2019 = 24 hours
The output:
Total Hours of Use of a Machine A: 144
Total Hours of Use of a Machine B: 216
I need to do that in Power BI in a way that any user moving a slicer of date, refresh the Total Hours of Use of a Machine.
I don't find any way that I can get the difference between the final date of the evaluation and the initial date and put in DAX or a new column.
You have to use measures if you want to recalculate the value when you change the date with a slicer.
The first step is to be sure to be able to calculate the number of day selected by your slicer.
It seems to be easy but if you use the function FirstDate on your calendar table directly integrated in PowerBI.
You'll never have what you expect.
The tricks here to get this number of day is to calculate the number of rows in your calendar table with the function countrows.
When you have this number day you just have to multiply this by 24 ( hours) and by the sum of your "Hours Function machine".( 6 for A 9 for B in your example )
( It's important to use the sum or another aggregate function like average because if you have multiple value the measure fall in error because it need only one value to multiply).
The dax formula looks like :
= COUNTROWS(('Calendar')) * Sum(Machine[Hours function])
You can then display this measure filtered by the Machine Name and a date slicer(based on your calendar table).

How to display data from different data source tables in a single table in Power BI

I have a couple of different tables in my Report, for demonstration purposes lets say that I have 1 data source that is Actual Invoice amounts and then I have another table that is Forecasted amounts. Each table has several dimensions that are the same between them, let say Country, Region, Product Classification and Product.
What I want is to be able to display a table/matrix that pulls information from both of these data sources like this
Description Invoice Forecast vs Forecast
USA 300 325 92%
East 150 175 86%
Product Grouping 1 125 125 100%
Product 1 50 75 67%
Product 2 75 50 150%
Product Grouping 3 25 50 50%
Product 3 25 50 50%
West 150 150 100%
Product Grouping 1 75 100 75%
Product 1 25 50 50%
Product 2 50 50 100%
Product Grouping 3 75 50 150%
Product 3 75 50 150%
I have not been able to figure out a way to combine the information from the multiple data source into a single matrix table, so any help would be appreciated. The one thing that I did find was somebody hard coded the structure of the rows into a separate data source and then used DAX expressions to pull in the pieces of information into the columns, but I don't like this solution because the structure of the rows is not constant.
What you're asking about is a common part of the star schema: combining facts from different fact tables together into a single visual or report.
What Not To Do (That You Might Be Tempted To)
What you don't want to do is combine the 2 fact tables into a single table in your Power BI data model. That's a lot of work and there's absolutely no need. Especially, since there are likely dimensions that the 2 fact tables do not have in common (e.g. actual amounts might be associated with a customer dimension, but forecast amounts wouldn't be).
What you also don't want to do is relate the 2 fact tables to each other in any way. Again, that's a lot of work. (Especially since there's no natural way to relate them at the row level.)
What To Do
Generally, how you handle 2 fact tables is the same as you handle a single fact table. First, you have your dimensions (country, region, classification, product, date, customer). Then you load your fact tables, and join them to the dimensions. You do not join your fact tables to each other. You then create measures (i.e. DAX expressions).
When you want to combine measures from the two facts together in a single matrix, you only use rows/columns that are meaningful to both fact tables. For example, actual amounts might be associated with a customer, but forecast amounts aren't. So you can't include customer information in the matrix. Another possibility is that actual amounts are recorded each day, whereas forecasts were done for the whole month. In this situation, you could put month in your matrix (since that's meaningful to both), but you wouldn't want to use date because Power BI wouldn't know how to divide up forecasts to individual dates.
As long as you're only using dimensions & attributes that are meaningful to both fact tables, you can easily create a matrix as you envision above. Simply drag on the attributes you want, then add the measures (i.e. DAX expressions).
The Invoice & Forecast columns would both be measures. The two measures from different fact tables can be combined into a 3rd measure for the vs. Forecast measure. Everything will work as long as you're just using dimensions/attributes that mean something to both fact tables.
I don't see anything in your proposed pivot table that strikes me as problematic.
Other Situations
If you have a situation where forecasts are at a month level and actual is at a date level, then you may be wondering how you'd relate them both to the same date dimension. This situation is called having different granularities, and there's a good article here I'd recommend reading that has advice: https://www.daxpatterns.com/handling-different-granularities/. Indeed, there's a whole section on comparing budget with revenue that you might find useful.
Finally, you mention that someone hard-coded the structure of the rows and used DAX expressions to build everything. This does, admittedly, sound like overkill. The goal with Power BI is flexibility. Once you have your facts, measures & dimensions, you can combine them in any way that makes sense. Hard-coding the rows eliminates that flexibility, and is a good clue that something isn't right. (Another good clue that something isn't right is when DAX expressions seem really complicated for something that should be easy)
I hope my answer helps. It's a general answer since your question is general. If you have specific questions about your specific situation, definitely post additional questions. (Sample data, a description of the model, the problem you're seeing, and what you want to see is helpful to get a good answer.)
If you're brand new to Power BI, data models, and the star schema, Alberto Ferrari and Marco Russo have an excellent book that I'd recommend reading to get a crash course: https://www.sqlbi.com/books/analyzing-data-with-microsoft-power-bi-and-power-pivot-for-excel/

Average Power BI Aggregates

I have entries that are uniquely identified by a variety of fields and that I pull in from Excel. Entries relate to the daily amount of work done, and people working on a specific area of a plant. Each entry has a work done field (measurement of the work done on that area), and a manpower count. The productivity per area is calculated by work done divided by manpower.
Date Area Work Done Manpower Productivity
2017/02/01 Pipe 50 25 2
2017/02/01 Valve 22 2 11
2017/02/01 Machine 54 2 22
I want to display the work done and manpower as bars in power BI, and the average productivity per day as a line. The problem is that the real productivity for the day (total work done divided by total manpower) is not the average of the individual productivity per area. Thus, I want to be able to create a line that total work done and manpower per day, and divides them to get the productivity, then only displays the productivity.
How can I do this in power BI?