Say I have the table
[3 0 2 5 8 1
2 9 8 3 1 2
9 8 3 2 3 1]
All I want is a function to yield the column
[19
25
26]
without having to type out literally every single column name. My actual data matrix is 30 columns long.
Is there a way to do this that I'm missing? Or is DAX just completely incapable of doing this? I've Googled every single possible combination of the words "sum", "row", "-column", "PowerBI", and "DAX" and gotten absolutely nothing useful. Why in the world would SUM and SUMX automatically assume that the user would always in every single instance ONLY want to sum along dimension 1?
This is not easily done in DAX, as far as I know. Your only options are to spell out all the columns, or drop that requirement and follow the instructions in my earlier answer on how to do this with the Query Editor (i.e. PowerQuery).
Your "why" questions are possibly just uttering of (quite understandable) frustration, but trying to answer them nonetheless: I think the basic idea of tabular reporting is that you aggregate over rows, not columns, and all nearly operations are centered around those. In PowerBI (and many other reporting tools) you need to (un)pivot your data first to do the type of query you're after, which comes full circle to abovelinked solution.
Related
Is there a way to dynamically display/hide columna bases on JINJA conditionals or anything similar?
So there is a SQL Editor query that I have converted to a dataset that is supposed to do some SUMs and output for example revenue per months however, I also have date filtration on top of it so then if I want to track 10 months of results by having those columns, if the filter is only for the last 2 months those results will come out empty.
If necessary and/or is unclear, I can attach images to exemplify what I mean.
I have the following measure:
test = SWITCH(TRUE(),
MAX(test[month])>=9&&MAX(test[month])<=12,"fall",
MAX(test[month])>=1&&MAX(test[month])<=3,"winter",
MAX(test[month])>=4&&MAX(test[month])<=6,"spring",
MAX(test[month])>=7&&MAX(test[month])<=8,"summer")
Currently it looks at the month number (i.e. "3" for March and outputs "winter", what I'd like however is it to output is a count per season to show the distribution of the seasons in the dataset.
For example my desired output would be
Month Number
Count of occurrences of each season
fall
5
winter
7
spring
11
summer
2
I can't have a calculated column here either as I will want to make this measure dynamic later on with the use of a slicer, can someone tell me if this is possible?
The issue here is that you want to define your categories within the measure. Measures are not dynamic without some filter-context.
Take this for example:
Notice that the output of the calculation is identical between seasons.
There is no filter context to help the measure discern between the different seasons because these seasons are not defined in the model. (At least, I don't know how to make this work)
Switch returns the first true result. So, if you have values like in your sample, then start with the smallest, then bigger, and the largest at the end.
test =
SWITCH(
TRUE()
,MAX(test[month])<4,"winter" -- test <4
,MAX(test[month])<7,"spring" -- 3< test < 7
,MAX(test[month])<9,"summer" -- 6< test < 9 -- Is it ok that you have 2 months in
,"fall" -- 8< test -- summer and 4 in fall?
)
If you use MAX(test[month])<4,"winter" instead of MAX(test[month])<=3,"winter" then you avoid one calculation step and the code will be faster.
Then you need to use the result to find months numbers and get dates from the selected months. Then calculate your table filtered by months dates. If this answer is not enough to solve the case, then give more information about you table, it's columns, and what do you mean by 'Count of occurrences of each season', exactly what does 'occurrences' mean, is it a number of certain rows or some unique values.
I have an explore like the following -
Timestamp Rate Count
July 1 $2.00 15
July 2 $2.00 12
July 3 $3.00 20
July 4 $3.00 25
July 5 $2.00 10
I want to get the below results -
Rate Number of days Count
$2.00 3 37
$3.00 2 45
How can I calculate the Number of days column in the the table calculation? I don't want the timestamp to be included in the final table.
First of all— is rate a dimension? If so, and you have LookML access, you could create a "Count Days" measure that's just a simple count, and then return Rate, Count Days, and Count. That would be really simple.
If you can't do that, this hard to do with just a table calculation, since what you're asking for is to change the grouping of the data. Generally, that's something that's only possible in SQL or LookML, where you can actually alter the grouping and aggregation of the data.
With Table Calculations, you can make operations on the data that's been returned by the query, but you can't change the grouping or aggregation of it— So the issue becomes that it's quite difficult to take 3 rows and then use a table calculation to represent those as 1 row.
I'd recommend taking this to the LookML or SQL if you have developer access or can ask someone who does. If you can't do that, then I'd suggest you look at this thread: https://discourse.looker.com/t/creating-a-window-function-inside-a-table-calculation-custom-measure/16973 which explains how to do these kinds of functions in table calculations. It's a bit complex, though.
Once you've done the calculation, you'd want to use the Hide No's from Visualization feature to remove the rows you aren't interested in.
I am trying to implement Sort Merge Bucket Join (a feature of hive) in c++.
For starter, suppose I have 100 small files containing, say, 10 million rows of integers each, collectively representing a column, say, column 1 of 1 billion rows of a table and similarly another 100 similar files representing a similar column, say, column 2 of another table.
I want to essentially sort both the columns and write only those values in a different file(s) where the values in column 1 = values in column 2.
The catch is I do not want to read more than 10 million integers of each column in the RAM.
I am comfortable merging the columns as long as they are sorted, but do not know how can I sort the whole column without actually having the whole column in my RAM at a time.
I know this technique is implemented in hive but I am not well-versed in it and I cannot find any article on internet of any help to me.
And this goes without saying, I want to perform this operation as efficiently as possible.
How can I go about this problem? Or how does hive have been able to do it?
I have a couple of different tables in my Report, for demonstration purposes lets say that I have 1 data source that is Actual Invoice amounts and then I have another table that is Forecasted amounts. Each table has several dimensions that are the same between them, let say Country, Region, Product Classification and Product.
What I want is to be able to display a table/matrix that pulls information from both of these data sources like this
Description Invoice Forecast vs Forecast
USA 300 325 92%
East 150 175 86%
Product Grouping 1 125 125 100%
Product 1 50 75 67%
Product 2 75 50 150%
Product Grouping 3 25 50 50%
Product 3 25 50 50%
West 150 150 100%
Product Grouping 1 75 100 75%
Product 1 25 50 50%
Product 2 50 50 100%
Product Grouping 3 75 50 150%
Product 3 75 50 150%
I have not been able to figure out a way to combine the information from the multiple data source into a single matrix table, so any help would be appreciated. The one thing that I did find was somebody hard coded the structure of the rows into a separate data source and then used DAX expressions to pull in the pieces of information into the columns, but I don't like this solution because the structure of the rows is not constant.
What you're asking about is a common part of the star schema: combining facts from different fact tables together into a single visual or report.
What Not To Do (That You Might Be Tempted To)
What you don't want to do is combine the 2 fact tables into a single table in your Power BI data model. That's a lot of work and there's absolutely no need. Especially, since there are likely dimensions that the 2 fact tables do not have in common (e.g. actual amounts might be associated with a customer dimension, but forecast amounts wouldn't be).
What you also don't want to do is relate the 2 fact tables to each other in any way. Again, that's a lot of work. (Especially since there's no natural way to relate them at the row level.)
What To Do
Generally, how you handle 2 fact tables is the same as you handle a single fact table. First, you have your dimensions (country, region, classification, product, date, customer). Then you load your fact tables, and join them to the dimensions. You do not join your fact tables to each other. You then create measures (i.e. DAX expressions).
When you want to combine measures from the two facts together in a single matrix, you only use rows/columns that are meaningful to both fact tables. For example, actual amounts might be associated with a customer, but forecast amounts aren't. So you can't include customer information in the matrix. Another possibility is that actual amounts are recorded each day, whereas forecasts were done for the whole month. In this situation, you could put month in your matrix (since that's meaningful to both), but you wouldn't want to use date because Power BI wouldn't know how to divide up forecasts to individual dates.
As long as you're only using dimensions & attributes that are meaningful to both fact tables, you can easily create a matrix as you envision above. Simply drag on the attributes you want, then add the measures (i.e. DAX expressions).
The Invoice & Forecast columns would both be measures. The two measures from different fact tables can be combined into a 3rd measure for the vs. Forecast measure. Everything will work as long as you're just using dimensions/attributes that mean something to both fact tables.
I don't see anything in your proposed pivot table that strikes me as problematic.
Other Situations
If you have a situation where forecasts are at a month level and actual is at a date level, then you may be wondering how you'd relate them both to the same date dimension. This situation is called having different granularities, and there's a good article here I'd recommend reading that has advice: https://www.daxpatterns.com/handling-different-granularities/. Indeed, there's a whole section on comparing budget with revenue that you might find useful.
Finally, you mention that someone hard-coded the structure of the rows and used DAX expressions to build everything. This does, admittedly, sound like overkill. The goal with Power BI is flexibility. Once you have your facts, measures & dimensions, you can combine them in any way that makes sense. Hard-coding the rows eliminates that flexibility, and is a good clue that something isn't right. (Another good clue that something isn't right is when DAX expressions seem really complicated for something that should be easy)
I hope my answer helps. It's a general answer since your question is general. If you have specific questions about your specific situation, definitely post additional questions. (Sample data, a description of the model, the problem you're seeing, and what you want to see is helpful to get a good answer.)
If you're brand new to Power BI, data models, and the star schema, Alberto Ferrari and Marco Russo have an excellent book that I'd recommend reading to get a crash course: https://www.sqlbi.com/books/analyzing-data-with-microsoft-power-bi-and-power-pivot-for-excel/