I have a table of sales from multiple stores with the value of sales in dollars and the date and the corresponding store.
In another table I have the store name and the expected sales amount of each store.
I want to create a column in the main first table that evaluates the efficiency of sales based on the other table..
In other words, if store B made 500 sales today, I want to check with lookup table to see the target then use it to divide and obtain the efficiency then graph the efficiency of each store.
Thanks.
I tried creating some measures and columns but stuck with circular dependencies
I expect to add one column to my main table to an integer 0 to 100 showing the efficiency.
You can merge the two tables. In the query editor go to Merge Querires > Merge Query As New. Chose your relationship (match it by the column StoreName) and merge the two tables. You will get something like this (just a few of your sample data):
StoreName ActualSaleAmount ExpectedAmount
a 500 3000
a 450 3000
b 370 3500
c 400 5000
Now you can add a calculated column with your efficency:
StoreName ActualSaleAmount ExpectedAmount Efficency
a 500 3000 500/3000
a 450 3000 450/3000
b 370 3500 370/3500
c 400 5000 400/5000
This would be:
Efficency = [ActualSaleAmount] / [ExpectedAmount]
Related
I am struglling to getting top 25 unique customers when filtering the table using Alert_Id. Basically, I have these columns in table which you can find below. The goal is to show top 25 unique customers based on highest value. The Item can be repeated but name has to be unique. I have tried so many different things but nothing seems to be working as expected because of Multiple customers have used multiple items and hence I am getting duplicate rows. The table has to be dymic because whenever user filters the table using Alert_id it should return those top unique customers that associated with Alert_id(Alert_id is a single selection). So whenever user select their Alert_id that table should display their data. I have tried below measure,
First I created calculated column to break the tie for price because many Item shares the same price:
max price = Table[PRICE] + RAND()
Then I created another column to get max price for the customer name:
MAX column for table = CALCULATE(MAX('Table'[max price]), ALLEXCEPT(Table, Table[CUSTOMER_NAME]))
Then I created calculated table using these columns:
SELECTCOLUMNS(
FILTER(Table, Table[max price]=Table[MAX column for table]), "Name" ,Table[CUSTOMER_NAME],"Item",Table[ITEM], "PRICE",Table[MAX column for table], "Alert_ID", Table[ID], "DATE", Table[REQ_DATE], "ITEM_COUNT", Table[PK])
But, this is giving me all unique customers with the MAX value and I am getting blank table when I filter with Alert_ID even thought it has data but the customers are not with the MAX value. Basically, It's not dynamically capturing max values for each customer_name when filter is applied. And, I if there are multiple rows with same customer name which can have same exact value then I would choose any random row without considering which ITEM it is. I just want top 25 unique customers for one Alert_ID.
Here is the sample data,
Here is expected output if I select Alert_ID = 123 from filter and it can be different when I select different Alert_ID.
FYI: I have tried topn with max price and even with RANKX but no luck. I always endedup having multiple customers.
Any help or lead will be highly appreciated!
I was able to figure out how to get unique values. Here is the solution that worked for me.
First, I created calculated column with my price column and RAND function to break the ties:
sum value = Table[PRICE] + RAND()
Then, I have created one measure that calculates the rank:
rank with table = RANKX(CALCULATETABLE(VALUES('Table'[ITEM]), ALLSELECTED('Table'[ITEM])),CALCULATE(SUM(Table[sum value])), ,DESC, Dense )
Then I applied the filter on NAME column to get top 25 based on sum value calculated column. Also, dragged my measure on filters pane and applied the filter where Rank with table = 1.
That's how I got unique names with highest valued ITEM.
I have a measure that calculates the total for a certain account number among numerous accounts: Total EUR Account1 = CALCULATE('GL '[Total EUR], Account[Account number]="1") and then put in a table to show cost per financial quarter:
Table 1
Also; an example of source data:
Source data
Now, this sum per quarter I want to split into four different categories based on another table:
Table 2
So I want to get the below table where the % share per Category and FQ has been multiplied by the total for the applicable FQ.
I do not have the categories in the source data for the main table (Table 1)
Table 3
I'm "bugging" to transform my datas to something usable.
For the moment, a datasource provides me a table where a column contains dates, following by 24 columns representing each hours. In this 24 columns, for each date (each row) I've a total of phone calls.
I want to show the hourly repartition. So, my original datasource is not really usable ans need to transform it with something where there is a column "hour" (a simple index from 0 to 23 or 1 to 24) and a column with the total call for each column from the original column. But I'm a lot confused to do it because I don't have a way to create a relationship. Like this :
Someone have any idea to help me? Thanks in advance
I would unpivot the data source and then create two dimensions (Calendar) and (Time).
Problem
I'm trying to calculate and display the maximum value of all selected rows alongside their actual values in a table in Power BI. When I try to do this with the measure MaxSelectedSales = MAXX(ALLSELECTED(FactSales), FactSales[Value]), the maximum value ends up being repeated, like this:
If I add additional dimensions to the output, even more rows appear.
What I want to see is just the selected rows in the fact table, without the blank values. (i.e., only four rows would be displayed for SaleId 1 through 4).
Does anyone know how I can achieve my goal with the data model shown below?
Details
I've configured the following model.
The DimMarket and DimSubMarket tables have two rows each, you can see their names above. The FactSales table looks like this:
SaleId
MarketId
SubMarketId
Value
IsCurrent
1
1
1
100
true
2
2
1
50
true
3
1
2
60
true
4
2
2
140
true
5
1
1
30
false
6
2
2
20
false
7
1
1
90
false
8
2
2
200
false
In the table output, I've filtered FactSales to only include rows where IsCurrent = true by setting a visual level filter.
Your max value (the measure) is a scalar value (a single value only). If you put a scalar value in a table with the other records, the value just get repeated. In general mixing scalar values and records (tables) does not really bring any benefit.
Measures like yours can be better displayed in a KPI or Multi KPI visual (normally with the year, that you get the max value per year).
If you just want to display the max value of selected rows (for example a filter in your table), use this measure:
Max Value = MAX(FactSales[Value])
This way all filter which are applied are considered in the measures calculation.
Here is a sample:
I've found a solution to my problem, but I'm slightly concerned with query performance. Although, on my current dataset, things seem to perform fairly well.
MaxSelectedSales =
MAXX(
FILTER(
SELECTCOLUMNS(
ALLSELECTED(FactSales),
"id", FactSales[SaleId],
"max", MAXX(ALLSELECTED(FactSales), FactSales[Value])
),
[id] = MAX(FactSales[SaleId])
),
[max]
)
If I understand this correctly, for every row in the output, this measure will calculate the maximum value across all selected FactSales rows, set it to a column named max and then filter the table so that only the current FactSales[SaleId] is selected. The performance hit comes from the fact that MAX needs to be executed for every row in the output and a full table scan would be done when that occurs.
Posted on behalf of the question asker
I have a small test table with two fields - id and name, 19 records total. When I try to get 10 percent of record from this table using the following query, I get ALL the records. I tried to do this on large table, but result is the same - all records are returned. The query:
select * from test tablesample (10 percent) s;
If I use ROWS instead of TABLESAMPLE (i.e.: select * from test tablesample (10 rows) s;, it works fine, only 10 records are returned. How can I get just the neccessary percentage of records?
You can refer to the link below:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Sampling
You must be using CombinedHiveOutputFormat, which does not go well with ORC format. Hence you will never be able to save the output from Percent query to a table.
In my knowledge the best way to do this is using rand() function. But again you should not use this with order by() clause as it will impact performance. Here is my sample query which is time efficient :
SELECT * FROM table_name
WHERE rand() <= 0.0001
DISTRIBUTE BY rand()
SORT BY rand()
LIMIT 5000;
I tested this on 900M row table and query executed in 2 mins.
Hope this helps.
You can use PERCENT with TABLESAMPLE. For example:
SELECT * FR0M TABLE_NAME
TABLESAMPLE(1 PERCENT) T;
This will select 1% of the data size of inputs and not necessarily the number of rows. More details can be found here.
But if you are really looking for a method to select a percentage of the number of rows, then you may have to use LIMIT clause with the number of records you need to retrieve.
For example, if your table has 1000 records, then you can select random 10% records as:
select * from table_name order by rand() limit 100;