PowerQuery PowerBI merge 2 tables based on condition between StartDate and EndDate - powerbi

I try to join 2 tables using Power Query/PowerBI: Absence and dimDate to create a result table below:
Absence table
+------------+--------------+--------------+-----------+-----------+
| EmployeeId | EmployeeName | AbsenceType | StartDate | EndDate |
+------------+--------------+--------------+-----------+-----------+
| 1 | A | Annual Leave | 2/01/2017 | 5/01/2017 |
| 2 | B | Sick Leave | 4/01/2017 | 6/01/2017 |
+------------+--------------+--------------+-----------+-----------+
dimDate table
+------------+
| FullDate |
+------------+
| 1/01/2017 |
| 2/01/2017 |
| 3/01/2017 |
| 4/01/2017 |
| 5/01/2017 |
| 6/01/2017 |
| 7/01/2017 |
| 8/01/2017 |
| 9/01/2017 |
| 10/01/2017 |
+------------+
Result
+------------+--------------+--------------+-----------+
| EmployeeId | EmployeeName | AbsenceType | Date |
+------------+--------------+--------------+-----------+
| 1 | A | Annual Leave | 2/01/2017 |
| 1 | A | Annual Leave | 3/01/2017 |
| 1 | A | Annual Leave | 4/01/2017 |
| 1 | A | Annual Leave | 5/01/2017 |
| 2 | B | Sick Leave | 4/01/2017 |
| 2 | B | Sick Leave | 5/01/2017 |
| 2 | B | Sick Leave | 6/01/2017 |
+------------+--------------+--------------+-----------+
I usually use SQL to create this result, however I don't know how to efficiently do it in PowerQuery.
SELECT A.EmployeeId
,A.EmployeeName
,A.AbsenceType
,D.FullDate
FROM Absence AS A
INNER JOIN dimDate AS D ON (
D.FullDate >= A.StartDate
AND D.FullDate <= A.EndDate
)
Note: I have tried the Full Join between 2 tables Absence and dimDate, then filter true value if dimDate.FullDate >= StartDate and dimDate.FullDate <= EndDate. However this approach seems to be ineffective with large table, and it creates redundant records before filtering so it's quite slow.
Please give me some advice.

No need to merge. You can create a column with embedded lists of all dates between StartDate and EndDate. And then expand that column.
let
Source = Table1,
#"Added Custom" = Table.AddColumn(Source, "Date", each List.Dates([StartDate],1+Duration.Days([EndDate]-[StartDate]),#duration(1,0,0,0))),
#"Expanded Date" = Table.ExpandListColumn(#"Added Custom", "Date"),
#"Changed Type" = Table.TransformColumnTypes(#"Expanded Date",{{"Date", type date}}),
#"Removed Columns" = Table.RemoveColumns(#"Changed Type",{"StartDate", "EndDate"})
in
#"Removed Columns"

Related

Continuous subtracting of values from previous row in a different column

I am working on a Power BI dashboard that is forecasting values up until 2030, depending on projects that are under implementation between 2022 to 2030.
I have one table [Actual] that contains actual values from 2021 and 2022 which I have at a monthly frequency.
I also have another table [Project Impacts] that contains the projects, with the impacts that they have on the actual values, and this is available from 2022 to 2030, at a quarterly frequency.
There are a few important measures relevant to this problem.
From the CO2 table, we have ‘Actual Values’.
From the Projects table, we have a measure named ‘Estimated saving from projects’, which calculates how much savings there are based on the projects each quarter. The table below shows the measures next to each other.
What I want to do is subtract the ‘Estimated savings from projects’ from the ‘Actual Values’, so that I am left with the ‘Forecast Values’. This is the results I get after doing this (image below).
As you can see from the image above, the ‘Forecast Values’ seems to be working properly, up until 2023/Q1. This is because there is no current data for 'Actual Values' after 2022, so the ‘Estimated saving from Projects’ is just subtracting from 0, hence the negative values in the far right column.
What I want to do is keep the calculation the same up until the end of 2022, but after 2022, the ‘Estimated saving from Projects’ should be subtracted from the last quarters value.
So as an example, the 2023/Q1 ‘Estimated saving from Projects’ value of 7.87 should be subtracted from the 53.51 value from the 2022/Q4 value in the ‘Forecast Value’ measure, giving a value of 45.64 instead of the value -7.57. Then the same process should repeat, so the 7.87 from 2023/Q2 should be subtracted from the 45.64 from the previous quarter and so on.
Does anyone know of a DAX formula, calculated column, or any solution to the problem above?
Many thanks
OK, so this isn't DAX, but I think the result works in your situation. I resolved it in Power Query. I recreated your table and added on to it:
+------+---------+--------------+------------------+
| Year | Quarter | ActualValues | EstimatedSavings |
+------+---------+--------------+------------------+
| 2021 | 2021Q1 | 83.88 | |
| 2021 | 2021Q2 | 72.84 | |
| 2021 | 2021Q3 | 72.25 | |
| 2021 | 2021Q4 | 68.13 | |
| 2022 | 2022Q1 | 77.53 | 0.17 |
| 2022 | 2022Q2 | 67.75 | 0.17 |
| 2022 | 2022Q3 | 63.08 | 0.17 |
| 2022 | 2022Q4 | 58.14 | 4.63 |
| 2023 | 2023Q1 | | 7.87 |
| 2023 | 2023Q2 | | 7.87 |
| 2023 | 2023Q3 | 104.33 | 8.94 |
| 2023 | 2023Q4 | | 5.11 |
| 2024 | 2024Q1 | | 10.08 |
| 2024 | 2024Q2 | | 10.08 |
| 2024 | 2024Q3 | | 9.04 |
| 2024 | 2024Q4 | | 8.42 |
+------+---------+--------------+------------------+
For illustration purposes, I then created a column I called "ForeCastValuesOLD". You will see below that instead of negative values, you get a "null" value, because the empty cells in "ActualValues" translate to a "null" value in Power Query (expected behavior for a column formatted as decimal number), so subtracting "EstimatedSavings" from this results in another "null" value. It's still wrong in terms of what you want, but it's a "different wrong".
I then selected the "ActualValues" column, and on the "Transform" tab clicked on the arrow next to "Fill" and then on "Down".
Note: "Fill | Down" will not work if you have zeros in that column, it only works with null values. See also here.
As a last step, I repeated the ForeCastValues calculated column, only with the new "ActualValues" column. The results are what you are looking for, I believe, and it doesn't matter how many blanks you have or where they are in that column.
This is the final table:
+------+---------+--------------+------------------+-------------------+----------------+
| Year | Quarter | ActualValues | EstimatedSavings | ForeCastValuesOLD | ForeCastValues |
+------+---------+--------------+------------------+-------------------+----------------+
| 2021 | 2021Q1 | 83.88 | | | |
| 2021 | 2021Q2 | 72.84 | | | |
| 2021 | 2021Q3 | 72.25 | | | |
| 2021 | 2021Q4 | 68.13 | | | |
| 2022 | 2022Q1 | 77.53 | 0.17 | 77.36 | 77.36 |
| 2022 | 2022Q2 | 67.75 | 0.17 | 67.58 | 67.58 |
| 2022 | 2022Q3 | 63.08 | 0.17 | 62.91 | 62.91 |
| 2022 | 2022Q4 | 58.14 | 4.63 | 53.51 | 53.51 |
| 2023 | 2023Q1 | 58.14 | 7.87 | | 50.27 |
| 2023 | 2023Q2 | 58.14 | 7.87 | | 50.27 |
| 2023 | 2023Q3 | 104.33 | 8.94 | 95.39 | 95.39 |
| 2023 | 2023Q4 | 104.33 | 5.11 | | 99.22 |
| 2024 | 2024Q1 | 104.33 | 10.08 | | 94.25 |
| 2024 | 2024Q2 | 104.33 | 10.08 | | 94.25 |
| 2024 | 2024Q3 | 104.33 | 9.04 | | 95.29 |
| 2024 | 2024Q4 | 104.33 | 8.42 | | 95.91 |
+------+---------+--------------+------------------+-------------------+----------------+
And here is the M code:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("fc5LDsMgDATQu7CORvgDdo7RdcT9r1EMjdSWJCuP9MayjyNxZkrbGK8ILnDvM7XtF7kHY7heoUzkcoWxUR0kX8gTedw0QwnMIFsKEarByl0hNqsg+10h7pd+P6aiylmQWZDxQ7wBt8X4wSJQVkgEx65LQ+d2AdFpOk3Pq5Tj9X/kJ5SJO7Iu9rnoUE6tvQE=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Year = _t, Quarter = _t, ActualValues = _t, EstimatedSavings = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Quarter", type text}, {"ActualValues", type number}, {"EstimatedSavings", type number}}),
#"Added Custom1" = Table.AddColumn(#"Changed Type", "ForeCastValuesOLD", each [ActualValues] - [EstimatedSavings]),
#"Changed Type2" = Table.TransformColumnTypes(#"Added Custom1",{{"ForeCastValuesOLD", type number}}),
#"Filled Down" = Table.FillDown(#"Changed Type2",{"ActualValues"}),
#"Added Custom" = Table.AddColumn(#"Filled Down", "ForeCastValues", each [ActualValues] - [EstimatedSavings]),
#"Changed Type1" = Table.TransformColumnTypes(#"Added Custom",{{"ForeCastValues", type number}})
in
#"Changed Type1"

PowerBI : Count Distinct values in one column based on Distinct Values in another column

i have a data for group no and its set value. When the set value is same for all the batches i dont want those batches to be counted. but if there are more than 1 set values in each batch then the dax query should count it as 1.
My current data is like this
| group_no | values |
| ---------- | ---------------------- |
| H110201208 | 600 |
| H110201208 | 600 |
| H110201208 | 680 |
| H101201215 | 665 |
| H109201210 | 640 |
| H123201205 | 600 |
| H125201208 | 610 |
| H111201212 | 610 |
| H111201212 | 630 |
I want my output like this
| Group no | Grand Total |
| ---------- | ----------- |
| H101201215 | 1 |
| H109201210 | 1 |
| H110201208 | 3 |
| H111201212 | 2 |
| H123201205 | 1 |
| H125201208 | 1 |
i want to create another table like the one above using dax so that i can plot graphs n percentages based on its output
i want to do this in powerbi using DAX language.
TABLE =
GROUPBY (
Groups, //SourceTable
Groups[ group_no ],
"GrandTotal", COUNTX ( CURRENTGROUP (), DISTINCTCOUNTNOBLANK ( Groups[ values ] ) )
)

Query array column in BigQuery by condition

I have a table in Bigquery with this format:
+------------+-----------------+------------+-----------------+---------------------------------+
| event_date | event_timestamp | event_name | event_params.key| event_params.value.string_value |
+------------+-----------------+------------+-----------------+---------------------------------+
| 20201110 | 2929929292 | my_event | previous_page | /some-page |
+------------+-----------------+------------+-----------------+---------------------------------+
| | layer | /some-page/layer |
| +-----------------+---------------------------------+
| | session_id | 99292 |
| +-----------------+---------------------------------+
| | user._id | 2929292 |
+------------+-----------------+------------+-----------------+---------------------------------+
| 20201110 | 2882829292 | my_event | previous_page | /some-page |
+------------+-----------------+------------+-----------------+---------------------------------+
| | layer | /some-page/layer |
| +-----------------+---------------------------------+
| | session_id | 29292 |
| +-----------------+---------------------------------+
| | user_id | 229292 |
+-------------------------------------------+-----------------+---------------------------------+
I want to perform a query to get all rows where event_params.value.string_value contains the regex /layer.
I have tried this:
SELECT
"event_params.value.string_value",
FROM `my_project.my_dataset.my_events_20210110`,
UNNEST(event_params) AS event_param
WHERE event_param.key = 'layer' AND
REGEXP_CONTAINS(event_param.value.string_value, r'/layer')
LIMIT 100
But I'm getting this output:
+---------------------------------+
| event_params.value.string_value |
+---------------------------------+
| event_params.value.string_value |
+---------------------------------+
| event_params.value.string_value |
+---------------------------------+
| event_params.value.string_value |
+---------------------------------+
| event_params.value.string_value |
+---------------------------------+
Some ideas of what I'm doing wrong?
You are selecting a string - you should select a column.
The other problem is that you're cross joining the table with its arrays - effectively bloating up the table.
Your solution is to use a subquery in the WHERE clause:
SELECT
* -- Not sure what you actually need from the table ...
FROM `my_project.my_dataset.my_events_20210110`
WHERE
-- COUNT(*)>0 means "if you find more than zero" then return TRUE
(SELECT COUNT(*)>0 FROM UNNEST(event_params) AS event_param
WHERE event_param.key = 'layer' AND
REGEXP_CONTAINS(event_param.value.string_value, r'/layer')
)
LIMIT 100
If you actually want the values from the array your quick solution is removing the quotes:
SELECT
event_params.value.string_value
FROM `my_project.my_dataset.my_events_20210110`,
UNNEST(event_params) AS event_param
WHERE event_param.key = 'layer' AND
REGEXP_CONTAINS(event_param.value.string_value, r'/layer')
LIMIT 100

How do i add additional rows in M QUERY

I want to add more rows using the Query editor (Power query/ M Query) in only the Start Date and End Date column:
+----------+------------------+--------------+-----------+-------------+------------+
| Employee | Booking Type | Jobs | WorkLoad% | Start Date | End date |
+----------+------------------+--------------+-----------+-------------+------------+
| John | Chargeable | CNS | 20 | 04/02/2020 | 31/03/2020 |
| John | Chargeable | CNS | 20 | 04/03/2020 | 27/04/2020 |
| Bernard | Vacation/Holiday | SN | 100 | 30/04/2020 | 11/05/2020 |
| Bernard | Vacation/Holiday | Annual leave | 100 | 23/01/2020 | 24/02/2020 |
| Bernard | Chargeable | Tech PLC | 50 | 29/02/2020 | 30/03/2020 |
+----------+------------------+--------------+-----------+-------------+------------+
I want to find the MIN(Start Date) and MAX(End Date) and then append the range of start to end dates to this table only in the Start Date and End Date column in the Query Editor (Power Query/ M Query). Preferrable if I can create another table2 duplicating the original table and append these rows.
For example:
+----------+------------------+--------------+-----------+-------------+------------+
| Employee | Booking Type | Jobs | WorkLoad% | Start Date | End date |
+----------+------------------+--------------+-----------+-------------+------------+
| John | Chargeable | CNS | 20 | 04/02/2020 | 31/03/2020 |
| John | Chargeable | CNS | 20 | 04/03/2020 | 27/04/2020 |
| Bernard | Vacation/Holiday | SN | 100 | 30/04/2020 | 11/05/2020 |
| Bernard | Vacation/Holiday | Annual leave | 100 | 23/01/2020 | 24/02/2020 |
| Bernard | Chargeable | Tech PLC | 50 | 29/02/2020 | 30/03/2020 |
| | | | | 23/01/2020 | 23/01/2020 |
| | | | | 24/01/2020 | 24/01/2020 |
| | | | | 25/01/2020 | 25/01/2020 |
| | | | | 26/01/2020 | 26/01/2020 |
| | | | | 27/01/2020 | 27/01/2020 |
| | | | | 28/01/2020 | 28/01/2020 |
| | | | | 29/01/2020 | 29/01/2020 |
| | | | | 30/01/2020 | 30/01/2020 |
| | | | | 31/01/2020 | 31/01/2020 |
| | | | | ... | ... |
| | | | | 11/05/2020 | 11/05/2020 |
+----------+------------------+--------------+-----------+-------------+------------+
The List.Dates function is pretty useful here.
Generate the dates in your range, duplicate that to two columns and then append.
let
StartDate = List.Min(StartTable[Start Date]),
EndDate = List.Max(StartTable[End Date]),
DateList = List.Dates(StartDate, Duration.Days(EndDate - StartDate), #duration(1,0,0,0)),
DateCols = Table.FromColumns({DateList, DateList}, {"Start Date", "End Date"}),
AppendDates = Table.Combine({StartTable, DateCols})
in
AppendDates

Find a string in the entire table (all fields, all columns, all rows) in Django

I have a module (table) in my Django app with 24 fields (columns), and I want to search a string in it. I want to see a list that show me which one of the rows has this string in its fields.
Please have a look at this example:
+-----+------+------+---------+------------+------------+------------+-----+-------------+
| id | name | year | country | attribute1 | attribute2 | attribute3 | ... | attribute20 |
+-----+------+------+---------+------------+------------+------------+-----+-------------+
| 1 | Tie | 1993 | USA | Bond | Busy | Busy | ... | Free |
+-----+------+------+---------+------------+------------+------------+-----+-------------+
| 2 | Ness | 1980 | Germany | Free | Busy | Both | ... | Busy |
+-----+------+------+---------+------------+------------+------------+-----+-------------+
| 3 | Both | 1992 | Sweden | Free | Free | Free | ... | Busy |
+-----+------+------+---------+------------+------------+------------+-----+-------------+
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
+-----+------+------+---------+------------+------------+------------+-----+-------------+
| 24 | Lex | 2001 | Russia | Busy | Free | Free | ... | Both |
+-----+------+------+---------+------------+------------+------------+-----+-------------+
What I am looking to get (by using filters, etc.) is something like this: (When I filter the records base on the word "Both" in the entire table and all of the records. Each row that contains "Both" is in the result below)
+----+------+------+---------+------------+------------+------------+-----+-------------+
| id | name | year | country | attribute1 | attribute2 | attribute3 | ... | attribute20 |
+----+------+------+---------+------------+------------+------------+-----+-------------+
| 1 | Ness | 1980 | Germany | Free | Busy | Both | ... | Busy |
+----+------+------+---------+------------+------------+------------+-----+-------------+
| 2 | Both | 1992 | Sweden | Free | Free | Free | ... | Busy |
+----+------+------+---------+------------+------------+------------+-----+-------------+
| 3 | Lex | 2001 | Russia | Busy | Free | Free | ... | Both |
+----+------+------+---------+------------+------------+------------+-----+-------------+
You can see that the string ("Both") appears in different rows in different columns. (one "Both" is under the column "attribute3", the other "Both" is under column "Name", and the last "Both" is under column "attribute20".
How you get this result in Django by queryset?
Thanks
Assuming you have modeled the above table as a Django model named Person
from django.db.models import Q
query_text = "your search string"
Person.objects.filter(
Q(name__contains=query_text) |
Q(year__contains=query_text) |
Q(attribute1__contains=query_text)
and so on for all your attributes
)
The above code will do a case sensitie search. if instead you want it to be case insenssitive search, use name__icontains instead of say name__contains in the above code.
As suggested by #rchurch4 in comment and based on this so answer, here's how one could search the entire table with fewer lines of code:
from functools import reduce
from operators import or_
all_fields = Person._meta.get_fields()
search_fields = [i.name for i in all_fields]
q = reduce(or_, [Q(**{'{}__contains'.format(f): search_text}) for f in search_fields], Q())
Person.objects.filter(q)