Can we create Dynamic Date table in mapping Data Flow? - powerbi

I have a query in Power BI that takes two parameter: Start Date and End Date.
Whenever I pass these Dates it return a table of Date that contain few columns created according to this range of date such as Date, QuarterofYear, Year, MonthName......etc.
Can we create a mapping data flow in ADF that takes two parameter as input and return a calculated table according to provided dates?
Is there any function that return the range of dates?

For your request: "I want that I pass two date Start Date and End Date in ADF Mapping Data Flow , and Data flow will Create a column such as "Date" that contain that number of Date rows. Is there any function for this? Exam. Start Date=20-01-2019, End Date=20-01-2020 Then Date Column Values should be: 20-01-2019 21-01-2019 ......... ......... 20-02-2020", according the Data Factory documents and my experience, the answer is no, we can't achieve it in Data Flow.

There is a solution to this, but it is a bit tricky.
TL;DR
The general data flow looks like this:
We need a dummy source with exactly one row which contains whatever.
Then we derive a column where we use the mapLoop() expression to create an array of all the dates we want to get rows for.
Finally, we need to flatten the array column which will result in one row per array entry and thus one row per date.
Walkthrough
Source dummy
Each dataflow needs a source and we need exactly one row to make our dataflow work. To achieve this I've created a dataset called empty of type CSV in my data lake which has this content:
empty
""
This is our source definition:
And its result looks like this:
Derived column days
This is where the magic happens!
We create a new column dates which is an array of all the dates we want to have in our date table:
In this scenario we want a date table starting on 2019-01-01 and reaching one year into the future. The full expression looks like this:
mapLoop(
addDays(currentDate(), 365) - toDate(2019-01-01),
addDays(toDate(2019-01-01), #index)
)
This is what happens here:
the mapLoop() function builds an array of elements. You specify the number of elements you want to have and the lambda expression to calculate each of the elements. For example, mapIndex([1, 2, 3, 4], #item + 2 + #index) results in [4, 6, 8, 10]
addDays(currentDate(), 365) - toDate('2019-01-01') is the number of days between our start (2019-01-01) and end date (1 year in the future from now) and thus the number of dates we want to have in our resulting array.
addDays(toDate(2019-01-01), #index) calculates each array item by adding #index days to our start date. This is executed for the number of days we've calculated before and #index is the array position. Thus, the first element of the array will be 2019-01-01 + 1, the second 2019-01-01 + 2 and so on.
Our stream now has these columns:
Flatten
Finally, you need a flatten transformation which will expand each item in your array to its dedicated row. We can also dismiss the useless empty column in this step:
And this finally results in what we wanted to achieve:
References
Data transformation expressions in mapping data flow

Related

MariaDB: multiple table update does not update a single row multiple times? Why?

Today I was just bitten in the rear end by something I didn't expect. Here's a little script to reproduce the issue:
create temporary table aaa_state(id int, amount int);
create temporary table aaa_changes(id int, delta int);
insert into aaa_state(id, amount) values (1, 0);
insert into aaa_changes(id, delta) values (1, 5), (1, 7);
update aaa_changes c join aaa_state s on (c.id=s.id) set s.amount=s.amount+c.delta;
select * from aaa_state;
The final result in the aaa_state table is:
ID
Amount
1
5
Whereas I would expect it to be:
ID
Amount
1
12
What gives? I checked the docs but cannot find anything that would hint at this behavior. Is this a bug that I should report, or is this by design?
The behavior you are seeing is consistent with two updates happening on the aaa_state table. One update is assigning the amount to 7, and then this amount is being clobbered by the second update, which sets to 5. This could be explained by MySQL using a snapshot of the aaa_state table to fetch the amount for each step of the update. If true, the actual steps would look something like this:
1. join the two tables
2. update the amount using the "first" row from the changes table.
now the cached result for the amount is 7, but this value will not actually
be written out to the underlying table until AFTER the entire update
3. update the amount using the "second" row from the changes table.
now the cached amount is 5
5. the update is over, write 5 out for the actual amount
Your syntax is not really correct for what you want to do. You should be using something like the following:
UPDATE aaa_state as
INNER JOIN
(
SELECT id, SUM(delta) AS delta_sum
FROM aaa_changes
GROUP BY id
) ac
ON ac.id = as.id
SET
as.amount = as.amount + ac.delta_sum;
Here we are doing a proper aggregation of the delta values for each id in a separate bona-fide subquery. This means that the delta sums will be properly computed and materialized in the subquery before MySQL does the join, to update the first table.

POWER BI Creating new query out of existing one using range of columns

Trying to create a new query from the existing "Master" Query using below formula:
let
Source = Table.SelectColumns('Original Source Name',{'Column Name','Column Name2'})
in
Source
which works fine, however I am looking to see if there is any other formula which would do the same but in a way that it will create the new query with a range of columns , for example Column 30- 67 ( in this case when the original Excel file is updated, inserting a column in this range it would automatically update in the PBI too when refreshed)
Here's one possible way. If you start with this table, named Table1:
You can reference it in a new query like this:
let
Source = Table.SelectColumns(Table1, List.Range(Table.ColumnNames(Table1), 2, 3))
in
Source
...to get this:
The formula selects a range of columns from the table starting at the column at index position 2, and spanning 3 columns. (The index starts with 0.) For columns 30-67, you would change the 2 to 31 and the 3 to 37. You would change Table1 to your Original Source Name as well.
See these links for more info on List.Range and Table.ColumnNames.

Power BI remove duplicates based on max value

I have 2 column; ID CODE, value
Remove duplicates function will remove the examples with the higher value and leave the lower one. Is there any way to remove the lower ones? The result I expected was like this.
I've tried Buffer Table function before but it doesn't work. Seems like Buffer Table just works with date-related data (newest-latest).
You could use SUMMARIZE which can be used similar to a SQL query that takes a MIN value for a column, grouped by some other column.
In the example below, MIN([value]) is taken, given a new column name "MinValue", which is grouped by IDCode. This should return the min value for each IDCode.
NewCalculatedTable =
SUMMARIZE(yourTablename, yourTablename[IDCode], "MinValue", MIN(yourTablename[value]) )
Alternatively, if you want the higher values just replace the MIN function with MAX.

Sort column with repeated values by another column

In Power BI Desktop, I'm trying to order the following column with repeated values by an ID column (contains primary key).
This returns the error: "There can't be more than one value in "Nível2"...."
In this other post it seems the suggestion is to concatenate the values of the column so they don't get duplicate.
But I want them to be repeated so they can aggregate values in visuals.
So, what's the workaround for this situation?
Thanks in advance for helping!
The issue is that your sort column (i.e. your ID column) contains multiple values for each value in the column you are trying to sort (i.e. your Nivel2 column).
You need to ensure that your sort column contains only one distinct value for each value in the column you are trying to sort.
One way to achieve this would be to create a new (calculated) sort column based on your ID column. It could be defined like this:
SortColumn:=CALCULATE(MAX('YourTable'[ID]),ALLEXCEPT('YourTable','YourTable'[Nivel2]))
Here is an example of how the SortColumn would behave:
Id Nivel2 SortColumn
1 Caixa 4
2 Caixa 4
3 Caixa 4
4 Caixa 4
5 Depósitos à ordem 7
6 Depósitos à ordem 7
7 Depósitos à ordem 7
You can now sort Nivel2 by SortColumn.
EDIT - The implementation of the SortColumn should be done in the data source
There seems to be a limitation in PowerBI where it checks the implementation of the sort column rather than the data in the sort column. Therefore the above solution does not work, even though the data in the sort column is perfectly valid. The above solution will throw this error when you attempt to sort [Nivel2] by SortColumn:
This column can't be sorted by a column that is already sorted, directly or indirectly, by this column.
The implementation of the SortColumn should be moved to the data source instead. I.e. if your data source is an Excel sheet, then the SortColumn should be created inside the Excel sheet.
The above answer does explain the issue and the resolvation correctly. The only change is that the SortColumn must be implemented outside of the tabular model (PowerBI) to ensure that PowerBI does not know about the dependency between the SortColumn and the [Nivel2] column.
In my case, I calculate the levels from a parent-child hierarchy
Path = Path([id],[father])
For each level:
Level1 = LOOKUPVALUE([Name],[id], PathItem([Path],1))
Level2 = LOOKUPVALUE([Name],[id], PathItem([Path],2))
.....
Then I created a new column for each level to sort the column Level:
SortL1 = LOOKUPVALUE([nID],[id], PathItem([Path],1))
SortL2 = LOOKUPVALUE([nID],[id], PathItem([Path],2))
.....
id and nID is the same numeric variable but "id" in string format because Path do not support numeric values.

Work with matrix (I can't edit visualisation)

I have this table in Power BI, But I can't do another table.
How I can do this?
Now the values are grouped by date (different fields have information under one date, next the same fields are grouped by another date)
I want the values in the columns to be grouped by field (one field has date information next to it).
Edit1:
I can't set Date on the 2nd place in the grouping
Because date is column, traffic,orders,rev,costs- are values
You need to set Date on the 2nd place in the grouping, after a field containing traffic, orders, etc.
EDIT:
You need to unpivot these columns first, for example, in PowerQuery. Use Edit Query. This results in transforming your 4 columns to 2: Attribute and Value. Attribute will be your first grouping parameter. 2nd will be Date. Value column goes to values.
If you need your source query somewhere else, you may create new query for this very report only. It is done by first right-clicking original one and selecting Reference Query, and the doing any edits. This will keep original query intact.