How to split records based on a value field in a column - kettle

i have a scenario like the below ...where based on value of num_seats i have to split the rows in target into that amount of data along with another field(seat_num) which will be having counter which will increment by 1.Please suggest..

You can clone the rows based on a field value (in this case, num_seats), remove the original (non-cloned) row, then calculate the seat number and replace the original fields (num_seats, seat_num, last_seat, etc.) with the new values:
Here's a Gist of the above transformation: https://gist.github.com/mattyb149/e4cf796ff45983ebf87e

Related

Can we create Dynamic Date table in mapping Data Flow?

I have a query in Power BI that takes two parameter: Start Date and End Date.
Whenever I pass these Dates it return a table of Date that contain few columns created according to this range of date such as Date, QuarterofYear, Year, MonthName......etc.
Can we create a mapping data flow in ADF that takes two parameter as input and return a calculated table according to provided dates?
Is there any function that return the range of dates?
For your request: "I want that I pass two date Start Date and End Date in ADF Mapping Data Flow , and Data flow will Create a column such as "Date" that contain that number of Date rows. Is there any function for this? Exam. Start Date=20-01-2019, End Date=20-01-2020 Then Date Column Values should be: 20-01-2019 21-01-2019 ......... ......... 20-02-2020", according the Data Factory documents and my experience, the answer is no, we can't achieve it in Data Flow.
There is a solution to this, but it is a bit tricky.
TL;DR
The general data flow looks like this:
We need a dummy source with exactly one row which contains whatever.
Then we derive a column where we use the mapLoop() expression to create an array of all the dates we want to get rows for.
Finally, we need to flatten the array column which will result in one row per array entry and thus one row per date.
Walkthrough
Source dummy
Each dataflow needs a source and we need exactly one row to make our dataflow work. To achieve this I've created a dataset called empty of type CSV in my data lake which has this content:
empty
""
This is our source definition:
And its result looks like this:
Derived column days
This is where the magic happens!
We create a new column dates which is an array of all the dates we want to have in our date table:
In this scenario we want a date table starting on 2019-01-01 and reaching one year into the future. The full expression looks like this:
mapLoop(
addDays(currentDate(), 365) - toDate(2019-01-01),
addDays(toDate(2019-01-01), #index)
)
This is what happens here:
the mapLoop() function builds an array of elements. You specify the number of elements you want to have and the lambda expression to calculate each of the elements. For example, mapIndex([1, 2, 3, 4], #item + 2 + #index) results in [4, 6, 8, 10]
addDays(currentDate(), 365) - toDate('2019-01-01') is the number of days between our start (2019-01-01) and end date (1 year in the future from now) and thus the number of dates we want to have in our resulting array.
addDays(toDate(2019-01-01), #index) calculates each array item by adding #index days to our start date. This is executed for the number of days we've calculated before and #index is the array position. Thus, the first element of the array will be 2019-01-01 + 1, the second 2019-01-01 + 2 and so on.
Our stream now has these columns:
Flatten
Finally, you need a flatten transformation which will expand each item in your array to its dedicated row. We can also dismiss the useless empty column in this step:
And this finally results in what we wanted to achieve:
References
Data transformation expressions in mapping data flow

Get total count of each distinct value

If I for example have a column of countries that might repeat and the list follows like this: Spain, Spain, Italy, Spain
I want to get the result that I take the number that a country appears in the column and divide it by total number. I have tried:
CountRows = DIVIDE(DISTINCTCOUNT('Report (7)'[Country]); COUNT('Report (7)'[Country]) )
Any suggestions? do I need a new column for that?
The easiest way to achieve this type of calculation is to add one column with the number of occurrence of the selected words divided by the number of row in the table.
You need to use the function Earlier to get the context.
If you have one table named Table1 and your column Country
Something like :
Divide(COUNTROWS(FILTER(table1, Table1[Country] = EARLIER(Table1[Country]))),COUNTROWS(Table1))
Don't forget to put your new column in Percentage type or add some decimal to see the correct data.

Power BI remove duplicates based on max value

I have 2 column; ID CODE, value
Remove duplicates function will remove the examples with the higher value and leave the lower one. Is there any way to remove the lower ones? The result I expected was like this.
I've tried Buffer Table function before but it doesn't work. Seems like Buffer Table just works with date-related data (newest-latest).
You could use SUMMARIZE which can be used similar to a SQL query that takes a MIN value for a column, grouped by some other column.
In the example below, MIN([value]) is taken, given a new column name "MinValue", which is grouped by IDCode. This should return the min value for each IDCode.
NewCalculatedTable =
SUMMARIZE(yourTablename, yourTablename[IDCode], "MinValue", MIN(yourTablename[value]) )
Alternatively, if you want the higher values just replace the MIN function with MAX.

Work with matrix (I can't edit visualisation)

I have this table in Power BI, But I can't do another table.
How I can do this?
Now the values are grouped by date (different fields have information under one date, next the same fields are grouped by another date)
I want the values in the columns to be grouped by field (one field has date information next to it).
Edit1:
I can't set Date on the 2nd place in the grouping
Because date is column, traffic,orders,rev,costs- are values
You need to set Date on the 2nd place in the grouping, after a field containing traffic, orders, etc.
EDIT:
You need to unpivot these columns first, for example, in PowerQuery. Use Edit Query. This results in transforming your 4 columns to 2: Attribute and Value. Attribute will be your first grouping parameter. 2nd will be Date. Value column goes to values.
If you need your source query somewhere else, you may create new query for this very report only. It is done by first right-clicking original one and selecting Reference Query, and the doing any edits. This will keep original query intact.

How do I change the individual value to be summed during annotation in Django?

I am front end developer new to django. There is a certain column(server_reach) in our postgres DB which has values of (1,2). But I need to write a query which tells me if at least one of the filtered rows has a row with reachable values( 1= not reachable, 2 = reachable).
I was initially told that the values of the column would be (0,1) based on which I wrote this:
ServerAgent.objects.values('server').filter(
app_uuid_url=app.uuid_url,
trash=False
).annotate(serverreach=Sum('server_reach'))
The logic is simple that I fetch all the filtered rows and annotate them with the sum of the server_reaches. If this is more than zero then at least one entry is non-zero.
But the issue is that the actual DB has values (1,2). And this logic will not work anymore. I want to subtract the server_reach of each row by '1' before summing. I have tried F expressions as below
ServerAgent.objects.values('server').filter(
app_uuid_url=app.uuid_url,
trash=False
).annotate(serverreach=Sum(F('server_reach')-1))
But it throws the following error. Please help me getting this to work.
AttributeError: 'ExpressionNode' object has no attribute 'split'
Use Avg instead of Sum. If average value is greater than 1 then at least one row contains value of 2.