PowerQuery - Fill missing data according to specific pattern

PowerQuery - Fill missing data according to specific pattern - powerbi

I am trying to clean data received from an Excel file and transform it using PowerQuery (in PowerBI) into a useable format.
Below a sample table, and what I am trying to do:
| Country | Type of location |
|--------- |------------------ |
| A | 1 |
| | 2 |
| | 3 |
| B | 1 |
| | 2 |
| | 3 |
| C | 1 |
| | 2 |
| | 3 |
As you can see, I have a list of location types for each country (always constant, always the same number per country, ie each country has 3 rows for 3 location types)
What I am trying to do is to see if there is a way to fill the empty cells in the "Country" column, with the appropriate Country name, which would give something like this:
| Country | Type of location |
|--------- |------------------ |
| A | 1 |
| A | 2 |
| A | 3 |
| B | 1 |
| B | 2 |
| B | 3 |
| C | 1 |
| C | 2 |
| C | 3 |
For now I thought about using a series of if/else if conditions, but as there are 100+ countries this doesn't seem like the right solution.
Is there any way to do this more efficiently?

As Murray mentions, the Table.FillDown function works great and is built into the GUI under the Transform tab in the query editor:
Note that it only fills down to replace nulls, so if you have empty strings instead of nulls in those rows, you'll need to do a replacement first. The button for that is just above the Fill button in the GUI and you'd use the dialog box like this
or else just use the M code that this generates instead of the GUI:
= Table.ReplaceValue(#"Previous Step","",null,Replacer.ReplaceValue,{"Country"})

Yes, like you can do in Excel, you can fill down.
From the docs - Table.FillDown
I believe you will need to sort the data correctly first.
Table.FillDown(
Table.FromRecords({
[Place = 1, Name = "Bob"],
[Place = null, Name = "John"],
[Place = 2, Name = "Brad"],
[Place = 3, Name = "Mark"],
[Place = null, Name = "Tom"],
[Place = null, Name = "Adam"]
}),
{"Place"}
)

Related

How to find count of Direct Reporting's by DAX formula in Power BI?

Good day! I have a sample employee table like the one below. I need a DAX formula in Power BI to create a measure to count the number of direct reports of each employee. For Example, the Direct Report count of GL0001 will be 2 (Because GL0001 is the line manager of GL0002 and GL0019 and they report to GL0001), the Direct Report count of EMP-02023 will be 3, Direct Report count of GL0002 will be 3. Please help me also to create measures regarding the count of only one direct reporting and less than three direct reporting
| Employee ID | Line Manager ID | Layer (of Employee) | Layer (of Line Manager) |
|--------------|-------------------|----------------------|--------------------------|
| EMP-01980 | GL0003 | 4 | 3 |
| EMP-02023 | EMP-02015 | 6 | 5 |
| EMP-01636 | EMP-02015 | 6 | 5 |
| EMP-02138 | EMP-02162 | 6 | 5 |
| EMP-02145 | EMP-01980 | 5 | 4 |
| GL0023 | GL0022 | 5 | 4 |
| GL0001 | | 1 | 0 |
| GL0002 | GL0001 | 2 | 1 |
| GL0003 | GL0002 | 3 | 2 |
| GL0019 | GL0001 | 2 | 1 |
| GL0020 | GL0002 | 3 | 2 |
| GL0024 | GL0002 | 3 | 2 |
| EMP-01918 | EMP-00791 | 9 | 8 |
| EMP-01941 | EMP-00791 | 9 | 8 |
| EMP-02019 | EMP-02156 | 8 | 7 |
| EMP-02024 | EMP-02023 | 7 | 6 |
| EMP-02025 | EMP-02023 | 7 | 6 |
| EMP-03001 | EMP-02023 | 7 | 6 |

Your data doesn't have all the Employee ID for each Line Manager ID. That means the PATH calculation would not work.
I've assumed your data looks like this
Employee ID
Line Manager ID
1000001
1000002
1000001
1000003
1000002
1000004
1000003
1000005
1000004
1000006
1000005
1000007
1000006
1000008
1000007
1000009
1000006
1000010
1000003
Creating Calculated columns you can calculate the PATH and the PATH SIZE
Path
Path = path('Table'[Employee ID],'Table'[Line Manager ID])
Path Size
Path Length = PATHLENGTH([Path])
Output
Edit
In that case, you can use the Line Manager ID column to count direct reports, measure below.
DAX: Calculated Column
CountDirectReport =
VAR EmpId = [Employee ID]
RETURN
CALCULATE (
COUNTROWS ( 'Table' ),
FILTER ( 'Table', [Line Manager ID] = EmpId )
)
Output

Counting the number of items in two mutually exclusive rows of categories

Here is the sheet for testing: https://docs.google.com/spreadsheets/d/11CoQ_PAtVNQBkbtnHH0xR4bhCQVU-pcz645h1akTQuA/edit?usp=sharing
I have a table like this:
| id | category | irrelevant |
|----|----------|------------|
| 1 | cat1 | FALSE |
| 2 | cat2 | FALSE |
| 3 | | TRUE |
| 4 | cat1 | FALSE |
Each item has an ID and a category or, if it is considered irrelevant, it has no category and the column "irrelevant" is marked as TRUE.
What I would like to do is to write a formula that will return the number of items in each category plus a row with the number of irrelevant items. So in the case above the result would be:
| category | number |
|------------|--------|
| cat1 | 2 |
| cat2 | 1 |
| irrelevant | 1 |
If I try something like:
=QUERY(A1:C5,"select B,count(A) group by B")
I get the correct numbers, but since "irrelevant" is not a category its cell is empty, so the result is:
| category | count id |
|----------|----------|
| | 1 |
| cat1 | 2 |
| cat2 | 1 |
Notice the empty "B2" cell. Is there a way to rename it to "irrelevant" without altering the first table? One thing I tried was just to count the irrelevant items.
=transpose(query(A1:C5, "select count(A) where C = TRUE label count(A) 'irrelevant'"))
which returns me simply
| irrelevant | 1 |
And then altering slightly the first formula so it doesn't count the "empty" categories and finally joining both of them in an array:
={
QUERY(A1:C5,"select B,count(A) where B <> '' group by B");
TRANSPOSE(QUERY(A1:C5, "select count(A) where C = TRUE label count(A) 'irrelevant'"))
}
This returns me what I want for the example above
| category | count id |
|------------|----------|
| cat1 | 2 |
| cat2 | 1 |
| irrelevant | 1 |
But this won't work if my original table doesn't have irrelevant items. Which can occur depending on the range I chose to query, so if I want to query a table like this:
| id | category | irrelevant |
|----|----------|------------|
| 5 | cat1 | FALSE |
| 6 | cat2 | FALSE |
| 7 | cat2 | FALSE |
| 8 | cat3 | FALSE |
The solution I found will not work. Any suggestions on how can I do that?

try:
=ARRAYFORMULA(QUERY(IF((B2:B="")*(C2:C<>""), "irrelevant", ),
"select Col1,count(Col21)
where Col1 is not null
group by Col1
label count(Col2)''"))

Find a string in the entire table (all fields, all columns, all rows) in Django

I have a module (table) in my Django app with 24 fields (columns), and I want to search a string in it. I want to see a list that show me which one of the rows has this string in its fields.
Please have a look at this example:
+-----+------+------+---------+------------+------------+------------+-----+-------------+
| id | name | year | country | attribute1 | attribute2 | attribute3 | ... | attribute20 |
+-----+------+------+---------+------------+------------+------------+-----+-------------+
| 1 | Tie | 1993 | USA | Bond | Busy | Busy | ... | Free |
+-----+------+------+---------+------------+------------+------------+-----+-------------+
| 2 | Ness | 1980 | Germany | Free | Busy | Both | ... | Busy |
+-----+------+------+---------+------------+------------+------------+-----+-------------+
| 3 | Both | 1992 | Sweden | Free | Free | Free | ... | Busy |
+-----+------+------+---------+------------+------------+------------+-----+-------------+
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
+-----+------+------+---------+------------+------------+------------+-----+-------------+
| 24 | Lex | 2001 | Russia | Busy | Free | Free | ... | Both |
+-----+------+------+---------+------------+------------+------------+-----+-------------+
What I am looking to get (by using filters, etc.) is something like this: (When I filter the records base on the word "Both" in the entire table and all of the records. Each row that contains "Both" is in the result below)
+----+------+------+---------+------------+------------+------------+-----+-------------+
| id | name | year | country | attribute1 | attribute2 | attribute3 | ... | attribute20 |
+----+------+------+---------+------------+------------+------------+-----+-------------+
| 1 | Ness | 1980 | Germany | Free | Busy | Both | ... | Busy |
+----+------+------+---------+------------+------------+------------+-----+-------------+
| 2 | Both | 1992 | Sweden | Free | Free | Free | ... | Busy |
+----+------+------+---------+------------+------------+------------+-----+-------------+
| 3 | Lex | 2001 | Russia | Busy | Free | Free | ... | Both |
+----+------+------+---------+------------+------------+------------+-----+-------------+
You can see that the string ("Both") appears in different rows in different columns. (one "Both" is under the column "attribute3", the other "Both" is under column "Name", and the last "Both" is under column "attribute20".
How you get this result in Django by queryset?
Thanks

Assuming you have modeled the above table as a Django model named Person
from django.db.models import Q
query_text = "your search string"
Person.objects.filter(
Q(name__contains=query_text) |
Q(year__contains=query_text) |
Q(attribute1__contains=query_text)
and so on for all your attributes
)
The above code will do a case sensitie search. if instead you want it to be case insenssitive search, use name__icontains instead of say name__contains in the above code.
As suggested by #rchurch4 in comment and based on this so answer, here's how one could search the entire table with fewer lines of code:
from functools import reduce
from operators import or_
all_fields = Person._meta.get_fields()
search_fields = [i.name for i in all_fields]
q = reduce(or_, [Q(**{'{}__contains'.format(f): search_text}) for f in search_fields], Q())
Person.objects.filter(q)

Create column to classify rows based on realted tables DAX PowerBI

I have simplified my problem to solve. Lets suppose I have three tables. One containing data and specific codes that identify objects lets say Apples.
+-------------+------------+-----------+
| Data picked | Color code | Size code |
+-------------+------------+-----------+
| 1-8-2018 | 1 | 1 |
| 1-8-2018 | 1 | 3 |
| 1-8-2018 | 2 | 2 |
| 1-8-2018 | 2 | 3 |
| 1-8-2018 | 2 | 2 |
| 1-8-2018 | 3 | 3 |
| 1-8-2018 | 4 | 1 |
| 1-8-2018 | 4 | 1 |
| 1-8-2018 | 5 | 3 |
| 1-8-2018 | 6 | 1 |
| 1-8-2018 | 6 | 2 |
| 1-8-2018 | 6 | 2 |
+-------------+------------+-----------+
And i have two related helping tables to help understand the codes (their relationships are inactive in the model due to ambiguity with other tables in the real case).
+-----------+--------+
| Size code | Size |
+-----------+--------+
| 1 | Small |
| 2 | Medium |
| 3 | Large |
+-----------+--------+
and
+------------+----------------+-------+
| Color code | Color specific | Color |
+------------+----------------+-------+
| 1 | Light green | Green |
| 2 | Green | Green |
| 3 | Semi green | Green |
| 4 | Red | Red |
| 5 | Dark | Red |
| 6 | Pink | Red |
+------------+----------------+-------+
Lets say that I want to create an extra column in the original table to determine which apples are class A and class B given that medium green Apples are class A and large Red apples are class B, the other remain blank as the example below.
+-------------+------------+-----------+-------+
| Data picked | Color code | Size code | Class |
+-------------+------------+-----------+-------+
| 1-8-2018 | 1 | 1 | |
| 1-8-2018 | 1 | 3 | |
| 1-8-2018 | 2 | 2 | A |
| 1-8-2018 | 2 | 3 | |
| 1-8-2018 | 2 | 2 | A |
| 1-8-2018 | 3 | 3 | |
| 1-8-2018 | 4 | 1 | |
| 1-8-2018 | 4 | 1 | |
| 1-8-2018 | 5 | 3 | B |
| 1-8-2018 | 6 | 1 | |
| 1-8-2018 | 6 | 2 | |
| 1-8-2018 | 6 | 2 | |
+-------------+------------+-----------+-------+
What's the proper DAX to use given the relationships are initially inactive. Preferably solvable without creating any further additional columns in any table. I already tried codes like:
CALCULATE (
"A" ;
FILTER ( 'Size Table' ; 'Size Table'[Size] = "Medium");
FILTER ( 'Color Table' ; 'Color Table'[Color] = "Green")
)
And many variations on the same principle

Given that the relationships are inactive, I'd suggest using LOOKUPVALUE to match ID values on the other tables. You should be able to create a calculated column as follows:
Class =
VAR Size = LOOKUPVALUE('Size Table'[Size],
'Size Table'[Size code], 'Data Table'[Size code])
VAR Color = LOOKUPVALUE('Color Table'[Color],
'Color Table'[Color code], 'Data Table'[Color code])
RETURN SWITCH(TRUE(),
(Size = "Medium") && (Color = "Green"), "A",
(Size = "Large") && (Color = "Red"), "B", BLANK())
If your relationships are active, then you don't need the lookups:
Class = SWITCH(TRUE(),
(RELATED('Size Table'[Size]) = "Medium") &&
(RELATED('Color Table'[Color]) = "Green"),
"A",
(RELATED('Size Table'[Size]) = "Large") &&
(RELATED('Color Table'[Color]) = "Red"),
"B",
BLANK())
Or a bit more elegantly written (especially for more classes):
Class =
VAR SizeColor = RELATED('Size Table'[Size]) & " " & RELATED('Color Table'[Color])
RETURN SWITCH(TRUE(),
SizeColor = "Medium Green", "A",
SizeColor = "Large Red", "B",
BLANK())

sqlite - python - full outer join with attaching a database

I have 2 tables in 2 databases:
table call in history.db:
ROWID | ADDRESS | DATE
1 | +98765 | 1396771532
2 | +98765 | 1396771533
3 | +98765 | 1396771534
4 | +98765 | 1396771535
5 | +98765 | 1396771536
6 | +98765 | 1396771537
7 | +98765 | 1396771538
8 | +98765 | 1396771539
9 | +98765 | 1396771510
table info in voices.db:
ID | CALLID | PATH | CODE
1 | 2 | voice1.m4a | 12234
2 | 5 | voice2.m4a | 12234
3 | 1 | voice4.m4a | 89765
First, I did an attach:
conn = sqlite3.connect("history.db")
conn.row_factory = sqlite3.Row
cursor = conn.cursor()
cursor.execute("attach ? as voice", ("voices.db",))
conn.commit()
Then, I joined 2 tables:
cursor.execute("SELECT * FROM call c JOIN (SELECT PATH, CALLID, CODE FROM voice.info WHERE CODE = ?) f ON c.ROWID = f.CALLID ORDER BY c.DATE", ("12234",))
So, I got the following result:
ROWID | ADDRESS | DATE | PATH | CALLID | CODE
2 | +98765 | 1396771533 | voice1.m4a | 2 | 12234
5 | +98765 | 1396771536 | voice2.m4a | 5 | 12234
But, I need a full outer join to get something like:
ROWID | ADDRESS | DATE | PATH | CALLID | CODE
1 | +98765 | 1396771532 | NULL | NULL | NULL
2 | +98765 | 1396771533 | voice1.m4a | 2 | 12234
3 | +98765 | 1396771534 | NULL | NULL | NULL
4 | +98765 | 1396771535 | NULL | NULL | NULL
5 | +98765 | 1396771536 | voice2.m4a | 5 | 12234
6 | +98765 | 1396771537 | NULL | NULL | NULL
7 | +98765 | 1396771538 | NULL | NULL | NULL
8 | +98765 | 1396771539 | NULL | NULL | NULL
9 | +98765 | 1396771510 | NULL | NULL | NULL
I tried this, but I got an error of ... UNION do not have same number result columns ....
How could I have a full outer Join?
I am using Python 2.7, so
Right and FULL OUTER JOINs are not currently supported

Your original query is badly written. Write it like this:
SELECT c.*, i.* FROM CALL c
JOIN voice.info i ON i.CODE = ? AND c.ROWID = i.CALLID
ORDER BY c.DATE;
Now transforming it into the full outer join as in the answer you linked is trivial:
SELECT c.*, i.* FROM CALL c
LEFT JOIN voice.info i ON c.ROWID = i.CALLID AND i.CODE = ?
UNION
SELECT c.*, i.* FROM voice.info i
LEFT JOIN CALL c ON c.ROWID = i.CALLID
WHERE i.CODE = ?;
In the answer you linked they use UNION ALL, which keeps duplicates in the result set. I don't think you want that, so therefore I prefer to use UNION, which removes duplicates (rows where all the columns are equal).
Also: it's actually even better to write out all columns instead of using *, but I didn't do that here for brevity.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

PowerQuery - Fill missing data according to specific pattern - powerbi

Related

How to find count of Direct Reporting's by DAX formula in Power BI?

Counting the number of items in two mutually exclusive rows of categories

Find a string in the entire table (all fields, all columns, all rows) in Django

Create column to classify rows based on realted tables DAX PowerBI

sqlite - python - full outer join with attaching a database

Categories

Resources