How to filter distinct counts of text with a greater than indicator in Power BI? - powerbi

I am working on a report that counts stores with different types of beverages. I am trying to get a distinct count of stores that are selling 4 or more Powerade flavors and two or more Coca-Cola flavors while maintaining a count of stores that are purchashing other products (Sprite, Dr. Pepper, etc.).
My data table is BEVSALES and the data looks like:
CustomerNo Brand Flavor
43 PWD Fruit Punch
37 Coca-Cola Vanilla
43 PWD Mixed Bry
37 Coca-Cola Cherry
44 Sprite Tropical Mix
43 PWD Strawberry
43 PWD Grape
44 Coca-Cola Cherry
17 Dr. Pepper Cherry
I am trying to make the data give me a distinct count of customers with filters that have PWD>=4 and Coca-Cola>=2, while keeping the customer count of Dr. Pepper and Sprite at 1 each. (1 customer purchasing PWD, 1 customer Purchasing Coca-Cola, etc.)
The best measure that I have been able to find is
= SUMX(BEVSALES, 1*(FIND("PWD",BEVSALES[Brand],,0)))
but I don't know how to put it together so the formula counts the stores that have more than 4 PWD and 2 Coca-Cola flavors. Any ideas?

The easiest way would be to do this in a separate query. Go to the query design and click on edit. Then chose your table and group by column Brand and distinctcount the column Flavor. The result should look like this (Maybe as a new table):
GroupedBrand DistinctCountFlavor
PWD 4
Coca-Cola 2
Sprite 1
Dr. Pepper 1
Now you can access the distinct count of the flavors by brands. With an IIF() statement you can check for >=4 at PWD and so on...

Related

Power BI Matrix Visual Showing Row of Blank Values Even Though Source Data Does Not Have Blanks

I have two tables one with data about franchise locations (Franchise Profile Info) and one with Award data. Each franchise location is given a certain number of awards they are allowed to give out per year. Each franchise location rolls up to a larger group depending on where in the country they are located. These tables are in a 1 to 1 relationship using Franchise ID. I am trying to create a matrix with the number of awards, total utilized, and percentage utilized rolled up to group with the ability to expand the groups and see individual locations. For some reason when I add the value fields a blank row is created. There are not any blank rows in either of the original tables so I'm not sure where this is coming from.
Franchise Profile Info table
ID
Franchise Name
Group
Street Address
City
State
164
Park's
West
12 Park Dr.
Los Angeles
CA
365
A & J
East
243 Whiteoak Rd
Stafford
VA
271
Otto's
South
89 Main St.
St. Augustine
FL
Award table
ID
Year
TotalAwards
Utilized
164
2022
16
12
365
2022
5
5
271
2022
22
17
This tables are in a relationship with a 1 to 1 match on ID
What I want the matrix to look like
Group
Total Awards
Utilized
%Awards Utilized
East
5
5
100%
West
16
12
75%
South
22
17
77%
Instead what I'm getting is this
Group
Total Awards
Utilized
%Awards Utilized
East
5
5
100%
West
16
12
75%
South
22
17
77%
0
0
0%
I can't for the life of me figure out where this row is coming from. I can add in the Group and Franchise name as rows but as soon as I add any of the value columns this blank row shows up.
You have a value on the many side that does not exist on the one side. You can read a full explanation here. https://www.sqlbi.com/articles/blank-row-in-dax/

Appending the looking value to a dataframe within a Loop

I have two different datasets. I have done the fuzzy matching between the two data sets by running the function 'get_matches'. That gave me the perfect result in 3 columns by matching the data from full_name_to_be_check with the item_master data as below:
Matching Data from item master Score Index
126_SURGICAL SCRUB BRUSHSTER ILE(EO) DRY_MEDLINE INDUSTRIES UK LIMITED 93 0
127_SURGICAL SCRUB BRUSHCHLO RHEXIDINE_MEDLINE INDUSTRIES UK LIMITED 100 1
127_SURGICAL SCRUB BRUSHCHLO RHEXIDINE_MEDLINE INDUSTRIES UK LIMITED 88 1
128_SURGICAL SCRUB BRUSHPOVI DONE-IODINE_MEDLINE INDUSTRIES UK LIMITED 88 2
128_SURGICAL SCRUB BRUSHPOVI DONE-IODINE_MEDLINE INDUSTRIES UK LIMITED 100 2
129_SURGICAL SKIN MARKER/REGULAR BROAD TIP_FANNIN (UK) LTD 100 3
And my code is as below;
***data = pd.DataFrame([])
for i in po['full_name_to_be_check']:
t = get_matches(i,item_master['item_master'])
data = data.append(t)
data.to_excel('item_2.xlsx', index = False)***
But, I am struggling while I am trying to add the 'i' value means the looking value as a column into the data table within the loop. Can anyone help me with that, please?

How do I create a pivot table with weighted averages from a table in PowerBI?

I have data in the following format:
Building
Tenant
Type
Floor
Sq Ft
Rent
Term Length
1 Example Way
Jeff
Renewal
5
100
100
6
47 Fake Street
Tom
New
3
500
200
12
I need to create a visualisation in PowerBI that displays a pivot table of attribute by tenant, with a weighted averages (by square foot) column, like this:
Jeff
Tom
Weighted Average (by Sq Ft)
Building
1 Example Way
47 Fake Street
-
Type
Renewal
New
-
Floor
5
3
-
Sq Ft
100
500
433.3333333
Rent
100
200
183.3333333
Term Length (months)
6
12
11
I have unpivoted the original data, like this:
Tenant
Attribute
Value
Jeff
Building
1 Example Way
Jeff
Type
Renewal
Jeff
Floor
5
Jeff
Sq Ft
100
Jeff
Rent
100
Jeff
Term Length (months)
6
Tom
Building
47 Fake Street
Tom
Type
New
Tom
Floor
3
Tom
Sq Ft
500
Tom
Rent
200
Tom
Term Length (months)
12
I can almost create what I need from the unpivoted data using a matrix (as below), but I can't calculate the weighted averages column from that matrix.
Jeff
Tom
Building
1 Example Way
47 Fake Street
Type
Renewal
New
Floor
5
3
Sq Ft
100
500
Rent
100
200
Term Length (months)
6
12
I can also create a table with my attributes as headers (instead of in a column). This displays the right values and lets me calculate weighted averages (as below).
Building
Type
Floor
Sq Ft
Rent
Term Length (months)
Jeff
1 Example Way
Renewal
5
100
100
6
Tom
47 Fake Street
New
3
500
200
12
Weighted Average (by Sq Ft)
-
-
-
433.3333333
183.3333333
11
However, it's important that these values are displayed vertically instead of horizontally. This is pretty straightforward in Excel, but I can't figure out how to do it in PowerBI. I hope this is clear. Can anyone help?
Thanks!

Combine Toad sql queries with decreasing output results into one list

I've been trying to produce a result where multiple queries return more restrictive returns. How can I see the full list as well as those records that meet the more restrictive conditions? Query 1 returns 538 records of sites in the given counties.
SELECT E_SITES.ID "SITE ID",
E_SITES.NAME "SITE NAME",
E_SITES.ADDR_1 "SITE ADDRESS"
E_SITES.CITY_NAME || ', ' || E_SITES.STATE_CODE || ' ' || E_SITES.POSTAL_CODE,
E_SITES.COUNTY_NAME
FROM E_SITES
WHERE E_SITES.COUNTY_NAME IN ('ALLAMAKEE', 'BENTON', 'BLACK HAWK', 'BREMER', 'BUCHANAN', 'CHICKASAW', 'CLAYTON', 'DELAWARE', 'DUBUQUE')
ORDER BY E_SITES.ID
Query 2 returns the number of sites that have a contact person identified. This is 503 records.
SELECT E_SITES.ID "SITE ID",
E_SITES.NAME "SITE NAME",
E_SITES.ADDR_1 "SITE ADDRESS"
E_SITES.CITY_NAME || ', ' || E_SITES.STATE_CODE || ' ' || E_SITES.POSTAL_CODE,
E_SITES.COUNTY_NAME,
E_INDIVIDUALS.FIRST_NAME || ' ' || E_INDIVIDUALS.LAST_NAME
FROM E_SITES, E_AFFILIATIONS, E_INDIVIDUALS
WHERE E_SITES.SITE_ID = E_AFFILIATIONS.SITE_ID
AND E_AFFILIATIONS.INDIVIDUAL_RID = E_INDIVIDUALS.RID
AND E_AFFILIATIONS.AFFILIATION_TYPE = ('SITE_CONTACT')
AND E_SITES.COUNTY_NAME IN ('ALLAMAKEE', 'BENTON', 'BLACK HAWK', 'BREMER', 'BUCHANAN', 'CHICKASAW', 'CLAYTON', 'DELAWARE', 'DUBUQUE')
ORDER BY E_SITES.ID
A further query would return those sites with a mailing address, which reduces the results down to 486 records. I need to get all 538 records, whether or not they have a contact or mailing address, and for those that do, have one row for each site.
Additional Information
My current results can look like this for Query 1 (including column headers for clarity, quotes to distinguish data elements):
"SITE ID" "SITE NAME" "SITE ADDRESS" "CITY, STATE ZIP" "COUNTY_NAME"
"09698" "BODINE ELECTRIC" "18114 KAPP DR" "PEOSTA, IA 52067" "BREMER"
"16895" "BRUGGEMAN LUMBER" "3003 WILLOW RD" "HOPKINTON, IA 52237" "DELAWARE"
"40047" "GENEVIEVE, LLC" "707 LINCOLN ST" "GARNAVILLOR, IA 52052" "CLAYTON"
Query 2 which requires a contact person currently only returns records that meet the requirement, even though I use the (+) operator.
"SITE ID" "SITE NAME" "SITE ADDRESS" "CITY, STATE ZIP" "COUNTY_NAME" "FIRST NAME LAST NAME"
"40047" "GENEVIEVE, LLC" "707 LINCOLN ST" "GARNAVILLOR, IA 52052" "CLAYTON" "DALE KARTMAN"
I get 1 record rather than the 3 records, with 2 having no contact person and 1 with a contact person. This is my dilema. I have to run each of these queries separately, get the results and copy them to a spreadsheet. Then I have to align the records with contact names to the 1st query of all facilities. Very labor intensive. Hope this helps clarify my needs.
If I understood you correctly, it is the OUTER JOIN you're looking for.
Here's a simple example (based on Scott's EMP and DEPT tables) which shows what it is.
There are 4 departments in the DEPT table:
SQL> select deptno from dept order by deptno;
DEPTNO
----------
10
20
30
40
However, no employee works in department 40:
SQL> select deptno, ename from emp order by deptno;
DEPTNO ENAME
---------- ----------
10 KING
10 CLARK
10 MILLER
20 FORD
20 SMITH
20 JONES
30 JAMES
30 TURNER
30 MARTIN
30 WARD
30 ALLEN
30 BLAKE
12 rows selected.
SQL>
If you want to display information collected from both of those tables (department name from the DEPT table and employee name from the EMP table), you'd join those tables - just like you did (I'll use ANSI syntax which actually JOINS tables, instead of enumerating them and putting join conditions into the WHERE clause):
SQL> select d.deptno, d.dname, e.ename
2 from dept d join emp e on e.deptno = d.deptno
3 order by d.deptno;
DEPTNO DNAME ENAME
---------- -------------- ----------
10 ACCOUNTING KING
10 ACCOUNTING CLARK
10 ACCOUNTING MILLER
20 RESEARCH FORD
20 RESEARCH SMITH
20 RESEARCH JONES
30 SALES JAMES
30 SALES TURNER
30 SALES MARTIN
30 SALES WARD
30 SALES ALLEN
30 SALES BLAKE
12 rows selected.
SQL>
Looks OK, but - I'd like to get information about DEPTNO = 40, although nobody works in it. So, use outer join:
SQL> select d.deptno, d.dname, e.ename
2 from dept d left join emp e on e.deptno = d.deptno
3 order by d.deptno;
DEPTNO DNAME ENAME
---------- -------------- ----------
10 ACCOUNTING KING
10 ACCOUNTING CLARK
10 ACCOUNTING MILLER
20 RESEARCH FORD
20 RESEARCH SMITH
20 RESEARCH JONES
30 SALES JAMES
30 SALES TURNER
30 SALES MARTIN
30 SALES WARD
30 SALES ALLEN
30 SALES BLAKE
40 OPERATIONS
13 rows selected.
SQL>
Right! Here it is! (note that LEFT JOIN produces the same result as LEFT OUTER JOIN; no need to specify "outer", although it makes thinks somewhat more obvious).
Also, there's the "old" Oracle outer join operator, (+) (literally, a + sign enclosed into round brackets). The above query would work as well if we put it like this:
select d.deptno, d.dname, e.ename
from dept d, emp e
where d.deptno = e.deptno (+);
I'd suggest you do the same with (outer join) your query. Once again:
join tables in the JOIN clause
put filters into the WHERE clause
Query will be easier to read and maintain, you'll know what is what, and - if necessary (and it might even be the case for you), if you use the "old" (+) operator, you won't be able to outer join one table to more than just one another table. As you're going deeper and deeper, you might need to outer join some table to several others, and that's where ANSI join takes place.
Good luck!

tabstat: How to sort/order the output by a certain variable?

I gathered some NBA players' data of their triple-double games, and would like to find out who got the most explosive data on average.
The source is "Basketball Reference - Player Game Finder - Triple Doubles".(Sorry that I can't post the direct url because of the lack of reputation)
So I generated a table summarizing descriptive statistics (e.g. count mean) for several variables (pts trb ast stl blk) usingļ¼š
tabstat pts trb ast stl blk, statistics(count mean) format(%9.1f) by(player)
What I get is the following table:
tabstat result:
How can I tell Stata to filter the players by count >= 10 (who got 10 or more triple-doubles ever) as a column then sort the table by pts and get:
Ideal result:
Like above, I would say Michael Jordan and James Harden are the Top 2 most explosive triple-double players and Darrell Walker is the most economic one.
Do study https://stackoverflow.com/help/mcve on how to present an example other people can work with straight away. Also, avoiding sports-specific jargon that won't be universally comprehensible and focusing more on the general programming problem would help. Fortunately, what you want seems clear nevertheless.
To do this you need to create a variable defining the order desired in advance of your tabstat call. To get it (value) labelled as you wish, use labmask (search labmask then download from the Stata Journal location given).
Here is some technique.
sysuse auto, clear
egen mean = mean(weight), by(rep78)
egen count = count(weight), by(rep78)
egen group = group(mean rep78) if count >= 5
replace group = -group
labmask group, values(rep78)
label var group "`: var label rep78'"
tabstat mpg weight , by(group) s(count mean) format(%1.0f)
Summary statistics: N, mean
by categories of: group (Repair Record 1978)
group | mpg weight
-------+--------------------
2 | 8 8
| 19 3354
-------+--------------------
3 | 30 30
| 19 3299
-------+--------------------
4 | 18 18
| 22 2870
-------+--------------------
5 | 11 11
| 27 2323
-------+--------------------
Total | 67 67
| 21 3030
----------------------------
Key details:
The grouping variable is based not only on the means you want to sort on but also on the original grouping variable, just in case there are ties on the means.
To get ordering from highest mean downwards, the grouping variable must be negated.
tabstat doesn't show variable labels in the body of the table. (Usually there wouldn't be enough space for them.)