How to analyze data with a column containing multiple values

How to analyze data with a column containing multiple values - powerbi

I'm attempting to analyze idea submissions in Power BI where there is a column with multiple values separated by commas. Here is an example of the table layout (each row being a submission):
Key
...
Tags
1
...
Chat, Service, Dallas
2
...
Banking, IVR, Miami, Zelle
3
...
New York, Collections
...
...
...
The tags column has the data I'm trying to analyze and it's sorted in whichever order they are first entered in by the submitter so they don't follow a certain structure necessarily. Some submissions may have as little as 2 tags and some as much as 15. I'm trying to figure out a way to structure the data in a way that Power BI can analyze each tag (if that makes sense, I'm sorry I'm having a difficult time explaining).
For instance, I want to be able to see the number of submissions by department (like chat or collections). I know I can split the tags column and have a separate column created for each tag but the problem I run into is that the new columns created have different values in each row depending on the order. For example, the new table after splitting the tags column would look like this:
Key
...
Tag1
Tag2
Tag3
Tag4
1
...
Chat
Service
Dallas
2
...
Banking
IVR
Miami
Zelle
3
...
New York
Collections
...
...
...
...
...
...
As you can see, the Tag1 column has mixed values in the sense that row 1 and 2 contains a department (chat and banking) but row 3 contains a location (New York). I suppose the question I'm trying to ask is if anyone has any recommendations on how to better analyze the tags so I can answer questions like:
What departments are sending in more submissions than others?
Which site locations are sending in more submissions?
I appreciate any help and advice. I hope this makes sense!

I'd suggest expanding to new rows rather than new columns.
Then your data looks like this
If you have a table of tags where each one is categorized as "Department" or "Location" or whatever, you can then merge that table onto the one above to have a nice Category column to help filter in your reporting.

Related

How do I show my closest 3 competitors on a map?

I work for a retailer and would like to create a map in Power BI for all of our stores and their 3 closest competitors, like the example below (Ideal Output). The blue dot is our store and the red dots are our competitors. Ideally, the map would change automatically when a different store is selected in a drop-down slicer. Please see the example data below (My Data).
Thanks,
Mark
Ideal Output:
My Data
Table 1: All of our stores. Very basic. Every store has its own unique number, along with basic store details including geocodes.
Table 2: All competitors. Every competitor has a unique number along with basic information including geocodes.
Table 3: Our stores and their 3 closest competitors.

Here's the solution that comes to mind for me. Create a new table that combines information from all 3 of the tables provided. It will be formatted like your Table 1 or Table 2, but with two new columns, Type and Slicer Store.
Type will specify whether the row is your own store or a competitor. Slicer Store will specify which rows should be displayed when that store's name is selected in the dropdown slicer.
Each row from your Table 3 will have 4 rows in the new table, one for each of the columns. The Slicer Store column in your new table will contain the Store Number from each row in Table 3. The Type column in your new table is self-explanatory.
You will end up with something like the following.
Number
Name
Address
Latitude
Longitude
Type
Slicer Store
JL123
...
...
...
...
Store
JL123
C1
...
...
...
...
Competitor
JL123
C2
...
...
...
...
Competitor
JL123
C3
...
...
...
...
Competitor
JL123
JL456
...
...
...
...
Store
JL456
C2
...
...
...
...
Competitor
JL456
C3
...
...
...
...
Competitor
JL456
C4
...
...
...
...
Competitor
JL456
Now for how to use it in Power BI. Create a single-select dropdown slicer using your new Slicer Store column. Then create your map visual using the rows in your new table. Use your new Type column as the Legend/Category. This allows you to color stores vs competitors differently in the visual.
Bing bang boom, you're done. Note that you may have "duplicate" rows for competitors in your new table. For example, if JL123 and JL456 were physically located right next to each other, then rows for C1, C2, and C3 would each appear twice in your new table. The Slicer Store would be different (JL123 or JL456) for these rows, though.
How you create the new table, either manually or with some sort of script, is the hard part.

Trying to replace data based on attributes from two other columns

I need to change the inventory category for a couple of account numbers and only for a couple of companies. The inventory category for these accounts are mapped based on the account number but need to be changed specifically just for two companies. I've tried to filter by the company number and then find/replace, which worked fine, but then I can't unfilter to bring back the rest of the companies. I can't change the category for just those account numbers because it is only different for just those two companies.

Lisa, Here's perhaps a simpler approach than where your current way is taking you.
If I begin with this table:
Then I add a column (Add Column -> Custom Column) with the following:
The formula uses an if statement to determine whether each row has a specific Account (Acct. 4) AND Company (Co. 8). If so, then 99 is returned as a new category value for that row of the new column. If not, then the original Inventory Category is returned as a value for that row of the new column. (Obviously, you would edit this formula accordingly, to support your account, company, and new inventory category values.)
Here's the result:
Then I could delete the original Inventory Category column and rename the remaining New Inventory Category column to Inventory Category.

Power BI - How can I create a measure aggregation from two tables?

I've got two tables Fauna_Afetada (animals affected) e Municipios (all cities of the country). A fauna afetada tem a sigla_uf (state initials which the city belongs to) e nome_municipio (name of the city), as well as the Municipios table.
I'd like to create a measure counting the number of deaths (Situação_Int = 0 in table Fauna_Afetada, specify that the animal died) by city (table Municipios). The way to join these tables is by using two columns (the sigla_uf and nome_municipio), because each Municipio in Fauna_Afetada might appear several times. How do I do that?
I'd like to create a measure with the name of the city with the highest number of deaths.
Anyone could help?
The data model is below.

It seems it is a data modeling issue in the first galce,
The model is full of bidirectional relationships that lead to not expect calculations behaviors.
Can you share the file or part of it

Power BI Dashboard where the core filter condition is a disjunction on numeric fields

We are trying to implement a dashboard that displays various tables, metrics and a map where the dataset is a list of customers. The primary filter condition is the disjunction of two numeric fields. We want to the user to be able to select a threshold for [field 1] and a separate threshold for [field 2] and then impose the condition [field 1] >= <threshold> OR [field 2] >= <threshold>.
After that, we want to also allow various other interactive slicers so the user can restrict the data further, e.g. by country or account manager.
Power BI naturally imposes AND between all filters and doesn't have a neat way to specify OR. Can you suggest a way to define a calculation using the two numeric fields that is then applied as a filter within the same interactive dashboard screen? Alternatively, is there a way to first prompt the user for the two threshold values before the dashboard is displayed -- so when they click Submit on that parameter-setting screen they are then taken to the main dashboard screen with the disjunction already applied?
Added in response to a comment:
The data can be quite simple: no complexity there. The complexity is in getting the user interface to enable a disjunction.
Suppose the data was a list of customers with customer id, country, gender, total value of transactions in the last 12 months, and number of purchases in last 12 months. I want the end-user (with no technical skills) to specify a minimum threshold for total value (e.g. $1,000) and number of purchases (e.g. 10) and then restrict the data set to those where total value of transactions in the last 12 months > $1,000 OR number of purchases in last 12 months > 10.
After doing that, I want to allow the user to see the data set on a dashboard (e.g. with a table and a graph) and from there select other filters (e.g. gender=male, country=Australia).

The key here is to create separate parameter tables and combine conditions using a measure.
Suppose we have the following Sales table:
Customer Value Number
-----------------------
A 568 2
B 2451 12
C 1352 9
D 876 6
E 993 11
F 2208 20
G 1612 4
Then we'll create two new tables to use as parameters. You could do a calculated table like
Number = VALUES(Sales[Number])
Or something more complex like
Value = GENERATESERIES(0, ROUNDUP(MAX(Sales[Value]),-2), ROUNDUP(MAX(Sales[Value]),-2)/10)
Or define the table manually using Enter Data or some other way.
In any case, once you have these tables, name their columns what you want (I used MinNumber and MinValue) and write your filtering measure
Filter = IF(MAX(Sales[Number]) > MIN(Number[MinCount]) ||
MAX(Sales[Value]) > MIN('Value'[MinValue]),
1, 0)
Then put your Filter measure as a visual level filter where Filter is not 0 and use MinCount and MinValues column as slicers.
If you select 10 for MinCount and 1000 for MinValue then your table should look like this:
Notice that E and G only exceed one of the thresholds and tha A and D are excluded.

To my knowledge, there is no such built-in slicer feature in Power BI at the time being. There is however a suggestion in the Power BI forum that requests a functionality like this. If you'd be willing to use the Power Query Editor, it's easy to obtain the values you're looking for, but only for hard-coded values for your limits or thresh-holds.
Let me show you how for a synthetic dataset that should fit the structure of your description:
Dataset:
CustomerID,Country,Gender,TransactionValue12,NPurchases12
51,USA,M,3516,1
58,USA,M,3308,12
57,USA,M,7360,19
54,USA,M,2052,6
51,USA,M,4889,5
57,USA,M,4746,6
50,USA,M,3803,3
58,USA,M,4113,24
57,USA,M,7421,17
58,USA,M,1774,24
50,USA,F,8984,5
52,USA,F,1436,22
52,USA,F,2137,9
58,USA,F,9933,25
50,Canada,F,7050,16
56,Canada,F,7202,5
54,Canada,F,2096,19
59,Canada,F,4639,9
58,Canada,F,5724,25
56,Canada,F,4885,5
57,Canada,F,6212,4
54,Canada,F,5016,16
55,Canada,F,7340,21
60,Canada,F,7883,6
55,Canada,M,5884,12
60,UK,M,2328,12
52,UK,M,7826,1
58,UK,M,2542,11
56,UK,M,9304,3
54,UK,M,3685,16
58,UK,M,6440,16
50,UK,M,2469,13
57,UK,M,7827,6
Desktop table:
Here you see an Input table and a subset table using two Slicers. If the forum suggestion gets implemented, it should hopefully be easy to change a subset like below to an "OR" scenario:
Transaction Value > 1000 OR Number or purchases > 10 using Power Query:
If you use Edit Queries > Advanced filter you can set it up like this:
The last step under Applied Steps will then contain this formula:
= Table.SelectRows(#"Changed Type2", each [NPurchases12] > 10 or [TransactionValue12] > 1000
Now your original Input table will look like this:
Now, if only we were able to replace the hardcoded 10 and 1000 with a dynamic value, for example from a slicer, we would be fine! But no...
I know this is not what you were looking for, but it was the best 'negative answer' I could find. I guess I'm hoping for a better solution just as much as you are!

How to parse through a column in Pig to create additional columns

New Apache Pig user here. I basically have data in a format and need to split this into 6 columns to create my desired schema and then load into Pig for my existing script to run.
Sorry if the format below is untidy, i cant upload a picture due to reputation score.
Existing format has 3 columns
User-Equipment values::key:bytearray values:value:bytearray
user1-mobile 20130306-AC 9
user1-mobile 20130306-AT 21
user2-laptop 20130306-BC 0
Required format:
User Equipment Date Type "Count or Time" Value
user1 mobile 20130306 A C 9
user1 mobile 20130306 A T 21
Any suggestions on how to ge this done? IS there a regex I need to write?
The tricky thing here is all the columns have a delimiter (-) between them except "Type" and column "C or T"

If you don't have a common delimiter I can think of two possibilities:
You could implement your own LoadFunc as explained here: http://ofps.oreilly.com/titles/9781449302641/load_and_store_funcs.html
You could use REGEX_EXTRACT_ALL as explained here: Apache Pig: Extra query parameters from web log
Here you go for 2.:
A = LOAD 'abc.txt' AS (line:CHARARRAY);
B = FOREACH A GENERATE FLATTEN(REGEX_EXTRACT_ALL(line, '^(.+?)\\-(.+?)\\s(.+?)\\-(.)(.)\\s(.+)$')) AS (User:CHARARRAY,Equipment:CHARARRAY,Date:CHARARRAY,Type:CHARARRAY,CountorTime:CHARARRAY,Value:CHARARRAY);

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js