How to anonymize/mask part of string in PowerBI?

How to anonymize/mask part of string in PowerBI? - powerbi

Say I am creating a pie chart for customer called 'Air Holland', for this customer I would like to show the overlap with other customers in a pie chart, including customers called 'Air Hungary', 'Air Ireland' and 'Air Iceland'. Due to privacy regulations of my customers I can only show partial names, e.g. the first three or four letters of their name. 'Air Holland' thus changes to 'Air xxxxxxx'
To implement this now in my pie chart, I have created a new Column CustomerNameMasked that takes the customer name, and replaces all characters but the first four with an 'x'. Ideally I would like to use CustomerName as the Legend in my pie chart, and then the CustomerNameMasked as the label, such that the pie chart is created using CustomerName, but will show the masked names.
However, as far as I know such a label is not possible, so now I have used CustomerNameMasked as my Legend column. But since these name are not unique (e.g. 'Air Hungary' and 'Air Holland' are both 'Air xxxxxxx' in the CustomerNameMasked column), different customers are taken together.
Any ideas how to create unique masked customer names? Or another work-around to ensure that my pie chart correctly shows the data per customer, but the legend shows masked names?

One way of preventing anonymised names from being merged in visualisations is to make sure they are not the same.
Add a calculated column:
Anonymised = "Airline " & RANKX('MyTable','MyTable'[CustomerName],,ASC,Dense)
Result:
Airline 1
Airline 2
Airline 3
...
If you prefer x's:
Add a Anonymised_Name table,
Name Anonymised Name
"Air Holland" "Air xxxxxxx"
"Air Hungary" "Air xxxxxxx "
"Air Iceland" "Air xxxxxxx  "
Use "fake space" (alt+0160 on the numpad) to prevent PowerBI from swallowing it up. Add a relationship and use this column in visualisations.
I prefer previous option as it makes it easier to distinguish and keep track of individual customers.
If you don't care whether number of "x"s matches real name:
Anonymised_Name_2 = "Air XXXXXXX" & REPT(" ",
RANKX('MyTable','MyTable'[CustomerName],,ASC,Dense))
(again fake space alt+0160)
Depending on what you do with your report, there is a significant risk of real customer names "leaking", so ideally you would want to anonymize your data before importing it.

Related

PowerBI Matrix Visual - Replacing Blank with Zero (harder than I thought)

I am trying to replace blanks with zero in a matrix visual, but the traditional method of adding +0 is causing another problem. I have the case described below in detail. Thank you so much for any help anyone may be able to offer!
I have a (fictitious) company with 60 employees located in 5 regions (Midwest, Northeast, Pacific, and Southwest). Each employee holds an occupational type (such as chemist, auditor, geologist, truck driver, etc.). Across the entire company, there are 18 different occupational types.
Additionally, each region considers some of the occupations as critical and others as non-critical and the critical vs. non-critical occupation types vary by region. If the occupation is critical for a particular region, the occupational title (e.g. chemist) should appear in the visual and if the occupation is non-critical, the generic title ‘Non-Critical’ should appear instead of the occupational title.
To accomplish this, my PowerBI model has two related tables – employee list (dimension table/many) and occupation list (fact table/one). Each employee on the employee list has a match code that is related to the match code on the occupation list to determine if the occupation is critical or non-critical for that employee’s region. If the occupation is critical, the related field (that will be used on the row field of the visual will be the occupational title. If non-critical, the related field will be the generic title ‘Non-Critical’.
Here’s an example of three records from the employee list fact table:
Image A
And here’s an example of some records from the occupational list dimension table:
Image B
The purpose of the visual is to show the count of employees onboard at two points in time (called FY20 and FY21) by occupational type with a slicer to filter by region.
The employee count is produced using the measure =COUNTROWS(Employee List)
Everything works great at this point. Here is an example of the visual filtered to Midwest, which correctly shows the Midwest Region’s 10 critical occupations broken out by occupational title and the employee counts. (non-critical count also correctly shown)
Image C
And as a second example, here is the view filtered to the Pacific Region showing the Pacific’s 3 critical occupations (non-critical also correctly show):
Image D
My only goal with this visual is to display zero instead of a blank for those cases where there are no employees. When I modify the measure to:
=COUNTROWS(Employee List) + 0
I get the following result (filtering to Midwest for example):
Image E
So, the result is that the formula did replace the blanks with zeros, but now all the entire company’s 18 critical occupations are displayed and not just the 10 for the Midwest. The counts are still correct for the Midwest, but I only want to show the Midwest occupations as they were appearing correctly before I added +0 to the measure. If I try to simply filter them out at the visual level, then they will stay filtered when I switch region where they should be unfiltered.
It seems the behavior is that a blank being replaced by a value (0) means that when there is a combination for which there is no data (such as Midwest/Chemist), the visual will still show 0 as a result for that combination.
I’m looking for anything I can do to replace blanks with zero and not displace the occupation types that don’t apply for the region. I would appreciate any assistance as I’ve been thinking about this for hours and have hit a wall.
Thank you!

I suggest a measure on the following form, written verbosely:
# Employees w/ zeroes =
VAR _employees = [# Employees]
VAR _totalEmployees = CALCULATE ( [# Employees] , REMOVEFILTERS ( 'Employee List'[Year] ) )
RETURN
_employees + IF ( ISNUMBER ( _totalEmployees ) , 0 )
This will first check that the occupation type has employees for the selected filter context, and only tack on a zero if so. The column specified in REMOVEFILTERS() must correspond to whatever you are using in your visualization - it is used to modify the filter context.

It looks like a fairly simple (if possibly temporary) solution is available for this problem by using conditional/advanced filtering on the visual. I set the advanced filter to show when the value is not 0 and this seemed to take care of it. Thank you for the DAX code and I will explore those options as well.
Thanks again!

Trying to replace data based on attributes from two other columns

I need to change the inventory category for a couple of account numbers and only for a couple of companies. The inventory category for these accounts are mapped based on the account number but need to be changed specifically just for two companies. I've tried to filter by the company number and then find/replace, which worked fine, but then I can't unfilter to bring back the rest of the companies. I can't change the category for just those account numbers because it is only different for just those two companies.

Lisa, Here's perhaps a simpler approach than where your current way is taking you.
If I begin with this table:
Then I add a column (Add Column -> Custom Column) with the following:
The formula uses an if statement to determine whether each row has a specific Account (Acct. 4) AND Company (Co. 8). If so, then 99 is returned as a new category value for that row of the new column. If not, then the original Inventory Category is returned as a value for that row of the new column. (Obviously, you would edit this formula accordingly, to support your account, company, and new inventory category values.)
Here's the result:
Then I could delete the original Inventory Category column and rename the remaining New Inventory Category column to Inventory Category.

How to edit the Query and remove Order By clause

I picked an entire table as a Data Source and picked my fields. The SQL of it returns as:
SELECT customer_id AS customer_id,
country AS country,
count(invoice_num) AS total_invoices
FROM sales
GROUP BY customer_id,
country
ORDER BY total_invoices DESC
LIMIT 10000;
I do not want this ORDER BY total_invoices DESC as it is ruining the entire result. What should I do?

I think it always orders by the first metric. If you have multiple metrics, you can re-order them to change which one is used in the Order By clause.
Under Customization, the bar chart also has a "Sort Bars" option to sort the x-axis by label, which might work for you depending on what kind of result you're looking for.

Split a column of lists into multiple columns in PowerBI

I have imported a JSON file into PowerBI and it contains a column in which the values are of type "List". I am looking to expand that column into multiple columns.
Specifically, the data contains a Sprint Name, the start date and the end date of the sprint, along with some other values associated with each sprint.
Trying to use "Expand to new rows" duplicates each sprint instance, creating a table that looks like this, duplicating each sprint instance multiple times for each associated value:
Sprint Name Value
JAN(S1Dev) 2019-01-01
JAN(S1Dev) 2019-01-13
JAN(S1Dev) {attribute}
JAN(S1Dev) {attribute}
JAN(S2Dev) 2019-01-14
JAN(S2Dev) 2019-01-31
JAN(S2Dev) {attribute}
JAN(S2Dev) {attribute}
FEB(S1Test) 2019-02-01
FEB(S1Test) 2019-02-15
... ...
I would like to do something similar to the "expand" feature, which instead creates a new column with each attribute rather than a new row. This is currently vastly increasing the size of my table for no reason, while also making the data practically un-useable. Any help would be appreciated, cheers!

I have found a very simple solution to this, but as it took me some time to figure it out I will answer my own question instead of deleting it to help others in the future...
Upon importing the JSON data into PowerBI first select "Convert to Table" to view the data as a table with editable properties.
Next, click the arrows pointing away from each other at the top of the column of Lists, and select "Extract Values".
Select a delimiter to use for concatenating values, I am choosing a comma since I know that the data contained within the list does not have any commas in it. If your data contains commas within it, choose something else. Similarly, if your data contains one of the delimiters, do not choose that as the delimiter.
It should now display a comma-separated list where it previously displayed "List" in orange text.
Now, right-click on the column and select "Split Column" then choose "By Delimiter"
Select the delimiter that you previously chose, and under "split at" select "Each occurrence of the delimiter" then click OK.
Your column should now be split into multiple columns based on the list!

power BI diaplay one value

I am using Power BI to bring together data from several systems and display a dash board with data from all of the systems.
The dashboard has a couple of filters which are then used to display the data relating to one object across all systems.
When the dashboard is first loaded and none of the filter have been selected, the data cards display information from all rows in the table.
Is there a way to make a data card only display one row of data?
or
Be blank if there are more than one row of data?

There's no direct way to look at the number of rows in the visual, count them, and do something different if there's more than 1.
That said, there are a few things you can do.
HASONEFILTER
If you have a specific column in your table that, when selected, filters your results to a single row, then you can check if there's a filter on that column using HASONEFILTER. (If you have multiple alternative columns,any of which filter to a single row, that's ok too.)
You could then create a measure for each column that tests HASONEFILTER. If true, return the MAX of the column. (The reason for MAX is because measures always have to aggregate, but the MAX of a 1-row column will be the same as the value in that column.) If false, return either BLANK() or an empty string, depending on your preference.
E.g.
ColumnAMeasure = IF(HASONEFILTER(Sheet1[Slicer Column]),MAX(Sheet1[COLUMN A]), "")
ColumnBMeasure = IF(HASONEFILTER(Sheet1[Slicer Column]),MAX(Sheet1[COLUMN B]), "")
where Sheet1 is the name of the table and "Slicer Column" is the name of the column being used as a slicer
HASONEVALUE
If you have multiple columns that could be used as filters in combination (meaning that having a filter applied on "Slicer Column" doesn't guarantee only 1 row in the table), then rather than testing HASONEFILTER, you can test HASONEVALUE.
ColumnAMeasure = IF(HASONEVALUE(Sheet1[COLUMN A]),MAX(Sheet1[COLUMN A]), "")
ColumnBMeasure = IF(HASONEVALUE(Sheet1[Column B]),MAX(Sheet1[COLUMN B]), "")
Notice that HASONEVALUE tests the current column you're trying to display, rather than a slicer column like HASONEFILTER.
One side-effect of HASONEVALUE is that, if you're filtered to 3 rows, but all 3 rows have the same value for column A, then column A will display that value. (Whereas with HASONEFILTER, column A would stay blank until you're filtered to one thing.)
Low Tech
Both answers above depend on a measure existing for every column you want to display, so that you can test whether to display a blank row or not. That could become a pain if you have dozens of columns.
A lower-tech alternative is to add in an additional row with blanks for each column and then sort your table so that that row always appears first. (And shorten your visual so only the top row is visible.) Technically the other rows would be underneath and there'd be a scrollbar, but at least the initial display would be blank rather than showing a random row.
Hopefully something here has helped. Other people might have better solutions too. More information:
HASONEFILTER documentation: https://msdn.microsoft.com/en-us/library/gg492135.aspx
HASONEVALUE documentation: https://msdn.microsoft.com/en-us/library/gg492190.aspx

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js