Postalcode for multiple countries - geocoding

Having a column that mixes German (5 digits) and Austrian (4 digits) postal codes. How can I make Tableau understand both correctly?
The column reads something like
postal-code
53173
99848
1080
1030
I assigned a geo-role to a second column that distinguishes the countries:
country-code
DE
DE
AT
AT

I used the data that you provided and I had no issue with Tableau recognizing them correctly.
You may need to specifically set the Geographic role for each field, but that should be it.

What Nicarus said.
With this data:
postal-code country-code
53173 DE
99848 DE
1080 AT
1030 AT
You can do this (just make sure to change the geographic role for the postal code as shown below):

Related

PowerBI Matrix Visual - Replacing Blank with Zero (harder than I thought)

I am trying to replace blanks with zero in a matrix visual, but the traditional method of adding +0 is causing another problem. I have the case described below in detail. Thank you so much for any help anyone may be able to offer!
I have a (fictitious) company with 60 employees located in 5 regions (Midwest, Northeast, Pacific, and Southwest). Each employee holds an occupational type (such as chemist, auditor, geologist, truck driver, etc.). Across the entire company, there are 18 different occupational types.
Additionally, each region considers some of the occupations as critical and others as non-critical and the critical vs. non-critical occupation types vary by region. If the occupation is critical for a particular region, the occupational title (e.g. chemist) should appear in the visual and if the occupation is non-critical, the generic title ‘Non-Critical’ should appear instead of the occupational title.
To accomplish this, my PowerBI model has two related tables – employee list (dimension table/many) and occupation list (fact table/one). Each employee on the employee list has a match code that is related to the match code on the occupation list to determine if the occupation is critical or non-critical for that employee’s region. If the occupation is critical, the related field (that will be used on the row field of the visual will be the occupational title. If non-critical, the related field will be the generic title ‘Non-Critical’.
Here’s an example of three records from the employee list fact table:
Image A
And here’s an example of some records from the occupational list dimension table:
Image B
The purpose of the visual is to show the count of employees onboard at two points in time (called FY20 and FY21) by occupational type with a slicer to filter by region.
The employee count is produced using the measure =COUNTROWS(Employee List)
Everything works great at this point. Here is an example of the visual filtered to Midwest, which correctly shows the Midwest Region’s 10 critical occupations broken out by occupational title and the employee counts. (non-critical count also correctly shown)
Image C
And as a second example, here is the view filtered to the Pacific Region showing the Pacific’s 3 critical occupations (non-critical also correctly show):
Image D
My only goal with this visual is to display zero instead of a blank for those cases where there are no employees. When I modify the measure to:
=COUNTROWS(Employee List) + 0
I get the following result (filtering to Midwest for example):
Image E
So, the result is that the formula did replace the blanks with zeros, but now all the entire company’s 18 critical occupations are displayed and not just the 10 for the Midwest. The counts are still correct for the Midwest, but I only want to show the Midwest occupations as they were appearing correctly before I added +0 to the measure. If I try to simply filter them out at the visual level, then they will stay filtered when I switch region where they should be unfiltered.
It seems the behavior is that a blank being replaced by a value (0) means that when there is a combination for which there is no data (such as Midwest/Chemist), the visual will still show 0 as a result for that combination.
I’m looking for anything I can do to replace blanks with zero and not displace the occupation types that don’t apply for the region. I would appreciate any assistance as I’ve been thinking about this for hours and have hit a wall.
Thank you!
I suggest a measure on the following form, written verbosely:
# Employees w/ zeroes =
VAR _employees = [# Employees]
VAR _totalEmployees = CALCULATE ( [# Employees] , REMOVEFILTERS ( 'Employee List'[Year] ) )
RETURN
_employees + IF ( ISNUMBER ( _totalEmployees ) , 0 )
This will first check that the occupation type has employees for the selected filter context, and only tack on a zero if so. The column specified in REMOVEFILTERS() must correspond to whatever you are using in your visualization - it is used to modify the filter context.
It looks like a fairly simple (if possibly temporary) solution is available for this problem by using conditional/advanced filtering on the visual. I set the advanced filter to show when the value is not 0 and this seemed to take care of it. Thank you for the DAX code and I will explore those options as well.
Thanks again!

How to anonymize/mask part of string in PowerBI?

Say I am creating a pie chart for customer called 'Air Holland', for this customer I would like to show the overlap with other customers in a pie chart, including customers called 'Air Hungary', 'Air Ireland' and 'Air Iceland'. Due to privacy regulations of my customers I can only show partial names, e.g. the first three or four letters of their name. 'Air Holland' thus changes to 'Air xxxxxxx'
To implement this now in my pie chart, I have created a new Column CustomerNameMasked that takes the customer name, and replaces all characters but the first four with an 'x'. Ideally I would like to use CustomerName as the Legend in my pie chart, and then the CustomerNameMasked as the label, such that the pie chart is created using CustomerName, but will show the masked names.
However, as far as I know such a label is not possible, so now I have used CustomerNameMasked as my Legend column. But since these name are not unique (e.g. 'Air Hungary' and 'Air Holland' are both 'Air xxxxxxx' in the CustomerNameMasked column), different customers are taken together.
Any ideas how to create unique masked customer names? Or another work-around to ensure that my pie chart correctly shows the data per customer, but the legend shows masked names?
One way of preventing anonymised names from being merged in visualisations is to make sure they are not the same.
Add a calculated column:
Anonymised = "Airline " & RANKX('MyTable','MyTable'[CustomerName],,ASC,Dense)
Result:
Airline 1
Airline 2
Airline 3
...
If you prefer x's:
Add a Anonymised_Name table,
Name Anonymised Name
"Air Holland" "Air xxxxxxx"
"Air Hungary" "Air xxxxxxx "
"Air Iceland" "Air xxxxxxx  "
Use "fake space" (alt+0160 on the numpad) to prevent PowerBI from swallowing it up. Add a relationship and use this column in visualisations.
I prefer previous option as it makes it easier to distinguish and keep track of individual customers.
If you don't care whether number of "x"s matches real name:
Anonymised_Name_2 = "Air XXXXXXX" & REPT(" ",
RANKX('MyTable','MyTable'[CustomerName],,ASC,Dense))
(again fake space alt+0160)
Depending on what you do with your report, there is a significant risk of real customer names "leaking", so ideally you would want to anonymize your data before importing it.

Tableau Custom Geocoding - Ambiguous cities

I have a lot of small towns (<15.000 population) from Germany in my data set, so Tableau (Desktop 10.3 Pro Version) has no geographical coordinates for them.
For this reason I made a .csv file (CustomGeo.csv is attached) with the following rows: Country (Name), State/Province, City, Latitude, Longitude. Also I created a schema.ini with the following content:
[CustomGeo.csv]
ColNameHeader=True
DecimalSymbol=,
Format=Delimited(;)
Col1="Country (Name)" Text
Col2="State/Province" Text
Col3="City" Text
Col4="Latitude" Double
Col5="Longitude" Double
Now, after I imported the csv file into Tableau (Map -> Geocoding -> Import Custom Gecoding), I have more than 600 ambiguous cities and I don't understand why. There are big cities like Stuttgart and München which are tagged as ambiguous. Bigger cities in Germany can have more than one Postcode (PLZ).
The possibility to select the field State/Province under Edit Locations and State/Province does not change anything.
Here you can download the csv file and my data set with for geocoding important columns [hosted on google drive]:
Custom Geocoding csv
Data set
Be careful if you want to open the csv file with Excel. Excel could change the column format so the latitude and longitude data could be fucked up :)
I hope anyone can help me with that problem. I do not know how to continue.
I have the solution:
If you use the column names (Country (Name), State/Province, City, Latitude, Longitude) then you will extend an existing role. For bigger cities (>15000 population) Tableau has geo data. So if you extend the existing role with all cities in Germany you will have the bigger cities as well in your custom geocoding file. Because of that, the error with ambiguous cities will show up and bigger cities like München (Munich) or Stuttgart can not be displayed on the map.

Need to get country ,state information based on city name

I have list of more than 10,000 cities , I need to find out corresponding state and country of this city. If any built in service or any web services is available for this ,please let me know .
Your help will be appreciated.
Thank you in Advance.
Maybe this free database can help you:
http://www.maxmind.com/en/worldcities
Includes the following fields:
- Country Code
- ASCII City Name
- City Name
- Region
- Population
- Latitude
- Longitude
If you need the states, you can use the database from here: http://download.geonames.org/export/dump/ (the file cities1000.zip has a LOT of cities)

How to parse through a column in Pig to create additional columns

New Apache Pig user here. I basically have data in a format and need to split this into 6 columns to create my desired schema and then load into Pig for my existing script to run.
Sorry if the format below is untidy, i cant upload a picture due to reputation score.
Existing format has 3 columns
User-Equipment values::key:bytearray values:value:bytearray
user1-mobile 20130306-AC 9
user1-mobile 20130306-AT 21
user2-laptop 20130306-BC 0
Required format:
User Equipment Date Type "Count or Time" Value
user1 mobile 20130306 A C 9
user1 mobile 20130306 A T 21
Any suggestions on how to ge this done? IS there a regex I need to write?
The tricky thing here is all the columns have a delimiter (-) between them except "Type" and column "C or T"
If you don't have a common delimiter I can think of two possibilities:
You could implement your own LoadFunc as explained here: http://ofps.oreilly.com/titles/9781449302641/load_and_store_funcs.html
You could use REGEX_EXTRACT_ALL as explained here: Apache Pig: Extra query parameters from web log
Here you go for 2.:
A = LOAD 'abc.txt' AS (line:CHARARRAY);
B = FOREACH A GENERATE FLATTEN(REGEX_EXTRACT_ALL(line, '^(.+?)\\-(.+?)\\s(.+?)\\-(.)(.)\\s(.+)$')) AS (User:CHARARRAY,Equipment:CHARARRAY,Date:CHARARRAY,Type:CHARARRAY,CountorTime:CHARARRAY,Value:CHARARRAY);