I have a dashboard in power BI that i want to group the countries by their continent name using bar chart
currently when I do it i have the below
Expected output
Any idea on how i can achieve this?
this is my day
Continet Country TotalSales
Africa Ghana 7612491.751
Africa Nigeria 14124361.42
Africa South Africa 5112305.914
Asia China 17817372.96
Asia India 7641389.641
Australia/Oceania Australia 12740363.52
Europe France 15415410.76
Europe Germany 12750071.97
Europe Turkey 6382936.304
Europe United Kingdom 23096905.81
North America Canada 8812713.914
North America United States 11517603.12
South America Brazil 10218528.38
You can put both Continet and Country in the Axis box and drill down but for some reason, Power BI only lets you turn off Concatenate labels on a horizontal bar chart.
Related
Right now, I have a view with a mess of common, conditional string replacement and substitutions for an open text field - in this example, regional classification.
(Please ignore the accuracy of geography, I'm just working with historical standard assignments. Also, I know I could speed things up with REPLACE or even just cleaning the RegEx statements for lookback - I'm just asking about the variable/nesting here.)
CREATE OR REPLACE FUNCTION public.region_cleanup(record_region text)
RETURNS text
LANGUAGE sql
STRICT
AS $function$
SELECT REGEXP_REPLACE(
REGEXP_REPLACE(
REGEXP_REPLACE(
REGEXP_REPLACE(
REGEXP_REPLACE(
REGEXP_REPLACE(record_region,'(NORTH AMERICA\s\-\sUSA\s\-\sUSA)','USA')
,'Rest\sof\sthe\sWorld\s\-\s','')
,'NORTH\sAMERICA\s\-\sCANADA','NORTH AMERICA - Canada')
,'\&\;','&')
,'Georgia\s\-\sGeorgia','MIDDLE EAST - Georgia')
,'EUROPE - Turkey','MIDDLE EAST - Turkey')
A sample output using this function would look like this in my dataset, pulling out records impacted (some are already in the correct format):
record_region_input
record_region_output
NORTH AMERICA - USA - USA - NORTHEAST - Massachusetts - Boston Metro
USA - NORTHEAST - Massachusetts - Boston Metro
NORTH AMERICA - USA - USA - MIDATLANTIC - Virginia
USA - MIDATLANTIC - Virginia
Rest of the World - ASIA - Thailand
ASIA - Thailand
Rest of the World - EUROPE - Portugal
EUROPE - Portugal
Rest of the World - ASIA - China - Shanghai Metro
ASIA - China - Shanghai Metro
Georgia - Georgia
MIDDLE EAST - Georgia
This is... fine. Regex is needed since there's tons of variability on what may come before or after these strings, and I have a proper validation list elsewhere. This is just a bulk scrub of common historical naming issues.
The problem is where I get hundreds of these kind of "known substitutions" (100+) for things like company naming or cross-department standards. Having dozens and dozens of REGEXP_REPLACE( nested statements makes editing/adding/dropping anything a maddening game of counting.
I'm trying to clean data within Postgres exclusively, since my current pipeline doesn't always allow for standardization prior to upload. I know how I'd tackle this cleanly outside of pure SQL, but in a 'vanilla' PostgreSQL instance (v12+) is there a better method for transforming strings for a view?
Updated with a sample input/output table using the example function.
If when you will split a string of data into additional regions then maybe replacing regions will be easy for you. For example:
with tb as (
select 1 as id, 'NORTH AMERICA - USA - USA - NORTHEAST - Massachusetts - Boston Metro' as record_region_input
union all
select 2 as id, 'NORTH AMERICA - USA - USA - MIDATLANTIC - Virginia'
union all
select 3 as id, 'Rest of the World - ASIA - China - Shanghai Metro'
)
select * from (
select distinct tb.id, unnest(string_to_array(record_region_input, ' - ')) as region from tb
order by tb.id
) a1 where a1.region not in ('NORTH AMERICA', 'Rest of the World');
-- Result:
1 Boston Metro
1 Massachusetts
1 NORTHEAST
1 USA
2 MIDATLANTIC
2 USA
2 Virginia
3 ASIA
3 China
3 Shanghai Metro
After then, for example, for duplicating regions you can use distinct, for unnecessary regions you can use NOT in, and you can use like '%ASIA%' to get all regions which contain ASIA and etc. After all processes, you can merge the corrected string again. Example:
with tb as (
select 1 as id, 'NORTH AMERICA - USA - USA - NORTHEAST - Massachusetts - Boston Metro' as record_region_input
union all
select 2 as id, 'NORTH AMERICA - USA - USA - MIDATLANTIC - Virginia'
union all
select 3 as id, 'Rest of the World - ASIA - China - Shanghai Metro'
)
select a1.id, string_agg(a1.region, ' - ') from (
select distinct tb.id, unnest(string_to_array(record_region_input, ' - ')) as region from tb
order by tb.id
) a1 where a1.region not in ('NORTH AMERICA', 'Rest of the World')
group by a1.id
-- Return:
1 Boston Metro - Massachusetts - NORTHEAST - USA
2 MIDATLANTIC - USA - Virginia
3 ASIA - China - Shanghai Metro
This is a simple idea, maybe this idea helps you to replace regions.
I have a table in text form that I want to read into pandas
I can use \n to separate the rows, but how can I separate the columns they are in the format ( 2 x text fields, then 6 x numeric).
Is there a method using regex or similar?
table_text = '''Name AIC sector Price (last close) Price (bid) Price (offer) NAV Total assets (£m) Market cap (£m)
3i Infrastructure Plc Infrastructure GBX 296.00 2.96 2.96 254.50 2,268.700 2,638.645
Aberdeen Asian Income Fund Limited Asia Pacific Income GBX 227.50 2.26 2.29 252.51 479.110 399.796
Aberdeen Diversified Income & Growth Ord Flexible Investment GBX 95.20 0.95 0.96 115.34 379.030 294.985
Aberdeen Emerging Markets Investment Company Limited Global Emerging Markets GBX 704.00 6.98 7.10 829.47 391.268 323.595
Aberdeen Japan Investment Trust Plc Japan GBX 712.50 7.00 7.25 784.79 114.957 94.198
Aberdeen Latin American Income Latin America GBX 57.00 0.54 0.57 62.13 40.985 32.555
Aberdeen New Dawn Asia Pacific GBX 322.00 3.22 3.26 365.56 431.544 350.752
Aberdeen New India Investment Trust Plc India GBX 516.00 5.16 5.18 601.47 375.170 301.268
Aberdeen New Thai Investment Trust Plc Country Specialist GBX 445.00 4.40 4.50 516.30 92.585 71.180
Aberdeen Smaller Companies Income Trust UK Smaller Companies GBX 358.00 3.56 3.60 397.45 95.028 79.153
Aberdeen Standard Asia Focus 2025 CULS Asia Pacific Smaller Companies GBX 100.95 1.01 1.01 97.25 391.484 37.026
Aberdeen Standard Asia Focus PLC Asia Pacific Smaller Companies GBX 1,280.00 12.75 13.00 1,440.65 483.841 402.730
Aberdeen Standard Equity Inc Trust plc UK Equity Income GBX 353.00 3.50 3.56 379.60 203.368 170.598
Aberdeen Standard European Logistics Income PLC Property - Europe GBX 116.00 1.15 1.16 117.82 309.808 305.022
Aberforth Smaller Companies Trust Plc UK Smaller Companies GBX 1,496.00 14.94 15.00 1,613.41 1,513.467 1,327.297
Aberforth Split Level Income Trust Plc UK Smaller Companies GBX 80.10 0.80 0.81 91.46 228.143 152.390
Aberforth Split Level Income ZDP 2024 UK Smaller Companies GBX 111.50 1.10 1.13 113.83 227.713 53.032
Acorn Income Fund Ltd UK Equity & Bond Income GBX 351.00 3.46 3.56 415.97 100.206 55.517
Acorn Income Fund ZDP 2022 UK Equity & Bond Income GBX 161.00 1.61 1.61 162.09 34.413 34.182
AEW UK REIT Ord Property - UK Commercial GBX 92.40 0.92 0.92 97.85 194.107 146.384'''
df = pd.DataFrame([x.split(';') for x in table_text.split('\n')])
print(df)
Outputs:
0
0 Name AIC sector Price (last close) Price (bid)...
1 3i Infrastructure Plc Infrastructure GBX 296...
2 Aberdeen Asian Income Fund Limited Asia Paci...
3 Aberdeen Diversified Income & Growth Ord Fle...
4 Aberdeen Emerging Markets Investment Company...
5 Aberdeen Japan Investment Trust Plc Japan GB...
6 Aberdeen Latin American Income Latin America...
7 Aberdeen New Dawn Asia Pacific GBX 322.00 3....
8 Aberdeen New India Investment Trust Plc Indi...
9 Aberdeen New Thai Investment Trust Plc Count...
10 Aberdeen Smaller Companies Income Trust UK S...
11 Aberdeen Standard Asia Focus 2025 CULS Asia ...
12 Aberdeen Standard Asia Focus PLC Asia Pacifi...
13 Aberdeen Standard Equity Inc Trust plc UK Eq...
14 Aberdeen Standard European Logistics Income ...
15 Aberforth Smaller Companies Trust Plc UK Sma...
16 Aberforth Split Level Income Trust Plc UK Sm...
17 Aberforth Split Level Income ZDP 2024 UK Sma...
18 Acorn Income Fund Ltd UK Equity & Bond Incom...
19 Acorn Income Fund ZDP 2022 UK Equity & Bond ...
20 AEW UK REIT Ord Property - UK Commercial GBX...
EDIT:
This is my hacky way of doing it. Relies on there being a currency column populated with "GBX" though.
Would welcome any ideas on better ways of doing this?
Is there a regex way of finding three capital letters preceded by a space and with a space then number afterwards? That would find the currency without hardcoding "GBX".
def convert_rows(df):
sector_name = "GBX"
for index, row in df.iterrows():
if sector_name in row[0]:
name = row[0].split(sector_name)[0]
numbers = row[0].split(sector_name)[1]
df.at[index, ['Name']] = name
df.at[index, ['AIC sector']] = sector_name
df.at[index,['Price (last close)', 'Price (bid)', 'Price (offer)', 'NAV', 'Total assets (£m)', 'Market cap (£m)']] = numbers.split()
return df
df = convert_rows(df)
You could try this:
import re
def convert_rows(df):
for index, row in df.iterrows():
# Search for the pattern
sector_name = re.match(r".+\s([A-Z]{3})\s\d+.+", row[0])
if sector_name:
sector_name = sector_name.group(1) # GBX for instance
name = row[0].split(sector_name)[0]
numbers = row[0].split(sector_name)[1]
df.at[index, ['Name']] = name
df.at[index, ['AIC sector']] = sector_name
df.at[index,['Price (last close)', 'Price (bid)', 'Price (offer)', 'NAV', 'Total assets (£m)', 'Market cap (£m)']] = numbers.split()
return df
I need some help with reshaping some data into groups. The variables are country1 and country2, and samegroup, which indicates if the countries are in the same group (continent). The original data I have is something like this:
country1
country2
samegroup
China
Vietnam
1
France
Italy
1
Brazil
Argentina
1
Argentina
Brazil
1
Australia
US
0
US
Australia
0
Vietnam
China
1
Vietnam
Thailand
1
Thailand
Vietnam
1
Italy
France
1
And I would like the output to be this:
country
group
China
1
Vietnam
1
Thailand
1
Italy
2
France
2
Brazil
3
Argentina
3
Australia
4
US
5
My first instinct would be to sort the initial data by "samegroup", then reshape (long to wide). But that doesn't quite solve the issue and I'm not sure how to continue from there. Any help would be greatly appreciated!
Unless you have a non-standard definition of continent, it is much easier to use kountry (which you will probably have to install) than reshape or repeated merges:
clear
input str12 country1 str12 country2 byte samegroup
China Vietnam 1
France Italy 1
Brazil Argentina 1
Argentina Brazil 1
Australia US 0
US Australia 0
Vietnam China 1
Vietnam Thailand 1
Thailand Vietnam 1
Italy France 1
end
capture net install dm0038_1
kountry country1, from(other) geo(marc) marker
rename (country1 GEO) (country group)
sort group country
capture ssc install sencode
sencode group, replace // or use recode here
keep country group
duplicates drop
list, clean noobs
label list group
This will produce
. list, clean noobs
country group
China Asia
Thailand Asia
Vietnam Asia
Australia Australasia
France Europe
Italy Europe
US North America
Argentina South America
Brazil South America
. label list group
group:
1 Asia
2 Australasia
3 Europe
4 North America
5 South America
I am trying to filter with a DAX measure in power BI. I have a list of countries by in my DAX formula I want to return United Kingdom and France
Country
United Kingdom
France
Germany
Turkey
South Africa
Ghana
Nigeria
Australia
New Zealand
Fiji
Solomon Islands
Canada
United States
India
Mexico
Brazil
China
My DAX is
ListCountry = CALCULATE(MAX(Orders[Country]),FILTER(Orders,Orders[Country]="France" || Orders[Country] ="United Kingdom"))
When I test it it returned only United Kingdom
BUT what I want is display
United Kingdom
France
It returns only United Kingdom, because you are calculating the MAX value (MAX(Orders[Country])). In this case, the filter returns France and United Kingdom, and the later one is the maximum value. Otherwise the filter returns what you expect:
Table = FILTER(Orders, Orders[Country] = "France" || Orders[Country] = "United Kingdom")
I have data like this along with other columns in a pandas df.
Apologies I haven't figured out how to present the question with code for the dataframe. First Post
Location:
- Tokyo, Japan
- Sacramento, USA
- Mexico City, Mexico
- Mexico City, Mexico
- Colorado Springs, USA
- New York, USA
- Chicago, USA
Does anyone know how I could isolate the country name from the location and create a new column with just the Country Name?
Try this:
In [29]: pd.DataFrame(df.Location.str.split(',',1).tolist(), columns = ['City','Country'])
Out[29]:
City Country
0 Tokyo Japan
1 Sacramento USA
2 Mexico City Mexico
3 Mexico City Mexico
4 Colorado Springs USA
5 Seoul South Korea
You can do this without any regular expressions - you can find the String.indexOf(“, “) to find the position of the seperator in the String, and then use String.substring to cut the String down to just this section.
However, a regular expression can also do this easily, but would likely be slower.