I would like to create a matrix visual like below and add data bars as conditional formating to the "Sales Percentage" Column with different user defined max and min values based on the countries.
I have the following dummy data
Salesperson
Country
Product
Sales Percentage
Total Sales
Gina
Canada
City Bike
0.02
232
Gina
Canada
Mountain Bike
0.56
2800
Gina
Italy
City Bike
0.32
213
Gina
Italy
Mountain Bike
0.21
1050
Gina
USA
City Bike
0.11
122
Gina
USA
Mountain Bike
0.43
2150
John
Canada
City Bike
0.32
333
John
Canada
Mountain Bike
0.34
442
John
Italy
City Bike
0.12
2132
John
Italy
Mountain Bike
0.67
1233
John
USA
City Bike
0.22
3300
John
USA
Mountain Bike
0.45
7300
Mary
Canada
City Bike
0.21
121
Mary
Canada
Mountain Bike
0.53
2650
Mary
Italy
City Bike
0.32
213
Mary
Italy
Mountain Bike
0.12
600
Mary
USA
City Bike
0.11
123
Mary
USA
Mountain Bike
0.12
600
The matrix looks like this after showing columns as rows and putting "Sales Percentage" and "Total Sales" as values, Country as columns and Product + Salesperson as rows:
I can add databars when I right click the Sales Percentage under values but I can only enter one user defined min and max value for the whole "Sales Percentage" column. Is it possible to have different maximum value for data bars based on the Country? For example to create a target value of 35% for Canada, 40% for USA and 50% for Italy. So in other words the data bar would be full when the Sales Percentage for Canada reaches 35% and full when Sales Percentage for USA reaches 40% and so on.
This isn't possible with you current setup. The best you could do to approximate this is as follows.
Create a measure as follows:
% Canada = CALCULATE(SUM('Table'[Total Sales]), 'Table'[Country ] = "Canada")
Do the same for USA and Italy and then add them as values to your matrix.
You can now select individual targets for each country.
Related
I need to create a matrix in the following format The total sales and percentage sales below each other:
This is why I have created a table with data like this:
Salesperson
Country
Sales
Product
Format
John
USA
0.45
Mountain Bike
Percentage
John
Canada
0.34
Mountain Bike
Percentage
John
Italy
0.67
Mountain Bike
Percentage
Gina
USA
0.43
Mountain Bike
Percentage
Gina
Canada
0.56
Mountain Bike
Percentage
Gina
Italy
0.21
Mountain Bike
Percentage
Mary
USA
0.12
Mountain Bike
Percentage
Mary
Canada
0.53
Mountain Bike
Percentage
Mary
Italy
0.12
Mountain Bike
Percentage
John
USA
0.22
City Bike
Percentage
John
Canada
0.32
City Bike
Percentage
John
Italy
0.12
City Bike
Percentage
Gina
USA
0.11
City Bike
Percentage
Gina
Canada
0.02
City Bike
Percentage
Gina
Italy
0.32
City Bike
Percentage
Mary
USA
0.11
City Bike
Percentage
Mary
Canada
0.21
City Bike
Percentage
Mary
Italy
0.32
City Bike
Percentage
John
USA
2250
Mountain Bike
Total
John
USA
1700
Mountain Bike
Total
John
USA
3350
Mountain Bike
Total
Gina
USA
2150
Mountain Bike
Total
Gina
Canada
2800
Mountain Bike
Total
Gina
Italy
1050
Mountain Bike
Total
Mary
USA
600
Mountain Bike
Total
Mary
Canada
2650
Mountain Bike
Total
Mary
Italy
600
Mountain Bike
Total
John
USA
1100
City Bike
Total
John
USA
1600
City Bike
Total
John
USA
600
City Bike
Total
...
...
...
...
...
Under Sales column is the total amount and percentage amount of sale and the matrix will filter after the Format column. But since I need to change the format of the percentage to percent, because it's in decimal format, I have created a measure for sales like this:
Sales_all =
VAR variable = SUM ( 'Table'[Sales])
RETURN
SWITCH (
SELECTEDVALUE ( 'Table'[Format]),
"Total", FORMAT ( variable, "General Number" ),
"Percentage", FORMAT ( variable, "Percent" ))
I have two questions. I would like to create a data bar conditional formatting for Percentage:
Is it possible to use different values for max and min of the data bar for each country. Currently when I choose data bars, I can only enter values for the whole column of Sales, disregarding the Countries (Canada, Italy, USA). For example I would like to enter a max value for Canada as 60% and max value for Italy as 25%. If I use the Sales column directly, not as measure, I can only choose one max value for the whole Sales column. The bar for the percentage should be full at 60% for Canada and full at 25% for Italy.
Since I have used a measure to change the format of the values in Sales column based on the Format column, I can't choose data bar under conditional formatting anymore? Why is this the case and how can I change it?
Please keep each post to a single question. Please don't paste data as images and keep the sample data as copiable text.
I don't understand question 1 so you will need to elaborate (ideally in a brand new question with copiable sample data). The reason for question 2 is that FORMAT() returns text and so is no longer a number and can't produce a data bar. Either keep the measure as a number or change the display formatting using calculation groups.
EDIT
You need to reshape your data. In PQ, pivot Format column with value of Sales as follows:
You end up with this (missing data because your sample wasn't complete)
Create a matrix as follows:
Highlight the column or measure for percentage and in the ribbon select percent for the format. This keeps the underlying value as a number but changes the display only.
On the matrix, ensure you have the following option.
You should now have the following:
You can now add data bars to percentage column.
My dataset looks like this:
Country
year
poverty rate
sales
Austria
1950
0.54
142
Austria
1951
0.32
12441
Austria
1952
0.32
12441
Bangladesh
1950
0.11
142123123
Bangladesh
1951
0.52
1234
Bangladesh
1952
0.32
12441
Sri Lanka
1950
0.95
4215
Sri Lanka
1951
0.21
142421
Sri Lanka
1952
0.32
12441
I want to do tsset so that I can (for example) create a new variable for change in sales per year for each country. When I try to do tsset country year, I see "repeated time values within panel". How can I create a new variable that is change in sales per year for each country and year? I have more variables so I would want to be able to specify the variable.
country looks like a string variable from here, but if it were then
tsset country year
would fail for that reason. So, suppose country is a numeric variable with value labels. Then it is essential to follow up the report of repeated observations with say
duplicates list country year
duplicates tag country year, gen(tag)
edit if tag
Then the next step depends on what you see, for example:
The duplicates are just junk with missing values on one of those variables. drop the junk.
Accidental duplicate observations. drop the duplicates.
Something more serious.
See also FAQ https://www.stata.com/support/faqs/data-management/repeated-time-values/
I have table with thousands of record. i want to create a table visual that will show top 5 records for each category. i created a measure to achieve this and i am getting the result exactly the same i am looking for but facing one issue there.
See below image where i am showing top 5 records for each category, but after each category i have total.
I don't want that total for top 5 records i am showing in the table instead i want the total of all the records which is there under each category.
How can i achieve that?
Measure I created is - Top 5 = RankX(AllSelected(table(Category), Table(account), table(name)),amount_measure,,,Dense)
for Top 5 measure i am putting the filter for top 5.
Category
Account
Name
P%
amount
country
owner
Food
A101
AA11
10%
105
India
A
Food
A102
AA12
20%
120
India
A
Food
A103
AA13
80%
100
India
A
Food
A104
AA14
30%
150
India
A
Food
A105
AA15
60%
90
India
A
Stat
B101
AA11
10%
205
India
A
Stat
B102
AA12
20%
220
India
A
Stat
B103
AA13
80%
200
India
A
Stat
B104
AA14
30%
250
India
A
Stat
B105
AA15
60%
190
India
A
Admn
D101
AD11
10%
305
India
A
Admn
D102
AD12
20%
320
India
A
Admn
D103
AD13
80%
300
India
A
Admn
D104
AD14
30%
350
India
A
Admn
D105
AD15
60%
290
India
A
Thanks,
SK
You can try this
Let's suppose you have the following measures
_sumAMT:= SUM('Table 1'[amount])
and this is your ranking measure
_sumAMTRank:= RANKX(ALLEXCEPT('Table 1','Table 1'[Category]),[_sumAMT],,DESC,Dense)
You can revise the subtotal by doing this
_sumAMT by CAT:= CALCULATE(SUM('Table 1'[amount]),ALLEXCEPT('Table 1','Table 1'[Category]))
_revisedTotal:= IF(HASONEVALUE('Table 1'[Name])=true(),[_sumAMT],[_sumAMT by CAT])
I have a table in text form that I want to read into pandas
I can use \n to separate the rows, but how can I separate the columns they are in the format ( 2 x text fields, then 6 x numeric).
Is there a method using regex or similar?
table_text = '''Name AIC sector Price (last close) Price (bid) Price (offer) NAV Total assets (£m) Market cap (£m)
3i Infrastructure Plc Infrastructure GBX 296.00 2.96 2.96 254.50 2,268.700 2,638.645
Aberdeen Asian Income Fund Limited Asia Pacific Income GBX 227.50 2.26 2.29 252.51 479.110 399.796
Aberdeen Diversified Income & Growth Ord Flexible Investment GBX 95.20 0.95 0.96 115.34 379.030 294.985
Aberdeen Emerging Markets Investment Company Limited Global Emerging Markets GBX 704.00 6.98 7.10 829.47 391.268 323.595
Aberdeen Japan Investment Trust Plc Japan GBX 712.50 7.00 7.25 784.79 114.957 94.198
Aberdeen Latin American Income Latin America GBX 57.00 0.54 0.57 62.13 40.985 32.555
Aberdeen New Dawn Asia Pacific GBX 322.00 3.22 3.26 365.56 431.544 350.752
Aberdeen New India Investment Trust Plc India GBX 516.00 5.16 5.18 601.47 375.170 301.268
Aberdeen New Thai Investment Trust Plc Country Specialist GBX 445.00 4.40 4.50 516.30 92.585 71.180
Aberdeen Smaller Companies Income Trust UK Smaller Companies GBX 358.00 3.56 3.60 397.45 95.028 79.153
Aberdeen Standard Asia Focus 2025 CULS Asia Pacific Smaller Companies GBX 100.95 1.01 1.01 97.25 391.484 37.026
Aberdeen Standard Asia Focus PLC Asia Pacific Smaller Companies GBX 1,280.00 12.75 13.00 1,440.65 483.841 402.730
Aberdeen Standard Equity Inc Trust plc UK Equity Income GBX 353.00 3.50 3.56 379.60 203.368 170.598
Aberdeen Standard European Logistics Income PLC Property - Europe GBX 116.00 1.15 1.16 117.82 309.808 305.022
Aberforth Smaller Companies Trust Plc UK Smaller Companies GBX 1,496.00 14.94 15.00 1,613.41 1,513.467 1,327.297
Aberforth Split Level Income Trust Plc UK Smaller Companies GBX 80.10 0.80 0.81 91.46 228.143 152.390
Aberforth Split Level Income ZDP 2024 UK Smaller Companies GBX 111.50 1.10 1.13 113.83 227.713 53.032
Acorn Income Fund Ltd UK Equity & Bond Income GBX 351.00 3.46 3.56 415.97 100.206 55.517
Acorn Income Fund ZDP 2022 UK Equity & Bond Income GBX 161.00 1.61 1.61 162.09 34.413 34.182
AEW UK REIT Ord Property - UK Commercial GBX 92.40 0.92 0.92 97.85 194.107 146.384'''
df = pd.DataFrame([x.split(';') for x in table_text.split('\n')])
print(df)
Outputs:
0
0 Name AIC sector Price (last close) Price (bid)...
1 3i Infrastructure Plc Infrastructure GBX 296...
2 Aberdeen Asian Income Fund Limited Asia Paci...
3 Aberdeen Diversified Income & Growth Ord Fle...
4 Aberdeen Emerging Markets Investment Company...
5 Aberdeen Japan Investment Trust Plc Japan GB...
6 Aberdeen Latin American Income Latin America...
7 Aberdeen New Dawn Asia Pacific GBX 322.00 3....
8 Aberdeen New India Investment Trust Plc Indi...
9 Aberdeen New Thai Investment Trust Plc Count...
10 Aberdeen Smaller Companies Income Trust UK S...
11 Aberdeen Standard Asia Focus 2025 CULS Asia ...
12 Aberdeen Standard Asia Focus PLC Asia Pacifi...
13 Aberdeen Standard Equity Inc Trust plc UK Eq...
14 Aberdeen Standard European Logistics Income ...
15 Aberforth Smaller Companies Trust Plc UK Sma...
16 Aberforth Split Level Income Trust Plc UK Sm...
17 Aberforth Split Level Income ZDP 2024 UK Sma...
18 Acorn Income Fund Ltd UK Equity & Bond Incom...
19 Acorn Income Fund ZDP 2022 UK Equity & Bond ...
20 AEW UK REIT Ord Property - UK Commercial GBX...
EDIT:
This is my hacky way of doing it. Relies on there being a currency column populated with "GBX" though.
Would welcome any ideas on better ways of doing this?
Is there a regex way of finding three capital letters preceded by a space and with a space then number afterwards? That would find the currency without hardcoding "GBX".
def convert_rows(df):
sector_name = "GBX"
for index, row in df.iterrows():
if sector_name in row[0]:
name = row[0].split(sector_name)[0]
numbers = row[0].split(sector_name)[1]
df.at[index, ['Name']] = name
df.at[index, ['AIC sector']] = sector_name
df.at[index,['Price (last close)', 'Price (bid)', 'Price (offer)', 'NAV', 'Total assets (£m)', 'Market cap (£m)']] = numbers.split()
return df
df = convert_rows(df)
You could try this:
import re
def convert_rows(df):
for index, row in df.iterrows():
# Search for the pattern
sector_name = re.match(r".+\s([A-Z]{3})\s\d+.+", row[0])
if sector_name:
sector_name = sector_name.group(1) # GBX for instance
name = row[0].split(sector_name)[0]
numbers = row[0].split(sector_name)[1]
df.at[index, ['Name']] = name
df.at[index, ['AIC sector']] = sector_name
df.at[index,['Price (last close)', 'Price (bid)', 'Price (offer)', 'NAV', 'Total assets (£m)', 'Market cap (£m)']] = numbers.split()
return df
I have one dataframe in the following form:
df = pd.read_csv('data/original.csv', sep = ',', names=["Date", "Gran", "Country", "Region", "Commodity", "Type", "Price"], header=0)
I'm trying to do a self join on the index Date, Gran, Country, Region producing rows in the form of
Date, Gran, Country, Region, CommodityX, TypeX, Price X, Commodity Y, Type Y, Prixe Y, Commodity Z, Type Z, Price Z
Every row should have all the different commodities and prices of a specific region.
Is there a simple way of doing this?
Any help is much appreciated!
Note: I simplified the example by ignoring a few attributes
Input Example:
Date Country Region Commodity Price
1 03/01/2014 India Vishakhapatnam Rice 25
2 03/01/2014 India Vishakhapatnam Tomato 30
3 03/01/2014 India Vishakhapatnam Oil 50
4 03/01/2014 India Delhi Wheat 10
5 03/01/2014 India Delhi Jowar 60
6 03/01/2014 India Delhi Bajra 10
Output Example:
Date Country Region Commodit1 Price1 Commodity2 Price2 Commodity3 Price3
1 03/01/2014 India Vishakhapatnam Rice 25 Tomato 30 Oil 50
2 03/01/2014 India Delhi Wheat 10 Jowar 60 Bajra 10
What you want to do is called a reshape (specifically, from long to wide). See this answer for more information.
Unfortunately as far as I can tell pandas doesn't have a simple way to do that. I adapted the answer in the other thread to your problem:
df['idx'] = df.groupby(['Date','Country','Region']).cumcount()
df.pivot(index= ['Date','Country','Region'], columns='idx')[['Commodity','Price']]
Does that solve your problem?