My dataset looks like this:
Country
year
poverty rate
sales
Austria
1950
0.54
142
Austria
1951
0.32
12441
Austria
1952
0.32
12441
Bangladesh
1950
0.11
142123123
Bangladesh
1951
0.52
1234
Bangladesh
1952
0.32
12441
Sri Lanka
1950
0.95
4215
Sri Lanka
1951
0.21
142421
Sri Lanka
1952
0.32
12441
I want to do tsset so that I can (for example) create a new variable for change in sales per year for each country. When I try to do tsset country year, I see "repeated time values within panel". How can I create a new variable that is change in sales per year for each country and year? I have more variables so I would want to be able to specify the variable.
country looks like a string variable from here, but if it were then
tsset country year
would fail for that reason. So, suppose country is a numeric variable with value labels. Then it is essential to follow up the report of repeated observations with say
duplicates list country year
duplicates tag country year, gen(tag)
edit if tag
Then the next step depends on what you see, for example:
The duplicates are just junk with missing values on one of those variables. drop the junk.
Accidental duplicate observations. drop the duplicates.
Something more serious.
See also FAQ https://www.stata.com/support/faqs/data-management/repeated-time-values/
Related
I would like to create a matrix visual like below and add data bars as conditional formating to the "Sales Percentage" Column with different user defined max and min values based on the countries.
I have the following dummy data
Salesperson
Country
Product
Sales Percentage
Total Sales
Gina
Canada
City Bike
0.02
232
Gina
Canada
Mountain Bike
0.56
2800
Gina
Italy
City Bike
0.32
213
Gina
Italy
Mountain Bike
0.21
1050
Gina
USA
City Bike
0.11
122
Gina
USA
Mountain Bike
0.43
2150
John
Canada
City Bike
0.32
333
John
Canada
Mountain Bike
0.34
442
John
Italy
City Bike
0.12
2132
John
Italy
Mountain Bike
0.67
1233
John
USA
City Bike
0.22
3300
John
USA
Mountain Bike
0.45
7300
Mary
Canada
City Bike
0.21
121
Mary
Canada
Mountain Bike
0.53
2650
Mary
Italy
City Bike
0.32
213
Mary
Italy
Mountain Bike
0.12
600
Mary
USA
City Bike
0.11
123
Mary
USA
Mountain Bike
0.12
600
The matrix looks like this after showing columns as rows and putting "Sales Percentage" and "Total Sales" as values, Country as columns and Product + Salesperson as rows:
I can add databars when I right click the Sales Percentage under values but I can only enter one user defined min and max value for the whole "Sales Percentage" column. Is it possible to have different maximum value for data bars based on the Country? For example to create a target value of 35% for Canada, 40% for USA and 50% for Italy. So in other words the data bar would be full when the Sales Percentage for Canada reaches 35% and full when Sales Percentage for USA reaches 40% and so on.
This isn't possible with you current setup. The best you could do to approximate this is as follows.
Create a measure as follows:
% Canada = CALCULATE(SUM('Table'[Total Sales]), 'Table'[Country ] = "Canada")
Do the same for USA and Italy and then add them as values to your matrix.
You can now select individual targets for each country.
I need to create a matrix in the following format The total sales and percentage sales below each other:
This is why I have created a table with data like this:
Salesperson
Country
Sales
Product
Format
John
USA
0.45
Mountain Bike
Percentage
John
Canada
0.34
Mountain Bike
Percentage
John
Italy
0.67
Mountain Bike
Percentage
Gina
USA
0.43
Mountain Bike
Percentage
Gina
Canada
0.56
Mountain Bike
Percentage
Gina
Italy
0.21
Mountain Bike
Percentage
Mary
USA
0.12
Mountain Bike
Percentage
Mary
Canada
0.53
Mountain Bike
Percentage
Mary
Italy
0.12
Mountain Bike
Percentage
John
USA
0.22
City Bike
Percentage
John
Canada
0.32
City Bike
Percentage
John
Italy
0.12
City Bike
Percentage
Gina
USA
0.11
City Bike
Percentage
Gina
Canada
0.02
City Bike
Percentage
Gina
Italy
0.32
City Bike
Percentage
Mary
USA
0.11
City Bike
Percentage
Mary
Canada
0.21
City Bike
Percentage
Mary
Italy
0.32
City Bike
Percentage
John
USA
2250
Mountain Bike
Total
John
USA
1700
Mountain Bike
Total
John
USA
3350
Mountain Bike
Total
Gina
USA
2150
Mountain Bike
Total
Gina
Canada
2800
Mountain Bike
Total
Gina
Italy
1050
Mountain Bike
Total
Mary
USA
600
Mountain Bike
Total
Mary
Canada
2650
Mountain Bike
Total
Mary
Italy
600
Mountain Bike
Total
John
USA
1100
City Bike
Total
John
USA
1600
City Bike
Total
John
USA
600
City Bike
Total
...
...
...
...
...
Under Sales column is the total amount and percentage amount of sale and the matrix will filter after the Format column. But since I need to change the format of the percentage to percent, because it's in decimal format, I have created a measure for sales like this:
Sales_all =
VAR variable = SUM ( 'Table'[Sales])
RETURN
SWITCH (
SELECTEDVALUE ( 'Table'[Format]),
"Total", FORMAT ( variable, "General Number" ),
"Percentage", FORMAT ( variable, "Percent" ))
I have two questions. I would like to create a data bar conditional formatting for Percentage:
Is it possible to use different values for max and min of the data bar for each country. Currently when I choose data bars, I can only enter values for the whole column of Sales, disregarding the Countries (Canada, Italy, USA). For example I would like to enter a max value for Canada as 60% and max value for Italy as 25%. If I use the Sales column directly, not as measure, I can only choose one max value for the whole Sales column. The bar for the percentage should be full at 60% for Canada and full at 25% for Italy.
Since I have used a measure to change the format of the values in Sales column based on the Format column, I can't choose data bar under conditional formatting anymore? Why is this the case and how can I change it?
Please keep each post to a single question. Please don't paste data as images and keep the sample data as copiable text.
I don't understand question 1 so you will need to elaborate (ideally in a brand new question with copiable sample data). The reason for question 2 is that FORMAT() returns text and so is no longer a number and can't produce a data bar. Either keep the measure as a number or change the display formatting using calculation groups.
EDIT
You need to reshape your data. In PQ, pivot Format column with value of Sales as follows:
You end up with this (missing data because your sample wasn't complete)
Create a matrix as follows:
Highlight the column or measure for percentage and in the ribbon select percent for the format. This keeps the underlying value as a number but changes the display only.
On the matrix, ensure you have the following option.
You should now have the following:
You can now add data bars to percentage column.
I need to calculate the total value of a column per employee per month. Then I need to impose a limit of 177 per employee per month. This will go into a matrix with employee as rows and months as columns. Lastly, i want to add up all the amounts per month to show the total in a line chart.
I made a measure to calculate the 1% with a max of amount of 177= if(0.01sum[amount]>177, 177,0.01sum[amount]). Then I used this measure in my matrix as explained above. This worked fine, but when i want to make the line chart the limit of 177 is still imposed because I use the same measure.
I tested it with some dummy data! Please do it like this:
Employee Month Amount
Jack January 1500
Joe February 20000
Joe March 1600
Jack April 1800
Brad June 10000
Jack July 9500
Joe February 9500
Brad April 6500
Jack December 12000
Joe June 8000
Brad April 9500
Jack January 1000
Jack April 1100
Jack April 8000
Joe February 12000
Joe February 12500
Joe February 13000
Brad June 15000
Brad June 16000
Here is the measure (DAX Code)you need to use:
your_measure =
if(0.01 * sum(your_table[Amount]) > 177, 177,0.01* sum(your_table[Amount]))
Then lets put it on a matrix and line chart:
If you want your 177 restriction not to be applied in line chart, Why not create another simple total measure:
= 0.01 * SUM(your table[amount])
Update requested from Peter
Now You need to check the whole picture! Employee is not a part of filter context. Model is filtered only by month! I added both measure as legends to the line chart!
I have a typical scenario as below.
I have a student table and it contains four columns as below :-
1.StudentID
2.StudentName
3.LastAttendanceDate
4.StudentType
Now there are some null values in the date column LastAttendanceDate.Is it possible to use a date slicer to show these values of the students who have LastAttendanceDate column value as null? In simple words: Say you are a student who went to a school on Monday, Tuesday and Friday and you were absent on Wednesday and Thursday so here Wednesday and Thursday are the days where you were absent in the week and we need to display these records in the table visualization.
My excel Input data:-
StudentID StudentName LastAttendanceDate StudentType
100 Mary 02-05-2011 10:45 Fulltime
100 Mary Fulltime
100 Mary 04-05-2011 12:45 Fulltime
100 Mary 06-05-2011 15:45 Fulltime
100 Mary Fulltime
100 Mary 08-05-2011 19:45 Fulltime
100 Mary 09-05-2011 12:45 Fulltime
101 John 02-05-2011 10:45 Part Time
101 John 03-05-2011 11:23 Part Time
101 John 04-05-2011 10:45 Part Time
101 John 06-05-2011 15:49 Part Time
101 John Part Time
101 John 08-05-2011 19:45 Part Time
101 John 09-05-2011 12:45 Part Time
so here I need to dynamically find in the week/month range or any dynamic date range say from date range 02-05-2011 and 08-05-2011 or 02-05-2011 and 09-05-2011 or even 06-05-2011 and 09-05-2011, the students who were absent and show it in my table visualization.
Can anyone provide an approach or any helpful DAX? Appreciate all the help
My present visualization looks like this :
I want to show the students who were absent in the given time range as selected in the date slicer.
so if I slide the date slicer as per minimum and maximum ranges, it should show all the rows of students who were absent or with null values for Last Attendance Date column in those time range.
Kind regards
Sameer
I need some help with reshaping some data into groups. The variables are country1 and country2, and samegroup, which indicates if the countries are in the same group (continent). The original data I have is something like this:
country1
country2
samegroup
China
Vietnam
1
France
Italy
1
Brazil
Argentina
1
Argentina
Brazil
1
Australia
US
0
US
Australia
0
Vietnam
China
1
Vietnam
Thailand
1
Thailand
Vietnam
1
Italy
France
1
And I would like the output to be this:
country
group
China
1
Vietnam
1
Thailand
1
Italy
2
France
2
Brazil
3
Argentina
3
Australia
4
US
5
My first instinct would be to sort the initial data by "samegroup", then reshape (long to wide). But that doesn't quite solve the issue and I'm not sure how to continue from there. Any help would be greatly appreciated!
Unless you have a non-standard definition of continent, it is much easier to use kountry (which you will probably have to install) than reshape or repeated merges:
clear
input str12 country1 str12 country2 byte samegroup
China Vietnam 1
France Italy 1
Brazil Argentina 1
Argentina Brazil 1
Australia US 0
US Australia 0
Vietnam China 1
Vietnam Thailand 1
Thailand Vietnam 1
Italy France 1
end
capture net install dm0038_1
kountry country1, from(other) geo(marc) marker
rename (country1 GEO) (country group)
sort group country
capture ssc install sencode
sencode group, replace // or use recode here
keep country group
duplicates drop
list, clean noobs
label list group
This will produce
. list, clean noobs
country group
China Asia
Thailand Asia
Vietnam Asia
Australia Australasia
France Europe
Italy Europe
US North America
Argentina South America
Brazil South America
. label list group
group:
1 Asia
2 Australasia
3 Europe
4 North America
5 South America