Stata table: how to compute difference column without adding a new variable? - stata

In a panel data set, I'm using
table Region TIME if TIME==2014 | TIME==2020 | TIME==2030 | TIME==2040, contents(sum BF ) row
to create the following table:
------------------------------------------
| TIME
Region | 2014 2020 2030 2040
----------+-------------------------------
701 | 26751 27941 29944 31477
702 | 10456 11354 12723 13788
704 | 41550 44481 49340 53273
706 | 44976 47535 51940 55573
709 | 43258 44398 46612 48191
711 | 6580 7011 7539 7856
713 | 9036 10139 11776 13194
714 | 3091 3284 3563 3750
716 | 9144 9730 10724 11543
719 | 5719 6292 7258 8036
720 | 11509 12161 13188 13919
722 | 21403 22344 23839 25006
723 | 4927 5094 5345 5447
728 | 2460 2576 2761 2906
|
Total | 240860 254340 276552 293959
------------------------------------------
I'd like to add a fifth column, which displays the difference between the year 2014 and 2040 in %.
Question: is this possible WITHOUT adding a new variable to the dataset? For instance by letting the fifth column being derived from a formula?
If not, how do I easily compute a new variable, taking account of the long format of the panel data set?

This isn't possible within table.
Your variable could be something like
egen total2014 = total(BF / (TIME == 2014)), by(Region)
egen total2040 = total(BF / (TIME == 2040)), by(Region)
gen pcdiff = 100 * (total2040 - total2014)/total2014
after which you can tabulate its (mean) value for each region. See Section 10 in http://www.stata-journal.com/sjpdf.html?articlenum=dm0055 for the first trick here.
You may need to go outside table for the tabulation, but if all else fails, collapse to a new dataset of totals and means.

Related

How to create column with name of column with the highest value per each ID in SAS Enterprise Guide / PROC SQL?

I have table in SAS Enterprise Guide like below:
ID | COL_A | COL_B | COL_C
-----|-------|-------|------
111 | 10 | 20 | 30
222 | 15 | 80 | 10
333 | 11 | 10 | 20
444 | 20 | 5 | 20
Requirements:
And I need to create new column "TOP" where will be the name of column with the highest values for each ID.
If for example 2 or more columns have the same highest value take the first under the alphabet.
Desire output:
ID | COL_A | COL_B | COL_C | TOP
-----|-------|-------|--------|-------
111 | 10 | 20 | 30 | COL_C
222 | 15 | 80 | 10 | COL_B
333 | 11 | 10 | 20 | COL_C
444 | 20 | 5 | 20 | COL_A
Becasue:
for ID = 111 the highest value is in COL_C, so name "COL_C" is in column "TOP"
for ID = 444 two columns have the highest value, so based on alpabet criterion in column "TOP" is name "COL_A"
How can i do that in SAS Enterprise Guide or in PROC SQL ?
This you can do with functions. Use MAX() to find the largest value. Use WHICHN() to find the index number of the first variable with that value. Use the VNAME() function to get the name of the variable with that index.
data want;
set have;
length TOP $32;
array list col_a col_b col_c;
top = vname(list[whichn(max(of list[*]),of list[*])]);
run;

Adding a measure which finds the next row value for every row (similar to SQL Lead window function)

will be very grateful if you could share your experience and advice on the following problem in Power BI:
3 Tables given in the data model:
calendar dimension table
fact table on sessions
fact table on spending
| CW | Total cost | Sessions | Expected Column 1 | Expected Column 2 |
+----+-------------+-----------+-------------------+-------------------+
| 1 | 1200 | 50 | | |
| 2 | 1500 | 60 | 1200 | 50 |
| 3 | 1700 | 48 | 1500 | 60 |
| 4 | 1150 | 36 | 1700 | 48 |
| 5 | 900 | 29 | 1150 | 36 |
+----+-------------+-----------+-------------------+-------------------+
CW column indicates the calendar week and it is from calendar table. Sessions and Total cost are from sessions and spending tables respectively. Data is aggregated and visualized on calendar week level.
Problem: I need to create measures to derive Expected column 1 and expected column 2 based on total cost and sessions columns. Basically getting next values for each row similar to lead window function.
I have checked power BI community and there are several ideas (for example here https://community.powerbi.com/t5/Desktop/DAX-Query-to-Find-Next-Value/td-p/833896).
But these solution assume all columns are from the same table, however in the above described case
all 3 columns are from different tables.
Will the be possible to get expected columns 1 and 2 and how? Many thanks in advance!

Filter out outliers dynamically using PERCENTILE

I'm building a sales dashboard in PowerBI.
I have a Sales table.
My source of data is declarative, so I have a few extreme values caused by human errors and mistypes, etc.
Let's say I want to build a histogram with:
On the X axis, the stock aging of any sales. Which is "how long the product has been in stock at the time of sale". It is given by the [Product_Age] column
On values, the number of sales.
What I want to do is exclude the top 1% extreme values from my calculations (average, etc.) and vizualisations.
I've created a measure :
SalesByAge_Adjusted =
VAR TEMP =
FILTER(
SALES;
VAR StockAgingMAX =
PERCENTILE.INC(
SALES[Sales_Age];
0,99
)
RETURN
SALES[Sales_Age] < StockAgingMAX
)
RETURN
COUNTROWS(TEMP)
It uses PERCENTILE.INC to get the 99th percentile of Sales_Age values in the current context and I try to use it as a filter.
However, it just won't work.
I can diplay the measure on its own. How many sales I have. But as soon as I drag and drop "Sales_Age" to summarize the values. It shows nothing.
I have created the following table as an example.
+-------+--------+
| Axis | Values |
+-------+--------+
| 1 | 1067 |
| 2 | 1725 |
| 4 | 298 |
| 8 | 402 |
| 16 | 1848 |
| 32 | 1395 |
| 64 | 1116 |
| 128 | 1027 |
| 256 | 1948 |
| 512 | 790 |
| 1024 | 2173 |
| 2048 | 2025 |
| 4096 | 104 |
| 8192 | 1243 |
| 16384 | 1676 |
| 32768 | 1285 |
| 65536 | 806 |
+-------+--------+
For filtering the values that are out the 99% percentile I've created the following measure. Basically it gets an overall percentile without filter context and compares to each Axis value.
Filter = IF(CALCULATE(PERCENTILE.INC('Table'[Axis],0.99),ALL('Table'))>=MAX('Table'[Axis]),1,0)
In the visual of the chart, you use the filter measure to exclude your outliers
In this case, it will filter the last value of table: 65,536

Filter column by row value PowerBI

I'm trying to write a DAX function to find the maximum value in one column based on a condition in another, but have this condition change dynamically based on the row value.
With this code:
CALCULATE(MAX(RankOfArea[count]),filter(RankOfArea,RankOfArea[Line]="Pic"))
I get this table:
count | Line | Max
7220 | Pic | 7220
283 | Dis | 7220
3557 | Pic | 7220
317 | Met | 7220
500 | Met | 7220
And I'd like this result:
count | Line | Max
7220 | Pic | 7220
283 | Dis | 283
3557 | Pic | 7220
317 | Met | 500
500 | Met | 500
Of course I have to remove the ="Pic", but not sure what to replace it with? Many thanks
There are a couple ways to do this for a calculated column.
One way is to remove all row context and explicitly define your condition:
Max = CALCULATE(MAX(RankOfArea[Count]),
ALL(RankOfArea),
RankOfArea[Line] = EARLIER(RankOfArea[Line]))
(The EARLIER function refers to the earlier row context.)
Another way is to remove just the [Count] row context:
Max = CALCULATE(MAX(RankOfArea[Count]), ALL(RankOfArea[Count])
In this case, since there are only two columns, this is equivalent to removing all row context except for the [Line] value:
Max = CALCULATE(MAX(RankOfArea[Count]), ALLEXCEPT(RankOfArea, RankOfArea[Line]))
I recommend this latter approach in case your table acquires more columns.

How can I export a two-way table?

I have created a two-way summary table in Stata, but I am struggling to output my results.
Using the auto.dta sample dataset as an example, I am trying to build a table that displays the means and standard deviations of mpg, by two other variables (expensive and foreign).
My code currently looks as follows:
sysuse auto.dta, replace
gen expensive = (price > 5000)
The table that I would like to display can be created by either of the two commands below:
tabulate expensive foreign, sum(mpg)
Means, Standard Deviations and Frequencies of Mileage (mpg)
| Car type
expensive | Domestic Foreign | Total
-----------+----------------------+----------
0 | 22.137931 28.875 | 23.594595
| 4.3648281 4.8825491 | 5.2305696
| 29 8 | 37
-----------+----------------------+----------
1 | 16.913043 22.428571 | 19
| 3.4629604 6.4416229 | 5.4467115
| 23 14 | 37
-----------+----------------------+----------
Total | 19.826923 24.772727 | 21.297297
| 4.7432972 6.6111869 | 5.7855032
| 52 22 | 74
table expensive foreign, c(mean mpg sd mpg) row col
----------------------------------------
| Car type
expensive | Domestic Foreign Total
----------+-----------------------------
0 | 22.1379 28.875 23.5946
| 4.364828 4.882549 5.23057
| 29 8 37
|
1 | 16.913 22.4286 19
| 3.46296 6.441623 5.446712
| 23 14 37
|
Total | 19.8269 24.7727 21.2973
| 4.743297 6.611187 5.785503
| 52 22 74
----------------------------------------
I can also closely approximate the same results using collapse, but this does not calculate row and column totals.
My issue is that neither the tabulate (with the sum option) command nor the table command seem friendly to output. I have tried converting to matrices, but tabulate with the sum option does not allow the matcell option and table seems similarly uncooperative.
I'm familiar with tabstat, esttab etc., but was not able to create the two-way table that I need with any of those packages. Any help would be really appreciated.
The community-contributed command asdoc does exactly that:
. asdoc table expensive foreign, c(mean mpg sd mpg count mpg) row col
----------------------------------------
| Car type
expensive | Domestic Foreign Total
----------+-----------------------------
0 | 22.1379 28.875 23.5946
| 4.364828 4.882549 5.23057
| 29 8 37
|
1 | 16.913 22.4286 19
| 3.46296 6.441623 5.446712
| 23 14 37
|
Total | 19.8269 24.7727 21.2973
| 4.743297 6.611187 5.785503
| 52 22 74
----------------------------------------
Click to Open File: Myfile.doc
Alternatively, one could use the community-contributed command tabout:
. tabout expensive foreign using table1.txt, c(mean mpg) sum replace
Table output written to: table1.txt
Car type
Domestic Foreign Total
Mean mpg Mean mpg Mean mpg
expensive
0 22.1 28.9 23.6
1 16.9 22.4 19.0
Total 19.8 24.8 21.3
. tabout expensive foreign using table2.txt, c(sd mpg) sum replace
Table output written to: table2.txt
Car type
Domestic Foreign Total
Sd mpg Sd mpg Sd mpg
expensive
0 4.4 4.9 5.2
1 3.5 6.4 5.4
Total 4.7 6.6 5.8
. tabout expensive foreign using table3.txt, c(count mpg) sum replace
Table output written to: table3.txt
Car type
Domestic Foreign Total
Count mpg Count mpg Count mpg
expensive
0 29.0 8.0 37.0
1 23.0 14.0 37.0
Total 52.0 22.0 74.0
an easy solution is to use collapse to get a dataset that reproduces your desired table, and then export the dataset as a csv
example
collapse (sum) mpg, by(expensive foreign)
and then
export delimited using mydata.csv