Frequency table with group variable - stata

I have a dataset with firm level data.
I have a variable employees (an integer) and a variable nace2 (an integer indicating what industry or service sector the company is related to)
I have created a third variable for grouping employees:
gen employees_cat = .
replace employees_cat = 1 if employees >=0 & employees<10
replace employees_cat = 2 if employees >=10 & employees<20
replace employees_cat = 3 if employees >=20 & employees<49
replace employees_cat = 4 if employees >=49 & employees<249
replace employees_cat = 5 if employees >=249
I would like to create a frequency table showing how many employees work in every nace2 sector per employees_cat.
As a reproducible example take
sysuse auto.dta
Let's try to get a frequency table showing the overall mileage (mpg) of all domestic / foreign cars that have a trunk space of 11, 12, 16, etc.

The starting point for frequency tabulations in Stata is tabulate which can show one- and two-way breakdowns. Used with by: multi-way breakdowns can be produced as a series of two-way tables. See also table.
With the variables you mention in the auto data there are 21 distinct values for mpg and 18 for trunk, so a two-way table would be 21 x 18 or 18 x 21 with many empty cells, as the number of observations at 74 is much less than the product 378. (Here to count distinct values the command distinct is installed: search distinct in Stata for literature references and latest code version to download.)
. sysuse auto, clear
(1978 Automobile Data)
. distinct mpg trunk
------------------------------
| total distinct
-------+----------------------
mpg | 74 21
trunk | 74 18
------------------------------
One way around this problem is to collapse the tabulation into a list with typical entry {row variable, column variable, frequency information}. This is offered by the program groups, which must be installed first, as here:
. ssc inst groups
. groups trunk mpg
+-------------------------------+
| trunk mpg Freq. Percent |
|-------------------------------|
| 5 28 1 1.35 |
| 6 23 1 1.35 |
| 7 18 1 1.35 |
| 7 24 2 2.70 |
| 8 21 1 1.35 |
|-------------------------------|
| 8 24 1 1.35 |
| 8 26 1 1.35 |
| 8 30 1 1.35 |
| 8 35 1 1.35 |
| 9 22 1 1.35 |
|-------------------------------|
| 9 28 1 1.35 |
| 9 29 1 1.35 |
| 9 31 1 1.35 |
| 10 21 1 1.35 |
| 10 24 1 1.35 |
|-------------------------------|
| 10 25 1 1.35 |
| 10 26 2 2.70 |
| 11 17 1 1.35 |
| 11 18 1 1.35 |
| 11 22 1 1.35 |
|-------------------------------|
| 11 23 1 1.35 |
| 11 28 1 1.35 |
| 11 30 1 1.35 |
| 11 34 1 1.35 |
| 11 35 1 1.35 |
|-------------------------------|
| 12 22 1 1.35 |
| 12 23 1 1.35 |
| 12 25 1 1.35 |
| 13 19 3 4.05 |
| 13 21 1 1.35 |
|-------------------------------|
| 14 14 1 1.35 |
| 14 17 1 1.35 |
| 14 18 1 1.35 |
| 14 19 1 1.35 |
| 15 14 1 1.35 |
|-------------------------------|
| 15 17 1 1.35 |
| 15 18 1 1.35 |
| 15 25 1 1.35 |
| 15 41 1 1.35 |
| 16 14 3 4.05 |
|-------------------------------|
| 16 18 1 1.35 |
| 16 19 3 4.05 |
| 16 20 2 2.70 |
| 16 21 1 1.35 |
| 16 22 1 1.35 |
|-------------------------------|
| 16 25 1 1.35 |
| 17 16 3 4.05 |
| 17 18 1 1.35 |
| 17 19 1 1.35 |
| 17 20 1 1.35 |
|-------------------------------|
| 17 22 1 1.35 |
| 17 25 1 1.35 |
| 18 12 1 1.35 |
| 20 14 1 1.35 |
| 20 15 1 1.35 |
|-------------------------------|
| 20 16 1 1.35 |
| 20 18 2 2.70 |
| 20 21 1 1.35 |
| 21 17 1 1.35 |
| 21 18 1 1.35 |
|-------------------------------|
| 22 12 1 1.35 |
| 23 15 1 1.35 |
+-------------------------------+
groups has many more options, which are documented in its help. But it extends easily to multi-way tables also collapsed to lists, as here with a third grouping variable:
. groups foreign trunk mpg, sepby(foreign trunk)
+------------------------------------------+
| foreign trunk mpg Freq. Percent |
|------------------------------------------|
| Domestic 7 18 1 1.35 |
| Domestic 7 24 2 2.70 |
|------------------------------------------|
| Domestic 8 26 1 1.35 |
| Domestic 8 30 1 1.35 |
|------------------------------------------|
| Domestic 9 22 1 1.35 |
| Domestic 9 28 1 1.35 |
| Domestic 9 29 1 1.35 |
|------------------------------------------|
| Domestic 10 21 1 1.35 |
| Domestic 10 24 1 1.35 |
| Domestic 10 26 1 1.35 |
|------------------------------------------|
| Domestic 11 17 1 1.35 |
| Domestic 11 22 1 1.35 |
| Domestic 11 28 1 1.35 |
| Domestic 11 34 1 1.35 |
|------------------------------------------|
| Domestic 12 22 1 1.35 |
|------------------------------------------|
| Domestic 13 19 3 4.05 |
| Domestic 13 21 1 1.35 |
|------------------------------------------|
| Domestic 14 19 1 1.35 |
|------------------------------------------|
| Domestic 15 14 1 1.35 |
| Domestic 15 18 1 1.35 |
|------------------------------------------|
| Domestic 16 14 3 4.05 |
| Domestic 16 18 1 1.35 |
| Domestic 16 19 3 4.05 |
| Domestic 16 20 2 2.70 |
| Domestic 16 22 1 1.35 |
|------------------------------------------|
| Domestic 17 16 3 4.05 |
| Domestic 17 18 1 1.35 |
| Domestic 17 19 1 1.35 |
| Domestic 17 20 1 1.35 |
| Domestic 17 22 1 1.35 |
| Domestic 17 25 1 1.35 |
|------------------------------------------|
| Domestic 18 12 1 1.35 |
|------------------------------------------|
| Domestic 20 14 1 1.35 |
| Domestic 20 15 1 1.35 |
| Domestic 20 16 1 1.35 |
| Domestic 20 18 2 2.70 |
| Domestic 20 21 1 1.35 |
|------------------------------------------|
| Domestic 21 17 1 1.35 |
| Domestic 21 18 1 1.35 |
|------------------------------------------|
| Domestic 22 12 1 1.35 |
|------------------------------------------|
| Domestic 23 15 1 1.35 |
|------------------------------------------|
| Foreign 5 28 1 1.35 |
|------------------------------------------|
| Foreign 6 23 1 1.35 |
|------------------------------------------|
| Foreign 8 21 1 1.35 |
| Foreign 8 24 1 1.35 |
| Foreign 8 35 1 1.35 |
|------------------------------------------|
| Foreign 9 31 1 1.35 |
|------------------------------------------|
| Foreign 10 25 1 1.35 |
| Foreign 10 26 1 1.35 |
|------------------------------------------|
| Foreign 11 18 1 1.35 |
| Foreign 11 23 1 1.35 |
| Foreign 11 30 1 1.35 |
| Foreign 11 35 1 1.35 |
|------------------------------------------|
| Foreign 12 23 1 1.35 |
| Foreign 12 25 1 1.35 |
|------------------------------------------|
| Foreign 14 14 1 1.35 |
| Foreign 14 17 1 1.35 |
| Foreign 14 18 1 1.35 |
|------------------------------------------|
| Foreign 15 17 1 1.35 |
| Foreign 15 25 1 1.35 |
| Foreign 15 41 1 1.35 |
|------------------------------------------|
| Foreign 16 21 1 1.35 |
| Foreign 16 25 1 1.35 |
+------------------------------------------+

Related

How can I combine categories?

I have a variable fruit with the following categories:
1
2
3
4
5
6
7
8
9
10
20
25
I want to collapse these as below:
1
2
3
4
5+
How can I do this?
Consider your example:
clear
input fruit
1
2
3
4
5
6
7
8
9
10
20
25
end
tabulate fruit
fruit | Freq. Percent Cum.
------------+-----------------------------------
1 | 1 8.33 8.33
2 | 1 8.33 16.67
3 | 1 8.33 25.00
4 | 1 8.33 33.33
5 | 1 8.33 41.67
6 | 1 8.33 50.00
7 | 1 8.33 58.33
8 | 1 8.33 66.67
9 | 1 8.33 75.00
10 | 1 8.33 83.33
20 | 1 8.33 91.67
25 | 1 8.33 100.00
------------+-----------------------------------
Total | 12 100.00
The following works for me:
replace fruit = 5 if fruit >= 5
tabulate fruit
fruit | Freq. Percent Cum.
------------+-----------------------------------
1 | 1 8.33 8.33
2 | 1 8.33 16.67
3 | 1 8.33 25.00
4 | 1 8.33 33.33
5 | 8 66.67 100.00
------------+-----------------------------------
Total | 12 100.00

After appending, I get null values in primary table headers

I have a table that I want to use as headers for another table that just has data. I used append as new in PBI, used the headers table as primary and data table as secondary. All the columns from the primary table have null values and the data table is appended next to headers column.
Eg:
Table 1 ( Headers)
+-----+-----+-----+-----+
| ABC | DEF | IGH | KLM |
+-----+-----+-----+-----+
Table 2 ( Data )
+----+----+----+----+
| 1 | 2 | 3 | 4 |
| 6 | 7 | 8 | 9 |
| 11 | 12 | 13 | 14 |
| 16 | 17 | 18 | 19 |
| 21 | 22 | 23 | 24 |
| 26 | 27 | 28 | 29 |
| 31 | 32 | 33 | 34 |
+----+----+----+----+
Table I am getting after append:
+------+------+------+------+------+------+------+------+
| ABC | DEF | IGH | KLM | null | null | null | null |
+------+------+------+------+------+------+------+------+
| null | null | null | null | 1 | 2 | 3 | 4 |
| null | null | null | null | 6 | 7 | 8 | 9 |
| null | null | null | null | 11 | 12 | 13 | 14 |
| null | null | null | null | 16 | 17 | 18 | 19 |
| null | null | null | null | 21 | 22 | 23 | 24 |
| null | null | null | null | 26 | 27 | 28 | 29 |
| null | null | null | null | 31 | 32 | 33 | 34 |
+------+------+------+------+------+------+------+------+
Table I need:
+-----+-----+-----+-----+
| ABC | DEF | IGH | KLM |
+-----+-----+-----+-----+
| 1 | 2 | 3 | 4 |
| 6 | 7 | 8 | 9 |
| 11 | 12 | 13 | 14 |
| 16 | 17 | 18 | 19 |
| 21 | 22 | 23 | 24 |
| 26 | 27 | 28 | 29 |
| 31 | 32 | 33 | 34 |
+-----+-----+-----+-----+
I used Append as new in PBI
Used the headers table ( Table 1) as primary and appended Table 2 to it.
This shows at the top function:
= Table.Combine({Table 1, Table 2})
This in the advanced editor:
let
Source = Table.Combine({Sheet1, InterviewQn})
in
Source
Expected result:
+-----+-----+-----+-----+
| ABC | DEF | IGH | KLM |
+-----+-----+-----+-----+
| 1 | 2 | 3 | 4 |
| 6 | 7 | 8 | 9 |
| 11 | 12 | 13 | 14 |
| 16 | 17 | 18 | 19 |
| 21 | 22 | 23 | 24 |
| 26 | 27 | 28 | 29 |
| 31 | 32 | 33 | 34 |
+-----+-----+-----+-----+
OR
+-----+-----+-----+-----+
| ABC | DEF | IGH | KLM |
| 1 | 2 | 3 | 4 |
| 6 | 7 | 8 | 9 |
| 11 | 12 | 13 | 14 |
| 16 | 17 | 18 | 19 |
| 21 | 22 | 23 | 24 |
| 26 | 27 | 28 | 29 |
| 31 | 32 | 33 | 34 |
+-----+-----+-----+-----+
If you're only trying to rename the columns of Table 2, using the column names of Table 1, then it's simply:
= Table.RenameColumns(#"Table 2", List.Zip({Table.ColumnNames(#"Table 2"), Table.ColumnNames(#"Table 1")}))
See https://pwrbi.com/so_55529969/ for worked example PBIX file

Power BI DAX to filter common items A & B share

Sample data:
| Vendor | Size Group | Model | Quantity | Cost | TAT | Posting Date |
|--------|------------|-------|----------|-------|-----|-------------------|
| A | S | A150 | 150 | 450 | 67 | July 7, 2018 |
| A | M | A200 | 250 | 1500 | 75 | June 22, 2018 |
| A | M | A150 | 25 | 8500 | 85 | July 9, 2018 |
| C | L | A200 | 350 | 1250 | 125 | March 5, 2018 |
| C | XL | A500 | 150 | 6500 | 45 | February 20, 2018 |
| A | M | A900 | 385 | 475 | 40 | January 29, 2018 |
| A | M | A150 | 650 | 45 | 45 | August 31, 2018 |
| D | M | A150 | 65 | 7500 | 15 | April 10, 2018 |
| D | M | A300 | 140 | 3420 | 10 | April 3, 2018 |
| E | S | A150 | 20 | 10525 | 85 | January 3, 2018 |
| B | S | A150 | 30 | 10500 | 40 | June 3, 2018 |
| B | S | A150 | 450 | 450 | 64 | April 3, 2018 |
| E | XS | A900 | 45 | 75 | 60 | January 3, 2018 |
| F | M | A900 | 95 | 655 | 175 | January 3, 2018 |
| D | XL | A300 | 15 | 21500 | 25 | January 3, 2018 |
| D | S | A500 | 450 | 65 | 25 | May 3, 2018 |
| A | M | A350 | 250 | 450 | 22 | January 3, 2018 |
| B | S | A150 | 45 | 8500 | 28 | January 3, 2018 |
| A | S | A300 | 550 | 650 | 128 | January 3, 2018 |
| C | M | A150 | 1500 | 855 | 190 | January 3, 2018 |
| B | M | A150 | 65 | 1750 | 41 | January 3, 2018 |
| A | L | A500 | 75 | 1700 | 24 | January 3, 2018 |
| B | S | A900 | 55 | 9800 | 37 | May 29, 2018 |
| B | M | A500 | 150 | 850 | 83 | April 18, 2018 |
In the provided sample, the common Size Groups A & B both share are S & M. So, I was hoping to display those Size Groups as the legend and Average Cost as the value in a clustered column chart.
Can anyone please advise how I can go about this?
Thank you!!!

Sort by two or more variables

I'm trying to sort by ID and then by Date.
What I have:
| ID | Date |
| ----------------------|
| 112 | 2013-01-01 |
| 112 | 2013-01-15 |
| 113 | 2012-01-01 |
| 112 | 2014-02-13 |
| 112 | 2013-01-02 |
| 113 | 2011-01-11 |
What I need:
| ID | Date |
| ----------------------|
| 112 | 2013-01-01 |
| 112 | 2013-01-02 |
| 112 | 2013-01-15 |
| 112 | 2014-02-13 |
| 113 | 2011-01-11 |
| 113 | 2012-01-01 |
My problem is that I only know how to sort by ID or Date.
More generally:
clear
input id foo
1 56
1 34
2 13
1 67
1 22
2 89
2 61
2 76
end
sort id (foo)
list, sepby(id)
+----------+
| id foo |
|----------|
1. | 1 22 |
2. | 1 34 |
3. | 1 56 |
4. | 1 67 |
|----------|
5. | 2 13 |
6. | 2 61 |
7. | 2 76 |
8. | 2 89 |
+----------+
In a more advanced programming context you can use the same syntax with bysort.

Conditionally create new observations

I have data in the following format (there are a lot more variables):
year ID Dummy
1495 65 1
1496 65 1
1501 65 1
1502 65 1
1520 65 0
1522 65 0
What I am trying to achieve is conditionally create new observations that fills in the data between two points in time conditional on a dummy. If the dummy is equal to 1, the data is supposed to be filled in. If the variable is equal to 0 then it shall not be filled in.
For example:
year ID Dummy
1495 65 1
1496 65 1
1497 65 1
1498 65 1
.
.
1501 65 1
1502 65 1
1503 65 1
1504 65 1
.
.
.
1520 65 0
1522 65 0
Here's one way to do this:
clear
input year id dummy
1495 65 1
1496 65 1
1501 65 1
1502 65 1
1520 65 0
1522 65 0
end
generate tag = year[_n] != year[_n+1] & dummy == 1
generate delta = year[_n] - year[_n+1] if tag
replace delta = . if abs(delta) == 1
expand abs(delta) if tag & delta != .
sort year
bysort year: egen seq = seq() if delta != .
replace seq = seq - 1
replace seq = 0 if seq == .
replace year = year + seq if year != .
drop tag delta seq
The above code snippet will produce:
list
+-------------------+
| year id dummy |
|-------------------|
1. | 1495 65 1 |
2. | 1496 65 1 |
3. | 1497 65 1 |
4. | 1498 65 1 |
5. | 1499 65 1 |
|-------------------|
6. | 1500 65 1 |
7. | 1501 65 1 |
8. | 1502 65 1 |
9. | 1503 65 1 |
10. | 1504 65 1 |
|-------------------|
11. | 1505 65 1 |
12. | 1506 65 1 |
13. | 1507 65 1 |
14. | 1508 65 1 |
15. | 1509 65 1 |
|-------------------|
16. | 1510 65 1 |
17. | 1511 65 1 |
18. | 1512 65 1 |
19. | 1513 65 1 |
20. | 1514 65 1 |
|-------------------|
21. | 1515 65 1 |
22. | 1516 65 1 |
23. | 1517 65 1 |
24. | 1518 65 1 |
25. | 1519 65 1 |
|-------------------|
26. | 1520 65 0 |
27. | 1522 65 0 |
+-------------------+