Formatting a Stata table like a table in SAS - sas

I have a 3-way table in Stata that looks like this:
I would like to format this 3-way crosstab like a table in SAS that looks like this:
The actual output in the table isn't important, I just want to know how I can change the formatting of the Stata table. Any help is appreciated!

The groups command from the Stata Journal will get you most of the way. This reproducible example doesn't exhaust the possibilities.
. webuse nlswork, clear
(National Longitudinal Survey. Young Women 14-26 years of age in 1968)
. groups union race , show(f F p P) sepby(union)
+--------------------------------------------------+
| union race Freq. #<= Percent %<= |
|--------------------------------------------------|
| 0 white 10777 10777 56.02 56.02 |
| 0 black 3784 14561 19.67 75.69 |
| 0 other 167 14728 0.87 76.56 |
|--------------------------------------------------|
| 1 white 2817 17545 14.64 91.20 |
| 1 black 1649 19194 8.57 99.77 |
| 1 other 44 19238 0.23 100.00 |
+--------------------------------------------------+
The command must be installed before you can use it. groups is a lousy search term, but this search will find the 2017 write-up and later updates of the software (at the time of writing, just one in 2018).
. search st0496, entry
Search of official help files, FAQs, Examples, and Stata Journals
SJ-18-1 st0496_1 . . . . . . . . . . . . . . . . . Software update for groups
(help groups if installed) . . . . . . . . . . . . . . . . N. J. Cox
Q1/18 SJ 18(1):291
groups exited with an error message if weights were specified;
this has been corrected
SJ-17-3 st0496 . . . . . Speaking Stata: Tables as lists: The groups command
(help groups if installed) . . . . . . . . . . . . . . . . N. J. Cox
Q3/17 SJ 17(3):760--773
presents command for listing group frequencies and percents and
cumulations thereof; for various subsetting and ordering by
frequencies, percents, and so on; for reordering of columns;
and for saving tabulated data to new datasets

Related

Constraining for each observation with same ID

I've got a dataset where I've been working a while to clean it up from wide to long format.
We're following about 1.000 patients with 1-5 aneurysms (can have more than 1 aneurysm) and some or all are treated with different available treatments. The patient can have two aneurysms where one is treated with treatment A and the other is treated with treatment B.
Here's an example of the data:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str32 record_id float treatmentChoice_ byte(treatment_1 treatment_3) float aneurysm_id
"007128de18ce5cb1635b8f27c5435ff3" . . . 1
"007128de18ce5cb1635b8f27c5435ff3" . . . 2
"00abd7bdb6283dd0ac6b97271608a122" . 2 . 1
"00abd7bdb6283dd0ac6b97271608a122" . . . 2
"0142103f84693c6eda416dfc55f65de1" . 1 . 1
"0142103f84693c6eda416dfc55f65de1" . . . 2
"0153826d93a58d7e1837bb98a3c21ba8" . . . 1
"0153826d93a58d7e1837bb98a3c21ba8" . . . 2
"01c729ac4601e36f245fd817d8977917" . 1 . 1
"01c729ac4601e36f245fd817d8977917" . . . 2
"01dd90093fbf201a1f357e22eaff6b6a" . . . 1
"01dd90093fbf201a1f357e22eaff6b6a" . 1 . 2
"0208e14dcabc43dd2b57e2e8b117de4d" . . . 1
"0208e14dcabc43dd2b57e2e8b117de4d" . 1 . 2
"0210f575075e5def7ffa77530ce17ef0" . . . 1
"0210f575075e5def7ffa77530ce17ef0" . . . 2
"022cc7a9397e81cf58cd9111f9d1db0d" . . . 1
"022cc7a9397e81cf58cd9111f9d1db0d" . . . 2
"02afd543116a22fc7430620727b20bb5" . 2 . 1
"02afd543116a22fc7430620727b20bb5" . . . 2
"0303ef0bd5d256cca1c836e2b70415ac" . . . 1
"0303ef0bd5d256cca1c836e2b70415ac" . 1 . 2
"041b2b0cac589d6e3b65bb924803cf1a" . . . 1
"041b2b0cac589d6e3b65bb924803cf1a" . . . 2
"0536317a2bbb936e85c3eb8294b076da" . . . 1
"0536317a2bbb936e85c3eb8294b076da" . 1 . 2
"06161d4668f217937cac0ac033d8d199" . . . 1
"06161d4668f217937cac0ac033d8d199" . . . 2
"065e151f8bcebb27fabf8b052fd70566" . . 1 1
"065e151f8bcebb27fabf8b052fd70566" . . . 2
"065e151f8bcebb27fabf8b052fd70566" . . . 3
"065e151f8bcebb27fabf8b052fd70566" . . . 4
"07196414cd6bf89d94a33e149983d102" . . . 1
"07196414cd6bf89d94a33e149983d102" . . . 2
"0721c38f8275dab504fc53aebcc005ce" . . . 1
"0721c38f8275dab504fc53aebcc005ce" . . . 2
"0721c38f8275dab504fc53aebcc005ce" . . . 3
"0721c38f8275dab504fc53aebcc005ce" 1 . . 4
"07bef516d53279a3f5e477d56d552a2b" . . . 1
"07bef516d53279a3f5e477d56d552a2b" . 2 . 2
"08678829b7e0ee6a01b17974b4d19cfa" . . . 1
"08678829b7e0ee6a01b17974b4d19cfa" . . . 2
"08bb6c65e63c499ea19ac24d5113dd94" . . . 1
"08bb6c65e63c499ea19ac24d5113dd94" . . . 2
"08f036417500c332efd555c76c4654a0" . . . 1
"08f036417500c332efd555c76c4654a0" . . . 2
"090c54d021b4b21c7243cec01efbeb91" . . . 1
"090c54d021b4b21c7243cec01efbeb91" . . . 2
"09166bb44e4c5cdb8f40d402f706816e" . . . 1
"09166bb44e4c5cdb8f40d402f706816e" . 1 . 2
"0930159addcdc35e7dc18812522d4377" . . . 1
"0930159addcdc35e7dc18812522d4377" . . . 2
"096844af91d2e266767775b0bee9105e" . . . 1
"096844af91d2e266767775b0bee9105e" . 2 . 2
"09884af1bb9d59803de0c74d6df57c23" . . . 1
"09884af1bb9d59803de0c74d6df57c23" . 2 . 2
"09e03748da35e9d799dc5d8ddf1909b5" . . . 1
"09e03748da35e9d799dc5d8ddf1909b5" . . . 2
"0a4ce4a7941ff6d1f5c217bf5a9a3bf9" . . . 1
"0a4ce4a7941ff6d1f5c217bf5a9a3bf9" . . . 2
"0a5db40dc58e97927b407c9210aab7ba" 4 . . 1
"0a5db40dc58e97927b407c9210aab7ba" . . . 2
"0a73c992955231650965ed87e3bd52f6" . . . 1
"0a73c992955231650965ed87e3bd52f6" . 2 . 2
"0a84ab77fff74c247a525dfde8ce988c" 1 . 2 1
"0a84ab77fff74c247a525dfde8ce988c" . . . 2
"0a84ab77fff74c247a525dfde8ce988c" . . . 3
"0af333ae400f75930125bb0585f0dcf5" . . . 1
"0af333ae400f75930125bb0585f0dcf5" . . . 2
"0af73334d9d2166191f3385de48f15d2" . 1 . 1
"0af73334d9d2166191f3385de48f15d2" . . . 2
"0b341ac8f396a8cdb88b7c658f66f653" . . . 1
"0b341ac8f396a8cdb88b7c658f66f653" . . . 2
"0b35cf4beb830b361d7c164371f25149" . 1 . 1
"0b35cf4beb830b361d7c164371f25149" . . . 2
"0b3e110c9765e14a5c41fadcc3cfc300" . . . 1
"0b6681f0f441e69c26106ab344ac0733" . . . 1
"0b6681f0f441e69c26106ab344ac0733" . . . 2
"0b8d8253a8415275dbc2619e039985bb" 4 . . 1
"0b8d8253a8415275dbc2619e039985bb" . . . 2
"0b8d8253a8415275dbc2619e039985bb" . . . 3
"0b92c26375117bf42945c04d8d6573d4" . 2 . 1
"0b92c26375117bf42945c04d8d6573d4" . . . 2
"0ba961f437f43105c357403c920bdef1" . . . 1
"0ba961f437f43105c357403c920bdef1" . . . 2
"0bb601fabe1fdfa794a5272408997a2f" . . . 1
"0bb601fabe1fdfa794a5272408997a2f" . . . 2
"0c75b36e91363d596dc46bd563c3f5ef" . 1 . 1
"0c75b36e91363d596dc46bd563c3f5ef" . . . 2
"0d461328a3bae7164ce7d3a10f366812" . . . 1
"0d461328a3bae7164ce7d3a10f366812" . 2 . 2
"0d4cc4eb459301a804cbef22914f44a3" . 1 . 1
"0d4cc4eb459301a804cbef22914f44a3" . . . 2
"0d4e29e11bb94e922112089f3fec61ef" . . . 1
"0d4e29e11bb94e922112089f3fec61ef" . 1 . 2
"0d513c74d667f55c8f4a9836c304149c" . 1 . 1
"0d513c74d667f55c8f4a9836c304149c" . . . 2
"0da25de126bb3b3ee565eff8888004c2" . . . 1
"0da25de126bb3b3ee565eff8888004c2" . 1 . 2
"0db9ae1f2201577f431b7603d0819fa6" . . . 1
end
label values treatment_1 treatment_1_
label def treatment_1_ 1 "Observation", modify
label def treatment_1_ 2 "Afsluttet", modify
label values treatment_3 treatment_3_
label def treatment_3_ 1 "Observation", modify
label def treatment_3_ 2 "Afsluttet", modify
As you can see, in this example there are three different treatments and I have sorted the observations by the record_ID (patients). Notice that each patient (record_ID) can appear more than once. In fact I have expanded the dataset so if a patient has 4 aneurysms, there will be 4 observations as the statistics are based on aneurysms, not patients.
My problem is, it's seemingly random which one of these observations will describe which treatment each aneurysm got, and I would like to add a variable treatment that lists treatment for the corresponding aneurysm ID. Also note treatmentChoice_ means "which treatment did aneurysm 1 get?" and treatmentChoice_1 means "which treatment did aneurysm 2 get?"
Is there a way to perhaps say:
"For each record_ID that is identical, look through treatmentChoice_ and set treatment to that value if aneurysm ID is 1. Then do the same for treatmentChoice_1, treatmentChoice_3 and set treatment to their value if aneurysm ID is 2 or 3 respectively.
If I follow this correctly you want to select one non-missing value from some variables in each observation. For that you can use max() or min() or the rowmin() or rowmax() function from egen.
With your example data (thanks), I got this. Note the two unlabelled values of 4.
. generate treatment = max(treatmentChoice_, treatment_1, treatment_3)
(73 missing values generated)
. label val treatment treatment_1_
. tab treatment
treatment | Freq. Percent Cum.
------------+-----------------------------------
Observation | 16 59.26 59.26
Afsluttet | 9 33.33 92.59
4 | 2 7.41 100.00
------------+-----------------------------------
Total | 27 100.00

Calculate compounded annual growth rates

I have a panel dataset with values of companies between 2006-2015.
This looks something like the example below:
I want to calculate three-year compounded annual growth rates:
2006-2009
2007-2010
...
2012-2015
I have already tried to use the following command:
bys tina: generate SalesGrowth=(Sales/L3.Sales)^(1/3) - 1 if mod(ano, 5) == 0
However, although Stata generates the new variable, all values are missing.
Alternatively to compounded annual growth rate, I could simply use a growth rate with 2009 and 2006 data. But, the same problem arises - no observations are created.
Consider this toy example:
clear
input tina ano Sales
500000069 2006 15000
500000069 2007 17000
500000069 2008 19000
500000069 2009 24000
500000069 2010 22000
500000069 2011 28000
500000069 2012 26000
500000069 2013 29000
500000069 2014 31000
500000069 2015 33000
500000087 2006 40000
500000087 2007 42000
500000087 2008 44000
500000087 2009 46000
500000087 2010 48000
500000087 2011 50000
500000087 2012 52000
500000087 2013 54000
500000087 2014 56000
500000087 2015 58000
end
format tina %9.0f
The following solution:
bysort tina: summarize ano
forvalues i = 1 / `= `r(N)' - 3' {
bysort tina (ano): generate SalesGrowth`i' = (Sales[`i'+3]/Sales[`i'])^(1/3) - 1
bysort tina (ano): replace SalesGrowth`i' = . if ano != ano[`i'+3]
}
Gives accurate estimates of what you need:
. list
+-------------------------------------------------------------------------------------------------------+
| tina ano Sales SalesG~1 SalesG~2 SalesG~3 SalesG~4 SalesG~5 SalesG~6 SalesG~7 |
|-------------------------------------------------------------------------------------------------------|
1. | 500000064 2006 15000 . . . . . . . |
2. | 500000064 2007 17000 . . . . . . . |
3. | 500000064 2008 19000 . . . . . . . |
4. | 500000064 2009 24000 .1696071 . . . . . . |
5. | 500000064 2010 22000 . .0897442 . . . . . |
|-------------------------------------------------------------------------------------------------------|
6. | 500000064 2011 28000 . . .1379805 . . . . |
7. | 500000064 2012 26000 . . . .02704 . . . |
8. | 500000064 2013 29000 . . . . .0964574 . . |
9. | 500000064 2014 31000 . . . . . .0345097 . |
10. | 500000064 2015 33000 . . . . . . .0827134 |
|-------------------------------------------------------------------------------------------------------|
11. | 500000096 2006 40000 . . . . . . . |
12. | 500000096 2007 42000 . . . . . . . |
13. | 500000096 2008 44000 . . . . . . . |
14. | 500000096 2009 46000 .0476896 . . . . . . |
15. | 500000096 2010 48000 . .0455159 . . . . . |
|-------------------------------------------------------------------------------------------------------|
16. | 500000096 2011 50000 . . .043532 . . . . |
17. | 500000096 2012 52000 . . . .041714 . . . |
18. | 500000096 2013 54000 . . . . .0400419 . . |
19. | 500000096 2014 56000 . . . . . .0384988 . |
20. | 500000096 2015 58000 . . . . . . .0370703 |
+-------------------------------------------------------------------------------------------------------+

Remove middle character from variable names

I have variable names ending with an underscore (_), followed by a year code:
clear
set obs 1
foreach var in age_58 age_64 age_75 age_184 age_93 age99 {
generate `var' = rnormal()
}
list
+----------------------------------------------------------------------+
| age_58 age_64 age_75 age_184 age_93 age99 |
|----------------------------------------------------------------------|
1. | .1162236 -.8781271 1.199268 -1.475732 .9077238 -.0858719 |
+----------------------------------------------------------------------+
I would like to rename them into:
age58 age64 age75 age184 age93 age99
I know I can do this by renaming one variable at a time as follows:
rename age_58 age58
rename age_64 age64
rename age_75 age75
rename age_184 age184
rename age_93 age93
How can I remove the underscore from all the variable names at once?
In Stata 13 and later versions, this can be done in one line using the built-in command rename.
One merely has to specify the relevant rules, which can include wildcard characters:
rename *_# *#
list
+----------------------------------------------------------------------+
| age58 age64 age75 age184 age93 age99 |
|----------------------------------------------------------------------|
1. | .1162236 -.8781271 1.199268 -1.475732 .9077238 -.0858719 |
+----------------------------------------------------------------------+
Type help rename group for details on the various available specifiers.
For Stata 8 up, the community-contributed command renvars offers a solution:
renvars age_*, subst(_)
For documentation and download, see
. search renvars, historical
Search of official help files, FAQs, Examples, SJs, and STBs
SJ-5-4 dm88_1 . . . . . . . . . . . . . . . . . Software update for renvars
(help renvars if installed) . . . . . . . . . N. J. Cox and J. Weesie
Q4/05 SJ 5(4):607
trimend() option added and help file updated
STB-60 dm88 . . . . . . . . Renaming variables, multiply and systematically
(help renvars if installed) . . . . . . . . . N. J. Cox and J. Weesie
3/01 pp.4--6; STB Reprints Vol 10, pp.41--44
renames variables by changing prefixes, postfixes, substrings,
or as specified by a user supplied rule
For the 2001 paper, see this .pdf file.
You can loop over the variables using the macro extended function subinstr:
foreach var of varlist * {
local newname : subinstr local var "_" "", all
if "`newname'" != "`var'" {
rename `var' `newname'
}
}

Copying Specific Lines To a .txt File

OK, I am using 'ipconfig /displaydns' to display all websites visited (since the last 'ipconfig /flushdns') and I would like to copy just the website's URL to Websites.txt.
A typical layout of the output is:
ocsp.digicert.com
----------------------------------------
Record Name . . . . . : ocsp.digicert.com
Record Type . . . . . : 5
Time To Live . . . . : 17913
Data Length . . . . . : 4
Section . . . . . . . : Answer
CNAME Record . . . . : cs9.wac.edgecastcdn.net
badge.stumbleupon.com
----------------------------------------
Record Name . . . . . : badge.stumbleupon.com
Record Type . . . . . : 1
Time To Live . . . . : 39560
Data Length . . . . . : 4
Section . . . . . . . : Answer
A (Host) Record . . . : 199.30.80.32
0.gravatar.com
----------------------------------------
Record Name . . . . . : 0.gravatar.com
Record Type . . . . . : 5
Time To Live . . . . : 2047
Data Length . . . . . : 4
Section . . . . . . . : Answer
CNAME Record . . . . : cs91.wac.edgecastcdn.net
But, I would wish just to have
ocsp.digicert.com
badge.stumbleupon.com
0.gravatar.com
as the output. Any ideas on how to do that, also I am using a Windows RT device, so external applications are not an option and the output is usually 10 times longer than that, and not all records are the same.
Use PowerShell:
ipconfig /displaydns | Select-String 'Record Name' | ForEach-Object {$_ -replace "Record Name . . . . . :", ""}

RegEx for DNS Servers via IPCONFIG

Stack Overflow RegEx Wizards, I've scoured Google and haven't quite found a good solution for this. I need to pull out 1:N DNS servers from IPCONFIG results. In the example below, I would need the first three. However, there may be an instance where there are more or less.
Update: Optimally we want to place cursor at first colon(:) in the DNS string then capture IPs until we hit an alpha character. So if we can just scrape a string from that colon to that alpha character we can run another RegEx to match IPs.
DNS.*: gets us to the first colon (:)
Need to read-ahead until alpha character.
Important Note: Because of the third-party tool we're using we can only use RegEx :)
Here's the RegEx value I've been using as for IPs. This will capture all IP's instead of just the DNS ones...
(([0-9]){1,3}.){1,3}[0-9]{1,3}
IPCONFIG Example
Subnet Mask . . . . . . . . . . . : 255.255.255.0
Default Gateway . . . . . . . . . : 152.225.244.1
DHCP Server . . . . . . . . . . . : 10.204.40.57
DNS Servers . . . . . . . . . . . : 10.204.127.11
10.207.2.50
10.200.10.6
Primary WINS Server . . . . . . . : 10.207.40.145
Secondary WINS Server . . . . . . : 10.232.40.38
Lease Obtained. . . . . . . . . . : Tuesday, August 28, 2012 6:45:12 AM
Lease Expires . . . . . . . . . . : Sunday, September 02, 2012 6:45:12 A
#!/usr/bin/env perl
use strict;
use warnings;
my $data = <<END;
Subnet Mask . . . . . . . . . . . : 255.255.255.0
Default Gateway . . . . . . . . . : 152.225.244.1
DHCP Server . . . . . . . . . . . : 10.204.40.57
DNS Servers . . . . . . . . . . . : 10.204.127.11
10.207.2.50
10.200.10.6
Primary WINS Server . . . . . . . : 10.207.40.145
Secondary WINS Server . . . . . . : 10.232.40.38
Lease Obtained. . . . . . . . . . : Tuesday, August 28, 2012 6:45:12 AM
Lease Expires . . . . . . . . . . : Sunday, September 02, 2012 6:45:12 A
END
my #ips = ();
if ($data =~ /^DNS Servers[\s\.:]+((\d{2}\.\d{3}\.\d{1,3}\.\d{1,3}\s*)+)/m) {
#ips = split(/\s+/, $1);
print "$_\n" foreach(#ips);
}
I would use unpack instead of regular expressions for parsing column-based data:
#!/usr/bin/env perl
use strict;
use warnings;
while (<DATA>) {
my ($ip) = unpack 'x36 A*';
print "$ip\n";
}
__DATA__
DNS Servers . . . . . . . . . . . : 10.204.127.11
10.207.2.50
10.200.10.6
Primary WINS Server . . . . . . . : 10.207.40.145
Secondary WINS Server . . . . . . : 10.232.40.38
You may have to adjust the number 36 to the actual number of characters that should be skipped.
Personally, I'd go in a different direction. Instead of manually parsing the output of ipconfig, I'd use the Win32::IPConfig module.
Win32::IPConfig - IP Configuration Settings for Windows NT/2000/XP/2003
use Win32::IPConfig;
use Data::Dumper;
my $host = shift || "127.0.0.1";
my $ipconfig = Win32::IPConfig->new($host);
my #searchlist = $ipconfig->get_searchlist;
print Dumper \#searchlist;
Match
DNS.+?:(\s*([\d.]+).)+
and pull out the groups. This assumes you have the entire multi-line string in one blob, ans that the extracted text may contain newlines and other whitespace.
The last dot is to match the newline, you need to use /m option
Match against this regex (see in action):
DNS Servers.*:\s*(.*(?:[\n\r]+\s+.*(?:[\n\r]+\s+.*)?)?)
First capture group will be your three IP's (atmost three) as you requested. You need to trim whitespaces surely.
Edit: Regex fixed to match at most three IP's. If there is less IP's, matches them only.