Remove middle character from variable names - stata

I have variable names ending with an underscore (_), followed by a year code:
clear
set obs 1
foreach var in age_58 age_64 age_75 age_184 age_93 age99 {
generate `var' = rnormal()
}
list
+----------------------------------------------------------------------+
| age_58 age_64 age_75 age_184 age_93 age99 |
|----------------------------------------------------------------------|
1. | .1162236 -.8781271 1.199268 -1.475732 .9077238 -.0858719 |
+----------------------------------------------------------------------+
I would like to rename them into:
age58 age64 age75 age184 age93 age99
I know I can do this by renaming one variable at a time as follows:
rename age_58 age58
rename age_64 age64
rename age_75 age75
rename age_184 age184
rename age_93 age93
How can I remove the underscore from all the variable names at once?

In Stata 13 and later versions, this can be done in one line using the built-in command rename.
One merely has to specify the relevant rules, which can include wildcard characters:
rename *_# *#
list
+----------------------------------------------------------------------+
| age58 age64 age75 age184 age93 age99 |
|----------------------------------------------------------------------|
1. | .1162236 -.8781271 1.199268 -1.475732 .9077238 -.0858719 |
+----------------------------------------------------------------------+
Type help rename group for details on the various available specifiers.

For Stata 8 up, the community-contributed command renvars offers a solution:
renvars age_*, subst(_)
For documentation and download, see
. search renvars, historical
Search of official help files, FAQs, Examples, SJs, and STBs
SJ-5-4 dm88_1 . . . . . . . . . . . . . . . . . Software update for renvars
(help renvars if installed) . . . . . . . . . N. J. Cox and J. Weesie
Q4/05 SJ 5(4):607
trimend() option added and help file updated
STB-60 dm88 . . . . . . . . Renaming variables, multiply and systematically
(help renvars if installed) . . . . . . . . . N. J. Cox and J. Weesie
3/01 pp.4--6; STB Reprints Vol 10, pp.41--44
renames variables by changing prefixes, postfixes, substrings,
or as specified by a user supplied rule
For the 2001 paper, see this .pdf file.

You can loop over the variables using the macro extended function subinstr:
foreach var of varlist * {
local newname : subinstr local var "_" "", all
if "`newname'" != "`var'" {
rename `var' `newname'
}
}

Related

Formatting a Stata table like a table in SAS

I have a 3-way table in Stata that looks like this:
I would like to format this 3-way crosstab like a table in SAS that looks like this:
The actual output in the table isn't important, I just want to know how I can change the formatting of the Stata table. Any help is appreciated!
The groups command from the Stata Journal will get you most of the way. This reproducible example doesn't exhaust the possibilities.
. webuse nlswork, clear
(National Longitudinal Survey. Young Women 14-26 years of age in 1968)
. groups union race , show(f F p P) sepby(union)
+--------------------------------------------------+
| union race Freq. #<= Percent %<= |
|--------------------------------------------------|
| 0 white 10777 10777 56.02 56.02 |
| 0 black 3784 14561 19.67 75.69 |
| 0 other 167 14728 0.87 76.56 |
|--------------------------------------------------|
| 1 white 2817 17545 14.64 91.20 |
| 1 black 1649 19194 8.57 99.77 |
| 1 other 44 19238 0.23 100.00 |
+--------------------------------------------------+
The command must be installed before you can use it. groups is a lousy search term, but this search will find the 2017 write-up and later updates of the software (at the time of writing, just one in 2018).
. search st0496, entry
Search of official help files, FAQs, Examples, and Stata Journals
SJ-18-1 st0496_1 . . . . . . . . . . . . . . . . . Software update for groups
(help groups if installed) . . . . . . . . . . . . . . . . N. J. Cox
Q1/18 SJ 18(1):291
groups exited with an error message if weights were specified;
this has been corrected
SJ-17-3 st0496 . . . . . Speaking Stata: Tables as lists: The groups command
(help groups if installed) . . . . . . . . . . . . . . . . N. J. Cox
Q3/17 SJ 17(3):760--773
presents command for listing group frequencies and percents and
cumulations thereof; for various subsetting and ordering by
frequencies, percents, and so on; for reordering of columns;
and for saving tabulated data to new datasets

Replace text between 2 particular lines in a text file using sed

Similar questions have been asked but they are for Powershell.
I have a Markdown file like:
.
.
.
## See also
- [a](./A.md)
- [A Child](./AChild.md)
.
.
.
- [b](./B.md)
.
.
.
## Introduction
.
.
.
I wish to replace all occurrences of .md) with .html) between ## See also and ## Introduction :
.
.
.
## See also
- [a](./A.html)
- [A Child](./AChild.html)
.
.
.
- [b](./B.html)
.
.
.
## Introduction
.
.
.
I tried like this in Bash
orig="\.md)"; new="\.html)"; sed "s~$orig~$new~" t.md -i
But, this replaces everywhere in the file. But I wish that the replacement happens only between ## See also and ## Introduction
Could you please suggest changes? I am using awk and sed as I am little familiar with those. I also know a little Python, is it recommended to do such scripting in Python (if it is too complicated for sed or awk)?
$ sed '/## See also/,/## Introduction/s/\.md/.html/g' file

Copying Specific Lines To a .txt File

OK, I am using 'ipconfig /displaydns' to display all websites visited (since the last 'ipconfig /flushdns') and I would like to copy just the website's URL to Websites.txt.
A typical layout of the output is:
ocsp.digicert.com
----------------------------------------
Record Name . . . . . : ocsp.digicert.com
Record Type . . . . . : 5
Time To Live . . . . : 17913
Data Length . . . . . : 4
Section . . . . . . . : Answer
CNAME Record . . . . : cs9.wac.edgecastcdn.net
badge.stumbleupon.com
----------------------------------------
Record Name . . . . . : badge.stumbleupon.com
Record Type . . . . . : 1
Time To Live . . . . : 39560
Data Length . . . . . : 4
Section . . . . . . . : Answer
A (Host) Record . . . : 199.30.80.32
0.gravatar.com
----------------------------------------
Record Name . . . . . : 0.gravatar.com
Record Type . . . . . : 5
Time To Live . . . . : 2047
Data Length . . . . . : 4
Section . . . . . . . : Answer
CNAME Record . . . . : cs91.wac.edgecastcdn.net
But, I would wish just to have
ocsp.digicert.com
badge.stumbleupon.com
0.gravatar.com
as the output. Any ideas on how to do that, also I am using a Windows RT device, so external applications are not an option and the output is usually 10 times longer than that, and not all records are the same.
Use PowerShell:
ipconfig /displaydns | Select-String 'Record Name' | ForEach-Object {$_ -replace "Record Name . . . . . :", ""}

RegEx for DNS Servers via IPCONFIG

Stack Overflow RegEx Wizards, I've scoured Google and haven't quite found a good solution for this. I need to pull out 1:N DNS servers from IPCONFIG results. In the example below, I would need the first three. However, there may be an instance where there are more or less.
Update: Optimally we want to place cursor at first colon(:) in the DNS string then capture IPs until we hit an alpha character. So if we can just scrape a string from that colon to that alpha character we can run another RegEx to match IPs.
DNS.*: gets us to the first colon (:)
Need to read-ahead until alpha character.
Important Note: Because of the third-party tool we're using we can only use RegEx :)
Here's the RegEx value I've been using as for IPs. This will capture all IP's instead of just the DNS ones...
(([0-9]){1,3}.){1,3}[0-9]{1,3}
IPCONFIG Example
Subnet Mask . . . . . . . . . . . : 255.255.255.0
Default Gateway . . . . . . . . . : 152.225.244.1
DHCP Server . . . . . . . . . . . : 10.204.40.57
DNS Servers . . . . . . . . . . . : 10.204.127.11
10.207.2.50
10.200.10.6
Primary WINS Server . . . . . . . : 10.207.40.145
Secondary WINS Server . . . . . . : 10.232.40.38
Lease Obtained. . . . . . . . . . : Tuesday, August 28, 2012 6:45:12 AM
Lease Expires . . . . . . . . . . : Sunday, September 02, 2012 6:45:12 A
#!/usr/bin/env perl
use strict;
use warnings;
my $data = <<END;
Subnet Mask . . . . . . . . . . . : 255.255.255.0
Default Gateway . . . . . . . . . : 152.225.244.1
DHCP Server . . . . . . . . . . . : 10.204.40.57
DNS Servers . . . . . . . . . . . : 10.204.127.11
10.207.2.50
10.200.10.6
Primary WINS Server . . . . . . . : 10.207.40.145
Secondary WINS Server . . . . . . : 10.232.40.38
Lease Obtained. . . . . . . . . . : Tuesday, August 28, 2012 6:45:12 AM
Lease Expires . . . . . . . . . . : Sunday, September 02, 2012 6:45:12 A
END
my #ips = ();
if ($data =~ /^DNS Servers[\s\.:]+((\d{2}\.\d{3}\.\d{1,3}\.\d{1,3}\s*)+)/m) {
#ips = split(/\s+/, $1);
print "$_\n" foreach(#ips);
}
I would use unpack instead of regular expressions for parsing column-based data:
#!/usr/bin/env perl
use strict;
use warnings;
while (<DATA>) {
my ($ip) = unpack 'x36 A*';
print "$ip\n";
}
__DATA__
DNS Servers . . . . . . . . . . . : 10.204.127.11
10.207.2.50
10.200.10.6
Primary WINS Server . . . . . . . : 10.207.40.145
Secondary WINS Server . . . . . . : 10.232.40.38
You may have to adjust the number 36 to the actual number of characters that should be skipped.
Personally, I'd go in a different direction. Instead of manually parsing the output of ipconfig, I'd use the Win32::IPConfig module.
Win32::IPConfig - IP Configuration Settings for Windows NT/2000/XP/2003
use Win32::IPConfig;
use Data::Dumper;
my $host = shift || "127.0.0.1";
my $ipconfig = Win32::IPConfig->new($host);
my #searchlist = $ipconfig->get_searchlist;
print Dumper \#searchlist;
Match
DNS.+?:(\s*([\d.]+).)+
and pull out the groups. This assumes you have the entire multi-line string in one blob, ans that the extracted text may contain newlines and other whitespace.
The last dot is to match the newline, you need to use /m option
Match against this regex (see in action):
DNS Servers.*:\s*(.*(?:[\n\r]+\s+.*(?:[\n\r]+\s+.*)?)?)
First capture group will be your three IP's (atmost three) as you requested. You need to trim whitespaces surely.
Edit: Regex fixed to match at most three IP's. If there is less IP's, matches them only.

Pull value for HostName for IPconfig command

I have a text file for IPCONFIG command, and am interested to obtain value for HOST NAME i.e. S4333AAB45 utilizing REGEX.
Windows IP Configuration
Host Name . . . . . . . . . . . . : S4333AAB45
Primary Dns Suffix . . . . . . . :
Node Type . . . . . . . . . . . . : Hybrid
IP Routing Enabled. . . . . . . . : No
I tried following option and it didn't work
/\bHost Name\s+(\d+)/
Here is what I would use:
/\s+Host Name.*: (\w+)$/
Use Field Splitting with AWK
You don't say what regular expression engine you're using, or why you need to use a regular expression to match the host name portion. If you have access to AWK, you can treat this as a field-splitting issue instead. For example:
awk '/\<Host Name\>/ { print $NF }' /tmp/foo
Use Known Line Positions
Assuming you've got Cygwin or similar installed, you can use the position of the interesting record to get the data you want without a regular expression at all. For example:
cat /tmp/foo | head -n3 | cut -d: -f2 | tr -d ' '
Just replace the cat command with your call to ipconfig instead, and you should get the results you want.
Use sed Instead
You can also use sed to find the line you're interested in, and print out just the trailing word on the line. For example:
sed -n '/\<Host Name\>/ s/.*[[:space:]]\([[:alnum:]]\+\)$/\1/p' /tmp/foo
Your host had a letter "S" as the first character of the host name, so "(\d+)" wouldn't be correct for matching your host name. You also failed to account for the dots and colon on the host name line. So the answer from weexpectedTHIS should do the trick. But for your information, here's how you could get the host name without first creating an intermediate file.
$ipconfig = `ipconfig /all`;
($host) = $ipconfig =~ /^\s*Host Name.*:\s*(\w+)/m;
You would need the "/m" in there so that the "^" will match the start of any line in the multi-line contents of $ipconfig. I tend to use "\s*" instead of "\s+" as a sort of insurance against future changes in the output format (where white space is often removed or expanded in newer versions of a command).