Stata looping that I can't comprehend

Stata looping that I can't comprehend - stata

I learned that to change lower case variable names to upper case variables I need to do the following:
foreach var of varlist * {
rename `var' `=upper("`var'")'
}
But I can't comprehend how this can really work.
First, rename does not require = to change variable names.
Second, I understand that I need to embrace var with ` and '
But what does that ` and ' mean surrounding
=upper("var'")
?

You don't need to do that. You don't need a loop and you don't need that syntax. Consider
. sysuse auto, clear
(1978 Automobile Data)
. ds
make mpg headroom weight turn gear_ratio
price rep78 trunk length displacement foreign
. rename *, upper
. ds
MAKE MPG HEADROOM WEIGHT TURN GEAR_RATIO
PRICE REP78 TRUNK LENGTH DISPLACEMENT FOREIGN
Otherwise you are puzzled at the
`= '
because indeed that is nothing to do with rename. That syntax obliges Stata to evaluate a scalar expression on the fly so that rename sees only the result of that expression. In your case the string expression
upper("`var'")
yields an upper-case version of the variable name contained in local macro var.
This syntax is documented at help macro and [P] macro (e.g.
in this version p.13) as one kind of expansion operator.
All that said, all variable names upper case is horrible style....

Related

How to replace a variable based on a local and subtraction

In one line, I would like to perform an operation on a specific row, the row of which is referenced by a number created by a local subtracting a number.
Here is a MWE:
sysuse auto2, clear
*save the number of observations in a local (is there a quicker way to do this?)
count
local N = r(N)
*make sure it works
di `N'
*make sure that the subtraction works
di `N'-1
*make the replacement, which works when referencing row with `N'
replace make = "def" in `N'
*here is the problem - subtracting from `N' doesn't work
replace make = "abc" in `N'-1
Error:
'74-1' invalid observation number
How can I solve this problem?

There are at least two ways to do this.
Create another local macro:
local Nm1 = `N' - 1
replace make = "abc" in `Nm1'
Force evaluation of an expression on the fly:
replace make = "abc" in `=`N'-1'

How to create a character variable out of 2 others where the first word is upcase and the other in brackets in SAS?

I have a table in sas and I want to create a new column C with a variable that should be computed by A and B, A should be in upcase letters and B in brackets.
If A is dog and B is cat then the C in that row should be DOG (cat).
I' m very new to sas, how can I do that?
I know that I can get upcase by upcase(A), but I don't know how I can have 2 character variables after one another to create a new variable and how to put a new variable in brackets.

SAS has a series of CAT.() functions that make that simple. CATS() strips the leading/trailing spaces from the values. CATX() allows you specify a value to paste between the values.
data want ;
set have;
length new $100 ;
new=catx(' ',upcase(a),cats('[',b,']'));
run;

Personally, I'm using cat/cats/catx only in very specific cases. For a problem like this, you can simply use the concatenate operator || that will make the code much more easier to understand:
data want;
set have;
attrib new format=$100.;
new = strip(upcase(a)) || " (" || strip(b) || ")";
run;
OK, that's maybe a little bit more verbose, but I think that's also more easy to understand for a new SAS programmer :)

I wants to remove list of character string from the original string in SAS

I want to remove "LIMITED", "LTD", "CORPORATION", "GMBH", "AG", "SDN", "BHD", "INC" string from my Customer_Name variable.
I tried with compress function in SAS like
Customer_Name1=compress(Customer_Name, 'LIMITED', 'LTD', 'GMBH');
But i am getting error -
The COMPRESS function call has too many arguments.
Please suggest way to solve it.

I would use a regular expression to perform this. Store the words to be removed in a macro variable, then use call prxchange to search within name and remove them. The words are separated by |, which signifies or in regular expression language.
%let vals = LIMITED|LTD|CORPORATION|GMBH|AG|SDN|BHD|INC;
data have;
input name $20.;
datalines;
a ltd
b limited
c corporation
d corp
e gmbh
f test
g ag
i sdn
j bhd
aggregate ag
income inc
;
run;
data want;
set have;
regex = prxparse("s/\b(&vals.)\b//i"); /* /b signifies a word boundary, so it will remove the whole words only */
call prxchange(regex,-1,name);
drop regex;
run;

Recode missing values coded with a string in Stata

I have a dataset with missing values coded "missing". How do I recode these so Stata recognizes them as missing values? When I have numeric missing values, I have been using e.g.:
mvdecode _all, mv(99=. )
However, when I run this with a character in it, e.g.:
mvdecode _all, mv("missing"=. )
I get the error missing is not a valid numlist.

mvdecode is for numeric variables only: the banner in the help is "Change numeric values to missing values" (emphasis added). So the error message should make sense: the string "missing" is certainly not a numeric value, so Stata stops you there. It makes no sense to say to Stata that numeric values "missing" should be changed to system missing, as you requested.
As for what you should do, that depends on what you mean in Stata terms by coded "missing".
If you are referring to string variables with literal values "missing" which should just be replaced by the empty string "", then that would be a loop over all string variables:
ds, has(type string)
quietly foreach v in `r(varlist)' {
replace `v' = "" if `v' == "missing"
}
If you are referring to numeric variables for which there is a value label "missing" then you need to find out the corresponding numeric value and use that in your call to mvdecode. Use label list to look up the asssociation between values and value labels.

mvdecode works with numlists, not strings (clearly stated in help mvdecode). The missing value for strings in Stata is denoted by "".
clear
set more off
*----- example dataset -----
sysuse auto
keep make mpg
keep in 1/5
replace make = "missing" in 2
list
*----- what you want -----
ds, has(type string)
foreach var in `r(varlist)' {
replace `var' = "" if `var' == "missing"
}
list
list if missing(make)
You can verify that Stata now recognizes one missing value for the string variable using the missing() function.

Selective case sensitivity/insensitivity with PRXPARSE

I have the following regex which I am using to scan fields within a dataset for a variety of markers that may indicate that the record belongs to a US resident:
prx_1 = (prxparse("/(?i)^USA$(?-i)|
(?i)^United[\s+]States[\s+]of[\s+]America$(?-i)|
(?i)^US$(?-i)|
(?i)^U[\s+]S[\s+]A$(?-i)|
(?i)^United[\s+]States$(?-i)|
(?i)^America$(?-i)|
(?i)^U[\.+]S[\.+]A$(?-i)|
(?i)^U[\.+]S[\.+]A[\.+]$(?-i)|
(?-i)^AL$|(?-i)^AK$|(?-i)^AZ$|(?-i)^AR$|
(?-i)^CA$|(?-i)^CO$|(?-i)^CT$|(?-i)^DE$|
(?-i)^DC$|(?-i)^FL$|(?-i)^GA$|(?-i)^HI$|
(?-i)^ID$|(?-i)^IL$|(?-i)^IN$|(?-i)^IA$|
(?-i)^KS$|(?-i)^KY$|(?-i)^LA$|(?-i)^ME$|
(?-i)^MD$|(?-i)^MA$|(?-i)^MI$|(?-i)^MN$|
(?-i)^MS$|(?-i)^MO$|(?-i)^MT$|(?-i)^NE$|
(?-i)^NV$|(?-i)^NH$|(?-i)^NJ$|(?-i)^NM$|
(?-i)^NY$|(?-i)^NC$|(?-i)^ND$|(?-i)^OH$|
(?-i)^OK$|(?-i)^OR$|(?-i)^PA$|(?-i)^RI$|
(?-i)^SC$|(?-i)^SD$|(?-i)^TN$|(?-i)^TX$|
(?-i)^UT$|(?-i)^VT$|(?-i)^VA$|(?-i)^WA$|
(?-i)^WV$|(?-i)^WI$|(?-i)^WY$|(?-i)^AS$|
(?-i)^GU$|(?-i)^MP$|(?-i)^PR$|(?-i)^VI$|
(?-i)^UM$|(?-i)^FM$|(?-i)^MH$|(?-i)^PW$|
(?-i)^AA$|(?-i)^AE$|(?-i)^AP$|(?-i)^CM$|
(?-i)^CZ$|(?-i)^NB$|(?-i)^PI$|(?-i)^TT$|
(?i)^Alabama$(?-i)|(?i)^Alaska$(?-i)|(?i)^Arizona$(?-i)|(?i)^Arkansas$(?-i)|
(?i)^California$(?-i)|(?i)^Colorado$(?-i)|(?i)^Connecticut$(?-i)|(?i)^Delaware$(?-i)|
(?i)^District[\s+]of[\s+]Columbia$(?-i)|(?i)^Florida$(?-i)|(?i)^Georgia$(?-i)|(?i)^Hawaii$(?-i)|
(?i)^Idaho$(?-i)|(?i)^Illinois$(?-i)|(?i)^Indiana$(?-i)|(?i)^Iowa$(?-i)|(?i)^Kansas$(?-i)|
(?i)^Kentucky$(?-i)|(?i)^Louisiana$(?-i)|(?i)^Maine$(?-i)|(?i)^Maryland$(?-i)|
(?i)^Massachusetts$(?-i)|(?i)^Michigan$(?-i)|(?i)^Minnesota$(?-i)|(?i)^Mississippi$(?-i)|
(?i)^Missouri$(?-i)|(?i)^Montana$(?-i)|(?i)^Nebraska$(?-i)|(?i)^Nevada$(?-i)|
(?i)^New[\s+]Hampshire$(?-i)|(?i)^New[\s+]Jersey$(?-i)|(?i)^New[\s+]Mexico$(?-i)|
(?i)^New[\s+]York$(?-i)|(?i)^North[\s+]Carolina$(?-i)|(?i)^North[\s+]Dakota$(?-i)|
(?i)^Ohio$(?-i)|(?i)^Oklahoma$(?-i)|(?i)^Oregon$(?-i)|(?i)^Pennslyvania$(?-i)|
(?i)^Rhode[\s+]Island$(?-i)|(?i)^South[\s+]Carolina$(?-i)|(?i)^South[\s+]Dakota$(?-i)|
(?i)^Tennessee$(?-i)|(?i)^Texas$(?-i)|(?i)^Utah$(?-i)|(?i)^Vermont$(?-i)|(?i)^Virginia$(?-i)|
(?i)^Washington$(?-i)|(?i)^West[\s+]Virginia$(?-i)|(?i)^Wisconsin$(?-i)|(?i)^Wyoming$(?-i)|
(?i)^American[\s+]Samoa$(?-i)|(?i)^Guam$(?-i)|(?i)^Northern[\s+]Mariana[\s+]Islands$(?-i)|
(?i)^Puerto[\s+]Rico$(?-i)|(?i)^Virgin[\s+]Islands$(?-i)|
(?i)^U[\.*]S[\.*][\s+]Minor[\s+]Outlying[\s+]Islands$(?-i)|
(?i)^Federated[\s+]States[\s+]of[\s+]Micronesia$(?-i)|(?i)^Marshall[\s+]Islands$(?-i)|
(?i)^Palau$(?-i)/"
));
This is a series of small regexes concatenated with the | marker. My understanding of regexes was that if I wanted to switch case sensitivity on and off I should use (?i) to turn it on and (?-i) to turn it off. However this code is not returning matches where the state name for example is written in upper case.
Have I misinterpreted something here?
Thanks

If the regex flavour support (?i), it should also support (?i:pattern). You should rewrite your regex and place the patterns which should be case-insensitive inside the non-capturing group (?i:pattern).
An example for the part of the pattern which you need to make case-insensitive:
^(?i:USA|United\s+States\s+of\s+America|United\s+States)$
An example for the part of the pattern which you need to make case-sensitive:
^(?:AL|AK|AZ|AR)$

This works here. See this page lower down, title "Comments and Inline Modifiers", for detail.
data have;
input state $;
datalines;
AZ
az
Az
ARIZONA
Arizona
ArIzOnA
;;;;
run;
data want;
set have;
_rx = prxparse('~(?i)AZ|(?-i)Arizona~o');
_rc = prxmatch(_rx,state);
put _rc=;
run;
Your regex is too complex right now to really help you troubleshoot. If you want troubleshooting, I would limit it to just one state (or something like that) and figure it out from there.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Stata looping that I can't comprehend - stata

Related

How to replace a variable based on a local and subtraction

How to create a character variable out of 2 others where the first word is upcase and the other in brackets in SAS?

I wants to remove list of character string from the original string in SAS

Recode missing values coded with a string in Stata

Selective case sensitivity/insensitivity with PRXPARSE

Categories

Resources