Custom program - debugging syntax - stata

I have created the following program which is not playing well with string expressions. I haven't been able to figure out the right adjustment to add in my syntax definition to get this to work as intended.
I think this is something small, but I haven't been able to get it right yet. Or, references to something that would help would also be appreciated.
Included is the program and some dummy code that yields the same error.
Thanks!
cap program drop repl_conf
program define repl_conf
syntax varlist =exp [if]
qui count `if'
if r(N) ==0 {
di as err "NO MATCHES -- NO REPLACE"
exit 9
}
else {
noi dis "SUCCESSFUL REPLACE of >=1 OBS -- " r(N) " OBS replaced"
qui replace `varlist' `exp' `if'
}
end
sysuse auto, clear
repl_conf length=999 if length==233
repl_conf make="ZZZ" if make=="AMC Concord"
type mismatch
r(109);

This gets further. I moved the second message so that it is only issued if the replace was successful.
program define repl_conf
gettoken varname 0 : 0, parse(=)
confirm var `varname'
gettoken eq 0 : 0, parse(=)
syntax anything [if]
qui count `if'
if r(N) == 0 {
di as err "NO MATCHES -- NO REPLACE"
exit 9
}
else {
qui replace `varname' = `anything' `if'
noi di "SUCCESSFUL REPLACE of >=1 OBS -- " r(N) " OBS replaced"
}
end
sysuse auto, clear
repl_conf length=999 if length==233
repl_conf make="ZZZ" if make=="AMC Concord"

Related

Generating dummy variable based on two string variables

I want to generate a dummy variable which is 1 if there is any match in two variables. These two variables are generated by egen concat and each contains a group of languages used in a country.
For example, var1 has values of apc apc apc apc, and var2 has values of apc or var1 is apc fra nya and var2 is apc. In either cases, fndmtch2 or egen anymatch would not give me 1. Is there anyway I can get 1 for each case?
Your data example can be simplified to
sysuse auto
egen var1 = concat(mpg foreign), punct(" ")
egen var2 = concat(trunk foreign), punct(" ")
as mapping to string in this instance is not needed for mpg trunk any more than it was needed for foreign. concat() maps to string on the fly, and the only issues with numeric variables (neither applying here) are if fractional parts are present or you want to see value labels.
Now that it is confirmed that multiple words can be present, we can work with a slightly more interesting example.
Here are two methods. One is to loop over the words in one variable and also the words in the other variable to check if there are any matches.
Stata's definition of a word here is that words are delimited by spaces. That being so, we can check for the occurrence of " word " within " variable ", where the leading and trailing spaces are needed because in say "frog toad newt" neither "frog" nor "newt" occurs with both leading and trailing spaces. In the OP's example the check may not be needed, but it often is, just as a search for "1" or "2" or "3" finds any of those within "11 12 13", which is wrong if you seek any as a word and not as a single character.
More is said on search for words within strings in a paper in press at the Stata Journal and likely to appear in 22(4) 2022.
* Example generated by -dataex-. For more info, type help dataex
clear
input str8 var1 str5 var2
"FR DE" "FR"
"FR DE GB" "GB"
"GB" "FR"
"IT FR" "GB DE"
end
gen wc = wordcount(var1)
su wc, meanonly
local max1 = r(max)
replace wc = wordcount(var2)
su wc, meanonly
local max2 = r(max)
drop wc
gen match = 0
quietly forval i = 1/`max1' {
forval j = 1/`max2' {
replace match = 1 if word(var1, `i') == word(var2, `j') & word(var1, `i') != ""
}
}
gen MATCH = 0
forval i = 1/`max1' {
replace MATCH = 1 if strpos(" " + var2 + " ", " " + word(var1, `i') + " ")
}
list
+----------------------------------+
| var1 var2 match MATCH |
|----------------------------------|
1. | FR DE FR 1 1 |
2. | FR DE GB GB 1 1 |
3. | GB FR 0 0 |
4. | IT FR GB DE 0 0 |
+----------------------------------+
EDIT
replace MATCH = 1 if strpos(" " + var2 + " ", " " + word(var1, `i') + " ") & !missing(var1, var2)
is better code to avoid the uninteresting match of " " with " ".

SAS: No matching %MACRO statement

I am following a published method to identify matched cases. I am getting the following error
ERROR: No matching %MACRO statement for this %MEND statement.
WARNING: Apparent invocation of macro MATCH not resolved.
137 %MEND MATCH;
138
139 %MATCH (g.ps_match,Match4,scase4,scontrol4, abuser, 0.0001);
_
180
ERROR 180-322: Statement is not valid or it is used out of proper order.
How do I correctly call the macro?
I am using SAS University Edition.
The method is from
http://www2.sas.com/proceedings/sugi25/25/po/25p225.pdf
Part 2: Perform the Match
The next part of the macro program performs the match and
outputs the matched pairs. First, the cases data set is
selected. Curob is used to keep track of the current case.
Matchto is used to identify matched pairs of cases and
controls. Start and oldi are initialized to control processing of
the controls data set DO loop.
data &lib..&matched.
(drop=Cmatch randnum aprob cprob start
oldi curctrl matched);
set &lib..&SCase. ;
curob + 1;
matchto = curob;
if curob = 1 then do;
start = 1;
oldi = 1;
end;
Next, the controls data set is selected. Processing starts at
the first unmatched observation. The data set is searched
until a match is found, or it is determined no match can be
made. Error checking is performed to avoid an infinite loop.
Curctrl is used to keep track of current control.
DO i = start to n;
set &lib..&Scontrol. point = i nobs = n;
if i gt n then goto startovr;
if _Error_ = 1 then abort;
curctrl = i;
If the propensity score of the current case (aprob) matches the
propensity score of the current control (cprob), then a match
was found. Update Cmatch to 1=Yes. Output the control.
Update matched to keep track of last matched control. Exit
the DO loop. If the propensity score of the current control is
greater than the propensity score of the current case, then no
match will be found for the current case. Stop the DO loop
processing.
if aprob = cprob then
do;
Cmatch = 1;
output &lib..&matched.;
matched = curctrl;
goto found;
end;
else if cprob gt aprob then
goto nextcase;
startovr: if i gt n then
goto nextcase;
END;
/* end of DO LOOP */
nextcase:
if Cmatch=0 then start = oldi;
found:
if Cmatch = 1 then do;
oldi = matched + 1;
start = matched + 1;
set &lib..&SCase. point = curob;
output &lib..&matched.;
end;
retain oldi start;
if _Error_=1 then _Error_=0;
run;
%MEND MATCH;
MACRO MATCH CALL STATEMENT
The following are call statements to the macro
program MATCH. The first performs a 4-digit match;
the second performs a 3-digit match.
%MATCH(STUDY,Propen,Match4,SCase4,
SContrl4,Interven,.0001);
%MATCH(STUDY,Propen,Match3,SCase3,
SContrl3,Interven,.001);
Presumably, you didn't include the beginning of the macro (i.e., the %MACRO MATCH(... portion, earlier in the paper). This is a macro, it's not intended to be run in pieces the way it's written - you need to include all of the code from %MACRO MATCH to %MEND and then the calls.

Stata: label variables using forvalue loop

I am trying to label a batch of variables using a loop as follows, but failed with stata error "invalid syntax". I couldn't find out where went wrong.
local myvars "basicenumerator" "basicfr_gpslatitude" "basicfr_gpslongitude"
local mylabels "Name of enumerator" "the latitude of the farmers house" "the longtitude of the farmers house"
local n : word count `mylabels'
forvalues i = 1/`n'{
local a: word `i' of `mylabels'
local b: word `i' of `myvars'
label var `b' "`a'"
}
To debug this, the main trick is to get Stata to show you what it thinks the local macros are. This script makes your code reproducible and also fixes it.
clear
set obs 1
gen basicenumerator = 42
gen basicfr_gpslatitude = 42
gen basicfr_gpslongitude = 42
local myvars `" "basicenumerator" "basicfr_gpslatitude" "basicfr_gpslongitude" "'
local mylabels `" "Name of enumerator" "the latitude of the farmers house" "the longtitude of the farmers house" "'
local n : word count `mylabels'
mac li
forvalues i = 1/`n'{
local a: word `i' of `mylabels'
local b: word `i' of `myvars'
label var `b' "`a'"
}
The problem is that the outer " " get stripped in defining your locals, so to keep the " " as desired, you need to wrap each string within compound double quotes.
For explanation, see http://www.stata.com/manuals14/u12.pdf 12.4.6.
Picky correction: spelling is longitude.

Variable length string argument in stata syntax

To cleanup data, I am writing a function which takes a list of variables, and replaces a list of strings with empty strings. While I have code to solve the problem, I want to learn how to use a variable length list of strings as an argument.
To get a sense of the simple version, the following would replace any "X" in myval1 and myval2 with an empty string, and is called like:
replace_string_with_empty myval1 myval2, code("X")
The code is,
capture program drop replace_string_with_empty
program replace_string_with_empty
syntax varlist(min=1), Code(string)
foreach var in `varlist' {
replace `var' = "" if `var' == "`code'"
}
end
But what if I have several codes? Forgetting that there may be cleaner ways to do this, I would like to call this as things like
replace_string_with_empty myval1 myval2, codes("X" "NONE")
But I can't figure out the type in the syntax command, etc. For example, the following does not work
capture program drop replace_string_with_empty
program replace_string_with_empty
syntax varlist(min=1), Codes(namelist)
foreach var in `varlist' {
foreach code in `codes' {
replace `var' = "" if `var' == "`code'"
}
}
end
Any ideas? (again, I am sure there are better ways to solve this exact problem, but I want to figure out how to use the syntax in this way for other tasks as well.
Here's a simple example of one approach to this. The asis option will leave the quotes alone, but we will then need to use compound quotes when referring to the strings that are to be recoded to null:
capture program drop replace_string_with_empty
program replace_string_with_empty
syntax varlist(min=1 string), Codes(string asis)
tokenize `"`codes'"'
while "`1'" != "" {
foreach var of varlist `varlist' {
replace `var' = "" if `var'==`"`1'"'
}
macro shift
}
end
sysuse auto, clear
clonevar make2=make
replace_string_with_empty make*, codes("AMC Concord" "AMC Spirit" "Audi 5000")

Stata - run code if variable name contained in local

I would like to have an if condition in Stata which runs the code in braces for a certain variable only if that variable's name is contained in a local. E.g.
if (`variable` element of `variablenames_local`) {
gen variable2 = variable + 2
}
How can this be done in Stata?
You can use extended macro functions for that, which are documented in help extended_fcn. In this case help macrolist is very useful. (I never remember the names of those help-files, instead I usually type help macro or help local and follow the links in that help-file.)
sysuse auto, clear
local vars "price mpg foreign"
foreach var of varlist _all {
if `: list var in vars' {
di "do something smart with `var'"
}
}
// alternatively:
foreach var of varlist `vars' {
di "do something smart with `var'"
}