Stata: label variables using forvalue loop - stata

I am trying to label a batch of variables using a loop as follows, but failed with stata error "invalid syntax". I couldn't find out where went wrong.
local myvars "basicenumerator" "basicfr_gpslatitude" "basicfr_gpslongitude"
local mylabels "Name of enumerator" "the latitude of the farmers house" "the longtitude of the farmers house"
local n : word count `mylabels'
forvalues i = 1/`n'{
local a: word `i' of `mylabels'
local b: word `i' of `myvars'
label var `b' "`a'"
}

To debug this, the main trick is to get Stata to show you what it thinks the local macros are. This script makes your code reproducible and also fixes it.
clear
set obs 1
gen basicenumerator = 42
gen basicfr_gpslatitude = 42
gen basicfr_gpslongitude = 42
local myvars `" "basicenumerator" "basicfr_gpslatitude" "basicfr_gpslongitude" "'
local mylabels `" "Name of enumerator" "the latitude of the farmers house" "the longtitude of the farmers house" "'
local n : word count `mylabels'
mac li
forvalues i = 1/`n'{
local a: word `i' of `mylabels'
local b: word `i' of `myvars'
label var `b' "`a'"
}
The problem is that the outer " " get stripped in defining your locals, so to keep the " " as desired, you need to wrap each string within compound double quotes.
For explanation, see http://www.stata.com/manuals14/u12.pdf 12.4.6.
Picky correction: spelling is longitude.

Related

Checking data type of variable derived from user input

I have the following block of code that I thought should be able to classify a user's input's data type in a Stata .do file:
capture program drop smth
program define smth
di "Enter smth: " _request(smth1)
local type = substr("`: type $smth1 '", 1, 3)
if "`type'" == "str" {
di "It is a string!"
}
else if "`type'" == "flo" {
di "It is a float!"
}
else if "`type'" == "int" {
di "It is an integer!"
}
else {
di "it is not a string, float nor integer!"
}
end
However, when I executed the .do file (trialscript is the name of the .do file) in a Stata command prompt with the user input, "hello", I encountered the following error:
. do trialscript
. capture program drop smth
. program define smth
1. di "Enter smth: " _request(smth1)
2. local type = substr("`: type $smth1 '", 1, 3)
3. if "`type'" == "str" {
4. di "It is a string!"
5. }
6. else if "`type'" == "flo" {
7. di "It is a float!"
8. }
9. else if "`type'" == "int" {
10. di "It is an integer!"
11. }
12. else {
13. di "it is not a string, float nor integer!"
14. }
15. end
.
.
end of do-file
. smth
Enter smth: . hello
no variables defined
it is not a string, float nor integer!
What the user enters given your code is put into a global macro, which is not a variable in Stata's sense, as a variable is (only) a column of data in a dataset. The type syntax you used works only with variables.
All global macros that are defined are strings. The programmer and user can think they contain numbers if and only if their content can be used numerically.
A test of whether the input is numeric is to try something numeric, e.g.
capture di 1 + $smth1
if _rc di "it is a string"
else di "it is a number"
This is not quite fail-safe, as a string might contain the name of a numeric variable or scalar, in which case the operation should work.
A test of whether a global macro contains a string that can be interpreted as an integer is to check whether floor($smth1) == $smth1 or equivalently that ceil() and round() return the input value.
There is no sense in which a global macro or its contents can be a float or int, except by trying whether such a variable would accept the contents as a value.
Stata's terminology here is that of many statistical programs in which a variable is a column in a dataset. It comes as a surprise to many of those who started with a mainstream programming language, as I did myself. More at https://www.stata.com/statalist/archive/2008-08/msg01258.html
The kind of input you are programming is now unusual in Stata.

Generating dummy variable based on two string variables

I want to generate a dummy variable which is 1 if there is any match in two variables. These two variables are generated by egen concat and each contains a group of languages used in a country.
For example, var1 has values of apc apc apc apc, and var2 has values of apc or var1 is apc fra nya and var2 is apc. In either cases, fndmtch2 or egen anymatch would not give me 1. Is there anyway I can get 1 for each case?
Your data example can be simplified to
sysuse auto
egen var1 = concat(mpg foreign), punct(" ")
egen var2 = concat(trunk foreign), punct(" ")
as mapping to string in this instance is not needed for mpg trunk any more than it was needed for foreign. concat() maps to string on the fly, and the only issues with numeric variables (neither applying here) are if fractional parts are present or you want to see value labels.
Now that it is confirmed that multiple words can be present, we can work with a slightly more interesting example.
Here are two methods. One is to loop over the words in one variable and also the words in the other variable to check if there are any matches.
Stata's definition of a word here is that words are delimited by spaces. That being so, we can check for the occurrence of " word " within " variable ", where the leading and trailing spaces are needed because in say "frog toad newt" neither "frog" nor "newt" occurs with both leading and trailing spaces. In the OP's example the check may not be needed, but it often is, just as a search for "1" or "2" or "3" finds any of those within "11 12 13", which is wrong if you seek any as a word and not as a single character.
More is said on search for words within strings in a paper in press at the Stata Journal and likely to appear in 22(4) 2022.
* Example generated by -dataex-. For more info, type help dataex
clear
input str8 var1 str5 var2
"FR DE" "FR"
"FR DE GB" "GB"
"GB" "FR"
"IT FR" "GB DE"
end
gen wc = wordcount(var1)
su wc, meanonly
local max1 = r(max)
replace wc = wordcount(var2)
su wc, meanonly
local max2 = r(max)
drop wc
gen match = 0
quietly forval i = 1/`max1' {
forval j = 1/`max2' {
replace match = 1 if word(var1, `i') == word(var2, `j') & word(var1, `i') != ""
}
}
gen MATCH = 0
forval i = 1/`max1' {
replace MATCH = 1 if strpos(" " + var2 + " ", " " + word(var1, `i') + " ")
}
list
+----------------------------------+
| var1 var2 match MATCH |
|----------------------------------|
1. | FR DE FR 1 1 |
2. | FR DE GB GB 1 1 |
3. | GB FR 0 0 |
4. | IT FR GB DE 0 0 |
+----------------------------------+
EDIT
replace MATCH = 1 if strpos(" " + var2 + " ", " " + word(var1, `i') + " ") & !missing(var1, var2)
is better code to avoid the uninteresting match of " " with " ".

How to add incrementing values to string for every time string occurs in Stata?

I have a string variable named talk. Let's say I want to find all instances of the word "please" in talk and, within each row, add a suffix to each "please" that contains an incrementing count of the word.
For example, if talk looks like this:
"will you please come here please do it as soon as you can if you please"
I want it to look like this instead:
"will you please1 come here please2 do it as soon as you can if you please3"
In other words, "please1" indicates that it's the first "please" to occur, "please2" is the second, etc.
I have written some code (below) using regex and several loops but it doesn't work perfectly and, even I could work out the kinks, it seems overly complicated. Is there a simpler way to do this?
# I first extract the portion of 'talk' beginning from the 1st please to the last
gen talk_pl = strtrim(stritrim(regexs(0))) if regexm(talk, "please.+please")
# I count the number of times "please" occurs in 'talk_pl'
egen count = noccur(talk_pl), string("please")
# in the loop below, x = 2nd to last word; i = 3rd to last word
qui levelsof count
foreach n in `r(levels)' {
local i = `n' -1
local x = `i' -1
replace talk_pl = regexrf(talk_pl, "please$", "please`n'") if count == `n'
replace talk_pl = regexrf(talk_pl, "please (?=.+?please`n')", "please`i' ") if count == `n'
replace talk_pl = regexrf(talk_pl, "please (?=.+?please`i')", "please`x' ") if count == `n'
}
* Example generated by -dataex-. To install: ssc install dataex
clear
input str71 talk
"will you please come here please do it as soon as you can if you please"
end
// Install egenmore if not installed already
* ssc install egenmore
clonevar wanted = talk
// count occurrences of "please"
egen countplease = noccur(talk), string(please)
// Loop over 1 to max number of occurrences
sum countplease, meanonly
forval i = 1/`r(max)' {
replace wanted = ustrregexrf(wanted, "\bplease\b", "please`i'")
}
list
+---------------------------------------------------------------------------------------+
1. | talk |
| will you please come here please do it as soon as you can if you please |
|---------------------------------------------------------------------------------------|
| wanted | countp~e |
| will you please1 come here please2 do it as soon as you can if you please3 | 3 |
+---------------------------------------------------------------------------------------+

How do I replace the contents of one variable with that of another?

In essence I want to tell Stata that when variable F11 = "Ya" replace it with the value in variable Score (SCREENSHOTS attached). enter image description here
So I want to replace the contents of the variable F11 as well as all the other indicator numbers (A01, B02, C03 etc.) with the score that applies to that indicator.
So for example, for the first observation in the screenshot, that person received a score of 19.88 (variable is Score) for the indicator F11 (variable is Kat_Indikator_KG) and the label "Ya" under the variable F11 tells us that that individual was scored for this category.
What I would like to do is replace the "Ya" with the score obtained in variable Score and I would like to do that for all the indicator variables e.g. A01, B02, C03.
So far I've tried the following, but none seem to work:
replace F11 = Score if Kat_Indikator_KG == "F-11"
replace F11 = Score if Kat_Indikator_KG == "F-11"
replace F11 = Score if Kat_Indikator_KG == "F-11" & F11 == "Ya"
replace F11 = Score if F11 == "Ya"
Screenshots are attached and help is appreciated!
You cannot replace a string variable with a numeric value. You could replace "Ya" with "1" and destring it, so something like this:
ds Kat Score, not
foreach var of varlist `r(varlist)' {
replace `var' = "1" if `var' == "Ya"
destring `var', replace
replace `var' == Score if `var' == 1
}

Stata - run code if variable name contained in local

I would like to have an if condition in Stata which runs the code in braces for a certain variable only if that variable's name is contained in a local. E.g.
if (`variable` element of `variablenames_local`) {
gen variable2 = variable + 2
}
How can this be done in Stata?
You can use extended macro functions for that, which are documented in help extended_fcn. In this case help macrolist is very useful. (I never remember the names of those help-files, instead I usually type help macro or help local and follow the links in that help-file.)
sysuse auto, clear
local vars "price mpg foreign"
foreach var of varlist _all {
if `: list var in vars' {
di "do something smart with `var'"
}
}
// alternatively:
foreach var of varlist `vars' {
di "do something smart with `var'"
}