I do not know Stata and I need to know what the following code means. I understand that prtvtacy gets recoded to 1 if it is 1 and 0 else. But what does recode prtvtacy (66/max=.) mean?
*cyprus
tab prtvtacy
tab prtvtacy, nolab
recode prtvtacy (66/max=.)
recode winlose 0=1 if prtvtacy==1
Here are inline comments for each line. Note that variable in Stata means what is most often called a column in many other programming languages.
*cyprus // <- This is a comment
tab prtvtacy // <- This shows the frequency of the variable prtvtacy
tab prtvtacy, nolab // <- Same as above, but shows numeric code instead of label
recode prtvtacy (66/max=.) // <- Any value in variable prtvtacy between 66 and the highest value is changed to missing value (.)
recode winlose 0=1 if prtvtacy==1 // <- Set variable winlose to 1 for observations where winlose is 0 and prtvtacy is 1
Related
I have a large data set in Stata.
There are several item batteries in this data set.
One item battery consists of 8 items (v1 - v8), each scaled from 1 to 7.
I want to code all items that take the value 1 in all items as missing values.
If v1 to v8 have the value "1", all rows to which this applies are to be replaced with missings.
I know how to code missing values with the if qualifier, but the selection with the complex condition causes me difficulties.
The code for R would probably solve this via rowSums, but I need the solution for Stata.
(I assume in R it would work like this:
df[rowSums(df[,c("v1", ... "v8")]!=1)==0, c("v1", .... "v8")] <- NA
But I need a solution for Stata.
If I understood this correctly, you want
egen rowall = concat(v1-v8)
mvdecode v1-v8 if rowall == 8 * "1", mv(1)
That is, all instances in v1-v8 of 1 are recoded as missing if and only if the values of those variables are all 1 in any observation.
I want to count the proportion of a variable, but the warning "factor variables may not contain negative values" always come up. After I check the label list, it contains as below:
label list w38_E1a:
w38_E1a:
-99 Refused
-98 Don't know
1 Yes
2 No
How do I remove this -99 and -98 data?
Thank you.
Assuming that the data is coded as numeric type, then I would simply recode them to be positive because if they are categorical it shouldn't matter their sign,
recode w38_E1a (-99 = 99) (-98 = 98)
I think you should drop those outliers,you can use drop if w38_E1a<0
It seems that -99 and -98 are intended to code missing values, thus no outlier here. If this is the case, you should recode the values -99 and -98 of variables using the label w38_E1a to missing. To find the variables whose values are labeled with a specific value label you can use -findname- from SSC.
cap which findname
if _rc ssc install findname // install -findname if necessary
findname, vallabelname(w38_E1a)
foreach v of varlist `r(varlist)' {
recode `v' (-99 = .a ) (-98 = .b)
}
label def w38_E1a .a "Refused" .b "Don't know" -99 "" -98 "", modify
I could not find a way to respond to https://stackoverflow.com/users/15742435/jesse-kaczmarski or https://stackoverflow.com/users/15819003/bing and because I have not "earned" enough reputation I can't comment on their answers directly. However, one should note that their advice can work out in a wrong way:
puput0808 only showed us a the contents of a value label, however, you are trying to recode a variable with the same name or drop cases if a variable with the same name have the values -99 or -98. However, what if the variable name is not identical to the name of the value label? It could be (a) that there is no variable that is connected to this value label (in that case an error message would occur) or (b) that there are several variables connected to this value label and only one has also the name of the value label (in this case the problem would persist).
puput0808 showed us the labels of -99 and -98 indicating that the values are intended to be treated as missing. In that case recoding the values to positive numbers would certainly be a mistake.
I found this curious behavior in the input command for Stata.
When you pass a local macro as an argument either for one variable or multiple, the input command gives this error:
'`' cannot be read as a number
Here are two examples that give the same error:
clear
local nums 1 1 1
input a b c
`nums'
end
clear
local num 1
input a b c
1 1 `num'
end
Is there a way to pass macros into the input command?
This is in substance largely a comment on the answer to Aaron Wolf, but the code makes it too awkward to fit in a physical comment.
Given stuff in a local, another way to do it is
clear
local num "1 1 1"
set obs 1
foreach v in a b c {
gettoken this num : num
gen `v' = `this'
}
Naturally, there are many ways to get 1 1 1 into three variables.
This does not pass a macro to the input command per se, but it does achieve your desired result, so perhaps this can help with what you are trying to do?
General idea is to set the value of a variable to a local, then split the local (similar to the text-to-column button in Excel).
clear
local nums "1 1 1"
foreach n of local nums {
if "`nums_2'" == "" local nums_2 "`n'"
else local nums_2 = "`nums_2'/`n'"
}
set obs 1
gen a = "`nums_2'"
split a, parse("/") gen(b) destring
drop a
I have a set of variables the list of which I have saved in a global macro so that I can use them in a function
global inlist_cond "amz2002ras_clss, amz2003ras_clss, amz2004ras_clss, amz2005ras_clss, amz2006ras_clss, amz2007ras_clss, amz2008ras_clss, amz2009ras_clss, amz2010ras_clss, amz2011ras_clss"
The reason why they are saved in a macro is because the list will be in a loop and its content will change depending on the year.
What I need to do is to generate a dummy variable so that water_dummy == 1 if any of the variables in the macro list has the WATER classification. In Stata, I need to write
gen water_dummy = inlist("WATER", "$inlist_cond")
, which--ideally--should translate to
gen water_dummy = inlist("WATER", amz2002ras_clss, amz2003ras_clss, amz2004ras_clss, amz2005ras_clss, amz2006ras_clss, amz2007ras_clss, amz2008ras_clss, amz2009ras_clss, amz2010ras_clss, amz2011ras_clss)
But this did not work---the code executed without any errors but the dummy variable only contained 0s. I know that it is possible to invoke macros inside functions in Stata, but I have never tried it when the macro contains a whole list of conditions. Any thoughts?
With a literal string specified, which the double quotes in the generate statement insist on, then you are comparing text with text and the comparison is not with the data at all.
. clear
. set obs 1
number of observations (_N) was 0, now 1
. gen a = "water"
. gen b = "wine"
. gen c = "beer"
. global myvars "a,b,c"
. gen found1 = inlist("water", "$myvars")
. gen found2 = inlist("water", $myvars)
. list
+---------------------------------------+
| a b c found1 found2 |
|---------------------------------------|
1. | water wine beer 0 1 |
+---------------------------------------+
The first comparison is equivalent to
. di inlist("water", "a,b,c")
0
which finds no match, as "water" is not matched by the (single!) other argument.
Macro references are certainly allowed within function or command calls: as each macro name is replaced by its contents before the syntax is checked, the function or command never even knows that a macro reference was ever used.
As #Aspen Chen concisely points out, omitting the double quotes gives what you want so long as the inlist() syntax remains legal.
If your data structure is something like in the following example, you can try the egen function incss, from egenmore (ssc install egenmore):
clear
set more off
input ///
str15(amz2009 amz2010)
"water" "juice"
"milk" "water"
"lemonade" "wine"
"water & beer" "tea"
end
list
egen watindic = incss(amz*), sub(water)
list
Be aware it searches for substrings (see the result for the last example observation).
A solution with a loop achieving different results is:
gen watindic2 = 0
forvalues i = 2009/2010 {
replace watindic2 = 1 if amz`i' == "water"
}
list
Another solution involves reshape, but I'll leave it at that.
I have a set of data like below:
A B C D
1 2 3 4
2 3 4 5
They are aggregated data which ABCD constitutes a 2x2 table, and I need to do Fisher exact test on each row, and add a new column for the p-value of the Fisher exact test for that row.
I can use fisher.exact and loop to do it in R, but I can't find a command in Stata for Fisher exact test.
You are thinking in R terms, and that is often fruitless in Stata (just as it is impossible for a Stata guy to figure out how to do by ... : regress in R; every package has its own paradigm and its own strengths).
There are no objects to add columns to. May be you could say a little bit more as to what you need to do, eventually, with your p-values, so as to find an appropriate solution that your Stata collaborators would sympathize with.
If you really want to add a new column (generate a new variable, speaking Stata), then you might want to look at tabulate and its returned values:
clear
input x y f1 f2
0 0 5 10
0 1 7 12
1 0 3 8
1 1 9 5
end
I assume that your A B C D stand for two binary variables, and the numbers are frequencies in the data. You have to clear the memory, as Stata thinks about one data set at a time.
Then you could tabulate the results and generate new variables containing p-values, although that would be a major waste of memory to create variables that contain a constant value:
tabulate x y [fw=f1], exact
return list
generate p1 = r(p_exact)
tabulate x y [fw=f2], exact
generate p2 = r(p_exact)
Here, [fw=variable] is a way to specify frequency weights; I typed return list to find out what kind of information Stata stores as the result of the procedure. THAT'S the object-like thing Stata works with. R would return the test results in the fisher.test()$p.value component, and Stata creates returned values, r(component) for simple commands and e(component) for estimation commands.
If you want a loop solution (if you have many sets), you can do this:
forvalues k=1/2 {
tabulate x y [fw=f`k'], exact
generate p`k' = r(p_exact)
}
That's the scripting capacity in which Stata, IMHO, is way stronger than R (although it can be argued that this is an extremely dirty programming trick). The local macro k takes values from 1 to 2, and this macro is substituted as ``k'` everywhere in the curly bracketed piece of code.
Alternatively, you can keep the results in Stata short term memory as scalars:
tabulate x y [fw=f1], exact
scalar p1 = r(p_exact)
tabulate x y [fw=f2], exact
scalar p2 = r(p_exact)
However, the scalars are not associated with the data set, so you cannot save them with the
data.
The immediate commands like cci suggested here would also have returned values that you can similarly retrieve.
HTH, Stas
Have a look the cci command with the exact option:
cci 10 15 30 10, exact
It is part of the so-called "immediate" commands. They allow you to do computations directly from the arguments rather than from data stored in memory. Have a look at help immediate
Each observation in the poster's original question apparently consisted of the four counts in one traditional 2 x 2 table. Stas's code applied to data of individual observations. Nick pointed out that -cci- can analyze a b c d data. Here's code that applies -cci to each table and, like Stas's code, adds the p-values to the data set. The forvalues i = 1/`=_N' statement tells Stata to run the loop from the first to the last observation. a[`i'] refers to the the value of the variable `a' in the i-th observation.
clear
input a b c d
10 2 8 4
5 8 2 1
end
gen exactp1 = .
gen exactp2 =.
label var exactp1 "1-sided exact p"
label var exactp2 "2-sided exact p"
forvalues i = 1/`=_N'{
local a = a[`i']
local b = b[`i']
local c = c[`i']
local d = d[`i']
qui cci `a' `b' `c' `d', exact
replace exactp1 = r(p1_exact) in `i'
replace exactp2 = r(p_exact) in `i'
}
list
Note that there is no problem in giving a local macro the same name as a variable.