Using varlist multiple times in Stata - list

I have three variables varA, varB and varC.
I attempted to first replace "missing" with NA in all three variables then add a label to all three variables.
First I replaced "missing" with NA:
local mylist1 varA-varC
foreach v1 of varlist `mylist1' {
replace `v1'="NA" if `v1' =="missing"
}
Now if I want to call the list again to add the same label to all three variables:
foreach v1 of varlist `mylist1' {
label var `v1' "testvaraible"
}
but I will get an error message saying :
varlist required
Could anyone explain why I can't recall the list?

For your first example, this would be legal syntax if the variables concerned were all string:
local mylist1 varA-varC
foreach v of varlist `mylist1' {
replace `v' = "NA" if `v' == "missing"
}
Notice the different punctuation for referring to a local macro (different left and right quotation marks) and the difference in placing braces.
It is difficult even to work out what you want in your second example, but the loop is over differing values of a local macro v which you never refer to inside the loop. Also, depending on the definition of the unspecified local macro testvaraible [sic], it is still puzzling why you would label three variables identically.
You may need to be much more explicit about your data and exactly what you want if this does not answer the question. In particular, we can't see definitions for local macros v1 and testvaraible.

I came to this discussion after the edits made in response to the discussion around the previous answer. At this point, copying the code as it now stands in the original post, the problem has apparently been corrected. Hate to post this as an answer, but it's apparently too long for a comment.
. set obs 1
obs was 0, now 1
. generate str8 varA = "a"
. generate str8 varB = "missing"
. generate str8 varC = "c"
. local mylist1 varA-varC
. foreach v1 of varlist `mylist1' {
2. replace `v1'="NA" if `v1' =="missing"
3. }
(0 real changes made)
(1 real change made)
(0 real changes made)
. foreach v1 of varlist `mylist1' {
2. label var `v1' "testvariable"
3. }
. list, clean noobs
varA varB varC
a NA c
. describe
Contains data
obs: 1
vars: 3
size: 24
------------------------------------------------------------------------------------------------
storage display value
variable name type format label variable label
------------------------------------------------------------------------------------------------
varA str8 %9s testvariable
varB str8 %9s testvariable
varC str8 %9s testvariable
------------------------------------------------------------------------------------------------
Sorted by:
Note: dataset has changed since last saved
.

Related

SAS - if and then condition statements

My data is more than 70,000. I have more than 50 variables. (Var1 to Var50). In each variable, there are about about 30 groups (I'll use a to z). I am trying to get a selection of data using if statements. I'd like to select every data with the same group. Eg data in var 1 to 30 with a, data with var 1 to 30 in b.
I seem to be writing
If (Var1="a" and Var2="a" and Var3="a" and Var4="a" and all the way to var50=
"a") or (Var1="b" and Var2="a" and Var3="b" and Var4="b" and all the way to var50=
"b")...
How do I consolidate? I tried using an array but it didnt work and i was not sure if arrays work in the IF and then statement.
IF (VAR2="A" or VAR2="B" or VAR2="C" or VAR2="D"
or VAR3="A" or VAR3="B" or VAR3="C" or VAR3="D"
or VAR4="A" or VAR4="B" or VAR4="C" or VAR4="D"
or VAR5="A" or VAR5="B" or VAR5="C" or VAR5="D"
or VAR6="A" or VAR6="B" or VAR6="C" or VAR6="D"
or VAR7="A" or VAR7="B" or VAR7="C" or VAR7="D"
or VAR8="A" or VAR8="B" or VAR8="C" or VAR8="C"
or VAR9="A" or VAR9="B" or VAR9="C" or VAR9="D"
or VAR10="A" or VAR10="B" or I10_D10="C" or VAR10="D"
or VAR12="A" or VAR12="B" or VAR12="C" or VAR12="D"
or VAR13="A" or VAR13="B" or VAR13="C" or VAR13="D"
or VAR14="A" or VAR14="B" or VAR14="C" or VAR14="D"
or VAR15="A" or VAR15="B" or VAR15="C" or VAR15="D"
or VAR6="A" or VAR16="B" or VAR16="C" or VAR16="D"
or VAR17="A" or VAR17="B" or VAR17="C" or VAR17="D"
or VAR18="A" or VAR18="B" or VAR18="C" or VAR18="C"
or VAR19="A" or VAR19="B" or VAR19="C" or I10_D19="D"
or VAR20="A" or VAR20="B" or I10_D20="C" or VAR20="D"
or VAR21="D" or VAR22="A" or VAR22="B" or VAR22="C" or VAR22="D"
or VAR23="A" or VAR23="B" or VAR23="C" or VAR23="D"
or VAR24="A" or VAR24="B" or VAR24="C" or VAR24="D"
or VAR25="A" or VAR25="B" or VAR25="C" or VAR25="D"
or VAR26="A" or VAR26="B" or VAR26="C" or VAR26="D"
or VAR27="A" or VAR27="B" or VAR27="C" or VAR27="D"
or VAR28="A" or VAR28="B" or VAR28="C" or VAR28="C"
or VAR29="A" or VAR29="B" or VAR29="C" or VAR29="D"
or VAR30="A" or VAR30="B" or I10_D30="C" or VAR30="D")
then Group=1; else Group=0;
You probably don't need a macro, however a macro might be faster.
%let value=a;
data want;
set have;
array var[50];
keepit=1;
do i=1 to 50;
keepit = keepit and (var[i]="&value");
if ^keepit then
leave;
end;
if keepit;
drop i keepit;
run;
I create a signal variable and update it's value, it will be false if any value in the var[] array is not the &value. I leave the loop early if we find 1 non-matching value, to make it more efficient.
It's not exactly clear what you want. If you want to avoid checking all variables you can use WHICHC to find if any in a list are A.
X = whichc('a', of var1-var30);
If you want to see what different groups you have across all the variables, I think a big proc freq is what you want:
proc freq data=have noprint;
table var1*var2*var3*var4....*var30*gender*age / list out=table_counts;
run;
And then check the table_counts data set to see if that has what you want.
If neither of these are what you want, you need to add more details to your question. A sample of data and expected output would be perfect.
When I need to search several variables for a particular value what I will do is - combine all variables into one string and then search that string. Like this:
*** CREATE TEST DATA ***;
data have;
infile cards;
input VAR1 $ VAR2 $ VAR3 $ VAR4 $ VAR5 $;
cards;
J J K A M
S U I O P
D D D D D
l m n o a
Q U J C S
;
run;
data want;
set have;
*** USE CATS FUNCTION TO CONCATENATE ALL VAR# INTO ONE VARIABLE ***;
allvar = cats(var1, var2, var3, var4, var5);
*** IF NEEDED, APPLY UPCASE TO CONCATENATED VARIABLE ***;
*allvar = upcase(allvar);
*** USE INDEXC FUNCTION TO SEARCH NEW VARIABLE ***;
if indexc(allvar, "ABCD") > 0 then group = 1;
else group = 0;
run;
I'm not sure if this is exactly what you need, but hopefully this is something you can modify for your particular task.
The code as posted is testing if ANY of a list of variables match ANY of a list of values.
Let's make a simple test dataset.
data have ;
input id (var1-var5) ($);
cards;
1 E F G H I
2 A B C D E
;;;;
Make one array of the values you want to find and one array of the variables you want to check. Loop over the list of variables until you either find one that contains one of the values or you run out of variables to test.
data want ;
set have;
array values (4) $8 _temporary_ ('A' 'B' 'C' 'D');
array vars var1-var5 ;
group=0;
do i=1 to dim(vars) until (group=1);
if vars(i) in values then group=1;
end;
drop i;
run;
You could avoid the array for the list of values if you want.
if vars(i) in ('A' 'B' 'C' 'D') then group=1;
But using the array will allow you to make the loop run over the list of values instead of the list of variables.
do i=1 to dim(values) until (group=1);
if values(i) in vars then group=1;
end;
Which might be important if you wanted to keep the variable i to indicate which value (or variable) was first matched.

SAS, combine strings

I have a dataset test1, I want to generate a key which is the combination of any of the specified variables. For example, the key in ideal_1, or the key in ideal_2. I need to write a macro for this, but the challenges for me is that the number of the vars are not fixed, as you can see in ideal1, it is the combination of 2, and in ideal3 it is the combination of 3.
data test1;
input var1$ var2$ var3$ var4$ var5$ var6$;
datalines;
1 a a b e
2 a f b e
3 a a a a
1 b a a a
2 a f b e
;
run;
data ideal_1;
set test1;
key=strip(var1)||strip(var2);
run;
data ideal_2;
set test1;
key=strip(var1)||strip(var2)||strip(var5);
run;
Just use a variable list. You could store the list into a macro variable to make it easier to edit.
%let keylist=var1 var2 var5 ;
Then you can use the macro variable where ever you need it.
data ideal_2;
set test1;
key=cats(of &keylist);
run;
If the variables have a naming convention as in your example you can use something like the following, which uses the colon operator to concatenate all of the variables that start with the prefix VAR.
key = catt(of var:);

Stata: convert a matrix to dataset without losing names

This question has been asked before but the answers do not seem to apply here. I would like to make a dataset from my regression output, without losing information. Consider:
clear *
input str3 iso3 var1 var2 var3
GBR 10 13 15
USA 9 7 4
FRA 8 8 7
BEL 3 4 5
end
local vars var2 var3
reg var1 var2 var3
matrix A=r(table)
matrix list A
clear
xsvmat A, names(col) norestore
Where Stata complains about the _cons column. I'm not interested in this column (although I also don't understand why it is such a problem to include it) but I don't find an option to cope with this in the xsvmat, svmat or svmat2 help.
Although Stata variable names can usually start with an underscore _, [U] 11.3 Naming conventions explains that _cons is a reserved name, and they can't be used as variable names.
I think you want this:
clear
set more off
input ///
str3 iso3 var1 var2 var3
GBR 10 13 15
USA 9 7 4
FRA 8 8 7
BEL 3 4 5
end
local vars var2 var3
reg var1 var2 var3
matrix A = r(table)
// get original row names of matrix (and row count)
local rownames : rowfullnames A
local c : word count `rownames'
// get original column names of matrix and substitute out _cons
local names : colfullnames A
local newnames : subinstr local names "_cons" "cons", word
// rename columns of matrix
matrix colnames A = `newnames'
// convert to dataset
clear
svmat A, names(col)
// add matrix row names to dataset
gen rownames = ""
forvalues i = 1/`c' {
replace rownames = "`:word `i' of `rownames''" in `i'
}
// check
order rownames
list, sep(0)
Extended macro functions are used. See help extended_fcn if you're not familiar with them.
See also this answer, which is very similar, and suggests postfile and statsby.
Finally, check ssc describe estout, if your goal is to output regression tables.

Refer to iteration number inside for-loop in Stata

Like many others, I often loop through variables in Stata, running some estimation command and then extracting the results to a variable created to hold them. This is simple when the variables are numbered sequentially or in some pattern (e.g. even numbers in a set). As an example:
sysuse auto
gen var1 = uniform()
gen var2 = uniform()
gen var3 = uniform()
*Create variables to hold results
gen str4 varname=""
gen results=.
*Loop through three variables
foreach i of numlist 1/3{
replace varname="var`i'" in `i'
sum var`i'
replace results=r(mean) in `i'
}
However, I often want to do something similar when the variables are not numeric and/or are not in an easy-to-handle order. Let's say I wanted to do the same thing for price, mpg, weight and length in the auto dataset. If we set up the for-loop as:
sysuse auto
gen str4 varname=""
gen results=.
foreach var of varlist price mpg weight length{
sum `var'
*Place values, in order, in rows?
}
then we need some way to understand that price is the first variable in the list, so its results should go in row 1 (or its name in row 1, or whatever we want to do).
This must be possible, but I would appreciate some suggestions. A clean/non-hackish way would be ideal, as I will be doing this a lot.
You can use a local counter that you start at 1 and increment at the end of each iteration:
sysuse auto, clear
gen varname=""
gen mean=.
local i=1
foreach var of varlist price mpg weight {
quietly sum `var'
replace mean = r(mean) in `i'
replace varname = "`var'" in `i'
local ++i
}
You could also do this. It's unlikely to seem as direct or simple as the standard technique explained by #Dimitriy V. Masterov, but it has its uses.
sysuse auto, clear
gen varname = ""
gen mean = .
local nvars : word count price mpg weight
tokenize "price mpg weight"
quietly forval j = 1/`nvars' {
sum ``j'', meanonly
replace mean = r(mean) in `j'
replace varname = "``j''" in `j'
}
The general points are
Words are separated by spaces, except that double quotation marks and compound double quotation marks bind tighter. Thus a, b and c are unsurprisingly the words in a b c but there are just two words in Stata "is great"
You can count how many objects you are looping over. It is the number of words in a string.
Applying tokenize to an argument string maps the separate words of that argument to local macros named 1, 2 and so forth. The nested macro references that is likely to imply are interpreted just as you would guess from elementary algebra: the innermost argument is evaluated first.
For more complicated problems, including the unpacking of a varlist, check out also unab.

How to keep a list of variables given some of them may not exist?

I have 100 dta files. I have a list of variables that I need to keep and save temporary copies on the fly. Some variables may or may not exist in a certain dta.
I need Stata to keep all variables that exist in a dta and ignore those that do not exist.
The following code has wrong syntax, but it could serve as a good pseudo code to give one a general idea of what should be done:
forval j = 1/100 {
use data`j'
local myVarList =""
foreach i of varlist var1 var2 var3 var4 var5 var6 var7 var8 {
capture sum `i'
if _rc = 0 {
`myVarList' = `myVarList'" "`i'
}
}
keep `myVarList'
save temporaryData`j'
}
Is there any way to do this?
There are many issues with your code. Here's one way to do the inner loop.
/* one fake dataset */
set obs 5
gen var1 = 1
gen var2 = 2
gen var3 = "c"
gen z = 35
ds
/* keep part */
local masterlist "var1 var2"
local keeplist = ""
foreach i of local masterlist {
capture confirm variable `i'
if !_rc {
local keeplist "`keeplist' `i'"
}
}
keep `keeplist'
The key part is that you can't foreach i of varlist phantomvar, since Stata will check the existence and error out. Similarly, putting the local name in special quotes will evaluate it, but you're trying to redefine. You may find set trace on a useful feature in debugging.
This is somewhat better code:
unab allvars: _all
local masterlist "var1 var2 phantomvar"
local keeplist: list allvars & masterlist
keep `keeplist'