I am trying to draw marginplot using stata12. I am running the following code:
margins, at(FuncVariant =(0(0.2) 1)) over(Platform)
Following is the error:
FuncVariant ambiguous abbreviation r(111);
I have the following variables like
FuncVariant :
FuncVariant
FuncVariant_mean
FuncVariant_W
Is that creating a problem?
Post the exact result of the following command to get a diagnosis of the issue in your data:
d FuncVariant*
To get rid of the issue, turn the Stata variable abbreviation setting permanently off:
set varabbrev off, perm
tl;dr: you probably don't have a FuncVariant variable in your data.
d FuncVariant*
storage display value
variable name type format label variable label
--------------------------------------------------------------------------------
FuncVariant byte %8.0g
FuncVariant_m~n float %9.0g
FuncVariant_W float %9.0g
I understood that FuncVariant is dummy variable, so instead I used FuncVariant_W, but it throws error
margins, at( FuncVariant_W =-1(0.2)1) over(Platform)
'FuncVariant_W' not found in list of covariates
For many other variables in the dataset it shows the same error, though the variables are present in the dataset.
Related
My code was running fine until I added the last line for age 5+. Does anyone know what's wrong with that line? Thank you.
data Work.File ;
set Work.File;
Female =(Sex ='F');
Male = (Sex ='M');
Age1=(age=1);
Age2=(age=2);
Age3=(age=3);
Age4=(age=4);
Age5+=(age='5+');
run;
The name of a SAS variable has certain restrictions, you can't have a + sign. Also Age should be a numeric variable. You can write last line as:
Age5Plus=(age>=5);
"Age5+"n=(age>=5);
would also work after setting
options validvarname=any;
but than you have to escape that name every time you use that variable
I have a data step where I have a few columns that need tied to one other column.
I have tried using multiple "from" statements and " to" statements and a couple other permutations of that, but nothing seems to do the trick. The code looks something like this:
data analyze;
set css_email_analysis;
from = bill_account_number;
to = customer_number;
output;
from = bill_account_number;
to = email_addr;
output;
from = bill_account_number;
to = e_customer_nm;
output;
run;
I would like to see two columns showing bill accounts in the "from" column, and the other values in the "to", but instead I get a bill account and its customer number, with some "..."'s for the other values.
Issue
This is most likely because SAS has two datatypes and the first time the to variable is set up, it has the value of customer_number. At your second to statement you attempt to set to to have the value of email_addr. Assuming email_addr is a character variable, two things can happen here:
Customer_number is a number - to has already been set up as a number, so SAS cannot force to to become a character, an error like this may appear:
NOTE: Invalid numeric data, 'me#mywebsite.com' , at line 15 column 8. to=.
ERROR=1 N=1
Customer_number is a character - to has been set up as a character, but without explicitly defining its length, if it happens to be shorter than the value of email_addr then the email address will be truncated. SAS will not show an error if this happens:
Code:
data _NULL_;
to = 'hiya';
to = 'me#mydomain.com';
put to=;
run;
short=me#m
to is set with a length of 4, and SAS does not expand it to fit the new data.
Detail
The thing to bear in mind here is how SAS works behind the scenes.
The data statement sets up an output location
The set statement adds the variables from first observation of the dataset specified to a space in memory called the PDV, inheriting lengths and data types.
PDV:
bill_account_number|customer_number|email_addr|e_customer_nm
===================================================================
010101 | 758|me#my.com |John Smith
The to statement adds another variable inheriting the characteristics of customer_number
PDV:
bill_account_number|customer_number|email_addr|e_customer_nm|to
===================================================================
010101 | 758|me#my.com |John Smith |758
(to is either char length 3 or a numeric)
Subsequent to statements will not alter the characteristics of the variable and SAS will continue processing
PDV (if customer_number is character = TRUNCATION):
bill_account_number|customer_number|email_addr|e_customer_nm|to
===================================================================
010101 | 758|me#my.com |John Smith |me#
PDV (if customer_number is numeric = DATA ERROR, to set to missing):
bill_account_number|customer_number|email_addr|e_customer_nm|to
===================================================================
010101 | 758|me#my.com |John Smith |.
Resolution
To resolve this issue it's probably easiest to set the length and type of to before your first to statement:
data analyze;
set css_email_analysis;
from = bill_account_number;
length to $200;
to = customer_number;
output;
...
You may get messages like this, where SAS has converted data on your behalf:
NOTE: Numeric values have been converted to character
values at the places given by: (Line):(Column).
27:8
N.B. it's not necessary to explicitly define the length and type of from, because as far as I can see, you only ever get the values for this variable from one variable in the source dataset. You could also achieve this with a rename if you don't need to keep the bill_account_number variable:
rename bill_account_number = from;
I am working with SAS and want to record variable which with over 50+ different qualitative dummies. For example, the state of the U.S.
In this case, I just want to reduce them into 4 or 5 levels dummy as quantitative variable.
I get several ideaS, for example to use if/else statement, however, the problem is that i have to write down and specify each of area name in SAS and the code looks like super heavy.
Is there any other ways to do that without redundant code? Or to avoid write each specific name of variable? In SAS.
Any ideas are appreciated!!
Method 1:
Use IN, but you still have to list the variables. You can also do it via a format, but you have to define the format first anyways.
if state in ('AL', 'AK', 'AZ' ... etc) then state_group = 1;
else if state in ( .... ) then state_group = 2;
Method 2:
For a format, you create format using PROC FORMAT and then apply it.
proc format;
value $ state_grp_fmt
'AL', 'AK', 'AZ' = 1
'DC', 'NC' = 2 ;
run;
And then you can use it with a PUT statement.
State_Group = put(state, state_grp_fmt);
My data has some missing values for the variable issue. I'm trying to impute the most recent past issue value (for that subject, identified by id1 and id2), if any. If all past issue values are missing, I want the code to leave the current value as missing.
I tried the below code, but Stata says foreach can't be combined with by.
bys id1 id2 (date): foreach v in 1(1)_n {
replace issue[n] = issue[n-v] if !missing(issue[n-v]) and missing(issue[n])==1
}
Is there a way to do this without using foreach with by?
The attempted loop over observations is quite unnecessary, as Stata does that any way.
If you want to use only the most recent non-missing value it is likely that you want this:
clonevar issue, generate(clone)
bys id1 id2 (date): replace issue = clone[n-1] if missing(issue)
Note the following bugs in your code apart from that you flag:
foreach v in 1(1)_n: foreach won't expand a numlist with in; nor will it evaluate _n for you.
replace issue[n]: subscripts are not allowed in that position; replace issue means the same thing any way.
issue[n-v]: you'd need a local reference there.
and is not a keyword: you need & if you want a logical "and"
n presumably is a typo for _n
See also this FAQ on replacing missing values
I am able to extract the mean into a matrix as follows:
svy: mean age, over(villageid)
matrix villagemean = e(b)'
clear
svmat village
However, I also want to merge this mean back to the villageid. My current thinking is to extract the rownames of the matrix villagemean like so:
local names : rownames villagemean
Then try to turn this macro names into variable
foreach v in names {
gen `v' = "``v''"
}
However, the variable names is empty. What did I do wrong? Since a lot of this is copied from Stata mailing list, I particularly don't understand the meaning of local names : rownames villagemean.
It's not completely clear to me what you want, but I think this might be it:
clear
set more off
*----- example data -----
webuse nhanes2f
svyset [pweight=finalwgt]
svy: mean zinc, over(sex)
matrix eb = e(b)
*----- what you want -----
levelsof sex, local(levsex)
local wc: word count `levsex'
gen avgsex = .
forvalues i = 1/`wc' {
replace avgsex = eb[1,`i'] if sex == `:word `i' of `levsex''
}
list sex zinc avgsex in 1/10
I make use of two extended macro functions:
local wc: word count `levsex'
and
`:word `i' of `levsex''
The first one returns the number of words in a string; the second returns the nth token of a string. The help entry for extended macro functions is help extended_fcn. Better yet, read the manuals, starting with: [U] 18.3 Macros. You will see there (18.3.8) that I use an abbreviated form.
Some notes on your original post
Your loop doesn't do what you intend (although again, not crystal clear to me) because you are supplying a list (with one element: the text name). You can see it running and comparing:
local names 1 2 3
foreach v in names {
display "`v'"
}
foreach v in `names' {
display "`v'"
}
foreach v of local names {
display "`v'"
}
You need to read the corresponding help files to set that right.
As for the question in your original post, : rownames is another extended macro function but for matrices. See help matrix, #11.
My impression is that for the kind of things you are trying to achieve, you need to dig deeper into the manuals. Furthermore, If you have not read the initial chapters of the Stata User's Guide, then you must do so.