I have data of 18 states for 6 years(2009-2014).How can i create dummies which consider state and time effect simultaneously?
Without your data I have to assume a lot to answer this, but if I assume your state variable is a string and your year variable is numeric, then to create dummy variables for this I would put the two variables together and then encode them, like below:
tostring year, replace
gen state_year = state+year
encode state_year, gen(state_year_num)
and state_year_num is your indicator variable.
If you want a bunch of dummy variables you can add this line:
tabulate state_year_num, gen(dummy)
which will generate as many dummy variables as state-year pairs.
Related
I have two variables containing state identifier and year. If I want to create dummy variables indicating each state, I usually write the following code:
tab state_id, gen(state_id_)
This will give me a group of variables, state_id_1,state_id_2,... etc. But what operations are available if I want to get a list of dummy variables for the interaction of state and year, for instance a dummy variable indicating state 1 in year 2005.
Have you tried looking at xi (https://www.stata.com/manuals13/rxi.pdf)? It will create dummies for each of the categorical variables and for the interaction of those two. So if you do:
xi i.state*i.year
This should give you what you are looking for, but note that it will naturally code this and omit the first category of each of your categorical variables.
I've gone in circles on this one for 1.5 hours, so I'm giving in and asking for help here. What I'm trying to do is dead simple but I cannot for the life of me find a link describing the process.
I have the following data step:
data _null_;
some_date = "01JAN2000"D;
call symput('macro_input_date',left(put(some_date),date9.)));
%useful_macro(¯o_input_date);
run;
where a date value is passed to a macro function (I'm new to these). I'd like to use the numeric value of the date value - let's be wild and say I want to get the value of the year, multiply it by the day value, and subtract the remainder after dividing the month value by 3. I can't seem to get just the year value out of the input. I've tried various things such as
symget, both "naked" and prepended with "%", with arguments that represent all possible permutations of the following variants:
have a naked reference to the variable, e.g. macro_input_date
enclose in single quotes, e.g. 'macro_input_date'
enclose in double quotes, e.g. "macro_input_date"
prepend with the ampersand, e.g. ¯o_input_date
direct call to %sysfunc(year(<argument as variously specified above>)
Can anyone tell me what I am missing?
Thanks!
Given that you asked about macro functions, I'll guess that your example date processing is just an example. Talking about macro functions in general, it's important to understand that a macro function will (generally) not be doing any processing of its own, it will just be generating some data step code to do some task. So, for something like your contrived example, the data step code would be something like:
data out;
set in; * Assume this contains a numeric called 'some_date';
result = year(some_date) * day(some_date) - mod(month(some_date), 3);
run;
To macroise this, you don't need to transfer the data values to the macro, you just need to transfer the variable name:
%macro date_func(var=);
year(&var) * day(&var) - mod(month(&var), 3)
%mend;
data out;
set in; * Assume this contains a numeric called 'some_date';
result = %date_func(var=some_date);
run;
Note that the value of the var parameter here is the literal text some_date, not the value of the some_date data step variable. There are other ways to do it of course - you could actually pass this macro a date literal and it would still work:
data out;
set in; * Assume this contains a numeric called 'some_date';
result = %date_func(var="21apr2017"d);
run;
so it all depends on exactly what you're trying to do... maybe you want to assign the result to another macro variable, so it doesn't need to be part of a data step at all, in which case you could do a similar thing with %sysfunc functions etc.
If you're just trying to get the year, you would do something like:
data _null_;
some_date = "01JAN2000"D;
call symput('macro_input_date',left(put(some_date,date9.)));
yearval = substr(symget('macro_input_date'),6,4);
put yearval=;
run;
Your macro value (¯o_input_date) is not the actual date value (14610) but is the text 01JAN2000. So you cannot use the year function (unless you INPUT it back), you would use substr to grab the year part.
Of course, this is all sort of pointless as going to/from macro variable doesn't really accomplish much here.
Are you just have trouble with date literals? Your data step code
data _null_;
some_date = "01JAN2000"D;
call symput('macro_input_date',left(put(some_date),date9.)));
run;
is just going to do the same thing as
%let macro_input_date=01JAN2000 ;
Now if you want to treat that string of characters as if it represents a date then you need to either wrap it up as a date literal
"¯o_input_date"d
Or convert it.
%sysfunc(inputn(¯o_input_date,date9))
Why not just store the actual date value into the macro variable?
call symputx('macro_input_date',some_date);
Then it wouldn't look like a date to you but it would look like a date to the YEAR() function.
totalSUPPLY= sum(of supply1-supply485);
Ive got this simple calculation to make (in SAS) from a table that Ive transposed (hence the variable names). I have to do this several times, and the the number of supply variables is not the same for each calculation. I.e. in the above example its 485, but I do it later in my analysis and its 350.
My question: Is there a way to 'wildcard' the number of 'supply' columns. Basically, I want something like this (but this doesnt work): totalSUPPLY= sum(of supply1-supply%);
Also: If there is an easier way do the same Im open (and would actually prefer) that.
Thanks everyone!
data yoursummary;
set yourdata; /*dataset containing supply1-supply485*/
array supplies{*} supply:;
totalSUPPLY = sum(of supplies{*});
run;
N.B. using a : wildcard like this will only pick up matching variables that are present in the PDV at the point when you create the array, so the array definition has to come after the set statement. Also, it only works for variables with a common prefix, not those with a common suffix.
As Joe has pointed out, the following more concise code also works:
data yoursummary;
set yourdata; /*dataset containing supply1-supply485*/
totalSUPPLY = sum(of supplies:);
run;
Of course, if you declare an array it's then easier to do related things like checking how many variables are being added together, or looping through the variables in the array and applying the same logic to each one in turn.
I have done this many times with Excel and Java... This time I need to do it using Stata because it is more convenient to preserve variables' labels. How can I restructure dataset_1 into dataset_2 below?
I need to transform the following dataset_1:
into dataset_2:
I know one way, which is a little awkward... I mean, I could expand all the observations, then create variable obsNo, and then, rename variables...is there any better way?
Stata is wonderful at this sort of thing, it's a simple reshape. Your data is a little awkward, as the reshape command was designed to work with variables where the common part of the variable name (in your case, Wage) comes first. In the documentation for reshape, "Wage" would be the stub. The part following Wage is required to be numeric. If you first sort your variable names by
rename (raceWhiteWage raceBlackWage raceAsianWage) (Wage1 Wage2 Wage3)
Then you can do:
reshape long Wage, i(state year) j(race)
That should give you the output your are looking for. You will have a column labeled "race", with values of 1 for White, 2 for Black, and 3 for Asian.
I have a Stata file file1.dta and one of the variables is income. I need to calculate average_income, assign it to a local macro, and store in a different Stata file, New.dta.
I have tried the following in a do file:
#delimit;
clear;
set mem 700m;
use file1.dta;
local average_income = mean income;
use New.dta;
gen avincome = average_income;
However, it does not work.
One way to do this would be the following:
#delimit;
clear;
set mem 700m;
use file1.dta;
quietly: summarize income;
local average_income = r(mean);
use New.dta;
gen avincome = `average_income';
This overlaps with your other post, namely How to retrieve data from multiple Stata files?. You don't say why you think
use file1.dta;
local average_income = mean income;
will work, but the second line is just fantasy syntax. There are various ways to calculate the mean of a variable, the most common being to use summarize and pick up the mean from r(mean).
You should probably delete this question: it serves no long-term purpose.