Create link between observations clustered within village - stata

Hi I am trying to create dyad from households id clustered within villages with stata. My problem is I do not know how to use vlookup in order to have a list of households id linked to every household.

Without a bit more information this question is tough to answer, but some places you can look are first tabulate to see your data broken down by variables. Another place to check is the bysort and gen commands, these together will probably be the answer you're looking for, although it is tough to tell from the question. Finally, you may want to look into encode if your village variable is a string, you will get a unique id for each village using that command.

Related

Ranking sum of data by two categories

I have data that I would like to rank by two separate categories, State and ServiceType. Essentially, there are multiple years of data for each ServiceType across various states, and I was hoping to get the sum of all years for each ServiceType by State, meaning each State is treated independently and the sums of the various categories are ranked only within that state, not nationally.
I've tried
bys State ServiceCategory (quant_variable): ///
egen rank_quant_variable= rank(sum(quant_variable)), field
as well as a version of above where I used a pre-calculated sum variable. Both don't really work.
This lacks a reproducible example, as you do not give your data or phrase your problem in terms of a dataset we could download, for example as loaded with or referred to in Stata. There is no need to give the full dataset but just a minimal example with the same structure.
The call to sum() here would be to Stata's sum() function, which yields the cumulative or running sum, which evidently isn't what you want. So that case is easy to dismiss.
The problem remaining is quite what you did in the code you don't show with a pre-calculated sum.
At a guess you worked out
bys State ServiceCategory: egen sum = total(quant_variable)
and then pushed that sum through rank(). But that would use each value of sum as many times as it occurred.
Perhaps you want something more like this:
egen tag = tag(State ServiceCategory)
bysort State: egen rank_quant_variable = rank(sum) if tag, field
bysort State (rank): replace rank = rank[1]
But it's really hard (for me) to visualize this without details on what you did or an example to work on.

How to obtain individual estimates (slopes/intercepts) in proc mixed (SAS)

I'm interested in seeing how sedentary behaviors change throughout time (Time 1, 2, 3) and see in a second step how it relates to mental health.
Thus, I would like to obtain an estimate (slope/intercept) for each subject to allow me to do the 2nd step. I can't find online how to do it (not sure what to search for).
Here's my code so far, which gives me 2 estimates (boys and girls); I would rather have an estimate for every participant.
ods output LSMeans=Means1;
proc mixed data=sb.LFcomplete method=ml covtest;
class SexeF time;
model CompDay = Time SexeF Time*SexeF;
repeated time;
lsmeans time*sexeF;
run;
Thank you in advance!
Please check this website for a similar example:
https://www.stat.ncsu.edu/people/davidian/courses/st732/examples/ex10_1.sas
The professor was using HLM for longitudinal data analysis. He used gender like your SexeF, age like your Time, and child as ID. The tricky part is when he was organizing the random effect file, he sorted ID and created Gender (Group or your SexeF) for subsequent merging with the fixed effects file. If your current ID variable is not aligned with your SexeF, you may sort your SexeF and create a new ID variable in SPSS before you import your data in SPSS.

Can I impute a variable conditional on another?

I am trying to impute the data about whether someone is born in the UK from wave 1 to wave 2. I suspect the egen function would work but I am not sure what the code would look like?
As you can see, I need to assign the same born in the uk response for person id 1 in wave 1 to wave 2.
I know I could do it by reshaping the dataset to a wide format but do you know whether there is any other way?
This is a Stata FAQ as accessible here.
You can copy downwards in the dataset without creating any new variables.
bysort id (wave) : replace born_in_uk = born_in_uk[_n-1] if missing(born_in_uk)
mipolate (SSC) has a groupwise option that checks for there being more than one non-missing value. Search within www.statalist.org for mentions.
Note that egen is a command, not a function.
I am not sure whether here born in the UK is numeric with labels or string. But, what if you would do something like:
encode born_in_UK, gen(born_num)
bysort person_id: egen born_num2=mean(born_num)
drop born_num
rename born_num2 born_num
The idea is to think of the repeating personal ids as groups and use the mean function to fill the missing values in the group. I think this should work.

Sum calculations in Stata

I have problems with managing data in stata and moreover I tried to search but maybe my search questions are incorrect. So I m very sorry if the question already existed.
if to look to the pic you will see, that I should calculate countries. Sum of the same countries by every id. I have a huge dataset, so I need to do it fast and not to loose time.
ask me questionsstata
Not entirely clear what you have in mind, but it sounds like you want:
bysort id country: generate count = _N
If not, a clearer example with fewer countries would be helpful.

How do I get a group minimum over level combinations of factors?

I would like to find minimum values within groups. In stata, I think it is simply "by group, sort : egen minvalue=min(value)"...
I tried to mess around with ave and rowsum, but to no avail.
ave(value, group, FUN=min) did not work.
Sorry this answer is a little late but, in case you are still looking for the answer or for future searchers here goes....
You are onto the right track with the -by- command. Here is what I'd do to find the lowest price of cars in the auto.dta dataset by domestic/foreign grouping.
sysuse auto, clear
bysort foreign : egen minprice = min(price)
What this does is create a new variable 'minprice' that holds the minimum price for domestic cars if a given car (observation) is domestic and vice versa for foreign cars. So this new variable with have just two values in this example and you can check that by doing:
tabulate minprice
Depending on why you wanted to find the minimum values by group this may not be what you had in mind but hopefully someone finds it helpful.