I am trying to create a comparison group of observations using propensity score matching. There are some characteristics that I care more about matching on than others. My questions are:
Is it possible to adjust the relative weights of variables I'm matching on when constructing the propensity score?
If so, how would one do this in Stata (with the psmatch2 command, for example)?
Thanks!
Take a look at coarsened exact matching. There's a user-written command called cem that implements this in Stata and several other packages.
However, this is not equivalent to PSM, where the scores are estimated rather than imposed by the analyst.
Related
How do I train to find the occurrence of a US state, when this set is constrained to 50 states because we need a large amount of data (say 1000 rows) to train a certain label.
I think it depends on the task you're trying to solve here. Do you need to differentiate if some two-letter combinations are US state name or not? Just a simple set of names would work? Or you're trying to build some kind of simple NER (https://en.wikipedia.org/wiki/Named-entity_recognition) for state names? This way, you can also start with simple matching by regex, but if you want to train some model later - you have much more than 50 examples. Your dataset won't be just "is these two letters represent state or not", but many sentences, which have state names somewhere in them, or not at all.
I am using propensity score stratification method. I got some output but can't interpret. I am looking for a source how to interpret those results.
I have divided PS scores into 5 groups and got this output at the end after running some codes
obs =1
type =0
freq =10 sum_wt = 1010988.4 sum_diff= 0.0015572 mean-diff= 0.0015572 SE-diff= 0.0000994551
I know that frequency column stands for 2*5(number of groups), mean diff is equal to sum diff and SE diff is the sq rt of 1-sum of weights
Does it say that ranking PS scores into 5 groups is an appropriate approach ? Which of above criteria I should use for final decision?
I believe your output is just stating the distribution within the groups. You evaluate whether or not propensity score matching, in your case stratified matching, works by looking at the absolute standardized differences of the variables pre vs post-matching.
Here is a peer reviewed paper my colleagues and I published that incorporates propensity score matching. There is some details in the methodology section that I wrote which should answer your question on how to evaluate if your approach is working.
I am using H2O machine learning package to do natural language predictions, including the functions h2o.word2vec and h2o.transform. I need sentence level aggregation, which is provided by the AVERAGE parameter value:
h2o.transform(word2vec, words, aggregate_method = c("NONE", "AVERAGE"))
However, in my case I strongly wish to avoid equal weighting of "the" and "platypus" for example.
Here's a scheme I concocted to achieve custom word-weightings. If H2O's word2vec "AVERAGE" option uses all the words including duplicates that might appear, then I could effect a custom word weighting when calling h2o.transform by adding additional duplicates of certain words to my sentences, when I want to weight them more heavily than other words.
Can any H2O experts confirm that that the word2vec AVERAGE parameter is using all the words rather than just the unique words when computing AVERAGE of the words in sentence?
Alternatively, is there a better way? I tried but I find myself unable to imagine any correct math to multiply the sentence average by some factor, after it was already computed.
Yes, h2o.transform will consider each occurrence of a word for the averaging, not just the unique words. Your trick will therefore work.
There is currently no direct way to provide user defined weights. You could probably do an ugly hack and weight directly the word embeddings but that won't be a straightforward solution I could recommend.
We can add this feature to H2O. I would love to hear what API would work for you (how would you like to provide the weights).
I have a group of treated firms in a country, and for each firm I would like to find the closest match in terms of industry, size and profitability in the rest of the country. I am working on Stata. All I need is to form a control group- could anybody guide me with the code? That'd be greatly appreciated! I currently have the following, which doesn't get me what I need:
psmatch2 (logpension) (treated sector logassets logebitda), logit ate
Here's how you might match on x1 and x2 using Mahalanobis distance as a metric, to get the effect on y from treatment t:
use http://ssc.wisc.edu/sscc/pubs/files/psm, clear
psmatch2 t, mahalanobis(x1 x2) outcome(y) ate
The variable _n1 stores the observation number of the matched control observation for every treatment observation.
The following is a full set of code you can run to find your average treatment effect on the treated (your most important indicator result) and to check if the data is balanced (whether your result is valid). Before you run it, you need to make sure your treated is labeled in the following manner: 0 should be labeled as the control group and 1 should be labeled as the experimental/treatment. "neighbor(1)" means I chose the option nearest-neighbor matching. It basically pairs each treated observation with a control observation whose propensity score is closest in absolute value.
psmatch2 treated sector logassets logebitda, outcome (logpension) neighbor(1) common
After running psmatch, you need to make sure your data is balanced. So you need to run this:
pstest sector logassets logebitda, treated(treated)
if your t-test shows any significance below 0.05, it means your data is not balanced. to check the balance of your data visually, you can also run
psgraph
right after your psmatch2 command.
Good luck!
I've estimated a model via maximum likelihood in Stata and was surprised to find that estimated standard errors for one particular parameter are drastically smaller when clustering observations. I take it from the Stata manual on robust standard error estimation in ML that this can happen if the contributions of individual observations to the score (the derivative of the log-likelihood) tend to cancel each other within clusters.
I would now like to dig a little deeper into what exactly is happening and would therefore like to have a look at these score contributions. As far as I can see, however, Stata only gives me the total sum as e(gradient). Is there any way to pry the individual summands out of Stata?
If you have written your own command, you can create a new variable containing these scores using the ml score command. Official Stata commands and most finished user written commands will often have score as an option for predict, which does the same thing but with an easier syntax.
These will give you the score of the log likelihood ($\ell$) with respect to the linear predictor, $x\beta = \beta_0 + \beta_1 x_1 + \beta_2 x_2 \elipses$. To get the derivative of the log likelihood with respect to an individual parameter, say $\beta_1$, you just use the chain rule:
$\frac{\partial \ell}{\partial \beta_1} = \frac{\partial \ell }{\partial x\beta} \frac{\partial x\beta}{\partial \beta_1}$
The scores returned by Stata are $ \frac{\partial \ell }{\partial x\beta}$, and $\frac{\partial x\beta}{\partial \beta_1} = x_1$.
So, to get the score for $\beta_1$ you just multiply the score returned by Stata and $x_1$.