I want to create a variable in SAS that satisfies certain conditions
The variable defines a group 'definitive RT'
if a case got 5 weeks of radiation therapy then they are included. They also cannot have surgery, chemotherapy or any other treatment.
How do I tell sas to make a group which got radiation for at least 5 weeks and no other treatment?
What kind of statement would this be ?
Related
I have a simple django project and I am trying to keep track of ranks for certain objects to see how they change over time. For example, what was the rank of US GDP (compared to other countries) over last 3 years. Below is the postgres db structure I am working with:
Below is what I am trying to achieve:
What I am finding challenging is that the previous period value may or may not exist and it's possible that even the entity may or may not be in the pervious period. Period can be year, quarter or months but for a specific record it can be either of one and stays consistently same for all the years for that record.
Can someone guide me in the right direction to write a query to achieve those tables? I am trying to avoid writing heavy forloop queries because there may be 100s of entities and many years of data.
So far I have only been able to achieve the below output:
I am just trying to figure out how to use annotate to fetch previous period values and ranks but I am pretty much stuck.
How do I train to find the occurrence of a US state, when this set is constrained to 50 states because we need a large amount of data (say 1000 rows) to train a certain label.
I think it depends on the task you're trying to solve here. Do you need to differentiate if some two-letter combinations are US state name or not? Just a simple set of names would work? Or you're trying to build some kind of simple NER (https://en.wikipedia.org/wiki/Named-entity_recognition) for state names? This way, you can also start with simple matching by regex, but if you want to train some model later - you have much more than 50 examples. Your dataset won't be just "is these two letters represent state or not", but many sentences, which have state names somewhere in them, or not at all.
Calculate one child’s allowance, based upon 75 cents per year old if s/he is under 10, and $1 per year if 10 or over.
I could have stated them at the beginning as well but is the use of the rate variables after the decision box correct?
Yes, it's functionally correct.
You could get rid of some of the sections if you wish by just using one variable to store the rate (rather than rate1 / rate2), and then just using the decision block to set the variable differently. You then wouldn't need to duplicate the last two statements.
I have a group of treated firms in a country, and for each firm I would like to find the closest match in terms of industry, size and profitability in the rest of the country. I am working on Stata. All I need is to form a control group- could anybody guide me with the code? That'd be greatly appreciated! I currently have the following, which doesn't get me what I need:
psmatch2 (logpension) (treated sector logassets logebitda), logit ate
Here's how you might match on x1 and x2 using Mahalanobis distance as a metric, to get the effect on y from treatment t:
use http://ssc.wisc.edu/sscc/pubs/files/psm, clear
psmatch2 t, mahalanobis(x1 x2) outcome(y) ate
The variable _n1 stores the observation number of the matched control observation for every treatment observation.
The following is a full set of code you can run to find your average treatment effect on the treated (your most important indicator result) and to check if the data is balanced (whether your result is valid). Before you run it, you need to make sure your treated is labeled in the following manner: 0 should be labeled as the control group and 1 should be labeled as the experimental/treatment. "neighbor(1)" means I chose the option nearest-neighbor matching. It basically pairs each treated observation with a control observation whose propensity score is closest in absolute value.
psmatch2 treated sector logassets logebitda, outcome (logpension) neighbor(1) common
After running psmatch, you need to make sure your data is balanced. So you need to run this:
pstest sector logassets logebitda, treated(treated)
if your t-test shows any significance below 0.05, it means your data is not balanced. to check the balance of your data visually, you can also run
psgraph
right after your psmatch2 command.
Good luck!
I have a dataset with observations at specific timepoints, but those timepoints (and the length of time between them) vary by group. I'm trying to "fill down" the data so that existing observations are carried down into missing cells. But I only want to do this for a certain number of rows after the original observation. So for example, I could have a dataset that looks like this:
For group A, I'd want to fill in the value for 2002 with 2001's value, 2004 with 2003, etc. I wouldn't want to fill in 2000 at all, since I don't have the preceding value. And I ALSO wouldn't want to fill in the 2011 value, because the "cyclelength" variable tells me that group A's observations are supposed to take place every two years, so I don't want to carry data forward past that. 2011 is just a genuinely missing value.
Similarly, in group B, I'd want to carry 2000's value forward into years 2001, 2002, and 2003 (because the "cyclelength" here is 4 years). I'd want to carry 2004's value into 2005, 2006, and 2007, but not beyond that--the later years should stay missing.
I've tried setting this up with the "carryforward" command, but haven't figured out how to have it stop filling down after a specified number of years that varies by group. Is there a way to do this, either with carryforward or otherwise?
This is a variation on a problem documented since 2000 as an FAQ: see here
The variation lies in limiting how far non-missing values are copied. But it falls easily to the same idea.
The last known value was recorded in certain years which we can copy down the dataset:
gen when_last_known = year if !missing(value)
bysort group (year) : replace when_last_known = when_last_known[_n-1] if missing(when_last_known)
Now the replacement wanted is
by group : replace value = value[_n-1] if missing(value) & (year - when_last_known) < cyclelength
That statement presupposes the sort order of the previous statement.
On Statalist (see here) you'd be expected to document that carryforward is a user-written command to be installed from SSC. That's a good convention here too.
In practice, it's good data management to keep the original data exactly as they arrive and do this on a clone of the variable. Sooner or later someone will ask to see the original values, and then you could be seriously embarrassed.