I'm interested in seeing how sedentary behaviors change throughout time (Time 1, 2, 3) and see in a second step how it relates to mental health.
Thus, I would like to obtain an estimate (slope/intercept) for each subject to allow me to do the 2nd step. I can't find online how to do it (not sure what to search for).
Here's my code so far, which gives me 2 estimates (boys and girls); I would rather have an estimate for every participant.
ods output LSMeans=Means1;
proc mixed data=sb.LFcomplete method=ml covtest;
class SexeF time;
model CompDay = Time SexeF Time*SexeF;
repeated time;
lsmeans time*sexeF;
run;
Thank you in advance!
Please check this website for a similar example:
https://www.stat.ncsu.edu/people/davidian/courses/st732/examples/ex10_1.sas
The professor was using HLM for longitudinal data analysis. He used gender like your SexeF, age like your Time, and child as ID. The tricky part is when he was organizing the random effect file, he sorted ID and created Gender (Group or your SexeF) for subsequent merging with the fixed effects file. If your current ID variable is not aligned with your SexeF, you may sort your SexeF and create a new ID variable in SPSS before you import your data in SPSS.
Related
I am examining the effect of passing vs running plays on injuries across a few football seasons. The way the data was collected, all injuries were recorded as well as information about the play in which the injury occurred (ie position, quarter, play type), game info (ie weather conditions, playing surface, etc), and team info (ie number of pass vs run plays in the game).
I would like to use one play as the primary exposure with the outcome as injury vs no injury with analysis using logistic regression, but to do so I would need to create all the records with no injury. There is a range from 0 to around 6-7 injuries in a game for a team, and the total passing and running plays are recorded so I would need to find a way to add X (total passing plays minus injuries on passing plays) and Y (total running plays - injuries on running plays) records that share all the details for that particular game but have no injury as the outcome. I imagine there is a way in proc sql to do this, but I could not find it online. How would I go about coding this?
I have attached an example of the relevant data. An example of what I would need to do is for game 1 add 30 records for passing plays and 38 records for running plays with outcome of no injury and otherwise the same data (team A, dry weather, game plays).
You can use the freq statement to prevent having to de-aggregate it.
The FREQ statement identifies a variable that contains the frequency
of occurrence of each observation. PROC LOGISTIC treats each
observation as if it appears n times, where n is the value of the FREQ
variable for the observation. If it is not an integer, the frequency
value is truncated to an integer.
SAS Documentation
De-aggregating the data would require the data step and a do loop. It's not recommended to do this.
How do you take an average of the coefficients across all months?
Please refer to this question earlier
How do I perform regression by month on the same SAS data set?
The comments in the linked question provide the code to get the estimates in a data set. Then you would run a PROC MEANS on the saved data set to get the averages. But you could also run the model without which a variable to get the monthly estimates alone. In general, it isn't common to average parameter estimates this way, except in a bootstrapping process.
I just start learning sas and would like some help with understanding the following chunk of code. The following program computes the annual payroll by department.
proc sort data = company.usa out=work.temp;
by dept;
run;
data company.budget(keep=dept payroll);
set work.temp;
by dept;
if wagecat ='S' then yearly = wagrate *12;
else if wagecat = 'H' then yearly = wagerate *2000;
if first.dept then payroll=0;
payroll+yearly;
if last.dept;
run;
Questions:
What does out = work.temp do in the first line of this code?
I understand the data step created 2 temporary variables for each by variable (first.varibale/last.variable) and the values are either 1 or 0, but what does first.dept and last.dept exactly do here in the code?
Why do we need payroll=0 after first.dept in the second to the last line?
This code takes the data for salaries and calculates the payroll amount for each department for a year, assuming salary is the same for all 12 months and that an hourly worker works 2000 hours.
It creates a copy of the data set which is sorted and stored in the work library. RTM.
From the docs
OUT= SAS-data-set
names the output data set. If SAS-data-set does not exist, then PROC SORT creates it.
CAUTION:
Use care when you use PROC SORT without OUT=.
Without the OUT= option, PROC SORT replaces the original data set with the sorted observations when the procedure executes without errors.
Default Without OUT=, PROC SORT overwrites the original data set.
Tips With in-database sorts, the output data set cannot refer to the input table on the DBMS.
You can use data set options with OUT=.
See SAS Data Set Options: Reference
Example Sorting by the Values of Multiple Variables
First.DEPT is an indicator variable that indicates the first observation of a specific BY group. So when you encounter the first record for a department it is identified. Last.DEPT is the last record for that specific department. It means the next record would the first record for a different department.
It sets PAYROLL to 0 at the first of each record. Since you have if last.dept; that means that only the last record for each department is outputted. This code is not intuitive - it's a manual way to sum the wages for people in each department. The common way would be to use a summary procedure, such as MEANS/SUMMARY but I assume they were trying to avoid having two passes of the data. Though if you're not sorting it may be just as fast anyways.
Again, RTM here. The SAS documentation is quite thorough on these beginner topics.
Here's an alternative method that should generate the exact same results but is more intuitive IMO.
data temp;
set company.usa;
if wagecat='S' then factor=12; *salary in months;
else if wagecat='H' then factor=2000; *salary in hours;
run;
proc means data=temp noprint NWAY;
class dept;
var wagerate;
weight factor;
output out=company.budget sum(wagerate)=payroll;
run;
I'm new to SAS EG, I usually use BASE SAS when I actually need the program, but my company is moving heavily toward EG. I'm helping some areas with some code to get data they need on an ad-hoc basis (the code won't change though).
However, during processing, we create many temporary files that are just iterations across months. I.E. if the user wants data from 2002 - 2016, we have to pull all those libraries and then concatenate them with our results. This is due to high transactional volume, the final dataset is limited to a small number of observations. Whenever I run this program though, SAS outputs all 183 of the datasteps created in the macro, making it very ugly, and sometimes the "Output Data" that appears isn't even output from the last datastep, but from an intermediary step, making it annoying to search through for the 'final output dataset'.
Is there a way to limit the datasets written to "Output Data" so that it only shows the final dataset - so that our end user doesn't need to worry about being confused?
Above is an example - There's a ton of output data sets that I don't care to see. I just want the final, which is located (somewhere) in that list...
Version is SAS E.G. 7.1
EG will always automatically show every dataset that was created after the program ends. If you don't want it to show any intermediate tables, delete them at the very last step in your process.
In your case, it looks as if your temporary tables all share the name TRN. You can clean it up as such:
/* Start of process flow */
<program statements>;
/* End of process flow*/
proc datasets lib=work nolist nowarn nodetails;
delete TRN:;
quit;
Be careful if you do this. Make sure that all of your temporary tables follow the same prefix naming scheme, otherwise you may accidentally delete tables that you need.
Another solution is to limit the number of datasets generated, and have a user-created link to the final dataset. There's an article about it here.
The alternate solution here is to add the output dataset explicitly as an entry on your process flow, and disregard the OUTPUT window unless you need to investigate something from the intermediary datasets.
This has the advantage that it lets you look at the intermediary datasets if something goes wrong, but also lets you not have to look through all of them to see the final dataset.
You should be able to add the final output dataset to the process flow once it's created once easily, and then after that one time it will be there for you to select to look at.
I have four variables Name, Date, MarketCap and Return. Name is the company name. Date is the time stamp. MarketCap shows the size of the company. Return is its return at day Date.
I want to create an additional variable MarketReturn which is the value weighted return of the market at each point time. For each day t, MarketCap weighted return = sum [ return(i)* (MarketCap(i)/Total(MarketCap) ] (return(i) is company i's return at day t).
The way I do this is very inefficient. I guess there must be some function can easily achieve this traget in SAS, So I want to ask if anyone can improve my code please.
step1: sort data by date
step2: calculate total market value at each day TotalMV = sum(MarketCap).
step3: calculate the weight for each company (weight = MarketCap/TotalMV)
step4: create a new variable 'Contribution' = Return * weight for each company
step5: sum up Contribution at each day. Sum(Contribution)
Weighted averages are supported in a number of SAS PROCs. One of the more common, all-around useful ones is PROC SUMMARY:
PROC SUMMARY NWAY DATA = my_data_set ;
CLASS Date ;
VAR Return / WEIGHT = MarketCap ;
OUTPUT
OUT = my_result_set
MEAN (Return) = MarketReturn
;
RUN;
The NWAY piece tells the PROC that the observations should be grouped only by what is stated in the CLASS statement - it shouldn't also provide an ungrouped grand total, etc.
The CLASS Date piece tells the PROC to group the observations by date. You do not need to pre-sort the data when you use CLASS. You do have to pre-sort if you say BY Date instead. The only rationale for using BY is if your dataset is very large and naturally ordered, you can gain some performance. Stick to CLASS in most cases.
VAR Return / WEIGHT = MarketCap tells the proc that any weighted calculations on Return should use MarketCap as the weight.
Lastly, the OUTPUT statement specifies the data set to write the results to (using the OUT option), and specifies the calculation of a mean on Return that will be written as MarketReturn.
There are many, many more things you can do with PROC SUMMARY. The documentation for PROC SUMMARY is sparse, but only because it is the nearly identical sibling of PROC MEANS, and SAS did not want to produce reams of mostly identical documentation for both. Here is the link to the SAS 9.4 PROC MEANS documentation. The main difference between the two PROCS is that SUMMARY only outputs to a dataset, while MEANS by default outputs to the screen. Try PROC MEANS if you want to see the result pop up on the screen right away.
The MEAN keyword in the OUTPUT statement comes from SAS's list of statistical keywords, a helpful reference for which is here.