I have the following datasets:
Date Primary_Occupation Jobs
1/1/2005 Math 23
1/1/2005 Science 7
1/1/2005 Food 10
1/1/2006 Math 10
1/1/2006 Sales 64
1/1/2006 Transportation 21
All the way until 11/1/2015
I am trying to tabulate the percentage of jobs by Primary_Occupation and overtime
I saw that proc univariate has a bunch of percentile options, but neither of them seem to be the solution for what I am looking to do.
Here's a template for you to get started. It creates a table with frequencies and percentages. In this example, the output table "summary" contains summary stats for this class of students by sex and age.
proc freq data=sashelp.class;
table sex*age / out=summary;
run;
Related
When I use the proc logistic in SAS, in the output, it return the confidence of interval and p-value of the odds ratio, how can I output the standard error of the odds ratio?
proc logistic data=edu;
model school = age sex income/ clodds=wald orpvalue;
oddsratio age;
run;
The output likes
Odds Ratio Estimates and Wald confidence interval
Point 95% Wald
Effect Estimate Confidence Limits p-value
age 1.21 0.74 2.001 < 0.01
Tip: The documentation page Proc Logistic Details -> ODS Table Names lists all the tables the procedure will produce for ODS. The ODDSRATIO ... /CL=WALD ...; statement creates an output table named OddsRatiosWald.
The ODS TRACE ON statement will also log the the table names that a Proc Step produces for ODS output.
Save the table as an output data set using the ODS OUTPUT statement.
Example:
Code from SAS samples tweaked to save ODS OUTPUT.
* Example 76.4 Nominal Response Data: Generalized Logits Model;
data school;
length Program $ 9;
input School Program $ Style $ Count ##;
datalines;
1 regular self 10 1 regular team 17 1 regular class 26
1 afternoon self 5 1 afternoon team 12 1 afternoon class 50
2 regular self 21 2 regular team 17 2 regular class 26
2 afternoon self 16 2 afternoon team 12 2 afternoon class 36
3 regular self 15 3 regular team 15 3 regular class 16
3 afternoon self 12 3 afternoon team 12 3 afternoon class 20
;
ods trace on;
ods graphics on;
ods html file='logistic.html';
proc logistic data=school;
freq Count;
class School Program(ref=first);
model Style(order=data)=School Program School*Program / link=glogit;
oddsratio program / cl=wald;
ods output OddsRatiosWald=or_program;
run;
proc print data=or_program;
title "Logistic Odds Ratios CL=Wald output data";
run;
ods html close;
ods trace off;
title;
Output data as examined by viewtable in Base SAS
I have the DAILY returns of industry portfolios in SAS.
I would like to calculate the WEEKLY returns.
The daily returns are in percentage so I think that should just be the sum of returns during each week.
Obvious problems I am facing is that the weeks can have a different number of days in.
The table I have in SAS is in the following format:
INDUSTRY_NUMBER DATE DAILY_RETURN
Any help would be greatly appreciated.
I have tried this:
proc expand data=Day_result
out=Week_result from=day to=week;
Industry_Number Trading_Date;
convert Value_weighted_return / method=aggregate observed=total;
run;
The daily data is in Day_Result when I remove the forth line i.e.
proc expand data=Day_result
out=Week_result from=day to=week;
convert Value_weighted_return / method=aggregate observed=total;
run;
This works as in it does what I want it to do but it doesn't do it for each category it does it for the whole table.
So if I have 40 categories I want the weekly returns for each category.
The second set of code provides the weekly return for every category.
EXAMPLE DATA:
data have;
format trading_date date9.;
infile datalines dlm=',';
input trading_date:ddmmyy10. industry_number value_weighted_return;
datalines;
19/01/2000,1, -0.008
20/01/2000,1, 0.008
23/01/2000,1, 0.008
24/01/2000,1, -0.007
25/01/2000,1, -0.009
26/01/2000,1, 0.008
27/01/2000,1, -0.008
30/01/2000,1, 0.003
31/01/2000,1, -0.001
01/02/2000,1, 0.004
02/02/2000,2, -0.008
03/02/2000,2, -0.005
06/02/2000,2, -0.004
07/02/2000,2, -0.009
08/02/2000,2, 0.002
09/02/2000,2, 0.006
10/02/2000,2, 0.008
13/02/2000,2, 0.008
14/02/2000,2, 0.002
15/02/2000,2, 0.01
16/02/2000,2, -0.008
;
run;
Sort your data by INDUSTRY_NUMBER Trading_Date, use INDUSTRY_NUMBER as a by-group, identify your time variable.
proc sort data=have;
by industry_number trading_date;
run;
Next, convert your data into a time-series to remove any time gaps. Set any missing days as the previous value since it does not change on those trading days (e.g. weekends, bank holidays, etc.).
proc timeseries data=have
out=have_ts;
by industry_number;
id trading_date interval=day
setmissing=previous
accumulate=average
;
var value_weighted_return;
run;
Finally, take the time-series output and convert it from day to week. Since you are using weights, you may want to use average rather than total.
proc expand data=have_ts
out=have_ts_week
from=day
to=week
;
by industry_number;
id trading_date;
convert Value_weighted_return / method=aggregate observed=average;
run;
I am trying to seek some validation, this may be trivial for most but I am by no means an expert at statistics. I am trying to select patients in the top 1% based on a score within each drug and location. The data would look something like this (on a much larger scale):
Patient drug place score
John a TX 12
Steven a TX 10
Jim B TX 9
Sara B TX 4
Tony B TX 2
Megan a OK 20
Tom a OK 10
Phil B OK 9
Karen B OK 2
The code snipit I have written to calculate those top 1% patients is as follows:
proc sql;
create table example as
select *,
score/avg(score) as test_measure
from prior_table
group by drug, place
having test_measure>.99;
quit;
Does this achieve what I am trying to do, or am going about it all wrong? Sorry if this is really trivial to most.
Thanks
There are multiple ways to calculate and estimate a percentile. A simple way is to use PROC SUMMARY
proc summary data=have;
var score;
output out=pct p99=p99;
run;
This will create a data set named pct with a variable p99 containing the 99th percentile.
Then filter your table for values >=p99
proc sql noprint;
create table want as
select a.*
from have as a
where a.score >= (select p99 from pct);
quit;
I have the following dataset:
Date Occupation Tota_Employed
1/1/2005 Teacher 45
1/1/2005 Economist 76
1/1/2005 Artist 14
2/1/2005 Doctor 26
2/1/2005 Economist 14
2/1/2005 Mathematician 10
and so on until November 2014
What I am trying to do is to calculate a column of percentage of employed by occupation such that my data will look like this:
Date Occupation Tota_Employed Percent_Emp_by_Occupation
1/1/2005 Teacher 45 33.33
1/1/2005 Economist 76 56.29
1/1/2005 Artist 14 10.37
2/1/2005 Doctor 26 52.00
2/1/2005 Economist 14 28.00
2/1/2005 Mathematician 10 20.00
where the percent_emp_by_occupation is calculated by dividing total_employed by each date (month&year) by total sum for each occupation to get the percentage:
Example for Teacher: (45/135)*100, where 135 is the sum of 45+76+14
I know I can get a table via proc tabulate, but was wondering if there is anyway of getting it through another procedure, specially since I wanted this as a separate dataset.
What is the best way to go about doing this? Thanks in advance.
Extract month and year from the date and create a key:
data ds;
set ds;
month=month(date);
year=year(date);
key=catx("_",month,year);
run;
Roll up the total at month level:
Proc sql;
create table month_total as
select key,sum(total_employed) as monthly_total
from ds
group by key;
quit;
Update the original data with the monthly total:
Proc sql;
create table ds as
select a.*,b.monthly_total
from ds as a left join month_total as b
on a.key=b.key;
quit;
This would lead to the following data set:
Date Occupation Tota_Employed monthly_total
1/1/2005 Teacher 45 135
1/1/2005 Economist 76 135
1/1/2005 Artist 14 135
Finally calculate the percentage as:
data ds;
set ds;
percentage=total_employed/monthly_total;
run;
Here you go:
proc sql;
create table occ2 as
select
occ.*,
total_employed/employed_by_date as percentage_employed_by_date format=percent7.1
from
occ a
join
(select
date,
sum(total_employed) as employed_by_date
from occ
group by date) b
on
a.date = b.date
;
quit;
Produces a table like so:
One last thought: you can create all of the totals you desire for this calculation in one pass of the data. I looked at a prior question you asked about this data and assumed that you used proc means to summarize your initial data by date and occupation. You can calculate the totals by date as well in the same procedure. I don't have your data, so I'll illustrate the concept with sashelp.class data set that comes with every SAS installation.
In this example, I want to get the total number of students by sex and age, but I also want to get the total students by sex because I will calculate the percentage of students by sex later. Here's how to summarize the data and get counts for 2 different levels of summary.
proc summary data=sashelp.class;
class sex age;
types sex sex*age;
var height;
output out=summary (drop=_freq_) n=count;
run;
The types statement identifies the levels of summary of my class variables. In this case, I want counts of just sex, as well as the counts of sex by age. Here's what the output looks like.
The _TYPE_ variable identifies the level of summary. The total count of sex is _TYPE_=2 while the count of sex by age is _TYPE_=3.
Then a simple SQL query to calculate the percentages within sex.
proc sql;
create table summary2 as
select
a.sex,
a.age,
a.count,
a.count/b.count as percent_of_sex format=percent7.1
from
summary (where=(_type_=3)) a /* sex * age */
join
summary (where=(_type_=2)) b /* sex */
on
a.sex = b.sex
;
quit;
The answer is to look back at the questions you have asked in the last few days about this same data and study those answers. Your answer is there.
While you are reviewing those answers, take time to thank them and give someone a check for helping you out.
I have a dataset of transactional data per week. (quantity, price, week, etc.)
However in the dataset i have two prices for the same week.
eg two observations for week 28 (one at price 5.03 and one at price 5.20)
what i want to do is calculate the weighted average price depending on the quantity and sum the quantity for the two different obs so that i have only one obs for week 28.
this happens frequently so i would like to be able to do this quickly without editing manually all prices and quantities.
Oh and this is in SAS btw!
Thanks!
PROC SUMMARY with the WEIGHT statement applied against price will calculate this for you.
proc summary data=have nway;
class week;
var quantity;
var price / weight=quantity;
output out=want (drop=_:) sum(quantity)= mean(price)=;
run;