edit similar values in sas - sas

I have a dataset of transactional data per week. (quantity, price, week, etc.)
However in the dataset i have two prices for the same week.
eg two observations for week 28 (one at price 5.03 and one at price 5.20)
what i want to do is calculate the weighted average price depending on the quantity and sum the quantity for the two different obs so that i have only one obs for week 28.
this happens frequently so i would like to be able to do this quickly without editing manually all prices and quantities.
Oh and this is in SAS btw!
Thanks!

PROC SUMMARY with the WEIGHT statement applied against price will calculate this for you.
proc summary data=have nway;
class week;
var quantity;
var price / weight=quantity;
output out=want (drop=_:) sum(quantity)= mean(price)=;
run;

Related

SAS How to sum a variable in duplicate records

Noob SAS user here.
I have a hospital data set with patientID and a variable that counts the days between admission and discharge.
Those patients who had more than one hospital admission show up with the same patientID and with a record of how many days they were in hospital each time.
I want to sum the total days in hospital per patient, and then only have one patientID record with the sum of all hospital days across all stays. Does anyone know how I would go about this?
You want to select distinct the sum of days_in_hospital and group by patientID This will get what you want:
proc sql;
create table want as
select distinct
patientID,
sum(days_in_hospital) as sum_of_days
from have
group by patientID;
quit;
Alternatively you can use proc summary.
proc summary data= hospital_data nway;
class patientID;
var days;
output out=summarized_data (drop = _type_ _freq_) sum=;
run;
This creates a new dataset called summarized_data which has the summed days for each patientID. (The nway option removes the overall summary row, and the drop statement removes extra default summary columns you don't need.)

SAS: Change time series freuqncy (Proc Expand)

I have a stock price dataset which has observations in miliseconds (Variables: STOCK DATE TIME(in ms) PRICE. It is sorted by stock, date, and time.
I now need a dataset where the freuqency is 1-second intervals. The price variable should be the prevailing price at the second.
I tried proc expand:
proc expand data=have out=want to=second;
id stock date time; run;
But it does not work that way.
Any help is appreciated!
M
got it: proc timeseries with id time and interval=second works!

Calculating Percentile by Date

I have the following datasets:
Date Primary_Occupation Jobs
1/1/2005 Math 23
1/1/2005 Science 7
1/1/2005 Food 10
1/1/2006 Math 10
1/1/2006 Sales 64
1/1/2006 Transportation 21
All the way until 11/1/2015
I am trying to tabulate the percentage of jobs by Primary_Occupation and overtime
I saw that proc univariate has a bunch of percentile options, but neither of them seem to be the solution for what I am looking to do.
Here's a template for you to get started. It creates a table with frequencies and percentages. In this example, the output table "summary" contains summary stats for this class of students by sex and age.
proc freq data=sashelp.class;
table sex*age / out=summary;
run;

Setting op table in SAS via proc tabulate

I have some data about students and their dropout procent. I have information abaout which education they started on, in a city (some educations are found in more cities) and the year they started their education. I also have information about wheter there were a quotient the studetns had to meet to be able to start their education.
The quotient variable can consist of numeric values and character values (see the table)
I want to make a table in SAS where I have the quotient and the dropout % like in the below picure:
So for each education and for each city I have the years out as rows and in the cells I have the quota for that year and the dropout % for the year.
I can not do it in SAS. I have tried:
proc tabulate data= sammensat missing;
var dropout;
class education year city quota ;
Table education* city,year *dropout all/ rts=180;
run;
This gives me part of the output I want. But I want another row showing the quota for each combination of education and city for each year.
Two problems: including the quota, and dealing with the char values.
Including quota is easy, if it's numeric.
proc tabulate data= sammensat missing;
class education year city;
var dropout quota;
Table education*city*(quota dropout),year all/ rts=180;
run;
You might need to add in statistics to those if they're not both the same (and both N); probably *mean for both, not sure exactly what your data looks like.
To deal with the character problem, you need to create a format that has either special values for the quota, if they're just assigned values (this city-year-education combination has no quota by definition), or uses values that show there is no quota (missing, 0, etc.).
proc format;
value quotaf
-1='NO QUOTA'
-9='PASSED AUDITON'
0-high=[3.1]
;
quit;
Then use that to format quota, either in the dataset or with a f= option on quota in the proc tabulate.

Contingency table in SAS

I have data on exam results for 2 years for a number of students. I have a column with the year, the students name and the mark. Some students don't appear in year 2 because they don't sit any exams in the second year. I want to show whether the performance of students persists or whether there's any pattern in their subsequent performance. I can split the data into two halves of equal size to account for the 'first-half' and 'second-half' marks. I can also split the first half into quintiles according to the exam results using 'proc rank'
I know the output I want is a 5 X 5 table that has the original 5 quintiles on one axis and the 5 subsequent quintiles plus a 'dropped out' category as well, so a 5 x 6 matrix. There will obviously be around 20% of the total number of students in each quintile in the first exam, and if there's no relationship there should be 16.67% in each of the 6 susequent categories. But I don't know how to proceed to show whether this is the case of not with this data.
How can I go about doing this in SAS, please? Could someone point me towards a good tutorial that would show how to set this up? I've been searching for terms like 'performance persistence' etc, but to no avail. . .
I've been proceeding like this to set up my dataset. I've added a column with 0 or 1 for the first or second half of the data using the first procedure below. I've also added a column with the quintile rank in terms of marks for all the students. But I think I've gone about this the wrong way. Shoudn't I be dividing the data into quintiles in each half, rather than across the whole two periods?
Proc rank groups=2;
var yearquarter;
ranks ExamRank;
run;
Proc rank groups=5;
var percentageResult;
ranks PerformanceRank;
run;
Thanks in advance.
Why are you dividing the data into quintiles?
I would leave the scores as they are, then make a scatterplot with
PROC SGPLOT data = dataset;
x = year1;
y = year2;
loess x = year1 y = year2;
run;
Here's a fairly basic example of the simple tabulation. I transpose your quintile data and then make a table. Here there is basically no relationship, except that I only allow a 5% DNF so you have more like 19% 19% 19% 19% 19% 5%.
data have;
do i = 1 to 10000;
do year = 1 to 2;
if year=2 and ranuni(7) < 0.05 then call missing(quintile);
else quintile = ceil(5*ranuni(7));
output;
end;
end;
run;
proc transpose data=have prefix=year out=have_t;
by i;
var quintile;
id year;
run;
proc tabulate data=have_t missing;
class year1 year2;
tables year1,year2*rowpctn;
run;
PROC CORRESP might be helpful for the analysis, though it doesn't look like it exactly does what you want.
proc corresp data=have_t outc=want outf=want2 missing;
tables year1,year2;
run;