I like to calculate percentages of multiple variables. That is calculting sum of each variable(column) and divide each variable sum by frequnecy. I tried to get proc summary to get those two stats and make array to compute percentages but the result values do not seem to be right but no errors in the log. I saw proc sql can do percentage calculation but I do not know how to do that for multiple variables. --must not be difficult but I just am not sure how to list them.
If you can either point out what I did wrong 1) in the proc summary way, or 2) direct me how to write proc SQL variable list, that would be great. thank you,
Here is code I wrote:
1)
proc summary data=m3resp;
var Lma1-Lma69;
output out=sumofLma sum=sumLma1-sumLma69;
run;
data sumLmaout;
set sumoflma;
array pctma[69];
do i=1 to 69;
pctma[i]=input(cat(sumLma,i),5.)/_freq_;
end;
drop i;
run;
2)
PROC SQL;
SELECT Lma1
(Lma1/SUM(Lma1)) AS PCTLma1
FROM m3resp;
QUIT;
PROC Means is a great tool for summarizing numeric variables. (To prevent the report, add NOPRINT on the PROC MEANS line).
data numbers;
input num1 num2;
datalines;
10 5
10 5
10 5
10 5
10 5
10 5
10 5
10 5
10 5
10 5
;
proc means n sum mean median data=work.numbers maxdec=2;
var num1 num2;
output out=work.totals(drop=_freq_ _type_)
sum(num1 num2)=NUM1_TOTAL NUM2_TOTAL
n(num1 num2)=NUM1_COUNT NUM2_COUNT;
run;
Related
I just performed the fisher test in R and in Excel on a 2x2 table with the contents 1 6 and 7 2. I can't manage to do this in sas.
data my_table;
input var1 var2 ##;
datalines;
1 6 7 2
;
proc freq data=my_table;
tables var1*var2 / fisher;
run;
The test somehow ignores that the table consists of the 4 variables but when I print the table it looks fine. In the test the contents of the table are 0, 1, 1, 0. I guess I need to change something when creating the data but what?
You do NOT have two variables that each have two categories.
Read the data in this way instead.
data have ;
do var1=1,2 ; do var2=1,2;
input count ##;
output;
end; end;
datalines;
1 6 7 2
;
Now VAR1 and VAR2 both have two possible values and COUNT has the number of cases for the particular combination. Use the WEIGHT statement to tell PROC FREQ to use COUNT as the number of cases.
proc freq data=have ;
weight count ;
tables var1*var2 / fisher ;
run;
I have a SAS data set that contains a column of numbers ranging from -2000 to 4000.
I want to select 37 random samples based on the following conditions.
If num between -2000 to -1000, randomly select 10 samples from this range,
if num between -1000 to 0, randomly select 15 sample from this range,
if num between 0 to 1000, randomly select 12 samples from this range,
I've tried the following:
proc surveyselect data=save.table
method=srs n=37 out=save.table_sample seed=1953;
run;
But this would give me random 37 samples from the whole population. I want to randomly select according the data range.
Please help with SAS code, thanks so much in advance!
Create a grouping variable in your data set that you can use to group analysis.
data output;
set save.table;
if number < -1000 then group=1;
else if number < 0 then group=2;
else if number < 1000 then group=3;
run;
Use PROC SURVEYSELECT with either a data set that has the same variable, GROUP, as well as the sample size or list the sample size in the PROC SURVEYSELECT.
proc surveyselect data=output
method=srs n=37 out=save.table_sample seed=1953 sampsize=(37 15 12);
strata group;
run;
Couldn't test because no sample data was provided, so here's an example using SASHELP.HEART
proc sort data=sashelp.heart out=heart; by chol_status; run;
proc surveyselect data=heart (where=(not missing(chol_status))) method=srs sampsize=(5 10 15) out=want;
strata chol_status;
run;
If you want to continue to use proc surveyselect, then a simple way to do this is:
data set1 set2 set3;
set save.table;
if number < -1000 then output set1;
else if number < 0 then output set2;
else if number < 1000 then output set3;
run;
Then call proc surveyselect thrice with different n values on these 3 datasets.
I have a set of pre and post scores, with values that can be 1 or 2, e.g.:
Pre Post
1 2
1 1
2 2
2 1
1 2
2 1
etc.
I need to create a 2x2 table that lists the frequencies, with percentages ONLY in the total row/column:
1 2 Total
1 14 60 74 / 30%
2 38 12 50 / 20%
Total 52 / 21% 72 / 29% 248
It doesn't need to be formatted specifically with the / between the n and percent, they can be on different lines. I just need to make sure the total percentages (no cumulative percentages) are in the table.
I think that I should use proc tabulate to get this, but I'm new to SAS and haven't been able to figure it out. Any help would be greatly appreciated.
Code I've tried:
proc tabulate data=.bilirubin order=data;
class pre ;
var post ;
table pre , post*( n colpctsum);
run;
You could make your own report. For example you could use PROC SUMMARY to get frequencies. Add a data step to calculate the percent and generate a character variable with the text you want to display. Then use PROC REPORT to display it.
proc summary data=have ;
class pre post ;
output out=summary ;
run;
proc format ;
value total .='Total (%)';
run;
data results ;
set summary ;
length display $20 ;
if _type_=0 then n=_freq_;
retain n;
if _type_ in (0,3) then display = put(_freq_,comma9.);
else display = catx(' ',put(_freq_,comma9.),cats('(',put(_freq_/n,percent8.2),')'));
run;
proc report missing data=results ;
column pre display,post n ;
define pre / group ;
define post / across ;
define n / noprint ;
define display / display ' ';
format pre post total.;
run;
I created this fakedata as an example:
data fakedata;
length name $5;
infile datalines;
input name count percent;
return;
datalines;
Ania 1 17
Basia 1 3
Ola 1 10
Basia 1 52
Basia 1 2
Basia 1 16
;
run;
The result I want to have is:
---> summed counts and percents for Basia
I would like to have summed count and percent for Basia as she was only once in the table with count 4 and percent 83. I tried exchanging name into a number to do GROUP BY in proc sql but it changes into order by (I had such an error). Suppose that it isn't so difficult, but I can't find the solution. I also tried some arrays without any success. Any help appreciated!
It sounds like proc sql does what you want:
proc sql;
select name, count(*) as cnt, sum(percent) as sum_percent
from fakedata
group by name;
You can add a where clause to get the results just for one name.
Hm, actually I got an answer.
proc summary data=fakedata;
by name;
var count percent;
output out=wynik (drop = _FREQ_ _TYPE_) sum(count)=count sum(percent)=percent;
run;
You can go back a step and use PROC FREQ most likely to generate this output in a single step. Based on counts the percents are not correct, but I'm not sure they're intended to be, right now they add up to over 100%. If you already have some summaries, then use the WEIGHT statement to account for the counts.
proc freq data=fakedata;
table name;
weight count;
run;
I have data on exam results for 2 years for a number of students. I have a column with the year, the students name and the mark. Some students don't appear in year 2 because they don't sit any exams in the second year. I want to show whether the performance of students persists or whether there's any pattern in their subsequent performance. I can split the data into two halves of equal size to account for the 'first-half' and 'second-half' marks. I can also split the first half into quintiles according to the exam results using 'proc rank'
I know the output I want is a 5 X 5 table that has the original 5 quintiles on one axis and the 5 subsequent quintiles plus a 'dropped out' category as well, so a 5 x 6 matrix. There will obviously be around 20% of the total number of students in each quintile in the first exam, and if there's no relationship there should be 16.67% in each of the 6 susequent categories. But I don't know how to proceed to show whether this is the case of not with this data.
How can I go about doing this in SAS, please? Could someone point me towards a good tutorial that would show how to set this up? I've been searching for terms like 'performance persistence' etc, but to no avail. . .
I've been proceeding like this to set up my dataset. I've added a column with 0 or 1 for the first or second half of the data using the first procedure below. I've also added a column with the quintile rank in terms of marks for all the students. But I think I've gone about this the wrong way. Shoudn't I be dividing the data into quintiles in each half, rather than across the whole two periods?
Proc rank groups=2;
var yearquarter;
ranks ExamRank;
run;
Proc rank groups=5;
var percentageResult;
ranks PerformanceRank;
run;
Thanks in advance.
Why are you dividing the data into quintiles?
I would leave the scores as they are, then make a scatterplot with
PROC SGPLOT data = dataset;
x = year1;
y = year2;
loess x = year1 y = year2;
run;
Here's a fairly basic example of the simple tabulation. I transpose your quintile data and then make a table. Here there is basically no relationship, except that I only allow a 5% DNF so you have more like 19% 19% 19% 19% 19% 5%.
data have;
do i = 1 to 10000;
do year = 1 to 2;
if year=2 and ranuni(7) < 0.05 then call missing(quintile);
else quintile = ceil(5*ranuni(7));
output;
end;
end;
run;
proc transpose data=have prefix=year out=have_t;
by i;
var quintile;
id year;
run;
proc tabulate data=have_t missing;
class year1 year2;
tables year1,year2*rowpctn;
run;
PROC CORRESP might be helpful for the analysis, though it doesn't look like it exactly does what you want.
proc corresp data=have_t outc=want outf=want2 missing;
tables year1,year2;
run;