I need help making a table that meets the conditions of a hw problem I have...I don't need the answer, rather help getting to what I need!
The question goes as follows:
Report the means and standard deviations of the newborn weights for these three groups in a single SAS table
here are the code I have:
data newset;
set mysubset;
bbywt = 99;
if dbwt < 9999 then bbywt = .0022046*(dbwt);
format meansdata $50.;
meansdata = 'na';
if RF_GDIAB = 'Y' AND RF_PHYPE = 'N' then meansdata = 'Case 1';
if RF_GDIAB = 'Y' AND RF_PHYPE = 'Y' then meansdata ='Case 2';
if RF_GDIAB = 'N' AND RF_PHYPE = 'N'
AND RF_PDIAB = 'N' AND RF_GHYPE = 'N' then meansdata ='Case 3';
run;
PROC univariate DATA = newset;
title 'Assignment 4 Q1 means table';
ods select basicmeasures;
var bbywt;
class meansdata;
where meansdata ~= 'na' and bbywt <99;
run;
whenever I run it I get three tables and I want it in one table with three rows... should I use proc corr or I just need to set it up differently? I need the var to be the converted weight of the baby bbywt, was it right to have the meansdata set up as the class statement?
Any help appreciated
Related
I am comparing the evolution of plasma concentrations over time for different treatments of patients.
We applied each treatment to different subjects and for each treatment we want a graph with the evolution for each subject in black, as well as for the the mean in red.
It should look like this
but it does look like this
My data has variable
trtan and trta for treatment number and name
subjid for the patient receiving that treatment
ATPT for timepoint
AVAL for Individual Concentrations
MEAN for average Concentrations
I am using SGPLOT to produce this line plot. y axis has concentrations while x axis has time points, I am sorting data by treatment, subject and timepoint before passing to Proc SGPLOT.
Lines for indivizual subjects are fine, Issue is with mean line plot, Since dataset is sorted by subject i am getting multiple mean plots by subject as well.
My requirement is to have multiple indivizual plots and an overlaying mean plot. Can anyone advise how can i solve this.
I am using below code. How can I repair it?
proc sort data = pc2;
by trtan trta subjid atptn atpt;
run;
proc sgplot data = pc2 dattrmap = anno pad = (bottom = 20%) NOAUTOLEGEND ;
by trtan trta;
series x = atptn y = aval/ group = trta
lineattrs = (color = black thickness = 1 pattern = solid );
series x = atptn y = mean/ group = trta attrid = trtcolor
lineattrs = (thickness = 2 pattern = solid );
xaxis label= "Actual Time (h)"
labelattrs = (size = 10)
values = (0 12 24 36 48 72 96 120 168)
valueattrs = (size = 10)
grid;
yaxis label= "Plasma Concentration (ng/mL)"
labelattrs = (size = 10)
valueattrs = (size = 10)
grid;
run;
This is not a problem with the mean only.
Leave out the mean, ass min=-20 to your yaxis specification, and you will see the same problem.
Alternatively run this code
data pc2;
do subj = 1 to 3;
do time = 1 to 25;
value = 2*sin(time/3) + rand('Normal');
output;
end;
end;
run;
proc sgplot data=pc2;
series x=time y=value;
run;
and you will get
The solution is to have one plot for each subject, so first sort the data by time and transpose it to have one variable subj_1 etc. for each subject.
proc sort data=pc2 out=SORTED;
by time subj;
run;
proc transpose data=TEST out=TRANS prefix=subj_;
by time;
id subj;
run;
I leave it as an exercise for you to add the mean to this dataset.
Then run sgplot with a series statement per subject. To build these statements, we interrogate the meta data in dataset WORK.TRANS
proc sql;
select distinct 'series x=time y='|| name ||'/lineattrs = (color=black)'
into :series_statements separated by ';'
from sasHelp.vColumn
where libname eq 'WORK' and memName eq 'TRANS'
and (name like 'subj%' or name = mean;
quit;
proc sgplot data=TRANS;
&series_statements;
run;
The result, without the mean, looks like this for my example:
Of course, you will have to do some graphical fine tuning.
We can achive it simply by taking the mean by ATPT and then instead of merging the mean record to the PK data by ATPT, you need to append the records and then you can run your code and it will give you the result you are expecting, please let me know if it does not work, it seems to have worked for me.
I have a questionnaire coded 1-5 and then labeled as (.) for missing variables. How do I code the data to reflect the following:
If patient has =>80% values not missing than missing values will be coded as the mean value of the questions answered. If patient is missing more than 80% of values than set measure summary to missing for patient, drop record.
condomuse;
set int108;
run;
proc means data=condomuse n nmiss missing;
var cusesability CUSESPurchase CUSESCarry CUSESDiscuss CUSESSuggest CUSESUse CUSESMaintain CUSESEmbarrass CUSESReject CUSESUnsure CUSESConfident CUSESComfort CUSESPersuade CUSESGrace CUSESSucceed;
by Intround sid;
run;
Using the following assumptions:
each line/record is a unique person
all variables are numeric
NMISS(), N(), CMISS() and DIM() are functions that can work with arrays.
This will identify all records with 80% or more missing.
data temp; *temp is output data set name;
set have; *have is input data set name;
*create an array to avoid listing all variables later;
array vars_check(*) cusesability CUSESPurchase CUSESCarry CUSESDiscuss CUSESSuggest CUSESUse CUSESMaintain CUSESEmbarrass CUSESReject CUSESUnsure CUSESConfident CUSESComfort CUSESPersuade CUSESGrace CUSESSucceed;
*calculate percent missing;
Percent_Missing = NMISS(of vars_check(*)) / Dim(vars_check);
if percent_missing >= 0.8 then exclude = 'Y';
else exclude = 'N';
run;
To replace with mean or a different method, PROC STDIZE can do that.
*temp is input data set name from previous step;
proc stdize data=temp out=temp_mean reponly method=mean;
*keep only records with more than 80%;
where exclude = 'N';
*list of vars to fill with mean;
VAR cusesability CUSESPurchase CUSESCarry CUSESDiscuss CUSESSuggest CUSESUse CUSESMaintain CUSESEmbarrass CUSESReject CUSESUnsure CUSESConfident CUSESComfort CUSESPersuade CUSESGrace CUSESSucceed;
run;
The different methods for standardization are here, but these are standardization methods not imputation methods.
referring to below code, after I transpose a data-set (output qc2), I tried to create a percentage column (most_recent_wk_percent_change) but the result of the column is 12.5% with two new columns - &week3. and &week2. created. The expected result is to calculate based on the values in week2 and week3 columns. I know the problem could be the referencing of the two columns in the percentage calculation (==> ( &week3. - &week2.)/&week2.;) , but I couldn't put my head to the correction. pls advise :)
%let week1 = 7;
%let week2 = 8;
%let week3 = 9;
proc sql;
create table qc as
select t_week, prod_cat, sum(sales) as sales
from master_table
where t_week in (&week1.,&week2.,&week3.)
group by 1,2
order by 2;
quit;
proc transpose data= qc out=qc2;
format
by prod_cat ;
id t_week;
run;
data qc2;
set qc2;
format most_recent_wk_percent_change PERCENT7.1;
most_recent_wk_percent_change = ( &week3. - &week2.)/&week2.;
run;
qc:
t_week|prod_cat|sales
7|cat|100
8|cat|200
9|cat|300
7|dog|150
8|dog|400
9|dog|300
7|rat|200
8|rat|600
9|rat|300
qc2: (TRANSPOSED TABLE --> note the column name of 7,8,9. (which is expected)
prod_cat|7|8|9
cat|100|200|300
dog|150|400|300
rat|200|600|300
qc2: (i wanted to get the change in % )
prod_cat|7|8|9|most_recent_wk_percent_change|&week2.|&week3.
cat|100|200|300|12.5%|.|.| ==> 12.5% is wrong. should be 50% (300-200)/(200)
dog|150|400|300|12.5%|.|.| ==> 12.5% is wrong. should be -25%
rat|200|600|300|12.5%|.|.| ==> 12.5% is wrong. should be -50%
I have no idea what you are doing or why, but if you have set VALIDVARNAME=any and the actual name of your variable is 7 and you try to use it in SAS code like this:
ratio = 7/8 ;
Then SAS will assume you mean the numeric value 7.
You need to use a name literal instead.
ratio = '7'n / '8'n ;
So you want
most_recent_wk_percent_change = ("&week3"n-"&week2"n)/"&week2"n;
If instead the actual name of the variable is _7 then you need to code this way.
most_recent_wk_percent_change = (_&week3.-_&week2.)/_&week2.;
Try adding a keep statement to your last data step, this will only keeps the columns you want in the output.
data qc2 (keep= most_recent_wk_percent_change prod_cat);
set qc2;
format most_recent_wk_percent_change PERCENT7.1;
most_recent_wk_percent_change = ( &week3. - &tweek2.)/&week2.;
run;
I am supposed to create a summary data set containing the mean, median, and standard deviation broken down by gender and group (using the CLASS statement). Using this summary data set, create four other data sets (in one DATA step) as follows:
(1) grand mean
(2) stats broken down by gender
(3) stats broken down by group
(4) stats broken down by gender and group
Given the hint to use the CHARTYPE option.
I provided my attempted solution, but I don't think I did it in the way asked.
DATA CLINICAL;
*Use LENGTH statement to control the order of
variables in the data set;
LENGTH PATIENT VISIT DATE_VISIT 8;
RETAIN DATE_VISIT WEIGHT;
DO PATIENT = 1 TO 25;
IF RANUNI(135) LT .5 THEN GENDER = 'Female';
ELSE GENDER = 'Male';
X = RANUNI(135);
IF X LT .33 THEN GROUP = 'A';
ELSE IF X LT .66 THEN GROUP = 'B';
ELSE GROUP = 'C';
DO VISIT = 1 TO INT(RANUNI(135)*5);
IF VISIT = 1 THEN DO;
DATE_VISIT = INT(RANUNI(135)*100) + 15800;
WEIGHT = INT(RANNOR(135)*10 + 150);
END;
ELSE DO;
DATE_VISIT = DATE_VISIT + VISIT*(10 + INT(RANUNI(135)*50));
WEIGHT = WEIGHT + INT(RANNOR(135)*10);
END;
OUTPUT;
IF RANUNI(135) LT .2 THEN LEAVE;
END;
END;
DROP X;
FORMAT DATE_VISIT DATE9.;
RUN;
PROC MEANS DATA=CLINICAL;
CLASS GENDER GROUP;
OUTPUT OUT=SUMMARY
MEAN=
MEDIAN=
STDDEV= / AUTONAME;
RUN;
No, what they're asking you to do is:
Use the OUTPUT statement in PROC MEANS to create a summary dataset. Choose the appropriate TYPES and CLASS values in PROC MEANS such that all four sets of data are represented on the output.
Using a single data step that has four dataset names on the data statement, selectively output those rows to the correct dataset. You would use the _TYPE_ variable to determine which dataset a row would be output to.
CHARTYPES just means your _TYPE_ variable will look like 1001 instead of 9 (the binary representation, basically). 1001 indicates which class variable is used (the first and the fourth) to create that breakout. (With only two class variables, you would have values 00, 01, 10, 11 possible). This is sometimes easier for non-programmers who aren't used to thinking in binary (these values would be 0, 1, 2, and 3 in decimal without CHARTYPES and thus might be more difficult for you to tell which corresponds to which variable).
I have table1 that contains one column (city), I have a second table (table2) that has two columns (city, distance),
I am trying to create a third table, table 3, this table contains two columns (city, distance), the city in table 3 will come from the city column in table1 and the distance will be the corresponding distance in table2.
I tried doing this using Proc IML based on Joe's suggestion and this is what I have.
proc iml;
use Table1;
read all var _CHAR_ into Var2 ;
use Table2;
read all var _NUM_ into Var4;
read all var _CHAR_ into Var5;
do i=1 to nrow(Var2);
do j=1 to nrow(Var5);
if Var2[i,1] = Var5[j,1] then
x[i] = Var4[i];
end;
create Table3 from x;
append from x;
close Table3 ;
quit;
I am getting an error, matrix x has not been set to a value. Can somebody please help me here. Thanks in advance.
The technique you want to use is called the "unique-loc technique". It enables you to loop over unique values of a categorical variable (in this case, unique cities) and do something for each value (in this case, copy the distance into another array).
So that others can reprodce the idea, I've imbedded the data directly into the program:
T1_City = {"Gould","Boise City","Felt","Gould","Gould"};
T2_City = {"Gould","Boise City","Felt"};
T2_Dist = {10, 15, 12};
T1_Dist = j(nrow(T1_City),1,.); /* allocate vector for results */
do i = 1 to nrow(T2_City);
idx = loc(T1_City = T2_City[i]);
if ncol(idx)>0 then
T1_Dist[idx] = T2_Dist[i];
end;
print T1_City T1_Dist;
The IF-THEN statement is to prevent in case there are cities in Table2 that are not in Table1. You can read about why it is important to use that IF-THEN statement. The IF-THEN statement is not needed if Table2 contains all unique elements of Table1 cities.
This technique is discussed and used extensively in my book Statistical Programming with SAS/IML Software.
You need a nested loop, or to use a function that finds a value in another matrix.
IE:
do i = 1 to nrow(table1);
do j = 1 to nrow(table2);
...
end;
end;