SAS: put in a new table means of samples - sas

I'm a beginner in SAS and I don't succeed with the following:
I have a table (let's called it table1) that contain 100 samples associated with two variables X and Y:
Number of sample
X
Y
1
8
7
1
3
4
1
11
11
2
14
2
2
14
2
2
17
-2
...
...
..
I'd like to create a new table (table2) that contains for each sample the mean of X (I must use proc means).
So the result must be something like this:
table2
Can you help me, please?
Thank you in advance,
Larapa
ps: every sample have the same size (3).

The documentation covers the operation of Proc MEANS in great detail.
For starters, try this example:
data have;
input id x y;
datalines;
1 8 7
1 3 4
1 11 11
2 14 2
2 14 2
2 17 -2
;
proc means nway noprint data=have;
by id;
var x;
output out=want(keep=id mean_x) mean=mean_x;
run;

Related

Subset data by group by proportion in SAS

In this data, I need to subset by each variable by certain percentage.
For example,
Obs Group Score
1 A 1
2 A 2
3 B 1
4 B 1
5 C 3
6 C 1
7 C 1
8 A 1
9 A 3
10 A 1
11 A 2
12 B 3
13 C 2
I would need to subset 10 obs.
The sample must consist of all groups, and score of 1 takes higher priority.
Each group is given certain percent.
Let say 50% for A, 20% for B and 30% for C.
I tried using proc surveyselect but it failed. The number of alloc is not same as the strata.
proc surveyselect data=example out=test sampsize=10;
strata group score/alloc=(0.5 0.2 0.3);
run;
I don't know proc surveyselect too much, so I give the data step version.
data have;
input Obs Group$ Score;
cards;
1 A 1
2 A 2
3 B 1
4 B 1
5 C 3
6 C 1
7 C 1
8 A 1
9 A 3
10 A 1
11 A 2
12 B 3
13 C 2
;
run;
proc sort;
by Group Score;
run;
data want;
array _Dist_[3]$ _temporary_('A','B','C');
array _Upper_[3] _temporary_(5,2,3);
array _Count_[3] _temporary_;
do i = 1 to rec;
set have nobs=rec point=i;
do j = 1 to dim(_Dist_);
_Count_[j] + (Group=_Dist_[j]);
if _Count_[j] <= _Upper_[j] and Group = _Dist_[j] then output;
end;
end;
stop;
drop j;
run;

Setting cutoff period SAS

I am having a problem with a dataset that looks like the one below. It is an inventory count of different location/weeks:
data have;
input itm location $ week inv;
cards;
3 x 1 30
3 x 2 20
3 x 3 0
3 x 4 5
3 y 1 100
3 y 2 90
3 y 3 0
3 y 4 6
4 x 1 30
4 x 2 0
4 x 3 40
4 x 4 10
;
run;
Here is the issue...once the inventory hits 0 for a specific location/item combination, I want all remaining weeks for that combination to be imputed with 0. My desired output looks like this:
data want;
input itm location $ week inv;
cards;
3 x 1 30
3 x 2 20
3 x 3 0
3 x 4 0
3 y 1 100
3 y 2 90
3 y 3 0
3 y 4 0
4 x 1 30
4 x 2 0
4 x 3 0
4 x 4 0
;
run;
I'm fairly new to SAS and don't know how to do this. Help?!
Thank you!
You can do that in the following steps:
by statement to indicate the order (the input dataset must be sorted accordingly)
retain statement to pass the value of a control variable (reset) to the following rows
deactivate the imputation (reset=0) for every first location/item combination
activate the imputation (reset=1) for zero values of inv
set to 0 if the imputation is active
Code:
data want (drop=reset);
set have;
by itm location week;
retain reset;
if first.location then reset=0;
if (inv = 0) then reset=1;
else if (reset = 1) then inv=0;
run;
The value of reset remains constant from row to row until explicitly modified.
The presence of the variable week in the by statement is only to check that the input data is chronologically sorted.
The following will use proc sql to give the wanted result. I have commented inline why I do different steps.
proc sql;
/* First of all fetch all observations where the inventory is depleated.*/
create table work.zero_inv as
select *, min(week) as min_zero_inv_week
from work.have where inv = 0
/* If your example data set had included several zero inventory weeks, without the follwing "commented" code you would got duplicates. I'll leave the excercise to explain this to you. Just alter your have data set and see the difference.*/
/*group by itm, location
having (calculated min_zero_inv_week) = week*/;
create table work.want_calculated as
/* Since we have fetched all weeks with zero inventories, we can use a left join and only update weeks that follows those with zeros and leave the inventory untouched for the rest.*/
select t1.itm, t1.location, t1.week,
/* Since we use a left join, we can check if the second data sets includes any rows at all for the specific item at the given location. */
case when t2.itm is missing or t1.week <= t2.week then t1.inv else t2.inv end as inv
from work.have as t1
left join work.zero_inv as t2
on t1.itm = t2.itm and t1.location = t2.location
/* proc sql does not promise to keep the order in your dataset, therefore it is good to sort it.*/
order by t1.itm, t1.location, t1.week;
run;
proc compare base=work.want compare=work.want_calculated;
title 'Hopefully no differences';
run;
Remember that stackoverflow isn't a "give me the code" forum, it is customary to try some solutions by yourself first. I'll cut you some slack since this is your first question; Welcome to SO :D.

SAS PROC GMAP Annotate Regions

I am having difficulty annotating a map I have created using the Gmap procedure (SAS 9.4).
I have a custom shape data set I have created for two regions (XX and YY). XX is actually a disjoint region made up of two shapes.
I am having two issues:
The Proc is trying to draw the Area XX as one contiguous region, even though I've defined it as two separate subpolygons.
The labels are not populating in the centroid of the shapes, even though I've tried using the %centroid macro to build the annotation set. The coordinates look to be correct, but the text is not showing up in the right place.
Here is the code I've put together.
data map;
input Area $ Y X POINTORDER SUB_POLYGON_NUMBER POLYGON_NUMBER;
cards;
XX 1 1 1 1 1
XX 2 1 2 1 1
XX 3 1 3 1 1
XX 3 2 4 1 1
XX 3 3 5 1 1
XX 2 3 6 1 1
XX 1 3 7 1 1
XX 1 2 8 1 1
XX -1 0 1 2 1
XX -2 0 2 2 1
XX -1 -2 3 2 1
YY 7 7 1 1 2
YY 7 8 2 1 2
YY 8 9 3 1 2
;
run;
data sales;
input Area $ Sales;
datalines;
XX 500
YY 200
;
run;
%annomac;
%CENTROID(map,anno,Area,segonly=1);
data anno;
set anno;
text=Area;
function='label';
style="'Albany AMT/bold'";
run;
proc gmap data = sales map=map;
id Area;
choro Sales / nolegend annotate=anno;
run;
quit;
As Joe said, this would defintely be good to have as two questions. I'll respond to the first part, since Joe has answered the second one.
By opening MAPS.Sweden, I found out that the region identifiers, your POLYGON_NUMBER and SUB_POLYGON_NUMBER, are called ID and SEGMENT. So if you change your column names according to that in the map definition, you'll get the wanted outcome.
data map;
input Area $ Y X POINTORDER SEGMENT ID;
cards;
XX 1 1 1 1 1
XX 2 1 2 1 1
XX 3 1 3 1 1
XX 3 2 4 1 1
XX 3 3 5 1 1
XX 2 3 6 1 1
XX 1 3 7 1 1
XX 1 2 8 1 1
XX -1 0 1 2 1
XX -2 0 2 2 1
XX -1 -2 3 2 1
YY 7 7 1 1 2
YY 7 8 2 1 2
YY 8 9 3 1 2
;
run;
I hadn't worked with gmap before, so it was quite interesting. I tried to read the documentation to find out how the columns should be named to get this to work. I did not find anything, but it should be there somewhere. Please drop a comment if you know where I can read about it.
I'm not sure about the first part of your question, but you probably should split them into two questions - these are two separate issues.
As far as the issue in the question title, the position of the annotate text, you have two problems.
One: your annotate text isn't using the same coordinate system. In SAS/GRAPH, this is controlled with the XSYS, YSYS, etc. variables. 4 is default, which is the value across the entire image; that's not what you want here. What you want here is 2, which is in the data space only (ie, actually on the drawn axis).
You also need to make it visible: by default it won't be drawn "over" a graph element.
data anno;
set anno;
text=Area;
function='label';
style="'Albany AMT/bold'";
color='Red';
when='After';
xsys='2';
ysys='2';
run;
I made it red to make it more visible, but you of course can use black.
Note that I tested this using the single polygon (I deleted the subpolygon=2); I'm not sure what would happen if you had both, but the centering would probably be a bit odd.

Aggregating Using Proc SQL

Suppose I've a dataset in the form:
A B C
1 3 5
1 4 8
1 3 3
2 2 2
2 7 6
2 3 3
3 4 4
3 4 7
3 2 8
Now, I want to take weighted average of each segment of A and then add them up over A. For example in A var for 1, I want to take the weighted avg as (3*5+4*8+3*3)/(3+4+3). And then add up to get 5.6. Same with other 2 segments of A. So, finally the table looks like the following:
A B C D
1 3 7 5.6
2 6 6 7
3 5 9 8.2
Thank you.
Just to provide an alternative approach, you can use the WEIGHT statement in PROC SUMMARY to achieve the same result. The only thing I'm not clear on from your example final table table is where the values of columns B & C come from (I've left these out of my solution below).
proc summary data=test nway;
class a;
var c / weight=b;
output out=agg2 (drop=_:) mean=d;
run;
You can find the solution below. I am curious about your result. For A=2, the weighted average should be (2*2+7*6+3*3)/(2+7+3), about 4.5. Why here you have 7?
data test;
input a b c ;
datalines;
1 3 5
1 4 8
1 3 3
2 2 2
2 7 6
2 3 3
3 4 4
3 4 7
3 2 8
;
run;
proc sql;
create table agg as
select a, b, c, sum(b*c)/sum(b) as d from test
group by a;
quit;
proc sort data=agg nodupkey;
by a d;
run;

How can we do conditional iteration in a sas dataset

How can we do iteration in a sas dataset.
For example I have chosen the first. of a variable.
And want to find the occurence of a particular condition and set a value when it satisfy
SAS data step has a built-in loop over observations. You don't have to do any thing, unless you want to, for some reason. For instance, the following generates a random number for each observation:
data one;
set sashelp.class;
rannum = ranuni(0);
run;
If you want to loop over variables, then there are arrays. For example, the following initializes variables, var1 to var10, with random numbers:
data one;
array vars[1:10] var1-var10;
do i = 1 to 10;
vars[i] = ranuni(0);
end;
run;
The first. and last. flags are automatically generated when you set a (sorted) data with a by statement. An example:
proc sort data=sashelp.class out=class;
by age;
run;
data one;
set class;
by age;
first = first.age;
last = last.age;
run;
/* check */
proc print data=one;
run;
/* on lst
Obs Name Age first last
1 Joyce 11 1 0
2 Thomas 11 0 1
3 James 12 1 0
4 Jane 12 0 0
5 John 12 0 0
6 Louise 12 0 0
7 Robert 12 0 1
8 Alice 13 1 0
...
18 William 15 0 1
19 Philip 16 1 1
*/