I am having difficulty annotating a map I have created using the Gmap procedure (SAS 9.4).
I have a custom shape data set I have created for two regions (XX and YY). XX is actually a disjoint region made up of two shapes.
I am having two issues:
The Proc is trying to draw the Area XX as one contiguous region, even though I've defined it as two separate subpolygons.
The labels are not populating in the centroid of the shapes, even though I've tried using the %centroid macro to build the annotation set. The coordinates look to be correct, but the text is not showing up in the right place.
Here is the code I've put together.
data map;
input Area $ Y X POINTORDER SUB_POLYGON_NUMBER POLYGON_NUMBER;
cards;
XX 1 1 1 1 1
XX 2 1 2 1 1
XX 3 1 3 1 1
XX 3 2 4 1 1
XX 3 3 5 1 1
XX 2 3 6 1 1
XX 1 3 7 1 1
XX 1 2 8 1 1
XX -1 0 1 2 1
XX -2 0 2 2 1
XX -1 -2 3 2 1
YY 7 7 1 1 2
YY 7 8 2 1 2
YY 8 9 3 1 2
;
run;
data sales;
input Area $ Sales;
datalines;
XX 500
YY 200
;
run;
%annomac;
%CENTROID(map,anno,Area,segonly=1);
data anno;
set anno;
text=Area;
function='label';
style="'Albany AMT/bold'";
run;
proc gmap data = sales map=map;
id Area;
choro Sales / nolegend annotate=anno;
run;
quit;
As Joe said, this would defintely be good to have as two questions. I'll respond to the first part, since Joe has answered the second one.
By opening MAPS.Sweden, I found out that the region identifiers, your POLYGON_NUMBER and SUB_POLYGON_NUMBER, are called ID and SEGMENT. So if you change your column names according to that in the map definition, you'll get the wanted outcome.
data map;
input Area $ Y X POINTORDER SEGMENT ID;
cards;
XX 1 1 1 1 1
XX 2 1 2 1 1
XX 3 1 3 1 1
XX 3 2 4 1 1
XX 3 3 5 1 1
XX 2 3 6 1 1
XX 1 3 7 1 1
XX 1 2 8 1 1
XX -1 0 1 2 1
XX -2 0 2 2 1
XX -1 -2 3 2 1
YY 7 7 1 1 2
YY 7 8 2 1 2
YY 8 9 3 1 2
;
run;
I hadn't worked with gmap before, so it was quite interesting. I tried to read the documentation to find out how the columns should be named to get this to work. I did not find anything, but it should be there somewhere. Please drop a comment if you know where I can read about it.
I'm not sure about the first part of your question, but you probably should split them into two questions - these are two separate issues.
As far as the issue in the question title, the position of the annotate text, you have two problems.
One: your annotate text isn't using the same coordinate system. In SAS/GRAPH, this is controlled with the XSYS, YSYS, etc. variables. 4 is default, which is the value across the entire image; that's not what you want here. What you want here is 2, which is in the data space only (ie, actually on the drawn axis).
You also need to make it visible: by default it won't be drawn "over" a graph element.
data anno;
set anno;
text=Area;
function='label';
style="'Albany AMT/bold'";
color='Red';
when='After';
xsys='2';
ysys='2';
run;
I made it red to make it more visible, but you of course can use black.
Note that I tested this using the single polygon (I deleted the subpolygon=2); I'm not sure what would happen if you had both, but the centering would probably be a bit odd.
Related
I'm quite new to SAS,
I have learned about SGplot, Datalines, IML and randgen.
I'd like to simply generate a random data for a simple scatter plot.
/* declaring manually a numeric list */
data my_data;
input x y ##;
datalines;
1 1 0 8 1 6 0 1 0 1 2 5
0 3 1 0 1 0 1 4 2 4 1 0
0 0 0 1 1 2 1 1 0 4 1 0
1 4 1 0 1 3 0 0 0 1 0 1
1 0 1 1 2 3 0 2 1 4 2 6
2 6 1 0 1 1 0 1 2 8 1 3
1 3 0 5 1 0 5 5 0 2 3 3
0 1 1 0 1 0 0 0 0 3
;
run;
proc sgplot data=my_data;
scatter x=x y=y;
run;
Now I would like in a similar manner to generate a vector of random numbers, such as:
proc iml;
N = 94;
rands = j(N,1);
call randgen(rands, 'Uniform'); /* SAS/IML 12.1 */
run;
and afterwards to transfer the vector as datalines and afterwards pass it into the SGplot.
Can somebody please demonstrate how to do it?
Since you want to pass it directly to datalines, use the submit and text substitution options in IML. This passes rands as an Nx1 vector into the datalines statement, allowing you to read it as one big line.
proc iml;
N = 94;
rands = j(N,1);
call randgen(rands, 'Uniform'); /* SAS/IML 12.1 */
submit rands;
data my_data;
input x y ##;
datalines;
&rands
;
run;
endsubmit;
quit;
proc sgplot data=my_data;
scatter x=x y=y;
run;
Note you'll need to double your size of N to get exactly 94, otherwise you will have 47. This is because it is reading each pair on the same line before moving to the next line. e.g.:
1 2 3 4
x = 1 y = 2
x = 3 y = 4
Source: Passing Values into Procedures (Rick Wicklin)
I'm a beginner in SAS and I don't succeed with the following:
I have a table (let's called it table1) that contain 100 samples associated with two variables X and Y:
Number of sample
X
Y
1
8
7
1
3
4
1
11
11
2
14
2
2
14
2
2
17
-2
...
...
..
I'd like to create a new table (table2) that contains for each sample the mean of X (I must use proc means).
So the result must be something like this:
table2
Can you help me, please?
Thank you in advance,
Larapa
ps: every sample have the same size (3).
The documentation covers the operation of Proc MEANS in great detail.
For starters, try this example:
data have;
input id x y;
datalines;
1 8 7
1 3 4
1 11 11
2 14 2
2 14 2
2 17 -2
;
proc means nway noprint data=have;
by id;
var x;
output out=want(keep=id mean_x) mean=mean_x;
run;
I have data which is as follows:
data have;
length
group 8
replicate $ 1
day 8
observation 8
;
input (_all_) (:);
datalines;
1 A 1 0
1 A 1 5
1 A 1 3
1 A 1 3
1 A 2 7
1 A 2 2
1 A 2 4
1 A 2 2
1 B 1 1
1 B 1 3
1 B 1 8
1 B 1 0
1 B 2 3
1 B 2 8
1 B 2 1
1 B 2 3
1 C 1 1
1 C 1 5
1 C 1 2
1 C 1 7
1 C 2 2
1 C 2 1
1 C 2 4
1 C 2 1
2 A 1 7
2 A 1 5
2 A 1 3
2 A 1 1
2 A 2 0
2 A 2 5
2 A 2 3
2 A 2 0
2 B 1 0
2 B 1 3
2 B 1 4
2 B 1 8
2 B 2 1
2 B 2 3
2 B 2 4
2 B 2 0
2 C 1 0
2 C 1 4
2 C 1 3
2 C 1 1
2 C 2 2
2 C 2 3
2 C 2 0
2 C 2 1
3 A 1 4
3 A 1 5
3 A 1 6
3 A 1 7
3 A 2 3
3 A 2 1
3 A 2 5
3 A 2 2
3 B 1 2
3 B 1 0
3 B 1 2
3 B 1 3
3 B 2 0
3 B 2 6
3 B 2 3
3 B 2 7
3 C 1 7
3 C 1 5
3 C 1 3
3 C 1 1
3 C 2 0
3 C 2 3
3 C 2 2
3 C 2 1
;
run;
I want to split observation into two columns based on day.
observation_ observation_
Obs group replicate day_1 day_2
1 1 A 0 7
2 1 A 5 2
3 1 A 3 4
4 1 A 3 2
5 1 B 1 3
6 1 B 3 8
7 1 B 8 1
8 1 B 0 3
9 1 C 1 2
10 1 C 5 1
11 1 C 2 4
12 1 C 7 1
13 2 A 7 0
14 2 A 5 5
15 2 A 3 3
16 2 A 1 0
17 2 B 0 1
18 2 B 3 3
19 2 B 4 4
20 2 B 8 0
21 2 C 0 2
22 2 C 4 3
23 2 C 3 0
24 2 C 1 1
25 3 A 4 3
26 3 A 5 1
27 3 A 6 5
28 3 A 7 2
29 3 B 2 0
30 3 B 0 6
31 3 B 2 3
32 3 B 3 7
33 3 C 7 0
34 3 C 5 3
35 3 C 3 2
36 3 C 1 1
The observant SO reader will notice that I have asked essentially the same question previously. However, because of SAS's obsession with "levels" and "by groups", since the variable being used to split the variable of interest isn't binary, that solution doesn't generalize.
Trying it directly, the following occurs:
proc sort data = have out = sorted;
by
group
replicate
;
run;
proc transpose data = sorted out = test;
by
group
replicate
;
var observation;
id day;
run;
ERROR: The ID value "_1" occurs twice in the same BY group.
I can use a LET statement to repress the errors, but in addition to cluttering up the log, SAS retains only the last observation of each BY group.
proc sort data = have out = sorted;
by
group
replicate
;
run;
proc transpose data = sorted out = test let;
by
group
replicate
;
var observation;
id day;
run;
Obs group replicate _NAME_ _1 _2
1 1 A observation 3 2
2 1 B observation 0 3
3 1 C observation 7 1
4 2 A observation 1 0
5 2 B observation 8 0
6 2 C observation 1 1
7 3 A observation 7 2
8 3 B observation 3 7
9 3 C observation 1 1
I don't doubt there's some kludgy way it could be done, such as splitting each group into a separate data set and then re-merging them. It seems like it should be doable with PROC TRANSPOSE, although how escapes me. Any ideas?
Not sure what you're talking about with "SAS's obsession...", but the issue here is fairly straightforward; you need to tell SAS about the four rows (or whatever) being separate, distinct rows. by tells SAS what the row-level ID is, but you're lying to it when you say by group replicate, since there are still multiple rows under that. So you need to have a unique key. (This would be true in any database-like language, nothing unique to SAS here. )
I would do this - make a day_row field, then sort by that.
data have_id;
set have;
by group replicate day;
if first.day then day_row = 0;
day_row+1;
run;
proc sort data=have_id;
by group replicate day_row;
run;
proc transpose data=have_id out=want(drop=_name_) prefix=observation_day_;
by group replicate day_row;
var observation;
id day;
run;
Your output looks like you don't want to transpose the data but instead just want split it into DAY1 and DAY2 sets and merge them back together. This will just pair the multiple readings per BY group in the same order that they appear, which is what it looks like you did in your example.
data want ;
merge
have(where=(day=1) rename=(observation=day_1))
have(where=(day=2) rename=(observation=day_2))
;
by group replicate;
drop day ;
run;
You can read the source data as many times as you need for the number of values of DAY.
If you think that you might not have the same number of observations per BY group for each DAY then you should add these statements at the end of the data step.
output;
call missing(of day_:);
data have;
input patient level timepoint;
datalines;
1 0 1
1 0 2
1 0 3
1 3 4
1 0 5
1 0 6
2 0 1
2 4 2
2 0 3
2 3 4
2 0 5
2 0 6
2 0 7
2 2 8
2 0 9
2 0 10
3 3 1
3 0 2
3 0 3
4 0 1
4 0 2
4 0 3
4 0 4
4 1 5
4 0 6
4 0 7
4 0 8
4 0 9
4 0 10
;;
proc print; run;
/*
Condition 1: If there is one non-zero numeric value, in level, sorted by timepoint for a patient, set level to 2.5 for the record that is immediately prior to this time point; and set level = 1.5 for the next prior time point; set level to 2.5 for the record that is immediate post this time point; and set level to 1.5 for the next post record. The levels by timepoint should look like, ... 1.5, 2.5, non-zero numeric value, 2.5, 1.5 ... (Note: ... are kept as 0s).
Condition 2: If there are two or more non-zero numeric values, in level, sorted by timepoint for a patient, find the FIRST non-zero numeric value, and set level to 2.5 for the record that is immediate prior this time point; and set level to 1.5 for the next prior time point; then find the LAST non-zero numeric value record, set level to 2.5 for the record that is immediate post this last non-zero numeric value, and set level to 1.5 for the next post record; Set all zero values (i.e. level=0) to level = 2.5 for records between the first and last non-zero numeric values; The levels by timepoint should look like: ... 1.5, 2.5, FIRST Non-zero Numeric value, 2.5, Non-zero Numeric value, 2.5, LAST Non-zero Numeric value, 2.5, 1.5 ....
*/
I've tried data steps using N-1, N-2, N+1, N+2, arrays/do loops (my first thought was to use multiple arrays for this so that I could use the i=index to go to previous i-1/i+1 or i-2/1+2 records, but it was hard to grasp the concept of how to even code it.). All of this has to be done BY Patient, so there may be instances where there is only one record before the first non-zero and not two. The same could be true for post record as well. I searched all different types of examples and help, but none that could help with my needs. Thanks in advance for any help.
This is how I want the data to look like:
data want;
input patient level timepoint;
datalines;
1 0 1
1 1.5 2
1 2.5 3
1 3 4
1 2.5 5
1 1.5 6
2 2.5 1
2 4 2
2 2.5 3
2 3 4
2 2.5 5
2 2.5 6
2 2.5 7
2 2 8
2 2.5 9
2 1.5 10
3 3 1
3 2.5 2
3 1.5 3
4 0 1
4 0 2
4 1.5 3
4 2.5 4
4 1 5
4 2.5 6
4 1.5 7
4 0 8
4 0 9
4 0 10
;;
proc print; run;
I approached this by first finding the timepoints of the first and last non-zero levels. Then I merged those into the original set, and changed levels based on the rules you mentioned.
proc sort data = have;
by patient timepoint;
run;
data have2;
retain first 0 last 0;
set have;
by patient timepoint;
if level ne 0 and first = 0 then first = timepoint;
if level ne 0 then last = timepoint;
if last.patient then do;
output;
first = 0;
last = 0;
end;
keep patient first last;
run;
proc sort data=have2;
by patient;
run;
data merged;
merge have have2;
by patient;
if level = 0 then do;
if first-timepoint = 1 then level = 2.5;
if first-timepoint = 2 then level = 1.5;
if last-timepoint = -1 then level = 2.5;
if last-timepoint = -2 then level = 1.5;
if first < timepoint < last then level = 2.5;
end;
drop first last;
run;
In Stata I want to have a variable calculated by a formula, which includes multiplying by the previous value, within blocks defined by a variable ID. I tried using a lag but that did not work for me.
In the formula below the Y-1 is intended to signify the value above (the lag).
gen Y = 0
replace Y = 1 if count == 1
sort ID
by ID: replace Y = (1+X)*Y-1 if count != 1
X Y count ID
. 1 1 1
2 3 2 1
1 6 3 1
3 24 4 1
2 72 5 1
. 1 1 2
1 2 2 2
7 16 3 2
Your code can be made a little more concise. Here's how:
input X count ID
. 1 1
2 2 1
1 3 1
3 4 1
2 5 1
. 1 2
1 2 2
7 3 2
end
gen Y = count == 1
bysort ID (count) : replace Y = (1 + X) * Y[_n-1] if count > 1
The creation of a dummy (indicator) variable can exploit the fact that true or false expressions are evaluated as 1 or 0.
Sorting before by and the subsequent by command can be condensed into one. Note that I spelled out that within blocks of ID, count should remain sorted.
This is really a comment, not another answer, but it would be less clear if presented as such.
Y-1, the lag in the formula would be translated as seen in the below.
gen Y = 0
replace Y = 1 if count == 1
sort ID
by ID: replace Y = (1+X)*Y[_n-1] if count != 1