I am trying to create a variable z that will take the same value within a group based on values of two variables X and Y of the first observation within a group. There are 4 possible values of Z a group can take based on the X and Y values of the first observation in the group.
Z=1 (if X=1 & Y=1),
Z=2 (if X=2 & Y=1),
Z=3 (if X=1 & Y=2), and
Z=4 (if X=2 & Y=2).
This is what I have and what I want.
X has two values, 1 or 2, within a group; while Y can take 1, 2 ,3.
Y is sorted in ascending order
if the first (or all group observations) take a value of 3, the resulting Z
value should be set to missing
This is what I have:
Obs Group X Y
1 10600 1 1
2 10600 1 2
3 10600 1 3
4 10800 2 1
5 10800 2 3
6 10900 1 2
7 10900 1 3
8 11100 2 2
9 11100 2 2
10 11100 2 3
11 11100 2 2
12 11200 2 3
13 11300 2 1
14 11300 2 2
15 11300 1 3
16 11300 1 3
17 11300 1 3
18 11300 1 3
And here is what I want:
Obs Group X Y Z
1 10600 1 1 1
2 10600 1 2 1
3 10600 1 3 1
4 10800 2 1 2
5 10800 2 3 2
6 10900 1 2 3
7 10900 1 3 3
8 11100 2 2 4
9 11100 2 2 4
10 11100 2 3 4
11 11100 2 2 4
12 11200 2 3 .
13 11300 2 1 2
14 11300 2 2 2
15 11300 1 3 .
16 11300 1 3 .
17 11300 1 3 .
18 11300 1 3 .
Thank you!
You are correct that a retained variable will carry a value into forward iterations of the data step. Nominally, a simple data step with a single set statement an iteration will correspond to a row in the data set.
Your retained variable is to be assigned at the start of a group, so you will need a by statement, which in turn makes an automatic flag variable first.<by-group-var> available.
data have; input
Group X Y; datalines;
10600 1 1
10600 1 2
10600 1 3
10800 2 1
10800 2 3
10900 1 2
10900 1 3
11100 2 2
11100 2 2
11100 2 3
11100 2 2
11200 2 3
11300 2 1
11300 2 2
11300 1 3
11300 1 3
11300 1 3
11300 1 3
run;
The last set of rows with group=11300 have x=2 followed by x=1. Your narrative
within a group
conveys an idea but is not explicitly precise. The actual grouping (based on the shown want) appears to be a combination of group and x. Thus, you will need a
by group x notsorted;
statement. The notsorted will cause the data step setup the first. and last. based on continguity of the values instead of the explicit ordering of values.
data want;
set have;
by group x nostsorted;
retain z;
if first.x then do; * detect first row in combinations "group/x";
select;
when (X=1 & Y=1) Z=1; * apply logic for retained value;
when (X=2 & Y=1) Z=2;
when (X=1 & Y=2) Z=3;
when (X=2 & Y=2) Z=4;
otherwise Z=.;
end;
end;
logic_tracker_first_x = first.x;
run;
ods listing; options nocenter;
proc print data=want;
run;
The output window shows
logic_tracker_
Obs Group X Y z first_x
1 10600 1 1 1 1
2 10600 1 2 1 0
3 10600 1 3 1 0
4 10800 2 1 2 1
5 10800 2 3 2 0
6 10900 1 2 3 1
7 10900 1 3 3 0
8 11100 2 2 4 1
9 11100 2 2 4 0
10 11100 2 3 4 0
11 11100 2 2 4 0
12 11200 2 3 . 1
13 11300 2 1 2 1
14 11300 2 2 2 0
15 11300 1 3 . 1
16 11300 1 3 . 0
17 11300 1 3 . 0
18 11300 1 3 . 0
please try using the following solution , I have used simpler approach by keeping only First Z variable per Group and then did a left join with same dataset to keep the First z variable across remaining observations for same group-
data test;
input group 5. x 1. y 1.;
if x=1 and y=1 then z=1;
else if x=2 and y=1 then z=2;
else if x=1 and y=2 then z=3;
else if x=2 and y=2 then z=4;
datalines;
1060011
1060012
1060013
1080021
1080023
1090012
1090013
1110022
1110022
1110023
1110022
1120023
1130021
1130022
1130013
1130013
1130013
1130013
;
run;
data test1;
set test;
keep group x z;
run;
proc sort data=test1; by group x; run;
data keep_first;
set test1;
by group x;
if first.group or first.x;
run;
proc sql;
create table final
as
select a.group, a.x, a.y, b.z
from test a
left join keep_first b
on a.group=b.group
and a.x=b.x
order by a.group, a.y, a.x;
quit;
Related
I'm quite new to SAS,
I have learned about SGplot, Datalines, IML and randgen.
I'd like to simply generate a random data for a simple scatter plot.
/* declaring manually a numeric list */
data my_data;
input x y ##;
datalines;
1 1 0 8 1 6 0 1 0 1 2 5
0 3 1 0 1 0 1 4 2 4 1 0
0 0 0 1 1 2 1 1 0 4 1 0
1 4 1 0 1 3 0 0 0 1 0 1
1 0 1 1 2 3 0 2 1 4 2 6
2 6 1 0 1 1 0 1 2 8 1 3
1 3 0 5 1 0 5 5 0 2 3 3
0 1 1 0 1 0 0 0 0 3
;
run;
proc sgplot data=my_data;
scatter x=x y=y;
run;
Now I would like in a similar manner to generate a vector of random numbers, such as:
proc iml;
N = 94;
rands = j(N,1);
call randgen(rands, 'Uniform'); /* SAS/IML 12.1 */
run;
and afterwards to transfer the vector as datalines and afterwards pass it into the SGplot.
Can somebody please demonstrate how to do it?
Since you want to pass it directly to datalines, use the submit and text substitution options in IML. This passes rands as an Nx1 vector into the datalines statement, allowing you to read it as one big line.
proc iml;
N = 94;
rands = j(N,1);
call randgen(rands, 'Uniform'); /* SAS/IML 12.1 */
submit rands;
data my_data;
input x y ##;
datalines;
&rands
;
run;
endsubmit;
quit;
proc sgplot data=my_data;
scatter x=x y=y;
run;
Note you'll need to double your size of N to get exactly 94, otherwise you will have 47. This is because it is reading each pair on the same line before moving to the next line. e.g.:
1 2 3 4
x = 1 y = 2
x = 3 y = 4
Source: Passing Values into Procedures (Rick Wicklin)
I'm looking to create a variable based on this data sample:
Video Subject Pre_post Pre_Post_ID
1 1 0 1
1 2 0 1
1 2 0 1
1 3 0 1
1 3 0 1
2 1 1 1
2 1 1 1
2 2 1 1
2 2 1 1
2 3 1 1
4 1 0 2
4 2 0 2
4 2 0 2
4 3 0 2
4 3 0 2
5 1 1 2
5 1 1 2
5 2 1 2
5 2 1 2
5 3 1 2
The goal of the variable will be to create an ID that links the pre_post variable to the subject on the condition that the pre_post_id is the same:
Video Subject Pre_post Pre_Post_ID Subject_P_P_ID
1 1 0 1 1
1 2 0 1 2
1 2 0 1 2
1 3 0 1 3
1 3 0 1 3
2 1 1 1 1
2 1 1 1 1
2 2 1 1 2
2 2 1 1 2
2 3 1 1 3
4 1 0 2 4
4 2 0 2 5
4 2 0 2 5
4 3 0 2 6
4 3 0 2 6
5 1 1 2 4
5 1 1 2 4
5 2 1 2 5
5 2 1 2 5
5 3 1 2 6
Thank you in advance for the help!
You will want to track the pairs (<pre_post_id>,<subject>) as a composite key and increment the Subject_P_P_ID every time a new pair (or key) is encountered.
To simplify the discussion, call the two items in the pair item1 and item2
Here are two ways:
Sort by item1 item2, step through BY item1 item2 and track pair count using logic based on an automatic first. variable -- pair_id + (first.item2), or
Track pairs as keys of a hash and assign new id as <hash>.num_items + 1 when key lookup fails.
Sort + Data Step + Revert Sort
proc sort data=have out=have_sorted;
by item1 item2;
run;
data have_sequenced;
set have_sorted;
by item1 item2;
item1_item2_pair_id + (first.item2);
run;
proc sort data=have_sequenced out=want;
by video subject pre_post pre_post_id item1_item2_pair_id;
run;
Hash
data want;
set have;
if _n_=1 then do;
declare hash lookup();
lookup.defineKeys('item1', 'item2');
lookup.defineData('item1_item2_pair_id');
lookup.defineDone();
end;
if lookup.find() ne 0 then do;
item1_item2_pair_id = lookup.num_items+1;
lookup.add();
end;
end;
I have a dataset that has to be grouped by number as follows.
ID dept count
1 10 2
2 10 2
3 20 4
4 20 4
5 20 4
6 20 4
7 30 4
8 30 4
9 30 4
10 30 4
so for every 3rd row I need a new level the output should be as follows.
ID dept count Level
1 10 2 1
2 10 2 1
3 20 4 1
4 20 4 1
5 20 4 2
6 20 4 2
7 30 4 1
8 30 4 1
9 30 4 2
10 30 4 2
I have tried counting the number of rows based on the dept and count.
data want;
set have;
by dept count;
if first.count then level=1;
else level+1;
run;
this generates a count but not what exactly I am looking for
ID dept count Level
1 10 2 1
2 10 2 1
3 20 4 1
4 20 4 1
5 20 4 2
6 20 4 2
7 30 4 1
8 30 4 1
9 30 4 2
10 30 4 2
It isn't quite clear what output you want. I've extended your input data a bit - please
could you clarify what output you'd expect for this input and what the logic is for generating it?
I've made a best guess at roughly what you might be aiming for - incrementing every 3 rows with the same dept and count - perhaps this will be enough for you to get to the answer you want?
data have;
input ID dept count;
cards;
1 10 2
2 10 2
3 20 4
4 20 4
5 20 4
6 20 4
7 30 4
8 30 4
9 30 4
10 30 4
11 30 4
12 30 4
13 30 4
14 30 4
;
run;
data want;
set have;
by dept count;
if first.count then do;
level = 0;
dummy = 0;
end;
if mod(dummy,3) = 0 then level + 1;
dummy + 1;
drop dummy;
run;
Output:
ID dept count level
1 10 2 1
2 10 2 1
3 20 4 1
4 20 4 1
5 20 4 1
6 20 4 2
7 30 4 1
8 30 4 1
9 30 4 1
10 30 4 2
11 30 4 2
12 30 4 2
13 30 4 3
14 30 4 3
One way to do this is to nest the SET statement inside a DO loop. Or in this case two DO loops. One to generate the LEVEL (within DEPT) and the second to count by twos. Use the LAST.DEPT flag to handle odd number of observations.
So if I modify the input to include odd number of observations in some groups.
data have;
input ID dept count;
cards;
1 10 2
2 10 2
3 20 4
4 20 4
5 20 4
6 20 4
7 20 4
8 30 4
9 30 4
10 30 4
;
Then can use this step to assign the LEVEL variable.
data want ;
do level=1 by 1 until(last.dept);
do sublevel=1 to 2 until(last.dept);
set have;
by dept;
output;
end;
end;
run;
Results:
Obs level sublevel ID dept count
1 1 1 1 10 2
2 1 2 2 10 2
3 1 1 3 20 4
4 1 2 4 20 4
5 2 1 5 20 4
6 2 2 6 20 4
7 3 1 7 20 4
8 1 1 8 30 4
9 1 2 9 30 4
10 2 1 10 30 4
I have data which is as follows:
data have;
length
group 8
replicate $ 1
day 8
observation 8
;
input (_all_) (:);
datalines;
1 A 1 0
1 A 1 5
1 A 1 3
1 A 1 3
1 A 2 7
1 A 2 2
1 A 2 4
1 A 2 2
1 B 1 1
1 B 1 3
1 B 1 8
1 B 1 0
1 B 2 3
1 B 2 8
1 B 2 1
1 B 2 3
1 C 1 1
1 C 1 5
1 C 1 2
1 C 1 7
1 C 2 2
1 C 2 1
1 C 2 4
1 C 2 1
2 A 1 7
2 A 1 5
2 A 1 3
2 A 1 1
2 A 2 0
2 A 2 5
2 A 2 3
2 A 2 0
2 B 1 0
2 B 1 3
2 B 1 4
2 B 1 8
2 B 2 1
2 B 2 3
2 B 2 4
2 B 2 0
2 C 1 0
2 C 1 4
2 C 1 3
2 C 1 1
2 C 2 2
2 C 2 3
2 C 2 0
2 C 2 1
3 A 1 4
3 A 1 5
3 A 1 6
3 A 1 7
3 A 2 3
3 A 2 1
3 A 2 5
3 A 2 2
3 B 1 2
3 B 1 0
3 B 1 2
3 B 1 3
3 B 2 0
3 B 2 6
3 B 2 3
3 B 2 7
3 C 1 7
3 C 1 5
3 C 1 3
3 C 1 1
3 C 2 0
3 C 2 3
3 C 2 2
3 C 2 1
;
run;
I want to split observation into two columns based on day.
observation_ observation_
Obs group replicate day_1 day_2
1 1 A 0 7
2 1 A 5 2
3 1 A 3 4
4 1 A 3 2
5 1 B 1 3
6 1 B 3 8
7 1 B 8 1
8 1 B 0 3
9 1 C 1 2
10 1 C 5 1
11 1 C 2 4
12 1 C 7 1
13 2 A 7 0
14 2 A 5 5
15 2 A 3 3
16 2 A 1 0
17 2 B 0 1
18 2 B 3 3
19 2 B 4 4
20 2 B 8 0
21 2 C 0 2
22 2 C 4 3
23 2 C 3 0
24 2 C 1 1
25 3 A 4 3
26 3 A 5 1
27 3 A 6 5
28 3 A 7 2
29 3 B 2 0
30 3 B 0 6
31 3 B 2 3
32 3 B 3 7
33 3 C 7 0
34 3 C 5 3
35 3 C 3 2
36 3 C 1 1
The observant SO reader will notice that I have asked essentially the same question previously. However, because of SAS's obsession with "levels" and "by groups", since the variable being used to split the variable of interest isn't binary, that solution doesn't generalize.
Trying it directly, the following occurs:
proc sort data = have out = sorted;
by
group
replicate
;
run;
proc transpose data = sorted out = test;
by
group
replicate
;
var observation;
id day;
run;
ERROR: The ID value "_1" occurs twice in the same BY group.
I can use a LET statement to repress the errors, but in addition to cluttering up the log, SAS retains only the last observation of each BY group.
proc sort data = have out = sorted;
by
group
replicate
;
run;
proc transpose data = sorted out = test let;
by
group
replicate
;
var observation;
id day;
run;
Obs group replicate _NAME_ _1 _2
1 1 A observation 3 2
2 1 B observation 0 3
3 1 C observation 7 1
4 2 A observation 1 0
5 2 B observation 8 0
6 2 C observation 1 1
7 3 A observation 7 2
8 3 B observation 3 7
9 3 C observation 1 1
I don't doubt there's some kludgy way it could be done, such as splitting each group into a separate data set and then re-merging them. It seems like it should be doable with PROC TRANSPOSE, although how escapes me. Any ideas?
Not sure what you're talking about with "SAS's obsession...", but the issue here is fairly straightforward; you need to tell SAS about the four rows (or whatever) being separate, distinct rows. by tells SAS what the row-level ID is, but you're lying to it when you say by group replicate, since there are still multiple rows under that. So you need to have a unique key. (This would be true in any database-like language, nothing unique to SAS here. )
I would do this - make a day_row field, then sort by that.
data have_id;
set have;
by group replicate day;
if first.day then day_row = 0;
day_row+1;
run;
proc sort data=have_id;
by group replicate day_row;
run;
proc transpose data=have_id out=want(drop=_name_) prefix=observation_day_;
by group replicate day_row;
var observation;
id day;
run;
Your output looks like you don't want to transpose the data but instead just want split it into DAY1 and DAY2 sets and merge them back together. This will just pair the multiple readings per BY group in the same order that they appear, which is what it looks like you did in your example.
data want ;
merge
have(where=(day=1) rename=(observation=day_1))
have(where=(day=2) rename=(observation=day_2))
;
by group replicate;
drop day ;
run;
You can read the source data as many times as you need for the number of values of DAY.
If you think that you might not have the same number of observations per BY group for each DAY then you should add these statements at the end of the data step.
output;
call missing(of day_:);
When I run return list, all after running a ranksum test, the count and z-score are available, but not the p-value. Is there any way of picking it up?
clear
input eventtime prefflag winner stakechange
1 1 1 10
1 2 1 5
2 1 0 50
2 2 0 31
2 1 1 51
2 2 1 20
1 1 0 10
2 2 1 10
2 1 0 5
3 2 0 8
4 2 0 8
5 2 0 8
5 2 1 8
3 1 1 8
4 1 1 8
5 1 1 8
5 1 1 8
end
bysort eventtime winner: tabstat stakechange, stat(mean median n) columns(statistics)
ranksum stakechange if inlist(eventtime, 1, 2) & inlist(winner, 0, .), by (eventtime)
return list, all
Try computing it after ranksum:
scalar pval = 2 * normprob(-abs(r(z)))
display pval
The answer is by #NickCox:
http://www.stata.com/statalist/archive/2004-12/msg00622.html
The Statalist archive is a valuable resource.