Copying the previous column value in SAS - sas

I am trying to copy the value from the previous column to the present column if there is a missing value, but there is something wrong in the code I wrote.
data X;
input A B C D E;
cards;
1 . . . 2
2 2 3 . .
3 3 4 5 6
4 4 4 2 .
. . 6 . .
;
run;
Program
data Y;
set x;
array arr(5) a--e;
array brr(4) b--e;
do j=1 to dim(arr);
do i =2 to dim(brr);
if brr(i)=. then brr(i)=arr(j);
end;
end;
drop i j;
run;
However the output that I get is
1 . 1 1 2
2 2 3 2 2
3 3 4 5 6
4 4 4 2 4
. . 6 6 6
Which is wrong!
The output I want is like this:
1 1 1 1 2
2 2 3 3 3
3 3 4 5 6
4 4 4 2 4
. . 6 6 6
What is wrong with the code?

Do you want 4 4 4 2 2 instead of 4 4 4 2 4 ?
You need only one loop:
Try this code:
data Y;
set x;
array arr(5) a--e;
do i=2 to dim(arr);
if arr(i)=. then arr(i)=arr(i-1);
end;
drop i;
run;
Also, don't forget to think what is happening in this code!
You could try to check for every row and every i:
what is the arr(i) value?
what is the arr(i-1) value?
is the outcome what is expected? (Convince yourself that the problem is solved :) )

Related

Creating variables based on other variables in SAS

I'm looking to create a variable based on this data sample:
Video Subject Pre_post Pre_Post_ID
1 1 0 1
1 2 0 1
1 2 0 1
1 3 0 1
1 3 0 1
2 1 1 1
2 1 1 1
2 2 1 1
2 2 1 1
2 3 1 1
4 1 0 2
4 2 0 2
4 2 0 2
4 3 0 2
4 3 0 2
5 1 1 2
5 1 1 2
5 2 1 2
5 2 1 2
5 3 1 2
The goal of the variable will be to create an ID that links the pre_post variable to the subject on the condition that the pre_post_id is the same:
Video Subject Pre_post Pre_Post_ID Subject_P_P_ID
1 1 0 1 1
1 2 0 1 2
1 2 0 1 2
1 3 0 1 3
1 3 0 1 3
2 1 1 1 1
2 1 1 1 1
2 2 1 1 2
2 2 1 1 2
2 3 1 1 3
4 1 0 2 4
4 2 0 2 5
4 2 0 2 5
4 3 0 2 6
4 3 0 2 6
5 1 1 2 4
5 1 1 2 4
5 2 1 2 5
5 2 1 2 5
5 3 1 2 6
Thank you in advance for the help!
You will want to track the pairs (<pre_post_id>,<subject>) as a composite key and increment the Subject_P_P_ID every time a new pair (or key) is encountered.
To simplify the discussion, call the two items in the pair item1 and item2
Here are two ways:
Sort by item1 item2, step through BY item1 item2 and track pair count using logic based on an automatic first. variable -- pair_id + (first.item2), or
Track pairs as keys of a hash and assign new id as <hash>.num_items + 1 when key lookup fails.
Sort + Data Step + Revert Sort
proc sort data=have out=have_sorted;
by item1 item2;
run;
data have_sequenced;
set have_sorted;
by item1 item2;
item1_item2_pair_id + (first.item2);
run;
proc sort data=have_sequenced out=want;
by video subject pre_post pre_post_id item1_item2_pair_id;
run;
Hash
data want;
set have;
if _n_=1 then do;
declare hash lookup();
lookup.defineKeys('item1', 'item2');
lookup.defineData('item1_item2_pair_id');
lookup.defineDone();
end;
if lookup.find() ne 0 then do;
item1_item2_pair_id = lookup.num_items+1;
lookup.add();
end;
end;

SAS - Split single column into two based value of non-binary ID column

I have data which is as follows:
data have;
length
group 8
replicate $ 1
day 8
observation 8
;
input (_all_) (:);
datalines;
1 A 1 0
1 A 1 5
1 A 1 3
1 A 1 3
1 A 2 7
1 A 2 2
1 A 2 4
1 A 2 2
1 B 1 1
1 B 1 3
1 B 1 8
1 B 1 0
1 B 2 3
1 B 2 8
1 B 2 1
1 B 2 3
1 C 1 1
1 C 1 5
1 C 1 2
1 C 1 7
1 C 2 2
1 C 2 1
1 C 2 4
1 C 2 1
2 A 1 7
2 A 1 5
2 A 1 3
2 A 1 1
2 A 2 0
2 A 2 5
2 A 2 3
2 A 2 0
2 B 1 0
2 B 1 3
2 B 1 4
2 B 1 8
2 B 2 1
2 B 2 3
2 B 2 4
2 B 2 0
2 C 1 0
2 C 1 4
2 C 1 3
2 C 1 1
2 C 2 2
2 C 2 3
2 C 2 0
2 C 2 1
3 A 1 4
3 A 1 5
3 A 1 6
3 A 1 7
3 A 2 3
3 A 2 1
3 A 2 5
3 A 2 2
3 B 1 2
3 B 1 0
3 B 1 2
3 B 1 3
3 B 2 0
3 B 2 6
3 B 2 3
3 B 2 7
3 C 1 7
3 C 1 5
3 C 1 3
3 C 1 1
3 C 2 0
3 C 2 3
3 C 2 2
3 C 2 1
;
run;
I want to split observation into two columns based on day.
observation_ observation_
Obs group replicate day_1 day_2
1 1 A 0 7
2 1 A 5 2
3 1 A 3 4
4 1 A 3 2
5 1 B 1 3
6 1 B 3 8
7 1 B 8 1
8 1 B 0 3
9 1 C 1 2
10 1 C 5 1
11 1 C 2 4
12 1 C 7 1
13 2 A 7 0
14 2 A 5 5
15 2 A 3 3
16 2 A 1 0
17 2 B 0 1
18 2 B 3 3
19 2 B 4 4
20 2 B 8 0
21 2 C 0 2
22 2 C 4 3
23 2 C 3 0
24 2 C 1 1
25 3 A 4 3
26 3 A 5 1
27 3 A 6 5
28 3 A 7 2
29 3 B 2 0
30 3 B 0 6
31 3 B 2 3
32 3 B 3 7
33 3 C 7 0
34 3 C 5 3
35 3 C 3 2
36 3 C 1 1
The observant SO reader will notice that I have asked essentially the same question previously. However, because of SAS's obsession with "levels" and "by groups", since the variable being used to split the variable of interest isn't binary, that solution doesn't generalize.
Trying it directly, the following occurs:
proc sort data = have out = sorted;
by
group
replicate
;
run;
proc transpose data = sorted out = test;
by
group
replicate
;
var observation;
id day;
run;
ERROR: The ID value "_1" occurs twice in the same BY group.
I can use a LET statement to repress the errors, but in addition to cluttering up the log, SAS retains only the last observation of each BY group.
proc sort data = have out = sorted;
by
group
replicate
;
run;
proc transpose data = sorted out = test let;
by
group
replicate
;
var observation;
id day;
run;
Obs group replicate _NAME_ _1 _2
1 1 A observation 3 2
2 1 B observation 0 3
3 1 C observation 7 1
4 2 A observation 1 0
5 2 B observation 8 0
6 2 C observation 1 1
7 3 A observation 7 2
8 3 B observation 3 7
9 3 C observation 1 1
I don't doubt there's some kludgy way it could be done, such as splitting each group into a separate data set and then re-merging them. It seems like it should be doable with PROC TRANSPOSE, although how escapes me. Any ideas?
Not sure what you're talking about with "SAS's obsession...", but the issue here is fairly straightforward; you need to tell SAS about the four rows (or whatever) being separate, distinct rows. by tells SAS what the row-level ID is, but you're lying to it when you say by group replicate, since there are still multiple rows under that. So you need to have a unique key. (This would be true in any database-like language, nothing unique to SAS here. )
I would do this - make a day_row field, then sort by that.
data have_id;
set have;
by group replicate day;
if first.day then day_row = 0;
day_row+1;
run;
proc sort data=have_id;
by group replicate day_row;
run;
proc transpose data=have_id out=want(drop=_name_) prefix=observation_day_;
by group replicate day_row;
var observation;
id day;
run;
Your output looks like you don't want to transpose the data but instead just want split it into DAY1 and DAY2 sets and merge them back together. This will just pair the multiple readings per BY group in the same order that they appear, which is what it looks like you did in your example.
data want ;
merge
have(where=(day=1) rename=(observation=day_1))
have(where=(day=2) rename=(observation=day_2))
;
by group replicate;
drop day ;
run;
You can read the source data as many times as you need for the number of values of DAY.
If you think that you might not have the same number of observations per BY group for each DAY then you should add these statements at the end of the data step.
output;
call missing(of day_:);

SAS reverse count within ID group

I am used to creating count variables within a group where the count goes upwards +1 at each time using :
data objective ;
set eg ;
count + 1 ;
by id age ;
if first.age then count = 1 ;
run ;
However I would like to do the reverse, i.e. where the first value of age in each id group has a value of 10 and each subsequently line has a value of -1 that of the preceding line:
data eg ;
input id age desire ;
cards;
1 5 10
1 4 9
1 3 8
1 2 7
1 1 6
2 10 10
2 9 9
2 8 8
2 7 7
2 6 6
2 5 5
2 4 4
2 3 3
2 2 2
2 1 1
3 7 10
3 6 9
3 5 8
3 4 7
3 3 6
3 2 5
3 1 4
;
run;
data objective ;
set eg ;
count - 1 ;
by id age ;
if first.age_ar then count = 10 ;
run ;
Is there a way to do this as count-1 is not recognised.
You can add -1 without using retain as follows:
data objective;
set eg;
count + -1;
by id descending age;
if first.id then count = 10;
run;
Try this (see comments in code for explanation):
data objective ;
retain count 10; /*retain last countvalue for every observation, 10 is optional as initial value*/
set eg ;
count=count - 1 ; /*count -1 does not work, but count=count-1 with count as retainvariable*/
by id age notsorted;/*notsorted because age is ordered descending*/
if first.id then count = 10 ;/*not sure why you hade age_ar here, should be id to get your desired output*/
run ;
output:

Inputting a row which number is in a variable

I have an array t that specifies numbers of rows that I want to read from file.txt. So my code should look like this:
data a;
do i = 1 to dim(t);
infile "C:\sas\file.txt" firstobs = t(i) obs = t(i);
input x1-x10;
output;
end;
run;
Of course this solution (firstobs) works only if the number of a column is a constant. How can I do this using an array (which is also read from the same file - from the first row)?
For example if the file.txt looked like this:
2 4 6 . . . . . . .
2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3
4 4 4 4 4 4 4 4 4
5 5 5 5 5 5 5 5 5
6 6 6 6 6 6 6 6 6
Then I want the output to be:
2 2 2 2 2 2 2 2 2 2
4 4 4 4 4 4 4 4 4 4
6 6 6 6 6 6 6 6 6 6
Here's an answer similar to Tom's, but which does not attempt to read in off-path data. This may be superior for cases where your skipped rows have data which are not formatted in the same manner as your on-path data. It uses Tom's parmcards and structure so you can more easily see the differences.
options parmcards=tempdata ;
filename tempdata temp;
parmcards;
2 4 6 . . . . . . .
2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3
4 4 4 4 4 4 4 4 4
5 5 5 5 5 5 5 5 5
6 6 6 6 6 6 6 6 6
;
%let ncol=9 ;
%let maxrows=1000;
data want ;
infile tempdata truncover end=eof;
array rows (&maxrows) _temporary_;
do i=1 by 1 until (rows(i)=.); *read in first line, just like Toms answer;
input rows(i) #;
drop i;
end;
input ; * stop inputting on the first line;
* Here you may need to use CALL SORTN to sort row array if it is not already sorted;
_currow = 2; * current row indicator;
do _i = 1 to dim(rows); * iterate through row array;
if rows[_i]=. then leave; * leave if row array is empty;
do while (_currow lt rows[_i] and not eof); * skip rows not in row array;
input;
_currow = _currow + 1;
end;
input x1-x&ncol; * now you know you are on a desired row, so input it;
output; * and output it;
_currow = _currow + 1;
end;
run;
You may as I noted above have to use CALL SORTN, if the array is not already sorted (i.e., if the missings are not at the end and the numbers are out of order).
Sounds like the first row contains the list of rows to keep. It would probably be easier to read that from a separate file, but you could make it work with a single file. You did not mention how to know the number of columns of data or the maximum number of row numbers that could be in the first row. For now let's assume that you can set these numbers in macro variables.
Let's get your example data into a file:
options parmcards=tempdata ;
filename tempdata temp;
parmcards;
2 4 6 . . . . . . .
2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3
4 4 4 4 4 4 4 4 4
5 5 5 5 5 5 5 5 5
6 6 6 6 6 6 6 6 6
;
Now let's read it into a dataset.
%let ncol=9 ;
%let maxrows=1000;
data want ;
infile tempdata truncover ;
array rows (&maxrows) _temporary_;
if _n_=1 then do i=1 by 1 until (rows(i)=.);
input rows(i) #;
drop i;
end;
else do;
input x1-x&ncol;
if whichn(_n_,of rows(*)) then output;
end;
run;
If the other rows of the file have invalid data such that the INPUT statement would cause errors you can skip trying to read the data from those rows with a minor modification in the ELSE block.
else do;
input #;
if whichn(_n_,of rows(*)) then do;
input x1-x&ncol;
output;
end;
end;
If you find that you frequently want to not read lots of records at the end of the file you could add this line to the end of the data step to stop when you have read past the last line you want.
if _n_ > max(of rows(*)) then stop;
If your file is structured (i.e. same delimiter/one continuous 'row' of input data ) then the approach below should work. I'm sure that you can tweak to make a bit more efficient but I put some comments in to explain what each section is doing. I also suggest reading through the infile documentation for an explanation of the _infile_ automatic variable and other ways to manipulate the input data buffer. Also, if your input data file needs split up into individual rows itself then you will need to adjust for that.
filename in_data 'C:\sas\file.txt';
data out_data (keep=x1-x10);
infile in_data;
input fn;
/*get the number of vars based on delimiter*/
count = count(strip(_infile_), ' ') + 1;
/*iterate through vars*/
do i =1 to count;
/*set new value to current var*/
rec = scan(strip(_infile_), i, ' ');
/*set array values to new value*/
array obs(10) x1-x10;
do j=1 to dim(obs);
obs(j) = rec;
end;
/*output to dataset*/
output out_data;
end;
run;
Input
2 4 6 7 8 9 10 11 2 3
Output
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
2 2 2 2 2 2 2 2 2 2
4 4 4 4 4 4 4 4 4 4
6 6 6 6 6 6 6 6 6 6
7 7 7 7 7 7 7 7 7 7
8 8 8 8 8 8 8 8 8 8
9 9 9 9 9 9 9 9 9 9
10 10 10 10 10 10 10 10 10 10
11 11 11 11 11 11 11 11 11 11
2 2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3 3
Hope that this helps.
OK, I figured it out. Assuming I know the number of columns (10) and number of rows (10) I can get what I wanted using the following code:
data a;
w=1;
infile "C:\sas\file.txt" n=10;
input #w x1-x10;
array x(*) x1-x10;
array t(10) _temporary_;
do i=1 to 10;
if(x(i)^=.) then t(i)=x(i);
else leave;
end;
do j=1 to i-1;
w=t(j);
input #w x1-x10;
output;
end;
stop;
run;
What is left is to do the same without knowing numbers of rows and columns. This way I only read the rows I'm interested in as opposed to reading all rows and only outputting the ones I need.
It would probably be a lot easier program to maintain if you just read the whole matrix into a dataset and then used the row numbers to pick the data you want. Your file would probably need to have hundreds of thousands of observations for the time saved to be worth the programming effort to avoid reading the full file.
Here is one way using the POINT= option of the SET statement to select the rows.
options parmcards=tempdata ;
filename tempdata temp;
parmcards;
2 4 6
2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3
4 4 4 4 4 4 4 4 4
5 5 5 5 5 5 5 5 5
6 6 6 6 6 6 6 6 6
;
data rows;
infile tempdata obs=1 ;
input row ##;
row=row-1;
run;
proc import datafile="%sysfunc(pathname(tempdata))" dbms=dlm out=full replace;
getnames=no;
delimiter=' ';
datarow=2;
run;
data want ;
set rows ;
pointer=row ;
set full point=pointer ;
run;
proc print; run;

How to reshape long to wide data in Stata?

I have the following data:
id tests testvalue
1 A 4
1 B 5
1 C 3
1 D 3
2 A 3
2 B 3
3 C 3
3 D 4
4 A 3
4 B 5
4 A 1
4 B 3
I would like to change the above long data format into following wide data.
id testA testB testC testD index
1 4 5 3 3 1
2 3 3 . . 2
3 . . 3 4 3
4 3 5 . . 4
4 1 3 . . 5
I am trying
reshape wide testvalue, i(id) j(tests)
It gives error because there are no unique values within tests.
What would be the solution to this problem?
You need to create an extra identifier to make replicates distinguishable.
clear
input id str1 tests testvalue
1 A 4
1 B 5
1 C 3
1 D 3
2 A 3
2 B 3
3 C 3
3 D 4
4 A 3
4 B 5
4 A 1
4 B 3
end
bysort id tests: gen replicate = _n
reshape wide testvalue, i(id replicate) j(tests) string
See also here for documentation.