Inputting a row which number is in a variable - sas

I have an array t that specifies numbers of rows that I want to read from file.txt. So my code should look like this:
data a;
do i = 1 to dim(t);
infile "C:\sas\file.txt" firstobs = t(i) obs = t(i);
input x1-x10;
output;
end;
run;
Of course this solution (firstobs) works only if the number of a column is a constant. How can I do this using an array (which is also read from the same file - from the first row)?
For example if the file.txt looked like this:
2 4 6 . . . . . . .
2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3
4 4 4 4 4 4 4 4 4
5 5 5 5 5 5 5 5 5
6 6 6 6 6 6 6 6 6
Then I want the output to be:
2 2 2 2 2 2 2 2 2 2
4 4 4 4 4 4 4 4 4 4
6 6 6 6 6 6 6 6 6 6

Here's an answer similar to Tom's, but which does not attempt to read in off-path data. This may be superior for cases where your skipped rows have data which are not formatted in the same manner as your on-path data. It uses Tom's parmcards and structure so you can more easily see the differences.
options parmcards=tempdata ;
filename tempdata temp;
parmcards;
2 4 6 . . . . . . .
2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3
4 4 4 4 4 4 4 4 4
5 5 5 5 5 5 5 5 5
6 6 6 6 6 6 6 6 6
;
%let ncol=9 ;
%let maxrows=1000;
data want ;
infile tempdata truncover end=eof;
array rows (&maxrows) _temporary_;
do i=1 by 1 until (rows(i)=.); *read in first line, just like Toms answer;
input rows(i) #;
drop i;
end;
input ; * stop inputting on the first line;
* Here you may need to use CALL SORTN to sort row array if it is not already sorted;
_currow = 2; * current row indicator;
do _i = 1 to dim(rows); * iterate through row array;
if rows[_i]=. then leave; * leave if row array is empty;
do while (_currow lt rows[_i] and not eof); * skip rows not in row array;
input;
_currow = _currow + 1;
end;
input x1-x&ncol; * now you know you are on a desired row, so input it;
output; * and output it;
_currow = _currow + 1;
end;
run;
You may as I noted above have to use CALL SORTN, if the array is not already sorted (i.e., if the missings are not at the end and the numbers are out of order).

Sounds like the first row contains the list of rows to keep. It would probably be easier to read that from a separate file, but you could make it work with a single file. You did not mention how to know the number of columns of data or the maximum number of row numbers that could be in the first row. For now let's assume that you can set these numbers in macro variables.
Let's get your example data into a file:
options parmcards=tempdata ;
filename tempdata temp;
parmcards;
2 4 6 . . . . . . .
2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3
4 4 4 4 4 4 4 4 4
5 5 5 5 5 5 5 5 5
6 6 6 6 6 6 6 6 6
;
Now let's read it into a dataset.
%let ncol=9 ;
%let maxrows=1000;
data want ;
infile tempdata truncover ;
array rows (&maxrows) _temporary_;
if _n_=1 then do i=1 by 1 until (rows(i)=.);
input rows(i) #;
drop i;
end;
else do;
input x1-x&ncol;
if whichn(_n_,of rows(*)) then output;
end;
run;
If the other rows of the file have invalid data such that the INPUT statement would cause errors you can skip trying to read the data from those rows with a minor modification in the ELSE block.
else do;
input #;
if whichn(_n_,of rows(*)) then do;
input x1-x&ncol;
output;
end;
end;
If you find that you frequently want to not read lots of records at the end of the file you could add this line to the end of the data step to stop when you have read past the last line you want.
if _n_ > max(of rows(*)) then stop;

If your file is structured (i.e. same delimiter/one continuous 'row' of input data ) then the approach below should work. I'm sure that you can tweak to make a bit more efficient but I put some comments in to explain what each section is doing. I also suggest reading through the infile documentation for an explanation of the _infile_ automatic variable and other ways to manipulate the input data buffer. Also, if your input data file needs split up into individual rows itself then you will need to adjust for that.
filename in_data 'C:\sas\file.txt';
data out_data (keep=x1-x10);
infile in_data;
input fn;
/*get the number of vars based on delimiter*/
count = count(strip(_infile_), ' ') + 1;
/*iterate through vars*/
do i =1 to count;
/*set new value to current var*/
rec = scan(strip(_infile_), i, ' ');
/*set array values to new value*/
array obs(10) x1-x10;
do j=1 to dim(obs);
obs(j) = rec;
end;
/*output to dataset*/
output out_data;
end;
run;
Input
2 4 6 7 8 9 10 11 2 3
Output
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
2 2 2 2 2 2 2 2 2 2
4 4 4 4 4 4 4 4 4 4
6 6 6 6 6 6 6 6 6 6
7 7 7 7 7 7 7 7 7 7
8 8 8 8 8 8 8 8 8 8
9 9 9 9 9 9 9 9 9 9
10 10 10 10 10 10 10 10 10 10
11 11 11 11 11 11 11 11 11 11
2 2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3 3
Hope that this helps.

OK, I figured it out. Assuming I know the number of columns (10) and number of rows (10) I can get what I wanted using the following code:
data a;
w=1;
infile "C:\sas\file.txt" n=10;
input #w x1-x10;
array x(*) x1-x10;
array t(10) _temporary_;
do i=1 to 10;
if(x(i)^=.) then t(i)=x(i);
else leave;
end;
do j=1 to i-1;
w=t(j);
input #w x1-x10;
output;
end;
stop;
run;
What is left is to do the same without knowing numbers of rows and columns. This way I only read the rows I'm interested in as opposed to reading all rows and only outputting the ones I need.

It would probably be a lot easier program to maintain if you just read the whole matrix into a dataset and then used the row numbers to pick the data you want. Your file would probably need to have hundreds of thousands of observations for the time saved to be worth the programming effort to avoid reading the full file.
Here is one way using the POINT= option of the SET statement to select the rows.
options parmcards=tempdata ;
filename tempdata temp;
parmcards;
2 4 6
2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3
4 4 4 4 4 4 4 4 4
5 5 5 5 5 5 5 5 5
6 6 6 6 6 6 6 6 6
;
data rows;
infile tempdata obs=1 ;
input row ##;
row=row-1;
run;
proc import datafile="%sysfunc(pathname(tempdata))" dbms=dlm out=full replace;
getnames=no;
delimiter=' ';
datarow=2;
run;
data want ;
set rows ;
pointer=row ;
set full point=pointer ;
run;
proc print; run;

Related

SAS reverse count within ID group

I am used to creating count variables within a group where the count goes upwards +1 at each time using :
data objective ;
set eg ;
count + 1 ;
by id age ;
if first.age then count = 1 ;
run ;
However I would like to do the reverse, i.e. where the first value of age in each id group has a value of 10 and each subsequently line has a value of -1 that of the preceding line:
data eg ;
input id age desire ;
cards;
1 5 10
1 4 9
1 3 8
1 2 7
1 1 6
2 10 10
2 9 9
2 8 8
2 7 7
2 6 6
2 5 5
2 4 4
2 3 3
2 2 2
2 1 1
3 7 10
3 6 9
3 5 8
3 4 7
3 3 6
3 2 5
3 1 4
;
run;
data objective ;
set eg ;
count - 1 ;
by id age ;
if first.age_ar then count = 10 ;
run ;
Is there a way to do this as count-1 is not recognised.
You can add -1 without using retain as follows:
data objective;
set eg;
count + -1;
by id descending age;
if first.id then count = 10;
run;
Try this (see comments in code for explanation):
data objective ;
retain count 10; /*retain last countvalue for every observation, 10 is optional as initial value*/
set eg ;
count=count - 1 ; /*count -1 does not work, but count=count-1 with count as retainvariable*/
by id age notsorted;/*notsorted because age is ordered descending*/
if first.id then count = 10 ;/*not sure why you hade age_ar here, should be id to get your desired output*/
run ;
output:

Copying the previous column value in SAS

I am trying to copy the value from the previous column to the present column if there is a missing value, but there is something wrong in the code I wrote.
data X;
input A B C D E;
cards;
1 . . . 2
2 2 3 . .
3 3 4 5 6
4 4 4 2 .
. . 6 . .
;
run;
Program
data Y;
set x;
array arr(5) a--e;
array brr(4) b--e;
do j=1 to dim(arr);
do i =2 to dim(brr);
if brr(i)=. then brr(i)=arr(j);
end;
end;
drop i j;
run;
However the output that I get is
1 . 1 1 2
2 2 3 2 2
3 3 4 5 6
4 4 4 2 4
. . 6 6 6
Which is wrong!
The output I want is like this:
1 1 1 1 2
2 2 3 3 3
3 3 4 5 6
4 4 4 2 4
. . 6 6 6
What is wrong with the code?
Do you want 4 4 4 2 2 instead of 4 4 4 2 4 ?
You need only one loop:
Try this code:
data Y;
set x;
array arr(5) a--e;
do i=2 to dim(arr);
if arr(i)=. then arr(i)=arr(i-1);
end;
drop i;
run;
Also, don't forget to think what is happening in this code!
You could try to check for every row and every i:
what is the arr(i) value?
what is the arr(i-1) value?
is the outcome what is expected? (Convince yourself that the problem is solved :) )

How to select the 5 minimum values with SAS Proc IML?

I would like to know if it's possible to select the 5 minimum or maximum values by rows with IML ?
This is my code :
Proc iml ;
use table;
read all var {&varlist} into matrix ;
n=nrow(matrix) ; /* n=369 here*/
p=ncol(matrix); /* p=38 here*/
test=J(n,5,.) ;
Do i=1 to n ;
test[i,1]=MIN(taux[i,]);
End;
Quit ;
So I would like to obtain a matrix test that contains for the 1rst column the maximal minimum value, then for the 2nd column the minimum value of my row EXCEPTING the 1rst value, etc...
If you have any idea ! :)
Event if it's not with IML (but with SAS : base, sql..)
So for example :
Data test; input x1-x10 ; cards;
1 9 8 7 3 4 2 6
9 3 2 1 4 7 12 -2
;run;
And I would like to obtain the results sorted by row:
1 2 3 4 6 7 8 9
-2 1 2 3 4 7 12
in order to select my 5 minimum values in another table :
y1 y2 y3 y4 y5
1 2 3 4 6
-2 1 2 3 4
Read the article "Compute the kth smallest data value in SAS"
Define the modules as in the article. Then use the following:
have = {1 9 8 7 3 4 2 6,
9 3 2 1 4 7 12 -2};
x = have`; /* transpose */
ord = j(5,ncol(x));
do j = 1 to ncol(x);
ord[,j] = ordinal(1:5, x[,j]);
end;
print ord;
If you have missing values in your data and want to exclude them, use the SMALLEST module instead of the ORDINAL module.
You can use call sort() in PROC IML to sort a column. Because you want to separate the columns and not sort the whole matrix, extract the column, sort it, and then update the original.
You want to sort rows, so transpose your matrix, do the sorting, and then transpose back.
proc iml;
have = {1 9 8 7 3 4 2 6,
9 3 2 1 4 7 12 -2};
print have;
n = nrow(have);
have = have`; /*Transpose because sort works on columns*/
do i=1 to n;
tmp = have[,i];
call sort(tmp,1);
have[,i]=tmp;
end;
have = have`;
want = have[,1:5];
print want;
quit;

SAS Translate cell count to specific values

I have a data set that has a person's name and how many times they scored a 1-10. For example, Bob scored 7 1s, 8 2s, and 7 4s, but did not receive any other scores.
Name 1 2 3 4 5 6 7 8 9 10
Bob 7 8 7 0 0 0 0 0 0 0
Hal 9 3 1 0 0 0 0 0 0 0
I want a data set that has a row for Bob that looks like this
Bob 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 4 4 4 4 4 4 4
Hal 1 1 1 1 1 1 1 1 1 2 2 2 3
I'm doing this in SAS by the way.
I know I can write a macro to create variables named score1, score2, ..., scoreN.
I am having trouble populating the cells. Any help would be appreciated. Thanks.
Such things - changing the structure of the dataset - sometimes easier to do with PROC TRANSPOSE:
data have;
input Name $ v1 v2 v3 v4 v5 v6 v7 v8 v9 v10;
datalines;
Bob 7 8 7 0 0 0 0 0 0 0
;
run;
/*convert original wide dataset into long one*/
proc transpose data=have out=have_long;
var v:;
by Name;
run;
data want;
set have_long;
substr(_NAME_,1,1)=""; *to get rid of first 'v' in variables' names;
do i=1 to COL1;
new_var=_NAME_;
output;
end;
drop _NAME_ COL1 i;
run;
/*convert back to wide dataset*/
proc transpose data=want out=want(drop=_NAME_);
var new_var;
by Name;
run;

Aggregating Using Proc SQL

Suppose I've a dataset in the form:
A B C
1 3 5
1 4 8
1 3 3
2 2 2
2 7 6
2 3 3
3 4 4
3 4 7
3 2 8
Now, I want to take weighted average of each segment of A and then add them up over A. For example in A var for 1, I want to take the weighted avg as (3*5+4*8+3*3)/(3+4+3). And then add up to get 5.6. Same with other 2 segments of A. So, finally the table looks like the following:
A B C D
1 3 7 5.6
2 6 6 7
3 5 9 8.2
Thank you.
Just to provide an alternative approach, you can use the WEIGHT statement in PROC SUMMARY to achieve the same result. The only thing I'm not clear on from your example final table table is where the values of columns B & C come from (I've left these out of my solution below).
proc summary data=test nway;
class a;
var c / weight=b;
output out=agg2 (drop=_:) mean=d;
run;
You can find the solution below. I am curious about your result. For A=2, the weighted average should be (2*2+7*6+3*3)/(2+7+3), about 4.5. Why here you have 7?
data test;
input a b c ;
datalines;
1 3 5
1 4 8
1 3 3
2 2 2
2 7 6
2 3 3
3 4 4
3 4 7
3 2 8
;
run;
proc sql;
create table agg as
select a, b, c, sum(b*c)/sum(b) as d from test
group by a;
quit;
proc sort data=agg nodupkey;
by a d;
run;