ranking within rows of a data set [sas] - sas

Suppose I've got a data set with n rows and p columns such that each entry in the data set contains a real number. I am looking for a way to rank the p columns within each row. The output of this ranking should be a length-p vector of ranks that accounts for ties.
So, let's say my data set has 5 columns. The first row could be something like row 1 = {10, 13, 3, 3, -4}. I'd like to perform some operations on this row and in the end get back the result row 1 ranks = {3, 4, 2, 2, 1}. The second row could be something like row 2 = {8, 3, -6, 5, 2} and the result on this row should be row 2 ranks = {5, 3, 1, 4, 2}.
Is this functionality implemented in SAS? I've generated code that doesn't account for ties, but they occur often enough that it would take an unreasonable amount of time to correct the row rankings that were done incorrectly.

Interesting question; here is one possible solution:
data have;
p1=10; p2=13; p3=3; p4=3; p5=-4; output;
p1=8; p2=3; p3=-6; p4=5; p5=2; output;
run;
data want;
set have;
array p(*) p1-p5;
array c(*) c1-c5;
array r(*) r1-r5;
/* Copy vector to temp array and sort */
do i=1 to dim(p);
c(i) = p(i);
end;
call sortn(of c(*));
/* Search new sorted array for the original position */
do i=1 to dim(c);
if i = 1 then rank=1;
else if c(i) ne c(i-1) then rank + 1;
do j=1 to dim(p);
if p(j) = c(i) then do;
r(j) = rank;
end;
end;
end;
/* PUT statement to see result in log */
put +3 p(*)
/ +3 c(*)
/ +3 r(*);
drop i j rank c1-c5;
run;

Sounds to me like you'll need several arrays to do this.
Array 1: Array to store the ranks
Array 2: Array to sort the values
Array 3: The original un-altered data
I don't have time right now to write the code but using someething like this would do a lot of the heavy lifting:
http://support.sas.com/kb/24/754.html

Might as well add this even though OP said he doesn't use IML in case others find this useful searching for it. IML is really the easiest way to solve this problem, since its fundamentally a vector/matrix problem...
proc iml;
p={10 13 3 3 -4, 5 6 5 2 3};
r=j(2,5,.);
print p r;
do i = 1 to nrow(p);
r[i,]=ranktie(p[i,]);
end;
print p r;
quit;
It does treat tries slightly differently from the OP, and thus would need some work to make it exactly like the solution requested - but in general, 1,2.5,2.5,4,5 [or 1,2,2,4,5] is probably what you really want, not 1,2,2,3,4. 4 and 5 should stay 4 and 5, not move up to 3 and 4, when 2 and 3 tie.

Just for fun, given the OP's answer to wanting a new dataset with ranks, here's the PROC RANK method. Probably not faster than a data step, but perhaps simpler and easier to use in multiple situations, and with the added advantage that you can't really make a mistake in the coding (without it actually crashing).
data have;
input id x1-x5;
datalines;
1 10 13 3 3 -4
2 5 6 5 2 3
;;;;
run;
proc transpose data=have out=temp;
by id;
var x1-x5;
run;
proc rank data=temp out=temprank;
var col1;
by id;
run;
proc transpose data=temprank out=want(drop=_name_ _label_);
by id;
var col1;
id _name_;
run;

Related

using some condition to search target data then replace them with a sequence

I'd like to search a column in a dataset based on some conditions. Then replace them with a sequence with range [0,1] and 1/n as increment (n is the number of data found based on the condition). For example, search odd numbers in column j in the Test dataset below. Then replace '3, 5, 7, 9, 11' with '0.2, 0.4, 0.6, 0.8, 0.1'.
data test;
do i=1 to 10 by 1;
j=i+1;
output;
end;
run;
Many thanks in advance
A simple retain with a conditional increment would work...
data want ;
set have ;
retain condition_met . ;
/* increment based on condition */
if (mod(i,2) = 1) then do ; /* odd numbers */
condition_met + 1 ;
/* apply */
value = 1 / condition_met ;
end ;
run ;

Finding the largest values in SAS (Top 3)

i am new in SAS and I have a little problem. I try to choose 3. largest value from substring in names of files in directory. why i can't do that?
parent=directory
data files_and_folders;
keep num;
did=dopen("parent");
if dnum(did)>3 then do;
do i=1 to dnum(did);
names=int(substr(dread(did,i),9,8));
num=largest(3,names);
output;
end;
end;
output;
run;
returns
names:
20160322
20160323
20160324
20160325
20160325
but returns null value for num
thanks for help
Your variable names is a single number.
largest gives you the largest value in a list of values.
e.g.
k=1
n=largest(k, 1, 2, 3, 4);
result: n = 4
k=2
n=largest(k, 1, 2, 3, 4);
result: n = 3
You are trying to get the third largest value out of a list of one. That results in a missing.
You need to output the whole file. Sort it by descending names. Then limit it to the first three observations e.g. by obs=3 in a set statement

Get a subset of a data set based on SRS of the unique values of a variable in SAS

I have a data set with two variables x and y. x has four distinct values 1, 2, 3, and 4. I want to first take a simple random sample of size 2 from these 4 unique values and keep the corresponding rows.
Say, I get the SRS of 1, 2, then I will keep the first 7 rows as a new data set. If I get a SRS of 2 and 3, then I will keep the fifth to the eighth rows. Here is a simple example to start with. Thank you.
data dataone;
input x y;
datalines;
1 3
1 4
1 5
1 8
2 3
2 7
2 9
3 2
4 8
4 5
;
run;
You can do this one of two ways.
PROC SURVEYSELECT will do the work for you, if you do it in two steps: first give it a dataset of just X unique values, then merge to another dataset that has all values.
Alternately, you can do this in a datastep, where you first determine if you're going to take that X value, then take/not take the rows.
data want;
set have;
by x;
call streaminit(7);
retain need 2; *need 2 total;
retain have 4; *have 4 total - this could be determined programatically.;
retain keep; *and these three could all be one retain statement, this is just more readable;
if first.x then do;
if need/have ge rand('Uniform') then keep=1;
else keep=0;
need + -keep;
have + -1;
put need= have= _t=;
end;
if keep;
run;
This is a modified form of Reservoir Sampling. It works out to identical results of SRS, even though the odds of taking any one X appear to be different.

Populate the matrix proc iml SAS

I have a x matrix with two columns(c1,c2). I want to fix the first column (c1), add 10 columns each have values C2+m, C2+m...C2+m to the X matrix, m is a random integer. finally the matrix going to be:
C1, C2+m, C2+m, C2+m...C2+m;
CODE:
proc iml;
use nonpar;
read all var{treat response} into x;
do i=1 to 10;
call randseed(123);
call randgen(u, "Uniform");
Max = 300; Min = 68;
m = min + floor( (1+Max-Min)*u );
x = x[,1]||x[,2]+m;
end;
quit;
Can someone help me fix that..
Thanks
Couple of things that should lead you in the right direction.
First, pre-create your full destination matrix; don't concatenate constantly. So, once your read the dataset into x, make another x_new that is the same number of rows as x but has 11 columns. j will do this for you.
Second, you can make all of your random numbers at once, but you have to initially define the size of the matrix to fill, again using j. This is assuming you want a new random integer for each of the 10 columns AND each of the rows; if you want just each of the rows or just one 'm' in total you need to do this differently, but you need to clarify that. If you just want one row of 10 m's, then you can do that first (generate a u that has 10 columns 1 row) then expand that to the full number of rows of x using matrix multiplication.
Here's a simplified example using SASHELP.CLASS showing these two concepts at work.
proc iml;
use sashelp.class;
read all var {age weight} into x;
x_new = j(nrow(x),11); *making the new matrix with lots of columns;
x_new[,1] = x[,1]; *copy in column 1;
call randseed(123);
u = j(nrow(x),10); *make the to be filled random matrix;
call randgen(u,'Uniform',68,300); *the min/max parameters can go here;
u = floor(u+0.5); *as Rick noted in comments, needed to get the max value properly;
x_new[,2:11] = u[,1:10] + x[,2]; *populate x_new here;
print x_new;
quit;

How to output a random set of observations from a SAS data set

I have a data set that selects random numbers from a uniform distribution. How do you only output those row indices? I basically want to select a random set of rows from a SAS data set.
data Unif(keep=u x k n m);
call streaminit(123);
a = -1; b = 1;
Min = 1; Max = 28000000;
do i = 1 to &NObs;
u = rand("Uniform"); /* U[0,1] */
x = a + (b-a)*u; /* U[a,b] */
k = ceil( Max*u ); /* uniform integer in 1..Max */
n = floor( (1+Max)*u ); /* uniform integer in 0..Max */
m = min + floor((1+Max-Min)*u); /* uniform integer in Min..Max */
output;
end;
keep k
run;
*not sure about this part;
data final;
set final;
where obs in (k);
run;
The best way to do this is to use PROC SURVEYSELECT.
proc surveyselect data=final out=selected seed=123 n=10;
run;
Or something along those lines depending on how you want to run it - the documentation has a lot of detail on the various options for how to perform the sampling.
If you want to do it in the datastep, you need to be running the code from Unif inside the second datastep, in some fashion. I don't entirely follow what it's trying to do; if that's a form of k/n sampling, search 'SAS k/n sampling' and you'll find lots out there as it's a common question, but the general approach is
data final_selected;
set final;
... code to determine if it should be selected...
if (condition); *subsetting if;
run;