I have a dataset with 10,000 observations. I want to program a variable that iterates through the dataset and counts row numbers as 1, 2, 3, then resets again at 1. So, if the variable was "count" then row 1, count=1, row 2, count=2, row 3, count=3, but row 4, count=1, row 5 count=2, etc. This program is in SAS.
In the data step, you can create a counter variable using _N_ and the modulo command:
counter = mod(_N_-1,3) + 1
Should give you:
Index Counter
1 1
2 2
3 3
4 1
5 2
6 3
. .
. .
. .
That's pretty easy.
data want;
set have;
count=mod(_N_-1,3)+1;
run;
Related
I have a row matrix (vector) A and another square matrix B. How can I multiply each row of matrix B with the row matrix A in SAS using proc iml or otherwise?
Let's say
a = {1 2 3}
b =
{2 3 4
1 5 3
5 9 10}
My output c would be:
{2 6 12
1 10 9
5 18 30}
Thanks!
Use the element-wise multiplication operator, # in IML:
proc iml;
a = {1 2 3};
b = {2 3 4,
1 5 3,
5 9 10};
c = a#b;
print c;
quit;
There's of course a non-IML solution, or twenty, though IML as Dom notes is probably easiest. Here's two.
First, get them onto one dataset, where the a dataset is on every row (with some other variable names) - see below. Then, either just do the math (use arrays) or use PROC MEANS or similar to use the a dataset as weights.
data a;
input w_x w_y w_z;
datalines;
1 2 3
;;;;
run;
data b;
input x y z;
id=_n_;
datalines;
2 3 4
1 5 3
5 9 10
;;;;
run;
data b_a;
if _n_=1 then set a;
set b;
*you could just multiply things here if you wanted;
run;
proc means data=b_a;
class id;
types id;
var x/weight=w_x;
var y/weight=w_y;
var z/weight=w_z;
output out=want sum=;
run;
I currently have a health injury data set of scores 0-6, where 0 is no injury and 6 is fatal injury. This is across 6 categorical body region variables. I'm attempting to construct an Abbreviated Injury Scale, where the three highest scores in an observation would be considered for the calculations. How do I filter the three highest in each row in SAS? Below is an example:
ID A B C D E F
1 0 0 0 3 4 0
2 1 2 1 4 0 0
3 0 0 5 0 0 0
4 1 2 1 5 4 0
So in OBS 1, scores 3, 4, and 0 would be used; OBS 2 - 4, 2, and 1; OBS 3 - 5, 0, and 0; OBS 4 - 5, 4, 2.
I've provided code below to do what you asked, and detailed out the steps enough that you should be able to modify it for many options/uses.
Basically, it takes your data, transposes it as Quentin suggested and then uses proc means to output the top 3 observations for each ID.
DATA NEW;
INPUT ID A B C D E F;
CARDS;
1 0 0 0 3 4 0
2 1 2 1 4 0 0
3 0 0 5 0 0 0
4 1 2 1 5 4 0
RUN;
PROC TRANSPOSE DATA=NEW OUT=T_OUT(RENAME=(_NAME_ = VARIABLE COL1=VALUES));
BY ID;
VAR A B C D E F;
PROC PRINT DATA=T_OUT;
RUN;
PROC MEANS DATA=T_OUT NOPRINT;
CLASS ID;
TYPES ID;
VAR VALUES;
OUTPUT OUT=TOP3LIST(RENAME=(_FREQ_=RANK VALUES_MEAN=INDEX_CRITERIA))SUM= MEAN=
IDGROUP(MAX(VALUES) OUT[3] (VALUES VARIABLE)=)/AUTOLABEL AUTONAME;
PROC PRINT DATA=TOP3LIST;
RUN;
***THEN YOU CAN MERGE THIS DATA SET TO YOUR ORIGINAL ONE BY ID TO GET YOUR INDEX CRITERIA ADDED TO IT***;
***THE INDEX_CRITERIA IS A MEAN FROM PROC MEANS BEFORE THE KEEPING OF JUST THE TOP3 VALUES***;
DATA FINAL (DROP=_TYPE_ RANK VALUES_Sum VALUES_1 VALUES_2 VALUES_3 VARIABLE_1 VARIABLE_2 VARIABLE_3);
MERGE NEW TOP3LIST;
INDEX_CRITERIA2=SUM(VALUES_1, VALUES_2, VALUES_3)/3; *THIS CRITERIA IS AVERAGE OF THE KEPT 3 VALUES;
BY ID;
PROC PRINT DATA=FINAL;
RUN;
Best regards,
john
I am having a problem with a dataset that looks like the one below. It is an inventory count of different location/weeks:
data have;
input itm location $ week inv;
cards;
3 x 1 30
3 x 2 20
3 x 3 0
3 x 4 5
3 y 1 100
3 y 2 90
3 y 3 0
3 y 4 6
4 x 1 30
4 x 2 0
4 x 3 40
4 x 4 10
;
run;
Here is the issue...once the inventory hits 0 for a specific location/item combination, I want all remaining weeks for that combination to be imputed with 0. My desired output looks like this:
data want;
input itm location $ week inv;
cards;
3 x 1 30
3 x 2 20
3 x 3 0
3 x 4 0
3 y 1 100
3 y 2 90
3 y 3 0
3 y 4 0
4 x 1 30
4 x 2 0
4 x 3 0
4 x 4 0
;
run;
I'm fairly new to SAS and don't know how to do this. Help?!
Thank you!
You can do that in the following steps:
by statement to indicate the order (the input dataset must be sorted accordingly)
retain statement to pass the value of a control variable (reset) to the following rows
deactivate the imputation (reset=0) for every first location/item combination
activate the imputation (reset=1) for zero values of inv
set to 0 if the imputation is active
Code:
data want (drop=reset);
set have;
by itm location week;
retain reset;
if first.location then reset=0;
if (inv = 0) then reset=1;
else if (reset = 1) then inv=0;
run;
The value of reset remains constant from row to row until explicitly modified.
The presence of the variable week in the by statement is only to check that the input data is chronologically sorted.
The following will use proc sql to give the wanted result. I have commented inline why I do different steps.
proc sql;
/* First of all fetch all observations where the inventory is depleated.*/
create table work.zero_inv as
select *, min(week) as min_zero_inv_week
from work.have where inv = 0
/* If your example data set had included several zero inventory weeks, without the follwing "commented" code you would got duplicates. I'll leave the excercise to explain this to you. Just alter your have data set and see the difference.*/
/*group by itm, location
having (calculated min_zero_inv_week) = week*/;
create table work.want_calculated as
/* Since we have fetched all weeks with zero inventories, we can use a left join and only update weeks that follows those with zeros and leave the inventory untouched for the rest.*/
select t1.itm, t1.location, t1.week,
/* Since we use a left join, we can check if the second data sets includes any rows at all for the specific item at the given location. */
case when t2.itm is missing or t1.week <= t2.week then t1.inv else t2.inv end as inv
from work.have as t1
left join work.zero_inv as t2
on t1.itm = t2.itm and t1.location = t2.location
/* proc sql does not promise to keep the order in your dataset, therefore it is good to sort it.*/
order by t1.itm, t1.location, t1.week;
run;
proc compare base=work.want compare=work.want_calculated;
title 'Hopefully no differences';
run;
Remember that stackoverflow isn't a "give me the code" forum, it is customary to try some solutions by yourself first. I'll cut you some slack since this is your first question; Welcome to SO :D.
I would like to know if it's possible to select the 5 minimum or maximum values by rows with IML ?
This is my code :
Proc iml ;
use table;
read all var {&varlist} into matrix ;
n=nrow(matrix) ; /* n=369 here*/
p=ncol(matrix); /* p=38 here*/
test=J(n,5,.) ;
Do i=1 to n ;
test[i,1]=MIN(taux[i,]);
End;
Quit ;
So I would like to obtain a matrix test that contains for the 1rst column the maximal minimum value, then for the 2nd column the minimum value of my row EXCEPTING the 1rst value, etc...
If you have any idea ! :)
Event if it's not with IML (but with SAS : base, sql..)
So for example :
Data test; input x1-x10 ; cards;
1 9 8 7 3 4 2 6
9 3 2 1 4 7 12 -2
;run;
And I would like to obtain the results sorted by row:
1 2 3 4 6 7 8 9
-2 1 2 3 4 7 12
in order to select my 5 minimum values in another table :
y1 y2 y3 y4 y5
1 2 3 4 6
-2 1 2 3 4
Read the article "Compute the kth smallest data value in SAS"
Define the modules as in the article. Then use the following:
have = {1 9 8 7 3 4 2 6,
9 3 2 1 4 7 12 -2};
x = have`; /* transpose */
ord = j(5,ncol(x));
do j = 1 to ncol(x);
ord[,j] = ordinal(1:5, x[,j]);
end;
print ord;
If you have missing values in your data and want to exclude them, use the SMALLEST module instead of the ORDINAL module.
You can use call sort() in PROC IML to sort a column. Because you want to separate the columns and not sort the whole matrix, extract the column, sort it, and then update the original.
You want to sort rows, so transpose your matrix, do the sorting, and then transpose back.
proc iml;
have = {1 9 8 7 3 4 2 6,
9 3 2 1 4 7 12 -2};
print have;
n = nrow(have);
have = have`; /*Transpose because sort works on columns*/
do i=1 to n;
tmp = have[,i];
call sort(tmp,1);
have[,i]=tmp;
end;
have = have`;
want = have[,1:5];
print want;
quit;
In SAS/IML, I'm doing the following:
matrix = {1 2 3 4 5, 2 3 1 2 3, 8 4 8 1 1};
empty = j(5,5);
do i=1 to 5;
empty[i,] = matrix[1,];
end;
So I want to replace the ith row of "empty" with the first row of matrix, but this code doesn't work. How can I replace entire rows of a matrix like this?
If you are trying to replace every row of 'empty' with matrix[1,], I don't see any thing wrong with the code.