I am having a problem with a dataset that looks like the one below. It is an inventory count of different location/weeks:
data have;
input itm location $ week inv;
cards;
3 x 1 30
3 x 2 20
3 x 3 0
3 x 4 5
3 y 1 100
3 y 2 90
3 y 3 0
3 y 4 6
4 x 1 30
4 x 2 0
4 x 3 40
4 x 4 10
;
run;
Here is the issue...once the inventory hits 0 for a specific location/item combination, I want all remaining weeks for that combination to be imputed with 0. My desired output looks like this:
data want;
input itm location $ week inv;
cards;
3 x 1 30
3 x 2 20
3 x 3 0
3 x 4 0
3 y 1 100
3 y 2 90
3 y 3 0
3 y 4 0
4 x 1 30
4 x 2 0
4 x 3 0
4 x 4 0
;
run;
I'm fairly new to SAS and don't know how to do this. Help?!
Thank you!
You can do that in the following steps:
by statement to indicate the order (the input dataset must be sorted accordingly)
retain statement to pass the value of a control variable (reset) to the following rows
deactivate the imputation (reset=0) for every first location/item combination
activate the imputation (reset=1) for zero values of inv
set to 0 if the imputation is active
Code:
data want (drop=reset);
set have;
by itm location week;
retain reset;
if first.location then reset=0;
if (inv = 0) then reset=1;
else if (reset = 1) then inv=0;
run;
The value of reset remains constant from row to row until explicitly modified.
The presence of the variable week in the by statement is only to check that the input data is chronologically sorted.
The following will use proc sql to give the wanted result. I have commented inline why I do different steps.
proc sql;
/* First of all fetch all observations where the inventory is depleated.*/
create table work.zero_inv as
select *, min(week) as min_zero_inv_week
from work.have where inv = 0
/* If your example data set had included several zero inventory weeks, without the follwing "commented" code you would got duplicates. I'll leave the excercise to explain this to you. Just alter your have data set and see the difference.*/
/*group by itm, location
having (calculated min_zero_inv_week) = week*/;
create table work.want_calculated as
/* Since we have fetched all weeks with zero inventories, we can use a left join and only update weeks that follows those with zeros and leave the inventory untouched for the rest.*/
select t1.itm, t1.location, t1.week,
/* Since we use a left join, we can check if the second data sets includes any rows at all for the specific item at the given location. */
case when t2.itm is missing or t1.week <= t2.week then t1.inv else t2.inv end as inv
from work.have as t1
left join work.zero_inv as t2
on t1.itm = t2.itm and t1.location = t2.location
/* proc sql does not promise to keep the order in your dataset, therefore it is good to sort it.*/
order by t1.itm, t1.location, t1.week;
run;
proc compare base=work.want compare=work.want_calculated;
title 'Hopefully no differences';
run;
Remember that stackoverflow isn't a "give me the code" forum, it is customary to try some solutions by yourself first. I'll cut you some slack since this is your first question; Welcome to SO :D.
Related
I'm a beginner in SAS and I don't succeed with the following:
I have a table (let's called it table1) that contain 100 samples associated with two variables X and Y:
Number of sample
X
Y
1
8
7
1
3
4
1
11
11
2
14
2
2
14
2
2
17
-2
...
...
..
I'd like to create a new table (table2) that contains for each sample the mean of X (I must use proc means).
So the result must be something like this:
table2
Can you help me, please?
Thank you in advance,
Larapa
ps: every sample have the same size (3).
The documentation covers the operation of Proc MEANS in great detail.
For starters, try this example:
data have;
input id x y;
datalines;
1 8 7
1 3 4
1 11 11
2 14 2
2 14 2
2 17 -2
;
proc means nway noprint data=have;
by id;
var x;
output out=want(keep=id mean_x) mean=mean_x;
run;
I answered a SAS question a few minutes ago and realized there is a generalization that might be more useful than that one (here). I didn't see this question already in StackOverflow.
The general question is: How can you process and keep an entire BY-group based on some characteristic of the BY-group that you might not know until you have looked at all the observations in the group?
Using input data similar to that from the earlier question:
* For some reason, we are tasked with keeping only observations that
* are in groups of ID_1 and ID_2 that contain at least one obs with
* a VALUE of 0.;
* In the following data, the following ID and ID_2 groups should be
* kept:
* A 2 (2 obs)
* B 1 (3 obs)
* B 3 (2 obs)
* B 4 (1 obs)
* The resulting dataset will have 8 observations.;
data x;
input id $ id_2 value;
datalines;
A 1 1
A 1 1
A 1 1
A 2 0
A 2 1
B 1 0
B 1 1
B 1 3
B 2 1
B 3 0
B 3 0
B 4 0
C 2 4
;
run;
Double DoW loop solution:
data have;
input id $ id_2 value;
datalines;
A 1 1
A 1 1
A 1 1
A 2 0
A 2 1
B 1 0
B 1 1
B 1 3
B 2 1
B 3 0
B 3 0
B 4 0
C 2 4
;
run;
data want;
do _n_ = 1 by 1 until(last.id_2);
set have;
by id id_2;
flag = sum(flag,value=0);
end;
do _n_ = 1 to _n_;
set have;
if flag then output;
end;
drop flag;
run;
I've tested this against the point approach using ~55m rows and found no appreciable difference in performance. Dataset used:
data have;
do ID = 1 to 10000000;
do id_2 = 1 to ceil(ranuni(1)*10);
do value = floor(ranuni(2) * 5);
output;
end;
end;
end;
run;
My answer might not be the most efficient, especially for large datasets, and I'm interested in seeing other possible answers. Here it is:
* For some reason, we are tasked with keeping only observations that
* are in groups of ID_1 and ID_2 that contain at least one obs with
* a VALUE of 0.;
* In the following data, the following ID and ID_2 groups should be
* kept:
* A 2 (2 obs)
* B 1 (3 obs)
* B 3 (2 obs)
* B 4 (1 obs)
* The resulting dataset will have 8 observations.;
data x;
input id $ id_2 value;
datalines;
A 1 1
A 1 1
A 1 1
A 2 0
A 2 1
B 1 0
B 1 1
B 1 3
B 2 1
B 3 0
B 3 0
B 4 0
C 2 4
;
run;
* I realize the data are already sorted, but I think it is better
* not to assume they are.;
proc sort data=x;
by id id_2;
run;
data obstokeep;
keep id id_2 value;
retain startptr haszero;
* This SET statement reads through the dataset in sequence and
* uses the CUROBS option to obtain the observation number. In
* most situations, this will be the same as the _N_ automatic
* variable, but CUROBS is probably safer.;
set x curobs=myptr;
by id id_2;
* When this is the first observation in a BY-group, save the
* current observation number (pointer).
* Also initialize a flag variable that will become 1 if any
* obs contains a VALUE of 0;
* The variables are in a RETAIN statement, so they keep their
* values as the SET statement above is executed for each obs
* in the BY-group.;
if first.id_2
then do;
startptr=myptr;
haszero=0;
end;
* This statement is executed for each observation. We check
* whether VALUE is 0 and, if so, record that fact.;
if value = 0
then haszero=1;
* At the end of the BY-group, we check to see if there were
* any observations with VALUE = 0. If so, we go back using
* another SET statement, re-read them via direct access, and
* write them to the output dataset.
* (Note that if VALUE order is not relevant, you can gain a bit
* more efficiency by writing the current obs first, then going
* back to get the rest.);
if last.id_2 and haszero
then do;
* When LAST and FIRST at the same time, there is only one
* obs, so no need to backtrack, just output and go on.;
if first.id_2
then output obstokeep;
else do;
* Here we assume that the observations are sequential
* (which they will be for a sequential SET statement),
* so we re-read these observations using another SET
* statement with the POINT option for direct access
* starting with the first obs of the by-group (the
* saved pointer) and ending with the current one (the
* current pointer).;
do i=startptr to myptr;
set x point=i;
output obstokeep;
end;
end;
end;
run;
proc sql;
select a.*,b.value from (select id,id_2 from have where value=0)a left join have b
on a.id=b.id and a.id_2=b.id_2;
quit;
I would like to know if it's possible to select the 5 minimum or maximum values by rows with IML ?
This is my code :
Proc iml ;
use table;
read all var {&varlist} into matrix ;
n=nrow(matrix) ; /* n=369 here*/
p=ncol(matrix); /* p=38 here*/
test=J(n,5,.) ;
Do i=1 to n ;
test[i,1]=MIN(taux[i,]);
End;
Quit ;
So I would like to obtain a matrix test that contains for the 1rst column the maximal minimum value, then for the 2nd column the minimum value of my row EXCEPTING the 1rst value, etc...
If you have any idea ! :)
Event if it's not with IML (but with SAS : base, sql..)
So for example :
Data test; input x1-x10 ; cards;
1 9 8 7 3 4 2 6
9 3 2 1 4 7 12 -2
;run;
And I would like to obtain the results sorted by row:
1 2 3 4 6 7 8 9
-2 1 2 3 4 7 12
in order to select my 5 minimum values in another table :
y1 y2 y3 y4 y5
1 2 3 4 6
-2 1 2 3 4
Read the article "Compute the kth smallest data value in SAS"
Define the modules as in the article. Then use the following:
have = {1 9 8 7 3 4 2 6,
9 3 2 1 4 7 12 -2};
x = have`; /* transpose */
ord = j(5,ncol(x));
do j = 1 to ncol(x);
ord[,j] = ordinal(1:5, x[,j]);
end;
print ord;
If you have missing values in your data and want to exclude them, use the SMALLEST module instead of the ORDINAL module.
You can use call sort() in PROC IML to sort a column. Because you want to separate the columns and not sort the whole matrix, extract the column, sort it, and then update the original.
You want to sort rows, so transpose your matrix, do the sorting, and then transpose back.
proc iml;
have = {1 9 8 7 3 4 2 6,
9 3 2 1 4 7 12 -2};
print have;
n = nrow(have);
have = have`; /*Transpose because sort works on columns*/
do i=1 to n;
tmp = have[,i];
call sort(tmp,1);
have[,i]=tmp;
end;
have = have`;
want = have[,1:5];
print want;
quit;
Say that I have the following database:
Min Rank Qty
2 1 100
2 2 90
2 3 80
2 4 70
5 1 110
5 2 100
5 3 90
5 4 80
5 5 70
7 1 120
7 2 110
7 3 100
7 4 90
I need to have the database with the continuous values for minutes like this:
Min Rank Qty
2 1 100
2 2 90
2 3 80
2 4 70
3 1 100
3 2 90
3 3 80
3 4 70
4 1 100
4 2 90
4 3 80
4 4 70
5 1 110
5 2 100
5 3 90
5 4 80
5 5 70
6 1 110
6 2 100
6 3 90
6 4 80
6 5 70
7 1 120
7 2 110
7 3 100
7 4 90
How can I do this in SAS? I just need to replicate the previous minute. The number of observations per minute varies...it can be 4 or 5 or more.
It is not that hard to imagine code that would do this, the problem is that it quickly starts to look messy.
If your dataset is not too large, one approach you could consider the following approach:
/* We find all gaps. the output dataset is a mapping: the data of which minute (reference_minute) do we need to create each minute of data*/
data MINUTE_MAPPING (keep=current_minute reference_minute);
set YOUR_DATA;
by min;
retain last_minute 2; *set to the first minute you have;
if _N_ NE 1 and first.min then do;
/* Find gaps, map them to the last minute of data we have*/
if last_minute+1 < min then do;
do current_minute=last_minute+1 to min-1;
reference_minute=last_minute;
output;
end;
end;
/* For the available data, we map the minute to itself*/
reference_minute=min;
current_minute=min;
output;
*update;
last_minute=min;
end;
run;
/* Now we apply our mapping to the data */
*you must use proc sql because it is a many-to-many join, data step merge would give a different outcome;
proc sql;
create table RESULT as
select YD.current_minute as min, YD.rank, YD.qty
MINUTE_MAPPING as MM
join YOUR_DATA as YD
on (MM.reference_minute=YD.min)
;
quit;
The more performant approach would involve trickery with arrays.
But i find this approach a bit more appealing (disclaimer: at first thought), it is quicker to grasp (disclaimer again: imho) for someone else afterwards.
For good measure, the array approach:
data RESULT (keep=min rank qty);
set YOUR_DATA;
by min;
retain last_minute; *assume that first record really is first minute;
array last_data{5} _TEMPORARY_;
if _N_ NE 1 and first.min and last_minute+1 < min then do; *gap found;
do current_min=last_minute+1 to min-1;
*store data of current record;
curr_min=min;
curr_rank=rank;
curr_qty=qty;
*produce records from array with last available data;
do iter=1 to 5;
min = current_minute;
rank = iter;
qty = last_data{iter};
if qty NE . then output; *to prevent output of 5th element where there are only 4;
end;
*put back values of actual current record before proceeding;
min=curr_min;
rank=curr_rank;
qty=curr_qty;
end;
*update;
last_minute=min;
end;
*insert data for use on later missing minutes;
last_data{rank}=qty;
if last.min and rank<5 then last_data{5}=.;
output; *output actual current data point;
run;
Hope it helps.
Note, currently no access to a SAS client where i am. So untested code, might contain a couple of typo's.
Unless you have an absurd number of observations, I think transposing would make this easy.
I don't have access to sas at the moment so bear with me (I can test it out tomorrow if you can't get it working).
proc transpose data=data out=data_wide prefix=obs_;
by minute;
id rank;
var qty;
run;
*sort backwards so you can use lag() to fill in the next minute;
proc sort data=data_wide;
by descending minute;
run;
data data_wide; set data_wide;
nextminute = lag(minute);
run;
proc sort data=data_wide;
by minute;
run;
*output until you get to the next minute;
data data_wide; set data_wide;
*ensure that the last observation is output;
if nextminute = . then output;
do until (minute ge nextminute);
output;
minute+1;
end;
run;
*then you probably want to reverse the transpose;
proc transpose data=data_wide(drop=nextminute)
out=data_narrow(rename=(col1=qty));
by minute;
var _numeric_;
run;
*clean up the observation number;
data data_narrow(drop=_NAME_); set data_narrow;
rank = substr(_NAME_,5)*1;
run;
Again, I can't test this now, but it should work.
Someone else may have a clever solution that makes it so you don't have to reverse-sort/lag/forward-sort. I feel like I have dealt with this before but the obvious solution for me right now is to have it sorted backwards at whatever prior sort you do (you can do the transpose with a descending sort no problem) to save you an extra sort.
How can we do iteration in a sas dataset.
For example I have chosen the first. of a variable.
And want to find the occurence of a particular condition and set a value when it satisfy
SAS data step has a built-in loop over observations. You don't have to do any thing, unless you want to, for some reason. For instance, the following generates a random number for each observation:
data one;
set sashelp.class;
rannum = ranuni(0);
run;
If you want to loop over variables, then there are arrays. For example, the following initializes variables, var1 to var10, with random numbers:
data one;
array vars[1:10] var1-var10;
do i = 1 to 10;
vars[i] = ranuni(0);
end;
run;
The first. and last. flags are automatically generated when you set a (sorted) data with a by statement. An example:
proc sort data=sashelp.class out=class;
by age;
run;
data one;
set class;
by age;
first = first.age;
last = last.age;
run;
/* check */
proc print data=one;
run;
/* on lst
Obs Name Age first last
1 Joyce 11 1 0
2 Thomas 11 0 1
3 James 12 1 0
4 Jane 12 0 0
5 John 12 0 0
6 Louise 12 0 0
7 Robert 12 0 1
8 Alice 13 1 0
...
18 William 15 0 1
19 Philip 16 1 1
*/