Have two data their variables name and layout are exactly the same
data data1;
input var$ val1 val2 val;
datalines;
A 0 8 8
B 9 8 7
C 7 2 3
;
data data2;
input var$ val1 val2 val;
datalines;
A 0 7 8
B 9 8 7
C 5 2 3
;
Want the math diff in each numeric cell.Look for elegant and smart ways. The really data set has more variables and columns.
data want;
input var$ val1 val2 val;
datalines;
A 0 1 0
B 0 0 0
C 2 0 0
;
proc compare base=data1 compare=data2 out=diff outdif noprint;
id var;
run;
Assuming that the data structures are exactly the same and that both dataset have the exact same number of observations in the same relative order you could do this.
Basically copy the data from the first dataset into a temporary array and then read the data from the second dataset and perform the subtraction.
data want;
array _temp [1000] _temporary_ ;
set data1 ;
array _x _numeric_;
do _n_=1 to dim(_x);
_temp[_n_]=_x[_n_];
end;
set data2 ;
do _n_=1 to dim(_x);
_x[_n_] =_temp[_n_]-_x[_n_];
end;
run;
Make sure the size of the temporary array is large enough. Making too large will not hurt anything.
You can change the _numeric_ variable list to a more specific list of variables if you don't want to calculate the difference for all of the numeric fields. Any variable not included in the array will have the values read from the second dataset.
Related
Given two simple datasets A and B as follows
DATA A; INPUT X ##;
CARDS;
1 2 3 4
RUN;
DATA B; INPUT Y ##;
CARDS;
1 2
RUN;
I am trying to create two datasets named C and D, one using repeated SET and OUTPUT statements and another using DO loop.
DATA C;
SET B;
K=1; DO; SET A; OUTPUT; END;
K=K+1; DO; SET A; OUTPUT; END;
K=K+1;
RUN;
DATA D;
SET B;
DO K = 1 TO 2;
SET A; OUTPUT;
END;
RUN;
I thought that C and D should be the same as the DO loop is supposed to be repeating those statements as shown in the DATA step for C, but it turns out that they are different.
Dataset C:
Obs Y K X
1 1 1 1
2 1 2 1
3 2 1 2
4 2 2 2
Dataset D:
Obs Y K X
1 1 1 1
2 1 2 2
3 2 1 3
4 2 2 4
Could someone please explain this?
The two SET A statements in the first data step are independent. So on each iteration of the data step they will both read the same observation. So it is as if you ran this step instead.
data c;
set b;
set a;
do k=1 to 2; output; end;
run;
The SET A statement in the second data step will execute twice on the first iteration of the data step. So it will read two observations from A for each iteration of the data step.
If you really wanted to do a cross-join you would need to use point= option so that you could re-read one of the data sets.
data want ;
set b ;
do p=1 to nobs ;
set a point=p nobs=nobs ;
output;
end;
run;
Your Table B has two obs so your code will only do two iterations:
Every time you read a new observation K resets to 1, Solution: use Retain keyword.
When your current records is OBS 1 and you do an output, you will keep outputting the first row from each table, that's why you output the first and second rows twice from table A.
Debugging:
Iteration 1 current view:
Obs Table X
1 A 1
Obs Table Y k
1 B 1 1
Output:
K=1; DO; SET A; OUTPUT; END;
Obs Y K X
1 1 1 1
K=K+1; DO; SET A; OUTPUT; END;
Obs Y K X
2 1 2 1
Iteration 2 current view:
Obs Table X
2 A 2
Obs Table Y k
2 B 2 1
Output:
K=1; DO; SET A; OUTPUT; END;
Obs Y K X
3 2 1 2
K=K+1; DO; SET A; OUTPUT; END;
Obs Y K X
4 2 2 2
I have a row matrix (vector) A and another square matrix B. How can I multiply each row of matrix B with the row matrix A in SAS using proc iml or otherwise?
Let's say
a = {1 2 3}
b =
{2 3 4
1 5 3
5 9 10}
My output c would be:
{2 6 12
1 10 9
5 18 30}
Thanks!
Use the element-wise multiplication operator, # in IML:
proc iml;
a = {1 2 3};
b = {2 3 4,
1 5 3,
5 9 10};
c = a#b;
print c;
quit;
There's of course a non-IML solution, or twenty, though IML as Dom notes is probably easiest. Here's two.
First, get them onto one dataset, where the a dataset is on every row (with some other variable names) - see below. Then, either just do the math (use arrays) or use PROC MEANS or similar to use the a dataset as weights.
data a;
input w_x w_y w_z;
datalines;
1 2 3
;;;;
run;
data b;
input x y z;
id=_n_;
datalines;
2 3 4
1 5 3
5 9 10
;;;;
run;
data b_a;
if _n_=1 then set a;
set b;
*you could just multiply things here if you wanted;
run;
proc means data=b_a;
class id;
types id;
var x/weight=w_x;
var y/weight=w_y;
var z/weight=w_z;
output out=want sum=;
run;
I am having a problem with a dataset that looks like the one below. It is an inventory count of different location/weeks:
data have;
input itm location $ week inv;
cards;
3 x 1 30
3 x 2 20
3 x 3 0
3 x 4 5
3 y 1 100
3 y 2 90
3 y 3 0
3 y 4 6
4 x 1 30
4 x 2 0
4 x 3 40
4 x 4 10
;
run;
Here is the issue...once the inventory hits 0 for a specific location/item combination, I want all remaining weeks for that combination to be imputed with 0. My desired output looks like this:
data want;
input itm location $ week inv;
cards;
3 x 1 30
3 x 2 20
3 x 3 0
3 x 4 0
3 y 1 100
3 y 2 90
3 y 3 0
3 y 4 0
4 x 1 30
4 x 2 0
4 x 3 0
4 x 4 0
;
run;
I'm fairly new to SAS and don't know how to do this. Help?!
Thank you!
You can do that in the following steps:
by statement to indicate the order (the input dataset must be sorted accordingly)
retain statement to pass the value of a control variable (reset) to the following rows
deactivate the imputation (reset=0) for every first location/item combination
activate the imputation (reset=1) for zero values of inv
set to 0 if the imputation is active
Code:
data want (drop=reset);
set have;
by itm location week;
retain reset;
if first.location then reset=0;
if (inv = 0) then reset=1;
else if (reset = 1) then inv=0;
run;
The value of reset remains constant from row to row until explicitly modified.
The presence of the variable week in the by statement is only to check that the input data is chronologically sorted.
The following will use proc sql to give the wanted result. I have commented inline why I do different steps.
proc sql;
/* First of all fetch all observations where the inventory is depleated.*/
create table work.zero_inv as
select *, min(week) as min_zero_inv_week
from work.have where inv = 0
/* If your example data set had included several zero inventory weeks, without the follwing "commented" code you would got duplicates. I'll leave the excercise to explain this to you. Just alter your have data set and see the difference.*/
/*group by itm, location
having (calculated min_zero_inv_week) = week*/;
create table work.want_calculated as
/* Since we have fetched all weeks with zero inventories, we can use a left join and only update weeks that follows those with zeros and leave the inventory untouched for the rest.*/
select t1.itm, t1.location, t1.week,
/* Since we use a left join, we can check if the second data sets includes any rows at all for the specific item at the given location. */
case when t2.itm is missing or t1.week <= t2.week then t1.inv else t2.inv end as inv
from work.have as t1
left join work.zero_inv as t2
on t1.itm = t2.itm and t1.location = t2.location
/* proc sql does not promise to keep the order in your dataset, therefore it is good to sort it.*/
order by t1.itm, t1.location, t1.week;
run;
proc compare base=work.want compare=work.want_calculated;
title 'Hopefully no differences';
run;
Remember that stackoverflow isn't a "give me the code" forum, it is customary to try some solutions by yourself first. I'll cut you some slack since this is your first question; Welcome to SO :D.
Say that I have the following database:
Min Rank Qty
2 1 100
2 2 90
2 3 80
2 4 70
5 1 110
5 2 100
5 3 90
5 4 80
5 5 70
7 1 120
7 2 110
7 3 100
7 4 90
I need to have the database with the continuous values for minutes like this:
Min Rank Qty
2 1 100
2 2 90
2 3 80
2 4 70
3 1 100
3 2 90
3 3 80
3 4 70
4 1 100
4 2 90
4 3 80
4 4 70
5 1 110
5 2 100
5 3 90
5 4 80
5 5 70
6 1 110
6 2 100
6 3 90
6 4 80
6 5 70
7 1 120
7 2 110
7 3 100
7 4 90
How can I do this in SAS? I just need to replicate the previous minute. The number of observations per minute varies...it can be 4 or 5 or more.
It is not that hard to imagine code that would do this, the problem is that it quickly starts to look messy.
If your dataset is not too large, one approach you could consider the following approach:
/* We find all gaps. the output dataset is a mapping: the data of which minute (reference_minute) do we need to create each minute of data*/
data MINUTE_MAPPING (keep=current_minute reference_minute);
set YOUR_DATA;
by min;
retain last_minute 2; *set to the first minute you have;
if _N_ NE 1 and first.min then do;
/* Find gaps, map them to the last minute of data we have*/
if last_minute+1 < min then do;
do current_minute=last_minute+1 to min-1;
reference_minute=last_minute;
output;
end;
end;
/* For the available data, we map the minute to itself*/
reference_minute=min;
current_minute=min;
output;
*update;
last_minute=min;
end;
run;
/* Now we apply our mapping to the data */
*you must use proc sql because it is a many-to-many join, data step merge would give a different outcome;
proc sql;
create table RESULT as
select YD.current_minute as min, YD.rank, YD.qty
MINUTE_MAPPING as MM
join YOUR_DATA as YD
on (MM.reference_minute=YD.min)
;
quit;
The more performant approach would involve trickery with arrays.
But i find this approach a bit more appealing (disclaimer: at first thought), it is quicker to grasp (disclaimer again: imho) for someone else afterwards.
For good measure, the array approach:
data RESULT (keep=min rank qty);
set YOUR_DATA;
by min;
retain last_minute; *assume that first record really is first minute;
array last_data{5} _TEMPORARY_;
if _N_ NE 1 and first.min and last_minute+1 < min then do; *gap found;
do current_min=last_minute+1 to min-1;
*store data of current record;
curr_min=min;
curr_rank=rank;
curr_qty=qty;
*produce records from array with last available data;
do iter=1 to 5;
min = current_minute;
rank = iter;
qty = last_data{iter};
if qty NE . then output; *to prevent output of 5th element where there are only 4;
end;
*put back values of actual current record before proceeding;
min=curr_min;
rank=curr_rank;
qty=curr_qty;
end;
*update;
last_minute=min;
end;
*insert data for use on later missing minutes;
last_data{rank}=qty;
if last.min and rank<5 then last_data{5}=.;
output; *output actual current data point;
run;
Hope it helps.
Note, currently no access to a SAS client where i am. So untested code, might contain a couple of typo's.
Unless you have an absurd number of observations, I think transposing would make this easy.
I don't have access to sas at the moment so bear with me (I can test it out tomorrow if you can't get it working).
proc transpose data=data out=data_wide prefix=obs_;
by minute;
id rank;
var qty;
run;
*sort backwards so you can use lag() to fill in the next minute;
proc sort data=data_wide;
by descending minute;
run;
data data_wide; set data_wide;
nextminute = lag(minute);
run;
proc sort data=data_wide;
by minute;
run;
*output until you get to the next minute;
data data_wide; set data_wide;
*ensure that the last observation is output;
if nextminute = . then output;
do until (minute ge nextminute);
output;
minute+1;
end;
run;
*then you probably want to reverse the transpose;
proc transpose data=data_wide(drop=nextminute)
out=data_narrow(rename=(col1=qty));
by minute;
var _numeric_;
run;
*clean up the observation number;
data data_narrow(drop=_NAME_); set data_narrow;
rank = substr(_NAME_,5)*1;
run;
Again, I can't test this now, but it should work.
Someone else may have a clever solution that makes it so you don't have to reverse-sort/lag/forward-sort. I feel like I have dealt with this before but the obvious solution for me right now is to have it sorted backwards at whatever prior sort you do (you can do the transpose with a descending sort no problem) to save you an extra sort.
How can we do iteration in a sas dataset.
For example I have chosen the first. of a variable.
And want to find the occurence of a particular condition and set a value when it satisfy
SAS data step has a built-in loop over observations. You don't have to do any thing, unless you want to, for some reason. For instance, the following generates a random number for each observation:
data one;
set sashelp.class;
rannum = ranuni(0);
run;
If you want to loop over variables, then there are arrays. For example, the following initializes variables, var1 to var10, with random numbers:
data one;
array vars[1:10] var1-var10;
do i = 1 to 10;
vars[i] = ranuni(0);
end;
run;
The first. and last. flags are automatically generated when you set a (sorted) data with a by statement. An example:
proc sort data=sashelp.class out=class;
by age;
run;
data one;
set class;
by age;
first = first.age;
last = last.age;
run;
/* check */
proc print data=one;
run;
/* on lst
Obs Name Age first last
1 Joyce 11 1 0
2 Thomas 11 0 1
3 James 12 1 0
4 Jane 12 0 0
5 John 12 0 0
6 Louise 12 0 0
7 Robert 12 0 1
8 Alice 13 1 0
...
18 William 15 0 1
19 Philip 16 1 1
*/