I need help to split a row into multiple rows when the value on the row is something like 1-5. The reason is that I need to count 1-5 to become 5, and not 1, as it is when it count on one row.
I've a ID, the value and where it belong.
As exempel:
ID Value Page
1 1-5 2
The output I want is something like this:
ID Value Page
1 1 2
1 2 2
1 3 2
1 4 2
1 5 2
I've tried using a IF-statement
IF bioVerdi='1-5' THEN
DO;
..
END;
So I don't know what I should put between the DO; and END;. Any clues to help me out here?
You need to loop over the values inside your range and OUTPUT the values. The OUTPUT statement causes the Data Step to write a record to the output data set.
data want;
set have;
if bioVerdi = '1-5' then do;
do value=1 to 5;
output;
end;
end;
Here is another solution that is less restricted to the actual value '1-5' given in your example, but would work for any value in the format '1-6', '1-7', '1-100', etc.
*this is the data you gave ;
data have ;
ID = 1 ;
value = '1-5';
page = 2;
run;
data want ;
set have ;
min = scan( value, 1, '-' ) ; * get the 1st word, delimited by a dash ;
max = scan( value, 2, '-' ) ; * get the 2nd word, delimited by a dash ;
/*loop through the values from min to max, and assign each value as the loop iterates to a new column 'NEWVALUE.' Each time the loop iterates through the next value, output a new line */
do newvalue = min to max ;
output ;
end;
/*drop the old variable 'value' so we can rename the newvalue to it in the next step*/
drop value min max;
/*newvalue was a temporary name, so renaming here to keep the original naming structure*/
rename newvalue = value ;
run;
Related
I have a dataset that contains an ID and some additional data. I want to perform transformations based on the ID with a by statement. The transformation works. Unfortunately SAS automatically reduces the dataset to one row per group. Does anybody know how to keep the original (number of) rows and still perform the group actions?
Here is some sample code to illustrate my problem
data dat;
input ID X $;
datalines;
1 a
1 b
1 c
1 d
2 a
2 b
3 a
4 k
5 z
5 a
5 c
;
data dat_new;
length x_new $2100.;
do until(last.ID);
set dat;
by ID notsorted;
x_new = ',' ||catx(',',x,x_new);
end;
drop x;
run;
Just add an OUTPUT statement inside the DO loop.
data dat_new;
length x_new $2100.;
do until(last.ID);
set dat;
by ID notsorted;
x_new = ',' ||catx(',',x,x_new);
output;
end;
drop x;
run;
When you do not have an explicit OUTPUT statement in a data step then an implied OUTPUT statement executes at the end of the data step. Your DO loop around the SET statement means that the end of the data step is only reached for the last observation per group.
If you want the final calculated value to be replicated on each observation then just add another loop to re-read the observations and put the OUTPUT statement in that loop.
data dat_new;
length x_new $2100.;
do until(last.ID);
set dat;
by ID notsorted;
x_new = ',' ||catx(',',x,x_new);
end;
do until(last.ID);
set dat;
by ID notsorted;
output;
end;
drop x;
run;
When you want to associate a group level computation result to EACH row in the group you will need to first iterate over the group to compute the result, and then have a second loop that reads the same rows of the group and outputs each. Use additional variables if you need to know the sequence number within the group and the total number of rows in the group.
data want(keep=id x_csv_list by_group_size seq);
length x_csv_list $2100.;
do by_group_size = 1 by 1 until(last.ID);
set dat;
by ID notsorted;
x_csv_list = catx(',',x_csv_list,x);
end;
do seq = 1 to by_group_size;
set dat;
output;
end;
run;
Also, if you are at the 'never really get it' stage, remember NOTSORTED means contiguous rows with the same by group variable values.
by s
s group first.s last.s
- ----- ------- ------
A 1st 1 0
A 1st 0 0 /* trick knowledge both 0 means row is interior */
A 1st 0 1
B 2nd 1 1 /* trick knowledge both 1 means group size is 1 row */
A 3rd 1 0
A 3rd 0 1
B 4th 1 0
B 4th 0 0
B 4th 0 1
C 5th 1 0
C 5th 0 1
I have a dataset similiar to this.I am not sure how many coulmns I would get or rows as it is part of the code. But I will have the first value to be equal to 0 bucket.
DATA MY_data;
INPUT bucket D_201503 D_201504 ;
DATALINES;
0 1000 20500
1 200 6700
2 101 456
3 45 567
;
eg -In this dataset I want the values below 10% of the first row value should be missing. like for eg first value is 1000 for bucket 0 so 45 should be missing. The same for 20500 as well.Anything below 10% should be missing. The dataset is generally not huge but need to determine columns and rows.
So I should get this as
0 1000 20500
1 200 6700
2 101 .
3 . .
I am not sure how I should loop through the dataset and make this condition
DATA MY_data;
INPUT bucket D_201503 D_201504 ;
DATALINES;
0 1000 20500
1 200 6700
2 101 456
3 45 567
;
data want;
set MY_data;
array row(*) _all_;
array _first_row(999); /*any number >= the number of columns of MY_data*/
/*we read the first line and store the values in _first_row array*/
retain _first_row:;
if _n_ = 1 then do i=1 to dim(row);
_first_row(i) = row(i);
end;
/*replacing values <10% of the first row*/
else do i=1 to dim(row);
if upcase(vname(row(i))) ne "BUCKET" and row(i) < 0.1*_first_row(i) then row(i) = .;
end;
drop i _first_row:;
run;
/*Find out how many variables there are (assume we just want all vars prefixed D_)*/
data _null_;
set my_data(obs = 1);
array vars_of_interest(*) D_:;
call symput(dim(vars_of_interest),"nvars")
run;
/*Save bucket 0 values to a temp array, compare each row and set missing values*/
data want;
set my_data;
array bucket_0(&nvars) _temporary_;
array vars_of_interest(*) D_:;
do i = 1 to &nvars;
if bucket = 0 then bucket_0[i] = vars_of_interest[i];
else if vars_of_interest[i] < bucket_0[i] / 100 then call missing(vname(vars_of_interest[i]))
end;
run;
You need a way to remember the values from the first row (or perhaps from the row where BUCKET=0?) so that you can then compare the value from the first row to the current value. A temporary ARRAY is an easy way to do that.
So assuming that BUCKET is always the first numeric variable in your data then you can just do something like this.
data want ;
set my_data;
array x _numeric_;
array y (1000) _temporary_;
do i=2 to dim(x);
if bucket=0 then y(i)=x(i);
else if x(i) < y(i) then x(i)=.;
end;
drop i;
run;
If BUCKET is not the first variable then you could add retain bucket; before the set statement to force it to be the first. Or change the first array statement to list the specific variables you want to process, just remember to change the lower bound on the DO loop.
If you have more than a thousand variables then increase the dimension of the temporary array.
I am new in SAS and want to check if all entries in a variable in a data set satisfy a condition (namely =1) and return just one dummy variable 0 pr one depending whether all entries in the variable are 1 or at least one is not 1.
Any idea how to do it?
IF colvar = 1 THEN dummy_variable = 1
creates another variable dummy_variable of the same size as the original variable.
Thank you
* Generate test data;
data have;
colvar=0;
colvar2=0;
do i=1 to 20;
colvar=round(ranuni(0));
output;
end;
drop i;
run;
* Read the input dataset twice, first counting the number
* of observations and setting the dummy variables to 1 if
* the corresponding variable has the value 1 in any obser-
* vation, second outputting the result. The dummy variables
* remain unchanged during the second loop.;
data want;
_n_=0;
d_colvar=0;
d_colvar2=0;
do until (eof);
set have end=eof;
if colvar = 1
then d_colvar=1;
if colvar2 = 1
then d_colvar2=1;
* etc.... *;
_n_=_n_+1;
end;
do _n_=1 to _n_;
set have;
output;
end;
run;
PROC SQL is a good tool for quickly generating a summary of an arbitrarily defined condition. What exactly your condition is is not clear. I think you want the ALL_ONE value in the table the code below generates. That will be 1 when every observation has COLVAR=1. Any value that is NOT a one will cause the condition to be false (0) and so ALL_ONE will then have a value of 0 instead of 1.
You could store the result into a small table.
proc sql ;
create table check_if_one as
select min( colvar=1 ) as all_one
, max( colvar=1 ) as any_one
, max( colvar ne 1 ) as any_not_one
, min( colvar ne 1 ) as all_not_one
from my_table
;
quit;
But you could also just store the value into a macro variable that you could easily use later for some purpose.
proc sql noprint ;
select min( colvar=1 ) into :all_one trimmed from my_table ;
quit;
Maybe a stupid question...
I got following dataset:
id count
x 1
y 2
z 3
a 1
b 2
c 3
etc.
And i want this:
id count group
x 1 1
y 2 1
z 3 1
a 1 2
b 2 2
c 3 2
etc.
Here is what I try:
data macro_1; set vix.macro_spy; where macro=1;
count+1;
if count>3 then do;
count=1;
end;
group=0;
if count=1 then group+1;
run;
But it is not working. How can I add all 'group' by one if I once get a 'count=1'?
Thanks.
even simpler
data want;
set vix.macro_spy;
group+(count=1);
run;
I'm not sure I understand what you need. So you have this dataset ordered so that values of variable count always go 1, 2, 3, 1, 2, 3, 1, 2, 3...
Now, you want to generate variable group so that value increments every time variable count passes over 3?
If so, you could do something like this:
data group;
set vix.macro_spy;
retain group;
if _N_ = 1 then group = 0;
if count = 1 then group + 1;
run;
This is the general pattern that I'm using.
if _N_ = 1 part is executed only once, this is where you initialize you variables.
retain statement ensures that the variable will retain its value from one iteration of the DATA step to the next.
I have a dataset like this(sp is an indicator):
datetime sp
ddmmyy:10:30:00 N
ddmmyy:10:31:00 N
ddmmyy:10:32:00 Y
ddmmyy:10:33:00 N
ddmmyy:10:34:00 N
And I would like to extract observations with "Y" and also the previous and next one:
ID sp
ddmmyy:10:31:00 N
ddmmyy:10:32:00 Y
ddmmyy:10:33:00 N
I tired to use "lag" and successfully extract the observations with "Y" and the next one, but still have no idea about how to extract the previous one.
Here is my try:
data surprise_6_step3; set surprise_6_step2;
length lag_sp $1;
lag_sp=lag(sp);
if sp='N' and lag(sp)='N' then delete;
run;
and the result is:
ID sp
ddmmyy:10:32:00 Y
ddmmyy:10:33:00 N
Any methods to extract the previous observation also?
Thx for any help.
Try using the point option in set statement in data step.
Like this:
data extract;
set surprise_6_step2 nobs=nobs;
if sp = 'Y' then do;
current = _N_;
prev = current - 1;
next = current + 1;
if prev > 0 then do;
set x point = prev;
output;
end;
set x point = current;
output;
if next <= nobs then do;
set x point = next;
output;
end;
end;
run;
There is an implicite loop through dataset when you use it in set statement.
_N_ is an automatic variable that contains information about what observation is implicite loop on (starts from 1). When you find your value, you store the value of _N_ into variable current so you know on which row you have found it. nobs is total number of observations in a dataset.
Checking if prev is greater then 0 and if next is less then nobs avoids an error if your row is first in a dataset (then there is no previous row) and if your row is last in a dataset (then there is no next row).
/* generate test data */
data test;
do dt = 1 to 100;
sp = ifc( rand("uniform") > 0.75, "Y", "N" );
output;
end;
run;
proc sql;
create table test2 as
select *,
monotonic() as _n
from test
;
create table test3 ( drop= _n ) as
select a.*
from test2 as a
full join test2 as b
on a._n = b._n + 1
full join test2 as c
on a._n = c._n - 1
where a.sp = "Y"
or b.sp = "Y"
or c.sp = "Y"
;
quit;