I have a binary string like '100111111100001111111111000'
It shows as a char variable in SAS.
How can I capture every single change either from 1 to 0 or 0 to 1?
My idea output would be like
Blockquote
type position
1-0 2
0-1 4
1-0 11
0-1 15
1-0 22
I stuck at how to write a recursive statement.(process like 20000 string all in once , every string could be really long......) I'm thinking I can have
zero=index(string,'0'); one=index(string,'1'); if zero>one then
string=substr(string, zero); else if zero
Is this a right direction? How should I put in a DO LOOP statement?
Thank you very much
Aaron
Seems reasonable to me. Slightly simplified.
do position = 1 to length(String)-1;
if subpad(string,position,2)='10' then do;
... output a row for the 1-0 change ...
end;
else if subpad(string,position,2)='01' then do;
... output a row for the 0-1 change ...
end;
end;
With you doing whatever it is you want to output (I assume something like setting a variable to '1-0' and then output;).
I use SUBPAD there sort of out of habit, SUBSTR should work just as well as long as you check the string length properly. SUBPAD won't error if it goes past the end of the string is all.
Please try these codes to see if this is what you are looking for
data have (keep=type pos);
retain type pos;
x = '100111111100001111111111000';
ct01 = count(x,'01');
ct10 = count(x,'10');
pos = 1;
do i =1 to ct01;
pos = find(x,'01',pos)+1;
type='0-1';
output;
end;
pos = 1;
do i =1 to ct10;
pos = find(x,'10',pos)+1;
type='1-0';
output;
end;
run;
proc sort data=have;
by pos;
run;
Related
I have 18 numerical variables pm25_total2000 to pm25_total2018
Each person have a starting year between 2013 and 2018, we can call that variable "reqyear".
Now I want to calculate mean for each persons 10 years before the starting year.
For example if a person have starting year 2015 I want mean(of pm25_total2006-pm25_total2015)
Or if a person have starting year 2013 I want mean(of pm25_total2004-pm25_total2013)
How to do this?
data _null_;
set scapkon;
reqyear=substr(iCDate,1,4)*1;
call symput('reqy',reqyear);
run;
data scatm;
set scapkon;
/* Medelvärde av 10 år innan rekryteringsår */
pm25means=mean(of pm25_total%eval(&reqy.-9)-pm25_total%eval(&reqy.));
run;
%eval(&reqy.-9) will be constant value (the same value for all as for the first person) , in my case 2007
That doesn't work.
You can compute the mean with a traditional loop.
data want;
set have;
array x x2000-x2018;
call missing(sum, mean, n);
do _n_ = 1 to 10;
v = x ( start - 1999 -_n_ );
if not missing(v) then do;
sum + v;
n + 1;
end;
end;
if n then mean = sum / n;
run;
If you want to flex your SAS skill, you can use POKE and PEEK concepts to copy a fixed length slice (i.e. a fixed number of array elements) of an array to another array and compute the mean of the slice.
Example:
You will need to add sentinel elements and range checks on start to prevent errors when start-10 < 2000.
data have;
length id start x2000-x2018 8;
do id = 1 to 15;
start = 2013 + mod(id,6);
array x x2000-x2018;
do over x;
x = _n_;
_n_+1;
end;
output;
end;
format x: 5.;
run;
data want;
length id start mean10yrPriorStart 8;
set have;
array x x2000-x2018;
array slice(10) _temporary_;
call pokelong (
peekclong ( addrlong ( x(start-1999-10) ) , 10*8 ) ,
addrlong ( slice (1))
);
mean10yrPriorStart = mean(of slice(*));
run;
use an array and loop
index the array with years
accumulate the sum of the values
accumulate the count to account for any missing values
divide to obtain the mean value
data want;
set have;
array _pm(2000:2018) pm25_total2000 - pm25_total2018;
do year=reqyear to (reqyear-9) by -1;
*add totals;
total = sum(total, _pm(year));
*add counts;
nyears = sum(nyears,not missing(_pm(year)));
end;
*accounts for possible missing years;
mean = total/nyears;
run;
Note this loop goes in reverse (start year to 9 years previous) because it's slightly easier to understand this way IMO.
If you have no missing values you can remove the nyears step, but not a bad thing to include anyways.
NOTE: My first answer did not address the OP's question, so this a redux.
For this solution, I used Richard's code for generating test data. However, I added a line to randomly add missing values.
x = _n_;
if ranuni(1) < .1 then x = .;
_n_+1;
This alternative does not perform any checks for missing values. The sum() and n() functions inherently handle missing values appropriately. The loop over the dynamic slice of the data array only transfers the value to a temporary array. The final sum and count is performed on the temp array outside of the loop.
data want;
set have;
array x(2000:2018) x:;
array t(10) _temporary_;
j = 1;
do i = start-9 to start;
t(j) = x(i);
j + 1;
end;
sum = sum(of t(*));
cnt = n(of t(*));
mean = sum / cnt;
drop x: i j;
run;
Result:
id start sum cnt mean
1 2014 72 7 10.285714286
2 2015 305 10 30.5
3 2016 458 9 50.888888889
4 2017 631 9 70.111111111
I have a dataset that consists of variables named month0-month120 and for each record I am trying to check if these variables equal a particular value. I am having a bit of trouble trying to do this dynamically rather than writing 120 lines of code. How would be the proper way to accomplish this? I am also having trouble formulating how to word the question which is also hindering me when searching online.
Edit: So basically I have this time series of values from the last 5 years represented in month0-120. I am trying to see how many '.' values are present within this array for each record. An example of input is as such
data testing;
set blah;
len = 0;
do i = 0 to 120;
if month[i] = . then len+1;
end;
run;
To count the number of missing values use NMISS().
data testing;
set blah;
len = nmiss(of month0-month120);
run;
Note CMISS() will also work since CMISS() works with both numeric and character variables.
For more general solution for referencing a set of variables use an ARRAY.
data testing;
set blah;
array months month0-month120;
do index=1 to dim(months);
* do something with MONTHS[index] ;
end;
run;
For the code you posted, you would need to explicitly declare your array and it's easier if you specify the index from 0 to 120. Otherwise, SAS would index it from 1 to 121 essentially.
data testing;
set blah;
array months(0:120) month0-month120;
len = 0;
do i = 0 to 120;
if month[i] = . then len+1;
end;
run;
I am trying to obtain random numer generation without repetition. My idea is to make do while loop which will go 5 times. Inside i will get the random numer, store it in the table and check at every iteration if the picked numer is in the table or not and then decide if this random pick is an repetition or not.
Here is my code where I try to perform my idea but something is wrong and i do not know where i made a mistake.
data WithoutRepetition;
counter = 0;
array temp (5) _temporary_;
do while(1);
rand=round(4*ranuni(0) +1,1);
if counter = 0 then
do;
temp(1) = rand;
counter=counter+1;
output;
continue;
end;
do a=1 to counter by 1;
if temp(a) = rand then continue ;
end;
temp(counter) = rand;
output;
counter=counter+1;
if counter = 5 then do;
leave;
end;
end;
run;
Sounds like you want a random permutation.
165 data _null_;
166 seed=12345;
167 array r[5] (1:5);
168 put r[*];
169 call ranperm(seed,of r[*]);
170 put r[*];
171 run;
1 2 3 4 5
5 1 4 3 2
This is a simplified version of what you are trying to do.
data WithoutRepetition;
i=0;
array temp[5];
do r=1 by 1 until(i eq dim(temp));
rand=round(4*ranuni(0)+1,1);
if rand not in temp then do; i+1; temp[i]=rand; end;
end;
drop i rand;
run;
You were reasonably close to a working, if convoluted solution. For educational purposes, although data _null_'s answer is much cleaner, here's why your code wasn't working:
Your leave statement is inside a do-end block which is inside another do-loop. Leave statements only break out of the innermost do-loop, so yours has no effect.
The same is true of your continue statements, the first of which is completely unnecessary.
Because you are updating your array with newly-found unique values before you increment counter, previously populated values are being overwritten. This often results in duplicates of the overwritten values appearing in your output.
I would place continue and leave in the same category as goto - avoid using them if at all possible, as they tend to make code difficult to debug. It's clearer to set the exit conditions for all your loops at the point of entry.
Just for fun, though, here's a fixed version of your original code:
data WithoutRepetition;
counter = 0;
array temp (5) _temporary_;
do while(1);
rand=round(4*ranuni(0) +1,1);
if counter = 0 then do;
temp(1) = rand;
counter +1;
output;
end;
dupe = 0;
do a=1 to counter;
if temp(a) = rand then dupe=1;
end;
if dupe then continue;
counter +1;
temp(counter) = rand;
output;
if counter = 5 then leave;
end;
run;
And here is an equivalent version with all the leave and continue statements replaced with more readable alternatives:
data WithoutRepetition;
counter = 0;
array temp (5) _temporary_;
do while(counter < 5);
rand=round(4*ranuni(0) +1,1);
if counter = 0 then do;
temp(1) = rand;
counter +1;
output;
end;
else do;
dupe = 0;
do a=1 to counter while(dupe = 0);
if temp(a) = rand then dupe=1;
end;
if dupe = 0 then do;
counter +1;
temp(counter) = rand;
output;
end;
end;
end;
run;
I have a SAS dataset, with the following variables: ID, Var1_0, Var1_3, Var1_6, Var2_0, Var2_3, Var2_6, which can be read like this: Var1_0 is parameter 1 at time 0. For every subjects I have 2 variables and 3 time points. I want to transpose this into a long format using an array, I did this:
data long;
set wide;
array a Var1_0 Var1_3 Var1_6 Var2_0 Var_3 Var_6;
Do i=1 to dim(a);
outcome = a[i];
*Var = ???;
if (mod(i,3)=1) then Time = 0;
else if (mod(i,3)=2) then Time = 3;
else Time = 6;
Output;
end;
keep ID Outcome Time;
run;
The problem is that I don't know how to calculate the parameter variable, i.e., I want to add a variables that is either 1 or 2, depending on which parameter the value is related to. Is there a better way of doing this? Thank you !
Reeza gave you the answer in her comment. Here it is typed out.
data long;
set wide;
array a[*] Var1_0 Var1_3 Var1_6 Var2_0 Var2_3 Var2_6;
do i=1 to dim(a);
outcome = a[i];
var = vname(a[i]);
time = input(scan(var,2,'_'),best.);
/*Other stuff you want to do*/
output;
end;
run;
VNAME(array[sub]) gives you the variable name of the variable referenced by array[sub].
scan(str,i,delim) gives you the ith word in str using the specified delimiter.
I am trying to create a loop that wiil shortcut code writing.
I want that every veriable from x1- x30 will be equal: x square i, when i is the index of x1(i.e 1).
For example x7 will be x7=x**7;
I wrote a code, but it doesn't work. and i don't know how to fix him. I will glad for your help people.
DATA maarah (drop = i e);
e = constant("e");
do i = -10 to 10 by 0.01;
x=i;
y=e**x;
output;
end;
length x1-x30 $2001;
do i =1 to 30 by 1;
x i=x**i;
output;
end;
run;
You're close. You need to declare an array. You don't explain what the first half is (the e**i part), so it's not clear what you want here - do you want a few thousand rows with powers of e, and then some rows with x1-x30? And why do you output each time in the second loop? To answer the core question, here:
DATA maarah (drop = i e);
e = constant("e");
do i = -10 to 10 by 0.01;
x=i;
y=e**x;
output;
end;
*length x1-x30 $2001; *what is this? Why do you want it 2001 characters, instead of numeric?;
array xs x1-x30; *you would need a $ after this if you truly wanted character;
do i =1 to 30 by 1;
xs[i]=x**i;
*output; *You probably do not want this. Output is probably outside of the loop.;
end;
run;
I would guess what you really want is this:
DATA maarah (drop = i e);
e = constant("e");
do i = -10 to 10 by 0.01;
x=i;
y=e**x;
*length x1-x30 $2001; *what is this? Why do you want it 2001 characters, instead of numeric?;
array xs x1-x30; *you would need a $ after this if you truly wanted character;
do j =1 to 30;
xs[j]=x**j;
end; *the x1-x30 loop;
output;
end; *the outer loop;
run;