Using Arrays for loops in SAS - sas

I have this dataset
DATA Problem3;
INPUT name $ Smoke_Tobacco $ Drink_Alcohol $ Take_Illegal_Drugs $ Drink_Soda $;
DATALINES;
Tom yes no no yes
Harry Yes Yes Yes No
Jim No No No Yes
Bob Yes No No Yes
Andy No Yes No Yes
Cody yes no no no
Ed Yes no no yes
Greg no Yes no No
Dave Yes No Yes no
;
RUN;
And I want to use loops to change all "yes" responses to "Yes" and all "no" responses to "No".
My idea is to use arrays for this such as what is shown in the Little SAS workbook
DATA songs;
INFILE 'c:\MyRawData\KBRK.dat';
INPUT City $ 1-15 Age wj kt tr filp ttr;
ARRAY song (5) wj kt tr filp ttr;
DO i = 1 TO 5;
IF song(i) = 9 THEN song(i) = .;
END;
Run;
Which replaces "9" with ".". So I edit my code to
DATA Problem3;
INPUT name $ Smoke_Tobacco $ Drink_Alcohol $ Take_Illegal_Drugs $ Drink_Soda $;
DATALINES;
Tom yes no no yes
Harry Yes Yes Yes No
Jim No No No Yes
Bob Yes No No Yes
Andy No Yes No Yes
Cody yes no no no
Ed Yes no no yes
Greg no Yes no No
Dave Yes No Yes no
;
ARRAY Answer (4) Smoke_Tobacco Drink_Alcohol Take_Illegal_Drugs Drink_Soda;
DO i=1 TO 9;
IF Answer(i) = 'yes' THEN Answer(i)= 'Yes';
ELSE IF Answer(i) = 'no' THEN Answer(i)= 'No';
END;
RUN;
But I get errors saying that the lines in my addition are either not valid or out of order. How do I go about fixing this?

Declare your array as a character, so add the $ into the array declaration
Just apply PROPCASE to the variables which will standardize all to have the first letter as a capital.
Why are you looping to 9, when you only have 4 items?
You have to put the array statements BEFORE the datalines. Nothing after the data is processed.
DATA Problem3;
INPUT name $ Smoke_Tobacco $ Drink_Alcohol $ Take_Illegal_Drugs $ Drink_Soda $;
ARRAY Answer (4) $3. Smoke_Tobacco Drink_Alcohol Take_Illegal_Drugs Drink_Soda;
DO i=1 TO 4;
answer(i)=propcase(answer(i));
END;
DATALINES;
Tom yes no no yes
Harry Yes Yes Yes No
Jim No No No Yes
Bob Yes No No Yes
Andy No Yes No Yes
Cody yes no no no
Ed Yes no no yes
Greg no Yes no No
Dave Yes No Yes no
;
RUN;

You cannot add statements after the end of the data step!
Just move your new statement to before the DATALINES; statement that marks the end of your data step and the beginning of your in-line data. Also make sure the upper bound on your do loop matches the size of your array. Let SAS figure that out for you. do i=1 TO dim(answer);
This is the second confusion along these lines that I have seen recently? I wonder if it is caused by people reflexively adding an extra run; statement after the end of their data steps that use in-line data? That extra run; is not needed and becomes a new empty step, not part of the original data step.

Related

I keep on getting errors in SAS

I am just testing my SAS code it was working but now it is not. It says it is invalid.
Sample code:
data have;
$ default $ student $;
cards;
(1) yes yes
(2) Yes No
(3) NO Yes
(4) No No
;
Thanks in advance!
I suspect you wanted to have an INPUT statement in your code. Also you need to have a variable name for that first string.
data have;
input col1 $ default $ student $;
cards;
(1) yes yes
(2) Yes No
(3) NO Yes
(4) No No
;

Is there way with a SAS data step to add a record count column the increases when a variable changes?

I have a table which has dates in it and the member changes over time. I want to know when the member started and ended. If the member starts and ends and then restarts that needs to be a different indicator.
Sample of what I have (sorry I don't know how to make a table here):
member yyyymm
Jim 201603
Jim 201606
Jim 201609
Bob 201709
Bob 201712
Jim 201806
Jef 201806
Jef 201809
I tried a proc sql statement which finds min and max date but then the max date is wrong if the member restarts (code A below). I also tried a data step and that said it wasn't properly sorted (code B below)
code A
proc sql;
create table tst as
select
member,
max(yyyymm) as effective_until,
min(yyyymm) as effective_from
from tbl
group by 1,2;
quit;
code B
data tst;
count + 1;
by member;
if first.member then count = 1;
run;
What I'm hoping for:
member yyyymm id
Jim 201603 1
Jim 201606 1
Jim 201609 1
Bob 201709 2
Bob 201712 2
Jim 201803 3
Jef 201806 4
Jef 201809 4
proc sort data=have;
by yyyymm member;
data want;
set have;
by yyyymm member;
if first.member then id+1;
run;
So try the lag function that return parameter from previous call. So here it return the value from last observation (but handle with care). When the member is different from last observation simply change you id. For example by adding 1.
data have;
length member $3 yyyymm $6;
input member yyyymm;
cards;
Jim 201603
Jim 201606
Jim 201609
Bob 201709
Bob 201712
Jim 201806
Jef 201806
Jef 201809
run;
data want;
set have;
if lag(member)^=member then id+1;
run;

Changing a SAS character variable into a SAS numerical variable?

I have created the following SAS table:
DATA test;
INPUT name$ Group_Number;
CARDS;
Joseph 1
Stephanie 2
Linda 3
James 1
Jane 2;
run;
I would like to change group number from a character type into a numeric type.
Here is my attempt:
data test2;
set test;
Group_Number1 = input(Group_Number, best5.);
run;
The problem is that when I execute:
proc contents data = test2;
run;
The output table shows that group number is still of a character type. I think that the problem may be that I have "best5." in my input statement. However I am not 100% sure what is wrong.
How can I fix the solution?
If you have a character variable your code will work. But you don't, you have a numeric variable in your sample data. So either your fake data is incorrect, or you don't have the problem you think you do.
Here's an example that you can run to see this.
*read group_number as numeric;
DATA test_num;
INPUT name$ Group_Number;
CARDS;
Joseph 1
Stephanie 2
Linda 3
James 1
Jane 2
;
run;
Title 'Group_Number is Numeric!';
proc contents data=test;
run;
*read group_number as character;
DATA test_char;
INPUT name$ Group_Number $;
CARDS;
Joseph 1
Stephanie 2
Linda 3
James 1
Jane 2
;
run;
data test_converted;
set test_char;
group_number_num = input(group_number, 8.);
run;
Title 'Group_Number is Character, Group_Number1 is Numeric';
proc contents data=test_converted;
run;
try this:
data test2;
set test;
Group_Number1 = input(put(Group_Number,best5.),best5.);
run;

How to work across two datasets in SAS

I have two datasets described below
data1:
$restaurant $reviewers
A Tom
B Jack.Mary.Joan
C Tom.Joan
D Rose
data2 (sorted by the friends numbers):
$user $friends
Tom Joan.Mary.Jack
Jack Tom.Rose
Mary Tom
Joan Tom
The question is to calculate the overlap in the reviews of these users with the reviews of their friends.
Take an example of Tom, the restaurants Toms friends reviewed are B and C, from which C was also reviewed by Tom. So here the percentage is C/B+C = 1/2, so the overlap is 50%.
I think I need a loop to work across two datasets, but with very basic knowledge of SAS, I don't know how. Has anybody an idea?
Thank you very much.
You should try something like this.
data reviews;
infile datalines dsd dlm=",";
input restaurant $ reviewer $;
datalines;
A,Tom
B,Jack
B,Mary
B,Joan
C,Tom
C,Joan
D,Rose
;
run;
data users;
infile datalines dsd dlm=",";
input user $ friend $;
datalines;
Tom,Joan
Tom,Mary
Tom,Jack
Jack,Tom
Jack,Rose
Mary,Tom
Joan,Tom
;
run;
proc sql;
create table want as
select t1.user
,sum(case when t3.restaurant=t2.restaurant then 1 else 0 end)/count(*) as percentage
from users t1
inner join reviews t2
on t1.user=t2.reviewer
inner join reviews t3
on t1.friend=t3.reviewer
group by t1.user
;
quit;
I did'nt get your 0,5 value for Tom, but maybe you have a mistake.
So you can adapt the code as needed.
I followed the logic from here :
How to check percentage overlap in SAS

Keep reading input from next row in the same variable

I have data from a chat that i want to read in one entry at the time. Every time a person has hit "send" should be one observation. The problem is when there is breaks (enter) in the text. I can't manage to make SAS keep reading this as the same observation. Here is some dummy data:
08:23 - Greg: Hi!
08:24 - Sue: Hello
08:24 - Greg: How are you?
08:25 - Sue: Just fine :)
How are you then?
08:26 - Greg: All good.
I want this to be 5 observations but i can only manage SAS to read this as 7 obs. Desired dataset should look like:
Obs VAR1
1 08:23 - Greg: Hi!
2 08:24 - Sue: Hello
3 08:24 - Greg: How are you?
4 08:25 - Sue: Just fine :) How are you then?
5 08:26 - Greg: All good.
I play around with the code:
data testing;
infile datalines ;
input var1 $60. ;
datalines;
08:23 - Greg: Hi!
08:24 - Sue: Hello
08:24 - Greg: How are you?
08:25 - Sue: Just fine :)
How are you then?
08:26 - Greg: All good.
;
But the actual file is a txt and have more irregularities than the above dummy example. I have tried to use the trailing # but cant get it to work the way i want. Maybe trailing # is not what i am after. Any suggestions how to proceed?
Try this.
Keep a running variable that is the last value. If the current value has a time stamp in the first 4 characters, then output it and reset the value to "". Append the current value to the running variable. Finally, output the last line, no matter what.
data testing(keep=line);
set testing end=last;
format line $2000.;
retain line;
if _n_ > 1 then do;
if index(substr(var1,1,4),":") then do;
output;
line = "";
end;
end;
put line= var1=;
line = catx(" ",line , var1);
put line=;
if last then do;
output;
put "AT LAST";
end;
run;
I unsuccesfully tried to find a solution in row data input, anyway I hope that this will be useful for you, postprocessing strings:
data testing;
infile datalines ;
input var1 $60.;
datalines;
08:23 - Greg: Hi!
08:24 - Sue: Hello
08:24 - Greg: How are you?
08:25 - Sue: Just fine :)
How are you then?
08:26 - Greg: All good.
;
data testing01;
set testing;
retain row 0;
if input(substr(var1,1,2),8.) le 24 and input(substr(var1,1,2),8.) ne .
and substr(var1,3,1)=':'
and input(substr(var1,4,2),8.) le 59 and input(substr(var1,4,2),8.) ne . then row = row+1; else row=row;
run;
proc transpose data=testing01 out=testing02;
var var1;
by row;
run;
data testing03;
length final $2000;
set testing02;
array str[*] col:;
do i=1 to dim(str);
if str[i] ne '' then final=cats(strip(final)||' '||strip(str[i]));
end;
drop col: row i _name_;
run;
filename FT15F001 temp;
data testing ;
infile FT15F001 end=eof ;
length string $6323;
retain string;
input #;
if _n_=1 then string=_infile_;
else if not missing(_infile_) and anydigit(_infile_)^=1 then string=catx(' ',string,_infile_);
else if not missing(_infile_) and anydigit(_infile_)=1 then do;
output;
call missing(string);
string=_infile_;
end;
if eof then output;
PARMCARDS;
08:23 - Greg: Hi!
08:24 - Sue: Hello
08:24 - Greg: How are you?
08:25 - Sue: Just fine :)
How are you then?
08:26 - Greg: All good.
;
There are a lot of ways to do this, depending on your particular use case.
Here's a regular expression one. This won't work if you have > 32767 total characters, unless you have some way to split it into chunks, but for smaller files works well; and the general approach can be used even if you read in a line at a time.
data test;
infile "c:\temp\chat.txt" recfm=f lrecl=32767;
input #;
rx_find = prxparse('~(\d\d:\d\d -.*?)(?=(?:\b\d\d:\d\d)|$)~ios');
rc_find = prxmatch(rx_find,_infile_);
pos=1;
pos2=0;
start=1;
call prxposn(rx_find,1,pos,len);
do until (pos2=0);
call prxposn(rx_find,1,pos,len);
found=substr(_infile_,pos,len);
output;
start=pos+len;
call prxnext(rx_find,start,-1,_infile_,pos2,len2);
end;
stop;
run;