How to do multiple if statements sas EG - sas

I am new to sas EG so unsure how to do this:
I am trying to write a small bit of code within my proc sql that takes a column of phone numbers and creates a new flag column. I want the new column to produce a Y when the phone number is: not empty, 11 digits long, only digits and not equal to '11111111111' and a 'N' otherwise.
Thanks in advance.

This should work:
data have;
input phone_number :$20.;
datalines;
123456789
11111111111
122355648978
12354879874
555566648+8
246k5654322
56485432135
asdfghjklop
;
run;
Syntax for if statements in proc sql is the same as in sql: you use case statement. If you have multiple conditions, you can add more when condition then result lines.
proc sql;
create table want as
select phone_number,
case
when phone_number ne '' and phone_number ne '11111111111' and length(phone_number)=11 and length(compress(phone_number, ,"kd"))=11 then 'P'
else 'N'
end as flag
from have
;
quit;

Related

How to remove first 4 and last 3 letters, digits or punctuation using PROC SQL in SAS Enterprise Guide?

I am trying to remove 009, and ,N from 009,A,N just to obtain the letter A in my dataset. Please guide how do I obtain such using PROC SQL in SAS or even in the data step. I want to keep the variable name the same and just remove the above-mentioned digits, letters and punctuation from the data.
Is your original value saved in one variable? If so, you can utilize the scan function in a data step or in Proc SQL to extract the second element from a string that contains comma delimiter:
data want (drop=str rename=(new_str=str));
set orig;
length new_str $ 1;
new_str = strip(scan(str,2,','));
run;
proc sql;
create table work.want
as select *, strip(scan(str,2,',')) as new_str length=1
from orig;
quit;
In Proc SQL, if you want to replace the original column with the updated column, you can replace the * in the SELECT clause with the names of all columns other than original variable you are modifying.
What you code really depends on what other values there are in the other rows of the data set.
From the one sample value you provide the following code could be what you want.
data have;
input myvar $char80.;
datalines;
009,A,N
009-A-N
Okay
run;
proc sql;
update work.have
set myvar = scan(myvar,2,',')
where count(myvar,',') > 1
;

Insert into function with SAS/SQL

I want to insert values into a new table, but I keep getting the same error: VALUES clause 1 attempts to insert more columns than specified after the INSERT table name. This is if I don't put apostrophes around my date. If I do put apostrophes then I get told that the data types do not correspond for the second value.
proc sql;
create table date_table
(cvmo char(6), next_beg_dt DATE);
quit;
proc sql;
insert into date_table
values ('201501', 2015-02-01)
values ('201502', 2015-03-01)
values ('201503', 2015-04-01)
values ('201504', 2015-05-01);
quit;
The second value has to remain as a date because it used with > and < symbols later on. I think the problem may be that 2015-02-01 just isn't a valid date format since I couldn't find it on the SAS website, but I would rather not change my whole table.
Date literals (constants) are quoted strings with the letter d immediately after the close quote. The string needs to be in a format that is valid for the DATE informat.
'01FEB2015'd
"01-feb-2015"d
'1feb15'd
If you really want to insert a series of dates then just use a data step with a DO loop. Also make sure to attach one of the many date formats to your date values so that they will print as human understandable text.
data data_table ;
length cvmo $6 next_beg_dt 8;
format next_beg_dt yymmdd10.;
do _n_=1 to 4;
cvmo=put(intnx('month','01JAN2015'd,_n_-1,'b'),yymmn6.);
next_beg_dt=intnx('month','01JAN2015'd,_n_,'b');
output;
end;
run;
#tom suggest you in comments how to use date and gives very good answer how to it efficently, which is less error prone than typing values. I am just putting the same into the insert statement.
proc sql;
create table date_table
(cvmo char(6), next_beg_dt DATE);
quit;
proc sql;
insert into date_table
values ('201501', "01FEB2015"D)
;

SAS Proc SQL Trim not working?

I have a problem that seems pretty simple (probably is...) but I can't get it to work.
The variable 'name' in the dataset 'list' has a length of 20. I wish to conditionally select values into a macro variable, but often the desired value is less than the assigned length. This leaves trailing blanks at the end, which I cannot have as they disrupt future calls of the macro variable.
I've tried trim, compress, btrim, left(trim, and other solutions but nothing seems to give me what I want (which is 'Joe' with no blanks). This seems like it should be easier than it is..... Help.
data list;
length id 8 name $20;
input id name $;
cards;
1 reallylongname
2 Joe
;
run;
proc sql;
select trim(name) into :nameselected
from list
where id=2;
run;
%put ....&nameselected....;
Actually, there is an option, TRIMMED, to do what you want.
proc sql noprint;
select name into :nameselected TRIMMED
from list
where id=2;
quit;
Also, end PROC SQL with QUIT;, not RUN;.
It works if you specify a separator:
proc sql;
select trim(name) into :nameselected separated by ''
from list
where id=2;
run;

IN clause in SAS

I have the below dataset
data a;
input id stat$;
datalines;
10 a
20 a
10 t
40 t
;
run;
I am writing the below macro
%macro ath(inp);
data _null_;
set a (where=(id=&inp);
----if any one of the status of ID is 'a' then put 'valid';
else put 'invalid';----
%mend;
I am a beginner in SAS. Please help me in writing the syntax for - if any one of the status of ID is 'a' then put 'valid'; else put 'invalid'
It looks like you want to scan over all (selected) rows in the input data, and test whether any of them has ID = 'a'. You don't do this with an IN clause; that's for testing against a set of values for each row, as opposed to testing against one value for all rows.
data _null_;
set a (where=(id = &inp);
if id = 'a' then do;
put 'valid';
stop;
end;
run;
As you have a where statement here, surely all the id's you select are going to be valid? I.e. you are selecting all the rows where id = &inp, so if your variable &inp = 'a' then all rows of the new dataset will be valid. Sorry if I have missed something here!

How to create a new variable in SAS by extracting part of the value of an existing numeric variable?

I have two datasets in SAS that I would like to merge, but they have no common variables. One dataset has a "subject_id" variable, while the other has a "mom_subject_id" variable. Both of these variables are 9-digit codes that have just 3 digits in the middle of the code with common meaning, and that's what I need to match the two datasets on when I merge them.
What I'd like to do is create a new common variable in each dataset that is just the 3 digits from within the subject ID. Those 3 digits will always be in the same location within the 9-digit subject ID, so I'm wondering if there's a way to extract those 3 digits from the variable to make a new variable.
Thanks!
SQL(using sample data from Data Step code):
proc sql;
create table want2 as
select a.subject_id, a.other, b.mom_subject_id, b.misc
from have1 a JOIN have2 b
on(substr(a.subject_id,4,3)=substr(b.mom_subject_id,4,3));
quit;
Data Step:
data have1;
length subject_id $9;
input subject_id $ other $;
datalines;
abc001def other1
abc002def other2
abc003def other3
abc004def other4
abc005def other5
;
data have2;
length mom_subject_id $9;
input mom_subject_id $ misc $;
datalines;
ghi001jkl misc1
ghi003jkl misc3
ghi005jkl misc5
;
data have1;
length id $3;
set have1;
id=substr(subject_id,4,3);
run;
data have2;
length id $3;
set have2;
id=substr(mom_subject_id,4,3);
run;
Proc sort data=have1;
by id;
run;
Proc sort data=have2;
by id;
run;
data work.want;
merge have1(in=a) have2(in=b);
by id;
run;
an alternative would be to use
proc sql
and then use a join and the substr() just as explained above, if you are comfortable with sql
Assuming that your "subject_id" variable is a number then the substr function wont work as sas will try convert the number to a string. But by default it pads some paces on the left of the number.
You can use the modulus function mod(input, base) which returns the remainder when input is divided by base.
/*First get rid of the last 3 digits*/
temp_var = floor( subject_id / 1000);
/* then get the next three digits that we want*/
id = mod(temp_var ,1000);
Or in one line:
id = mod(floor(subject_id / 1000), 1000);
Then you can continue with sorting the new data sets by id and then merging.