How to use If CONTAINS to remove variables containing certain words

How to use If CONTAINS to remove variables containing certain words - sas

I am trying to use SAS to remove certain project names from a dataset. We want to remove any project names that have the words "seminar" or "workshop" from our dataset. The below code is not running, it is giving warnings for each letter of "workshop" and saying it is invalid. If there is a better format to find and delete things in SAS please let me know which function to use.
I tried this code:
data before;
if find(projectname, "seminar", "workshop" then delete;
run;

FIND() can only search for one term at a time. You can chain it with an OR for just two words.
data before;
set have;
if find(projectname, "seminar", "it") or find(projectName, "workshop", "it") then delete;
run;

Related

Can I get SAS to concatenate and entire folder of data sets?

I'm working with data that seem to be split into nearly arbitrary sets from year to year. What I would like to do is to be able to start by concatenating all of the .sas7bdat files in a single library. How would I go about this?
Alternatively, if I know all of the possible names that files in the library might be assigned (but many are potentially missing from any given library), how can I get SAS to ignore missing files? For instance, say that I know all of the .sas7bdat files in my library have one of the names "set01", "set02", "set03" or "set04". If a particular library ("L") is missing one of these, then the data step:
DATA temp;
SET L.set01 L.set02 L.set03 L.set04;
RUN;
will produce an error. Assuming that I know that at least one of these exists, is there an option that will tell SAS to ignore the missing ones?
(I understand that these are two totally different questions, but either would solve my immediate problem.)

in SAS there is an easy way for SAS to automatically choose the datasets that start with some common name, you can use following statement:
data temp;
set L.set0: ; /*It will search for all datasets that start with set0 and will set only those which are available*/
run;
Does it answer your query?
Second approach
libname L "Y:\Test Data";
proc sql;
select strip("L."||memname) into :DSNAME separated by ' '
from dictionary.tables
where libname='L';
quit;
/* Main final DS*/
data want;
set &DSNAME;
run;
It will extract all Dataset names in L directory and will create macro variable DSNAME such as : L.set01 L.oth02 etc. , common names won't matter here..

How do I delete empty rows in SAS?

I have empty rows in SAS after I imported an .xls file.
I'm using the following code but it is not working.
I'm trying to delete all the empty rows.
DATA PROJECT.CLEAN_DATA1;
set PROJECT.merged_data;
if missing(coalesceC(of _character_)) and missing(coalesce(of_numeric_)) then delete;
run;
Please help!

Note: It was “character” in the original post with out “_”.
_CHARACTER_ is the constant that includes all characters in SAS. Try that instead in the condition statement.
And it is a good programming practice to write global constants in SAS as is. So, the condition is -
missing(coalesceC(of _CHARACTER_)) and missing(coalesce(of _NUMERIC_))

How to use call symput on a specific observation in SAS

I'm trying to convert a SAS dataset column to a list of macro variables but am unsure of how indexing works in this language.
DATA _Null_;
do I = 1 to &num_or;
set CondensedOverrides4 nobs = num_or;
call symputx("Item" !! left(put(I,8.))
,"Rule", "G");
end;
run;
Right now this code creates a list of macro variables Item1,Item2,..ItemN etc. and assigns the entire column called "Rule" to each new variable. My goal is to put the first observation of "Rule" in Item1, the second observation in that column in Item2, etc.
I'm pretty new to SAS and understand you can't brute force logic in the same way as other languages but if there's a way to do this I would appreciate the guidance.

Much easier to create a series of macro variables using PROC SQL's INTO clause. You can save the number of items into a macro variable.
proc sql noprint;
select rule into :Item1-
from CondensedOverrides4
;
%let num_or=&sqlobs;
quit;
If you want to use a data step there is no need for a DO loop. The data step iterates over the inputs automatically. Put the code to save the number of observations into a macro variable BEFORE the set statement in case the input dataset is empty.
data _null_;
if eof then call symputx('num_or',_n_-1);
set CondensedOverrides4 end=eof ;
call symputx(cats('Item',_n_),rule,'g');
run;

SAS does not need loops to access each row, it does it automatically. So your code is really close. Instead of I, use the automatic variable _n_ which can function as a row counter though it's actually a step counter.
DATA _Null_;
set CondensedOverrides4;
call symputx("Item" || put(_n_,8. -l) , Rule, "G");
run;
To be honest though, if you're new to SAS using macro variables to start isn't recommended, there are usually multiple ways to avoid it anyways and I only use it if there's no other choice. It's incredibly powerful, but easy to get wrong and harder to debug.
EDIT: I modified the code to remove the LEFT() function since you can use the -l option on the PUT statement to left align the results directly.
EDIT2: Removing the quotes around RULE since I suspect it's a variable you want to store the value of, not the text string 'RULE'. If you want the macro variables to resolve to a string you would add back the quotes but that seems incorrect based on your question.

SAS NOTSORTED Equivalent

I was using the following code to analyze data:
set taq.cq_&yyyymmdd:;
by symbol date time NOTSORTED ex;
There are are thousands of datasets I am running the code on in the unit of days. When &yyyymmdd only specifies one dataset (for one day. for example, 20130102), it works. However, when I try to run it for multiple datasets (for example, 201301:), SAS returns the following errors:
BY NOTSORTED/NOBYSORTED cannot be used with SET statement when
more than one data set is specified.
If I cannot use NOTSORTED here, what is an equivalent statement that I could use?
My understanding of the keyword NOTSORTED is that you use it when the data is not sorted yet. Therefore, do I need to sort it first? How to do it?
I am also confused by the number of variables that NOTSORTED is referencing. Does it only have an effect on "time", or it has effect on "symbol, data, time"?
Many thanks!
UPDATE#2:
The rest of the process immediately following the set statement is: (pseudo code as i don't have the permission to post the original code)
Data _quotes;
SET STATEMENT HERE
Change the name of a variable in the dataset (Variable name is EXN).
last.EXN in a if statement. If the condition is satisfied, label EXN.
Drop some variables.
Run;
DATA NEWDATASET (sortedby= SYMBOL DATE TIME index=(SYMBOL)
label="WRDS-TAQ NBBO Data");
SET _quotes;
by symbol date time;
....
Run;

NOTSORTED means that SAS can assume the sort order in the data is correct, so it may not have explicitly gone through a PROC SORT but it is in logical order as listed in the BY statement.
All variables in the BY statement are included in the NOTSORTED option. Given that I suspect you fully don't understand BY group processing.
It's usually a bit dangerous to use, especially if you don't understand BY group processing. If your data is in the same group but not adjacent it won't work properly and will not produce an error. The correct workaround depends on your processes to be honest.
I would suggest reviewing the documentation regarding BY group processing. It's quite in depth and has lots of samples to illustrate the different type of calculations.
http://support.sas.com/documentation/cdl/en/lrcon/69852/HTML/default/viewer.htm#n138da4gme3zb7n1nifpfhqv7clq.htm
NOTSORTED is often used in example posts to either avoid a sort or when using a custom sort that's difficult to implement in other ways. Explicitly sorting will remove this issue but you may also be misunderstanding how SAS processes data when you have a SET statement with a BY statement. I believe this is called interleaving.
http://support.sas.com/documentation/cdl/en/lrcon/69852/HTML/default/viewer.htm#n1tgk0uanvisvon1r26lc036k0w7.htm

I suspect that the NOTSORTED keyword is being using to find groups for observations with the same value for the EX variable within the same symbol,date,time. If you only need to find the FIRST then you can use the LAG() function to calculate the FIRST.EX flag.
data want;
set taq.cq_&yyyymmdd:;
by symbol date time;
first_ex = first.time or ex ne lag(ex);
Otherwise then perhaps you want to convert the process to data step views and then set the views together.
data work.view_cq_20130102 / view=work.view_cq_20130102;
set taq.cq_20130102;
by symbol date time ex NOTSORTED;
...
run;
...
data want ;
set work.view_cq_201301: ;
by symbol date time;
...

Subsetting data set in SAS by referencing an external text file

I am working with a data set from the FDA that contains data on reactions to pharmaceutical drugs. I am trying to subset the data by the names of drugs. I have an external text file with the drug names that I am interested in. I want to create a subset of the data comprised of my drugs of interest. My external text file is titled SSRIFULL.txt and the variable name is DRUGNAME. I tried many things that were blatantly wrong
i.e.
DATA SSRIFULL2;
---- SET SSRIFULL;
---- If Drugname ~= "P:\APPRENTICESHIP\SSRI_LIST.txt" then delete;
Run;
and I cannot find any literature on the matter directly. Should I look more into the topics on truncover or maybe proc sql? The text file contains a list of ~20 drugs. I am open to some type of inline code as well but for some reason SAS does not like this...
DATA SSRIFULL2;
---SET SSRIFULL;
------IF (AGE >19) OR (AGE = .) Then Delete;
------If (DRUGNAME ~= 'clomipramine' OR 'fluvoxamine' or 'Paxil' or 'paroxetine' or
'Prozac'
------or 'fluoxetine' or 'Seroquel' or 'Wellbutrin' or 'bupropion' or 'Zoloft' or 'sertraline'
------OR 'Zyban') Then Delete;
RUN;
As is probably evident, I do not have a lot of experience with SAS I am just trying to get this data set useable for analysis at this point.
Thank you for any help in advance

You should consult the SAS documentation to learn the necessary syntax. Your second attempt was pretty close, but this is correct:
DATA SSRIFULL2;
SET SSRIFULL;
IF (AGE >19) OR (AGE = .) Then Delete;
If DRUGNAME in ('clomipramine' 'fluvoxamine' 'Paxil' 'paroxetine' 'Prozac' 'fluoxetine' 'Seroquel' 'Wellbutrin' 'bupropion' 'Zoloft' 'sertraline' 'Zyban') then delete;
RUN;
Note that names stored in the variable drugname will be case sensitive, so if, say, the variable is 'paxil' and you try to match on 'Paxil' that won't work. You could use the lowcase function to deal with this.
To implement something like your first attempt, you'll have to read the file in to a SAS dataset and then use that to do the matching in a second step:
data ssri_list;
length drugname $50.;
infile 'P:\APPRENTICESHIP\SSRI_LIST.txt';
input drugname$;
run;
proc sql;
create table ssrifull2 as
select * from ssrifull where 0<=age<19 and drugname not in
(select drugname from ssri_list);
quit;
or something like that.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to use If CONTAINS to remove variables containing certain words - sas

FIND() can only search for one term at a time. You can chain it with an OR for just two words. data before; set have; if find(projectname, "seminar", "it") or find(projectName, "workshop", "it") then delete; run;

Related

Can I get SAS to concatenate and entire folder of data sets?

How do I delete empty rows in SAS?

How to use call symput on a specific observation in SAS

SAS NOTSORTED Equivalent

Subsetting data set in SAS by referencing an external text file

Categories

Resources