I am working with a data set from the FDA that contains data on reactions to pharmaceutical drugs. I am trying to subset the data by the names of drugs. I have an external text file with the drug names that I am interested in. I want to create a subset of the data comprised of my drugs of interest. My external text file is titled SSRIFULL.txt and the variable name is DRUGNAME. I tried many things that were blatantly wrong
i.e.
DATA SSRIFULL2;
---- SET SSRIFULL;
---- If Drugname ~= "P:\APPRENTICESHIP\SSRI_LIST.txt" then delete;
Run;
and I cannot find any literature on the matter directly. Should I look more into the topics on truncover or maybe proc sql? The text file contains a list of ~20 drugs. I am open to some type of inline code as well but for some reason SAS does not like this...
DATA SSRIFULL2;
---SET SSRIFULL;
------IF (AGE >19) OR (AGE = .) Then Delete;
------If (DRUGNAME ~= 'clomipramine' OR 'fluvoxamine' or 'Paxil' or 'paroxetine' or
'Prozac'
------or 'fluoxetine' or 'Seroquel' or 'Wellbutrin' or 'bupropion' or 'Zoloft' or 'sertraline'
------OR 'Zyban') Then Delete;
RUN;
As is probably evident, I do not have a lot of experience with SAS I am just trying to get this data set useable for analysis at this point.
Thank you for any help in advance
You should consult the SAS documentation to learn the necessary syntax. Your second attempt was pretty close, but this is correct:
DATA SSRIFULL2;
SET SSRIFULL;
IF (AGE >19) OR (AGE = .) Then Delete;
If DRUGNAME in ('clomipramine' 'fluvoxamine' 'Paxil' 'paroxetine' 'Prozac' 'fluoxetine' 'Seroquel' 'Wellbutrin' 'bupropion' 'Zoloft' 'sertraline' 'Zyban') then delete;
RUN;
Note that names stored in the variable drugname will be case sensitive, so if, say, the variable is 'paxil' and you try to match on 'Paxil' that won't work. You could use the lowcase function to deal with this.
To implement something like your first attempt, you'll have to read the file in to a SAS dataset and then use that to do the matching in a second step:
data ssri_list;
length drugname $50.;
infile 'P:\APPRENTICESHIP\SSRI_LIST.txt';
input drugname$;
run;
proc sql;
create table ssrifull2 as
select * from ssrifull where 0<=age<19 and drugname not in
(select drugname from ssri_list);
quit;
or something like that.
Related
I have to perform statistical analysis on a file with hundreds of observations and 7 variables(columns)on SAS. I know that it is necessary to insert all the observations after "cards" or "datalines". But I can't write them all obviously. How can I do? Moreover, the given data file already is .sas7bdat.
Then, since (in my case) the multiple correspondence analysis requires only six of the seven variables, does this affect what I have to write in INPUT or/and in CARDS?
You only use CARDS when you're trying to manually write a data set. If you already have a SAS data set (sas7bdat) you can usually use that directly (there are some exceptions but likely don't apply here).
First create a libname to the folder where the file is:
libname myFiles 'path to fodler with sas file';
Then load it into your work library - this is a temporary space that is cleaned up when you're done so no files here are saved permanently.
This copies it over to that library - which is often faster.
data myFileName;
set myFiles.myFileName;
run;
You can just work with the file from that library by referencing it as myFiles.myFileName in your code.
proc means data=myFiles.myFileName;
run;
This should get you started, but you should take the SAS free e-course to understand the basics, it will save you time overall.
Just tell SAS to use the dataset. INPUT statement (and CARDS/DATALINES or INFILE statement) are for reading from text files.
proc corresp data='/my directory/mydataset.sas7bdat' .... ;
...
run;
You could also make a libref that points to the directory and use two level name to reference the dataset.
libname myfiles '/my directory/';
proc corresp data=myfiles.mydataset .... ;
...
run;
I'm working with data that seem to be split into nearly arbitrary sets from year to year. What I would like to do is to be able to start by concatenating all of the .sas7bdat files in a single library. How would I go about this?
Alternatively, if I know all of the possible names that files in the library might be assigned (but many are potentially missing from any given library), how can I get SAS to ignore missing files? For instance, say that I know all of the .sas7bdat files in my library have one of the names "set01", "set02", "set03" or "set04". If a particular library ("L") is missing one of these, then the data step:
DATA temp;
SET L.set01 L.set02 L.set03 L.set04;
RUN;
will produce an error. Assuming that I know that at least one of these exists, is there an option that will tell SAS to ignore the missing ones?
(I understand that these are two totally different questions, but either would solve my immediate problem.)
in SAS there is an easy way for SAS to automatically choose the datasets that start with some common name, you can use following statement:
data temp;
set L.set0: ; /*It will search for all datasets that start with set0 and will set only those which are available*/
run;
Does it answer your query?
Second approach
libname L "Y:\Test Data";
proc sql;
select strip("L."||memname) into :DSNAME separated by ' '
from dictionary.tables
where libname='L';
quit;
/* Main final DS*/
data want;
set &DSNAME;
run;
It will extract all Dataset names in L directory and will create macro variable DSNAME such as : L.set01 L.oth02 etc. , common names won't matter here..
Good day!
I need a list of libraries-tables on a SAS server with a size of each table and last time, when it was open/used.
I'm not very familiar with SAS, so I don't even know where would I start searching :(
I assume, that there is some simple solution, maybe a proc of some sort, that may help...
You can use proc contents to access metadata about a library in SAS, for example using the sashelp library:
proc contents data = sashelp._ALL_ NODS;
run;
sashelp is the library you are refencing. By specifying _ALL_ you ask SAS for data about all the files in this library (by choosing a singular file such as sashelp.ztc you can get information on jut one file).
This will give you a lot of information, so by using the NODS statement you can suppress the output to give you less detail. The above code will give you the number of files, their type, the level, the file size, and the data they were last modified.
If you want to output this information to a dataset, you have to use the ODS output system with the correct ods table name, in this case it is Members. Furthermore, if you're looking for datasets in particular then you can filter the output with a where= statement:
ods output Members = test (where = (memtype = "DATA"));
proc contents data = work._ALL_ NODS noprint;
run;
ods listing; /* change back to listing output*/
I am working with multiple waves of survey data. I have finished defining formats and labels for the first wave of data.
The second wave of data will be different, but the codes, labels, formats, and variable names will all be the same. I do not want to define all these attributes again...it seems like there should be a way to export the PROC CONTENTS information for one dataset and import it into another dataset. Is there a way to do this?
The closest thing I've found is PROC CPORT but I am totally confused by it and cannot get it to run.
(Just to be clear I'll ask the question another way as well...)
When you run PROC CONTENTS, SAS tells you what format, labels, etc. it is using for each variable in the dataset.
I have a second dataset with the exact same variable names. I would like to use the variable attributes from the first dataset on the variables in the second dataset. Is there any way to do this?
Thanks!
So you have a MODEL dataset and a HAVE dataset, both with data in them. You want to create WANT dataset which has data from HAVE, with attributes of MODEL (formats, labels, and variable lengths). You can do this like:
data WANT ;
if 0 then set MODEL ;
set HAVE ;
run ;
This works because when the DATA step compiles, SAS builds the Program Data Vector (PDV) which defines variable attributes. Even though the SET MODEL never executes (because 0 is not true), all of the variables in MODEL are created in the PDV when the step compiles.
Importantly, note that if there are corresponding variables with different lengths, the length from MODEL will determine the length of the variable in WANT. So if HAVE has a variable that is longer than the same-named variable in MODEL, it may be truncated. Options VARLENCHK determines whether or not SAS throws a warning/error if this happens.
That assumes there are no formats/labels on the HAVE dataset. If there is a variable in HAVE that has a format/label, and the corresponding variable in MODEL does not have a format/label, the format/label from HAVE will be applied to WANT.
Sample code below.
data model;
set sashelp.class;
length FavoriteColor $3;
FavoriteColor="Red";
dob=today();
label
dob='BirthDate'
;
format
dob mmddyy10.
;
run;
data have;
set sashelp.class;
length FavoriteColor $10;
dob=today()-1;
FavoriteColor="Orange";
label
Name="HaveLabel"
dob="HaveLabel"
;
format
Name $1.
dob comma.
;
run;
options varlenchk=warn;
data want;
if 0 then set model;
set have;
run;
I'd create an empty dataset based on the existing one, and then use proc append to append the contents to it.
Create some sample data for the second round of data:
data new_data;
age = 10;
run;
Create an empty dataset based on the original data:
proc sql noprint;
create table want like sashelp.class;
quit;
Append the data into the empty dataset, retaining the details from the original:
proc append base=want data=new_data force nowarn;
run;
Note that I've used the force and nowarn options on proc append. This will ensure the data is appended even if differences are found between the two datasets being used. This is expected if you have, for example, format differences. It will also hide things like if columns exist in the new table that aren't in the old table etc. So be careful that this is doing what you want it to. If the behaviour is undesirable, consider using a datastep to append instead (and list the want dataset first).
Welcome to the stack.
If you want to copy the properties of the table without the data within it, you could use PROC SQL or data step with zero rows read in.
This examples copies all information about the SASHELP.CLASS dataset into a brand new dataset. All formats, attributes, labels, the whole thing is copies over. If you want to only copy some of the columns, specify them in select clause instead of asterix.
PROC SQL outobs=0;
CREATE TABLE WANT as SELECT * FROM SASHELP.CLASS;
QUIT;
Regards,
Vasilij
I need to import data from European Social Survey databank to SAS.
I'm not very good at using SAS so I just naively tried importing the text file one gets but it stores it all in one variable.
Can someone maybe help me with what to do? Since there doesn't seem to be a guide on their webpage I reckon it has to be pretty easy.
It's free to register (and takes 5 secs) and I need all possible data for Denmark.
Edit: When downloading what they call a SAS file, what i get is a huge proc format and the same text file as one gets by choosing text.
The data in the text file isn't comma separated and the first row does not contain variable names.
Download it in SAS format. Save the text file in a location you can remember, and open the SAS file. It's not just one big proc format; it's a big proc format followed by a datastep with input code. It was probably created by SPSS (it fits the pattern of an SPSS saved .sas file anyhow). Look for:
DATA OUT.ESS1_4e01_0_F1;
Or something like that (that's what it is when I downloaded it). It's probably about 3/4 of the way down the page. You just need to change the code:
INFILE 'ESS1_4e01_0_F1.txt';
or similar, to be the directory you placed the text file in. Create a LIBNAME for OUT that goes to wherever you want to permanently save this, and do that at the start of the .sas file, replacing the top 3 lines like so.
Originally:
LIBNAME LIBRARY '';
LIBNAME OUT '';
PROC FORMAT LIBRARY=LIBRARY ;
Change these to:
libname out "c:\mystuff\"; *but probably not c:\mystuff :);
options fmtsearch=(out);
proc format lib=out;
Then run the entire thing.
This is the best solution if you want the formatted values (value labels) and variable labels. If you don't care about that, then it might be easier to deal with the CSV like Bob shows.
But the website says yu can download SAS format, why don't you?
You need a delimiter if all goes into one column.
data temp;
length ...;
infile 'file.csv' dlm=',';
input ...;
run;
As Dirk says, the web site says you can download a SAS dataset directly. However, if there's some reason you don't want to do that, choose a comma separated file (CSV) and use PROC IMPORT. Here is an example:
proc import out=some_data
datafile='c:\path\somedata.csv'
dbms=csv replace;
getnames=yes;
run;
Of course, this assumes the first row contains column names.