Problems with SAS and WRDS - sas

I am new on SAS and I have a problem when I connect my computer with SAS to WRDS (Wharton Research Data Services). I want to compute some portfolios and I am running this code.
*****************************************************************************
Program Description : MOMENTUM PORTFOLIOS OF JEGADEESH AND TITMAN (JF, 1993)
USING MONTHLY RETURNS FROM CRSP
Created by : G. Cici, WRDS
Modified by : R. Moussawi, WRDS
Date Created : November 2004
Date Modified : May 2007
*****************************************************************************;
%let wrds = wrds.wharton.upenn.edu 4016;
options comamid=TCP remote=WRDS;
signon username=_prompt_;
rsubmit;
*****************************************************************************
1. Specifying Options
*****************************************************************************;
*** NUMBER OF PRIOR MONTHS USED TO CREATE MOMENTUM PORTFOLIOS;
%let J=6; * J can be between 3 to 12 months;
*** HOLDING PERIOD IN MONTHS AFTER PORTFOLIO CREATION;
%let K=6; * K can be between 3 to 12 months;
*** Footnote 4 page 69: 1965-1989 are the dates of portfolio holding periods;
*** BEGINING SAMPLE PERIOD;
%let begyear=1965;
*** ENDING SAMPLE PERIOD;
%let endyear=1989;
*****************************************************************************
2. Get Historical Exchange Codes and Share Codes for Common Stocks
***************************************************************************** ;
* Merge historical codes with CRSP Monthly Stock File;
proc sql;
create table msex1
as select a.permno, a.date, a.ret, b.exchcd, b.shrcd
from crsp.msf(keep=date permno ret) as a
left join crsp.mseall(keep=date permno exchcd shrcd) as b
on a.permno=b.permno and a.date= b.date;
quit;
First I provide my username and password to connect to wrds, and then, it gives an error message that reads as follows:
Libname CRSP is not assigned
Any idea why this may be happening? Thanks!

The code you submit to be run remotely needs to be sandwiched in between rsubmit; and endrsubmit;. You are missing endrsubmit;. It seems SAS is trying to run the code locally where the libname crsp has not been assigned.

Related

How to create counter variable with if condition

So I am still a newbie in SAS and therefore any help is greatly appreciated.
I am trying to create 2 counter variables:
1st one - COUNTTREATMENTVISITS that counts +1 whenever FOLDERNAME variable in my dataset has any values except 'Visit 1 Screening 1','Visit 2 Screening 2','Visit 17 Safety FU'
2nd one - COUNTETPATIENT that counts +1 whenever DSTERM2 variable in my dataset has any values except 'Complete'
after getting these 2 counter variables straight I just want to calculate and display in output EEOT_RATE as per the formula :EEOT_RATE=COUNTETPATIENT/(COUNTTREATMENTVISITS/1000)
I already did some SAS code(using Cluepoints platform for clinical trials) but I can't get past the errors(see below code and error snapshot):
data RAND1 (keep= CP_PATIENT CP_REGION CP_CENTER RANDYN RANDYN_STD RANDOMIZED_AT
RANDOMIZED_AT_INT);
set data_in.rand;
where RANDYN='Yes';
by FOLDERNAME;
retain COUNTTREATMENTVISITS=0;
if FOLDERNAME NOT in('Visit 1 Screening 1','Visit 2 Screening 2','Visit 17 Safety FU') then
COUNTTREATMENTVISITS+1;
run;
proc sort data = RAND1;
by CP_PATIENT;
run;
data DS1 (keep = CP_PATIENT CP_REGION CP_CENTER ET DSTERM2 DSCONT FOLDERNAME);
set data_in.DS;
retain COUNTETPATIENT=0;
if strip(DSTERM2) NE 'Completed' then COUNTETPATIENT+1;
run;
proc sort data = DS1;
by CP_PATIENT;
run;
data data_out.output;
merge RAND1 (in=a) DS1 (in=b);
by CP_PATIENT;
if a;
if CP_PATIENT='' then delete;
EEOT_RATE=COUNTETPATIENT/(COUNTTREATMENTVISITS/1000);
run;
Error snapshot
SAS Syntax for retain statement is:
retain COUNTTREATMENTVISITS 0;
See documentation here : https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/lestmtsref/p0t2ac0tfzcgbjn112mu96hkgg9o.htm#p115mm1hsepln9n11cxtkk7wejim

Convert SAS DATE and use in a PROC SQL

I'm having problems to DATEs in SAS Enterprise Guide 7.1 M4.
it's very very simple in SQL Server or VBA but in SAS is driving me crazy.
Problem:
For some strange reason I'm unable to make a simple select. I tried many different forms of formating and convertions but any seems to work
My Simple select returns no observations.
Description of T1.DT_DATE in proc contents
Type: Num
Len: 8
Format: DDMMYY10.
Informat: DATETIME20.
%let DATE_EXAMPLE='01JAN2019'd;
data _null_;
call symput ('CONVERTED_DATE',put(&DATE_EXAMPLE, ddmmyy10.));
run;
%put &CONVERTED_DATE;
PROC SQL;
CREATE TABLE TEST_SELECT AS
SELECT *
FROM MY_SAMPLE_DATA as T1
WHERE T1.DT_DATE = &CONVERTED_DATE
;QUIT;
Intially you are setting up the date properly but you are changing it to a different value that is not understood in where clause. See the resolutions of macrovariable for both macrovariables you have created
%put value of my earlier date value is &DATE_EXAMPLE;
value of my earlier date value is '01JAN2019'd
%put value of my current date value is &CONVERTED_DATE;
value of my current date value is 01/01/2019
change your code to use date literal that is '01JAN2019'd then your code will work. 01/01/2019 value will not make sense in where clause.
PROC SQL;
CREATE TABLE TEST_SELECT AS
SELECT *
FROM MY_SAMPLE_DATA as T1
WHERE T1.DT_DATE = &CONVERTED_DATE
;QUIT;

SAS Macro help to loop monthly sas datasets

I have monthly datasets in SAS Library for customers from Jan 2013 onwards with datasets name as CUST_JAN2013,CUST_FEB2013........CUST_OCT2017. These customers datasets have huge records of 2 million members for each month.This monthly datset has two columns (customer number and customer monthly expenses).
I have one input dataset Cust_Expense with customer number and month as columns. This Cust_Expense table has only 250,000 members and want to pull expense data for each member from SPECIFIC monthly SAS dataset by joining customer number.
Cust_Expense
------------
Customer_Number Month
111 FEB2014
987 APR2017
784 FEB2014
768 APR2017
.....
145 AUG2017
345 AUG2014
I have tried using call execute, but it tries to loop thru each 250,000 records of input dataset (Cust_Expense) and join with corresponding monthly SAS customer tables which takes too much of time.
Is there a way to read input tables (Cust_Expense) by month so that we read all customers for a specific month and then read the same monthly table ONCE to pull all the records from that month, so that it does not loop 250,000 times.
Depending on what you want the result to be, you can create one output per month by filtering on cust_expenses per month and joining with the corresponding monthly dataset
%macro want;
proc sql noprint;
select distinct month
into :months separated by ' '
from cust_expenses
;
quit;
proc sql;
%do i=1 %to %sysfunc(countw(&months));
%let month=%scan(&months,&i,%str( ));
create table want_&month. as
select *
from cust_expense(where=(month="&month.")) t1
inner join cust_&month. t2
on t1.customer_number=t2.customer_number
;
%end;
quit;
%mend;
%want;
Or you could have one output using one join by 'unioning' all those monthly datasets into one and dynamically adding a month column.
%macro want;
proc sql noprint;
select distinct month
into :months separated by ' '
from cust_expenses
;
quit;
proc sql;
create table want as
select *
from cust_expense t1
inner join (
%do i=1 %to %sysfunc(countw(&months));
%let month=%scan(&months,&i,%str( ));
%if &i>1 %then union;
select *, "&month." as month
from cust_&month
%end;
) t2
on t1.customer_number=t2.customer_number
and t1.month=t2.month
;
quit;
%mend;
%want;
In either case, I don't really see the point in joining those monthly datasets with the cust_expense dataset. The latter does not seem to hold any information that isn't already present in the monthly datasets.
Your first, best answer is to get rid of these monthly separate tables and make them into one large table with ID and month as key. Then you can simply join on this and go on your way. Having many separate tables like this where a data element determines what table they're in is never a good idea. Then index on month to make it faster.
If you can't do that, then try creating a view that is all of those tables unioned. It may be faster to do that; SAS might decide to materialize the view but maybe not (but if it's extremely slow, then look in your temp table space to see if that's what's happening).
Third option then is probably to make use of SAS formats. Turn the smaller table into a format, using the CNTLIN option. Then a single large datastep will allow you to perform the join.
data want;
set jan feb mar apr ... ;
where put(id,CUSTEXPF1.) = '1';
run;
That only makes one pass through the 250k table and one pass through the monthly tables, plus the very very fast format lookup which is undoubtedly zero cost in this data step (as the disk i/o will be slower).
I guess you could output your data in specific dataset like this example :
data test;
infile datalines dsd;
input ID : $2. MONTH $3. ;
datalines;
1,JAN
2,JAN
3,JAN
4,FEB
5,FEB
6,MAR
7,MAR
8,MAR
9,MAR
;
run;
data JAN FEB MAR;
set test;
if MONTH = "JAN" then output JAN;
if MONTH = "FEB" then output FEB;
if MONTH = "MAR" then output MAR;
run;
You will avoid to loop through all your ID (250000)
and you will use dataset statement from SAS
At the end you will get 12 DATASET containing the ID related.
If you case, FEB2014 , for example, you will use a substring fonction and the condition in your dataset will become :
...
set test;
...
if SUBSTR(MONTH,1,3)="FEB" then output FEB;
...
Regards

How to iteratively run a SAS procedure with different subsets of data?

I would like to repeatedly run PROC REG with different subsets of an existing SAS dataset. Here's a simple example dataset:
DATA data_main;
input trt depth year response;
cards;
1 1 2014 1.1
1 2 2014 1.2
2 1 2014 1.3
2 2 2014 1.4
1 1 2013 2.2
1 2 2013 2.4
2 1 2013 2.6
2 2 2013 2.8
;
run;
For each combination of trt and depth I want to run this procedure, where current_data is the current combination of trt and depth:
PROC REG data = current_data;
model response = year;
run;
And I want to capture the regression coefficients and p-values for all iterations in one dataset or text file.
The number of levels of input and trt is much greater in my actual dataset, so I'm trying to avoid manually coding each combination. Can someone explain to me how to do this?
Consider running a macro iterating through the combinations of trt and depth. Below nested loop iteratively re-creates the current_data dataset and uses it in regression procedure outputting the corresponding result table. Adjust value ranges in loop limits as needed for all combinations:
%macro loopregression;
%do j = 1 %to 2; * TRT VALUES;
%do i = 2013 %to 2014; * DEPTH VALUES;
DATA current_data;
SET data_main;
if trt = &j;
if depth = &i;
run;
PROC REG data = current_data noprint outest=results&i&j;;
model response = year;
run;
%end;
%end;
%mend loopregression;
%loopregression;

Download Data From TAQ Using SAS

I am trying to download the entire TAQ database on WRDS using SAS. Folloing is the SAS code given by a person from technical support of WRDS:
%let wrds=wrds.wharton.upenn.edu 4016;
options comamid=TCP remote=WRDS;
signon username=_prompt_;
%macro taq_daily_dataset_list(type=ctm,begyyyymmdd=20100101,endyyyymmdd=20111231) / des="Autogenerated list of needed Daily TAQ datasets";
%let type=%lowcase(&type);
/* Get SAS date values for date range endpoints */
%let begdate = %sysfunc(inputn(&begyyyymmdd,yymmdd8.));
%let enddate = %sysfunc(inputn(&endyyyymmdd,yymmdd8.));
%do d=&begdate %to &enddate /** For each date in the DATE range */;
%let yyyymmdd=%sysfunc(putn(&d,yymmddn8.));
/*If the corresponding dataset exists, add it to the list */
%if %sysfunc(exist(taqmsec.&type._&yyyymmdd)) %then taqmsec.&type._&yyyymmdd;
%end;
%mend;
* using this macro;
data my_output;
set %taq_daily_dataset_list(type=ctm,begyyyymmdd=20100101,endyyyymmdd=20121231) open=defer;
run;
I tried to run this in SAS, but it gave me an erorr "THERE IS NOT A DEFAULT INPUT DATA SET (_LAST_IS_NULL)". I don't know how to use SAS, not even a little. All I want is downloading the database.
Really appreciated if someone could help me out of here.
The code you are running is a SAS/CONNECT session from your computer to a remote server. Once you connect, I'm assuming the libname TAQMSEC is defined on the server. So, my guess is you need to "remote submit" the code (which will create the SAS dataset my_output in the server's WORK library). Then you can use PROC DOWNLOAD to copy it to your local machine:
%let wrds=wrds.wharton.upenn.edu 4016;
options comamid=TCP remote=WRDS;
signon username=_prompt_;
RSUBMIT; /* Execute following on server after logging in */
%macro taq_daily_dataset_list(type=ctm,begyyyymmdd=20100101,endyyyymmdd=20111231) / des="Autogenerated list of needed Daily TAQ datasets";
%let type=%lowcase(&type);
/* Get SAS date values for date range endpoints */
%let begdate = %sysfunc(inputn(&begyyyymmdd,yymmdd8.));
%let enddate = %sysfunc(inputn(&endyyyymmdd,yymmdd8.));
%do d=&begdate %to &enddate /** For each date in the DATE range */;
%let yyyymmdd=%sysfunc(putn(&d,yymmddn8.));
/*If the corresponding dataset exists, add it to the list */
%if %sysfunc(exist(taqmsec.&type._&yyyymmdd)) %then taqmsec.&type._&yyyymmdd;
%end;
%mend;
* using this macro;
data my_output;
set %taq_daily_dataset_list(type=ctm,begyyyymmdd=20100101,endyyyymmdd=20121231) open=defer;
run;
/* Download result to your computer */
proc download data=my_output;
run;
ENDRSUBMIT; /* Signals end of processing on remote server */
Any programming statements that appear between the RSUBMIT and ENDRSUBMIT commands are executed on the remote server. Notice that the macro is created and executed by the remote SAS session.
Remember to use the signoff command to disconnect from the server after you retrieve the data you need.
I don't speak SAS so I can't comment on your code, but I don't recognize "taqmsec" as one of the main files. The Consolidated Trades data is held in files of the form taq.CT_YYYYMMDD and the Consolidated Quotes files are taq.CQ_YYYYMMDD. The first available date for these is 19930104.
Back when I had an account, I wrote some Python scripts to automate the process of downloading data in bulk from WRDS: https://github.com/jbrockmendel/pywrds
The scripts that try to auto-setup SSH keys are untested (please send me a note if you want to help me test/fix them), but the core is well-tested. Assuming you have an account and key-based authentication set up, you can run:
import pywrds
# Download the TAQ Consolidated Trades (TAQ_CT) file for 1993-06-12.
# y = [num_files, num_rows, paramiko_ssh, paramiko_sftp, time_elapsed]
y = pywrds.get_wrds('taq.ct', 1993, 06, 12)
# Loop over all available dates to download in bulk.
# The script is moderately smart about picking up
# unfinished loops where they left off.
# y = [num_files, time_elapsed]
y = pywrds.wrds_loop('taq.ct')
# Find out what the darn names of the available TAQ files are.
# y = [file_list, paramiko_ssh, paramiko_sftp]
y = pywrds.find_wrds('taq')
The files start at a few tens of MB apiece in 1993 and grow to ~1 GB apiece for taq.ct and >5GB for taq.cq. Standard WRDS accounts limit your storage space to 1 GB, so trying to query all of, say, taq.cq_20050401 will put a truncated file in your directory. pywrds.get_wrds breaks up these big queries and loops over smaller files, then recombines them after they have all downloaded.
Caution: wrds_loop also deletes these files from your directory on the server after downloading them. It also runs rm wrds_export*, since all of the SAS files it uploads begin with "wrds_export". Make sure you don't have anything else following the same pattern.
The same commands also work with Compustat (comp.fundq, comp.g_fundq, ...), CRSP (crsp.msf, crsp.dsf, ...), OptionMetrics (optionm.optionm_opprcd1996, optionm.opprcd1997,...), IBES, TFN, ...
# Also works with other WRDS datasets.
# The day, month, and year arguments are optional.
# Get the OptionMetrics pricing file for March 1993
y = pywrds.get_wrds('optionm.opprcd', 1993, 3)
# Get the Compustat Fundamentals Quarterly file for 1997
y = pywrds.get_wrds('comp.fundq', 1997)
# Get the CRSP Monthly Stock File for all available years
y = pywrds.get_wrds('crsp.msf')