Download Data From TAQ Using SAS - sas

I am trying to download the entire TAQ database on WRDS using SAS. Folloing is the SAS code given by a person from technical support of WRDS:
%let wrds=wrds.wharton.upenn.edu 4016;
options comamid=TCP remote=WRDS;
signon username=_prompt_;
%macro taq_daily_dataset_list(type=ctm,begyyyymmdd=20100101,endyyyymmdd=20111231) / des="Autogenerated list of needed Daily TAQ datasets";
%let type=%lowcase(&type);
/* Get SAS date values for date range endpoints */
%let begdate = %sysfunc(inputn(&begyyyymmdd,yymmdd8.));
%let enddate = %sysfunc(inputn(&endyyyymmdd,yymmdd8.));
%do d=&begdate %to &enddate /** For each date in the DATE range */;
%let yyyymmdd=%sysfunc(putn(&d,yymmddn8.));
/*If the corresponding dataset exists, add it to the list */
%if %sysfunc(exist(taqmsec.&type._&yyyymmdd)) %then taqmsec.&type._&yyyymmdd;
%end;
%mend;
* using this macro;
data my_output;
set %taq_daily_dataset_list(type=ctm,begyyyymmdd=20100101,endyyyymmdd=20121231) open=defer;
run;
I tried to run this in SAS, but it gave me an erorr "THERE IS NOT A DEFAULT INPUT DATA SET (_LAST_IS_NULL)". I don't know how to use SAS, not even a little. All I want is downloading the database.
Really appreciated if someone could help me out of here.

The code you are running is a SAS/CONNECT session from your computer to a remote server. Once you connect, I'm assuming the libname TAQMSEC is defined on the server. So, my guess is you need to "remote submit" the code (which will create the SAS dataset my_output in the server's WORK library). Then you can use PROC DOWNLOAD to copy it to your local machine:
%let wrds=wrds.wharton.upenn.edu 4016;
options comamid=TCP remote=WRDS;
signon username=_prompt_;
RSUBMIT; /* Execute following on server after logging in */
%macro taq_daily_dataset_list(type=ctm,begyyyymmdd=20100101,endyyyymmdd=20111231) / des="Autogenerated list of needed Daily TAQ datasets";
%let type=%lowcase(&type);
/* Get SAS date values for date range endpoints */
%let begdate = %sysfunc(inputn(&begyyyymmdd,yymmdd8.));
%let enddate = %sysfunc(inputn(&endyyyymmdd,yymmdd8.));
%do d=&begdate %to &enddate /** For each date in the DATE range */;
%let yyyymmdd=%sysfunc(putn(&d,yymmddn8.));
/*If the corresponding dataset exists, add it to the list */
%if %sysfunc(exist(taqmsec.&type._&yyyymmdd)) %then taqmsec.&type._&yyyymmdd;
%end;
%mend;
* using this macro;
data my_output;
set %taq_daily_dataset_list(type=ctm,begyyyymmdd=20100101,endyyyymmdd=20121231) open=defer;
run;
/* Download result to your computer */
proc download data=my_output;
run;
ENDRSUBMIT; /* Signals end of processing on remote server */
Any programming statements that appear between the RSUBMIT and ENDRSUBMIT commands are executed on the remote server. Notice that the macro is created and executed by the remote SAS session.
Remember to use the signoff command to disconnect from the server after you retrieve the data you need.

I don't speak SAS so I can't comment on your code, but I don't recognize "taqmsec" as one of the main files. The Consolidated Trades data is held in files of the form taq.CT_YYYYMMDD and the Consolidated Quotes files are taq.CQ_YYYYMMDD. The first available date for these is 19930104.
Back when I had an account, I wrote some Python scripts to automate the process of downloading data in bulk from WRDS: https://github.com/jbrockmendel/pywrds
The scripts that try to auto-setup SSH keys are untested (please send me a note if you want to help me test/fix them), but the core is well-tested. Assuming you have an account and key-based authentication set up, you can run:
import pywrds
# Download the TAQ Consolidated Trades (TAQ_CT) file for 1993-06-12.
# y = [num_files, num_rows, paramiko_ssh, paramiko_sftp, time_elapsed]
y = pywrds.get_wrds('taq.ct', 1993, 06, 12)
# Loop over all available dates to download in bulk.
# The script is moderately smart about picking up
# unfinished loops where they left off.
# y = [num_files, time_elapsed]
y = pywrds.wrds_loop('taq.ct')
# Find out what the darn names of the available TAQ files are.
# y = [file_list, paramiko_ssh, paramiko_sftp]
y = pywrds.find_wrds('taq')
The files start at a few tens of MB apiece in 1993 and grow to ~1 GB apiece for taq.ct and >5GB for taq.cq. Standard WRDS accounts limit your storage space to 1 GB, so trying to query all of, say, taq.cq_20050401 will put a truncated file in your directory. pywrds.get_wrds breaks up these big queries and loops over smaller files, then recombines them after they have all downloaded.
Caution: wrds_loop also deletes these files from your directory on the server after downloading them. It also runs rm wrds_export*, since all of the SAS files it uploads begin with "wrds_export". Make sure you don't have anything else following the same pattern.
The same commands also work with Compustat (comp.fundq, comp.g_fundq, ...), CRSP (crsp.msf, crsp.dsf, ...), OptionMetrics (optionm.optionm_opprcd1996, optionm.opprcd1997,...), IBES, TFN, ...
# Also works with other WRDS datasets.
# The day, month, and year arguments are optional.
# Get the OptionMetrics pricing file for March 1993
y = pywrds.get_wrds('optionm.opprcd', 1993, 3)
# Get the Compustat Fundamentals Quarterly file for 1997
y = pywrds.get_wrds('comp.fundq', 1997)
# Get the CRSP Monthly Stock File for all available years
y = pywrds.get_wrds('crsp.msf')

Related

How to download a file from web and assign to the certain folder by using SAS

Good Morning
So I have tried to download zip file from the website, and try to assign the location.
The location I want to put is
S:\Projects\
Method1,
First Attempt is below
DATA _null_ ;
x 'start https://yehonal.github.io/DownGit/#/home?url=https:%2F%2Fgithub.com%2FCSSEGISandData%2FCOVID-19%2Ftree%2Fmaster%2Fcsse_covid_19_data%2Fcsse_covid_19_daily_reports';
RUN ;
Method1, I can download the file, but this automatically downloaded to my Download folder.
Method 2,
so I found out this way.
filename out "S:\Projects\csse_covid_19_daily_reports.zip";
proc http
url='https://yehonal.github.io/DownGit/#/home?url=https:%2F%2Fgithub.com%2FCSSEGISandData%2FCOVID-19%2Ftree%2Fmaster%2Fcsse_covid_19_data%2Fcsse_covid_19_daily_reports'
method="get" out=out;
run;
But the code is not working, not downloading anything.
how can I download the file from the web and assign to the certain location?
I would probably recommend a macro in this case then (or CALL EXECUTE) but I prefer macros and then calling the macro via CALL EXECUTE. Took about a minute running on SAS Academics on Demand (free cloud service).
*set start date for files;
%let start_date = 01-22-2020;
*macro to import data;
%macro importFullData(date);
*file name reference;
filename out "/home/fkhurshed/WANT/&date..csv";
*file to download;
%let download_url = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/&date..csv";
proc http url=&download_url
method="get" out=out;
run;
*You can add in data import/append steps here as well as necessary;
%mend;
%importFullData(&start_date.);
data importAll;
start_date=input("&start_date", mmddyy10.);
*runs up to previous day;
end_date=today() - 1;
do date=start_date to end_date;
formatted_date=put(date, mmddyyd10.);
str=catt('%importFullData(', formatted_date, ');');
call execute(str);
end;
run;
The url when viewed in a browser is using javascript in the browser to construct a zip file that is automatically downloaded. Proc HTTP does not run javascript, so will not be able to download the ultimate response which is the constructed zip file, thus you get the 404 message.
The list of the files in the repository can be obtain as json from url
https://api.github.com/repos/CSSEGISandData/COVID-19/contents/csse_covid_19_data/csse_covid_19_daily_reports
The listing data contains the download_url for each csv file.
A download_url will look like
https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/01-22-2020.csv
You can download individual files with SAS per #Reeza, or
use git commands or SAS git* functions to download the repository
AFAIK git archive for downloading only a specific subfolder of a repository is not available surfaced by github server
use svn commands to download a specific folder from a git repository
requires svn be installed (https://subversion.apache.org/) I used SlikSvn
Example:
Make some series plots of a response by date from stacked imported downloaded data.
options noxwait xsync xmin source;
* use svn to download all files in a subfolder of a git repository;
* local folder for storing data from
* COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University;
%let covid_data_root = c:\temp\csse;
%let rc = %sysfunc(dcreate(covid,&covid_data_root));
%let download_path = &covid_data_root\covid;
%let repo_subdir_url = https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_daily_reports;
%let svn_url = %sysfunc(tranwrd(&repo_subdir_url, tree/master, trunk));
%let os_command = svn checkout &svn_url "&download_path";
/*
* uncomment this block to download the (data) files from the repository subfolder;
%sysexec %superq(os_command);
*/
* codegen and execute the PROC IMPORT steps needed to read each csv file downloaded;
libname covid "&covid_data_root.\covid";
filename csvlist pipe "dir /b ""&download_path""";
data _null_;
infile csvlist length=l;
length filename $200;
input filename $varying. l;
if lowcase(scan(filename,-1,'.')) = 'csv';
out = 'covid.day_'||translate(scan(filename,1,'.'),'_','-');
/*
* NOTE: Starting 08/11/2020 FIPS data first starts appearing after a few hundred rows.
* Thus the high GuessingRows
*/
template =
'PROC IMPORT file="#path#\#filename#" replace out=#out# dbms=csv; ' ||
'GuessingRows = 5000;' ||
'run;';
source_code = tranwrd (template, "#path#", "&download_path");
source_code = tranwrd (source_code, "#filename#", trim(filename));
source_code = tranwrd (source_code, "#out#", trim(out));
/* uncomment this line to import each data file */
*call execute ('%nrstr(' || trim (source_code) || ')');
run;
* memname is always uppercase;
proc contents noprint data=covid._all_ out=meta(where=(memname like 'DAY_%'));
run;
* compute variable lengths for LENGTH statement;
proc sql noprint;
select
catx(' ', name, case when type=2 then '$' else '' end, maxlen)
into
:lengths separated by ' '
from
( select name, min(type) as type, max(length) as maxlen, min(varnum) as minvarnum, max(varnum) as maxvarnum
from meta
group by name
)
order by minvarnum, maxvarnum
;
quit;
* stack all the individual daily data;
data covid.all_days;
attrib date length=8 format=mmddyy10.;
length &lengths;
set covid.day_: indsname=dsname;
date = input(substr(dsname,11),mmddyy10.);
format _character_; * reset length based formats;
informat _character_; * reset length based informats;
run ;
proc sort data=covid.all_days out=us_days;
where country_region = 'US';
by province_state admin2 date;
run;
ods html gpath='.' path='.' file='covid.html';
options nobyline;
proc sgplot data=us_days;
where province_state =: 'Cali';
*where also admin2=: 'O';
by province_state admin2;
title "#byval2 County, #byval1";
series x=date y=confirmed;
xaxis valuesformat=monname3.;
label province_state='State' admin2='County';
label confirmed='Confirmed (cumulative?)';
run;
ods html close;
options byline;
Plots

Running SAS code on multiple files with DO LOOP and a MACRO

I am very new to SAS and have received my first work assignment. Basically, I need to pull all of the patient ID's (patid) and procedure code (proc_cd) from multiple SAS files and put it into an excel file.
From my research, I believe I need a MACRO with a do loop that will run this search for all of the files
Below is the code I've put together. Again I'm very new to SAS so any help will be appreciated!
libname sas 'P:\H3.2018.DH_StressQuery\dat';
libname optum 'C:\OPTUM Data\Zip5';
data libname.filename;
set libname.filename;
%MACRO LOOP * I don't know what to put here.
%DO i = 1 %TO
("zip5_r2018q1.sas7bdat","16.2GB","Sas7bdat","C:\OPTUM
Data\Zip5\zip5_r2018q1.sas7bdat","11Jul2018:20:07:01"
)
(data sas.query file;
set optum.zip5_m2007q1
(keep = patid, Proc_Cd);
if Proc_Cd = '94621');
proc print data= data.query file
%END;
%MEND LOOP;
%LOOP;

Problems with SAS and WRDS

I am new on SAS and I have a problem when I connect my computer with SAS to WRDS (Wharton Research Data Services). I want to compute some portfolios and I am running this code.
*****************************************************************************
Program Description : MOMENTUM PORTFOLIOS OF JEGADEESH AND TITMAN (JF, 1993)
USING MONTHLY RETURNS FROM CRSP
Created by : G. Cici, WRDS
Modified by : R. Moussawi, WRDS
Date Created : November 2004
Date Modified : May 2007
*****************************************************************************;
%let wrds = wrds.wharton.upenn.edu 4016;
options comamid=TCP remote=WRDS;
signon username=_prompt_;
rsubmit;
*****************************************************************************
1. Specifying Options
*****************************************************************************;
*** NUMBER OF PRIOR MONTHS USED TO CREATE MOMENTUM PORTFOLIOS;
%let J=6; * J can be between 3 to 12 months;
*** HOLDING PERIOD IN MONTHS AFTER PORTFOLIO CREATION;
%let K=6; * K can be between 3 to 12 months;
*** Footnote 4 page 69: 1965-1989 are the dates of portfolio holding periods;
*** BEGINING SAMPLE PERIOD;
%let begyear=1965;
*** ENDING SAMPLE PERIOD;
%let endyear=1989;
*****************************************************************************
2. Get Historical Exchange Codes and Share Codes for Common Stocks
***************************************************************************** ;
* Merge historical codes with CRSP Monthly Stock File;
proc sql;
create table msex1
as select a.permno, a.date, a.ret, b.exchcd, b.shrcd
from crsp.msf(keep=date permno ret) as a
left join crsp.mseall(keep=date permno exchcd shrcd) as b
on a.permno=b.permno and a.date= b.date;
quit;
First I provide my username and password to connect to wrds, and then, it gives an error message that reads as follows:
Libname CRSP is not assigned
Any idea why this may be happening? Thanks!
The code you submit to be run remotely needs to be sandwiched in between rsubmit; and endrsubmit;. You are missing endrsubmit;. It seems SAS is trying to run the code locally where the libname crsp has not been assigned.

Repeatedly running a data step if URL fails to load

I am creating a dataset using filename URL web submissions. However, in some instaces I keep getting '502' responses from the server. To get around this I would like to use some conditional logic inside a macro. I'm most of the way there, but I cant quite get the end bit to work. The idea is that the macro, which is nested within other nested macros will keep trying this one submission until it gets a dataset that doesn't have 0 observations then move on:
%macro test_exst;
filename loader url "http://finance.yahoo.com/d/quotes.csv?s=&svar1.+&svar2.+&svar3.+&svar4.+&svar5.+&svar6.+&svar7.+&svar8.+&svar9.+&svar10.+
&svar11.+&svar12.+&svar13.+&svar14.+&svar15.+&svar16.+&svar17.+&svar18.+&svar19.+&svar20.+
&svar21.+&svar22.+&svar23.+&svar24.+&svar25.+&svar26.+&svar27.+&svar28.+&svar29.+&svar30.+
&svar31.+&svar32.+&svar33.+&svar34.+&svar35.+&svar36.+&svar37.+&svar38.+&svar39.+&svar40.+
&svar41.+&svar42.+&svar43.+&svar44.+&svar45.+&svar46.+&svar47.+&svar48.+&svar49.+&svar50.+
&svar51.+&svar52.+&svar53.+&svar54.+&svar55.+&svar56.+&svar57.+&svar58.+&svar59.+&svar60.+
&svar61.+&svar62.+&svar63.+&svar64.+&svar65.+&svar66.+&svar67.+&svar68.+&svar69.+&svar70.+
&svar71.+&svar72.+&svar73.+&svar74.+&svar75.+&svar76.+&svar77.+&svar78.+&svar79.+&svar80.+
&svar81.+&svar82.+&svar83.+&svar84.+&svar85.+&svar86.+&svar87.+&svar88.+&svar89.+&svar90.+
&svar91.+&svar92.+&svar93.+&svar94.+&svar95.+&svar96.+&svar97.+&svar98.+&svar99.+&svar100.+
&svar101.+&svar102.+&svar103.+&svar104.+&svar105.+&svar106.+&svar107.+&svar108.+&svar109.+&svar110.+
&svar111.+&svar112.+&svar113.+&svar114.+&svar115.+&svar116.+&svar117.+&svar118.+&svar119.+&svar120.+
&svar121.+&svar122.+&svar123.+&svar124.+&svar125.+&svar126.+&svar127.+&svar128.+&svar129.+&svar130.+
&svar131.+&svar132.+&svar133.+&svar134.+&svar135.+&svar136.+&svar137.+&svar138.+&svar139.+&svar140.+
&svar141.+&svar142.+&svar143.+&svar144.+&svar145.+&svar146.+&svar147.+&svar148.+&svar149.+&svar150.+
&svar151.+&svar152.+&svar153.+&svar154.+&svar155.+&svar156.+&svar157.+&svar158.+&svar159.+&svar160.+
&svar161.+&svar162.+&svar163.+&svar164.+&svar165.+&svar166.+&svar167.+&svar168.+&svar169.+&svar170.+
&svar171.+&svar172.+&svar173.+&svar174.+&svar175.+&svar176.+&svar177.+&svar178.+&svar179.+&svar180.+
&svar181.+&svar182.+&svar183.+&svar184.+&svar185.+&svar186.+&svar187.+&svar188.+&svar189.+&svar190.+
&svar191.+&svar192.+&svar193.+&svar194.+&svar195.+&svar196.+&svar197.+&svar198.+&svar199.+&svar200.
&f=&&fvar&a." DEBUG ;
/* data step based on filename url above goes here, each pass will give 500 metrics x 1 symbol dataset*/
%put create dataset from csv submission;
data temp_&I._&&fvar&a.;
infile loader length=len MISSOVER /*delimiter = ','*/;
/* input record $varying8192. len; */
input record $varying30. len;
format record $30.;
informat record $30.;
run;
data _null_;
dsid=open("temp_&I._&&fvar&a.");
obs=attrn(dsid,"nobs");
put "number of observations = " obs;
if obs = 0 then stop;
else;
filename loader url "http://finance.yahoo.com/d/quotes.csv?s=&svar1.+&svar2.+&svar3.+&svar4.+&svar5.+&svar6.+&svar7.+&svar8.+&svar9.+&svar10.+
&svar11.+&svar12.+&svar13.+&svar14.+&svar15.+&svar16.+&svar17.+&svar18.+&svar19.+&svar20.+
&svar21.+&svar22.+&svar23.+&svar24.+&svar25.+&svar26.+&svar27.+&svar28.+&svar29.+&svar30.+
&svar31.+&svar32.+&svar33.+&svar34.+&svar35.+&svar36.+&svar37.+&svar38.+&svar39.+&svar40.+
&svar41.+&svar42.+&svar43.+&svar44.+&svar45.+&svar46.+&svar47.+&svar48.+&svar49.+&svar50.+
&svar51.+&svar52.+&svar53.+&svar54.+&svar55.+&svar56.+&svar57.+&svar58.+&svar59.+&svar60.+
&svar61.+&svar62.+&svar63.+&svar64.+&svar65.+&svar66.+&svar67.+&svar68.+&svar69.+&svar70.+
&svar71.+&svar72.+&svar73.+&svar74.+&svar75.+&svar76.+&svar77.+&svar78.+&svar79.+&svar80.+
&svar81.+&svar82.+&svar83.+&svar84.+&svar85.+&svar86.+&svar87.+&svar88.+&svar89.+&svar90.+
&svar91.+&svar92.+&svar93.+&svar94.+&svar95.+&svar96.+&svar97.+&svar98.+&svar99.+&svar100.+
&svar101.+&svar102.+&svar103.+&svar104.+&svar105.+&svar106.+&svar107.+&svar108.+&svar109.+&svar110.+
&svar111.+&svar112.+&svar113.+&svar114.+&svar115.+&svar116.+&svar117.+&svar118.+&svar119.+&svar120.+
&svar121.+&svar122.+&svar123.+&svar124.+&svar125.+&svar126.+&svar127.+&svar128.+&svar129.+&svar130.+
&svar131.+&svar132.+&svar133.+&svar134.+&svar135.+&svar136.+&svar137.+&svar138.+&svar139.+&svar140.+
&svar141.+&svar142.+&svar143.+&svar144.+&svar145.+&svar146.+&svar147.+&svar148.+&svar149.+&svar150.+
&svar151.+&svar152.+&svar153.+&svar154.+&svar155.+&svar156.+&svar157.+&svar158.+&svar159.+&svar160.+
&svar161.+&svar162.+&svar163.+&svar164.+&svar165.+&svar166.+&svar167.+&svar168.+&svar169.+&svar170.+
&svar171.+&svar172.+&svar173.+&svar174.+&svar175.+&svar176.+&svar177.+&svar178.+&svar179.+&svar180.+
&svar181.+&svar182.+&svar183.+&svar184.+&svar185.+&svar186.+&svar187.+&svar188.+&svar189.+&svar190.+
&svar191.+&svar192.+&svar193.+&svar194.+&svar195.+&svar196.+&svar197.+&svar198.+&svar199.+&svar200.
&f=&&fvar&a." DEBUG ;
data temp_&I._&&fvar&a.;
infile loader length=len MISSOVER /*delimiter = ','*/;
/* input record $varying8192. len; */
input record $varying30. len;
format record $30.;
informat record $30.;
run;
run;
%mend;
%test_exst;
The idea here is try URL submission, create dataset from it, check number of obs is not zero. If its not end the macro. If it is resubmit the same filename URL then create the dataset from it again. Keep doing this until the server respond, the exit the macro and most on to the rest of the code.
I haven't got as far as running this code in anger yet. I'm guessing the filename URL will work fine, but it is the fact that the code is attempting to create a dataset within a data null step right at the end that is making it fall over. Any ideas?
Thanks
Without getting into the specifics of your project, a good way to approach this generally is with recursion.
%macro test_me(iter);
%let iter=%eval(&iter.+1);
data my_data;
infile myfilename;
input stuff;
call symputx("obscount",_n_);
run;
%if &obscount=1 and &iter. < 10 %then %do;
%put Iteration &iter. failed, trying again;
%test_me(&iter.);
%end;
%mend test_me;
%test_me(0);
It checks to see if it worked, and if it did not work, it calls itself again, with a maximum iteration count to make sure you don't end up in infinite loop land if the server is down or somesuch. You also might put a delay in there if the server has a maximum frequency of calls or any other rules the API requires you to follow.

SAS -> Shell DB2 Passthrough and macro resolving

I'm trying to automate a job that involves a lot of data being passed across the network and between the actual db2 server and our SAS server. What I'd like to do is take a traditional pass through...
proc sql;
connect to db2(...);
create table temp as
select * from connection to db2(
select
date
.......
where
date between &start. and &stop.
); disconnect from db2;
quit;
into something like this:
x "db2 'insert into temp select date ...... where date between &start. and &stop.'";
I'm running into a few issues the first of which is db2 date format of 'ddMONyyyy'd which causes the shell command to terminate early. If I can get around that I think it should work.
I can pass a macro variable through to the AIX (SAS) server without the extra set of ' ' needed to execute the db2 command.
Any thoughts?
You might get around the single-quote around the date issue by setting off the WHERE clause with a parenthesis. I'm not sure that this will work, but it might be worth trying.
As far as X command, try something like the following:
%let start = '01jan2011'd;
%let stop = '31dec2011'd;
%let command_text = db2 %nrbquote(')insert into temp select date ... where (date between &start. and &stop.)%nrbquote(');
%put command_text = &command_text;
x "&command_text";