CSV document import by using filename statement - sas

I want to read a CSV document by using a filename statement in SAS but Excel already included variable names as the first line when I input variable names by using an input statement--there is going to be a mistake. How can I deal with this situation?
filename outdata "C:\Users\Xiang\Desktop\crime2005.csv";
data crime;
infile outdata dlm="," dsd ;
run;
proc means mean std maxdec=1 ;
run;
proc print;
run;

First off - you're confusing things a bit by saying 'via the filename statement'. This is via datastep. The filename statement happens to be a relatively small component of this.
Second, let's get this into proper SAS indenting so we can see what's going on:
filename outdata "C:\Users\Xiang\Desktop\crime2005.csv";
data crime;
infile outdata dlm="," dsd ;
input [your-variable-list];
run;
proc means data=crime mean std maxdec=1 ;
run;
proc print data=crime;
run;
Data steps and Procs end with run (except for Procs that end in quit). Each of these is a separate step, so always include the run. Always include data= , unless you're using some fancy programming trick. 'data' always is in the first column, not indented - data step is the master statement, not filename.
These make your code readable, and protect you from mistakes. Readable code is important, even if you work alone; it means you understand what you wrote five years ago, five years from now.
Your original question - how do I avoid the errors from the header row?
filename outdata "C:\Users\Xiang\Desktop\crime2005.csv";
data crime;
infile outdata dlm="," dsd firstobs=2;
input [your-variable-list];
run;
There you go. FIRSTOBS=2 tells SAS to skip the first line [ie, the header row].
One thing you might try is a PROC IMPORT. PROC IMPORT with DBMS=CSV will do something really handy for you - it will put in the log a complete data step with all of the code to read the file in yourself. So while I don't actually recommend PROC IMPORT for production code [as it often makes poor decisions as to character/numeric formatting and lengths, among other things], it is very helpful to see how to get started with an input statement.
proc import file=outdata out=crime dbms=csv replace;
run;
Then look at your log, and copy that code out (removing line numbers); now you can modify it to your heart's content.

Related

Correct Syntax for Filename Statement using a dataset variable

I'm a SAS newbie and I don't seem to be able to work out the correct syntax for this. I have a dataset (fileList) with one field (projFile) which holds a filename. I wish to open the file and read the contents into a second field that will be created. The file is zipped (it's a SAS-EG project file) and so I'm told that I should use Filename statement with the zip option to read the file. However, no matter how I reference projFile it doesn't like it.
data fileList;
set fileList;
filename inzip zip "&projFile" member="project.xml";
infile inzip;
input fileContent $char2000.;
output;
run;
I may also have the input statement wrong, but until I can get past this issue, I don't know. Thanks.
If you are always reading the same file (member) from the ZIP file you can use the FILEVAR= option on the infile statement to switch which ZIP file you are reading that member from.
So if I have three ZIP files that each has a file named example.txt in it and a dataset like this with the list of filenames.
data fnames ;
input filename $80.;
cards;
c:\downloads\file1.zip
c:\downloads\file2.zip
c:\downloads\file3.zip
;
Then I can use that dataset to drive the creation of a new dataset that has the information from those files.
data test;
set fnames ;
fname=filename;
infile in zip filevar=fname member='example.txt' end=eof truncover;
do while (not eof);
input line $100. ;
output;
end;
run;
If the driving dataset has the list of members in the ZIP file to read then you can use the MEMVAR= option on the INFILE statement also.
data members ;
infile cards dsd dlm='|' truncover ;
input filename :$80. memname :$80.;
cards;
c:\downloads\file1.zip|example.txt
c:\downloads\file2.zip|example.txt
c:\downloads\file3.zip|example.txt
;
data test;
set members ;
filevar=filename;
memvar=memname;
infile in zip filevar=filevar memvar=memvar end=eof truncover;
do while (not eof);
input line $100. ;
output;
end;
run;
There's a few issues here.
First - you probably shouldn't use data filelist; set filelist; if you're doing something like this. Make a new dataset.
Second - filename is not executable. It is declarative. You can place it inside the data step, but you shouldn't, and for precisely this reason: it makes you think it's doing something inside the data step. It's not. It's doing something, period, and then the data step happens, later (even when placed here).
Third - you aren't using infile properly, but that's really a consequence of Second. You need the filevar option on infile to allow it to do something different here.
Fourth - you probably don't really want to just read arbitrarily from the project.xml. Really this whole thing is probably not what you want to do... I've done what you're doing, and it's doable, but not this way. But that's probably a bigger question.
If this were to work, what you'd do is this:
filename a zip "c:\doesntmatter.egp" member="project.xml";
data files;
length fname $255;
infile datalines truncover;
input #1 fname $255.;
datalines;
c:\myfile.egp
c:\myfile2.egp
c:\myfile3.egp
;;;;
run;
data egp;
set files;
infile a filevar=fname pad truncover;
input #1 first_line $512. #;
put first_line;
run;
The filename statement doesn't really do anything, but I show you where it would go. You see the filevar on the infile statement - that points to the fname variable on files. Then it reads in from there.
My general suggestion is that you should probably use the xml libname engine here; figure out what you want to do on a per-xml basis, write that out as a macro, then call the macro for each line in the file name dataset (using call execute probably, or if you must, dosubl). You don't have to use the xml libname engine, but it'll simplify things most likely.
If you're only using one file, then you can specify it directly in the filename statement I showed above, and just use infile with that filename (infile a; here, but please call it something more sensible than a). But again, it's silly to read it in this way - use the libname engine, as it'll parse out the xml for you.
Edits, to remove incorrect information confirmed by Tom's answer. Even though it does work, I don't recommend using infile here - read it with the libname engine, it'll save you loads of time.

SAS - Input all variables in a data step without naming every variable

How does one input all variables/columns within a data step using INPUT but without naming every variable? This can be done by naming each variable, for example:
DATA dataset;
INFILE '/folders/myfolders/file.txt';
INPUT variable1 variable2 variable3 variable4 $ variable5;
RUN;
However, this is very tedious for large datasets containing 200+ variables.
The original question implied that you already had a SAS data set. In that case all variables are automatically included when you SET the dataset.
data copy ;
set '/folders/myfolders/file.sas7bdat';
run;
Or just reference it in the analysis you want to do.
proc means data='/folders/myfolders/file.sas7bdat';
run;
If you actually have a TEXT file and you want to read it into a SAS dataset you could use PROC IMPORT to guess what is in the file. If it has a header row then proc import will try to convert those into valid variable names. It will also try to guess how to define the variables based on what values it sees in the text file.
proc import out=want datafile='/folders/myfolders/file.txt' dbm=dlm ;
delimiter=',';
run;
Or if the issue that it is too hard to create 200 unique variable names you could just use a variable list with numeric suffixes to save a lot of typing.
DATA dataset;
INFILE '/folders/myfolders/file.txt' dsd ;
length var1-var200 $20 ;
input var1-var200 ;
RUN;

SAS DS2 put statement output to a file

I have a SAS DS2 program which prints the output in the screen from a data step as below, I would like to channel the output to a file. AFAIK, 'file' is a missing feature in DS2 since DS2 currently reads and writes to tables, can someone please let me know how to stream the output from ds2 program to a file ? Thanks.
put '********* Body (html contents from a stream) **********/';
put _body;
Regards,
AKS
DS2 does not have file i/o capabilities beyond SAS or SQL or similar targets. DS2 mostly exists for the purpose of enabling easy connectivity to Hadoop and Teradata and some other Big Data targets; hopefully it will be expanded in the future but it's still quite young right now and not very well utilized.
That said, there are some ... creative workarounds. One possible answer involves using the log. This isn't a great solution, but it does work.
Basically, turn off notes and source, then PROC PRINTTO redirects the log to the file you want, PUT to the log, and then redirect back and turn things back on. Warnings and errors still go to the log, so don't have any of those.
This is definitely not a great solution for production code. For production code, I would highly recommend writing to a SQL table or SAS dataset and then generating your output with an old fashioned data step. DS2 isn't currently intended for this sort of thing. It's possible packages will be written to do this more helpfully in the future even if the language is not expanded to have this functionality; the JSON package is a good place to look to begin with, though I think it does not have the functionality right now.
Here's an example of using the log (a highly contrived one):
proc sql;
select name into :namestr
separated by ' '
from sashelp.class
;
select age into :agestr
separated by ' '
from sashelp.class;
quit;
%let namestr = %str(%')&namestr%str(%');
%let agestr = %str(%')&agestr%str(%');
options nonotes nosource;
proc printto log="c:\temp\testds2.txt" new;
proc ds2;
data _null_;
method init();
dcl int rc;
dcl nvarchar(15) name;
dcl int age;
dcl double iter;
dcl nvarchar(1000) namestr;
dcl nvarchar(500) agestr;
name ='';
age = .;
namestr= &namestr.;
agestr = &agestr.;
do iter = 1 to countw(namestr);
name = scan(namestr,iter);
age = scan(agestr,iter);
put name age;
end;
end;
enddata;
run;
quit;
proc printto; run;
options notes source;

Stop sas macro from overwriting different imported csv files as the same sas dataset

I found a macro and have been using it to import datasets that are given to me in csv format. Now I need to edit it because I have datasets that have an id number in them and I want sas datasets with the same name.
THE csvs are named things like IDSTUDY233_first.csv So I want the sas dataset to be IDSTUDY233_first. It should appear in my work folder.
I thought it would just create a sas dataset for each csv named IDSTUDY233_first or something like that. (and so on and so forth for each additional study). However it's naming this way.
IDSTUDY_FIRST
and over rights itself for every ID. I am newer to macros and have been trying to figure out WHY it does this and how to fix it. Suggestions?
%let subdir=Y:\filepath\; *MACRO VARIABLE FOR FILEPATH;
filename dir "&subdir.*.csv "; *give the file the name from the path that your at whatever the csv is named;
data new; *create the dataset new it has all those filepath names csv names;
length filename fname $ 200;
infile dir eof=last filename=fname;
input ;
last: filename=fname;
run;
proc sort data=new nodupkey; *sort but don't keep duplicate files;
by filename;
run;
data null; *create the dataset null;
set new;
call symputx(cats('filename',_n_),filename); *call the file name for this observation n;
call symputx(cats('dsn',_n_),compress(scan(filename,-2,'\.'), ,'ka')); *call the dataset for this file compress then read the file;
call symputx('nobs',_n_); *call for the number of observations;
run;
%put &nobs.; *but each observation in;
%macro import; *start the macro import;
%do i=1 %to &nobs; *Do for each fie to number of observations;
proc import datafile="&&filename&i" out=&&dsn&i dbms=csv replace;
getnames=yes;
run;
%end;
%mend import;
%import
*call import macro;
As you can see I added my comments of my understanding. Like I said macros are new to me. I may be incorrect in my understanding. I am guessing the problem is either in
call symputx(cats('dsn',_n_),compress(scan(filename,-2,'\.'), ,'ka'));
or it is in the import statement probably out=&&dsn&i since it rapidly over writes the previous SAS files until it does every one. It's just I need all the sas files not just the last 1.
My guess is that you are right, it is to do with this line:
call symputx(cats('dsn',_n_),compress(scan(filename,-2,'\.'), ,'ka'));
The gotcha is in the arguments passed to compress. Compress can be used to remove or keep certain characters in a string. In the above example, they are using it to just keep alphabetic characters by passing in the 'ka' modifiers. This is effectively causing files with different names (because they have different numbers) to be treated as the same file.
You can modify this behaviour to keep alphabetic characters, digits, and the underscore character by changing the parameters from ka to kn.
This change does mean that you also need to make sure that none of your file names begin with a number (as SAS datasets can't begin with a number).
The documentation for the compress function is here:
http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000212246.htm
An easy way to debug this would be to take the dataset with all of the call symput statements, and in addition to storing these values in macro variables, write them to variables in the dataset. Also change it from a data _null_ to a data tmp statement. You can then see for each file what the destination table name will be.

Exporting SAS Data into FTP using SAS

I wanted to export SAS dataset from SAS into FTP. I can export csv file (or txt file) using the following command:
%macro export_to_ftp(dsn= ,outfile_name= );
Filename MyFTP ftp "&outfile_name."
HOST='ftp.site.com'
cd= "&DATA_STRM/QC"
USER=&ftp_user.
PASS=&ftp_pass.;
PROC EXPORT DATA= &dsn. OUTFILE= MyFTP DBMS=%SCAN(&outfile_name.,2,.) REPLACE;
RUN; filename MyFTP clear;
%mend;
%export_to_ftp(dsn=lib1.dataset ,outfile_name=dataset.csv);
But couldn't use it to export SAS dataset. Can anybody please help me.
Thank you!
PROC EXPORT is not used to export SAS datasets, it's used to convert SAS datasets to other formats. You normally wouldn't use the FTP filename method to transfer SAS datasets; you would either use SAS/CONNECT if you are intending to transfer from one SAS machine to another (if you license SAS/CONNECT and want help with this, please say so), or use normal (OS) FTP processes to transfer the file. It is technically possible to use the FTP filename method to transfer a SAS file (as a binary file, reading then writing byte-by-byte) but that's error-prone and overly complicated.
The best method if you're using SAS to drive the process is to write a FTP script in your OS, and call that using x or %sysmd, passing the filename as an argument. If you include information about your operating system, something could be easily drawn up to help you out.
Note: if you're on a server, you need to verify that you have 'x' permission; that's often locked down. If you do not, you may not be able to run this entirely from SAS.
As Joe says, you do not use PROC EXPORT to create a file to be transferred using FTP. The safest way to exchange SAS datasets is to use PROC CPORT to create a transport file. Here is a modified version of your original macro:
%macro export_to_ftp(dsn= ,outfile_name= );
%let DBMS=%UPCASE(%SCAN(&outfile_name.,2,.));
%if &DBMS ne CSV and &DBMS ne TXT and &DBMS ne CPT %then %do;
%put &DBMS is not supported.;
%goto getout;
%end;
%if &DBMS=CPT %then %do;
filename MyFTP ftp "&outfile_name."
HOST='ftp.site.com'
cd= "&DATA_STRM/QC"
USER=&ftp_user.
PASS=&ftp_pass.
rcmd='binary';
PROC CPORT DATA= &dsn.
FILE = MyFTP;
RUN;
%end;
%else %do;
filename MyFTP ftp "&outfile_name."
HOST='ftp.site.com'
cd= "&DATA_STRM/QC"
USER=&ftp_user.
PASS=&ftp_pass.
rcmd='ascii';
PROC EXPORT DATA= &dsn.
OUTFILE= MyFTP
DBMS= &dbms REPLACE;
RUN;
%end;
filename MyFTP clear;
%getout:
%mend;
%export_to_ftp(dsn=lib1.dataset ,outfile_name=dataset.csv);
%export_to_ftp(dsn=lib1.dataset ,outfile_name=dataset.cpt);
By convention, this will use a file extension of cpt to identify that you want a SAS transport file created. Whoever receives the file would use PROC CIMPORT to convert the file back to a SAS dataset:
filename xpt 'path-to-transport-file';
proc cimport data=dataset infile=xpt;
run;
filename xpt clear;
Note that SAS transport files should be transferred as binary files; the other two formats are text files; hence the different filename statements.
One of many advantages of using PROC CPORT is that the entire data set is copied, including any indexes that may exist. Also, you are protected against problems related to using the data set on operating systems different from the one that created it.
What OP requires is a Binary transfer to the FTP server. That can be done via a data step.
filename ftpput ftp "<full name of your file with ext>" cd='<DIR>' user='<username>' pass='<password>' host='<ftp host>' recfm=s debug; /*ftp dir stream connection*/
filename myfile '/path/to/your/library/dsfile.sas7bdat' recfm=n; /*local file*/
/*Binary Transfer -- recfm=n*/
data _null_;
n=1;
infile myfile nbyte=n;
input;
file ftpput;
put _infile_ ##;
run;
You can tweak the input/put statements to your specifications. But otherwise, this works for me.
I guess, you can also try the fcopy function with the above filenames. That should work too, with binary transfers.
options nonotes; /*do not want the data step or ftp server notes*/
data _null_;
fcop=fcopy("myfile","ftpput");
if fcop = 0 /*Success code*/ then do;
put '|Successfully copied src file to FTP!|';
end;
else do;
msg=sysmsg();
put fcop= msg=;
end;
run;
options notes;
Looking at your code, you seem to have forgotten the quotes (") around &ftp_user. and &ftp_pass.
Otherwise your code looks okay to me.
If that does not do the trick, some error message would come in handy.
Also note that your use of scan to determine the dbms is tricky: what if a future filename has (multiple) dots in it? you are better of putting -1 (part after last dot) instead of 2 (part after second dot) as a parameter to your scan function.
I would actually do it the other way around
Export your file locally and then upload it with the FTP program
something like this: (note I used CSV.. but please use what ever file format)
You may need to edit that a little more but the base logic is there
%macro export_to_ftp(dsn= ,outfile_name= );
PROC EXPORT DATA= &dsn. OUTFILE= &file_to_FTP..CSV DBMS=%SCAN(&outfile_name.,2,.) REPLACE;
RUN;
Filename MyFTP ftp "c:\FTP_command.bat"
put "ftp &ftp_user.:&ftp_pass.#ftp.site.com "
put "cd &DATA_STRM./QC"
put "c:\&file_to_FTP..CSV ";
x "c:\FTP_command.bat";
%mend;
%export_to_ftp(dsn=lib1.dataset ,outfile_name=dataset.csv);