Is there a method to make the first delimiter in an observation different to the rest? In Microsoft SQL Server Integration Services (SSIS), there is an option to set the delimiter per column. I wonder if there is a similar way to achieve this in SAS with an amendment to the below code, whereby the first delimiter would be tab instead and the rest pipe:
proc export
dbms=csv
data=mydata.dataset1
outfile="E:\OutPutFile_%sysfunc(putn("&sysdate9"d,yymmdd10.)).txt"
replace
label;
delimiter='|';
run;
For example
From:
var1|var2|var3|var4
to
var1 var2|var3|var4
...Where the large space between var1 and var2 is a tab.
Many thanks in advance.
Sounds like you just want to make a new variable that has the first two variables combined and then write that out using tab delimiter.
data fix ;
length new1 $50 ;
set have ;
new1=catx('09'x,var1,var2);
drop var1 var2 ;
run;
proc export data=fix ... delimiter='|' ...
Note that you can reference a variable in the DLM= option on the FILE statement in a data step.
data _null_;
dlm='09'x ;
file 'outfile.txt' dsd dlm=dlm ;
set have ;
put var1 # ;
dlm='|' ;
put var2-var4 ;
run;
Or you could use the catx() trick in a data _null step. You also might want to use vvalue() function to insure formats are applied.
data _null_;
length newvar $200;
file 'outfile.txt' dsd dlm='|' ;
set have ;
newvar = catx('09'x,vvalue(var1),vvalue(var2));
put newvar var3-var4 ;
run;
Updated Fixed order of delimiters to match question.
Final code based on the marked answer by Tom:
data _null_;
dlm='09'x ;
file "E:\outputfile_%sysfunc(putn("&sysdate9"d,yymmdd10.)).txt" dsd dlm=dlm ;
set work.have;
put
var1 # ;
dlm='|';
put var2 var3 var4;
run;
Related
I noticed in the SAS log that when I call a proc export data=mydata outfile="csv.csv" dbms=csv replace; run;, I get a generated internal set which declares a comma data format: comma20.3.
138 format YEAR best12. ;
145 format RATE_SPREAD comma20.3 ;
How can I get proc export not to do this, and to export without comma separators? Eg 9000 instead of 9,000?
Unfortunately PROC EXPORT does not support the FORMAT statement.
You could make a view to the original data with the format removed and export that.
data for_export / view=for_export;
set mydata;
format rate_spread ;
run;
proc export data=for_export outfile="csv.csv" dbms=csv replace;
run;
But you really don't need to use PROC EXPORT to write a CSV file. A data step works just as well. You might have to do a little work to add the header row.
proc transpose data=mydata(obs=0) out=names ;
var _all_;
run;
data _null_;
file "csv.csv" dsd ;
set names;
put _name_ #;
run;
data _null_;
file "csv.csv" dsd mod ;
set mydata;
put (_all_) (+0);
format rate_spread ;
run;
This is a follow-up of my previous question:
How to import a txt file with single quote mark in a variable and another in another variable.
The solution there works perfectly until there is not a variable whose values could be null.
In this latter case, I get:
filename sample 'c:\temp\sample.txt';
data _null_;
file sample;
input;
put _infile_;
datalines;
001|This variable could be null|PROVA|MILANO|1000
002||'80S WERE GREAT|FORLI'|1100
003||'80S WERE GREAT|ROMA|1110
;
data want;
data prova;
infile sample dlm='|' lrecl=50 truncover;
format
codice $3.
could_be_null $20.
nome $20.
luogo $20.
importo 4.
;
input
codice
could_be_null
nome
luogo
importo
;
putlog _infile_;
run;
proc print;
run;
Is it possible to correctly load a file like the one in the example directly in SAS, without manually modifying the original .txt?
You will need to pre-process the file to fix the issue.
If you add quotes around the values then you will not have the problem.
002||"'80S WERE GREAT"|"FORLI'"|1100
IF you know that none of the values contain the delimiter then adding a space before every delimiter
002 | |'80S WERE GREAT |FORLI' |1100
will let you read it without the DSD option.
If lines are shorter than 32K bytes then it can be done in the same step that reads the data.
data test2 ;
infile sample dlm='|' truncover ;
input #;
_infile_ = tranwrd(_infile_,'|',' |');
input (var1-var5) (:$40.);
run;
proc print;
run;
Results:
Obs var1 var2 var3 var4 var5
1 001 This variable could be null PROVA MILANO 1000
2 002 '80S WERE GREAT FORLI' 1100
3 003 '80S WERE GREAT ROMA 1110
One way to test if you have the issue is to make sure each line has the right number of fields.
filename sample temp;
options parmcards=sample;
parmcards;
001|This variable could be null|PROVA|MILANO|1000
002||'80S WERE GREAT|FORLI'|1100
003||'80S WERE GREAT|ROMA|1110
;
data _null_;
infile sample dsd end=eof;
if eof then do;
call symputx('nfound',nfound);
putlog / 'Found ' nfound :comma11.
'problem lines out of ' _n_ :comma11. 'lines.'
;
end;
input;
retain expect nfound;
words=countw(_infile_,'|','qm');
if _n_=1 then expect=words;
else if expect ne words then do;
nfound+1;
if nfound <= 10 then do;
putlog (_n_ expect words) (=) ;
list;
end;
end;
run;
Example Results:
_N_=2 expect=5 words=4
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8
2 002||'80S WERE GREAT|FORLI'|1100 32
_N_=3 expect=5 words=3
3 003||'80S WERE GREAT|ROMA|1110 30
Found 2 problem lines out of 4 lines.
PS Go tell SAS to enhance their delimited file processing: https://communities.sas.com/t5/SASware-Ballot-Ideas/Enhancements-to-INFILE-FILE-to-handle-delimited-file-variations/idi-p/435977
You need to add the DSD option to your INFILE statement.
https://support.sas.com/techsup/technote/ts673.pdf
DSD (delimiter-sensitive data) option—Specifies that SAS should treat
delimiters within a data value as character data when the delimiters
and the data value are enclosed in quotation marks. As a result, SAS
does not split the string into multiple variables and the quotation
marks are removed before the variable is stored. When the DSD option
is specified and SAS encounters consecutive delimiters, the software
treats those delimiters as missing values. You can change the default
delimiter for the DSD option with the DELIMTER= option.
I have an existing process that imports data from a flat file with no headers. There are hundreds of columns. The provider of the file has added several hundred more columns at different points within the existing columns. I have a list of the old and new column names and SAS code that properly sets the data types for the old columns but not the new ones. I'd rather not have to go through my existing import code and manually write column headers and data formats but I'm not sure how to use these parts to get new import code for the new headers.
data raw_file;
infile "flatfile.csv" delimiter="|" missover dsd firstobs=1;
informat oldcol1 best32.;
informat oldcol2 mmddyy10.;
informat oldcolN $60.;
format oldcol1 best32.;
format oldcol2 mmddyy10.;
format oldcolN $60.;
input
oldcol1
oldcol2
oldcolN $;
run;
I have the header information in an Excel file right now.
old K010H K010I K010J K020A
new K010H K010I K010J K010L K010M K010N K020A
Based on your description, I presume you either know or will find out the informats for the new columns also. If that is the case, why don't you auto generate the code to read the file?
Since you have the header information, assuming you can modify it to the following format and save as a CSV:
var infmt
K010H best32.
K010I mmddyy10.
K010J $60.
K010L best32.
K010M mmddyy10.
K010N $60.
K020A best32.
Then something like this would automatically generate the code and read the data for you:
proc import datafile="cols.csv" out=cols replace;
run;
proc sql;
select var into :cols separated by ' ' from cols ;
select infmt into :infmts separated by ' ' from cols ;
quit;
%macro gen_code;
data raw_file;
infile "flatfile.csv" delimiter="|" missover dsd firstobs=1;
%let ii = 1;
%do %while (%scan(&cols, &ii, %str( )) ~= %str());
%let col = %scan(&cols, &ii, %str( ));
%let infmt = %scan(&infmts, &ii, %str( ));
informat &col &infmt ;
%let ii = %eval(&ii + 1);
%end;
input
%let ii = 1;
%do %while (%scan(&cols, &ii, %str( )) NE %str());
%let col = %scan(&cols, &ii, %str( ));
&col
%let ii = %eval(&ii + 1);
%end;
;
run;
%mend;
%gen_code;
In the future, you could make modifications to your header CSV file and the rest will be taken care by the code itself.
If you have a machine readable data dictionary then you can generate the code from that. Otherwise you will need to just edit your data step. While you are at it you can clean it up so that it is easier to maintain.
First thing is to use LENGTH or ATTRIB to define the variables, instead of forcing SAS to guess. Second only attach informats or formats to variables that need them. For example there is no need to attach informats to normal strings or numbers. No need to attach $xx format to character variables. Do you really need to attach BEST32. format to numbers instead of letting SAS go ahead and display the numeric variables without formats attached using the default BEST12. format?
Second if you define the variables in the order they appear then you can use a positional variable list in the INPUT statement. Then you only have to change the INPUT statement if the first or last variable changes.
So for your example you might create a data step like this instead.
data raw_file;
infile "flatfile.csv" dlm="|" truncover dsd firstobs=1;
length
oldcol1 8
oldcol2 8
oldcolN $60
;
informat oldcol2 mmddyy10.;
format oldcol2 mmddyy10.;
input oldcol1 -- oldcolN ;
run;
Then adding new variables is as simple as inserting them into right place in the LENGTH statement and when needed adding them to the INFORMAT and/or FORMAT statements. If you don't know what the variables contain then make them as character strings and look at the resulting values and decide later if you need to define them differently.
i m new to sas and studying different ways to do subject line task.
Here is two ways i knew at the moment
Method1: file statement in data step
*DATA _NULL_ / FILE / PUT ;
data _null_;
set engappeal;
file 'C:\Users\1502911\Desktop\exportdata.txt' dlm=',';
put id $ name $ semester scoreEng;
run;
Method2: Proc Export
proc export
data = engappeal
outfile = 'C:\Users\1502911\Desktop\exportdata2.txt'
dbms = dlm;
delimiter = ',';
run;
Question:
1, Is there any alternative way to export raw data files
2, Is it possible to export the header also using the data step method 1
You can also make use of ODS
ods listing file="C:\Users\1502911\Desktop\exportdata3.txt";
proc print data=engappeal noobs;
run;
ods listing close;
You need to use the DSD option on the FILE statement to make sure that delimiters are properly quoted and missing values are not represented by spaces. Make sure you set your record length long enough, including delimiters and inserted quotes. Don't worry about setting it too long as the lines are variable length.
You can use CALL VNEXT to find and output the names. The LINK statement is so the loop is later in the data step to prevent __NAME__ from being included in the (_ALL_) variable list.
data _null_;
set sashelp.class ;
file 'class.csv' dsd dlm=',' lrecl=1000000 ;
if _n_ eq 1 then link names;
put (_all_) (:);
return;
names:
length __name__ $32;
do while(1);
call vnext(__name__);
if upcase(__name__) eq '__NAME__' then leave;
put __name__ #;
end;
put;
return;
run;
I have a data set that I am uploading to sas. There are always 4 variables in the exact same order. The problem is sometimes the variables could have slightly different names.
For example the first variable user . The next day i get the same dataset, it might be userid . . . So I cannot use rename(user=my_user)
Is there any way i could refer to the variable by their order . . something like this
rename(var_order_1=my_user) ;
rename(var_order_3=my_inc) ;
rename _ALL_=x1-x4 ;
There are a few ways to do this. One is to determine the variable names from PROC CONTENTS or dictionary.columns and generate rename statements.
data have;
input x1-x4;
datalines;
1 2 3 4
5 6 7 8
;;;;
run;
%macro rename(var=,newvar=);
rename &var.=&newvar.;
%mend rename;
data my_vars; *the list of your new variable names, and their variable number;
length varname $10;
input varnum varname $;
datalines;
1 FirstVar
2 SecondVar
3 ThirdVar
4 FourthVar
;;;;
run;
proc sql; *Create a list of macro calls to the rename macro from joining dictionary.columns with your data. ;
* Dictionary.columns is like proc contents.;
select cats('%rename(var=',name,',newvar=',varname,')')
into :renamelist separated by ' '
from dictionary.columns C, my_vars M
where C.memname='HAVE' and C.libname='WORK'
and C.varnum=M.varnum;
quit;
proc datasets;
modify have;
&renamelist; *use the calls;
quit;
Another is to put/input the data using the input stream and the _INFILE_ automatic variable (that references the current line in the input stream). Here's an example. You would of course keep only the new variables if you wanted.
data have;
input x1-x4;
datalines;
1 2 3 4
5 6 7 8
;;;;
run;
data want;
set have;
infile datalines truncover; *or it will go to next line and EOF prematurely;
input #1 ##; *Reinitialize to the start of the line or it will eventually EOF early;
_infile_=catx(' ',of _all_); *put to input stream as space delimited - if your data has spaces you need something else;
input y1-y4 ##; *input as space delimited;
put _all_; *just checking our work, for debugging;
datalines; *dummy datalines (could use a dummy filename as well);
;;;;
run;
Here is another approach using the dictionary tables..
data have;
format var1-var4 $1.;
call missing (of _all_);
run;
proc sql noprint;
select name into: namelist separated by ' ' /* create macro var */
from dictionary.columns
where libname='WORK' and memname='HAVE' /* uppercase */
order by varnum; /* should be ordered by this anyway */
%macro create_rename(invar=);
%do x=1 %to %sysfunc(countw(&namelist,%str( )));
/* OLDVAR = NEWVARx */
%scan(&namelist,&x) = NEWVAR&x
%end;
%mend;
data want ;
set have (rename=(%create_rename(invar=&namelist)));
put _all_;
run;
gives:
NEWVAR1= NEWVAR2= NEWVAR3= NEWVAR4=