I have the below raw data
1,,35,000
2,100,45,000
and need the below in a dataset
1 . 35000
2 100 45000
this would require both dsd option and using comma. informat.
How to carry this out?
DSD has nothing to do with this - DSD involves input like
1,,"35,000"
2,100,"45,000"
If that is what you have, then you can use the : operator to read it in with the comma informat.
data test;
infile datalines dlm=',' dsd;
input id
num
dollar :comma8.;
datalines;
1,,"35,000"
2,100,"45,000"
;;;;
run;
If you do not have the quotes around the field, then you will need to parse this somehow. One solution is below, which will work as long as the field with commas is the final field.
data test;
infile datalines dlm=',' dsd;
input #;
if countc(_infile_,',') =3 then do;
_commapos = findc(_infile_,',',-1*length(_infile_));
_infile_ = substr(_infile_,1,_commapos-1)||substr(_infile_,_commapos+1);
end;
input id
num
dollar ;
put _all_;
datalines;
1,,35,000
2,100,45,000
;;;;
run;
If the field your potential is in is in a consistent field, but NOT the first one, you can modify the above solution to correct it. If it's in potentially more than one field, you have a much more difficult problem to solve.
Related
This is a follow-up of my previous question:
How to import a txt file with single quote mark in a variable and another in another variable.
The solution there works perfectly until there is not a variable whose values could be null.
In this latter case, I get:
filename sample 'c:\temp\sample.txt';
data _null_;
file sample;
input;
put _infile_;
datalines;
001|This variable could be null|PROVA|MILANO|1000
002||'80S WERE GREAT|FORLI'|1100
003||'80S WERE GREAT|ROMA|1110
;
data want;
data prova;
infile sample dlm='|' lrecl=50 truncover;
format
codice $3.
could_be_null $20.
nome $20.
luogo $20.
importo 4.
;
input
codice
could_be_null
nome
luogo
importo
;
putlog _infile_;
run;
proc print;
run;
Is it possible to correctly load a file like the one in the example directly in SAS, without manually modifying the original .txt?
You will need to pre-process the file to fix the issue.
If you add quotes around the values then you will not have the problem.
002||"'80S WERE GREAT"|"FORLI'"|1100
IF you know that none of the values contain the delimiter then adding a space before every delimiter
002 | |'80S WERE GREAT |FORLI' |1100
will let you read it without the DSD option.
If lines are shorter than 32K bytes then it can be done in the same step that reads the data.
data test2 ;
infile sample dlm='|' truncover ;
input #;
_infile_ = tranwrd(_infile_,'|',' |');
input (var1-var5) (:$40.);
run;
proc print;
run;
Results:
Obs var1 var2 var3 var4 var5
1 001 This variable could be null PROVA MILANO 1000
2 002 '80S WERE GREAT FORLI' 1100
3 003 '80S WERE GREAT ROMA 1110
One way to test if you have the issue is to make sure each line has the right number of fields.
filename sample temp;
options parmcards=sample;
parmcards;
001|This variable could be null|PROVA|MILANO|1000
002||'80S WERE GREAT|FORLI'|1100
003||'80S WERE GREAT|ROMA|1110
;
data _null_;
infile sample dsd end=eof;
if eof then do;
call symputx('nfound',nfound);
putlog / 'Found ' nfound :comma11.
'problem lines out of ' _n_ :comma11. 'lines.'
;
end;
input;
retain expect nfound;
words=countw(_infile_,'|','qm');
if _n_=1 then expect=words;
else if expect ne words then do;
nfound+1;
if nfound <= 10 then do;
putlog (_n_ expect words) (=) ;
list;
end;
end;
run;
Example Results:
_N_=2 expect=5 words=4
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8
2 002||'80S WERE GREAT|FORLI'|1100 32
_N_=3 expect=5 words=3
3 003||'80S WERE GREAT|ROMA|1110 30
Found 2 problem lines out of 4 lines.
PS Go tell SAS to enhance their delimited file processing: https://communities.sas.com/t5/SASware-Ballot-Ideas/Enhancements-to-INFILE-FILE-to-handle-delimited-file-variations/idi-p/435977
You need to add the DSD option to your INFILE statement.
https://support.sas.com/techsup/technote/ts673.pdf
DSD (delimiter-sensitive data) option—Specifies that SAS should treat
delimiters within a data value as character data when the delimiters
and the data value are enclosed in quotation marks. As a result, SAS
does not split the string into multiple variables and the quotation
marks are removed before the variable is stored. When the DSD option
is specified and SAS encounters consecutive delimiters, the software
treats those delimiters as missing values. You can change the default
delimiter for the DSD option with the DELIMTER= option.
I need to use the INFILE statement to read a file called np_traffic.csv, name the table traffic2, and only import a column called ReportingDate as a character.
Current Code is giving me the error
"The data set WORK.TRAFFIC2 may be incomplete. When this step was
stopped there were 0 observations and 1 variables."
DATA traffic2;
INFILE “E:/Documents/Week 2/np_traffic.csv”
dsd firstobs=2;
INPUT ReportingDate $;
RUN;
Let's assume that you really have a delimited text file, which is what a CSV file is, instead of the spreadsheet you pictured in the photograph in your post. To read the 6th field in a line you need to first read the first 5 fields. That does not mean you need use the values read from those fields.
data traffic2;
infile “E:/Documents/Week 2/np_traffic.csv”
dsd firstobs=2
;
length dummy $1 ReportingDate $12;
input 5*dummy ReportingDate ;
drop dummy;
run;
I would suggest to try it this way:
data traffic2;
drop a b c d e g;
infile 'E:\Documents\Week 2\np_traffic.csv' dsd dlm='<Insert your delimiter>' firstobs=2;
input a b c d e f g;
run;
https://documentation.sas.com/?docsetId=lestmtsref&docsetTarget=n1rill4udj0tfun1fvce3j401plo.htm&docsetVersion=9.4&locale=en
I have a excel-file where I want to split words into different columns in SAS.
In the file it looks like this in the same column, I want to split it and get rid of quotation marks :
ID;"City";"Year"
1;"New york";NULL
2;"stockton";"18"
This is what I tried to do:
data work.project ;
infile "&path\users.csv" delimiter=';' missover dsd;
input ID: $30.
City: $200.
Year: $5. ;
run;
proc print data=work.project;
run;
My output:
Obs ID City Year
1 ,,,"ID ""City"" ""Year
2 ,,,"1 ""new york"" NULL"
3 ,,,"2 ""stockton"" ""18"
4 ,,,"3 ""moscow "" NULL"
Rather than the colon and formats in the INPUT statement use an INFORMAT statement.
data work.project;
infile datalines4 delimiter=';' truncover dsd;
informat id $30. city $200. year $4.;
input ID City Year;
datalines4;
1;"New York";NULL
2;"Stockton";"18"
;;;;
run;
proc print data=project;
run;
My datalines has 2 variables, date 1 and date 2 with corresponding format ddmmyy10. and mmddyy10.
data date;
input date1 ddmmyy10. date2 mmddyy10.;
datalines;
09/01/2015 01/09/2015
10/01/2015 01/10/2015
11/01/2015 01/11/2015
12/01/2015 01/12/2015
13/01/2015 01/13/2015
;
run;
I tried to add the code but still not work
infile datalines delimiter=' ';
The : (colon) format modifier enables you to use list input but also to specify an informat after a variable name, whether character or numeric. SAS reads until it encounters a blank column, the defined length of the variable (character only), or the end of the data line, whichever comes first.
Format Modifier
data date;
input date1: ddmmyy10. date2: mmddyy10.;
datalines;
09/01/2015 01/09/2015
10/01/2015 01/10/2015
11/01/2015 01/11/2015
12/01/2015 01/12/2015
13/01/2015 01/13/2015
;
run;
The reason why the code is not working is because you are using List input to read non-standard data without colon input modifier or informat statement.
For non-standard data(commas, dollar, date etc. -> Reading Raw data -> Kinds of Data ) or standard data of length greater than 8 byte using List input technique you would need to use either INFORMAT statement or colon modifier with INPUT statement.
1) Assign informat for the input variables using INFORMAT statement or ATTRIB statement
data date;
informat date1 ddmmyy10. date2 mmddyy10.;
input date1 date2;
format date1-date2 yymmdd10.;
datalines;
09/01/2015 01/09/2015
10/01/2015 01/10/2015
;
run;
2) Use Colon (:) input modifier
data date;
input date1: ddmmyy10. date2: mmddyy10.;
format date1-date2 yymmdd10.;
datalines;
09/01/2015 01/09/2015
10/01/2015 01/10/2015
;
run;
I want to read the following dat file into SAS. Since the names and values are separated by 2 spaces I use the ampersand in the input statement. But it seems that the DLM='/' in the infile statement conflicts with it. Can someone tell me what the mistake in my code is?
File:
1118 ART CONTUCK 57.69/65.20/120.50//152.60
2287 MICHAEL WINSTONE 145.89
Code:
data mylib.D_report;
infile Dinning dlm='/' dsd missover;
input ID 1-4 Name & $17. M1-M6;
run;
You're mixing input styles, which while understandable given you have fairly mixed input data, isn't permitted the way you're doing it.
Your best option is to read M1-6 into one variable, then split it up using SCAN.
data work.D_report;
infile datalines missover dlm=' ';
input ID :4.
Name & $17.
Ms :$40.;
array M[6];
do _t = 1 to countc(Ms,'/')+1;
if _t > dim(M) then leave;
M[_t]=scan(Ms,_t,'/','m');
end;
datalines;
1118 ART CONTUCK 57.69/65.20/120.50//152.60
2287 MICHAEL WINSTONE 145.89
;;;;
run;
You just need to change the delimiter.
data D_report;
dlm = ' ';
infile cards dlm=dlm missover dsd;
input ID 1-4 Name & $17. #;
dlm = '/';
input M1-M6;
cards;
1118 ART CONTUCK 57.69/65.20/120.50//152.60
2287 MICHAEL WINSTONE 145.89
run;
proc print;
run;