I want to read the following dat file into SAS. Since the names and values are separated by 2 spaces I use the ampersand in the input statement. But it seems that the DLM='/' in the infile statement conflicts with it. Can someone tell me what the mistake in my code is?
File:
1118 ART CONTUCK 57.69/65.20/120.50//152.60
2287 MICHAEL WINSTONE 145.89
Code:
data mylib.D_report;
infile Dinning dlm='/' dsd missover;
input ID 1-4 Name & $17. M1-M6;
run;
You're mixing input styles, which while understandable given you have fairly mixed input data, isn't permitted the way you're doing it.
Your best option is to read M1-6 into one variable, then split it up using SCAN.
data work.D_report;
infile datalines missover dlm=' ';
input ID :4.
Name & $17.
Ms :$40.;
array M[6];
do _t = 1 to countc(Ms,'/')+1;
if _t > dim(M) then leave;
M[_t]=scan(Ms,_t,'/','m');
end;
datalines;
1118 ART CONTUCK 57.69/65.20/120.50//152.60
2287 MICHAEL WINSTONE 145.89
;;;;
run;
You just need to change the delimiter.
data D_report;
dlm = ' ';
infile cards dlm=dlm missover dsd;
input ID 1-4 Name & $17. #;
dlm = '/';
input M1-M6;
cards;
1118 ART CONTUCK 57.69/65.20/120.50//152.60
2287 MICHAEL WINSTONE 145.89
run;
proc print;
run;
Related
In the code below, I'm wondering why is the last observation(=carlo) lost when using the column pointer control?
data work.toExercise ;
infile "/home/u61425323/BASE_DATA/exercise.txt" ; /* my direction */
input Name $7. +3 Nation $7. +2 Code $5. ;
title "Why is the last observation(=carlo) lost?" ;
run;
proc print ; run ;
Below are the exercise.txt.
natasha korea a1111
kelly america b2222
carlo mexico c333
Below are the output results.
enter image description here
Please forgive my poor English.
To stop SAS from going to a new line for input when the line is too short to satisfy the INPUT statement use the TRUNCOVER option on the INFILE statement.
Let's create a text file with your variable length records.
filename text temp;
options parmcards=text;
parmcards;
natasha korea a1111
kelly america b2222
carlo mexico c333
;
If you read it with your data step we get this message:
NOTE: LOST CARD.
Name=carlo Nation=mexico Code= _ERROR_=1 _N_=3
NOTE: 3 records were read from the infile TEXT.
The minimum record length was 23.
The maximum record length was 24.
NOTE: SAS went to a new line when INPUT statement reached past the end of a line.
NOTE: The data set WORK.ORGINAL has 2 observations and 3 variables.
But when we add the TRUNCOVER option it reads all three observations.
data want ;
infile text truncover ;
input Name $7. +3 Nation $7. +2 Code $5. ;
run;
Result
Do not use the ancient MISSOVER option. That option will discard text at the end of lines that are not long enough for the format that is reading them. It can work if you only use LIST MODE input style where SAS adjusts the width of the informat to match the length of the next word on the line, but then you are just getting the TRUNCOVER behavior anyway so why not be specific.
data wrong ;
infile text missover ;
input Name $7. +3 Nation $7. +2 Code $5. ;
run;
Use the TRUNCOVER option with the INFILE statement.
From the INPUT documentation
TRUNCOVER
overrides the default behavior of the INPUT statement when an input data record is shorter than the INPUT statement expects. By default, the INPUT statement automatically reads the next input data record. TRUNCOVER enables you to read variable-length records when some records are shorter than the INPUT statement expects. Variables without any values assigned are set to missing.
I think that happens because you have the last record shorter than the code expects.
You can try one of the infile options to control the processing in this case, for example:
infile "/home/u61425323/BASE_DATA/exercise.txt" MISSOVER;
I also do not know your task requirements but probably this version of the code would work more stable:
data work.toExercise ;
length Name $7 Nation $7 Code $5;
infile "/home/u61425323/BASE_DATA/exercise.txt" dlm=' ';
input Name Nation Code;
title "Why is the last observation(=carlo) lost?" ;
run;
This is a follow-up of my previous question:
How to import a txt file with single quote mark in a variable and another in another variable.
The solution there works perfectly until there is not a variable whose values could be null.
In this latter case, I get:
filename sample 'c:\temp\sample.txt';
data _null_;
file sample;
input;
put _infile_;
datalines;
001|This variable could be null|PROVA|MILANO|1000
002||'80S WERE GREAT|FORLI'|1100
003||'80S WERE GREAT|ROMA|1110
;
data want;
data prova;
infile sample dlm='|' lrecl=50 truncover;
format
codice $3.
could_be_null $20.
nome $20.
luogo $20.
importo 4.
;
input
codice
could_be_null
nome
luogo
importo
;
putlog _infile_;
run;
proc print;
run;
Is it possible to correctly load a file like the one in the example directly in SAS, without manually modifying the original .txt?
You will need to pre-process the file to fix the issue.
If you add quotes around the values then you will not have the problem.
002||"'80S WERE GREAT"|"FORLI'"|1100
IF you know that none of the values contain the delimiter then adding a space before every delimiter
002 | |'80S WERE GREAT |FORLI' |1100
will let you read it without the DSD option.
If lines are shorter than 32K bytes then it can be done in the same step that reads the data.
data test2 ;
infile sample dlm='|' truncover ;
input #;
_infile_ = tranwrd(_infile_,'|',' |');
input (var1-var5) (:$40.);
run;
proc print;
run;
Results:
Obs var1 var2 var3 var4 var5
1 001 This variable could be null PROVA MILANO 1000
2 002 '80S WERE GREAT FORLI' 1100
3 003 '80S WERE GREAT ROMA 1110
One way to test if you have the issue is to make sure each line has the right number of fields.
filename sample temp;
options parmcards=sample;
parmcards;
001|This variable could be null|PROVA|MILANO|1000
002||'80S WERE GREAT|FORLI'|1100
003||'80S WERE GREAT|ROMA|1110
;
data _null_;
infile sample dsd end=eof;
if eof then do;
call symputx('nfound',nfound);
putlog / 'Found ' nfound :comma11.
'problem lines out of ' _n_ :comma11. 'lines.'
;
end;
input;
retain expect nfound;
words=countw(_infile_,'|','qm');
if _n_=1 then expect=words;
else if expect ne words then do;
nfound+1;
if nfound <= 10 then do;
putlog (_n_ expect words) (=) ;
list;
end;
end;
run;
Example Results:
_N_=2 expect=5 words=4
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8
2 002||'80S WERE GREAT|FORLI'|1100 32
_N_=3 expect=5 words=3
3 003||'80S WERE GREAT|ROMA|1110 30
Found 2 problem lines out of 4 lines.
PS Go tell SAS to enhance their delimited file processing: https://communities.sas.com/t5/SASware-Ballot-Ideas/Enhancements-to-INFILE-FILE-to-handle-delimited-file-variations/idi-p/435977
You need to add the DSD option to your INFILE statement.
https://support.sas.com/techsup/technote/ts673.pdf
DSD (delimiter-sensitive data) option—Specifies that SAS should treat
delimiters within a data value as character data when the delimiters
and the data value are enclosed in quotation marks. As a result, SAS
does not split the string into multiple variables and the quotation
marks are removed before the variable is stored. When the DSD option
is specified and SAS encounters consecutive delimiters, the software
treats those delimiters as missing values. You can change the default
delimiter for the DSD option with the DELIMTER= option.
I have a excel-file where I want to split words into different columns in SAS.
In the file it looks like this in the same column, I want to split it and get rid of quotation marks :
ID;"City";"Year"
1;"New york";NULL
2;"stockton";"18"
This is what I tried to do:
data work.project ;
infile "&path\users.csv" delimiter=';' missover dsd;
input ID: $30.
City: $200.
Year: $5. ;
run;
proc print data=work.project;
run;
My output:
Obs ID City Year
1 ,,,"ID ""City"" ""Year
2 ,,,"1 ""new york"" NULL"
3 ,,,"2 ""stockton"" ""18"
4 ,,,"3 ""moscow "" NULL"
Rather than the colon and formats in the INPUT statement use an INFORMAT statement.
data work.project;
infile datalines4 delimiter=';' truncover dsd;
informat id $30. city $200. year $4.;
input ID City Year;
datalines4;
1;"New York";NULL
2;"Stockton";"18"
;;;;
run;
proc print data=project;
run;
filename Source 'C:\Source.txt';
Data Example;
Infile Source;
Input Var1 Var2;
Run;
Is there a way I can import all the variables from Source.txt without the "Input Var1 Var2" line? If there are many variables, I think it's too time consuming to list out all the variables, so I was wondering if there's any way to bypass that.
Thanks
Maybe you can use proc import ?
For a CSV I use this and I don't have to define every variable
proc import datafile="&CSVFILE"
out=myCsvData
dbms=dlm
replace;
delimiter=';';
getnames=yes;
run;
It depends on what you have in your txt file. Try different delimiters.
If you are looking at a solution which is INFILE statement based then following reference code should help.
data _null_;
set sashelp.class;
file '/tester/sashelp_class.txt' dsd dlm='09'x;
put name age sex weight height;
run;
/* Version #1 : When data has mixed data(numeric and character) */
data reading_data_w_format;
infile '/tester/sashelp_class.txt' dsd dlm='09'x;
format name $10. age 8. gender $1. weight height 8.2;
input (name--height) (:);
run;
proc print data=reading_data_w_format;run;
proc contents data=reading_data_w_format;run;
/* Version #2 : When all data can be read a character.
I know this version doesn't make sense, but it's still an option*/
data reading_data_wo_format;
infile '/tester/sashelp_class.txt' dsd dlm='09'x;
input (var1-var5) (:$8.); /* Length would be max length of value in all the columns */
run;
proc print data=reading_data_wo_format;run;
proc contents data=reading_data_wo_format;run;
I'd suggest to write down the informat for the variables to be read so that you are sure that the file is as per your specification. PROC IMPORT will try to scan the data first from 1st row till GUESSINGROWS(do not set it to high, if each column is of consistent length) value and based on the length and type, it will use an informat and length which it finds suitable for the reading the variables in the file.
I have the below raw data
1,,35,000
2,100,45,000
and need the below in a dataset
1 . 35000
2 100 45000
this would require both dsd option and using comma. informat.
How to carry this out?
DSD has nothing to do with this - DSD involves input like
1,,"35,000"
2,100,"45,000"
If that is what you have, then you can use the : operator to read it in with the comma informat.
data test;
infile datalines dlm=',' dsd;
input id
num
dollar :comma8.;
datalines;
1,,"35,000"
2,100,"45,000"
;;;;
run;
If you do not have the quotes around the field, then you will need to parse this somehow. One solution is below, which will work as long as the field with commas is the final field.
data test;
infile datalines dlm=',' dsd;
input #;
if countc(_infile_,',') =3 then do;
_commapos = findc(_infile_,',',-1*length(_infile_));
_infile_ = substr(_infile_,1,_commapos-1)||substr(_infile_,_commapos+1);
end;
input id
num
dollar ;
put _all_;
datalines;
1,,35,000
2,100,45,000
;;;;
run;
If the field your potential is in is in a consistent field, but NOT the first one, you can modify the above solution to correct it. If it's in potentially more than one field, you have a much more difficult problem to solve.