I'm trying to use line pointer controls in my SAS program which utilizes many INPUT statements based on the value of certain variables. Many of my fields contain packed decimals and I think this is causing issues with the line-pointer controls, it seems that the program doesn't determine the packed decimal before it unpacks and has moved the column pointer incorrectly.
I have no way of knowing how many of my 'segments' to read, I just know the maximum number that is possible, so I will be needing to check values through the program before reading the data and use relative pointer controls based on that 'segment'. I'll be looping and using line-pointer controls (INPUT +N) to accomplish this.
My old program worked correctly when I knew the exact columns I needed to read in, so I simply used an input statement with.
Here's a sample of the old program, I'm only including the top portion because it will illustrate what I had:
....some rsubmitting and options statements....
%DO i=0 %TO 2;
filename MyFN "MyFile(-&i.)" disp=shr;
DATA ReadInTemp;
INFILE MyFN MISSOVER PAD;
INPUT
#1 Pro_Ind $1. #;
IF Pro_Ind ="H" or Pro_Ind ="T" THEN DELETE;
IF Pro_Ind ="1" THEN DO;
INPUT
#2 Time_Stamp ? PD8.
#10 MyVar2 ? $1.
#11 MyVar3 ? $20.
#31 MyVar4 ? $2.
Here's the program I'm trying with Line Controls:
....some rsubmitting and options statements....
%DO i=0 %TO 2;
filename MyFN "MyFile(-&i.)" disp=shr;
DATA ReadInTemp;
INFILE MyFN missover pad;
INPUT
#1 Pro_Ind $1. #;
IF Pro_Ind ="H" or Pro_Ind ="T" THEN DELETE;
IF Pro_Ind ="1" THEN DO;
INPUT
#2 Time_Stamp ? PD8. +7
MyVar1 ? $1. +1
MyVar2 ? $20. +19
MyVar3 ? $2. +1
Please keep in mind this is only the sample of where A) the program used to work and B) where it is not working now. I understand that there is no END and %END statements, etc, but I believe my issue is after I am reading this TimeStamp variable which contains the Packed Decimal.
One issue in your program with line controls is that the informat will advance the column pointer by the informat width, and then you are explicitly advancing more.
To be certain the informat reads the data as-is (retain leading spaces #Tom) use $CHARw. instead of $w. and remove the extraneous +n
INPUT
#2 Time_Stamp ? PD8.
MyVar1 ? $CHAR1.
MyVar2 ? $CHAR20.
MyVar3 ? $CHAR2.
...
Related
This is a follow-up of my previous question:
How to import a txt file with single quote mark in a variable and another in another variable.
The solution there works perfectly until there is not a variable whose values could be null.
In this latter case, I get:
filename sample 'c:\temp\sample.txt';
data _null_;
file sample;
input;
put _infile_;
datalines;
001|This variable could be null|PROVA|MILANO|1000
002||'80S WERE GREAT|FORLI'|1100
003||'80S WERE GREAT|ROMA|1110
;
data want;
data prova;
infile sample dlm='|' lrecl=50 truncover;
format
codice $3.
could_be_null $20.
nome $20.
luogo $20.
importo 4.
;
input
codice
could_be_null
nome
luogo
importo
;
putlog _infile_;
run;
proc print;
run;
Is it possible to correctly load a file like the one in the example directly in SAS, without manually modifying the original .txt?
You will need to pre-process the file to fix the issue.
If you add quotes around the values then you will not have the problem.
002||"'80S WERE GREAT"|"FORLI'"|1100
IF you know that none of the values contain the delimiter then adding a space before every delimiter
002 | |'80S WERE GREAT |FORLI' |1100
will let you read it without the DSD option.
If lines are shorter than 32K bytes then it can be done in the same step that reads the data.
data test2 ;
infile sample dlm='|' truncover ;
input #;
_infile_ = tranwrd(_infile_,'|',' |');
input (var1-var5) (:$40.);
run;
proc print;
run;
Results:
Obs var1 var2 var3 var4 var5
1 001 This variable could be null PROVA MILANO 1000
2 002 '80S WERE GREAT FORLI' 1100
3 003 '80S WERE GREAT ROMA 1110
One way to test if you have the issue is to make sure each line has the right number of fields.
filename sample temp;
options parmcards=sample;
parmcards;
001|This variable could be null|PROVA|MILANO|1000
002||'80S WERE GREAT|FORLI'|1100
003||'80S WERE GREAT|ROMA|1110
;
data _null_;
infile sample dsd end=eof;
if eof then do;
call symputx('nfound',nfound);
putlog / 'Found ' nfound :comma11.
'problem lines out of ' _n_ :comma11. 'lines.'
;
end;
input;
retain expect nfound;
words=countw(_infile_,'|','qm');
if _n_=1 then expect=words;
else if expect ne words then do;
nfound+1;
if nfound <= 10 then do;
putlog (_n_ expect words) (=) ;
list;
end;
end;
run;
Example Results:
_N_=2 expect=5 words=4
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8
2 002||'80S WERE GREAT|FORLI'|1100 32
_N_=3 expect=5 words=3
3 003||'80S WERE GREAT|ROMA|1110 30
Found 2 problem lines out of 4 lines.
PS Go tell SAS to enhance their delimited file processing: https://communities.sas.com/t5/SASware-Ballot-Ideas/Enhancements-to-INFILE-FILE-to-handle-delimited-file-variations/idi-p/435977
You need to add the DSD option to your INFILE statement.
https://support.sas.com/techsup/technote/ts673.pdf
DSD (delimiter-sensitive data) option—Specifies that SAS should treat
delimiters within a data value as character data when the delimiters
and the data value are enclosed in quotation marks. As a result, SAS
does not split the string into multiple variables and the quotation
marks are removed before the variable is stored. When the DSD option
is specified and SAS encounters consecutive delimiters, the software
treats those delimiters as missing values. You can change the default
delimiter for the DSD option with the DELIMTER= option.
I am trying to export SAS data into CSV, sas dataset name is abc here and format is
LINE_NUMBER DESCRIPTION
524JG 24PC AMEFA VINTAGE CUTLERY SET "DUBARRY"
I am using following code.
filename exprt "C:/abc.csv" encoding="utf-8";
proc export data=abc
outfile=exprt
dbms=tab;
run;
output is
LINE_NUMBER DESCRIPTION
524JG "24PC AMEFA VINTAGE CUTLERY SET ""DUBARRY"""
so there is double quote available before and after the description here and additional doble quote is coming after & before DUBARRY word. I have no clue whats happening. Can some one help me to resolve this and make me understand what exatly happening here.
expected result:
LINE_NUMBER DESCRIPTION
524JG 24PC AMEFA VINTAGE CUTLERY SET "DUBARRY"
There is no need to use PROC EXPORT to create a delimited file. You can write it with a simple DATA step. If you want to create your example file then just do not use the DSD option on the FILE statement. But note that depending on the data you are writing that you could create a file that cannot be properly parsed because of extra un-protected delimiters. Also you will have trouble representing missing values.
Let's make a sample dataset we can use to test.
data have ;
input id value cvalue $ name $20. ;
cards;
1 123 A Normal
2 345 B Embedded|delimiter
3 678 C Embedded "quotes"
4 . D Missing value
5 901 . Missing cvalue
;
Essentially PROC EXPORT is writing the data using the DSD option. Like this:
data _null_;
set have ;
file 'myfile.txt' dsd dlm='09'x ;
put (_all_) (+0);
run;
Which will yield a file like this (with pipes replacing the tabs so you can see them).
1|123|A|Normal
2|345|B|"Embedded|delimiter"
3|678|C|"Embedded ""quotes"""
4||D|Missing value
5|901||Missing cvalue
If you just remove DSD option then you get a file like this instead.
1|123|A|Normal
2|345|B|Embedded|delimiter
3|678|C|Embedded "quotes"
4|.|D|Missing value
5|901| |Missing cvalue
Notice how the second line looks like it has 5 values instead of 4, making it impossible to know how to split it into 4 values. Also notice how the missing values have a minimum length of at least one character.
Another way would be to run a data step to convert the normal file that PROC EXPORT generates into the variant format that you want. This might also give you a place to add escape characters to protect special characters if your target format requires them.
data _null_;
infile normal dsd dlm='|' truncover ;
file abnormal dlm='|';
do i=1 to 4 ;
if i>1 then put '|' #;
input field :$32767. #;
field = tranwrd(field,'\','\\');
field = tranwrd(field,'|','\|');
len = lengthn(field);
put field $varying32767. len #;
end;
put;
run;
You could even make this datastep smart enough to count the number of fields on the first row and use that to control the loop so that you wouldn't have to hard code it.
I am new to SAS and I need some help here
The question below:
So far, I have done this:
data Purchase;
infile ‘c:\temp\PurchaseRecords.dat’ dlm=’,’ DSD;
input id $8 visit_no # unitpurchased #;
keep id unitpurchased;
run;
What do I need to add in my statement to make those orders look like this?
just an example.
Thank you.
You can use the infile column= in conjunction with the input held input # modifier to determine when held input has run past a trailing comma meant to indicate a missing value that is to be interpreted as a case of zero units_purchased. The automatic variable _infile_ is used to check when an input statement has positioned itself for the next read to be beyond the length of a data line.
data want;
infile datalines dsd dlm=',' column=p;
attrib id length=$8 units_purchased length=8 ;
input id #; * held input record;
* loop over held input record;
do while (p <= length(_infile_)+1); * +1 for dealing with trailing comma;
input units_purchased #; * continue to hold the record;
if missing(units_purchased) then units_purchased = 0;
output;
end;
datalines;
C005,3,15,,39
D2356,4,11,,5
A323,3,10,15,20
F123,1,
run;
The sometimes easier to use ## modifier wouldn't be used in this case because a missing value is to be considered valid input and thus can't be used to assert a 'no more data' condition.
Since the data includes the number of values use that to control a DO loop to read the values. I am not sure why you would want to lose the information on the order of the values, so I have commented out the KEEP statement. To convert the missing values to zeros I used a sum statement. You could use an IF/THEN statement or a COALESE() function call or other methods to convert the missing values to zeros.
data Purchase;
infile 'c:\temp\PurchaseRecords.dat' dsd truncover ;
length id $8 ;
input id visit_no # ;
do visit=1 to visit_no ;
input unitpurchased #;
unitpurchased+0;
output;
end;
* keep id unitpurchased;
run;
Your original program had a few errors:
Wrong quote characters. Use normal ASCII single or double quote characters.
It is reading value of ID from only column 8. I find it better to use LENGTH statement to define the variables instead of forcing SAS to guess at how to define the variables.
The input statement improperly is trying to use column pointer motion command, #nnn. Plus the variable location to move the pointer to, unitpurchased, has not yet been given a value.
No attempt was made to read more than one value from the line.
You did not include truncover (or even the older missover) option on your infile statement.
I work with SAS EG at work and am pretty familiar with it, but am just now trying to pick up the basics of programming in base SAS using SAS university.
Can someone please take a look at the below code and tell me what the #1 and #7 mean when I'm declaring these columns... I think it has something to do with the length of numbers allowed?
Thanks in advance!
DATA MYDATA1;
INPUT **#1** COL1 4.2 **#7** COL2 3.1;
ADD_RESULT = COL1 + COL2;
DATALINES;
11.21 5.3
3.11 11
;
PROC PRINT DATA= MYDATA1;
RUN;
The # in an INPUT statement is used to move the column pointer. So #1 moves to the first column.
Note that your example datalines are all indented by three spaces, so your program will not run. If you place the DATALINES (or CARDS) statement starting in column one then the editor will automatically move to column one when you insert lines to begin typing your data. The program will also then be clearer to the reader if the DATALINES statement is in column one.
Note that your first value is too long for the INFORMAT that you are using the in the INPUT statement. You used a width of 4 characters, but the value has 5 characters, counting the decimal point.
Also you will normally only include a decimal part on a informat specification when you know that the raw data has purposely NOT supplied an actual period character to indicate the boundary between the ones and tenths place. So if your raw data value was 1121 then reading it with 4.2 would result in the number 11.21.
DATA MYDATA1;
INPUT #1 COL1 5. #7 COL2 3.;
ADD_RESULT = COL1 + COL2;
DATALINES;
11.21 5.3
3.11 11
;
PROC PRINT DATA= MYDATA1;
RUN;
The #1 and #7 as used in your code indicate the column position at which SAS should expect to find the input data. So col1 data should be found in data column position 1 onwards and col2 should be found in data column position 7 onwards.
You might need to realign some of your data to be consistent with the expected #column input positions.
Problem: suppose i do not know the variable name and number of variable. or imagine there are too many variables that i cannot write the put statement.
the following cases is that i knew there are 3 varialbes
data _null_;
set meeting;
file 'C:\Users\Desktop\meeting2.txt' dlm=',';
put region mtg sendmail;
run;
I tried using put _all_;
And the output is:
region=N mtg=24NOV1999 sendmail=10OCT1999 _ERROR_=0 _N_=1
region=S mtg=28DEC1999 sendmail=13NOV1999 _ERROR_=0 _N_=2
region=E mtg=03DEC1999 sendmail=19OCT1999 _ERROR_=0 _N_=3
region=W mtg=04OCT1999 sendmail=20AUG1999 _ERROR_=0 _N_=4
While it does not give comman delimited format but named format instead
My desired output would be
N,24NOV1999,10OCT1999
S,28DEC1999,13NOV1999
E,03DEC1999,19OCT1999
W,04OCT1999,20AUG1999
This is right one
data _null_;
set meeting;
file 'C:\Users\Desktop\meeting2.txt' dlm=',';
put (_all_) (~);
run;
this one helps u
The reason you have so many answers that seem to work, but have different characters, is that the important thing is changing _all_ to (_all_). The arguments after that are not important.
Explained in some detail here, you actually have two entirely different things going on when you write
put _all_;
and
put (_all_) (:);
Programmers familiar with the concept of an overloaded function will find that as the simplest way to think of this. If put sees _all_, it calls one version of put. If it sees (_all_) (or any list of variables with ( ) around it), it calls another (expanding _all_ to its variable list). Notice that if you try
put (_all_);
It fails, and it fails with errors suggesting it is trying to call formatted input (ie, it asks you why you don't have another ( there, which would be the normal thing in formatted input after a list with ( ).)
By itself, _all_ is an argument to put that specifically tells it to use named output to output all variables in the dataset. Hence the variable=value format of the output. So in the first example, _all_ is a constant - an argument - nothing more.
In the second example, though, (_all_) is a variable list, which contains all variables as if they were typed in, space delimited. So
put (_all_) (:);
is equivalent to
put (name sex age height weight) (:);
if used with SASHELP.CLASS. Adding anything - a colon, a tilde, an ampersand, etc. - that is legal in the context of formatted output will cause that to be used.
Note that
put _all_ #;
Does not cause that to happen - apparently # (or ## or / or //) are all legal arguments to put _all_.
Interestingly, _numeric_ and _character_ do not have an analogous shortcut - clearly this is an explicit, special case just for _all_. They cannot be used without parens. put _numeric_; gives an error that _numeric_ is not a legal variable name. But, put (_numeric_) (:); is perfectly legal.
Try the colon modifier option.
data _null_;
set meeting;
file 'C:\Users\Desktop\meeting2.txt' dlm=',';
put (_all_) (:);
run;
Another option is to read the names from the SASHELP.VCOLUMN table, create a macro variable that lists the columns and include that in your put statement.
The documentation is a bit scarce:
https://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000176623.htm
:
enables you to specify a format that the PUT statement uses to write the variable value. All leading and trailing blanks are deleted, and each value is followed by a single blank.
~
enables you to specify a format that the PUT statement uses to write the variable value. SAS displays the formatted value in quotation marks even if the formatted value does not contain the delimiter. SAS deletes all leading and trailing blanks, and each value is followed by a single blank. Missing values for character variables are written as a blank (" ") and, by default, missing values for numeric variables are written as a period (".").
It is easiest to just use a variable list followed by a format list. Syntax is:
(<variable list>) (<format list>)
The values in the format list are repeated until the variables in the variable list are exhausted. The format list can include format modifiers like :,&,~ or = and cursor movement commands like /, +n, or #n.
Also you should add the DSD option to your FILE statement so that missing values are properly represented in the CSV file as having nothing between the delimiters.
So your program reduces to:
data _null_;
set meeting;
file 'C:\Users\Desktop\meeting2.txt' DSD dlm=',';
put (_all_) (:) ;
run;
The problem you had with PUT _ALL_; is that when _ALL_ is used by itself it is treated differently than when it is part of a variable list inside of (). As a variable list it does not include system generated variables such as _N_ or FIRST. or LAST. variables generated by BY statements.
Note that if you want to use _ALL_ in a variable list and still get named output you can use the = format modifier in the format list.
put (_all_) (=) ;
No, I'm Spartacus!
data _null_;
set meeting;
file 'C:\Users\Desktop\meeting2.txt' dlm=',';
put (_all_) (&);
run;
data meeting;
input region $ mtg $ sendmail $;
cards;
N 24NOV1999 10OCT1999
S 28DEC1999 13NOV1999
E 03DEC1999 19OCT1999
W 04OCT1999 20AUG1999
;
run;
proc export data=meeting
outfile='c:\input\meeting.txt'
dbms=tab replace;
delimiter=',';
run;
hope this is helpul for even number of of variables.