I have a quick question.
I am learning SAS and have come across the dsd= option.
Does anyone know what this stands for? It might assist in remembering / contextualizing.
Thanks.
Rather than just copy and pasting text from the internet. I'll try to explain it a bit clearer. Like the delimiter DLM=, DSD is an option that you can use in the infile statement.
Suppose a delimiter has been specified with DLM= and we used DSD. If SAS sees two delimiters that are side by side or with only blank space(s) between them, then it would recognize this as a missing value.
For example, if text file dog.txt contains the row:
171,255,,dog
Then,
data test;
infile 'C:\sasdata\dog.txt' DLM=',' DSD;
input A B C D $;
run;
will output:
A B C D
171 255 . dog
Therefore, variable C will be missing denoted by the .. If we had not used DSD, it would return as invalid data.
DSD stands for Delimiter-Sensitive Data.
The DSD (Delimiter-Sensitive Data) in infile statement does three things for you. 1: it ignores delimiters in data values enclosed in quotation marks; 2: it ignores quotation marks as part of your data; 3: it treats two consecutive delimiters in a row as missing value.
Source: easy sas
DSD (delimiter-sensitive data)
specifies that when data values are enclosed in quotation marks,
delimiters within the value are treated as character data. The DSD
option changes how SAS treats delimiters when you use LIST input and
sets the default delimiter to a comma. When you specify DSD, SAS
treats two consecutive delimiters as a missing value and removes
quotation marks from character values.
http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000146932.htm
DSD refers to delimited data files that have delimiters back to back when there is missing data. In the past, programs that created delimited files always put a blank for missing data. Today, however, pc software does not put in blanks, which means that the delimiters are not separated. The DSD option of the INFILE statement tells SAS to watch out for this. Below are examples (using comma delimited values) to illustrated:
Old Way: 5,4, ,2, ,1 ===> INFILE 'file' DLM=',' ... etc
New Way: 5,4,,2,,1 ===> INFILE 'file' DLM=',' DSD ... etc.
Refer
reference
Related
I want to replace certain values in my json file (in this example null values with empty quotation marks.) My solution is working correctly but, for some mysterious reason, the last character of the json file is deleted. Regardless of the last character, the code always deletes it - I have also tried with a different json file that ends in curly braces.
What is causing this and more importantly how can I prevent this?
data testdata_;
input var1 var2 var3;
format _all_ commax10.1;
datalines;
3.1582 0.3 1.8
21 . .
1.2 4.5 6.4
;
proc json out = 'G:\test.json' pretty fmtnumeric nosastags keys;
export testdata_;
run;
data _null_;
infile 'G:\test.json';
file 'G:\test.json';
input;
_infile_ = tranwrd(_infile_,'null','""');
put _infile_ ;
run;
To see how the contents change, first run the code until "data null" statement and check the file content, then run the last statement.
Data _null_ has it correct; don't write to the same file. SAS offers this option, but in the modern day it's almost always the wrong answer, due to how SAS supports this and the fact that storage is sufficiently cheap and fast.
In this case, it looks like it's a relatively easy fix, but you probably should do as suggested and write to a new file anyway - there will be other issues.
data testdata_;
input var1 var2 var3;
format _all_ commax10.1;
datalines;
3.1582 0.3 1.8
21 . .
1.2 4.5 6.4
;
proc json out = 'H:\temp\test.json' pretty fmtnumeric nosastags keys;
export testdata_;
run;
data _null_;
infile 'H:\temp\test.json' end=eof;
file 'H:\temp\test.json';
input #;
putlog _infile_;
_infile_ = tranwrd(_infile_,'null','"" ');
len = length(_infile_);
put _infile_ ;
if eof then put _infile_;
run;
There's two changes. One, I use '"" ' instead of '""' in the tranwrd; that's because otherwise you end up with slightly odd results with new lines being added. If your JSON parser doesn't like "" ,, then you may want to instead have two tranwrd, one for null, and one for null, or something similar (or use a regular expression). But what's important is the number of characters needs to match in the input and the output. If you can't handle that (like the extra spaces are problematic) then you're left with "write a new file".
Two, I look for the end of the file, then intentionally write out a second line there. That avoids the issue you're having with the bracket, as it avoids having the EOF being written out before the bracket. I'm not 100% sure I know why you need that - but you do.
Another option, which might make more sense, is to only write the lines that have the bracket.
data _null_;
infile 'H:\temp\test.json' sharebuffers;
file 'H:\temp\test.json';
input #;
putlog _infile_;
if find(_infile_,'null') then do;
_infile_ = tranwrd(_infile_,'null','"" ');
put _infile_;
end;
run;
I added sharebuffers because that should make it run a bit faster. Note that I also remove one space - something weird about how SAS does this seems to otherwise remove a space from the following line otherwise. No idea why, probably something weird with EOL characters.
But again - don't do any of this unless there's no other option. Write a new file.
One strange thing is that the PROC JSON always writes a text file that uses LF as the end of line characters.
So you might be able to get your overwriting of the file to work if add these caveats:
Use TERMSTR=LF on the INFILE statement.
Use SHAREDBUFFERS on the INFILE statement.
Replace the string with the same number of bytes with the TRANWRD() function and not put a space as the last character on the line.
I would also search for ': null' instead of just 'null' to reduce risk of replacing those characters in some other string in the file.
data _null_;
infile json SHAREBUFFERS termstr=lf ;
file json ;
input ;
_infile_ = tranwrd(_infile_,': null',': ""');
put _infile_;
run;
I am trying to change the delimiter from comma to pipe in a text file using SAS. The data in the input file looks like-
Site,Variable,20151120010000,5.82,1,1,Project|Code|comment
Site,Variable,20151120020000,5.82,1,1,Project|Code|comment
Site,Variable,20151120030000,5.81,1,1,Project|Code|comment, out of service
I want to change the commas (delimiter) to pipe but if there is a comma (for example in the last line), I don't want to change it to pipe. Basically Project|Code|comment, out of service is one column. I am using the code below (as suggested by a stack overflow member)-
%let flname1=D:\temp\comma_file_%sysfunc(today(),yymmddn8.).txt;
%put &=flname1;
%let flname2=D:\temp\pipe_file_%sysfunc(today(),yymmddn8.).txt;
%put &=flname2;
data _null_;
length x1-x9 $200;
infile "&flname1" dsd dlm=',' truncover;
file "&flname2" dsd dlm='|';
input x1-x9;
put x1-x9;
run;
The output I get using this code looks like-
Site|Variable|20151120010000|5.82|1|1|"Project|Code|comment"||
Site|Variable|20151120020000|5.82|1|1|"Project|Code|comment"||
Site|Variable|20151120030000|5.81|1|1|"Project|Code|comment"|out of service|
I want the output to look like-
Site|Variable|20151120010000|5.82|1|1|Project|Code|comment
Site|Variable|20151120020000|5.82|1|1|Project|Code|comment
Site|Variable|20151120030000|5.81|1|1|Project|Code|comment,out of service
This might be pretty easy but I am just starting to learn SAS. Any help is greatly appreciated.
Just read the file as a series of text fields and re-write it using a different delimiter. Your problem is that the first few fields are comma delimited and the last two are pipe delimited. It looks like you have three fields but that the first field is 7 comma delimited values. So read the last two columns using ('|') as the delimiter and the first 7 using both pipe and comma ('|,') as the delimiter. Then re-write it using another the delimiter. You will need to make two filerefs to allow it be processed using different delimiters.
filename original "sample1.dat";
filename copy "sample1.dat";
filename out "sample1.csv";
data _null_;
length field1-field9 $200;
infile original dsd dlm='|' truncover;
input field7-field9 ;
infile copy dsd dlm=',|' truncover;
input field1-field7 ;
file out dsd dlm='|';
put field1-field9;
run;
That will generate what you requested.
Note that if you write the new file using comma (',') as the delimiter it will look like this instead since SAS will protect the embedded delimiter with quotes.
Site,Variable,20151120010000,5.82,1,1,Project,Code,comment
Site,Variable,20151120020000,5.82,1,1,Project,Code,comment
Site,Variable,20151120030000,5.81,1,1,Project,Code,"comment, out of service"
Or you could use the SCAN() function to break the first field out into 7. That eliminates the need to read the line with multiple delimiters.
data _null_;
length field1-field9 $200;
infile original dsd dlm='|' truncover;
input field7-field9 ;
array field (9);
do i=1 to 7;
field(i) = scan(field7,i,',','m');
end;
file out dsd dlm='|';
put field1-field9;
run;
I would simply import the original column that contains the pipe separation, then create three new columns using scan to split the data:
data test;
set test;
project=scan(piped,1,'|');
code=scan(piped,2,'|');
comment=scan(piped,3,'|');
run;
I am reading the Language Reference: Concepts; and it mentioned i can use ~(tilde) to deal with character values that has delimiters within character values.
And following is the example code
data scores;
infile datalines dsd;
input name : $9. Score1-score3 Team ~ $25. div $;
datalines;
Smith,12,22,46,"Green Hornets, Atlanta",AAA
Mitchel,23,19,25,"High Volts, Portland",AAA
Jones,09,17,54,"Vulcans, Las Vegas",AA
run;
I am wonder why there is problem if i comment out the infile statement.
You also remove the DSD option which specified that the delimiter is a comma. Otherwise the expected delimiter is a space.
You don't need the tilde (~) to read the team variable correctly, but you do need it to include the quotation marks.
DSD (delimiter-sensitive data)
specifies that when data values are enclosed in quotation marks, delimiters within the value are treated as character data. The DSD option changes how SAS treats delimiters when you use LIST input and sets the default delimiter to a comma. When you specify DSD, SAS treats two consecutive delimiters as a missing value and removes quotation marks from character values.
Interaction: Use the DELIMITER= or DLMSTR= option to change the delimiter.
Tip:Use the DSD option and LIST input to read a character value that contains a delimiter within a string that is enclosed in quotation marks. The INPUT statement treats the delimiter as a valid character and removes the quotation marks from the character string before the value is stored. Use the tilde (~) format modifier to retain the quotation marks.
~ Tilde is required to to treat single quotation marks, double quotation marks, and delimiters in character values in a special way. This format modifier reads delimiters within quoted character values as characters instead of as delimiters and retains the quotation marks when the value is written to a variable.
Why this is needed, because SAS has reserved certain delimiters for it's own functioning i.e. single quotation marks, double quotation marks are used to represent strings, when you want SAS to treat these quotation marks differently you have to tell explicitly it to SAS using - Tilde (~)
http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000144370.htm
SAS can only automatically recognize single blank as delimiter and it cannot automatically recognize , as delimiter.You would have to explicitly tell it to SAS. In your case you have used the option dsd which does three things for you.
(i) It automatically by default take , as your delimiter. If you want to provide any other delimiter, you would have to specifically tell it to SAS then using dlm= option.
(ii) SAS treats two consecutive delimiters as a missing value and removes quotation marks from character values
(iii)specifies that when data values are enclosed in quotation marks, delimiters within the value are treated as character data
http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000146932.htm
There are 3 concepts i would like to clarify. :(colon format modifier), ~(tilde) and dlm=
data scores;
infile datalines dsd;
input name : $10. score1-score3 team ~ $25. div $;
datalines;
Smith,12,22,46,"Green Hornets, Atlanta",AAA
FriedmanLi,23,19,25,"High Volts, Portland",AAA
Jones,09,17,54,"Vulcans, Las Vegas",AA
;
run;
Firstly, usage of : in input statement can totally replace length statement? And why i do not need : for team variable sth like team : ~ $25. ?
Secondly, why sas can automatically recoginize , is the delimiter but not " or blank ?
Colon Operator is required
to tell SAS to use the informat supplied but to stop reading the value for this variable when a delimiter is encountered. Do not forget the colons because without them SAS may read past a delimiter to satisfy the width specified in the informat.
http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000144370.htm
~ Tilde is required to
to treat single quotation marks, double quotation marks, and delimiters in character values in a special way. This format modifier reads delimiters within quoted character values as characters instead of as delimiters and retains the quotation marks when the value is written to a variable.
Why this is needed, because SAS has reserved certain delimiters for it's own functioning i.e. single quotation marks, double quotation marks are used to represent strings, when you want SAS to treat these quotation marks differently you have to tell explicitly it to SAS using - Tilde (~)
http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000144370.htm
SAS can only automatically recognize single blank as delimiter and it cannot automatically recognize , as delimiter.You would have to explicitly tell it to SAS. In your case you have used the option dsd which does three things for you.
(i)
It automatically by default take , as your delimiter. If you want to provide any other delimiter, you would have to specifically tell it to SAS then using dlm= option.
(ii)
SAS treats two consecutive delimiters as a missing value and removes quotation marks from character values
(iii)specifies that when data values are enclosed in quotation marks, delimiters within the value are treated as character data
http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000146932.htm
I have a string which looks like this:
"ABAR_VAL", "ACQ_EXPTAX_Y", "ACQ_EXP_TAX", "ADJ_MATHRES2"
And I'd like it to look like this:
ABAR_VAL ACQ_EXPTAX_Y ACQ_EXP_TAX ADJ_MATHRES2
I.e. no apostrophes or commas and single space separated.
What is the cleanest / shortest way to do so in SAS 9.1.3?
Preferably something along the lines of:
call symput ('MyMacroVariable',compress(????,????,????))
Just to be clear, the result needs to be single space separated, devoid of punctuation, and contained in a macro variable.
Here you go..
data test;
var1='"ABAR_VAL", "ACQ_EXPTAX_Y", "ACQ_EXP_TAX", "ADJ_MATHRES2"';
run;
data test2;
set test;
call symput('macrovar',COMPBL( COMPRESS( var1,'",',) ) );
run;
%put ¯ovar;
Is this part of an infile statement or are you indeed wanting to create macro variables that contain these values? If this is part of an infile statement you shouldn't need to do anything if you have the delimiter set properly.
infile foo DLM=',' ;
And yes, you can indeed use the compress function to remove specific characters from a character string, either in a data step or as part of a macro call.
COMPRESS(source<,characters-to-remove>)
Sample Data:
data temp;
input a $;
datalines;
"boo"
"123"
"abc"
;
run;
Resolve issue in a data step (rather than create a macro variable):
data temp2; set temp;
a=compress(a,'"');
run;
Resolve issue whilst generating a macro variable:
data _null_; set temp;
call symput('MyMacroVariable',compress(a,'"'));
run;
%put &MyMacroVariable.;
You'll have to loop through the observations in order to see the compressed values the variable for each record if you use the latter code. :)
To compress multiple blanks into one, use compbl : http://www.technion.ac.il/docs/sas/lgref/z0214211.htm