tilde, dlm and colon format modifier in list input - sas

There are 3 concepts i would like to clarify. :(colon format modifier), ~(tilde) and dlm=
data scores;
infile datalines dsd;
input name : $10. score1-score3 team ~ $25. div $;
datalines;
Smith,12,22,46,"Green Hornets, Atlanta",AAA
FriedmanLi,23,19,25,"High Volts, Portland",AAA
Jones,09,17,54,"Vulcans, Las Vegas",AA
;
run;
Firstly, usage of : in input statement can totally replace length statement? And why i do not need : for team variable sth like team : ~ $25. ?
Secondly, why sas can automatically recoginize , is the delimiter but not " or blank ?

Colon Operator is required
to tell SAS to use the informat supplied but to stop reading the value for this variable when a delimiter is encountered. Do not forget the colons because without them SAS may read past a delimiter to satisfy the width specified in the informat.
http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000144370.htm
~ Tilde is required to
to treat single quotation marks, double quotation marks, and delimiters in character values in a special way. This format modifier reads delimiters within quoted character values as characters instead of as delimiters and retains the quotation marks when the value is written to a variable.
Why this is needed, because SAS has reserved certain delimiters for it's own functioning i.e. single quotation marks, double quotation marks are used to represent strings, when you want SAS to treat these quotation marks differently you have to tell explicitly it to SAS using - Tilde (~)
http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000144370.htm
SAS can only automatically recognize single blank as delimiter and it cannot automatically recognize , as delimiter.You would have to explicitly tell it to SAS. In your case you have used the option dsd which does three things for you.
(i)
It automatically by default take , as your delimiter. If you want to provide any other delimiter, you would have to specifically tell it to SAS then using dlm= option.
(ii)
SAS treats two consecutive delimiters as a missing value and removes quotation marks from character values
(iii)specifies that when data values are enclosed in quotation marks, delimiters within the value are treated as character data
http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000146932.htm

Related

SAS Scan function separator not working as it should

I ran into a problem with the scan function in sas.
The dataset I have contains one variable that needs to be split into multiple variables.
The variable is structured like this:
4__J04__1__SCH175__BE__compositeur / arrangeur__compositeur /
bewerker__(blank)__1__17__108.03__93.7
I use this code to split this into multiple variables:
data /*ULB.*/work.smart_BCSS_withNISS_&JJ.&K.;
set work.smart_BCSS_withNISS_&JJ.&K.;
/* Maand splitsen in variablen */
mois=scan(smart,1,"__");
jours=scan(smart,2,"__");
nbjours=scan(smart,3,"__");
refClient=scan(smart,4,"__");
paysPrestation=scan(smart,5,"__");
wordingFR=scan(smart,6,"__");
wordingNL=scan(smart,7,"__");
fonction=scan(smart,8,"__");
ARTISTIQUE2=scan(smart,9,"__");
Art_At_LEAST=scan(smart,10,"__");
totalBrut=scan(smart,11,"__");
totalImposable=scan(smart,12,"__");
run;
Most of the time this works perfectly. However sometimes the 4th variable 'refClient' contains one single underscore like this:
4__J04__1__LE_46__BE__compositeur / arrangeur__compositeur /
bewerker__(blank)__1__17__108.03__93.7
Somehow the scan function also detects this single underscore as a separator even though the separator is a double underscore.
Any idea on how to avoid this behavior?
Aurieli's code works, but their answer doesn't explain why. Your understanding of how scan works is incorrect.
If there is more than 1 character in the delimiter specified for scan, each character is treated as a delimiter. You've specified _ twice. If you had specified ab then a and b would both have been treated as delimiters, rather than ab being the delimiter.
scan by default treats multiple consecutive delimiters as a single delimiter, which was why your code treated both __ and _ as delimiters. So if you specified ab as the delimiter string then ba, abba etc. would also be counted as a single delimiter by default.
You can use regexp to change single '_' (for example, change to '-') and then scan what you want:
data /*ULB.*/work.test;
smart="4__J04__1__LE_18__BE__compositeur / arrangeur__compositeur / bewerker__(blank)__1__17__108.03__93.7";
smartcr=prxchange("s/(?<=[^_])(_{1})(?=[^_])/-/",-1,smart);
/* Maand splitsen in variablen */
mois=scan(smartcr,1,"__");
jours=scan(smartcr,2,"__");
nbjours=scan(smartcr,3,"__");
refClient=tranwrd(scan(smartcr,4,"__"),'-','_');
paysPrestation=scan(smartcr,5,"__");
wordingFR=scan(smartcr,6,"__");
wordingNL=scan(smartcr,7,"__");
fonction=scan(smartcr,8,"__");
ARTISTIQUE2=scan(smartcr,9,"__");
Art_At_LEAST=scan(smartcr,10,"__");
totalBrut=scan(smartcr,11,"__");
totalImposable=scan(smartcr,12,"__");
run;
Mildly interesting, the INFILE statement supports a delimiter string.
data test;
infile cards dlmstr='__';
input (mois
jours
nbjours
refClient
paysPrestation
wordingFR
wordingNL
fonction
ARTISTIQUE2
Art_At_LEAST
totalBrut
totalImposable) (:$32.);
cards;
4__J04__1__SCH175__BE__compositeur / arrangeur__compositeur / bewerker__(blank)__1__17__108.03__93.7
4__J04__1__LE_46__BE__compositeur / arrangeur__compositeur / bewerker__(blank)__1__17__108.03__93.7
;;;;
run;
proc print;
run;

SAS search and replace the last word using regex [duplicate]

Match strings ending in certain character
I am trying to get create a new variable which indicates if a string ends with a certain character.
Below is what I have tried, but when this code is run, the variable ending_in_e is all zeros. I would expect that names like "Alice" and "Jane" would be matched by the code below, but they are not:
proc sql;
select *,
case
when prxmatch("/e$/",name) then 1
else 0
end as ending_in_e
from sashelp.class
;quit;
You should account for the fact that, in SAS, strings are of char type and spaces are added up to the string end if the actual value is shorter than the buffer.
Either trim the string:
prxmatch("/e$/",trim(name))
Or add a whitespace pattern:
prxmatch("/e\s*$/",name)
^^^
to match 0 or more whitespaces.
SAS character variables are fixed length. So you either need to trim the trailing spaces or include them in your regular expression.
Regular expressions are powerful, but they might be confusing to some. For such a simple pattern it might be clearer to use simpler functions.
proc print data=sashelp.class ;
where char(name,length(name))='e';
run;

Modified list input using dsd

I am reading the Language Reference: Concepts; and it mentioned i can use ~(tilde) to deal with character values that has delimiters within character values.
And following is the example code
data scores;
infile datalines dsd;
input name : $9. Score1-score3 Team ~ $25. div $;
datalines;
Smith,12,22,46,"Green Hornets, Atlanta",AAA
Mitchel,23,19,25,"High Volts, Portland",AAA
Jones,09,17,54,"Vulcans, Las Vegas",AA
run;
I am wonder why there is problem if i comment out the infile statement.
You also remove the DSD option which specified that the delimiter is a comma. Otherwise the expected delimiter is a space.
You don't need the tilde (~) to read the team variable correctly, but you do need it to include the quotation marks.
DSD (delimiter-sensitive data)
specifies that when data values are enclosed in quotation marks, delimiters within the value are treated as character data. The DSD option changes how SAS treats delimiters when you use LIST input and sets the default delimiter to a comma. When you specify DSD, SAS treats two consecutive delimiters as a missing value and removes quotation marks from character values.
Interaction: Use the DELIMITER= or DLMSTR= option to change the delimiter.
Tip:Use the DSD option and LIST input to read a character value that contains a delimiter within a string that is enclosed in quotation marks. The INPUT statement treats the delimiter as a valid character and removes the quotation marks from the character string before the value is stored. Use the tilde (~) format modifier to retain the quotation marks.
~ Tilde is required to to treat single quotation marks, double quotation marks, and delimiters in character values in a special way. This format modifier reads delimiters within quoted character values as characters instead of as delimiters and retains the quotation marks when the value is written to a variable.
Why this is needed, because SAS has reserved certain delimiters for it's own functioning i.e. single quotation marks, double quotation marks are used to represent strings, when you want SAS to treat these quotation marks differently you have to tell explicitly it to SAS using - Tilde (~)
http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000144370.htm
SAS can only automatically recognize single blank as delimiter and it cannot automatically recognize , as delimiter.You would have to explicitly tell it to SAS. In your case you have used the option dsd which does three things for you.
(i) It automatically by default take , as your delimiter. If you want to provide any other delimiter, you would have to specifically tell it to SAS then using dlm= option.
(ii) SAS treats two consecutive delimiters as a missing value and removes quotation marks from character values
(iii)specifies that when data values are enclosed in quotation marks, delimiters within the value are treated as character data
http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000146932.htm

How to check if a string contains apostrophe

I don't remember how SAS deal with these special characters. Any built-in functions?
E.g
a = New Year's Day, should I use something like index(a, 'New Year's Day') > 0?
The key to this question is the masking of the apostrophe in quotes. If you wish to look for an occurrence of a single apostrophe, you can mask it with double apostrophes:
Looking for single apostrophes
data _NULL_;
a="New Year's Day";
b=index(a,"'");
put b=;
run;
The single apostrophe is passed as a second argument to the index function, using double quotes.
Looking for double quotes
data _NULL_;
a='They said, "Happy New Year!"';
b=index(a,'"');
put b=;
run;
This time around, the double quote is set inside single quotes when passed to the index function
mjsqu and NeoMental covered the basic case well, but in the special case where you do not have the option of using " (for example, you need to prevent macro variable resolution), you can double the apostrophe:
data _null_;
a='MerryXmas&HappyNewYear''s'; *here need single quotes or a macro quoting function;
b=find(a,"'"); *here do not need to mask ampersand resolution;
run;
Of course you could also use %nrstr to avoid resolution, but there are real life cases where this is occasionally needed. This works with "" similarly (two "" become one character ").
Use "find" command like below to find out what are you looking for is there in the string or not. If the returned value is greater than > 0 then apostrophe or whatever you are looking for is there, otherwise not.
Teststring - where you want to look
Next to Teststring is "'" - In quotes what are you looking for, in
your case apostrophe
data _null_;
TestString="New year's day";
IsItThere=find(TestString,"'");
put IsItThere=;
run;
http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a002267763.htm

In SAS, what does the option "dsd" stand for?

I have a quick question.
I am learning SAS and have come across the dsd= option.
Does anyone know what this stands for? It might assist in remembering / contextualizing.
Thanks.
Rather than just copy and pasting text from the internet. I'll try to explain it a bit clearer. Like the delimiter DLM=, DSD is an option that you can use in the infile statement.
Suppose a delimiter has been specified with DLM= and we used DSD. If SAS sees two delimiters that are side by side or with only blank space(s) between them, then it would recognize this as a missing value.
For example, if text file dog.txt contains the row:
171,255,,dog
Then,
data test;
infile 'C:\sasdata\dog.txt' DLM=',' DSD;
input A B C D $;
run;
will output:
A B C D
171 255 . dog
Therefore, variable C will be missing denoted by the .. If we had not used DSD, it would return as invalid data.
DSD stands for Delimiter-Sensitive Data.
The DSD (Delimiter-Sensitive Data) in infile statement does three things for you. 1: it ignores delimiters in data values enclosed in quotation marks; 2: it ignores quotation marks as part of your data; 3: it treats two consecutive delimiters in a row as missing value.
Source: easy sas
DSD (delimiter-sensitive data)
specifies that when data values are enclosed in quotation marks,
delimiters within the value are treated as character data. The DSD
option changes how SAS treats delimiters when you use LIST input and
sets the default delimiter to a comma. When you specify DSD, SAS
treats two consecutive delimiters as a missing value and removes
quotation marks from character values.
http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000146932.htm
DSD refers to delimited data files that have delimiters back to back when there is missing data. In the past, programs that created delimited files always put a blank for missing data. Today, however, pc software does not put in blanks, which means that the delimiters are not separated. The DSD option of the INFILE statement tells SAS to watch out for this. Below are examples (using comma delimited values) to illustrated:
Old Way: 5,4, ,2, ,1 ===> INFILE 'file' DLM=',' ... etc
New Way: 5,4,,2,,1 ===> INFILE 'file' DLM=',' DSD ... etc.
Refer
reference