Reading a period as character value in SAS - sas

I am reading a period '.' as a character variable's value but it is reading it as a blank value.
data output1;
input #1 a $1. #2 b $1. #3 c $1.;
datalines;
!..
1.3
;
run;
Output Required
------ --------
A B C A B C
! ! . .
1 3 1 . 3
Please help me in reading a period as such.

The output is determined by the informat used ($w. informat in your case, requested by $1. in your code, so $1. is first of all informat definition, lenght definition of variable is a side product of this).
Use $char. informat for desired result.
data output1;
input #1 a $char1. #2 b $char1. #3 c $char1.;
datalines;
!..
1.3
;
run;
From documentation:
$w Informat
The $w. informat trims leading blanks and left aligns the values before storing the text. In addition, if a field contains only blanks and a single period, $w. converts the period to a blank because it interprets the period as a missing value. The $w. informat treats two or more periods in a field as character data.
$CHARw. informat
The $CHARw. informat does not trim leading and trailing blanks or convert a single period in the input data field to a blank before storing values.

I don't immediately see why it does not work.
But if you are not interested in figuring out why it does not work, but just want something that does: read it in as 1 variable of length $3. Then in a next step; split it using substr.
E.g.,
data output1;
length tmp $3;
input tmp;
datalines;
!..
1.3
;
run;
data output2 (drop=tmp);
length a $1;
length b $1;
length c $1;
set output1;
a=substr(tmp,1,1);
b=substr(tmp,2,1);
c=substr(tmp,3,1);
run;

Related

Why does my regex only change my first entry in SAS?

I have a number of text entries (municipalities) from which I need to remove the s at the end.
Data test;
input city $;
datalines;
arjepogs
askers
Londons
;
run;
data cities;
set test;
if prxmatch("/^(.*?)s$/",city)
then city=prxchange("s/^(.*?)s$/$1/",-1,city);
run;
Strangely enough, my s's are only removed from my first entry.
What am I doing wrong?
You defined CITY as length $8. The s in Londons is in the 7th position of the string. Not the LAST position of the string. Use the TRIM() function to remove the trailing spaces from the value of the variable.
data have;
input city $20.;
datalines;
arjepogs
Kent
askers
Londons
;
data want;
set have;
length new_city $20 ;
new_city=prxchange("s/^(.*?)s$/$1/",-1,trim(city));
run;
Result
Obs city new_city
1 arjepogs arjepog
2 Kent Kent
3 askers asker
4 Londons London
You could also just change the REGEX to account for the trailing spaces.
new_city=prxchange("s/^(.*?)s\ *$/$1/",-1,city);
Here is another solution using only SAS string functions and no regex. Note that in this case there is no need to trim the variable:
data cities;
set test;
if substr(city,length(city)) eq "s" then
city=substr(city,1,length(city)-1);
run;

Removing Characters from SAS String Starting on Left

I have a SAS string that always starts with a date. I want to remove the date from the substring.
Example of data is below (data does not have bullets, included bullets to increase readability)
10/01/2016|test_num15
11/15/2016|recom_1_test1
03/04/2017|test_0_8_i0|vacc_previous0
I want the data to look like this (data does not have bullets, included bullets to increase readability)
test_num15
recom_1_test1
test_0_8_i0|vacc_previous0
Index find '|' position in the string, then substr substring; or use regular expression.
data have;
input x $50.;
x1=substr(x,index(x,'|')+1);
x2=prxchange('s/([^_]+\|)(?=\w+)//',1,x);
cards;
10/01/2016|test_num15
11/15/2016|recom_1_test1
03/04/2017|test_0_8_i0|vacc_previous0
;
run;
This is a great use case for call scan. If your length of date is constant (always 10), then you don't actually need this (start would be 12 then and skip to the substr, as user667489 noted in comments), but if it's not this would be helpful.
data have;
length textstr $100;
input textstr $;
datalines;
10/01/2016|test_num15
11/15/2016|recom_1_test1
03/04/2017|test_0_8_i0|vacc_previous0
;;;;
run;
data want;
set have;
call scan(textstr,2,start,length,'|');
new_textstr = substr(textstr,start);
run;
It would also let you grab the second word only if that's useful (using length third argument for substr).

SAS PRXCHANGE a number between words into similar number of spaces

Is it possible to use the number in this string:
'xx8xx'
by replacing the number with 8 spaces to get this string:
'xx xx'
I can identify the number between the xx but the replacement syntax does not work as intended:
PRXCHANGE(s/xx([\d]*)xx/' ' x $1/io, -1, 'xx8xx')
Is there a way to use the number being held in $1 to repeat the space character by that number i.e. something like ' ' x $1?
Any help much appreciated!
Tiaan
Supposed you need to replace with three blank.
data _null_;
x=prxchange('s/(xx)\d+(xx)/$1 $2/', -1, 'xx8xx');
_x=prxchange('s/(?=\w+)(\d+)/ /',1,'xx8xx');
put _all_;
run;
Edit:
I missed important information. Tranwrd and repeat could be used to get it.
data _null_;
x=tranwrd('xx8xx', prxchange('s/.*(\d+).*/$1/',1,'xx8xx'), repeat(' ',prxchange('s/.*(\d+).*/$1/',1,'xx8xx')));
put _all_;
run;
You'll need to extract first, then compile a new regex. This will be expensive since you have to compile once per line.
data have;
input xstr $;
datalines;
xx8xx
xx3xx
xx4xx
;;;;
run;
data want;
set have;
rx1 = prxparse('/xx([\d])*xx/io');
rc1 = prxmatch(Rx1,xstr);
num_x = prxposn(rx1,1,xstr);
rx2 = prxparse(cat('s/(xx)[\d]*(xx)/$1',repeat(" ",num_x-1),'$2/i'));
newstr = prxchange(rx2,-1,xstr);
run;

what is the difference between dsd and dlm="," in SAS?

Let me sum up what I have got from this website. https://communities.sas.com/t5/General-SAS-Programming/Please-explain-DSD-and-DLM-differences/td-p/146773
(1): Without dsd, the cursor passes all the delimiters before reading the next field, while on the other hand, with dsd, the cursor only pass one delimiter.
(2): If you use dsd, the informat should use a colon somehow?
Do you know any differences between the two? Many thanks for your time and attention.
The most obvious difference is how DSD treats consecutive delimiters. From the docs:
When you specify DSD, SAS treats two consecutive delimiters as a
missing value and removes quotation marks from character values.
Whereas the default functionality of DLM=',' is to treat consecutive commas as a single comma, DSD will assign missing values between consecutive commas. Here's an example:
data work.dlm_test;
infile datalines dlm=','; /* using dlm */
input var1 var2 var3;
datalines; /* note how the consecutive commas are treated! */
1,2,3
1,,3
,2,3
;
data work.dsd_test;
infile datalines dsd; /* using dsd */
input var1 var2 var3;
datalines;
1,2,3
1,,3
,2,3
;
proc print data=dlm_test;
/* this will print something like:
OBS | var1 | var2 | var3
-----+------+------+------ Note only 2 observations b/c of
1 | 1 | 2 | 3 default FLOWOVER functionality.
2 | 1 | 3 | 2 <--- Also, final '3' is ignored because there
is no variable to store it.
*/
run;
proc print data=dsd_test;
/* this will print something like:
OBS | var1 | var2 | var3
-----+------+------+------
1 | 1 | 2 | 3
2 | 1 | . | 3 <-- note the missing value in var2
3 | . | 2 | 3 <-- note the third observation, with missing val
*/
run;
Also, DSD will be able to tell that a comma found inside quotation marks is actually not a delimiter, but part of a character string. In contrast, if you use only DLM=',', then it will ignore quotation marks and treat every comma-cluster as a delimiter.
TIP: By default, DSD drops the quotes around character strings, but you can keep the quotes by using the ~format identifier in the INPUT statement.
It is useful to note that DSD and DLM can also be used together to get the behavior of DSD, but change the default delmiiter from a comma to something else, like a semicolon (;). Example:
infile (filename) dsd dlm=';';
I found this documentation page to be the most instructive.
Remember: DSD stands for "delimiter-sensitive data" because it is more deliberate about processing delimiters!
The real issue how the input statement behaves when it sees a delimiter when it starts to read a variable. With the DSD option it will set the value to missing and move the pointer past the delimiter. Without the DSD option it will skip over the delimiter (or multiple adjacent delimiters) before reading the value. You can confirm this by reading a line that starts with a delimiter.
The colon modifier helps when the actual value is shorter than the informat's width, but it also helps by moving the pointer PAST the delimiter so that the NEXT variable is read correctly. This is what makes it important when using formatted input statements with the DSD infile option.
You can avoid the need to worry about the : modifiers by using an INFORMAT statement instead of listing the informats in the input statement.

Character value with embedded blanks with list input

I would like to read following instream datalines
datalines;
Smith,12,22,46,Green Hornets,AAA
FriedmanLi,23,19,25,High Volts,AAA
Jones,09,17,54,Las Vegas,AA
;
I employed while it read AAA items to team variables but not as div. And how should I place &(ampersand to read character with embedded blanks?)
data scores2;
infile datalines dlm=",";
input name : $10. score1-score3 team $20. div $;
datalines;
Smith,12,22,46,Green Hornets,AAA
FriedmanLi,23,19,25,High Volts,AAA
Jones,09,17,54,Las Vegas,AA
;
run;
Notice I have used : before team also ( well you have already used colon operator : for other variables , not sure why did you miss over here) As I have already mentioned in your other query, use : colon operator (tilde, dlm and colon format modifier in list input) which would tell SAS to use the informat supplied but to stop reading the value for this variable when a delimiter is encountered. Here as you had not used this operator , that is why SAS was trying to read 20 chars, even though
there was a delimiter in between.
Tested
data scores2;
infile datalines dlm=",";
input name : $10.
score1-score3
team : $20.
div : $3.;
datalines;
Smith,12,22,46,Green Hornets,AAA
FriedmanLi,23,19,25,High Volts,AAA
Jones,09,17,54,Las Vegas,AA
;
run;
Another way to do this that's often a bit easier to read is to use the informat statement.
data scores2;
infile datalines dlm=",";
informat name $10.
team $20.
div $4.;
input name $ score1-score3 team $ div $;
datalines;
Smith,12,22,46,Green Hornets,AAA
FriedmanLi,23,19,25,High Volts,AAA
Jones,09,17,54,Las Vegas,AA
;
run;
That accomplishes the same thing as using the colon (input name :$10.) but organizes it a bit more cleanly.
And just to be clear, embedded blanks are irrelevant in comma delimited input; '20'x (ie, space) is just another character when it's not the delimiter. What ampersand will do is addressed in this article, and more specifically, if space is the delmiiter it allows you to require two consecutive delimiters to end a field. Example:
data scores2;
infile datalines dlm=" ";
informat name $10.
team $20.
div $4.;
input name $ score1-score3 team & $ div $;
datalines;
Smith 12 22 46 Green Hornets AAA
FriedmanLi 23 19 25 High Volts AAA
Jones 09 17 54 Las Vegas AA
;
run;
Note the double space after all of the team names - that's required by the &. But this is only because delimiter is space (which is default, so if you removed the dlm=' ' it would also be needed.)