Concatenation option? '!' - sas

I have been studying basic level SAS and here is a problem that I don't understand.
data test;
A='Ipswich, England';
B=substr(A,1,7);
C=B!!';'!!'England';
run;
According to the problem, the value of C must be Ipswich , England.
I tried the code and there are three things that I would like to ask in particular.
1), Why is it okay to use !! instead of || ? Is !! a different concatenation option?
2), The result I got was Ipswich ;England. So, I don't know what the comma is doing there instead of the smicolon.
3), Why is there an extra space after Ipswich? Should not B be only the 7 letters from A from letter 1? As in I s p w i c h ?
The text I am working on has some weird expressions so there is a chance that it is a typo, but I do not want to go there yet.
Thank you.

You can use !! as an alias for ||. Old keyboards didn't have the | character. Also old ASCII/EBCDIC transcoders didn't always translate that character properly.
Your code is definitely using a semi-colon and not a comma. So either a typo or a transcription error is why the suggested answer has a comma.
Since you didn't tell SAS what length to use for variable B it had to guess. So it guessed it should use the same length as the input to the SUBSTR() function call. So both A and B are defined as 16 bytes long. The || operator does not trim the trailing spaces so the semi-colon is the 17th byte of C.
171 data test;
172 A='Ipswich, England';
173 B=substr(A,1,7);
174 C=B!!';'!!'England';
175 put (a b c) (=$quote.);
176 run;
A="Ipswich, England" B="Ipswich" C="Ipswich ;England"
NOTE: The data set WORK.TEST has 1 observations and 3 variables.
Contents:
Alphabetic List of Variables and Attributes
# Variable Type Len
1 A Char 16
2 B Char 16
3 C Char 24

1) Back in the day not all keyboards had pipe.
2) More and one extra space.
27 data null;
28 A='Ipswich, England';
29 B=substr(A,1,7);
30 C=B!!';'!!'England';
31 l1=vlength(a);
32 l2=vlength(b);
33 l3=vlength(c);
34 put all;
35 put 'NOTE: ' c $varying. l3 '';
36 run;
3) Length of B defaults to length of SUBSTR argument 1.
A=Ipswich, England B=Ipswich C=Ipswich ;England l1=16 l2=16 l3=24 _ERROR_=0 _N_=1
NOTE: **Ipswich ;England**

Related

How to create a string with values of previous row SAS

I have a table with list of account numbers bucket n monthperiod. I need to make a bucket string like below..please help (base SAS)
ACC Bucket Month bucketstring
123 0 jan18 0
123 1 feb18 10
123 2 mar18 210
345 0 feb18 0
345 1 mar18 10
The retain statement is used to maintain the value of a non-set variable over the iterations of the implicit loop that happens during the DATA step.
This example will work with ACC groups having upto 15 months (0..15). ACCs with more months will see a message put in the log.
data want;
set have;
by ACC;
length bucketstring $20; * bucketstring might have to be made longer;
retain buckstring;
if length (bucketstring) = 20 and not first.ACC then
put 'ERROR: bucketstring has to be longer for the case of ' ACC= month=;
if first.ACC
then bucketstring = cats(month);
else bucketstring = cats(bucketstring,month);
run;
The cats function concatenates items. The items are automatically stripped of leading and trailing spaces, as well as automatically converting a number-item to a character value if necessary.

How to add a column of repeated numbers in SAS?

How to generate a repeating series of numbers in a column in SAS, from 1 to x?
Suppose x is 3.
Data is like:
name age
A 15
D 16
C 21
B 35
E 79
F 85
G 64
and I want to add a column named list, like this:
name age list
A 15 1
D 16 2
C 21 3
B 35 1
E 79 2
F 85 3
G 64 1
data class;
set sashelp.class;
if list>=3 then list=0;
list+1;
run;
Easiest way I can think of is to use mod and the iteration counter.
data want;
set have;
list = 1 + mod(_N_ - 1,3);
run;
mod is the modulo function (gives the remainder after dividing).
So if you want that to vary based on some parameter, well, change the 3 to a parameter.
%let num_atwork = 2;
data want;
set have;
list = 1 + mod(_N_ - 1, &num_atwork.);
run;

Reshaping data from long to wide

Below is an example that I found to reshape data from long to wide.But I am not able ti understand the code, especially the way they are replacing blanks and why. Can someone help me understand the code?
Example 1: Reshaping one variable
We will begin with a small data set with only one variable to be reshaped. We will use the variables year and faminc (for family income) to create three new variables: faminc96, faminc97 and faminc98. First, let's look at the data set and use proc print to display it.
DATA long ;
INPUT famid year faminc ;
CARDS ;
1 96 40000
1 97 40500
1 98 41000
2 96 45000
2 97 45400
2 98 45800
3 96 75000
3 97 76000
3 98 77000
;
RUN ;
PROC PRINT DATA=long ;
RUN ;
Obs famid year faminc
1 1 96 40000
2 1 97 40500
3 1 98 41000
4 2 96 45000
5 2 97 45400
6 2 98 45800
7 3 96 75000
8 3 97 76000
9 3 98 77000
Now let's look at the program. The first step in the reshaping process is sorting the data (using proc sort) on an identification variable (famid) and saving the sorted data set (longsort). Next we write a data step to do the actual reshaping. We will explain each of the statements in the data step in order.
PROC SORT DATA=long OUT=longsort ;
BY famid ;
RUN ;
DATA wide1 ;
SET longsort ;
BY famid ;
KEEP famid faminc96 -faminc98 ;
RETAIN faminc96 - faminc98 ;
ARRAY afaminc(96:98) faminc96 - faminc98 ;
IF first.famid THEN
DO;
DO i = 96 to 98 ;
afaminc( i ) = . ;
END;
END;
afaminc( year ) = faminc ;
IF last.famid THEN OUTPUT ;
RUN;
This is a good example to compare and contrast with DO UNTIL(LAST. It does away with the RETAIN and INIT to missing on FIRST.FAMID and the LAST. test for when to OUTPUT. Those operations are sill done just using the built in features of the data step loop.
DATA long;
INPUT famid year faminc;
CARDS;
1 96 40000
1 97 40500
1 98 41000
2 96 45000
2 97 45400
2 98 45800
3 96 75000
3 97 76000
3 98 77000
;;;;
RUN;
proc print;
run;
data wide;
do until(last.famid);
set long;
by famid;
ARRAY afaminc[96:98] faminc96-faminc98;
afaminc[year]=faminc;
end;
drop year faminc;
run;
proc print;
run;
The main element here is the SAS retain statement.
The datastep is executed for every observation in the dataset. For every iteration all variables are set to missing and then the data is loaded from the dataset.
If a variable is RETAINed it will not be reset, but will keep the information from the last iteration.
BY famid ;
Your dataset is ordered and the datastep is using a by statement. This will initialize the first.famid and last.famid. These are just binaries that turn to 1 for the first/last observation of a single id-group.
RETAIN faminc96 - faminc98 ;
As already explained faminc96 - faminc98 will keep their value from one datastep iteration to the next.
ARRAY afaminc(96:98) faminc96 - faminc98 ;
Just an array, so you can call the variables by number instead of name.
IF first.famid THEN
DO;
DO i = 96 to 98 ;
afaminc( i ) = . ;
END;
END;
For every first observation in an id-group the retained variables are reset. Otherwise you would keep values from one od-group to the next. Same could be done by IF first.famid then call missing(of afaminc(*));
afaminc( year ) = faminc ;
Writing the information to your transposed variables, according to the year.
IF last.famid THEN OUTPUT ;
After you have written all the values to your new variables, you only OUTPUT one observation (the last) in every id-group to the new dataset. As the variables were retained, they are all filled at this point.
This datastep is fast and purpose build. But generally you could just use proc transpose
I highly recommend proc transpose. It'll make your life easier.
http://support.sas.com/resources/papers/proceedings09/060-2009.pdf

Trailing zeros in string format in Stata

I am trying to convert a string variable A in Stata to a string variable B such that each observation has a fixed length. For example the string variable A is
85
01
3
and I want to convert it to another string variable B with trailing zeros in order to get length 5 for each observations
85000
01000
30000
I know that in order to put leading zeros this code works
gen B= string(real(A),"%05.0f"). How should it be modified in order to get trailing zeros?
The issue is that your new variable is not the old one with a new format, but different values altogether. One way is:
clear
set more off
*----- example data -----
input ///
str2 A
85
01
3
end
list
*------ what you want -----
// desired length of string
local len = 5
// factor
gen xdif = 10 ^ (`len' - length(A))
// new values with new format
gen B0 = real(A) * xdif
gen B = string(B0,"%0`len'.0f")
list
You can make that a one-liner, if you like.
An alternative approach that works equally well padding strings that contain non-numeric values.
clear
set more off
*----- example data -----
input ///
str3 A
85
01
3
XYZ
"D F"
end
list
*------ what you want -----
// desired length of string
local len 5
// desired trailing character
local pad 0
// new values
gen B0 = "`pad'"*`len'
gen B1 = A+B0
generate str`len' B3 = B1
// new values in a single command
generate str`len' B = A+"`pad'"*`len'
list

How do I assign numeric values to the alphabet in SAS

I'm trying to convert a character string to a numeric variable and then sum the values of each character to use as a unique identifier for that field.
So for example, I would like A=1, B=2, C=3.....X=24 Y=25 Z=26.
Say my string is "CAB" so after running the code I would like the result to be an intermidiary column of numbers, where the value for CAB IS 3 1 2 and the result column would be derived by summing the string 3+1+2= 6 and show the value of the intermideate column, so the final value woud be 6.
Here is the sas code I used to convert the characters to numbers, but I need help with the result column.
DATA CHAR_VALUE;
SET WORK.XYZ;
CHAR_2_NUM=TRANSLATE(MY_VAR_CHAR, '1 2 3 ...24 25 26', 'A B C ...X Y Z');
NUM_CHAR=INPUT(CHAR_2_NUM,32.);
RUN;
Thanks in advance...I appreciate any help or suggestions.
-rachel
RANK will give the ASCII numeric value underlying a character; so A=65, B=66, Z=90, a=97, z=122.
So this should work (if you want only the uppercase values - not a different value for a than A):
data test;
charval='CAB';
do _t=1 to length(Charval);
numval=sum(numval,rank(char(upcase(charval),_t))-64);
end;
put _all_;
run;
Another option (Based on the comments below), is to build an informat with the relationships between letter and value. My loop iterates over each character A to Z, you can then put whatever value you want for each letter as label (I just put 1,2,3,4... but label= will change that).
data fmts;
retain fmtname 'CHARNUM' type 'i';
do _t=65 to 90;
start=byte(_t); *the character, so byte(65)='A';
label=_t-64; *the resulting number;
output;
end;
run;
proc format cntlin=fmts;
quit;
data test;
charval='CAB';
do _t=1 to length(Charval);
numval=sum(numval,input(char(upcase(charval),_t),CHARNUM.));
end;
put _all_;
run;
Finally, if you want to be able to construct this in the same datastep, you could construct the relationships in a hash table and look up the result. I can explain that if desired, though I'd like to see a more detailed example of what you want to do in terms of defining the relationship between a letter and its code.
If you need to see the intermediate values, you can do that by inserting a CAT function in the loop- I recommend CATX:
data test;
charval='CAB';
format intermed $100.;
do _t=1 to length(Charval);
numval=sum(numval,input(char(upcase(charval),_t),CHARNUM.));
intermed=catx('|',intermed,input(char(upcase(charval),_t),CHARNUM.)); *or the RANK portion from earlier;
end;
put _all_;
run;
That would give you 3|1|2, which you could then do math on via SCAN:
do _t = 1 to countc(intermed,'|')+1;
numval2 = sum(numval2,scan(intermed,_t,'|'));
end;
Your method to try and translate is a good attempt, but it will not really work. Here is a simple solution:
DATA CHAR_VALUE;
retain all_chars 'ABCDEFGHIJKLMMOPQRSTUVXXYZ';
set XYZ;
length CHAR_2_NUM $200;
CHAR_2_NUM = ' ';
NUM_CHAR = 0;
do i=1 to length(MY_VAR_CHAR);
if i=1 then CHAR_2_NUM = substr(MY_VAR_CHAR,i,1);
else CHAR_2_NUM = trim(CHAR_2_NUM) || ' ' || substr(MY_VAR_CHAR,i,1);
NUM_CHAR + index(all_chars,substr(MY_VAR_CHAR,i,1));
end;
drop i all_chars;
RUN;
This takes advantage of the fact that the indexed position of each character of your source variable in the all_chars variable corresponds to the mapping you desired.
UPDATED to also create your CHAR_2_NUM variable, which I overlooked in the original question.
Another simple solution is based on the collate function:
To convert a variable called MyNumbers (in the range of 1 to 26) to English upper-case characters, one can use:
collate(64 + MyNumbers, 64 + MyNumbers)
To obtain lower-case characters, one can use:
collate(96 + MyNumbers, 96 + MyNumbers)
Here's a quick example:
data _null_;
do MyNumbers = 1 to 26;
MyLettersUpper = collate(64 + MyNumbers, 64 + MyNumbers);
MyLettersLower = collate(96 + MyNumbers, 96 + MyNumbers);
put MyNumbers MyLettersUpper MyLettersLower;
end;
run;
1 A a
2 B b
3 C c
4 D d
5 E e
6 F f
7 G g
8 H h
9 I i
10 J j
11 K k
12 L l
13 M m
14 N n
15 O o
16 P p
17 Q q
18 R r
19 S s
20 T t
21 U u
22 V v
23 W w
24 X x
25 Y y
26 Z z
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
cpu time 0.03 seconds