adding space to character variables - sas

when I run the following code, I see that the number in my character variable gets shifted as a value
data test;
input names$ score1 score2;
cards;
A1 80 95
A 2 80 95
;
run;
proc print data=test;
run;
leading to a output like the following
The SAS System
Obs names score1 score2
1 A1 80 95
2 A 2 80
How do I create a variable like "A 2" with space so that the 2 doesn't get shifted

Your problem is you're using space delimited data input. Is it truly space delimited, though, or is it columnar (fixed position)?
data test;
input names $ 1-4 score1 5-12 score2 13-20;
cards;
A1 80 95
A 2 80 95
;
run;
If it's truly delimited and you're just not exactly replicating the data here, you have a few choices. You can use the & character to ask SAS to look for two consecutive spaces to be a delimiter, but your actual data doesn't have that correctly either - but it would look like so:
data test;
input names &$ score1 score2;
cards;
A1 80 95
A 2 80 95
;
run;
Or if you truly have the issue here that you have some single spaces that are delimiters and some single spaces that are not, you'll have to work out some sort of logic to do this. The exact logic depends on your rules, but here's an example - here I look for that space, and assume that if it is there then there is exactly one more character, then I want to move everything down one so that I have a guaranteed double space now. This is probably not a good rule for you, but it is an example of what you might do.
data test;
input #;
if substr(_infile_,2,1)=' ' then do; *if there is a space at spot two specifically;
_infile_ = substr(_infile_,1,3)||' '||substr(_infile_,4); *shift everything after 3 down;
end;
input names &$ score1 score2;
cards;
A1 80 95
A 2 80 95
;
run;

If your input is fixed block, as suggested, and the NAMES field is 12 bytes, as suggested by the data, then you can use formatted input for NAMES.
data test;
length names $ 12 score1 score2 8;
input names $12. score1 score2;
names=trim(left(names));
cards;
A1 80 95
A 2 80 95
;
run;

Related

Complete columns based on values that precede data as table format in SAS

How should the code be completed to make this work?
Code:
data ms;
infile 'C';
input cr ls ms color $;
if input #; *statemet that reads the line with one word and complete the color column*
run;
Input:
Blars
10 83287 10.00
20 1748956 30.00
30 2222222 73.00
40 833709 90.00
Klirs
10 922222 90.50
20 1222222 10.00
30 1111111 93.33
40 8998877 300.90
Expected output:
cr
ls
ms
color
10
83287
10.00
Blars
20
1748956
30.00
Blars
30
2222222
50.00
Blars
40
833709
73.00
Blars
10
922222
90.50
Klirs
20
1222222
10.00
Klirs
30
1111111
93.33
Klirs
40
8998877
300.90
Klirs
Attempted to read it
Just RETAIN the extra variable. You need some way to detect which type of line you currently are reading. When it has the COLOR just update the COLOR variable and do not write out an observation. When it has the actual data then read all of the fields and write an observation.
data ms;
infile 'C' truncover ;
length color $10 cr ls ms 8;
retain color;
input cr ?? # ;
if missing(cr) then do;
color = _infile_;
delete;
end;
input ls ms ;
run;
Make sure to define the COLOR column long enough to store the longest value. This assumes there are no blank lines, as you mentioned in your comment on the original question.
Slightly different method than other solution.
Use INPUT ## to read the full line and hold it in the automatic variable _infile_.
Check _infile_ variable to see if it contains any numeric values, if so, process as data.
Otherwise, process as a colour.
data have;
infile cards truncover;
*set length and retain color across rows;
length color $10 cr ls ms 8;
retain color;
*read in string;
input ##;
*check for any digits in string, if any are found, process as data;
if anydigit(_infile_) then do;
input cr ls ms;
output;
end;
*otherwise read in as color;
else input color $;
cards;
Blars
10 83287 10.00
20 1748956 30.00
30 2222222 73.00
40 833709 90.00
Klirs
10 922222 90.50
20 1222222 10.00
30 1111111 93.33
40 8998877 300.90
;;;;
run;
Richard, your code could even be more succinct.
* attempt to read first 2 chars as number;
* ?? suppresses errors;
input num ?? 2. #;
if missing(num) then
input #1 color $;
else do;
input #1 cr ls ms;
output;
end;
You can scan a held generic input line and then choose which input statement you want based on the scan.
data want;
length color $20 cr ls ms 8;
retain color;
infile 'c' missover;
input #;
if missing(input(scan(_infile_,1),??best12.)) then
input #1 color ;
else
input #1 cr ls ms ;
if not missing(cr);
run;

Prevent SAS to automatically remove trailing blank in string

I have a sample data set like below.
data d01;
infile datalines dlm='#';
input Name & $15. IdNumber & $4. Salary & $5. Site & $3.;
datalines;
アイ# 2355# 21163# BR1
アイウエオ# 5889# 20976# BR1
カキクケ# 3878# 19571# BR2
;
data _null_ ;
set d01 ;
file "/folders/myfolders/test.csv" lrecl=1000 ;
length filler $3;
filler = ' ';
w_out = ksubstr(Name, 1, 5) || IdNumber || Salary || Site || filler;
put w_out;
run ;
I want to export this data set to csv (fixed-width format) and every line will has the length of 20 byte (20 1-byte-character).
But SAS auto remove my trailing spaces. So the result would be 17 byte for each line. (the filler is truncated)
I know I can insert the filler like this.
put w_out filler $3.;
But this won't work in case the `site' column is empty, SAS will truncate its column and the result also not be 20 byte for each line.
I didn't quite understand what you are trying to do with ksubstr, but if you want to add padding to get the total length to 20 characters, you may have to write some extra logic:
data _null_ ;
set d01 ;
file "/folders/myfolders/test.csv" lrecl=1000 ;
length filler $20;
w_out = ksubstr(Name,1,5) || IdNumber || Salary || Site;
len = 20 - klength(w_out) - 1;
put w_out #;
if len > 0 then do;
filler = repeat(" ", len);
put filler $varying20. len;
end;
else put;
run ;
You probably do not want to write a fixed column file using a multi-byte character set. Instead look into seeing if your can adjust your process to use a delimited file instead. Like you did in your example input data.
If you want the PUT function to write a specific number of bytes just use formatted PUT statement. To have the number of bytes written vary based on the strings value you can use the $VARYING format. The syntax when using $VARYING is slightly different than when using normal formats. You add a second variable reference after the format specification that contains the actual number of bytes to write.
You can use the LENGTH() function to calculate how many bytes your name values take. Since it normally ignores the trailing space just add another character to the end and subtract one from the overall length.
To pad the end with three blanks you could just add three to the width used in the format for the last variable.
data d01;
infile datalines dlm='#';
length Name $15 IdNumber $4 Salary $5 Site $3 ;
input Name -- Site;
datalines;
アイ# 2355# 21163# BR1
アイウエオ# 5889# 20976# BR1
カキクケ# 3878# 19571# BR2
Sam#1#2#3
;
filename out temp;
data _null_;
set d01;
file out;
nbytes=length(ksubstr(name,1,5)||'#')-1;
put name $varying15. nbytes IdNumber $4. Salary $5. Site $6. ;
run;
Results:
67 data _null_ ;
68 infile out;
69 input ;
70 list;
71 run;
NOTE: The infile OUT is:
Filename=...\#LN00059,
RECFM=V,LRECL=32767,File Size (bytes)=110,
Last Modified=15Aug2019:09:01:44,
Create Time=15Aug2019:09:01:44
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
1 アイ 235521163BR1 24
2 アイウエオ588920976BR1 30
3 カキクケ 387819571BR2 28
4 Sam 1 2 3 20
NOTE: 4 records were read from the infile OUT.
The minimum record length was 20.
The maximum record length was 30.
By default SAS sets an option of NOPAD on a FILE statement, it also sets each line to 'variable format', which means lengths of lines can vary according to the data written. To explicitly ask SAS to pad your records out with spaces, don't use a filler variable, just:
Set the LRECL to the width of file you need (20)
Set the PAD option, or set RECFM=F
Sample code:
data _null_ ;
set d01 ;
file "/folders/myfolders/test.csv" lrecl=20 PAD;
w_out = Name || IdNumber || Salary || Site;
put w_out;
run ;
More info here: http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000171874.htm#a000220987

Transform numbers with 0 values at the beginning

I have the following dataset:
DATA survey;
INPUT zip_code number;
DATALINES;
1212 12
1213 23
1214 23
;
PROC PRINT; RUN;
I want to link this data to another table but the thing is that the numbers in the other table are stored in the following format: 0012, 0023, 0023.
So I am looking for a way to do the following:
Check how long the number is
If length = 1, add 3 0 values to the beginning
If length = 2, add 2 0 values to the beginning
Any thoughts on how I can get this working?
Numbers are numbers so if the other table has the field as a number then you don't need to do anything. 13 = 0013 = 13.00 = ....
If the other table actually has a character variable then you need to convert one or the other.
char_number = put(number, Z4.);
number = input(char_number, 4.);
You can use z#. formats to accomplish this:
DATA survey;
INPUT zip_code number;
DATALINES;
1212 12
1213 23
1214 23
9999 999
8888 8
;
data survey2;
set survey;
number_long = put(number, z4.);
run;
If you need it to be four characters long, then you could do it like this:
want = put(input(number,best32.),z4.);

Reshaping data from long to wide

Below is an example that I found to reshape data from long to wide.But I am not able ti understand the code, especially the way they are replacing blanks and why. Can someone help me understand the code?
Example 1: Reshaping one variable
We will begin with a small data set with only one variable to be reshaped. We will use the variables year and faminc (for family income) to create three new variables: faminc96, faminc97 and faminc98. First, let's look at the data set and use proc print to display it.
DATA long ;
INPUT famid year faminc ;
CARDS ;
1 96 40000
1 97 40500
1 98 41000
2 96 45000
2 97 45400
2 98 45800
3 96 75000
3 97 76000
3 98 77000
;
RUN ;
PROC PRINT DATA=long ;
RUN ;
Obs famid year faminc
1 1 96 40000
2 1 97 40500
3 1 98 41000
4 2 96 45000
5 2 97 45400
6 2 98 45800
7 3 96 75000
8 3 97 76000
9 3 98 77000
Now let's look at the program. The first step in the reshaping process is sorting the data (using proc sort) on an identification variable (famid) and saving the sorted data set (longsort). Next we write a data step to do the actual reshaping. We will explain each of the statements in the data step in order.
PROC SORT DATA=long OUT=longsort ;
BY famid ;
RUN ;
DATA wide1 ;
SET longsort ;
BY famid ;
KEEP famid faminc96 -faminc98 ;
RETAIN faminc96 - faminc98 ;
ARRAY afaminc(96:98) faminc96 - faminc98 ;
IF first.famid THEN
DO;
DO i = 96 to 98 ;
afaminc( i ) = . ;
END;
END;
afaminc( year ) = faminc ;
IF last.famid THEN OUTPUT ;
RUN;
This is a good example to compare and contrast with DO UNTIL(LAST. It does away with the RETAIN and INIT to missing on FIRST.FAMID and the LAST. test for when to OUTPUT. Those operations are sill done just using the built in features of the data step loop.
DATA long;
INPUT famid year faminc;
CARDS;
1 96 40000
1 97 40500
1 98 41000
2 96 45000
2 97 45400
2 98 45800
3 96 75000
3 97 76000
3 98 77000
;;;;
RUN;
proc print;
run;
data wide;
do until(last.famid);
set long;
by famid;
ARRAY afaminc[96:98] faminc96-faminc98;
afaminc[year]=faminc;
end;
drop year faminc;
run;
proc print;
run;
The main element here is the SAS retain statement.
The datastep is executed for every observation in the dataset. For every iteration all variables are set to missing and then the data is loaded from the dataset.
If a variable is RETAINed it will not be reset, but will keep the information from the last iteration.
BY famid ;
Your dataset is ordered and the datastep is using a by statement. This will initialize the first.famid and last.famid. These are just binaries that turn to 1 for the first/last observation of a single id-group.
RETAIN faminc96 - faminc98 ;
As already explained faminc96 - faminc98 will keep their value from one datastep iteration to the next.
ARRAY afaminc(96:98) faminc96 - faminc98 ;
Just an array, so you can call the variables by number instead of name.
IF first.famid THEN
DO;
DO i = 96 to 98 ;
afaminc( i ) = . ;
END;
END;
For every first observation in an id-group the retained variables are reset. Otherwise you would keep values from one od-group to the next. Same could be done by IF first.famid then call missing(of afaminc(*));
afaminc( year ) = faminc ;
Writing the information to your transposed variables, according to the year.
IF last.famid THEN OUTPUT ;
After you have written all the values to your new variables, you only OUTPUT one observation (the last) in every id-group to the new dataset. As the variables were retained, they are all filled at this point.
This datastep is fast and purpose build. But generally you could just use proc transpose
I highly recommend proc transpose. It'll make your life easier.
http://support.sas.com/resources/papers/proceedings09/060-2009.pdf

Beginner. Reading data in SAS (Reading date and 100 score issue)

The problem said: The first line is a header line and should not be read (use the infile option firstobs=2) The remaining lines contain and ID number(character). gender(character), date of birth DOB, and two scores 1 and 2. Note that there are some missing values for the scores, and you want to be sure that SAS does not go to a new line to read these values. Write a SAS DATA STEP TO READ DOB with DATE9. Here are the lines of data(I put it in my code to save space).
DATA READ;
INFILE DATALINES FIRSTOBS=2;
INPUT ID 1-3
GENDER $ 5
#7 DOB mmddyy10.
# SCORE1 3
# SCORE2 3
;
DATALINES;
***Header line: ID GENDER DOB SCORE1 SCORE2
001 M 10/10/1976 1OO 99
002 F 01/01/1960 89
003 M 05/07/2001 90 98
;
DATA PROB12_8;
SET READ;
FORMAT DOB MMDDYY9.;
RUN;
PROC PRINT DATA=PROB12_8;
RUN;
My output is:
OBS ID GENDER DOB SCORE1 SCORE2
1 1 M . . 99
2 2 F . 89 .
3 3 M . 90 98
I don't understard why the program read in that way, if I specify the amount of spaces and use the pointer in my program.
Thanks for your help.
Your problems start at SCORE1 and SCORE2 you have the pointer control specified incorrectly. Also notice that 1OO is not 100. This file can be read easily with list input and missover infile statement option.
DATA READ;
INFILE DATALINES FIRSTOBS=2 missover;
informat id $3. gender $1. dob mmddyy10.;
input ID GENDER DOB SCORE1 SCORE2;
format dob mmddyy10.;
datalines;
***Header line: ID GENDER DOB SCORE1 SCORE2
001 M 10/10/1976 1OO 99
002 F 01/01/1960 89
003 M 05/07/2001 90 98
;;;;
run;