SAS cutting off string during data recode - sas

On a dataset that is created by:
data voa;
input Address $50.;
input City $1-15 State $16-17 Zip;
input Latitude Longitude;
datalines;
1675 C Street, Suite 201
Anchorage AK 99501
61.21 -149.89
600 Azalea Road
Mobile AL 36609
30.65 -88.15
I'm attempting to add a new variable which is essential a recoding of Long and Lat, like so:
data voa1;
set voa;
if Longitude < -110 then Region = "West";
if Latitude > 40 and Longitude < -90 and Longitude > -110 then Region = "Mid-West";
if Latitude > 40 and Longitude > -90 then Region = "North-East";
if Latitude < 40 and Longitude < -110 then Region = "South";
run;
Unfortunately, it seems that SAS is cutting the strings short and leaving them at 4 characters (e.g. "Mid-West" just becomes "Mid-"). If I had to guess I would assume that this is because SAS assigns a certain number of bytes for each value in a column based on the first value in that column, and doesn't dynamically modify the number of bytes based on new values. How do I fix this?
Note: I think a potential fix might be putting the longest potential output (in this case "North-East") first, but this seems like an inelegant solution.

One of the nice features of SAS is that you are not forced to define your variables before using them. But if you don't define the variable then SAS must make a guess at what you meant by the code that you write. In your case since the first reference to the new variable Region is in the assignment statement:
Region = "West"
SAS makes the logical decision to define it as a character variable of length 4.
To fix that just add a LENGTH statement before the first IF statement.
length region $10;

Related

Moving variable to next column in SAS

I'm having the following table in SAS
SAS Table: Price
ID Description Price Discount
20 Hot blue warm 12.0
21 Durable A 15.0 0
22 Flexible 13.5 0
23 Bendable and A 12.3
I'm planning to move 'warm' and 'and A' from Price column to Description column while '12.0' and '12.3' to Price, what should I do?
You cannot change the type of an existing variable, but you can change the name.
Use the INPUT() function to convert strings to numbers. You can use the ?? modifier to suppress errors generated by strings that do not represent numbers.
data want;
set price(rename=(price=char_price));
price = input(char_price,??32.)
if missing(price) then description=catx(' ',description,char_price);
run;

How to read instream data

I know that this is a very basic question, but I really am having trouble.
I wrote the following code and all I want to do is to read the data correctly, but I cannot find a way to instruct SAS to read the BMI.
There are two things I want to do.
1), Have SAS store the entire number including all the decimals.
2), when printed, I would like to approximate the number to two decimal points.
data HW02EX01;
input Patient 1-2 Weight 5-7 Height 9-10 Age 13-14 BMI 17-26 Smoking $ 29-40 Asthma $ 45-48;
cards;
14 167 70 65 23.9593878 never no
run;
Note: I left only the first observation since the display becomes really ugly and wearisome to edit by hand.
friend.
Maybe it could be useful the following code:
data HW02EX01_;
input Patient Weight Height Age BMI Smoking : $20. Asthma $10.;
format BMI 32.2;
cards;
14 167 70 65 23.9593878 never no
;
By way of comment, I would like to indicate some details:
If your input data has a fixed length, use the column reading method as you propose in your code in the input statement. If not, use a list reading entry (assuming there is a delimiter between your input data).
When SAS reads in the list form, it converts and saves all the digits of a numerical value. So you do not have to worry about reading decimals too.
To display a numerical value the way you like it, you can use the format statement to assign a representation of the value. In this case with two decimals we use the format "w.d". Where w is the total length of the number that can occur and d indicates the number of decimals to show. It should be mentioned that the formats do not change the value of the variable, only its presentation.
When using the cards statement, it is not necessary to use a run statement.
I hope it will be useful.
See the comments in the code, specifically the usage of LENGTH, FORMAT and INFORMAT statements to control the input and output appearance of data.
data HW02EX01;
*specify the length of the variables;
length patient $8. weight height age bmi 8. smoking asthma $8.;
*specify the informats of the variables;
*an informat is does the variable look like when trying to read it in;
informat patient $8. weight height age bmi best32.;
*formats control how infomraiton is displayed in output/tables;
format bmi 32.2 weight height age 12.;
input Patient $ Weight Height Age BMI Smoking Asthma ;
cards;
14 167 70 65 23.9593878 never no
;
run;

Base SAS 9.2-- SUBSTR function to classify Zip Codes into regions

I have a variable full of ZIP code observations and I want to sort those ZIP codes into four regions based on the first three digits of the code.
For example, all ZIP codes that start with 350, 351, or 352 should be grouped into a region called "central." Those that start with 362, 368, 360 or 361 should be in a region called "east." Etc.
How do I get base SAS to look at only the first three digits of the ZIP code variable?
What is the best way to associate those digits with a new variable called "region?"
Here's the code I have so far:
data work.temp;
set library.dataset;
a= substr (Zip_Code,1,3);
put a;
keep Zip_Code a;
run;
proc print data=work.temp;
run;
The column a is blank in my proc print results, however.
Thanks for your help
As #joe explains, this is due to zipcode being defined as numeric variable. I have seen this happening in one of the client locaton, that zipcode is defined as numeric. It lead to various data issues . You should try to define zipcode as character variable and then you can assign regions by using if statements or by reference table or by proc format. Below are exaples of if statement and reference tables. I find reference table method very robust.
data have;
input zip_code $;
datalines;
35099
35167
35245
36278
36899
36167
;
By if statement
data work.temp;
set have;
if in('350', '351', '352') then Region ='EAST';
if substr (Zip_Code,1,3) in('362', '368', '361') then REgion ='WEST';
run;
By use of reference table
data reference;
input code $ Region $;
datalines;
350 EAST
351 EAST
352 EAST
362 WEST
368 WEST
361 WEST
;
proc sql;
select a.*, b.region from have a
left join
reference b
on substr (Zip_Code,1,3) = code;
If a is blank, then your zip_code variable is almost certainly numeric. You probably have a note about numeric to character conversion.
SAS will happily allow you to ignore numeric and character in most instances, but it won't always give correct behavior. In this case, it's probably converting it with the BEST12 format, meaning, 60601 becomes " 60601". So substr(that,1,3) gives " ", of course.
Zip code ideally would be stored in a character variable as it's an identifier, but if it's not for whatever reason, you can do this:
a = substr(put(zip_code,z5.),1,3);
The Zw.d format is correct since you want Massachusetts to be "02101" and not "2101 ".

Adding a new Column to existing SAS dataset

I have a SAS dataset that I have created by reading in a .txt file. It has about 20-25 rows and I'd like to add a new column that assigns an alphabet in serial order to each row.
Row 1 A
Row 2 B
Row 3 C
.......
It sounds like a really basic question and one that should have an easy solution, but unfortunately, I'm unable to find this anywhere. I get solutions for adding new calculated columns and so on, but in my case, I just want to add a new column to my existing datatable - there is no other relation between the variables.
This is kind of ugly and if you have more than 26 rows it will start to use random ascii characters. But it does solve the problem as defined by the question.
Test data:
data have;
do row = 1 to 26;
output;
end;
run;
Explanation:
On my computer, the letter 'A' is at position 65 in the ASCII table (YMMV). We can determine this by using this code:
data _null_;
pos = rank('A');
put pos=;
run;
The ASCII table will position the alphabet sequentially, so that B will be at position 66 (if A is at 65 and so on).
The byte() function returns a character from the ASCII table at a certain position. We can take advantage of this by using the position of ASCII character A as an offset, subtracting 1, then adding the row number (_n_) to it.
Final Solution:
data want;
set have;
alphabet = byte(rank('A')-1 + _n_);
run;
Not better than Tom's but a brute force alternative essentially. Create the string of Alpha and then use CHAR() to identify character of interest.
data want;
set sashelp.class;
retain string 'ABCDEFGHIJKLMNOPQRSTUVWXYZ';
letter = char(string, _n_);
run;

SAS date or numeric data?

%let months_back = %sysget(months_back);
data;
m = intnx('month', "&sysdate9"d, -&months_back - 2, 'begin');
m = intnx('day', put(m, date9.), 26, 'same');
m2back = put(m, yymmddd10.);
put m2back;
run;
NOTE: Character values have been converted to numeric values at the
places given by: (Line):(Column).
5:19 NOTE: Invalid numeric data, '01OCT2012' , at line 5 column 19.
I really don't know why this go wrong. The date string is numeric data?
PUT(m, date9.) is the culprit here. The 2nd argument of INTNX needs to be numeric (i.e. a date), the PUT function always returns a character value, in this instance '01OCT2012'. Just take out the PUT function completely and the code should work.
m = intnx('day', m, 26, 'same');
SAS stores dates as numbers - and in fact does not have a truly separate type for them. A SAS date is the number of days since 1/1/1960, so a bit over 19000 for today. The date format is entirely irrelevant to any date calculations - it is solely for human readibility.
The bit where you say:
"&sysdate9"d
actually converts the string "01JAN2012" to a numeric value (18304).
There's actually a quicker way to accomplish what you're trying to do. Because days correspond to whole numbers in SAS, to increment by one day you can simply add one to the value.
For example:
%let months_back=5;
data _null_;
m = intnx('month', today(), -&months_back - 2, 'begin');
m2 = intnx('day', m, 26, 'same');
m3 = intnx('month',"&sysdate9"d, -&months_back - 2)+26;
m2back = put(m2, yymmdd10.);
put m= date9. m2= yymmdd10. m3= yymmdd10.;
run;
M3 does your entire calculation in one step, by using the MONTH interval, then adding 26. INTNX('day'...) is basically pointless, unless there's some other value to using the function (using a shift index for example).
You also can see the use of a format in the PUT(log) statement here - you don't have to PUT it to a character value and then put that to the log to get the formatted value, just put (var) (format.); - and string together as many as you want that way.
Also, "&sysdate9."d is not the best way to get the current date. &sysdate. is only defined on startup of SAS, so if your session ran for 3 days you would not be on the current day (though perhaps that's desired?). Instead, the TODAY() function gets the current date, up to date no matter how long your SAS session has been running.
Finally - I recommend data _null_; if you don't want a dataset (and naming the result dataset if you do want it). data _null_ does not create a dataset. data; simply creates increasing numbers of datasets (data1, data2, ...) which quickly fill up your workspace and make it hard to tell what you're doing.