I want to convert some variable into char with defined length and some to numeric with some length in data step.
How to achieve this?
Thanks in advance
for character its working fine.
Data Clean_data
format
country $30.
Region $30.
Days_on_site (Numeric value)
AOD (Numeric value)
set raw_data;
run;
All SAS numeric values in a running DATA step are 8-byte IEEE floating-point. See Numerical Accuracy in SAS Software The number of bytes used to store a numeric variable in a data set can be specified to be anywhere from 3 (or 2) to 8 depending on operating system and hardware. Round off can occur moving a value from a data set into a data step when length is < 8.
SAS Numeric variables can also have an associated format that is used when the value has to be rendered (or presented) for output, generally for human readability. Output can be most anything, Excel, PDF, RTF, ViewTable viewer, grid viewer in Enterprise Guide, grid viewer in SAS Studio, etc. This concept may align with your question of "convert some variable into char".
There are many numeric formats. Based on your variable name Days_on_site I would presume you want to format a numeric variable that contains only integer values. A numeric value can be formatted to show as the nearest integer using
format Days_on_site 4.;
If the actual value is > 9999 the format will show the value as **** when viewed.
One of the commonly used numeric formats is w.d
From the documentation:
w specifies the width of the output field
d specifies the number of digits to the right of the
decimal point in the numeric value. This argument is optional.
Related
Suppose I have in SAS someTable with a column someColumn of type Character.
I can adjust length, format, informat and label in the following way:
ALTER TABLE WORK.someTable
MODIFY someColumn char(8) format=$CHAR6. informat=$CHAR6. label='abcdef'
But I doubt if this is the correct way for the following reasons:
It seems pointless that the syntax requires the type char because column type can't be changed with a MODIFYstatement.
This code does not work if someColumn is of type Numeric or Date.
The syntax for changing length is inconsistent with the syntax for changing format/informat/label.
Actually, I expected the following code to work:
ALTER TABLE WORK.someTable
MODIFY someColumn length=8 format=$CHAR6. informat=$CHAR6. label='someLabel'
This code runs without errors nut does not change the length.
Question:
What is the correct syntax to modify the length of a column using ALTER TABLE / MODIFY?
(For arbitrary column type like character/numeric/date.)
The syntax for defining the altered variable ("column") is the same as the syntax PROC SQL uses for defining a variable. What the documentation calls "column-definition Component"
column data-type <column-modifier(s)>
That is why you use the SQL syntax, char(n) or num, for specifying the type. Note that SAS datasets only have two data types: fixed length character strings and floating point numbers. SAS will automatically convert any other SQL data-type into the proper one of those.
The limitations on altering the type are spelled out in the documentation:
Changing Column Attributes
If a column is already in the table, then
you can change the following column attributes by using the MODIFY
clause: length, informat, format, and label. The values in a table are
either truncated or padded with blanks (if character data) as
necessary to meet the specified length attribute.
You cannot change a character column to numeric and vice versa. To
change a column’s data type, drop the column and then add it (and its
data) again, or use the DATA step.
Note: You cannot change the length of a numeric column with the ALTER
TABLE statement. Use the DATA step instead.
Note that to make such changes to a dataset SAS will have to create a whole new dataset. So you might as well just write a data step to create the new dataset and then you will have full control.
Also be careful if you change the length of character variable to make sure that the attached FORMAT is still correct.
In your example you are changing the variable to be 8 bytes long, but are attaching a format that will only display the first 6 bytes.
In general it is best to not attach formats to character variables to avoid the confusion that type of mismatch can cause. Unfortunately there is no way to remove the attached format using PROC SQL. The best you could do is to set the format to $., that is without an explicit width. If you want to completely remove the format you will need to use a FORMAT statement in PROC DATASETS or a data step.
I am trying to load SAS data file together with its variable and value labels, but I cant seem to make it work.
I have 3 SAS files
sas data ("data_final.sas7bdat")
sas format dictionary that contains the format name, variable name/labels, etc ("formats.sas7bdat")
sas format library that contains the format name, value name/labels,etc ("format_library.sas7bdat")
I am trying to load this to SPSS using the following code but it doesn't work. It loads the data and the variable labels but not the value labels.
GET SAS DATA='\data_final.sas7bdat'
/FORMATS='\formats.sas7bdat'
/FORMATS='\format_library.sas7bdat'.
Any help is greatly appreciated.
Thank you!
The FORMATS= option wants the name of the SAS format catalog, not another SAS dataset. Catalogs use sas7bcat as the extension.
GET SAS DATA='\data_final.sas7bdat'
/FORMATS='\formats.sas7bcat'.
If you really cannot get it to work then read in the formats_library.sas7bdat and look at the FMTNAME, TYPE, START, END and LABEL variables and use those to generate the SPSS code you need to attach data labels to your SPSS data.
FMTNAME is the name of the format. The TYPE determines if it is applies to character values or numeric values (or if in fact is an INFORMAT instead of FORMAT). The START and END mark the range of values (frequently they will be the same) and LABEL is the decoded value (aka the data label). Unlike in SPSS in SAS you only have to define the code/decode mapping once and then apply to as many variables as you want.
The dataset you show as being named formats.sas7bdat looks like it is the variable level metadata. That should list each variable (NAME) and what format, if any, has been attached to it (FORMAT). So if that shows there is a variable named FRED that has the format YESNO attached to it then look for records in format_library where FMTNAME='YESNO' and see what values it maps. So if FRED is numeric with values 1 and 2 then format YESNO might have one record with START='1' and LABEL='YES' and another with START='2' and LABEL='NO'.
I have a numeric variable in a SAS dataset which is length 8. Despite of its length being 8 bytes it contains only one number. See the example bellow.
my_variable
1
2
5
9
0
3
The problem is that I need this variable to be only 1 byte in length and SAS doesn't accept it. I am running the following code:
data my_data_2;
set my_data;
length my_variable 1;
run;
And SAS reports this error message:
ERROR 352-185: The length of numeric variables is 3-8.
1 - So, why I cannot have a numeric variable with a length less than 3 (or greater than 8) bytes?
2 - Is there a way to manage this? I really need this variable to be length 1.
Edit - adding more context:
I need this specific variable to be length one because I need to submit this dataset to a regulatory authority in my country. They demand this variable to be numeric and length one, otherwise their validation program will not be able to read it. Also, it is needed to be submitted as .DBF file (which is simply done by using SAS proc export statement).
I tried to use Microsoft Access 2013 to change length to 1 and it works. The problem is that Access 2013 does not read or save .DBF as it is an old file format. So, I wanted to change the length in SAS and simply export it .DBF.
According to the documentation:
The minimum length for a SAS variable on Windows and UNIX operating
systems is 3 bytes, and the maximum length is 8 bytes. On IBM
mainframes, the minimum length for a SAS variable is 2 bytes, and the
maximum length is 8 bytes.
The SAS numeric variable length is counted in bytes, not digits.
If you need a flag use character variable instead.
The length of a numeric variable in SAS is the number of bytes that it can be stored in. As SAS only uses floating point numerics, they cannot be smaller than 3 bytes in Windows or Unix (2 in z/OS); there is no integer or binary/bit data type in base SAS.
You're welcome to use a format which controls the field width displayed on the screen.
I think that PROC EXPORT will write numeric variable with length of 1. You just need to a attach a format to it so that the proc knows that is what you want.
Try this test program.
%let fname=%sysfunc(pathname(work))/test.dbf ;
data test;
length male 8 sex $1 female 8;
set sashelp.class(obs=3 keep=sex );
male=(sex='M');
female=(sex='F');
format male female F1.;
run;
proc export data=test outfile="&fname" replace
dbms=dbf
;
run;
Then dump the contents of your DBF file as binary to the log
data _null_;
infile "&fname" recfm=f lrecl=32;
input;
list;
run;
and compare it to the description of the file format https://en.wikipedia.org/wiki/.dbf#File_architecture_overview
I started out formatting my variables using PROC FORMAT. Later on I found that I had to change some of my variables in my dataset. I want to maintain the formatting I originally created, but I don't think I can do this if I recode. Am I correct in assuming this? I think I will have to just change some of my formats to accommodate my new variables, but is there a way
I'm not quite sure I understand your question, but I think I can still answer your question by giving you an understanding of the difference between recoding variables in SAS and using formatted values.
If you have originally created a format, that format is applied to the values in the SAS dataset at the time that your analysis is run. So, if you have a value of "Block A" in a character variable in your dataset and you have formatted value that maps "Block A" to the formatted value of 1, then if you go in and later change the value of "Block A" to something else and rerun your analysis, "Block A" will not longer be printed in your output or used in your analysis as the formatted value. Formats work independently of the underlying values in your datasets. When you run an analysis SAS essentially looks through your datasets at run-time and maps each of the values to the formatted values as you've specified in your proc format statement and then performs the analysis using the formatted values.
If you want to keep the original formatting, you can use two separate formats: one for the old format and one for the new formatting and call the appropriate format into your procedures depending on when you want to use which format.
You can also use a put statement in a datastep to convert the previously formatted value and "hard code" the formatted value as an actual value in your dataset. For example, if you have a format called "blockno" that you used with a variable called "block" then, using your old format, you could create a variable called blockno_old and set it to the old formatted value with:
block_old=put(block, $blockno.).
You could then modify block with your new values. You would then have to variables in your dataset: block_old which would contain the original values of your variable and block which, after your changes, would contain the new values.
Proc Format is not a format statement
With proc format, you create formats, you do not assign them to variables. That you can do for instance with a format statement.
The format of a variable is not its internal length
A SAS variable can only have two types: numerical (which non SAS programmers call double) or chracter (which non SAS programmers call fixed length character) It can however have hundreds of different formats. The format just determines the way the variable is represented in a report.
You can perfectly change the format of a variable without changing it's length.
Try this:
proc format;
value myFormat
0-10 = 'small'
10-20 ='medium'
20-100='large' ;
run;
data test1;
infile datalines;
length myVar 8.;
input myVar;
format myVar 6.2;
datalines;
1
2.1
9.12
10.123
15.1234
22.12345
50.123456
;
data test2;
set test1;
format myVar myFormat.;
data test3;
set test2;
format myVar 12.6;
run;
title 'In test1, myVar has format 6.2';
proc print data=test1;
run;
title 'In test2, myVar has format myFormat';
proc print data=test2;
run;
title 'In test3, myVar has format 12.6';
proc print data=test3;
run;
You can create a format in a format catalog and store it for any future reference. It always happens that the dataset has new variables and updated variables with new data. So having a format catalog to accommodate the new and old changes will actually help to maintain history of the original and current values.
I have a need to combine two sas datasets having the same column names but one of the datasets will have a numeric value where the same name in the other dataset are character. I was thinking to evaluate each field with the %isnum function and based on this convert the number to character:
char_id = put(id, 7.) ;
drop id ;
rename char_id=id ;
What I need to know is how do I determine the length of the variable to use in the PUT and what would I do for date fields?
Sounds like you need to analyze your data and see how long things are. Use an obviously too long format (best32.) and then see how long the actual results are, or use max.
For date fields, you need to decide how you want your date fields to look.
date_c = put(date_n,date9.);
That would be the default, but there are literally hundreds of date formats you can choose from.
You can also use proc contents data=myDataStes out=VarDatasets; run; and you will get the list of variables with type, length, format, informat and so on.