formatting variables and then recoding - sas

I started out formatting my variables using PROC FORMAT. Later on I found that I had to change some of my variables in my dataset. I want to maintain the formatting I originally created, but I don't think I can do this if I recode. Am I correct in assuming this? I think I will have to just change some of my formats to accommodate my new variables, but is there a way

I'm not quite sure I understand your question, but I think I can still answer your question by giving you an understanding of the difference between recoding variables in SAS and using formatted values.
If you have originally created a format, that format is applied to the values in the SAS dataset at the time that your analysis is run. So, if you have a value of "Block A" in a character variable in your dataset and you have formatted value that maps "Block A" to the formatted value of 1, then if you go in and later change the value of "Block A" to something else and rerun your analysis, "Block A" will not longer be printed in your output or used in your analysis as the formatted value. Formats work independently of the underlying values in your datasets. When you run an analysis SAS essentially looks through your datasets at run-time and maps each of the values to the formatted values as you've specified in your proc format statement and then performs the analysis using the formatted values.
If you want to keep the original formatting, you can use two separate formats: one for the old format and one for the new formatting and call the appropriate format into your procedures depending on when you want to use which format.
You can also use a put statement in a datastep to convert the previously formatted value and "hard code" the formatted value as an actual value in your dataset. For example, if you have a format called "blockno" that you used with a variable called "block" then, using your old format, you could create a variable called blockno_old and set it to the old formatted value with:
block_old=put(block, $blockno.).
You could then modify block with your new values. You would then have to variables in your dataset: block_old which would contain the original values of your variable and block which, after your changes, would contain the new values.

Proc Format is not a format statement
With proc format, you create formats, you do not assign them to variables. That you can do for instance with a format statement.
The format of a variable is not its internal length
A SAS variable can only have two types: numerical (which non SAS programmers call double) or chracter (which non SAS programmers call fixed length character) It can however have hundreds of different formats. The format just determines the way the variable is represented in a report.
You can perfectly change the format of a variable without changing it's length.
Try this:
proc format;
value myFormat
0-10 = 'small'
10-20 ='medium'
20-100='large' ;
run;
data test1;
infile datalines;
length myVar 8.;
input myVar;
format myVar 6.2;
datalines;
1
2.1
9.12
10.123
15.1234
22.12345
50.123456
;
data test2;
set test1;
format myVar myFormat.;
data test3;
set test2;
format myVar 12.6;
run;
title 'In test1, myVar has format 6.2';
proc print data=test1;
run;
title 'In test2, myVar has format myFormat';
proc print data=test2;
run;
title 'In test3, myVar has format 12.6';
proc print data=test3;
run;

You can create a format in a format catalog and store it for any future reference. It always happens that the dataset has new variables and updated variables with new data. So having a format catalog to accommodate the new and old changes will actually help to maintain history of the original and current values.

Related

How to read SAS format dictionary into SPSS?

I am trying to load SAS data file together with its variable and value labels, but I cant seem to make it work.
I have 3 SAS files
sas data ("data_final.sas7bdat")
sas format dictionary that contains the format name, variable name/labels, etc ("formats.sas7bdat")
sas format library that contains the format name, value name/labels,etc ("format_library.sas7bdat")
I am trying to load this to SPSS using the following code but it doesn't work. It loads the data and the variable labels but not the value labels.
GET SAS DATA='\data_final.sas7bdat'
/FORMATS='\formats.sas7bdat'
/FORMATS='\format_library.sas7bdat'.
Any help is greatly appreciated.
Thank you!
The FORMATS= option wants the name of the SAS format catalog, not another SAS dataset. Catalogs use sas7bcat as the extension.
GET SAS DATA='\data_final.sas7bdat'
/FORMATS='\formats.sas7bcat'.
If you really cannot get it to work then read in the formats_library.sas7bdat and look at the FMTNAME, TYPE, START, END and LABEL variables and use those to generate the SPSS code you need to attach data labels to your SPSS data.
FMTNAME is the name of the format. The TYPE determines if it is applies to character values or numeric values (or if in fact is an INFORMAT instead of FORMAT). The START and END mark the range of values (frequently they will be the same) and LABEL is the decoded value (aka the data label). Unlike in SPSS in SAS you only have to define the code/decode mapping once and then apply to as many variables as you want.
The dataset you show as being named formats.sas7bdat looks like it is the variable level metadata. That should list each variable (NAME) and what format, if any, has been attached to it (FORMAT). So if that shows there is a variable named FRED that has the format YESNO attached to it then look for records in format_library where FMTNAME='YESNO' and see what values it maps. So if FRED is numeric with values 1 and 2 then format YESNO might have one record with START='1' and LABEL='YES' and another with START='2' and LABEL='NO'.

Generating many tables from a single table in SAS

I have a table in SAS which contains the format information I want. I want to bin this data into the categories given.
What I don't know how to do is create either an xform or a format file from the data.
An example table looks like this:
TxtLabel Type FmtName label Hlo count
. I FAC1f 0 O 1
1996 I FAC1f 1 2
1997 I FAC1f 2 3
I want to date all years in a different data set as after 1997 OR before 1996.
The problem is that I know how to do this by hard coding it, but these files changes the numbers each time so I'm hoping to use the information in the table to generate the bins rather than hard code them.
How do I go about binning by data using a column from another dataset for my categorization?
Edit
I have two data sets, one which looks like the one I have included and one which has a column titled "YEAR". I want to bin the second data set using the categories from the first. In this case there are two available years in TxtLabel. There are multiple tables like this, I'm looking at how to generate PROC Format code from the table, rather than hard coding the values.
This should run to create the desired format
Proc FORMAT CNTLIN=MyCustomFormatControlData;
run;
You can then use it in a DATA Step, or apply it to a column in a data set.
Binning the data might be construed as 'data set splitting' but your question does not make it clear if that is so. Generic arbitrary splitting is often done with one of these techniques:
wall paper source code resolved from macro variables populated from information garnered in a Proc SQL or Proc FREQ step
dynamic data splitting using hash object for grouping records in memory, and saved to a data set with an .output() call.
Sample code for explicit binning
data want0 want1 want2 want3 want4 want5 wantOther;
set have;
* explicit wall paper;
select (put(year,FAC1f.));
when ('0') output want0;
when ('1') output want1;
when ('2') output want2;
when ('3') output want3;
when ('4') output want4;
when ('5') output want5;
otherwise output wantOther;
run;
This is the construct that source code generated by macro can produce, and requires
one pass to determine the when/output lines that are to be generated
a second pass to apply the lines of code that were generated.
If this is the data processing that you are attempting:
do some research (plenty of info out there)
write some code
make a new question if you get errors you can't resolve
Proc FORMAT
Proc FORMAT has a CNTLIN option for specifying a data set containing the format information. The structure and values expected of the Input Control Data Set (that CNTLIN) is described in the Output Control Data Set documentation. Some of the important control data columns are:
FMTNAME
specifies a character variable whose value is the format or informat name.
LABEL
specifies a character variable whose value is associated with a format or an informat.
START
specifies a character variable that gives the range's starting value.
END
specifies a character variable that gives the range's ending value.
As the requirements of the custom format to be created get more sophisticated you will need to have more information variables in the input control data set.

Export/Import Attributes of a SAS dataset

I am working with multiple waves of survey data. I have finished defining formats and labels for the first wave of data.
The second wave of data will be different, but the codes, labels, formats, and variable names will all be the same. I do not want to define all these attributes again...it seems like there should be a way to export the PROC CONTENTS information for one dataset and import it into another dataset. Is there a way to do this?
The closest thing I've found is PROC CPORT but I am totally confused by it and cannot get it to run.
(Just to be clear I'll ask the question another way as well...)
When you run PROC CONTENTS, SAS tells you what format, labels, etc. it is using for each variable in the dataset.
I have a second dataset with the exact same variable names. I would like to use the variable attributes from the first dataset on the variables in the second dataset. Is there any way to do this?
Thanks!
So you have a MODEL dataset and a HAVE dataset, both with data in them. You want to create WANT dataset which has data from HAVE, with attributes of MODEL (formats, labels, and variable lengths). You can do this like:
data WANT ;
if 0 then set MODEL ;
set HAVE ;
run ;
This works because when the DATA step compiles, SAS builds the Program Data Vector (PDV) which defines variable attributes. Even though the SET MODEL never executes (because 0 is not true), all of the variables in MODEL are created in the PDV when the step compiles.
Importantly, note that if there are corresponding variables with different lengths, the length from MODEL will determine the length of the variable in WANT. So if HAVE has a variable that is longer than the same-named variable in MODEL, it may be truncated. Options VARLENCHK determines whether or not SAS throws a warning/error if this happens.
That assumes there are no formats/labels on the HAVE dataset. If there is a variable in HAVE that has a format/label, and the corresponding variable in MODEL does not have a format/label, the format/label from HAVE will be applied to WANT.
Sample code below.
data model;
set sashelp.class;
length FavoriteColor $3;
FavoriteColor="Red";
dob=today();
label
dob='BirthDate'
;
format
dob mmddyy10.
;
run;
data have;
set sashelp.class;
length FavoriteColor $10;
dob=today()-1;
FavoriteColor="Orange";
label
Name="HaveLabel"
dob="HaveLabel"
;
format
Name $1.
dob comma.
;
run;
options varlenchk=warn;
data want;
if 0 then set model;
set have;
run;
I'd create an empty dataset based on the existing one, and then use proc append to append the contents to it.
Create some sample data for the second round of data:
data new_data;
age = 10;
run;
Create an empty dataset based on the original data:
proc sql noprint;
create table want like sashelp.class;
quit;
Append the data into the empty dataset, retaining the details from the original:
proc append base=want data=new_data force nowarn;
run;
Note that I've used the force and nowarn options on proc append. This will ensure the data is appended even if differences are found between the two datasets being used. This is expected if you have, for example, format differences. It will also hide things like if columns exist in the new table that aren't in the old table etc. So be careful that this is doing what you want it to. If the behaviour is undesirable, consider using a datastep to append instead (and list the want dataset first).
Welcome to the stack.
If you want to copy the properties of the table without the data within it, you could use PROC SQL or data step with zero rows read in.
This examples copies all information about the SASHELP.CLASS dataset into a brand new dataset. All formats, attributes, labels, the whole thing is copies over. If you want to only copy some of the columns, specify them in select clause instead of asterix.
PROC SQL outobs=0;
CREATE TABLE WANT as SELECT * FROM SASHELP.CLASS;
QUIT;
Regards,
Vasilij

How to read excel files containing brackets in header in sas?

I have an excel file where open of the columns is temperature (F) and then when I import it in sas it saves variable name as temperature_F_ or when I use validvarany option it saves exactly as temperature (F). However, I need to now convert the data in C. So whenever I use either of the variable name (i.e temperature_F_ or temperature (F)) it does not work. For the second one, it thinks temperature as functions. So wats the way around this one?
The exact nature of your problem isn't clear, as temperature_F_ should be fine if you've imported under validvarname=v7.
data want;
set have;
temperature_c_ = (5/9)*((temperature_f_)-32);
run;
If you have to work with the validvarname=any; version, then you use named literals:
data want;
set have;
'temperature(c)'n = (5/9)*(('temperature(f)'n)-32);
run;
Similar to a date literal (ie, '01JAN2010'd) but for member/variable/etc. names.

Change variable length in SAS dataset

I need to change the variable length in a existing dataset. I can change the format and informat but not the length. I get an error. The documentation says this is possible but there are no examples.
Here is my issue. My data source could change so I don't want to pre define columns on import. I want to do a generic import and then look for certain columns and adjust the length.
I have tried PROC SQL and DATA steps. It looks like the only way to do this is to recreate the dataset or the column. Which I don't want to do.
Thanks,
Donnie
If you put your LENGTH statement before the SET statement, in a Data step, you can change the length of a variable. Obviously, you will get truncation if you have data longer than your new length.
However, using a DATA step to change the length is also re-creating the data set, so I'm confused by that part of your question.
The only way to change the length of a variable in a datastep is to define it before a source (SET) dataset is read in.
Conversely you can use an alter statement in a proc sql. SAS support alter statement
Length of a variable remains same once you set the dataset. Add length statements before you set the dataset if you need to change length of a columns
data a;
length a, b, c $200 ;
set b ;
run ;