How do I format new variables that I Create? - sas

I am trying to create a new variable in SAS. I use if then logic to create a new character variable. However, the variable is being truncated. How do I format the new variable so that all the characters appear?
DATA Clinic;
set stat201.clinic;
rename age_at_consent=Age ldiastolic=dbp lsystolic=sbp ldobp=datebp;
if smoking="" then smoking="Missing";
if smokecat=0 then smokecat_1="Never Smoker";
if smokecat=1 then smokecat_1="Current Everyday Smoker";
if smokecat=2 then smokecat_1="Former Smoker";
if smokecat=9 then smokecat_1="Never Assessed";
attrib smokecat_1 format =$25.;
drop smokecat;
rename smokecat_1= smokecat;
run;

SAS will define the variable based on when it first appears. Since the first appearance is in this assignment statement.
if smokecat=0 then smokecat_1="Never Smoker";
It will be defined as a character variable of length 12.
Just define the variable BEFORE using it. You can use a LENGTH statement
length smokecat_1 $25;
or an ATTRIB statement to define a variable.
attrib smokecat_1 length=$25;
Attaching the $25. format does not change the length of the variable.
It just means that you want SAS to use 25 characters to display the values. But the variable will still only be 12 characters long. There is no need to attach any format to the variable. Formats are instructions on how to display values and SAS already knows how to display character values.

To complement Tom's answer above:
SAS will define the variable type and length based on when it first appears.
I agree it is a good practice to define a variable before using it.

Related

Difficulty understanding the "_n_" variable in SAS, and how it applies to a concatenate function

I am very new to SAS, and for whatever reason am finding a lot of difficulty deciphering what this code block (below) does. I've googled and search stackoverflow to no avail. I'd appreciate any input, thanks!
set dataset;
id=cat("L",_n_);
run;
Probably there must be a data statement as well.
data newdataset;
set dataset;
id = cat("L", _n_);
run;
This above code creates a new dataset named newdataset from the existing dataset named dataset.
Also creating a new column called id, and id is creating by concatenating a constant character value "L" with the automatic variable _n_ using the CAT function. The automatic variable _n_ represents the number of times the DATA step has iterated.

Globally set length for all string variable in SAS

Is there a global option to override the default length options for string in SAS? The only way that I have found is by setting the length statement within a data or proc step. This is not ideal for me as I would need to rework a large amount of code.

explain what is happening in Proc Sql

select Name into :Dataset1-:Dataset%trim(%left(&DatasetNum)) from MEM;
I am not able to interpret what is happening here in this statement can anyone give me an explanation.
I understand this stament
select count(Name) into :DatasetNum from MEM
But not the above one.
It is attempting to use the value of the macro variable DATASETNUM as the upper bound on the macro variable names that are being created by the SELECT statement. Because the previous variable was created with leading spaces the %LEFT() macro is called to remove them. The call to the macro %trim() is not needed as trailing spaces would not cause any trouble.
It is much easier to just build the macro variable array first and then set the counter variable from the value of the automatic macro variable SQLOBS. Plus then it will not have the leading blanks.
select name into :Dataset1- from mem ;
%let DatasetNum=&sqlobs;
If you have an older version of SAS that doesn't support the new :varname- syntax then just use a large value for the upper bound. SAS will only create the number of macro variables it needs.
select name into :Dataset1-:Dataset99999 from mem;
This is creating an array of SAS macro variables (DATASET1, DATASET2, DATASET3) etc, populated from the Name column of the MEM dataset.
It is analagous to:
data _null_;
set MEM;
call symputx(cats('Dataset',_n_),Name);
run;

formatting variables and then recoding

I started out formatting my variables using PROC FORMAT. Later on I found that I had to change some of my variables in my dataset. I want to maintain the formatting I originally created, but I don't think I can do this if I recode. Am I correct in assuming this? I think I will have to just change some of my formats to accommodate my new variables, but is there a way
I'm not quite sure I understand your question, but I think I can still answer your question by giving you an understanding of the difference between recoding variables in SAS and using formatted values.
If you have originally created a format, that format is applied to the values in the SAS dataset at the time that your analysis is run. So, if you have a value of "Block A" in a character variable in your dataset and you have formatted value that maps "Block A" to the formatted value of 1, then if you go in and later change the value of "Block A" to something else and rerun your analysis, "Block A" will not longer be printed in your output or used in your analysis as the formatted value. Formats work independently of the underlying values in your datasets. When you run an analysis SAS essentially looks through your datasets at run-time and maps each of the values to the formatted values as you've specified in your proc format statement and then performs the analysis using the formatted values.
If you want to keep the original formatting, you can use two separate formats: one for the old format and one for the new formatting and call the appropriate format into your procedures depending on when you want to use which format.
You can also use a put statement in a datastep to convert the previously formatted value and "hard code" the formatted value as an actual value in your dataset. For example, if you have a format called "blockno" that you used with a variable called "block" then, using your old format, you could create a variable called blockno_old and set it to the old formatted value with:
block_old=put(block, $blockno.).
You could then modify block with your new values. You would then have to variables in your dataset: block_old which would contain the original values of your variable and block which, after your changes, would contain the new values.
Proc Format is not a format statement
With proc format, you create formats, you do not assign them to variables. That you can do for instance with a format statement.
The format of a variable is not its internal length
A SAS variable can only have two types: numerical (which non SAS programmers call double) or chracter (which non SAS programmers call fixed length character) It can however have hundreds of different formats. The format just determines the way the variable is represented in a report.
You can perfectly change the format of a variable without changing it's length.
Try this:
proc format;
value myFormat
0-10 = 'small'
10-20 ='medium'
20-100='large' ;
run;
data test1;
infile datalines;
length myVar 8.;
input myVar;
format myVar 6.2;
datalines;
1
2.1
9.12
10.123
15.1234
22.12345
50.123456
;
data test2;
set test1;
format myVar myFormat.;
data test3;
set test2;
format myVar 12.6;
run;
title 'In test1, myVar has format 6.2';
proc print data=test1;
run;
title 'In test2, myVar has format myFormat';
proc print data=test2;
run;
title 'In test3, myVar has format 12.6';
proc print data=test3;
run;
You can create a format in a format catalog and store it for any future reference. It always happens that the dataset has new variables and updated variables with new data. So having a format catalog to accommodate the new and old changes will actually help to maintain history of the original and current values.

Change variable length in SAS dataset

I need to change the variable length in a existing dataset. I can change the format and informat but not the length. I get an error. The documentation says this is possible but there are no examples.
Here is my issue. My data source could change so I don't want to pre define columns on import. I want to do a generic import and then look for certain columns and adjust the length.
I have tried PROC SQL and DATA steps. It looks like the only way to do this is to recreate the dataset or the column. Which I don't want to do.
Thanks,
Donnie
If you put your LENGTH statement before the SET statement, in a Data step, you can change the length of a variable. Obviously, you will get truncation if you have data longer than your new length.
However, using a DATA step to change the length is also re-creating the data set, so I'm confused by that part of your question.
The only way to change the length of a variable in a datastep is to define it before a source (SET) dataset is read in.
Conversely you can use an alter statement in a proc sql. SAS support alter statement
Length of a variable remains same once you set the dataset. Add length statements before you set the dataset if you need to change length of a columns
data a;
length a, b, c $200 ;
set b ;
run ;