I created a new variable (Racebin from Race) with an if statement in SAS. The length of the variable is 5 by default, so the categories of the new variable are truncated. How can I set its length to 20, for example?
I tried this:
data birth;
set WORK.birth;
if Race="White" then Racebin $20 = "White";
else Racebin $20 ="Not white";
run;
then this:
data birth;
set WORK.birth;
length Racebin $ 20;
if Race="White" then Racebin = "White";
else Racebin ="Not white";
run;
None of them works.
Your second code is the right way to define a NEW variable.
But since you seem to be overwriting your input dataset by using the same dataset name on both the DATA and SET statements then it is probable that the variable already exists. In that case the only way to change its length is to define it BEFORE the SET statement.
data birth;
length Racebin $20;
set birth;
if Race="White" then Racebin = "White";
else Racebin ="Not white";
run;
Or you could eliminate it from the input data by using the DROP= dataset option.
data birth;
set birth(drop=racebin);
length Racebin $20;
if Race="White" then Racebin = "White";
else Racebin ="Not white";
run;
You can avoid that issue of new variables already existing by not overwriting your inputs. Use a different name for the result of the DATA step than the input data it is reading.
Related
I am rearranging some data to run in a linear regression and used the code below to change. When I look at the new dataset however the New Names are incomplete. For example Day11 shows are Day1, Day14 as Day1, Day20 as Day2 ect. Where is my code causing this problem?
Data EMU116Linear;
Set EMU116;
TumorVolume = Day0; Day = "Day0"; output;
TumorVolume = Day3; Day = "Day3"; output;
TumorVolume = Day7; Day = "Day7"; output;
TumorVolume = Day9; Day = "Day9"; output;
TumorVolume = Day11; Day = "Day11"; output;
TumorVolume = Day14; Day = "Day14"; output;
TumorVolume = Day16; Day = "Day16"; output;
TumorVolume = Day18; Day = "Day18"; output;
TumorVolume = Day21; Day = "Day21"; output;
drop Day0 Day3 Day7 Day9 Day11 Day14 Day16 Day18 Day21 TumorWeightDay21;
Run;
Original Data Set
New Data Set
SAS will truncate character values to the variable's designated length. Since you do not explicitly specify length, the very first assignment of Day0 designates the length of the new Day variable to 4 characters. Afterwards, any supplied, longer value to Day variable will truncate to 4 (i.e., Day11 to Day1). To accommodate 5 characters and to initialize new variables use length directive:
Data EMU116Linear;
Set EMU116;
length TumorVolume Day $5;
...
run
However, from your code and desired result a better less repetitive solution may be proc_transpose to reshape your wide data to long format:
proc sort data=EMU116Linear;
by Treatment Treatment2 TreatCode Mouse;
run;
proc transpose
data=EMU116Linear
out=EMU116;
by Treatment Treatment2 TreatCode Mouse;
run;
data EMU116;
set EMU116;
rename col1=TumorVolume
_name_=Day;
run;
Very new to SAS programming 0.0
I am trying to change the title "Listing of Data Set Health" to all uppercase and what I am doing isn't working. PLS HELP.
proc format;
value $Gender
'M'='Male'
'F'='Female'
other= 'Unknown'; * Handle Missing Values;
run;
data health;
infile '/folders/myfolders/health.txt' pad;
input #1 Subj $3.
#4 Gender $1.
#5 (Age HR) (2.)
#9 (SBP DBP Chol) (3.);
if Chol gt 200 then do;
Stoke_Risk = 'High';
LDL_Group = 'Bad';
end;
if Age le 21 then Age_Group = 1;
else if Age le 59 then Age_Group = 2;
else if Age ge 60 then Age_Group = 3;
format Gender $Gender.; *this line could be under data or proc
print;
Current_Year = year(today()); *current year based on today and year function;
Short_Gender = lowcase(Gender); *lower case function for string;
ABP = mean(SBP, DBP); *mean of blood pressure;
run;
title "Listing of Data Set Health";
proc print data=health;
ID Subj;
run;
The title statement is a global statement that is used in open code. If you would like it to always be upper-case, you will want to type your title directly in upper-case:
title "LISTING OF DATA SET HEALTH";
If you want to be able to have it always be in upper-case no matter what you type, you will need to delve into the SAS Macro Facility and macro functions. This is a more advanced aspect of SAS that you will get into later.
The %upcase() macro function can be used in open code to convert any text to upper-case.
title "%upcase(listing of data set health)";
Note that this function differs from upcase(), which you will use in the data step. Functions starting with % are special macro functions.
You can explicitly change it to uppercase in the title statement:
title "LISTING OF DATA SET HEALTH";
If you want to change the title dynamically, you could write a macro like:
%let title = "Listing of Data Set Health";
title "%upcase(&title.)";
I have a questionnaire coded 1-5 and then labeled as (.) for missing variables. How do I code the data to reflect the following:
If patient has =>80% values not missing than missing values will be coded as the mean value of the questions answered. If patient is missing more than 80% of values than set measure summary to missing for patient, drop record.
condomuse;
set int108;
run;
proc means data=condomuse n nmiss missing;
var cusesability CUSESPurchase CUSESCarry CUSESDiscuss CUSESSuggest CUSESUse CUSESMaintain CUSESEmbarrass CUSESReject CUSESUnsure CUSESConfident CUSESComfort CUSESPersuade CUSESGrace CUSESSucceed;
by Intround sid;
run;
Using the following assumptions:
each line/record is a unique person
all variables are numeric
NMISS(), N(), CMISS() and DIM() are functions that can work with arrays.
This will identify all records with 80% or more missing.
data temp; *temp is output data set name;
set have; *have is input data set name;
*create an array to avoid listing all variables later;
array vars_check(*) cusesability CUSESPurchase CUSESCarry CUSESDiscuss CUSESSuggest CUSESUse CUSESMaintain CUSESEmbarrass CUSESReject CUSESUnsure CUSESConfident CUSESComfort CUSESPersuade CUSESGrace CUSESSucceed;
*calculate percent missing;
Percent_Missing = NMISS(of vars_check(*)) / Dim(vars_check);
if percent_missing >= 0.8 then exclude = 'Y';
else exclude = 'N';
run;
To replace with mean or a different method, PROC STDIZE can do that.
*temp is input data set name from previous step;
proc stdize data=temp out=temp_mean reponly method=mean;
*keep only records with more than 80%;
where exclude = 'N';
*list of vars to fill with mean;
VAR cusesability CUSESPurchase CUSESCarry CUSESDiscuss CUSESSuggest CUSESUse CUSESMaintain CUSESEmbarrass CUSESReject CUSESUnsure CUSESConfident CUSESComfort CUSESPersuade CUSESGrace CUSESSucceed;
run;
The different methods for standardization are here, but these are standardization methods not imputation methods.
I am reading a .txt file into SAS, that uses "|" as the delimiter. The issue is there is one column that is using "|" as a word separator as well instead of acting like delimiter, this needs to be in one column.
For example the txt file looks like:
apple|fruit|Healthy|choices|of|food|12|2012|chart
needs to look like this in the SAS dataset:
apple | fruit | Healthy choices of Food | 12 | 2012 | chart
How do I eliminate "|" between "Healthy choices of Food"?
I think this will do what you want:
data tmp1;
length tmp $100;
input tmp $;
cards;
apple|fruit|Healthy|choices|of|food|12|2012|chart
apple|fruit|Healthy|choices|of|food|and|lots|of|other|stuff|12|2012|chart
;
run;
data tmp2;
set tmp1;
num_delims=length(tmp)-length(compress(tmp,"|"));
expected_delims=5;
extra_delims=num_delims-expected_delims;
length new_var $100;
i=1;
do while(scan(tmp,i,"|") ne "");
if i<=2 or (extra_delims+2)<i<=num_delims then new_var=trim(new_var)||scan(tmp,i,"|")||"|";
else new_var=trim(new_var)||scan(tmp,i,"|")||"#";
i+1;
end;
new_var=left(tranwrd(new_var,"#"," "));
run;
This isn't particularly elegant, but it will work:
data tmp;
input tmp $50.;
cards;
apple|fruit|Healthy|choices|of|food|12|2012|chart
;
run;
data tmp;
set tmp;
var1 = scan(tmp,1,'|');
var2 = scan(tmp,2,'|');
var4 = scan(tmp,-3,'|');
var5 = scan(tmp,-2,'|');
var6 = scan(tmp,-1,'|');
var3 = tranwrd(tmp,trim(var1)||"|"||trim(var2),"");
var3 = tranwrd(var3,trim(var4)||"|"||trim(var5)||"|"||trim(var6),"");
var3 = tranwrd(var3,"|"," ");
run;
Expanding a little on Itzy's answer, here is another possible solution:
data want;
/* Define variables */
attrib item length=$10 label='Item';
attrib class length=$10 label='Family';
attrib desc length=$80 label='Item Description';
attrib count length=8 label='Some number';
attrib year length=$4 label='Year';
attrib somevar length=$10 label='Some variable';
length countc $8; /* A temp variable */
infile 'c:\temp\delimited_temp.txt' lrecl=1000 truncover;
input;
item = scan(_infile_,1,'|','mo');
class = scan(_infile_,2,'|','mo');
countc = scan(_infile_,-3,'|','mo'); /* Temp var for numeric field */
count = inputn(countc,'8.'); /* Re-read the numeric field */
year = scan(_infile_,-2,'|','mo');
somevar = scan(_infile_,-1,'|','mo');
desc = tranwrd(
substr(_infile_
,length(item)+length(class)+3
,length(_infile_)
- ( length(item)+length(class)+length(countc)
+length(year)+length(somevar)+5))
,'|',' ');
drop countc;
run;
The key in this case it to read your file directly and handle the delimiters yourself. This can be tricky and requires that your data file is exactly as described. A much better solution would be to go back to whoever gave this this data and ask them to deliver it to you in a more appropriate form. Good luck!
Another possible workaround.
data tmp;
infile '/path/to/textfile';
input tmp :$100.;
array varlst (*) $30 v1-v6;
a=countw(tmp,'|');
do i=1 to dim(varlst);
if i<=2 then
varlst(i) = scan(tmp,i,'|');
else if i>=4 then
varlst(i) = scan(tmp,a-(dim(varlst)-i),'|');
else do j=3 to a-(dim(varlst)-i)-1;
varlst(i)=catx(' ', varlst(i),scan(tmp,j,'|'));
end;
end;
drop tmp a i j;
run;
I have a variable, textvar, that looks like this:
type=1&name=bob
type=2&name=sue
I want to create a new table that looks like this:
type name
1 bob
2 sue
My approach is to use scan to split the variables on & so for the first observation I have
var1 var2
type=1 name=bob
So now I can use scan again to split on =:
vname = scan(var1, 1, '=');
value = scan(var1, 2, '=');
But how can I now assign value to the variable named vname?
PROC TRANPSOSE is the quickest way. You need an ID variable (dummy or real).
data test;
informat testvar $50.;
input testvar $;
datalines;
type=1&name=bob
type=2&name=sue
;;;;
run;
data test_vert;
set test;
id+1;
length scanner $20 vname vvalue $20;
scanner=scan(testvar,1,"&");
do _t=2 by 1 until (scanner=' ');
vname=scan(scanner,1,"=");
vvalue=scan(scanner,2,"=");
output;
scanner=scan(testvar,_t,"&");
end;
run;
proc transpose data=test_vert out=test_T;
by id;
id vname;
var vvalue;
run;
Does this help? Dynamic variable names in SAS
I think I have some code to address this, but left it at my workplace.
Obviously you haven't included your real data, but can't you just hard code some of the values if the format of the raw data is the same in each row? My code converts the "=" and "&" to "," to make the scan function easier to use.
data want (keep=type name);
set test;
_newvar=translate(testvar,",,","&=");
type=input(scan(_newvar,2),best12.);
length name $20;
name=scan(_newvar,4);
run;