I have a column in my sas file as age and another column as finalage. I want to substitute the values in age column by values in agefinal column for just one ID (that is 5)
The code that I used was:
Data temp;
set temp;
if ID = 5;
then age = agefinal;
run;
I could not substitute the values. The values in age column did not change. I tried to run this code to check the character length of values since character type is numeric for both the columns.
Code:
Proc contents data = temp;
tables age agefinal;
run;
The output that I got was:
age : character length 3.
agefinal: character length $3
I would appreciate your suggestions.
Try removing the semicolon at the end of the if statement. Right now what you're doing is deleting all records where the id isn't equal to five.
Try setting the formats to be the same
data temp;
modify temp;
format age agefinal $3.;
run;
and then see if it will let you do the substitution.
The code you provided runs with an ERROR, remove the additional semicolon and that may fix your issue:
/* ORIGINAL */
Data temp;
set temp;
if ID = 5;
then age = agefinal;
run;
/* CORRECTED */
Data temp;
set temp;
if ID = 5 /* REMOVED SEMICOLON */
then age = agefinal;
run;
Cheers
Rob
Related
I have a questionnaire coded 1-5 and then labeled as (.) for missing variables. How do I code the data to reflect the following:
If patient has =>80% values not missing than missing values will be coded as the mean value of the questions answered. If patient is missing more than 80% of values than set measure summary to missing for patient, drop record.
condomuse;
set int108;
run;
proc means data=condomuse n nmiss missing;
var cusesability CUSESPurchase CUSESCarry CUSESDiscuss CUSESSuggest CUSESUse CUSESMaintain CUSESEmbarrass CUSESReject CUSESUnsure CUSESConfident CUSESComfort CUSESPersuade CUSESGrace CUSESSucceed;
by Intround sid;
run;
Using the following assumptions:
each line/record is a unique person
all variables are numeric
NMISS(), N(), CMISS() and DIM() are functions that can work with arrays.
This will identify all records with 80% or more missing.
data temp; *temp is output data set name;
set have; *have is input data set name;
*create an array to avoid listing all variables later;
array vars_check(*) cusesability CUSESPurchase CUSESCarry CUSESDiscuss CUSESSuggest CUSESUse CUSESMaintain CUSESEmbarrass CUSESReject CUSESUnsure CUSESConfident CUSESComfort CUSESPersuade CUSESGrace CUSESSucceed;
*calculate percent missing;
Percent_Missing = NMISS(of vars_check(*)) / Dim(vars_check);
if percent_missing >= 0.8 then exclude = 'Y';
else exclude = 'N';
run;
To replace with mean or a different method, PROC STDIZE can do that.
*temp is input data set name from previous step;
proc stdize data=temp out=temp_mean reponly method=mean;
*keep only records with more than 80%;
where exclude = 'N';
*list of vars to fill with mean;
VAR cusesability CUSESPurchase CUSESCarry CUSESDiscuss CUSESSuggest CUSESUse CUSESMaintain CUSESEmbarrass CUSESReject CUSESUnsure CUSESConfident CUSESComfort CUSESPersuade CUSESGrace CUSESSucceed;
run;
The different methods for standardization are here, but these are standardization methods not imputation methods.
In SAS, I want to use the 'if ~ else if' function to perform some instructions if the first data is character and to perform other instructions if they are numbers.
I tried to use 'if ~ else if' statement but I do not know how to specify the conditional statement.
In this code,
data pr1;
input ~~~~
put profname$ course;
cards;
LEE 22
15 PARK
;
run;
I want to show this.
profname course
LEE 22
PARK 15
What can I put in '~~~' ??
There may be some clever INPUT statement magic you could do here, but I think the easiest, clearest solution would be to read in both columns as character data and then test them to see which column has which data:
data pr1 (keep = profname course);
* Declare PROFNAME as character and COURSE as numeric;
length profname $ 20 course 8;
* Read in two columns of data;
input col1 $ col2 $
if input(col1, ??best.) = . then do;
* If COL1 cannot be converted to a number, assume it is a name;
profname = col1;
course = input(col2, ??best.);
end; else do;
* Otherwise assume COL2 is the name;
profname = col2;
course = input(col1, ??best.);
end;
cards;
LEE 22
15 PARK
;
run;
The ?? modifiers in the INPUT() function suppress the usual warnings when a value can't be processed.
Have a variable called var1 that has two kinds of values (both as character strings). One is "ND" the other is a number out of 0-100, as a string. I want to convert "ND" to 0 and the character string to a numeric value, for example 1(character) to 1(numeric).
Here's my code attempt:
data cleaned_up(drop = exam_1);
set dataset.df(rename=(exam1=exam_1));
select (exam1);
when ('ND') do;
exam1 = 0;
end;
when ;
exam1 = input(exam_1,2.);
end;
otherwise;
end;
Clearly not working. What am I doing wrong?
A couple of problems with your code. Putting the rename statement as a dataset option against the input dataset will perform the rename before the data is read in. Therefore exam1 won't exist as it is now called exam_1. This will still be defined as a character column, so the input function won't work.
You need to keep the existing column, create a new numeric column to do the conversion, then drop the old column and rename the new one. This can be done as a dataset option against the output dataset.
The tranwrd function will replace all occurrences of 'ND' to '0', then using input with the best12 informat will read in all the data as numbers. You don't have to specify the length when reading numbers (i.e. 2. for 2 digits, 3. for 3 digits etc).
data cleaned_up (drop=exam1 rename=(exam_1=exam1));
set df;
exam_1 = input(tranwrd(exam1,'ND','0'),best12.);
run;
You are using select(exam1) while it should be select(exam_1). You can use select for this purpose, but I think simple if condition can solve this much easier:
data test;
length source $32;
do source='99', '34.5', '105', 'ND';
output;
end;
run;
data result(drop = convertedValue);
set test;
if (source eq 'ND') then do;
result = 0;
end;
else do;
convertedValue = input(source,??best.);
if not missing(convertedValue) then do;
if (0 <= round(convertedValue, 1E-12) <= 100) then do;
result = convertedValue;
end;
end;
end;
run;
input(source,??best.) tries to convert source to number and if it fails (e.g. values contains some word), it does not print an error and simply continues execution.
round(convertedValue,1E-12) is used to avoid precision error during the comparison. If you want to do it absolutely safely you have to use something like
if (0 < round(convertedValue,1E-12) < 100
or abs(round(convertedValue,1E-12)) < 1E-10
or abs(round(convertedValue-100,1E-12)) < 1E-10
)
Try to use ifc function then convert to numeric variable.
data have;
input x $3.;
_x=input(ifc(x='ND','0',x),best12.);
cards;
3
10
ND
;
I have two datasets, both with same variable names. In one of the datasets two variables have character format, however in the other dataset all variables are numeric. I use the following code to convert numeric variables to character, but the numbers are changing by 490.6 -> 491.
How can I do the conversion so that the numbers wouldn't change?
data tst ;
set data (rename=(Day14=Day14_Character Day2=Day2_Character)) ;
Day14 = put(Day14_Character, 8.) ;
Day2 = put(Day2_Character, 8.) ;
drop Day14_Character Day2_Character ;
run;
Your posted code is confused. Half of it looks like code to convert from character to numeric and half looks like it is for the other direction.
To convert to character use the PUT() function. Normally you will want to left align the resulting string. You can use the -L modifier on the end of the format specification to left align the value.
So to convert numeric variables DAY14 and DAY2 to character variables of length $8 you could use code like this:
data want ;
set have (rename=(Day14=Day14_Numeric Day2=Day2_Numeric)) ;
Day14 = put(Day14_Numeric, best8.-L) ;
Day2 = put(Day2_Numeric, best8.-L) ;
drop Day14_Numeric Day2_Numeric ;
run;
Remember you use PUT statement or PUT() function with formats to convert values to text. And you use the INPUT statement or INPUT() function with informats to convert text to values.
Change the format to something like Best8.2:
data tst ;
set data (rename=(Day14=Day14_Character Day2=Day2_Character)) ;
Day14 = put(Day14_Character, best8.2) ;
Day2 = put(Day2_Character, best8.2) ;
drop Day14_Character Day2_Character ;
run;
Here is an example:
data test;
input r ;
datalines;
500.04
490.6
;
run;
data test1;
set test;
num1 = put(r, 8.2);
run;
If you do not want to specify the width and number of decimal points you can just use the BEST. informat and SAS will automatically assign the width and decimals based on the input data. However the length of the outcome variable may be large unless you specify it explicitly. This will still retain your numbers as in the original variable.
Hi there pretty simple question I think, I have a record like this for example :
name value
Mack 12
Mack 10
Mack 50
Now I want to put all the value in a variable or a single row. The result should be
value_concat
12,10,50
I try to use the first and last statement with SAS but it not working for me here is what I wrote :
data List_Trt1;
set List_Trt;
by name;
if first.name then value_concat = value;
value_concat = cats(value_concat,",",value);
if last.name then value_concat = cats(value_concat,",",value);
run;
Thank you for the help!
You're on the right track.
data List_Trt1;
set List_Trt;
by name;
length value_concat $30; *or whatever is appropriate;
retain value_concat;
if first.name then value_concat=' ';
value_concat=catx(',',value_concat,value);
if last.name then output;
run;
First, you need retain so it keeps its value throughout. Second, you need to initialize it to blank on first.name. Third, you need to output only on last.name. I use CATX because it is more appropriate to what you are doing, but your CATS should be okay also.