I have a series of string values with missing observations. I would like to use flat substitution. For instance variable x has 3 available values. There should be a 33.333% chance that a missing value will be assigned to the available values for x under this substitution method. How would I do this?
DATA have;
INPUT id a $ b $ c $ x;
CARDS;
1 Y Male . 5
2 Y Female . 4
3 . Female Tall 4
4 Y . Short 2
5 N Male Tall 1
;
Run;
You could use temporary arrays to store the possible values. Then generate a random index into the array.
DATA have;
INPUT id a $ b $ c $ x;
CARDS;
1 Y Male . 5
2 Y Female . 4
3 . Female Tall 4
4 Y . Short 2
5 N Male Tall 1
;
data want ;
set have ;
array possible_b (2) $8 ('Male','Female') ;
if missing(b) then b=possible_b(1+int(rand('uniform')*dim(possible_b)));
run;
I did this with generating random numbers and hard coding the limits. There should be an easier way to do this, but for the purposes of the question this should work.
option missing='';
data begin;
input a $;
cards;
a
.
b
c
.
e
.
f
g
h
.
.
j
.
;
run;
data intermediate;
set begin;
if a EQ '' then help= rand("uniform");
else help=.;
run;
data wanted;
set intermediate;
format help populated.;
if a EQ '' then do;
if 0<=help<0.33 then a='V1';
else if 0.33<=help<0.66 then a='V2';
else if 0.66<=help then a='V3';
end;
drop help;
run;
In the dataset i have 3 variables id, type and value. I would like to calculate the score status named heart according to the type and value.
the initial value for heart is 8.
There is error on the else if statement;
data score;
input id type $ value;
retain heart 8; /*8 is initial value of heart*/
if type = "add" then heart+value;
else if type = "minus" then heart-value;
else heart=heart;
datalines;
1001 add 10
1002 minus 5
1003 add 2
1004 add 5
1005 minus 6
;
run;
The sum statment has the syntax, variable+expression. The plus sign is required. The hyphen - in your else if statement will not form a sum statement; instead, you must use: else if type = "minus" then heart+-value;.
I am trying to do a recursive lag in sas, the problem that I just learned is that x = lag(x) does not work in SAS.
The data I have is similar in format to this:
id date count x
a 1/1/1999 1 10
a 1/1/2000 2 .
a 1/1/2001 3 .
b 1/1/1997 1 51
b 1/1/1998 2 .
What I want is that given x for the first count, I want each successive x by id to be the lag(x) + some constant.
For example, lets say: if count > 1 then x = lag(x) + 3.
The output that I would want is:
id date count x
a 1/1/1999 1 10
a 1/1/2000 2 13
a 1/1/2001 3 16
b 1/1/1997 1 51
b 1/1/1998 2 54
Yes, the lag function in SAS requires some understanding. You should read through the documentation on it (http://support.sas.com/documentation/cdl/en/lefunctionsref/67398/HTML/default/viewer.htm#n0l66p5oqex1f2n1quuopdvtcjqb.htm)
When you have conditional statements with a lag inside the "then", I tend to use a retained variable.
data test;
input id $ date count x;
informat date anydtdte.;
format date date9.;
datalines;
a 1/1/1999 1 10
a 1/1/2000 2 .
a 1/1/2001 3 .
b 1/1/1997 1 51
b 1/1/1998 2 .
;
run;
data test(drop=last);
set test;
by id;
retain last;
if ^first.id then do;
if count > 1 then
x = last + 3;
end;
last = x;
run;
I have the following data file
H 321 s main st
P mary e 21 f
P aby e 23 m
P stary e 31 f
P dory e 23 m
H 321 s second st
P lary e 31 m
P laby e 43 m
P ltary e 31 m
P lory e 23 m
P lwey e 43 f
P lwty e 35 f
P lowtetr e 25 m
H 4351 s 35343nd st
I try to calculate the number of people living at a certain address . So the resulting data set should have 3 observations .
Here is the code
data ch21.test2 ;
infile testFile end = last ;
retain address ;
input type $1. # ;
if type = 'H' then
do ;
if _n_ > 1 then
output ;
total=0 ;
input #3 address $3-21 ;
end ;
else if type='P' then
input #3 name $12. +1 age 2. +1 gender $1. ;
total+1 ;
if last then
output ;
run ;
However i get only one row .
I don't know why you only got one row out of your code; I get two when I run it (probably three, I'm using datalines which do not support the end variable). However, you don't have a do loop around the else condition, which leads to you getting the wrong answer (probably one higher than you should). Your input is a bit confusing as it combines input styles, but it's not wrong particularly; I changed it below to what I'm more comfortable with but yours works as well (it's just harder to read). I did however add a #1 to type that is probably a good idea; in the event that you have unexpected input issues, #1 makes sure you're reading the first character (as that's what you want).
If you're still only getting one row, you may have an issue with the format of your data file; perhaps it is a UNIX file and you're reading on a Windows machine, so it doesn't respect the EOL character, for example.
data test2 ;
infile datalines end = last ;
retain address ;
input #1 type $1. # ; *add #1;
if type = 'H' then
do ;
if _n_ > 1 then
output ;
total=0 ;
input #3 address $19. ; *converted to formatted style;
end ;
else if type='P' then do; *added do - you had indented here but did not have a do;
input #3 name $12. #15 age 2. #18 gender $1.; *converted to all formatted style;
total+1 ;
end; *added end - assuming if last then output should be outside?;
if last then
output ;
datalines;
H 321 s main st
P mary e 21 f
P aby e 23 m
P stary e 31 f
P dory e 23 m
H 321 s second st
P lary e 31 m
P laby e 43 m
P ltary e 31 m
P lory e 23 m
P lwey e 43 f
P lwty e 35 f
P lowtetr e 25 m
H 4351 s 35343nd st
;;;;
run;
You are missing &. You data contains blanks . Try this
data ch21.test2 ;
infile testFile end = last ;
retain address ;
input type $1. # ;
if type = 'H' then
do ;
if _n_ > 1 then
output ;
total=0 ;
input #3 address & $18. ; /* ADDED & and corrected to $18. */
end ;
else if type='P' then
input #3 name & $12. +1 age 2. +1 gender $1. ; /* ADDED & */
total+1 ;
if last then
output ;
run ;
I'm trying to convert a character string to a numeric variable and then sum the values of each character to use as a unique identifier for that field.
So for example, I would like A=1, B=2, C=3.....X=24 Y=25 Z=26.
Say my string is "CAB" so after running the code I would like the result to be an intermidiary column of numbers, where the value for CAB IS 3 1 2 and the result column would be derived by summing the string 3+1+2= 6 and show the value of the intermideate column, so the final value woud be 6.
Here is the sas code I used to convert the characters to numbers, but I need help with the result column.
DATA CHAR_VALUE;
SET WORK.XYZ;
CHAR_2_NUM=TRANSLATE(MY_VAR_CHAR, '1 2 3 ...24 25 26', 'A B C ...X Y Z');
NUM_CHAR=INPUT(CHAR_2_NUM,32.);
RUN;
Thanks in advance...I appreciate any help or suggestions.
-rachel
RANK will give the ASCII numeric value underlying a character; so A=65, B=66, Z=90, a=97, z=122.
So this should work (if you want only the uppercase values - not a different value for a than A):
data test;
charval='CAB';
do _t=1 to length(Charval);
numval=sum(numval,rank(char(upcase(charval),_t))-64);
end;
put _all_;
run;
Another option (Based on the comments below), is to build an informat with the relationships between letter and value. My loop iterates over each character A to Z, you can then put whatever value you want for each letter as label (I just put 1,2,3,4... but label= will change that).
data fmts;
retain fmtname 'CHARNUM' type 'i';
do _t=65 to 90;
start=byte(_t); *the character, so byte(65)='A';
label=_t-64; *the resulting number;
output;
end;
run;
proc format cntlin=fmts;
quit;
data test;
charval='CAB';
do _t=1 to length(Charval);
numval=sum(numval,input(char(upcase(charval),_t),CHARNUM.));
end;
put _all_;
run;
Finally, if you want to be able to construct this in the same datastep, you could construct the relationships in a hash table and look up the result. I can explain that if desired, though I'd like to see a more detailed example of what you want to do in terms of defining the relationship between a letter and its code.
If you need to see the intermediate values, you can do that by inserting a CAT function in the loop- I recommend CATX:
data test;
charval='CAB';
format intermed $100.;
do _t=1 to length(Charval);
numval=sum(numval,input(char(upcase(charval),_t),CHARNUM.));
intermed=catx('|',intermed,input(char(upcase(charval),_t),CHARNUM.)); *or the RANK portion from earlier;
end;
put _all_;
run;
That would give you 3|1|2, which you could then do math on via SCAN:
do _t = 1 to countc(intermed,'|')+1;
numval2 = sum(numval2,scan(intermed,_t,'|'));
end;
Your method to try and translate is a good attempt, but it will not really work. Here is a simple solution:
DATA CHAR_VALUE;
retain all_chars 'ABCDEFGHIJKLMMOPQRSTUVXXYZ';
set XYZ;
length CHAR_2_NUM $200;
CHAR_2_NUM = ' ';
NUM_CHAR = 0;
do i=1 to length(MY_VAR_CHAR);
if i=1 then CHAR_2_NUM = substr(MY_VAR_CHAR,i,1);
else CHAR_2_NUM = trim(CHAR_2_NUM) || ' ' || substr(MY_VAR_CHAR,i,1);
NUM_CHAR + index(all_chars,substr(MY_VAR_CHAR,i,1));
end;
drop i all_chars;
RUN;
This takes advantage of the fact that the indexed position of each character of your source variable in the all_chars variable corresponds to the mapping you desired.
UPDATED to also create your CHAR_2_NUM variable, which I overlooked in the original question.
Another simple solution is based on the collate function:
To convert a variable called MyNumbers (in the range of 1 to 26) to English upper-case characters, one can use:
collate(64 + MyNumbers, 64 + MyNumbers)
To obtain lower-case characters, one can use:
collate(96 + MyNumbers, 96 + MyNumbers)
Here's a quick example:
data _null_;
do MyNumbers = 1 to 26;
MyLettersUpper = collate(64 + MyNumbers, 64 + MyNumbers);
MyLettersLower = collate(96 + MyNumbers, 96 + MyNumbers);
put MyNumbers MyLettersUpper MyLettersLower;
end;
run;
1 A a
2 B b
3 C c
4 D d
5 E e
6 F f
7 G g
8 H h
9 I i
10 J j
11 K k
12 L l
13 M m
14 N n
15 O o
16 P p
17 Q q
18 R r
19 S s
20 T t
21 U u
22 V v
23 W w
24 X x
25 Y y
26 Z z
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
cpu time 0.03 seconds