Struggling to get this If/Then/Else statement not working. I have two columns: Variable and Value. Variable has the name of the variable and Value has all the potential codes that could be associated with that Variable.
Example:
Variable Value
Gender F
Gender M
I want to create a field called "Flag" and if the value isn't among the list of values, it should flag that field; otherwise, leave that field blank
data Want;
length
Variable $40.
Value $40.
Flag $8.;
set Have (keep = Variable Value);
if (Variable = 'Gender' and Value ^= 'M') then Flag = 'UnkCode'; else Flag="";
if (Variable = 'Gender' and Value ^= 'F') then Flag = 'UnkCode'; else Flag="";
if (Variable = 'Gender' and Value ^= 'O') then Flag = 'UnkCode'; else Flag="";
if (Variable = 'Gender' and Value ^= 'U') then Flag = 'UnkCode'; else Flag="";
run;
quit;
The dataset I'm using has only has two values for Gender: F and M. For whatever reason, the flag field in both lines has "UnkCode"
Any idea what I'm doing wrong?
Just to be, possibly, a little more clear: your if statements are evaluated sequentially.
So for your first observation, Flag will be initially set to "", as ('M' = 'M'). However, Flag is then overwritten by your subsequent if statements, and as ('M' ^= 'F'), Flag is overwritten, and takes the value 'UnkCode'.
In addition to Keni's use of an in statement (which is better than the code I am about to suggest), you could also do the following (which may help you understand if statements better).
if (Variable = 'Gender' and Value = 'M') then Flag = "";
else if (Variable = 'Gender' and Value = 'F') then Flag = "";
else if (variable = 'Gender' and Value = 'O') then Flag = "";
else if (Variable = 'Gender' and Value = 'U') then Flag = "";
else if (Variable = 'Gender') then Flag = 'UnkCode';
I might also suggest that instead of having a variable named 'Variable', with a value of 'Gender', you simply have a variable named 'Gender' with a value of 'F' or 'M'. While there are certainly specific circumstances in which you would not want to create your dataset this way, they are relatively few and far between.
I put all the IF conditions in one statement and it worked.
data Want;
length Variable $40. Value $40. Flag $8.;
set Have (keep = Variable Value);
If (Variable = 'Gender' and Value in ('F','M','O','U')) then Flag =" ";
else Flag = 'UnkCode';
run;
quit;
For multiple variables to flag for... this should work.
data Want;
length Variable $40. Value $40. Flag $8.;
set Have (keep = Variable Value);
Flag = 'UnkCode';
if (Variable = 'Gender' and Value in ('F','M','O','U')) then Flag =" ";
else if (Variable = 'Race' and Value in ('B','A')) then Flag =" ";
run;
quit;
Related
I created a SAS function using fcmp to calculate the jaccard distance between two strings. I do not want to use macros, as I'm going to use it through a large dataset for multiples variables. the substrings I have are missing others.
proc fcmp outlib=work.functions.func;
function distance_jaccard(string1 $, string2 $);
n = length(string1);
m = length(string2);
ngrams1 = "";
do i = 1 to (n-1);
ngrams1 = cats(ngrams1, substr(string1, i, 2) || '*');
end;
/*ngrams1= ngrams1||'*';*/
put ngrams1=;
ngrams2 = "";
do j = 1 to (m-1);
ngrams2 = cats(ngrams2, substr(string2, j, 2) || '*');
end;
endsub;
options cmplib=(work.functions);
data test;
string1 = "joubrel";
string2 = "farjoubrel";
jaccard_distance = distance_jaccard(string1, string2);
run;
I expected ngrams1 and ngrams2 to contain all the substrings of length 2 instead I got this
ngrams1=jo*ou*ub
ngrams2=fa*ar*rj
If you want real help with your algorithm you need to explain in words what you want to do.
I suspect your problem is that you never defined how long you new character variables NGRAM1 and NGRAM2 should be. From the output you show it appears that FCMP defaulted them to length $8.
To define a variable you need use a LENGTH statement (or an ATTRIB statement with the LENGTH= option) before you start referencing the variable.
I want to display my data as either yes or no in the output for initaltesting, site visit, and follow up, how would I do that? There are numeric values for this on the data set but want character responses of "y" or "n"
PROC FORMAT;
VALUE SiteVisitfmt 1 = 'yes'
0 = 'no';
VALUE InitialTestingfmt 1 = 'yes'
2 = 'no';
VALUE TestEventfmt 1 = 'One Event '
2 = 'Two Events'
3 = 'Three Events'
4 = 'Four Events'
5 = 'Five Events';
VALUE FollowUpfmt 1 = 'yes'
0 = 'no';
FORMAT SiteVisit SiteVisitfmt. InitialTesting InitialTestingfmt. TestEvent TestEventfmt.
FollowUp FollowUpfmt.;
RUN;
data PMdataedits;
set PMdata (rename = (Number_of_Days_from_Onset_to_Sit =SiteVisit
Number_of_Days_between_Onset_and = InitialTesting
Number_of_Test_Events_in_IRIS = TestEvent
Number_of_Days_between_Test_1_an = FollowUp));
drop SPA;
attrib date1 format=date9.;
date1=input(date,mmddyy10.);
NewSiteVisit = put(SiteVisit, 8.);
NewInitialTesting = put(InitialTesting, 8.);
NewFollowUp = put(FollowUp, 8.);
NewSiteVisit=;
if (NewSiteVisit=<1) THEN NewSiteVisit= '1';
if (NewSiteVisit>1) THEN NewSiteVist= '0';
NewInitialTesting=;
if (NewInitialTesting<=2) THEN NewInitialTesting= '1';
if (NewInitialTesting>2) THEN NewInitialTesting='0';
This statement:
FORMAT SiteVisit SiteVisitfmt. InitialTesting InitialTestingfmt. TestEvent TestEventfmt.
FollowUp FollowUpfmt.;
Needs to be on the data step (sometime after data PMdataedits; but before the run; that you don't show), not in the proc format. That's the statement that assigns the format to a variable; each dataset (which is defined by a data step) has its own, unique set of variables that can be the same name as other datasets but have different contents and formats.
Also note that you don't have to name the formats after the variables, and don't need three different yes/no formats. You could have done:
proc format;
format ynf
'1'='yes'
'0'='no'
;
run;
And then used
format sitevisit initialtesting followup ynf.;
And that would have covered all three of them with one format. But what you did is legal, it's just more typing than you need!
In SAS EG, I have a user defined format
value $MDC
'001' = '77'
'002' = '77
...
'762' = '14'
etc.
My data set has DRG_code string variables with values like '001' and '140'.
I was trying to create a new variable, with the below code.
MDC = put(DRG_code, $MDC.)
Only there are more values for the variable DRG_code in my data set, then specified in the user defined format file, $MDC.
For example, when the data set DRG_Code equals '140' this value does not exist in the user defined format, and for some reason the put statement is returning MDC = '14' (which should only be its value with the DRUG code is '762').
Is there a way to make sure my put statement only returns a value from the user defined format when a corresponding value is present?
Grateful for feedback.
Lori
I've tried using formatting like "length" to have my put statement return 3, which I thought would result in "140" instead of "14" and that didn't work.
value $MDC
'001' = '77'
'002' = '77
...
'762' = '14'
MDC = put(DRG_code, $MDC.)
Formats have a DEFAULT width. If you do not specify a width when using the format then SAS will use the default width. When making a user defined format PROC FORMAT will set the default width to the maximum width of the formatted values. In your example the default width is being set to 2.
You can override that when you use the format.
MDC = put(DRG_code, $MDC3.)
Or you could define the default when you define the format.
value $MDC (default=3)
'001' = '77'
'002' = '77'
...
'762' = '14'
;
You can also set a default value for the unmatched values using the other keyword.
value $MDC (default=3)
'001' = '77'
'002' = '77'
...
'762' = '14'
other = 'UNK'
;
You can even nest a call to another format for the unmatched values (or any target format). In which case you do not need to specify the default width since the width on the nested format will be used when defining the default width.
value $MDC
'001' = '77'
'002' = '77'
...
'762' = '14'
other = [$3.]
;
I presume all the value mappings were $2 because that is what is used for an 'unfound' source value. In order to ensure the length of 'unfound' values, make sure one of the formatted values has trailing spaces filling out to length of longest unfound value.
value $MDC
'001' = '77 ' /* 7 characters, presuming no DRG_code exceeds 7 characters */
'002' = '77'
'762 = '14'
You can also fix this by specifying a length to use when applying the format, e.g.
proc format;
value $MDC
'001' = '77'
'762' = '14'
;
run;
data _null_;
do var = '001','140','762';
var_formatted = quote(put(var,$MDC3.));
put var= var_formatted=;
end;
run;
Output:
var=001 var_formatted="77 "
var=140 var_formatted="140"
var=762 var_formatted="14 "
N.B. both this solution and Richard's will result in trailing whitespace being added to formatted values, as you can see from the quotes.
Here I propose a slight modification to user667489's solution so that:
you don't need to specify the length of the format every time you use it (using the default option of the value statement when defining the format)
the resulting formatted value doesn't have trailing blanks (using the trim() function on the output resulting from applying the format)
i.e.
proc format;
value $MDC(default=3)
'001' = '77'
'002' = '77'
'762' = '14'
;
run;
data _null_;
do var = '001', '140', '762';
var_formatted = quote(trim(put(var, $MDC.)));
put var= var_formatted=;
end;
run;
which gives the following output:
var=001 var_formatted="77"
var=140 var_formatted="140"
var=762 var_formatted="14"
I need a trigger to update a table DIRECTORY_NUMBER when one value of DN_NUM column matches with MSISDN column value of a different table (RNPH_REQUETS_DETAILS) under a different schema(NKADM). The trigger will run every time there's a new entry in the DIRECTORY_NUMBER table. Based upon several conditions, the values of the DN_STATUS column and a few other columns need to be updated. The updated value of the DN_STATUS column will be 'r' if the conditions are met, and 'w' if the conditions are not met. Active portion of my code is given below:
UPDATE d
SET d.DN_STATUS = CASE WHEN EXISTS (SELECT 1 from NKADM.RNPH_REQUESTS_DETAILS n where n.MSISDN = d.DN_NUM AND n.PROCESS_STATE_ID = 4 AND n.ACTION='IN' AND n.FAILED_STATUS IS NULL AND TRUNC(n.MODIFICATION_DATE) = TRUNC(SYSDATE))
THEN 'r'
ELSE 'w'
END,
d.DN_MODDATE = SYSDATE,
d.BUSINESS_UNIT_ID = 2,
d.HLCODE = 5
WHERE d.DN_ID =: NEW.DN_ID
AND d.PLCODE = 1004
AND d.DN_STATUS = 'f'
FROM DIRECTORY_NUMBER d;
I am getting the following error:
Error(48,1): PL/SQL: SQL Statement ignored
Error(60,3): PL/SQL: ORA-00933: SQL command not properly ended
The errors get resolved only if I remove the references. But that gives a different result than intended. When the code is as follows:
UPDATE DIRECTORY_NUMBER
SET DN_STATUS = CASE WHEN EXISTS (SELECT 1 from NKADM.RNPH_REQUESTS_DETAILS where MSISDN = DN_NUM AND PROCESS_STATE_ID = 4
AND ACTION='IN' AND FAILED_STATUS IS NULL AND TRUNC(MODIFICATION_DATE) = TRUNC(SYSDATE))
THEN 'r'
ELSE 'w'
END,
DN_MODDATE =SYSDATE,
BUSINESS_UNIT_ID=2,
HLCODE =5
WHERE DN_ID =:NEW.DN_ID
AND PLCODE =1004
AND DN_STATUS ='f';
COMMIT;
Even when the CASE WHEN EXISTS condition is true (returns result when run independently), the value of DN_STATUS gets updated to 'w'.
Update: I tried with the following code:
UPDATE DIRECTORY_NUMBER
SET DN_STATUS = 'r',
DN_MODDATE =SYSDATE,
BUSINESS_UNIT_ID=2,
HLCODE =5
WHERE DN_ID =:NEW.DN_ID
AND PLCODE =1004
AND DN_STATUS ='f';
AND DN_NUM in (select MSISDN from NKADM.RNPH_PROCESS_DETAILS where PROCESS_STATE_ID = 4);
This isn't working either. If I remove the last condition, the resultant row has DN_STATUS value of 'f', and the MSISDN is in NKADM.RNPH_PROCESS_DETAILS table with PROCESS_STATE_ID = 4. I don't understand why it's not working.
What am I doing wrong?
In BEFORE update/insert trigger for EACH ROW you can modify data of record which is currently processed. You don't need to call an extra UPDATE to change the data.
In other words you can do something like this
IF :NEW.PLCODE = 1004 AND :NEW.DN_STATUS = 'f' THEN
:NEW.DN_MODDATE := SYSDATE;
:NEW.BUSINESS_UNIT_ID := 2;
:NEW.HLCODE := 5;
-- this query you can wrap in a function and call this function
SELECT COUNT(1)
INTO lv_count
FROM NKADM.RNPH_REQUESTS_DETAILS n
WHERE n.MSISDN = :NEW.DN_NUM
AND n.PROCESS_STATE_ID = 4
AND n.ACTION = 'IN'
AND n.FAILED_STATUS IS NULL
AND TRUNC(n.MODIFICATION_DATE) = TRUNC(SYSDATE);
IF lv_count > 0 THEN
:NEW.DN_STATUS := 'r';
ELSE
:NEW.DN_STATUS := 'w';
END IF;
END IF;
For some reason when SAS does proportional hazards regression it is including those observations that are specified as . as a group in the results. I suspect it has something to do with how I created my variable (and that SAS thinks my numeric variables are characters) but I can't figure out what I did wrong. I am using SAS 9.4
data final; set final;
if edu_d = 'hs less' then edu_regress = 1;
else if edu_d = 'hs' then edu_regress = 1;
else if edu_d = 'some college' then edu_regress = 2;
else if edu_d = 'college plus' then edu_regress = 3;
else if edu_d = 'missing' then edu_regress=.;
run;
Then I run my regression:
proc phreg data=final;
class edu_regress;
model fuptime*dc(0)=edu_regress/rl;
run;
And the output is as follows:
edu_regress . 1 0.10963 0.12941 0.7177 0.3969 1.116 0.866 1.438
edu_regress 1 1 0.22514 0.10949 4.2278 0.0398 1.252 1.011 1.552
edu_regress 2 1 0.21706 0.11410 3.6190 0.0571 1.242 0.993 1.554
Where . is a category instead of treated as missing.
I'm sure I'm making a rookie mistake but I just can't figure it out.
I would clear your output, and re-run the code, and check the log and output.
As I read the docs, to get missing values treated as a category you would need to have /missing on your CLASS statement, which you do not have in the code shown. Without that, I think missing values should be automatically excluded.
When I run PHREG with a CLASS variable that has missing values, I get a note in the log about observations being deleted due to missing values, and the output shows that the number of observations used is less than the number of observations read.
If SAS thinks edu_regress is character, that's possible if it already was on the dataset as character. This is one reason not to do data x; set x; and instead make a new dataset. You should see notes in the datastep when you run it the way you have now regarding numeric to character conversion, if this is indeed the problem.
Anyway, one way to adjust this is to use CALL MISSING. It sets a variable to missing correctly regardless of the type.
data final;
set final;
if edu_d = 'hs less' then edu_regress = 1;
else if edu_d = 'hs' then edu_regress = 1;
else if edu_d = 'some college' then edu_regress = 2;
else if edu_d = 'college plus' then edu_regress = 3;
else if edu_d = 'missing' then call missing(edu_Regress);
run;