I'm trying to compare if a number of different variables happen in the order I expect using a macro. My code is:
%macro Order (second,first,var);
data order;
set data;
if &second. > &first. then &var._Correct = 1; else &var._Correct = 0;
if &second. < &first. then &var._Error = 1; else &var._Error = 0;
run;
%mend order;
%order(B,A,AB);
%order(C,B,BC);
I have a lot of other variables to compare. The problem is, when I run the macro the output dataset only has the last pair. In this example, that would be BC. I know I can make multiple output datasets and each one would have the pairs, but then I'd have to rejoin them all together. How can I get one dataset that has all of my &var._Correct and &var._Error pairs?
Your problem is that you're rewriting the data step twice. That's unneeded. Most of the time, macros like this can be lines in a data step not whole data steps.
%macro Order (second,first,var);
if &second. > &first. then &var._Correct = 1; else &var._Correct = 0;
if &second. < &first. then &var._Error = 1; else &var._Error = 0;
%mend order;
data order;
set data;
%order(B,A,AB);
%order(C,B,BC);
run;
Something more like that. I'd note a few minor issues here. What if &second=&first? You want no correct and no error, or is that correct or error?
And an easier way to do this:
%macro Order (second,first,var);
&var._correct = (&second. > &first.); *or GE?;
&var._error = (&second. < &first.); *or LE?; *only one of these two;
%mend order;
That puts the same values into the variable in a lot less code.
Related
I created a SAS function using fcmp to calculate the jaccard distance between two strings. I do not want to use macros, as I'm going to use it through a large dataset for multiples variables. the substrings I have are missing others.
proc fcmp outlib=work.functions.func;
function distance_jaccard(string1 $, string2 $);
n = length(string1);
m = length(string2);
ngrams1 = "";
do i = 1 to (n-1);
ngrams1 = cats(ngrams1, substr(string1, i, 2) || '*');
end;
/*ngrams1= ngrams1||'*';*/
put ngrams1=;
ngrams2 = "";
do j = 1 to (m-1);
ngrams2 = cats(ngrams2, substr(string2, j, 2) || '*');
end;
endsub;
options cmplib=(work.functions);
data test;
string1 = "joubrel";
string2 = "farjoubrel";
jaccard_distance = distance_jaccard(string1, string2);
run;
I expected ngrams1 and ngrams2 to contain all the substrings of length 2 instead I got this
ngrams1=jo*ou*ub
ngrams2=fa*ar*rj
If you want real help with your algorithm you need to explain in words what you want to do.
I suspect your problem is that you never defined how long you new character variables NGRAM1 and NGRAM2 should be. From the output you show it appears that FCMP defaulted them to length $8.
To define a variable you need use a LENGTH statement (or an ATTRIB statement with the LENGTH= option) before you start referencing the variable.
Variable name is PRC. This is what I have so far. First block to delete negative values. Second block is to delete missing values.
data work.crspselected;
set work.crspraw;
where crspyear=2016;
if (PRC < 0)
then delete;
where ticker = 'SKYW';
run;
data work.crspselected;
set work.crspraw;
where ticker = 'SKYW';
where crspyear=2016;
where=(PRC ne .) ;
run;
Instead of using a function to remove negative and missing values, it can be done more simply when inputting or outputting the data. It can also be done with only one data step:
data work.crspselected;
set work.crspraw(where = (PRC >= 0 & PRC ^= .)); * delete values that are negative and missing;
where crspyear = 2016;
where ticker = 'SKYW';
run;
The section that does it is:
(where = (PRC >= 0 & PRC ^= .))
Which can be done for either the input dataset (work.crspraw) or the output dataset (work.crspselected).
If you must use a function, then the function missing() includes only missing values as per this answer. Hence ^missing() would do the opposite and include only non-missing values. There is not a function for non-negative values. But I think it's easier and quicker to do both together simultaneously without a function.
You don't need more than your first test to remove negative and missing values. SAS treats all 28 missing values (., ._, .A ... .Z) as less than any actual number.
I have some SAS code along the lines of:
DATA MY_SAMPLE;
SET SAMPLE;
BY A;
IF A = 1 THEN B = 1;
ELSE IF A ^= 1 THEN B = 0;
ELSE IF MISSING(A) THEN B = .;
IF FIRST.A;
RUN;
which is returning a set with 0 observations (it shouldn't do this). I have sorted the data by A and tried reading the data into an intermediate dataset before applying the IF FIRST.A but get the same results.
Am I missing something completely obvious? I use the FIRST and LAST all of the time!
Agree with #Robert, the sample code should output records, assuming there are records in your input data and it is sorted.
I would double-check the log from your real program/data, and make sure there are no errors, and that the input dataset has records.
If that doesn't help, I would add some debugging PUT statements, something like below (untested):
DATA MY_SAMPLE;
SET SAMPLE;
BY A;
IF A = 1 THEN B = 1;
ELSE IF A ^= 1 THEN B = 0;
ELSE IF MISSING(A) THEN B = .; *This will never be true ;
put "Before subsetting if " (_n_ A first.A)(=) ;
IF FIRST.A;
put "After subsetting if " (_n_ A first.A)(=) ;
RUN;
As Robert noted, as written your Else if Missing(A) would never be true, because if A is missing the prior Else if A ^= 1 will evaluate to true because SAS uses binary logic (true/false), not trinary logic(true/false/null).
Also I would check for any stray OUTPUT statements in your code.
Checked the log; checked the input; closed MSSQL down; opened it up again and lo and behold, code worked first time. Thanks for the downgrade, but I didn't realize that MSSQL is prone to twitches!
For some reason when SAS does proportional hazards regression it is including those observations that are specified as . as a group in the results. I suspect it has something to do with how I created my variable (and that SAS thinks my numeric variables are characters) but I can't figure out what I did wrong. I am using SAS 9.4
data final; set final;
if edu_d = 'hs less' then edu_regress = 1;
else if edu_d = 'hs' then edu_regress = 1;
else if edu_d = 'some college' then edu_regress = 2;
else if edu_d = 'college plus' then edu_regress = 3;
else if edu_d = 'missing' then edu_regress=.;
run;
Then I run my regression:
proc phreg data=final;
class edu_regress;
model fuptime*dc(0)=edu_regress/rl;
run;
And the output is as follows:
edu_regress . 1 0.10963 0.12941 0.7177 0.3969 1.116 0.866 1.438
edu_regress 1 1 0.22514 0.10949 4.2278 0.0398 1.252 1.011 1.552
edu_regress 2 1 0.21706 0.11410 3.6190 0.0571 1.242 0.993 1.554
Where . is a category instead of treated as missing.
I'm sure I'm making a rookie mistake but I just can't figure it out.
I would clear your output, and re-run the code, and check the log and output.
As I read the docs, to get missing values treated as a category you would need to have /missing on your CLASS statement, which you do not have in the code shown. Without that, I think missing values should be automatically excluded.
When I run PHREG with a CLASS variable that has missing values, I get a note in the log about observations being deleted due to missing values, and the output shows that the number of observations used is less than the number of observations read.
If SAS thinks edu_regress is character, that's possible if it already was on the dataset as character. This is one reason not to do data x; set x; and instead make a new dataset. You should see notes in the datastep when you run it the way you have now regarding numeric to character conversion, if this is indeed the problem.
Anyway, one way to adjust this is to use CALL MISSING. It sets a variable to missing correctly regardless of the type.
data final;
set final;
if edu_d = 'hs less' then edu_regress = 1;
else if edu_d = 'hs' then edu_regress = 1;
else if edu_d = 'some college' then edu_regress = 2;
else if edu_d = 'college plus' then edu_regress = 3;
else if edu_d = 'missing' then call missing(edu_Regress);
run;
I may be missing something obvious, but how do you calculate 'powers' in SAS?
Eg X squared, or Y cubed?
what I need is to have variable1 ^ variable2, but cannot find the syntax... (I am using SAS 9.1.3)
got it! there is no function.
you need to do:
variable1 ** variable2;
data t;
num = 5;
pow = 2;
res = num**pow;
run;
proc print data = t;
run;
Use the POWER function and, if necessary, the CONSTANT function.
nbr_squared = power(nbr, 2);
nbr_cubed = power(nbr, 3);
E_to_the_power_2 = power(constant('E'),2);