I would want create a new variable with a condition if x1 is positive the new variable takes 1 else 0. My directory is 'dir' and my sas dataset is 'exemple'. SAS not creates me the x2 variable.
data dir.exemple;
set exemple;
if x1<0 then x2=1;
else x2=0;
end;
run;
The log is
NOTE: Variable x1 is uninitialized.
NOTE: The data set DIR.EXEMPLE has 1 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 0.16 seconds
cpu time 0.04 seconds
As Jess has said, you should look at your whole log first to check for error messages. Right now even if your libname statement path is correct, you'll still get errors.
If you want the condition if x1 is positive then it should be "x1>0", not x1<0. It's only positive if it's greater than zero. And you don't need the end; since you're not using a do or select statement.
libname dir 'C:\sasdata';
data dir.exemple;
set exemple;
if x1>0 then x2=1;
else x2=0;
run;
Related
I'm working on homework and I need to take random observations from a data set. I am trying to create the random data set based on my teacher's coding instructions but I keep getting an error saying my random_number variable does not exist. I have posted an image of my code. Do I need to create the variable or make a proc sort for it?
Code
Here are the error logs:
434 proc print data= random_sample (obs=100);
435 var id country year random_number pr cl;
ERROR: Variable RANDOM_NUMBER not found.
436 run;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
If your goal is to select a random sample then one way to do this is creating a new column with a random number between [0, 1], then sort, then select the first N rows.
data want;
set have;
random_number = rand(“uniform”);
run;
proc sort data=want;
by random_number;
run;
proc print data=want(obs=100);
run;
In case you need something different, please clarify your question.
In your photograph of code the data step that looks like it is trying to create a list of random numbers will fail because it does not have an END for the DO loop. (unless you have it at the end of some line that is out of frame from the camera.)
I am trying to create a variable that will flag (to a "1") when it hits a certain number (when there is improvement in a process). I am then trying to reset the baseline, so that a new baseline (threshold) has to be hit for it to be flagged. the data set starts off with just one variable (x). I create another one from the first observation called "baseline", so I will compare all other "x's" to baseline. once I hit a threshold, I want to change the baseline to the threshold it just hit.
here is the relevant part of the code (note I have already created code that determined baseline earlier in program).
data combo;
set combo;
if (baseline-x)/8 >1 then do;
flag=1;
baseline=x;
end;
else
flag=0;
run;
here is the relevant part of the output.
I am expecting flag to be 1 (which it is) for the third observation (because baseline started out at 259, then moved to 251 as I want it to. but why is flag=1 after that? The threshold is not met. can anyone help? thanks John
I think you need another parentheses in your condition like below.
I run here and after all flags became zero.
if ((baseline-x)/8) >1 then
do;
flag=1;
baseline=x;
end;
else
flag=0;
run;
The data step is overwriting the original value of BASELINE after it sets the FLAG variable to 1. So we cannot see what value it had when read from the original value of the COMBO dataset, but we can assume it was at least 8 more than X to cause it to go down that branch of the IF statement.
You need a separate variable to keep track of the current baseline. You can use RETAIN to do this.
data out;
set combo;
** Keep the value of this for each observation in the data set **;
retain current_baseline;
** Initialize baseline to starting value for data set **;
if _n_ = 1 then current_baseline = baseline;
if (current_baseline - x) / 8 < 1 then do;
flag = 1;
** Update current_baseline to new value since flag has been tripped **;
current_baseline = x;
end;
else flag = 0;
** If you want to store the value of baseline for later viewing, you can **;
baseline = current_baseline;
run;
Note that you really only need the values of x and the initial baseline value to run this. Let's say your initial baseline is x - 8. Then you can simply modify the initialization line to
** Initialize baseline to starting value for data set **;
if _n_ = 1 then current_baseline = x - 8;
Then you can run this with your raw data set with only the values for x.
Below is a simple representation of my problem. I do not control the data, nor the format applied (this is a backend service for a Stored Process Web App). My goal is to return the error message generated - which in this case is actually a NOTE.
data _null_;
input x 8.;
cards;
4 4
;
run;
The above generates:
NOTE: Invalid data for x in line 61 1-8. RULE:
----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0 61 4 4 x=. ERROR=1 N=1 NOTE: DATA statement used (Total
process time):
real time 0.00 seconds
cpu time 0.01 seconds
It's easy enough to capture the error status (if _error_ ne 0 then do) but what I'd like to do is return the value of the NOTE - which handily tells us which column was invalid, along with line and column numbers.
Is this possible without log scanning? I've tried sysmsg() and syswarningtext to no avail.
AFAIK, There is no feature for capturing the NOTES a data step causes while the data step is running.
Since you are in STP environment, you might either use either:
-altlog at session startup or
proc printto log=… wrap of the step
and do that scan.
I'm trying to write a macro code where i've used several keyword parameters and want one of those parameters to be able to read in multiple arguments/values.
I want to achieve something like this:
%MACRO TEST(CONDITION=, VVAR=, OUT_VAR=)/PARMBUFF;
%LET CNT = %sysfunc(countw(&syspbuff));
&OUT_VAR = .;
%DO I =1 %TO &CNT;
%IF &CONDITION=Y %THEN %DO;
&OUT_VAR=(SALARY+BUNUS)/COUNT(VALUES PASSED TO VVAR PARAMETER);
%END;
%END;
%MEND;
data person;
input SALARY BONUS COND $;
datalines;
100 50 Y
200 75 Y
300 0 N
;
%TEST(CONDITION=COND,VVAR=SALARY BONUS,OUT_VAR=AVG_SAL);
RUN;
Can anyone suggest how can I achieve that? I tired using the syspbuff options to read in the values for VVAR parameter, but it has all the values passed to all the parameters.
Thanks!
You appear to be confused about the timing of when macro code executes and when the code that the macro has generated is executed by SAS. The macro processor does its work first and then passes the code onto SAS to interpret. When SAS sees a complete data or proc step then it runs that step.
Here is my attempt to translate your post into something that could execute. Not sure if it is what you are looking for.
First you have a macro that takes in three parameters. The first is some SAS expression that evaluates to a string. The middle is a variable list. And the last is a single variable name. It uses these parameter values to generate SAS code that could be used inside of a data step to conditionally generate the mean of the variables in the list.
%MACRO TEST(CONDITION=, VVAR=, OUT_VAR=);
IF &CONDITION='Y' THEN DO;
&OUT_VAR=mean(of &vvar);
END;
else &out_var=.;
%MEND;
Now you will need to call that macro as part of a data step so that the code is generated in a place where SAS will understand it. Note that when your data step is using inline data (CARDS/DATALINES) then the inline data most be the last thing in the data step.
data person;
input SALARY BONUS COND $;
%TEST(CONDITION=COND,VVAR=SALARY BONUS,OUT_VAR=AVG_SAL);
datalines;
100 50 Y
200 75 Y
300 0 N
;
If you run it with the MPRINT option on you can see the SAS code that the macro has generated. It is just the same as if you had typed that code directly into the data step instead of asking the macro to generate it for you.
615 data person;
616 input SALARY BONUS COND $;
617 %TEST(CONDITION=COND,VVAR=SALARY BONUS,OUT_VAR=AVG_SAL);
MPRINT(TEST): IF COND='Y' THEN DO;
MPRINT(TEST): AVG_SAL=mean(of SALARY BONUS);
MPRINT(TEST): END;
MPRINT(TEST): else AVG_SAL=.;
618 datalines;
NOTE: The data set WORK.PERSON has 3 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 0.06 seconds
cpu time 0.03 seconds
622 ;
If you wanted to generate multiple new variables then just call it multiple times. You could probably make the macro much more complicated by treating each input parameter as a delimited list of values and process the first set and then the second set etc. But why?
If you want to know how many words appear in the value of a macro variable, like the VVAR parameter in the macro above, then you can use the %sysfunc() macro function to call the SAS function countw().
%let cnt=%sysfunc(countw(&vvar));
You could then use the value of &cnt where you need it. Say as the upper bound in a macro %do loop or as an integer constant in an expression in the generated SAS code.
Is there a way to use by group processing in SAS when the data is grouped together but is out of order?
data sample;
input x;
datalines;
3
3
1
1
2
2
;
run;
Try to print out the first of each group:
data _null_;
set sample;
by x;
if first.x then do;
put _all_;
end;
run;
Results in the below error:
x=3 FIRST.x=1 LAST.x=0 _ERROR_=0 _N_=1
ERROR: BY variables are not properly sorted on data set WORK.SAMPLE.
x=3 FIRST.x=1 LAST.x=0 _ERROR_=1 _N_=2
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 3 observations read from the data set WORK.SAMPLE.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds
And just to reiterate - I do not want to sort the grouped data first - I need to process it in this order. I know I could create a proxy variable to sort on using an intermediary datastep and either a retain statement or the lag() function but I'm really looking for a solution that avoids this step. Also, I'd like to use the first and last keywords in my by-group processing.
Use the NOTSORTED option on your BY statement:
data _null_;
set sample;
by x NOTSORTED;
if first.x then do;
put _all_;
end;
run;