what does "if first.col3" mean? Sas enterprise guide - sas

what does "if first.col3" mean? (Sas enterprise guide)
Data table2;
set table1;
by col1 col2;
if first.col3;
run;
there is no statement after "if first.col3"

first.<variable-name> is special variable that is set to a value of 1 when the variable name is listed in a BY statement, AND the value is from the first row of a new group comprised of the by variable and it's subordinate by variables.
For the use case of first.<variable-name> and variable-name is NOT in listed in a BY statement you will get a log line stating
NOTE: first.<variable-name> is uninitialized.
Uninitialized variables are assigned a missing value at the start of the DATA step.
An if <expression> statement WITHOUT a then is called a subsetting-if and program control continues beyond it only when the expression is true.
If your case then if first.col3 has an expression that is never true because col3 is not listed in the BY statement, so control never passed beyond the subsetting if.
A DATA Step without an explicitly coded OUTPUT statement will by default output a row when control reaches the end of the step.
In your case control never reaches the end of the step, so no OUTPUT ever occurs and the resultant data set table2 will have zero rows.

Related

Is there any function in SAS where we can read the exact value from the variable

Suppose i have a column called ABC and that variable has the data like
:
123_112233_66778_1122 or
123_112233_1122_11232 or
1122_112233_66778_123
so i want to generate the desire variable in the next column as 1122. like this "1122" i have a long list where i need to cross the value from the column called ABC, if found the exact match then need to generate. However, i don't want to generate the match like 112233 because it does not match the value what i am looking for.
For an example you can see all three line what i have given for reference. I am taking only the match records which is "1122" from all the above 3 lines.
I really have no clue to overcome on the problem. I have tried my hands with wildcards but did not get much success. Any help would be much apricated
It is hard to tell from your description, but from the values you show it looks like you want the INDEXW() function. That will let you search a string for matching words with a option to specify which characters are to be considered as the separators between the words. The result is the location of where the word starts within longer string. When the word is not found the result is a zero.
Let's create a simple example to demonstrate.
data have;
input abc $30. ;
cards;
123_112233_66778_1122
123_112233_1122_11232
1122_112233_66778_123
;
data want;
set have ;
location = indexw(trim(abc),'1122','_');
run;
Note that SAS will consider any value other than zero (or missing) as TRUE so you can just use the INDEXW() function call in a WHERE statement.
data want;
set have;
where indexw(trim(abc),'1122','_');
run;

(Julia) Not able suppress output using semicolon when assigning variable inside an IF statement

Why does adding a semicolon in an IF ELSE statement is unable to suppress the output display?
Unable to suppress output
Indeed as the Julia Manual explains:
If an expression is entered into an interactive session with a trailing semicolon, its value is not shown.
However, this statement refers to the whole entered expression. In your case the whole expression includes the if part so you should write:
if condition
...
else
...
end;
(note the semicolon afer end)
In particular note, as explained here in the Julia Manual, that:
if blocks also return a value, which may seem unintuitive to users coming from many other languages. This value is simply the return value of the last executed statement in the branch that was chosen
Putting ; after end suppresses printing of the value returned by the if block.

when to release the record in the input buffer with the # in sas?

The following is the simple SAS program:
data mydata;
do group = 'placebo', 'active';
do subj = 1 to 5;
input score #;
output;
end;
end;
datalines;
250 222 230 210 199
166 183 123 129 234
;
I am learning SAS by myself. So I was thinking to make sure what happens here. For my understanding, the first line of the 5 entries belongs to the group placebo and the second line belongs to the group active. At first, the input buffer contains the first line of the 5 numbers, and the do subj=1 to 5 prints them out one by one, until the end of the current data step iteration. Then, the data step continues with the second iteration. Is this understanding correct? Many thanks for your time and attention.
PS. I just want to make sure when to release the current input buffer. After checking online, I found that the purpose of the # is as the following:
holds an input record for the execution of the next INPUT statement within the same iteration of the DATA step. This line-hold specifier is called trailing #.
So, it means the input buffer is released if one of the following two conditions is met:
(1): A new input statement is met without any # or ##.
(2): The end of the current data step iteration.
Any comments are greatly appreciated.
I like Tom's answer, but want to expand a bit on the meaning of data step iteration. You wrote:
At first, the input buffer contains the first line of the 5 numbers, and the do subj=1 to 5 prints them out one by one, until the end of the current data step iteration. Then, the data step continues with the second iteration. Is this understanding correct?
The DATA step is an implied iterative loop, from the top (DATA statement) to the bottom (RUN statement typically, in this case I think DATALINES statement). If you want to see what happens on each iteration of the loop, you can write values to the log with the PUT statement, you can also write N to the log, which is a counter for DATA step iteration number. So you might change your code to:
do group = 'placebo', 'active';
do subj = 1 to 5;
input score #;
put _n_= score= ;
output;
end;
end;
If you do that you should see that all of the data (all 10 values from both rows) are processed on the first iteration of the DATA step. You should only see _n_=1 in the log. As #Tom explained, this is because in the explicit looping you wrote, SAS moves forward to the second line of data when it can't find a sixth value to read on the first line. I think most people would consider the NOTE SAS throws about moving to the next line as a warning or even error.
If you want to have two iterations of the DATA step loop, you could change to something like:
if _n_=1 then group = 'placebo';
else if _n_=2 then group= 'active';
do subj = 1 to 5;
input score #;
put _n_= score= ;
output;
end;
(Not suggesting that two iterations is better, or that the above code is better, point is just to show what data step iteration means).
Your code should work fine, but you should see a note that SAS went to a new line in your LOG.
When GROUP='placebo' the inner loop (DO SUBJ ...) will read 5 numbers and leave the pointer at the end of the first line. Then the outer loop will execute again with GROUP='active'. When it tries to read the SCORE for SUBJ=1 there is nothing left on the first line. So SAS will skip to the next line and read the first SCORE from there. Then the other four values are read from that line.
Finally at the end of the data step it will "release" the line so the pointer will be at the beginning of line three (if there was a line three).
Then the whole data step will loop one more time and set GROUP='placebo' and SUBJ=1, but when it tries to read the SCORE it reads past the end of the file and stops the data step.
Note that your program would work fine as long as you have 10 values spaced over as many lines as you want.

SAS ODS escape character macro variable error

The SAS v9.4 documentation lists an automatic macro variable &sysodsescapechar which contains the current ODS escape character, assigned using ods escapechar=.
Whenever I try to view the macro variable using a %put statement, I get the following error:
ERROR: Open code statement recursion detected.
This happens when open code erroneously causes a macro statement to call another macro statement.
I've tried all of the following:
%put &=sysodsescapechar.;
%put %nrbquote(&sysodsescapechar.);
%put %superq(sysodsescapechar);
They all result in the same error.
When I try to view the macro variable using a data step, it appears to be empty.
data test;
esc = "&sysodsescapechar.";
put esc=;
run;
If the macro variable actually is empty, why do I get open code statement recursion errors? The %put statement on its own is valid, so putting an empty variable shouldn't be an issue.
Any guidance here would be much appreciated.
What's happening is the escape char seems to need a close parentheses. For example:
%put %superq(SYSODSESCAPECHAR););
;
It escapes the ) , which means now you have
%put superq(;);
In your first example, it's a little trickier because a semicolon by itself doesn't seem to be escaped so you have to provide a close parentheses:
%put &SYSODSESCAPECHAR.)x;
x
That works, for example. I'm not sure if it's only close paren or other things that would also allow it to stop trying to escape, but that's the only thing I can tell works.
You can look at the actual value of the macro variable in SASHELP.VMACRO; it's not ' ' (which is indeed what gets passed to the data step even with SYMGET, but it's clearly parsed). In that table it is '03'x, which looks like a uppercase L in the upper half of the character. That is the "End of Text" control character. I suspect the behavior in the editor when using this in text (in a macro variable) is simply a behavior of the editor - '03'x is not representable on many editors (if I try to paste it here, for example, it isn't displayed, but does exist as something I can backspace over with zero width). SAS is clearly capable of dealing with a 'normal' ods escapechar but isn't capable of dealing with '03'x in the same fashion.

SAS - selecting character observations from position 1 to position 2

I am stuck in this one particular point. I have a character variable with observations extracted from rtf document. I need to keep only the observations from obs A to obs B. The firstobs and obs is not helpful here because we do not know the observation number beforehand. All we know is the two unique strings. For example in the dataset, I need to create a dataset with observations from obs 11 to 16. This is only part of dataset, the original dataset has over 1500 observations, that is why we use unique text to capture instead of observation number.
Thank you all in advance.
You don't explain enough, but odds are you can do something sort of like this if I understand you right (you have a "start" and a "stop" string in the document).
data want;
set have;
retain keep 0;
if strvar = "keepme" then keep=1;
if keep=1;
if strvar = "lastone" then keep=0;
run;
IE, have some condition set the keep variable to 1, then test for it, then have the off condition after that (assuming you want to keep the off condition row). Use string functions like index or find or scan to search for your particular string if it's not an entire string. You could also use regular expressions if necessary.