Where statement is not capturing my condition correctly - sas

I want to tell SAS to capture specific observation under the variable "rashloc_spcy" (and others) for a string observations ("B", "P/G", "Peri", "Gen"). However, when I see the results, SAS is capturing other observations not described in my statement. Is there anything I can do to modify my code?
output result
proc print data=k.dataset;
var rashloc_GNT rashloc_PER rasloc_Spcfy;
where ((rashloc_GNT = "GNT") OR (rashloc_PER = "PER")) OR rashloc_Spcfy in ("B", "P/G", "Peri", "Gen"));
run;
I should be getting only the quoted keyterms in the variable of interest (rashloc_spcfy)

So you want to exclude cases where the third variable is some other non missing value if they meet the first two criteria?
So perhaps?
where ((rashloc_GNT = "GNT") OR (rashloc_PER = "PER"))
and not (rashloc_Spcfy not in (" ","B","P/G","Peri","Gen"))
;

Related

Truncation when using CASE in SQL statement in SAS (Enterprise Guide)

I am trying to manipulate some text files in SAS Enterprise Guide and load them line by line in a character variable "text" which gets the length 1677 characters.
I can use the Tranwrd() function to create a new variable text21 on this variable and get the desired result as shown below.
But if I try to put some conditions on the execution of exactly the same Tranwrd() to form the variable text2 (as shown below) it goes wrong as the text in the variable is now truncated to around 200 characters, even though the text2 variable has the length 1800 characters:
PROC SQL;
CREATE TABLE WORK.Area_Z_Added AS
SELECT t1.Area,
t1.pedArea,
t1.Text,
/* text21 */
( tranwrd(t1.Text,'zOffset="0"',compress('zOffset="'||put(t2.Z,8.2)||'"'))) LENGTH=1800 AS text21,
/* text2 */
(case when t1.type='Area' then
tranwrd(t1.Text,'zOffset="0"',compress('zOffset="'||put(t2.Z,8.2)||'"'))
else
t1.Text
end) LENGTH=1800 AS text2,
t1.Type,
t1.id,
t1.x,
t1.y,
t2.Z
FROM WORK.VISSIM_IND t1
LEFT JOIN WORK.AREA_Z t2 ON (t1.Type = t2.Type) AND (t1.Area = t2.Area)
ORDER BY t1.id;
QUIT;
Anybody got a clue?
This is a known problem with using character functions inside a CASE statement. See this thread on SAS Communities https://communities.sas.com/t5/SAS-Programming/Truncation-when-using-CASE-in-SQL-statement/m-p/852137#M336855
Just use the already calculated result in the other variable instead by using the CALCULATED keyword.
CREATE TABLE WORK.Area_Z_Added AS
SELECT
t1.Area
,t1.pedArea
,t1.Text
,(tranwrd(t1.Text,'zOffset="0"',cats('zOffset="',put(t2.Z,8.2),'"')))
AS text21 length=1800
,(case when t1.type='Area'
then calculated text21
else t1.Text
end) AS text2 LENGTH=1800
,t1.Type
,t1.id
,t1.x
,t1.y
,t2.Z
FROM WORK.VISSIM_IND t1
LEFT JOIN WORK.AREA_Z t2
ON (t1.Type = t2.Type)
AND (t1.Area = t2.Area)
ORDER BY t1.id
;
If you don't need the extra TEXT21 variable then use the DROP= dataset option to remove it.
CREATE TABLE WORK.Area_Z_Added(drop=text21) AS ....

Delete all observations starting with a list of values from database (SAS)

I am trying to find the optimized way to do this :
I want to delete from a character variable all the observations STARTING with different possible strings such as :
"Subtotal" "Including:"
So if it starts with any of these values (or many others that i didn't write here) then delete them from the dataset.
Best solution would be a macro variable containing all the values but i don't know how to deal with it. (%let list = Subtotal Including: but counts them as variables while they are values)
I did this :
data a ; set b ;
if findw(product,"Subtotal") then delete ;
if findw(product,"Including:") then delete;
...
...
Would appreciate any suggestions !Thanks
First figure out what SAS code you want. Then you can begin to worry about how to use macro logic or macro variables.
Do you just to exclude the strings that start with the values?
data want ;
set have ;
where product not in: ("Subtotal" "Including");
run;
Or do you want to subset based on the first "word" in the string variable?
where scan(product,1) not in ("Subtotal" "Including");
Or perhaps case insensitive?
where lowcase(scan(product,1)) not in ("subtotal" "including");
Now if the list of values is small enough (less than 64K bytes) then you could put the list into a macro variable.
%let list="Subtotal" "Including";
And then later use the macro variable to generate the WHERE statement.
where product not in: (&list);
You could even generate the macro variable from a dataset of prefix values.
proc sql noprint;
select quote(trim(prefix)) into :list separated by ' '
from prefixes
;
quit;

Multiple To clauses in Data step

I have a data step where I have a few columns that need tied to one other column.
I have tried using multiple "from" statements and " to" statements and a couple other permutations of that, but nothing seems to do the trick. The code looks something like this:
data analyze;
set css_email_analysis;
from = bill_account_number;
to = customer_number;
output;
from = bill_account_number;
to = email_addr;
output;
from = bill_account_number;
to = e_customer_nm;
output;
run;
I would like to see two columns showing bill accounts in the "from" column, and the other values in the "to", but instead I get a bill account and its customer number, with some "..."'s for the other values.
Issue
This is most likely because SAS has two datatypes and the first time the to variable is set up, it has the value of customer_number. At your second to statement you attempt to set to to have the value of email_addr. Assuming email_addr is a character variable, two things can happen here:
Customer_number is a number - to has already been set up as a number, so SAS cannot force to to become a character, an error like this may appear:
NOTE: Invalid numeric data, 'me#mywebsite.com' , at line 15 column 8. to=.
ERROR=1 N=1
Customer_number is a character - to has been set up as a character, but without explicitly defining its length, if it happens to be shorter than the value of email_addr then the email address will be truncated. SAS will not show an error if this happens:
Code:
data _NULL_;
to = 'hiya';
to = 'me#mydomain.com';
put to=;
run;
short=me#m
to is set with a length of 4, and SAS does not expand it to fit the new data.
Detail
The thing to bear in mind here is how SAS works behind the scenes.
The data statement sets up an output location
The set statement adds the variables from first observation of the dataset specified to a space in memory called the PDV, inheriting lengths and data types.
PDV:
bill_account_number|customer_number|email_addr|e_customer_nm
===================================================================
010101 | 758|me#my.com |John Smith
The to statement adds another variable inheriting the characteristics of customer_number
PDV:
bill_account_number|customer_number|email_addr|e_customer_nm|to
===================================================================
010101 | 758|me#my.com |John Smith |758
(to is either char length 3 or a numeric)
Subsequent to statements will not alter the characteristics of the variable and SAS will continue processing
PDV (if customer_number is character = TRUNCATION):
bill_account_number|customer_number|email_addr|e_customer_nm|to
===================================================================
010101 | 758|me#my.com |John Smith |me#
PDV (if customer_number is numeric = DATA ERROR, to set to missing):
bill_account_number|customer_number|email_addr|e_customer_nm|to
===================================================================
010101 | 758|me#my.com |John Smith |.
Resolution
To resolve this issue it's probably easiest to set the length and type of to before your first to statement:
data analyze;
set css_email_analysis;
from = bill_account_number;
length to $200;
to = customer_number;
output;
...
You may get messages like this, where SAS has converted data on your behalf:
NOTE: Numeric values have been converted to character
values at the places given by: (Line):(Column).
27:8
N.B. it's not necessary to explicitly define the length and type of from, because as far as I can see, you only ever get the values for this variable from one variable in the source dataset. You could also achieve this with a rename if you don't need to keep the bill_account_number variable:
rename bill_account_number = from;

Stata: combine foreach with by

My data has some missing values for the variable issue. I'm trying to impute the most recent past issue value (for that subject, identified by id1 and id2), if any. If all past issue values are missing, I want the code to leave the current value as missing.
I tried the below code, but Stata says foreach can't be combined with by.
bys id1 id2 (date): foreach v in 1(1)_n {
replace issue[n] = issue[n-v] if !missing(issue[n-v]) and missing(issue[n])==1
}
Is there a way to do this without using foreach with by?
The attempted loop over observations is quite unnecessary, as Stata does that any way.
If you want to use only the most recent non-missing value it is likely that you want this:
clonevar issue, generate(clone)
bys id1 id2 (date): replace issue = clone[n-1] if missing(issue)
Note the following bugs in your code apart from that you flag:
foreach v in 1(1)_n: foreach won't expand a numlist with in; nor will it evaluate _n for you.
replace issue[n]: subscripts are not allowed in that position; replace issue means the same thing any way.
issue[n-v]: you'd need a local reference there.
and is not a keyword: you need & if you want a logical "and"
n presumably is a typo for _n
See also this FAQ on replacing missing values

a question on the default initialized value for a SAS variable

I am having two questions on the following SAS code:
%let crsnum=3;
data revenue;
set sasuser.all end=final;
where course_number=&crsnum;
total+1;
if paid=’Y’ then paidup+1;
if final then do;
call symput(’numpaid’,paidup);
call symput(’numstu’,total);
call symput(’crsname’,course_title);
end;
run;
proc print data=revenue noobs;
var student_name student_company paid;
title "Fee Status for &crsname (#&crsnum)";
footnote "Note: &numpaid Paid out of &numstu Students";
run;
First question, in line 5, it has
if paid=’Y’ then paidup+1;
"paidup" should be a variable here.
It seems to me that SAS setup the default initial value of "paidup" as 0. Is that true?
Second question, in the code segment of
title "Fee Status for &crsname (#&crsnum)";
How does #&crsnum work? Or what's the functionality of # here?
First question: yes, that's what SAS has done - it has initialised the variable with 0, and 'retains' the value of the variable across data set loops. (Unless the variable paidup already exists in the source data, in your case sasuser.all)
Second question: in the code you've posted, there is nothing special about the #: it will appear as a literal before the resolved value of &crsnum in the title. So if &crsname is Blah and &crsnum is 3, the title will read
Fee Status for Blah (#3)
The # can, however, affect titles when a by group is in play, when included in the title in a particular way - see the documentation here, under the heading 'Inserting BY-Group Information into a Title'.