Identify variable name based on value - sas

I have a value and I need to find what are the variables in a dataset has this value. All are character variables and search value is also character variable.
Example:
Value to be searched:123456
Output: Var1 and Var3
All variables are character
this is sample dataset: in reality there are hundreds of variables to be searched
Thanks in advance

You'd want to use WHICHC in conjunction with an array, and then VNAME to find out the name of the variable.
data have;
input x $ y $ z $;
datalines;
123456 234567 345678
234567 345678 123456
345678 234567 456789
;;;;
run;
data want;
set have;
array vars x y z;
pos = whichc('123456',of vars[*]);
if pos>0 then varname = vname(vars[pos]);
run;
WHICHC returns 0 if not found. Then you could use PROC FREQ on this dataset to find the list of all values that have this.
This particular approach only works if it can only exist once per row; if it can exist in two variables on the same row, you would have to loop over the array and search it iteratively.

Related

Why does my regex only change my first entry in SAS?

I have a number of text entries (municipalities) from which I need to remove the s at the end.
Data test;
input city $;
datalines;
arjepogs
askers
Londons
;
run;
data cities;
set test;
if prxmatch("/^(.*?)s$/",city)
then city=prxchange("s/^(.*?)s$/$1/",-1,city);
run;
Strangely enough, my s's are only removed from my first entry.
What am I doing wrong?
You defined CITY as length $8. The s in Londons is in the 7th position of the string. Not the LAST position of the string. Use the TRIM() function to remove the trailing spaces from the value of the variable.
data have;
input city $20.;
datalines;
arjepogs
Kent
askers
Londons
;
data want;
set have;
length new_city $20 ;
new_city=prxchange("s/^(.*?)s$/$1/",-1,trim(city));
run;
Result
Obs city new_city
1 arjepogs arjepog
2 Kent Kent
3 askers asker
4 Londons London
You could also just change the REGEX to account for the trailing spaces.
new_city=prxchange("s/^(.*?)s\ *$/$1/",-1,city);
Here is another solution using only SAS string functions and no regex. Note that in this case there is no need to trim the variable:
data cities;
set test;
if substr(city,length(city)) eq "s" then
city=substr(city,1,length(city)-1);
run;

How to create a class variable based on several indicator variables

I have a dataset of clinical test results. I want to create a "results" variable based on the variable names of the tests that the patients were positive for. This may be taken care of by many if-else statements, but I want to be able to build a character variable that accounts for multiple test results without having to know the various response patterns a priori.
This is an example of the dataset:
ID RSV FLU
1 Positive Negative
2 Negative Positive
3 Negative Negative
4 Positive Positive
5 Negative Negative
This is what I am looking for:
ID RSV FLU Result
1 Positive Negative RSV
2 Negative Positive FLU
3 Negative Negative
4 Positive Positive RSV, FLU
5 Negative Negative
Any help would be appreciated!
I have used proc transpose to invert the dataset, with this approach you can have as many columns as needed for clinical test outcome
/*Input Dataset*/
data have;
input ID RSV$ FLU$;
datalines;
1 Positive Negative
2 Negative Positive
3 Negative Negative
4 Positive Positive
5 Negative Negative
;
run;
proc sort data=have; by id; run;
/*Initial Transpose*/
proc transpose data=have out=want0;
by id;
var rsv flu;
run;
/*Manipulate transposed dataset*/
data want1;
length Result $50.;
set want0;
by id;
retain Result '';
if first.id then Result='';
if first.id and col1='Positive' then Result=_NAME_;
else if not first.id and col1='Positive' then Result=catx(', ',Result,_NAME_);
if last.id;
run;
/*Final outcome*/
proc sql;
create table want
as
select a.*, b.result
from have a
left join want1 b
on a.id=b.id;
quit;
An array and VNAME() are likely good options here. Untested.
data want;
set have;
array diags(*) RSV FLU;*list variables here;
length diags_combined $256.;
do i=1 to dim(diags);
if diag(i) = 'Positive' then catx(', ', diags_combined, vname(diag(i)));
end;
run;

Removing Characters from SAS String Starting on Left

I have a SAS string that always starts with a date. I want to remove the date from the substring.
Example of data is below (data does not have bullets, included bullets to increase readability)
10/01/2016|test_num15
11/15/2016|recom_1_test1
03/04/2017|test_0_8_i0|vacc_previous0
I want the data to look like this (data does not have bullets, included bullets to increase readability)
test_num15
recom_1_test1
test_0_8_i0|vacc_previous0
Index find '|' position in the string, then substr substring; or use regular expression.
data have;
input x $50.;
x1=substr(x,index(x,'|')+1);
x2=prxchange('s/([^_]+\|)(?=\w+)//',1,x);
cards;
10/01/2016|test_num15
11/15/2016|recom_1_test1
03/04/2017|test_0_8_i0|vacc_previous0
;
run;
This is a great use case for call scan. If your length of date is constant (always 10), then you don't actually need this (start would be 12 then and skip to the substr, as user667489 noted in comments), but if it's not this would be helpful.
data have;
length textstr $100;
input textstr $;
datalines;
10/01/2016|test_num15
11/15/2016|recom_1_test1
03/04/2017|test_0_8_i0|vacc_previous0
;;;;
run;
data want;
set have;
call scan(textstr,2,start,length,'|');
new_textstr = substr(textstr,start);
run;
It would also let you grab the second word only if that's useful (using length third argument for substr).

usage of single trailing(#) in SAS for delimited data

Can you please tell me if we can use single trailing (#) for delimited data
rather than fixed width.
Thanks,
Nikhila
From the comments it looks like the question is really how to skip columns in delimited data. A simple way is to read the value into a variable that you later drop. Or even read it into a variable that you want and then overwrite it with the value from the column you do want to keep.
data want ;
infile cards dsd truncover ;
length var1 var2 $20;
input 3*var1 var2 ;
cards;
nikhila,26,hyd,btech
akhila,24,blr,btech
nitesh,20,blr,bmm
;

SAS DATA step / INPUT statement: reading column-based raw data AND multiple observations from single line?

I’m working with some raw data that has fixed column widths, but has all its records written into a single line (blame the data vendor, not me :-) ). I know how to use
fixed column widths in the INPUT statement, and how to use ## to read more than one observation per line, but I am having trouble when I try to do both.
As an example, here’s some code where the data has fixed column widths, but there is one line per record. This code works fine:
DATA test_1;
INPUT alpha $ 1-5 beta $ 6-10 gamma 11-15 ;
DATALINES;
a f 1
ab fg 12
abc fgh 123
abcd fghi 1234
abcdefghij12345
;
RUN;
Now here’s the code for what I’m really trying to do – all the data is in one line, and I try to use the ## notation:
DATA test_2;
INPUT alpha $ 1-5 beta $ 6-10 gamma 11-15 ##;
DATALINES;
a f 1 ab fg 12 abc fgh 123 abcd fghi 1234 abcdefghij12345
;
RUN;
This fails because it just keeps reading the beginning 15 characters, holding that record, and re-reading from the start. Based on my understanding of the semantics of the ## notation, I can definitely understand why this would be happening.
Is there any way I can accomplish reading fixed column data from a single line; that is, make test_2 have the same content as test_1? Perhaps through some combination of symbols in the INPUT statement, or maybe resorting to another method (with file I/O functions, PROC IMPORT, etc.)?
Have you tried specifying variable lengths using informats?
For example:
DATA test_2;
INPUT alpha $5. beta $5. gamma 5.0 ##;
DATALINES;
a f 1 ab fg 12 abc fgh 123 abcd fghi 1234 abcdefghij12345
;
RUN;
From the SAS documentation:
Formatted input causes the pointer to move like that of column input
to read a variable value. The pointer moves the length that is
specified in the informat and stops at the next column.