I have a dataset of clinical test results. I want to create a "results" variable based on the variable names of the tests that the patients were positive for. This may be taken care of by many if-else statements, but I want to be able to build a character variable that accounts for multiple test results without having to know the various response patterns a priori.
This is an example of the dataset:
ID RSV FLU
1 Positive Negative
2 Negative Positive
3 Negative Negative
4 Positive Positive
5 Negative Negative
This is what I am looking for:
ID RSV FLU Result
1 Positive Negative RSV
2 Negative Positive FLU
3 Negative Negative
4 Positive Positive RSV, FLU
5 Negative Negative
Any help would be appreciated!
I have used proc transpose to invert the dataset, with this approach you can have as many columns as needed for clinical test outcome
/*Input Dataset*/
data have;
input ID RSV$ FLU$;
datalines;
1 Positive Negative
2 Negative Positive
3 Negative Negative
4 Positive Positive
5 Negative Negative
;
run;
proc sort data=have; by id; run;
/*Initial Transpose*/
proc transpose data=have out=want0;
by id;
var rsv flu;
run;
/*Manipulate transposed dataset*/
data want1;
length Result $50.;
set want0;
by id;
retain Result '';
if first.id then Result='';
if first.id and col1='Positive' then Result=_NAME_;
else if not first.id and col1='Positive' then Result=catx(', ',Result,_NAME_);
if last.id;
run;
/*Final outcome*/
proc sql;
create table want
as
select a.*, b.result
from have a
left join want1 b
on a.id=b.id;
quit;
An array and VNAME() are likely good options here. Untested.
data want;
set have;
array diags(*) RSV FLU;*list variables here;
length diags_combined $256.;
do i=1 to dim(diags);
if diag(i) = 'Positive' then catx(', ', diags_combined, vname(diag(i)));
end;
run;
Related
I have a number of text entries (municipalities) from which I need to remove the s at the end.
Data test;
input city $;
datalines;
arjepogs
askers
Londons
;
run;
data cities;
set test;
if prxmatch("/^(.*?)s$/",city)
then city=prxchange("s/^(.*?)s$/$1/",-1,city);
run;
Strangely enough, my s's are only removed from my first entry.
What am I doing wrong?
You defined CITY as length $8. The s in Londons is in the 7th position of the string. Not the LAST position of the string. Use the TRIM() function to remove the trailing spaces from the value of the variable.
data have;
input city $20.;
datalines;
arjepogs
Kent
askers
Londons
;
data want;
set have;
length new_city $20 ;
new_city=prxchange("s/^(.*?)s$/$1/",-1,trim(city));
run;
Result
Obs city new_city
1 arjepogs arjepog
2 Kent Kent
3 askers asker
4 Londons London
You could also just change the REGEX to account for the trailing spaces.
new_city=prxchange("s/^(.*?)s\ *$/$1/",-1,city);
Here is another solution using only SAS string functions and no regex. Note that in this case there is no need to trim the variable:
data cities;
set test;
if substr(city,length(city)) eq "s" then
city=substr(city,1,length(city)-1);
run;
I have a SAS string that always starts with a date. I want to remove the date from the substring.
Example of data is below (data does not have bullets, included bullets to increase readability)
10/01/2016|test_num15
11/15/2016|recom_1_test1
03/04/2017|test_0_8_i0|vacc_previous0
I want the data to look like this (data does not have bullets, included bullets to increase readability)
test_num15
recom_1_test1
test_0_8_i0|vacc_previous0
Index find '|' position in the string, then substr substring; or use regular expression.
data have;
input x $50.;
x1=substr(x,index(x,'|')+1);
x2=prxchange('s/([^_]+\|)(?=\w+)//',1,x);
cards;
10/01/2016|test_num15
11/15/2016|recom_1_test1
03/04/2017|test_0_8_i0|vacc_previous0
;
run;
This is a great use case for call scan. If your length of date is constant (always 10), then you don't actually need this (start would be 12 then and skip to the substr, as user667489 noted in comments), but if it's not this would be helpful.
data have;
length textstr $100;
input textstr $;
datalines;
10/01/2016|test_num15
11/15/2016|recom_1_test1
03/04/2017|test_0_8_i0|vacc_previous0
;;;;
run;
data want;
set have;
call scan(textstr,2,start,length,'|');
new_textstr = substr(textstr,start);
run;
It would also let you grab the second word only if that's useful (using length third argument for substr).
Is it possible to use the number in this string:
'xx8xx'
by replacing the number with 8 spaces to get this string:
'xx xx'
I can identify the number between the xx but the replacement syntax does not work as intended:
PRXCHANGE(s/xx([\d]*)xx/' ' x $1/io, -1, 'xx8xx')
Is there a way to use the number being held in $1 to repeat the space character by that number i.e. something like ' ' x $1?
Any help much appreciated!
Tiaan
Supposed you need to replace with three blank.
data _null_;
x=prxchange('s/(xx)\d+(xx)/$1 $2/', -1, 'xx8xx');
_x=prxchange('s/(?=\w+)(\d+)/ /',1,'xx8xx');
put _all_;
run;
Edit:
I missed important information. Tranwrd and repeat could be used to get it.
data _null_;
x=tranwrd('xx8xx', prxchange('s/.*(\d+).*/$1/',1,'xx8xx'), repeat(' ',prxchange('s/.*(\d+).*/$1/',1,'xx8xx')));
put _all_;
run;
You'll need to extract first, then compile a new regex. This will be expensive since you have to compile once per line.
data have;
input xstr $;
datalines;
xx8xx
xx3xx
xx4xx
;;;;
run;
data want;
set have;
rx1 = prxparse('/xx([\d])*xx/io');
rc1 = prxmatch(Rx1,xstr);
num_x = prxposn(rx1,1,xstr);
rx2 = prxparse(cat('s/(xx)[\d]*(xx)/$1',repeat(" ",num_x-1),'$2/i'));
newstr = prxchange(rx2,-1,xstr);
run;
I have to merge two dataset in SAS where in one the key variable is a number where the lentgh is 10 (for example). If the number is shorter than 10 I have a variable number of zeros. For example 00000056471.
While in the other dataset the number is simply 56471. I want to create another variable in the second dataset that add the variable number of zero and use that as key variable for the merge.
How can I fix?
Thank in advance
There are three approaches you could use:
data have;
input ID $10. ;
cards;
143134
12
14356677
12f
oh dear
;run;
data want;
set have;
/* if numeric then convert to number and back using z10 format */
newID=put(input(ID,8.),$z10.);
/* if alphanumeric, right align then replace spaces (will replace ALL spaces) */
newID2=translate(right(ID),'0',' ');
/* probably the best method */
newID3=repeat('0',10-length(ID)-1)||ID;
run;
Documentation on Zw.d format available here.
Currently, I have a data, including upc code. The value in UPC code is ranged from 3 digits to 5 digits. Thus, I want to unify all these upc codes with 5 digits.
For example, upc code is 111. I would make this value as 00111. How can I do this in sas?
You're looking for the zw.d format.
data have;
upc=111;
run;
data want;
set have;
upc_char = put(upc,z5.);
run;
If upc is a character variable to start with, you need input along with put.