Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
in SAS how to to use a contain (or alternative) operator when you have more than one set of letters to choose. eg where have_variable= abd, afg, afd, acc and want_variable=abd, afg, afd (containing ab or af only)
I've Split your have and want list into two tables with multiple records then left joined on Have list to find the matching ones.
The final table will look like this
/* Create your input String */
data Have;
have="abd , afg , afd , acc";
run;
data Want ;
want="abd , afg , afd";
run;
/* Splint Input strings into Multiple Rows */
data Have_List;
set Have;
do i=1 by 0;
source=lowcase(scan(have,i,','));
if missing(source) then leave;
output;
i+1;
end;
keep source ;
run;
data Want_List;
set Want;
do i=1 by 0;
lookup=lowcase(scan(want,i,','));
if missing(lookup) then leave;
match='match';
output;
i+1;
end;
keep lookup match;
run;
/* Create a SQL left join to lookup the matching values */
proc sql;
create table match as
select h.source as have , COALESCE(w.match,"no-match") as match
from have_list h left join want_list w on h.source=w.lookup;
quit;
You can use a list in your select statement.
Like that :
proc sql;
select * from my_table where have_variable in ('abd','afg','afd','acc') and want_variable in ('abd','afg','afd');
run;
quit;
You can even use the in operator in a dataset statement like this :
data want;
set mydate;
if have_variable in ('abd','afg','afd','acc') and
want_variable in ('abd','afg','afd');
run;
If you want to obtain the variable only containing 2 letters you can use the LIKE :
proc sql;
select * from my_table where have_variable like '%ab%' or have_variable like '%af%';
run;
in a dataset :
data want;
set mydate;
where have_variable like '%ab%' or
have_variable like '%af%';
run;
Regards
If you only want records that begin with ab or af (rather than contains them anywhere in the string), then you can you in followed by :. With this usage, the colon instructs SAS to only search the first n letters in the string, where n is length of the comparison (2 in your example).
Note that this only works in a datastep, not proc sql.
data have;
input have_var $;
datalines;
abd
afg
afd
acc
;
run;
data _null_;
set have;
where have_var in: ('ab','af');
put _all_;
run;
Related
I am working with a huge dataset in sas trying to use proc sql and I need help setting up a like statement. I'm trying to extract all the columns that have 'eco' in the name
I'm getting an error in the where statement as it is not registering the second *.
Any help?
proc sql
select *
from cfy19e8
where * LIKE %eco%;
You could concatenate all of your columns with catx() and find any one that has the word eco.
data have;
input col1$ col2$ col3$;
datalines;
sadfeco kdoa wrfs
asdf asdf sadf
mfecosa mawoeco mfzeco
;
run;
data want;
set have;
where catx('|', col1, col2, col3) LIKE '%eco%';
run;
If you have a lot of character columns, you could use the shortcut _CHARACTER_ to concatenate all variables, then use find() within an if statement in a data step.
data want;
set have;
if(find(catx('|', of _CHARACTER_), 'eco') );
run;
Perhaps
proc contents noprint data=cfy19e8 out=eco_columns(where=(upcase(name) like '%ECO%'));
run;
title 'Columns with ECO in their name';
proc print data=eco_columns;
var name;
run;
I am starting out with SAS and PROC SQL, and I want to do a very simple task as described below but am having trouble.
data test;
input x_ray y_chromosome z_sword;
cards;
1 2 3
4 5 6
4 7 8
7 8 9
7 9 10
7 10 11
;
In the table test we have three variables, x_ray, y_chromosome, and z_sword.
My goal is to make another table, let's say, result, that looks like this.
data result;
input name $;
cards;
x_ray
y_chromosome
z_sword
;
I looked up ways to do this online, but all the methods are very well beyond my understanding and I simply do not believe that such a simple process has to be so complicated.
May I have some help, please?
Use PROC CONTENTS.
proc contents data=test noprint out=result;
run;
If you only want the variable names and not the other information you can use dataset option to limit the variables kept in the RESULT dataset.
proc contents data=test noprint out=result(keep=name) ;
run;
I cannot find any shorter and simpler answer than #Tom, just another two ways for reference.
Use dictionary table:
proc sql;
create table result as select name from sashelp.vcolumn where libname='WORK' and memname='TEST';
quit;
or query them by I/O functions:
data result;
rc = open('work.test');
if rc then do;
do i = 1 to attrn(rc,'nvars');
name = varname(rc,i);
output;
end;
rc = close(rc);
end;
run;
I want an answer for this.
The input I have is:
ABC123
The output I want is:
123ABC
How to print the output in this format (i.e. backwards) using Proc SQL?
Based on the information given and assuming that all your data is in same format you can tweak substr function in proc sql
data have;
value='ABC123';
run;
proc sql;
create table want
as
select value,
substr(value,4,4)||substr(value,1,3) as new_value
from have;
quit;
proc print data=want; run;
The same function can be applied in data step as well.
You will probably want to use trim() to deal with the trailing spaces that SAS stores in character variables.
trim(substr(have,4))||substr(have,1,3)
If you want an algorithm that would work with similar strings of any length (any number of letters followed by any number of digits), I suggest using regular expressions to modify the input string.
outStr = prxChange("s/([A-z]+)([\d]+)/$2$1/", 1, inStr);
You can easily use it within proc sql.
data test1;
inStr = "ABCdef12345";
run;
proc sql;
create table test2 as
select prxChange("s/([A-z]+)([\d]+)/$2$1/", 1, inStr) as outStr
from test1;
quit;
Base SAS contains a function REVERSE that is dedicated to reversing a string, and that can be used both in proc sql and in a datastep. See example in SAS documentation or here:
proc sql;
select Name,
reverse(Name) as Name_reversed
from sashelp.class
;
quit;
output:
Name | Name_reversed
--------|--------------
Alfred | derflA
Alice | ecilA
Barbara | arabraB
etc.
First I have created dataset 'have'. Then I sorted this dataset(have).
Again created a dataset 'havenot'.Now basically,I need to subtract two datasets('have' and 'havenot').
data have;
input party_ID Preference_ID:$11.;
datalines;
101 Preference1
101 Preference2
102 Preference4
102 Preference1
102 Preference5
;
proc sort data = have;
by party_ID Preference_ID;
run;
data havenot;
set have;
by party_ID Preference_ID;
if first.party_id;
run;
(output of havenot)
party_ID Preferenece_ID
101 Preference1
102 Preference1
Desired output that I want
party_ID Preference_ID
101 Preference2
102 Preference4
102 Preference5
Are you asking how to remove the first record per PARTY_ID?
You could just reverse the logic in your subsetting IF statement.
data want;
set have;
by party_id;
if not first.party_id;
run;
Or another way is to explicitly delete the first observations.
if first.party_id then delete;
If you are asking how to remove exact row matches then PROC SQL can do that.
proc sql ;
create table want as
select * from have
except
select * from havenot
;
quit;
If you want to remove rows based on just key matches then might be better in a data step.
data want ;
merge have havenot(in=in2 keep=party_id preference_id);
by party_id preference_id;
if not in2;
run;
basically you can do if not first.variable will give the dataset you want
data other;
set have;
by party_ID Preference_ID;
if not first.party_id;
run;
The easiest option is to use a data step:
data output;
merge have(in=i1) havenot(in=i2);
by party_ID Preference_ID;
if not i2;
run;
If you want to use proc sql, you could do the following:
proc sql noprint;
create table output as
select a.*
from have as a
full outer join havenot as b
on a.party_ID eq b.party_ID and a.Preference_ID eq b.Preference_ID
where b.party_ID is missing;
quit;
What i want to do: I need to create a new variables for each value labels of a variable and do some recoding. I have all the value labels output from a SPSS file (see sample).
Sample:
proc format; library = library ;
value SEXF
1 = 'Homme'
2 = 'Femme' ;
value FUMERT1F
0 = 'Non'
1 = 'Oui , occasionnellement'
2 = 'Oui , régulièrement'
3 = 'Non mais j''ai déjà fumé' ;
value ... (many more with different amount of levels)
The new variable name would be the actual one without F and with underscore+level (example: FUMERT1F level 0 would become FUMERT1_0).
After that i need to recode the variables on this pattern:
data ds; set ds;
FUMERT1_0=0;
if FUMERT1=0 then FUMERT1_0=1;
FUMERT1_1=0;
if FUMERT1=1 then FUMERT1_1=1;
FUMERT1_2=0;
if FUMERT1=2 then FUMERT1_2=1;
FUMERT1_3=0;
if FUMERT1=3 then FUMERT1_3=1;
run;
Any help will be appreciated :)
EDIT: Both answers from Joe and the one of data_null_ are working but stackoverflow won't let me pin more than one right answer.
Update to add an _ underscore to the end of each name. It looks like there is not option for PROC TRANSREG to put an underscore between the variable name and the value of the class variable so we can just do a temporary rename. Create rename name=newname pairs to rename class variable to end in underscore and to rename them back. CAT functions and SQL into macro variables.
data have;
call streaminit(1234);
do caseID = 1 to 1e4;
fumert1 = rand('table',.2,.2,.2) - 1;
sex = first(substrn('MF',rand('table',.5),1));
output;
end;
stop;
run;
%let class=sex fumert1;
proc transpose data=have(obs=0) out=vnames;
var &class;
run;
proc print;
run;
proc sql noprint;
select catx('=',_name_,cats(_name_,'_')), catx('=',cats(_name_,'_'),_name_), cats(_name_,'_')
into :rename1 separated by ' ', :rename2 separated by ' ', :class2 separated by ' '
from vnames;
quit;
%put NOTE: &=rename1;
%put NOTE: &=rename2;
%put NOTE: &=class2;
proc transreg data=have(rename=(&rename1));
model class(&class2 / zero=none);
id caseid;
output out=design(drop=_: inter: rename=(&rename2)) design;
run;
%put NOTE: _TRGIND(&_trgindn)=&_trgind;
First try:
Looking at the code you supplied and the output from Joe's I don't really understand the need for the formats. It looks to me like you just want to create dummies for a list of class variables. That can be done with TRANSREG.
data have;
call streaminit(1234);
do caseID = 1 to 1e4;
fumert1 = rand('table',.2,.2,.2) - 1;
sex = first(substrn('MF',rand('table',.5),1));
output;
end;
stop;
run;
proc transreg data=have;
model class(sex fumert1 / zero=none);
id caseid;
output out=design(drop=_: inter:) design;
run;
proc contents;
run;
proc print data=design(obs=40);
run;
One good alternative to your code is to use proc transpose. It won't get you 0's in the non-1 cells, but those are easy enough to get. It does have the disadvantage that it makes it harder to get your variables in a particular order.
Basically, transpose once to vertical, then transpose back using the old variable name concatenated to the variable value as the new variable name. Hat tip to Data null for showing this feature in a recent SAS-L post. If your version of SAS doesn't support concatenation in PROC TRANSPOSE, do it in the data step beforehand.
I show using PROC EXPAND to then set the missings to 0, but you can do this in a data step as well if you don't have ETS or if PROC EXPAND is too slow. There are other ways to do this - including setting up the dataset with 0s pre-proc-transpose - and if you have a complicated scenario where that would be needed, this might make a good separate question.
data have;
do caseID = 1 to 1e4;
fumert1 = rand('Binomial',.3,3);
sex = rand('Binomial',.5,1)+1;
output;
end;
run;
proc transpose data=have out=want_pre;
by caseID;
var fumert1 sex;
copy fumert1 sex;
run;
data want_pre_t;
set want_pre;
x=1; *dummy variable;
run;
proc transpose data=want_pre_t out=want delim=_;
by caseID;
var x;
id _name_ col1;
copy fumert1 sex;
run;
proc expand data=want out=want_e method=none;
convert _numeric_ /transformin=(setmiss 0);
run;
For this method, you need to use two concepts: the cntlout dataset from proc format, and code generation. This method will likely be faster than the other option I presented (as it passes through the data only once), but it does rely on the variable name <-> format relationship being straightforward. If it's not, a slightly more complex variation will be required; you should post to that effect, and this can be modified.
First, the cntlout option in proc format makes a dataset of the contents of the format catalog. This is not the only way to do this, but it's a very easy one. Specify the appropriate libname as you would when you create a format, but instead of making one, it will dump the dataset out, and you can use it for other purposes.
Second, we create a macro that performs your action one time (creating a variable with the name_value name and then assigning it to the appropriate value) and then use proc sql to make a bunch of calls to that macro, once for each row in your cntlout dataset. Note - you may need a where clause here, or some other modifications, if your format library includes formats for variables that aren't in your dataset - or if it doesn't have the nice neat relationship your example does. Then we just make those calls in a data step.
*Set up formats and dataset;
proc format;
value SEXF
1 = 'Homme'
2 = 'Femme' ;
value FUMERT1F
0 = 'Non'
1 = 'Oui , occasionnellement'
2 = 'Oui , régulièrement'
3 = 'Non mais j''ai déjà fumé' ;
quit;
data have;
do caseID = 1 to 1e4;
fumert1 = rand('Binomial',.3,3);
sex = rand('Binomial',.5,1)+1;
output;
end;
run;
*Dump formats into table;
proc format cntlout=formats;
quit;
*Macro that does the above assignment once;
%macro spread_var(var=, val=);
&var._&val.= (&var.=&val.); *result of boolean expression is 1 or 0 (T=1 F=0);
%mend spread_var;
*make the list. May want NOPRINT option here as it will make a lot of calls in your output window otherwise, but I like to see them as output.;
proc sql;
select cats('%spread_var(var=',substr(fmtname,1,length(Fmtname)-1),',val=',start,')')
into :spreadlist separated by ' '
from formats;
quit;
*Actually use the macro call list generated above;
data want;
set have;
&spreadlist.;
run;