I am working with a huge dataset in sas trying to use proc sql and I need help setting up a like statement. I'm trying to extract all the columns that have 'eco' in the name
I'm getting an error in the where statement as it is not registering the second *.
Any help?
proc sql
select *
from cfy19e8
where * LIKE %eco%;
You could concatenate all of your columns with catx() and find any one that has the word eco.
data have;
input col1$ col2$ col3$;
datalines;
sadfeco kdoa wrfs
asdf asdf sadf
mfecosa mawoeco mfzeco
;
run;
data want;
set have;
where catx('|', col1, col2, col3) LIKE '%eco%';
run;
If you have a lot of character columns, you could use the shortcut _CHARACTER_ to concatenate all variables, then use find() within an if statement in a data step.
data want;
set have;
if(find(catx('|', of _CHARACTER_), 'eco') );
run;
Perhaps
proc contents noprint data=cfy19e8 out=eco_columns(where=(upcase(name) like '%ECO%'));
run;
title 'Columns with ECO in their name';
proc print data=eco_columns;
var name;
run;
Related
Suppose that I have the following list
proc sql;
select name into: list separated by ' ' from dataset;
quit;
and I want to keep the elements of this list in my dataset2. The datastep
data dataset2;
set dataset2;
if name in &list.;
run;
does not work. How should I modify the first proc sql so that the datastep makes sense? Thanks in advance!
This assumes you're looking to filter values within a column, not the number of columns.
If the value is numeric this is what you need:
data dataset3;
set dataset2;
if name not in (&list.);
run;
If the value is character then this is what you need:
proc sql;
select quote(name) into: list separated by ' ' from dataset;
quit;
data dataset3;
set dataset2;
if name not in (&list.);
run;
I've created a user defined format by using proc format statements.Would like to create a macro over it in a way that if the input data changes, the code should able to do change accordingly.
Here is the code:
proc format ;
value $a 1='1-sepstrata'
0='0-Non-sepstrata'
A='A-sepstrata';
run;
In the dateset I've,a columns named stratum which has unique values such as 1,0,A.
Select the distinct values of STRATA and use it to generate the format definition in a file. Then use PROC FORMAT to create the format.
proc sql;
create table fmtdef as
select '$A' as fmtname
, strata as start
, catx('-',strata
,case when (strata='0') then 'Non-sepstrata' else 'sepstrata' end
) as label
from have
group by strata
order by fmtname,start
;
quit;
proc format lib=work.formats cntlin=fmtdef;
run;
I want an answer for this.
The input I have is:
ABC123
The output I want is:
123ABC
How to print the output in this format (i.e. backwards) using Proc SQL?
Based on the information given and assuming that all your data is in same format you can tweak substr function in proc sql
data have;
value='ABC123';
run;
proc sql;
create table want
as
select value,
substr(value,4,4)||substr(value,1,3) as new_value
from have;
quit;
proc print data=want; run;
The same function can be applied in data step as well.
You will probably want to use trim() to deal with the trailing spaces that SAS stores in character variables.
trim(substr(have,4))||substr(have,1,3)
If you want an algorithm that would work with similar strings of any length (any number of letters followed by any number of digits), I suggest using regular expressions to modify the input string.
outStr = prxChange("s/([A-z]+)([\d]+)/$2$1/", 1, inStr);
You can easily use it within proc sql.
data test1;
inStr = "ABCdef12345";
run;
proc sql;
create table test2 as
select prxChange("s/([A-z]+)([\d]+)/$2$1/", 1, inStr) as outStr
from test1;
quit;
Base SAS contains a function REVERSE that is dedicated to reversing a string, and that can be used both in proc sql and in a datastep. See example in SAS documentation or here:
proc sql;
select Name,
reverse(Name) as Name_reversed
from sashelp.class
;
quit;
output:
Name | Name_reversed
--------|--------------
Alfred | derflA
Alice | ecilA
Barbara | arabraB
etc.
It’s the first time that I’ve opened sas today and I’m looking at some code a colleague wrote.
So let’s say I have some data (import) where duplicates occur but I want only those which have a unique number named VTNR.
First she looks for unique numbers:
data M.import;
set M.import;
by VTNR;
if first.VTNR=1 then unique=1;
run;
Then she creates a table with the duplicated numbers:
data M.import_dup1;
set M.import;
where unique^=1;
run;
And finally a table with all duplicates.
But here she is really hardcoding the numbers, so for example:
data M.import_dup2;
set M.import;
where VTNR in (130001292951,130100975613,130107546425,130108026864,130131307133,130134696722,130136267001,130137413257,130137839451,130138291041);
run;
I’m sure there must be a better way.
Since I’m only familiar with R I would write something like:
import_dup2 <- subset(import, is.element(import$VTNR, import_dup1$VTNR))
I guess there must be something like the $ also for sas?
To me it looks like the most direct translation of the R code
import_dup2 <- subset(import, is.element(import$VTNR, import_dup1$VTNR))
Would be to use SQL code
proc sql;
create table import_dup2 as
select * from import
where VTNR in (select VTNR from import_dup1)
;
quit;
But if your intent is to find the observations in IMPORT that have more than one observation per VTNR value there is no need to first create some other table.
data import_dup2 ;
set import;
by VTNR ;
if not (first.VTNR and last.VTNR);
run;
I would use the options in PROC SORT.
Make sure to specify an OUT= dataset otherwise you'll overwrite your original data.
/*Generate fake data with dups*/
data class;
set sashelp.class sashelp.class(obs=5);
run;
/*Create unique and dup dataset*/
proc sort data=class nouniquekey uniqueout=uniquerecs out=dups;
by name;
run;
/*Display results - for demo*/
proc print data=uniquerecs;
title 'Unique Records';
run;
proc print data=dups;
title 'Duplicate Records';
run;
Above solution can give you duplicates but not unique values. There are many possible ways to do both in SAS. Very easy to understand would be a SQL solution.
proc sql;
create table no_duplicates as
select *
from import
group by VTNR
having count(*) = 1
;
create table all_duplicates as
select *
from import
group by VTNR
having count(*) > 1
;
quit;
I would use Reeza's or Tom's solution, but for completeness, the solution most similar to R (and your preexisting code) would be three steps. Again, I wouldn't use this here, it's excess work for something you can do more easily, but the concept is helpful in other situations.
First, get the dataset of duplicates - either her method, or proc sort.
proc sort nodupkey data=have out=nodups dupout=dups;
by byvar;
run;
Then pull those into a macro list:
proc sql;
select byvar
into :duplist separated by ','
from dups;
quit;
Then you have them in &duplist. and can use them like so:
data want;
set have;
if not (byvar in &duplist.);
run;
data want;
set import;
where VTNR in import_dup1;
run;
I currently have a dataset with 200 variables. From those variables, I created 100 new variables. Now I would like to drop the original 200 variables. How can I do that?
Slightly better would be, how I can drop variables 3-200 in the new dataset.
sorry if I was vague in my question but basically I figured out I need to use --.
If my first variable is called first and my last variable is called last, I can drop all the variables inbetween with (drop= first--last);
Thanks for all the responses.
As with most SAS tasks, there are several alternatives. The easiest and safest way to drop variables from a SAS data set is with PROC SQL. Just list the variables by name, separated by a comma:
proc sql;
alter table MYSASDATA
drop name, age, address;
quit;
Altering the table with PROC SQL removes the variables from the data set in place.
Another technique is to recreate the data set using a DROP option:
data have;
set have(drop=name age address);
run;
And yet another way is using a DROP statement:
data have;
set have;
drop name age address;
run;
Lots of options - some 'safer', some less safe but easier to code. Let's imagine you have a dataset with variables ID, PLNT, and x1-x200 to start with.
data have;
id=0;
plnt=0;
array x[200];
do _t = 1 to dim(x);
x[_t]=0;
end;
run;
data want;
set have;
*... create new 100 variables ... ;
*option 1:
drop x1-x200; *this works when x1-x200 are numerically consecutive;
*option 2:
drop x1--x200; *this works when they are physically in order on the dataset -
only the first and last matter;
run;
*Or, do it this way. This would also work with SQL ALTER TABLE. This is
the safest way to do it.;
proc sql;
select name into :droplist separated by ' ' from dictionary.columns
where libname='WORK' and memname='HAVE' and name not in ('ID','PRNT');
quit;
proc datasets lib=work;
modify want;
drop &droplist.;
quit;
If all of the variables you want to drop are named so they all start the same (like old_var_1, old_var_2, ..., old_var_n), you could do this (note the colon in drop option):
data have;
set have(drop= old_var:);
run;
data want;
set have;
drop VAR1--VARx;
run;
Would love to know if you can do this by position.
Definitely works with variable names separated by double dash (--).
I have some macros that would allow this here
You could run that whole set of macros, or just run list_vars(), is_blank(), num_words, find_word, remove_word, remove_words , nth_word().
Using these it would be:
%let keep_vars = keep_this and_this also_this;
%let drop_vars = %list_vars(old_dataset);
%let drop_vars = %remove_words(&drop_vars , &keep_vars);
data new_dataset (drop = &drop_vars );
set old_dataset;
/*stuff happens*/
run;
This will keep the three variables keep_this and_this also_this but drop everything else in the old dataset.