How to find email domains using proc sql - sas

In mysql this would look like:
substring_index(email, '#', -1)
But I can't seem to find if a function like this is available in SAS proc sql.

this seems to work
data _null_;
x='rxx#gmail.com';
dom=substr(x,findc(x,'#'));
put dom;
run;

Use FINDW to find the position of # and SUBSTR to return the remainder?

Related

Proc SQL SAS Basic

I want an answer for this.
The input I have is:
ABC123
The output I want is:
123ABC
How to print the output in this format (i.e. backwards) using Proc SQL?
Based on the information given and assuming that all your data is in same format you can tweak substr function in proc sql
data have;
value='ABC123';
run;
proc sql;
create table want
as
select value,
substr(value,4,4)||substr(value,1,3) as new_value
from have;
quit;
proc print data=want; run;
The same function can be applied in data step as well.
You will probably want to use trim() to deal with the trailing spaces that SAS stores in character variables.
trim(substr(have,4))||substr(have,1,3)
If you want an algorithm that would work with similar strings of any length (any number of letters followed by any number of digits), I suggest using regular expressions to modify the input string.
outStr = prxChange("s/([A-z]+)([\d]+)/$2$1/", 1, inStr);
You can easily use it within proc sql.
data test1;
inStr = "ABCdef12345";
run;
proc sql;
create table test2 as
select prxChange("s/([A-z]+)([\d]+)/$2$1/", 1, inStr) as outStr
from test1;
quit;
Base SAS contains a function REVERSE that is dedicated to reversing a string, and that can be used both in proc sql and in a datastep. See example in SAS documentation or here:
proc sql;
select Name,
reverse(Name) as Name_reversed
from sashelp.class
;
quit;
output:
Name | Name_reversed
--------|--------------
Alfred | derflA
Alice | ecilA
Barbara | arabraB
etc.

Using a set statement to select many files with a pattern

I am trying to load in a large number of data sets that have a similar pattern to their naming in a set statement ie
data output;
set &filenames. ;
I have seen in other questions a method of creating a macro with all of the names using a proc sql statement but I can't seem to figure out how to do this. I was wondering if someone could help me to construct this proc sql code. The naming pattern is something like abcXX03 where abc is common and the XX is changing (ie it can be XX XY or XZ say) but I only want to select the ones with the patern abcXX03. I've tried something like
proc sql;
select memnames into :names
separated by " "
from dictionary.tables
where libname eq "LIBNAME" and
memname like "%XX03"
quit;
based on previous answers.
Use the colon short cut. This will APPEND all the data sets into the same file
data want;
set abc: ;
run;

SAS: How to calculate frequency for all character variables except some

I know I can have something like the following to calculate frequency for all Chars:
proc freq data=sashelp.class;
tables _char_;
run;
However, is there a way to exclude some variables? I want to do something like:
proc freq data=sashelp.class;
tables _char_ EXCEPT VAR1 VAR2;
run;
Thank you so much!
you can use drop = , as shown below.
proc freq data=sashelp.cars(drop=origin make);
tables _char_;
run;
The drop example is the simplest, certainly, and probably best if that's exactly the request.
However, if it's slightly different - such as, you want to include (or exclude) all character variables matching a particular pattern, you can use macro variables constructed from dictionary.columns (or proc contents output dataset).
proc sql;
select name
into :freqlist separated by ' '
from dictionary.columns
where memname='YOURTABLE' and libname='YOURLIB'
and type='char' and name like 'PATTERN%'
;
quit;
Obviously filling in the various uppercase things as appropriate. Usually MEMNAME, LIBNAME, and NAME are stored upper case, though not always, so consider adding UPCASE() to them.
Then you can put &freqlist.; on the tables statement to get the list of columns that match your query.

SAS equivalent to R’s is.element()

It’s the first time that I’ve opened sas today and I’m looking at some code a colleague wrote.
So let’s say I have some data (import) where duplicates occur but I want only those which have a unique number named VTNR.
First she looks for unique numbers:
data M.import;
set M.import;
by VTNR;
if first.VTNR=1 then unique=1;
run;
Then she creates a table with the duplicated numbers:
data M.import_dup1;
set M.import;
where unique^=1;
run;
And finally a table with all duplicates.
But here she is really hardcoding the numbers, so for example:
data M.import_dup2;
set M.import;
where VTNR in (130001292951,130100975613,130107546425,130108026864,130131307133,130134696722,130136267001,130137413257,130137839451,130138291041);
run;
I’m sure there must be a better way.
Since I’m only familiar with R I would write something like:
import_dup2 <- subset(import, is.element(import$VTNR, import_dup1$VTNR))
I guess there must be something like the $ also for sas?
To me it looks like the most direct translation of the R code
import_dup2 <- subset(import, is.element(import$VTNR, import_dup1$VTNR))
Would be to use SQL code
proc sql;
create table import_dup2 as
select * from import
where VTNR in (select VTNR from import_dup1)
;
quit;
But if your intent is to find the observations in IMPORT that have more than one observation per VTNR value there is no need to first create some other table.
data import_dup2 ;
set import;
by VTNR ;
if not (first.VTNR and last.VTNR);
run;
I would use the options in PROC SORT.
Make sure to specify an OUT= dataset otherwise you'll overwrite your original data.
/*Generate fake data with dups*/
data class;
set sashelp.class sashelp.class(obs=5);
run;
/*Create unique and dup dataset*/
proc sort data=class nouniquekey uniqueout=uniquerecs out=dups;
by name;
run;
/*Display results - for demo*/
proc print data=uniquerecs;
title 'Unique Records';
run;
proc print data=dups;
title 'Duplicate Records';
run;
Above solution can give you duplicates but not unique values. There are many possible ways to do both in SAS. Very easy to understand would be a SQL solution.
proc sql;
create table no_duplicates as
select *
from import
group by VTNR
having count(*) = 1
;
create table all_duplicates as
select *
from import
group by VTNR
having count(*) > 1
;
quit;
I would use Reeza's or Tom's solution, but for completeness, the solution most similar to R (and your preexisting code) would be three steps. Again, I wouldn't use this here, it's excess work for something you can do more easily, but the concept is helpful in other situations.
First, get the dataset of duplicates - either her method, or proc sort.
proc sort nodupkey data=have out=nodups dupout=dups;
by byvar;
run;
Then pull those into a macro list:
proc sql;
select byvar
into :duplist separated by ','
from dups;
quit;
Then you have them in &duplist. and can use them like so:
data want;
set have;
if not (byvar in &duplist.);
run;
data want;
set import;
where VTNR in import_dup1;
run;

SAS Drop Condition Output - Not enough variables observed in Output

I am learning drop_conditions in SAS. I am requesting a sample on three variables, however only two observations are retrieved. Please advise! Thank you!
data temp;
set mydata.ames_housing_data;
format drop condition $30.;
if (LotArea in (7000:9000)) then drop_condition = '01: Between 7000-9000 sqft';
else if (LotShape EQ 'IR3') then drop_condition = '02: Irregular Shape of Property';
else if (condition1 in 'Artery' OR 'Feedr') then drop_condition = '03: Proximity to arterial/feeder ST';
run;
proc freq data=temp;
tables drop_condition;
title 'Sample Waterfall';
run; quit;
Your conditions/comparisons aren't specified correctly, I think you're looking for the IN operator to do multiple comparisons in a single line. I'm surprised there aren't errors in your log.
Rather than the following:
if (LotArea = 7000:9000)
Try:
if (lotArea in (7000:9000))
and
if (condition1 EQ 'Atrery' OR 'Feedr')
should be
if (condition1 in ('Atrery', 'Feedr'))
EDIT:
You also need to specify a length for the drop_condition variable, instead of a format to ensure the variable is long enough to hold the text specified. It's also useful to verify your answer afterwards with a proc freq against the specified conditions, for example:
proc freq data=temp;
where drop_condition=:'01';
tables drop_condition*lot_area;
run;