Best Practice - To generate a RUN_ID for my SAS ETL - sas

I have a SAS ETL Process which runs daily and I keep track of the run using a control table. I've been using sequential numbers that I generate for each run. But is there a better or Best Practice?
My Process of generating run_id is:
data have;
input run_id date $;
datalines;
0 12dec2017
1 21jan2018
2 1feb2018
;
run;
proc sql; select max(run_id) into :id from have ; quit;
I get the max+1 and use it as the next run_id. In the example above my next run_id will be 3 (2+1).

I would recommend using a datetime stamp as run_id instead of a sequence so the number itself will be meaningful; it can be char or numeric but in this format YYYYMMDDHHMMSS so it will be easier to sort by.
This code will generate the id for you:
data new;
run_id=&id+1;
id_char="%sysfunc(today(),yymmddn8.)_%sysfunc(compress(%sysfunc(time(),time6.) ,:))";
id_num=%sysfunc(today(),yymmddn8.)%sysfunc(compress(%sysfunc(time(),time6.) ,:));
run;
Output:
run_id=3 id_char=20180517_1234 id_num=201805171234

Do this:
proc sql; select max(run_id)+1 into :id from have ; quit;

Related

Proc SQL SAS Basic

I want an answer for this.
The input I have is:
ABC123
The output I want is:
123ABC
How to print the output in this format (i.e. backwards) using Proc SQL?
Based on the information given and assuming that all your data is in same format you can tweak substr function in proc sql
data have;
value='ABC123';
run;
proc sql;
create table want
as
select value,
substr(value,4,4)||substr(value,1,3) as new_value
from have;
quit;
proc print data=want; run;
The same function can be applied in data step as well.
You will probably want to use trim() to deal with the trailing spaces that SAS stores in character variables.
trim(substr(have,4))||substr(have,1,3)
If you want an algorithm that would work with similar strings of any length (any number of letters followed by any number of digits), I suggest using regular expressions to modify the input string.
outStr = prxChange("s/([A-z]+)([\d]+)/$2$1/", 1, inStr);
You can easily use it within proc sql.
data test1;
inStr = "ABCdef12345";
run;
proc sql;
create table test2 as
select prxChange("s/([A-z]+)([\d]+)/$2$1/", 1, inStr) as outStr
from test1;
quit;
Base SAS contains a function REVERSE that is dedicated to reversing a string, and that can be used both in proc sql and in a datastep. See example in SAS documentation or here:
proc sql;
select Name,
reverse(Name) as Name_reversed
from sashelp.class
;
quit;
output:
Name | Name_reversed
--------|--------------
Alfred | derflA
Alice | ecilA
Barbara | arabraB
etc.

SAS: How to Automate the Creation of Many Datasets using Another Data set

I am looking to create multiple datasets from city_variables dataset. There are a total of 58 observations that I summed up into macrovariable (&count) to stop the do loop.
The city_variables dataset looks like (vertically ofcourse):
CITY_NAME
City1
City2
City3
City4
City5
City6
City7
City8
City9
City10
..........
City58
I created macrovariable &name from a data null statement in order to input the cityname into the dataset name.
Any help would be great on how to automate the creation of the 48 files by name (not number). Thanks again.
/Create macro with number of observations in concordinate file/
proc sql;
select count(area_name);
into :count
from main.state_all;
quit;
%macro repeat;
data _null_;
set city_variables;
%do i= 1 %UNTIL (i = &count);
call symput('name',CITY_NAME);
run;
data &name;
set dataset;
where city_name = &name;
run;
%end;
%mend repeat;
%repeat
Well, if you're going to do
proc sql;
select count(area_name);
into :count
from main.state_all;
quit;
Then why not go all the way? Make a macro that does one dataset output, given the criteria as parameters, then make one call for each separate whatever-name. This might be close to what you're looking at.
%macro make_data(data_name=, set_name=, where=);
data &data_name.;
set &set_name.;
where &where.;
run;
%mend make_data;
proc sql;
select
cats('%make_data(data_name=',city_name,
', set_name=dataset, where=city_name="',
city_name,
'" )')
into :make_datalist
separated by ' '
from main.state_all;
quit;
&make_datalist.;
Some other options that I'll just link to:
Chris Hemedinger # SAS Dummy blog How to Split One Data Set Into Many shows a similar concept except he doesn't put the macro wrapper where I do.
Paul Dorfman, Data Step Hash Objects as Programming Tools is the seminal paper on using a hash table to do this. This is the "fastest" way to do this, likely, if you understand hash tables and have the memory available.
You don't need to use a macro to automate splitting up your data in this way. Since your example is really simple, I would consider using call execute in a null data step:
data test;
infile datalines ;
input city_name $20.;
datalines;
City1
City2
City2
City3
City3
City3
;
run;
data _null_;
set test;
call execute("data "||strip(city_name)||";"||"
set test;
where city_name = '"||strip(city_name)||"';"||"
run;");
run;

SAS equivalent to R’s is.element()

It’s the first time that I’ve opened sas today and I’m looking at some code a colleague wrote.
So let’s say I have some data (import) where duplicates occur but I want only those which have a unique number named VTNR.
First she looks for unique numbers:
data M.import;
set M.import;
by VTNR;
if first.VTNR=1 then unique=1;
run;
Then she creates a table with the duplicated numbers:
data M.import_dup1;
set M.import;
where unique^=1;
run;
And finally a table with all duplicates.
But here she is really hardcoding the numbers, so for example:
data M.import_dup2;
set M.import;
where VTNR in (130001292951,130100975613,130107546425,130108026864,130131307133,130134696722,130136267001,130137413257,130137839451,130138291041);
run;
I’m sure there must be a better way.
Since I’m only familiar with R I would write something like:
import_dup2 <- subset(import, is.element(import$VTNR, import_dup1$VTNR))
I guess there must be something like the $ also for sas?
To me it looks like the most direct translation of the R code
import_dup2 <- subset(import, is.element(import$VTNR, import_dup1$VTNR))
Would be to use SQL code
proc sql;
create table import_dup2 as
select * from import
where VTNR in (select VTNR from import_dup1)
;
quit;
But if your intent is to find the observations in IMPORT that have more than one observation per VTNR value there is no need to first create some other table.
data import_dup2 ;
set import;
by VTNR ;
if not (first.VTNR and last.VTNR);
run;
I would use the options in PROC SORT.
Make sure to specify an OUT= dataset otherwise you'll overwrite your original data.
/*Generate fake data with dups*/
data class;
set sashelp.class sashelp.class(obs=5);
run;
/*Create unique and dup dataset*/
proc sort data=class nouniquekey uniqueout=uniquerecs out=dups;
by name;
run;
/*Display results - for demo*/
proc print data=uniquerecs;
title 'Unique Records';
run;
proc print data=dups;
title 'Duplicate Records';
run;
Above solution can give you duplicates but not unique values. There are many possible ways to do both in SAS. Very easy to understand would be a SQL solution.
proc sql;
create table no_duplicates as
select *
from import
group by VTNR
having count(*) = 1
;
create table all_duplicates as
select *
from import
group by VTNR
having count(*) > 1
;
quit;
I would use Reeza's or Tom's solution, but for completeness, the solution most similar to R (and your preexisting code) would be three steps. Again, I wouldn't use this here, it's excess work for something you can do more easily, but the concept is helpful in other situations.
First, get the dataset of duplicates - either her method, or proc sort.
proc sort nodupkey data=have out=nodups dupout=dups;
by byvar;
run;
Then pull those into a macro list:
proc sql;
select byvar
into :duplist separated by ','
from dups;
quit;
Then you have them in &duplist. and can use them like so:
data want;
set have;
if not (byvar in &duplist.);
run;
data want;
set import;
where VTNR in import_dup1;
run;

SAS Keep maximum value by ID

Each ID has several instances, and each instance has a different value. I would like the final output to be the maximum value per ID. So the initial dataset is:
ID Value
1 100
1 7
1 65
2 12
2 97
3 82
3 54
And the output will be:
ID Value
1 100
2 97
3 82
I tried running proc sort twice thinking that the first sort would get things in the proper order so that nodupkey on the second sort would get rid of the right values. This did not work.
proc sort work.data; by id value descending; run;
proc sort work.data nodupkey; by id; run;
Thanks!
Your approach should have worked fine but it looks like you have a syntax error - did you forget to check your log? The descending keyword needs to go before the variable you want to sort in descending order.
proc sort data=sashelp.class out=tmp;
by sex descending height;
run;
proc sort data=tmp out=final nodupkey;
by sex;
run;
Also - in case you're not familiar with SQL, I strongly suggest that you should learn it as it will simplify many data manipulation tasks. This can also be solved in a single SQL step:
proc sql noprint;
create table want as
select sex,
max(height) as height
from sashelp.class
group by sex
;
quit;
My preferred solution:
proc means data=have noprint;
class id;
var value;
output out=want max(value)=;
run;
Should be a lot faster than two sorts.

create unique id variable based on existing id variable

Trying to make a more simple unique identifier from already existing identifier. Starting with just and ID column I want to make a new, more simple, id column so the final data looks like what follows. There are 1million + id's, so it isnt an option to do if thens, maybe a do statement?
ID NEWid
1234 1
3456 2
1234 1
6789 3
1234 1
A trivial data step solution not using monotonic().
proc sort data=have;
by id;
run;
data want;
set have;
by id;
if first.id then newid+1;
run;
using proc sql..
(you can probably do this without the intermediate datasets using subqueries, but sometimes monotonic doesn't act the way you'd think in a subquery)
proc sql noprint;
create table uniq_id as
select distinct id
from original
order by id
;
create table uniq_id2 as
select id, monotonic() as newid
from uniq_id
;
create table final as
select a.id, b.newid
from original_set a, uniq_id2 b
where a.id = b.id
;
quit;