I know that Sas starts with the observation at the top of a dataset when processing and proceeds to the next until it reaches the bottom observation, but is there an easy way to make sas process the bottom observation first and then work its way to the top?
You can use nobs and point to process it backwards without having to do any intermediate steps. Here's an example:
data backwards;
do k= nobs to 1 by -1;
set sashelp.class nobs = nobs point=k;
output;
end;
stop;
run;
proc print data=sashelp.class;run;
proc print data=backwards;run;
See page 2 of this pdf for all the juicy details.
You certainly can change your data to be in reverse order, then process top down. Add a variable to the data set which acts as an index..then sort the data set descending by that variable.
data work.myData ;
set work.myData ;
indx = _n_ ;
run ;
proc sort data=work.myData ;
by descending indx ;
run ;
You can flip your observations order in a single step using PROC SQL:
proc sql;
create table work.cars as
select *
from sashelp.cars
order by monotonic() desc;
quit;
The key here is order by monotonic() desc which translates as "sort by descending observation numbers".
Alternatively, you can create a view (instead of creating a table) which will refer to the original table, but in a reverse observation number order:
proc sql;
create view work.cars_rev as
select *
from sashelp.cars
order by monotonic() desc;
quit;
Related
This code works for top value but I need top 5 values
proc sql;
create table cash.gO5 as
select * , max(Transaction_Due_Date) as max1 format = date9.
from cash.Orders_Dim65
group by Customer_Name;
quit;
PROC SQL does not support order analytical functions such as rank() as found in other flavors of SQL; however, there are numerous ways in which you can get a rank by group. Here are a few options you can use.
Option 1: PROC RANK
proc rank does exactly what it sounds like: ranks stuff. Note that your data must be sorted if being used in SAS 9 or SPRE.
proc rank data=sashelp.cars
out=want(where=(msrp_rank LE 5))
descending;
by make;
var msrp; /* Variable to rank */
ranks msrp_rank; /* Name of variable holding ranks */
run;
Option 2: Data Step
You can rank using a data step. Note that your data must be sorted if using SAS 9 or SPRE.
proc sort data=sashelp.cars
out=cars;
by make descending msrp;
run;
data want;
set cars;
by make descending msrp;
if(first.make) then Rank = 0;
Rank+1;
if(Rank LE 5);
run;
Option 3: simple.topK CAS Action
If you have Viya, you can use CAS actions to quickly rank large datasets. This can be used in both SAS and Python with the SWAT package.
/* Load sashelp.cars into CAS */
data casuser.cars;
set sashelp.cars;
run;
proc cas;
simple.topk result=r /
table = {caslib='casuser' name='cars' groupby='make'}
casout = {caslib='casuser' name='cars_top_5' replace=true}
aggregator ='max'
bottomK = 0
topK = 5
inputs = {{name='msrp'}}
;
quit;
I have two codes one proc sql and another proc and datastep. Both are interlinked datasets.
Below is the proc sql lines.
create table new as select a.id,a.alid,b.pdate from tb a inner join
tb1 act on a.aid =act.aid left join tb2 as b on (r.alid=a.alid) where
a.did in (15,45); quit;
Below is the proc and datasteps created from above datatset new.
proc sort data = new uodupkey;
by alid;
data new1;
set new;
format ddate date9.
dat1=datepart(today);
datno=input(number,20.);
key=_n_;
rename alid blid;
run;
proc sort data=new1 nodupkey;
by datno dat1;
run;
I need to put everything into single proc sql step.
You mention two data steps but I only see one.
Anyway, your data step and proc sort can indeed be written in one sql query (which you can then insert in your proc sql):
proc sql;
create table new1 as
select id
,alid as blid
,pdate
,datepart(today) as dat1
,input(number,20.) as datno
,monotonic() as key
from new1
group by datno, dat1
having key=min(key)
;
quit;
One remark though. Your data step expects variables called ddate,today and number in your input dataset new. If that dataset is supposed to be the result of your first sql query, then those variables don't exist and their values along with those of dat1 and datno in new1 will always be missing.
Also I assume you misspelled nodupkey on your proc sort.
EDIT: or, to have it all in the same query (if that's what you meant with "the same proc sql"):
proc sql;
create table new1 as
select id
,alid as blid
,pdate
,datepart(today) as dat1
,input(number,20.) as datno
,monotonic() as key
from (
select a.id,a.alid,b.pdate
from tb a
inner join tb1 act
on a.aid =act.aid
left join tb2 as b
on (r.alid=a.alid)
where a.did in (15,45)
)
group by datno, dat1
having key=min(key)
;
quit;
It’s the first time that I’ve opened sas today and I’m looking at some code a colleague wrote.
So let’s say I have some data (import) where duplicates occur but I want only those which have a unique number named VTNR.
First she looks for unique numbers:
data M.import;
set M.import;
by VTNR;
if first.VTNR=1 then unique=1;
run;
Then she creates a table with the duplicated numbers:
data M.import_dup1;
set M.import;
where unique^=1;
run;
And finally a table with all duplicates.
But here she is really hardcoding the numbers, so for example:
data M.import_dup2;
set M.import;
where VTNR in (130001292951,130100975613,130107546425,130108026864,130131307133,130134696722,130136267001,130137413257,130137839451,130138291041);
run;
I’m sure there must be a better way.
Since I’m only familiar with R I would write something like:
import_dup2 <- subset(import, is.element(import$VTNR, import_dup1$VTNR))
I guess there must be something like the $ also for sas?
To me it looks like the most direct translation of the R code
import_dup2 <- subset(import, is.element(import$VTNR, import_dup1$VTNR))
Would be to use SQL code
proc sql;
create table import_dup2 as
select * from import
where VTNR in (select VTNR from import_dup1)
;
quit;
But if your intent is to find the observations in IMPORT that have more than one observation per VTNR value there is no need to first create some other table.
data import_dup2 ;
set import;
by VTNR ;
if not (first.VTNR and last.VTNR);
run;
I would use the options in PROC SORT.
Make sure to specify an OUT= dataset otherwise you'll overwrite your original data.
/*Generate fake data with dups*/
data class;
set sashelp.class sashelp.class(obs=5);
run;
/*Create unique and dup dataset*/
proc sort data=class nouniquekey uniqueout=uniquerecs out=dups;
by name;
run;
/*Display results - for demo*/
proc print data=uniquerecs;
title 'Unique Records';
run;
proc print data=dups;
title 'Duplicate Records';
run;
Above solution can give you duplicates but not unique values. There are many possible ways to do both in SAS. Very easy to understand would be a SQL solution.
proc sql;
create table no_duplicates as
select *
from import
group by VTNR
having count(*) = 1
;
create table all_duplicates as
select *
from import
group by VTNR
having count(*) > 1
;
quit;
I would use Reeza's or Tom's solution, but for completeness, the solution most similar to R (and your preexisting code) would be three steps. Again, I wouldn't use this here, it's excess work for something you can do more easily, but the concept is helpful in other situations.
First, get the dataset of duplicates - either her method, or proc sort.
proc sort nodupkey data=have out=nodups dupout=dups;
by byvar;
run;
Then pull those into a macro list:
proc sql;
select byvar
into :duplist separated by ','
from dups;
quit;
Then you have them in &duplist. and can use them like so:
data want;
set have;
if not (byvar in &duplist.);
run;
data want;
set import;
where VTNR in import_dup1;
run;
I have a file with 10 obs. and different parameters. I need to add to my data a new variable of 'ID' for each observation- i.e a column of numbers 1-10.
How can I add a variable that is simply equal to the obs column?
I thought about doing it with a loop, define an empty vat, run over the var and each time add '1' to previous observation, however, it seems kind of complicated. Is there a better way to do it?
You can use the Data Step automatic variable _n_. This is the iteration count of the Data Step loop.
Data want;
set have;
ID = _n_;
run;
If you opt for a Proc SQL solution, there are two ways:
1. Undocumented:
proc sql;
create table want as
select monotonic() as row, *
from sashelp.class
;
quit;
Documented:
ods listing close;
ods output sql_results=want;
proc sql number;
select * from sashelp.class;
quit;
ods listing;
#DomPazz answer would definitely work! Just in case you would like return the number of observations according to attributes, Try this:
proc sort data= dataset out= sort_data;
by * your attribute(s) *;
data sort_data;
set sort_data;
by * your attribute(s) that is listed in above proc sort statement *;
if first.attribute then i=1; <=== first by group observation, number =1
i + 1; <==== sum statement (retaining)
if last.attribute and .... then ....; <=== whatever you want to do . Not necessary
run;
first / Last is very helpful in doing row operation.
I've been trying to make my code more efficient and this is the original code, but I think it can be written in one step.
data TABLE;set ORIGINAL_DATA;
Multi=percent*total_units;
keep Multi Type;
proc sort; by Type;
proc means noprint data=TABLE1; by Type; var Multi;output out=Table2(drop= _type_ _freq_)sum=Multi;run;
proc means noprint data=Table1; var Multi;output out=Table3(drop= _type_ _freq_) sum=total ;run;
proc sql;
create table TABLE4as
select a.Type, a.Multi label="Multi", b.total label="total"
from TABLE2 a, TABLE3 b
order by Type;
quit;
data TABLE5;set TABLE4;
pct=(MULTI/total)*100;
run;
I am able to split up part of it, but I can't figure out how to get the PCT part in my code. This is what I have.
proc sql;
create table TABLE1 as
select distinct type, sum(percent*total_units) as MULTI label "MULTI",
MULTI/(percent*total_units)) as PCT
from ORIGINAL_DATA
group by type;
quit;
I had to edit some of the code but I think the general idea should make sense.
The main problem is I cannot call upon the MULTI column because it is just being created but I want to create a percentage of the total for each type.
The "SAS" way to do something like this is to use a CLASS statement with PROC MEANS. That will calculate statistics on all the interaction levels in the data (identified by the TYPE variable). The row where TYPE=0 will be the "total" value, representing the value of that statistic for the entire data set.
In your case, we can take advantage of the fact that PROC MEANS will create the output data set sorted by TYPE and by the variables listed in the CLASS statement. That means we can just read the first observation and save it's value for calculating percentages.
It's probably easier to just show some code:
data TABLE;
set ORIGINAL_DATA;
Multi = percent * total_units;
keep Multi Type;
run;
proc means noprint data=TABLE;
class Type;
var multi;
output out=next sum=;
run;
data want;
retain total;
set next;
if _n_ = 1 then do;
/* The first obs will be the _TYPE_=0 record */
total = multi;
delete;
end;
pct = (multi / total) * 100;
drop total _freq_ _type_;
run;
Notice that you do not need to sort the data before using PROC MEANS. That's because we are using a CLASS statement rather than a BY statement. The data step is using the first observation in the data set created by MEANS (the TYPE=0 record) to retain the total sum of your variable. The delete statement keeps it out of the result.
CLASS statements with PROC MEANS are very useful. Take a few minutes to read up on how the TYPE variable is calculated, especially if you try using more than one class variable.
You can skip the initial data step by using the WEIGHT option in VAR statement of PROC MEANS (this will effectively do the multiplication for you). You can also use PROC TABULATE instead of PROC MEANS, as tabulate can calculate the percentage. I believe the following code will produce your required output in one go.
ods noresults;
proc tabulate data=have out=want (drop=_: rename=(total_units_sum=total total_units_pctsum_0=pct));
class type;
var total_units / weight=percent;
table type, total_units*(sum pctsum);
run;
ods results;
If you need one step, maybe this will work, but it's not actually efficient, since it processes data twice, once for detail by TYPE, once for total.
proc sql;
create table TABLE1 as
select
d.type
, sum(d.percent*d.total_units) as MULTI label "MULTI"
, calculated MULTI/s.total as PCT
from ORIGINAL_DATA d,
( select sum(percent*total_units) as total
from ORIGINAL_DATA) s
group by type
;
quit;
For more efficiency, but in more than one steps you could simply replace tables withe views in your original code:
data TABLE; => data TABLE / view=TABLE;
create table TABLE4 => create view TABLE4