How to remove duplicates in SAS data - sas

I am trying to delete the observations in my data set that are the same across multiple variables.
For example
PIN Start Date End Date
1 Jan 1 2014 Jan 3 2014>
1 Jan 1 2014 Jan 3 2015
3 March 2 2014 March 5 2014
4 July 1 2014 July 8 2014
5 July 1 2014 July 8 2014
6 August 9 2014 August 24 2014
I would want to remove those with the same PIN and Start Date.

Translate the string dates into SAS dates first.
data have2;
set have(rename=(start_date = _start_date
end_date = _end_date) );
start_date = input(strip(_start_date), anydtdte10.);
end_date = input(strip(_end_date), anydtdte10.);
format start_date end_date date9.;
drop _start_date _end_date;
run;
Then use proc sort nodupkey.
proc sort data=have2 nodupkey;
by pin start_date;
run;

Related

Calculate the lag of years by groups of records and between consecutive dates

I would like to ask your help to do the following:
I have a data set that looks like this:
ID Date1 Date2
a 2015 2015
a 2016 2016
a 2017 2018
a 2018 2020
b 2015 2016
b 2018 2019
b 2020 2020
.... ..... .....
Desired output:
ID Lag
a Start
a 1
a 1
a 0
b Start
b 3
b 1
.... ..... .....
I need to count how many years pass from Date2 to Date1 next row for each ID. For example: from Date2 = 2015 (first row) to Date1 2016 (second row) there is 1 year of difference. Can anyone help me please? The first row in the desired output should be set to "Start" to indicate that the lag cannot be calculated because it is the starting point.
Thank you in advance
Simple lag() and a first.
data want ;
set have ;
by ID ;
l = lag(Date2) ;
if not first.ID then diff = sum(Date1,-l) ;
run ;

How to change year end in power bi?

I have a matrix with column 1 as YEARMONTH (202001, 202002, ....)
Column 2 is SALES.
In the matrix currently the value for Jan to Dec 2020 gets summed for year 2020.
I want to have year to have months only from April to March. Example 2019 is Apr 2019 to Mar 2020. 2020 is Apr 2020 to Mar 2021.
How can I implement this in power bi?
Try with below:
Fiscal year
1. Create a FiscalYearNumber column as
FiscalYearNumber=If( Month([Date]) >= 7 , Year([Date]),Year([Date]) -1 )
FiscalYearDisplay = ="FY"&Right(Format([FiscalYearNumber],"0#"),2)&"-"&Right(Format([FiscalYearNumber]+1,"0#"),2)
2) Create a column called FiscalMonth
FiscalMonth=(If( Month([Date]) >= 7 , Month([Date]) - 6,Month([Date]) + 6 )
3) Create Fisical Quater
FiscalQuarterNumber = ROUNDUP ([FiscalMonth]/3,0)
FiscalQuarterDisplay= "FQ" & format([FiscalQuarterNumber],"0")

About keeping observation with specified criteria in SAS

Hello and many thanks in advance for your answers and efforts to help newby users in this forum.
i have a sas table with the variables : ID, Year, Month, and Creation date.
What i desire is, per month and year and Creation date to keep only one ID.
My HAVE data is :
ID Year Month Date of creation
1 2019 1 a
1 2019 1 a
1 2019 1 b
1 2019 2 c
1 2019 3 d
1 2020 5 e
2 2019 1 a
2 2019 1 b
2 2019 3 c
3 2021 8 m
3 2021 9 k
My WANT data is
ID Year Month Date of creation
1 2019 1 a
1 2019 1 b
1 2019 2 c
1 2019 3 d
1 2020 5 e
2 2019 1 a
2 2019 1 b
2 2019 3 c
3 2021 8 m
3 2021 9 k
I tried nodup key but it removes ID's.
Your example seems to work fine with NODUPKEY option of PROC SORT. Perhaps you used the wrong BY variables?
data have;
input ID Year Month Creation $ ;
cards;
1 2019 1 a
1 2019 1 a
1 2019 1 b
1 2019 2 c
1 2019 3 d
1 2020 5 e
2 2019 1 a
2 2019 1 b
2 2019 3 c
3 2021 8 m
3 2021 9 k
;
proc sort data=have out=want nodupkey;
by id year month creation ;
run;
You can also use distinct clause from proc sql, it will remove duplicates based on all columns
proc sql;
create table want
as
select distinct * from have;
quit;

SAS - deleting duplicates on multiple variables

I am trying to delete the observations in my data set that are the same across multiple variables.
For example
RID Admission Date End Date
1 Jan 1 2014 Jan 3 2014>
1 January 1 Jan 3 2014
1 March 2 2014 March 5 2014
2 July 1 2014 July 8 2014
2 July 1 2014 July 8 2014
2 August 9 2014 August 24 2014
I would want to keep all unique admissions for each RID, but delete any observations with the same RID and the same admission/end dates.
Thanks!
PROC SORT DATA=work.yourdatasetin OUT=work.datasetout NODUPLICATES ;
BY rid ;
RUN ;
Something like this might also work.
proc sql;
create table work.yourdatasetout as
select distinct
*
from
work.yourdatasetin;
quit;

How to delete the row with missing character values using SAS

I have a data set like this:
id type time
70657 23E Nov 4 2002 12:00AM
61651 12R
11603 DQ2
45819 Jul 23 2013 12:00AM
732 Mar 4 2011 12:00AM
22810 231
I want to do two things with missing values.
The first thing is how to remove the rows if the values of the variable time is " ".
desired output1
id type time
70657 23E Nov 4 2002 12:00AM
45819 Jul 23 2013 12:00AM
732 Mar 4 2011 12:00AM
The second thing is to remove the rows if there is any missing values.
desired output2
id type time
70657 23E Nov 4 2002 12:00AM
SAS code:
data character;
length id type time $ 24;
input id $ 1-5 type $ 8-10 time $ 13-31;
cards;
70657 23E Nov 4 2002 12:00AM
61651 12R
11603 DQ2
45819 Jul 23 2013 12:00AM
732 Mar 4 2011 12:00AM
22810 231
;
run;
I would be inclined to use proc sql. Something like:
proc sql;
create table newchar as
select *
from character
where id is not null and type is not null and time is not null;
quit;
The SAS alternative.
DATA WANT;
SET CHARACTER (WHERE = (TIME ~= "" AND TYPE ~= "" AND ID ~= ""));
RUN;