SQL retrieving 1st run data from a date range

SQL retrieving 1st run data from a date range - c++

I need to retrieve 1st run information from an Oracle database for a particular date range. First run means ignoring rows where serial numbers that are run at a later time.
Note: 1 = Passed, 0 = Fail
Example of my data is:
SERIALNUM TIMESTAMP_ PASSED …{more data}
001 2015-01-07T11:22:50 0
002 2015-01-07T11:24:00 0
003 2015-01-07T11:25:50 1
001 2015-01-07T11:26:50 1
004 2015-01-07T11:28:50 1
005 2015-01-07T11:29:50 1
006 2015-01-07T11:31:50 1
002 2015-01-07T11:30:50 0
002 2015-01-07T11:33:50 1
007 2015-01-07T11:35:50 1
008 2015-01-07T11:36:50 1
0010 2015-01-07T11:39:50 1
009 2015-01-07T11:37:50 1
Desired results, 10 units tested, 2 failed, 8 passed.
Using Excel to get my proper 1st run data I:
[step1] Delete rows outside of my date range.
[step2] Sort by SERIALNUM (1st level) TIMESTAMP_ (2nd level).
[step3] Remove Duplicate SERIALNUM.
[step4] Then count the number of passed units (1 = pass).
This gives me my desired results.
Changing the order gives me undesired results.
I can get the data from the database from my selected range by using:
SELECT SERIALNUM, TIMESTAMP_, PASSED
FROM dbTble
WHERE TO_DATE('01/07/2015 09:00:00', 'MM/DD/YYYY HH24:MI:SS') <= TIMESTAMP_
AND TIMESTAMP_ < TO_DATE('01/07/2015 14:59:59', 'MM/DD/YYYY HH24:MI:SS')
ORDER BY SERIALNUM, TIMESTAMP_
It seems like I should be using subquery, but I saw a note that subquery cannot be sorted.
How can I accomplish this with SQL command?

This will get the first run (only) of runs within the date range. If there is a run for a particular serialnum earlier than the minimum date of the range, it won't be excluded:
SELECT serialnum, timestamp_, passed FROM (
SELECT serialnum, timestamp_, passed
, ROW_NUMBER() OVER ( PARTITION BY serialnum ORDER BY timestamp_ ) AS rn
FROM dbtable
WHERE TO_DATE('01/07/2015 09:00:00', 'MM/DD/YYYY HH24:MI:SS') <= timestamp_
AND timestamp_ < TO_DATE('01/07/2015 14:59:59', 'MM/DD/YYYY HH24:MI:SS')
) WHERE rn = 1
ORDER BY serialnum, timestamp_
The window function ROW_NUMBER() ranks according to earliest (use DESC after timestamp_ to force latest).
Hope this helps.

Related

Latest values by category based on a selected date

First, as I am a French guy, I want to apologise in advance for my poor English!
Despite my searches since few days, I cannot find the correct measure to solve my problem.
I think I am close to the solution, but I really need help to achieve this job!
Here is my need:
I have a dataset with a date table and a "Position" (i.e. "stock") table, which is my fact table, with date column.
Classic relationship between these 2 tables. Many Dates in "Position" table / 1 date un "Dates" table.
My "Dates" table has a one date per day (Column "AsOf")
My "Deals" table looks like this:
Id
DealId
AsOfDate
Notional
10000
1
9/1/2022
2000000
10001
1
9/1/2022
3000000
10002
1
9/1/2022
1818147
10010
4
5/31/2022
2000000
10011
4
5/31/2022
997500
10012
4
5/31/2022
1500000
10013
4
5/31/2022
1127820
10014
5
7/27/2022
140000
10015
5
7/27/2022
210000
10016
5
7/27/2022
500000
10017
5
7/27/2022
750000
10018
5
7/27/2022
625000
10019
1
8/31/2022
2000000
10020
1
8/31/2022
3000000
10021
1
8/31/2022
1801257
10022
1
8/31/2022
96976
10023
1
8/31/2022
1193365
10024
1
8/31/2022
67883
Based on a selected date (slicer with all dates from "Dates" table), I would like to calculate the sum of Last Notional for each "Deal" (column "DealId").
So, I must identify, for each Deal, the last "Asof Date" before or equal to the selected date and sum all matching rows.
Examples:
If selected date is 9/1/2022, I will see all rows, except rows asof date = 8/31/2022 for deal 1 (as the last date for this deal is 9/1/2022).
So, I expect to see:
DealId Sum of Notional
1 6 818 147
4 5 625 320
5 2 225 000
Grand Total 14 668 467
If I select 8/31/2022, total for Deal 1 changes (as we now take rows of 8/31 instead of 1/9):
DealId Sum of Notional
1 8 159 481
4 5 625 320
5 2 225 000
Grand Total 16 009 800
If I select 7/29, only deals 4 and 5 are active on this date, so the results should be:
DealId Sum of Notional
4 5 625 320
5 2 225 000
Grand Total 7 850 320
I think I found a solution for the rows, but my total is wrong (only notionals of the selected date are totalized).
I also think my measure is incorrect if I try to display the notional amounts aggregated by Rating (other column in my table) instead of deal.
Here is my measure:
Last Notional =
VAR SelectedAsOf =
SELECTEDVALUE ( Dates[AsOf] )
VAR LastAsofPerDeal =
CALCULATE (
MAX ( Deals[AsOf Date] ),
FILTER ( ALLEXCEPT ( Deals, Deals[DealId] ), Deals[AsOf Date] <= SelectedAsOf )
)
RETURN
CALCULATE (
SUM ( Deals[Notional] ),
FILTER (
ALLEXCEPT ( Deals, Deals[DealId]),
LastAsofPerDeal = Deals[AsOf Date]
)
)
I hope it is clear for you, and you will be able to find a solution for this.
Thanks in advance.
Antoine

Make sure you have no relationship between your calendar table and deals table like so.
Create a slicer with your dates table and create a table visual with deal id. Then add a measure to the table as follows:
Sum of Notional =
VAR slicer = SELECTEDVALUE(Dates[Date])
VAR tbl = FILTER(Deals,Deals[AsOfDate] <= slicer)
VAR maxBalanceDate = CALCULATE(MAX(Deals[AsOfDate]),tbl)
RETURN
CALCULATE(
SUM(Deals[Notional]),
Deals[AsOfDate] = maxBalanceDate
)

replicate a SQL report in powerbi - create 2 queries and merge them to get result

I am trying to create a report in power BI where I have to create one query which creates 30 calculated columns, then merge it with another query with left outer join to get my results. I am using measures to do my calculations to create the 30 columns and when I bring them together in report view, I lose my results from the second query.
I tried to create calculated columns in a new table to store results but since all calculations do a distinct count of account numbers, I am unable to put results in the same table, so I used measures to do my calculations.
Cannot post the code online :(
Expected result:
School name Code Col1 Col2 Col2
a ABC 1000 0 0
b BBB 2000 2000 2000
c AAB 0 0 0
d NNN 4000 4000 0
e ACE 0 0 0
Getting this result
School name Code Col1 Col2 Col2
a ABC 1000 0 0
b BBB 2000 2000 2000
d NNN 4000 4000 0

Percent Retention

I am trying to create a "Percent Retention" for policies during a given time period ( By month, YTD and year over year) . So all of the policies at a given time period compared to those active at the end of the period.
Policies can be:
N=New
RN=ReNew
C=Cancel
RI=ReInstate
NR=NonRenew
Transaction data kinda looks like this, the StatusNum is something I can derive to show inforce status.
PolicyID PolicyNum StatusDate Status StatusNum Net
1 123 1/1/2018 N 1 1
2 123 3/31/2018 C 0 -1
3 123 4/1/2018 RI 1 +1
4 123 6/1/2018 RN 1 0
5 222 2/1/2018 N 1 1
6 222 7/1/2018 RN 1 0
7 333 1/1/2018 N 1 1
8 333 6/1/2018 NR 0 -1
9 444 1/1/2018 N 1 1
10 444 5/30/2018 C 0 -1
My best guess on how to do this is to take the sum of the last StatusNum values at a PIT (partitioned by Policy Number) divided by the first StatusNum value at the beginning PIT. So if I filter by dates 1/1/2018 to 8/1/2018
123 will be in force (+1,+1)
222 will not be in force yet(so not counted for anything) (+0,+0)
333 was in force at the beginning, but it non renewed (+1,-1)
444 was in force at the beginning, but it cancelled (+1,-1)
So 3 of the policies were active at 1/1/2018 and 2 cancelled, 1 doesn't matter so the retention would be 33.3%
Can anyone offer feedback if this is the best way to do this and how to accomplish this?
Thank you in advance for your assistance.
Update
This is kinda what I am looking for, but it is too slow:
'AsOfPolicies =
var A= SELECTCOLUMNS(SUMMARIZECOLUMNS(Transactions[PolicyNumber], filter( Transactions,Transactions[DateKey]=min(Transactions[DateKey])&&Transactions[IsInForce]=-1) ),"aPolicyNumber", [PolicyNumber])
var B=SELECTCOLUMNS(SUMMARIZECOLUMNS(Transactions[PolicyNumber], filter( Transactions,Transactions[DateKey]<=MAX(Transactions[DateKey]) ),"MaxDate",MAX(Transactions[DateKey]) ),"bPolicyNumber",[PolicyNumber],"MaxDate",[MaxDate]) var C = SELECTCOLUMNS(filter(CROSSJOIN(A,B),[aPolicyNumber]=[bPolicyNumber]),"cPolicyNumber",[aPolicyNumber],"MaxDateKey",[MaxDate])
Var D = SELECTCOLUMNS(filter(CROSSJOIN(C,Transactions),[cPolicyNumber]=[PolicyNumber] && [MaxDateKey]=[DateKey]),"PolicyNumber",[PolicyNumber],"PD_ID",[PD_ID],"IsInForce",[IsInForce])
Return D'
Update
Also the filter does not appear to be working

I think you can do something like this:
Retention =
VAR StartDates =
SUMMARIZE (
ALLSELECTED ( PolicyLog ),
PolicyLog[PolicyNum],
"Start", MIN ( PolicyLog[StatusDate] )
)
VAR Included =
SELECTCOLUMNS (
FILTER ( StartDates, [Start] <= MIN ( Dates[Date] ) ),
"Policies", PolicyLog[PolicyNum]
)
VAR Filtered = FILTER ( PolicyLog, PolicyLog[PolicyNum] IN Included )
RETURN
DIVIDE (
SUMX ( Filtered, PolicyLog[Net] ),
COUNTROWS ( SUMMARIZE ( Filtered, PolicyLog[PolicyNum] ) )
)
First, you create a table, StartDates, that gives the earliest dates for each policy limited to the time frame you have selected. It would look something like this:
StartData =
PolicyNum Start
123 1/1/2018
222 2/1/2018
333 1/1/2018
444 1/1/2018
From there, we just want a list of which policies we want to include in the calculation. So we pick the ones that have a Start on the minimum selected date in the date slicer. We just want a list of the resulting policy numbers, so we just select that column.
Included =
Policies
123
333
444
From there we filter the whole PolicyLog table to just include these ones (Filtered).
Finally, we can add up the Net column for each of these selected policies and divide by the distinct count of them to get our retention percentage.
Edit: In response to your comment, I think you want to be a bit more selective with the StartDate variable. Instead of MIN( PolicyLog[StatusDate] ), try something more like this:
CALCULATE( MIN(PolicyLog[StatusDate]), PolicyLog[Status] IN {"N", "RN", "RI"} )

Oracle - ( may be Regular expression ) - Match eighty percent of chars from 2 string literals

I am trying to match two columns from 2 different tables. I need to pickup records which matches 80% of the chars.
Like tableA has a colA and tableB has colB. colA has a value 'ABCDSEFG' and colb has value 'XAB*CDEFG'.
select colA
from tableA, tableB
where colA matches 80% chars from cloB
=> should return ABCDEFG ( since 80% chars matching to 'XAB*CDEFG').
Any help is appreciated.

Have a look at UTL_MATCH package. One of its functions is Jaro-Winkler similarity which returns values between 0 (no match) and 100 (perfect match).
SQL> with taba (cola) as (select 'ABCDSEFG' from dual union
2 select 'AB68S33G' from dual union
3 select 'asdfghjk' from dual
4 ),
5 tabb (colb) as (select 'XAB*CDEFG' from dual union
6 select 'ABR8S33GA' from dual union
7 select 'asdfghjk' from dual
8 )
9 select a.cola, b.colb,
10 utl_match.jaro_winkler_similarity (a.cola, b.colb) jw
11 from taba a, tabb b;
COLA COLB JW
-------- --------- ----------
AB68S33G ABR8S33GA 90
AB68S33G XAB*CDEFG 56
AB68S33G asdfghjk 0
ABCDSEFG ABR8S33GA 71
ABCDSEFG XAB*CDEFG 88
ABCDSEFG asdfghjk 0
asdfghjk ABR8S33GA 0
asdfghjk XAB*CDEFG 0
asdfghjk asdfghjk 100
9 rows selected.
SQL>
You'd add a WHERE clause to that query, i.e.
where utl_match.jaro_winkler_similarity (a.cola, b.colb) >= 80
in order to return values that satisfy your condition.

Drop all observations by ID where conditions are not met

I have a dataset with ~4 million transactional records, grouped by Customer_No (consisting of 1 or more transactions per Customer_No, denoted by a sequential counter). Each transaction has a Type code and I am only interested in customers where a particular combination of transaction Types were used. Neither joining the table on itself or using EXISTS in Proc Sql is allowing me to efficiently evaluate the transaction Type criteria. I suspect a data step using retain and do-loops would process the dataset faster
The dataset:
Customer_No Tran_Seq Tran_Type
0001 1 05
0001 2 12
0002 1 07
0002 2 86
0002 3 04
0003 1 07
0003 2 84
0003 3 84
0003 4 84
The criteria I am trying to apply:
All Customer_No's Tran_Type's must only be in ('04','05','07','84','86'),
drop all transactions for that Customer_No if any other Tran_Type was used
Customer_No's Tran_Type's must include ('84' or '86') AND '04', drop all transactions for the Customer_No if this condition is not met
The output I want:
Customer_No Tran_Seq Tran_Type
0002 1 07
0002 2 86
0002 3 04

The DoW loop solution should be the most efficient if the data is sorted. If it's not sorted, it will either be the most efficient or similar in scale but slightly less efficient depending on the circumstances of the dataset.
I compared to Dom's solution with a 3e7 ID dataset, and got for the DoW a similar (slightly less) total length with less CPU for unsorted dataset, and about 50% faster for sorted. It is guaranteed to run in about the length of time the dataset takes to write out (maybe a bit more, but it shouldn't be much), plus sorting time if needed.
data want;
do _n_=1 by 1 until (last.customer_no);
set have;
by customer_no;
if tran_type in ('84','86')
then has_8486 = 1;
else if tran_type in ('04')
then has_04 = 1;
else if not (tran_type in ('04','05','07','84','86'))
then has_other = 1;
end;
do _n_= 1 by 1 until (last.customer_no);
set have;
by customer_no;
if has_8486 and has_04 and not has_other then output;
end;
run;

I don't think it's that complicated. Join to a subquery, group by Customer_No, and put your conditions in a having clause. A condition in a min function must be true for all rows, whereas a condition in a max function must be true for any one row:
proc sql;
create table want as
select
h.*
from
have h
inner join (
select
Customer_No
from
have
group by
Customer_No
having
min(Tran_Type in('04','05','07','84','86')) and
max(Tran_Type in('84','86')) and
max(Tran_Type eq '04')) h2
on h.Customer_No = h2.Customer_No
;
quit;

I must have made a join error. On re-writing, Proc Sql completed in less than 30 seconds (on the original 4.9 million record dataset). It's not particularly elegant code though, so I'd still appreciate any improvements or alternative methods.
data Have;
input Customer_No $ Tran_Seq $ Tran_Type:$2.;
cards;
0001 1 05
0001 2 12
0002 1 07
0002 2 86
0002 3 04
0003 1 07
0003 2 84
0003 3 84
0003 4 84
;
run;
Proc sql;
Create table Want as
select t1.* from Have t1
LEFT JOIN (select DISTINCT Customer_No from Have
where Tran_Type not in ('04','05','07','84','86')
) t2
ON(t1.Customer_No=t2.Customer_No)
INNER JOIN (select DISTINCT Customer_No from Have
where Tran_Type in ('84','86')
) t3
ON(t1.Customer_No=t3.Customer_No)
INNER JOIN (select DISTINCT Customer_No from Have
where Tran_Type in ('04')
) t4
ON(t1.Customer_No=t4.Customer_No)
Where t2.Customer_No is null
;Quit;

I would offer a slightly less complex SQL solution than #naed555 using the INTERSECT operator.
proc sql noprint;
create table to_keep as
(
select distinct customer_no
from have
where tran_type in ('84','86')
INTERSECT
select distinct customer_no
from have
where tran_type in ('04')
)
EXCEPT
select distinct customer_no
from have
where tran_type not in ('04','05','07','84','86')
;
create table want as
select a.*
from have as a
inner join
to_keep as b
on a.customer_no = b.customer_no;
quit;

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

SQL retrieving 1st run data from a date range - c++

Related

Latest values by category based on a selected date

replicate a SQL report in powerbi - create 2 queries and merge them to get result

Percent Retention

Oracle - ( may be Regular expression ) - Match eighty percent of chars from 2 string literals

Drop all observations by ID where conditions are not met

Categories

Resources