I'm trying to write a regular expression in oracle that match the following number format :
ab0ab0 where b can be strictly equal to a-1 or a+1
but I'm having difficulties with increasing or decreasing a digit value in regex.
I don't want to add all possible values in the regular expression with alternation.
what I'm done so far is : ^(\d)(\d)0\1\2(0)$
but it doesn't check if b=a-1 or a+1
Apart from writing out all the options you could subtract one digit from the other:
SELECT *
FROM your_table
WHERE TO_NUMBER( REGEXP_SUBSTR( your_column, '^(\d)(\d)0\1\20$', 1, 1, NULL, 1 ) )
- TO_NUMBER( REGEXP_SUBSTR( your_column, '^(\d)(\d)0\1\20$', 1, 1, NULL, 2 ) )
IN ( -9, -1, 1, 9 )
(-9 and 9 will include the values 090090 and 900900 if you expect the subtraction/addition to wrap around - if you do not want these values then just use -1 and 1)
or, check whether the 2-digit value is 1 or 10 modulo 11:
SELECT *
FROM your_table
WHERE MOD( TO_NUMBER( REGEXP_SUBSTR( your_column, '^(\d\d)0\10$', 1, 1, NULL, 1 ) ), 11 )
IN ( 1, 10 )
OR your_column IN ( '090090', '900900' )
Just as Wiktor said, you can't do it without alternations.
Went ahead and typed it out for you though.
^((01|12|23|34|45|56|67|78|89|90|09|98|87|76|65|54|43|32|21|10)0){2}$
Related
I am using Amazon Athena engine version 1, which is based on Presto 0.172.
Consider the example data set:
id
date_column
col1
1
01/03/2021
NULL
1
02/03/2021
1
1
15/03/2021
2
1
16/03/2021
NULL
1
17/03/2021
NULL
1
30/03/2021
NULL
1
30/03/2021
1
1
31/03/2021
NULL
I would like to replace all NULLs in the table with the last non-NULL value i.e. I want to get:
id
date_column
col1
1
01/03/2021
NULL
1
02/03/2021
1
1
15/03/2021
2
1
16/03/2021
2
1
17/03/2021
2
1
30/03/2021
1
1
30/03/2021
1
1
31/03/2021
1
I was thinking of using a lag function with IGNORE NULLS option but unfortunately, IGNORE NULLS is not supported by Athena engine version 1 (it is also not supported by Athena engine version 2, which is based on Presto 0.217).
How to achieve the desired format without using the IGNORE NULLS option?
Here is some template for generating the example table:
WITH source1 AS (
SELECT
*
FROM (
VALUES
(1, date('2021-03-01'), NULL),
(1, date('2021-03-02'), 1),
(1, date('2021-03-15'), 2),
(1, date('2021-03-16'), NULL),
(1, date('2021-03-17'), NULL),
(1, date('2021-03-30'), NULL),
(1, date('2021-03-30'), 1),
(1, date('2021-03-31'), NULL)
) AS t (id, date_col, col1)
)
SELECT
id
, date_col
, col1
-- This doesn't work as IGNORE NULLS is not supported.
-- CASE
-- WHEN col1 IS NOT NULL THEN col1
-- ELSE lag(col1) OVER IGNORE NULLS (PARTITION BY id ORDER BY date_col)
-- END AS col1_lag_nulls_ignored
FROM
source1
ORDER BY
date_co
After reviewing similar questions on SO (here and here), the below solution will work for all column types (including Strings and dates):
WITH source1 AS (
SELECT
*
FROM (
VALUES
(1, date('2021-03-01'), NULL),
(1, date('2021-03-02'), 1),
(1, date('2021-03-15'), 2),
(1, date('2021-03-16'), NULL),
(1, date('2021-03-17'), NULL),
(1, date('2021-03-30'), 1),
(1, date('2021-03-31'), NULL)
) AS t (id, date_col, col1)
)
, grouped AS (
SELECT
id
, date_col
, col1
-- If the row has a value in a column, then this row and all subsequent rows
-- with a NULL (before the next non-NULL value) will be in the same group.
, sum(CASE WHEN col1 IS NULL THEN 0 ELSE 1 END) OVER (
PARTITION BY id ORDER BY date_col) AS grp
FROM
source1
)
SELECT
id
, date_col
, col1
-- max is used instead of first_value, since in cases where there will
-- be multiple records with NULL on the same date, the first_value may
-- still return a NULL.
, max(col1) OVER (PARTITION BY id, grp ORDER BY date_col) AS col1_filled
, grp
FROM
grouped
ORDER BY
date_col
I have the following measure which calculates a node size for a graph chart table:
MEASURE tMeasures[m_ONA_nodeSize_flex] =
IF(
ISBLANK([m_ONA_rankFilter_nodes]),
BLANK(),
var empCnt = ROW(
"data",
CALCULATE(
SUMX( DISTINCT(tONAAppend[id]), 1),
FILTER(
ALL(tONAAppend),
NOT( ISBLANK([m_ONA_rankFilter_nodes]))
)
)
)
RETURN
IF(
empCnt > 25, ROUND( 1500 / empCnt, 0),
60
)
)
[m_ONA_rankFilter_nodes] is used to filter those nodes which exist in edges in the same table. These edges are also filtered using multiple conditions with a measure m_ONA_edgeValue_afterRank, which returns BLANK() if a row doesn't match filters and edge_value if it does.
The script before using tMeasures[m_ONA_nodeSize_flex] with included measures [m_ONA_rankFilter_nodes] and m_ONA_edgeValue_afterRank works relatively fast:
EVALUATE
TOPN(
501,
SUMMARIZECOLUMNS(
'tONAAppend'[tableName],
'tONAAppend'[name],
'tONAAppend'[nameFrom],
'tONAAppend'[nameTo],
'tONAAppend'[id],
'tONAAppend'[idFrom],
'tONAAppend'[idTo],
'tONAAppend'[photoSrc],
__DS0FilterTable,
__DS0FilterTable2,
__DS0FilterTable3,
__DS0FilterTable4,
"m_ONA_edgeValue_afterRank", 'tMeasures'[m_ONA_edgeValue_afterRank],
"m_ONA_rankFilter_nodes", 'tMeasures'[m_ONA_rankFilter_nodes]
),
'tONAAppend'[tableName],
1,
'tONAAppend'[name],
1,
'tONAAppend'[nameFrom],
1,
'tONAAppend'[nameTo],
1,
'tONAAppend'[id],
1,
'tONAAppend'[idFrom],
1,
'tONAAppend'[idTo],
1,
'tONAAppend'[photoSrc],
1
)
However, when I replace 'tMeasures'[m_ONA_rankFilter_nodes] with a 'tMeasures'[m_ONA_nodeSize_flex] it starts working dramatically slower:
EVALUATE
TOPN(
501,
SUMMARIZECOLUMNS(
'tONAAppend'[tableName],
'tONAAppend'[name],
'tONAAppend'[nameFrom],
'tONAAppend'[nameTo],
'tONAAppend'[id],
'tONAAppend'[idFrom],
'tONAAppend'[idTo],
'tONAAppend'[photoSrc],
__DS0FilterTable,
__DS0FilterTable2,
__DS0FilterTable3,
__DS0FilterTable4,
"m_ONA_edgeValue_afterRank", 'tMeasures'[m_ONA_edgeValue_afterRank],
"m_ONA_nodeSize_flex", 'tMeasures'[m_ONA_nodeSize_flex]
),
'tONAAppend'[tableName],
1,
'tONAAppend'[name],
1,
'tONAAppend'[nameFrom],
1,
'tONAAppend'[nameTo],
1,
'tONAAppend'[id],
1,
'tONAAppend'[idFrom],
1,
'tONAAppend'[idTo],
1,
'tONAAppend'[photoSrc],
1
)
As I understand, the problem is in DAX engine working: it tries to calculate the value of this measure for each row. I think the fastest algo could be calculate once, then send to storage, and then populate all rows with it.
How can I optimize and force DAX to work more efficient?
I found good articles which relate to this topic and then implemented approach with variables written there:
https://www.sqlbi.com/articles/optimizing-conditions-involving-blank-values-in-dax/
https://www.sqlbi.com/articles/understanding-eager-vs-strict-evaluation-in-dax/
It worked as expected - all measures put in special variables were materialized in a Storage Engine. The performance has increased dramatically.
I want to replace everything in a string with '' except for a given pattern using Oracle's regexp_replace.
In my case the pattern refers to German licence plates. The patterns is contained in the usage column (verwendungszweck_bez) of a revenue table (of a bank). The pattern can be matched by ([a-z]{1,3})[- ]([a-z]{1,2}) ?([0-9]{1,4}). Now I'd like to reverse the matching pattern in order to match everything except for the pattern.
The usage column looks like this:
ALLIANZ VERSICHERUNGS-AG VERTRAG AS-9028000568 KFZ-VERSICHERUNG KFZ-VERS. XX-Y 427 01.01.19 - 31.12.19
XX-Y 427 would be the pattern I'm interested in. The string can contain more than one license plate:
AXA VERSICHERUNG AG 40301089910 KFZ HAFTPFLICHT ABC-RM10 37,35 + 40330601383 KFZ HAFTPFLIVHT ABC-LX 283 21,19
In this case I need ABC-RM10 and ABC-LX 283.
So far I just replace everything from the string with regexp_replace:
regexp_replace(lower(a.verwendungszweck_bez),'^(.*?)kfz','')
because there's always 'kfz' in the string and the licence plate information follows (not necessarily direct) after that.
upper(regexp_replace(regexp_substr(regexp_replace(lower(a.verwendungszweck_bez),'(^(.*?)kfz',''),'([a-z]{1,3})[- ]([a-z]{1,2}) ?([0-9]{1,4})',1,1),'([a-z]{1,3})[- ]([a-z]{1,2}) ?([0-9]{1,4})','\1-\2 \3'))
This works but I'm sure there's a better solution.
The result should be a list of customers, licence plates and count of cars like this:
Customer|licence plates |count
1234567 |XX-Y 427| 1
1255599 |ABC-RM 10 + ABC-LX 283| 2
You can use a recursive sub-query to find the items. Also, you can use UPPER and TRANSLATE to normalise the data to remove the optional separators in the number plates and convert it into a single case:
Test Data:
CREATE TABLE test_data ( value ) AS
SELECT 'ALLIANZ VERSICHERUNGS-AG VERTRAG AS-9028000568 KFZ-VERSICHERUNG KFZ-VERS. XX-Y 427 01.01.19 - 31.12.19' FROM DUAL UNION ALL
-- UNG AG 4030 should not match
SELECT 'AXA VERSICHERUNG AG 40301089910 KFZ HAFTPFLICHT ABC-RM10 37,35 + 40330601383 KFZ HAFTPFLIVHT ABC-LX 283 21,19' FROM DUAL UNION ALL
-- Multiple matches adjacent to each other
SELECT 'AA-A1BB-BB222CC C3333' FROM DUAL UNION ALL
-- Duplicate values with different separators and cases
SELECT 'AA-A1 AA-A 1 aa a1' FROM DUAL
Query:
WITH items ( value, item, next_pos ) AS (
SELECT value,
TRANSLATE( UPPER( REGEXP_SUBSTR( value, '([^a-z]|^)([a-z]{1,3}[- ][a-z]{1,2} ?\d{1,4})(\D|$)', 1, 1, 'i', 2 ) ), '_ -', '_' ),
REGEXP_INSTR( value, '([^a-z]|^)([a-z]{1,3}[- ][a-z]{1,2} ?\d{1,4})(\D|$)', 1, 1, 1, 'i', 2 ) - 1
FROM test_data
UNION ALL
SELECT value,
TRANSLATE( UPPER( REGEXP_SUBSTR( value, '([^a-z]|^)([a-z]{1,3}[- ][a-z]{1,2} ?\d{1,4})(\D|$)', next_pos, 1, 'i', 2 ) ), '_ -', '_' ),
REGEXP_INSTR( value, '([^a-z]|^)([a-z]{1,3}[- ][a-z]{1,2} ?\d{1,4})(\D|$)', next_pos, 1, 1, 'i', 2 ) - 1
FROM items
WHERE next_pos > 0
)
SELECT item,
COUNT(*)
FROM items
WHERE item IS NOT NULL AND NEXT_POS > 0
GROUP BY item
Output:
ITEM | COUNT(*)
:------- | -------:
CCC3333 | 1
AAA1 | 4
XXY427 | 1
ABCRM10 | 1
ABCLX283 | 1
BBBB222 | 1
db<>fiddle here
The result should be a list of customers ...
You haven't given any information on how customers relate to this; that part is left as an exercise to the reader (who hopefully has the client values somewhere and can correlate them to the input).
Update:
If you want the count of unique number plates per row then:
WITH items ( rid, value, item, next_pos ) AS (
SELECT ROWID,
value,
TRANSLATE( UPPER( REGEXP_SUBSTR( value, '([^a-z]|^)([a-z]{1,3}[- ][a-z]{1,2} ?\d{1,4})(\D|$)', 1, 1, 'i', 2 ) ), '_ -', '_' ),
REGEXP_INSTR( value, '([^a-z]|^)([a-z]{1,3}[- ][a-z]{1,2} ?\d{1,4})(\D|$)', 1, 1, 1, 'i', 2 ) - 1
FROM test_data
UNION ALL
SELECT rid,
value,
TRANSLATE( UPPER( REGEXP_SUBSTR( value, '([^a-z]|^)([a-z]{1,3}[- ][a-z]{1,2} ?\d{1,4})(\D|$)', next_pos, 1, 'i', 2 ) ), '_ -', '_' ),
REGEXP_INSTR( value, '([^a-z]|^)([a-z]{1,3}[- ][a-z]{1,2} ?\d{1,4})(\D|$)', next_pos, 1, 1, 'i', 2 ) - 1
FROM items
WHERE next_pos > 0
)
SELECT LISTAGG( item, ' + ' ) WITHIN GROUP ( ORDER BY item ) AS items,
COUNT(*)
FROM (
SELECT DISTINCT
rid,
item
FROM items
WHERE item IS NOT NULL AND NEXT_POS > 0
)
GROUP BY rid;
Which outputs:
ITEMS | COUNT(*)
:----------------------- | -------:
XXY427 | 1
ABCLX283 + ABCRM10 | 2
AAA1 + BBBB222 + CCC3333 | 3
AAA1 | 1
db<>fiddle here
PowerBI conditional calculation is not working.
I have created custom column and I am writing formula where I can get two different calculation for past month and current month
I have tried if else and switch function but it is not giving desired result.
This is Difference I am calculating based on 3 different columns from 3 different data sources.
Difference =
( SUM ( Opportunity[Revenue] ) + SUM ( 'August 2019'[Revenue] ) )
- SUM ( '2018 Invoice'[Revenue] )
I would like to get result where (Opportunity[Month]) = 1 ,2,3,4,5,6, 7 then Difference should be
Difference = sum('August 2019' [Revenue]))-sum('2018 Invoice'[Revenue])
else
Difference =
( SUM ( Opportunity[Revenue] ) + SUM ( 'August 2019'[Revenue] ) )
- SUM ( '2018 Invoice'[Revenue] )
How about this?
Difference =
IF (
Opportunity[Month] IN { 1, 2, 3, 4, 5, 6, 7 },
SUM ( Opportunity[Revenue] )
)
+ SUM ( 'August 2019'[Revenue] )
- SUM ( '2018 Invoice'[Revenue] )
If the Month > 7, then the IF returns BLANK() and you just get the last two terms.
Hi I have some problem which I can't solve, maybe some of you can help me
I need to show result of division.
select 50/200
as we all know it supposed be 0.25, however, I got 0
so them i try this
SELECT ROUND(CAST(50 AS NUMERIC(18,2) )/ CAST(200 AS NUMERIC(18,2)),2)
which gives me 0.25000000000000000000
I then tried to use Round
select cast(round(50/200,2) as numeric(36,2))
but its returning me 0.00
How would i fix this to just show 0.25?
You could just do this:
SELECT CAST ( ROUND ( 50 / 200.0 , 2 ) AS numeric ( 18 , 2 )) ;
EDIT:
Per your comment, you could modify it to this.
SELECT CAST ( ROUND ( #int1 / CAST ( #int2 AS numeric ( 18 , 2 )) , 2 ) AS numeric ( 18 , 2 )) ;