using Oracle REGEXP_INSTR to find exact word - regex

I want to return the following position from the strings using REGEXP_INSTR.
I am looking for the word car with exact match in the following strings.
car,care,oscar - 1
care,car,oscar - 6
oscar,care,car - 12
something like
SELECT REGEXP_INSTR('car,care,oscar', 'car', 1, 1) "REGEXP_INSTR" FROM DUAL;
I am not sure what kind of escape operators to use.

A simpler solution is to surround the source string and search string with commas and find the position using INSTR.
SELECT INSTR(',' || 'car,care,oscar' || ',', ',car,') "INSTR" FROM DUAL;
Example:
SQL Fiddle
with x(y) as (
SELECT 'car,care,oscar' from dual union all
SELECT 'care,car,oscar' from dual union all
SELECT 'oscar,care,car' from dual union all
SELECT 'car' from dual union all
SELECT 'cart,care,oscar' from dual
)
select y, ',' || y || ',' , instr(',' || y || ',',',car,')
from x
| Y | ','||Y||',' | INSTR(','||Y||',',',CAR,') |
|-----------------|-------------------|----------------------------|
| car,care,oscar | ,car,care,oscar, | 1 |
| care,car,oscar | ,care,car,oscar, | 6 |
| oscar,care,car | ,oscar,care,car, | 12 |
| car | ,car, | 1 |
| cart,care,oscar | ,cart,care,oscar, | 0 |

The following query handles all scenarios. It returns the starting position if the string begins with car, or the whole string is just car. It returns the starting position + 1 if ,car, is found or if the string ends with ,car to account for the comma.
SELECT
CASE
WHEN REGEXP_LIKE('car,care,oscar', '^car,|^car$') THEN REGEXP_INSTR('car,care,oscar', '^car,|^car$', 1, 1)
WHEN REGEXP_LIKE('car,care,oscar', ',car,|,car$') THEN REGEXP_INSTR('car,care,oscar', ',car,|,car$', 1, 1)+1
ELSE 0
END "REGEXP_INSTR"
FROM DUAL;
SQL Fiddle demo with the various possibilities

I like Noel his answer as it gives a very good performance! Another way around is by creating separate rows from a character separated string:
pm.nodes = 'a;b;c;d;e;f;g'
(select regexp_substr(pm.nodes,'[^;]+', 1, level)
from dual
connect by regexp_substr(pm.nodes, '[^;]+', 1, level) is not null)

Related

Extract Numbers from String - Custom

I'd like to extract "Most" numbers from a string and Add "JW" at the end.
My values look like:
RFID_DP_IDS339020JW3_IDMsg - Result = 339020JW
RFID_DP_IDSA72130JW_IDMsg --> 72130JW
RFID_DP_IDS337310JW1_IDMsg --> 337310JW
Basically I would remove all first letters, keep all numbers and JW
For now I had this
regexp_replace(Business_CONTEXT, '[^0-9]', '')||'JW' RegistrationPoint
But that would include the numbers AFTER 'JW'
Any idea?
How about this?
result would return exactly two letters after bunch of digits
result2 would return digits + JW
Pick the one you find the most appropriate.
SQL> with test (col) as
2 (select 'RFID_DP_IDS339020JW3_IDMsg' from dual union all
3 select 'RFID_DP_IDSA72130JW_IDMsg' from dual union all
4 select 'RFID_DP_IDS337310JW1_IDMsg' from dual
5 )
6 select col,
7 regexp_substr(col, '\d+[[:alpha:]]{2}') result,
8 regexp_substr(col, '\d+JW') result2
9 from test;
COL RESULT RESULT2
-------------------------- -------------------------- --------------------------
RFID_DP_IDS339020JW3_IDMsg 339020JW 339020JW
RFID_DP_IDSA72130JW_IDMsg 72130JW 72130JW
RFID_DP_IDS337310JW1_IDMsg 337310JW 337310JW
SQL>
If you really want to extract the longest digit string out of your given strings you can use the following:
WITH test (Business_CONTEXT) AS
(SELECT 'RFID_DP_IDS339020JW3_I9DMsg' from dual union all
SELECT 'RFID_DP_IDSA72130JW_IDMsg' from dual union all
SELECT 'RFID_DP_IDS337310JW1_IDMsg' from dual
)
SELECT Business_CONTEXT
, (SELECT MAX(regexp_substr(Business_CONTEXT, '\d+', 1, LEVEL))
KEEP (dense_rank last ORDER BY LENGTH(regexp_substr(Business_CONTEXT, '\d+', 1, LEVEL)))
FROM dual
CONNECT BY regexp_substr(Business_CONTEXT, '\d+', 1, LEVEL) IS NOT NULL) num
FROM test
Result:
Business_CONTEXT | NUM
----------------------------+-----
RFID_DP_IDS339020JW3_I9DMsg | 339020
RFID_DP_IDSA72130JW_IDMsg | 72130
RFID_DP_IDS337310JW1_IDMsg | 337310

How to calculate number of non blank rows based on the value using dax

I have a table with numeric values and blank records. I'm trying to calculate a number of rows that are not blank and bigger than 20.
+--------+
| VALUES |
+--------+
| 2 |
| 0 |
| 13 |
| 40 |
| |
| 1 |
| 200 |
| 4 |
| 135 |
| |
| 35 |
+--------+
I've tried different options but constantly get the next error: "Cannot convert value '' of type Text to type Number". I understand that blank cells are treated as text and thus my filter (>20) doesn't work. Converting blanks to "0" is not an option as I need to use the same values later to calculate AVG and Median.
CALCULATE(
COUNTROWS(Table3),
VALUE(Table3[VALUES]) > 20
)
OR getting "10" as a result:
=CALCULATE(
COUNTROWS(ALLNOBLANKROW(Table3[VALUES])),
VALUE(Table3[VALUES]) > 20
)
The final result in the example table should be: 4
Would be grateful for any help!
First, the VALUE function expects a string. It converts strings like "123"into the integer 123, so let's not use that.
The easiest approach is with an iterator function like COUNTX.
CountNonBlank = COUNTX(Table3, IF(Table3[Values] > 20, 1, BLANK()))
Note that we don't need a separate case for BLANK() (null) here since BLANK() > 20 evaluates as False.
There are tons of other ways to do this. Another iterator solution would be:
CountNonBlank = COUNTROWS(FILTER(Table3, Table3[Values] > 20))
You can use the same FILTER inside of a CALCULATE, but that's a bit less elegant.
CountNonBlank = CALCULATE(COUNT(Table3[Values]), FILTER(Table3, Table3[Values] > 20))
Edit
I don't recommend the CALCULATE version. If you have more columns with more conditions, just add them to your FILTER. E.g.
CountNonBlank =
COUNTROWS(
FILTER(Table3,
Table3[Values] > 20
&& Table3[Text] = "xyz"
&& Table3[Number] <> 0
&& Table3[Date] <= DATE(2018, 12, 31)
)
)
You can also do OR logic with || instead of the && for AND.

extract all numbers in a string

How can I extract all numbers in a string?
Sample inputs:
7nr-6p
12c-18L
12nr-24L
11nr-12p
Expected Outputs:
{7,6}
{12,18}
{12,24}
etc...
The following is tested with the first one, 7nr-6p:
select regexp_split_to_array('7nr-6p', '[^0-9]') AS new_volume from mytable;
Gives: {7,"","",6,""} // Why is a numeric-only match returning spaces?
select regexp_matches('7nr-6p', '[0-9]*'::text) from mytable;
Gives: {7} // Why isn't this continuing?
select regexp_matches('7nr-6p', '\d'::text) from mytable;
Gives: {7}
select NULLIF(regexp_replace('7nr-6p', '\D',',','g'), '')::text from mytable;
Gives: 7,,,6,
The following query:
select regexp_split_to_array(regexp_replace('7nr-6p', '^[^0-9]*|[^0-9]*$', 'g'), '[^0-9]+')
AS new_volume from mytable;
"Trims" the prefix and suffix non-numbers and splits by the remaining non-numbers.
select regexp_matches('7nr-6p', '[0-9]*'::text) from mytable;
Gives: {7} // Why isn't this continuing?
Because without the 'g' flag, the regex stops at the first match.
Add the 'g' flag:
select regexp_matches('7nr-6p', '[0-9]*'::text, 'g') from mytable;
You can replace all text and then split:
SELECT regexp_split_to_array(
regexp_replace('7nr-6p', '[a-zA-Z]', '','g'),
'[^0-9]'
)
This returns {7,6}
SELECT id, (regexp_matches(string, '\d+', 'g'))[1]::int AS nr
FROM (
VALUES
(1, '7nr-6p')
, (2, '12c-18L')
, (3, '12nr-24L')
, (4, '11nr-12p')
) tbl(id, string);
Result:
id | nr
----+----
1 | 7
1 | 6
2 | 12
2 | 18
3 | 12
3 | 24
4 | 11
4 | 12
I wanted them in a single cell so I could extract them as needed
SELECT id, trim(regexp_replace(string, '\D+', ',', 'g'), ',') AS nrs
FROM (
VALUES
(1, '7nr-6p')
, (2, '12c-18L')
, (3, '12nr-24L')
, (4, '11nr-12p')
) tbl(id, string);
Result:
id | nrs
----+-------
1 | 7,6
2 | 12,18
3 | 12,24
4 | 11,12
dbfiddle here
Here is a more robust solution
CREATE OR REPLACE FUNCTION get_ints_from_text(TEXT) RETURNS int[] AS $$
select array_remove(regexp_split_to_array($1,'[^0-9]+','i'),'')::int[];
$$ LANGUAGE SQL IMMUTABLE;
Example
select get_ints_from_text('7nr-6p'); -- 7,6
-- also resilient in situations like
select get_ints_from_text('-7nr--6p'); -- 7,6
Here is a link to try
http://sqlfiddle.com/#!17/c6ac7/2
I feel that wrapping this functionality into an immutable function is prudent. This is a pure function, one that will not mutate data and one that returns the same result given the same input. Immutable functions marked as "immutable" have performance benefits.
By using a function we also benefit from abstraction. There is one source to update should this functionality need to improve in the future.
For more information about immutable functions see
https://www.postgresql.org/docs/10/static/sql-createfunction.html

hive parse string from log

i'm having a problem when parsing string from log file, this is the case:
"skey":"110","scp_id":"OC05","capedge":"3G"
"skey":"140","scp_id":"OC02","capedge":"3G"
"skey":"0","scp_id":"OC01","capedge":"3G"
this is our expected output for our table
| skey | scp_id | capedge |
| 110 | OC05 | 3G |
| 140 | OC02 | 3G |
| 0 | OC01 | 3G |
i've tried using parse_url method from https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF but unfortunately our string is not in url format, is there any better approach for this? or do i have to use regexp_extract for this?
thank you,
Galih
You may use a combination of SPLIT function and REGEXP_EXTRACT
select REGEXP_EXTRACT( skey , ':"(\\w+)"', 1) as skey,
REGEXP_EXTRACT( scp_id , ':"(\\w+)"', 1) as scp_id,
REGEXP_EXTRACT( capedge , ':"(\\w+)"', 1) as capedge
from (
select SPLIT(log_record, ',' )[0] as skey,
SPLIT(log_record , ',')[1] as scp_id,
SPLIT( log_record , ',')[2] as capedge
FROM yourtable
) a;
HUE DEMO : user id,pwd : demo,demo

posix regexp to split a table

I'm currently working on data migration in PostgreSQL. Since I'm new to posix regular expressions, I'm having some trouble with a simple pattern and would appreciate your help.
I want to have a regular expression split my table on each alphanumeric char in a column, eg. when a column contains a string 'abc' I'd like to split it into 3 rows: ['a', 'b', 'c']. I need a regexp for that
The second case is a little more complicated, I'd like to split an expression '105AB' into ['105A', '105B'], I'd like to copy the numbers at the beginning of the string and split the table on uppercase letters, in the end joining the number with exactly 1 uppercase letter.
the function I'll be using is probably regexp_split_to_table(string, regexp)
I'm intentionally providing very little data not to confuse anyone, since what I posted is the essence of the problem. If you need more information please comment.
The first was already solved by you:
select regexp_split_to_table(s, ''), i
from (values
('abc', 1),
('def', 2)
) s(s, i);
regexp_split_to_table | i
-----------------------+---
a | 1
b | 1
c | 1
d | 2
e | 2
f | 2
In the second case you don't say if the numerics are always the first tree characters:
select
left(s, 3) || regexp_split_to_table(substring(s from 4), ''), i
from (values
('105AB', 1),
('106CD', 2)
) s(s, i);
?column? | i
----------+---
105A | 1
105B | 1
106C | 2
106D | 2
For a variable number of numerics:
select n || a, i
from (
select
substring(s, '^\d{1,3}') n,
regexp_split_to_table(substring(s, '[A-Z]+'), '') a,
i
from (values
('105AB', 1),
('106CD', 2)
) s(s, i)
) s;
?column? | i
----------+---
105A | 1
105B | 1
106C | 2
106D | 2