extract all numbers in a string - regex

How can I extract all numbers in a string?
Sample inputs:
7nr-6p
12c-18L
12nr-24L
11nr-12p
Expected Outputs:
{7,6}
{12,18}
{12,24}
etc...
The following is tested with the first one, 7nr-6p:
select regexp_split_to_array('7nr-6p', '[^0-9]') AS new_volume from mytable;
Gives: {7,"","",6,""} // Why is a numeric-only match returning spaces?
select regexp_matches('7nr-6p', '[0-9]*'::text) from mytable;
Gives: {7} // Why isn't this continuing?
select regexp_matches('7nr-6p', '\d'::text) from mytable;
Gives: {7}
select NULLIF(regexp_replace('7nr-6p', '\D',',','g'), '')::text from mytable;
Gives: 7,,,6,

The following query:
select regexp_split_to_array(regexp_replace('7nr-6p', '^[^0-9]*|[^0-9]*$', 'g'), '[^0-9]+')
AS new_volume from mytable;
"Trims" the prefix and suffix non-numbers and splits by the remaining non-numbers.
select regexp_matches('7nr-6p', '[0-9]*'::text) from mytable;
Gives: {7} // Why isn't this continuing?
Because without the 'g' flag, the regex stops at the first match.
Add the 'g' flag:
select regexp_matches('7nr-6p', '[0-9]*'::text, 'g') from mytable;

You can replace all text and then split:
SELECT regexp_split_to_array(
regexp_replace('7nr-6p', '[a-zA-Z]', '','g'),
'[^0-9]'
)
This returns {7,6}

SELECT id, (regexp_matches(string, '\d+', 'g'))[1]::int AS nr
FROM (
VALUES
(1, '7nr-6p')
, (2, '12c-18L')
, (3, '12nr-24L')
, (4, '11nr-12p')
) tbl(id, string);
Result:
id | nr
----+----
1 | 7
1 | 6
2 | 12
2 | 18
3 | 12
3 | 24
4 | 11
4 | 12
I wanted them in a single cell so I could extract them as needed
SELECT id, trim(regexp_replace(string, '\D+', ',', 'g'), ',') AS nrs
FROM (
VALUES
(1, '7nr-6p')
, (2, '12c-18L')
, (3, '12nr-24L')
, (4, '11nr-12p')
) tbl(id, string);
Result:
id | nrs
----+-------
1 | 7,6
2 | 12,18
3 | 12,24
4 | 11,12
dbfiddle here

Here is a more robust solution
CREATE OR REPLACE FUNCTION get_ints_from_text(TEXT) RETURNS int[] AS $$
select array_remove(regexp_split_to_array($1,'[^0-9]+','i'),'')::int[];
$$ LANGUAGE SQL IMMUTABLE;
Example
select get_ints_from_text('7nr-6p'); -- 7,6
-- also resilient in situations like
select get_ints_from_text('-7nr--6p'); -- 7,6
Here is a link to try
http://sqlfiddle.com/#!17/c6ac7/2
I feel that wrapping this functionality into an immutable function is prudent. This is a pure function, one that will not mutate data and one that returns the same result given the same input. Immutable functions marked as "immutable" have performance benefits.
By using a function we also benefit from abstraction. There is one source to update should this functionality need to improve in the future.
For more information about immutable functions see
https://www.postgresql.org/docs/10/static/sql-createfunction.html

Related

mariaDB extract only the digits from {"user":"128"}

From the string {"user":"128"}, expected only 128.
Could also be {"user":"8"} or {"user":"12"} or {"user":"128798"}
In this fiddle example the matched value should be 3
Schema:
CREATE TABLE test1 (
id INT AUTO_INCREMENT,
userid INT(11),
PRIMARY KEY (id)
);
INSERT INTO test1(userid)
VALUES
('126'),
('2457'),
('3'),
('40');
CREATE TABLE test2 (
id INT AUTO_INCREMENT,
code VARCHAR(1024),
PRIMARY KEY (id)
);
INSERT INTO test2(code)
VALUES
('{"user":"128"}'),
('{"user":"2459"}'),
('{"user":"3"}'),
('{"user":"46"}');
Query:
(SELECT sub.`userid`, rev.`code` REGEXP '[0-9]'
FROM `test1` AS sub, `test2` AS rev
WHERE sub.`userid`= rev.`code`
)
Have tried REGEXP '[0-9]' and also '[[:digit:]]' but no luck.
I have also tried concat ('',value * 1) = value or concat(value * 1) = value
SOLUTION: fiddle
REGEXP will return 0 or 1 if the regexp matches the string.
You should have a look to 'regexp_substr'
https://dev.mysql.com/doc/refman/8.0/en/regexp.html#function_regexp-substr
select REGEXP_SUBSTR(code, '"[0-9]+"') from test2
This works for me, after selecting MySQL 8 version in your fiddler.
it returns :
"128"
"2459"
null
"46"
or
select REGEXP_SUBSTR(code, '[0-9]+') from test2
if you want 3 instead of null for "k3"
akina suggestion of using JSON_EXTRACT is way better, but does require that you redefine the column as a JSON datatype from VARCHAR.
Use the function REGEXP_SUBSTR:
MariaDB [(none)]> SELECT REGEXP_SUBSTR('{"user":"128"}', '\\d+');
+-----------------------------------------+
| REGEXP_SUBSTR('{"user":"128"}', '\\d+') |
+-----------------------------------------+
| 128 |
+-----------------------------------------+
1 row in set (0.000 sec)

Extract Numbers from String - Custom

I'd like to extract "Most" numbers from a string and Add "JW" at the end.
My values look like:
RFID_DP_IDS339020JW3_IDMsg - Result = 339020JW
RFID_DP_IDSA72130JW_IDMsg --> 72130JW
RFID_DP_IDS337310JW1_IDMsg --> 337310JW
Basically I would remove all first letters, keep all numbers and JW
For now I had this
regexp_replace(Business_CONTEXT, '[^0-9]', '')||'JW' RegistrationPoint
But that would include the numbers AFTER 'JW'
Any idea?
How about this?
result would return exactly two letters after bunch of digits
result2 would return digits + JW
Pick the one you find the most appropriate.
SQL> with test (col) as
2 (select 'RFID_DP_IDS339020JW3_IDMsg' from dual union all
3 select 'RFID_DP_IDSA72130JW_IDMsg' from dual union all
4 select 'RFID_DP_IDS337310JW1_IDMsg' from dual
5 )
6 select col,
7 regexp_substr(col, '\d+[[:alpha:]]{2}') result,
8 regexp_substr(col, '\d+JW') result2
9 from test;
COL RESULT RESULT2
-------------------------- -------------------------- --------------------------
RFID_DP_IDS339020JW3_IDMsg 339020JW 339020JW
RFID_DP_IDSA72130JW_IDMsg 72130JW 72130JW
RFID_DP_IDS337310JW1_IDMsg 337310JW 337310JW
SQL>
If you really want to extract the longest digit string out of your given strings you can use the following:
WITH test (Business_CONTEXT) AS
(SELECT 'RFID_DP_IDS339020JW3_I9DMsg' from dual union all
SELECT 'RFID_DP_IDSA72130JW_IDMsg' from dual union all
SELECT 'RFID_DP_IDS337310JW1_IDMsg' from dual
)
SELECT Business_CONTEXT
, (SELECT MAX(regexp_substr(Business_CONTEXT, '\d+', 1, LEVEL))
KEEP (dense_rank last ORDER BY LENGTH(regexp_substr(Business_CONTEXT, '\d+', 1, LEVEL)))
FROM dual
CONNECT BY regexp_substr(Business_CONTEXT, '\d+', 1, LEVEL) IS NOT NULL) num
FROM test
Result:
Business_CONTEXT | NUM
----------------------------+-----
RFID_DP_IDS339020JW3_I9DMsg | 339020
RFID_DP_IDSA72130JW_IDMsg | 72130
RFID_DP_IDS337310JW1_IDMsg | 337310

Google BigQuery - Execute dynamically generated queries from a select statement

Have a huge table in Google BigQuery with following structure (> 100 million rows):
name | departments
abc | 1,2,5,6
xyz | 4,5
pqr | 3,4,6
Want to convert the data into following format:
name | 1 | 2 | 3 | 4 | 5 | 6
abc | 1 | 1 | | | 1 | 1
xyz | | | | 1 | 1 |
pqr | | | 1 | 1 | | 1
As of now, able to generate the queries required to prepare the dataset in this format by using CONCAT and REGEX_REPLACE functions:
SELECT ' insert into dataset.output ( name, ' +
CONCAT(
'_' , replace(departments,',',',_') )
+ ' ) values( \'' + name +'\','+ REGEXP_REPLACE(departments, "([^,\n]+)", "1") +')'
FROM (
select name, departments from dataset.input )
This generates the output with the 100 M insert queries which can be used to create the data in the required structure.
However, now below are my questions:
Can we execute the output of this query (100 M insert queries) directly by using Big Query SQL or we would need to fire each insert one by one?
I believe there is no way to pivoting or transposing the data in a column with multiple comma separated values. Is that right?
Is there a more optimal way of achieving this using BigQuery SQL and not writing custom Java code?
Thanks.
Below example for BigQuery Standard SQL
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'abc' name, '1,2,5,6' departments UNION ALL
SELECT 'xyz', '4,5' UNION ALL
SELECT 'pqr', '3,4,6'
)
SELECT
name,
IF(departments LIKE '%1%', 1, 0) AS d1,
IF(departments LIKE '%2%', 1, 0) AS d2,
IF(departments LIKE '%3%', 1, 0) AS d3,
IF(departments LIKE '%4%', 1, 0) AS d4,
IF(departments LIKE '%5%', 1, 0) AS d5,
IF(departments LIKE '%6%', 1, 0) AS d6
FROM `project.dataset.table`
with result as
Row name d1 d2 d3 d4 d5 d6
1 abc 1 1 0 0 1 1
2 xyz 0 0 0 1 1 0
3 pqr 0 0 1 1 0 1
So you need to run above with destination to whatever new table you prepared
Note, above assumes you have just 6 departments and most important there is no ambiguity in numbers like 1 does not conflict with 10 for example
If you do have such case - you need transform below lines
IF(departments LIKE '%2%', 1, 0) AS d2,
into
IF(CONCAT(',', departments, ',') LIKE '%,2,%', 1, 0) AS d2 ...
And of course, you can use just one simple INSERT statement
INSERT `project.dataset.new_table` (name, d1, d2, d3, d4, d5, d6)
SELECT
name,
IF(departments LIKE '%1%', 1, 0) AS d1,
IF(departments LIKE '%2%', 1, 0) AS d2,
IF(departments LIKE '%3%', 1, 0) AS d3,
IF(departments LIKE '%4%', 1, 0) AS d4,
IF(departments LIKE '%5%', 1, 0) AS d5,
IF(departments LIKE '%6%', 1, 0) AS d6
FROM `project.dataset.table`
So, the final point of all this is:
instead of generating INSERT STATEMENT for each and every row in original table - you should generate simple SELECT statement that does "pivoting"
Update for "extreme" minimizing generated code
See an example:
#standardSQL
CREATE TEMP FUNCTION c(departments STRING, department INT64) AS (
IF(departments LIKE CONCAT('%',CAST(department AS STRING),'%'), 1, 0)
);
WITH `project.dataset.table` AS (
SELECT 'abc' name, '1,2,5,6' departments UNION ALL
SELECT 'xyz', '4,5' UNION ALL
SELECT 'pqr', '3,4,6'
), temp AS (
SELECT name, departments AS d
FROM `project.dataset.table`
)
SELECT
name,
c(d,1)d1,
c(d,2)d2,
c(d,3)d3,
c(d,4)d4,
c(d,5)d5,
c(d,6)d6
FROM temp
as you can see - now each of your 10000 lines will be like c(d,N)dN, with max in length as c(d,10000)d10000, so you have chance to fit into query size limit

using Oracle REGEXP_INSTR to find exact word

I want to return the following position from the strings using REGEXP_INSTR.
I am looking for the word car with exact match in the following strings.
car,care,oscar - 1
care,car,oscar - 6
oscar,care,car - 12
something like
SELECT REGEXP_INSTR('car,care,oscar', 'car', 1, 1) "REGEXP_INSTR" FROM DUAL;
I am not sure what kind of escape operators to use.
A simpler solution is to surround the source string and search string with commas and find the position using INSTR.
SELECT INSTR(',' || 'car,care,oscar' || ',', ',car,') "INSTR" FROM DUAL;
Example:
SQL Fiddle
with x(y) as (
SELECT 'car,care,oscar' from dual union all
SELECT 'care,car,oscar' from dual union all
SELECT 'oscar,care,car' from dual union all
SELECT 'car' from dual union all
SELECT 'cart,care,oscar' from dual
)
select y, ',' || y || ',' , instr(',' || y || ',',',car,')
from x
| Y | ','||Y||',' | INSTR(','||Y||',',',CAR,') |
|-----------------|-------------------|----------------------------|
| car,care,oscar | ,car,care,oscar, | 1 |
| care,car,oscar | ,care,car,oscar, | 6 |
| oscar,care,car | ,oscar,care,car, | 12 |
| car | ,car, | 1 |
| cart,care,oscar | ,cart,care,oscar, | 0 |
The following query handles all scenarios. It returns the starting position if the string begins with car, or the whole string is just car. It returns the starting position + 1 if ,car, is found or if the string ends with ,car to account for the comma.
SELECT
CASE
WHEN REGEXP_LIKE('car,care,oscar', '^car,|^car$') THEN REGEXP_INSTR('car,care,oscar', '^car,|^car$', 1, 1)
WHEN REGEXP_LIKE('car,care,oscar', ',car,|,car$') THEN REGEXP_INSTR('car,care,oscar', ',car,|,car$', 1, 1)+1
ELSE 0
END "REGEXP_INSTR"
FROM DUAL;
SQL Fiddle demo with the various possibilities
I like Noel his answer as it gives a very good performance! Another way around is by creating separate rows from a character separated string:
pm.nodes = 'a;b;c;d;e;f;g'
(select regexp_substr(pm.nodes,'[^;]+', 1, level)
from dual
connect by regexp_substr(pm.nodes, '[^;]+', 1, level) is not null)

posix regexp to split a table

I'm currently working on data migration in PostgreSQL. Since I'm new to posix regular expressions, I'm having some trouble with a simple pattern and would appreciate your help.
I want to have a regular expression split my table on each alphanumeric char in a column, eg. when a column contains a string 'abc' I'd like to split it into 3 rows: ['a', 'b', 'c']. I need a regexp for that
The second case is a little more complicated, I'd like to split an expression '105AB' into ['105A', '105B'], I'd like to copy the numbers at the beginning of the string and split the table on uppercase letters, in the end joining the number with exactly 1 uppercase letter.
the function I'll be using is probably regexp_split_to_table(string, regexp)
I'm intentionally providing very little data not to confuse anyone, since what I posted is the essence of the problem. If you need more information please comment.
The first was already solved by you:
select regexp_split_to_table(s, ''), i
from (values
('abc', 1),
('def', 2)
) s(s, i);
regexp_split_to_table | i
-----------------------+---
a | 1
b | 1
c | 1
d | 2
e | 2
f | 2
In the second case you don't say if the numerics are always the first tree characters:
select
left(s, 3) || regexp_split_to_table(substring(s from 4), ''), i
from (values
('105AB', 1),
('106CD', 2)
) s(s, i);
?column? | i
----------+---
105A | 1
105B | 1
106C | 2
106D | 2
For a variable number of numerics:
select n || a, i
from (
select
substring(s, '^\d{1,3}') n,
regexp_split_to_table(substring(s, '[A-Z]+'), '') a,
i
from (values
('105AB', 1),
('106CD', 2)
) s(s, i)
) s;
?column? | i
----------+---
105A | 1
105B | 1
106C | 2
106D | 2