How to make regular expression correctly?

How to make regular expression correctly? - regex

I need to get data from third-occurrence position of "*" to 4th. I do so:
with t as (select 'T*76031*12558*test*received percents' as txt from dual)
select regexp_replace(txt, '.*(.{4})[*][^*].*$', '\1')
from t
I receive "test" - it's right, but how to get any number of characters, not just 4?

This should work given the example you have used:
REGEXP_REPLACE( txt, '(^.*\*.*\*.*\*)([[:alnum:]]*)(\*.*$)', '\2')
So the SELECT would be:
WITH t
AS (SELECT 'T*76031*12558*test*received percents' AS txt FROM DUAL)
SELECT REGEXP_REPLACE( txt, '(^.*\*.*\*.*\*)([[:alnum:]]*)(\*.*$)', '\2')
FROM t;
The regex looks for:
Group 1:
start of string. Any number of characters up to a ''. Any further characters up mto another ''. Any further characters up to the third '*'.
Group 2:
Any alphanumeric characters
Group 3:
A '*' followed by any other characters up to the end of the string.
Replace all of the above with whatever was found in Group 2.
Hope this helps.
EDIT:
Following on from a great answer from another thread by Rob van Wijk here:
Exracting substring from given string
WITH t
AS (SELECT 'T*76031*12558*test*received percents' AS txt FROM DUAL)
SELECT REGEXP_SUBSTR( txt,'[^\*]+',1,4)
FROM t;

How about the following?
^([^*]*[*]){3}([^*]*)
The first part matches 3 groups of * and the second part matches everything until the next * or end of line.

You are assuming that the last * of your text is also the fourth. If this assumption is true then this :
\b\w*\b(?=\*[^*]*$)
Will get you what you want. But of course this only matches the last word between * before the last star. It only matches test in this case or whatever word characters are inside the *.

Note: 10g REGEXP_SUBSTR doesn't support returning subexpressions, see comments below.
If you are really only selecting a part of the string I recommend using REGEXP_SUBSTR instead. I don't know if it's more efficient, but it will better document your intent:
SQL> select regexp_substr('T*76031*12558*test*received percents',
'^([^*]*[*]){3}([^*]*)', 1, 1, '', 2) from dual;
REGEXP_SUBST
------------
test
Above I have used regexp provided by Pieter-Bas.
See also http://www.regular-expressions.info/oracle.html

Related

Regex match everything after first and until 2nd occurrence of a slash

Need to match everything after the first / and until the 2nd / or end of string. Given the following examples:
/US
/CA
/DE/Special1
/FR/Special 1/special2
Need the following returned:
US
CA
DE
FR
Was using this in DataStudio which worked:
^(.+?)/
However the same in BigQuery is just returning null. After trying dozens of other examples here, decided to ask myself. Thanks for your help.

For such simple extraction - consider alternative of using cheaper string functions instead of more expensive regexp functions. See an example below
#standardSQL
WITH `project.dataset.table` AS (
SELECT '/US' line UNION ALL
SELECT '/CA' UNION ALL
SELECT '/DE/Special1' UNION ALL
SELECT '/FR/Special 1/special2'
)
SELECT line, SPLIT(line, '/')[SAFE_OFFSET(1)] value
FROM `project.dataset.table`
with result
Row line value
1 /US US
2 /CA CA
3 /DE/Special1 DE
4 /FR/Special 1/special2 FR

Your regex matches any 1 or more chars as few as possible at the start of a string (up to the first slash) and puts this value in Group 1. Then it consumes a / char. It does not actually match what you need.
You can use a regex in BigQuery that matches a string partially and capture the part you need to get as a result:
/([^/]+)
It will match the first occurrence of a slash followed with one or more chars other than a slash placing the captured substring in the result you get.

Oracle Database, extract string beeing between two other strings

I need a regexp that's combined with regexp_substr() would give me the word being between two other specified words.
Example:
source_string => 'First Middle Last'
substring varchar2(100);
substring := regexp_substr(source_string, 'First (.*) Last'); <===
this doesn't work :(.
dbms_output.put_line(substring) ===> output should be: 'Middle'
I know it looks simple and to be honest, at the beginning I thought the same.
But now after spending about 3h for searching for a solution I give up...

It's not working because the literal strings 'First' and 'Last' are being looked for. Assuming that the strings don't all literally begin 'First' you need to find another way to represent them. You've already done this by representing 'Middle' as (.*)
The next point is that you need to extract a sub-expression (the part in parenthesis), this is the 6th parameter of REGEXP_SUBSTR().
If you put these together then the following gives you what you want:
regexp_substr(source_string, '.*\s(.*)\s.*', 1, 1, 'i', 1)
An example of it working:
SQL> select regexp_substr('first middle last', '.*\s(.*)\s.*', 1, 1, 'i', 1)
2 from dual;
REGEXP
------
middle
You can also use an online regex tester to validate that 'middle' is the only captured group.
Depending on what your actual source strings look like you may not want to search for exactly spaces, but use \W (a non-word character) instead.
If you're expecting exactly three words I'd also anchor your expression to the start and end of the string: ^.*\s(.*)\s.*$

If source string always looks the same, i.e. consists of 3 elements (words), then such a simple regular expression does the job:
SQL> with t (str) as
2 (select 'First Middle Last' from dual)
3 select regexp_substr(str, '\w+', 1, 2) result from t;
RESULT
------
Middle
SQL>

(\S*) pattern might be used with regexp_replace and regexp_substr as in the following way to get the middle word :
with t(str) as
(
select 'First Middle Last' from dual
)
select regexp_substr(trim(regexp_replace(str, '^(\S*)', '')),'(\S*)')
as "Result String"
from t;
Result String
-------------
Middle
in the first step First, and in the second one Last words are trimmed.
Or, More directly you can figure out by using regexp_replace as
with t(str) as
(
select 'First Middle Last' from dual
)
select regexp_replace(str,'(.*) (.*) (.*)','\2')
as "Result String"
from t;
Result String
-------------
Middle

How to split strings using two delimiter in Oracle 11g regexp_substr functions

I have doubt to split a string using the delimiter.
First split based on , delimiter select those splitted strings should split based on - delimiter
My original string: UMC12I-1234,CSM3-123,VQ,
Expected output:
UMC12I
CSM3
VQ
Each value comes as row value
I tried the option
WITH fab_sites AS (
SELECT trim(regexp_substr('UMC12I-1234,CSM3-123,VQ,', '[^,]+', 1, LEVEL)) fab_site
FROM dual
CONNECT BY LEVEL <= regexp_count('UMC12I-1234,CSM3-123,VQ,', '[^,]+')+1
)
SELECT fab_site FROM fab_sites WHERE fab_site IS NOT NULL
-- splitted based on , delimiter
Output is:
UMC12I-1234
CSM3-123
VQ
how can I get my expected output? (need to split again - delimiter)

You may extract the "words" before the - with the regexp_substr using
([^,-]+)(-[^,-]+)?
The pattern will match and capture into Group 1 one or more chars other than , and -, then will match an optional sequence of - and 1+ chars other than ,and -.
See the regex demo.
Use this regex_substr line instead of yours with the above regex:
SELECT trim(regexp_substr('UMC12I-1234,CSM3-123,VQ,', '([^,-]+)(-[^,-]+)?', 1, LEVEL, NULL, 1)) fab_site
See the online demo

You might try this query:
WITH fab_sites AS (
SELECT TRIM(',' FROM REGEXP_SUBSTR('UMC12I-1234,CSM3-123,VQ,', '(^|,)[^,-]+', 1, LEVEL)) fab_site
FROM dual
CONNECT BY LEVEL <= REGEXP_COUNT('UMC12I-1234,CSM3-123,VQ,', '(^|,)[^,-]+')
)
SELECT fab_site
FROM fab_sites;
We start by matching any substring that starts either with the start of the whole string ^ or with a comma ,, the delimiter. We then get all the characters that match neither a comma nor a dash -. Once we have that substring we trim any leftover commas from it.
P.S. I think the +1 in the CONNECT BY clause is extraneous, as is the WHERE NOT NULL in the "outer" query.

Extract data outside of parentheses in oracle

I have this value: (203)1669
My requirement is to extract data which is outside of the parentheses.
I want to use Regular expression for this Oracle query.
Much appreciated!

You can use the Oracle REGEXP_REPLACE() function, and match the group which is outside the parentheses.
SELECT REGEXP_REPLACE(phone_number, '\([[:digit:]]+\)(.*)', '\1') AS newValue
FROM your_table

You can use the combination of SUBSTR and INSTR function.
select substr('(203)1669', instr('(203)1669',')')+1) from dual

This example uses REGEXP_SUBSTR() and the REGEX explicitly follows your spec of getting the 4 digits between the closing paren and the end of the line. If there could be a different number of digits, replace the {4} with a + for one or more digits:
SQL> with tbl(str) as (
select '(203)1669' from dual
)
select regexp_substr(str, '\)(\d{4})$', 1, 1, NULL, 1) nbr
from tbl;
NBR
----
1669
SQL>

For the pattern you mentioned, this should work.
select
rtrim(ltrim(substr(phone_number,instr(phone_number,')')+1,length(phone_number))))
as derived_phone_no
from
(select '(123)456' as phone_number from dual union all
select '(567)99084' as phone_number from dual)
Here first I am getting position of ) and then getting substr from the position of ) + 1 till the length of the string. As a best practice, you can use trim functions.

How do I remove all characters that aren't alphabetic from a string in PL/SQL?

I have a PL/SQL procedure and I need to take a string and remove all characters that aren't alphabetic. I've seen some examples and read documentation about the REGEXP_REPLACE function but can't understand how it functions.
This is not a duplicate because I need to remove punctuation, not numbers.

Either:
select regexp_replace('1A23B$%C_z1123d', '[^A-Za-z]') from dual;
or:
select regexp_replace('1A23B$%C_z1123d', '[^[:alpha:]]') from dual;
The second one takes into account possible other letters like:
select regexp_replace('123żźć', '[^[:alpha:]]') from dual;
Result:
żźć
Also to answer your question about how the functions works: the first parameter is the source string, the second - a regular expression - everything which will be matched to it, will be replaced by the third argument (optional, NULL by default, meaning all matched characters will just be removed).
Read more about regular expressions:
http://docs.oracle.com/cd/B19306_01/appdev.102/b14251/adfns_regexp.htm

you can use regexp like that:
SELECT REGEXP_REPLACE(UPPER('xYztu-123-hello'), '[^A-Z]+', '') FROM DUAL;
also answered here for non-numeric chars

Try this:
SELECT REGEXP_REPLACE('AB$%c','[^a-zA-Z]', '') FROM DUAL;
Or
SELECT REGEXP_REPLACE( your_column, '[^a-zA-Z]', '' ) FROM your_table;
Read here for more information

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to make regular expression correctly? - regex

I need to get data from third-occurrence position of "" to 4th. I do so: with t as (select 'T7603112558testreceived percents' as txt from dual) select regexp_replace(txt, '.(.{4})[][^].*$', '\1') from t I receive "test" - it's right, but how to get any number of characters, not just 4?

How about the following? ^([^][]){3}([^]) The first part matches 3 groups of and the second part matches everything until the next * or end of line.

Related

Regex match everything after first and until 2nd occurrence of a slash

Oracle Database, extract string beeing between two other strings

How to split strings using two delimiter in Oracle 11g regexp_substr functions

Extract data outside of parentheses in oracle

How do I remove all characters that aren't alphabetic from a string in PL/SQL?

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to make regular expression correctly? - regex

I need to get data from third-occurrence position of "*" to 4th. I do so: with t as (select 'T*76031*12558*test*received percents' as txt from dual) select regexp_replace(txt, '.*(.{4})[*][^*].*$', '\1') from t I receive "test" - it's right, but how to get any number of characters, not just 4?

How about the following? ^([^*]*[*]){3}([^*]*) The first part matches 3 groups of * and the second part matches everything until the next * or end of line.

Related

Regex match everything after first and until 2nd occurrence of a slash

Oracle Database, extract string beeing between two other strings

How to split strings using two delimiter in Oracle 11g regexp_substr functions

Extract data outside of parentheses in oracle

How do I remove all characters that aren't alphabetic from a string in PL/SQL?

Categories

Resources

I need to get data from third-occurrence position of "" to 4th. I do so: with t as (select 'T7603112558testreceived percents' as txt from dual) select regexp_replace(txt, '.(.{4})[][^].*$', '\1') from t I receive "test" - it's right, but how to get any number of characters, not just 4?

How about the following? ^([^][]){3}([^]) The first part matches 3 groups of and the second part matches everything until the next * or end of line.