Regex (All after first match (without the first match)) - regex

I am struggling with the easy Regex expression. Basically I want everything after the first match of "_" without the "_".
My current expression is like this: _(.*)
When I give input: AAA_BBB_CCC
The output is: _BBB_CCC
My ideal output would be: BBB_CCC
I am using a snowflake database with their build-in regex function.
Unfortunately, I can not use (?<=_).* as it does not support this format of "?<=". Is there some other way how can I modify _(.*) to get the right output?
Thank you.

You can use a regular expression to achieve this, something like this is JavaScript for example will do the job
"AAA_BBB_CCC".replace(/[^_]+./, '')
Use REGEXP_REPLACE with Snowflake
regexp_replace('AAA_BBB_CCC','^[^_]+_','')
https://docs.snowflake.net/manuals/sql-reference/functions/regexp_replace.html
But you can also find the first index of _ and use substring, available in all languages
let text = "AAA_BBB_CCC"
let index = text.indexOf('_')
if(index !== -1 && index < text.length) {
let result = text.substring(index+1)
}

In Snowflake SQL, you may use REGEXP_SUBSTR, its syntax is
REGEXP_SUBSTR( <string> , <pattern> [ , <position> [ , <occurrence> [ , <regex_parameters> [ , <group_num ] ] ] ] ).
The function allows you to return captured substrings:
By default, REGEXP_SUBSTR returns the entire matching part of the subject. However, if the e (for “extract”) parameter is specified, REGEXP_SUBSTR returns the the part of the subject that matches the first group in the pattern. If e is specified but a group_num is not also specified, then the group_num defaults to 1 (the first group). If there is no sub-expression in the pattern, REGEXP_SUBSTR behaves as if e was not set.
So, you need to set the regex_parameters to e and - optionally - group_num argument to 1:
Select REGEXP_SUBSTR('AAA_BBB_CCC', '_(.*)', 1, 1, 'e', 1)
Select REGEXP_SUBSTR('AAA_BBB_CCC', '_(.*)', 1, 1, 'e')

Use a capture group:
\_(?<data>.*)
Which returns the capture group data containing BBB_CCC
Example:
https://regex101.com/r/xZaXKR/1

To get this actually working you need to use:
SELECT REGEXP_SUBSTR('AAA_BBB_CCC', '_(.*)', 1, 1, 'e', 1);
which gives:
REGEXP_SUBSTR('AAA_BBB_CCC', '_(.*)', 1, 1, 'E', 1)
BBB_CCC
you need to pass the REGEXP_SUBSTR parameter <regex_parameters> clause of e as that is extract sub-matches. thus Wiktor's answer is 95% correct.

Related

Oracle 11g - REGEXP_REPLACE - Subexpressions/different matches

SQLFiddle: http://sqlfiddle.com/#!4/db1bd/49/0
I'm working on a query that returns an object's DN:(cn=name,ou=folder,dc=hostname,dc=com)
My goal is to return this information in a "prettier" output akin to AD:(name\folder\hostname.com)
I've accomplished this in a clunky way:
REGEXP_REPLACE(REGEXP_REPLACE(TEST, '.*CN=(.+?),DC=.*', '\1', 1, 1, 'i'), ',OU=', '\', 1, 0, 'i') -- grab everything between CN= and DC=, replace with \'s --
|| '\' ||
REGEXP_REPLACE(SUBSTR(TEST, REGEXP_INSTR(TEST, ',DC=', 1, 1, 0, 'i')+4),',DC=','.', 1, 0, 'i') -- grab everything after DC=, replace with .'s --
While that works I'm not thrilled with how overly complicated it is (and that it involves having to stitch two regex'd strings together).
I started clean and realized I was doing too much to get what I wanted and my starting point is now here:
REGEXP_REPLACE(test, '(,?(cn=|ou=)(.+?),)', '\3\')
I think I have a good understanding of how this one works but if I add an additional (...) it breaks what I already have working and returns the entire string. I've read that Oracle's regex engine is not as advanced as some others, but I'm struggling to grasp the order of how things are evaluated.
Example Input (can have multiple OUs/DCs):
cn=name,ou=subgroup,ou=group,dc=accounts,dc=hostname,dc=com
cn=name,ou=group,dc=hostname,dc=com
Expected Output
name\subgroup\group\accounts.hostname.com
name\group\hostname.com
The data coming in is dynamic and never a set number of OUs or DCs.
You may use
SELECT REPLACE(
REGEXP_REPLACE(
test,
'(^|,)(cn|ou)=([^,]*)(,dc=)?',
'\3\\'),
',dc=',
'.')
FROM regexTest
See the SQLFiddle.
The first (^|,)(cn|ou)=([^,]*)(,dc=)? regex matches , or start of string, then cn or ou, then =, then captures into Group 3 zero or more chars other than a comma, and then matches an optional ,dc= substring (thus, removing the first instance of ,dc=). The replacement is Group 3 contents and a backslash.
So, the second operation is easy, just replace all ,dc= with ., you do not even need a regex for this.
May be something like that:
SELECT nvl(regexp_replace(
regexp_replace(
nullif(
regexp_replace(test, '^cn=(.+?),DC=(.+?)$', '\1 \2',1,1,'i')
, test
) , ' |,(CN|OU)=', '\\', 1, 0,'i'
), ',DC=', '.', 1, 0,'i'
),test) result
FROM regexTest
This query does not change the input if there is no DC=.

Replace pair of % in oracle

please, I have in Oracle table this texts (as 2 records)
"Sample text with replace parameter %1%"
"You reached 90% of your limit"
I need replace %1% with specific text from input parameter in Oracle Function. In fact, I can have more than just one replace parameters. I have also record with "Replace this %12% with real value"
This functionality I have programmed:
IF poc > 0 THEN
FOR i in 1 .. poc LOOP
p := get_param(mString => mbody);
mbody := replace(mbody,
'%' || p || '%', parameters(to_number(p, '99')));
END LOOP;
END IF;
But in this case I have problem with text number 2. This functionality trying replace "90%" also and I then I get this error:
ORA-06502: PL/SQL: numeric or value error: NULL index table key value
It's a possible to avoid try replace "90%"? Many thanks for advice.
Best regards
PS: Oracle version: 10g (OCI Version: 10.2)
Regular expressions can work here. Try the following and build them into your script.
SELECT REGEXP_REPLACE( 'Sample text with replace parameter %1%',
'\%[0-9]+\%',
'db_size' )
FROM DUAL
and
SELECT REGEXP_REPLACE( 'Sample text with replace parameter 1%',
'\%[0-9]+\%',
'db_size' )
FROM DUAL
The pattern is pretty simple; look for patterns where a '%' is followed by 1 or more numbers followed by a '%'.
The only issue here will be if you have more than one replacement to make in each string and each replacement is different. In that case you will need to loop round the string each time replacing the next parameter. To do this add the position and occurrence parameters to REGEXP_REPLACE after the replacement string, e.g.
REGEXP_REPLACE( 'Sample text with replace parameter %88888888888%','\%[0-9]+\%','db_size',0,1 )
You are getting the error because at parameters(to_number(p, '99')). Can you please check the value of p?
Also, if the p=90 then then REPLACE will not try to replace "90%". It will replace "%90%". How have you been sure that it's trying to replace "90%"?

How do I convert this pcre regex to be used with Oracle's REGEXP_SUBSTR?

I have this pcre regular expression that I want to port to an Oracle-supported regex:
^.*pdf_(\w+-\w+).*$
Is designed to match and only what's bolded:
roundBox indent pdf_placement
pdf_grade
indent pdf_placement1 roundBox
What is the equivalent expression in Oracle's regex syntax?
Edit:
I tried what was suggested by sln in the comments:
SELECT REGEXP_SUBSTR(class, '^.*pdf_(\w+(?:-\w+)*).*$') FROM ...
And all I'm getting is the entire value returned, not just the match:
roundBox indent pdf_placement
instead of
placement
The expression I ended up going with was:
pdf_(\w+(?-\w*)*)
In full, the SELECT clause looked like this:
SELECT REGEXP_SUBSTR(class, 'pdf_(\w+(-\w*)*)', 1, 1, 'i', 1) FROM ...
You could take the approach of replacing what's unwanted:
SQL> with t (txt) as (
2 select 'roundBox indent pdf_placement' from dual union all
3 select 'PDF_grade' from dual union all
4 select 'indent pdf_placement1 roundBox' from dual
5 ) -- end of sample data
6 select regexp_replace(txt, '^.*pdf_(\w+).*$', '\1', 1, 0, 'i')
7 from t;
REGEXP_REPLACE(TXT,'^.*PDF_(\W
--------------------------------------------------------------------------------
placement
grade
placement1
I used the parameter 'i' to make it case insensitive and work with capital letters PDF as well. Feel free to play with it as needed.

Oracle: Extract number from String

I've reviewed this question and I'm wondering my output seems to be a little skewed.
From my understanding the REGEXP_REPLACE method, takes a string that you want to replace content with, followed by a pattern to match, then anything that does not match that pattern is replaced with the substitution param.
I've written the following function to extract distance from a text field, in which a spatial query will be performed on the result.
CREATE OR REPLACE FUNCTION extract_distance
(
p_search_string VARCHAR2
)
RETURN VARCHAR2
IS
l_distance VARCHAR2(25);
BEGIN
SELECT REGEXP_REPLACE(UPPER(p_search_string), '(([0-9]{0,4}) ?MILES)', '')
INTO l_distance FROM SYS.DUAL;
RETURN l_distance;
END extract_distance;
When I run this in a block to test:
DECLARE
l_output VARCHAR2(25);
BEGIN
l_output := extract_distance('Stores selling COD4 in 400 Miles');
DBMS_OUTPUT.PUT_LINE(l_output);
END;
I'd expect the output 400 miles but in-fact I get Stores selling COD4 in. Where have I gone wrong?
"REGEXP_REPLACE extends the functionality of the REPLACE function by letting you search a string for a regular expression pattern. By default, the function returns source_char with every occurrence of the regular expression pattern replaced with replace_string." from Oracle docu
You could use, e.g.,
SELECT REGEXP_REPLACE('Stores selling COD4 in 400 Miles', '^.*?(\d+ ?MILES).*$', '\1', 1, 0, 'i') FROM DUAL;
or alternatively
SELECT REGEXP_SUBSTR('Stores selling COD4 in 400 Miles', '(\d+ ?MILES)', 1, 1, 'i') FROM DUAL;
You'll want to use, regexp_substr which returns a substring that matches the regular expression.
REGEX_SUBSTR

Replacing the first vowel-consonent occurence with consonent-vowel using sub in R

I know that it should be something like this but definitely I am missing something in the syntax:
yy=sub(r'\b[aeiou][^aeiou]*',r'\b[^aeiou][aeiou]*',"abmmmm")
I expect to have "bammmm" as output
Error: unexpected string constant in "yy=sub(r'\b[aeiou][^aeiou]*'"
I am not sure how is the exact syntax.
Please run your code in RStudio or any R compiler. I am new to regex and you giving me Python code wouldn't help me to understand the situation. Thanks!
This is what you want
yy=sub("\\b([aeiou])([^aeiuos])","\\2\\1","abmm")
I'll explain how it works:
If you ask me to substitute any vowel-consonent with any consonent-vowel? It doesn't make much sense. Should I change ab to ba, ce, or da? It can be any one of them. You never specified any relationship between the vowel in vowel-consonent and the vowel in consonent-vowel. Therefore, it doesn't make sense to put a regular expression in the 2nd argument. As a result, you are not allowed to.
If you want to achieve what you asked for. You can add brackets to the regular expression in the 1st argument. The first ( marks group 1, second ( marks group 2, etc. (note, group 0 is the whole matched string.) You can use \1, \2, ... in the second argument to put the matched group there.
As an alternative to using a regular expression for this, there's a nice string reversal function in example(strsplit)
> strReverse <- function(x)
sapply(lapply(strsplit(x, NULL), rev), paste, collapse="")
> dd <- "abmmmm"
> paste(strReverse(substr(dd, 1, 2)), substr(dd, 3, nchar(dd)), sep = "")
[1] "bammmm"