Get the first ocurrence of a string in a variable REGEX - regex

I have the following variable in a database: PSC-CAMPO-GRANDE-I08-V00-C09-H09-IPRMKT and I want to split it into two variables, the first will be PSC-CAMPO-GRANDE-I08 and the second V00-C09-H09-IPRMKT.
I'm trying the regex .*(\-I).*(\-V), this doesn't work. Then I tried .*(\-I), but it gets the last -IPRMKT string.
Then my question is: There a way of split the string PSC-CAMPO-GRANDE-I08-V00-C09-H09-IPRMKT considering the first occurrence of -I?

This should do the trick:
regex = "(.*?-I[\d]{2})-(.*)"
Here is test script in Python
import re
regex = "(.*?-I[\d]{2})-(.*)"
match = re.search(regex, "PSC-CAMPO-GRANDE-I08-V00-C09-H09-IPRMKT")
if match:
print ("yep")
print (match.group(1))
print (match.group(2))
else:
print ("nope")
In the regex, I'm grabbing everything up to the first -I then 2 numbers. Then match but don't capture a -. Then capture the rest. I can help tweak it if you have more logic that you are trying to do.

You may use
^(.*?-I[^-]*)-(.*)
See the regex demo
Details:
^ - start of a string
(.*?-I[^-]*) - Group 1:
.*? - any 0+ 0+ chars other than line break chars up to the first (because *? is a lazy quantifier that matches up to the first occurrence)
-I - a literal substring -I
[^-]* - any 0+ chars other than a hyphen (your pattern was missing it)
- - a hyphen
(.*) - Group 2: any 0+ chars other than line break chars up to the end of a line.

Related

Regex: match patterns starting from the end of string

I wish to match a filename with column and line info, eg.
\path1\path2\a_file.ts:17:9
//what i want to achieve:
match[1]: a_file.ts
match[2]: 17
match[3]: 9
This string can have garbage before and after the pattern, like
(at somewhere: \path1\path2\a_file.ts:17:9 something)
What I have now is this regex, which manages to match column and line, but I got stuck on filename capturing part.. I guess negative lookahead is the way to go, but it seems to match all previous groups and garbage text in the end of string.
(?!.*[\/\\]):(\d+):(\d+)\D*$
Here's a link to current implementation regex101
You can replace the lookahead with a negated character class:
([^\/\\]+):(\d+):(\d+)\D*$
See the regex demo. Details:
([^\/\\]+) - Group 1: one or more chars other than / and \
: - a colon
(\d+) - Group 2: one or more digits
: - a colon
(\d+) - Group 3: one or more digits
\D*$ - zero or more non-digit chars till end of string.

Why my optional captured group in my regex does not work?

Here is a text example that I will usually get:
CERTIFICATION/repos_1/test_examples_1_01_C.py::test_case[6]
CERTIFICATION/repos_1/test_examples_2_01_C.py::test_case[7]
INTEGRATION/test_example_scan_1.py::test_case
INTEGRATION/test_example_scan_2.py::test_case
Here is the regex I'm using to capture 3 different groups:
^.*\/(.*)\.py.*:{2}(.*(\[.*\])?)
If we take an example with the first line of my examples I should get:
test_examples_1_BV_01_C - test_case[6] - [6]
And for the last line:
test_example_scan_2 - test_case - None
But if you try this regex you will find out that the first example does not work. I can't get
the [6]. If you remove the "?" you will have no match with line that does not have "[.*]" at the end
So, how can I get all those information ? And what do I do wrong ?
Regards
You can use
^.*\/(.*)\.py.*::(.*?(\[.*?\])?)$
See the regex demo
Details:
^ - start of string
.* - any zero or more chars other than line break chars, as many as possible
\/ - a / char
(.*) - Group 1: any zero or more chars other than line break chars, as many as possible
\.py - .py substring
.* - any zero or more chars other than line break chars, as many as possible
:: - a :: string
(.*?(\[.*?\])?) - Group 2: any zero or more chars other than line break chars, as few as possible, and then an optional Group 3 matching [, any zero or more chars other than line break chars, as few as possible, and a ]
$ - end of string.
With the help of negated character class you can get all matches and make this regex lot more efficient:
^.*/([^.]+)\.py::([^[]+(\[[^]]*]|))$
RegEx Demo

Regexp Substring From URL

I need to retrieve some word from url :
WebViewActivity - https://google.com/search/?term=iphone_5s&utm_source=google&utm_campaign=search_bar&utm_content=search_submit
return I want :
search/iphone_5s
but I'm stuck and not really understand how to use regexp_substr to get that data.
I'm trying to use this query
regexp_substr(web_url, '\google.com/([^}]+)\/', 1,1,null,1)
which only return the 'search' word, and when I try
regexp_substr(web_url, '\google.com/([^}]+)\&', 1,1,null,1)
it turns out I get all the word until the last '&'
You may use a REGEXP_REPLACE to match the whole string but capture two substrings and replace with two backreferences to the capture group values:
REGEXP_REPLACE(
'WebViewActivity - https://google.com/search/?term=iphone_5s&utm_source=google&utm_campaign=search_bar&utm_content=search_submit',
'.*//google\.com/([^/]+/).*[?&]term=([^&]+).*',
'\1\2')
See the regex demo and the online Oracle demo.
Pattern details
.* - any zero or more chars other than line break chars as many as possible
//google\.com/ - a //google.com/ substring
([^/]+/) - Capturing group 1: one or more chars other than / and then a /
.* - any zero or more chars other than line break chars as many as possible
[?&]term= - ? or & and a term= substring
([^&]+) - Capturing group 2: one or more chars other than &
.* - any zero or more chars other than line break chars as many as possible
NOTE: To use this approach and get an empty result if the match is not found, append |.+ at the end of the regex pattern.

How to use Ruby gsub with regex to do partial string substitution

I have a pipe delimited file which has a line
H||CUSTCHQH2H||PHPCCIPHP|1010032000|28092017|25001853||||
I want to substitute the date (28092017) with a regex "[0-9]{8}" if the first character is "H"
I tried the following example to test my understanding where Im trying to subtitute "a" with "i".
str = "|123||a|"
str.gsub /\|(.*?)\|(.*?)\|(.*?)\|/, "\|\\1\|\|\\1\|i\|"
But this is giving o/p as
"|123||123|i|"
Any clue how this can be achieved?
You may replace the first occurrence of 8 digits inside pipes if a string starts with H using
s = "H||CUSTCHQH2H||PHPCCIPHP|1010032000|28092017|25001853||||"
p s.gsub(/\A(H.*?\|)[0-9]{8}(?=\|)/, '\100000000')
# or
p s.gsub(/\AH.*?\|\K[0-9]{8}(?=\|)/, '00000000')
See the Ruby demo. Here, the value is replaced with 8 zeros.
Pattern details
\A - start of string (^ is the start of a line in Ruby)
(H.*?\|) - Capturing group 1 (you do not need it when using the variation with \K): H and then any 0+ chars as few as possible
\K - match reset operator that discards the text matched so far
[0-9]{8} - eight digits
(?=\|) - the next char must be |, but it is not added to the match value since it is a positive lookahead that does not consume text.
The \1 in the first gsub is a replacement backreference to the value in Group 1.

Looking for regex to match before and after a number

Given the string
170905-CBM-238.pdf
I'm trying to match 170905-CBM and .pdf so that I can replace/remove them and be left with 238.
I've searched and found pieces that work but can't put it all together.
This-> (.*-) will match the first section and
This-> (.[^/.]+$) will match the last section
But I can't figure out how to tie them together so that it matches everything before, including the second dash and everything after, including the period (or the extension) but does not match the numbers between.
help :) and thank you for your kind consideration.
There are several options to achieve what you need in Nintex.
If you use Extract operation, use (?<=^.*-)\d+(?=\.[^.]*$) as Pattern.
See the regex demo.
Details
(?<=^.*-) - a positive lookbehind requiring, immediately to the left of the current location, the start of string (^), then any 0+ chars other than LF as many as possible up to the last occurrence of - and the subsequent subpatterns
\d+ - 1 or more digits
(?=\.[^.]*$) - a positive lookahead requiring, immediately to the right of the current location, the presence of a . and 0+ chars other than . up to the end of the string.
If you use Replace text operation, use
Pattern: ^.*-([0-9]+)\.[^.]+$
Replacement text: $1
See another regex demo (the Context tab shows the result of the replacement).
Details
^ - a start of string anchor
.* - any 0+ chars other than LF up to the last occurrence of the subsequent subpatterns...
- - a hyphen
([0-9]+) - Group 1: one or more ASCII digits
\. - a literal .
[^.]+ - 1 or more chars other than .
$ - end of string.
The replacement $1 references the value stored in Group 1.
I don't know ninetex regex, but a sed type regex:
$ echo "170905-CBM-238.pdf" | sed -E 's/^.*-([0-9]*)\.[^.]*$/\1/'
238
Same works in Perl:
$ echo "170905-CBM-238.pdf" | perl -pe 's/^.*-([0-9]*)\.[^.]*$/$1/'
238