Regex choose based on string format

Regex choose based on string format - regex

I have following formats of data:
CumulativeReport_cumulativeReportBins_CumulativeBinNetworksViews_totalSuccessfulHeartbeats_1
CumulativeReport_cumulativeReportBins_CumulativeBinNetworksViews_totalSuccessfulHeartbeats__1
I am using following regex:
^(.*)_(.*?_.*?)(_\d$|__\d$)
My requirement every time is to get CumulativeBinNetworksViews_totalSuccessfulHeartbeats. For first case its working fine but for second case its printing "totalSuccessfulHeartbeats_1". How to solve this.

You can use
^(.*)_([^_]+_[^_]+)__?\d$
See the regex demo. Details:
^ - start of string
(.*) - Group 1: any zero or more chars other than line break chars as many as possible
_ - an underscore
([^_]+_[^_]+) - Group 2: one or more chars other than _, _ and one or more chars other than _
__? - one or two underscores
\d - a digit
$ - end of string.

Related

golang regex get the string including the search character

I am extracting a piece of string from a string (link):
https://arteptweb-vh.akamaihd.net/i/am/ptweb/100000/100000/100095-000-A_0_VO-STE%5BANG%5D_AMM-PTWEB_XQ.1V7rLEYkPH.smil/master.m3u8
The desired output should be 100000/100000/100095-000-A_
I am using the Regex ^.*?(/[i,na,fm,d]([,/]?)(/am/ptweb/|.+=.+,))([^_]*).*?$ in Golang flavor and I can get only the group 4 with the folowing output 100000/100000/100095-000-A
However I want the underscore after A.
Bit stuck on this, any help on this is appreciated.

You can use
(/(i|na|fm|d)(/am/ptweb/|.+=.+,))([^_]*_?)
See the regex demo.
Details:
(/(i|na|fm|d)(/am/ptweb/|.+=.+,)) - Group 1:
/ - a / char
(i|na|fm|d) - Group 2: i, na, fm or d
(/am/ptweb/|.+=.+,) - Group 3: /amp/ptweb/ or one or more chars as many as possible (other than line break chars), =, one or more chars as many as possible (other than line break chars) and a , char
([^_]*_?) - Group 4: zero or more chars other than _ and then an optional _.

You can match the underscore after the A like:
^.*?(/(?:[id]|na|fm)([,/]?)(/am/ptweb/|.+=.+,))([^_]*_).*$
See a regex demo
A few notes about the pattern that you tried:
This notation is a character class [i,na,fm,d] which should be a grouping (?:[id]|na|fm)
In this group ([,/]?) you optionally capture either , or / so in theory it could match a string that has /i//am/ptweb/
The last part .*?$ does not have to be non greedy as it is the last part of the pattern
This part [^_]* can also match spaces and newlines

Need a regex to find a number and text in a filename

Have filenames in the format:
021-05-05_10-10-12-111_Nancy_Test_123456-1234_194456454390816_OD_2021042911270.pdf
I need to find “123456-1234” and OD.
In the 123456-1234 number, the ‘-’ are wildcards so the number can be eg. 1234561234, 123456**1234, 123456_1234 - but there will always be 10 digits. (0-9) and the wildcard (if any) will be between the 6'th and 7'th digit.
The “OD” can be “OD” or “OS”, ignore case.
The number and OD/OS must be moved to the beginning of the filename with a server name in between, and today's date after OD/OS to uppercase.
Eg: 123456-1234_servername1_OD_yyyy_mm_dd_ss_021-05-05_10-10-12-111_Nancy_Test_194456454390816_2021042911270.pdf.
I'm using a file renaming program that will take the regex.
(Don't know if advertising is allowed at StackOverflow, if it is, I will of course provide a link to the program).
This is what I got so far:
(?:_od_|_sd_) gives me the OD or SD
(?<!\d)\d{10}(?!\d) gives me the 1234561234 but only if there are no wildcards between the 6'th and 7'th digit.
Furthermore, I can't figure out how to put them together and move them in front with the server name in between.

You can use
(?i)^(.*?)_(\d{6}\D*\d{4})_(\d+)_(od|sd)_
Replace with $2_servername1_$4_yyyy_mm_dd_ss_$1_$3_.
See the regex demo. Details:
(?i) - case insensitive mode on
^ - start of string
(.*?) - Group 1: any zero or more chars other than line break chars as few as possible
_ - an underscore
(\d{6}\D*\d{4}) - Group 2: six digits, zero or more non-digits, four digits
_ - an underscore
(\d+) - Group 3: one or more digits
_ - an underscore
(od|sd) - Group 4: od or sd
_ - an underscore

Ignore Until "Spacebar+I or V or X" - Regex Expression

So... I had a regex which worked just fine (wasn't pretty but worked), until the Roman Numerals reached more than X.
Currently my Regex looks like this:
(.*?)(^(X{1,3})(I[XV]|V?I{0,3})$|^(I[XV]|V?I{1,3})$|^V$)*(.)( EP\. )(\d*)(.*)
The problem I have right now is that if roman numeral has value 10 or more it's is in 1st group which drives me nuts.
I need it to work in a way that all before roman numerals is ignored.
Test Text:
PEPA THE PIG XVI EP. 169 - BAD ENDING
Could you please help me fix the regex so it would actually do what it suppose to do?

You should re-consider using anchors in the middle of a regex: ^ requires start of string and $ requires the end of string.
Besides, (.) before ( Ep\. ) consume the space, and the Ep pattern cannot match it.
Consider using
^(.*?)\b(X{1,3}(?:I[XV]|V?I{0,3})|I[XV]|V?I{1,3}|V)\b(.)\b(EP\.)\s*(\d+)(.*)
See the regex demo. You might still need to check what exactly you want to match with (.).
Details:
^ - start of string
(.*?) - Group 1: any zero or more chars other than line break chars, as few as possible
\b - a word boundary
(X{1,3}(?:I[XV]|V?I{0,3})|I[XV]|V?I{1,3}|V) - Group 2: one to three Xs followed with IX or IV, or with an optional V and then zero to three Is, or IX, IV, or an optional V followed with one to three Is or V
\b - a word boundary
(.) - Group 3: any one char (other than a newline)
\b - a word boundary
(EP\.) - Group 4: EP.
\s* - zero or more whitespaces
(\d+) - Group 5: one or more digits
(.*) - Group 6: any zero or more chars other than line break chars, as many as possible

Why my optional captured group in my regex does not work?

Here is a text example that I will usually get:
CERTIFICATION/repos_1/test_examples_1_01_C.py::test_case[6]
CERTIFICATION/repos_1/test_examples_2_01_C.py::test_case[7]
INTEGRATION/test_example_scan_1.py::test_case
INTEGRATION/test_example_scan_2.py::test_case
Here is the regex I'm using to capture 3 different groups:
^.*\/(.*)\.py.*:{2}(.*(\[.*\])?)
If we take an example with the first line of my examples I should get:
test_examples_1_BV_01_C - test_case[6] - [6]
And for the last line:
test_example_scan_2 - test_case - None
But if you try this regex you will find out that the first example does not work. I can't get
the [6]. If you remove the "?" you will have no match with line that does not have "[.*]" at the end
So, how can I get all those information ? And what do I do wrong ?
Regards

You can use
^.*\/(.*)\.py.*::(.*?(\[.*?\])?)$
See the regex demo
Details:
^ - start of string
.* - any zero or more chars other than line break chars, as many as possible
\/ - a / char
(.*) - Group 1: any zero or more chars other than line break chars, as many as possible
\.py - .py substring
.* - any zero or more chars other than line break chars, as many as possible
:: - a :: string
(.*?(\[.*?\])?) - Group 2: any zero or more chars other than line break chars, as few as possible, and then an optional Group 3 matching [, any zero or more chars other than line break chars, as few as possible, and a ]
$ - end of string.

With the help of negated character class you can get all matches and make this regex lot more efficient:
^.*/([^.]+)\.py::([^[]+(\[[^]]*]|))$
RegEx Demo

Regexp Substring From URL

I need to retrieve some word from url :
WebViewActivity - https://google.com/search/?term=iphone_5s&utm_source=google&utm_campaign=search_bar&utm_content=search_submit
return I want :
search/iphone_5s
but I'm stuck and not really understand how to use regexp_substr to get that data.
I'm trying to use this query
regexp_substr(web_url, '\google.com/([^}]+)\/', 1,1,null,1)
which only return the 'search' word, and when I try
regexp_substr(web_url, '\google.com/([^}]+)\&', 1,1,null,1)
it turns out I get all the word until the last '&'

You may use a REGEXP_REPLACE to match the whole string but capture two substrings and replace with two backreferences to the capture group values:
REGEXP_REPLACE(
'WebViewActivity - https://google.com/search/?term=iphone_5s&utm_source=google&utm_campaign=search_bar&utm_content=search_submit',
'.*//google\.com/([^/]+/).*[?&]term=([^&]+).*',
'\1\2')
See the regex demo and the online Oracle demo.
Pattern details
.* - any zero or more chars other than line break chars as many as possible
//google\.com/ - a //google.com/ substring
([^/]+/) - Capturing group 1: one or more chars other than / and then a /
.* - any zero or more chars other than line break chars as many as possible
[?&]term= - ? or & and a term= substring
([^&]+) - Capturing group 2: one or more chars other than &
.* - any zero or more chars other than line break chars as many as possible
NOTE: To use this approach and get an empty result if the match is not found, append |.+ at the end of the regex pattern.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex choose based on string format - regex

Related

golang regex get the string including the search character

Need a regex to find a number and text in a filename

Ignore Until "Spacebar+I or V or X" - Regex Expression

Why my optional captured group in my regex does not work?

Regexp Substring From URL

Categories

Resources