Regex hard to find - regex

I'd like to build a regex which will be run on the following elements:
test.stuff;visibility:=reexport,
test.stuff;bundle-version="0.0.0";visibility:=reexport,
test.stuff;bundle-version="0.0.0"
test.stuff,
test.stuff
The aim of my regex is to either replace bundle-version="0.0.0" by bundle-version="1.2.3" or to add bundle-version="1.2.3".
After replacement it should produce the following elements:
test.stuff;bundle-version="1.2.3";visibility:=reexport,
test.stuff;bundle-version="1.2.3";visibility:=reexport,
test.stuff;bundle-version="1.2.3"
test.stuff;bundle-version="1.2.3",
test.stuff;bundle-version="1.2.3"
Currently I have the following regex:
(test.*?)([;]+bundle.*?)?([;,]+.*)
With this replacement pattern:
$1;bundle-version="1.2.3"$3
But it doesn't work for these two:
test.stuff;bundle-version="0.0.0" --> becomes test.stuff;bundle-version="1.2.3";bundle-version="0.0.0"
test.stuff --> not matched
Any help would be greatly appreciated, thanks!
EDIT: the regex should only match lines starting with "test.stuff"

This worked for me in C#/LinqPad:
string s = #"test.stuff;visibility:=reexport,
test.stuff;bundle-version=""0.0.0"";visibility:=reexport,
test.stuff;bundle-version=""0.0.0""
test.stuff,
test.stuff";
string pat = "(test[^;,\n]*)([;,]+bundle[^;,\n]*)?([;,]*.*)?";
string rep ="$1;bundle-version=\"1.2.3\"$3";
string result = Regex.Replace(s,pat,rep)
Edit: added \n to first group to avoid capturing a line after last "test.stuff" occurrence.

I would do it in two times:
1. replace bundle-version="0.0.0" par bundle-version="1.2.3"
2. replace stuff(?!;bun) par stuff;bundle-version="1.2.3"

Related

Look for any character that surrounds one of any character including itself

I am trying to write a regex code to find all examples of any character that surrounds one of any character including itself in the string below:
b9fgh9f1;2w;111b2b35hw3w3ww55
So ‘b2b’ and ‘111’ would be valid, but ‘3ww5’ would not be.
Could someone please help me out here?
Thanks,
Nikhil
You can use this regex which will match three characters where first and third are same using back reference, where as middle can be any,
(.).\1
Demo
Edit:
Above regex will only give you non-overlapping matches but as you want to get all matches that are even overlapping, you can use this positive look ahead based regex which doesn't consume the next two characters instead groups them in group2 so for your desired output, you can append characters from group1 and group2.
(.)(?=(.\1))
Demo with overlapping matches
Here is a Java code (I've never programmed in Ruby) demonstrating the code and the same logic you can write in your fav programming language.
String s = "b9fgh9f1;2w;111b2b35hw3w3ww55";
Pattern p = Pattern.compile("(.)(?=(.\\1))");
Matcher m = p.matcher(s);
while(m.find()) {
System.out.println(m.group(1) + m.group(2));
}
Prints all your intended matches,
111
b2b
w3w
3w3
w3w
Also, here is a Python code that may help if you know Python,
import re
s = 'b9fgh9f1;2w;111b2b35hw3w3ww55'
matches = re.findall(r'(.)(?=(.\1))',s)
for m in re.findall(r'(.)(?=(.\1))',s):
print(m[0]+m[1])
Prints all your expected matches,
111
b2b
w3w
3w3
w3w

Python Regex - How to extract the third portion?

My input is of this format: (xxx)yyyy(zz)(eee)fff where {x,y,z,e,f} are all numbers. But fff is optional though.
Input: x = (123)4567(89)(660)
Expected output: Only the eeepart i.e. the number inside 3rd "()" i.e. 660 in my example.
I am able to achieve this so far:
re.search("\((\d*)\)", x).group()
Output: (123)
Expected: (660)
I am surely missing something fundamental. Please advise.
Edit 1: Just added fff to the input data format.
You could find all those matches that have round braces (), and print the third match with findall
import re
n = "(123)4567(89)(660)999"
r = re.findall("\(\d*\)", n)
print(r[2])
Output:
(660)
The (eee) part is identical to the (xxx) part in your regex. If you don't provide an anchor, or some sequencing requirement, then an unanchored search will match the first thing it finds, which is (xxx) in your case.
If you know the (eee) always appears at the end of the string, you could append an "at-end" anchor ($) to force the match at the end. Or perhaps you could append a following character, like a space or comma or something.
Otherwise, you might do well to match the other parts of the pattern and not capture them:
pattern = r'[0-9()]{13}\((\d{3})\)'
If you want to get the third group of numbers in brackets, you need to skip the first two groups which you can do with a repeating non-capturing group which looks for a set of digits enclosed in () followed by some number of non ( characters:
x = '(123)4567(89)(660)'
print(re.search("(?:\(\d+\)[^(]*){2}(\(\d+\))", x).group(1))
Output:
(660)
Demo on rextester

Regex: Separate a string of characters with a non-consistent pattern (Oracle) (POSIX ERE)

EDIT: This question pertains to Oracle implementation of regex (POSIX ERE) which does not support 'lookaheads'
I need to separate a string of characters with a comma, however, the pattern is not consistent and I am not sure if this can be accomplished with Regex.
Corpus: 1710ABCD.131711ABCD.431711ABCD.41711ABCD.4041711ABCD.25
The pattern is basically 4 digits, followed by 4 characters, followed by a dot, followed by 1,2, or 3 digits! To make the string above clear, this is how it looks like separated by a space 1710ABCD.13 1711ABCD.43 1711ABCD.4 1711ABCD.404 1711ABCD.25
So the output of a replace operation should look like this:
1710ABCD.13,1711ABCD.43,1711ABCD.4,1711ABCD.404,1711ABCD.25
I was able to match the pattern using this regex:
(\d{4}\w{4}\.\d{1,3})
It does insert a comma but after the third digit beyond the dot (wrong, should have been after the second digit), but I cannot get it to do it in the right position and globally.
Here is a link to a fiddle
https://regex101.com/r/qQ2dE4/329
All you need is a lookahead at the end of the regular expression, so that the greedy \d{1,3} backtracks until it's followed by 4 digits (indicating the start of the next substring):
(\d{4}\w{4}\.\d{1,3})(?=\d{4})
^^^^^^^^^
https://regex101.com/r/qQ2dE4/330
To expand on #CertainPerformance's answer, if you want to be able to match the last token, you can use an alternative match of $:
(\d{4}\w{4}\.\d{1,3})(?=\d{4}|$)
Demo: https://regex101.com/r/qQ2dE4/331
EDIT: Since you now mentioned in the comment that you're using Oracle's implementation, you can simply do:
regexp_replace(corpus, '(\d{1,3})(\d{4})', '\1,\2')
to get your desired output:
1710ABCD.13,1711ABCD.43,1711ABCD.4,1711ABCD.404,1711ABCD.25
Demo: https://regex101.com/r/qQ2dE4/333
In order to continue finding matches after the first one you must use the global flag /g. The pattern is very tricky but it's feasible if you reverse the string.
Demo
var str = `1710ABCD.131711ABCD.431711ABCD.41711ABCD.4041711ABCD.25`;
// Reverse String
var rts = str.split("").reverse().join("");
// Do a reverse version of RegEx
/*In order to continue searching after the first match,
use the `g`lobal flag*/
var rgx = /(\d{1,3}\.\w{4}\d{4})/g;
// Replace on reversed String with a reversed substitution
var res = rts.replace(rgx, ` ,$1`);
// Revert the result back to normal direction
var ser = res.split("").reverse().join("");
console.log(ser);

Regex to match everything from nth occurence of character onwards [duplicate]

i am trying to build one regex expression for the below sample text in which i need to replace the bold text. So far i could achieve this much
((\|)).*(\|) which is selecting the whole string between the first and last pip char. i am bound to use apache or java regex.
Sample String: where text length between pipes may vary
1.1|ProvCM|111111111111|**10.15.194.25**|10.100.10.3|10.100.10.1|docsis3.0
To match part after nth occurrence of pipe you can use this regex:
/^(?:[^|]*\|){3}([^|]*)/
Here n=3
It will match 10.15.194.25 in matched group #1
RegEx Demo
^((?:[^|]*\\|){3})[^|]+
You can use this.Replace by $1<anything>.See demo.
https://regex101.com/r/tP7qE7/4
This here captures from start of string to | and then captures 3 such groups and stores it in $1.The next part of string till | is what you want.Now you can replace it with anything by $1<textyouwant>.
Here's how you can do the replacement:
String input = "1.1|ProvCM|111111111111|10.15.194.25|10.100.10.3|10.100.10.1|docsis3.0";
int n = 3;
String newValue = "new value";
String output = input.replaceFirst("^((?:[^|]+\\|){"+n+"})[^|]+", "$1"+newValue);
This builds:
"1.1|ProvCM|111111111111|new value|10.100.10.3|10.100.10.1|docsis3.0"

Regex to match all lines starting with a specific string

I have this very long cfg file, where I need to find the latest occurrence of a line starting with a specific string. An example of the cfg file:
...
# format: - search.index.[number] = [search field]:element.qualifier
...
search.index.1 = author:dc.contributor.*
...
search.index.12 = language:dc.language.iso
...
jspui.search.index.display.1 = ANY
...
I need to be able to get the last occurrence of the line starting with search.index.[number] , more specific: I need that number. For the above snippet, that number would be 12.
As you can see, there are other lines too containing that pattern, but I do not want to match those.
I'm using Groovy as a programming/scripting language.
Any help is appreciated!
Have you tried:
def m = lines =~ /(?m)^search\.index\.(\d+)/
m[ -1 ][ 1 ]
Try this as your expression :
^search\.index\.(\d+)/
And then with Groovy you can get your result with:
matcher[0][0]
Here is an explanation page.
I don't think you should go for it but...
If you can do a multi-line search (anyway you have to here), the only way would be to read the file backward. So first, eat everything with a .* (om nom nom)(if you can make the dot match all, (?:.|\s)* if you can't). Now match your pattern search\.index\.(\d+). And you want to match this pattern at the beginning of a line: (?:^|\n) (hoping you're not using some crazy format that doesn't use \n as new line character).
So...
(?:.|\s)*(?:^|\n)search\.index\.(\d+)
The number should be in the 1st matching group. (Test in JavaScript)
PS: I don't know groovy, so sorry if it's totally not appropriate.
Edit:
This should also work:
search\.index\.(\d+)(?!(?:.|\s)*?(?:^|\n)search\.index\.\d+)