Regex to match everything from nth occurence of character onwards [duplicate] - regex

i am trying to build one regex expression for the below sample text in which i need to replace the bold text. So far i could achieve this much
((\|)).*(\|) which is selecting the whole string between the first and last pip char. i am bound to use apache or java regex.
Sample String: where text length between pipes may vary
1.1|ProvCM|111111111111|**10.15.194.25**|10.100.10.3|10.100.10.1|docsis3.0

To match part after nth occurrence of pipe you can use this regex:
/^(?:[^|]*\|){3}([^|]*)/
Here n=3
It will match 10.15.194.25 in matched group #1
RegEx Demo

^((?:[^|]*\\|){3})[^|]+
You can use this.Replace by $1<anything>.See demo.
https://regex101.com/r/tP7qE7/4
This here captures from start of string to | and then captures 3 such groups and stores it in $1.The next part of string till | is what you want.Now you can replace it with anything by $1<textyouwant>.

Here's how you can do the replacement:
String input = "1.1|ProvCM|111111111111|10.15.194.25|10.100.10.3|10.100.10.1|docsis3.0";
int n = 3;
String newValue = "new value";
String output = input.replaceFirst("^((?:[^|]+\\|){"+n+"})[^|]+", "$1"+newValue);
This builds:
"1.1|ProvCM|111111111111|new value|10.100.10.3|10.100.10.1|docsis3.0"

Related

Regex: Separate a string of characters with a non-consistent pattern (Oracle) (POSIX ERE)

EDIT: This question pertains to Oracle implementation of regex (POSIX ERE) which does not support 'lookaheads'
I need to separate a string of characters with a comma, however, the pattern is not consistent and I am not sure if this can be accomplished with Regex.
Corpus: 1710ABCD.131711ABCD.431711ABCD.41711ABCD.4041711ABCD.25
The pattern is basically 4 digits, followed by 4 characters, followed by a dot, followed by 1,2, or 3 digits! To make the string above clear, this is how it looks like separated by a space 1710ABCD.13 1711ABCD.43 1711ABCD.4 1711ABCD.404 1711ABCD.25
So the output of a replace operation should look like this:
1710ABCD.13,1711ABCD.43,1711ABCD.4,1711ABCD.404,1711ABCD.25
I was able to match the pattern using this regex:
(\d{4}\w{4}\.\d{1,3})
It does insert a comma but after the third digit beyond the dot (wrong, should have been after the second digit), but I cannot get it to do it in the right position and globally.
Here is a link to a fiddle
https://regex101.com/r/qQ2dE4/329
All you need is a lookahead at the end of the regular expression, so that the greedy \d{1,3} backtracks until it's followed by 4 digits (indicating the start of the next substring):
(\d{4}\w{4}\.\d{1,3})(?=\d{4})
^^^^^^^^^
https://regex101.com/r/qQ2dE4/330
To expand on #CertainPerformance's answer, if you want to be able to match the last token, you can use an alternative match of $:
(\d{4}\w{4}\.\d{1,3})(?=\d{4}|$)
Demo: https://regex101.com/r/qQ2dE4/331
EDIT: Since you now mentioned in the comment that you're using Oracle's implementation, you can simply do:
regexp_replace(corpus, '(\d{1,3})(\d{4})', '\1,\2')
to get your desired output:
1710ABCD.13,1711ABCD.43,1711ABCD.4,1711ABCD.404,1711ABCD.25
Demo: https://regex101.com/r/qQ2dE4/333
In order to continue finding matches after the first one you must use the global flag /g. The pattern is very tricky but it's feasible if you reverse the string.
Demo
var str = `1710ABCD.131711ABCD.431711ABCD.41711ABCD.4041711ABCD.25`;
// Reverse String
var rts = str.split("").reverse().join("");
// Do a reverse version of RegEx
/*In order to continue searching after the first match,
use the `g`lobal flag*/
var rgx = /(\d{1,3}\.\w{4}\d{4})/g;
// Replace on reversed String with a reversed substitution
var res = rts.replace(rgx, ` ,$1`);
// Revert the result back to normal direction
var ser = res.split("").reverse().join("");
console.log(ser);

Regex hard to find

I'd like to build a regex which will be run on the following elements:
test.stuff;visibility:=reexport,
test.stuff;bundle-version="0.0.0";visibility:=reexport,
test.stuff;bundle-version="0.0.0"
test.stuff,
test.stuff
The aim of my regex is to either replace bundle-version="0.0.0" by bundle-version="1.2.3" or to add bundle-version="1.2.3".
After replacement it should produce the following elements:
test.stuff;bundle-version="1.2.3";visibility:=reexport,
test.stuff;bundle-version="1.2.3";visibility:=reexport,
test.stuff;bundle-version="1.2.3"
test.stuff;bundle-version="1.2.3",
test.stuff;bundle-version="1.2.3"
Currently I have the following regex:
(test.*?)([;]+bundle.*?)?([;,]+.*)
With this replacement pattern:
$1;bundle-version="1.2.3"$3
But it doesn't work for these two:
test.stuff;bundle-version="0.0.0" --> becomes test.stuff;bundle-version="1.2.3";bundle-version="0.0.0"
test.stuff --> not matched
Any help would be greatly appreciated, thanks!
EDIT: the regex should only match lines starting with "test.stuff"
This worked for me in C#/LinqPad:
string s = #"test.stuff;visibility:=reexport,
test.stuff;bundle-version=""0.0.0"";visibility:=reexport,
test.stuff;bundle-version=""0.0.0""
test.stuff,
test.stuff";
string pat = "(test[^;,\n]*)([;,]+bundle[^;,\n]*)?([;,]*.*)?";
string rep ="$1;bundle-version=\"1.2.3\"$3";
string result = Regex.Replace(s,pat,rep)
Edit: added \n to first group to avoid capturing a line after last "test.stuff" occurrence.
I would do it in two times:
1. replace bundle-version="0.0.0" par bundle-version="1.2.3"
2. replace stuff(?!;bun) par stuff;bundle-version="1.2.3"

Extracting Number from Log File

I'm trying to extract a number from a log file that outputs lines of text like this:
1/11/2016 3:26:12 AM 1/11/2016 3:27:00 AM 45.6 A
The output from the line is 45.6 A
However, my Regex code is returning the 12 A from 3:26:12 AM. I need it to completely ignore the time number and just output the 45.6 A.
Here's my Regex code:
$regex = '\d+(?:\.\d+)?(?=\s+A)'
You just forgot to anchor the lookeahead at the end of the string:
\d+(?:\.\d+)?(?=\s+A$)
^
See the regex demo
The \d+(?:\.\d+)? will match one or more digits optionally followed with a . followed with one or more digits (a float value), and the (?=\s+A$) lookahead will require one or more whitespace characters with A right at the end of the string to appear after the float value.
$s = '1/11/2016 3:26:12 AM 1/11/2016 3:27:00 AM 45.6 A'
$rx = '\d+(?:\.\d+)?(?=\s+A$)'
$result = [regex]::Match($s, $rx, 'RightToLeft')
if ($result) { $result.Value; }
You can use word boundary (\b) to match only A, not AM:
\d+(?:\.\d+)?(?=\s+A\b)
DEMO: https://regex101.com/r/pA7jK2/1
if you just need find the last digit with an A in it, try this
(\d+\.\d\sA)
Demo here

Regular expression extract filename from line content

I'm very new to regular expression. I want to extract the following string
"109_Admin_RegistrationResponse_20130103.txt"
from this file content, the contents is selected per line:
01-10-13 10:44AM 47 107_Admin_RegistrationDetail_20130111.txt
01-10-13 10:40AM 11 107_Admin_RegistrationResponse_20130111.txt
The regular expression should not pick the second line, only the first line should return a true.
Your Regex has a lot of different mistakes...
Your line does not start with your required filename but you put an ^ there
missing + in your character group [a-zA-Z], hence only able to match a single character
does not include _ in your character group, hence it won't match Admin_RegistrationResponse
missing \ and d{2} would match dd only.
As per M42's answer (which I left out), you also need to escape your dot . too, or it would match 123_abc_12345678atxt too (notice the a before txt)
Your regex should be
\d+_[a-zA-Z_]+_\d{4}\d{2}\d{2}\.txt$
which can be simplified as
\d+_[a-zA-Z_]+_\d{8}\.txt$
as \d{2}\d{2} really look redundant -- unless you want to do with capturing groups, then you would do:
\d+_[a-zA-Z_]+_(\d{4})(\d{2})(\d{2})\.txt$
Remove the anchors and escape the dot:
\d+[a-zA-Z_]+\d{8}\.txt
I'm a newbie in php but i think you can use explode() function in php or any equivalent in your language.
$string = "01-09-13 10:17AM 11 109_Admin_RegistrationResponse_20130103.txt";
$pieces = explode("_", $string);
$stringout = "";
foreach($i = 0;$i<count($pieces);i++){
$stringout = $stringout.$pieces[$i];
}

Using Regex is there a way to match outside characters in a string and exclude the inside characters?

I know I can exclude outside characters in a string using look-ahead and look-behind, but I'm not sure about characters in the center.
What I want is to get a match of ABCDEF from the string ABC 123 DEF.
Is this possible with a Regex string? If not, can it be accomplished another way?
EDIT
For more clarification, in the example above I can use the regex string /ABC.*?DEF/ to sort of get what I want, but this includes everything matched by .*?. What I want is to match with something like ABC(match whatever, but then throw it out)DEF resulting in one single match of ABCDEF.
As another example, I can do the following (in sudo-code and regex):
string myStr = "ABC 123 DEF";
string tempMatch = RegexMatch(myStr, "(?<=ABC).*?(?=DEF)"); //Returns " 123 "
string FinalString = myStr.Replace(tempMatch, ""); //Returns "ABCDEF". This is what I want
Again, is there a way to do this with a single regex string?
Since the regex replace feature in most languages does not change the string it operates on (but produces a new one), you can do it as a one-liner in most languages. Firstly, you match everything, capturing the desired parts:
^.*(ABC).*(DEF).*$
(Make sure to use the single-line/"dotall" option if your input contains line breaks!)
And then you replace this with:
$1$2
That will give you ABCDEF in one assignment.
Still, as outlined in the comments and in Mark's answer, the engine does match the stuff in between ABC and DEF. It's only the replacement convenience function that throws it out. But that is supported in pretty much every language, I would say.
Important: this approach will of course only work if your input string contains the desired pattern only once (assuming ABC and DEF are actually variable).
Example implementation in PHP:
$output = preg_replace('/^.*(ABC).*(DEF).*$/s', '$1$2', $input);
Or JavaScript (which does not have single-line mode):
var output = input.replace(/^[\s\S]*(ABC)[\s\S]*(DEF)[\s\S]*$/, '$1$2');
Or C#:
string output = Regex.Replace(input, #"^.*(ABC).*(DEF).*$", "$1$2", RegexOptions.Singleline);
A regular expression can contain multiple capturing groups. Each group must consist of consecutive characters so it's not possible to have a single group that captures what you want, but the groups themselves do not have to be contiguous so you can combine multiple groups to get your desired result.
Regular expression
(ABC).*(DEF)
Captures
ABC
DEF
See it online: rubular
Example C# code
string myStr = "ABC 123 DEF";
Match m = Regex.Match(myStr, "(ABC).*(DEF)");
if (m.Success)
{
string result = m.Groups[1].Value + m.Groups[2].Value; // Gives "ABCDEF"
// ...
}