regex split string by specific values - regex

Trying to split a string by specific characters and values with a regex expression.
I have the following string for example:
abc.def.ghi:wxyz_1234
I would like to get both 'wxyz' and '1234'.
i.e. the string between ':' and '_' and the string after '_'
Cheers!

Method 1
Maybe,
([^\s:_]+)_(\S+)
might work OK.
RegEx Demo 1
Method 2
With lookbehind, to create a left boundary for pre-underscore string:
(?<=:)([^_]+)_(.+)
RegEx Demo 2
Test
import re
string = '''
abc.def.ghi:wxyz_1234
abc.def.ghi:abcd_78910
abc.def.ghi: foo_baz123
'''
expression = r'([^\s:_]+)_(\S+)'
for i in re.findall(expression, string):
print(i[0])
print(i[1])
Output
wxyz
1234
abcd
78910
foo
baz123
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
RegEx Circuit
jex.im visualizes regular expressions:

string str = "abc.def.ghi:wxyz_1234";
Regex rx = new Regex(":(.*)_(.*)");
Match match = rx.Match(str);
string first =match.Groups[1].Value;
string second= match.Groups[2].Value;

I managed to create the following
Case A - (?<=:)(.+)(?=_)
Case B - (?<=_).*
Guess the options are endless...
Thanks for your assistance!

Related

Regular Expression to find string between two characters

I'm trying to create a REGEX to find the string between \ and > in the following input :
\\RANDOM\APPLE\BOB\GEORGE\MIKE\TOM >>\\TEST\TEST2\TEST3\TEST\TEST\JOHN.
Desired Output:TOM
I've been able to create ([^>]+) to isolate the first section of the string before the first > . I just can't seem to figure out how to expand on this and isolate TOM.
Try
\\([^\\>]+?) >>
Regex Demo
In javascript:
let regex = /\\([^\\>]+?) >>/
// Note \\ is required for literal \ in js
let str = "\\\\RANDOM\\APPLE\\BOB\\GEORGE\\MIKE\\TOM >>\\\\TEST\\TEST2\\TEST3\\TEST\\TEST\\JOHN.";
match = str.match(regex);
console.log(match[1]); //TOM
This should works:
[^\\\s>]+(?=\s*>)
Demo:
It will works even if the desired match has one or more > after it and if has one or more whitespaces before >.
I mean: this regex will match TOM from all this strings:
\\RANDOM\APPLE\BOB\GEORGE\MIKE\TOM >\\TEST\TEST2\TEST3\TEST\TEST\JOHN.
\\RANDOM\APPLE\BOB\GEORGE\MIKE\TOM >>\\TEST\TEST2\TEST3\TEST\TEST\JOHN.
\\RANDOM\APPLE\BOB\GEORGE\MIKE\TOM>>\\TEST\TEST2\TEST3\TEST\TEST\JOHN.

Match the characters that are in quotes followed by the word xyz_id

It is necessary to match the string set into quotes that is followed by the world "xyz_id": For example the text is like this: "xyz_id":"55555" It's necessary to get only 55555 with a regular expression..
Following regex will help you extract "55555":
\"xyz_id\":\"(.*)\"
https://regex101.com/r/5VnGKN/1
Below is sample code in Java:
String x = "\"xyz_id\":\"55555\""; //String on which processing needs to be done
Pattern pat1 = Pattern.compile("\"xyz_id\":\"(.*)\""); //Pattern to compare
Matcher mat1 = pat1.matcher(x);
while(mat1.find()){
System.out.println(mat1.group(1));
}
Output:
55555

Regex to match everything from nth occurence of character onwards [duplicate]

i am trying to build one regex expression for the below sample text in which i need to replace the bold text. So far i could achieve this much
((\|)).*(\|) which is selecting the whole string between the first and last pip char. i am bound to use apache or java regex.
Sample String: where text length between pipes may vary
1.1|ProvCM|111111111111|**10.15.194.25**|10.100.10.3|10.100.10.1|docsis3.0
To match part after nth occurrence of pipe you can use this regex:
/^(?:[^|]*\|){3}([^|]*)/
Here n=3
It will match 10.15.194.25 in matched group #1
RegEx Demo
^((?:[^|]*\\|){3})[^|]+
You can use this.Replace by $1<anything>.See demo.
https://regex101.com/r/tP7qE7/4
This here captures from start of string to | and then captures 3 such groups and stores it in $1.The next part of string till | is what you want.Now you can replace it with anything by $1<textyouwant>.
Here's how you can do the replacement:
String input = "1.1|ProvCM|111111111111|10.15.194.25|10.100.10.3|10.100.10.1|docsis3.0";
int n = 3;
String newValue = "new value";
String output = input.replaceFirst("^((?:[^|]+\\|){"+n+"})[^|]+", "$1"+newValue);
This builds:
"1.1|ProvCM|111111111111|new value|10.100.10.3|10.100.10.1|docsis3.0"

Using Regex is there a way to match outside characters in a string and exclude the inside characters?

I know I can exclude outside characters in a string using look-ahead and look-behind, but I'm not sure about characters in the center.
What I want is to get a match of ABCDEF from the string ABC 123 DEF.
Is this possible with a Regex string? If not, can it be accomplished another way?
EDIT
For more clarification, in the example above I can use the regex string /ABC.*?DEF/ to sort of get what I want, but this includes everything matched by .*?. What I want is to match with something like ABC(match whatever, but then throw it out)DEF resulting in one single match of ABCDEF.
As another example, I can do the following (in sudo-code and regex):
string myStr = "ABC 123 DEF";
string tempMatch = RegexMatch(myStr, "(?<=ABC).*?(?=DEF)"); //Returns " 123 "
string FinalString = myStr.Replace(tempMatch, ""); //Returns "ABCDEF". This is what I want
Again, is there a way to do this with a single regex string?
Since the regex replace feature in most languages does not change the string it operates on (but produces a new one), you can do it as a one-liner in most languages. Firstly, you match everything, capturing the desired parts:
^.*(ABC).*(DEF).*$
(Make sure to use the single-line/"dotall" option if your input contains line breaks!)
And then you replace this with:
$1$2
That will give you ABCDEF in one assignment.
Still, as outlined in the comments and in Mark's answer, the engine does match the stuff in between ABC and DEF. It's only the replacement convenience function that throws it out. But that is supported in pretty much every language, I would say.
Important: this approach will of course only work if your input string contains the desired pattern only once (assuming ABC and DEF are actually variable).
Example implementation in PHP:
$output = preg_replace('/^.*(ABC).*(DEF).*$/s', '$1$2', $input);
Or JavaScript (which does not have single-line mode):
var output = input.replace(/^[\s\S]*(ABC)[\s\S]*(DEF)[\s\S]*$/, '$1$2');
Or C#:
string output = Regex.Replace(input, #"^.*(ABC).*(DEF).*$", "$1$2", RegexOptions.Singleline);
A regular expression can contain multiple capturing groups. Each group must consist of consecutive characters so it's not possible to have a single group that captures what you want, but the groups themselves do not have to be contiguous so you can combine multiple groups to get your desired result.
Regular expression
(ABC).*(DEF)
Captures
ABC
DEF
See it online: rubular
Example C# code
string myStr = "ABC 123 DEF";
Match m = Regex.Match(myStr, "(ABC).*(DEF)");
if (m.Success)
{
string result = m.Groups[1].Value + m.Groups[2].Value; // Gives "ABCDEF"
// ...
}

Search for text with Regex

I need a regular expression to get the text out from between [ and ] within a sentence.
Example Text:
Hello World - Test[**This is my string**]. Good bye World.
Desired Result:
**This is my String**
The regex that I have come up with is Test\\[[a-zA-Z].+\\], but this returns the entire **Test[This is my string]**
You could use a capturing group to access to the text of interest:
\[([^]]+)\]
A quick proof of concept using JavaScript:
var text = 'Hello World - Test[This is my string]. Good bye World.'
var match = /\[([^\]]+)\]/.exec(text)
if (match) {
console.log(match[1]) // "This is my string"
}
If the regular expression engine you are using supports both lookahead and lookbehind, Tim's solution is more appropriate.
Match m = Regex.Match(#"Hello World - Test[This is my string]. Good bye World.",
#"Test\[([a-zA-Z].+)\]");
Console.WriteLine(m.Groups[1].Value);
(?<=Test\[)[^\[\]]*(?=\])
should do what you want.
(?<=Test\[) # Assert that "Test[" can be matched before the current position
[^\[\]]* # Match any number of characters except brackets
(?=\]) # Assert that "]" can be matched after the current position
Read up on lookaround assertions.