Regular Expression to find string between two characters - regex

I'm trying to create a REGEX to find the string between \ and > in the following input :
\\RANDOM\APPLE\BOB\GEORGE\MIKE\TOM >>\\TEST\TEST2\TEST3\TEST\TEST\JOHN.
Desired Output:TOM
I've been able to create ([^>]+) to isolate the first section of the string before the first > . I just can't seem to figure out how to expand on this and isolate TOM.

Try
\\([^\\>]+?) >>
Regex Demo
In javascript:
let regex = /\\([^\\>]+?) >>/
// Note \\ is required for literal \ in js
let str = "\\\\RANDOM\\APPLE\\BOB\\GEORGE\\MIKE\\TOM >>\\\\TEST\\TEST2\\TEST3\\TEST\\TEST\\JOHN.";
match = str.match(regex);
console.log(match[1]); //TOM

This should works:
[^\\\s>]+(?=\s*>)
Demo:
It will works even if the desired match has one or more > after it and if has one or more whitespaces before >.
I mean: this regex will match TOM from all this strings:
\\RANDOM\APPLE\BOB\GEORGE\MIKE\TOM >\\TEST\TEST2\TEST3\TEST\TEST\JOHN.
\\RANDOM\APPLE\BOB\GEORGE\MIKE\TOM >>\\TEST\TEST2\TEST3\TEST\TEST\JOHN.
\\RANDOM\APPLE\BOB\GEORGE\MIKE\TOM>>\\TEST\TEST2\TEST3\TEST\TEST\JOHN.

Related

regex split string by specific values

Trying to split a string by specific characters and values with a regex expression.
I have the following string for example:
abc.def.ghi:wxyz_1234
I would like to get both 'wxyz' and '1234'.
i.e. the string between ':' and '_' and the string after '_'
Cheers!
Method 1
Maybe,
([^\s:_]+)_(\S+)
might work OK.
RegEx Demo 1
Method 2
With lookbehind, to create a left boundary for pre-underscore string:
(?<=:)([^_]+)_(.+)
RegEx Demo 2
Test
import re
string = '''
abc.def.ghi:wxyz_1234
abc.def.ghi:abcd_78910
abc.def.ghi: foo_baz123
'''
expression = r'([^\s:_]+)_(\S+)'
for i in re.findall(expression, string):
print(i[0])
print(i[1])
Output
wxyz
1234
abcd
78910
foo
baz123
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
RegEx Circuit
jex.im visualizes regular expressions:
string str = "abc.def.ghi:wxyz_1234";
Regex rx = new Regex(":(.*)_(.*)");
Match match = rx.Match(str);
string first =match.Groups[1].Value;
string second= match.Groups[2].Value;
I managed to create the following
Case A - (?<=:)(.+)(?=_)
Case B - (?<=_).*
Guess the options are endless...
Thanks for your assistance!

Regex to match everything from nth occurence of character onwards [duplicate]

i am trying to build one regex expression for the below sample text in which i need to replace the bold text. So far i could achieve this much
((\|)).*(\|) which is selecting the whole string between the first and last pip char. i am bound to use apache or java regex.
Sample String: where text length between pipes may vary
1.1|ProvCM|111111111111|**10.15.194.25**|10.100.10.3|10.100.10.1|docsis3.0
To match part after nth occurrence of pipe you can use this regex:
/^(?:[^|]*\|){3}([^|]*)/
Here n=3
It will match 10.15.194.25 in matched group #1
RegEx Demo
^((?:[^|]*\\|){3})[^|]+
You can use this.Replace by $1<anything>.See demo.
https://regex101.com/r/tP7qE7/4
This here captures from start of string to | and then captures 3 such groups and stores it in $1.The next part of string till | is what you want.Now you can replace it with anything by $1<textyouwant>.
Here's how you can do the replacement:
String input = "1.1|ProvCM|111111111111|10.15.194.25|10.100.10.3|10.100.10.1|docsis3.0";
int n = 3;
String newValue = "new value";
String output = input.replaceFirst("^((?:[^|]+\\|){"+n+"})[^|]+", "$1"+newValue);
This builds:
"1.1|ProvCM|111111111111|new value|10.100.10.3|10.100.10.1|docsis3.0"

Regex match string between two characters

Let's say I have a string containing a filename that includes width and height..
eg.
"en/text/org-affiliate-250x450.en.gif"
how can I get only the "250" contained by '-' and 'x' and then the "450" containd by 'x' and '.' using regex?
I tried following this answer but with no luck.
Regular Expression to find a string included between two characters while EXCLUDING the delimiters
If you are using R then you can try following solution
txt = "en/text/org-affiliate-250x450.en.gif"
x <- gregexpr("[0-9]+", txt)
x2 <- as.numeric(unlist(regmatches(txt, x)))
Use a lookbehind and a lookahead:
(?<=-|x)\d+(?=x|\.)
(?<=-|x) Lookbehind for either a - or a x.
\d+ Match digits.
(?=x|\.) Lookahead for either a x or a ..
Try the regex here.
Use the regex -(\d)+x(\d+)\.:
var str = 'en/text/org-affiliate-250x450.en.gif';
var numbers = /-(\d+)x(\d+)\./.exec(str);
numbers = [parseInt(numbers[1]), parseInt(numbers[2])];
console.log(numbers);

Replace pattern with pattern in vb.net string

I want to replace "0A ","0B ",...,"1A ","1B ",... patterns with "0A|","0B|",...,"1A|","1B|",... from string vb.net
I can write individual replace lines like
string = string.Replace("0A ", "0A|")
string = string.Replace("0B ", "0B|")
.
.
.
string = string.Replace("0Z ", "0Z|")
But, I would have to write too many lines(26*10*2- Two because such scenario occurs twice) and it just doesn't seem to be a good solution. Can someone give me a good regex solution for this?
Use Regex.Replace:
result = Regex.Replace(string, "(\d+[A-Z]+) ", "$1|")
I used the pattern \d+[A-Z]+ to represent the text under the assumption that your series of data might see more than one digit/letter. This seems to be working in the demo below.
Demo
Regex: \s Substitution: |
Details:
\s Matches any whitespace character
Regex demo
VB.NET code:
Regex.Replace("0A ", "\s", "|") Output: 0A|

Regular Expression Split

I have a string as mentioned below. I have been trying to split using regular expression and going through the forums, I found ([^|]+) which would match everything except (pipe) However I want to break this into two using regular expressions, but not been able to do this. So one expression would be (xyz) which would extract from GA till everything before the pipe character, the second would be (abc) which would extract anything after the first pipe.
GA1.2.1127630839.1468526914|3847EFF358ABEC90-01A39B0290BAC298
The first is ^[^|]+ and the second is [^|]+$.
The idea is to use your negated character class with anchors. ^ will match the string start and $ will matchthe string end.
These two patterns have no lookarounds and will work with almost any regex flavor.
Guessing at popular languages. :-)
Python:
'GA1.2.1127630839.1468526914|3847EFF358ABEC90-01A39B0290BAC298'.split('|')
JavaScript:
'GA1.2.1127630839.1468526914|3847EFF358ABEC90-01A39B0290BAC298'.split('|')
PHP:
explode('|', 'GA1.2.1127630839.1468526914|3847EFF358ABEC90-01A39B0290BAC298')
Go:
strings.Split("GA1.2.1127630839.1468526914|3847EFF358ABEC90-01A39B0290BAC298", "|")
Ruby:
'GA1.2.1127630839.1468526914|3847EFF358ABEC90-01A39B0290BAC298'.split('|')
EDIT
After clarification, I get what you're asking. Fiddling with regex101.com, I found that those two expressions should give you what you want:
^.*(?=\|) gets the first part, and
(?<=\|).* gets the second.
When you click on the link, you can see it in action.
PREVIOUS ANSWER
Many alternatives to regular expressions as #smarx's answer reveals.
But something along those lines should do it:
R
myString <- 'GA1.2.1127630839.1468526914|3847EFF358ABEC90-01A39B0290BAC298'
part1 <- sub(pattern = "(.*)\\|(.*)", x = myString, replacement = "\\1")
part2 <- sub(pattern = "(.*)\\|(.*)", x = myString, replacement = "\\2")
R requires doubling all backslashes, some other languages don't.
Python
import re
myString = 'GA1.2.1127630839.1468526914|3847EFF358ABEC90-01A39B0290BAC298'
part1 = re.sub(pattern="(.*)\|(.*)", repl = "\\1", string = myString)
part1 = re.sub(pattern="(.*)\|(.*)", repl = "\\2", string = myString)