Regex select everything up until next match including new lines - regex

Im trying to capture the conversation below but the regex expression only capture a single line, I want it to capture the entire phrase said by anyone up until the next person says anything else. If I use the /s setting, the '.+' will capture everything until the end of the file not until the next match
Im new to the regular expressions, sorry for any bad explanation
This is what Ive got so far
The regex expression:
/([0-9]{2}\/[0-9]{2}\/[0-9]{2} [0-9]{2}\:[0-9]{2}\:[0-9]{2}: (.+):) (.+)/
What I want
Regex101 Fiddle
I going to use use both \2 and \3 to capture who said and the phrase said inside a for loop so I can text mine it

Using a pattern to extract, then some LINQ to process:
var pattern = "^[0-9]{2}/[0-9]{2}/[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}: (.+?): ((?:[^/]+(?:\n|$))+)";
var data = Regex.Matches(src, pattern, RegexOptions.Multiline).Cast<Match>().Select(m => new { who = m.Groups[1].Value, text = m.Groups[2].Value});

Related

How can I match multiple hits between 2 delimiters?

Hi, my fellow RegEx'ers ;)
I'm trying to match multiple Texts between every two quotes
Here's my text:
...random code
someArray[] = ["Come and",
"get me,",
"or fail",
"trying!",
"Yours truly"]
random code...
So far, I managed to get the correct matches with two patterns, executed after each other:
(?s)someArray\[\].*?=.*?\[(.*?)\]
this extracts the text between the two brackets and on the result, I use this one:
"(.*?)"
This is working just fine, but I'd love to get the Texts in one regex.
Any help is highly appreciated!
Consider using \G. With its help, you may match "(.*?)" preceded by either someArray[] = [ or previous match of "(.*?)" (well, strictly speaking previous match of entire regex). Then just grab first capture groups from all matches:
(?:(?s).*someArray\[\].*?=.*?\[|\G[^"\]]+)"(.*?)"
Demo: https://regex101.com/r/eBQWdU/3
How you grab the first capture groups from depends on the language you're using regex in. For example in PHP you may do something like this:
preg_match_all('/(?:(?s).*someArray\[\].*?=.*?\[|\G[^"\]]+)"(.*?)"/', $input, $matches);
$array_items = $matches[1];
Demo: https://ideone.com/mZgU1x

Regex wrapping word

Regex example
How can I exclude the first space in every match?
The same regex: (?:^|\W)#(\w+)(?!\w)
Is this what you're looking for?
http://regexr.com/3ca98
From the information you gave us until now, this regex should also be sufficient: #(\w+)(?!\w).
But maybe there's more to it than we know. What did you want to achieve with the (?:^|\W)?
Edit: Thinking about what you probably want to achieve, it occured to me that you might only match your pattern if it's not in the middle of another word (e.g. test#case). You probably don't want to match this.
To exclude such cases, you have to asure that there's some kind of whitespace character in front of it, or in other words: nothing else but whitespace characters or nothing.
I assume you use javascript because regexr.com does and sadly, there is no regex lookbehind available in javascripts regex implementation. So there is no real option to make sure there is only nothing or whitespace in front of your pattern.
One solution would be to work with capture groups. Take this regex:
(?:^|\s+)(#\w+)
It searches for one or more whitespace characters or linestarts in front of your pattern but doesn't use a capture group for that. Then your pattern is up and it's the first capture group in the whole expression.
To use this in javascript now, you need to instantiate a RegExp object and use its function exec until there are no more matches and save the first capture group to a result array.
JS code:
var txt = text.innerHTML;
var re = /(?:^|\s+)(#\w+)/g;
var res = [];
var tmpresult = [];
while ((tmpresult = re.exec(txt)) !== null) {
res.push(tmpresult[1]); // push first capture group to result stack
}
result.innerHTML = JSON.stringify(res, null, 2);
JSFiddle: https://jsfiddle.net/j41tw4hm/1/
Updated regexr.com: http://regexr.com/3ca9n

Regex Fine End of Search Field then remove rest of line

Hi I have been looking for a Regex I can find most of what im after but not quite right.
Im trying to do a find a replace using regex, which i can get to work but not quite the way i want to.
An example of what i am searching is
10/01/14PUT/a/users/84335httpetcetcetcete
10/01/14GET/a/users/663/badges?thisisatest
10/01/14GET/a/users/8836:thisisatestetc
What im trying to do is and the end of the user digits as shown below by a % i have put in temporarily i want to remove the rest of the line.
10/01/14PUT/a/users/84335%httpetcetcetcete
10/01/14GET/a/users/663%/badges?thisisatest
10/01/14GET/a/users/8836%:thisisatestetc
I have been using s = s.regex.replace(s, "a/users/\d*", " ")
but this if obviously not working, so close yet so far.
Any assistance is gratefully received.
Many thanks, VBVirg
You were actually on the right track, the regex you came up with is almost what you need:
a/users/\d*
But what your call did was actually replace what you wanted to preserve with a space.
The regex you're looking for would be more like this:
(a\/users\/\d*).*$
And you would use it in the Replace() method as follows:
s = Regex.Replace(s, "(a\/users\/\d*).*$", "$1") />
The $1 is a backreference to the capture group (the part of the regex in parentheses). So what this would do is take whatever part of the string matches that regex, and replace it with only what is in the capture group.
How about: s = s.regex.replace(s, "(a/users/\d*).*", "\1")
This will save the "a/users/(digits)" string to a variable (\1), so it doesn't get deleted by the replace function.
I think the following will do what you want:
s = Regex.Replace(s, "^(.*\/users\/\d*).*$", "$1")
It works by capturing the part of the string you are interested in and replacing the whole string with just the part that was captured.

Regular Expression for phrases starting with TO

I am pretty new to Regular Expression. I want to write a regular expression to get the TO Followed by the rest of it after each new line. I tried to use this but doesn't work properly.
^TO\n?\s?[A-Za-z0-9]\n?[A-Za-z0-9]
It only highlights properly the TO W11 which all are in one line. Highlights only TO from first data and the 3rd data only highlights the first line. Basically it doesn't read the new lines.
Some of my data looks like this:
TO
EXTERNAL
TRAVERSE
TO W11
TO CONTROL
TRAVERSE
I would appreciate if anybody can help me.
Make sure you use a multiline regex:
var options = RegexOptions.MultiLine;
foreach (Match match in Regex.Matches(input, pattern, options))
...
More at: http://msdn.microsoft.com/en-us/library/yd1hzczs(v=vs.110).aspx
It looks like your pattern isn't matching because the start of the string is really a space and not the T character. Also, [A-Za-z0-9] matches only one character, and you want the whole word. I used the + to denote that I want one or more matches of those characters.
(TO\n?\s?[A-Za-z0-9]+)
This regex matches "TO EXTERNAL", "TO W11" and "TO CONTROL". Be sure to use the global modifier so that you get all matches, not just the first one.

Regular expression to find specific text within a string enclosed in two strings, but not the entire string

I have this type of text:
string1_dog_bit_johny_bit_string2
string1_cat_bit_johny_bit_string2
string1_crocodile_bit_johny_bit_string2
string3_crocodile_bit_johny_bit_string4
string4_crocodile_bit_johny_bit_string5
I want to find all occurrences of “bit” that occur only between string1 and string2. How do I do this with regex?
I found the question Regex Match all characters between two strings, but the regex there matches the entire string between string1 and string2, whereas I want to match just parts of that string.
I am doing a global replacement in Notepad++. I just need regex, code will not work.
Thank you in advance.
Roman
If I understand correctly here a code to do what you want
var intput = new List<string>
{
"string1_dog_bit_johny_bit_string2",
"string1_cat_bit_johny_bit_string2",
"string1_crocodile_bit_johny_bit_string2",
"string3_crocodile_bit_johny_bit_string4",
"string4_crocodile_bit_johny_bit_string5"
};
Regex regex = new Regex(#"(?<bitGroup>bit)");
var allMatches = new List<string>();
foreach (var str in intput)
{
if (str.StartsWith("string1") && str.EndsWith("string2"))
{
var matchCollection = regex.Matches(str);
allMatches.AddRange(matchCollection.Cast<Match>().Select(match => match.Groups["bitGroup"].Value));
}
}
Console.WriteLine("All matches {0}", allMatches.Count);
This regex will do the job:
^string1_(?:.*(bit))+.*_string2$
^ means the start of the text (or line if you use the m option like so: /<regex>/m )
$ means the end of the text
. means any character
* means the previous character/expression is repeated 0 or more times
(?:<stuff>) means a non-capturing group (<stuff> won't be captured as a result of the matching)
You could use ^string1_(.*(bit).*)*_string2$ if you don't care about performance or don't have large/many strings to check. The outer parenthesis allow multiple occurences of "bit".
If you provide us with the language you want to use, we could give more specific solutions.
edit: As you added that you're trying a replacement in Notepad++ I propose the following:
Use (?<=string1_)(.*)bit(.*)(?=_string2) as regex and $1xyz$2 as replacement pattern (replace xyz with your string). Then perform an "replace all" operation until N++ doesn't find any more matches. The problem here is that this regex will only match 1 bit per line per iteration - and therefore needs to be applied repeatedly.
Btw. even if a regexp matches the whole line, you can still only replace parts of it using capturing groups.
You can use the regex:
(?:string1|\G)(?:(?!string2).)*?\Kbit
regex101 demo. Tried it on notepad++ as well and it's working.
There're description in the demo site, but if you want more explanations, let me know and I'll elaborate!