Assistance with a repeating regex expresion - regex

I know there are tons of posts and tutorials on Regex but I am stuck on something that I thought would be fairly simple, sadly not for me ...
Consider the string below:
DEMO1 in ( "test1", "test2", "test3"), CODE_ID in ( "test4", "test5", "test6")
I would like to split the line into two groups, one for 'DEMO2 in' and another for 'CODE_ID in'. The groups would ideally look like
Match 1 = DEMO1 in ( "test1", "test2", "test3")
Match 2 = CODE_ID in ( "test4", "test5", "test6")
This regex pattern gives me the gives me the 'DEMO1 in' and 'CODE_ID in' sections, problem is how to capture the rest of string up to the closing paranthesis?
(\w+\s+in\s)
How do I split and capture on the comma after the closing paranthesis. Also there could be more then just the two groups but for now just the split and capture on two sets would be very helpful.
Not sure about the lifespan of the these but here are is a regex101 the links to my current work, not much there:
https://regex101.com/r/ukwLZM/1

You can get the the full string up to the closing parenthesis with this regex: (\w+\s+in\s+\(.*?\))

It is somewhat of a dirty hack, but I tested that (\w+\s+in \([^)]\)+\)) would work.
To explain:
I capture all the symbols which are not closing parenthesis. And then also add the closing parenthesis symbol itself.

You don't need the capture group for a match only, and you can use a negated character class matching any character except the ( and )
\w+\s+in\s+\([^()]*\)
See a regex101 demo.

Related

REGEX - Capture everything exept the sentence who start with a "["

I try since 2 day to write an Regex who capture some information from my postmaster digest.
Exemple:
0.32768:0A006832, 4.33024:DD040000 [Stage: CreateMessage]Final-Recipient: rfc822;tXXXXXXXions.croXXXXXy#cXXXXXXXtique.frAction: failedStatus: 5.2.2Diagnostic-Code: smtp;554 5.2.2 mailbox full;
I want to capture sentence like that:
Final-Recipient:
Action:
Status:
Diagnostic-Code:
Remote-MTA:
BUT i dont want to capture
[Stage]:
I wrote a regex who work perfectly fine for capturing :
([A-Z]{1}[a-z]+\-)?[A-Z]{1,3}[a-z]*\:\
But sadly i dont know how to says to my regex to NOT capturing sentences that start with a "["
i tried this :
[^\[]([A-Z]{1}[a-z]+\-)?[A-Z]{1,3}[a-z]*\:\
This avoid capturing "[Stage:" but capture one caracters before each other captured sentences.
Anyone know how to capture my postmaster errors ?
Thanks in advance.
(NB: Edited i removed "failedStatus:" and replaced by "Status: ")
Add (?<!(\[)) before your first regex. the final result would be what you want.
complete answer:
(?<!(\[))([A-Z]{1}[a-z]+\-)?[A-Z]{1,3}[a-z]*\:\
explanation:
You want to prevent having [ element before your phrase which in regex would be (\[) and you want to don't have it before phrase which means you want to use not equal lookBehind. in regex ?< is lookBehind and ! is not.
so what you need is ?<!(\[)
Using sed, you can use capture groups for the first part that matches any character except ] and another group for the whole last part including the optional capture group inside.
Use those in the replacement with a newline between group 1 and group 2 \1\n\2
Note that your pattern would not match failedStatus: as it does not start with a capital letter.
Also you can omit this quantifier {1} as 1 is the default, and you don't have to escape \- and \: and \
sed -E 's/([^\[])(([A-Z][a-z]+-)?[A-Z]{1,3}[a-z]*: )/\1\n\2/g' File.eml
Output
0.32768:0A006832, 4.33024:DD040000 [Stage: CreateMessage]
Final-Recipient: rfc822;tXXXXXXXions.croXXXXXy#cXXXXXXXtique.fr
Action: failed
Status: 5.2.2
My bad! I made a mistake in my original question!
I want to capture these fields:
Final-Recipient:
-Action:
-Status:
-Diagnostic-Code:
Remote-MTA:
But not this ONE :
-[Stage: ...
So the regex from ghazal khaki is correct and works fine!
Again thanks for your support guys!

Regex to find after particular word inside a string

I am using regex to find few keywords after colon(:) and the best I have reached so far is:
sample test case
test {
test1 {
sadffd(test: "aff", aaa: "aa1") {}
}
}
Now I have to find a keyword inside () brackets and its working for 'aaa' but when I add test it fails, it matches entire words in string.
my regex so far
\btest(.*\w") (failed case) expected "aff" returned "aff", aaa: "aa1"
\baaa(.*\w") (pass case) returned "aa1"
please let me know if more information is needed
You may try
:\s*"(.*?)"
And the data you need is in the first capturing group.
Explanation
:\s*"(.*?)"
: colon
\s* followed by optionally any number of spaces
" followed by quote
( ) capturing group, containing...
.*? any number of character, matching as few as possible
" followed by quote
Demo:
https://regex101.com/r/WnvzdG/1
Update:
If you want to match ONLY after specific keywords, followed by colon, you can do something like:
(KEYWORD1|KEYWORD2|KEYWORD3)\s*:\s*"(.*?)"
First capture group will be the keyword matched, second capture group will be the value.
One more approach (executed in Python)
items = ['test{test1 {sadffd(test: "aff", aaa: "aa1") {}}}']
for item in items:
print(re.findall(r'"(\w+)"',item))
print(re.findall(r'(?<=: )"(\w+)"',item))
Output
['aff', 'aa1']
['aff', 'aa1']
I believe a simple regex would work to get everything inside the double quotes in your case:
("\w+")
Note that your question above says you want to capture "aff" and not just aff so I've included the surrounding quotes within the capturing group.
Example from regex101:
It's pretty crude but this should be OK for the input you've presented. (It wouldn't handle things like an escaped double quote in the string, for example).

Regex select everything up until next match including new lines

Im trying to capture the conversation below but the regex expression only capture a single line, I want it to capture the entire phrase said by anyone up until the next person says anything else. If I use the /s setting, the '.+' will capture everything until the end of the file not until the next match
Im new to the regular expressions, sorry for any bad explanation
This is what Ive got so far
The regex expression:
/([0-9]{2}\/[0-9]{2}\/[0-9]{2} [0-9]{2}\:[0-9]{2}\:[0-9]{2}: (.+):) (.+)/
What I want
Regex101 Fiddle
I going to use use both \2 and \3 to capture who said and the phrase said inside a for loop so I can text mine it
Using a pattern to extract, then some LINQ to process:
var pattern = "^[0-9]{2}/[0-9]{2}/[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}: (.+?): ((?:[^/]+(?:\n|$))+)";
var data = Regex.Matches(src, pattern, RegexOptions.Multiline).Cast<Match>().Select(m => new { who = m.Groups[1].Value, text = m.Groups[2].Value});

How can I match multiple hits between 2 delimiters?

Hi, my fellow RegEx'ers ;)
I'm trying to match multiple Texts between every two quotes
Here's my text:
...random code
someArray[] = ["Come and",
"get me,",
"or fail",
"trying!",
"Yours truly"]
random code...
So far, I managed to get the correct matches with two patterns, executed after each other:
(?s)someArray\[\].*?=.*?\[(.*?)\]
this extracts the text between the two brackets and on the result, I use this one:
"(.*?)"
This is working just fine, but I'd love to get the Texts in one regex.
Any help is highly appreciated!
Consider using \G. With its help, you may match "(.*?)" preceded by either someArray[] = [ or previous match of "(.*?)" (well, strictly speaking previous match of entire regex). Then just grab first capture groups from all matches:
(?:(?s).*someArray\[\].*?=.*?\[|\G[^"\]]+)"(.*?)"
Demo: https://regex101.com/r/eBQWdU/3
How you grab the first capture groups from depends on the language you're using regex in. For example in PHP you may do something like this:
preg_match_all('/(?:(?s).*someArray\[\].*?=.*?\[|\G[^"\]]+)"(.*?)"/', $input, $matches);
$array_items = $matches[1];
Demo: https://ideone.com/mZgU1x

Regex: How do I match something that may OR may not be between [ ]

I am parsing a log using Perl and I am stumped with as to how I can parse something like this:
from=[ihatethisregex#hotmail.com]
from=ihatethisregex#hotmail.com
What I need is ihatethisregex#hotmail.com and I need to capture this in a named capture group called "email".
I tried the following:
(?<email>(?:\[[^\]]+\])|(?:\S+))
But this captures the square brackets when it parses the first line. I don't want the square brackets. Was wondering if I could do something like this:
(?:\[(?<email>[^\]]+)\])|(?<email>\S+)
and when I evaluate $+{email}, it will just take whichever one that was matched. I also tried the following:
(?:\[?(?<email>(?:[^\]]+\])|(?:\S+)))
But this gave strange results when the email was wrapped in a pair of square brackets.
Any help is appreciated.
/(\[)?your-regexp-here(?(1)\]|)/
( ) capture group #1
\[ opening bracket
? optionally
your-regexp-here your regexp
(?( ) ) conditional match:
1 if capture group #1 evaluated,
\] closing bracket
| else nothing
Note that this does not work in all languages, since conditional match is not a part of a standard regular expression, but rather an extension. Works in Perl, though.
EDIT: misplaced question mark.
I tend to do these kinds of things in two steps, just because its clearer:
my ($val)= /\w+=(.*)/ ;
$val =~ s/\[(.*)\]/$1/e ;
This trims off [] seperately.
Perhaps the following will be helpful:
use strict;
use warnings;
while (<DATA>) {
/from\s*=\s*\[?(?<email>(?:[^\]]+))\]?/;
print $+{email}, "\n";
}
__DATA__
from=[ihatethisregex#hotmail.com]
from=ihatethisregex#hotmail.com
Output:
ihatethisregex#hotmail.com
ihatethisregex#hotmail.com