REGEX - Capture everything exept the sentence who start with a "[" - regex

I try since 2 day to write an Regex who capture some information from my postmaster digest.
Exemple:
0.32768:0A006832, 4.33024:DD040000 [Stage: CreateMessage]Final-Recipient: rfc822;tXXXXXXXions.croXXXXXy#cXXXXXXXtique.frAction: failedStatus: 5.2.2Diagnostic-Code: smtp;554 5.2.2 mailbox full;
I want to capture sentence like that:
Final-Recipient:
Action:
Status:
Diagnostic-Code:
Remote-MTA:
BUT i dont want to capture
[Stage]:
I wrote a regex who work perfectly fine for capturing :
([A-Z]{1}[a-z]+\-)?[A-Z]{1,3}[a-z]*\:\
But sadly i dont know how to says to my regex to NOT capturing sentences that start with a "["
i tried this :
[^\[]([A-Z]{1}[a-z]+\-)?[A-Z]{1,3}[a-z]*\:\
This avoid capturing "[Stage:" but capture one caracters before each other captured sentences.
Anyone know how to capture my postmaster errors ?
Thanks in advance.
(NB: Edited i removed "failedStatus:" and replaced by "Status: ")

Add (?<!(\[)) before your first regex. the final result would be what you want.
complete answer:
(?<!(\[))([A-Z]{1}[a-z]+\-)?[A-Z]{1,3}[a-z]*\:\
explanation:
You want to prevent having [ element before your phrase which in regex would be (\[) and you want to don't have it before phrase which means you want to use not equal lookBehind. in regex ?< is lookBehind and ! is not.
so what you need is ?<!(\[)

Using sed, you can use capture groups for the first part that matches any character except ] and another group for the whole last part including the optional capture group inside.
Use those in the replacement with a newline between group 1 and group 2 \1\n\2
Note that your pattern would not match failedStatus: as it does not start with a capital letter.
Also you can omit this quantifier {1} as 1 is the default, and you don't have to escape \- and \: and \
sed -E 's/([^\[])(([A-Z][a-z]+-)?[A-Z]{1,3}[a-z]*: )/\1\n\2/g' File.eml
Output
0.32768:0A006832, 4.33024:DD040000 [Stage: CreateMessage]
Final-Recipient: rfc822;tXXXXXXXions.croXXXXXy#cXXXXXXXtique.fr
Action: failed
Status: 5.2.2

My bad! I made a mistake in my original question!
I want to capture these fields:
Final-Recipient:
-Action:
-Status:
-Diagnostic-Code:
Remote-MTA:
But not this ONE :
-[Stage: ...
So the regex from ghazal khaki is correct and works fine!
Again thanks for your support guys!

Related

Regex to find after particular word inside a string

I am using regex to find few keywords after colon(:) and the best I have reached so far is:
sample test case
test {
test1 {
sadffd(test: "aff", aaa: "aa1") {}
}
}
Now I have to find a keyword inside () brackets and its working for 'aaa' but when I add test it fails, it matches entire words in string.
my regex so far
\btest(.*\w") (failed case) expected "aff" returned "aff", aaa: "aa1"
\baaa(.*\w") (pass case) returned "aa1"
please let me know if more information is needed
You may try
:\s*"(.*?)"
And the data you need is in the first capturing group.
Explanation
:\s*"(.*?)"
: colon
\s* followed by optionally any number of spaces
" followed by quote
( ) capturing group, containing...
.*? any number of character, matching as few as possible
" followed by quote
Demo:
https://regex101.com/r/WnvzdG/1
Update:
If you want to match ONLY after specific keywords, followed by colon, you can do something like:
(KEYWORD1|KEYWORD2|KEYWORD3)\s*:\s*"(.*?)"
First capture group will be the keyword matched, second capture group will be the value.
One more approach (executed in Python)
items = ['test{test1 {sadffd(test: "aff", aaa: "aa1") {}}}']
for item in items:
print(re.findall(r'"(\w+)"',item))
print(re.findall(r'(?<=: )"(\w+)"',item))
Output
['aff', 'aa1']
['aff', 'aa1']
I believe a simple regex would work to get everything inside the double quotes in your case:
("\w+")
Note that your question above says you want to capture "aff" and not just aff so I've included the surrounding quotes within the capturing group.
Example from regex101:
It's pretty crude but this should be OK for the input you've presented. (It wouldn't handle things like an escaped double quote in the string, for example).

How to use regex in notepad ++ to search for emails with specific domains

I am trying to use Notepad ++ to delete emails that end in #domain2.serverdata.net
here is a string example:
smtp:name#domain1.com;SMTP:name#domain2.com;smtp:name#domain2.serverdata.net;smtp:name#domain3.com;smtp:name_e4d1fe3d-e985-40d0-bc65-32c57c9b14d1#domain2.serverdata.net
I was hoping to use:
;smtp:.*#domain2.serverdata.net
but it captures SMTP:name#domain2.com as well
Ctrl+H
Find what: (?:\A|;)smtp:[^#]*#domain2\.serverdata\.net
Replace with: LEAVE EMPTY
check Wrap around
check Regular expression
Replace all
Explanation:
(?:\A|;) # non capture group, beginning of file or semicolon,
this allows to delete the first email of the file
that haven't a semicolon before it
smtp: # literally
[^#]+ # 1 or more any character that is not #
#domain2\.serverdata\.net # literally
Try Regex: ;?smtp:[\w.-]+?#domain2\.serverdata\.net
Demo
Regexes will usually capture as much as possible. For instance: START.*STOP applied to the following text:
STARTsghlegdSTOPfsgikbSTARTsvdinusSTOPwegtgw
will capture this part:
STARTsghlegdSTOPfsgikbSTARTsvdinusSTOPwegtgw
^------------------------------------^
In your case, the .* captures everything up to the last instance of #domain2.serverdata.net. You don't want to use . (any character), you want to use "any character except '#'" which is written like this: [^#].
So your full regex would be smtp:[^#]*#domain2\.serverdata\.net. I also dropped the initial ; since it would prevent you from capturing the first mail address.
Try this one:
smtp:[^#]+#domain2\.serverdata\.net(;)?

Regex - Matching characters then capturing

this is my first post on StackOverflow, and regex is new to me, please bear with me.
I am attempting to capture fields within a powershell command event log.
I have text in the following format:
(Get-AdUser): name="Identity"; value="Username"
I want to capture the string inside the parenthesis Get-ADUser and also capture the value field of "username"
If possible a final output of
Get-AdUser Username
would be perfect.
The gotcha is that I want to capture any value inside the parenthesis except for the word "Out-Default". Out-Default is the output of a command, rather than the command itself.
So far I have:
\((?!Out-Default)([^)]+)\)
which is matching anything inside the parenthesis except "Out-Default".
I'm not sure how to approach it from here. Any advice is appreciated.
Update - is it possible to use only 1 capture group to capture:
(Get-AdUser): name="Identity"; value="Username"
and have the result look like
Get-AdUser name=Identity value=Username
?
Hope this work
\((?!Out-Default)([^)]+)\).*?value="([^"]+)"
Regex demo
Explanation:
\: Escapes a special character sample
( … ): Capturing group sample
(?!…): Negative lookahead sample
[^x]: One character that is not x sample
+: One or more sample
.: Any character except line break sample
*: Zero or more times sample
?: Once or none sample

Get all the characters until a new date/hour is found

I have to parse a lot of content with a regular expression.
The content might, for example, be:
14-08-2015 14:18 : Example : Hello =) How are you?
What are you doing?
14-08-2015 14:19: Example2 : I'm fine thanks!
I have this regular expression that will of course return 2 matches, and the groups that I need - data, hour, name, multi line message:
(\d{2}-\d{2}-\d{4})\s?(\d{2}:\d{2})\s?:([^:]+):([^\d]+)
The problem is that if a number is written inside the message this will not be OK, because the regex will stop getting more characters.
For example in this case this will not work:
14-08-2015 14:18 : Example : Hello =) How are you?
What are you 2 doing?
14-08-2015 14:19: Example2 : I'm fine thanks!
How do I get all the characters until a new date/hour is found?
The problem is with your final capturing group ([^\d]+).
Instead you can use ((?:(?!\d{2}-\d{2}-\d{4})[\s\S])+)
The outer parenthesis: ((?:(?!\d{2}-\d{2}-\d{4})[\s\S])+) indicate a capturing group
The next set of parenthesis: ((?:(?!\d{2}-\d{2}-\d{4})[\s\S])+) indicate a non-capturing group that we want to match 1 to infinite amount of times.
Inside we have a negative look ahead: ((?:(?!\d{2}-\d{2}-\d{4})[\s\S])+). This says that whatever we are matching cannot include a date.
What we actually capture: ((?:(?!\d{2}-\d{2}-\d{4})[\s\S])+) means we capture every character including a new line.
The entire regex that works looks like this:
(\d{2}-\d{2}-\d{4})\s?(\d{2}:\d{2})\s?:([^:]+):((?:(?!\d{2}-\d{2}-\d{4})[\s\S])+)
https://regex101.com/r/wH5xR2/2
Use a lookahead for dates and get everything up to that.
/^(\d{2}-\d{2}-\d{4})\s?(\d{2}:\d{2})\s?:([^:]+):\s?((?:(?!^\d{2}-\d{2}-\d{4}\s?\d{2}:\d{2}).)*)/sm
I've edited you regex in two ways:
Added ^to the front, ensuring you only start from timestamps on their own line, which should filter out most issues with people posting timestamps
Replaced the last capturing group with ((?:(?!^\d{2}-\d{2}-\d{4}\s?\d{2}:\d{2}).)*)
(?!^\d{2}-\d{2}-\d{4}\s?\d{2}:\d{2}) is a negative lookahead, with date
(?:(lookahead).)* Looks for any amount of characters that aren't followed by a date anchored to the start of a line.
((?:(lookahead).)*) Just captures the group for you.
It's not that efficient, but it works. Note the s flag for dotall (dot matches newlines) and m flag that lets ^ match at the start of line. ^ is necessary in the lookahead so that you don't stop the match in case someone posts a timestamp, and in the start to make sure you only match dates from the start of a line.
DEMO: https://regex101.com/r/rX8eH0/3
DEMO with flags in regex: https://regex101.com/r/rX8eH0/4

How to skip phrase only if it exists

I've been cutting my teeth on regex over the past couple of days, and have encountered an issue I cant seem to get past.
Lets assume the following 3 string values
AKA NAME:FOO
FOO
AKA NAME:
My goal is to capture the value of the string after AKA NAME: in a named match group, and if AKA NAME: is not present, capture the entire string in the match group. If "AKA NAME:" IS present with no subsequent value, the regex expression should fail. I have developed the following expression
^(?:AKA NAME:)?\s*(?<VALUE>(.|\n|\r){1,225})$
This will correctly capture the word "FOO" in the first 2 strings above, however, in the third it captures "AKA NAME:" in the match group. I figured putting ? after the non-capture group containing "AKA NAME:" would have caused the engine to skip this value, but it is not.
Can someone give me some guidance?
You can try with:
(?:AKA NAME:)*(.+)*
and check if $1 exist.
DEMO
Use a look behind assertion and then exclusion set for "AKA NAME:" only:
EDITTED:
(?<=AKA NAME:)\s?(\w+)|(?!AKA NAME:)^(\w+)
DEMO
I think you can use this regex:
"^(AKA NAME\s*:)?\s*(.*)$"gm
and get \2 for your result.
[Regex Demo]
^(AKA NAME:)?(.*)$
\2 should contain what you're looking for.