I try since 2 day to write an Regex who capture some information from my postmaster digest.
Exemple:
0.32768:0A006832, 4.33024:DD040000 [Stage: CreateMessage]Final-Recipient: rfc822;tXXXXXXXions.croXXXXXy#cXXXXXXXtique.frAction: failedStatus: 5.2.2Diagnostic-Code: smtp;554 5.2.2 mailbox full;
I want to capture sentence like that:
Final-Recipient:
Action:
Status:
Diagnostic-Code:
Remote-MTA:
BUT i dont want to capture
[Stage]:
I wrote a regex who work perfectly fine for capturing :
([A-Z]{1}[a-z]+\-)?[A-Z]{1,3}[a-z]*\:\
But sadly i dont know how to says to my regex to NOT capturing sentences that start with a "["
i tried this :
[^\[]([A-Z]{1}[a-z]+\-)?[A-Z]{1,3}[a-z]*\:\
This avoid capturing "[Stage:" but capture one caracters before each other captured sentences.
Anyone know how to capture my postmaster errors ?
Thanks in advance.
(NB: Edited i removed "failedStatus:" and replaced by "Status: ")
Add (?<!(\[)) before your first regex. the final result would be what you want.
complete answer:
(?<!(\[))([A-Z]{1}[a-z]+\-)?[A-Z]{1,3}[a-z]*\:\
explanation:
You want to prevent having [ element before your phrase which in regex would be (\[) and you want to don't have it before phrase which means you want to use not equal lookBehind. in regex ?< is lookBehind and ! is not.
so what you need is ?<!(\[)
Using sed, you can use capture groups for the first part that matches any character except ] and another group for the whole last part including the optional capture group inside.
Use those in the replacement with a newline between group 1 and group 2 \1\n\2
Note that your pattern would not match failedStatus: as it does not start with a capital letter.
Also you can omit this quantifier {1} as 1 is the default, and you don't have to escape \- and \: and \
sed -E 's/([^\[])(([A-Z][a-z]+-)?[A-Z]{1,3}[a-z]*: )/\1\n\2/g' File.eml
Output
0.32768:0A006832, 4.33024:DD040000 [Stage: CreateMessage]
Final-Recipient: rfc822;tXXXXXXXions.croXXXXXy#cXXXXXXXtique.fr
Action: failed
Status: 5.2.2
My bad! I made a mistake in my original question!
I want to capture these fields:
Final-Recipient:
-Action:
-Status:
-Diagnostic-Code:
Remote-MTA:
But not this ONE :
-[Stage: ...
So the regex from ghazal khaki is correct and works fine!
Again thanks for your support guys!
Related
I am using regex to find few keywords after colon(:) and the best I have reached so far is:
sample test case
test {
test1 {
sadffd(test: "aff", aaa: "aa1") {}
}
}
Now I have to find a keyword inside () brackets and its working for 'aaa' but when I add test it fails, it matches entire words in string.
my regex so far
\btest(.*\w") (failed case) expected "aff" returned "aff", aaa: "aa1"
\baaa(.*\w") (pass case) returned "aa1"
please let me know if more information is needed
You may try
:\s*"(.*?)"
And the data you need is in the first capturing group.
Explanation
:\s*"(.*?)"
: colon
\s* followed by optionally any number of spaces
" followed by quote
( ) capturing group, containing...
.*? any number of character, matching as few as possible
" followed by quote
Demo:
https://regex101.com/r/WnvzdG/1
Update:
If you want to match ONLY after specific keywords, followed by colon, you can do something like:
(KEYWORD1|KEYWORD2|KEYWORD3)\s*:\s*"(.*?)"
First capture group will be the keyword matched, second capture group will be the value.
One more approach (executed in Python)
items = ['test{test1 {sadffd(test: "aff", aaa: "aa1") {}}}']
for item in items:
print(re.findall(r'"(\w+)"',item))
print(re.findall(r'(?<=: )"(\w+)"',item))
Output
['aff', 'aa1']
['aff', 'aa1']
I believe a simple regex would work to get everything inside the double quotes in your case:
("\w+")
Note that your question above says you want to capture "aff" and not just aff so I've included the surrounding quotes within the capturing group.
Example from regex101:
It's pretty crude but this should be OK for the input you've presented. (It wouldn't handle things like an escaped double quote in the string, for example).
I am trying to use Notepad ++ to delete emails that end in #domain2.serverdata.net
here is a string example:
smtp:name#domain1.com;SMTP:name#domain2.com;smtp:name#domain2.serverdata.net;smtp:name#domain3.com;smtp:name_e4d1fe3d-e985-40d0-bc65-32c57c9b14d1#domain2.serverdata.net
I was hoping to use:
;smtp:.*#domain2.serverdata.net
but it captures SMTP:name#domain2.com as well
Ctrl+H
Find what: (?:\A|;)smtp:[^#]*#domain2\.serverdata\.net
Replace with: LEAVE EMPTY
check Wrap around
check Regular expression
Replace all
Explanation:
(?:\A|;) # non capture group, beginning of file or semicolon,
this allows to delete the first email of the file
that haven't a semicolon before it
smtp: # literally
[^#]+ # 1 or more any character that is not #
#domain2\.serverdata\.net # literally
Try Regex: ;?smtp:[\w.-]+?#domain2\.serverdata\.net
Demo
Regexes will usually capture as much as possible. For instance: START.*STOP applied to the following text:
STARTsghlegdSTOPfsgikbSTARTsvdinusSTOPwegtgw
will capture this part:
STARTsghlegdSTOPfsgikbSTARTsvdinusSTOPwegtgw
^------------------------------------^
In your case, the .* captures everything up to the last instance of #domain2.serverdata.net. You don't want to use . (any character), you want to use "any character except '#'" which is written like this: [^#].
So your full regex would be smtp:[^#]*#domain2\.serverdata\.net. I also dropped the initial ; since it would prevent you from capturing the first mail address.
Try this one:
smtp:[^#]+#domain2\.serverdata\.net(;)?
this is my first post on StackOverflow, and regex is new to me, please bear with me.
I am attempting to capture fields within a powershell command event log.
I have text in the following format:
(Get-AdUser): name="Identity"; value="Username"
I want to capture the string inside the parenthesis Get-ADUser and also capture the value field of "username"
If possible a final output of
Get-AdUser Username
would be perfect.
The gotcha is that I want to capture any value inside the parenthesis except for the word "Out-Default". Out-Default is the output of a command, rather than the command itself.
So far I have:
\((?!Out-Default)([^)]+)\)
which is matching anything inside the parenthesis except "Out-Default".
I'm not sure how to approach it from here. Any advice is appreciated.
Update - is it possible to use only 1 capture group to capture:
(Get-AdUser): name="Identity"; value="Username"
and have the result look like
Get-AdUser name=Identity value=Username
?
Hope this work
\((?!Out-Default)([^)]+)\).*?value="([^"]+)"
Regex demo
Explanation:
\: Escapes a special character sample
( … ): Capturing group sample
(?!…): Negative lookahead sample
[^x]: One character that is not x sample
+: One or more sample
.: Any character except line break sample
*: Zero or more times sample
?: Once or none sample
I have to parse a lot of content with a regular expression.
The content might, for example, be:
14-08-2015 14:18 : Example : Hello =) How are you?
What are you doing?
14-08-2015 14:19: Example2 : I'm fine thanks!
I have this regular expression that will of course return 2 matches, and the groups that I need - data, hour, name, multi line message:
(\d{2}-\d{2}-\d{4})\s?(\d{2}:\d{2})\s?:([^:]+):([^\d]+)
The problem is that if a number is written inside the message this will not be OK, because the regex will stop getting more characters.
For example in this case this will not work:
14-08-2015 14:18 : Example : Hello =) How are you?
What are you 2 doing?
14-08-2015 14:19: Example2 : I'm fine thanks!
How do I get all the characters until a new date/hour is found?
The problem is with your final capturing group ([^\d]+).
Instead you can use ((?:(?!\d{2}-\d{2}-\d{4})[\s\S])+)
The outer parenthesis: ((?:(?!\d{2}-\d{2}-\d{4})[\s\S])+) indicate a capturing group
The next set of parenthesis: ((?:(?!\d{2}-\d{2}-\d{4})[\s\S])+) indicate a non-capturing group that we want to match 1 to infinite amount of times.
Inside we have a negative look ahead: ((?:(?!\d{2}-\d{2}-\d{4})[\s\S])+). This says that whatever we are matching cannot include a date.
What we actually capture: ((?:(?!\d{2}-\d{2}-\d{4})[\s\S])+) means we capture every character including a new line.
The entire regex that works looks like this:
(\d{2}-\d{2}-\d{4})\s?(\d{2}:\d{2})\s?:([^:]+):((?:(?!\d{2}-\d{2}-\d{4})[\s\S])+)
https://regex101.com/r/wH5xR2/2
Use a lookahead for dates and get everything up to that.
/^(\d{2}-\d{2}-\d{4})\s?(\d{2}:\d{2})\s?:([^:]+):\s?((?:(?!^\d{2}-\d{2}-\d{4}\s?\d{2}:\d{2}).)*)/sm
I've edited you regex in two ways:
Added ^to the front, ensuring you only start from timestamps on their own line, which should filter out most issues with people posting timestamps
Replaced the last capturing group with ((?:(?!^\d{2}-\d{2}-\d{4}\s?\d{2}:\d{2}).)*)
(?!^\d{2}-\d{2}-\d{4}\s?\d{2}:\d{2}) is a negative lookahead, with date
(?:(lookahead).)* Looks for any amount of characters that aren't followed by a date anchored to the start of a line.
((?:(lookahead).)*) Just captures the group for you.
It's not that efficient, but it works. Note the s flag for dotall (dot matches newlines) and m flag that lets ^ match at the start of line. ^ is necessary in the lookahead so that you don't stop the match in case someone posts a timestamp, and in the start to make sure you only match dates from the start of a line.
DEMO: https://regex101.com/r/rX8eH0/3
DEMO with flags in regex: https://regex101.com/r/rX8eH0/4
I've been cutting my teeth on regex over the past couple of days, and have encountered an issue I cant seem to get past.
Lets assume the following 3 string values
AKA NAME:FOO
FOO
AKA NAME:
My goal is to capture the value of the string after AKA NAME: in a named match group, and if AKA NAME: is not present, capture the entire string in the match group. If "AKA NAME:" IS present with no subsequent value, the regex expression should fail. I have developed the following expression
^(?:AKA NAME:)?\s*(?<VALUE>(.|\n|\r){1,225})$
This will correctly capture the word "FOO" in the first 2 strings above, however, in the third it captures "AKA NAME:" in the match group. I figured putting ? after the non-capture group containing "AKA NAME:" would have caused the engine to skip this value, but it is not.
Can someone give me some guidance?
You can try with:
(?:AKA NAME:)*(.+)*
and check if $1 exist.
DEMO
Use a look behind assertion and then exclusion set for "AKA NAME:" only:
EDITTED:
(?<=AKA NAME:)\s?(\w+)|(?!AKA NAME:)^(\w+)
DEMO
I think you can use this regex:
"^(AKA NAME\s*:)?\s*(.*)$"gm
and get \2 for your result.
[Regex Demo]
^(AKA NAME:)?(.*)$
\2 should contain what you're looking for.