Regex - Matching characters then capturing

Regex - Matching characters then capturing - regex

this is my first post on StackOverflow, and regex is new to me, please bear with me.
I am attempting to capture fields within a powershell command event log.
I have text in the following format:
(Get-AdUser): name="Identity"; value="Username"
I want to capture the string inside the parenthesis Get-ADUser and also capture the value field of "username"
If possible a final output of
Get-AdUser Username
would be perfect.
The gotcha is that I want to capture any value inside the parenthesis except for the word "Out-Default". Out-Default is the output of a command, rather than the command itself.
So far I have:
\((?!Out-Default)([^)]+)\)
which is matching anything inside the parenthesis except "Out-Default".
I'm not sure how to approach it from here. Any advice is appreciated.
Update - is it possible to use only 1 capture group to capture:
(Get-AdUser): name="Identity"; value="Username"
and have the result look like
Get-AdUser name=Identity value=Username
?

Hope this work
\((?!Out-Default)([^)]+)\).*?value="([^"]+)"
Regex demo
Explanation:
\: Escapes a special character sample
( … ): Capturing group sample
(?!…): Negative lookahead sample
[^x]: One character that is not x sample
+: One or more sample
.: Any character except line break sample
*: Zero or more times sample
?: Once or none sample

Related

REGEX - Capture everything exept the sentence who start with a "["

I try since 2 day to write an Regex who capture some information from my postmaster digest.
Exemple:
0.32768:0A006832, 4.33024:DD040000 [Stage: CreateMessage]Final-Recipient: rfc822;tXXXXXXXions.croXXXXXy#cXXXXXXXtique.frAction: failedStatus: 5.2.2Diagnostic-Code: smtp;554 5.2.2 mailbox full;
I want to capture sentence like that:
Final-Recipient:
Action:
Status:
Diagnostic-Code:
Remote-MTA:
BUT i dont want to capture
[Stage]:
I wrote a regex who work perfectly fine for capturing :
([A-Z]{1}[a-z]+\-)?[A-Z]{1,3}[a-z]*\:\
But sadly i dont know how to says to my regex to NOT capturing sentences that start with a "["
i tried this :
[^\[]([A-Z]{1}[a-z]+\-)?[A-Z]{1,3}[a-z]*\:\
This avoid capturing "[Stage:" but capture one caracters before each other captured sentences.
Anyone know how to capture my postmaster errors ?
Thanks in advance.
(NB: Edited i removed "failedStatus:" and replaced by "Status: ")

Add (?<!(\[)) before your first regex. the final result would be what you want.
complete answer:
(?<!(\[))([A-Z]{1}[a-z]+\-)?[A-Z]{1,3}[a-z]*\:\
explanation:
You want to prevent having [ element before your phrase which in regex would be (\[) and you want to don't have it before phrase which means you want to use not equal lookBehind. in regex ?< is lookBehind and ! is not.
so what you need is ?<!(\[)

Using sed, you can use capture groups for the first part that matches any character except ] and another group for the whole last part including the optional capture group inside.
Use those in the replacement with a newline between group 1 and group 2 \1\n\2
Note that your pattern would not match failedStatus: as it does not start with a capital letter.
Also you can omit this quantifier {1} as 1 is the default, and you don't have to escape \- and \: and \
sed -E 's/([^\[])(([A-Z][a-z]+-)?[A-Z]{1,3}[a-z]*: )/\1\n\2/g' File.eml
Output
0.32768:0A006832, 4.33024:DD040000 [Stage: CreateMessage]
Final-Recipient: rfc822;tXXXXXXXions.croXXXXXy#cXXXXXXXtique.fr
Action: failed
Status: 5.2.2

My bad! I made a mistake in my original question!
I want to capture these fields:
Final-Recipient:
-Action:
-Status:
-Diagnostic-Code:
Remote-MTA:
But not this ONE :
-[Stage: ...
So the regex from ghazal khaki is correct and works fine!
Again thanks for your support guys!

Regex to find after particular word inside a string

I am using regex to find few keywords after colon(:) and the best I have reached so far is:
sample test case
test {
test1 {
sadffd(test: "aff", aaa: "aa1") {}
}
}
Now I have to find a keyword inside () brackets and its working for 'aaa' but when I add test it fails, it matches entire words in string.
my regex so far
\btest(.*\w") (failed case) expected "aff" returned "aff", aaa: "aa1"
\baaa(.*\w") (pass case) returned "aa1"
please let me know if more information is needed

You may try
:\s*"(.*?)"
And the data you need is in the first capturing group.
Explanation
:\s*"(.*?)"
: colon
\s* followed by optionally any number of spaces
" followed by quote
( ) capturing group, containing...
.*? any number of character, matching as few as possible
" followed by quote
Demo:
https://regex101.com/r/WnvzdG/1
Update:
If you want to match ONLY after specific keywords, followed by colon, you can do something like:
(KEYWORD1|KEYWORD2|KEYWORD3)\s*:\s*"(.*?)"
First capture group will be the keyword matched, second capture group will be the value.

One more approach (executed in Python)
items = ['test{test1 {sadffd(test: "aff", aaa: "aa1") {}}}']
for item in items:
print(re.findall(r'"(\w+)"',item))
print(re.findall(r'(?<=: )"(\w+)"',item))
Output
['aff', 'aa1']
['aff', 'aa1']

I believe a simple regex would work to get everything inside the double quotes in your case:
("\w+")
Note that your question above says you want to capture "aff" and not just aff so I've included the surrounding quotes within the capturing group.
Example from regex101:
It's pretty crude but this should be OK for the input you've presented. (It wouldn't handle things like an escaped double quote in the string, for example).

How to use regex in notepad ++ to search for emails with specific domains

I am trying to use Notepad ++ to delete emails that end in #domain2.serverdata.net
here is a string example:
smtp:name#domain1.com;SMTP:name#domain2.com;smtp:name#domain2.serverdata.net;smtp:name#domain3.com;smtp:name_e4d1fe3d-e985-40d0-bc65-32c57c9b14d1#domain2.serverdata.net
I was hoping to use:
;smtp:.*#domain2.serverdata.net
but it captures SMTP:name#domain2.com as well

Ctrl+H
Find what: (?:\A|;)smtp:[^#]*#domain2\.serverdata\.net
Replace with: LEAVE EMPTY
check Wrap around
check Regular expression
Replace all
Explanation:
(?:\A|;) # non capture group, beginning of file or semicolon,
this allows to delete the first email of the file
that haven't a semicolon before it
smtp: # literally
[^#]+ # 1 or more any character that is not #
#domain2\.serverdata\.net # literally

Try Regex: ;?smtp:[\w.-]+?#domain2\.serverdata\.net
Demo

Regexes will usually capture as much as possible. For instance: START.*STOP applied to the following text:
STARTsghlegdSTOPfsgikbSTARTsvdinusSTOPwegtgw
will capture this part:
STARTsghlegdSTOPfsgikbSTARTsvdinusSTOPwegtgw
^------------------------------------^
In your case, the .* captures everything up to the last instance of #domain2.serverdata.net. You don't want to use . (any character), you want to use "any character except '#'" which is written like this: [^#].
So your full regex would be smtp:[^#]*#domain2\.serverdata\.net. I also dropped the initial ; since it would prevent you from capturing the first mail address.

Try this one:
smtp:[^#]+#domain2\.serverdata\.net(;)?

Get all the characters until a new date/hour is found

I have to parse a lot of content with a regular expression.
The content might, for example, be:
14-08-2015 14:18 : Example : Hello =) How are you?
What are you doing?
14-08-2015 14:19: Example2 : I'm fine thanks!
I have this regular expression that will of course return 2 matches, and the groups that I need - data, hour, name, multi line message:
(\d{2}-\d{2}-\d{4})\s?(\d{2}:\d{2})\s?:([^:]+):([^\d]+)
The problem is that if a number is written inside the message this will not be OK, because the regex will stop getting more characters.
For example in this case this will not work:
14-08-2015 14:18 : Example : Hello =) How are you?
What are you 2 doing?
14-08-2015 14:19: Example2 : I'm fine thanks!
How do I get all the characters until a new date/hour is found?

The problem is with your final capturing group ([^\d]+).
Instead you can use ((?:(?!\d{2}-\d{2}-\d{4})[\s\S])+)
The outer parenthesis: ((?:(?!\d{2}-\d{2}-\d{4})[\s\S])+) indicate a capturing group
The next set of parenthesis: ((?:(?!\d{2}-\d{2}-\d{4})[\s\S])+) indicate a non-capturing group that we want to match 1 to infinite amount of times.
Inside we have a negative look ahead: ((?:(?!\d{2}-\d{2}-\d{4})[\s\S])+). This says that whatever we are matching cannot include a date.
What we actually capture: ((?:(?!\d{2}-\d{2}-\d{4})[\s\S])+) means we capture every character including a new line.
The entire regex that works looks like this:
(\d{2}-\d{2}-\d{4})\s?(\d{2}:\d{2})\s?:([^:]+):((?:(?!\d{2}-\d{2}-\d{4})[\s\S])+)
https://regex101.com/r/wH5xR2/2

Use a lookahead for dates and get everything up to that.
/^(\d{2}-\d{2}-\d{4})\s?(\d{2}:\d{2})\s?:([^:]+):\s?((?:(?!^\d{2}-\d{2}-\d{4}\s?\d{2}:\d{2}).)*)/sm
I've edited you regex in two ways:
Added ^to the front, ensuring you only start from timestamps on their own line, which should filter out most issues with people posting timestamps
Replaced the last capturing group with ((?:(?!^\d{2}-\d{2}-\d{4}\s?\d{2}:\d{2}).)*)
(?!^\d{2}-\d{2}-\d{4}\s?\d{2}:\d{2}) is a negative lookahead, with date
(?:(lookahead).)* Looks for any amount of characters that aren't followed by a date anchored to the start of a line.
((?:(lookahead).)*) Just captures the group for you.
It's not that efficient, but it works. Note the s flag for dotall (dot matches newlines) and m flag that lets ^ match at the start of line. ^ is necessary in the lookahead so that you don't stop the match in case someone posts a timestamp, and in the start to make sure you only match dates from the start of a line.
DEMO: https://regex101.com/r/rX8eH0/3
DEMO with flags in regex: https://regex101.com/r/rX8eH0/4

Using RegEx with something of the format "xxx:abc" to match just "abc"?

I've not done much RegEx, so I'm having issues coming up with a good RegEx for this script.
Here are some example inputs:
document:ASoi4jgt0w9efcZXNDOFzsdpfoasdf-zGRnae4iwn2, file:90jfa9_189204hsfiansdIASDNF, pdf:a09srjbZXMgf9oe90rfmasasgjm4-ab, spreadsheet:ASr0gk0jsdfPAsdfn
And here's what I'd want to match on each of those examples:
ASoi4jgt0w9efcZXNDOFzsdpfoasdf-zGRnae4iwn2, 90jfa9_189204hsfiansdIASDNF, a09srjbZXMgf9oe90rfmasasgjm4-ab, ASr0gk0jsdfPAsdfn
What would be the best and perhaps simplest RegEx to use for this? Thanks!

.*:(.*) should get you everything after the last colon in the string as the value of the first group (or second group if you count the 'match everything' group).
An alternative would be [^:]*$ which gets you all characters at the end of the string up to but not including the last character in the string that is a colon.

Use something like below:
([^:]*)(,|$)
and get the first group. You can use a non-capturing group (?:ABC) if needed for the last. Also this makes the assumption that the value itself can have , as one of the characters.
I don't think answers like (.*)\:(.*) would work. It will match entire string.

(.*)\:(.*)
And take the second capture group...

Simplest seems to be [^:]*:([^,]*)(?:,|$).
That is find something that has something (possibly nothing) up to a colon, then a colon, then anything not including a comma (which is the thing matched), up to a comma or the end of the line.
Note the use of a non-capturing group at the end to encapsulate the alternation. The only capturing group appearing is the one which you wish to use.
So in python:
import re
exampStr = "document:ASoi4jgt0w9efcZXNDOFzsdpfoasdf-zGRnae4iwn2, file:90jfa9_189204hsfiansdIASDNF, pdf:a09srjbZXMgf9oe90rfmasasgjm4-ab, spreadsheet:ASr0gk0jsdfPAsdfn"
regex = re.compile("[^:]*:([^,]*)(?:,|$)")
result = regex.findall(exampStr)
print result
#
# Result:
#
# ['ASoi4jgt0w9efcZXNDOFzsdpfoasdf-zGRnae4iwn2', '90jfa9_189204hsfiansdIASDNF', 'a09srjbZXMgf9oe90rfmasasgjm4-ab', 'ASr0gk0jsdfPAsdfn']
#
#
A good introduction is at: http://www.regular-expressions.info/tutorial.html .

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex - Matching characters then capturing - regex

Related

REGEX - Capture everything exept the sentence who start with a "["

Regex to find after particular word inside a string

How to use regex in notepad ++ to search for emails with specific domains

Get all the characters until a new date/hour is found

Using RegEx with something of the format "xxx:abc" to match just "abc"?

Categories

Resources