How to skip phrase only if it exists - regex

I've been cutting my teeth on regex over the past couple of days, and have encountered an issue I cant seem to get past.
Lets assume the following 3 string values
AKA NAME:FOO
FOO
AKA NAME:
My goal is to capture the value of the string after AKA NAME: in a named match group, and if AKA NAME: is not present, capture the entire string in the match group. If "AKA NAME:" IS present with no subsequent value, the regex expression should fail. I have developed the following expression
^(?:AKA NAME:)?\s*(?<VALUE>(.|\n|\r){1,225})$
This will correctly capture the word "FOO" in the first 2 strings above, however, in the third it captures "AKA NAME:" in the match group. I figured putting ? after the non-capture group containing "AKA NAME:" would have caused the engine to skip this value, but it is not.
Can someone give me some guidance?

You can try with:
(?:AKA NAME:)*(.+)*
and check if $1 exist.
DEMO

Use a look behind assertion and then exclusion set for "AKA NAME:" only:
EDITTED:
(?<=AKA NAME:)\s?(\w+)|(?!AKA NAME:)^(\w+)
DEMO

I think you can use this regex:
"^(AKA NAME\s*:)?\s*(.*)$"gm
and get \2 for your result.
[Regex Demo]

^(AKA NAME:)?(.*)$
\2 should contain what you're looking for.

Related

REGEX - Capture everything exept the sentence who start with a "["

I try since 2 day to write an Regex who capture some information from my postmaster digest.
Exemple:
0.32768:0A006832, 4.33024:DD040000 [Stage: CreateMessage]Final-Recipient: rfc822;tXXXXXXXions.croXXXXXy#cXXXXXXXtique.frAction: failedStatus: 5.2.2Diagnostic-Code: smtp;554 5.2.2 mailbox full;
I want to capture sentence like that:
Final-Recipient:
Action:
Status:
Diagnostic-Code:
Remote-MTA:
BUT i dont want to capture
[Stage]:
I wrote a regex who work perfectly fine for capturing :
([A-Z]{1}[a-z]+\-)?[A-Z]{1,3}[a-z]*\:\
But sadly i dont know how to says to my regex to NOT capturing sentences that start with a "["
i tried this :
[^\[]([A-Z]{1}[a-z]+\-)?[A-Z]{1,3}[a-z]*\:\
This avoid capturing "[Stage:" but capture one caracters before each other captured sentences.
Anyone know how to capture my postmaster errors ?
Thanks in advance.
(NB: Edited i removed "failedStatus:" and replaced by "Status: ")
Add (?<!(\[)) before your first regex. the final result would be what you want.
complete answer:
(?<!(\[))([A-Z]{1}[a-z]+\-)?[A-Z]{1,3}[a-z]*\:\
explanation:
You want to prevent having [ element before your phrase which in regex would be (\[) and you want to don't have it before phrase which means you want to use not equal lookBehind. in regex ?< is lookBehind and ! is not.
so what you need is ?<!(\[)
Using sed, you can use capture groups for the first part that matches any character except ] and another group for the whole last part including the optional capture group inside.
Use those in the replacement with a newline between group 1 and group 2 \1\n\2
Note that your pattern would not match failedStatus: as it does not start with a capital letter.
Also you can omit this quantifier {1} as 1 is the default, and you don't have to escape \- and \: and \
sed -E 's/([^\[])(([A-Z][a-z]+-)?[A-Z]{1,3}[a-z]*: )/\1\n\2/g' File.eml
Output
0.32768:0A006832, 4.33024:DD040000 [Stage: CreateMessage]
Final-Recipient: rfc822;tXXXXXXXions.croXXXXXy#cXXXXXXXtique.fr
Action: failed
Status: 5.2.2
My bad! I made a mistake in my original question!
I want to capture these fields:
Final-Recipient:
-Action:
-Status:
-Diagnostic-Code:
Remote-MTA:
But not this ONE :
-[Stage: ...
So the regex from ghazal khaki is correct and works fine!
Again thanks for your support guys!

Regex to find after particular word inside a string

I am using regex to find few keywords after colon(:) and the best I have reached so far is:
sample test case
test {
test1 {
sadffd(test: "aff", aaa: "aa1") {}
}
}
Now I have to find a keyword inside () brackets and its working for 'aaa' but when I add test it fails, it matches entire words in string.
my regex so far
\btest(.*\w") (failed case) expected "aff" returned "aff", aaa: "aa1"
\baaa(.*\w") (pass case) returned "aa1"
please let me know if more information is needed
You may try
:\s*"(.*?)"
And the data you need is in the first capturing group.
Explanation
:\s*"(.*?)"
: colon
\s* followed by optionally any number of spaces
" followed by quote
( ) capturing group, containing...
.*? any number of character, matching as few as possible
" followed by quote
Demo:
https://regex101.com/r/WnvzdG/1
Update:
If you want to match ONLY after specific keywords, followed by colon, you can do something like:
(KEYWORD1|KEYWORD2|KEYWORD3)\s*:\s*"(.*?)"
First capture group will be the keyword matched, second capture group will be the value.
One more approach (executed in Python)
items = ['test{test1 {sadffd(test: "aff", aaa: "aa1") {}}}']
for item in items:
print(re.findall(r'"(\w+)"',item))
print(re.findall(r'(?<=: )"(\w+)"',item))
Output
['aff', 'aa1']
['aff', 'aa1']
I believe a simple regex would work to get everything inside the double quotes in your case:
("\w+")
Note that your question above says you want to capture "aff" and not just aff so I've included the surrounding quotes within the capturing group.
Example from regex101:
It's pretty crude but this should be OK for the input you've presented. (It wouldn't handle things like an escaped double quote in the string, for example).

Regular Expression issue with specific string

I have to parse an inconsistence string and these are the formats of the strings:
1SURNAME/NAMEMR (The last two or three chars are MR/MRS/MS/DR)
1SURNAME/NAME MR
or
1SURNAME/NAME
I need to catch this sequence using Regular Expression and I have built this one:
1[A-Z]*\/[A-Z]*[\s]?[[MRS|MR|MS|DR]+
but for this name it works only for:
1SMITH/GEORGEMR
1SMITH/GEORGE MR
but not for 1SMITH/GEORGE
Anyone knows what is going wrong here?
Put the last part into a non-capturing group and make it as optional by adding a ? quantifier next to that group.
\b1[A-Z]*\/[A-Z]*\s?(?:MRS|MR|MS|DR)?\b
DEMO

Find Regex for states in US having this pattern

Regex for us state
I want to retrieve state in this string. there is two types.
My string having these types.
US-VA-Arlington
VA-Arlington
In above from these i want to get state(VA) every time.
Please send suggestions.
Thanks,
Girish
Try following regex
([^-]*)-[^-]*$
Required state will be captured in \1
Try with following regex:
([A-Z]+)-\w+$
Use this regexp:
^(?:[A-Z]{2}-)?([A-Z]{2})-
The first optional group will match the country code if it exists; but it's a non-capturing group. The second group matches the state code. The state will be in capture group 1.
(US\-)?(\w\w)\-(\w+)
The first group collects 0 or 1 instances of US-
The second group collects the state abbreviation
The third group collects the city name - you may have to modify this regex to accept spaces (as others pointed out)

Find first point with regex

I want a regex which return me only characters before first point.
Ex :
T420_02.DOMAIN.LOCAL
I want only T420_02
Please help me.
You can use the following regex: ^(.*?)(?=\.)
The captured group contains what you need (T420_02 in your example).
This simple expression should do what you need, assuming you want to match it at the beginning of the string:
^(.+?)\.
The capture group contains the string before (but not including) the ..
Here's a fiddle: http://www.rexfiddle.net/s8l0bn3
Use regex pattern ^[^.]+(?=[.])