I've been cutting my teeth on regex over the past couple of days, and have encountered an issue I cant seem to get past.
Lets assume the following 3 string values
AKA NAME:FOO
FOO
AKA NAME:
My goal is to capture the value of the string after AKA NAME: in a named match group, and if AKA NAME: is not present, capture the entire string in the match group. If "AKA NAME:" IS present with no subsequent value, the regex expression should fail. I have developed the following expression
^(?:AKA NAME:)?\s*(?<VALUE>(.|\n|\r){1,225})$
This will correctly capture the word "FOO" in the first 2 strings above, however, in the third it captures "AKA NAME:" in the match group. I figured putting ? after the non-capture group containing "AKA NAME:" would have caused the engine to skip this value, but it is not.
Can someone give me some guidance?
You can try with:
(?:AKA NAME:)*(.+)*
and check if $1 exist.
DEMO
Use a look behind assertion and then exclusion set for "AKA NAME:" only:
EDITTED:
(?<=AKA NAME:)\s?(\w+)|(?!AKA NAME:)^(\w+)
DEMO
I think you can use this regex:
"^(AKA NAME\s*:)?\s*(.*)$"gm
and get \2 for your result.
[Regex Demo]
^(AKA NAME:)?(.*)$
\2 should contain what you're looking for.
Related
I try since 2 day to write an Regex who capture some information from my postmaster digest.
Exemple:
0.32768:0A006832, 4.33024:DD040000 [Stage: CreateMessage]Final-Recipient: rfc822;tXXXXXXXions.croXXXXXy#cXXXXXXXtique.frAction: failedStatus: 5.2.2Diagnostic-Code: smtp;554 5.2.2 mailbox full;
I want to capture sentence like that:
Final-Recipient:
Action:
Status:
Diagnostic-Code:
Remote-MTA:
BUT i dont want to capture
[Stage]:
I wrote a regex who work perfectly fine for capturing :
([A-Z]{1}[a-z]+\-)?[A-Z]{1,3}[a-z]*\:\
But sadly i dont know how to says to my regex to NOT capturing sentences that start with a "["
i tried this :
[^\[]([A-Z]{1}[a-z]+\-)?[A-Z]{1,3}[a-z]*\:\
This avoid capturing "[Stage:" but capture one caracters before each other captured sentences.
Anyone know how to capture my postmaster errors ?
Thanks in advance.
(NB: Edited i removed "failedStatus:" and replaced by "Status: ")
Add (?<!(\[)) before your first regex. the final result would be what you want.
complete answer:
(?<!(\[))([A-Z]{1}[a-z]+\-)?[A-Z]{1,3}[a-z]*\:\
explanation:
You want to prevent having [ element before your phrase which in regex would be (\[) and you want to don't have it before phrase which means you want to use not equal lookBehind. in regex ?< is lookBehind and ! is not.
so what you need is ?<!(\[)
Using sed, you can use capture groups for the first part that matches any character except ] and another group for the whole last part including the optional capture group inside.
Use those in the replacement with a newline between group 1 and group 2 \1\n\2
Note that your pattern would not match failedStatus: as it does not start with a capital letter.
Also you can omit this quantifier {1} as 1 is the default, and you don't have to escape \- and \: and \
sed -E 's/([^\[])(([A-Z][a-z]+-)?[A-Z]{1,3}[a-z]*: )/\1\n\2/g' File.eml
Output
0.32768:0A006832, 4.33024:DD040000 [Stage: CreateMessage]
Final-Recipient: rfc822;tXXXXXXXions.croXXXXXy#cXXXXXXXtique.fr
Action: failed
Status: 5.2.2
My bad! I made a mistake in my original question!
I want to capture these fields:
Final-Recipient:
-Action:
-Status:
-Diagnostic-Code:
Remote-MTA:
But not this ONE :
-[Stage: ...
So the regex from ghazal khaki is correct and works fine!
Again thanks for your support guys!
I am using regex to find few keywords after colon(:) and the best I have reached so far is:
sample test case
test {
test1 {
sadffd(test: "aff", aaa: "aa1") {}
}
}
Now I have to find a keyword inside () brackets and its working for 'aaa' but when I add test it fails, it matches entire words in string.
my regex so far
\btest(.*\w") (failed case) expected "aff" returned "aff", aaa: "aa1"
\baaa(.*\w") (pass case) returned "aa1"
please let me know if more information is needed
You may try
:\s*"(.*?)"
And the data you need is in the first capturing group.
Explanation
:\s*"(.*?)"
: colon
\s* followed by optionally any number of spaces
" followed by quote
( ) capturing group, containing...
.*? any number of character, matching as few as possible
" followed by quote
Demo:
https://regex101.com/r/WnvzdG/1
Update:
If you want to match ONLY after specific keywords, followed by colon, you can do something like:
(KEYWORD1|KEYWORD2|KEYWORD3)\s*:\s*"(.*?)"
First capture group will be the keyword matched, second capture group will be the value.
One more approach (executed in Python)
items = ['test{test1 {sadffd(test: "aff", aaa: "aa1") {}}}']
for item in items:
print(re.findall(r'"(\w+)"',item))
print(re.findall(r'(?<=: )"(\w+)"',item))
Output
['aff', 'aa1']
['aff', 'aa1']
I believe a simple regex would work to get everything inside the double quotes in your case:
("\w+")
Note that your question above says you want to capture "aff" and not just aff so I've included the surrounding quotes within the capturing group.
Example from regex101:
It's pretty crude but this should be OK for the input you've presented. (It wouldn't handle things like an escaped double quote in the string, for example).
I have to parse an inconsistence string and these are the formats of the strings:
1SURNAME/NAMEMR (The last two or three chars are MR/MRS/MS/DR)
1SURNAME/NAME MR
or
1SURNAME/NAME
I need to catch this sequence using Regular Expression and I have built this one:
1[A-Z]*\/[A-Z]*[\s]?[[MRS|MR|MS|DR]+
but for this name it works only for:
1SMITH/GEORGEMR
1SMITH/GEORGE MR
but not for 1SMITH/GEORGE
Anyone knows what is going wrong here?
Put the last part into a non-capturing group and make it as optional by adding a ? quantifier next to that group.
\b1[A-Z]*\/[A-Z]*\s?(?:MRS|MR|MS|DR)?\b
DEMO
Regex for us state
I want to retrieve state in this string. there is two types.
My string having these types.
US-VA-Arlington
VA-Arlington
In above from these i want to get state(VA) every time.
Please send suggestions.
Thanks,
Girish
Try following regex
([^-]*)-[^-]*$
Required state will be captured in \1
Try with following regex:
([A-Z]+)-\w+$
Use this regexp:
^(?:[A-Z]{2}-)?([A-Z]{2})-
The first optional group will match the country code if it exists; but it's a non-capturing group. The second group matches the state code. The state will be in capture group 1.
(US\-)?(\w\w)\-(\w+)
The first group collects 0 or 1 instances of US-
The second group collects the state abbreviation
The third group collects the city name - you may have to modify this regex to accept spaces (as others pointed out)
I want a regex which return me only characters before first point.
Ex :
T420_02.DOMAIN.LOCAL
I want only T420_02
Please help me.
You can use the following regex: ^(.*?)(?=\.)
The captured group contains what you need (T420_02 in your example).
This simple expression should do what you need, assuming you want to match it at the beginning of the string:
^(.+?)\.
The capture group contains the string before (but not including) the ..
Here's a fiddle: http://www.rexfiddle.net/s8l0bn3
Use regex pattern ^[^.]+(?=[.])