regex optional word - regex

I am trying to find a regex that will match each of the following cases from a set of ldap objectclass definitions - they're just strings really.
The variations in the syntax are tripping my regex up and I don't seem to be able to find a balance between the greedy nature of the match and the optional word "MAY".
( class1-OID NAME 'class1' SUP top STRUCTURAL MUST description MAY ( brand $ details $ role ) )
DESIRED OUTPUT: description
ACTUAL GROUP1: description
ACTUAL GROUP1 with ? on the MAY group: description MAY
( class2-OID NAME 'class2' SUP top STRUCTURAL MUST groupname MAY description )
DESIRED OUTPUT: groupname
ACTUAL GROUP1: groupname
ACTUAL GROUP1 with ? on the MAY group: groupname MAY description
( class3-OID NAME 'class3' SUP top STRUCTURAL MUST ( code $ name ) )
DESIRED OUTPUT: code $ name
ACTUAL GROUP1: no match
ACTUAL GROUP1 with ? on the MAY group: code $ name
( class4-OID NAME 'class4' SUP top STRUCTURAL MUST ( code $ name ) MAY ( group $ description ) )
DESIRED OUTPUT: code $ name
ACTUAL GROUP1: code $ name
ACTUAL GROUP1 with ? on the MAY group: code $ name
Using this:
MUST \(?([\w\$\-\s]+)\)?\s*(?:MAY) (Regex101)
matches lines 1, 2 and 4, but doesn't match the 3rd one with no MAY statement.
Adding an optional "?" to the MAY group results in a good match for 3 and 4, but then the 1st and 2nd lines act greedily and run on into MAY (line 1) or the remainder of the string (line 2).
It feels like I need the regex to consider MAY as optional but also that if MAY is found it should stop - I don't seem to be able to find that balance.

If you can use a regex with two capturing groups you may use
MUST\s+(?:\(([^()]+)\)|(\S+))\s*(?:MAY)?
See the regex demo
Details
MUST - a word MUST
\s+ - 1+ whitespaces
(?:\(([^()]+)\)|(\S+)) - two alternatives:
\( - (
([^()]+) - Group 1: 1+ chars other than ( and )
\) - a ) char
| - or
(\S+) - Group 2: one or more non-whitespace chars
\s+ - 1+ whitespaces
(?:MAY)? - an optional word MAY

Related

Match specific letter from group N Regex

I have the following log message:
Aug 25 03:07:19 localhost.localdomainASM:unit_hostname="bigip1",management_ip_address="192.168.41.200",management_ip_address_2="N/A",http_class_name="/Common/log_to_elk_policy",web_application_name="/Common/log_to_elk_policy",policy_name="/Common/log_to_elk_policy",policy_apply_date="2020-08-10 06:50:39",violations="HTTP protocol compliance failed",support_id="5666478231990524056",request_status="blocked",response_code="0",ip_client="10.43.0.86",route_domain="0",method="GET",protocol="HTTP",query_string="name='",x_forwarded_for_header_value="N/A",sig_ids="N/A",sig_names="N/A",date_time="2020-08-25 03:07:19",severity="Eror",attack_type="Non-browser Client,HTTP Parser Attack",geo_location="N/A",ip_address_intelligence="N/A",username="N/A",session_id="0",src_port="39348",dest_port="80",dest_ip="10.43.0.201",sub_violations="HTTP protocol compliance failed:Bad HTTP version",virus_name="N/A",violation_rating="5",websocket_direction="N/A",websocket_message_type="N/A",device_id="N/A",staged_sig_ids="",staged_sig_names="",threat_campaign_names="N/A",staged_threat_campaign_names="N/A",blocking_exception_reason="N/A",captcha_result="not_received",microservice="N/A",tap_event_id="N/A",tap_vid="N/A",vs_name="/Common/adv_waf_vs",sig_cves="N/A",staged_sig_cves="N/A",uri="/random",fragment="",request="GET /random?name=' or 1 = 1' HTTP/1.1\r\n",response="Response logging disabled"
And I have the following RegEx:
request="(?<Flag1>.*?)"
I trying now to match some text again from the previous group under name "Flag1", the new match that I'm trying to flag it is /random?name=' or 1 = 1' as Flag2.
How can I match the needed text from other matched group number or flag name without insert the new flag inside the targeted group like:
request="(?<Flag1>\w+\s+(?<Flag2>.*?)\s+HTTP.*?)"
https://regex101.com/r/EcBv7p/1
Thanks.
You can use
request="(?<Flag1>[A-Z]+\s+(?<Flag2>\/\S+='[^']*')[^"]*)"
See the regex demo.
Details:
(?<Flag1> - Flag1 group:
[A-Z]+ - one or more uppercase ASCII letters
\s+ - one or more whitespaces
(?<Flag2>\/\S+='[^']*') - Group Flag2: /, one or more non-whitespace chars, =', zero or more chars other than ', and then a ' char
[^"]* - zero or more chars other than "
) - end of Flag1 group.
If I understand you correctly, you want to match whatever string a previous group has matches, right?
In that case you can use \n or in this case \1 to match the same thing that your first capture group matched

Match string between delimiters, but ignore matches with specific substring

I have to parse all the text in a paranthesis but not the one that contains "GST"
e.g:
(AUSTRALIAN RED CROSS – ATHERTON)
(Total GST for this Invoice $1,104.96)
today for a quote (07) 55394226 − admin.nerang#waste.com.au − this applies to your Nerang services.
expected parsed value:
AUSTRALIAN RED CROSS – ATHERTON
I am trying:
^\(((?!GST).)*$
But its only matching the value and not grouping correctly.
https://regex101.com/r/HndrUv/1
What would be the correct regex for the same?
This regex should work to get the expected string:
^\((?!.*GST)(.*)\)$
It first checks if it does not contain the regular expression *GST. If true, it then captures the entire text.
(?!*GST)(.*)
All that is then surrounded by \( and \) to leave it out of the capturing group.
\((?!.*GST)(.*)\)
Finally you add the BOL and EOL symbols and you get the result.
^\((?!.*GST)(.*)\)$
The expected value is saved in the first capture group (.*).
You can use
^\((?![^()]*\bGST\b)([^()]*)\)$
See the regex demo. Details:
^ - start of string
\( - a ( char
(?![^()]*\bGST\b) - a negative lookahead that fails the match if, immediately to the right of the current location, there are zero or more chars other than ) and ( and then GST as a whole word (remove \bs if you do not need whole word matching)
([^()]*) - Group 1: any zero or more chars other than ) and (
\) - a ) char
$ - end of string
Bonus:
If substrings in longer texts need to be matched, too, you need to remove ^ and $ anchors in the above regex.

How to include a substring EXCEPT an exact one in middle of REGEX expression?

Issue
I'm trying to match 3 groups, where one is conditional
String: 12345-12345-1230
Group 1: 12345-12345
Group 2: -123
Group 3: 0
However I only want to match Group 2 if the string is NOT "-000". Meaning group 2 will either be blank if that section is '-000' or it will be whatever else those 4 characters are; '-123' '-001', etc.).
Here is the REGEX with it just accepting anything as group 2:
^(.{5}-.{5})(.{4})([0-9])$ regex101
What I've tried
Negative Lookahead:
^(.{5}-.{5})(?!-000)([0-9])$
^(.{5}-.{5})(.{4}(?!.{4}))([0-9])$
OR Operator:
^(.{5}-.{5})(-000)|(.{4})([0-9])$
This is the closest I've come, however I can't get it to work WITH the final condition ([0-9])$. It's also not ideal to have the remove case (-000) as a separate group as the accept case (not -000).
You may try:
^(\d{5}-\d{5})(?:-000|(-\d{3}))(\d)$
See the online demo.
^ - Start of line ancor.
( - Open 1st capture group.
\d{5}-\d{5} - Match 5 digits, an hyphen, and again 5 digits.
) - Close 1st capture group.
(?: - Open non-capturing group.
-000 - Match "-000" literally.
| - Pipe symbol used as an or-operator.
( - Open 2nd capture group.
-\d{3} - match an hyphen and 3 digits.
) - Close 2nd capture group.
) - Close non-capturing group.
( - Open 3rd capture group.
(\d) - Match a single digit.
) - Close 3rd capture group.
$ - End line ancor.
If you want to capture the 2nd group without hypen, then try: ^(\d{5}-\d{5})-(?:000|(\d{3}))(\d)$
Try this:
(\d{5}-\d{5})(?!-000)(-\d{3})(0)
See Demo

Regex: How to capture one set of parenthesis, but not the next

I have the following data.
Nike.com (Nike) (Apparel)
Adidas.com (Adidas) (Footwear)
Under Armour (Accessories)
Lululemon (Apparel)
I am trying to capture the company name, but not the type of product. Specifically, I want to capture
Nike.com (Nike)
Adidas.com (Adidas)
Under Armour
Lululemon
Using this RegEx:
(.+? \(.+?\))
I get the following:
Nike.com (Nike)
Adidas.com (Adidas)
Under Armour (Accessories)
Lululemon (Apparel)
This works for Nike and Adidas, but it doesn't work for Under Armour or Lululemon. The type of product will always be at the end of the line. I've tried the following with no success:
(.+? \(.+?\)(?!Accessories|Apparel|Footwear))
(.+? \(.+?\)(?!.*Accessories|.*Apparel|.*Footwear).*)
You seem to want to get all up to the parenthesized substring at the end of string.
You may use
^(.+?) *\([^()]+\)$
See the regex demo
Details
^ - start of string
(.+?) - Group 1: any one or more chars other than line break chars, as few as possible
* - zero or more spaces
\( - a ( char
[^()]+ - 1+ chars other than ( and )
\) - a ) char
$ - end of string.

Regex for text file

I have a text file with the following text:
andal-4.1.0.jar
besc_2.1.0-beta
prov-3.0.jar
add4lib-1.0.jar
com_lab_2.0.jar
astrix
lis-2_0_1.jar
Is there any way i can split the name and the version using regex. I want to use the results to make two columns 'Name' and 'Version' in excel.
So i want the results from regex to look like
andal 4.1.0.jar
besc 2.1.0-beta
prov 3.0.jar
add4lib 1.0.jar
com_lab 2.0.jar
astrix
lis 2_0_1.jar
So far I have used ^(?:.*-(?=\d)|\D+) to get the Version and -\d.*$ to get the Name separately. The problem with this is that when i do it for a large text file, the results from the two regex are not in the same order. So is there any way to get the results in the way I have mentioned above?
Ctrl+H
Find what: ^(.+?)[-_](\d.*)$
Replace with: $1\t$2
check Wrap around
check Regular expression
UNCHECK . matches newline
Replace all
Explanation:
^ # beginning of line
(.+?) # group 1, 1 or more any character but newline, not greedy
[-_] # a dash or underscore
(\d.*) # group 2, a digit then 0 or more any character but newline
$ # end of line
Replacement:
$1 # content of group 1
\t # a tabulation, you may replace with what you want
$2 # content of group 2
Result for given example:
andal 4.1.0.jar
besc 2.1.0-beta
prov 3.0.jar
add4lib 1.0.jar
com_lab 2.0.jar
astrix
lis 2_0_1.jar
Not quite sure what you meant for the problem in large file, and I believe the two regex you showed are doing opposite as what you said: first one should get you the name and second one should give you version.
Anyway, here is the assumption I have to guess what may make sense to you:
"Name" may follow by - or _, followed by version string.
"Version" string is something preceded by - or _, with some digit, followed by a dot or underscore, followed by some digit, and then any string.
If these assumption make sense, you may use
^(.+?)(?:[-_](\d+[._]\d+.*))?$
as your regex. Group 1 is will be the name, Group 2 will be the Version.
Demo in regex101: https://regex101.com/r/RnwMaw/3
Explanation of regex
^ start of line
(.+?) "Name" part, using reluctant match of
at least 1 character
(?: )? Optional group of "Version String", which
consists of:
[-_] - or _
( ) Followed by the "Version" , which is
\d+ at least 1 digit,
[._] then 1 dot or underscore,
\d+ then at least 1 digit,
.* then any string
$ end of line