Regex multiple conditions - regex

I recently started using regex, but I can't seem to figure out this problem:
https://xxxx.yyyy.com/en
For this URL I want to create a regex that is only valid when all conditions below are true:
does not contain 'xxxx'
does contain /en$ or /en/
I got to validate the 2 seperate conditions, but can't seem to put them together
\/en\/|\/en$|^(?!.*(xxxx)).*$
Can you please help?
Thanks!

You may use
/^(?!.*xxxx).*\/en(?:$|\/)/
See the regex demo
Details
^ - start of string
(?!.*xxxx) - there can't be xxxx after any 0+ chars other than line break chars
.* - any 0 or more chars other than line break chars, as many as possible
\/en - /en substring
(?:$|\/) - end of string or /
So, if you want to replace xxxx with more than one term, use
/^(?!.*(?:stage|acc)).*\/en(?:$|\/)/
Note that you may force the engine to match them as whole words if you add word boundaries:
/^(?!.*\b(?:stage|acc)\b).*\/en(?:$|\/)/
If you need a full string match, add .* at the end of the pattern.

Using only lookarounds:
^(?!.*xxxx)(?=.*\/en(?:$|\/)).*
^ // start of line
(?!.*xxxx) // look ahead and don't match anything then 'xxxx'
(?= // look ahead and match
.*\/en // anything then '/en'
(?:$|\/) // end of line OR a slash
) // end of look ahead
.* // match all (can be omitted if testing lines)
Flags: global, multiline
Steps: 188
Demo

Related

Regex matching multiple groups

I am very new to Regex and trying to create filter rule to get some matches. For Instance, I have query result like this:
application_outbound_api_external_metrics_service_plus_success_total
application_outbound_api_external_metrics_service_plus_failure_total
application_inbound_api_metrics_service_success_total
application_inbound_api_metrics_service_failure_total
Now I want to filter ONLY lines which contains "outbound" AND "service_plus" AND "failure".
I tried to play with groups, but how can I create an regex, but somwhere I am misundersteanding this which contains in wrong results.
Regex which I used:
/(?:outbound)|(?:service_plus)|(?:failure)/
You should use multiple lookahead assertions:
^(?=.*outbound)(?=.*service_plus)(?=.*failure).*\n?
The above should use the MULTILINE flag so that ^ is interpreted as start of string or start of line.
^ - matches start of string or start of line.
(?=.*outbound) - asserts that at the current position we can match 0 or more non-newline characters followed by 'outbound` without consuming any characters (i.e. the scan position is not advanced)
(?=.*service_plus) - asserts that at the current position we can match 0 or more non-newline characters followed by 'service_plus` without consuming any characters (i.e. the scan position is not advanced)
(?=.*failure) - asserts that at the current position we can match 0 or more non-newline characters followed by 'failure` without consuming any characters (i.e. the scan position is not advanced)
.*\n? - matches 0 or more non-line characters optionally followed by a newline (in case the final line does not terminate in a newline character)
See RegEx Demo
In Python, for example:
import re
lines = """application_outbound_api_external_metrics_service_plus_success_total
application_outbound_api_external_metrics_service_plus_failure_total
application_inbound_api_metrics_service_success_total
application_inbound_api_metrics_service_failure_total
failureoutboundservice_plus"""
rex = re.compile(r'^(?=.*outbound)(?=.*service_plus)(?=.*failure).*\n?', re.M)
filtered_lines = ''.join(rex.findall(lines))
print(filtered_lines)
Prints:
application_outbound_api_external_metrics_service_plus_failure_total
failureoutboundservice_plus
You need to make use of lookaheads to assert that multiple things need to exist regardless of the order they exist:
^(?=.*(?:^|_)outbound(?:_|$))(?=.*(?:^|_)service_plus(?:_|$))(?=.*(?:^|_)failure(?:_|$)).+$
^ - start line anchor
(?= - open the positive lookahead aka "ahead of me is..."
.* - optionally anything
(?:^|_) - start line anchor or underscore
outbound - the word "outbound"
(?:_|$) - underscore or end line anchor
The underscores and line anchors ensure we don't have false positives like "outbounds" or "goutbound"
) - close the positive lookahead
Rinse and repeat for "service_plus" and "failure"
Since we haven't captured any chars yet, the second and third lookaheads allow for searching the terms in any order
.+$ - capture everything till the end of the line
https://regex101.com/r/Zhl4Mf/1
If the order does matter then build a regex in the correct order:
^.*_outbound_.*_service_plus_failure_.*$
https://regex101.com/r/b7O5YK/1

How to clear lines after the last regex match

I got an huge log of records I need to turn into a table.
Each line has a record, preceded by date and time, something like this:
27/11/2019 16:35 - i don't need this
28/11/2019 17:25 - don't need this either
30/11/2019 11:33 - stuff i'm looking for
01/12/2019 08:11 - stuff that i'm also looking for
03/11/2019 09:39 - don't need this
I want to completely clear the file from all the lines that I don't need.
I'm able to clear most of the lines that I don't want if I use the following regex and substitution patterns (in notepad++, using the flag in which dot matches newline):
.+?(?<datetime>[\d\/]+\s[\d:]+)\s-\s(?<mystuff>stuff[^\n]+)
'${datetime};${mystuff}
However, I can't clear the lines after the last match. How could I do so?
You may use
Find What: ^(?:.+?([\d/]+\h[\d:]+)\h-\h(stuff.*)|.*\R?)
Replace With: (?{1}$1;$2)
Details
^ - start of a line
(?:.+?([\d/]+\h[\d:]+)\h-\h(stuff.*)|.*\R?) - match either
.+? - any 1+ chars, as few as possible
([\d/]+\h[\d:]+) - Group 1: one or more digits or /, a horizontal whitespace, one or more digits or :
\h-\h - a horizontal whitespace, - and a hor. whitespace
(stuff.*) - Group 2: stuff and the rest of the line
| - or
.* - any 0+ chars other than linebreak chars
\R? - an optional line break sequence.
The (?{1}$1;$2) replacement pattern only replaces with $1;$2 if Group 1 matches.
See the Notepad++ demo:

Regex to find a starting pattern including either of 2 strings but not contain a specific text

I want to use Regex to find a line containing a particular pattern.
The pattern should be a string starting with 2 characters (a-zA-Z0-9) followed by a dash then either "FAL" or "SAL" and does not include the term "OJT" at all.
Just want to make sure I have the right or am I missing something as it doesn't appear to work as expected
^[a-zA-z0-9]{1,2}(?=.*?\-SAL|-FAL\b)((?!OJT).)*$
You may use
^[a-zA-Z0-9]{1,2}(?!.*OJT).*?(?:-SAL|-FAL)\b.*
See the regex demo
Details
^ - start of string
[a-zA-Z0-9]{1,2} - one or two alphanumeric chars
(?!.*OJT) - any 0+ chars, as few as possible, followed with OJT char sequence should not appear immediately to the right of the current location
.*? - any 0+ chars other than line break chars as few as possible
(?:-SAL|-FAL)\b - -SAL or -FAL not followed with a word char
.* - the rest of string.
See the regex graph:

Regex Need to Match word but exclude other words in any order

I have the following combination of lines:-
WAN-bridge
bridge-WAN
WAN-VLAN
ether1-WAN <-----
ether2-hello
ether2-wan2 <-----
WAN-BRIDGE
wan-bridge
bridge-wan
vlan918-WAN
VLAN-wan
wan-ether1 <-----
wan-Bridge
I need a PCRE regex to match any line that contains 'wan' but excludes the words 'vlan' and 'bridge' in any order and irrespective of case.
I have marked the lines I wish to match.
I have tried so many variations, but none have worked.
Any help would be appreciated.
You can use this
^(?=.*wan)(?!.*(vlan|bridge)).*$
^ - start of string.
(?=.*wan) - positive lookahead. condition for wan must be in line.
(?!.*(vlan|bridge)) - negative lookahead. condition for vlan and bridge must not be in line.
.* - match anything except new line.
$ - end of string.
Demo

How to get the first match in regexp?

I have three strings as list below:
Levofloxacin 500mg/100mL
Levofloxacin 500mg
Procaterol Hydrochloride …………… 25μg
The first line, I want to just get 'mg' without 'mL' in my result.
The second line, I want get 'mg'.
The third line, I want get 'ug'.
I have try regexp pattern like:
(?!(.*[ ]{1}[0-9]+))[a-zA-Zμ]+
However, the first line always returns 'mg' with 'mL'...
How could I just acquire 'mg' with regexp?
Any suggestions will be appreciated.
As mentioned in the comment section, try this regex:
^\D*[\d.]+\K[a-zμ]+
Click for Demo
Explanation:
^ - asserts the start of the string
\D* - matches 0+ occurrences of any character that is not a digit
[\d.]+ - matches 1+ occurrences of any character that is a digit
\K - removes what has been matched so far
[a-zμ]+ - this is what you want. This will contain the units like mg, ml appearing after the first number. If there are any other special characters like μ, you can add them too in this character list