I have two scenarios where I need two regex.
/vendors?(\-[a-z]*)*/
/vendor-staff?(\-[a-z]*)*/
My problem is that first one interfere with second one:
With the first one I need to capture cases like: vendor, vendor-add, vendor-edit, vendor-list;
Second one needs to capture cases where: vendor-staff-add, vendor-staff-edit exists only;
How can I do that? I tried several options without success.
I tried to validate those here: https://regexr.com/3uddc
Thank you
You may add a negative lookahead (?!-staf) after vendor in the first regex:
vendor(?!-staf)s?(-[a-z]*)*
To prevent consecutive hyphens, you need to replace the [a-z]* with [a-z]+ pattern:
vendor(?!-staf)s?(-[a-z]+)*
See the regex demo
Details
vendor - a literal substring
(?!-staf) - no -staf substring allowed right after vendor
s? - an optional s
(-[a-z]+)* - 0 or more occurrences of - and then 1+ lowercase ASCII letters.
Related
I have a conditional lookahead regex that tests to see if there is a number substring at the end of a string, and if so match for the numbers, and if not, match for another substring
The string in question: "H2K 101"
If just the lookahead is used, i.e. (?=\d{1,8}$)(\d{1,8}$), the lookahead succeeds, and "101" is found in capture group 1
When the lookahead is placed into a conditional, i.e. (?(?=\d{1,8}\z)(\d{1,8}\z)|([a-zA-Z]+[\d_-]{1,8}[a-zA-Z]+)), the lookahead now fails, and the second pattern is used, matching "H2K", and a "2" is found in capture group 2.
If the test string has the "2" swapped for a letter, i.e. "HKK 101"
then the lookahead conditional works as expected, and the number "101" is once again found in capture group 1.
I've tested this in Regex101 and other PCRE engines, and all work the same, so clearly I'm missing something obvious about conditionals or the condition regex I'm using. Any insight greatly appreciated.
Thanks.
The look ahead starts at the current position, so initially it fails, and the alternative is used -- where it finds a match at the current position.
If you want the look ahead to succeed when still at the initial position, you need to allow for the intermediate characters to occur. Also, when the alternative kicks in, realise that there can follow a second match that still uses the look ahead, but now at a position where the look ahead is successful.
From what I understand, you are interested in one match only, not two consecutive matches (or more). So that means you should attempt to match the whole string, and capture the part of interest in a capture group. Also, the look ahead should be made to succeed when still at the initial position. This all means you need to inject several .*. There is no need for a conditional.
(?=.*\d{1,8}\z).*?(\d{1,8}\z)|([a-zA-Z]+[\d_-]{1,8}[a-zA-Z]+).*
Note also that (?=.*\d{1,8}\z) succeeds if and only when (?=.*\d\z) succeeds, so you can simplify that:
(?=.*\d\z).*?(\d{1,8}\z)|([a-zA-Z]+[\d_-]{1,8}[a-zA-Z]+).*
There are two capture groups. It there is a match, exactly one of the capture groups will have a non-empty matching content, which is the content you need.
You want to match a number of specific length at the end of the string, and if there is none, match something else.
There is no need for a conditional here. Conditional patterns are necessary to examine what to match next at the given position inside the string based either on a specific group match or a lookaround test. They are not useful when you want to give priority to a specific pattern.
Here, you can use a PCRE pattern based on the \K operator like
.*?\K\d{1,8}\z|[a-zA-Z]+[\d_-]{1,8}[a-zA-Z]+
Or, using capturing groups
(?|.*?(\d{1,8})\z|([a-zA-Z]+[\d_-]{1,8}[a-zA-Z]+))
See the regex demo #1 and regex demo #2.
Details:
.*?\K\d{1,8}$ - any zero or more chars other than line break chars, as few as possible, then the match reset operator that discards the text matched so far, then one to eight digits at the end of string
| - or
[a-zA-Z]+[\d_-]{1,8}[a-zA-Z]+ - one or more letters, 1-8 digits, underscores or hyphens, and then one or more letters.
And
(?| - start of the branch reset group:
.*? - any zero or more chars other than line break chars, as few as possible
(\d{1,8}) - Group 1: one to eight digits
\z - end of string
| - or
( - Group 1 start:
[a-zA-Z]+ - one or more ASCII letters
[\d_-]{1,8} - one to eight digits, underscores, hyphens
[a-zA-Z]+ - one or more ASCII letters
) - Group 1 end
) - end of the group.
I am trying to create a regular expression that will identify possible abbreviations within a given string in Python. I am kind of new to RegEx and I am having difficulties creating an expression though I beleive it should be somewhat simple. The expression should pick up words that have two or more capitalised letter. The expression should also be able to pick up words where a dash have been used in-between and report the whole word (both before and after the dash). If numbers are also present they should also be reported with the word.
As such, it should pick up:
ABC, AbC, ABc, A-ABC, a-ABC, ABC-a, ABC123, ABC-123, 123-ABC.
I have already made the following expression: r'\b(?:[a-z]*[A-Z\-][a-z\d[^\]*]*){2,}'.
However this does also pick up these wrong words:
A-bc, a-b-c
I believe the problem is that it looks for either multiple capitalised letters or dashes. I wish for it to only give me words that have atleast two or more capitalised letters. I understand that it will also "mistakenly" take words as "Abc-Abc" but I don't believe there is a way to avoid these.
If a lookahead is supported and you don't want to match double -- you might use:
\b(?=(?:[a-z\d-]*[A-Z]){2})[A-Za-z\d]+(?:-[A-Za-z\d]+)*\b
Explanation
\b A word boundary
(?= Positive lookahead, assert that from the current location to the right is
(?:[a-z\d-]*[A-Z]){2} Match 2 times the optionally the allowed characters and an uppercase char A-Z
) Close the lookahead
[A-Za-z\d]+ match 1+ times the allowed characters without the hyphen
(?:-[A-Za-z\d]+)* Optionally repeat - and 1+ times the allowed characters
\b A word boundary
See a regex101 demo.
To also not not match when there are hyphens surrounding the characters you can use negative lookarounds asserting not a hyphen to the left or right.
\b(?<!-)(?=(?:[a-z\d-]*[A-Z]){2})[A-Za-z\d]+(?:-[A-Za-z\d]+)*\b(?!-)
See another regex demo.
Currently, I am not expert in Regex, but I tried below thing I want to improve it better, can some one please help me?
Pattern can contain ASCII letters, spaces, commas, periods, ', . and - special characters, and there can be one digit at the end of string.
So, it's working well
/^[a-z ,.'-]+(\d{1})?$/i
But I want to put condition that at least 2 letters should be there, could you please tell me, how to achieve this and explain me bit as well, please?
Note that {1} is always redundant in any regex, please remove it to make the regex pattern more readable. (\d{1})? is equal to \d? and matches an optional digit.
Taking into account the string must start with a letter, you can use
/^(?:[a-z][ ,.'-]*){2,}\d?$/i
Details:
^ - start of string
(?: - start of a non-capturing group (it is used here as a container for a pattern sequence to quantify):
[a-z] - an ASCII letter
[ ,.'-]* - zero or more spaces, commas, dots, single quotation marks or hyphens
){2,} - end of group, repeat two or more ({2,}) times
\d? - an optional digit
$ - end of string
i - case insensitive matching is ON.
See the regex demo.
The thing to change in your regex is + after the list of allowed characters.
+ means one or many occurrences of the provided characters. If you want to have 2 or more you can use {2,}
So your regex should look something like
/^[a-z ,.'-]{2,}\d?$/i
I have this regex to detect an email address:
(?=.*[a-zA-Z])([a-zA-Z0-9_.+-]{8,})#(\S+\.\S+)
The requirement: The part before # needs to contain at least one letter and be at least 8 characters long.
I'm using positive lookahead to see if it contains a letter, but lookahead actually apply to the entire line (the part after # usually will contain letters), so this will pass
123456789#gmail.com
So question is, how can I validate only the result of the first capturing group (in this case 123456789) to see if it has a letter or not?
The [a-zA-Z0-9_.+-]{8,} consuming pattern part before # does not match #, so the lookahead check should only check for a letter after 0 or more chars other than #.
Using
(?=[^#]*[a-zA-Z])([a-zA-Z0-9_.+-]{8,})#(\S+\.\S+)
will fix the issue. See the regex demo and a Regulex graph:
You may further optimize the lookahead pattern by precising the [^#]. E.g. since you only allow 0-9_.+- apart from letters, you may write the regex as
(?=[0-9_.+-]*[a-zA-Z])([a-zA-Z0-9_.+-]{8,})#(\S+\.\S+)
^^^^^^^^^
See this regex demo.
Or, you may follow the principle of contrast (suggested in comments), and use [^#a-zA-Z]* instead of [^#]*.
Depending on where you are using the regex, you might want to wrap it with ^ and $ anchors to ensure a full string match.
I need to find regex which matches both:
;hostname:MytestHello;
;message:#Hellowtestworld;
In this value:
;hostname:MytestHello;severity:major;message:#Hellowtestworld;
Here is my regex shot:
(hostname:|message:).*?(test).*?\;
But I only get the first occurence:
hostname:nimsofttest22;
What can I do in order to get BOTH results ?
While the multiple matching part is easy to solve with a global modifier or the correct language function/method that returns multiple matches, your pattern contains a flaw: it may return unwanted results if message or hostname with no test after them appear before another occurrence with test. See this regex demo to understand what I mean.
So, the correct way is to restrict . here, to match any char but ; (that acts as a delimiter in your string):
/(?:hostname|message):[^;]*?test[^;]*;/g
See this regex demo.
Note: you should adapt the pattern for any language method//function that you will choose later in the code.
Details
(?:hostname|message) - either of the 2 substrings
: - a colon
[^;]*? - any 0+ chars other than ;, as few as possible
test - test
[^;]* - any 0+ chars other than ; as many as possible
; - a semi-colon.