How can I set maximum length to this regular expression? - regex

This validation works well for allowing alphanumeric chars, spaces, and a dash, but I have not been able to set the max length to 23.
Regex:
(^\w+\s*(-?)(\s*\w+\s*)(\w+)$){0,23}
The cases I need to pass:
Winston1-Salem6
Winston-Salem
Winston Salem
1-two3
word2 with space
Cases I need to fail:
-Newberty-Los-
12345678901234567890123444

It may be more convenient to check the length separately, but you can use a lookahead to confirm that the entire expression is between 0 and 23 characters.
(?=^.{0,23}$)(^\w+\s*(-?)(\s*\w+\s*)(\w+)$)
http://rubular.com/r/GVIbG8hDKz

Just use a look ahead to assert a max length:
(?=^.{1,23}$)^\w+\s*(-?)(\s*\w+\s*)(\w+)$
Demo
Or a negative lookahead works too:
(?!^.{24,})^\w+\s*(-?)(\s*\w+\s*)(\w+)$
Demo
Variable width lookaheads are supported in most modern regex flavors

^(?!(^-|-$|.{24,})).*
Winston1-Salem6 - PASS
Winston-Salem - PASS
Winston Salem - PASS
1-two3 - PASS
word2 with space - PASS
-Newberty-Los- - FAIL
12345678901234567890123444 - FAIL
Demo
https://regex101.com/r/eM3qR9/2
Regex Explanation:
^(?!(^-|-$|.{24,})).*
Assert position at the beginning of the string «^»
Assert that it is impossible to match the regex below starting at this position (negative lookahead) «(?!(^-|-$|.{24,}))»
Match the regex below and capture its match into backreference number 1 «(^-|-$|.{24,})»
Match this alternative «^-»
Assert position at the beginning of the string «^»
Match the character “-” literally «-»
Or match this alternative «-$»
Match the character “-” literally «-»
Assert position at the end of the string, or before the line break at the end of the string, if any «$»
Or match this alternative «.{24,}»
Match any single character that is NOT a line break character «.{23,}»
Between 24 and unlimited times, as many times as possible, giving back as needed (greedy) «{24,}»
Match any single character that is NOT a line break character «.*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»

Regex can't total lengths of matches up like you want.
Use
/^(\w+([ -]\w+)*)$/
and check the length of the group #1 manually after the match.

Related

Regex matching multiple groups

I am very new to Regex and trying to create filter rule to get some matches. For Instance, I have query result like this:
application_outbound_api_external_metrics_service_plus_success_total
application_outbound_api_external_metrics_service_plus_failure_total
application_inbound_api_metrics_service_success_total
application_inbound_api_metrics_service_failure_total
Now I want to filter ONLY lines which contains "outbound" AND "service_plus" AND "failure".
I tried to play with groups, but how can I create an regex, but somwhere I am misundersteanding this which contains in wrong results.
Regex which I used:
/(?:outbound)|(?:service_plus)|(?:failure)/
You should use multiple lookahead assertions:
^(?=.*outbound)(?=.*service_plus)(?=.*failure).*\n?
The above should use the MULTILINE flag so that ^ is interpreted as start of string or start of line.
^ - matches start of string or start of line.
(?=.*outbound) - asserts that at the current position we can match 0 or more non-newline characters followed by 'outbound` without consuming any characters (i.e. the scan position is not advanced)
(?=.*service_plus) - asserts that at the current position we can match 0 or more non-newline characters followed by 'service_plus` without consuming any characters (i.e. the scan position is not advanced)
(?=.*failure) - asserts that at the current position we can match 0 or more non-newline characters followed by 'failure` without consuming any characters (i.e. the scan position is not advanced)
.*\n? - matches 0 or more non-line characters optionally followed by a newline (in case the final line does not terminate in a newline character)
See RegEx Demo
In Python, for example:
import re
lines = """application_outbound_api_external_metrics_service_plus_success_total
application_outbound_api_external_metrics_service_plus_failure_total
application_inbound_api_metrics_service_success_total
application_inbound_api_metrics_service_failure_total
failureoutboundservice_plus"""
rex = re.compile(r'^(?=.*outbound)(?=.*service_plus)(?=.*failure).*\n?', re.M)
filtered_lines = ''.join(rex.findall(lines))
print(filtered_lines)
Prints:
application_outbound_api_external_metrics_service_plus_failure_total
failureoutboundservice_plus
You need to make use of lookaheads to assert that multiple things need to exist regardless of the order they exist:
^(?=.*(?:^|_)outbound(?:_|$))(?=.*(?:^|_)service_plus(?:_|$))(?=.*(?:^|_)failure(?:_|$)).+$
^ - start line anchor
(?= - open the positive lookahead aka "ahead of me is..."
.* - optionally anything
(?:^|_) - start line anchor or underscore
outbound - the word "outbound"
(?:_|$) - underscore or end line anchor
The underscores and line anchors ensure we don't have false positives like "outbounds" or "goutbound"
) - close the positive lookahead
Rinse and repeat for "service_plus" and "failure"
Since we haven't captured any chars yet, the second and third lookaheads allow for searching the terms in any order
.+$ - capture everything till the end of the line
https://regex101.com/r/Zhl4Mf/1
If the order does matter then build a regex in the correct order:
^.*_outbound_.*_service_plus_failure_.*$
https://regex101.com/r/b7O5YK/1

Regex to get length of group of character classes

I have a workaround for this, but was hoping to find a purely regex solution.
The requirements are:
has one required character
only pulls from a pool of approved characters
minimum length of 4
single word, no whitespace
e.g.
required character: m
pool of characters: [a,b,e,l]
Possible matches:
mabel
abemal
labeam
won't match:
a mael
ama
label
So far I have this expression, but putting a {4,} after it thinks I'm talking about multiplying word matches by 4.
^\b(?:[abel]*[m]+[abel]*)\b
You can use
^(?=.*[m])[abelm]{4,}$
^ start of a line or string
Positive Lookahead (?=.*[m])
Asserts that the string contains at least 1 m character
[abelm]{4,} matches characters in the list abelm
between 4 and unlimited times, as many times as possible.
(greedy) (case sensitive)
$ end of a line or string

Is there a way to use Regex to capture numbers out of a string based on a specific leading letters?

I need to extract any number between 4-10 digits that following directly after 'PO#' OR 'PO# ' (with a whitespace). I do not want to include the PO# with the actual value that is extracted, however I do need it as criteria to target the value within a string. If the digits are less than 4 or greater than 10, I do not wish to capture the value and would like to otherwise ignore it.
A sample string would look like this:
PO#12445 for Vendor Enterprise
or
Invoice# 21412556 for Vendor Enterprise for PO# 12445
My current RegEX expression captures PO# with '#' and I use additional logic after the fact to remove the '#', however my expression is also capturing Invoice# and Inv# which I don't want it to do. I'd like it to only target PO#.
Current Expression: [P][O][#]\s*[0-9]{3,9}\d+\w
Any help would be greatly appreciated!
If you need only the digits, you can use \b(?<=PO#)\s?(\d{4,10})\b, with:
(?<=PO#): positivive lookbehind, be sure that this pattern is present before the needed pattern (PO followed by #)
\s?: 0 or 1 whitespace
(\d{4,10}): between 4 and 10 digits
\b: word boundaries to avoid ie. the 10 first digits of a 11 digits pattern match or 'SPO#' to match
Edit: Alexander Mashin is right about the lookbehind having to be fixed width, so \b(?<=PO#)\s?(\d{4,10})\b is better https://regex101.com/r/1KBQd1/5
Edit: added word boundaries
You can use a capturing group and repeat matching the digits 4-10 times using [0-9]{4,10}.
Note that [P][O][#] is the same as PO#
\bPO#\s*([0-9]{4,10})\b
\bPO#\s* Match PO# preceded by a word boundary and match 0+ whitespace chars
( Capture group 1
[0-9]{4,10} Match 4 - 10 digits
)\b Close group followed by a word boundary to prevent the match being part of a larger word
Regex demo
If PCRE is available, how about:
PO#\s*\K\d{4,10}(?=\D|$)
PO#\s* matches the leading substring "PO#" followed by 0 or more whitespaces.
\K resets the starting position of the match and works as a positive (zero length) lookbehind.
\d{4,10} matches a sequence of digits of 4 <= length <= 10.
(?=\D|$) is the positive lookahead to match a non-digit character or the end of the string.

Match numbers after first character

I'd like to use Regex to determine whether the characters after the first are all numbers.
For example:
A123 would be valid as after A there are only numbers
A12B would be invalid as, after the first character, there is another letter
I essentially want to ignore the first character
I have so far this:
(?<=A)\w*(?=)
but this makes A12B or A1B2C valid, I only want numbers after A.
You could match not a digit \D, followed by matching 1+ times a digit. If that is the whole string, you could use anchors asserting the start ^ and the $ end of the string.
^\D\d+$
That will match:
^ Start of the string
\D Match not a digit
\d+ Match 1+ digits making sure there are digits
$ End of the string
Regex demo
The best solution I can think of is:
^.\d*$
^ - Start of the line
. - Any character (except line terminators)
\d*
\d- a number
* - repeated any number of times (including 0 times. If you want it to be at least 1, change it to +).
$ - End of the line
let regex = /^.\d*$/;
let testStrings = ['A123', 'A12B'];
testStrings.forEach(str => {
console.log(`${str} is ${regex.test(str) ? 'valid' : 'invalid'}`);
});
Your attempt is very complicated, especially given how simple is your goal.
Succeeding at regexes is all about simplicity.
The first character can be anything, so just go with ..
The next ones are all digits, so you want \d.
You'll star it to specify restriction-less repetition, or use + if you want at least one.
Finally, you need to anchor your regex at the beginning and at the end, else it would match stuff like A123XXXXX or XXXXA123.
Note that most implementations of match will already anchor the pattern at the end, so you can omit the caret at the beginning.
Final regex:
^.\d*$
Maybe
(?<=.{1,1})([0-9]+)(?=\s)
(?<=.{1,1}) - has exactly one character before
([0-9]+) - at least one digit
(?=\s) - has a whitespace after
Add ^ at the beginning - to specify beginning of line
Replace (?=\s) with $ for end of line
^[a-zA-Z][0-9]{3}$
^ - "starting with" (Here it is starting with any letter). Read it as ^[a-zA-Z]
[a-z] - any small letters and A-Z any capital letters (you may change if required.)
[0-9] - any numbers
{3} - describes how many numbers you want to check. You have to read it as [0-9]{3}
$ - End of the statement. (Means, in this case it will end up with 3 numbers)
Here you can play around - https://regex101.com/r/mqUHvP/5

Exclude double characters in a string

It's actually simple to do, but I'm stucked in this solution.
I have a list of random characters with a length of 20 contains only capital characters and numbers. As example.
NC6DGL2L41ADTXEP20UP
F3KB7UXUBD5089BKANOY
A5P3UI57KW18UNF89AKL
6O36RJHDLNXW8Y1O1GBC
6CVAT6LTAHEKDRCB9KNH
K20L4MQRA5C677P2NNV8
726WYBOO0X7UTFMSN6VT
AYBECMW9AVJX9AX5F1ZZ
HWKWU0BEIWLHZZJYKDC1
TXLF9FYNIVZ7SHR92ZIH
My goal is to choose only these who doesn't contain a double character in an order like this.
F3KB7UXUBD5089BKANOY
I don't want strings like this, because there is a N character in an order.
NC6NNNN41ADTXEP20UP
(?!^.*([A-Z0-9])\1.*$)^[A-Z0-9]+$
See the demo
Negative Lookahead to make sure that 2 of the same characters do not sit together
(Edited to increase performance, see the other version through the demo link, v1 of the regex).
Breakdown of the regex:
(?! - start of the negative lookahead
^ - from the start of the string
.* - any character, any amount of times
([A-Z0-9]) - capture a character in the ranges given
\1 - the same characters as the first capture group
.*$ any character, any amount of times until the end of the string
) close negative lookahead
This section therefore means, outside of this, do not match anything that from start to finish contains 2 of the same character (in the ranges A-Z and 0-9) sitting together.
^ - from the start of the string
[A-Z0-9]+ - a character in the ranges given, one or more times
$ - until the end