optionals in a look around regular expressions

optionals in a look around regular expressions - regex

I'm trying to figure out how to put option values in a lookaround with regular expressions.
These values should match
3
1000
15-20
2048-4096/100
This value should not
3/4
I want to say in regex "only match if there is a dash 4 digit number and a colon preceding the / division symbol
For example:
-9999 preceding the / division symbol should match
9999/ should not match because there is no -
-/ should not match because there is no number
^[^0][0-9]*(-|:)?([0-9]*)?(?<=[0-9])(\/)?([0-9]*)$
I have the look around just looking for a preceding number but if I put a ? or * in it it no longer works. Thanks for the help!!!

^\d+(?:[-:](?:\d{4}\/\d+|\d+))?$
If I'm understanding what you want correctly,
\d+ Starts with some number
(?: ...)? Followed by optional pattern which begins with a dash or colon
\d{4}/\d+ The number in the optional group can be a 4 digit number with a slash and another number
\d+ Or the optional group can be any number if it doesn't begin with a slash.

Related

Is there a way to use Regex to capture numbers out of a string based on a specific leading letters?

I need to extract any number between 4-10 digits that following directly after 'PO#' OR 'PO# ' (with a whitespace). I do not want to include the PO# with the actual value that is extracted, however I do need it as criteria to target the value within a string. If the digits are less than 4 or greater than 10, I do not wish to capture the value and would like to otherwise ignore it.
A sample string would look like this:
PO#12445 for Vendor Enterprise
or
Invoice# 21412556 for Vendor Enterprise for PO# 12445
My current RegEX expression captures PO# with '#' and I use additional logic after the fact to remove the '#', however my expression is also capturing Invoice# and Inv# which I don't want it to do. I'd like it to only target PO#.
Current Expression: [P][O][#]\s*[0-9]{3,9}\d+\w
Any help would be greatly appreciated!

If you need only the digits, you can use \b(?<=PO#)\s?(\d{4,10})\b, with:
(?<=PO#): positivive lookbehind, be sure that this pattern is present before the needed pattern (PO followed by #)
\s?: 0 or 1 whitespace
(\d{4,10}): between 4 and 10 digits
\b: word boundaries to avoid ie. the 10 first digits of a 11 digits pattern match or 'SPO#' to match
Edit: Alexander Mashin is right about the lookbehind having to be fixed width, so \b(?<=PO#)\s?(\d{4,10})\b is better https://regex101.com/r/1KBQd1/5
Edit: added word boundaries

You can use a capturing group and repeat matching the digits 4-10 times using [0-9]{4,10}.
Note that [P][O][#] is the same as PO#
\bPO#\s*([0-9]{4,10})\b
\bPO#\s* Match PO# preceded by a word boundary and match 0+ whitespace chars
( Capture group 1
[0-9]{4,10} Match 4 - 10 digits
)\b Close group followed by a word boundary to prevent the match being part of a larger word
Regex demo

If PCRE is available, how about:
PO#\s*\K\d{4,10}(?=\D|$)
PO#\s* matches the leading substring "PO#" followed by 0 or more whitespaces.
\K resets the starting position of the match and works as a positive (zero length) lookbehind.
\d{4,10} matches a sequence of digits of 4 <= length <= 10.
(?=\D|$) is the positive lookahead to match a non-digit character or the end of the string.

RegEx for matching digits and one dot with quantifier

I have a specific pattern I'm trying to get. The pattern I'm looking for is the following: 13 digits with a possible dot for a total of min 3 and max 13 digits (including the dot if present) and ending with "/" and number from 1 to 6.
for now I have this pattern
^(\d*|\d*\.?\d*)\/[1-6]$
but this matches 1234/1 or 123456.890123456778/2
but it's not what I need
I tried a few things but I think I missing something
^(\d*|\d*\.?\d*){3-13}\/[1-6]$
Possible match:
1.3/1
123456./2
123456.890123/3
1234567890123/4
123/5
How do I solve this problem?

Your wordings are a little confusing but if I got you correct then you can use this regex,
^(?=.{5,15}$)\d+\.?\d*\/[1-6]$
Explanation:
^ - Start of string
(?=.{5,15}$) - This positive look ahead ensures that the minimum length is 5 and max length is 15 (adding two for last slash and number)
\d+\.?\d* - Starts capturing the text with one or more digits followed by optional dot . and further more zero or more digits
\/[1-6] - Matches a slash and one to six digit
$ - End of string
Regex Demo
Let me know if this works fine for you else list the case for which it doesn't work.

Regex for 9-digit phone number dot-separated

I would like to check if a phone number contains exactly 3 digits - dot - 3 digits - dot - 3 digits. (e.g. 123.456.789)
So far I have this, but it doesn't work:
^(\d{3}\){2}\d{4}$

Note that an escaped bracket \) loses its special meaning in regex and the pattern becomes invalid since the capturing group is not closed.
If you want to match a dot with a regex, you need to include it to your pattern, and if you say 3 digits must be at the end there is no point in declaring 4 digits with \d{4}.
^(\d{3}\.){2}\d{3}$
^ ^
or if we expand the first group:
^\d{3}\.\d{3}\.\d{3}$
So all the fix consists in adding a dot after the second backslash and adjusting the final limiting quantifier.
Note that for mostly "stylistics" concerns (since efficiency gain is insignificant) I'd use a non-capturing group with the first regex variant:
^(?:\d{3}\.){2}\d{3}$

Why does this not match my example?

as I go through the regex101 quiz/lessons, I am supposed to match an IP address (without leading zeros).
Now the following
^[^0]+[0-9]+\\.[^0]+[0-9]+\\.[^0]+[0-9]+\\.[^0]+[0-9]+$
matches 23.34.7433.33
but fails to match single digit numbers like 1.2.3.4
Why is this so, when my + is supposed to match "1 to infinite" times...?

You are in fact matching more than 2 digits for each number in the IP address because you have:
[^0]+[0-9]+
[^0]+ matches at least one character, and [0-9]+ matches at least 1 character. Both will match 'at least 2 characters' (characters being in scope of the character classes).
Also 23.34.7433.3 doesn't match your regex for the reason I stated above.
And you might try this regex for the purpose you stated:
^(?:[1-9][0-9]{0,2}\.){3}[1-9][0-9]{0,2}$
[1-9][0-9]{0,2} will match up to 3 digits, with a non leading 0.
EDIT: You mentioned in a comment that 0.0.0.0 (single digit zeroes) are to be accepted as well. The modified regex from above would be:
^(?:(?:[1-9][0-9]{0,2}|0)\.){3}(?:[1-9][0-9]{0,2}|0)$

Assuming you want to check an IPv4, I suggest you this pattern:
^(?<nb>2(?>[0-4][0-9]|5[0-5])|1[0-9]{2}|[1-9]?[0-9])(?>\.\g<nb>){3}$
I have defined a named subpattern nb to make the pattern shorter, but if you prefer, you can rewrite all and replace \g<nb>:
^(?>2(?>[0-4][0-9]|5[0-5])|1[0-9]{2}|[1-9]?[0-9])(?>\.(?>2(?>[0-4][0-9]|5[0-5])|1[0-9]{2}|[1-9]?[0-9])){3}$
Numbers greater than 255 are not allowed.
pattern details:
The goal is to describe what is allowed:
numbers with 3 digits that begins with "2" can be followed by a digit in [0-4] and a digit in [0-9] OR by 5 and a digit in [0-5] because it can exceed 255.
numbers with 3 digits that begins with "1" can be followed by any two digits.
any number with 2 digits that doesn't begin with "0"
any number with 1 digit (zero included)
If I add one by one these rules, I obtain
2(?>[0-4][0-9]|5[0-5])
2(?>[0-4][0-9]|5[0-5]) | 1[0-9]{2}
2(?>[0-4][0-9]|5[0-5]) | 1[0-9]{2} | [1-9][0-9]
2(?>[0-4][0-9]|5[0-5]) | 1[0-9]{2} | [1-9][0-9] | [0-9]
Now I have a definition for allowed numbers. I can reduce a little the size of the pattern replacing [1-9][0-9] | [0-9] by [1-9]?[0-9]
Then you only have to add the dot repeat the subpattern four times: x.x.x.x
But since there is only three dots, I write the first number and I repeat 3 times a group that contains a dot and a number:
2(?>[0-4][0-9]|5[0-5])|1[0-9]{2}|[1-9]?[0-9] # the first number
(?>\.2(?>[0-4][0-9]|5[0-5])|1[0-9]{2}|[1-9]?[0-9]){3} # the group repeated 3 times
To be sure that the string doesn't contain anything else that the IP I described, I add anchors for the start of string ^ and for the end of string $, then the string begins and ends with the IP.
To reduce the size of a pattern you can define a named group which allows to reuse the subpattern it contains,
Then you can rewrite the pattern like this:
^
(?<nb> 2(?>[0-4][0-9]|5[0-5])|1[0-9]{2}|[1-9]?[0-9] ) # named group definition
(?> \. \g<nb> ){3} # \g<nb> is the reference to the subpattern named nb
$

[0-9]+ should be [0-9]*
* matches 0 or more.
+ matches 1 or more.
You already have the case [^0] <--- this actually wrong because it will match letters also.
besides that it will match the first character that's NOT zero then at least one number after that.
It should be written as
[1-9][0-9]*
This essentially checks the first letter and sees if its a number that's between 1-9 then the next numbers(0 nums to infinite nums) after that is a number 0-9.
Then this will come out to.
^[1-9][0-9]*\.[1-9][0-9]*\.[1-9][0-9]*\.[1-9][0-9]*$
Edit live on Debuggex
cleaning it up.
^(?:[1-9][0-9]*\.){3}[1-9][0-9]*$
this should work...
^(?:[1-9][0-9]*\.|[0-9])(?:[1-9][0-9]*\.|[0-9])(?:[1-9][0-9]*\.|[0-9])(?:[1-9][0-9]*|[0-9])$
cleaned up.
^(?:(?:[1-9][0-9]*|0)\.){3}(?:[1-9][0-9]*|0)$

Your regex would match ABCDEFG999.FOOBSR888 etc, because [^0] is any character other than a zero, and bith character classes are required by the +.
I think you want this:
^[1-9]\d*(\\.[1-9]\d*){3}$
having replaced various verbose expressions with their equivalent, this is 4 groups of digits each starting with a non-zero.
Actually the problem is far more complicated, because your approach (once corrected) allows 999.999.999.999, which is not a valid IP.

It might be because you need at least two digits between two dots '.'
try using this pattern: ^[^0]+[0-9]*\.[^0]+[0-9]*\.[^0]+[0-9]*\.[^0]+[0-9]*$

to match ip address you should use this pattern:
\b(?:\d{1,3}.){3}\d{1,3}\b
taken from here:
http://www.regular-expressions.info/examples.html

Regex expressions for matching comparisons

Is it possible to create a regular expression that matches a comparison such as less than or greater than? For example, match all dollar values less than $500.
One way I would use this would be on online stores that list many products on a single page but do not provide a way to sort by price. I found a search page by regex extension for Chrome and am trying to figure out if there is a way I can use a regex to match any strings on the page beginning with a dollar sign followed by any number less than a number that I specify.

This should work for you \$[1-4]?\d?\d\b.
Explanation:
r"""
\$ # Match the character “$” literally
[1-4] # Match a single character in the range between “1” and “4”
? # Between zero and one times, as many times as possible, giving back as needed (greedy)
\d # Match a single digit 0..9
? # Between zero and one times, as many times as possible, giving back as needed (greedy)
\d # Match a single digit 0..9
\b # Assert position at a word boundary
"""

This could do what you need: ^(\$[1-4]?\d?\d)$. This will match any value between $1 and $499.
As mentioned above, if you would like to match even decimal values you could use something like so: ^(\$[1-4]?\d?\d(\.\d{2})?)$. That being said, numeric validation should ideally be done using actual mathematical operations, and not regular expressions.

Edit: this is overly complicated, but it will also match any value strictly less than 500
\$[1-4]\d{2}(\.\d{2})?$|\$\d{1,2}(\.\d{2})?$
if you need to match $500 as well, add another |\$500(\.00)?$
This matches:
\$ the dollar symbol
[1-4] followed by a digit between 1 and 4
\d{2} followed by exactly 2 digits
(\.\d{2})? optionally --> ()? followed by a dot --> \. and exactly 2 digits
$ followed by end of line (may be replaced with \b for word boundaries)
| or
\$\d{1,2} the dollar symbol followed by any two digits
(\.\d{1,2})?$ again optionally followed by cents, followed by end of line

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

optionals in a look around regular expressions - regex

Related

Is there a way to use Regex to capture numbers out of a string based on a specific leading letters?

RegEx for matching digits and one dot with quantifier

Regex for 9-digit phone number dot-separated

Why does this not match my example?

Regex expressions for matching comparisons

Categories

Resources