Regex lookahead part of group accepted

Regex lookahead part of group accepted - regex

I'm using regex in powershell 5.1.
I need it to detect groups of numbers, but ignore groups followed or preceeded by /, so from this it should detect only 9876.
[regex]::matches('9876 1234/56',‘(?<!/)([0-9]{1,}(?!(\/[0-9])))’).value
As it is now, the result is:
9876
123
6
More examples: "13 17 10/20" should only match 13 and 17.
Tried using something like (?!(\/([0-9]{1,}))), but it did not help.

You may use
\b(?<!/)[0-9]+\b(?!/[0-9])
See the regex demo
Alternatively, if the numbers can be glued to text:
(?<![/0-9])[0-9]+(?!/?[0-9])
See this regex demo.
The first pattern is based on word boundaries \b that make sure there are no letters, digits and _ right before and after an expected match. The second one just makes sure there are no digits and / on both ends of the match.
Details
(?<![/0-9]) - a negative lookbehind making sure there is no digit or / immediately to the left of the current location
[0-9]+ - one or more digis
(?!/?[0-9]) - a negative lookahead making sure there is no optional / followed with a digit immediately to the right of the current location.

Related

Regex should find substring that does not start with /

I want to find version numbers in a long string.
Example: "Hellothisisastring 12.3 blabla"
I need to find a substring that is a version number but does not start with "/".
Example: "Hellothisisastring /12.3 blabla"
shouldn't match.
I already build following regex:
[0-9]+.[0-9]
How can I detect a that the version number does not start with "/". The problem is that it is not at the beginning of the string. I already tried with negative lookahead.
(?!/)[0-9]+.[0-9] still matches with a slash before.
Thanks for any help :)

You need to use a lookbehind and include a digit pattern to also fail the positions right after digits:
(?<![\d\/])[0-9]+\.[0-9]+
See the regex demo.
Also, you may match any amount of . + digits using
(?<![\d\/])[0-9]+(?:\.[0-9]+)+
See this regex demo. Details:
(?<![\d\/]) - a negative lookbehind that fails the match if there is a digit or / immediately to the left of the current location
[0-9]+ - one or more digits
(?:\.[0-9]+)+ - one or more sequences of a . and one or more digits.

How to match a 10 digit phone number which may or maynot be have a 2 or 3 digit country code.The country code is not to be matched

Example string
fgcfghhfghfgch1234567890fghfghfgh fhghghfgh+916546546165fghfghfghfgh fhfghfghfghfgh+915869327425ghfghfghfgh
I want to match
1234567890
6546546165
5869327425
In essence i would like to do something like this (?<=\+\d{2})?\d{10}.
Match 10 digits \d{10} which may follow ? a country code in format: \+\d{2}.
What would be a correct regular expression to do this?
Also,
What to do if the country code could possibly be even 3 digit long.
e.g.
+917458963214
+0047854123698
match 7854123698 and 7458963214.

Your expected matches all appear to be immediately followed with a char other than a digit.
I suggest making the pattern inside the positive lookbehind optional and adding (?!\d) lookahead at the end to fail the match (and thus triggering backtracking in the lookbehind) if the ten digits are immediately followed with a digit:
(?<=(?:\+\d{2,3})?)\d{10}(?!\d)
See the regex demo. Details:
(?<=(?:\+\d{2,3})?) - a positive lookbehind that requires + and two or three digits or an empty string immediately to the left of the current location
\d{10} - ten digits
(?!\d) - no digit allowed immediately on the right.
However, as in 99% you can access captured substrings, you should just utilize a capturing group:
(?:\+\d{2,3})?(\d{10})
See this regex demo. Your values are in Group 1.

Regex Exclude Number Within Two Characters of Number

I have some manually entered data (it's an email subject), and I am trying to extract the correct ID to perform a series of actions with RPA on.
RE:'HC=312-822-281' abc2-1234567 7354612
I have a regex query:
(?<!\d)\d{7}(?!\d)
I want to extract 7354612 but not 1234567.
I want to avoid matching any 7-digit number that is preceded with a hyphen, or a hyphen and a space.
My initial query works 80% of the time, but this hyphen issue is interfering with the other 20%.

You can modify the existing (?<!\d) lookbehind to also exclude the position after a hyphen, i.e. (?<![\d-]), and add another lookbehind to exclude the hyphen + space context ((?<!- ) or (?<!-\s)):
(?<![\d-])(?<!- )\d{7}(?!\d)
(?<![\d-])(?<!-\s)\d{7}(?!\d)
Note \s matches any whitespace. See the regex demo.
Details
(?<![\d-]) - a negative lookbehind that fails the match if there is a digit or a hyphen immediately to the left of the current location
(?<!-\s) - a negative lookbehind that fails the match if there is a - and a space after it immediately to the left of the current location
\d{7} - any seven digits
(?!\d) - a negative lookahead that fails the match if there is a digit immediately to the right of the current location.
Variations
With PCRE regex, you may also use
-\s*\d{7}(?!\d)(*SKIP)(*F)|(?<!\d)\d{7}(?!\d)
See the regex demo, where -\s*\d{7}(?!\d)(*SKIP)(*F)| matches -, 0+ spaces, seven digits after which there are no more digits and skips that match, only returning matches for the (?<!\d)\d{7}(?!\d) pattern.
In .NET, modern JavaScript and PyPi regex in Python, you may use
(?<!\d|-\s*)\d{7}(?!\d)
See this regex demo. Here, (?<!\d|-\s*) negative lookbehind fails the match if there is a digit or - + 0 or more whitespace chars immediately to the left of the current position.

Regex : extract the biggest number from x to y figures

I have an Url formatted as follow : https://www.mywebsite.com/subdomain/123456789.htm. I know that the webpage number is built with exactly 9 or 10 digits. I would like to extract this number using a Regex.
The Regex I use to perform this operation is :
^https://www.mywebsite.com/[A-Za-z0-9_.-~/]+([0-9]{9,10}).htm$
The problem is that when the number is 10 digits long, I get a match which is good but only the last 9 digits are captured. For example : https://www.mywebsite.com/subdomain/1234567890.htm captures 234567890 only.
I could easily create two regexes (one with 9 digits and one with 10) and take the longest number if both matches, but is there any elegant way to solve this problem using Regex?
EDIT
Following remarks which have been made below, there is actually a mistake in my original Regex : the first character group matches the first digit of the 10, and leaves only the 9 others for the capturing group. I've added a screenshot below. Adding a forward slash to the Regex before the capturing group solved the issue, thanks!

As per #TheFourthBird, you are missing a match on the forward slash. Maybe a slightly different approach to yours would be a non-capturing group:
^https://www.mywebsite.com/(?:[^/]+/)+(\d{9,10}).htm$

The character class [A-Za-z0-9_.-~/]+ matches all the character that follow until the end of the line.
This part ([0-9]{9,10}). will then backtrack until it can match the resulting digits, which it can starting from 9 digits and that will be in the capturing group.
Note to either escape the hyphen \- or place it at the start or end of the character class or else it could possible match a range.
One option is to use a word bounary \b before matching the digits
^https://www\.mywebsite\.com/[A-Za-z0-9_.~/-]+\b([0-9]{9,10})\.htm$
Regex demo
Another way could be matching the / right before the digits.
^https://www\.mywebsite\.com/[A-Za-z0-9_.~/-]+/([0-9]{9,10})\.htm$
Regex demo
If there can also be chars a-zA-Z or an underscoe before the digits and a lookbehind is supported, you could also assert that there is not a digit before (?<!\d)
^https://www\.mywebsite\.com/[A-Za-z0-9_.~/-]+(?<!\d)([0-9]{9,10})\.htm$
Regex demo

One more approach. This gets all the numbers between / and htm
(\d+)(?=\.htm)
RegexDemo

Regex for 5 digit number with optional characters

I am trying to create a regex to validate a field where the user can enter a 5 digit number with the option of adding a / followed by 3 letters. I have tried quite a few variations of the following code:
^(\d{5})+?([/]+[A-Z]{1,3})?
But I just can't seem to get what I want.
For instance l would like the user to either enter a 5 digit number such as 12345 with the option of adding a forward slash followed by any 3 letters such as 12345/WFE.

You probably want:
^\d{5}(?:/[A-Z]{3})?$
You might have to escape that forward slash depending on your regex flavor.
Explanation:
^ - start of string anchor
\d{5} - 5 digits
(?:/[A-Z]{3}) - non-capturing group consisting of a literal / followed by 3 uppercase letters (depending on your needs you could consider making this a capturing group by removing the ?:).
? - 0 or 1 of what precedes (in this case that's the non-capturing group directly above).
$ - end of string anchor
All in all, the regex looks like this:

You can use this regex
/^\d{5}(?:\/[a-zA-Z]{3})?$/

^\d{5}(?:/[A-Z]{3})?$
Here it is in practice (this is a great site to test your regexes):
http://regexr.com?36h9m

^(\d{5})(\/[A-Z]{3})?
Tested in rubular

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex lookahead part of group accepted - regex

Related

Regex should find substring that does not start with /

How to match a 10 digit phone number which may or maynot be have a 2 or 3 digit country code.The country code is not to be matched

Regex Exclude Number Within Two Characters of Number

Regex : extract the biggest number from x to y figures

Regex for 5 digit number with optional characters

Categories

Resources