I want to find version numbers in a long string.
Example: "Hellothisisastring 12.3 blabla"
I need to find a substring that is a version number but does not start with "/".
Example: "Hellothisisastring /12.3 blabla"
shouldn't match.
I already build following regex:
[0-9]+.[0-9]
How can I detect a that the version number does not start with "/". The problem is that it is not at the beginning of the string. I already tried with negative lookahead.
(?!/)[0-9]+.[0-9] still matches with a slash before.
Thanks for any help :)
You need to use a lookbehind and include a digit pattern to also fail the positions right after digits:
(?<![\d\/])[0-9]+\.[0-9]+
See the regex demo.
Also, you may match any amount of . + digits using
(?<![\d\/])[0-9]+(?:\.[0-9]+)+
See this regex demo. Details:
(?<![\d\/]) - a negative lookbehind that fails the match if there is a digit or / immediately to the left of the current location
[0-9]+ - one or more digits
(?:\.[0-9]+)+ - one or more sequences of a . and one or more digits.
Related
I have the following regex
(?<=[\/]).*(?=[\/]) I'm trying to run on FM7-4/E27/U20 and I'm trying to only get the character between the two slashes, no numbers. I tried adding [^0-9] but wasn't able to get a match. Any help would be appreciated.
You can use
(?<=\/)[^\/\d]*(?=\d*\/)
See the regex demo.
Details:
(?<=\/) - the / char must appear immediately on the left
[^\/\d]* - zero or more chars other than / and digits
(?=\d*\/) - a positive lookahead that requires zero or more digits and then / immediately on the right.
I have an Url formatted as follow : https://www.mywebsite.com/subdomain/123456789.htm. I know that the webpage number is built with exactly 9 or 10 digits. I would like to extract this number using a Regex.
The Regex I use to perform this operation is :
^https://www.mywebsite.com/[A-Za-z0-9_.-~/]+([0-9]{9,10}).htm$
The problem is that when the number is 10 digits long, I get a match which is good but only the last 9 digits are captured. For example : https://www.mywebsite.com/subdomain/1234567890.htm captures 234567890 only.
I could easily create two regexes (one with 9 digits and one with 10) and take the longest number if both matches, but is there any elegant way to solve this problem using Regex?
EDIT
Following remarks which have been made below, there is actually a mistake in my original Regex : the first character group matches the first digit of the 10, and leaves only the 9 others for the capturing group. I've added a screenshot below. Adding a forward slash to the Regex before the capturing group solved the issue, thanks!
As per #TheFourthBird, you are missing a match on the forward slash. Maybe a slightly different approach to yours would be a non-capturing group:
^https://www.mywebsite.com/(?:[^/]+/)+(\d{9,10}).htm$
The character class [A-Za-z0-9_.-~/]+ matches all the character that follow until the end of the line.
This part ([0-9]{9,10}). will then backtrack until it can match the resulting digits, which it can starting from 9 digits and that will be in the capturing group.
Note to either escape the hyphen \- or place it at the start or end of the character class or else it could possible match a range.
One option is to use a word bounary \b before matching the digits
^https://www\.mywebsite\.com/[A-Za-z0-9_.~/-]+\b([0-9]{9,10})\.htm$
Regex demo
Another way could be matching the / right before the digits.
^https://www\.mywebsite\.com/[A-Za-z0-9_.~/-]+/([0-9]{9,10})\.htm$
Regex demo
If there can also be chars a-zA-Z or an underscoe before the digits and a lookbehind is supported, you could also assert that there is not a digit before (?<!\d)
^https://www\.mywebsite\.com/[A-Za-z0-9_.~/-]+(?<!\d)([0-9]{9,10})\.htm$
Regex demo
One more approach. This gets all the numbers between / and htm
(\d+)(?=\.htm)
RegexDemo
I'm using regex in powershell 5.1.
I need it to detect groups of numbers, but ignore groups followed or preceeded by /, so from this it should detect only 9876.
[regex]::matches('9876 1234/56',‘(?<!/)([0-9]{1,}(?!(\/[0-9])))’).value
As it is now, the result is:
9876
123
6
More examples: "13 17 10/20" should only match 13 and 17.
Tried using something like (?!(\/([0-9]{1,}))), but it did not help.
You may use
\b(?<!/)[0-9]+\b(?!/[0-9])
See the regex demo
Alternatively, if the numbers can be glued to text:
(?<![/0-9])[0-9]+(?!/?[0-9])
See this regex demo.
The first pattern is based on word boundaries \b that make sure there are no letters, digits and _ right before and after an expected match. The second one just makes sure there are no digits and / on both ends of the match.
Details
(?<![/0-9]) - a negative lookbehind making sure there is no digit or / immediately to the left of the current location
[0-9]+ - one or more digis
(?!/?[0-9]) - a negative lookahead making sure there is no optional / followed with a digit immediately to the right of the current location.
I need to check a block of text from an email for a number that's exactly 8 digits long, and only return the first match.
Here are my test cases:
Test123456789 -- should fail because 9 digits
Test23456789Test -- pass
Test23456789 Test -- pass
13456780Test -- pass
Test0123456 -- fail because 7 digits
Extra text in the email: I’ve attached the information you requested. If you have any questions, please let us know. -- extra text in the email shouldn't matter.
I've tried:
.*(\d{8}).* -- matches multiples
.*?(\d{8}).* -- only one match but it also matches on a 9 digit number
.*(?<!\d)\d{8}(?!\d).* -- I found in another answer but it returns all of the text in the email and I only want the 8 digit number.
Thank you for any guidance!
You can use the following regex:
(?!.*\d{9})\d{8}
It starts by using a negative look ahead for 9 digits. Then it matches 8 digits.
This will fail if there's 7 or 9 digits.
A small tweak to the last version you posted:
Try: .*(?<!\d)(\d{8})(?!\d).*
Demo
As the others have said, you can use negative look ahead and negative look behind, and remember to not include a g flag or else you'll match every occurrence of the pattern:
(?<!\d)\d{8}(?!\d)
Demo (global match)
You could find the first occurrence using:
^[\s\S]+?(?<!\d)(\d{8})(?!\d)
That will match:
^ Assert start of the string
[\s\S]+? Match any character non greedy
(?<!\d) negative lookbehind to check what is on the left is not a digit
(\d{8}) Capture 8 digits in a group
(?!\d) Negative lookahead to check what is on the right is not a digit
Or enable make the dot match the newline in your tool or language or prefix the regex with (?s) and replace [\s\S]+? with .+?
Your value is in the first capturing group.
Regex demo
I'm working with some builds and have to write a regex to include some branches, but exclude others (in order to create the builds).
The best I could come up with is this regex which uses positive and negative lookahead to match the branch names:
(?=.*12\.3)^((?!(version-12\.3)).)*$
Here are the branch names and how they should be matched:
bugfix-ISSUE-123-some-details-version-12.3
ISSUE-1234-some-other-details-version-12.3
bugfix-12.3
bugfix2-12.3
12.3stuff
stu12.3ff
// match everything above, but don't match anything from below
master
version-12.3
version-3.21
some-other-branch
bugfix-3.21
test
Please use this online tool (it's the only one I found that supports negative and positive lookahead regexes).
Right now the regex I came up with works fine, EXCEPT for the following 2 branches:
bugfix-ISSUE-123-some-details-version-12.3
ISSUE-1234-some-other-details-version-12.3
The reason they are not included is because I used this negative lookahead regex which excludes version-12.3 (what I want), but also excludes anything else that includes this string (like ISSUE-123-version-12.3, which I want included, but it's not):
((?!(version-12\.3)).)*$
Can you help a bit, please?
If you need to fail all matches where a string having 12.3 inside starts with version-+some digits/dots, you may use
^(?!version-\d+\.\d).*12\.3.*$
See the regex demo.
Details:
^ - start of string
(?!version-\d+\.\d) - a negative lookahead that fails the match if there is version-, 1+ digits, a dot and a digit right at the start of the string
.* - any 0+ chars (other than line break chars)
12\.3 - a 12.3 substring
.* - any 0+ chars (other than line break chars)
$ - end of string.
If the version- + digits/dots is disallowed as a whole string, use
^(?!version-[\d.]+$).*12\.3.*$
See another regex demo.
Here, ^ will match the start of string and then the (?!version-[\d.]+$) will trigger the check: if there is version- followed with 1+ digits/dots up to the string end ($) the match will be failed.