I'm looking for a regex pattern that would match a version number.
The solutions I found here don't really match what I need.
I need the pattern to be valid for single numbers and also for numbers followed by .
The valid numbers are
1
1.23
1.2.53.4
Invalid numbers are
01
1.02.3
.1.2
1.2.
-1
Consider:
^[1-9]\d*(\.[1-9]\d*)*$
Breaking that down:
^ - Start at the beginning of the string.
[1-9] - Exactly one of the characters 1 thru 9.
\d* - More digits.
( - Beginning of some optional extra stuff
\. - A literal dot.
[1-9] - Exactly one of the characters 1 thru 9.
\d* - More digits.
) - End of the optional extra stuff.
* - There can be any number of those optional extra stuffs.
$ - And end at the end of the string.
Beware
Some of this syntax differs depending what regex engine you are using. For example, are you using the one from Perl, PHP, Javascript, C#, MySQL...?
In my experience, version numbers do not fit the neat format you described.
Specifically, you get values like 0.3RC5, 12.0-beta6, 2019.04.15-alpha4.5, 3.1stable, V6.8pl7 and more.
If you are validating existing data, make sure that your criteria fit the conditions you've described. In particular, if you are following "Semantic Versioning", be aware that versions which are zeros are legal, so 1.0.1, that "Additional labels for pre-release and build metadata are available as extensions to the MAJOR.MINOR.PATCH format.", and that "1" is not a legal version number.
Be warned that the above will also match stupidly long version numbers like 1.2.3.4.5.6.7.8.9.10.11.12.13.14. To prevent this, you can restrict it, like so:
^[1-9]\d*(\.[1-9]\d*){0,3}$
This changes the * for "any number of optional extra dots and numbers" to a range from zero to three. So it'd accept 1, 1.2, 1.2.3, and 1.2.3.4, but not 1.2.3.4.5.
Also, if you want zeros to be legal but only if there are no other numbers (so 0.3, 1.0.1), then it gets a little more complex:
^(0|[1-9]\d*)(\.(0|[1-9]\d*)){0,3}$
This question may also be a duplicate: A regex for version number parsing
Major.Minor.Patch - npm version like 0.1.2:
^([1-9]\d*|0)(\.(([1-9]\d*)|0)){2}$
More or optional minor groups like 1.1.5.0 or just 1.2:
^([1-9]\d*|0)(\.(([1-9]\d*)|0)){0,3}$
Avoid leading zero - no |0 in first group:
^([1-9]\d*)(\.(([1-9]\d*)|0)){0,3}$
Semantic Version String like 1.0.0-beta
^(0|[1-9]\d*)\.(0|[1-9]\d*)\.(0|[1-9]\d*)(?:-((?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\.(?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:\+([0-9a-zA-Z-]+(?:\.[0-9a-zA-Z-]+)*))?$
Break down:
^: match the line start
$: match the line end
( and ): make a group
([1-9]\d*|0): match version number
[1-9]\d*: starting with 1~9, following any number of digit
|: logical or
0: literal zero
\.: literal (escaped) dot
{2}: exact 2 matches
{0,3}: 0~3 matches
Test cases (regex101 JavaScript):
Match:
0.0.0
0.0.1
0.1.0
1.0.0
1.0.1
1.1.0
1.1.1
0.0.10
0.10.0
10.0.0
0.1.10
1.0.10
1.0.100
0.100.1
100.0.0
1.20.0
Not match:
0.0.00
0.00.0
00.00.0
0.0.01
0.01.0
01.0.0
0.01.0
01.0.0
00.0.01
This regex should help:
^(([1-9]+\d*\.)+[1-9]+\d*)|[1-9]+\d*$
Below is the explanation.
[1-9]+\d* means a sequence which begins with a non-zero number, followed by zero or more numbers
The first part (([1-9]+\d*\.)+[1-9]+\d*) catches all of your correct examples, except of 1. So, we have a | (or), followed by a [1-9]+\d* sequence.
([\*,\^])([\-,\*,\w]+[\.])+(\w)*
for npm package fro example
"cross-env": "^5.2.0",
Related
I'm not sure if using regex is the correct way to go about this here, but I wanted to try solving this with regex first (if it's possible)
I have an edifact file, where the data (in bold) in certain fields in some segments need to be substituted (with different dates, same format)
UNA:+,? '
UNB+UNOC:3+000000000+000000000+20190801:1115+00001+DDMP190001'
UNH+00001+BRKE:01+00+0'
INV+ED Format 1+Brustkrebs+19880117+E000000001+**20080702**+++1+0'
FAL+087897044+0000000++name+000000000+0+**20080702**++1+++J+N+N+N+N+N+++0'
INL+181095200+385762115+++0'
BEE+20080702++++0'
BAA+++J+J++++++J+++++++J++0'
BBA++++++++J++++++J+J++++++J+++++J+++J+J++++++++J+0'
BHP+J+++++J+++++J+++++0'
BLA+++J+++++++++0'
BFA++++++++++++J++0'
BSA++J+++J+J+++0'
BAT+20190801+0'
DAT+**20080702**++++0'
UNT+000014+00001'
UNZ+00001+00001'
at first I was able to match those fields using a positive lookahead and a lookbehind (I had different expressions for matching each date).
Here, for example is the expression I intially used to match the date in the "FAL" segment: (?<=\+[\d]{1}\+)\d{8}(?=\+\+), but then i saw that this date is sometimes preceeded by 9 digits, and sometimes by 1 (based on version) and followed by a either ++ or a + and a date so I added a logiacl OR like this: (?<=\+[\d]{9}\+|\+[\d]{1}\+)\d{8}(?=\+[\d]{8}\+|\+\+)and quickly realized it's not sustainable because I saw that these edifact files vary (far beyond only either 9 and 1 digits)
(I have 6 versions for each type, and i have 6 types total)
Because I have a scheme/map indicating what each version should be built like and I know on what position (based on the + separator) the date is written in each version, I thought about maybe matching the date based on the +, so after the 7th occurence (say in the FAL segment) of plus in a certain line, match the next 8 digits.
is this possible to achieve with regex? and if yes, could someone please tell me how?
I suggest using a pattern like
^((?:[^+\n]*\+){7})\d{8}(?=\+(?:\d{8})?\+)
where {7} can be adjusted to the value you need for each type of segments, and replace with the backreference to Group 1. In Python, it is \g<1>20200101 (where 20200101 is your new date), in PHP/.NET, it is ${1}20200101. In JS, it will be just $1.
To run on a multiline text, use m flag. In Python regex, you may embed it like (?m)^((?:[^+\n]*\+){7})\d{8}(?=\+(?:\d{8})?\+).
See the Python regex demo
Details
^ - start of string/line
((?:[^+\n]*\+){7}) - Group 1: 7 repetitions of any chars other than + and newline, and then a +
\d{8} - 8 digits
(?=\+(?:\d{8})?\+) - that are followed with +, and optional chunk of 8 digits and a +.
I'm trying to make
09-546-943
fail in the below regex pattern.
^[0-9]{2,3}[- ]{0,1}[0-9]{3}[- ]{0,1}[0-9]{3}$
Passing criteria is
greater than 10-000-000 or 010-000-000 and
less than 150-000-000
The tried example "09-546-943" passes. This should be a fail.
Any idea how to create a regex that makes this example a fail instead of a pass?
You may use
^(?:(?:0?[1-9][0-9]|1[0-4][0-9])-[0-9]{3}-[0-9]{3}|150-000-000)$
See the regex demo.
The pattern is partially generated with this online number range regex generator, I set the min number to 10 and max to 150, then merged the branches that match 1-8 and 9 (the tool does a bad job here), added 0? to the two digit numbers to match an optional leading 0 and -[0-9]{3}-[0-9]{3} for 10-149 part and -000-000 for 150.
See the regex graph:
Details
^ - start of string
(?: - start of a container non-capturing group making the anchors apply to both alternatives:
(?:0?[1-9][0-9]|1[0-4][0-9]) - an optional 0 and then a number from 10 to 99 or 1 followed with a digit from 0 to 4 and then any digit (100 to 149)
-[0-9]{3}-[0-9]{3} - a hyphen and three digits repeated twice (=(?:-[0-9]{3}){2})
| - or
150-000-000 - a 150-000-000 value
) - end of the non-capturing group
$ - end of string.
This expression or maybe a slightly modified version of which might work:
^[1][0-4][0-9]-[0-9]{3}-[0-9]{3}$|^[1][0]-[0-9]{3}-[0-9]{2}[1-9]$
It would also fail 10-000-000 and 150-000-000.
In this demo, the expression is explained, if you might be interested.
This pattern:
((0?[1-9])|(1[0-4]))[0-9]-[0-9]{3}-[0-9]{3}
matches the range from (0)10-000-000 to 149-999-999 inclusive. To keep the regex simple, you may need to handle the extremes ((0)10-000-000 and 150-000-000) separately - depending on your need of them to be included or excluded.
Test here.
This regex:
((0?[1-9])|(1[0-4]))[0-9][- ]?[0-9]{3}[- ]?[0-9]{3}
accepts (space) or nothing instead of -.
Test here.
I require a RegEx that validates the entered number to be either
0
1-9 with one decimal (i.e. 5.5, but not 3.33 or 7)
10.0
I was trying with the below RegEx, but I'm not succeeding..
/(0|[1-9]\.[1-9]|\10.0)/g
fyi, in my system, a '0' means 'No grade entered', a 10.0 is the maximum and the minimum is a 1.0
You're having problems probably because you're missing the start and end markers, e.g., ^ and $, hence, your solution is not bounded to the whole input. Try:
/^(0|[1-9]\.[0-9]|10\.0)$/g
I think this one is more robust?
^0$|^[1-9]{1}\.[0-9]{1}$|^10\.0$
Main things to worry about are the above ones will for example match 12.0, because the 0 is not anchored. You also want to use {1} quantifiers in the decimal case, and include [0-9] after the decimal (so 7.0 is matched).
EDIT: Explanation of changes
Adding ^ and $ to each of the three options ensures that the match is the whole of the string. This means that for example, ^0$ matches 0 but does not match 0.0, 01, 6.0 or any other string where 0 is only part of the string.
Changing [1-9] after the decimal point in the second option to [0-9] allows 7.0 to be matched, where it previously would not
Adding the quantifiers {1} to the [] groups before and after the decimal point in the second option ensures that we match only a single digit. Previously, [1-9]\.[0-9] would match 12.1, 5.33 and other strings where the match is only contained within the string. They are somewhat redundant with ^ and $, but with regexes I like failsafes...
I also moved the escaping \ in the third option to be before the decimal point, which seemed like a typo (we want to match a literal . not use . to mean any character)
My proposal:
(?<![0-9.])(0|(?:[1-9]\.[0-9])|(?:10\.0))(?![0-9.])
All of the following will match: 0, 1.1, 1.0, 1.9, 2.0, 2.1, 9.0, 9.1, 9.9, 10.0, but all of the following will not: 0.1, 0.2, 0.9, 1.11, 1.20, 1.01, 10.05, 110.05. Does not require one-number per line, can extract numbers embedded in text.
Here is the example: regex101
More detailed explanation:
(?<![0-9.])
is a look-behind that prevents us from ripping out pieces of number-literals in multi-line input, e.g. 10000010.0 should not be matched.
(0|(?:[1-9]\.[0-9])|(?:10\.0))
This is the part that matches your specification. The ?: is needed only if you want to keep the matched groups "clean", in the sense that there will be no group(2) for the middle case
(?![0-9.])
This is another look ahead, again: important only for multi-line text.
If you drop look-behinds, look-aheads and "environmentally friendly match-groups", you end up with something like:
0|([1-9]\.[0-9])|(10\.0)
and if you are working with one-item-per-line input, you can add prefix ^ and suffix $, and go with that.
Does someone has some experiences with Regex 0.12 ?
At this moment, we're using Watson Explorer enterprise.
And therefore we've to construct an XSL script which can retrieve perticual meta-data back:
The pattern which we basically need are 3.2.14P5879 or 14.1.1Z5526
Thus: Digits Dot Digits Dot Digits Letter-P_or_Z Digits
for example, if I've the text:
There was an issue with project 3.2.14P5879, regarding to document 14.1.1Z5526-ABC.docx it says that we've to use the documents of "14.1.1P5526 - xyz.pdf"
Then it would be amazing if we could have the next result:
- <content name="test">3.2.14P5879</content>
- <content name="test">14.1.1Z5526</content>
- <content name="test">14.1.1P5526</content>
But now,
When we tried to use the next reg-expression
\d+\.\d+\.\d+[PZ]\d+
We noticed that it didn't work, and the reason for that is : We think that Watson still uses regex 0.12
according to this link : https://www.ibm.com/support/knowledgecenter/SS8NLW_11.0.2/com.ibm.swg.im.infosphere.dataexpl.engine.man.doc/r_viv_match.html
and (see the regex specification for detailed information) :
http://www.delorie.com/gnu/docs/regex/regex_toc.html
Thus the question is :
How do you write \d+.\d+.\d+[PZ]\d+ into a regex 0.12 compatible version?
Plus
How and where can I test such things?
(I don't want to relay on stackoverflow, for each new query)
You may use
[:digit:]+\.[:digit:]+\.[:digit:]+[PF][:digit:]+
if I recall correctly.
[0-9]+\.[0-9]+\.[0-9]+[PZ][0-9]+
The above reg-expression, will find patterns similar as 123.12.12P1234
If we decompose this expression then we will have 7 sections.
Thus:
Find me the pattern in the text which contains:
[0-9]+ <-- numbers from 0 to 9 and may repeat e.g. 1 or 02736
\. <-- followed by a single dot "."
[0-9]+ <-- followed by repeatable numbers
\. <-- followed by a single dot "."
[0-9]+ <-- followed by repeatable numbers
[PZ] <-- followed by a single P or Z, but not both
[0-9]+ <-- and ends by repeatable numbers
[ A plus sing (+) means, that the values between brackets are repeatable. ]
Some patterns :
1.1.1P1
1.1.1Z1
100.100.100P100
1.100.1P1
etc
Hi I am working on RegEx. Correct response should NOT allow for number to the tenths only, as in RESPONSE = "925.0", nor should it allow for trailing zeros after the hundredths place as in RESPONSE = "925.000". Only correct responses: 925, 0925, 0925., 925., 925.00, 00925
I worked on it and finally came up with this
"^-?(0)*(\d*(\.(00))?\d+.|(\d){1,3}(,(\d){3})*(\.(00))?)$"
It works for three digit numbers but if i want it for 38400.00 it doesn't allow it
I am not quite certain whether the decimal places can be any digit or if they have to be zero. If the former, then this should do the trick:
^-?\d{1,3}(,?\d{3})*(\.(\d{2})?)?$
If the latter, then this:
^-?\d{1,3}(,?\d{3})*(\.(00)?)?$
The entire match starting with the decimal point is optional, and the two decimal places in that match are optional as well.
UPDATE I just realized that it appears you need to accept commas in the response as well - I assume for thousands, millions, etc.
UPDATE #2 per OP's comment
^-?(\d+|\d{1,3}(,\d{3})*)(\.(00)?)?$
UPDATE #3 Added link to regex101 for explanation of this regular expression.
Have a try with:
^-?\d{1,3}(?:,?\d{3})*(?:\.(?:00)?)?$
I think your problem is that you're trying to match it in chunks of three, with commas separating, but 38400.00 doesn't have commas.
Try this:
^-?\d+(\.?(\d{2})?)$
The - indicates the character, -. With the ? after, it says that it may or may not apply. This allows negative numbers, so if you only want positive numbers matched, delete the first two characters.
\d represents every digit. The + after says that there can be as many as you want, as long as there's at least one.
Then there's a \., which is just a dot in the number. The ? does the same as before.. Since you seem to allow trailing periods, I assumed you wanted it to be considered separately from the following digits.
The () encloses the next group, which is the period (\.) followed by two characters that match \d -- two digits -- and which may be repeated 0 or 1 times, as dictated by the ?. This allows people to either have no digits after the period or two, but nothing else.
The ^ at the beginning specifies it has to be at the beginning of the line, and the $ at the end specifies it has to end at the end of the line. Remember to enable the multiline (m) flag so it works properly.
Disclaimer: I've not done much regex work before, so I could well be totally off. If it doesn't work, let me know.
Couldn't you do this without the ?'s
^[0-9,]+(\.){0,1}(\d{2}){0,1}$
improved: ^\d+[0-9,]*(\.){0,1}(\d{2}){0,1}$
Edit:
Broken down a bit as requested
Old one:
[0-9,]+
1 or more digits/commas (would have accepted ',' as true) so improved version:
\d+
for starts with 1 or more digits
[0-9,]*
0 or more digits/commas
followed by
(\.){0,1}
0 or 1 decimal
Followed by
(\d{2}){0,1}
0 or 1 of (exactly 2 digits)