Regex ending character can be one of 2 - regex

I have a url that I am trying to get the id out of. Problem is, the url could look like this "https://www.website.com/blah/1234567890:0", "https://www.website.com/blah/1234567890/" or "https://www.website.com/blah/1234567890"
It is now the first option that is giving me trouble. Basically, all I want is "1234567890", so for the last option, I need to omit the :0.
Here is what I tried when capturing id with/without ending /:
([^/]*)\/?$
Here is what I tried to cover both with/without ending / and the :0, but it does not work as I thought (id is match 6, but I've no way of knowing it will always be 6):
([^/]*)[/:?$]

Note your [/:?$] subpattern matches a single char, either /, or :, ? or $ (the $ symbol inside [...] is not a special regex operator any longer).
You may make the first negated character class lazily quantified, and add an optional group that would match / or :0 one or zero times:
([^\/]*?)(?:\/|:0)?$
See the regex demo. Replace 0 with [0-9] to match any digit at the end of the string.
Details:
([^\/]*?) - Group 1: zero or more chars other than / as few as possible (due to the *? quantifier)
(?:\/|:0)? - an optional non-capturing group matching one of the two alternatives, 1 or 0 times: / or :0
$ - end of string.

Related

capturing values after an optional slash

I am trying to write in regex a string that allows me to have
an alphanumeric string of length no longer than 5 (as an example) [a-z0-9]{3,5}
followed by an optional forward slash /?
that cannot end in a 3
I want to capture any group of at least 3, with our without a slash, and then anything after it.
And I am having a very hard time accomplishing this. If I require the slash / it is much easier to do so.
When I try
(?=.+\/?.+)[a-z0-9]{2,5}\/?(?<!3\/|3)
I can capture what I want - up until the slash, but can't crack how to get anything after IF legit things occur
(?=.+\/?.+)[a-z0-9]{2,62}\/?.?
My requirement for length goes up by 1 - to 4 instead of 3 - due to the additional . I put after the \/?. I could change my match to account for it, but it becomes really difficult.
(?=.+\/?.+)[a-z0-9]{2,5}\/?(?<!3\/|3)$
This only gives me the last slash or non slash follwed by 2,5 characters.
(?=.+\/?.+)[a-z0-9]{2,62}\/?.*
or
(?=.+\/?.+)[a-z0-9]{2,62}\/?.?+
simply then ignores my ending rule, of not being able to close with3/ or 3. Also this allows me to use more than 5 characters before the slash. Def not what I want :)
Is there a way to make an optional field still maintain length and ending rules?
I am running this script on both regexr.com and https://www.w3schools.com/jsref/tryit.asp?filename=tryjsref_regexp and gitbash and not getting the results I would like
Try:
^[a-z0-9]{3,5}(?<!3)(?:$|\/.*)
Regex demo.
^ - beginning of the string
[a-z0-9]{3,5} - capture a-z0-9 between 3 and 5 times
(?<!3) - the last character should not be 3
(?:$|\/.*) - match either end of string $ or / and any number of characters.
If the last character in this range [a-z0-9] should not be a 3 you can exclude it like [a-z124-9]
^[a-z0-9]{2,4}[a-z124-9](?:\/.*)?$
Explanation
^ Start of string
[a-z0-9]{2,4} Match 2-4 chars in the ranges a-z 0-9
[a-z124-9] Match a single char a-z and then either 1,2 4-9
(?:\/.*)? Optionally match / and the rest of the line
$ End of string
See a regex101 demo.
If you can not match a 3 at all:
^[a-z124-9]{3,5}(?:\/.*)?$
See another regex101 demo

RegEx for matching operation sequences

I have a numbers operation like this:
-2-28*95+874-1545*-5+36
I need to extract operands, not implied in a multiplication operation with a regex:
-2
+874
+36
I tried things like that without success:
[\+,-]\d+(?=\+|-|$)
This regex matches -5, too, and
(?(?=\d+)[\+,-]|^)\d+(?=\+|-|$)
matches nothing.
How do I solve this problem?
You may use
(?<!\*)[-+]\d*\.?\d+(?![*\d])
See the regex demo
Details
(?<!\*) - (a negative lookbehind making sure the current position is) not immediately preced with a * char
[-+] - - or +
\d* - 0 or more digits
\.? - an optional . char
\d+ - 1+ digits
(?![*\d]) - not immediately followed with a * or digit char.
See the regex graph:
This RegEx might help you to capture your undesired pattern in one group (), then it would leave your desired output:
(((-|\+|)\d+\*(-|\+|)\d+))
You can also use other language specific functions such as (*SKIP)(*FAIL) or (*SKIP)(*F) and get the desired output:
((((-|\+|)\d+\*(-|\+|)\d+))(*SKIP)(*FAIL)|([s\S]))
You can also DRY your expression, if you wish, and remove unnecessary groups that you may not need.
Another option could be to match what you don't want and capture in a group what you want to keep. Your values are then in the first capturing group:
[+-]?\d+(?:\*[+-]?\d+)+|([+-]?\d+)
Explanation
[+-]?\d+ Optional + or - followed by 1+ digits
(?:\*[+-]?\d+)+ Repeat the previous pattern 1+ times with an * prepended
| Or
([+-]?\d+) Capture in group 1 matching an optional + or - and 1+ digits
Regex demo

How to write that the pattern should be repeated?

I have a line of pattern:
double1, +double2,-double3.
For single double value pattern is :
[+-]?([0-9]+([.][0-9]*)?|[.][0-9]+)
How to make it for triple value?
Such as:
1.1, 0, -0
0, -123, 33
Not valid for:
""
1,123
123,123,123,123
You can use a slightly simpler pattern:
^(?:(?:^[+-]?|, ?[+-]?)\d+(?:\.\d+)?){3}$
Matches only triple occurences as you specified in your edit.
You can try it here.
As correctly pointed out by The Fourth Bird in his comments below, if you wish to match entries such as .9, where no digits precede the full stop you can use:
^(?:(?:^[+-]?|, ?[+-]?)(?:\d+(?:\.\d+)?|\.\d+)){3}$
You can check this pattern here.
The double part ([.][0-9]*)? is optional which will match 0 or 1 times.
To match it triple times, you could match a double using [-+]?(?:[0-9]+(?:\.[0-9]+)?|\.[0-9]+) which will match an optional + or - followed by an alternation that will match either a digit followed by an optional part that matches a dot and one or more digits or a dot followed by one or more digits.
Repeat that pattern 2 times using a quantifier {2} preceded by a comma and zero or more times a whitespace character \s*.
Add anchors to assert the start ^ and the end $ of the string and you could make use of a non capturing group (?: if you only want to check if it is a match and not refer to the groups anymore.
^[-+]?(?:[0-9]+(?:\.[0-9]+)?|\.[0-9]+)(?:,\s*[-+]?(?:[0-9]+(?:\.[0-9]+)?|\.[0-9]+)){2}$

How to use regular expression to use as few groups as possible to match as long string as possible

For example, this is the regular expression
([a]{2,3})
This is the string
aaaa // 1 match "(aaa)a" but I want "(aa)(aa)"
aaaaa // 2 match "(aaa)(aa)"
aaaaaa // 2 match "(aaa)(aaa)"
However, if I change the regular expression
([a]{2,3}?)
Then the results are
aaaa // 2 match "(aa)(aa)"
aaaaa // 2 match "(aa)(aa)a" but I want "(aaa)(aa)"
aaaaaa // 3 match "(aa)(aa)(aa)" but I want "(aaa)(aaa)"
My question is that is it possible to use as few groups as possible to match as long string as possible?
How about something like this:
(a{3}(?!a(?:[^a]|$))|a{2})
This looks for either the character a three times (not followed by a single a and a different character) or the character a two times.
Breakdown:
( # Start of the capturing group.
a{3} # Matches the character 'a' exactly three times.
(?! # Start of a negative Lookahead.
a # Matches the character 'a' literally.
(?: # Start of the non-capturing group.
[^a] # Matches any character except for 'a'.
| # Alternation (OR).
$ # Asserts position at the end of the line/string.
) # End of the non-capturing group.
) # End of the negative Lookahead.
| # Alternation (OR).
a{2} # Matches the character 'a' exactly two times.
) # End of the capturing group.
Here's a demo.
Note that if you don't need the capturing group, you can actually use the whole match instead by converting the capturing group into a non-capturing one:
(?:a{3}(?!a(?:[^a]|$))|a{2})
Which would look like this.
Try this Regex:
^(?:(a{3})*|(a{2,3})*)$
Click for Demo
Explanation:
^ - asserts the start of the line
(?:(a{3})*|(a{2,3})*) - a non-capturing group containing 2 sub-sequences separated by OR operator
(a{3})* - The first subsequence tries to match 3 occurrences of a. The * at the end allows this subsequence to match 0 or 3 or 6 or 9.... occurrences of a before the end of the line
| - OR
(a{2,3})* - matches 2 to 3 occurrences of a, as many as possible. The * at the end would repeat it 0+ times before the end of the line
-$ - asserts the end of the line
Try this short regex:
a{2,3}(?!a([^a]|$))
Demo
How it's made:
I started with this simple regex: a{2}a?. It looks for 2 consecutive a's that may be followed by another a. If the 2 a's are followed by another a, it matches all three a's.
This worked for most cases:
However, it failed in cases like:
So now, I knew I had to modify my regex in such a way that it would match the third a only if the third a is not followed by a([^a]|$). So now, my regex looked like a{2}a?(?!a([^a]|$)), and it worked for all cases. Then I just simplified it to a{2,3}(?!a([^a]|$)).
That's it.
EDIT
If you want the capturing behavior, then add parenthesis around the regex, like:
(a{2,3}(?!a([^a]|$)))

Strange behavior of regex

Background:
I need to identify a pair of numbers separated by a hyphen (-), the numbers can optionally include +/- and can be decimal.
So below are examples of that:
3-4, +3-+4, .3-.4, 0.3-0.4, -0.3--0.4, 0.3--0.4 etc...
I was using below expression:
(-?\+?\d*.?\d*)-(-?\+?\d*.?\d*)
It works well in most cases but fails in below:
-0.3--0.4
The groups it forms are: -0.3- and 0.4
But if i replace it like:
(-?\+?\d*.?\d+)-(-?\+?\d*.?\d+), it works fine.
I am wondering what difference replacing the * with + is making?
We have used this in javascript.
The wrong capturing is accounted for by the fact that your patterns inside capturing groups (-?\+?\d*.?\d*) can match an empty string and - more importantly here - . matches any char, not only a dot. You must escape it to match a literal dot. Note how (-?\+?\d*.?\d*)-(-?\+?\d*.?\d*) matches 3-4, (the , is captured with Group 2 pattern .) and note Matches 5 and 6 where . matches a space and a hyphen.
Also, your -?\+? actually allows matching -+ sequence of signs, which does not seem what you need. Just use [-+]? optional character class.
So, you might want to use ([-+]?\d*\.?\d*)-([-+]?\d*\.?\d*) pattern, but I'd advise to make sure at least 1 digit is matched, and you may use ([-+]?\d*\.?\d+)-([-+]?\d*\.?\d+) pattern for it.
Details:
([-+]?\d*\.?\d+) - Group 1: a sequence of
[-+]? - an optional - or +
\d* - 0+ digits
\.? - an optional .
\d+/\d* - 1 or more digits (or 0 or more with *)
- - a hyphen
([-+]?\d*\.?\d+) - see above.