URL Match Regex: Positive and Negative terms - regex

I'm trying to match both negatives and positives terms in a URL match regex.
"Word1" and "Word2" are negatives ones, and "Word3" is the positive. The URL must have the positive keyword and dont have the negatives to match.
https://example.net
https://example.net/word3 - match
https://example.net/word3/word2 - dont match
https://example.net/word3/word1 - dont match
Currently I'm excluding the homepage too, but with the positive match I think that it will be unnecessary.
^((?!(word1|word2|http(s?):\/\/example\.net\/?($|[?#][&=#+%0-9a-zA-Z]+$))).)*$
How can I use positive and negative matchs in just one string?

If positive lookarounds are supported, you could use:
In your example you use both exemple and example but I have used example here.
^(?!.*\bword[12]\b)(?=.*\bword3\b)https?://example\.net/word\d.*$
That will match:
^ Assert start of the string
(?!.*\bword[12]\b) Negative lookahead to assert what is on the right is not word1 or word2
(?=.*\bword3\b) Positive lookahead to assert what is on the right is word3
https?://example\.net/word\d Match start of the url followed by /word and a digit.
.* Match any character 0+ times
$ Assert end of the string
Regex demo

You can use the following regex:
^(?=.*word3)(?!.*word(?:1|2)).*$
It starts by matching the start of string, then uses a positive look ahead for 'word3', then uses a negative look ahead for 'word' followed by either
1 OR 2.
Finally it matches the rest of the string (if it has passed the test).

Related

Regex match an entire number with lookbehind and look ahead logic(without word boundaries)

I am trying to detect if a string has a number based on few conditions. For example I don't want to match the number if it's surrounded by parentheses and I'm using the lookahead and lookbehind to do this but I'm running into issues when the number contains multiple digits. Also, the number can be between text without any space separators.
My regex:
(?https://regex101.com/r/RnTSMJ/1
Sample examples:
{2}: Should NOT Match. //My regex Works
{34: Should NOT Match. //My regex matches 4 in {34
45}: Should NOT Match. //My regex matches 4 in {45
{123}: Should NOT Match. //My regex matches 2 in {123}
I looked at Regex.Match whole words but this approach doesn't work for me. If I use word boundaries, the above cases work as expected but then cases like the below don't where numbers are surrounded with text. I also want to add some additional logic like don't match specific strings like 1st, 2nd, etc or #1, #2, etc
updated regex:
(?<!\[|\{|\(|#)(\b\d+\b)(?!\]|\}|\|st|nd|rd|th)
See here https://regex101.com/r/DhE3K4/4
123abd //should match 123
abc345 //should match 234
ab2123cd // should match 2123
Is this possible with pure regex or do I need something more comprehensive?
You could match 1 or more digits asserting what is on the left and right is not {, } or a digit or any of the other options to the right
(?<![{\d#])\d+(?![\d}]|st|nd|rd|th)
Explanation
(?<![{\d#]) Negative lookbehind, assert what is on the left is not {, # or a digit
\d+ Match 1+ digits
(?! Negative lookahead, assert what is on the right is not
[\d}]|st|nd|rd|th Match a digit, } or any of the alternatives
) Close lookahead
Regex demo
The following regex is giving the expected result.
(?<![#\d{])\d+(?!\w*(?:}|(?:st|th|rd|nd)\b))
Regex Link

Regular Expression to extract alphanumeric parts of a URL?

Given any URL, like:
https://stackoverflow.com/v1/summary/1243PQ/details/P1/9981
How do I extract the numeric or alphanumeric part of the URL? I.e. the following strings from the url given above:
1. v1
2. 1243PQ
3. P1
4. 9981
To rephrase, a regex to extract strings from a string (URL) which have at least 1 digit and 0 or more alphabet characters, separated by '/'.
I tried to capture a repeating group (^[a-zA-Z0-9]+)+ and ([a-zA-Z]{0,100}[0-9]{1,100})+ but it didn't work. In hindsight intuition does say this shouldn't work. I am unsure how do I match patterns over a group and not just a single character.
If I understand what you really want:
Extracting parts with only numbers or with numbers following alphabets
then; I can suggest this regex:
\b[a-zA-Z]*[0-9]+[a-zA-z]*\b
Regex Demo
I use \b to assert position of a word boundary or a part.
As numbers are required and alphabets can comes before or after that I use above regex.
If following alphabets are not required then I can suggest this regex:
\b[a-zA-z0-9]*[0-9]+[a-zA-Z0-9]*\b
Regex Demo
I believe this should work for you:
(\d*\w+\d+\w*)
EDIT: actually, this should be sufficient
(\w+\d+\w*)
or
(\w*\d+\w*)
Well, you could do this:
(\w*\d+\w*) with the g (global) regex option
On the example URL, it would look like this:
const regex = /(\w*\d+\w*)/g;
const url = 'https://stackoverflow.com/v1/summary/1243PQ/details/P1/9981';
console.log(url.match(regex))
Try \/[a-zA-Z]*\d+[a-zA-Z0-9]*
Explanation:
\/ - match / literally
[a-zA-Z]* - 0+ letters
\d+ - 1+ digits - thanks to this, we require at least one digits
[a-zA-Z0-9]* - 0+ letters or digits
Demo
It will captrure together with / at the beginning, so you need to trim it.

Using regex to determine straight (unordered hand)

A straight in poker is five cards in a row, for example 23456 or 89TJQ. With a "sorted" hand, the regex could be written as:
^(A2345|23456|34567|45678|56789|6789T|789TJ|89TJQ|9TJQK|TJQKA)$
It's a bit verbose but straightforward enough. However, would it be possible to generate a (sensible) regex if the hand was unordered? For example, if the hand was 52634 or JQ89T??
One possible way would be to use a ?=.*<item> lookahead (which would essentially be "unsorted"), for example:
^(?:
(?=.*A)(?=.*2)(?=.*3)(?=.*4)(?=.*5)
|(?=.*2)(?=.*3)(?=.*4)(?=.*5)(?=.*6)
|(?=.*3)(?=.*4)(?=.*5)(?=.*6)(?=.*7)
|(?=.*4)(?=.*5)(?=.*6)(?=.*7)(?=.*8)
|(?=.*5)(?=.*6)(?=.*7)(?=.*8)(?=.*9)
|(?=.*6)(?=.*7)(?=.*8)(?=.*9)(?=.*T)
|(?=.*7)(?=.*8)(?=.*9)(?=.*T)(?=.*J)
|(?=.*8)(?=.*9)(?=.*T)(?=.*J)(?=.*Q)
|(?=.*9)(?=.*T)(?=.*J)(?=.*Q)(?=.*K)
|(?=.*T)(?=.*J)(?=.*Q)(?=.*K)(?=.*A)
)
.{5}$
Are there other / better approaches to finding if a straight exists using regex only?
You can use the following regex:
See regex in use here
(?!.*(.).*\1)(?:[A2345]{5}|[23456]{5}|[34567]{5}|[45678]{5}|[56789]{5}|[6789T]{5}|[789TJ]{5}|[89TJQ]{5}|[9TJQK]{5}|[TJQKA]{5})
This works by first using a negative lookahead to ensure that the string doesn't contain any duplicates (?!.*(.).*\1). Then it matches 5 characters from any of the straight possibilities.
(?!.*(.).*\1)
#^^^ ^ negative lookahead ensuring what follows doesn't match
# ^^ match any character any number of times
# ^^^ capture a character into capture group #1
# ^^ match any character any number of times
# ^^ match the same text as most recently matched by the 1st capture group
Against JQQ89, it works as follows:
- .* matches J
- (.) captures Q
- .* matches nothing
- \1 tries to match Q (and succeeds)
- Negative lookahead has a match, so fail the match.

Regex not returning all matches

I have the following regex (my actual regex is actually a lot more complex but I pinned down my problem to this): \s(?<number>123|456)\s
And the following test data:
" 123 456 "
As expected/wanted result I would have the regex match in 2 matches one with "number" being "123" and the second with number being "456". However, I'm only getting 1 match with "number" being "123".
I did notice that adding another space in between "123" en "456" in the test data does give 2 matches...
Why don't I get the result I want? How to get it right?
Your pattern contains consuming \s patterns that matches a whitespace before and after a number, and the input contains consecutive numbers separated with a single whitespace. If there were two spaces between the numbers, it would work.
Use whitespace boundaries based on lookarounds:
(?<!\S)(?<number>123|456)(?!\S)
See the regex demo
The (?<!\S) is a negative lookbehind that will fail the match if there is a non-whitespace char immediately to the left of the current location, and (?!\S) is a negative lookahead that will fail the match if there is a non-whitespace char immediately to the right of the current location.
(?<!\S) is the same as (?<=^|\s) and (?!\S) is the same as (?=$|\s), but more efficient.
Note that in many situations you might even go with 1 lookahead and use
\s(?<number>123|456)(?!\S)
It will ensure the consecutive whitespace separated matches are found.

Global regex with multiple matches, where the separator should be shared in several matches

First of all, sorry for the unclear title, it's hard to describe (and to find an existing solution for the same reason).
I use this regex in Javascript, to collect numbers in a string :
/(?:^|[^\d])([\d]+)(?:$|[^\d])/g
Executing it on "5358..2145" returns 2 matches, where the submatches are "5358" and "2145"
But if I use it on "5358.2145", I receive only 1 match : "5358"
So, I understand it so :
The first match is found ("5358.") so the point goes in the first match
What I want as second match is not preceded with start of string or the point because this point already belongs to the first match
How can I change my pattern to find all numbers separated with 1 non-number character ?
Use a negative lookahead at the end:
/(?:^|\D)(\d+)(?!\d)/g
See the regex demo
The pattern matches:
(?:^|\D) - either start of string (^) or any non-digit char (\D)
(\d+) - Group 1: one or more digits
(?!\d) - the negative lookahead failing the match if there is a digit immediately to the right of the current location.