REGEX - how to match exacly 3 words from url? - regex

I would like to match with regex exacly 3 words from search phrase from url but not match 4 or more. URL can have some variations. The problem is shown as below. Regex should match and not match following examples:
SHOULD MATCH:
https://example.com/search=any%20url%20encoded_word-here
https://example.com/search=any%20url%20encoded_word-here%20
https://example.com/search=z%C5%82oty%20z%C5%82oty%20z%C5%82oty
https://example.com/search=z%C5%82oty%20z%C5%82ota%20%C5%82ata
https://example.com/search=any%20%20word%20%20here
https://example.com/search=any%20word%20here&color=blue
https://example.com/search=any-1st%20word_2nd%20here3
SHOULD NOT MATCH:
https://example.com/search=one%20two%20three%20four
https://example.com/search=one%20%20two%20%20three%20%20four
https://example.com/search=one%20%20two%20three%20%20four
https://example.com/search=one%20%20two%20%20three%20%20four
https://example.com/search=one%20two%20three%20four&color=blue
https://example.com/search=z%C5%82oty%20z%C5%82oty%20z%C5%82oty%20word
Started here https://regex101.com/r/0qzCJV/1 but I have no idea how to not match on conditions. Can you pls help me guys?

You may use this regex with a negative lookahead to fail the match when there are 3 %20 followed by at least 1 more character:
^(?!(?:.+?%20){3}.)(?:.+?%20){2}.+?(?:%20)?$
RegEx Demo
RegEx Details:
^: Start
(?!(?:.+?%20){3}.): Negative lookahead to fail the match when we have 3 occurrences of %20 followed by at least 1 character
(?:.+?%20){2}: Match 1+ of any characters followed by %20. Repeat this match 2 times to match 2 words
.+?: Match 1+ of any characters
(?:%20)?: Match optional %20 before end
$: End
Or use possessive quantifier to reduce backtracking:
^(?!(?:.+?%20){3}+.)(?:.+?%20){2}.+?(?:%20)?$

Try this:
^(((?!%20).)*(%20)+){2}((?!%20).)*(%20)?$
See live demo.
This uses a negative look ahead to match up to %20, then any number of %20, and all that twice. Then finish with anything not %20, except there may be %20 at the end.
Note: Your examples non-matches did not include urls with less than 3, eg
https://example.com/search=one%20two

Related

Regex to allow groups of 7 numbers and spaces

I'm looking for help with a regex.
My input field should allow only groups of up to 7 digits, and an unlimited number spaces whether at the beginning, middle, or end.
Here are a few examples of valid matches
Match1:
478 2635478 14587 9652
Match2 (spaces at the end):
14 2 55586
I tried this regex
^( )*[0-9]{1,7}(( )*[0-9]{1-7})*( )*$
It matches when the group is 8 digits.
Converting my comment to answer so that solution is easy to find for future visitors.
You may use this regex:
^ *[0-9]{1,7}(?: +[0-9]{1,7})* *$
RegEx Demo
RegEx Breakup:
^: Start
*: Match 0 or more spaces
[0-9]{1,7}: Match 1 to 7 digits
-(?: +[0-9]{1,7})*: Match 1+ spaces followed by a match of 1 to 7 digits. Repeat this group 0 or more times
*: Match 0 or more spaces
$: End
An idea with one group and use of a word boundary to separate blocks:
^ *(?:\d{1,7}\b *)+$
See this demo at regex101 (more explanation on the right side)
\b will require a space or the end after each \d{1,7} repetition.

Match with optional positive lookahead

I've got 2 strings in the format:
Some_thing_here_1234 Match Me 1 & 1234 Match Me 1_1
In both cases I want the resultant match to be 1234 Match Me 1
So far I've got (?<=^|_)\d{4}\s.+ which works but in the case of string 2 also captures the _1 at the end. I thought I could use a lookahead at the end with an optional such as (?<=^|_)\d{4}\s.+(?=_\d{1}$|$) but it always seems to revert to the second option and so the _1 gets through.
Any help would be great
You can use
(?<=^|_)\d{4}\s[^_]+
See the regex demo.
Details:
(?<=^|_) - a positive lookbehind that matches a location that is immediately preceded with either start of string or a _ char (equal to (?<![^_]))
\d{4} - four digits
\s - a whitespace
[^_]+ - one or more chars other than _.
Your second pattern (?<=^|_)\d{4}\s.+(?=_\d{1}$|$) is greedy and at the end of the string the second alternative |$ will match so you will keep matching the whole line.
Note that you can omit {1}
If you want to use an optional part in the lookahad, you can make the match non greedy and optionally match :_\d in the lookahead followed by the end of the string.
(?<=^|_)\d{4}\s.+?(?=(?:_\d)?$)
See a regex demo.

Regex TRYING to search with multiple criteria or backwards

Appreciating regex but still beginning.
I tried many workarounds but can't figure how to solve my problem.
String A : 4 x 120glgt
String B : 120glgt
I'd like the proper regex to return 120 as the number after "x".
But sometimes there won't be "x". So, be it [A] or [B] looking for one unique approach.
I tried :
to start the search from the END
Start right after the "x"
I clearly have some syntax issues and didn't quite get the logic of (?=)
(?=[^x])(?=[0-9]+)
So looking forward to learn with your help
As you tagged pcre, you could optionally match the leading digits followed by x and use \K to clear the match buffer to only match the digits after it.
^(?:\d+\h*x\h*)?\K\d+
The pattern matches:
^ Start of string
(?:\d+\h*x\h*)? Optionally match 1+ digits followed by x between optional spaces
\K Forget what is matched so far
\d+ Match 1+ digits
See a regex demo.
If you want to use a lookahead variant, you might use
\d+(?=[^\r\n\dx]*$)
This pattern matches:
\d+ Match 1+ digits
(?= Positive lookahead, assert what is to the right is
[^\r\n\dx]*$ Match optional repetitions of any char except a digit, x or a newline
) Close the lookahead
See another regex demo.

I wrote url validation regex but the regex is very slow

I know this is slow because of ([\.\-][a-z0-9])*. But I don't know how to optimize it.
^https:\/\/([a-z0-9]+([\.\-][a-z0-9])*)+(\.([a-z]{2,11}|[0-9]{1,5}))(:[0-9]{1,5})?(\/.*)?$
You don't have to use this part )*)+ in your pattern. This could also potentially lead to catastrophic backtracking.
Note that you only have to escape the backslash if the delimiters for the regex are also / and you don't have to escape the [\.\-]
If you don't need that capture groups afterwards, you can omit them.
^https:\/\/[a-z0-9]+(?:[.-][a-z0-9]+)*\.(?:[a-z]{2,11}|[0-9]{1,5})(?::[0-9]{1,5})?(\/.*)?$
The pattern matches:
^ Start of string
https:\/\/ Match https:// As you only want to match https
[a-z0-9]+ Match 1+ times any of the listed
(?:[.-][a-z0-9]+)* Optionally repeat matching . or - and 1+ times any of the listed
\.(?:[a-z]{2,11}|[0-9]{1,5}) Match either 2-11 times a char a-z or match 1-5 digits
(?::[0-9]{1,5})? Optionally match : and 1-5 digits
(\/.*)? Optionally match /` and the rest of the line
$ End of string
Regex demo

how to match a list of fixed length words separated by space or comma?

The words' length could be 2 or 6-10 and could be separated by space or comma. The word only include alphabet, not case sensitive.
Here is the groups of words that should be matched:
RE,re,rereRE
Not matching groups:
RE,rere,rel
RE,RERE
Here is the pattern that I have tried
((([a-zA-Z]{2})|([a-zA-Z]{6,10}))(,|\s+)?)
But unfortunately this pattern can match string like this: RE,RERE
Look like the word boundary has not been set.
You could match chars a-z either 2 or 6 - 10 times using an alternation
Then repeat that pattern 0+ times preceded by a comma or a space [ ,].
^(?:[A-Za-z]{6,10}|[A-Za-z]{2})(?:[, ](?:[A-Za-z]{6,10}|[A-Za-z]{2}))*$
Explanation
^ Start of string
(?:[A-Za-z]{6,10}|[A-Za-z]{2}) Match chars a-z 6 -10 or 2 times
(?: Non capturing group
[, ](?:[A-Za-z]{6,10}|[A-Za-z]{2}) Match comma or space and repeat previous pattern
)* Close non capturing group and repeat 0+ times
$ End of string
Regex demo
If lookarounds are supported, you might also assert what is directly on the left and on the right is not a non whitespace character \S.
(?<!\S)(?:[A-Za-z]{6,10}|[A-Za-z]{2})(?:[ ,](?:[A-Za-z]{6,10}|[A-Za-z]{2}))*(?!\S)
Regex demo
([a-zA-Z]{2}(,|\s)|[a-zA-Z]{6,10}|(,|\s))
This one will get only the words who have 2 letter, or between 6 and 10
\b,?([a-zA-Z]{6,10}|[a-zA-Z]{2}),?\b
You can use this
^(?!.*\b[a-z]{4}\b)(?:(?:[a-z]{2}|[a-z]{6,10})(?:,|[ ]+)?)+$
Regex Demo
This regex will match your first case, but neither of your two other cases:
^((([a-zA-Z]{2})|([a-zA-Z]{6,10}))(,|[ ]+|$))+$
I'm making the assumption here that each line should be a single match.
Here it is in action.