Regex to capture string with multiple optional words - regex

I'm using Overpass API's regex. Unsure which flavour it uses.
I'm wishing to capture these strings:
"Footpath"
"Public Footpath"
"Footpath No. 27001"
"Public Footpath No. 125"
"Footpath #424"
"Public Footpath #5"
This fails to return the first two options.
^(Public)?Footpath (No\. |#)?[0-9]
How do I make the 'No./# optional?
I've tried variations on wrapping them in brackets, but to no avail eg.
^(Public)?Footpath ((No\. |#)?[0-9])?
I'm afraid I'm out of my depth.

You may use this regex with multiple optional non-capturing groups:
^(?:Public )?Footpath(?: No\.)?(?: #?[0-9]+)?$
RegEx Demo
RegEx Details:
^: Start
(?:Public )?: Match Public in an optional non-capturing group
Footpath: Match Footpath
(?: No\.)?: Match No\. in an optional non-capturing group
(?: #?[0-9]+)?: Match space followed by optional # and 1+ digits in an optional non-capturing group
$: End

Related

Validate string # followed by digits but # increases after every occurance

I have a string looks like this
#123##1234###2356####69
It starts with # and followed by any digits, every time the # appears, the number of # increases, first time 1, second time 2, etc.
It's similar to this regex, but since I don't know how long this pattern goes, so it's not very useful.
^#\d+##\d+###\d+$
I'm using PCRE regex engine, it allows recursion (?R) and conditions (?(1)...) etc.
Is there a regex to validate this pattern?
Valid
#123
#12##235
#1234##12###368
#1234##12###368####22235#####723356
Invalid
##123
#123###456
#123##456##789
I tried ^(?(1)(?|(#\1)|(#))\d+)+$ but it doesn't seem to work at all
You can do this using PCRE conditional sub-pattern matching:
^(?:((?(1)\1)#)\d+)++$
RegEx Demo
RegEx Details:
^: Start
(?:: Start non-capture group
(: Start capture group #1
(?(1)\1): if/then/else directive that means match back-reference \1 only if 1st capture group is available otherwise match null
#: Match an additional #
): End capture group #1
\d+: Match 1+ digits
)++: End non-capture group. Match 1+ of this non-capture group.
$: End
One option could be optionally matching a backreference to group 1 inside group 1 using a possessive quantifier \1?+# adding # on every iteration.
^(?:(\1?+#)\d+)++$
^ Start of string
(?: Non capture group
(\1?+#)\d+ Capture group 1, match an optional possessive backreference to what is already captured in group 1 and add matching a # followed by 1+ digits
)++ Close the non capture group and repeat 1+ times possessively
$ End of string
Regex demo
I think you can use forward-referencing here:
^(?:((?:\1(?!^)|^)#)\d+)+$
See the regex demo.
Details:
^ - start of string
(?:((?:\1(?!^)|^)#)\d+)+ - one or more occurrences of
((?:\1(?!^)|^)#) - Group 1 (the \1 value): start of string or an occurrence of the Group 1 value if it is not at the string start position
\d+ - one or more digits
$ - end of string.
NOTE: This technique does not work in regex flavors that do not support forward referencing, like ECMAScript based flavors (e.g. JavaScript, VBA, C++ std::regex)
Despite there are already working answers, and inspired by Wiktor's answer, I came up this idea:
(?:(^#|#\1)\d+)+$
Which is also quite short and effective(also works for non pcre environment).
See the test cases

How to repeat a regex group until end of string

I have the following string
someaddres.com/?f=[B]a-test,a test,Test[C]test a,test2
I'm trying to pull two groups from it:
[B]a-test,a test,Test
and
[C]test a,test2
How would I repeat the capture group until a character not present in the group is found?
My current regex is: f=(\[[A-Z]\][a-zA-Z0-9,-\s]+)
You may use this regex with a captured group that will match twice:
(\[\w+\][^[]+)
RegEx Demo
If you want 2 capture groups in single match then use:
(\[\w+\][^[]+)(\[\w+\][^[]+)
RegEx Demo 2
RegEx Details:
(: Start capture group #1
\[: Match a [
\w+: Match 1+ word characters
\]: Match a ]
[^[]+: Match 1+ of any characters that is not [`
): End capture group #1

Regex modify capturing group

I have this Regex
^(?!.*\b(?:https?:\/\/|www\.))\w+(?:\.\w+)*\.\w{2,}(?:,\w+(?:\.\w+)*\.\w{2,})+$
that captures multiple URL separated by commas
caputres google.com,facebook.com but not with extra characters like google.com/home.php?,facebook.com/pages/#ref=?
Assuming your URLs won't contain a comma, you can add another optional non-capturing group in your regex like this:
^(?!.*\b(?:https?:\/\/|www\.))\w+(?:\.\w+)*\.\w{2,}(?:\/[^,]*)?(?:,\w+(?:\.\w+)*\.\w{2,}(?:\/[^,]*)?)*$
RegEx Demo
Note addition of an optional non-capturing group in regex:
(?:\/[^,]*)?: That matches text starting with / followed by 0 or more of any character except a comma. ? makes this group optional

Regex doesn't ignore the optionnals groups

I'm trying the create a regex to catch my url and his, optionnals, groups. The regex works fine if the url is complete. The optionnals groups are not optionnals at all.
Regex :
\/(.+)(?:\/(.+))(?:(?:\?(.+)))
Urls to catch :
/taxi
/taxi/lyon
/taxi/lyon?coordinates=7542
https://regex101.com/r/NKFkwq/4/
As you can see, the third line is catched. But i'd like the first and second too.
I thought the ?: will be enought to do that, but i missed something...
Thanks a lot for your help !
Cheers
EDIT and answer
Thanks in the comments for helping me. Here the great regex (the one i expected) : https://regex101.com/r/NKFkwq/8
Indeed ?: is about ignoring a match, not made him optionnal.
Your pattern consists of capturing and non capturing groups. The (?: denotes a non capturing group.
If you want to match all 3 lines, you could use match the part starting from the first forward slash and make the part starting from the second forward slash optional.
^/[^\s/]+(?:/[^\s/]+)?$
^ Start of string
/[^\s/]+ Match / and match 1+ times any char except a whitespace or /
(?: Non capturing group
/[^\s/]+ Match / and match 1+ times any char except a whitespace or /
)? Close non capturing group and make it optional
$ End of string
Regex demo
If you want to have capturing groups, but don't want to match /taxi?coordinates=7542 you could nest the groups and make them optional as well.
^/\w+(/\w+(\?\S*)?)?$
^ Start of string
/\w+ Match / and 1+ word chars
( Capture group 1
/\w+ Match / and 1+ word chars
( Capture group 2
\?\S* Match ? and 0+ times a non whitespace char
)? Close group 2
)? Close group 1
$ End of string
Regex demo

Matching Word Regex

Hello i want to match with regex this word
(Parc Installé)
from this text:
31/1/2017 17:19:23,4245986,ct0001#Intotel.int,Parc Installé,100.100.30.100
I did this regex ',[A-Za-zA-zÀ-ú+ \/\w+0-9._%+-]+,'
But the result is : 4245986 ans Parc Installé.
How can i match only Parc Installé
You may try a regex based on a lookahead that will require a comma and digits/commas after it up to the end of string:
[^,]+(?=\s*,[\d.]+$)
See this regex demo
Details:
[^,]+ - 1 or more chars other than ,
(?=\s*,[\d.]+$) - a lookahead requiring
\s* - zero or more whitespaces
, - a comma
[\d.]+ - 1+ digits or dots up to...
$ - ... the end of string
To make it a bit more restrictive, you may replace the lookahead with (?=\s*,\d+(?:\.\d+){3}$) to require 4 sequences of dot-separated 1+ digits. See this regex demo.
If a lookahead is not supported (case with a RE2 engine), you might want to use a capturing group based solution:
([^,]+)\s*,[\d.]+$
Here, the part within (...) will be captured into Group 1 and will be accessible via a backreference or a function like =REGEXEXTRACT in Google Spreasheets that only retrieves the contents of a capturing group if the latter is present in the pattern.