Regex to match path containing one of two strings - regex

RegEx to match one of two strings in the third segment, ie in pseudo code:
/content/au/(boomer or millenial)/...
Example matches
/content/au/boomer
/content/au/boomer/male/31
/content/au/millenial/female/29/M
/content/au/millenial/male/18/UM
Example non-matches
/content/au
/content/nz/millenial/male/18/UM
/content/au/genz/male
I've tried this, but to no avail:
^/content/au/(?![^/]*/(?:millenial|boomer))([^/]*)

Don't use a look ahead; just use the plain alternation millenial|boomer then a word-boundary:
^/content/au/(?:millenial|boomer)\b(?:/.*)?
See live demo.
You should probably spell millennial correctly too (two "n"s, not one).

What's with the negative lookahead? This is a simple, if not trivial, positive match.
^/content/au/(?:millenial|boomer)(?:/|$)
The final group says the match needs to be followed by a slash or nothing, so as to exclude paths which begin with one of the alternatives, but contain additional text.

You can use the following regex DEMO
content/au/(?:boomer|millenial)

Related

Regexp - Get everything before two different strings. One can contain both

I have to use regexp.
Current state:
.+?((/=\.czxy)|(?=\.zzzz))
It's working for the first two cases (that's obvious)
So I have decided to do something like this:
.+?((/=\.czxy)|(?=\.zzzz)|(?=\-\-[0-9]))
But this still doesn't work. (There is OR).
I want to have everything before the extension. (Example 1 and 2)
When string is ended with '--1,--2, --3... and so on', I need to have everything before that. (Example 3 and 4)
Note: I cannot use if construction.
Examples:
123_abc_cb1.czxy -> 123_abc_cb1
123_23c_cb1.zzzz -> 123_23c_cb1
123_abc_cb1--1.czxy -> 123_abc_cb1
123_23c_cb1--1.zzzz -> 123_23c_cb1
EDIT:
123_abc_cb1 is a random combination of letters, numbers and special characters, there can be everything.
Your attempt has these issues:
A typo: (/= should be (?=
The regex does not require that the --[0-9] part is still followed by the extension. That part should actually be an optional part that precedes the pattern for the extension.
So change to this:
^.+?(?=(?:--\d)?\.(?:czxy|zzzz))
Or -- if matches do not necessarily start at the start of the input/line:
(?<!\S).+?(?=(?:--\d)?\.(?:czxy|zzzz))
You don't need any lookarounds if you can use a capture group. To match characters and underscore you can use for example \w to match word characters:
(\w+)(?:--\d+)?\.(?:czxy|zzzz)\b
Regex demo
why not use the recurrent information "_cb1"
/.*_cb1/

Split complex string into mutliple parts using regex

I've tried a lot to split this string into something i can work with, however my experience isn't enough to reach the goal. Tried first 3 pages on google, which helped but still didn't give me an idea how to properly do this:
I have a string which looks like this:
My Dogs,213,220#Gallery,635,210#Screenshot,219,530#Good Morning,412,408#
The result should be:
MyDogs
213,229
Gallery
635,210
Screenshot
219,530
Good Morning
412,408
Anyone have an idea how to use regex to split the string like shown above?
Given the shared patterns, it seems you're looking for a regex like the following:
[A-Za-z ]+|\d+,\d+
It matches two patterns:
[A-Za-z ]+: any combination of letters and spaces
\d+,\d+: any combination of digits + a comma + any combination of digits
Check the demo here.
If you want a more strict regex, you can include the previous pattern between a lookbehind and a lookahead, so that you're sure that every match is preceeded by either a comma, a # or a start/end of string character.
(?<=^|,|#)([A-Za-z ]+|\d+,\d+)(?=,|#|$)
Check the demo here.

Regex - Skip characters to match

I'm having an issue with Regex.
I'm trying to match T0000001 (2, 3 and so on).
However, some of the lines it searches has what I can describe as positioners. These are shown as a question mark, followed by 2 digits, such as ?21.
These positioners describe a new position if the document were to be printed off the website.
Example:
T123?214567
T?211234567
I need to disregard ?21 and match T1234567.
From what I can see, this is not possible.
I have looked everywhere and tried numerous attempts.
All we have to work off is the linked image. The creators cant even confirm the flavour of Regex it is - they believe its Python but I'm unsure.
Regex Image
Update
Unfortunately none of the codes below have worked so far. I thought to test each code in live (Rather than via regex thinking may work different but unfortunately still didn't work)
There is no replace feature, and as mentioned before I'm not sure if it is Python. Appreciate your help.
Do two regex operations
First do the regex replace to replace the positioners with an empty string.
(\?[0-9]{2})
Then do the regex match
T[0-9]{7}
If there's only one occurrence of the 'positioners' in each match, something like this should work: (T.*?)\?\d{2}(.*)
This can be tested here: https://regex101.com/r/XhQXkh/2
Basically, match two capture groups before and after the '?21' sequence. You'll need to concatenate these two matches.
At first, match the ?21 and repace it with a distinctive character, #, etc
\?21
Demo
and you may try this regex to find what you want
(T(?:\d{7}|[\#\d]{8}))\s
Demo,,, in which target string is captured to group 1 (or \1).
Finally, replace # with ?21 or something you like.
Python script may be like this
ss="""T123?214567
T?211234567
T1234567
T1234434?21
T5435433"""
rexpre= re.compile(r'\?21')
regx= re.compile(r'(T(?:\d{7}|[\#\d]{8}))\s')
for m in regx.findall(rexpre.sub('#',ss)):
print(m)
print()
for m in regx.findall(rexpre.sub('#',ss)):
print(re.sub('#',r'?21', m))
Output is
T123#4567
T#1234567
T1234567
T1234434#
T123?214567
T?211234567
T1234567
T1234434?21
If using a replace functionality is an option for you then this might be an approach to match T0000001 or T123?214567:
Capture a T followed by zero or more digits before the optional part in group 1 (T\d*)
Make the question mark followed by 2 digits part optional (?:\?\d{2})?
Capture one or more digits after in group 2 (\d+).
Then in the replacement you could use group1group2 \1\2.
Using word boundaries \b (Or use assertions for the start and the end of the line ^ $) this could look like:
\b(T\d*)(?:\?\d{2})?(\d+)\b
Example Python
Is the below what you want?
Use RegExReplace with multiline tag (m) and enable replace all occurrences!
Pattern = (T\d*)\?\d{2}(\d*)
replace = $1$2
Usage Example:

Regular expression to match line containing some strings and not others

I have lines like this:
example.com/p/stuff/...
example.com/page/thing/...
example.com/page/stuff/...
example.com/page/other-stuff/...
etc
where the dots represent continuing URL paths. I want to select URLs that contain /page/ and are NOT followed by thing/. So from the above list we would select:
example.com/page/stuff/...
example.com/page/other-stuff/...
.*?\/page\/[^(thing)].*
this is the regex for matching a string which has /page/ not followed by thing
adding the lazy evalation is suggested because you advance a char at the time, better performance!
You need to use negative lookahead:
example\.com\/page\/(?!thing\/).*
Demo
Use the following regex pattern:
.*?\/page\/(?!thing\/).*
https://regex101.com/r/19wh1w/2
(?!thing\/) - negative lookahead assertion ensures that page/ section is not followed by thing/

Regexp: replacing all [[??]] with {{param|??}}

I'm hoping some regexp guru to help me out with this:
I have strings such as [[AB]], [[ABC]] and [[BEC]], and I want to replace them with string {{param|AB}}, {{param|ABC}} and {{param|BEC}} respectively.
All source strings are inside [[]] and have 2 or 3 upper case letters. The idea is to transfer the letters inside brackets to the new format. It's fine if I need two different regexps for 2 and 3 letter long cases.
(if curious, this is for replacing large number of links with templates in a Mediawiki based page).
Thanks in advance!
You can replace the result of following regex :
/\[\[([A-Z]{2,3})\]\]/
with :
{{/param\|\1/}}
Not that some regex engines use $ for capture group so you may need to use {{/param\|$1/}}
If you want to exclude some words you can use a negative look ahead :
/^\[\[((?!AAA|BBB|CCC)[A-Z]{2,3})]]$/gm
But note that since that preceding regex use anchors if you are dealing with a multiline string you need to use m flag (multiline flag).
See demo https://regex101.com/r/cR8zG6/1
You can search using this regex:
\[\[(\w+)\]\]
and replace using:
{{param|$1}}
RegEx Demo