Regex help - match one string but not another

Regex help - match one string but not another - regex

I have been using this:
~^\/student-accommodation\/(?:[^\/]+?)\/([^\/]+)\/$
to match for URLs like
/student-accommodation/manchester/ropemaker-court-manchester/
But now I need to edit this regex so it also matches for URLs like the below. All these new URLs will follow the same pattern and add a string that starts with #utm-source. Importantly they won't have another / in them.
/student-accommodation/manchester/ropemaker-court-manchester/#utm_source=afs&utm_medium=email&utm_campaign=ropemakercourt_afs_dec20
But then I don't want the regex to match for URLs like the below:
/student-accommodation/manchester/ropemaker-court-manchester/en-suite/
Can anyone help? I am a novice at regex! Thanks

Use
^\/student-accommodation\/[^\/]+\/([^\/]+)\/(?:#utm_source.*)?$
See proof
Explanation
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
\/ '/'
--------------------------------------------------------------------------------
student- 'student-accommodation'
accommodation
--------------------------------------------------------------------------------
\/ '/'
--------------------------------------------------------------------------------
[^\/]+ any character except: '\/' (1 or more
times (matching the most amount possible))
--------------------------------------------------------------------------------
\/ '/'
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
[^\/]+ any character except: '\/' (1 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
\/ '/'
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
#utm_source '#utm_source'
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
)? end of grouping
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string

Related

Regex matches from right instead of left

I don't know if I diagnosed the issue correctly, but it seems the analyzer starts from right to left, instead of from left to right.
My regex is: \/root\/(.+)\/(.+)\/(.+)\/(.+)$, and my trial sentence is irrelevant/root/user/course/exercise/custom/folder/file.txt
The match is correct, but not the groups. I want to get:
group 1: user
group 2: course
group 3: exercise
group 4: custom/folder/file.txt (basically everything that goes after the previous groups).
I'm running this on TS and playing with it on regex101.com (ES set)

Use
\/root\/([^/]+)\/([^/]+)\/([^/]+)\/(.*)$
See proof.
EXPLANATION
--------------------------------------------------------------------------------
\/ '/'
--------------------------------------------------------------------------------
root 'root'
--------------------------------------------------------------------------------
\/ '/'
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
[^/]+ any character except: '/' (1 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
\/ '/'
--------------------------------------------------------------------------------
( group and capture to \2:
--------------------------------------------------------------------------------
[^/]+ any character except: '/' (1 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \2
--------------------------------------------------------------------------------
\/ '/'
--------------------------------------------------------------------------------
( group and capture to \3:
--------------------------------------------------------------------------------
[^/]+ any character except: '/' (1 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \3
--------------------------------------------------------------------------------
\/ '/'
--------------------------------------------------------------------------------
( group and capture to \4:
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
) end of \4
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string

Optional element between .*?

I'm trying to extract an optional element via PCRE from the following example.
I need to pull out the xxxx-xxx-xxxx-xxxx-xxxxx if ActivityID exists.
I'm guessing I need to use lookaheads or the like but I can't quite wrap my head around it.
</Level><Task>...<Correlation ActivityID='{xxxx-xxx-xxxx-xxxx-xxxxx}'/><Execution...</Channel>
This works if the element exists, saving to taco64:
<\/level>(?<taco16>.*?)ActivityID='{(?<taco64>.*)}'(?<taco32>.*?)<Computer>
Being optional drops everything into taco32.
<\/level>(?<taco16>.*?)(ActivityID='{(?<taco64>.*)}')?(?<taco32>.*?)<Computer>

Use
<\/level>(?:(?<taco16>.*?)(ActivityID='{(?<taco64>.*)}'))?(?<taco32>.*?)<Computer>
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
< '<'
--------------------------------------------------------------------------------
\/ '/'
--------------------------------------------------------------------------------
level> 'level>'
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
(?<taco16> group and capture to taco16:
--------------------------------------------------------------------------------
.*? any character except \n (0 or more
times (matching the least amount
possible))
--------------------------------------------------------------------------------
) end of taco16
--------------------------------------------------------------------------------
(?<taco64> group and capture to taco64:
--------------------------------------------------------------------------------
ActivityID='{ 'ActivityID=\'{'
--------------------------------------------------------------------------------
( group and capture to \3:
--------------------------------------------------------------------------------
.* any character except \n (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of taco64
--------------------------------------------------------------------------------
}' '}\''
--------------------------------------------------------------------------------
) end of \2
--------------------------------------------------------------------------------
)? end of grouping
--------------------------------------------------------------------------------
(?<taco32> group and capture to taco32:
--------------------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
--------------------------------------------------------------------------------
) end of taco32
--------------------------------------------------------------------------------
<Computer> '<Computer>'

Match last occurrence only in perl regex

/1.1/s/1/-/g
I am working on school assignment to reference implementation sed command. I get this string to match "/1/-/". I have experiment
$str =~ m{/[^/]*/[^/]*/}g;
but result is /1.1/s/. How can I get "/1/-/" only
Could somebody help me?

Use
/[^/]*/[^/]*/(?=[^/]*$)
See proof.
EXPLANATION
--------------------------------------------------------------------------------
/ '/'
--------------------------------------------------------------------------------
[^/]* any character except: '/' (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
/ '/'
--------------------------------------------------------------------------------
[^/]* any character except: '/' (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
/ '/'
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
[^/]* any character except: '/' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
$ before an optional \n, and the end of
the string
--------------------------------------------------------------------------------
) end of look-ahead

Regex for URLs shows ".finance" as invalid/error

I got this regex for my laravel URL validator and if i put a ".finance" domain it shows it's against the regex. What is wrong? All other tested domain endings work so far.
$regex = '/^(http:\/\/www\.|https:\/\/www\.|http:\/\/|https:\/\/)?[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/.*)?$/';

Replace [a-z]{2,5} with [a-z]{2,} to allow any two or more letters in the TLD.
Remove {1}, these are always unnecessary.
(http:\/\/www\.|https:\/\/www\.|http:\/\/|https:\/\/)? is way to repeptitive, shorten it with optional groups / chars to (?:https?:\/\/(?:www\.)?)?.
Use
/^(?:https?:\/\/(?:www\.)?)?[a-z0-9]+(?:[-.][a-z0-9]+)*\.[a-z]{2,}(?::[0-9]{1,5})?(\/.*)?$/
See proof.
Explanation
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
http 'http'
--------------------------------------------------------------------------------
s? 's' (optional (matching the most amount
possible))
--------------------------------------------------------------------------------
: ':'
--------------------------------------------------------------------------------
\/ '/'
--------------------------------------------------------------------------------
\/ '/'
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
www 'www'
--------------------------------------------------------------------------------
\. '.'
--------------------------------------------------------------------------------
)? end of grouping
--------------------------------------------------------------------------------
)? end of grouping
--------------------------------------------------------------------------------
[a-z0-9]+ any character of: 'a' to 'z', '0' to '9'
(1 or more times (matching the most amount
possible))
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the most amount possible)):
--------------------------------------------------------------------------------
[-.] any character of: '-', '.'
--------------------------------------------------------------------------------
[a-z0-9]+ any character of: 'a' to 'z', '0' to '9'
(1 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
)* end of grouping
--------------------------------------------------------------------------------
\. '.'
--------------------------------------------------------------------------------
[a-z]{2,} any character of: 'a' to 'z' (at least 2
times (matching the most amount possible))
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
: ':'
--------------------------------------------------------------------------------
[0-9]{1,5} any character of: '0' to '9' (between 1
and 5 times (matching the most amount
possible))
--------------------------------------------------------------------------------
)? end of grouping
--------------------------------------------------------------------------------
( group and capture to \1 (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
\/ '/'
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
)? end of \1 (NOTE: because you are using a
quantifier on this capture, only the LAST
repetition of the captured pattern will be
stored in \1)
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string

Multiple regex matching within a lookaround

i am trying to do a multiple match within a lookbehind and a look forward
Let's say i have the following string:
$ Hi, this is an example #
Regex: (?<=$).+a.+(?=#)
I am expecting it to return both 'a' within those boundaries, is there a way to do it with only one regex?
Engine: Python

If you can use quantifiers in the lookbehind, use
(?<=\$[^$#]*?)a(?=[^#]*#)
See proof.
Explanation
--------------------------------------------------------------------------------
(?<= look behind to see if there is:
--------------------------------------------------------------------------------
\$ '$'
--------------------------------------------------------------------------------
[^$#]*? any character except: '$' and '#' (0 or more
times (matching the least amount possible))
--------------------------------------------------------------------------------
) end of look-behind
--------------------------------------------------------------------------------
a 'a'
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
[^#]* any character except: '#' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
# '#'
--------------------------------------------------------------------------------
) end of look-ahead
PCRE pattern:
(?:\G(?<!^)|\$)[^$]*?\Ka(?=[^#]*#)
See another proof
Explanation
--------------------------------------------------------------------------------
(?: group, but do not capture:
--------------------------------------------------------------------------------
\G where the last m//g left off
--------------------------------------------------------------------------------
(?<! look behind to see if there is not:
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
) end of look-behind
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
\$ '$'
--------------------------------------------------------------------------------
) end of grouping
--------------------------------------------------------------------------------
[^$#]*? any character except: '$' and '#' (0 or more times
(matching the least amount possible))
--------------------------------------------------------------------------------
\K match reset operator
--------------------------------------------------------------------------------
a 'a'
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
[^#]* any character except: '#' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
# '#'
--------------------------------------------------------------------------------
) end of look-ahead

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex help - match one string but not another - regex

Related

Regex matches from right instead of left

Optional element between .*?

Match last occurrence only in perl regex

Regex for URLs shows ".finance" as invalid/error

Multiple regex matching within a lookaround

Categories

Resources