Regular expression groups

Regular expression groups - regex

For all the regex experts out there! I'm trying to figure out how to group my url into parts using regular expressions.
Example:
site.com/user/account/info/settings
I want to be able to capture the user/accout/info url NOT /settings
Can anyone take this challenge and be kind enough to help me out? Thanks!

If you want to get the beginning of the URL try this:
(\/.*\/(?!.*\/.+))
Input:
site.com/foo/remove-me/
site.com/user/account/info/settings
site.com/foo/bar/remove-me
site.com/foo/remove-me?param1=true&param2=hello+world
Output:
/foo/
/user/account/info/
/foo/bar/
/foo/
https://regex101.com/r/yI5rG4/2
After consideration of all your comments under your post, I understand that you want to get the last segment for controller name extraction. Hence try this:
(?:\/(?!.*\/.+))([^\?\n]*)
Used on these inputs:
site.com/foo/remove-me/
site.com/user/account/info/settings
site.com/foo/bar/remove-me
site.com/foo/remove-me?param1=true&param2=hello+world
Output for group 1:
remove-me/
settings
remove-me
remove-me
Test here: https://regex101.com/r/kR5tX6/2

Related

Multiple slash in URL replacement though regex

I am trying to create a regex in pcre, that is going to salinize URL with multiple slashes like the following:
https://www.domin.com/test1/////test2/somemoretests_67142 https://www.domin.com/test1/test2/somemoretests_67142///// https://www.domin.com/test1/test2///somemoretests_67142
So that I can replace it with the following: https://\2\4 and the link at the end of it looks: https://www.domin.com/test1/test2/somemoretests_67142
I have been struggling with it for the past couple of days, so any regex guru help is more than welcome :)
I have tried the following and more:
(http|https):\/\/(.*)(\/\/+)(.*)
(http|https):\/\/(.*)(\/\/){2,}(.*)
(http|https):\/\/(.*)(\/\/{2})(.*)
I am going to utilize these for Akamai to sanitize our URLs though cloudlet.

You can try:
(?<!https:\/)(?<!http:\/)(\/+$|(?<=\/)\/+)
And substitute the first group with empty string.
Regex demo.
This will produce this output:
https://www.domin.com/test1/test2/somemoretests_67142
https://www.domin.com/test1/test2/somemoretests_67142
https://www.domin.com/test1/test2/somemoretests_67142

Regex for Affiliate URL

For Matomo outgoing link tracking I need the regex pattern, which matched the following URLs:
https://www.example.com/product/?sku=12345&utm_source=123456789
and
https://www.example.com/product/?utm_source=123456789
"https://www.example.com/" and "utm_source=123456789" are always fixed in the URL, just "product/" or "category/product/" change and must replaced by regex pattern.
Thanks

Maybe this example can help you reach your goal:
(?<=https:\/\/www\.example\.com\/).+(?=utm_source=123456789)
It looks for any characters between these two groups:
https://www.example.com/
utm_source=123456789
Given the examples:
https://www.example.com/product/?sku=12345&utm_source=123456789
https://www.example.com/product/?utm_source=123456789
Your matches would be:
product/?sku=12345&
product/?

How to fix regex url pattern

I need to fix my url pattern:
/^((http(s)?(\:\/\/)){1}(www\.)?([\w\-\.\/])*(\.[a-zA-Z]{2,4}\/?)[^\\\/#?])[^\s\b\n|]*[^\.,;:\?\!\#\^\$ -]/
I thought this regex was ok, but it is not working for urls like: https://xx.xx (without www). 'www' should be optional ((www.)?). Where is the bug?

The problem is not in the (www\.)? part but that parts after that.
Take a look at the [^\\\/#?] and the [^\.,;:\?\!\#\^\$ -] parts.
So a valid URL would be https://xx.xx plus none of \/#? plus none of .,;:?!#^$_- making the url valid if you add those, for example https://xx.xx11.
I do advice you to not try to create your own regex because you are missing a lot!
For example, tlds like .amsterdam are valid. And why are you capturing so many groups?
Your regex as an image made with https://www.debuggex.com/:

WordPress URL Rewrite unable to get second matches

My URL is http://example.com/locate/ny/2
in functions, I use below code
$wp_rewrite->add_rule('locate/([^/]+)','index.php?page_id=294&cs=$matches[1]','top');
I got URL like this http://example.com/locate/ny I got this working, but i want to add a pagination after ny like ny?cpaged=3 and rewrite to ny/3
but what is the regexp for index.php?page_id=294&cs=$matches[1]&cpaged=$matches[2] from url http://example.com/locate/ny/2

You need to add another capturing group within the regex that just picks out the digits from the url. Assuming your url structure isn't going to change this regex should work.
$wp_rewrite->add_rule('locate\/([^\/]+)\/(\d*)','index.php?page_id=294&cs=$matches[1]&cpaged=$matches[2]','top');
See here for a demo and to play around with it further: https://regex101.com/r/BNkZBo/1/

Regular expression to match string from url

I want to match shop name from a url .Please see the example below. Its for url redirection in a word press application.
See the examples given below
http://example.com/outlets/19-awok?page=2
http://example.com/outlets/19-awok
http://example.com/outlets/159-awok?page=3
In all cases i need to get only awok from the url .It will be the text coming after '-' and before query string .
I tried below and its not working
/outlets/(\d+)-(.*)? => /shop/$2
Any help will be greatly appreciated.

You can use this regex:
/outlets/\d+-([^?]+)?
Trailing ? is used to strip previous query string.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regular expression groups - regex

Related

Multiple slash in URL replacement though regex

Regex for Affiliate URL

How to fix regex url pattern

WordPress URL Rewrite unable to get second matches

Regular expression to match string from url

Categories

Resources