Get whole url without preset parameters - regex

So I have this regex for getting a YouTube link
/(http|https):\/\/www\.youtube\.com\/watch\?v=(\w+)/i
But the problem is that it won't pick up the end of the link of something like this:
https://www.youtube.com/watch?v=videoID&sdfgsdfgsdfg;jsfdg;lkjsdf;gkj
It picks up https://www.youtube.com/watch?v=videoID and leaves &sdfgsdfgsdfg;jsfdg;lkjsdf;gkj alone. I want it to pick up the whole string while still extracting the video ID.

Try this
/https?:\/\/www\.youtube\.com\/watch\?.*?&?v=(\w+)(?:&[^\s]+)?/i
https? is the same as (?:http|https), you didn't say you needed to capture the protocol
/watch\?...
.*? - Consume any additional query parameters
&? - If there are other query paramters, then there will be an &
v=(\w+)(?:&[^\s]+)? - Capture the VideoID and, optionally, the rest of the URL up to whitespace

Related

Matching redirect on url end, ignoring the substring

Im currently trying to redirect from and old website to the new one.
The domain has changed and the subpath has changed, but the end is always the same, so I am trying to create a regex that will ignore the subpath, and only match with the ending, no matter what the combination might be.
Example:
http://shop.kmsport.dk/team-sport/bolde/fodbolde
https://kmsport.dk/collections/fodbolde
http://shop.kmsport.dk/fodbolde/fodbold-udstryr/anforerbind-325
These 3 urls all contain the word "fodbolde" but I only wanna match up the first two, since they both end on "/fodbolde", and ignoring the subpath in the process.
So far I've been able to match up the ends with this:
\/([a-zA-Z]*)*+$
How do I create something to account for the different subpaths?
P.s Its a massive sporting good store, so would be nice not having to creating a unique redirect for every possible combination -.-
If you are only interested in the last part just go with
url.rsplit('/', 1)[-1]
You current regex is not taking /fodbolde into account. If that has to be at the end you could use $ to assert the end of the string like /fodbolde$
One possibility could be to match the start of the string ^https?:// and optionally match shop. (?:shop\.)? followed by kmsport\.sk/
Then use a repeating pattern matching not a forward slash followed by a forward slash zero or more times (?:[^/]+/)* and at the end of the string match fodbolde fodbolde$
^https?://(?:shop\.)?kmsport\.dk/(?:[^/]+/)*fodbolde$

Conditional Regex to match url

I am trying to make a if/then condition to match the url, but I can't seem to get it to work. I am trying to match URLs and then capture the non-optional group. So - if a url comes in like this:
/en/testing.aspx
I want to capture /testing.aspx
if the url comes in like this:
/testing.aspx
I want to capture /testing.aspx
Is there an easy way to do this using regex?
EDIT:
The Url can be multi-part url, like /en/sub1/sub2/testing.aspx - I essentially want everything after "/en/".
use regex \/en(\/.+)$
Check this out
edited
https://regex101.com/r/lwowhi/6
If there is "/en/" in the URL and you still want to capture /testing.aspx then here is an edit (?:\/en)*(\/.+)$
https://regex101.com/r/lwowhi/8
You can use a greedy regex which will consume everything up until the final forward slash. Then, capture everything which comes after that point.
^.*?(?:\/en)?(\/.*)$
Demo
Guessing all pages are .aspx then use group.
regex: .(/..aspx)
this will match "/testing.aspx" in all bellow samples
/testing.aspx or
/en/testing.aspx or
www.abc.com/en-us/testing.aspx

Simple regex to replace first part of URL

Given
http://localhost:3000/something
http://www.domainname.com/something
https://domainname.com/something
How do I select whatever is before the /something and replace it with staticpages?
The input URL is the result of a request.referer, but since you can't render request.referer (and I don't want a redirect_to), I'm trying to manually construct the appropriate template using controller/action where action is always the route, and I just need to replace the domain with the controller staticpages.
You could use a regex like this:
(https?://)(.*?)(/.*)
Working demo
As you can see in the Substitution section, you can use capturing group and concatenates the strings you want to generate the needed urls.
The idea of the regex is to capture the string before and after the domain and use \1 + staticpages + \3.
If you want to change the protocol to ftp, you could play with capturing group index and use this replacement string:
ftp://\2\3
So, you would have:
ftp://localhost:3000/something
ftp://www.domainname.com/something
ftp://domainname.com/something

Search & Replace Request URI Filter in Google Analytics

I have 2 landing pages:
/aa/index.php/aa/index/[sessionID]/alpha
/bb/index.php/bb/index/[sessionID]/bravo
Because the sessionID is unique, each of the landing page will be tracked as different pages. Therefore, I need a filter to remove the sessionID. These are what i want to track:
/aa/index.php/aa/index/alpha
/bb/index.php/bb/index/bravo
I created the Search and Replace Custom Filter on the Request URI:
Search String: /(aa|bb)/index\.php/(aa|bb)/index/(.*)
Replace String: /$1/index.php/$2/index/$3
But i get the /$1/index.php/$2/index/$3 being reported on the dashboard the next day. So i tried /\1/index.php/\2/index/\3 but i got very strange results, //aa/index.php/aa/index/alpha/index.php/aa/index/aa.
Does anyone know how to reference the grouped patterns in the replace string?
My Solution:
i managed to solve it using Advanced Filter. My solution:
Field A => Request URI: /(aa|bb)/index\.php/(aa|bb)/index/(.*)/(.*)
Field B => -
Output to => Request URI: /$A1/index.php/$A2/index/$A4
I haven't used the Google Analytics regex engine, but it appears to me that \1 is referencing the entire match (which in other regex implementations is called \0), while \2 is the first group, \3 is the second group, and so on.
Your initial regex, however, looks incomplete--I think it should look as follows:
Search String: /(aa|bb)/index\.php/(aa|bb)/index(/.*)/(alpha|bravo)
Replace String: /\2/index.php/\3/index/\5
(Note that I'm not sure whether ? is supported in this regex implementation as the non-greedy modifier, but if it is, the above search string pattern might run a little faster if you change /.* to /.*?.)

Regex to match anything after /

I'm basically not in the clue about regex but I need a regex statement that will recognise anything after the / in a URL.
Basically, i'm developing a site for someone and a page's URL (Local URL of Course) is say (http://)localhost/sweettemptations/available-sweets. This page is filled with custom post types (It's a WordPress site) which have the URL of (http://)localhost/sweettemptations/sweets/sweet-name.
What I want to do is redirect the URL (http://)localhost/sweettemptations/sweets back to (http://)localhost/sweettemptations/available-sweets which is easy to do, but I also need to redirect any type of sweet back to (http://)localhost/sweettemptations/available-sweets. So say I need to redirect (http://)localhost/sweettemptations/sweets/* back to (http://)localhost/sweettemptations/available-sweets.
If anyone could help by telling me how to write a proper regex statement to match everything after sweets/ in the URL, it would be hugely appreciated.
To do what you ask you need to use groups. In regular expression groups allow you to isolate parts of the whole match.
for example:
input string of: aaaaaaaabbbbcccc
regex: a*(b*)
The parenthesis mark a group in this case it will be group 1 since it is the first in the pattern.
Note: group 0 is implicit and is the complete match.
So the matches in my above case will be:
group 0: aaaaaaaabbbb
group 1: bbbb
In order to achieve what you want with the sweets pattern above, you just need to put a group around the end.
possible solution: /sweets/(.*)
the more precise you are with the pattern before the group the less likely you will have a possible false positive.
If what you really want is to match anything after the last / you can take another approach:
possible other solution: /([^/]*)
The pattern above will find a / with a string of characters that are NOT another / and keep it in group 1. Issue here is that you could match things that do not have sweets in the URL.
Note if you do not mind the / at the beginning then just remove the ( and ) and you do not have to worry about groups.
I like to use http://regexpal.com/ to test my regex.. It will mark in different colors the different matches.
Hope this helps.
I may have misunderstood you requirement in my original post.
if you just want to change any string that matches
(http://)localhost/sweettemptations/sweets/*
into the other one you provided (without adding the part match by your * at the end) I would use a regular expression to match the pattern in the URL but them just blind replace the whole string with the desired one:
(http://)localhost/sweettemptations/available-sweets
So if you want the URL:
http://localhost/sweettemptations/sweets/somethingmore.html
to turn into:
http://localhost/sweettemptations/available-sweets
and not into:
localhost/sweettemptations/available-sweets/somethingmore.html
Then the solution is simpler, no groups required :).
when doing this I would make sure you do not match the "localhost" part. Also I am assuming the (http://) really means an optional http:// in front as (http://) is not a valid protocol prefix.
so if that is what you want then this should match the pattern:
(http://)?[^/]+/sweettemptations/sweets/.*
This regular expression will match the http:// part optionally with a host (be it localhost, an IP or the host name). You could omit the .* at the end if you want.
If that pattern matches just replace the whole URL with the one you want to redirect to.
use this regular expression (?<=://).+