How to extract and validate the id from a youtube video link? - regex

I found this RegEx for extracting youtube ID's:
#^http(?:s?)://(?:www\.)?youtu(?:be\.com/watch\?(?:.*?&(?:amp;)?)?v=|\.be/)([\w‌​\-]+)(?:&(?:amp;)?[\w\?=]*)?#i
Now I'm trying to modify the RegEx to extract the youtube id for a youtube URL in this format:
http://www.youtube.com/watch?v=ESUYMoJVpYo&feature=share&a=rRL4kwOAewcP9KzId6Ks4A
How do I make sure I get the Id extracted from all possible url formats...

URLs aren't normally parsed by regular expression. If you want to modify them in any way, then you probably shouldn't use them.
URLs use what's called a Query String to pass parameters to a page. The beginning of the query string is marked by a question mark and followed by an ampersand delimited list of name/value pairs.
For example, using your own url: http://www.youtube.com/watch?v=ESUYMoJVpYo&feature=share&a=rRL4kwOAewcP9KzId6Ks4A
Page request: www.youtube.com/watch
Whole query string: ?v=ESUYMoJVpYo&feature=share&a=rRL4kwOAewcP9KzId6Ks4A
Name/Value pairs:
v -> ESUYMoJVpYo
feature -> share
a -> rRL4kwOAewcP9KzId6Ks4A
If you want to parse/modify the URL, do so by breaking down the query string. That'll be much more reliable than trying to write a RegEx for it.

Related

Regular expressions (RegEx) to filter string from URLs in Google Analytics

I want to filter a string from the URLs in Google Analytics. This can be done using the Views > Filter > Exclude using RegEx, but I have been unable to get it to work.
An outline of how these filters are set up, can be found here, however, I can not work out how to isolate the string using RegEx. I believe it will need to be one filter per URL type.
The URLs follow this format:
/software/11F372288FA/pagename
/software/13F412C5FA/pagename/summary
/software/XIL1P0BFXCKM81/pagename2
I need to exclude this part of the URL:
/11F372288FA/
So that the URL data (e.g. Session time) is recorded against:
/software/pagename
/software/pagename/summary
/software/pagename2
I have worked out that I can isolate the string using thing following RegEx
^\/validate\/(..........)\/accounts\/summary$
It is not very elegant and would require a filter for every URL type.
Thanks for the help!
I'm not certain if this will work in your exact case but instead of using regex for this it might be easier to just create a new string from the start to the end of "software" and append everything from pagename to the end. In Java this might look something like:
String newString = oldString.substring(0, 9) + oldString.substring(oldString.indexOf("pagename"));
Take note though that this will only work if the "software" at the start is always the same length and you are actually only excluding things between "software" and "pagename".

Regex replace to map paginated URLs to a new format

I'm using a web crawler tool to compare two different website crawls before and after migration and need to map paginated URLs that have changed format.
e.g
Old: https://example.com/page/2/ OR: https://example.com/directory/page/16/
New: https://example.com/?page=2 OR: https://example.com/directory/?page=16
The tool has a regex replace feature for URL maping,
However, I cannot get the regex correct and the end result has an extra forward slash at the end:
https://example.com/?page=2/
What is the correct regex here to get the result I'm looking for?
Regex: /page/([0-9]+)/
Replace: /?page=$1

How to compare two string using RegEx

I have a collection in the MongoDB which has list of URLs like below.
In my business logic for some requirements, I want to check whether the called URL is matching with any of the records in the DB records.
like req.originalUrl i get suppose
/logistics/initiator/5ee7a0be36acdc46ae0576d6/users
But in the above URL obviously, I'm getting the actual Id -- 5ee7a0be36acdc46ae0576d6
What i tried:
I tried manually concatinating the req.baseUrl and req.route.path but that still gives me the below string
/logistics/initiator/:initiator/users
which is again incomparable.
Replace the ID with \{\w+\}, and use that as a regular expression to match against the url column in the table. So the regexp should be /logistics/initiator/\{\w+\}/users.

Get whole url without preset parameters

So I have this regex for getting a YouTube link
/(http|https):\/\/www\.youtube\.com\/watch\?v=(\w+)/i
But the problem is that it won't pick up the end of the link of something like this:
https://www.youtube.com/watch?v=videoID&sdfgsdfgsdfg;jsfdg;lkjsdf;gkj
It picks up https://www.youtube.com/watch?v=videoID and leaves &sdfgsdfgsdfg;jsfdg;lkjsdf;gkj alone. I want it to pick up the whole string while still extracting the video ID.
Try this
/https?:\/\/www\.youtube\.com\/watch\?.*?&?v=(\w+)(?:&[^\s]+)?/i
https? is the same as (?:http|https), you didn't say you needed to capture the protocol
/watch\?...
.*? - Consume any additional query parameters
&? - If there are other query paramters, then there will be an &
v=(\w+)(?:&[^\s]+)? - Capture the VideoID and, optionally, the rest of the URL up to whitespace

Replacing Anchor in JSTL

In my JSP I have receive some data which is coming from database my data is for example something like this :
Google is the greatest search engine ever http://www.google.com
what I wanna do is so simple: I want to make this link wrap in anchor tag using JSTL something like:
Google is the greatest search engine ever http://www.google.com
that's all !
take note that the urls are not constant, I mean I'm not sure what that be exactly & I just mentioned google here for the example.
Follow this SO question to create a replaceAll function for JSTL and then use the following pattern to replace the url to html link:
String pattern = "(http:[A-z0-9./~%]+)";
String str = "Google is the http://www.test.com greatest search engine ever http://www.google.com";
String replaced = str.replaceAll(pattern, "<a href='$1'>$1</a>");