ColdFusion regex to match valid Youtube links (need improvement) - regex

Although similar questions were asked on here multiple times already, I've got request to amend an existing regex line to improve it. Pretty sure this will help others in the same situation too.
What I'm trying to achieve is to match valid YouTube video URLs using ColdFusion regex.
Here's what I've currently got:
ReMatch('^.*(youtu.be\/|v\/|u\/\w\/|embed\/|watch\?v=|\&v=)([^##\&\?]*).*',mylink)
This works for the following URL types:
http://www.youtube.com/watch?v=0zM3nApSvMg&feature=feedrec_grec_index
http://www.youtube.com/user/IngridMichaelsonVEVO#p/a/u/1/QdK8U-VIH_o
http://www.youtube.com/v/0zM3nApSvMg?fs=1&hl=en_US&rel=0
http://www.youtube.com/watch?v=0zM3nApSvMg#t=0m10s
http://www.youtube.com/embed/0zM3nApSvMg?rel=0
http://www.youtube.com/watch?v=0zM3nApSvMg
http://youtu.be/0zM3nApSvMg
However, the following URL for whatever reason is getting matched too:
http://www.theguardian.com/media/2013/nov/29/russell-brand-rages-sun-rupert-murdoch
How can I amend the code to be a bit more accurate? Maybe making sure that the 'youtu' part is paramount to the link would help as I think the current regex only takes it as one of the optional parts? Trouble is I'm not able to amend this code myself, hence asking for help here.
//////EDITED////////////////
Thanks to Omega's answer below, with a little amendment here's the pattern that worked for my case:
ReMatch('(http:\/\/)(?:www\.)?youtu(?:be\.com\/(?:watch\?|user\/|v\/|embed\/)\S+|\.be\/\S+)',mylink)
Also, it is worth noting I had to strip the lookbehind part from the suggested pattern as ColdFusion does not support it.

(?<=http:\/\/)(?:www\.)?youtu(?:be\.com\/(?:watch\?|user\/|v\/|embed\/)\S+|\.be\/\S+)
See this demo.

Related

how to make regex

I was trying to solve a problem through regex. but It's very hard to make the regex. let look to an example maybe you people can help me out. and gave me some good source to learn regex. Now my problem is I want to make the regex for a sentence. e.g www.facebook.com www.goole.com www.online.facebook.com www.live.com if you see these example the www and com is same but the data between these are changing. i tried to make through this link but can't.

LocationMatch rules with Apache

I need to fix (with or without regex) so you can access "/admin.php?+" but not "/SUBSITE/admin.php?+" on a site. Anyone has a good idea of how to fix that?
Here is a regex which might fit your need (your original question was a bit vague):
^(?!www\.mysite\.com\/.*\/admin\.php).*$
I have tested this regex and it blocks "www.mysite.com/*/admin.php", but it allows the following:
www.mysite.com/*/guestbook.php
www.mysite.com/*.php
If you want additional restrictions in the regex, then let me know and I can refine this answer.
Regex101

Regex to find a web address

I'm trying to isolate links from html using a regex and the one I found that is suppose to do it doesn't seem to work.
/^(http?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/
Am I missing something? I'm using Brackets as my text editor
^(?:http|https):\/\/(?:[a-z0-9\-\.]+)(?::[0-9]+)?(?:\/|\/(?:[\w#!:\.\?\+=&%#!\-\/\(\)]+)|\?(?:[\w#!:\.\?\+=&%#!\-\/\(\)]+))?$
Messy, but works.
Also, you might want to look at a similar question: Regex expression for valid website link
Hope this helps :)
It is hard to make it 100% accurate.
A url could also be a IP address for example.
http://ip/
It can contain query strings.
http://www.google.com/?a=1&b=2
It can contain spaces.
http://www.google.com/this is my url/
It depends on what need you have for accuracy.

Need to better my regex for full sentence removal instead of link removal

So, after some help from some lovely folks surfing stackoverflow, I got a regex to remove links that people posted. Now, I think I want to find one that removes their entire post, perhaps with " ", so my form will not allow the post. (instead of hey, check out my site at [LINK REMOVED]. Which is awesome, but could be better if it removed the whole sentence instead of just the link.) I am terrible with regexes atm, so any help would be greatly appreciated!
Here is my current regex:
$a = $_POST['msge'];
$b = preg_replace('%[a-zA-Z0-9\-\.]+\.(com|org|net|mil|edu|COM|ORG|NET|MIL|EDU)%', '[LINK REMOVED]', $a);
Any ideas?
There are better ways to find links in a string, here's an example in Perl that was given in this related question. If you're dead set on using a regex, this was mentioned in another related question and looks more promising than the one you're currently trying.
If you want to do replacement of the entire sentence given a link, you could use something like the following:
[^.|^!|^?]*(link)[^.|^!|^?]*[.|!|?]
Obviously you would want to replace link with your link pattern match.
Subjectively I would also suggest it may be a little odd to remove entire sentences from the middle of content that people are posting since it may alter the entire meaning of the post. If your main intent is to remove the link (for example, to prevent spam backlinks) you may just want to obfuscate the link by replacing it with something obvious like -LINK-.

Get value between <b> tag using regex in Yahoo Pipes

I have searched up and down trying to find an answer that will work for me but haven't been able to figure this out. I'm using Yahoo Pipes for this.
Lake Harmony Estates <b>Sleeps: 16</b>
What I need to do is extract the Sleeps: 16 out from the B tag and output just that value and nothing else. I don't suspect this is very hard to do, but given my limited regex knowledge it's giving me troubles. I've tried adapting regex code pertaining to other tags, but just can't seem to get this one to work.
Any help on this would be appreciated. Thanks.
Edit:
Here is my pipe if you wanted to take a look at the regex horrible-ness I've created. The one I'm trying to work though is the item.sleeps, last entry in the 2nd regex
http://pipes.yahoo.com/pipes/pipe.info?_id=567026d850223b0075d80fd3c9bf7e75
This should fit your needs assuming the html isn't ladened with quotes and such. Note that the + will mean that empty <b> tags are ignored. Also, html is not truly passable via regex, so this will only work for basic tags. It should work even if the tag has an ID or a class property, but there are absolutely manners to break this regex.
/<b[^>]*>([^<]+)<\/b>/
I posted this question to Twitter and got a response back that worked for me.
(?s)^.*<b>(.*?)</b>.*
Replace with $1 and have G flag checked.
This solution did everything I needed. I had additional data that I had already excluded in my example that became unnecessary with this regex.