How to capture a group from a absolute fill path without any slash in it using JavaScript - regex

Here is a sample file path,
/Users/X/Q/Q-doc/src/templates/demos.js
The part I would love to capture is demos.
Here is another example,
/Users/X/Q/Q-doc/src/templates/demos1.js
The target I want is demos1.
I tried to use /\/(.*).js/ to capture the filename but seems it will also capture all the things in between.

([^\/]*?)\.js$
This will grab everything that is not a forward slash, so long as it's followed by .js, from the end of the string.
See it here

Your pattern is doing what it should, however your approach needs a fix you can use this approach instead:
(\w+)\.js
Working demo
Update: in case you need a match for samples like Kyle Fairns mentioned in his comment you can use
.*\/(.*?)\.js
Working demo

Related

What is the correct regex pattern to use to clean up Google links in Vim?

As you know, Google links can be pretty unwieldy:
https://www.google.com/search?q=some+search+here&source=hp&newwindow=1&ei=A_23ssOllsUx&oq=some+se....
I have MANY Google links saved that I would like to clean up to make them look like so:
https://www.google.com/search?q=some+search+here
The only issue is that I cannot figure out the correct regex pattern for Vim to do this.
I figure it must be something like this:
:%s/&source=[^&].*//
:%s/&source=[^&].*[^&]//
:%s/&source=.*[^&]//
But none of these are working; they start at &source, and replace until the end of the line.
Also, the search?q=some+search+here can appear anywhere after the .com/, so I cannot rely on it being in the same place every time.
So, what is the correct Vim regex pattern to use in order to clean up these links?
Your example can easily be dealt with by using a very simple pattern:
:%s/&.*
because you want to keep everything that comes before the second parameter, which is marked by the first & in the string.
But, if the q parameter can be anywhere in the query string, as in:
https://www.google.com/search?source=hp&newwindow=1&q=some+search+here&ei=A_23ssOllsUx&oq=some+se....
then no amount of capturing or whatnot will be enough to cover every possible case with a single pattern, let alone a readable one. At this point, scripting is really the only reasonable approach, preferably with a language that understands URLs.
--- EDIT ---
Hmm, scratch that. The following seems to work across the board:
:%s#^\(https://www.google.com/search?\)\(.*\)\(q=.\{-}\)&.*#\1\3
We use # as separator because of the many / in a typical URL.
We capture a first group, up to and including the ? that marks the beginning of the query string.
We match whatever comes between the ? and the first occurrence of q= without capturing it.
We capture a second group, the q parameter, up to and excluding the next &.
We replace the whole thing with the first capture group followed by the second capture group.

REGEX: Repeating Character that is Optional

I have a regex problem in which I have to capture these examples of outputs(github.com, medium.com, www.nytimes.com, www.theguardian.com, techcrunch.com) in the links, the problem is some link doesn't have "www" so I this is my regex:
https?:\/\/([w]{3}?\.?)
What I thought is that, make the www optional but I dont want to capture it. Thanks!
edit: I found the solution!
https?:\/\/([\w\-\.]+)
While your suggested solution achieves the desired effect, the suggested way would be
to use a non-capturing group
Which would turn this:
https?:\/\/([w]{3}?\.?)
Into this:
https?:\/\/(?:[w]{3}?\.?)

Is it possible to remove the slash in this matching?

I want to extend my regexp for filepaths matching and I don't know how to do it even if I see the problem.
Innput example
"C://species/dinosaurs/trex.json"
Ouput example
["C://species/dinosaurs" "trex" "json"]
so that I have the folder path, the filename and the extension.
I also want the folder path to be optional
My regexp
I tried
"^(.*[\\\/])?(.*)\.(.*)$"
It outputs
["C://species/dinosaurs/" "trex" "json"]
Almost but I have the / at the end of the head
I so tried
"^((.*)[\\\/])?(.*)\.(.*)$"
I ouputs
["C://species/dinosaurs/" "C://species/dinosaurs" "trex" "json"]
Maybe better because I juste have to remove the first match whereas in the first case I have to post-process the string.
I see the problem because several / can exist in the body so that it is harder.
Is it possible to say that the end of the first matching group can be all but not /.
I tried
^(.*(?!\/))[\\\/]?(.*)\.(.*)$
Does not work. I just discovered negative assertions but the output is
["C://species/dinosaurs/trex" "json"]
Any clue ?
This one should suit your needs:
^(?:(.*)/)?([^/]+)\.([^.]+)$
Visualization by Debuggex

Regex for youtube URL

I am using the following regex for validating youtube video share url's.
var valid = /^(http\:\/\/)?(youtube\.com|youtu\.be)+$/;
alert(valid.test(url));
return false;
I want the regex to support the following URL formats:
http://youtu.be/cCnrX1w5luM
http://youtube/cCnrX1w5luM
www.youtube.com/cCnrX1w5luM
youtube/cCnrX1w5luM
youtu.be/cCnrX1w5luM
I tried different regex but I am not getting a suitable one for share links. Can anyone help me to solve this.
Here's a regex I use to match and capture the important bits of YouTube URLs with video codes:
^((?:https?:)?\/\/)?((?:www|m)\.)?((?:youtube(-nocookie)?\.com|youtu.be))(\/(?:[\w\-]+\?v=|embed\/|v\/)?)([\w\-]+)(\S+)?$
Works with the following URLs:
https://www.youtube.com/watch?v=DFYRQ_zQ-gk&feature=featured
https://www.youtube.com/watch?v=DFYRQ_zQ-gk
http://www.youtube.com/watch?v=DFYRQ_zQ-gk
//www.youtube.com/watch?v=DFYRQ_zQ-gk
www.youtube.com/watch?v=DFYRQ_zQ-gk
https://youtube.com/watch?v=DFYRQ_zQ-gk
http://youtube.com/watch?v=DFYRQ_zQ-gk
//youtube.com/watch?v=DFYRQ_zQ-gk
youtube.com/watch?v=DFYRQ_zQ-gk
https://m.youtube.com/watch?v=DFYRQ_zQ-gk
http://m.youtube.com/watch?v=DFYRQ_zQ-gk
//m.youtube.com/watch?v=DFYRQ_zQ-gk
m.youtube.com/watch?v=DFYRQ_zQ-gk
https://www.youtube.com/v/DFYRQ_zQ-gk?fs=1&hl=en_US
http://www.youtube.com/v/DFYRQ_zQ-gk?fs=1&hl=en_US
//www.youtube.com/v/DFYRQ_zQ-gk?fs=1&hl=en_US
www.youtube.com/v/DFYRQ_zQ-gk?fs=1&hl=en_US
youtube.com/v/DFYRQ_zQ-gk?fs=1&hl=en_US
https://www.youtube.com/embed/DFYRQ_zQ-gk?autoplay=1
https://www.youtube.com/embed/DFYRQ_zQ-gk
http://www.youtube.com/embed/DFYRQ_zQ-gk
//www.youtube.com/embed/DFYRQ_zQ-gk
www.youtube.com/embed/DFYRQ_zQ-gk
https://youtube.com/embed/DFYRQ_zQ-gk
http://youtube.com/embed/DFYRQ_zQ-gk
//youtube.com/embed/DFYRQ_zQ-gk
youtube.com/embed/DFYRQ_zQ-gk
https://www.youtube-nocookie.com/embed/DFYRQ_zQ-gk?autoplay=1
https://www.youtube-nocookie.com/embed/DFYRQ_zQ-gk
http://www.youtube-nocookie.com/embed/DFYRQ_zQ-gk
//www.youtube-nocookie.com/embed/DFYRQ_zQ-gk
www.youtube-nocookie.com/embed/DFYRQ_zQ-gk
https://youtube-nocookie.com/embed/DFYRQ_zQ-gk
http://youtube-nocookie.com/embed/DFYRQ_zQ-gk
//youtube-nocookie.com/embed/DFYRQ_zQ-gk
youtube-nocookie.com/embed/DFYRQ_zQ-gk
https://youtu.be/DFYRQ_zQ-gk?t=120
https://youtu.be/DFYRQ_zQ-gk
http://youtu.be/DFYRQ_zQ-gk
//youtu.be/DFYRQ_zQ-gk
youtu.be/DFYRQ_zQ-gk
https://www.youtube.com/HamdiKickProduction?v=DFYRQ_zQ-gk
The captured groups are:
protocol
subdomain
domain
path
video code
query string
https://regex101.com/r/vHEc61/1
You're missing www in your regex
The second \. should optional if you want to match both youtu.be and youtube (but I didn't change this since just youtube isn't actually a valid domain - see note below)
+ in your regex allows for one or more of (youtube\.com|youtu\.be), not one or more wild-cards.
You need to use a . to indicate a wild-card, and + to indicate you want one or more of them.
Try:
^(https?\:\/\/)?(www\.youtube\.com|youtu\.be)\/.+$
Live demo.
If you want it to match URLs with or without the www., just make it optional:
^(https?\:\/\/)?((www\.)?youtube\.com|youtu\.be)\/.+$
Live demo.
Invalid alternatives:
If you want www.youtu.be/... to also match (at the time of writing, this doesn't appear to be a valid URL format), put the optional www. outside the brackets:
^(https?\:\/\/)?(www\.)?(youtube\.com|youtu\.be)\/.+$
youtube/cCnrX1w5luM (with or without http://) isn't a valid URL, but the question explicitly mentions that the regex should support that. To include this, replace youtu\.be with youtu\.?be in any regex above. Live demo.
I know I'm like 2 years late to the party, but I was needing to write something up anyway, and seems to fit every test case that I can throw at it. Should be able to reference the first match ($1) to get the ID. Matches the http, https, www and non-www, youtube.com, youtu.be, /watch? and /watch.php? on youtube.com (youtu.be does not use these), and it supports matching even when there are other variables in the URL string (?t= for time, ?list= for playlists, etc).
(?:https?:\/\/)?(?:youtu\.be\/|(?:www\.|m\.)?youtube\.com\/(?:watch|v|embed)(?:\.php)?(?:\?.*v=|\/))([a-zA-Z0-9\_-]+)
Format for YouTube videos has changed. This regex works for all cases:
^(http(s)??\:\/\/)?(www\.)?((youtube\.com\/watch\?v=)|(youtu.be\/))([a-zA-Z0-9\-_])+
Tests here.
Based on so many other regex; this is the best I have got:
((http(s)?:\/\/)?)(www\.)?((youtube\.com\/)|(youtu.be\/))[\S]+
Test:
http://regexr.com/3bga2
Try this:
((http://)?)(www\.)?((youtube\.com/)|(youtu\.be)|(youtube)).+
http://regexr.com?36o7a
I took one of the answers from here and added support for a few edge cases that I noticed in my dataset. This should work for pretty much any valid url.
^(?:https?:)?(?:\/\/)?(?:youtu\.be\/|(?:www\.|m\.)?youtube\.com\/(?:watch|v|embed)(?:\.php)?(?:\?.*v=|\/))([a-zA-Z0-9\_-]{7,15})(?:[\?&][a-zA-Z0-9\_-]+=[a-zA-Z0-9\_-]+)*(?:[&\/\#].*)?$
I tried this one and it works fine for me.
(?:http(?:s)?:\/\/)?(?:www\.)?(?:youtu\.be\/|youtube\.com\/(?:(?:watch)?\?(?:.*&)?v(?:i)?=|(?:embed|v|vi|user)\/))([^\?&\"'<> #]+)
You can check here https://regex101.com/r/Kvk0nB/1
https://regexr.com/62kgd
^((http|https)\:\/\/)?(www\.youtube\.com|youtu\.?be)\/((watch\?v=)?([a-zA-Z0-9]{11}))(&.*)*$
https://www.youtube.com/watch?v=YPz9zqakRbk
https://www.youtube.com/watch?v=YPz9zqakRbk&t=11
http://youtu.be/cCnrX1w5luM&y=12
http://youtu.be/cCnrX1w5luM
http://youtube/cCnrXswsluM
www.youtube.com/cCnrX1w5luM
youtube/cCnrX1w5luM
Check this pattern instead:
r'(?i)(http.//|https.//)*[A-Za-z0-9._%+-]+\.\w+'

RegEx to find all possible relative links to a specific file - also capture link text

Yes, there's hundreds of [regex] [html] topics on SO, but the first 30 I've checked don't help me with my problem.
I've got 745 total links (all relative, and they have to stay relative) to a file in my site. I need to find all these links and append data before and after them. I also need to capture and use the link text.
I've tried several expressions and the regex below is the closest I can get, but it's not good enough - it keeps finding a few instances of some other href to a different file and captures the content all the way to the </a> of the file I actually care about.
<a href="((.)*?)?myFile.html((.)*?)?>((.)*?)?</a>
In the above, I need to capture the relative path to the file and any anchors that might be present, as well as the actual link text.
What regex should I be using?
It shouldn't matter, but I'm using Adobe Dreamweaver to perform the search.
The following regex should work for what you need:
<a href="([^"]*?a\.fparameters\.html)(#[^"]+?)?".*?>(.*?)<
It will work even if you have URLs like:
JOBMAXNODECOUNT
that do not have #xxxx.
A few examples:
For JOBMAXNODECOUNTyou will get:
Group 1: a.fparameters.html
Group 2: #jobmaxnodecount
Group 3: JOBMAXNODECOUNT
For mjobctl -m to modify the job after it has been submitted. See the RSVSEARCHALGO you will get only one match:
Group 1: a.fparameters.html
Group 2: #rsvsearchalgo
Group 3: RSVSEARCHALGO
Try this regex: (updated)
href="([^"]*?)myFile\.html#?([^"]*).*?>(.*?)<\/a>
Explained demo here: http://regex101.com/r/lA6vB7
First, never do this: (.)* ...or this: (?:.)*
The first one consumes one character at a time and captures it in a group, each time overwriting previous captured character. The second one avoids most of that overhead by using a non-capturing group, but it's still only matching one character at a time inside that group; why bother? All it's doing is cluttering up the regex.
Adding the ? to make it non-greedy -- e.g. (.)*?-- doesn't make it worse, but it doesn't help, either. And sticking that inside another group and making the group optional -- i.e. ((.)*?)? -- is a recipe for catastrophic backtracking.. But performance considerations aside, when I see a capturing group with a quantifier attached, it almost always turns out mistake on the author's part. (ref)
As for your question, my solution turns out to be almost identical to Oscar's:
([^<>]*)