Get a part of the second group - regex

I'm having some difficulties with regex.
Here is an example of the string on which I'm doing regex:
This is some useless information (first;second;third;fourth;fifth;sixth) (seventh;eigth;ninth;tenth)
I am looking for a regex that will allows me to pick only one of the word in parenthesis, like 'ninth'. The word I need to pick depends on where I'm in my program, so I will just adapt the regex once I will know how to write it
The best I have found for the moment is : (?<=\()([^]]+?)(?=\)).*?
That allows me to match the whole content of the group between parenthesis.
Can someone help me please?

If the need is to match the contents between parenthesis given a variable
input parameter it can be done like this :
(?<=\()(?:(?![()]).)*?(?<=[(;])(ninth)(?=[);])(?:(?![()]).)*(?=\))
It is dynamically constructed by joining the three parts.
(?<=\()(?:(?![()]).)*?(?<=[(;])( + variable + )(?=[);])(?:(?![()]).)*(?=\))
https://regex101.com/r/6yxQyp/1
Where the variable is captured in group 1 if needed.

Thank you for your help.
I finally find a regex that allows me to get the value I wanted.
To reuse my example :
This is some useless information (first;second;third;fourth;fifth;sixth) (seventh;eigth;ninth;tenth)
The worlds 'first' and 'seventh' will always be there so if i want to get the value of second i will use :
\(first;(\w+)
to get ninth's value i will use :
\(seventh;\w+;(\w+)
Hope this will help someone else !
Have a good day :)

Related

What is the correct regex pattern to use to clean up Google links in Vim?

As you know, Google links can be pretty unwieldy:
https://www.google.com/search?q=some+search+here&source=hp&newwindow=1&ei=A_23ssOllsUx&oq=some+se....
I have MANY Google links saved that I would like to clean up to make them look like so:
https://www.google.com/search?q=some+search+here
The only issue is that I cannot figure out the correct regex pattern for Vim to do this.
I figure it must be something like this:
:%s/&source=[^&].*//
:%s/&source=[^&].*[^&]//
:%s/&source=.*[^&]//
But none of these are working; they start at &source, and replace until the end of the line.
Also, the search?q=some+search+here can appear anywhere after the .com/, so I cannot rely on it being in the same place every time.
So, what is the correct Vim regex pattern to use in order to clean up these links?
Your example can easily be dealt with by using a very simple pattern:
:%s/&.*
because you want to keep everything that comes before the second parameter, which is marked by the first & in the string.
But, if the q parameter can be anywhere in the query string, as in:
https://www.google.com/search?source=hp&newwindow=1&q=some+search+here&ei=A_23ssOllsUx&oq=some+se....
then no amount of capturing or whatnot will be enough to cover every possible case with a single pattern, let alone a readable one. At this point, scripting is really the only reasonable approach, preferably with a language that understands URLs.
--- EDIT ---
Hmm, scratch that. The following seems to work across the board:
:%s#^\(https://www.google.com/search?\)\(.*\)\(q=.\{-}\)&.*#\1\3
We use # as separator because of the many / in a typical URL.
We capture a first group, up to and including the ? that marks the beginning of the query string.
We match whatever comes between the ? and the first occurrence of q= without capturing it.
We capture a second group, the q parameter, up to and excluding the next &.
We replace the whole thing with the first capture group followed by the second capture group.

Get only first match in Regex

Given this string: hello"C07","73" (quotes included) I want to get "C07". I'm using (?:hello)|(?<=")(?<screen>[a-zA-Z0-9]+)?(?=") to try to do this. However, it consistently matches "73" as well. I've tried ...0-9]+){1}..., but that doesn't work either. I must be misunderstanding how this is supposed to work, but I can't figure out any other way.
How can I get just the first set of characters between quotes?
EDIT: Here's a link to show my problem.
EDIT: Ok, here's exactly what I'm trying to do:
Basically, what I'm trying to get is this: 1) a positive match on "hello", 2) a group named "screen" with, in this case, "C07" in it and 3) a group named "format" with, in this case, "73" in it.
Both the "C07" and "73" will vary. "hello" will always be the same. There may or may not be an extra comma between "hello" and the first double-quote.
For you initial question of how to stop after the first match either removing the global search, or searching from the start of the string would accomplish that.
For the latter question you can name your groups and just keep extending the pattern throughout the line(s).
hello"(?<screen>[^"]+)","(?<format>[^"]+)"
Demo: http://regex101.com/r/PBXe8l/1
Based on your regex example, why not:
^(?:hello)"([a-zA-Z\d]+)"
Regex Demo

Proper regex that i couldn't reach

^http(?:s)?:\/\/(?:www\.|events\.)?(?:v\.)?(?:(?:youtube)|(?:youku))\.\w{2,}\/(?:(?:\d{4}\/[^\/]+\/api\/video-files\.php\?\w+=\w+|watch\?(?=[^?]*v=?\-?\w+)(?:[^\s?,^\&?]+)?)|(?:v_show\/id_(?:\w{10,})(?:\.html)?))$
I've such a regex that fits according to strings below :
http://v.youku.com/v_show/id_XODQxOTg0ODg0.html
http://www.youtube.com/watch?v=n66NLBbQ53w
http://events.youku.com/2014/misc/api/video-files.php?vid=XMjg3MzQ5NTg4
But I need to manage the last one to accept all the strings after 'vid='
for example this needs to be accessible also :
http://events.youku.com/2015/misc/api/video-files.php?vid=XMTI2MDEyNzQ0OA==
I've tried to add other | for this part such as \w+=\w+== but it didn't work...
Anyone please help me to accept the strings after the '=' sign ? \w+ doesn't work i think...
https://regex101.com/r/mR5cU3/1
Here there is:
I juste change your regexp adding (?:==)?
^http(?:s)?://(?:www.|events.)?(?:v.)?(?:(?:youtube)|(?:youku)).\w{2,}/(?:(?:\d{4}/[^/]+/api/video-files.php\?\w+=\w+(?:==)?|watch\?(?=[^?]*v=?-?\w+)(?:[^\s?,^\&?]+)?)|(?:v_show/id_(?:\w{10,})(?:.html)?))$
Solution :
^http(?:s)?:\/\/(?:www\.|events\.)?(?:v\.)?(?:(?:youtube)|(?:youku))\.\w{2,}\/(?:(?:\d{4}\/[^\/]+\/api\/video-files\.php\?\w+=\w+(?:==)?|watch\?(?=[^?]*v=?\-?\w+)(?:[^\s?,^\&?]+)?)|(?:v_show\/id_(?:\w{10,})(?:\.html)?))$

Reg Ex Facebook

I am trying to extract some information from facebook using Regex. Here is a link with an example:
https://graph.facebook.com/210989592315921
I was interested in what would the regular expression be in order to extract just the number of likes from this string.
I have tried for example this expression:
"likes":\s[0-9]$
Thank you in advance for any advice regarding this matter,
Mark
You should follow "#Hope I helped" comment and use a json parser. You can't be sure the text is going to be formatted always the same way:
Are you always going to have a single space between : and the number ?
By the way, here is the error you are looking for, your current regex matches a single figure, not a multiple digit number, you should use something like: [0-9]+ and probably remove the $ which is not correct in your example, as you have a comma after the number.

Regex, how to select all items outside of selection group

I'm a Regex noob and am pretty sure I'm not going about this in the most efficient way - wanted to get some advice.
I have a Regex expression ((\w+\b.*?){100}){1} which selects the first 100 words of my string, the length of which varies.
What I want is to select the entire string except for the first 100 words.
Is there syntax I can add to my current expression to do this, or am I better off trying to directly select the rest of the text instead.
Also, if anyone has any good resources for improving my Regex knowledge, i'd be very appreciative. Thus far I've found http://gskinner.com/RegExr/ to be very helpful.
Thanks in advance!
If you use this, you can refer to everything else as group 3 noted as $3
This one will treat hyphenated words as one word.
(\w+(-\w+|\b).*?){100}(.*)
Regex training Here