Get only first match in Regex - regex

Given this string: hello"C07","73" (quotes included) I want to get "C07". I'm using (?:hello)|(?<=")(?<screen>[a-zA-Z0-9]+)?(?=") to try to do this. However, it consistently matches "73" as well. I've tried ...0-9]+){1}..., but that doesn't work either. I must be misunderstanding how this is supposed to work, but I can't figure out any other way.
How can I get just the first set of characters between quotes?
EDIT: Here's a link to show my problem.
EDIT: Ok, here's exactly what I'm trying to do:
Basically, what I'm trying to get is this: 1) a positive match on "hello", 2) a group named "screen" with, in this case, "C07" in it and 3) a group named "format" with, in this case, "73" in it.
Both the "C07" and "73" will vary. "hello" will always be the same. There may or may not be an extra comma between "hello" and the first double-quote.

For you initial question of how to stop after the first match either removing the global search, or searching from the start of the string would accomplish that.
For the latter question you can name your groups and just keep extending the pattern throughout the line(s).
hello"(?<screen>[^"]+)","(?<format>[^"]+)"
Demo: http://regex101.com/r/PBXe8l/1

Based on your regex example, why not:
^(?:hello)"([a-zA-Z\d]+)"
Regex Demo

Related

REGEX to match a pattern within a GET variable

I am looking for a pattern to match if a special character was used within a GET variable, for example, in this case, if the character '<' was used by a user at somepoint in the variable 'x'. Currently, I've created the following regex pattern:
^example\.php\?(.*)x=(.*)<(.*)&
So, for example, the following strings would be matched:
example.php?x=abc"def&y=ghi
example.php?x=abcde"f&y=ghi
But with the pattern that I'm currently using, every other GET variable that might be using "<" is also being matched, for example:
example.php?x=abcdef&y=ghi"&z=klm
How should I overcome this so that it matches everything up to the first appeareance of the '&' character (so that it only analyses the 'x' variable)?
I don't know if the question is understandable, if not, I'll try to provide further details.
It took me sometime but I finally discovered what should I do.
I couldn't parse the url and get the value for X because the regex is going to be on a SQL query (As far as I'm aware, that's no easy task).
The answer would be something in the light of:
example\.php\?(.*)x=[^&]*<
I hope that I can help someone

What is the correct regex pattern to use to clean up Google links in Vim?

As you know, Google links can be pretty unwieldy:
https://www.google.com/search?q=some+search+here&source=hp&newwindow=1&ei=A_23ssOllsUx&oq=some+se....
I have MANY Google links saved that I would like to clean up to make them look like so:
https://www.google.com/search?q=some+search+here
The only issue is that I cannot figure out the correct regex pattern for Vim to do this.
I figure it must be something like this:
:%s/&source=[^&].*//
:%s/&source=[^&].*[^&]//
:%s/&source=.*[^&]//
But none of these are working; they start at &source, and replace until the end of the line.
Also, the search?q=some+search+here can appear anywhere after the .com/, so I cannot rely on it being in the same place every time.
So, what is the correct Vim regex pattern to use in order to clean up these links?
Your example can easily be dealt with by using a very simple pattern:
:%s/&.*
because you want to keep everything that comes before the second parameter, which is marked by the first & in the string.
But, if the q parameter can be anywhere in the query string, as in:
https://www.google.com/search?source=hp&newwindow=1&q=some+search+here&ei=A_23ssOllsUx&oq=some+se....
then no amount of capturing or whatnot will be enough to cover every possible case with a single pattern, let alone a readable one. At this point, scripting is really the only reasonable approach, preferably with a language that understands URLs.
--- EDIT ---
Hmm, scratch that. The following seems to work across the board:
:%s#^\(https://www.google.com/search?\)\(.*\)\(q=.\{-}\)&.*#\1\3
We use # as separator because of the many / in a typical URL.
We capture a first group, up to and including the ? that marks the beginning of the query string.
We match whatever comes between the ? and the first occurrence of q= without capturing it.
We capture a second group, the q parameter, up to and excluding the next &.
We replace the whole thing with the first capture group followed by the second capture group.

Matching within matches by extending an existing Regex

I'm trying to see if its possible to extend an existing arbitrary regex by prepending or appending another regex to match within matches.
Take the following example:
The original regex is cat|car|bat so matching output is
cat
car
bat
I want to add to this regex and output only matches that start with 'ca',
cat
car
I specifically don't want to interpret a whole regex, which could be quite a long operation and then change its internal content to match produce the output as in:
^ca[tr]
or run the original regex and then the second one over the results. I'm taking the original regex as an argument in python but want to 'prefilter' the matches by adding the additional code.
This is probably a slight abuse of regex, but I'm still interested if it's possible. I have tried what I know of subgroups and the following examples but they're not giving me what I need.
Things I've tried:
^ca(cat|car|bat)
(?<=ca(cat|car|bat))
(?<=^ca(cat|car|bat))
It may not be possible but I'm interested in what any regex gurus think. I'm also interested if there is some way of doing this positionally if the length of the initial output is known.
A slightly more realistic example of the inital query might be [a-z]{4} but if I create (?<=^ca([a-z]{4})) it matches against 6 letter strings starting with ca, not 4 letter.
Thanks for any solutions and/or opinions on it.
EDIT: See solution including #Nick's contribution below. The tool I was testing this with (exrex) seems to have a slight bug that, following the examples given, would create matches 6 characters long.
You were not far off with what you tried, only you don't need a lookbehind, but rather a lookahead assertion, and a parenthesis was misplaced. The right thing is: Put the original pattern in parentheses, and prepend (?=ca):
(?=ca)(cat|car|bat)
(?=ca)([a-z]{4})
In the second example (without | alternative), the parentheses around the original pattern wouldn't be required.
Ok, thanks to #Armali I've come to the conclusion that (?=ca)(^[a-z]{4}$) works (see https://regexr.com/3f4vo). However, I'm trying this with the great exrex tool to attempt to produce matching strings, and it's producing matches that are 6 characters long rather than 4. This may be a limitation of exrex rather than the regex, which seems to work in other cases.
See #Nick's comment.
I've also raised an issue on the exrex GitHub for this.

Regex: Non fixed-width look around assertions?

My college asked my to provide him with a regex that only matches if the test-string endswith
.rar or .part1.rar or part01.rar or part001.rar (and so on).
Should match:
foo.part1.rar
xyz.part01.rar
archive.rar
part3_is_the_best.rar
Should not match:
foo.r61
bar.part03.rar
test.sfv
I immediately came up with the regex \.(part0*1\.)?rar$. But this does match for bar.part03.rar.
Next I tried to add a negative look behind assertion: .*(?<!part\d*)\.(part\0*1\.)?rar$ That didn't work either, because look around assertions need to be fixed width.
Then I tried using a regex-conditional. But that didn't work either.
So my question: Can this even be solved by using pure regex?
An answer should either contain a link to regex101.com providing a working solution, or explain why it can't work by using pure regex.
You could use lookahead to verify the one case that fails your original regex (.rar with .part part that isn't 0*1) is discredited:
^(?!.*\.part0*[^1]\.rar$).*\.(part0*1\.)?rar$
See it in action
This is an old question, but here's another approach:
(?:\.part0*1\.rar|^(?<!\.)\w+\.rar)$
The idea is to match either:
A string that ends with .part0*1.rar (ie foo.part01.rar, foo.part1.rar, bar.part001.rar), OR
A string that ends with .rar and doesn't contain any other dots (.) before that.
Works on all your test cases, plus your extra foo.part19.rar.
https://regex101.com/r/EyHhmo/2

RegEx: Match Mr. Ms. etc in a "Title" Database field

I need to build a RegEx expression which gets its text strings from the Title field of my Database. I.e. the complete strings being searched are: Mr. or Ms. or Dr. or Sr. etc.
Unfortunately this field was a free field and anything could be written into it. e.g.: M. ; A ; CFO etc.
The expression needs to match on everything except: Mr. ; Ms. ; Dr. ; Sr. (NOTE: The list is a bit longer but for simplicity I keep it short.)
WHAT I HAVE TRIED SO FAR:
This is what I am using successfully on on another field:
^(?!(VIP)$).* (This will match every string except "VIP")
I rewrote that expression to look like this:
^(?!(Mr.|Ms.|Dr.|Sr.)$).*
Unfortunately this did not work. I assume this is because because of the "." (dot) is a reserved symbol in RegEx and needs special handling.
I also tried:
^(?!(Mr\.|Ms\.|Dr\.|Sr\.)$).*
But no luck as well.
I looked around in the forum and tested some other solutions but could not find any which works for me.
I would like to know how I can build my formula to search the complete (short) string and matches everything except "Mr." etc. Any help is appreciated!
Note: My Question might seem unusual and seems to have many open ends and possible errors. However the rest of my application is handling those open ends. Please trust me with this.
If you want your string simply to not start with one of those prefixes, then do this:
^(?!([MDS]r|Ms)\.).*$
The above simply ensures that the beginning of the string (^) is not followed by one of your listed prefixes. (You shouldn't even need the .*$ but this is in case you're using some engine that requires a complete match.)
If you want your string to not have those prefixes anywhere, then do:
^(.(?!([MDS]r|Ms)\.))*$
The above ensures that every character (.) is not followed by one of your listed prefixes, to the end (so the $ is necessary in this one).
I just read that your list of prefixes may be longer, so let me expand for you to add:
^(.(?!(Mr|Ms|Dr|Sr)\.))*$
You say entirely of the prefixes? Then just do this:
^(?!Mr|Ms|Dr|Sr)\.$
And if you want to make the dot conditional:
^(?!Mr|Ms|Dr|Sr)\.?$
^
Through this | we can define any number prefix pattern which we gonna match with string.
var pattern = /^(Mrs.|Mr.|Ms.|Dr.|Er.).?[A-z]$/;
var str = "Mrs.Panchal";
console.log(str.match(pattern));
this may do it
/(?!.*?(?:^|\W)(?:(?:Dr|Mr|Mrs|Ms|Sr|Jr)\.?|Miss|Phd|\+|&)(?:\W|$))^.*$/i
from that page I mentioned
Rather than trying to construct a regex that matches anything except Mr., Ms., etc., it would be easier (if your application allows it) to write a regex that matches only those strings:
/^(Mr|Ms|Dr|Sr)\.$/
and just swap the logic for handling matching vs non-matching strings.
re.sub(r'^([MmDdSs][RSrs]{1,2}|[Mm]iss)\.{0,1} ','',name)