Why is this regular expression matching so much? - regex

I am trying to use http://www.regexr.com/ to create a regular expression.
Basically I am looking to replace something that matches <Openings>any other tags/text</Openings>
<Openings><opening><item><x>3</x><y>3</y><width>10.5</width><height>13.5</height><type>rectangle</type><clipX>0</clipX><clipY>0</clipY><imgsrc></imgsrc></item></opening></Openings>
I started with ([\<Openings\>])\w+ (http://regexr.com/393mv ) but it seems to be matching too many things. Right now that regular expression should only match <Openings>.

Regex to match the whole Openings tag is,
<Openings>.*?<\/Openings>
If you want to capture the contents inside the Openings tag then try the below,
<Openings>(.*?)<\/Openings>

([\<Openings\>])\w+
The brackets mean "Match any character in this". You should use
(\<Openings\>)\w+
which matches specifically "<Openings>" plus one or more word characters.

Related

Regex Group Capture, how to stop before next word

I have the following regular expression:
Defaults(.*)Class=\"(?<class>.*)\"(.*)StorePath=\"(?<storePath>.*)\"
And the following string:
Defaults Class="Class name here" StorePath="Any store path here" SqlTable="SqlTableName"
I'm trying to achieve the following:
class Class name here
storePath Any store path here
But, what I'm getting as a result is:
class Class name here
storePath Any store path here SqlTable="SqlTableName"
How to stop before the Sqltable text?
The language is C# and the regex engine is the built in for .NET framework.
Thanks a lot!
The solution proposed by #ahmed-abdelhameed solves the problem, I forgot the non-greedy.
Defaults(.*)Class=\"(?<class>.*)\"(.*)StorePath=\"(?<storePath>.*?)\"
Thanks!
In the storePath group, you're matching zero or more times of any character (greedy match). What greedy match means is that it will return as many characters as possible, so it keeps matching characters until it reaches the last occurrence of ".
What you need to do is to convert your greedy match into a lazy match by replacing .* with .*?. What lazy match means is that it will return as few characters as possible, so in your case, it'll keep matching character until it reaches the first occurrence of ".
Simply replace your regex with:
Defaults(.*)Class=\"(?<class>.*)\"(.*)StorePath=\"(?<storePath>.*?)\"
References:
Laziness Instead of Greediness.
What do 'lazy' and 'greedy' mean in the context of regular expressions?
Alittle easier to read:
Class="(.+?)".+?StorePath="(.+?)"
The .+? is saying match un-greedy, basically match as little as possible.
That will cause it to capture up to the next "

Regular Expression Words stuck together

Is there a way to write regular expressions to stop right before a particular word or characters?
For example, I have a text like:
Advisor:HarrisTeamTeamRole
So I want to write a regular expression that makes the advisor name dynamic, but only capture Harris. How do I write a regular expression to stop right before Team?
You could use a lookbehind and lookahead like this:
(?<=Advisor:).*?(?=Team)
Debuggex Demo
This will only capture from "Advisor:" up to the first "Team", and the regex will not capture anything else after (including "Team") in a capture group or otherwise. This will require a type of regex that can do lookbehinds... if you are not using that, you'll have to use grouping... which could be as simple as:
Advisor:(.*?)Team
and then just get the capture group #1
Try this one
This regular expression would be:
:([A-Z][a-z]*)
This one captures only the first word after the colon as long as it's in CamelCase, meaning it doesn't have to be the word Team it could be Advisor:HarrisNetworkSomething as well.
You can try in Lazy way and get the matched group from index 1
^Advisor:(.*?)Team
Here is online demo

how to get individual parameters using regular expression

Here is a string, func("abc", "def", "ghi"), I want to get the individual parameter using regular expression '"\"".*"\""', but it doesn't work, it will matches all of the arguments. How to get the individual parameter using regular expression?
this one may give what you want:
"[^"]*"
above is just the regex, you may need quote it to use in your programming language.
.* is greedy i.e it would match as much as possible
You have to make .* match lazily using .*?
.*? is lazy i.e it would match as less as possible
So your regex would be
"".*?""

How to get first match of string by Regular Expression?

I have the following text string:
$ABCD(file="somefile.txt")$' />Some more text followed by a dollar like this one)$. Some more random text
I am trying to match the $ABCD(file="somefile.txt")$ part of the string using a regular expression.
I am using this (?=[$]ABCD[(]file=).*(?<=[)][$]) regular expression pattern to make the intended match. It's not working as expected because I am getting a match all the way to the second )$ in the string.
For example, the match will be as follows:
$ABCD(file="somefile.txt")$' />Some more text followed by a dollar like this one)$
How should I modify the pattern to match to the end of the first occurrence of the )$?
Here is a good online regular expression engine tester:
http://derekslager.com/blog/posts/2007/09/a-better-dotnet-regular-expression-tester.ashx
try appending a ? to the greedy *
(?=[$]ABCD[(]file=).*?(?<=[)][$])
Lazy quantification
The standard quantifiers in regular expressions are greedy, meaning
they match as much as they can. Modern regular expression tools allow a quantifier to be specified as lazy (also known as > non-greedy, reluctant, minimal, or ungreedy) by putting a question mark after the quantifier
You could just use this:
\$ABCD\(file="[a-z.]+"\)\$
to get $ABCD(file="somefile.txt")$.
Your problem was the .* bit, it was too general and thus matched everything up to the last $.
I would advance you to use the second quote to define the end of the searched pattern: [^"]* will match to anything except ".
So the pattern for the file name would be: \$ABCD\(file="([^"]*)

How I match an HTML attribute using Notepad++ regular expression search?

Here is my text:
<span class="c1">Testing "this string"</span>
and I want to end up with this:
<span>Testing "this string"</span>
so I tried to use this regex in Notepad++ to replace with nothing:
class=".*"
but that matches this:
class="c1">Testing "this string"
How do I stop that match after one instance of "?
By default, regular expressions are greedy (and so .* will match as much as it possibly can, which, in your case is c1">Testing "this string). In general, you have two ways of getting around this:
Use a nongreedy (or lazy) modifier (.*?), which will match as little as possible (in your case, just c1). Notepad++ doesn't support lazy modifiers, though.
Specify exactly what you want to match with class="[^"]*", which will match everything that isn't a quote. In general, this is the more optimized solution, as well.
class=".*?"
Will make the * lazy.