regexp, help with assertions - regex

I have the following string:
<a name="subhd_182"></a>
<a name="st_394"></a>
<a name="st_395"></a>
<a name="qn_494"></a>
<a name="st_495"></a>
<a name="qn_594"></a>
<a name="st_595"></a>
<a name="subhd_282"></a>
<a name="qn_694"></a>
<a name="st_695"></a>
<a name="qn_794"></a>
<a name="st_795"></a>
<a name="qn_894"></a>
<a name="st_895"></a>`
And I want to replace every <a name="st_\d*"></a> with <a name="qn_\d*"></a> if it follows immediately <a name="subhd_\d*"></a>
I use this regex %(.*<a name="subhd_.*)(?=<a name="st(?!<a name="qn))(<a name=")st(.*)%sU and replace with $1$2qn$3. But it also replaces second case too

I'm assuming you only want to match name after the first subhd row above, but not the second, since the first one is an "st_" and the second one is a "qn_".
Try:
(<a name="subhd_\d+">\s*<\/a>\s*<a name=")st(_\d+">)
where you would replace as $1qn$2 Note that here I have assumed that you were quite literal when you said "it follows immediately .
I don't really understand why you're throwing the lookahead in, unless the actual rule you're trying to implement is more complicated than you've stated.

Try: %(<a name="subhd_\d+"></a>\n<a name=")st(.*)%sU and replace with $1qn$2. On a sidenote I don't really know what the U modifier does for you here. Also, you might want to change your \n newline matcher according to your operating system.
I have found RegExr a really useful tool for regular expressions.

Related

Capture specific first matches in regex

I have this text and want to capture each match of the letter 'ñ' under the html href attribute. I want it to match the 'ñ' in both niño.html and niña.html, but not the ones in Niño and Niña:
<a href='niño.html'>Niño</a> <a href='niña.html'>Niña</a>
I tried this but it also matches Niño:
ñ(.*?\.html'>)+?
When replacing with n\1, it gives:
<a href='nino.html'>Nino</a> <a href='niña.html'>Niña</a>
What I would want the text to look like is:
<a href='nino.html'>Niño</a> <a href='nina.html'>Niña</a>
How can I do this?
when you try this the part between does not contain the single quote:
ñ([^']*?\.html'>)+?

Can't seem to capture newline+spaces in Regex

I know regexes aren't the best for web parsing, but I'm using it as an exercise.
I'm using Район:[^<>]*\n\s*<[^<>]*>\n\s*<a[^<>]*>([^<>]+)<\/a>
to try to match:
Район: </span>
<span class="company__contacts-item-text">
<a class="link" href="/moscow/top/marina-roscha/">Марьина роща</a>
I've been looking at it for a while but I don't know what I've been doing wrong. How can I capture something that would have newlines and different urls in the tags?
Try this regex:
Район:.+?<a[^>]+>(.+?)</a>
DESCRIPTION
DEMO
https://regex101.com/r/wA4oH0/1

Textmate Find regex, Replace wild

In textmate-1.5 I can use the regex syntax (.*) to find both lines in the below use case:
<span class="class1"></span>
<span class="class2"></span>
Now I want to append more code to each of them so my find query is span class="(.*)" and my replace query is span class="(.*)" aria-hidden="true" which i had hoped would result in this:
<span class="class1" aria-hidden="true"></span>
<span class="class2" aria-hidden="true"></span>
but it actually resulted in this:
<span class="(.*)" aria-hidden="true"></span>
<span class="(.*)" aria-hidden="true"></span>
Using find/replace (not using column selection which would work for this example but not for the actual situation) is it possible to maintain the area matched by regex in the replace action with a representative wild character or something?
Change your replace query as,
span class="$1" aria-hidden="true"
$1 would refer the characters which are present inside group index 1.
(<span class="[^"]*")
Try this.Replace with $1 aria-hidden="true".See demo.
http://regex101.com/r/wQ1oW3/22

Reg exp: string NOT in pattern

I have problems constructing a reg exp. I think I should use lookahead/behind but I just don't make it.
I want to make a reg-exp that catches all HTML tags that do NOT contain a string ('rabbit').
For example, the following tags should be matched
<a XXX> <span yyy> </div x zz> </li qwerty=ab cd> <div hello=stackoverflow>
But not the following
<a XXrabbitX> <span yyyrabbit> </div xrabbitzz> </li rabbit=abcd hippo=9876> <div hello=rabbit>
(My next step is to make make a substitution so that the word rabbit enters the tags, but that will hopefully come easy.)
(I use PHP5-preg_replace.)
Thanks.
I guess you're matching the HTML tags with a regex something like this:
/<[^>]*>/
You can add a negative look-ahead assertion in there to assert that "rabbit" cannot be found in the tag:
/<(?![^>]*rabbit)[^>]*>/

Regular expression to parse html links

I have this html with this type of snippit below all over:
<li><label for="summary">Summary:</label></li>
<li class="in">
<textarea class="ta" id="summary" name="summary" rows="4" cols="10" tabindex="4">
${fieldValue(bean: book, field: 'summary')}</textarea>
<a href="#" class="tt">
<img src="<g:createLinkTo dir='images/buttons/' file='icon.gif'/>" alt="Help icon for the summary field">
<span class="tooltip">
<span class="top"></span>
<span class="middle">Help text for summary</span>
<span class="bottom"></span>
</span>
</a>
</li>
I want to pull off the alt value and the text between XXXX and replace the a tag with the code below.
This is my stab at the reg ex
<a href="#" class="tt">.*alt="(.*)".*<span class="middle">(.*)<\/span><\/a>
Output with the callbacks
<ebs:cssToolTip alt="$1" text="$2"/>
I tried it out on http://rubular.com/ and it does not quite work. Any suggestions
You may want to ensure your regexp isn't greedily picking up characters - use ".*?" rather than straight ".*".
What do you mean, "it does not quite work"? How does it fail?
A suggestion (not tested your regexp): note that * is a greedy operator, so .* is rarely a good idea because it may match a lot more than what you intended.
Try:
<a href="#" class="tt">.*alt="([^"]*)".*<span class="middle">([^"]*)<\/span><\/a>
Think i solved it by getting an idea from another stackoverflow question
<a href="#" class="tt">.*alt="([^"]*)".*<span class="middle">([^<]*).*<\/a>
This seems to work on the http://rubular.com/ site
Here you go:
http://rubular.com/regexes/8434
You were facing two potential problems. First, without adding the //m option, '.' will not match newline characters. Second, you were using greedy matching. Adding the '*?' makes it better.
/<a href="#" class="tt">.*?alt="([^"]*)">.*?<span class="middle">(.*?)<\/span>/m