Regular expression to parse html links - regex

I have this html with this type of snippit below all over:
<li><label for="summary">Summary:</label></li>
<li class="in">
<textarea class="ta" id="summary" name="summary" rows="4" cols="10" tabindex="4">
${fieldValue(bean: book, field: 'summary')}</textarea>
<a href="#" class="tt">
<img src="<g:createLinkTo dir='images/buttons/' file='icon.gif'/>" alt="Help icon for the summary field">
<span class="tooltip">
<span class="top"></span>
<span class="middle">Help text for summary</span>
<span class="bottom"></span>
</span>
</a>
</li>
I want to pull off the alt value and the text between XXXX and replace the a tag with the code below.
This is my stab at the reg ex
<a href="#" class="tt">.*alt="(.*)".*<span class="middle">(.*)<\/span><\/a>
Output with the callbacks
<ebs:cssToolTip alt="$1" text="$2"/>
I tried it out on http://rubular.com/ and it does not quite work. Any suggestions

You may want to ensure your regexp isn't greedily picking up characters - use ".*?" rather than straight ".*".

What do you mean, "it does not quite work"? How does it fail?
A suggestion (not tested your regexp): note that * is a greedy operator, so .* is rarely a good idea because it may match a lot more than what you intended.
Try:
<a href="#" class="tt">.*alt="([^"]*)".*<span class="middle">([^"]*)<\/span><\/a>

Think i solved it by getting an idea from another stackoverflow question
<a href="#" class="tt">.*alt="([^"]*)".*<span class="middle">([^<]*).*<\/a>
This seems to work on the http://rubular.com/ site

Here you go:
http://rubular.com/regexes/8434
You were facing two potential problems. First, without adding the //m option, '.' will not match newline characters. Second, you were using greedy matching. Adding the '*?' makes it better.
/<a href="#" class="tt">.*?alt="([^"]*)">.*?<span class="middle">(.*?)<\/span>/m

Related

Textmate Find regex, Replace wild

In textmate-1.5 I can use the regex syntax (.*) to find both lines in the below use case:
<span class="class1"></span>
<span class="class2"></span>
Now I want to append more code to each of them so my find query is span class="(.*)" and my replace query is span class="(.*)" aria-hidden="true" which i had hoped would result in this:
<span class="class1" aria-hidden="true"></span>
<span class="class2" aria-hidden="true"></span>
but it actually resulted in this:
<span class="(.*)" aria-hidden="true"></span>
<span class="(.*)" aria-hidden="true"></span>
Using find/replace (not using column selection which would work for this example but not for the actual situation) is it possible to maintain the area matched by regex in the replace action with a representative wild character or something?
Change your replace query as,
span class="$1" aria-hidden="true"
$1 would refer the characters which are present inside group index 1.
(<span class="[^"]*")
Try this.Replace with $1 aria-hidden="true".See demo.
http://regex101.com/r/wQ1oW3/22

regular expression to add brackets before and after a repeated text

I have the following line, and I want to add a brackets before and after it:
from:
<span class="Footnote">    Matt. xx. 19.</span>
or:
<span class="Footnote">    1 Thess. i. 7.</span>
and different values of verse references.. (in other words anything in between those > and <
to:
<span class="Footnote"> (Matt. xx. 19.)</span>
and so on (it takes anything in between those > and < and add () before and after it..
p.s. I use notepad++ to search and replace..
edit:
the first 3 replies work great, even for anything not in the same format of the verse.. which is helpful.. however I noticed in the code some differences that doesn't get changed.. like if the code has any tags in between.. like:
<span class="Footnote">    [See <i>Dan</i>, note 12, p. 26, <i>infra</i>.  “Eternal” ="long.”]</span>
or if the code is divided in more than one line! like
<span class="Footnote">    some text
more text
</span>
Thanks in advance,
Find what:
Footnote">\s*([^>]+)\s*<
Replace with:
Footnote">(\1)<
Search for
<span class="Footnote">\s*([^<>]*?)\s*</span>
and replace with
<span class="Footnote">(\1)</span>
This changes
<span class="Footnote"> Matt. xx. 19. </span>
into
<span class="Footnote">(Matt. xx. 19.)</span>
Try this: (Couldnt test it, my family wants me to close the computer at Christmas breakfast).
preg_replace("/Footnote">([^>]*?)</span>/i","[\1]",$subject);

regexp, help with assertions

I have the following string:
<a name="subhd_182"></a>
<a name="st_394"></a>
<a name="st_395"></a>
<a name="qn_494"></a>
<a name="st_495"></a>
<a name="qn_594"></a>
<a name="st_595"></a>
<a name="subhd_282"></a>
<a name="qn_694"></a>
<a name="st_695"></a>
<a name="qn_794"></a>
<a name="st_795"></a>
<a name="qn_894"></a>
<a name="st_895"></a>`
And I want to replace every <a name="st_\d*"></a> with <a name="qn_\d*"></a> if it follows immediately <a name="subhd_\d*"></a>
I use this regex %(.*<a name="subhd_.*)(?=<a name="st(?!<a name="qn))(<a name=")st(.*)%sU and replace with $1$2qn$3. But it also replaces second case too
I'm assuming you only want to match name after the first subhd row above, but not the second, since the first one is an "st_" and the second one is a "qn_".
Try:
(<a name="subhd_\d+">\s*<\/a>\s*<a name=")st(_\d+">)
where you would replace as $1qn$2 Note that here I have assumed that you were quite literal when you said "it follows immediately .
I don't really understand why you're throwing the lookahead in, unless the actual rule you're trying to implement is more complicated than you've stated.
Try: %(<a name="subhd_\d+"></a>\n<a name=")st(.*)%sU and replace with $1qn$2. On a sidenote I don't really know what the U modifier does for you here. Also, you might want to change your \n newline matcher according to your operating system.
I have found RegExr a really useful tool for regular expressions.

Reg exp: string NOT in pattern

I have problems constructing a reg exp. I think I should use lookahead/behind but I just don't make it.
I want to make a reg-exp that catches all HTML tags that do NOT contain a string ('rabbit').
For example, the following tags should be matched
<a XXX> <span yyy> </div x zz> </li qwerty=ab cd> <div hello=stackoverflow>
But not the following
<a XXrabbitX> <span yyyrabbit> </div xrabbitzz> </li rabbit=abcd hippo=9876> <div hello=rabbit>
(My next step is to make make a substitution so that the word rabbit enters the tags, but that will hopefully come easy.)
(I use PHP5-preg_replace.)
Thanks.
I guess you're matching the HTML tags with a regex something like this:
/<[^>]*>/
You can add a negative look-ahead assertion in there to assert that "rabbit" cannot be found in the tag:
/<(?![^>]*rabbit)[^>]*>/

replacing html link with regex and yahoo pipes

my question is similar to the one here:
Regular expression on Yahoo! pipes
i have my gmail status hooked up to twitter through friendfeed, but unfortunately, they truncate the link text, and my links aren't working once they get to twitter. I need to be able to take this:
<div style="margin-top:2px;color:black;">/good jquery tips
<a rel="nofollow" style="text-decoration:none;color:#00c;" target="_blank" href="http://james.padolsey.com/javascript/things-you-may-not-know-about-jquery/" title="http://james.padolsey.com/javascript/things-you-may-not-know-about-jquery/">
http://james.padolsey.com/javascr...
</a>
</div>
and replace the truncated link with the href attribute, so it looks like this:
<div style="margin-top:2px;color:black;">/good jquery tips
<a rel="nofollow" style="text-decoration:none;color:#00c;" target="_blank" href="http://james.padolsey.com/javascript/things-you-may-not-know-about-jquery/" title="http://james.padolsey.com/javascript/things-you-may-not-know-about-jquery/">
http://james.padolsey.com/javascript/things-you-may-not-know-about-jquery/
</a>
</div>
thanks for the help!
Anthony's warning stands, you can't parse HTML safely with regexes, so please weigh up the risks involved if you choose to use Pipes.
Assuming you want to replace only this specific structure, and expect it to break if anything in the source changes subtly, the following will work well enough for your purposes:
replace:
(<a\s[^>]*href=")(.*?)("[^>]*>).*?(</a>)
with:
$1$2$3$2$4
(Using s and i options)