Regex to match a string if not followed by another string - regex

In Mediawiki via Replace extension (MariaDB 10.6) I want to match the string <span class="sense"><span class="bld">A</span> and delete it, as long as there is no <span class="bld"> further down that line. Here is an example of text where it should not be matched:
<span class="sense"><span class="bld">A</span> [[lay bare at the side]], [[expose]], τι τῆς πλευρᾶς <span class="bibl">Arr. <span class="title">Tact.</span>40.5</span>, cf. <span class="bibl">D.C.49.6</span> (Pass.). </span><span class="sense"><span class="bld">2</span> metaph., [[lay bare]], [[disclose]], τὸν πάντα λόγον <span class="bibl">Hdt.1.126</span>, cf. <span class="bibl">8.19</span>, <span class="bibl">9.44</span>; τὸ βούλευμα <span class="bibl">Conon 50</span>:—Pass., <b class="b3">παρεγυμνώθη διότι</b>… <span class="bibl">Plb.1.80.9</span>.</span>
So far I tried (<span class="sense"><span class="bld">A<\/span>) ((?!<span class="bld">).*) (and replacing with nothing) but it matches instances that do contain the unwanted string.

You can use
<span class="sense"><span class="bld">A<\/span>(?s)(?!.*<span class="bld">)
See the regex demo. Details:
<span class="sense"><span class="bld">A<\/span> - a literal <span class="sense"><span class="bld">A</span> string
(?s) - s flag that makes . match across lines
(?!.*<span class="bld">) - a negative lookahead that fails the match if, immediately to the right of the current location, there are
.* - any zero or more chars as many as possible
<span class="bld"> - a literal string.

Related

Looking for regex to find footer elements

I would like to use regex to search for all instances of a footer in a epub like the following sample:
<p class="calibre1">2 <> GENERAL INTRODUCTION </p>
of the more general format:
<p class="calibre1">[page number from 1-1000][" <>"][Title of section]</p>
My goal is to use calibre's regex to find all instances of that footer and delete them but I've tried these expressions and none of them work to even find the one above example:
<p class="calibre1">[0-9] <>[^>] </p>
<p class="calibre1">[0-9] <> [\w] </p>
and even the general:
<p class="calibre1">[\w--[\d_]]</p>
<p class="calibre1">[0-9] [.]</p>
<p class="calibre1">[0-9] *[.]</p>
<p class="calibre1">[0-9][*.]</p>
I'm new to regex and am pulling my hair out. Please help with my (mis)understanding.
This should work for what you want:
^<p[ \t]*class="calibre1">[0-9]+[^<]*<>[^<]*<[/]p>$
Please try this:
^<p class="calibre1">\d{1,4}.*</p>$
^ - Anchor to the start of the line
<p class="calibre1"> - Actual text to match
\d{1,4} - match 1 to 4 digits
.* - then zero or more characters
<\p> - until the closing tag
$ - anchored to the end of the line

RegEx match string but not if string comes after

I'm doing a find/replace and but I have already made a few changes the slow way. I want to use regex to replace the rest but make sure I don't replace ones I've already done. So, I need it to match 1 but not 2. The end result will be replacing all instances that look like 1 with 2. The -icon can be anything
1: <span class="glyphicons icon">
2: <span class="glyphicons glyphicons-icon">
More examples:
<span class="glyphicons hand">
<span class="glyphicons flower">
<span class="glyphicons bucket">
<span class="glyphicons glyphicons-stone_head">
<span class="glyphicons glyphicons-decapitated-corpse">
I need to replace the first 3 examples but not the last 2. The application is quite large so I'd really like to be able to do this with one 'replace all'.
Assuming icon can be any word, I'd try replacing glyphicons\s([A-Za-z]+)" by glyphicons glyphicons-$1".

Textmate Find regex, Replace wild

In textmate-1.5 I can use the regex syntax (.*) to find both lines in the below use case:
<span class="class1"></span>
<span class="class2"></span>
Now I want to append more code to each of them so my find query is span class="(.*)" and my replace query is span class="(.*)" aria-hidden="true" which i had hoped would result in this:
<span class="class1" aria-hidden="true"></span>
<span class="class2" aria-hidden="true"></span>
but it actually resulted in this:
<span class="(.*)" aria-hidden="true"></span>
<span class="(.*)" aria-hidden="true"></span>
Using find/replace (not using column selection which would work for this example but not for the actual situation) is it possible to maintain the area matched by regex in the replace action with a representative wild character or something?
Change your replace query as,
span class="$1" aria-hidden="true"
$1 would refer the characters which are present inside group index 1.
(<span class="[^"]*")
Try this.Replace with $1 aria-hidden="true".See demo.
http://regex101.com/r/wQ1oW3/22

Notepad++ Regular Express with angle bracket data

Id' like to replace text with angle bracket as follows:
<p> <b id="docs-guid-785896d2-1" >Choose </span> <span style="font-size: 15px; ">barren</span> <span > passage.</span></b> </p>\r\n', <b id="docs-guid-785896d2-6" > <span >empty</span></b> </p>\r\n\r\n<div> </div>\r\n', '<p> <b id="docs-guid-785896d2-665" > <span >wheat</span></b> </p>\r\n'
all data is one line.
and i tried to remove b-tag like "<b id="docs-guid-785896d2-1" > xxxx </b>" => xxxx
i used "<b id="docs-guid-(.*)" >(.*)</b>" & "\2" to remove that tag, but only one string founded (of all 3)...
could you somebody help me to find & replace all 3 pairs..
thanks in advance.
Use the lazy version of (.*) by adding a question mark:
<b id="docs-guid-(.*?)" >(.*?)</b>
^ ^
Otherwise you'll match too much and the replace will remove more than necessary.
Or better yet, use negated class for some more efficiency:
<b id="docs-guid-[^"]+" >(.*?)</b>
Here, replace by $1

Help with a regular expression

I am fairly new to regular expressions and have been having difficulty using one to extract the data I am after. Specifically, I am looking to extract the date touched and the the counter from the following:
<span style="color:blue;"><query></span>
<span style="color:blue;"><pages></span>
<span style="color:blue;"><page pageid="3420" ns="0" title="Test" touched="2011-07-08T11:00:58Z" lastrevid="17889" counter="9" length="6269" /></span>
<span style="color:blue;"></pages></span>
<span style="color:blue;"></query></span>
<span style="color:blue;"></api></span>
I am currently using vs2010. My current expression is:
std::tr1::regex rx("(?:.*touch.*;)?([0-9-]+?)(?:T.*count.*;)([0-9]+)(&.*)?");
std::tr1::regex_search(buffer, match, rx);
match[1] contains the following:
2011-07-08T11:00:58Z" lastrevid="17889" counter="9" length="6269" /></span>
<span style="color:blue;"></pages></span>
<span style="color:blue;"></query></span>
<span style="color:blue;"></api></span>
match[2] contains the following:
6269" /></span>
<span style="color:blue;"></pages></span>
<span style="color:blue;"></query></span>
<span style="color:blue;"></api></span>
I am looking for just "2011-07-08" in match[1] and just "9" in match[2]. The date format will never alter, but the counter will almost certainly be much larger.
Any help would be highly appreciated.
That's because cmatch::operator[](int i) returns a sub_match, whose sub_match::operator basic_string() (used in the context of cout) returns a string starting at the beginning of the match and ending at the end of the source string.
Use sub_match::str(), i.e. match[1].str() and match[2].str().
Moreover, you'll need your expression to be more specific: .* tries to match the world, and gives up some if it can't.
Try std::tr1::regex rx("touched="([0-9-]+).+counter="([0-9]+)");.
You could even use non-greedy matchers (like +? and *?) to prevent excessive matching.
Try
std::tr1::regex rx("(?:.*touch.*;)?([0-9-]+)(?:T.*count.*;)([0-9]+)(&.*)?");
removing the question mark makes the term greedy, so it will fill as much as it can.