Regex to select the string between two texts - regex

I have a code like:
<p>Also: <a>text 1</a></p> <p><a> text 2 </a></p>
I am using a regex like this, I just want to remove until the first </P>
<p>Also:(.*?)</p>
and the output is
empty
How do I select until the first </p> from <p>Also?

I think you want a regex like this:
/(?<=<p>Also:).+?(?=<\/p>)/i
[Regex Demo]
or
/^.*?(?<=<p>Also:).+?(?=<\/p>)/gi
[Regex Demo]

I tried VB.NET and found that your regex pattern works for your input, However, When I tried in regexr.com I found that the foward slash "/" should be escaped.
You could try this:
<p>Also:(.+?)<\/p>
Note: For HTML, I won't recommend you to use regex. It's better to use an HTML parser depending on your programming language.

Related

Multiline Search and Replace in Atom editor

I want to find this:
<p>
various text and code
</p>
...and replace it with completely different text. Atom doesn't seem to have a multi-line RegEx flag. How can I accomplish this?
The regular expression (.|\r?\n)*? is what you're looking for.
Used in the example above, <p>(.|\r?\n)*?</p> will select all three lines and you can then either replace or delete those lines.
Try the regex [\s\S]*?, with your example <p>[\s\S]*?</p>
see also https://github.com/atom/find-and-replace/issues/303

Sublime: replace everything between quotes

I need some help with Regular expression to Search and Replace in Sublime to do the following.
I have HTML-code with links like
href="http://www.example.com/test=123"
href="http://www.example.com/test=6546"
href="http://www.example.com/test=3214"
I want to replace them with empty links:
href=""
href=""
href=""
Please help me to create a Reg. ex. filter to match my case. I guess it would sound like "starts with Quote, following with http:// .... ends with Quote and has digitals and '=' sign", but I'm not very confident of how to write this in Reg. ex. way.
(?<=href=")[^"]*
Try this.Replace by empty string.
See demo.
https://regex101.com/r/sH8aR8/40

Negative lookahead but with something before it

I'm using a regex to parse some HTML I have the following regex which matches all tags except img and a.
\<(?!img|a)[^\>]+\>
This works well but I also want it to match the closing tags, I've tried the following but it doesn't work:
\</?(?!img|a)[^\>]+\>
What would be the best way to do this?
(Also before there is a plethora of comments saying not to use regexes to parse HTML I'd just like to say that this HTML is generated by a tool and is very uniform.)
EDIT:
<p>So in this</p>
<p>HTML <strong>with nested tags</strong></p>
<p>It should remove <i>everything</i> except This link
and this <img src="#" alt="image" /> but it also needs to kep the textual content</p>
I think that the simplest solution would be the following:
<\/?(?!img|a)[^>]+>
It simply matches:
a <,
a / (escaped with \) if there is any (quantifier ?),
asserts that there is neither img nor a,
a sequence of anything but > ([^>]+) and
a >
See it working here on regex101.
Ok here is a pretty wasteful solution:
<(?!img|a|\/img|\/a)[^>]+>
It would be great if someone could find a better one.

Get link text and href and wrap it in other html tags

How can stuff like this:
<b><a class='visit' href='LINK'>LINK'S NAME</a></b>
Can be turned into this:
<tr><td>LINK'S NAME<а href="LINK">constant text</a></td></tr>
You should never use regex for HTML manipulation, unless you have a good reason to do so.
regex for match:
/<b>\s*<a\s+class='visit'\s+href='([^']*)'\s*>([^<]+)<\/a>\s*<\/b>/
replacemenet:
"<tr><td>$2<а href=\"$1\">constant text</a></td></tr>"

How to Find Quotes within a Tag?

I have a string like this:
This <span class="highlight">is</span> a very "nice" day!
What should my RegEx-pattern in VB look like, to find the quotes within the tag? I want to replace it with something...
This <span class=^highlight^>is</span> a very "nice" day!
Something like <(")[^>]+> doesn't work :(
Thanks
It depends on your regex flavor, but this works for most of them:
"(?=[^<]*>)
EDIT: For anyone curious how this works. This translates into English as "Find a quote that is followed by a > before the next <".
Regexes are fundamentally bad at parsing HTML (see Can you provide some examples of why it is hard to parse XML and HTML with a regex? for why). What you need is an HTML parser. See Can you provide an example of parsing HTML with your favorite parser? for examples using a variety of parsers.
If you are using VB.net you should be able to use HTMLAgilityPack.
Try this: <span class="([^"]+?)?">
This should get your the first attribute value in a tag:
<[^">]+"(?<value>[^"]*)"[^>]*>
If your intention is to replace ALL quotation marks within tags, you could use the following regular expression:
(<[^>"]*)(")([^>]*>)
That will isolate the substrings before and after your quotation mark. Note that this does not attempt to match opening and closing quotation marks. It simply matches a quotation mark within a tag.