Multiline Search and Replace in Atom editor - regex

I want to find this:
<p>
various text and code
</p>
...and replace it with completely different text. Atom doesn't seem to have a multi-line RegEx flag. How can I accomplish this?

The regular expression (.|\r?\n)*? is what you're looking for.
Used in the example above, <p>(.|\r?\n)*?</p> will select all three lines and you can then either replace or delete those lines.

Try the regex [\s\S]*?, with your example <p>[\s\S]*?</p>
see also https://github.com/atom/find-and-replace/issues/303

Related

Regex to select the string between two texts

I have a code like:
<p>Also: <a>text 1</a></p> <p><a> text 2 </a></p>
I am using a regex like this, I just want to remove until the first </P>
<p>Also:(.*?)</p>
and the output is
empty
How do I select until the first </p> from <p>Also?
I think you want a regex like this:
/(?<=<p>Also:).+?(?=<\/p>)/i
[Regex Demo]
or
/^.*?(?<=<p>Also:).+?(?=<\/p>)/gi
[Regex Demo]
I tried VB.NET and found that your regex pattern works for your input, However, When I tried in regexr.com I found that the foward slash "/" should be escaped.
You could try this:
<p>Also:(.+?)<\/p>
Note: For HTML, I won't recommend you to use regex. It's better to use an HTML parser depending on your programming language.

Need help in regular expression for notepad ++

Below is my sample text .
<ul>
<li>Google</li>
<li>Yahoo</li>
<li>Bing</li>
</ul>
I would like to add an extra attribute in anchor tag with the value of hyperlink like below.
<ul>
<li>Google</li>
<li>Yahoo</li>
<li>Bing</li>
</ul>
I want to do this using notepad++ regular expression. Appreciate your help !!
You can use this regular expression find/replace:
Find: >([^<>]+)</a>
Replace:  aria-label="$1"$0
Transforming Quotes
In comments you asked to also replace a single quote by a repeated single quote, in both the texts. This cannot be done in the same replace operation, but you could launch a separate one, that should be executed before the one above:
Find: '(?=[^<>]*</a>)
Replace: ''
And then after this is done, you could apply the first replace operation.
I will assume that all your tags are correctly formed (no missing closing tag, no missing bracket, etc...). You can then do something like :
Replace :
(<a[^>]*)>([^<]*)(<\/a>)
by
$1 aria-label="$2">$2$3
Demo here
Use (?<=www\.)(\w+)(\..+\")(?=>) as a find template and \1\2 aria-label="\1" as replace template.
Click on Replace All button.

Notepad++ ( perl ) regex match multiple line pattern

I want to remove a div from a couple hundred html files
<div id="mydiv">
blahblah blah
more blah blah
more html
<some javascript here too>
</div>
I thought that this would do the job but it doesn't
<div(.*)</div>
Does anyone know which is the proper regex for this?
Regex
<div[^>]+>(.*?)</div>
Don't forget to check the option . matches newline like in the image below :
Alternatively, you can use this regex also: <div[^>]+>([\s\S]*?)</div> with or without the checkbox checked.
Discussion
Since * metacharacter is greedy, you need to tell him to take as few as possible characters (use of ?).
Check that the divs you want to remove DO NOT contain nested div. In that case, the regex at the start of my answer won't help you.
If you face this case, I'd suggest you using an html parser.

Regex replace is eating up the whole string! How do I make regex ungreedy?

I'm working with a really large spreedsheet in Open Office and I've had to learn regular expressions to clean it up.
Right now I'm trying to remove all <span> tags and I've come up with an expression to do so:
(<span.*?>|</span>)
The problem is that OpenOffice doesn't seem to like the question mark (which should make it ungreedy), so when I try to remove the <span> tags, it removes most of my string.
Here is a sample of the data: http://pastebin.com/AKWZJJCv
What is an alternative way of reming the <span> tags that would work in OpenOffice's find and replace?
You could also try (<span[^>]*>|</span>)
Give this a try:
<(\/)?span([a-zA-z\-\="0-9 ]*)?>
Tested here.

How to Find Quotes within a Tag?

I have a string like this:
This <span class="highlight">is</span> a very "nice" day!
What should my RegEx-pattern in VB look like, to find the quotes within the tag? I want to replace it with something...
This <span class=^highlight^>is</span> a very "nice" day!
Something like <(")[^>]+> doesn't work :(
Thanks
It depends on your regex flavor, but this works for most of them:
"(?=[^<]*>)
EDIT: For anyone curious how this works. This translates into English as "Find a quote that is followed by a > before the next <".
Regexes are fundamentally bad at parsing HTML (see Can you provide some examples of why it is hard to parse XML and HTML with a regex? for why). What you need is an HTML parser. See Can you provide an example of parsing HTML with your favorite parser? for examples using a variety of parsers.
If you are using VB.net you should be able to use HTMLAgilityPack.
Try this: <span class="([^"]+?)?">
This should get your the first attribute value in a tag:
<[^">]+"(?<value>[^"]*)"[^>]*>
If your intention is to replace ALL quotation marks within tags, you could use the following regular expression:
(<[^>"]*)(")([^>]*>)
That will isolate the substrings before and after your quotation mark. Note that this does not attempt to match opening and closing quotation marks. It simply matches a quotation mark within a tag.