Exclude certain code from being auto formatted in WebStorm - webstorm

WebStorm auto-formats my js file perfectly.
However, there is a situation I want to exclude some code from being auto-formatted, like anything between <pre><code> and </code></pre>. Auto-format introduces new white spaces. Normally it's not a problem. However, I want to preserve the white spaces between my <pre><code> and </code></pre> tags.

Settings/Preferences | Editor | Code Style
HTML | Other
Make sure that such tags are listed in Keep white spaces inside list

Related

Possible Bug using Regex in Notepad++ with Replace All?

Have I found a bug in Notepad++ or am I doing something wrong?
Background info
(Please note that I do know that one are supposed not to use Regex parsing HTML, but I think this is a special case that should work - without the possible Notepad++ bug ;-)
I have exported Apple Notes as HTML using Exporter 3.0 on a Mac. In the HTML output every Note line is between <div> - </div> elements and also "header/title lines" like <h1> - </h1> or <h2> - </h2> etc. Each "header/title line" is often split in several unnecessary HTML header elements as in the following simplified example.
<div><h1>TEST </h1><h1>Title<br></h1></div>
<div><b><h2>T1</h2><u><h2>T2</h2></u><h2> </h2></b><h2>(</h2><h2>T3</h2><u><h2>T4</h2></u><h2>)</h2><b><h2><br></h2></b></div>
This HTML can't be imported into OneNote giving the same result as seen in Apple Notes i.e. each "header/title" line is split in multiple lines. That's true even when changing the <h1>/<h2> block elements to inline elements using an initial <style>h1, h2 {display: inline;}</style> statement. (Maybe that is a bug or restriction in OneNote, but I need to find a workaround.)
Therefore, I need to clean the example HTML output above from the unnecessary HTML header <h1> or <h2> (all but the first in every line) and </h1> or </h2> (all but the last in every line), to get the following result that can be imported to OneNote without problem.
<div><h1>TEST Title<br></h1></div>
<div><b><h2>T1<u>T2</u> </b>(T3<u>T4</u>)<b><br></h2></b></div>
Solution ? - Developed Regex
I'm quite new to Regex, especially advanced Regex, but I think I have found a way to clean the erroneous HTML code using TWO different Regex expressions as follows.
Both works well when tested using regex101.com, I think.
The first one is used to remove unnecessary </h1> or </h2> elements and is a Positive Lookahead function (it works both in regex101 and in Notepad++)
(</h[1-6]>)(?=.*?\1)
(Demo)
Picture 1 shows a working Find All + Mark All in Notepad++
Picture 2 shows a working Replace All
The Second one used to remove unnecessary <h1> or <h2> elements and is a Positive Lookbehind function (it works in regex101 but NOT fully in Notepad++)
(?<=(<(h[1-6])>))(?:.*?)\K\1
(Demo)
Picture 3 shows a working Find All + Mark All in Notepad++ = All 8 occurrences found
Picture 4 shows a NOT working Replace All in Notepad++ = Only 5 occurrences (of the 8 found) are replaced
If I redo the same Replace All a second time 2 of the
remaining 3 occurrences are replaced.
If I redo the same Replace All a third time the last
remaining occurrence is replaced.
BUG ?
Is this a bug in Notepad++ or is this behavior normal or am I doing something strange here? Please help me understand.
So, rather than make multiple passes through your data, you can get it all in one pass with this:
(^.*?<h[1-6]>)?(.*?)</?h[1-6]>(?=.*</h[1-6]>.*?$)
and replace it with \1\2. The first capture group skips the first <h#> on each line and is null after line start. The second capture group captures everything up to the next <h#> tag. The optional slash (/?) scans and deletes both open and close tags. The last part is a positive lookahead to make sure the last </h#> is not deleted.
In the two lines of your examples all the header levels were the same on the line and this regex is fine. If the first open and last close don't match, then you have a problem but I think your solutions also have that same problem. In any case you can fix that in a second pass with ^(.*<h)([1-6])(.*<h)[1-6] and replace it with \1\2\3\2.
I would also point out that this creates unbalanced HTML with a <b>, followed by <h1>, followed by </b>, followed by </h1>. I don't know if that is OK for your case. If not, it might be better to remove ALL the <h#> tags and anchor new ones just inside the <div> </div> pair.
In any event here is a REGEX101 screenprint with this regex working on your examples:

Dart - Split content of HTML tags

So I got this regex which matches everything inside an HTML tag:
/(?<=<\s*\w+[^>]*>)(.*)(?=<\/\w+>)/gm
Playground: https://regex101.com/r/WthKUd/3
What the regex does:
(?<=<\s*\w+[^>]*>) Checks for opening HTML tag
(.*) - Checks for any character
(?=<\/\w+>) - Checks for closing HTML tag
Now I need to tweak this so that I can extract content from a tag as a List.
So given the string:
<p>Lazy fox has <b>text</b> and <b>bold text again</b></p>
And doing:
<pattern>.allMatches('<p>Lazy fox has <b>text</b> and <b>bold text again</b></p>');
The result would be:
[
'Lazy fox has ',
'<b>text</b>',
' and ',
'<b>bold text again</b>'
]
It should basically split normal text content from HTML tags so I can populate a RichText widget with the correct styles.
I have tried to modify the regex in quite a few ways but I can't seem to get it to match text as one match group and tags as another.
How would I tweak the regex to do what I want?
EDIT: I am well aware of existing parsers and we are already using flutter_html but it doesn't meet some of our needs which is why I'm creating a simpler, slimmed down version.
It's may not be the solution that you are searching for, but I used the flutter_html package for a while now, and the render is great, maybe you can switch your RichText widget to this dependency ?
With this dependency, you can choose to only render some html tags and remove some others.

Regex to remove empty tags HTML except images

I have a regex to remove empty tags HTML, like <p></p> or <span></span>, but inside them I can have images and I want to ignore the tag <img>. My regex:
(<(?!\/)[^>]+>)+(<\/[^>]+>)+
My uses cases:
I want to ignore the last line, because I have an image inside the tag.
Check the live editor: https://regex101.com/r/81M8VR/1
The following seems to work:
(<(?!\/)((?!img)[^>])+>)+(<\/[^>]+>)+
https://regex101.com/r/A0N1rL/1

Regex to remove only span tags but preserve content found within them

I have a span tag like this:
<span id="item.2.2">3 October.--As I must do something or go mad, I write this diary.</span>
I'd like to be able to remove the open and closing span but leave the text within it. In addition the id part of the opening span does change, so it could be item.10.2 or item.100.5 so I would need to take that into account.
** edit **
Edited to add. The file(s) I'd want to replace this in also have span tags that of not include the id specifier and I do NOT want to remove them, or their closing , sorry I should have said that earlier.
Do a regex which replaces </?span[^>]*> with empty string

regular expression to parse html title tag

I need to parse a lot of html files in order to know which ones contain specific text within title tag.
Let's suppose that titles are
file1.htm
<title>100 text other text</title>
file2.htm
<title>text 100 text other text</title>
file3.htm
<title>text 1000 text other text</title>
file4.htm
<title>text one hundred text other text</title>
Following my example I need to find files name that contain 100 or one hundred, that is files 1,2 and 4.
My problem is that I don't know how to write regular expression
gci "c:\my_folder" | ? {$_.extension -eq ".htm"} |
select-string -pattern '<title>*100*</title>' |
Select-Object -Unique Path
Please note, if this may be important for regexp, that title tag is not at the beginning of a row but in the middle.
Thanks in advance.
This should do it.
^.*<title>(.*(100|one\shundred)[^0].*)?</title>.*$
try
<title>(.*[^[:alnum:]])?(100|one hundred)([^[:alnum:]].*)?</title>
for the pattern to match. pattern syntax is PCRE (like in perl), it can be reformulated if necessary.
best regards,
carsten
ps:
beware of the pitfalls - all the recommendations and warnings from the comments do hold; still, in your case, the regex approach seems viable (mainly because you're investigating the 'title' tag's content, there should only be a single one per file and spreading it across multiple lines would be plain silly imho).