Notepad++ Replace new line inside text - regex

I have this sample, because is one of one million rows with that.
I have this text:
<tr class="even">
<td><a href="http://www.ujk.edu.pl/">Jan Kochanowski
University of Humanities and Sciences (Swietokrzyska Pedagogical
University) / Uniwersytet Humanistyczno Przyrodniczy Jana Kochanowskiego
w Kielcach</a></td>
I want to replace to be like that:
<tr class="even">
<td>Jan Kochanowski University of Humanities and Sciences (Swietokrzyska Pedagogical University) / Uniwersytet Humanistyczno Przyrodniczy Jana Kochanowskiegow Kielcach</td>
I tried that REGEX: (.*)
But didn't work.

Open the replace window with Ctrl + H
Then enter
Find what: ([^>])[\r\n]{1,2}
Replace with: \1
Check Regular Expression
[^>] matches a character that isn't a >
The {1,2} protects against a file that may only have a newline and not a carriage return.
\1 replaces just the character that was in the grouping ( ).

Open Notepad++
Click Search >> Replace..
Replace with: \n
At the bottom you will find "Search Mode", click "Extended"
Live example here: http://postimg.org/image/c66pw8kkr/

If you can't make jmstoker's solution work, try like this:
you need to check if the line breaks are just CRLF or just one of them, for that, click on the toolbar the icon "show all characters" or go to menu View -> Show Symbol -> Show all characters
in the replace dialog select the "Extended" search mode
in the "find what:" field, write this: \r\n (or just \r or just \n, basically match CR with \r and LF with \n)
leave the replace with field empty
once this is done, all the line breaks will have been replaced, but you want the <tr class="even"> to be on its own line, so just replace still using an extended search <tr class="even"><td> with <tr class="even">\r\n<td>
I'm guessing you also have rows with class "odd" or something like that so you might need to repeat that last step with the different class :)

Related

Find and replace between first '>' and next '<' in vim

I have an html table with some headers:
<th>Header01</th>
...
<th>Different Header 20</th>
I'd like to find and replace everything between the first > and the next < with {{ }}:
<td>{{ }}</td>
...
<td>{{ }}</td>
I know I can use :s/\%Vth/td/g to replace all the th with td, but how can I use Vim's regex to find and replace everything between > and < with {{ }}?
I have attempted the following without success
:s/\%V\>(.*?)\</{{ }}/g
if you would take the risk of changing a html/vim by regex, in vim, you can just do:
%s/>[^<]*</>{{ & }}</g
You may use
%s/>\(.\{-}\)</>{{ \1 }}</g
In a non-very magic mode, \< and \> are word boundaries that is why they did not work on your side. Besides, *?, a Perl-like lazy quantifier, should be written as \{-} in Vim. The % symbol at the start tells Vim to search and replace on all lines, not just the current one.
Details
> - matches a >
\(.\{-}\) - captures into Group 1 any 0 or more chars (but linebreaks, if you need to include line breaks, prepend . with \_) but as few as possible
< - matches a <
The replacement is >{{ \1 }}<, >{{, Group 1 value and }}>. g makes multiple search and replace operations on lines.
I may be misreading the question, but it appears the desired replacement text is paired double braces around 3 whitespaces (not around the original matched text: "replace everything between the first > and the next < with {{ }}". If correct, then the following simple substitute command should work:
%s/>.*</>{{ }}</g
If you have just one tag --> content per line, you can do:
:%norm cit{{ }}
As you would know, "cit" is a vim text-object that stands for "Change Inner Tag".
If you have many tags on each line you can try:
:%s,>\zs[^<]*\ze</.*,{{ }},g
For more read :h text-objects and :h \zs

Match whitespace created by Chrome devtools source view?

<span style='mso-tab-count:1'>         </span>
<span style='mso-tab-count:1'> </span>
The bottom line above is from a "View Source Code" page and the top line is from the Chrome Developer Tools Source view. The RegEx below matches the bottom tags, which contain a series of spaces, but not the top tags, which enclose just empty whitespace. See this on the Regex Tester at https://regex101.com/r/P9dUP9/2
(<span style='mso-tab-count:1'>)\s{2,}(<\/span>)
How could I make the Regex also match the top line, and how can I tell the difference between the two kinds of whitespace on the screen without copying and pasting both of them into a text editor?
I guess it's a non-printable control character. My hex editor tells me it's \x20, but that's not captured for me. Your best bet would be using an exclusion such as:
(<span style='mso-tab-count:1'>)[^<]{2,}(<\/span>)
or
(<span style='mso-tab-count:1'>)\W{2,}(<\/span>)

How to change all title attribute's value in Title Case in sublime text

I have 500 HTML files in my project where casing and quotes (" or ') in <title> attribute vary over all pages, see few examples below
<button title="Next" id="next"> Next</button>
<button title="next"> Next </buton>
<button title=""please go back">Check</button>
I want to change all title attributes in Title Case
<button title="Next" id="next"> Next</button>
<button title="Next"> Next </buton>
<button title="Please Go Back">Check</button>#
I have tried to find and replace - Regular Expression and Case sensitive button enabled
Find What: title=(".*")\s
Replace With: title="\u$"
But didn't get success.Please tell me what I am doing wrong?
UPDATED : also want to remove extra ' " see #
To further my comment, first it's the issue of .* being 'greedy' instead of 'lazy', meaning it is matching as much as possible (i.e. Next"> Next</button><button title="Next in your example).
The quick fix is using a 'lazy' .* instead, aka .*? (I added a ? to indicate possible presence of space because there's none in your examples):
title=(".*?")\s?
To improve performance, you would use a negated class:
title=("[^"]+")\s?
Where [^"]+ matches any character except ".
And to cope with the different quotes, you can use:
title=("[^"]+"|'[^']+')\s?
Which basically means either "[^"]+" or '[^']+' for the part within the parentheses.
For the replace and consecutive quotes issue:
title=(?:"+([^"]+)"+|'+([^']+)'+)\s?
Replace with:
title="\u$1$2"
The only thing is that the last line will be <button title="Please go back">Check</button>, if that's not an issue...
EDIT: \G actually works. Use a second replace:
(?:(?<=title=")|(?<!^)\G)[^\s"]+\s?
Replace with:
\u$0
(?<=title=('|")).+?(?=('|"))
this should give you matches Next next please go back that you can use.
you can use the index of the match to find your match in the Original string if you want to upper your lowers..
or use title=('|").+?(\1) to find any title attributes in your tekst including the quotation marks

How can I search text using regex when it contains \r\n

I am using Sublime Text 2's regex search and replace tool and would like to search text that includes the \r and \n special characters but cannot see how just at the moment.
For example, I have the text:
<div class="head">\r\n
\r\n Keep this text\r\n</div>
Which I would like to transform into:
<h1>Keep this text</h1>
I would also like to factor in the eventuality that these \r\n characters may not be present.
How might I search accounting for \r\n being present and absent, and then remove them as per above? If two regex are required that's fine too.
So far I have <div class="head">(\w)+</div>, however this is stalled by the aforementioned \r\n.
I think you're looking for \s, which matches white space.
So your regex should be something like the following:
<div class="head">\s*(.+?)\s*</div>
If you can do this in ST2, then I think it would fit your need:
Find:
<div class="head">[\s\r\n]*([\w ]+)[\s\r\n]*<\/div>
Replace by:
<h1>$1</h1>
Demo

Delete all lines until a specific line in Notepad++

I have to edit a lot of source codes similar to each other.
random blah
random blah
blah
<table style="width: 232px; font-size: small;" cellpadding="0" cellspacing="0">....
What I want to do is to delete lines until table tag. I think I can do it with Regex search but I couldnt write the regex pattern.
Thank you
You have to go through multiple steps to do what you stated above:
Go to the replace window, select the "extended" mode, and in the "find what" field type in "\r\n" and replace them with: "LINEBREAK" (theres a space after 'LINEBREAK'). Click on replace all.
Go to the replace window again, select the "regular expression" mode, and in the "find what" field type in "(.*)(.*)(<table)(.*)(>)(.*)(.*)" and in the replace with field, type in "\2\3\4\5". Click on replace all.
Now go to replace window again, select elect the "extended" mode, and in the "find what" field type in "LINEBREAK" (theres a space after 'LINEBREAK') and replace them with: "\r\n". Click on replace all.
Notepad++ doesn't support multi line regex, which makes it hard to do what you wanted to do without going through the steps given above.
you can try something like:
(^.*$\n)*<table(.+)>
First group will match all lines before your table tag %)