Find and replace between first '>' and next '<' in vim - regex

I have an html table with some headers:
<th>Header01</th>
...
<th>Different Header 20</th>
I'd like to find and replace everything between the first > and the next < with {{ }}:
<td>{{ }}</td>
...
<td>{{ }}</td>
I know I can use :s/\%Vth/td/g to replace all the th with td, but how can I use Vim's regex to find and replace everything between > and < with {{ }}?
I have attempted the following without success
:s/\%V\>(.*?)\</{{ }}/g

if you would take the risk of changing a html/vim by regex, in vim, you can just do:
%s/>[^<]*</>{{ & }}</g

You may use
%s/>\(.\{-}\)</>{{ \1 }}</g
In a non-very magic mode, \< and \> are word boundaries that is why they did not work on your side. Besides, *?, a Perl-like lazy quantifier, should be written as \{-} in Vim. The % symbol at the start tells Vim to search and replace on all lines, not just the current one.
Details
> - matches a >
\(.\{-}\) - captures into Group 1 any 0 or more chars (but linebreaks, if you need to include line breaks, prepend . with \_) but as few as possible
< - matches a <
The replacement is >{{ \1 }}<, >{{, Group 1 value and }}>. g makes multiple search and replace operations on lines.

I may be misreading the question, but it appears the desired replacement text is paired double braces around 3 whitespaces (not around the original matched text: "replace everything between the first > and the next < with {{ }}". If correct, then the following simple substitute command should work:
%s/>.*</>{{ }}</g

If you have just one tag --> content per line, you can do:
:%norm cit{{ }}
As you would know, "cit" is a vim text-object that stands for "Change Inner Tag".
If you have many tags on each line you can try:
:%s,>\zs[^<]*\ze</.*,{{ }},g
For more read :h text-objects and :h \zs

Related

Regex expresion that should select matching text but escape everything between ><

I have a lot of files where i have something similar
<span translate>textTo.translate</span>
What need to look like is
<span>{{ $t("textTo.translate") }}</span>
What i do now is find all occurrences of span translate> and replace all with span>{{ $t(" and then go thru file and finish step by step
Is there any regex that i could use in replace all to achieve this?
What i managed to do is to select all (span translate>)[^<>]+(?=<) but i cannot find replacement regex
You can use
Find: <span\s+translate>([^<>]+)<
Replace: <span>{{ $t("$1") }}<
See the regex demo.
Details:
<span\s+translate> - a <span string, one or more whitespaces and then a translate> string
([^<>]+) - Group 1 ($1 refers to this group value): one or more chars other than < and >
< - a < char
The replacement is <span>{{ $t(" + Group 1 value + ") }}< text.
See the demo:
nvm, found answer but for some reason works only in VSCode while fails in PHPStorm.
Find regex: span translate>([^<>]+(?=))<
Replace regex: span>{{ \$t("$1") }}<

Recursive regex for templating sub loops

So I looked at How to write a recursive regex that matches nested parentheses? and other solutions for recursive regex matching, but I'm still not getting a proper match on RegexBuddy.
I have a generic handlebars-style template that I want to parse myself, a table with headings:
<table>
<thead>
<tr>
{{#each columns as col }}<th>{{col}}</th>{{/each}}
</tr>
</thead>
<tbody>
{{#each rows as row }}
<tr>
{{#each row as col }}<td>col</td>{{/each}}
</tr>
{{/each}}
</tbody>
</table>
And trying to match with
/{{\#each (\w+) as (\w+) }}(.*?|(?R)){{/each}}/s
The regex matches the {{#each columns... in the <thead> just fine, but it seems to ignore the |(?R) part and matches {{#each rows... only until the first {{/each}}. I, of course, would like it to match both the inner and outer #each expressions. How? This is perhaps much more complex than simple nested parentheses.
(I always feel like I'm a pro at RegEx until I run into things like this. I have been trying for a while to make this work, and regular-expressions.info is just confusing me more.)
I'm currently working around this by doing {{#each_sub...}}...{{/each_sub}} so my regex won't stop on the first closing tag, but that's obviously a sub-optimal way of doing it. I have several other applications that would benefit from recursive regex but can't figure out what I'm doing wrong.
It isn't ignoring the recursion, it's just never reaching it. Because .*? is capable of matching your delimiters ({{#each...}} and {{/each}}), it matches the first closing delimiter it finds and reports success without ever needing to recurse.
For this technique to work, the branch before the (?R) has to match anything that's not a delimiter. Since your delimiters consist of multiple characters, you can't use a negated character class, as they did in the question you linked to. Instead, you need to use a tempered greedy token:
(?:(?!{{[#/]each\b).)*
This is the same as .*, except before it consumes each character it checks to make sure it's not the beginning of {{#each or {{/each. Here it is in context:
{{\#each (\w+) as (\w+) }}(?:(?:(?!{{[#/]each\b).)*|(?R))*{{/each}}
If the first branch fails, it means you've encountered something that looks like a delimiter. If it's an opening delimiter, the second branch takes over and tries to match the whole pattern recursively. Otherwise, it pops out of the loop (note the * after the group--you were missing that, too) and tries to match a closing delimiter.
While the regex above will work fine on valid input, it's subject to catastrophic backtracking if input is malformed. To avoid that, you can use an unrolled loop in place of the alternation (as #Wiktor did in his comment):
{{\#each\s+(\w+)\s+as\s+(\w+)\s*}}(?:(?!{{[#/]each\b).)*(?:(?R)(?:(?!{{[#/]each\b).)*)*{{/each}}
Here's a slightly more readable version, with possessive quantifiers added to squeeze out even more speed:
{{\#each\s+(\w+)\s+as\s+(\w+)\s*}}
(?:(?!{{[#/]each\b).)*+
(?:
(?R)
(?:(?!{{[#/]each\b).)*+
)*+
{{/each}}

Regular expression replace start and end, ignore middle

In an Ant build file, is there a way to use a replaceregexp to find and replace two tags, and retain what's in between them? For example, to find all of these:
</a>1234abcdefg</P>
</a>123456789. </p>
</a> yop </p>
</a></p>
and replace
</a> and </p>
with
<#> and <##>
so that I have, respectively:
<#>1234abcdefg##
<#>123456789. <##>
<#> yop <##>
<#><##>
I can't replace the tags individually since they occur in other places, I just want the instances in which </a> is followed by </p>, in the same line, with either nothing or something in between them, and I want to keep what's in between them.
Try this:
<replaceregexp file="notTested.xml" match="(<)\/a(>.*?<)\/p(>)" replace="\1#\2##\3" byline="true" flags="g" />
as for, but it replaces what's between the tags with .* , i haven't seen .* in a replacement/substitution expression. probably it takes it as literals . and *.
as for </a>.*</p>, the > .* < will not work when you have multiple declerations of </a> and </p> on the same line... such as:
</a>1234abcdefg</P>abcde</a>123456789. </p> would be replaced as
<#>1234abcdefg</P>abcde</a>123456789. <##>
you need to use non greedy quantifier ?. See WiKi for the use of .*? vs .*.
Solution 1: You can try this
You store the match with parenthesis, and then replace it.
exp = new Regex(#"YourtagStartRegex(bodyRegex)YourtagClosingRegex");
str = exp.Replace(str, "$1");
Reference:Replace the start and end of a string ignoring the middle with regex, how?
Or
Solution 2:
Regex ignore middle part of capture

How to change all title attribute's value in Title Case in sublime text

I have 500 HTML files in my project where casing and quotes (" or ') in <title> attribute vary over all pages, see few examples below
<button title="Next" id="next"> Next</button>
<button title="next"> Next </buton>
<button title=""please go back">Check</button>
I want to change all title attributes in Title Case
<button title="Next" id="next"> Next</button>
<button title="Next"> Next </buton>
<button title="Please Go Back">Check</button>#
I have tried to find and replace - Regular Expression and Case sensitive button enabled
Find What: title=(".*")\s
Replace With: title="\u$"
But didn't get success.Please tell me what I am doing wrong?
UPDATED : also want to remove extra ' " see #
To further my comment, first it's the issue of .* being 'greedy' instead of 'lazy', meaning it is matching as much as possible (i.e. Next"> Next</button><button title="Next in your example).
The quick fix is using a 'lazy' .* instead, aka .*? (I added a ? to indicate possible presence of space because there's none in your examples):
title=(".*?")\s?
To improve performance, you would use a negated class:
title=("[^"]+")\s?
Where [^"]+ matches any character except ".
And to cope with the different quotes, you can use:
title=("[^"]+"|'[^']+')\s?
Which basically means either "[^"]+" or '[^']+' for the part within the parentheses.
For the replace and consecutive quotes issue:
title=(?:"+([^"]+)"+|'+([^']+)'+)\s?
Replace with:
title="\u$1$2"
The only thing is that the last line will be <button title="Please go back">Check</button>, if that's not an issue...
EDIT: \G actually works. Use a second replace:
(?:(?<=title=")|(?<!^)\G)[^\s"]+\s?
Replace with:
\u$0
(?<=title=('|")).+?(?=('|"))
this should give you matches Next next please go back that you can use.
you can use the index of the match to find your match in the Original string if you want to upper your lowers..
or use title=('|").+?(\1) to find any title attributes in your tekst including the quotation marks

Notepad++ Replace new line inside text

I have this sample, because is one of one million rows with that.
I have this text:
<tr class="even">
<td><a href="http://www.ujk.edu.pl/">Jan Kochanowski
University of Humanities and Sciences (Swietokrzyska Pedagogical
University) / Uniwersytet Humanistyczno Przyrodniczy Jana Kochanowskiego
w Kielcach</a></td>
I want to replace to be like that:
<tr class="even">
<td>Jan Kochanowski University of Humanities and Sciences (Swietokrzyska Pedagogical University) / Uniwersytet Humanistyczno Przyrodniczy Jana Kochanowskiegow Kielcach</td>
I tried that REGEX: (.*)
But didn't work.
Open the replace window with Ctrl + H
Then enter
Find what: ([^>])[\r\n]{1,2}
Replace with: \1
Check Regular Expression
[^>] matches a character that isn't a >
The {1,2} protects against a file that may only have a newline and not a carriage return.
\1 replaces just the character that was in the grouping ( ).
Open Notepad++
Click Search >> Replace..
Replace with: \n
At the bottom you will find "Search Mode", click "Extended"
Live example here: http://postimg.org/image/c66pw8kkr/
If you can't make jmstoker's solution work, try like this:
you need to check if the line breaks are just CRLF or just one of them, for that, click on the toolbar the icon "show all characters" or go to menu View -> Show Symbol -> Show all characters
in the replace dialog select the "Extended" search mode
in the "find what:" field, write this: \r\n (or just \r or just \n, basically match CR with \r and LF with \n)
leave the replace with field empty
once this is done, all the line breaks will have been replaced, but you want the <tr class="even"> to be on its own line, so just replace still using an extended search <tr class="even"><td> with <tr class="even">\r\n<td>
I'm guessing you also have rows with class "odd" or something like that so you might need to repeat that last step with the different class :)