In my html page I have a lot of strings inside tags.
like
<p>Some string 1</p>
<p>Some string 2</p>
<p>Any string 3</p>
I need to put all of them to attribute TRANSLATE, lowercase them and replace all spaces to underscores inside strings.
So I multiselect all of them with holded CTRL, then ctrl+K, ctrl+L make them lowercase, CTRL+x - erase, two left arrows for going inside tags, write translate="PASTE HERE"
Now I have
<p translate="some string 1"></p>
<p translate="some string 2"></p>
<p translate="any string 3"></p>
Next step - I need to make underscores instead of spaces.
To find all translate strings I use regex (?s)translate=".+?"
But how to replace? Help.
Type ctrl + H and then
Use negative-lookbehind to search spaces which are not preceded by p.
(?<!p)\h+
\h matches only horizontal spaces.
Now replace-all it with _.
This is simple but will work and faster than looking for a smarter answer.
Find this: translate="(.*) (.*)"
Replace with this: translate="\1_\2"
Keep using Replace All until all your unwanted spaces are underscores (in the example you gave, twice).
Related
I want to extract data from html. The thing is, that i cant extract 2 of strings which are on the top, and on the bottom of my pattern.
I want to extract 23423423423 and 1234523453245 but only, if there is string Allan between:
<h4>###### </h4> said12:49:32
</div>
<a href="javascript:void(0)" onclick="replyAnswer(##########,'GET','');" class="reportLink">
report </a>
</div>
<div class="details">
<p class="content">
Hi there, Allan.
</p>
<div id="AddAnswer1234523453245"></div>
Of course, i can do something like this: Profile\/(\d+).*\s*.*\s*.*\s*.*\s*.*\s*.*\s*.*\s*.*Allan.*\s*.*\s*.*AddAnswer(\d+). But the code is horrible. Is there any solution to make it shorter?
I was thinking about:
Profile\/(\d+)(.\sAllan)*AddAnswer(\d+)
or
Profile\/(\d+)(.*Allan\s*)*AddAnswer(\d+)
but none of wchich works properly. Do you have any ideas?
You can construct a character group to match any character including newlines by using [\S\s]. All space and non-space characters is all characters.
Then, your attempts were reasonably close
/Profile\/(\d+)[\S\s]*Allan[\S\s]*AddAnswer(\d+)/
This looks for the profile, the number that comes after it, any characters before Allan, any characters before AddAnswer, and the number that comes after it. If you have single-line mode available (/s) then you can use dots instead.
/Profile\/(\d+).*Allan.*AddAnswer(\d+)/s
demo
You can use m to specify . to match newlines.
/Profile\/(\d+).+AddAnswer(\d+)/m
Better use a parser instead. If you must use regular expressions for whatever reason, you might get along with a tempered greedy solution:
Profile/(\d+) # Profile followed by digits
(?:(?!Allan)[\S\s])+ # any character except when there's Allan ahead
Allan # Allan literally
(?:(?!AddAnswer)[\S\s])+ # same construct as above
AddAnswer(\d+) # AddAnswer, followed by digits
See a demo on regex101.com
I have a lot of html files with text without <p>. tags in the code.
I try find and replace with Adobe Brackets or Sublime Text 2:
Find <br><br>\n
Replace </p>\n</p>
But they do not find the \n in the code
Simplified, now I have:
Some sentence, some sentence<br><br>
(I have one space here in the code)
Some sentence, some sentence<br><br>
I would like to convert:
Some sentence, some sentence</p>
<p>Some sentence, some sentence</p>
(I know I will have to add manually just one <p> at the beginning, this is not important and it is not the point of this question)
Match a br with followed spaces (regex spaces includes \n\r\t ...):
<br\s*\/?>\s*
You can then replace with your string with global search.
Edit: I saw that your replacement is not just a carriage return, which will be messy with my example.
I would go for a two steps, replace any br by \n then apply your p elements by replacing multiple \n\s*.
Find:(.*)<br><br>\n?
Replace:<p>\1</p>\n
InPut:
Some sentence, some sentence<br><br>
Some sentence, some sentence<br><br>
OutPut:
<p>Some sentence, some sentence</p>
<p>Some sentence, some sentence</p>
In an Ant build file, is there a way to use a replaceregexp to find and replace two tags, and retain what's in between them? For example, to find all of these:
</a>1234abcdefg</P>
</a>123456789. </p>
</a> yop </p>
</a></p>
and replace
</a> and </p>
with
<#> and <##>
so that I have, respectively:
<#>1234abcdefg##
<#>123456789. <##>
<#> yop <##>
<#><##>
I can't replace the tags individually since they occur in other places, I just want the instances in which </a> is followed by </p>, in the same line, with either nothing or something in between them, and I want to keep what's in between them.
Try this:
<replaceregexp file="notTested.xml" match="(<)\/a(>.*?<)\/p(>)" replace="\1#\2##\3" byline="true" flags="g" />
as for, but it replaces what's between the tags with .* , i haven't seen .* in a replacement/substitution expression. probably it takes it as literals . and *.
as for </a>.*</p>, the > .* < will not work when you have multiple declerations of </a> and </p> on the same line... such as:
</a>1234abcdefg</P>abcde</a>123456789. </p> would be replaced as
<#>1234abcdefg</P>abcde</a>123456789. <##>
you need to use non greedy quantifier ?. See WiKi for the use of .*? vs .*.
Solution 1: You can try this
You store the match with parenthesis, and then replace it.
exp = new Regex(#"YourtagStartRegex(bodyRegex)YourtagClosingRegex");
str = exp.Replace(str, "$1");
Reference:Replace the start and end of a string ignoring the middle with regex, how?
Or
Solution 2:
Regex ignore middle part of capture
I am using Sublime Text 2's regex search and replace tool and would like to search text that includes the \r and \n special characters but cannot see how just at the moment.
For example, I have the text:
<div class="head">\r\n
\r\n Keep this text\r\n</div>
Which I would like to transform into:
<h1>Keep this text</h1>
I would also like to factor in the eventuality that these \r\n characters may not be present.
How might I search accounting for \r\n being present and absent, and then remove them as per above? If two regex are required that's fine too.
So far I have <div class="head">(\w)+</div>, however this is stalled by the aforementioned \r\n.
I think you're looking for \s, which matches white space.
So your regex should be something like the following:
<div class="head">\s*(.+?)\s*</div>
If you can do this in ST2, then I think it would fit your need:
Find:
<div class="head">[\s\r\n]*([\w ]+)[\s\r\n]*<\/div>
Replace by:
<h1>$1</h1>
Demo
I'm referring to this page: http://ergoemacs.org/emacs/emacs_regex.html
which says that to capture a pattern in Emacs Regexp, you need to escape the paren like this: \(myPattern\).
It further says that the syntax for capturing a sequence of ASCII characters is [[:ascii:]]+
In my document, I'm trying to match all strings that occur between <p class="calibre3"> and </p>
So, following the syntax above, I do a replace-regexp for
<p class="calibre3">\([[:ascii:]]+\)</p>
but it finds no matches.
Suggestions?
Regexps are not good for general-purpose HTML parsing, but as paragraph tags cannot be validly nested, the following is going to be fine (provided the mark-up is valid & well-formed).
<p class="calibre3">\(.*?\)</p>
*? is the non-greedy zero-or-more repetitions operator, so it will match as little as possible -- in this case everything until the next </p> (as opposed to the greedy version, which would match everything until the final </p> in the text).
The [^<] approach is fine if it fits the data in question, but it won't work if there are other tags within the paragraphs.
You need to escape your angle brackets and I would use [^<] instead of [[:ascii]] like so:
\<p class="calibre3"\>([^<]+\)</p\>
<p class="calibre3">\([^<]\)+</p>
Source: #TooTone