preg_replace regular expression HTML [closed] - regex

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I am using shortcodes in WordPress. After each shortcode output (closing div) I got <br> (or <br />) tag.
trying to filter them out, but I don't know how. Generated HTML looks like
<div class="fullwidth"><!-- 1st shortcode-->
<div class="fullwidth-content">
<!-- 2nd shortcode-->
<div class="twocol-one"> content
</div><br>
</div><br>
<!-- 3rd shortcode-->
<div class="twocol-second"> content
</div><br>
<div class="clearboth"></div>
</div><br>
seems BR is newline from tinyMCE. And I don't want loooong shotcode lines.
I am trying to use preg_replace but i cannot create correct $pattern.
Can you help me?
my function
function replace_br($content) {
$rep = preg_replace("/<\/div>\s*<br\s*\/?>/i", "</div>",$content);
return $rep; }
add_filter('the_content', 'replace_br');
not working.
While using
$rep = preg_replace("/\s*<br\s*\/?>/i", "",$content); in function, all BRs are replaced.
Fine, but i want to replace only BRs after closing DIV tag.
str_replace("</div><br>", "</div>", $content); also not working.
What's wrong with my function?
No error returned.

You are doing it wrong in the first place, since you have to remove the tags.
You are doing it wrong because you're using regex for HTML (sometimes it's OKish).
Variation of regex you're using should suffice: Demo
You should really consider using DOMDocument or similar:
$html = <<<HTML
...
HTML;
$dom = new DOMDocument();
$dom->loadHTML($html);
$element = $dom->getElementsByTagName('br');
$remove = [];
foreach($element as $item){
$remove[] = $item;
}
foreach ($remove as $item) {
$item->parentNode->removeChild($item);
}
$html = $dom->saveHTML();
echo $html;
This would remove all of br, you would need to adjust the code work for your specs, but this should be a pointer.

this is an alternative way to use regex
in your case
/(?<=<\/div>)(<br[\s\/]?>)/mg

Related

Select all whole divs with specific class using Regex for VSC [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I have a tonne of code like this in a massive file and I simply want to delete all divs with the qt class.
<div class="qt">
<div class="qy"> Standard Delivery</div>
<div class="qu">€1.95/$2.99</div>
</div>
<div class="qe" data-country="kw.svg">
<p class="qr"> Kuwait </p>
</div>
<div class="qt">
<div class="qy"> Standard Delivery</div>
<div class="qu">€1.95/$2.99</div>
<div class="qs"> Express Delivery</div>
<div class="qb">€2.95/$3.99</div>
</div>
<div class="qe" data-country="ml.svg">
<p class="qr"> Malawi </p>
</div>
I would like to select all divs with the qt class up to the matching closing div in VSC (find and replace). How would we go about doing that especially given there are a varying number of other </div> tags inside the div we want to select?
As I mentioned in the comments above, if your code is similarly formatted to that portion you showed, it becomes easier to do this with find/replace. For instance:
(^\s*<div class="qt">)[\s\S]*?(^\s*<\/div>\n) find and replace with nothing.
This form requires that your inner div's are one-liners like
<div class="qy"> Standard Delivery</div> where the entire element is on one-line.
See https://regex101.com/r/LxCr6k/1
If those inner div's are not necessarily one-liners, this is the simplest version I found that fits your data:
^\s*<div class="qt">[\s\S]*?<\/div>\n^\s*<\/div>\n
see https://regex101.com/r/LxCr6k/6
To explain this latter one:
^\s*<div class="qt"> find a qt class div
[\s\S]*?<\/div>\n^\s*<\/div>\n go, newlines or characters, until 2 consecutive <\/div>'s.
Now this version does require that of the 2 consecutive ending </div> tags following a qt class opener that the second represents the closing tag of the qt opening tag. So if your data could look like this:
<div class="qt">
<div class="qy"> Standard Delivery</div>
<div class="qu">€1.95/$2.99
<div class="qu">€1.95/$2.99</div> <= inner tag here
</div>
</div>
than these regexes won't work. Nested tags are notoriously "hard" to handle with a single regex. Only you know if your data allows this last regex to work. Your data as presented looks regular and simple enough to allow a regex approach to work.

I need to use RegEx to find a speciffic word in HTML page? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
I'm trying to extract a specific word (that might change) which comes after a permanent expression. I want to extract the name Taldor in this code:
<h4 class="t-16 t-black t-normal">
<span class="visually-hidden">Company Name</span>
<span class="pv-entity__secondary-title">Taldor</span>
</h4>
For now I able to find <h4 class="t-16 t-black t-normal"> using this regex:
(?<=<h4 class="t-16 t-black t-normal">).*
Will be glad for any kind of advice.
I'd suggest you to use an HTML parsing library like Jsoup in Java or beautifulsoup in Python to parse HTML instead of using regex for this reason
Following is the kind of code that does the job for you,
String s = "<h4 class=\"t-16 t-black t-normal\">\r\n" +
" <span class=\"visually-hidden\">Company Name</span>\r\n" +
" <span class=\"pv-entity__secondary-title\">Taldor</span>\r\n" +
" </h4>";
Document doc = Jsoup.parse(s);
for (Element element : doc.getElementsByClass("pv-entity__secondary-title")) {
System.out.println(element.text());
break;
}
Prints,
Taldor
In worst case, if you are doing some quick and dirty work, you can do this temporary solution using regex but it is surely not recommended thing to do.
<span class="pv-entity__secondary-title">(.*?)<\/span>
Use this regex and capture your data from group1.
Demo

Remove HTML Comment Tags from text [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
I'm really struggling trying to remove comment tags from HTML.
I want to keep everything inside the comment tags. I just want to remove <!-- and --> from the text.
I'm writing code using Python 2.7 and BeautifulSoup4.
I've tried using Regex to no avail. I tried the pattern "(<!--.*?-->)", but this seems to remove everything inside also.
I've also tried "(<!--|-->)" but it did not do what I wanted.
How can I achieve this?
you can use re.sub:
import re
f = open('filename.txt', 'r').readlines()
for n in f:
text = n.rstrip()
othertext = re.sub('<!--', '', text)
f = open('saved.txt', 'a')
f.write(othertext)
f.write('\n')
You can Just Group the Comments tag and replace the remaining data in the file
import re
List = "C:\\Users\\Administrator\\Desktop\\File1.txt"
with open(List,'r') as readfile:
content = readfile.readlines()
for i in content:
if '!' in i :
line = re.sub('(\<!--)([\w ]*)(-->)',r'\2',i)
with open('C:\\Users\\Administrator\\Desktop\\File2.txt',"a+") as writefile:
writefile.write(line)

HTML5 - simple pattern don't works with Smarty [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I'm using a simple pattern like this :
<input type="text" pattern="[a-z]{2}" required ../>
But it's never valid. It seems it didn't works.
I tested it in Firefox
Is there something to active or something like that ?
My template :
<section class="inscription">
<h1>Inscription</h1>
<form method="post" enctype="multipart/form-data">
<div class="formu-inscription">
<label for="nom"> Nom : </label> <br/><input type="text" id="nom" name="nom" value="{set_value('nom')}" pattern="[a-z]{2}" required /> <br />
{form_error('nom')}
/* others inputs */
For people using smarty template, you can do
pattern="[a-z]{literal}{2}{/literal}"
Looking at the rendered source code you posted as a comment on a now deleted answer you have
<input type="text" id="nom" name="nom" value="" pattern="[a-z]2" required />
This is not the pattern in your source that you posted in your question. It seems that you are using some unspecified templating system that is using the {} characters as field identifiers and is misinterpreting your pattern.
The result in your rendered page is the pattern [a-z]2, which will validate for a string like a2 or f2, but not a, or aa, a3 or anything longer.
Since you haven't specified what templating system you're using it's not possible to indicate how you might work around this. Possibly a pattern of [a-z]{{2}} might work.

Regular Expression issue

I have a code like this
<div class="rgz">
<div class="xyz">
</div>
<div class="ckh">
</div>
</div>
The class ckh wont appear everytime. Can someone suggest the regex to get the data of fiv rgz. Data inside ckh is not needed but the div wont appear always.
Thanks in advance
#diEcho and #Dve are correct, you should learn to use something like the native DOMdocument class rather than using regex. Your code will be easier to read and maintain, and will handle malformed HTML much better.
Here is some sample code which may or may not do what you want:
$contents = '';
$doc = new DOMDocument();
$doc->load($page_url);
$nodes = $doc->getElementsByTagName('div');
foreach ($nodes as $node)
{
if($node->hasAttributes()){
$attributes = $element->attributes;
if(!is_null($attributes)){
foreach ($attributes as $index=>$attr){
if($attr->name == 'class' && $attr->value == 'rgz'){
$contents .= $node->nodeValue;
}
}
}
}
}
Regex is probably not your best option here.
A javascript framework such as jquery will allow you to use CSS selectors to get to the element your require, by doing something like
$('.rgz').children().last().innerHTML