Remove string between HTML tags with TRegEx

Remove string between HTML tags with TRegEx - regex

I am designing by code a report sent by email with Outlook using HTML format.
To do that, I'm loading first a HTML template where I can insert all dynamic parts using predefined tags like [CustomerName].
<p>You will find below reports for customer [CustomerName] dated [ReportdDate]</p>
<tag-1>
<h3>TableTitleA</h3>
<table>
<thead id="t01">
<tr>
<th align='center' width='80'>Order Nr</th>
<th align='left' width='400'>Date</th>
<th align='left' width='200'>Info</th>
<th align='center' width='200'>Site Name</th>
</tr>
</thead>
<tbody>
[TableA]
</tbody>
</table>
</tag-1>
<tag-2>
<h3>TableTitleB</h3>
<table>
<thead id="t01">
<tr>
<th align='center' width='80'>Order Nr</th>
<th align='left' width='100'>Date</th>
<th align='left' width='400'>Info</th>
<th align='left' width='200'>Site Name</th>
</tr>
</thead>
<tbody>
[TableB]
</tbody>
</table>
</tag-2>
<p>Best regards</p>
This template is ready to insert two HTML tables: [TableA] and [TableB]
But sometimes a table has no data. So, I want to remove that complete HTML section. To achieve this, I have inserted fake tags:
<tag-1></tag-1> and <tag-2></tag-2>
And then removing the complete section including the two fake tags using TRegEx. This is working just fine here:
https://regex101.com/r/5OFlyC/1
But with this code in Delphi, it doesn't work as expected:
TRegEx.Replace(MessageBody.Text, '<tag-1>.*?</tag-1>', '');
Could you tell me what's wrong here?
My problem is fixed. Thanks to all of you

Just use the roSingleLine option to deal with line feeds:
MessageBody.Text := TRegEx.Replace(MessageBody.Text, '<tag-1>.*?</tag-1>', '', [roSingleLine]);

first you have to remove all the CR LF from your string and then use the expression with escape before < and >
S:=StringReplace(messagebody.Text,#13#10,'<br>',[rfReplaceAll]);
S:=TRegEx.Replace(S,'(\<tag-1\>.*?\<\/tag-1\>)','');
messagebody.text:=StringReplace(S,'<br>',#13#10,[rfReplaceAll]);

Related

Jinja2 Repeats Table Rows When Reloading Page

I have a 'dashboard' page that comes up after a user selects a project that they'd like to inspect the details on. Some of these details are displayed in a table format. Below is a portion of the Jinja2 template:
`
{% if release_container.nreleases > 0 %}
<table class="table table-bordered">
<thead>
<tr>
<th scope="col">Name</th>
<th scope="col">Date Created</th>
<th scope="col">Status</th>
<th scope="col">Date Released</th>
</tr>
</thead>
<tbody>
{% for rh in release_container.release_history %}
<tr>
<td>{{rh.name}}</td>
<td>{{rh.creation_date}}</td>
<td>{{rh.status}}</td>
<td>{{rh.release_date}}</td>
</tr>
{% endfor %}
</tbody>
</table><!--ends table-->
{% endif %}
`
If I reload the page, or click on a link and then come back to the page, the same data will get appended. As an example, here's the HTML from a 'fresh' load, which is what I should look like after a reload as well:
`
<tbody>
<tr>
<td>v0.1</td>
<td>2022-11-10</td>
<td>False</td>
<td>2022-11-10</td>
</tr>
<tr>
<td>v0.2</td>
<td>2022-11-17</td>
<td>False</td>
<td>2022-11-17</td>
</tr>
</tbody>
`
But this is what it looks like when the page is reloaded:
`
<tbody>
<tr>
<td>v0.1</td>
<td>2022-11-10</td>
<td>False</td>
<td>2022-11-10</td>
</tr>
<tr>
<td>v0.2</td>
<td>2022-11-17</td>
<td>False</td>
<td>2022-11-17</td>
</tr>
<tr>
<td>v0.1</td>
<td>2022-11-10</td>
<td>False</td>
<td>2022-11-10</td>
</tr>
<tr>
<td>v0.2</td>
<td>2022-11-17</td>
<td>False</td>
<td>2022-11-17</td>
</tr>
</tbody>
`
What do I need to do in order to stop this from happening? I presume this is some kind of cache setting?
Thank you in advance for any assistance.
I've debugged the code by printing out the lengths of the various arrays to see if they change on reload, thinking that the data was being appended to the array, but the length remains '2', even though 4 different entries are rendered.
EDIT
A commenter asked to see the view code. At this point in time, it's a huge function that generates the containers that get passed to the template (which is just one template amongst half a dozen that gets passed), and I'll be refactoring that code shortly into more palatable bites, but hopefully the snippet below provides more context:
for release in releases:
rhistory_container = ProjectDashboardReleaseHistoryContainer()
rhistory_container.rid = release.id
rhistory_container.name = release.name
rhistory_container.creation_date = release.created
rhistory_container.status = release.is_active
release_container.release_history.append(rhistory_container)
"""
"""
return render_template('project_home.html',parent_container=parent_container,release_container=release_container,testing_container=testing_container,reqs_container=reqs_container,sw_container=sw_container,hw_container=hw_container)

How to get date input from table created using for loop in django?

So I have passed a context from views.py to my html template.
I have created a html table using 'For Loop' in the following way and also added a column with input date field.
<table class="table">
<thead style="background-color:DodgerBlue;color:White;">
<tr>
<th scope="col">Barcode</th>
<th scope="col">Owner</th>
<th scope="col">Mobile</th>
<th scope="col">Address</th>
<th scope="col">Asset Type</th>
<th scope="col">Schhedule Date</th>
<th scope="col">Approve Asset Request</th>
</tr>
</thead>
<tbody>
{% for i in deliverylist %}
<tr>
<td class="barcode">{{i.barcode}}</td>
<td class="owner">{{i.owner}}</td>
<td class="mobile">{{i.mobile}}</td>
<td class="address">{{i.address}}</td>
<td class="atype">{{i.atype}}</td>
<td class="deliverydate"><input type="date"></td>
<td><button id="schedulebutton" onclick="schedule({{forloop.counter0}})" style="background-color:#288233; color:white;" class="btn btn-indigo btn-sm m-0">Schedule Date</button></td>
</tr>
{% endfor %}
</tbody>
Now I would like to get that date element value in javascript, but its proving difficult since I am assigning a class instead of id(as multiple elements cant have same id).
I tried in the following way but its not working. The console log shows no value in that variable.
<script> //i is the iteration number passed in function call using forloop.counter0
function schedule(i){
var deldate = document.getElementsByClassName("deliverydate");
deldate2 = deldate[i].innerText;
console.log(deldate2); //log shows no value/empty
console.log(i); //log shows iteration number
</script>

How to parse specific conents from table with Scrapy

I'm trying to parse certain contents from table looking like below:
<table class="dataTbl col-4">
<tr>
<th scope="row">Rent</th>
<td>5.5</td>
<th scope="row">Management</th>
<td>3.3</td>
</tr>
<tr>
<th scope="row">Deposit</th>
<td>No</td>
<th scope="row">Other</th>
<td>No</td>
</tr>
<tr>
<th scope="row">Other2</th>
<td>No</td>
<th scope="row">Insurance</th>
<td>Yes</td>
</tr>
</table>
My goal is to find specific row (for example, Rent) and if there is a match, extract the content in the next <td> tag(For example, 5.5).
But how can I do it in Python?
I'm using Python3/Scrapy 1.3.0.
Thanks

In [9]: Selector(text=html).xpath('//th[text()="Rent"]/following-sibling::td[1]').extract()
Out[9]: ['<td>5.5</td>']
Use text()="Rent" to id the th tag
Use following-sibling:: get it's sibling and use [1] to get first

Using a python's regular expression.
r'\>text\<.+\n +\<td\>(\d+\.\d+)'
In your case, change text by Rent. Also, this is a useful web page to debug regular expressions.

How to stop encoding url in template file on Beego?

I'm in trouble dealing with template and url encoding on Beego.
(Beego is one of the template engines of Go lang)
How to stop encoding url in HTML TAG in template file on Beego?
Please let me know.
--
logcontroller.go
package controllers
import (
"mycode/models"
)
type FiletranslogController struct {
baseController
}
func (this *FiletranslogController) Get() {
// Already encoded url
this.Data["querystring"] = "/filetranslog/getlogs?sdate=2016-11-13%2000%3A00&edate=2016-12-13%2023%3A59&md5=&trans_type=2"
this.TplName = "log/filetrans.html"
}
filetrans.html
<!-- Not in TABLE TAG -->
{{str2html .querystring}}
<!-- In TABLE TAG -->
<table id="table-log"
data-url="{{str2html .querystring}}"
data-toggle="table"
data-toolbar="#toolbar-log"
data-search="true"
data-show-refresh="true"
data-pagination="true"
data-side-pagination="server"
>
<thead>
<tr>
<th data-field="rdate">Date</th>
<th data-field="mail_sender">Mail Sender</th>
<th data-field="trans_type">Trans Type</th>
<th data-field="md5">MD5</th>
</tr>
</thead>
</table>
<script>
view source on Web browser
<!-- Not in TABLE TAG -->
/filetranslog/getlogs?sdate=2016-11-13%2000%3A00&edate=2016-12-13%2023%3A59&md5=&trans_type=2
<!-- In TABLE TAG -->
<table id="table-log"
data-url="/filetranslog/getlogs?sdate=2016-11-13%2000%3A00&edate=2016-12-13%2023%3A59&md5=&trans_type=2"
data-toggle="table"
data-toolbar="#toolbar-log"
data-search="true"
data-show-refresh="true"
data-pagination="true"
data-side-pagination="server"
>
<thead>
<tr>
<th data-field="rdate">Date</th>
<th data-field="mail_sender">Mail Sender</th>
<th data-field="trans_type">Trans Type</th>
<th data-field="md5">MD5</th>
</tr>
</thead>
</table>
<script>
OMG
/filetranslog/getlogs?sdate=2016-11-13%2000%3A00&edate=2016-12-13%2023%3A59&md5=&trans_type=2
---> changed to
/filetranslog/getlogs?sdate=2016-11-13%2000%3A00&edate=2016-12-13%2023%3A59&md5=&trans_type=2
* ex) PHP Smarty template engine supports {literal} bla..bla..never encoded {/literal} tag. *

str2html
Parse string to HTML, no escaping. {{str2html .Strhtml}}
https://beego.me/docs/mvc/view/template.md

Second test result.
template_file.html
{{str2html .querystring}}
<table data-url="{{.querystring}}"
data-url='{{.querystring}}'
data-url="{{str2html .querystring}}"
data-url='{{str2html .querystring}}'
>
<thead>
<tr>
<th data-field="rdate">Date</th>
<th data-field="mail_sender">Mail Sender</th>
<th data-field="trans_type">Trans Type</th>
<th data-field="md5">MD5</th>
</tr>
</thead>
</table>
view source on Web Browser
/filetranslog/getlogs?sdate=2016-11-13%2000%3A00&edate=2016-12-13%2023%3A59&md5=&trans_type=2
<table data-url="/filetranslog/getlogs?sdate=2016-11-13%2000%3A00&edate=2016-12-13%2023%3A59&md5=&trans_type=2"
data-url='/filetranslog/getlogs?sdate=2016-11-13%2000%3A00&edate=2016-12-13%2023%3A59&md5=&trans_type=2'
data-url="/filetranslog/getlogs?sdate=2016-11-13%2000%3A00&edate=2016-12-13%2023%3A59&md5=&trans_type=2"
data-url='/filetranslog/getlogs?sdate=2016-11-13%2000%3A00&edate=2016-12-13%2023%3A59&md5=&trans_type=2'
>
<thead>
<tr>
<th data-field="rdate">Date</th>
<th data-field="mail_sender">Mail Sender</th>
<th data-field="trans_type">Trans Type</th>
<th data-field="md5">MD5</th>
</tr>
</thead>
</table>
Why is literal string encoded? I use "beego.ParseForm" function for form parsing, however, double-encoded url is not parsed by "beego.ParseForm" properly.

Beautifulsoup replace set of html code with different code

I have a set of html code in my beautifulsoup object which is to be replaced with some other code
This is what I am getting in my Beautifulsoup object
<html>
<body>
<table class="bt" width="100%">
<tr class="heading">
<th scope="col">Â </th>
<th class="th-heading" scope="col">B</th>
<th class="tho" scope="col"><b>O</b></th></tr></table></div></div></div></div></div></div></body></html></html>
<th class="thm" scope="col"><b>M</b></th>
<th class="thr" scope="col"><b>R</b></th>
<th class="thw" scope="col"><b>W</b></th>
<th class="thecon" scope="col"><b>E</b></th>
<th class="thw" scope="col"><b>0s</b></th>
<th class="thw" scope="col"><b>F</b></th>
<th class="thw" scope="col"><b>S</b></th>
<th scope="col">Â </th>.............</body></html>
Required code:
<html>
<body>
<table class="bt" width="100%">
<tr class="heading">
<th scope="col">Â </th>
<th class="th-heading" scope="col">B</th>
<th class="tho" scope="col"><b>O</b></th>
<th class="thm" scope="col"><b>M</b></th>
<th class="thr" scope="col"><b>R</b></th>
<th class="thw" scope="col"><b>W</b></th>
<th class="thecon" scope="col"><b>E</b></th>
<th class="thw" scope="col"><b>0s</b></th>
<th class="thw" scope="col"><b>F</b></th>
<th class="thw" scope="col"><b>S</b></th>
<th scope="col">Â </th>.............</body></html>
I have tried but that's not working
soup.replace('<th class="tho" scope="col"><b>O</b></th></tr></table></div></div></div></div></div></div></body></html></html>', '<th class="tho" scope="col"><b>O</b></th>')

In your own solution you're already hinting at string replacements, rather than
actual HTML tree insertions. That's because the HTML you're starting from is terrible.
One solution is to add tags to the original tree that was generated by BeautifulSoup:
from bs4 import BeautifulSoup
import re
start_str = """<html><body><table class="bt" width="100%"><tr class="heading"><th scope="col">Â </th>
<th class="th-heading" scope="col">B</th>
<th class="tho" scope="col"><b>O</b></th></tr></table></div></div></div></div></div></div></body></html></html>
<th class="thm" scope="col"><b>M</b></th>
<th class="thr" scope="col"><b>R</b></th>
<th class="thw" scope="col"><b>W</b></th>
<th class="thecon" scope="col"><b>E</b></th>
<th class="thw" scope="col"><b>0s</b></th>
<th class="thw" scope="col"><b>F</b></th>
<th class="thw" scope="col"><b>S</b></th>
<th scope="col">Â </th>.............</body></html>"""
soup = BeautifulSoup(start_str) # remark: this'll split right after the first '</html>'
substr = re.findall('<th class="thm".*', start_str, re.DOTALL)
subsoup = BeautifulSoup(substr[0])
for tag in subsoup.findAll('th'):
soup.tr.append(tag)
While using regular expressions to parse HTML isn't recommended, this is a
borderline case, and it's not even really parsing, merely selecting a substring.
In that sense, it can even be replaced completely with pure python builtins:
substr = start_str.split('</html></html>')[1]
Another solution is simply to remove those undesired tags, but that will only work if that substring is fixed:
to_remove = '</tr></table></div></div></div></div></div></div></body></html></html>'
soup = BeautifulSoup(''.join(start_str.split(to_remove)))
You could also use the re module in this solution, if there is whitespace between those tags for example.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Remove string between HTML tags with TRegEx - regex

Just use the roSingleLine option to deal with line feeds: MessageBody.Text := TRegEx.Replace(MessageBody.Text, '<tag-1>.*?</tag-1>', '', [roSingleLine]);

first you have to remove all the CR LF from your string and then use the expression with escape before < and > S:=StringReplace(messagebody.Text,#13#10,'<br>',[rfReplaceAll]); S:=TRegEx.Replace(S,'(\<tag-1\>.*?\<\/tag-1\>)',''); messagebody.text:=StringReplace(S,'<br>',#13#10,[rfReplaceAll]);

Related

Jinja2 Repeats Table Rows When Reloading Page

How to get date input from table created using for loop in django?

How to parse specific conents from table with Scrapy

How to stop encoding url in template file on Beego?

Beautifulsoup replace set of html code with different code

Categories

Resources