Multiline regex powershell - regex

I would be grateful to receive some solution for my problem connected with parsing/ regex html file code:
d:\acc.html
<!-- WebSite-Watcher Demo Report -->
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>WebSite-Watcher Report</title>
<style type="text/css">
<!--
a:link, a:active {
color: #4040A0;
text-decoration: underline;
}
a:visited {
color: #8080A0;
text-decoration: underline;
}
a:hover {
background: #FFF000;
color: #FF0000;
text-decoration: underline;
}
body, td {
font-size: 11px;
line-height: 15px;
font-family: Verdana, Arial;
}
li {
list-style: square;
font-size: 11px;
line-height: 15px;
margin-top: 10px;
}
-->
</style>
</head>
<body>
<center>
<table cellpadding="2" cellspacing="2" border="0" width="80%">
<tr>
<td>
<font color="#336699" style="font-size: 18px;"><b>Highlighted changes</b></font><br>
<div style="border-top: 1px dashed dadada; margin-top: 5px;"></div>
<br>
<font color="#f00000">This report displays the first 200 characters of highlighted changes,<br>
the length can be changed individually with the <b>wsw_url_highlighted_changes(200)</b> variable.</font><br>
<br>
<table cellpadding="5" cellspacing="0" border="0" width="100%">
<tr>
<td style="border-bottom-color: #d0d0d0; border-bottom-style: solid; border-bottom-width: 1px; background-color: #eaeaea;"><!-- F1E896 -->
<font style="font-size: 13px;"><b>Lorem ipsum</b></font><br><font color="#808080"> | Web page | Local page</font>
</td>
</tr>
<tr>
<td style="border-bottom-color: #f0f0f0; border-bottom-style: solid; border-bottom-width: 1px; background-color: #f8f8f8;"><!-- F5F2C7 -->
<blockquote>
<br>
</blockquote>
</td>
</tr>
</table><br>
<br>
<table cellpadding="5" cellspacing="0" border="0" width="100%">
<tr>
<td style="border-bottom-color: #d0d0d0; border-bottom-style: solid; border-bottom-width: 1px; background-color: #eaeaea;"><!-- F1E896 -->
<font style="font-size: 13px;"><b>Lorem ipsum</b></font><br><font color="#808080">18-06-2015 | Web page | Local page</font>
</td>
</tr>
<tr>
<td style="border-bottom-color: #f0f0f0; border-bottom-style: solid; border-bottom-width: 1px; background-color: #f8f8f8;"><!-- F5F2C7 -->
<blockquote>
Lorem ipsum BBBBBBBBBBBB<br>
</blockquote>
</td>
</tr>
</table><br>
<br>
<table cellpadding="5" cellspacing="0" border="0" width="100%">
<tr>
<td style="border-bottom-color: #d0d0d0; border-bottom-style: solid; border-bottom-width: 1px; background-color: #eaeaea;"><!-- F1E896 -->
<font style="font-size: 13px;"><b>Lorem ipsum</b></font><br><font color="#808080">18-06-2015 | Web page | Local page</font>
</td>
</tr>
<tr>
<td style="border-bottom-color: #f0f0f0; border-bottom-style: solid; border-bottom-width: 1px; background-color: #f8f8f8;"><!-- F5F2C7 -->
<blockquote>
Lorem ipsum BBBBBBBBBBBB<br>AAAAAAAAAAAAAAAaa AA<br>
</blockquote>
</td>
</tr>
</table><br>
<br>
<br>
<br>
<div style="border-top: 1px dashed dadada;"></div>
<font color="#808080"><i>Report date: 18-06-2015</i></font><br>
</td>
</tr>
</table><br>
</center>
</body>
</html>
I need to 'clean' this file from empty entries like the first one(no text just some empty spaces or usually just .
I know that in powershell there is solution for multi regex and it will probably look like:
d:\pattern.txt
(?=<table cellpadding="5" ).*(?=<blockquote>).{0,6}(?=<\/blockquote>)
script:(thanks Jisaak)
$content = (Get-Content 'd:\acc.txt' -raw)
$pattern = (Get-Content 'd:\pattern.txt' -raw)
[regex]::Replace($content, $pattern, '',`
[System.Text.RegularExpressions.RegexOptions]::Multiline `
-bor [System.Text.RegularExpressions.RegexOptions]::Singleline)
I mean- from (0-6 of any symbols) )
This regex doesn't work i have problems with properly writing this advanced regex. Thanks for any help

Would this problem be easier if you didn't have to deal with multiple lines?
My experience with regex is limited and html is non-existant, but the below workaround can turn your blocks into single lines (and back to blocks again)
$file = (Get-Content ".\acc.html" -raw)
# Replace new line CR LF with a string (e.g. NEWLINE)
$file2 = ([regex]::Replace($file, ">`r`n", ">NEWLINE", "Singleline"))
$file2 | out-file ".\acc_edited.html"
# Single line regex replacement here to get rid of empty table.
# String NEWLINE can be used to indicate a new line.
# Replace the string with new line characters CR LF after regex replacement.
[regex]::Replace($file2, ">NEWLINE", ">`r`n", "Singleline") | Out-File ".\acc_original.html"

This should work:
(?<=<table cellpadding="5" cellspacing="0" border="0" width="100%">).*
(?=<blockquote>)|(?<=<blockquote>).{0,6}(?=<\/blockquote>)

Related

Table is rendered but dataTables will not modify existing HTML table in Flask

Below I have provided a minimum working example of my Flask app that uses dataTable.JS to modify an existing HTML table. In cross-comparing with current examples online, I think this is due to required libraries not being loaded properly and/or that it cannot find the data.
Guidance is greatly appreciated!
HTML (base.html)
<!DOCTYPE html>
<html>
<head>
<link rel="stylesheet" href="{{ url_for('static', filename='css/main.css') }}">
<link rel="stylesheet" type="text/css" href="https://cdn.datatables.net/v/dt/dt-1.11.5/af-2.3.7/b-2.2.2/fc-4.0.2/fh-3.2.2/sc-2.0.5/sl-1.3.4/datatables.min.css"/>
</head>
<body>
<header>
<div class="container">
<h1 class="logo"></h1>
<strong><nav>
<ul class="menu">
<li>Home</li>
<li>Entity Map</li>
</ul>
</nav></strong>
<div class = "logoimg">
<img src="{{ url_for('static',filename='images/GENERIC_PDE_PageCard.png') }}" width="10%">
</div>
</div>
</header>
<div class="container">
{% block content %}
{% endblock %}
</div>
<script type="text/javascript" charset="utf8" src="https://code.jquery.com/jquery-3.5.1.js"></script>
<script type="text/javascript" src="https://cdn.datatables.net/v/dt/dt-1.11.5/af-2.3.7/b-2.2.2/fc-4.0.2/fh-3.2.2/sc-2.0.5/sl-1.3.4/datatables.min.js"></script>
{% block scripts %}{% endblock %}
</body>
</html>
HTML(home.html)
{% extends "base.html" %}
{% block content %}
<div class = "home">
<h3>OCFY Entity Database</h3>
<h4>Purpose</h4>
<p>Purpose This application is designed for performing custom queries to a database containing information about private and non-public entities who are serving school-aged children and youth in the Commonwealth of Pennsylvania. Sites in this database include private or non-public licensed schools, as well as non-public entities such as residential and juvenile justice institutions.</p>
<h4>How to Search</h4>
<p>The database provides information at the site-level, as well as by several higher-order aggregates, including region, county, city, type of service, PDE Educational Entity and DHS Entity.
Use the drop-down filters within to perform a custom query that automatically displays in the table on the next tab. Selecting one or more values fromm the filters will automatically remove irrelevant values from the rest of the filters. You can also use the filters in any order. They will still show only relevant options.
The search function at the top right of the table accepts words and/or whole numbers. The search function looks across all columns for all entities in the database and displays every entity with a column containing the number and/or word that was typed.
Use the first drop down box to select multiple fields from the database. Your choices will be displayed automatically in the table. The application defaults to showing several key fields. Use backspace within the search box to remove fields from the table.</p>
</div>
<br>
<div class = "homecontent">
<div class = "sidebar">
<h4>Select Fields from Database</h4>
</div>
<div class = "tablecontainer">
<table id="entity_table" class="table table-striped">
<tr>
<th>DHS Entity Name</th>
<th>DHS Legal Name</th>
<th>Full Address</th>
</tr>
<tr>
<td>Test1</td>
<td>Test2</td>
<td>Test3</td>
</tr>
<tr>
<td>Test1</td>
<td>Test2</td>
<td>Test3</td>
</tr>
<tr>
<td>Test1</td>
<td>Test2</td>
<td>Test3</td>
</tr>
</table>
</div>
</div>
{% endblock %}
{% block scripts %}
<script>
$(document).ready(function () {
$('#entity_table').dataTable();
});
</script>
{% endblock %}
CSS (main.css)
body {
margin: 0;
padding: 0;
font-family: "Helvetica Neue", Helvetica, Arial, sans-serif;
color: #444;
}
/*
* Formatting the header area
*/
header {
box-shadow: 3px 5px 3px #d0d1d3; /* offset x, offset y, blur radius */
background-color: #002060;
height: 90px;
width: 100%;
opacity: .9;
margin-bottom: 5px;
}
div.container {
scroll-behavior: auto;
padding-top: 5px;
padding-right: 5px;
padding-bottom: 5px;
padding-left: 5px;
}
div.logoimg{
display: flex;
justify-content: left;
align-items: center;
}
/*
* Formatting the container contents
*/
.container {
width: 1200px;
margin: 0 auto;
}
div.homecontent{
width: 1200px;
}
div.tablecontainer {
float: right;
width: 850px;
background-color: #ffffff;
}
div.sidebar {
float: left;
box-shadow: 3px 5px 3px #d0d1d3;
text-align: center;
width: 300px;
height: 500px;
padding-top: 10px;
padding-right: 10px;
padding-bottom: 10px;
padding-left: 10px;
background-color: #F3F4F5;
-webkit-border-radius: 6px;
-moz-border-radius: 6px;
border-radius: 6px;
}
div.home {
box-shadow: 3px 5px 3px #d0d1d3;
padding-top: 10px;
padding-right: 10px;
padding-bottom: 10px;
padding-left: 10px;
background-color: #F3F4F5;
-webkit-border-radius: 6px;
-moz-border-radius: 6px;
border-radius: 6px;
}
div.map {
padding-top: 10px;
padding-right: 10px;
padding-bottom: 10px;
padding-left: 10px;
background-color: #F3F4F5;
-webkit-border-radius: 6px;
-moz-border-radius: 6px;
border-radius: 6px;
}
h2 {
font-size: 3em;
margin-top: 40px;
text-align: center;
letter-spacing: -2px;
}
h3 {
font-size: 1.7em;
font-weight: 100;
margin-top: 30px;
text-align: center;
letter-spacing: -1px;
color: #999;
}
.menu {
float: right;
margin-top: 8px;
}
.menu li {
display: inline;
}
.menu li + li {
margin-left: 35px;
}
.menu li a {
color: #fff;
text-decoration: none;
}
App:
from flask import Flask, render_template
ocyf = Flask(__name__)
#ocyf.route('/')
def home():
return render_template("home.html")
#ocyf.route('/map/')
def map():
return render_template("map.html")
if __name__ == '__main__':
ocyf.run(debug=True)
You must use thead and tbody tags on your table.
<table id="entity_table" class="table table-striped">
<thead>
<tr>
<th>DHS Entity Name</th>
<th>DHS Legal Name</th>
<th>Full Address</th>
</tr>
</thead>
<tbody>
<tr>
<td>Test1</td>
<td>Test2</td>
<td>Test3</td>
</tr>
<tr>
<td>Test1</td>
<td>Test2</td>
<td>Test3</td>
</tr>
<tr>
<td>Test1</td>
<td>Test2</td>
<td>Test3</td>
</tr>
</tbody>
</table>

HTML tag <a> does not appear in Gmail and Outlook (maybe others)

I must repair a bug on the emails sent by my company when a new customer's account is created for the website, it must show a HTML <a> tag with the link to go on the new account, but in Gmail and Outlook, this <a> tag does not appear.
The code just here:
<tr>
<td style="font-size: 0px; padding: 10px 25px; padding-top: 30px; padding-bottom: 30px;
word-break: break-word;" align="center">
<table style="border-collapse: separate; line-height: 100%;" border="0" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td style="border: none; border-radius: 37px; cursor: auto; mso-padding-alt: 10px 25px;
background: #5F9AB0;" align="center" valign="middle" bgcolor="#5F9AB0"> Demander mon mot de passe
</td>
</tr>
</tbody>
</table>
</td>
</tr>
Screenshot of expected result (actual when I read my mail with the Mail mac app)
Screenshot of the display of the mail in Chrome and Firefox using Gmail and Outlook webapp
#Syfer suggested email queries are involved. If they are, check out https://www.caniemail.com/search/?s=media.
That says support for media queries is absent or spotty for all Gmails, all the Outlooks (except for the one that uses Apple Mail), all Yahoo's, all AOL's, and others.
Ain't it grand writing email HTML? 😁

Unable to style mail body using velocity template

I am using velocity template to send email.
I am able to send the mail with static text as body of the mail.
But I am unable to style the mail body like adding color styling tables in the mail body.
thus could you please suggest how I can acheive.
This is sample vm file which I am getting as it is mail body i.e it is not styled.
<!DOCTYPE html>
<html>
<head>
<title>User Information</title>
<style>
body{
font-family: verdana;
}
table {
width: 500px;
}
table, th, td {
border: 1px solid black;
padding: 2px;
}
th{
background-color: #00439A;
color : #FFFFFF;
}
tr.odd{
background-color: #CFCFCF;
}
tr.even{
background-color: #1076F5;
}
</style>
</head>
<body>
<h1>User Information</h1>
<table>
<tr>
<th>ID</th>
<th>First Name</th>
<th>Last Name</th>
<th>Age</th>
<th>Gender</th>
</tr>
</table>
</body>
</html>

Issue with cfdocument page break

I have a master query that I output that gives me a list of sports. Then I have two sub queries that give me Query1: person information on who is in each sport, and Query2: Sport books the the people from query 1 have. I am trying to output a table row as a sort of header for each sport, then additional rows for the query data. I would like to break a page after every sport so I don't have a sport description row on the bottom of a page with the rest of the data on the next.
I have tried adding in every spot imaginable, but I always end up with blank pages equaling the recordcount of the getsports query at the beginning of the document.
Here is the code I am using. I have removed the actual query data. Does anyone have any suggestions or thoughts on what I am doing wrong?
<cfquery name="getterm" datasource="DS1">
select * from dbo.semester where current = 1
</cfquery>
<cfquery name="getsports" datasource="DS1">
SELECT * FROM [sports]
</cfquery>
<body>
<div id="wrap">
<cfinclude template="header.cfm">
<!-- header end -->
<div class="container" style="padding-top:0px;">
<cfdocument format="PDF" mimetype="application/pdf" orientation="landscape">
<table width="100%" cellspacing="0" cellpadding="0" topmargin="0" leftmargin="0" border="0" style="font-size:10px">
<tr>
<td colspan="8" bgcolor="#e3edef" style="padding-top:10px; padding-bottom:10px; padding-left:2px; font-family:Arial;" align="center">Books Not Issued -
<cfoutput>#yearOfSport#</h1>
</cfoutput>
</td>
</tr>
<cfoutput>
<cfloop query="getsports">
<tr>
<td colspan="8" style="padding-top:10px; padding-bottom:10px; padding-left:2px; font-family:Arial;" align="center">
<h1>#getsports.descr# </h1>
</td>
</tr>
<cfquery name="getbooks" datasource="DS1">
...
</cfquery>
<cfloop query="getbooks">
<cfquery name="getbooks2" datasource="DS1">
...
</cfquery>
<tr>
<td align="left" style="border-bottom: 1px solid; border-top: 1px solid; padding-top:10px;padding-bottom:10px;font-family:arial">
<h4>ID</h4></td>
<td align="left" style="border-bottom: 1px solid; border-top: 1px solid; padding-top:10px;padding-bottom:10px;font-family:arial">
<h4>Name</h4></td>
<td colspan="4" align="left" style="border-bottom: 1px solid; border-top: 1px solid; padding-top:10px;padding-bottom:10px;font-family:arial">
<h4>Sport</h4></td>
</tr>
<tr>
<td align="left" style="padding-top:10px;padding-bottom:10px;font-family:arial; font-weight: normal;">#id#</td>
<td align="left" style="padding-top:10px;padding-bottom:10px;font-family:arial; font-weight: normal;">#nameLast#, #nameFirst#, #nameMiddle# </td>
<td colspan="4" align="left" style="padding-top:10px;padding-bottom:10px;font-family:arial; font-weight: normal;">#sport#</td>
</tr>
<tr>
<td align="left" style="border-bottom: 1px solid ##cccccc; padding-top:10px;padding-bottom:10px;font-family:arial">Class</td>
<td align="left" style="border-bottom: 1px solid ##cccccc; padding-top:10px;padding-bottom:10px;font-family:arial">ISBN</td>
<td align="left" style="border-bottom: 1px solid ##cccccc; padding-top:10px;padding-bottom:10px;font-family:arial">Title</td>
<td align="left" style="border-bottom: 1px solid ##cccccc; padding-top:10px;padding-bottom:10px;font-family:arial">Author</td>
<td align="left" style="border-bottom: 1px solid ##cccccc; padding-top:10px;padding-bottom:10px;font-family:arial">Status</td>
</tr>
<cfloop query="getbooks2">
<tr>
<td align="left" style="padding-top:10px;padding-bottom:10px;font-family:arial">#getbooks2.subject#</td>
<td align="left" style="padding-top:10px;padding-bottom:10px;font-family:arial">#getbooks2.ISBN#</td>
<td align="left" style="padding-top:10px;padding-bottom:10px;font-family:arial">#getbooks2.title#</td>
<td align="left" style="padding-top:10px;padding-bottom:10px;font-family:arial">#getbooks2.author#</td>
<td align="left" style="padding-top:10px;padding-bottom:10px;font-family:arial">#getbooks2.status#</td>
</tr>
</cfloop>
</cfloop>
</cfloop>
</cfoutput>
<cfdocumentItem type="footer">
<table width="100%" style="font-size:10px;">
<tr>
<td style="font-family:Arial;" align="left">
<cfoutput>Page #cfdocument.currentpagenumber# of #cfdocument.totalpagecount# - #dateformat(now(), "mm/dd/yyyy")#</cfoutput>
</td>
</tr>
</table>
</cfdocumentItem>
</div>
</table>
</cfdocument>
</div>
<!-- Container end -->
<div id="push"></div>
</div>
<cfinclude template="footer.cfm">
</body>
Disclaimer: I have no idea what is in those header and footer files..but the header and footer is outside of the cfdocument tag. I usually don't do that but maybe there is a reason why you do.
In order to get my breaks in looping data I add this (below) to the end of my loops, and sometimes I add a counter incase I need more surgical control.
<div style="page-break-before:always"> </div>
And it just works.
If you need to get some precision. I would run your page without the cf document and get the source output from the browser and start to figue what you need, don't need and where you want your breaks to occur.
Then take that and just wrap cfdocument around the raw HTML output and see if you get the desired effects.
If you data is enourmous, then grab a subset so limit your results so you can manage a smaller chuck of that report.
One other things I do for my sanity.
I use cfsavecontent and set all my HTML output to a variable and then stuff it into cfdocument so I am not mixing-it-up so to speak...
Example:
<cfsavecontent variable="buildUpReport">
<cfinclude template="header.cfm">
nested looping ad nauseum...
<div style="page-break-before:always"> </div>
more ad nauseum looping...
<cfinclude template="footer.cfm">
</cfsavecontent>
<cfdocument localUrl="yes"
format="PDF"
mimetype="text/html"
marginbottom="0.15"
margintop="0"
marginright="0"
marginleft="0">
<cfoutput>#buildUpReport#</cfoutput>
<cfdocumentitem type="footer" evalatprint="true">
<table width="100%" border="0" cellpadding="0" cellspacing="0">
<tr><td align="center">
<cfoutput>
#cfdocument.currentpagenumber# of
#cfdocument.totalpagecount# |
#dateformat(now(),"mm-dd-yyyy")#
</cfoutput>
</td></tr>
</table>
</cfdocumentitem>
</cfdocument>
Thanks for the suggestions. I ended up including additional table tags inside my main output. Once I did that the page breaks worked correctly. I think the way I was doing it before was breaking the table structure.

Use regular expressions to remove HTML tags in Flex/AS3

I'm writing a HTML parser in Flex (AS3) and I need to remove some HTML tags that are not needed.
For example, I want to remove the divs from this code:
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<p style="padding-left: 18px; padding-right: 20px; text-align: center;">
<span></span>
<span style=" font-size: 48px; color: #666666; font-style: normal; font-weight: bold; text-decoration: none; font-family: Arial;">20% OFF.</span>
<span> </span>
<span style=" font-size: 48px; color: #666666; font-style: normal; font-weight: normal; text-decoration: none; font-family: Arial;">Do it NOW!</span>
<span> </span>
</p>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
and end with something like this:
<div>
<p style="padding-left: 18px; padding-right: 20px; text-align: center;">
<span></span>
<span style=" font-size: 48px; color: #666666; font-style: normal; font-weight: bold; text-decoration: none; font-family: Arial;">20% OFF.</span>
<span> </span>
<span style=" font-size: 48px; color: #666666; font-style: normal; font-weight: normal; text-decoration: none; font-family: Arial;">Do it NOW!</span>
<span> </span>
</p>
</div>
My question is, how can I write a regular expression to remove these unwanted DIVs? Is there a better way to do it?
Thanks in advance.
You can't match arbitrarily nested constructs with a regular expression because nesting means irregularity. A parser (which you are writing) is the correct tool for this.
Now in this very special case, you could do a
result = subject.replace(/^\s*(<\/?div>)(?:\s*\1)*(?=\s*\1)/mg, "");
(which would simply remove all directly subsequent occurrences of <div> or </div> except the last one), but this is bad in so many ways that I'm afraid it will get me downvoted into oblivion.
To explain:
^ # match start of line
\s* # match leading whitespace
(</?div>) # match a <div> or </div>, remember which
(?:\s*\1)* # match any further <div> or </div>, same one as before
(?=\s*\1) # as long as there is another one right ahead
Can you count the ways in these this will fail? (Think comments, unmatched <div>s etc.)
Assuming that your target HTML is actually valid XML, you can use a recursive function to drag out the non-div bits.
static function grabNonDivContents(xml:XML):XMLList {
var out:XMLList = new XMLList();
var kids:XMLList = xml.children();
for each (var kid:XML in kids) {
if (kid.name() && kid.name() == "div") {
var grandkids:XMLList = grabNonDivContents(kid);
for each (var grandkid:XML in grandkids) {
out += grandKid;
}
} else {
out += kid;
}
}
return out;
}
In my experience, parse complex html with regex only is hell. Regexes are quickly getting out of hand. It is much more robust to extract pieces of information you need (maybe with simple regexes) and assemble them back into simpler document.