regex: remove all but? - regex

I have html that looks like
<tr>
<td align="left"><a target="_blank" href="/tags/ref_color_tryit.asp?color=Yellow">Yellow</a> </td>
<td align="left"><a target="_blank" href="/tags/ref_color_tryit.asp?hex=FFFF00">#FFFF00</a></td>
<td bgcolor="#FFFF00"> </td>
<td align="left">Shades</td>
<td align="left">Mix</td>
</tr>
<tr>
<td align="left"><a target="_blank" href="/tags/ref_color_tryit.asp?color=YellowGreen">YellowGreen</a> </td>
<td align="left"><a target="_blank" href="/tags/ref_color_tryit.asp?hex=9ACD32">#9ACD32</a></td>
<td bgcolor="#9ACD32"> </td>
<td align="left">Shades</td>
<td align="left">Mix</td>
</tr>
What I am wanting to do is
filter the html so I only end up with
<td bgcolor="#XXXXXX"> </td>
Then Filter that so I end up with a whole pile of rows of
XXXXXX
XXXXXX
How would I do that?

Hi you can use following regex.
\<td bgcolor\=\"([^\"]+\)">\&nbsp\;\<\/td\>
Use group option to capture "XXXXXX"

First regex to match the right tags:
\<td bgcolor="#[0-9A-Fa-f]{6}"> \</td\>
Then, you can filter that data again with (or use a group option, depends on what language as to which is more convenient):
[0-9A-Fa-f]{6}
That is, if you want to use regex (don't shoot me, the question is what regular expression can I use for this)

if you must use regex, here is a demo using Ruby's irb:
>> %Q{
<tr>
<td align="left"><a target="_blank" href="/tags/ref_color_tryit.asp?color=Yellow">Yellow</a> </td>
<td align="left"><a target="_blank" href="/tags/ref_color_tryit.asp?hex=FFFF00">#FFFF00</a></td>
<td bgcolor="#FFFF00"> </td>
<td align="left">Shades</td>
<td align="left">Mix</td>
</tr>
<tr>
<td align="left"><a target="_blank" href="/tags/ref_color_tryit.asp?color=YellowGreen">YellowGreen</a> </td>
<td align="left"><a target="_blank" href="/tags/ref_color_tryit.asp?hex=9ACD32">#9ACD32</a></td>
<td bgcolor="#9ACD32"> </td>
<td align="left">Shades</td>
<td align="left">Mix</td>
</tr>
}.scan(/<td[^>]*> <\/td>/).map {|s| s[/#([a-f0-9]+)/i, 1]}
=> ["FFFF00", "9ACD32"]

I wouldn't parse HTML with regex's either, but if I did I'd do it like this ;)
var html = '<tr>\n<td align="left"><a target="_blank" href="/tags/ref_color_tryit.asp?color=Yellow">Yellow</a> </td>\n<td align="left"><a target="_blank" href="/tags/ref_color_tryit.asp?hex=FFFF00">#FFFF00</a></td>\n<td bgcolor="#FFFF00"> </td>\n<td align="left">Shades</td>\n<td align="left">Mix</td>\n</tr>\n\n\n<tr>\n<td align="left"><a target="_blank" href="/tags/ref_color_tryit.asp?color=YellowGreen">YellowGreen</a> </td>\n<td align="left"><a target="_blank" href="/tags/ref_color_tryit.asp?hex=9ACD32">#9ACD32</a></td>\n<td bgcolor="#9ACD32"> </td>\n<td align="left">Shades</td>\n<td align="left">Mix</td>\n</tr>'
.split('\n'),
colors = [],
i, l,
match;
for(i = 0, l = html.length; i < l; i++) {
if(match = /<td bgcolor="#([\da-fA-F]{6})"> <\/td>/.exec(html[i])) {
colors.push(match[1]);
}
}
console.log(colors);

Related

Remove all label tags inside a string

I want to remove all label tags from a string.
This is the input string.
<p>
<title>Contact Us</title>
</p>
<table dropzone="copy">
<tbody>
<tr>
<td class="label" style="cursor: default;">Full Name</td>
<td style=
"cursor: default;">[<label id="{0a4a7240-9606-416a-bf7b-ef11a47cca8e}">First name</label>] [<label id="{94263497-683b-46f9-ba0f-69f4c2736598}">Last name</label>]</td>
</tr>
<tr>
<td class="label" style="cursor: d
efault;">Telephone</td>
<td style="cursor: default;">[<label id="{ce68e02e-e9fd-40ee-9375-ee1b05972e9b}">Phone</label>]</td>
</tr>
<tr>
<td class="label" style="cursor: default;">Email</td>
<td style="cursor: default;">[<label id="{411b580e-f7e9-4dd2-a70d-947385360cd0}">Email</label>]</td>
</tr>
<tr>
<td class="label" style="cursor: default;">Message</td>
<td style="cursor: default;">[
<label id="{13e2ff23-135c-4c6d-beb4-2960a533cb98}">Your Message</label>]</td>
</tr>
<tr>
<td class="label" style="cursor: default;">Company</td>
<td style="cursor: default;">[<label id="{c3f22c3a-8fc1
-48a4-8d6a-fe346024ca2b}">Company</label>]</td>
</tr>
</tbody>
</table>
<p> </p>
<p> </p>
The label tag needs to be removed but value inside the string should not be removed
<label id="{0a4a7240-9606-416a-bf7b-ef11a47cca8e}">First name</label> will become First name
<label id="{ce68e02e-e9fd-40ee-9375-ee1b05972e9b}">Phone</label> will become Phone
<label id="{411b580e-f7e9-4dd2-a70d-947385360cd0}">Email</label> will become Email
<label id="{13e2ff23-135c-4c6d-beb4-2960a533cb98}">Your Message</label> will become Your Message
<label id="{c3f22c3a-8fc1-48a4-8d6a-fe346024ca2b}">Company</label> will become Company
I tried the following regex [Regex]::Match( $text, '(?s)<label(.*)">' ).Groups.Value but its not working.
Any suggestions would be appreciated
Thanks in advance
This regex could work, you can use the -replace operator instead of the call to Regex.Replace:
(Get-Content path\to\file -Raw) -replace '<label id="\{[\d\w-]+}">([a-z ]+)<\/label>', '$1'
See https://regex101.com/r/3gbJEp/1 for details.
It is generally a bad idea to attempt to parse HTML with regular expressions.
Instead use a dedicated HTML parser as the HtmlDocument class
Example
function ParseHtml($String) {
$Unicode = [System.Text.Encoding]::Unicode.GetBytes($String)
$Html = New-Object -Com 'HTMLFile'
if ($Html.PSObject.Methods.Name -Contains 'IHTMLDocument2_Write') {
$Html.IHTMLDocument2_Write($Unicode)
} else {
$Html.write($Unicode)
}
$Html.Close()
$Html
}
$Html = ParseHtml ' # Your Html
<p>
...
<p> </p>
'
$Html.getElementsByTagName('label') |ForEach-Object { $Null = $_.removeNode() }
$Html.body.innerHtml
<P></P>
<TABLE dropzone="copy">
<TBODY>
<TR>
<TD class=label style="CURSOR: default">Full Name</TD>
<TD style="CURSOR: default">[First name] [Last name]</TD></TR>
<TR>
<TD class=label style="CURSOR: d
efault">Telephone</TD>
<TD style="CURSOR: default">[Phone]</TD></TR>
<TR>
<TD class=label style="CURSOR: default">Email</TD>
<TD style="CURSOR: default">[Email]</TD></TR>
<TR>
<TD class=label style="CURSOR: default">Message</TD>
<TD style="CURSOR: default">[ Your Message]</TD></TR>
<TR>
<TD class=label style="CURSOR: default">Company</TD>
<TD style="CURSOR: default">[Company]</TD></TR></TBODY></TABLE>
<P></P>
<P></P>

Get verification code from a html string code using regex

I am currently writing an automation script, Where I read email Gmail through API and i am getting below html content. Now i need only code 191418 from this html content, I want to take it using regex. I tried with this
.*([0-9]{6})
To find 6 digit code but its returns 10 matchings, I am not good at regex, Can someone please help me to get the code using regex?
<div dir="ltr"><br><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr"><br></div><u></u>
<div>
<center id="m_-2051398760120817894wrapper">
<table id="m_-2051398760120817894main" width="100%">
<tbody><tr id="m_-2051398760120817894logo">
<td>
<table width="100%">
<tbody><tr>
<td>
<img src="test.com/logo.png" width="140px" alt="xxxxx Logo" style="padding:0 10px">
</td>
</tr>
</tbody></table>
</td>
</tr>
<tr>
<td height="18px"></td>
</tr>
<tr id="m_-2051398760120817894header">
<td>
<table width="100%">
<tbody><tr>
<td height="64px" style="background-color:#10069f;color:#fff;padding-left:24px;font-weight:700">Reset your password</td>
</tr>
</tbody></table>
</td>
</tr>
<tr id="m_-2051398760120817894content">
<td>
<table width="100%">
<tbody><tr>
<td style="background-color:#f6f5ff;padding:24px 24px 16px 24px">
<p style="margin-top:0">The following is the verification code required to complete your password reset.</p>
<p style="margin-bottom:24px">Enter the following verification code on the screen during the registration, and proceed to the next step.</p>
<div style="display:block;text-align:center;margin-bottom:8px;background-color:#fff;height:92px;font-weight:600;font-size:36px;line-height:92px">191418</div>
<span style="display:block;font-size:12px;color:#5d5d5d">*The verification code is valid only for 24 hours.</span>
</td>
</tr>
</tbody></table>
</td>
</tr>
<tr>
<td height="24px"></td>
</tr>
<tr id="m_-2051398760120817894footer">
<td>
<table width="100%">
<tbody><tr>
<td style="background-color:#6d7777;padding:16px 24px;font-size:12px;color:#fff">
<table width="100%">
<tbody><tr>
<td id="m_-2051398760120817894footer-left">
<span style="display:block">amnimo Inc.</span>
<span style="display:block">0-3-30 usaa-fso, xxxxxxxx-shi, Tokyo, 180-8750, Japan</span>
<span style="display:block">Phone: +81-422-52-6779</span>
<span id="m_-2051398760120817894copyright-mb" style="margin-top:16px">© 2020 <div dir="ltr"><br><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr"><br></div><u></u>
<div>
<center id="m_-2051398760120817894wrapper">
<table id="m_-2051398760120817894main" width="100%">
<tbody><tr id="m_-2051398760120817894logo">
<td>
<table width="100%">
<tbody><tr>
<td>
<img src="https://test.com/logo.png" width="140px" alt="Amnimo Logo" style="padding:0 10px">
</td>
</tr>
</tbody></table>
</td>
</tr>
<tr>
<td height="18px"></td>
</tr>
<tr id="m_-2051398760120817894header">
<td>
<table width="100%">
<tbody><tr>
<td height="64px" style="background-color:#10069f;color:#fff;padding-left:24px;font-weight:700">Reset your password</td>
</tr>
</tbody></table>
</td>
</tr>
<tr id="m_-2051398760120817894content">
<td>
<table width="100%">
<tbody><tr>
<td style="background-color:#f6f5ff;padding:24px 24px 16px 24px">
<p style="margin-top:0">The following is the verification code required to complete your password reset.</p>
<p style="margin-bottom:24px">Enter the following verification code on the screen during the registration, and proceed to the next step.</p>
<div style="display:block;text-align:center;margin-bottom:8px;background-color:#fff;height:92px;font-weight:600;font-size:36px;line-height:92px">191418</div>
<span style="display:block;font-size:12px;color:#5d5d5d">*The verification code is valid only for 24 hours.</span>
</td>
</tr>
</tbody></table>
</td>
</tr>
<tr>
<td height="24px"></td>
</tr>
<tr id="m_-2051398760120817894footer">
<td>
<table width="100%">
<tbody><tr>
<td style="background-color:#6d7777;padding:16px 24px;font-size:12px;color:#fff">
<table width="100%">
<tbody><tr>
<td id="m_-2051398760120817894footer-left">
<span style="display:block">test Inc.</span>
<span style="display:block">2-9-32 ssdsa-sss, puakano-shi, Tokyo, 000-8000, Japan</span>
<span style="display:block">Phone: +81-000-00-652</span>
<span id="m_-2051398760120817894copyright-mb" style="margin-top:16px">© 2020 amnimo Inc.</span>
</td>
<td id="m_-2051398760120817894footer-right">
<span style="display:block">© 2020 amnimo Inc.</span>
</td>
</tr>
</tbody></table>
</td>
</tr>
</tbody></table>
</td>
</tr>
</tbody></table>
</center>
</div>
</div></div> Inc.</span>
</td>
<td id="m_-2051398760120817894footer-right">
<span style="display:block">© 2020 test Inc.</span>
</td>
</tr>
</tbody></table>
</td>
</tr>
</tbody></table>
</td>
</tr>
</tbody></table>
</center>
</div>
</div></div>
You should use some DOM library that will let you query the element you want and get its content. Parsing HTML with regex is bad idea.
If you must do it, getting six numbers is not enough - after inspecting, I see that it's div content. So I would write something among the lines:
<div[^>]*>\d{6}<\/div>
Pattern explanation:
<div - match <div literally
[^>]* - match zero or more characters other from >
> - match > literally
\d{6} - match 6 digits
<\/div> - match <\/div> literally
Regex demo
EDIT
In order to extract desired text, use capturing groups:
<div[^>]*>(\d{6})<\/div>
Then text in first capturing group will be your desired result.
Maybe try word boundaries, which will prevent matching inside longer numbers:
\b([0-9]{6})\b
https://regex101.com/r/dQAiHU/1/

Hyperlink in custom display form

I am trying to add a field that holds a hyperlink in a custom display form. I have the following code:
<td width="190px" valign="top" class="ms-formlabel">
<H3 class="ms-standardheader">
<nobr>ATO</nobr>
</H3>
</td>
<td width="400px" valign="top" class="ms-formbody">
<a href="https://intelshare.intelink.gov/sites/carm/Shared%20Documents/ATO%20Letters/' + #eMASSID + '.pdf'">
<xsl:value-of select="#ATO"/>
</a>
</td>
</tr>
However it only shows the "ATO" text, it doesn't display the hyperlink. I am pretty new to using SharePoint designer so I'm not to sure where to go from here.
Sample test demo:
<tr>
<td width="190px" valign="top" class="ms-formlabel">
<H3 class="ms-standardheader">
<nobr>Employee Type</nobr>
</H3>
</td>
<td width="400px" valign="top" class="ms-formbody">
<a href="https://intelshare.intelink.gov/sites/carm/Shared%20Documents/ATO%20Letters/{#Company_x0020_name}.pdf">
<xsl:value-of select="#Employee_x0020_Type"/>
</a>
<xsl:value-of select="#Employee_x0020_Type"/>
</td>
</tr>
Result:

Code after CFInclude seems to disappear or is not rendered

Having some issues with a ColdFusion application here. I'm trying to add in a <cfinclude template="header.cfm"/> and it renders correctly however the rest of the cf code seems to disappear, not sure if its not being rendered or just not showing up because of the cfinclude statement running. This is for a page header I'm trying to insert.
Is there a way to insert the cfincludes and have it stop so the rest of the page can process? Does my question make sense?
<table width="600" border="0" align="center" cellpadding="0" cellspacing="0">
<!-- fwtable fwsrc="header.png" fwbase="default.gif" fwstyle="Dreamweaver" fwdocid = "742308039" fwnested="1" -->
<tr>
<td><img name="grantpro" src="images/grantpro.gif" width="411" height="80" border="0" alt=""></td>
<td><img name="gpimage" src="images/gpimage.jpg" width="189" height="80" border="0" alt=""></td>
</tr>
<tr>
<td colspan="2" align="center">
<table width="599px" border="0" align="center" cellpadding="1" cellspacing="1">
<tr>
<td colspan="4"><div align="center"><font size="5"><strong>FDC Menu</strong></font></div></td>
</tr>
<td colspan="3"><strong>FDC Pending Proposals:</strong></td>
</tr>
<tr>
<td> </td>
<td colspan="2">By Applicant Name</td>
</tr>
<tr>
<td> </td>
<td colspan="2">By Grant Type</td>
</tr>
<tr>
<td> </td>
<td colspan="2"> </td>
</tr>
<tr>
<td colspan="3"><strong>FDC Funded Proposals:</strong></td>
</tr>
<tr>
<td> </td>
<td colspan="2"><strong><em>Current Year</em></strong></td>
</tr>
<tr>
<td> </td>
<td> </td>
<td>By Applicant Name</td>
</tr>
<tr>
<td> </td>
<td> </td>
<td>By Grant Type</td>
</tr>
<tr>
<td> </td>
<td colspan="2"><em><strong>Prior Years</strong></em></td>
</tr>
<tr>
<td> </td>
<td> </td>
<td>By Applicant Name </td>
</tr>
<tr>
<td> </td>
<td> </td>
<td>By Grant Type</td>
<cfinclude template="cssmenu/header.cfm"/>
</table>
<p align="center"><strong>Logout</strong></p> </td>
The following code shows where the problem is
<tr>
<td> </td>
<td> </td>
<td>By Grant Type</td>
<cfinclude template="cssmenu/header.cfm"/>
</table>
Solution 1:
This is the recommended solution
The <cfinclude> probably should be moved outside of the </table>
Solution 2:
cssmenu/header.cfm would need to finish the current table row and start an new one. This is not recommended. It is not modular at all.
</tr>
<tr>
<td colspan="3">
... Content goes here ...
</td>
</tr>
You are missing a </tr> before the <cfinclude>. Also it seems like an odd place to include a header, rather add another table row and td and include the header inside of the <td> not in between the table code as this is causing it to break.

Using Regular Expressions in DW to copy attributes from one tag into another

I'm trying to copy an tag, with attributes from one place in my HTML into another place via Regular Expressions in Dreamweaver. Specifically, I would like to take the following code:
<i class="icon-camera"></i></td>
<td class="lastName"><a name="smith" class="lbp_secondary" href="http://www.mywebsite.com/contact-us/directory/joe-smith/" rel="nofollow">Smith</a>
and do a Find/Replace with Regular Expressions enabled, so that the code is replaced with the following syntax:
<a name="smith" class="lbp_secondary" href="http://www.mywebsite.com/contact-us/directory/joe-smith/" rel="nofollow"><i class="icon-camera"></i></a></td>
<td class="lastName"><a name="smith" class="lbp_secondary" href="http://www.mywebsite.com/contact-us/directory/joe-smith/" rel="nofollow">Smith</a>
Basically, it's wrapping the tag pair with the same tag used in the next line.
The Find/Replace I've tried so far is:
Find:
<i class="icon-camera"></i></td>
<td class="lastName"><a(.*)>(.*)</a>
Replace:
<a$1><i class="icon-camera"></i></a></td>
<td class="lastName"><a$1>$2</a>
Also, to be clear, I'm trying to do this for about 300 (out of about 450) instances where exists in my HTML. So some sample data to use would look like:
<tr>
<td class="photo" style="text-align: center;" align="center"><i class="icon-camera"></i></td>
<td class="lastName"><a name="davis" class="lbp_secondary" href="http://www.mywebsite.com/contact-us/directory/kathy-aldrich/" rel="nofollow">Davis</a></td>
<td class="firstName"><a name="david" class="lbp_secondary" href="http://www.mywebsite.com/contact-us/directory/kathy-aldrich/" rel="nofollow">David</a></td>
<td class="businessPhone">509-555-2353</td>
<td class="emailAddress">davidd#mywebsite.com</td>
<td class="office">1822</td>
<td class="department">Shipping</td>
</tr>
<tr>
<td class="photo" style="text-align: center;" align="center"><i class="icon-camera"></i></td>
<td class="lastName"><a name="allen" class="lbp_secondary" href="http://www.mywebsite.com/contact-us/directory/alan-allen/" rel="nofollow">Allen</a></td>
<td class="firstName"><a name="alan" class="lbp_secondary" href="http://www.mywebsite.com/contact-us/directory/alan-allen/" rel="nofollow">Alan</a></td>
<td class="businessPhone">509-555-2027</td>
<td class="emailAddress">alana#mywebsite.com</td>
<td class="office">1481</td>
<td class="department">Marketing</td>
</tr>
<tr>
<td class="photo" style="text-align: center;" align="center"> </td>
<td class="lastName"><a name="buttons" class="lbp_secondary" href="http://www.mywebsite.com/contact-us/directory/benjamin-buttons/" rel="nofollow">Buttons</a></td>
<td class="firstName"><a name="benjamin" class="lbp_secondary" href="http://www.mywebsite.com/contact-us/directory/benjamin-buttons/" rel="nofollow">Benjamin</a></td>
<td class="businessPhone">509-555-2250</td>
<td class="emailAddress">benjaminb#mywebsite.com</td>
<td class="office">3013</td>
<td class="department">HR</td>
</tr>
<tr>
<td class="photo" style="text-align: center;" align="center"><i class="icon-camera"></i></td>
<td class="lastName"><a name="Lenore" class="lbp_secondary" href="http://www.mywebsite.com/contact-us/directory/luis-lenore/" rel="nofollow">Lenore</a></td>
<td class="firstName"><a name="luis" class="lbp_secondary" href="http://www.mywebsite.com/contact-us/directory/luis-lenore/" rel="nofollow">Luis</a></td>
<td class="businessPhone">509-555-2042</td>
<td class="emailAddress">luisl#mywebsite.com</td>
<td class="office">1432</td>
<td class="department">Support</td>
</tr>
In the Find box, put the following:
(<i class="icon-camera"></i>)(</td>)
and in Replace box:
<a name="smith" class="lbp_secondary" href="http://www.mywebsite.com/contact-us/directory/joe-smith/" rel="nofollow">$1</a>$2
To find the first occurence of <i class="icon-camera"></i> and wrap it in with a copy of the next <a> tag:
Find:
(<i class="icon-camera"></i>)([\s\S]*?)(<a [^>]*>)
Replace:
$3$1</a>$2$3
DEMO
Notice I used [\s\S]*?, which is exactly the same as .*? but it will also match newlines.
The extra ? makes a quantifier lazy.