perl regex for complex multiline search replace - regex

I know there are many questions on this topic, but most are fairly trivial and
I'm unable to find a solution for my case.
I have a set of HTML files with many, many "media" items like the following,
each of which is a "paragraph", separated by "\n\n". Here is a link to a sample file of the type I'm working on.
<li class="media">
<div class="media-left">
<a href="#">
<img class="media-object" src="4_17-HE-assoc.png" width="250" alt="...">
</a>
</div>
<div class="media-body">
<h4 class="media-heading">Figure 4.17</h4>
Association plot for the hair-color eye-color data. Left: marginal table, collapsed over
gender; right: full table.
</div>
</li>
For each <img ...> tag, I need to find the src="file" value, and replace the href="#" on the previous line
by href="file" class="fancybox. i.e., so that item will then look like
<li class="media">
<div class="media-left">
<a href="4_17-HE-assoc.png" class="fancybox">
<img class="media-object" src="4_17-HE-assoc.png" width="250" alt="...">
</a>
</div>
<div class="media-body">
<h4 class="media-heading">Figure 4.17</h4>
Association plot for the hair-color eye-color data. Left: marginal table, collapsed over
gender; right: full table.
</div>
</li>
I tried the following as a one-liner, but it has no effect, i.e., it doesn't make the changes.
perl -pi~ -e '$/ = "";s|<a href="#">\n(\s*<img class="media object") src=(".*png")|<a class="fancybox" href="\2">\n\1 src=\2|ms' ch03.html
Can someone help with this? I'd be happy with a simple script that I could
use for this and modify for other similar modifications of a collection of web files.
edit: I'm aware of the advantages of using perl modules such as HTML::TreeBuilder to avoid having to parse HTML directly. If someone
could give me a start, I could probably take it from there.

use XML::LibXML qw( );
my $qfn = 'ch03.html';
my $in_qfn = $qfn . "~";
my $out_qfn = $qfn;
rename($qfn, $in_qfn)
or die("Can't rename \"qfn\": $!\n");
my $parser = XML::LibXML->new();
my $doc = $parser->parse_html_file($in_qfn);
for my $a_node ($doc->findnodes('//a[#href="#"]')) {
my ($src_node) = $a_node->findnodes('img[1]/#src')
or next;
$a_node->setAttribute('href', $src_node->value());
$a_node->setAttribute('class', 'fancybox');
}
my $html = $doc->toStringHTML();
open(my $fh, '>', $out_qfn)
or die("Can't create \"$out_qfn\": $!\n");
print($fh $html);
Tested:
$ diff -u ch03.html{~,}
--- ch03.html~ 2016-01-20 12:41:30.809203040 -0800
+++ ch03.html 2016-01-20 12:41:31.009201042 -0800
## -1,7 +1,7 ##
-<div class="contents">
+<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
+<html><body><div class="contents">
<h1 class="tocpage">Chapter 3: Fitting and Graphing Discrete Distributions</h1>
<hr class="tocpage">
-
<div class="row">
<div class="col-md-6">
<!-- prelude-inserted -->
## -18,7 +18,7 ##
<div class="col-md-6">
<h3>Contents</h3>
<dl class="chaptoc">
- <dd>3.1. Introduction to discrete distributions</dd>
+<dd>3.1. Introduction to discrete distributions</dd>
<dd>3.2. Characteristics of discrete distributions</dd>
<dd>3.3. Fitting discrete distributions</dd>
<dd>3.4. Diagnosing discrete distributions: Ord plots</dd>
## -27,8 +27,7 ##
<dd>3.7. Chapter summary</dd>
<dd>3.8. Lab exercises</dd>
</dl>
-
- </div>
+</div>
</div>
<!-- more-content -->
## -38,11 +37,10 ##
<h3>Selected figures</h3>
<a class="btn btn-primary" href="../../Rcode/ch03.R" role="button">view R code</a>
<ul class="media-list">
- <li class="media">
+<li class="media">
<div class="media-left">
- <a href="#">
- <img class="media-object" src="saxony-barplot.png" width="250" alt="males in Saxony families">
- </a>
+ <a href="saxony-barplot.png" class="fancybox">
+ <img class="media-object" src="saxony-barplot.png" width="250" alt="males in Saxony families"></a>
</div>
<div class="media-body">
<h4 class="media-heading">Figure 3.2</h4>
## -52,9 +50,8 ##
<li class="media">
<div class="media-left">
- <a href="#">
- <img class="media-object" src="dbinom2-plot2-1.png" width="250" alt="Binomial distributions">
- </a>
+ <a href="dbinom2-plot2-1.png" class="fancybox">
+ <img class="media-object" src="dbinom2-plot2-1.png" width="250" alt="Binomial distributions"></a>
</div>
<div class="media-body">
<h4 class="media-heading">Figure 3.9</h4>
## -64,9 +61,8 ##
<li class="media">
<div class="media-left">
- <a href="#">
- <img class="media-object" src="dpois-xyplot2-1.png" width="250" alt="Poisson distributions">
- </a>
+ <a href="dpois-xyplot2-1.png" class="fancybox">
+ <img class="media-object" src="dpois-xyplot2-1.png" width="250" alt="Poisson distributions"></a>
</div>
<div class="media-body">
<h4 class="media-heading">Figure 3.11</h4>
## -76,9 +72,8 ##
<li class="media">
<div class="media-left">
- <a href="#">
- <img class="media-object" src="Fed0-plots2-1.png" width="250" alt="Hanging rootogram">
- </a>
+ <a href="Fed0-plots2-1.png" class="fancybox">
+ <img class="media-object" src="Fed0-plots2-1.png" width="250" alt="Hanging rootogram"></a>
</div>
<div class="media-body">
<h4 class="media-heading">Figure 3.15</h4>
## -89,9 +84,8 ##
<li class="media">
<div class="media-left">
- <a href="#">
- <img class="media-object" src="ordplot1-1.png" width="250" alt="Ord plot for the Butterfly data">
- </a>
+ <a href="ordplot1-1.png" class="fancybox">
+ <img class="media-object" src="ordplot1-1.png" width="250" alt="Ord plot for the Butterfly data"></a>
</div>
<div class="media-body">
<h4 class="media-heading">Figure 3.18</h4>
## -100,9 +94,10 ##
</div>
</li>
- </ul> <!-- media-list -->
- </div> <!-- col-md-12 -->
+ </ul>
+<!-- media-list -->
+</div> <!-- col-md-12 -->
<!-- footer -->
</div> <!-- row -->
-</div>
+</div></body></html>

I couldn't resist but write this one-off, super unstable, sends-me-to-parse-html-with-regex-hell sed command:
sed -i.bak '/<a href="#"/ {
N
/\n.*<img class=/ {
s/^\( *<a href="\).*\(\n.*src="\)\([^"]*\)\(.*\)/\1\3" class="fancybox">\2\3\4/
}
}' ch03.html
This looks for a line with href="#", appends the next line and then substitutes the filename and fancybox into the a tag.
Diffing the result and the input file:
43c43
< <a href="#">
---
> <a href="saxony-barplot.png" class="fancybox">
55c55
< <a href="#">
---
> <a href="dbinom2-plot2-1.png" class="fancybox">
67c67
< <a href="#">
---
> <a href="dpois-xyplot2-1.png" class="fancybox">
79c79
< <a href="#">
---
> <a href="Fed0-plots2-1.png" class="fancybox">

Related

How to show a detail page of a product with Laravel Livewire

I have a project in Laravel and Livewire.
This is my view displaying a list of projects submitted.
#forelse ($projects as $index => $project)
<div class="max-w-7xl mx-auto sm:px-6 lg:px-8">
<a href="#" wire:click="showProjectDetails({{ $project->slug }})"
class="block p-6 bg-white border border-gray-200 rounded-sm shadow-md hover:bg-gray-100 dark:bg-gray-800 dark:border-gray-700 dark:hover:bg-gray-700">
<h5 class="mb-2 text-2xl font-bold tracking-tight text-blue-500 dark:text-white">
{{ $project->title }}
</h5>
<p class="my-5">Bugdet Ghc500 - Ghc 700</p>
<p class="font-normal text-gray-700 dark:text-gray-400">{{ $project->description }}</p>
<ul class="flex gap-4 text-blue-600 mt-5">
<li>PHP</li>|
<li>Mobile App Development</li>|
<li>Database Management</li>|
<li>C++</li>
</ul>
</a>
</div>
#empty
<div class="max-w-7xl mx-auto sm:px-6 lg:px-8">
<p class="text-4xl">No Projects found</p>
</div>
#endforelse
When I click on a single project, a new view called show-project should display the details of the particular project that has been clicked.
My Livewire controller
public function showProjectDetails($slug)
{
return redirect()->route('show-project.details', $slug);
}
<a href="{{ route('show.project_details', ['slug' => $project->slug]) }}">
and web.php route can be either Controller or Livewire page.

bootstrap data toggle tab is working only one tab but the other tab is not working

<div class="widget-header" style="margin-top: 5%;">
<ul class="nav nav-tabs">
<li class="<c:if test="${tab != 'LETTERS'}">selected active </c:if>inline headerDivider">
<a class="<c:if test="${tab != 'LETTERS'}">active </c:if>header" href="action" data-toggle="tab"><Regulations</a>
</li>
<li class="<c:if test="${tab == 'LETTERS'}">selected active </c:if>inline headerDivider">
<a class="<c:if test="${tab == 'LETTERS'}">>selected active </c:if>header" href="action" data-toggle="tab">Letters</a>
</li>
<c:choose>
<c:when test="${tab != 'LETTERS'}">
<a data-href="action" class="btn btn-small" data-toggle="modal" data-reload="regulationDiv">Manage Regulations</a>
</c:when>
<c:otherwise>
<a data-href="action" class="btn btn-small" data-toggle="modal" data-reload="letterDiv">Add Letter</a>
</c:otherwise>
</c:choose>
</ul>
</div>
<div style="overflow: auto; max-height:60vh;" class="tab-content">
<c:if test="${tab != 'LETTERS'}">
<div id="regulationDiv" data-url='action'>
<jsp:include page="regulations.jsp"/>
</div>
</c:if>
<c:if test="${tab == 'LETTERS'}">
<div id="letterDiv" data-url='action'>
<jsp:include page="letters.jsp"/>
</div>
</c:if>
</div>
First tab by default its working second tab there is no event or click, looks like disabled tab.Am I missing something to include in div elements, I tried many ways seems no solution yet to me.

preg_replace regular expression to replace link within a particular tags

I need one help, i want to replace the href link to my link within a particular div class only.
<div id="slider1" class="owl-carousel owl-theme">
<div class="item">
<div class="imagens">
<img src="https://image.oldste.org" alt="The Fate of the Furious" width="100%" height="100%" />
<span class="imdb">
<b class="icon-star"></b> N/A
</span>
</div>
<span class="ttps">The Fate of the Furious</span>
<span class="ytps">2017</span>
</div>
</div>
Here i want to change http://oldsite.com/ to http://newsite.com/?id=
i want these href links like
<a href="http://newsite.com/?id=the-fate-of-the-furious">
Please help me with preg_replace regular expression.
Thanks
this may help you
$content = get_the_content();
$pattern = "/(?<=href=(\"|'))[^\"']+(?=(\"|'))/";
$newurl = get_permalink();
$content = preg_replace($pattern,$newurl,$content);
echo $content;
Lookbehinds are too expensive, use \K to start the fullstring match and avoid a capture group.
<a href="\K[^"]+\/ This pattern will be very efficient. I should state that this pattern will match ALL <a href urls. It also matches greedily until it finds the last / in the url -- I assume this is okay by your input sample.
Pattern Demo
Code (PHP Demo):
$in='<div id="slider1" class="owl-carousel owl-theme">
<div class="item">
<div class="imagens">
<img src="https://image.oldste.org" alt="The Fate of the Furious" width="100%" height="100%" />
<span class="imdb"><b class="icon-star"></b> N/A</span>
</div>
<span class="ttps">The Fate of the Furious</span>
<span class="ytps">2017</span>
</div>';
echo preg_replace('/<a href="\K[^"]+\//','http://newsite.com/?id=',$in);
Output:
<div id="slider1" class="owl-carousel owl-theme">
<div class="item">
<div class="imagens">
<img src="https://image.oldste.org" alt="The Fate of the Furious" width="100%" height="100%" />
<span class="imdb"><b class="icon-star"></b> N/A</span>
</div>
<span class="ttps">The Fate of the Furious</span>
<span class="ytps">2017</span>
</div>

Extracting all dojo attach point values from HTML

I have a saved HTML page which I've opened in notepad++. I would like to extract all the attach points out of the html file. Example from the HTML below:
<div class="contentBar">
<div class="banner" style="">
<span class="bannerRepeat"></span>
<span class="bannerDecal"></span>
</div>
<div>
<div class="logo" data-dojo-attach-point="pageLogoPt">
ABC
</div>
<div class="title" data-dojo-attach-point="pageTitlePt">
ABC
</div>
<div class="userPane">
<div>
<span class="LoginCell LoginText"><span data-dojo-attach-point="welcomeBlockPt">Welcome</span>, <b data-dojo-attach-point="usernameBlockPt">User Name</b></span>
<span widgetid="acme_Button_0" id="acme_Button_0" class="LoginCell Button" data-dojo-type="acme.Button" data-dojo-props="size: 'small'" data-dojo-attach-point="logOutButtonPt"><span widgetid="dijit_form_Button_0" class="dijit dijitReset dijitInline dijitButton ButtonSmall" role="presentation"><span class="dijitReset dijitInline dijitButtonNode" data-dojo-attach-event="ondijitclick:__onClick" role="presentation"><span style="-moz-user-select: none;" aria-disabled="false" id="dijit_form_Button_0" tabindex="0" class="dijitReset dijitStretch dijitButtonContents" data-dojo-attach-point="titleNode,focusNode" role="button" aria-labelledby="dijit_form_Button_0_label"><span class="dijitReset dijitInline dijitIcon dijitNoIcon" data-dojo-attach-point="iconNode"></span><span class="dijitReset dijitToggleButtonIconChar">●</span><span class="dijitReset dijitInline dijitButtonText" id="dijit_form_Button_0_label" data-dojo-attach-point="containerNode">Logout</span></span></span><input value="" class="dijitOffScreen" data-dojo-attach-event="onclick:_onClick" tabindex="-1" role="presentation" aria-hidden="true" data-dojo-attach-point="valueNode" type="button"></span></span>
</div>
<div>
<span id="printLink" style="display:none;">Print</span>
<span id="zoomPercentageDisplay"><span data-dojo-attach-point="zoomBlockPt">Zoom</span>: 100%</span>
<span id="smallFontSizeLink" style="font-size: .8em;">A</span>
<span id="defaultFontSizeLink" style="font-size: 1em;">AA</span>
<span id="largeFontSizeLink" style="font-size: 1.2em;">AAA</span>
</div>
</div>
</div>
</div>
I would like to get:
pageLogoPt
pageTitlePt
welcomeBlockPt
usernameBlockPt
etc ...
Is this possible? Thanks
You can do the following:
Replace (data-dojo-attach-point="[^"]+)(?=") with \n\1\n. This will put what you're looking for on separate lines.
Mark All based on the regex data-dojo-attach-point="[^"]+. Tick "Bookmark line" checkbox.
Search -> Bookmark -> Remove Unmarked Lines
Replace data-dojo-attach-point=" with blank.
This will give you your list with each item in its own line.
Tested on Notepad++ 6.8.8.
Inspired by https://superuser.com/questions/477628/export-all-regular-expression-matches-in-textpad-or-notepad-as-a-list.

Selenium Python UnboundLocalError: local variable 'element' referenced before assignment

I am trying to click on a span tag which contains the text "Clean feed crm"
using an XPATH locator.
I get the error:
UnboundLocalError: local variable 'element' referenced before assignment
Full error trace:
Traceback (most recent call last):
File "C:\Webdriver\ClearCore\TestCases\OperationsPage_TestCase.py", line 56, in test_add_and_run_clean_process
process_lists_page.click_clean_feed_task_from_groups_tab(Globals.process_lists_clean_feed_task_crm)
File "C:\Webdriver\ClearCore\Pages\operations.py", line 90, in click_clean_feed_task_from_groups_tab
clean_feed_crm_element = self.get_element(By.XPATH, '//div[#id="operations_add_process_list_ct_groups_and_tasks"]//../span[contains(text(), "Clean feed crm")]')
File "C:\Webdriver\ClearCore 501\Pages\base.py", line 31, in get_element
return element
UnboundLocalError: local variable 'element' referenced before assignment
If i use the absolute full XPATH it works fine. The relative XPATH it shows the error.
The full absolute XPATH which works is:
(By.XPATH, 'html/body/div[2]/div[2]/div/div[4]/div/div[2]/div/div[3]/div/div[7]/div/div[3]/div/div[4]/div/div[2]/div/div[4]/div/div[3]/div/div[3]/div/div[2]/div/div[1]/div/div/div/div/div[1]/div[1]/div[2]/div/div[1]/div[1]/div/div/div[2]/div/div[2]/span[1]/span')
The relative XPATH which does not work is:
(By.XPATH, '//div[#id="operations_add_process_list_ct_groups_and_tasks"]//../span[contains(text(), "Clean feed crm")]')
The HTML is:
<div id="operations_add_process_list_ct_groups_and_tasks" class="GPI5XK1CDG" __gwtcellbasedwidgetimpldispatchingfocus="true" __gwtcellbasedwidgetimpldispatchingblur="true" role="tree">
<div style="overflow: hidden;">
<div>
<div>
<div aria-selected="false" role="treeitem" aria-setsize="3" aria-posinset="1" aria-expanded="true" aria-level="1">
<div class="GPI5XK1CIF GPI5XK1CAG" style="padding-left: 0px;">
<div style="overflow: hidden;">
<div>
<div>
<div aria-selected="false" role="treeitem" aria-setsize="3" aria-posinset="1" aria-level="2">
<div class="GPI5XK1CIF" style="padding-left: 16px;">
<div class="GPI5XK1CIF GPI5XK1CKF" style="padding-left: 16px;position:relative;" onclick="">
<div style="position:absolute;display:none;"/>
<div class="GPI5XK1CLF">
<div style="padding-left: 22px;position:relative;zoom:1;">
<div style="left:0px;margin-top:-8px;position:absolute;top:50%;line-height:0px;">
<img border="0" style="width:16px;height:16px;background:url() no-repeat 0px 0px;" src="http://justin-pc.infoshare.local:8080/clearcore501/ClearCore/clear.cache.gif" onload="this.__gwtLastUnhandledEvent="load";"/>
</div>
<div>
<span>
<span class=" myinlineblock" title="Clean feed crm" style="white-space:nowrap;overflow:hidden;text-overflow:ellipsis;empty-cells:show;width:100%;margin-right:-14px;">Clean feed crm</span>
</span>
<span>
<span class="" title="Turn task off or on." style="">
<input type="checkbox" checked="" tabindex="-1"/>
</span>
</span>
</div>
</div>
</div>
</div>
</div>
</div>
<div aria-selected="false" role="treeitem" aria-setsize="3" aria-posinset="2" aria-level="2">
<div class="GPI5XK1CIF" style="padding-left: 16px;">
<div class="GPI5XK1CIF GPI5XK1CKF" style="padding-left: 16px;position:relative;" onclick="">
<div style="position:absolute;display:none;"/>
<div class="GPI5XK1CLF">
<div style="padding-left: 22px;position:relative;zoom:1;">
<div style="left:0px;margin-top:-8px;position:absolute;top:50%;line-height:0px;">
<img border="0" style="width:16px;height:16px;background:url() no-repeat 0px 0px;" src="http://justin-pc.infoshare.local:8080/clearcore501/ClearCore/clear.cache.gif" onload="this.__gwtLastUnhandledEvent="load";"/>
</div>
<div>
<span>
<span class=" myinlineblock" title="Clean feed escr" style="white-space:nowrap;overflow:hidden;text-overflow:ellipsis;empty-cells:show;width:100%;margin-right:-14px;">Clean feed escr</span>
</span>
<span>
<span class="" title="Turn task off or on." style="">
<input type="checkbox" checked="" tabindex="-1"/>
</span>
</span>
</div>
</div>
</div>
</div>
</div>
</div>
My method implementation is:
def click_clean_feed_task_from_groups_tab(self, feed):
# Params: feed: clean feed crm, clean feed escr or clean feed orchard
#clean_feed_crm_element = self.driver.find_element(By.XPATH, '//span[#class="myinlineblock" and contains(text(), "%s") % feed]')
clean_feed_crm_element = self.get_element(By.XPATH, '//div[#id="operations_add_process_list_ct_groups_and_tasks"]//../span[contains(text(), "Clean feed crm")]')
#clean_feed_crm_element = WebDriverWait(self.driver, 20).until(EC.element_to_be_clickable((By.XPATH, '//div[#id="operations_add_process_list_ct_groups_and_tasks"]//..//.//..//..//..//..//..//..//../span[contains(text(), "%s")] % feed ]')))
clean_feed_crm_element.click()
return self
From my TestCase class i call th method:
project_navigator = ProjectNavigatorPage(self.driver)
process_lists_page = project_navigator.select_projectNavigator_item("Process Lists")
process_lists_page.click_add_button_for_process_lists()
process_lists_page.click_clean_task_arrow_to_expand_it_from_groups_tab("add")
process_lists_page.click_clean_feed_task_from_groups_tab(Globals.process_lists_clean_feed_task_crm)
Globals.py is:
process_lists_clean_feed_task_crm = "Clean feed crm"
I havea also tried using WebDriverWait still the same error:
clean_feed_crm_element = WebDriverWait(self.driver, 20).until(EC.element_to_be_clickable(((By.XPATH, '//div[#id="operations_add_process_list_ct_groups_and_tasks"]//../span[contains(text(), "%s") % feed]')))
%s, % feed the value is "Clean feed crm" as I am looking for this text (passed in as a parameter into my method.
What am i doing wrong? What XPATH could i use then to click the element which has the text "Clean feed crm"?
Thanks,
Riaz
If we recall some elements from the XPath sintax:
The expression "//" selects nodes in the document from the current
node that match the selection no matter where they are.
The expression ".." selects the parent of the current node.
Therefore when you write:
//div[#id="operations_add_process_list_ct_groups_and_tasks"]//..
You are selecting the div node itself. From that node the relative XPath should be:
//div[#id="operations_add_process_list_ct_groups_and_tasks"]//span[contains(text(), "Clean feed crm")]
That way you select the div node with the id selected, and look inside for the span tag which contains the text.