Regex: removing specific style attribute in HTML - regex

I have a problem in regex.
I want to remove attribute style html backgroud-image in tag HTML, like this:
<span style="background-image: url ("http://mantis.we.intern/custom/userfiles/image/6y0eC4vzptnIxsikHs0AJA.png"); bgcolor:'red'">11111<br />
<span style="background-image: url('https:// asd asdmantis.we.intern/custom/userfiles/image/6y0eC4vzptnIxsikHs0AJA.png')">22222
<span style="background-image:url('https://fmantis.we.intern/custom/userfiles/image/6y0eC4vzptnIxsikHs0AJA.png')">3333
<span style="background-image:url( 'https://fmantis.we.intern/custom/userfiles/image/6y0eC4vzptnIxsikHs0AJA.png')">444
<span style="background-image: url ( "http://mantis.we.intern/custom/userfiles/image/6y0eC4vzptnIxsikHs0AJA.png");">555<br />
<span style="background-image: url(xx https://trk.workexpert.net/web/include/ckeditor_432/plugins/icons.png?t=E0LB& xx); >666
And the result from regex, could be like this:
<span style=" bgcolor:'red'">11111<br />
<span style="">22222
<span style="">3333
<span style="">444
<span style="">555<br />
<span style=" >666
Thank you, before.

This regex does exactly what you need:
background-image:[^)]*[)];?
It selects the words background-image: and then everything up until the first ) with an optional ;
Regex101 Tested

Related

preg_replace regular expression to replace link within a particular tags

I need one help, i want to replace the href link to my link within a particular div class only.
<div id="slider1" class="owl-carousel owl-theme">
<div class="item">
<div class="imagens">
<img src="https://image.oldste.org" alt="The Fate of the Furious" width="100%" height="100%" />
<span class="imdb">
<b class="icon-star"></b> N/A
</span>
</div>
<span class="ttps">The Fate of the Furious</span>
<span class="ytps">2017</span>
</div>
</div>
Here i want to change http://oldsite.com/ to http://newsite.com/?id=
i want these href links like
<a href="http://newsite.com/?id=the-fate-of-the-furious">
Please help me with preg_replace regular expression.
Thanks
this may help you
$content = get_the_content();
$pattern = "/(?<=href=(\"|'))[^\"']+(?=(\"|'))/";
$newurl = get_permalink();
$content = preg_replace($pattern,$newurl,$content);
echo $content;
Lookbehinds are too expensive, use \K to start the fullstring match and avoid a capture group.
<a href="\K[^"]+\/ This pattern will be very efficient. I should state that this pattern will match ALL <a href urls. It also matches greedily until it finds the last / in the url -- I assume this is okay by your input sample.
Pattern Demo
Code (PHP Demo):
$in='<div id="slider1" class="owl-carousel owl-theme">
<div class="item">
<div class="imagens">
<img src="https://image.oldste.org" alt="The Fate of the Furious" width="100%" height="100%" />
<span class="imdb"><b class="icon-star"></b> N/A</span>
</div>
<span class="ttps">The Fate of the Furious</span>
<span class="ytps">2017</span>
</div>';
echo preg_replace('/<a href="\K[^"]+\//','http://newsite.com/?id=',$in);
Output:
<div id="slider1" class="owl-carousel owl-theme">
<div class="item">
<div class="imagens">
<img src="https://image.oldste.org" alt="The Fate of the Furious" width="100%" height="100%" />
<span class="imdb"><b class="icon-star"></b> N/A</span>
</div>
<span class="ttps">The Fate of the Furious</span>
<span class="ytps">2017</span>
</div>

Selenium Python Xpath how to select the correct span text from many nested div tags

I have a web page with a left hand menu. It is made up of many div tags.
I have noticed when my Selenium Python script runs it is not clicking the text I want clicked from the left hand menu. It is clicking something else.
My Xpath is not correct.
I would like to locate the text "Statistics" (it is in a div\span tag) which has the parent div text "Analysis"
It is not clicking the correct text "Statistics" because there maybe another "Statistics" somewhere in the HTML source. If i start from the div tag which has the text "Analysis" and then find the text "Statistics" then I will get the correct element.
My Xpath is:
.//div//span[#title="Analysis"]/following::div[5]//span[text()="Statistics"]
The HTML is:
<div>
<span class="" title="Analysis"
style="white-space:nowrap;overflow:hidden;text-overflow:ellipsis;empty-cells:show;display:block;">Analysis</span>
</div>
</div>
</div>
</div>
</div>
<div style="overflow: hidden;">
<div>
<div>
<div aria-selected="false" role="treeitem" aria-setsize="3" aria-posinset="1" aria-expanded="false"
aria-level="2">
<div class="GJPPK2LBIF" style="padding-left: 16px;">
<div class="GJPPK2LBIF GJPPK2LBKF" style="padding-left: 16px;position:relative;" onclick="">
<div class="GJPPK2LBJF" style="left: 0px;width: 15px;height: 15px;position:absolute;">
<img border="0"
style="width:15px;height:15px;background:url(data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAA8AAAAPCAYAAAA7/HbnjJn53wAAAABJRU5ErkJggg==) no-repeat 0px 0px;"
src="http://test1:8080/clearcore/ClearCore/clear.cache.gif"
onload="this.__gwtLastUnhandledEvent=" load";"/>
</div>
<div class="GJPPK2LBLF">
<div style="padding-left: 22px;position:relative;zoom:1;">
<div style="left:0px;margin-top:-8px;position:absolute;top:50%;line-height:0px;">
<img border="0"
style="width:16px;height:16px;background:url(data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAYAAAAf8/9hAAAABmJLR0QA/wD/AP+gvaekJggg==) no-repeat 0px 0px;"
src="http://test1:8080/clearcore/ClearCore/clear.cache.gif"
onload="this.__gwtLastUnhandledEvent=" load";"/>
</div>
<div>
<span class="" title="Statistics"
style="white-space:nowrap;overflow:hidden;text-overflow:ellipsis;empty-cells:show;display:block;">Statistics</span>
</div>
</div>
</div>
</div>
</div>
</div>
Thanks,
Riaz
If you have FireFox with FirePath you can test the xpath and see how many and which matches you get. For instance:
//span[text()="Statistics"]
This may result in 1 matching node but also in more. Let's assume there's two matches and the one you want is the second one. Then you'd choose:
//span[text()="Statistics"][2]

Extracting all dojo attach point values from HTML

I have a saved HTML page which I've opened in notepad++. I would like to extract all the attach points out of the html file. Example from the HTML below:
<div class="contentBar">
<div class="banner" style="">
<span class="bannerRepeat"></span>
<span class="bannerDecal"></span>
</div>
<div>
<div class="logo" data-dojo-attach-point="pageLogoPt">
ABC
</div>
<div class="title" data-dojo-attach-point="pageTitlePt">
ABC
</div>
<div class="userPane">
<div>
<span class="LoginCell LoginText"><span data-dojo-attach-point="welcomeBlockPt">Welcome</span>, <b data-dojo-attach-point="usernameBlockPt">User Name</b></span>
<span widgetid="acme_Button_0" id="acme_Button_0" class="LoginCell Button" data-dojo-type="acme.Button" data-dojo-props="size: 'small'" data-dojo-attach-point="logOutButtonPt"><span widgetid="dijit_form_Button_0" class="dijit dijitReset dijitInline dijitButton ButtonSmall" role="presentation"><span class="dijitReset dijitInline dijitButtonNode" data-dojo-attach-event="ondijitclick:__onClick" role="presentation"><span style="-moz-user-select: none;" aria-disabled="false" id="dijit_form_Button_0" tabindex="0" class="dijitReset dijitStretch dijitButtonContents" data-dojo-attach-point="titleNode,focusNode" role="button" aria-labelledby="dijit_form_Button_0_label"><span class="dijitReset dijitInline dijitIcon dijitNoIcon" data-dojo-attach-point="iconNode"></span><span class="dijitReset dijitToggleButtonIconChar">●</span><span class="dijitReset dijitInline dijitButtonText" id="dijit_form_Button_0_label" data-dojo-attach-point="containerNode">Logout</span></span></span><input value="" class="dijitOffScreen" data-dojo-attach-event="onclick:_onClick" tabindex="-1" role="presentation" aria-hidden="true" data-dojo-attach-point="valueNode" type="button"></span></span>
</div>
<div>
<span id="printLink" style="display:none;">Print</span>
<span id="zoomPercentageDisplay"><span data-dojo-attach-point="zoomBlockPt">Zoom</span>: 100%</span>
<span id="smallFontSizeLink" style="font-size: .8em;">A</span>
<span id="defaultFontSizeLink" style="font-size: 1em;">AA</span>
<span id="largeFontSizeLink" style="font-size: 1.2em;">AAA</span>
</div>
</div>
</div>
</div>
I would like to get:
pageLogoPt
pageTitlePt
welcomeBlockPt
usernameBlockPt
etc ...
Is this possible? Thanks
You can do the following:
Replace (data-dojo-attach-point="[^"]+)(?=") with \n\1\n. This will put what you're looking for on separate lines.
Mark All based on the regex data-dojo-attach-point="[^"]+. Tick "Bookmark line" checkbox.
Search -> Bookmark -> Remove Unmarked Lines
Replace data-dojo-attach-point=" with blank.
This will give you your list with each item in its own line.
Tested on Notepad++ 6.8.8.
Inspired by https://superuser.com/questions/477628/export-all-regular-expression-matches-in-textpad-or-notepad-as-a-list.

How can I extract URLs from html content with ruby regexp?

Lets go directly with an example since it is not easy to explain:
<li id="l_f6a1ok3n4d4p" class="online"> <div class="link"> random strings - 4 <a style="float:left; display:block; padding-top:3px;" href="http://www.webtrackerplus.com/?page=flowplayerregister&a_aid=&a_bid=&chan=flow"><img border="0" src="/resources/img/fdf.gif"></a> <!-- a class="none" href="#">random strings - 4 site2.com - # - </a --> </div> <div class="params"> <span>Submited: </span>7 June 2015 | <span>Host: </span>site2.com </div> <div class="report"> <a title="" href="javascript:report(3191274,%203,%202164691,%201)" class="alert"></a> <a title="" href="javascript:report(3191274,%203,%202164691,%200)" class="work"></a> <b>100% said work</b> </div> <div class="clear"></div> </li> <li id="l_zsgn82c4b96d" class="online"> <div class="link"> <a href="javascript:show('zsgn82c4b96d','random%20strings%204',%20'site1.com');%20" onclick="visited('zsgn82c4b96d');" style
In the above content i want to extract from
javascript:show('f6a1ok3n4d4p','random%20strings%204',%20'site2.com')
the string "f6a1ok3n4d4p" and "site2.com" then make it as
http://site2.com/f6a1ok3n4d4p
and same for
javascript:show('zsgn82c4b96d','random%20strings%204',%20'site1.com')
to become
http://site1.com/zsgn82c4b96d
I need it to be done with ruby regex
This should give you some insight of how to do it.
https://regex101.com/r/wD4oT8/2
javascript:show\(\'(.*?)'.*?\'([^\']*)\'\) will capture the first argument as $1, last part within ' as $2, so you get what you want by substituting as $2/$1.
That's the regex part of it, and, of course, you can adjust the regex as you see fit, for example, to include the usage of " (javascript:show\((?:\'|\")(.*?)(?:\'|\").*?\'([^\'\"]*)(?:\'|\")\) or allow only with 3 arguments.
/yourregex/.match(yourstring) will extract the information you need.

Regular expression to remove lines with special characters

<a class='jdr' href='javascript:void(0);' onClick="return openDiv('jrtp');"></a>
<span class="jcn">
<a href="http://example.com/Ahmedabad/Aptech-N-Power-Hardware-Networking-<near>-Toll-Naka-Opp-Kakadia-Hospital-Below-Sankalp-Reataurant-Bapu-Nagar/079PXX79-XX79-110420173655-D4K6_QWhtZWRhYmFkIENDTkEgVHJhaW5pbmcgSW5zdGl0dXRlcw==_BZDET" title='Aptech N Power Hardware & Networking' >Aptech N Power Hardware & Networkin...</a>
</span>
<section class="jrat">
<a rel="nofollow" href="http://example.com/Ahmedabad/Aptech-N-Power-Hardware-Networking-<near>-Toll-Naka-Opp-Kakadia-Hospital-Below-Sankalp-Reataurant-Bapu-Nagar/079PXX79-XX79-110420173655-D4K6_QWhtZWRhYmFkIENDTkEgVHJhaW5pbmcgSW5zdGl0dXRlcw==_BZDET#rvw"><span class='s10'></span><span class='s10'></span><span class='s10'></span><span class='s10'></span><span class='s0'></span></a>
<a class="jrt" href="http://example.com/Ahmedabad/Aptech-N-Power-Hardware-Networking-<near>-Toll-Naka-Opp-Kakadia-Hospital-Below-Sankalp-Reataurant-Bapu-Nagar/079PXX79-XX79-110420173655-D4K6_QWhtZWRhYmFkIENDTkEgVHJhaW5pbmcgSW5zdGl0dXRlcw==_BZDET#rvw">2 ratings</a>
<span class="jrt"> |</span>
<a class="rate_this" onclick="_ct('ratethis','lspg');" href="http://example.com/Ahmedabad/Aptech-N-Power-Hardware-Networking-<near>-Toll-Naka-Opp-Kakadia-Hospital-Below-Sankalp-Reataurant-Bapu-Nagar/079PXX79-XX79-110420173655-D4K6_QWhtZWRhYmFkIENDTkEgVHJhaW5pbmcgSW5zdGl0dXRlcw==_BZDET/writereview">Rate this</a>
</section>
<section class="jcar">
<section class="jbc">
<a href="http://example.com/Ahmedabad/Aptech-N-Power-Hardware-Networking-<near>-Toll-Naka-Opp-Kakadia-Hospital-Below-Sankalp-Reataurant-Bapu-Nagar/079PXX79-XX79-110420173655-D4K6_QWhtZWRhYmFkIENDTkEgVHJhaW5pbmcgSW5zdGl0dXRlcw==_BZDET">
<img width="83" height="56" border="0" src="http://images.jdmagicbox.com/upload_test/ahmedabad/b4/079pxx79.xx79.110420172948.d4b4/logo/faf3f2409ed7993aaa70f848ab0bb6fb_t.jpg" class="Clogo" />
</a>
<!-- <span class="noLogo"></span> -->
<section class="jrcl">
<p>
**A/35, Lakhani Chamber, Toll Naka, Opp Kakadia Hospital, Below Sankalp Reataurant, Bapu Nagar, Ahmedabad - 380024** | View Map<br>
</p>
From the above XML data I want to extract the following---
A/35, Lakhani Chamber, Toll Naka, Opp Kakadia Hospital, Below Sankalp Reataurant, Bapu Nagar, Ahmedabad - 380024
I need help in creating a regular expression to find and remove all lines containing special characters.
I am using the following regex ----
/(\<.+?>)/g
Please help.Thanks
Try this
/(?<=\*{2})([^<>]*?)(?=\*{2})/g
it matches all content between the **.
I think you want to remove lines which are HTML tags, so try this:
/^<.*>\n/g