jquery regex get several key not only one - regex

I would like to get
PA-1400-11PA ADP-40PH ABA
Here html code
</div>
<div class="ref">
<h2 id='affiche_sous_titre'>eee :</h2> <p>
<a href='eee' title='PA-1400-11PA' class='lien_menu'>PA-1400-11PA</a> - <a href='uuu' title='ADP-40PH ABA' class='lien_menu'>ADP-40PH ABA</a> </p>
</div>
<div class="modele_tout">
</div>
<div class="star-customer">
Here my reg code
line=line.replace(/[\"\']lien_menu[\"\']>(.*?)<\/a>/ig,"$1\n")
But I have only
ADP-40PH ABA
What is the problem.I dont understand?
thanks for your help

Related

Why doesnt this regexp work for this html?

<div class="_1zGQT _2ugFP message-in">
<div class="-N6Gq">
<div class="copyable-text" data-pre-plain-text="[18:09, 3.6.2019] Лера сестра: ">
<div class="_12pGw">
<div class="_3X58t selectable-text invisible-space copyable-text">
<span class="_2ZDCk">
<img crossorigin="anonymous" src="URL" alt="😆" draggable="false" class="_298rb _2FANH selectable-text invisible-space copyable-text" data-plain-text="😆" style="visibility: visible;">
</span>
</div>
</div>
</div>
</div>
</div>
Ive try to get with this code:
soup.find('div', class_=re.compile('^selectable-text invisible-space copyable-text'))
All i got: None.
The problem is that part of the class (_3X58t ) is changing.
This would be likely due to using ^ anchor, which we could modify to:
soup.find('div', class_=re.compile('selectable-text invisible-space copyable-text'))
or we might try this expression for the divs:
(.+?selectable-text invisible-space copyable-text)
Demo
I would first see if a single class, from the compound class list, could be used e.g.
soup.select_one('.selectable-text')
Else combine classes
soup.select_one('[class$="selectable-text invisible-space copyable-text"]')
Rather than resorting to regex.

preg_replace regular expression to replace link within a particular tags

I need one help, i want to replace the href link to my link within a particular div class only.
<div id="slider1" class="owl-carousel owl-theme">
<div class="item">
<div class="imagens">
<img src="https://image.oldste.org" alt="The Fate of the Furious" width="100%" height="100%" />
<span class="imdb">
<b class="icon-star"></b> N/A
</span>
</div>
<span class="ttps">The Fate of the Furious</span>
<span class="ytps">2017</span>
</div>
</div>
Here i want to change http://oldsite.com/ to http://newsite.com/?id=
i want these href links like
<a href="http://newsite.com/?id=the-fate-of-the-furious">
Please help me with preg_replace regular expression.
Thanks
this may help you
$content = get_the_content();
$pattern = "/(?<=href=(\"|'))[^\"']+(?=(\"|'))/";
$newurl = get_permalink();
$content = preg_replace($pattern,$newurl,$content);
echo $content;
Lookbehinds are too expensive, use \K to start the fullstring match and avoid a capture group.
<a href="\K[^"]+\/ This pattern will be very efficient. I should state that this pattern will match ALL <a href urls. It also matches greedily until it finds the last / in the url -- I assume this is okay by your input sample.
Pattern Demo
Code (PHP Demo):
$in='<div id="slider1" class="owl-carousel owl-theme">
<div class="item">
<div class="imagens">
<img src="https://image.oldste.org" alt="The Fate of the Furious" width="100%" height="100%" />
<span class="imdb"><b class="icon-star"></b> N/A</span>
</div>
<span class="ttps">The Fate of the Furious</span>
<span class="ytps">2017</span>
</div>';
echo preg_replace('/<a href="\K[^"]+\//','http://newsite.com/?id=',$in);
Output:
<div id="slider1" class="owl-carousel owl-theme">
<div class="item">
<div class="imagens">
<img src="https://image.oldste.org" alt="The Fate of the Furious" width="100%" height="100%" />
<span class="imdb"><b class="icon-star"></b> N/A</span>
</div>
<span class="ttps">The Fate of the Furious</span>
<span class="ytps">2017</span>
</div>

Selenium Python Xpath how to select the correct span text from many nested div tags

I have a web page with a left hand menu. It is made up of many div tags.
I have noticed when my Selenium Python script runs it is not clicking the text I want clicked from the left hand menu. It is clicking something else.
My Xpath is not correct.
I would like to locate the text "Statistics" (it is in a div\span tag) which has the parent div text "Analysis"
It is not clicking the correct text "Statistics" because there maybe another "Statistics" somewhere in the HTML source. If i start from the div tag which has the text "Analysis" and then find the text "Statistics" then I will get the correct element.
My Xpath is:
.//div//span[#title="Analysis"]/following::div[5]//span[text()="Statistics"]
The HTML is:
<div>
<span class="" title="Analysis"
style="white-space:nowrap;overflow:hidden;text-overflow:ellipsis;empty-cells:show;display:block;">Analysis</span>
</div>
</div>
</div>
</div>
</div>
<div style="overflow: hidden;">
<div>
<div>
<div aria-selected="false" role="treeitem" aria-setsize="3" aria-posinset="1" aria-expanded="false"
aria-level="2">
<div class="GJPPK2LBIF" style="padding-left: 16px;">
<div class="GJPPK2LBIF GJPPK2LBKF" style="padding-left: 16px;position:relative;" onclick="">
<div class="GJPPK2LBJF" style="left: 0px;width: 15px;height: 15px;position:absolute;">
<img border="0"
style="width:15px;height:15px;background:url(data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAA8AAAAPCAYAAAA7/HbnjJn53wAAAABJRU5ErkJggg==) no-repeat 0px 0px;"
src="http://test1:8080/clearcore/ClearCore/clear.cache.gif"
onload="this.__gwtLastUnhandledEvent=" load";"/>
</div>
<div class="GJPPK2LBLF">
<div style="padding-left: 22px;position:relative;zoom:1;">
<div style="left:0px;margin-top:-8px;position:absolute;top:50%;line-height:0px;">
<img border="0"
style="width:16px;height:16px;background:url(data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAYAAAAf8/9hAAAABmJLR0QA/wD/AP+gvaekJggg==) no-repeat 0px 0px;"
src="http://test1:8080/clearcore/ClearCore/clear.cache.gif"
onload="this.__gwtLastUnhandledEvent=" load";"/>
</div>
<div>
<span class="" title="Statistics"
style="white-space:nowrap;overflow:hidden;text-overflow:ellipsis;empty-cells:show;display:block;">Statistics</span>
</div>
</div>
</div>
</div>
</div>
</div>
Thanks,
Riaz
If you have FireFox with FirePath you can test the xpath and see how many and which matches you get. For instance:
//span[text()="Statistics"]
This may result in 1 matching node but also in more. Let's assume there's two matches and the one you want is the second one. Then you'd choose:
//span[text()="Statistics"][2]

Extracting all dojo attach point values from HTML

I have a saved HTML page which I've opened in notepad++. I would like to extract all the attach points out of the html file. Example from the HTML below:
<div class="contentBar">
<div class="banner" style="">
<span class="bannerRepeat"></span>
<span class="bannerDecal"></span>
</div>
<div>
<div class="logo" data-dojo-attach-point="pageLogoPt">
ABC
</div>
<div class="title" data-dojo-attach-point="pageTitlePt">
ABC
</div>
<div class="userPane">
<div>
<span class="LoginCell LoginText"><span data-dojo-attach-point="welcomeBlockPt">Welcome</span>, <b data-dojo-attach-point="usernameBlockPt">User Name</b></span>
<span widgetid="acme_Button_0" id="acme_Button_0" class="LoginCell Button" data-dojo-type="acme.Button" data-dojo-props="size: 'small'" data-dojo-attach-point="logOutButtonPt"><span widgetid="dijit_form_Button_0" class="dijit dijitReset dijitInline dijitButton ButtonSmall" role="presentation"><span class="dijitReset dijitInline dijitButtonNode" data-dojo-attach-event="ondijitclick:__onClick" role="presentation"><span style="-moz-user-select: none;" aria-disabled="false" id="dijit_form_Button_0" tabindex="0" class="dijitReset dijitStretch dijitButtonContents" data-dojo-attach-point="titleNode,focusNode" role="button" aria-labelledby="dijit_form_Button_0_label"><span class="dijitReset dijitInline dijitIcon dijitNoIcon" data-dojo-attach-point="iconNode"></span><span class="dijitReset dijitToggleButtonIconChar">●</span><span class="dijitReset dijitInline dijitButtonText" id="dijit_form_Button_0_label" data-dojo-attach-point="containerNode">Logout</span></span></span><input value="" class="dijitOffScreen" data-dojo-attach-event="onclick:_onClick" tabindex="-1" role="presentation" aria-hidden="true" data-dojo-attach-point="valueNode" type="button"></span></span>
</div>
<div>
<span id="printLink" style="display:none;">Print</span>
<span id="zoomPercentageDisplay"><span data-dojo-attach-point="zoomBlockPt">Zoom</span>: 100%</span>
<span id="smallFontSizeLink" style="font-size: .8em;">A</span>
<span id="defaultFontSizeLink" style="font-size: 1em;">AA</span>
<span id="largeFontSizeLink" style="font-size: 1.2em;">AAA</span>
</div>
</div>
</div>
</div>
I would like to get:
pageLogoPt
pageTitlePt
welcomeBlockPt
usernameBlockPt
etc ...
Is this possible? Thanks
You can do the following:
Replace (data-dojo-attach-point="[^"]+)(?=") with \n\1\n. This will put what you're looking for on separate lines.
Mark All based on the regex data-dojo-attach-point="[^"]+. Tick "Bookmark line" checkbox.
Search -> Bookmark -> Remove Unmarked Lines
Replace data-dojo-attach-point=" with blank.
This will give you your list with each item in its own line.
Tested on Notepad++ 6.8.8.
Inspired by https://superuser.com/questions/477628/export-all-regular-expression-matches-in-textpad-or-notepad-as-a-list.

ModX: Using GetResources to display multiple pages in one page

I am trying to use getResources to display multiple resources withing one resource, including their Templates and TVs.
The code I have in the page I want to display them is:
[[!getResources? &parents=`50` &sortdir=`ASC` &sortby=`menuindex` &limit=`100` &includeTVs=`1` &processTVs=`1` &tpl=`gigtemp` ]]
Where &tpl=gigtemp is a chunk I have created where all my template HTML and TVs are.
However, nothing is showing on the page.
Can anyone help me out?
Please let me know if I need to explain more.
Update:
Some of the info is showing, but a lot of the html is broken.
My HTML on the Chunk is:
<div class="gig-guide">
<div class="gig-info">
<h2>[[+tv.gigname]]</h2>
<strong>[[=tv.gigcity]]</strong>
<img src="[[+tv.gigthumb]]" alt="Contra Clave Contra Event: [[+tv.gigname]" /></div>
<div class="gig-info">
<h2>[[+tv.gigdate]]</h2>
[[+tv.gigtime]]</div>
<div class="gig-info">
<h2>[[+tv.gigvenue]]</h2>
[[+tv.gigaddress]]</div>
<div class="gig-info">
<h2>[[+tv.gigcost]]</h2>
</div>
<div class="gig-bottom">
<div class="fb-like" data-href="[[+tv.gigfbevent]]" data-send="false" data-width="300" data-colorscheme="dark" data-show-faces="false"> </div>
<div class="gigsocialmedia"><img src="assets/images/ccc-fb.png" alt="This event on Facebook" /> <a class="twitter-share-button" href="https://twitter.com/share?text=[[+tv.gigtwitter]]" target="_blank" data-lang="en"><img src="assets/images/ccc-twiter.png" alt="Tweet this event" /></a> <img src="assets/images/ccc-email.png" alt="Email this event to a friend" /></div>
</div>
<!--END GIG BOTTOM DIV-->
<!--END GIG GUIDE DIV-->
Again, any help is appreciated!
Your code is valid and as far as I can see, without errors. That means that there is something else wrong, I would guess one of the following:
You have not cleared your cache, which is not necessary but could solve weird problems
The children of resource 50 is not published or is hidden
There is something else wrong around your code, making Modx not parsing it correctly.
Edit: You had several errors in your chunk. Try replacing it with this:
<div class="gig-guide">
<div class="gig-info">
<h2>[[+tv.gigname]]</h2>
<strong>[[+tv.gigcity]]</strong>
<img src="[[+tv.gigthumb]]" alt="Contra Clave Contra Event: [[+tv.gigname]]" /></div>
<div class="gig-info">
<h2>[[+tv.gigdate]]</h2>
[[+tv.gigtime]]</div>
<div class="gig-info">
<h2>[[+tv.gigvenue]]</h2>
[[+tv.gigaddress]]</div>
<div class="gig-info">
<h2>[[+tv.gigcost]]</h2>
</div>
<div class="gig-bottom">
<div class="fb-like" data-href="[[+tv.gigfbevent]]" data-send="false" data-width="300" data-colorscheme="dark" data-show-faces="false"> </div>
<div class="gigsocialmedia"><img src="assets/images/ccc-fb.png" alt="This event on Facebook" /> <a class="twitter-share-button" href="https://twitter.com/share?text=[[+tv.gigtwitter]]" target="_blank" data-lang="en"><img src="assets/images/ccc-twiter.png" alt="Tweet this event" /></a> <img src="assets/images/ccc-email.png" alt="Email this event to a friend" /></div>
</div>
<!--END GIG BOTTOM DIV-->
<!--END GIG GUIDE DIV-->
Are your resources you are trying to display hidden? then you need the &showHidden=1 .
Are they unpublished? then you also need the &showUnpublished=1
You may also need the &includeContent=1 ~maybe~
See if you can get away without using the &processTVs
If you are still having issues - leave out the &tpl=``, getResources will just dump it's output to the page so you can see what is actually being returned. Might give you another clue as to what is not happening..