Parse specific div from raw text using regex?

Parse specific div from raw text using regex? - regex

So I'm in a situation that requires parsing raw HTML data as a string, this is unavoidable unfortunately otherwise I wouldn't post this. I only need regex to match the class of a div that has an img tag as a child.
So this is the code example that I'm dealing with:
<div class="summary">
<h3>Example</h3>
<div class="explanation">
<span>This serves as an example for the site.</span>
</div>
<div class="user-details">
mheathershaw<br>
<img src="res/badge522.png"/> <span class="score">522</span>
</div>
<div class="help">
Help
</div>
</div>
And the div that I'd like to retrieve the class from is the div that contains the image. The exact capture from this example that I'd like (optimally) is user-details. The criteria for capturing it is simply if it has <img ... /> as a child.
Anyone able to help? Thanks!

You may try this,
/<div\b[^>]*\bclass="([^"]*)"[^>]*>(?:(?!<\/div>)[\s\S])*?<img\b[^>]*>(?:(?!<\/div>)[\s\S])*?<\/div>/
DEMO

Related

Why doesnt this regexp work for this html?

<div class="_1zGQT _2ugFP message-in">
<div class="-N6Gq">
<div class="copyable-text" data-pre-plain-text="[18:09, 3.6.2019] Лера сестра: ">
<div class="_12pGw">
<div class="_3X58t selectable-text invisible-space copyable-text">
<span class="_2ZDCk">
<img crossorigin="anonymous" src="URL" alt="😆" draggable="false" class="_298rb _2FANH selectable-text invisible-space copyable-text" data-plain-text="😆" style="visibility: visible;">
</span>
</div>
</div>
</div>
</div>
</div>
Ive try to get with this code:
soup.find('div', class_=re.compile('^selectable-text invisible-space copyable-text'))
All i got: None.
The problem is that part of the class (_3X58t ) is changing.

This would be likely due to using ^ anchor, which we could modify to:
soup.find('div', class_=re.compile('selectable-text invisible-space copyable-text'))
or we might try this expression for the divs:
(.+?selectable-text invisible-space copyable-text)
Demo

I would first see if a single class, from the compound class list, could be used e.g.
soup.select_one('.selectable-text')
Else combine classes
soup.select_one('[class$="selectable-text invisible-space copyable-text"]')
Rather than resorting to regex.

Accordion container with article tag

According to the documentation of F6 we can use the accordion container with something else than a ul tag. Or, I can't get it to work with an article tag.
The problem seems to be caused by the fact that .accordion-title isn't the direct child of .accordion-item. Unfortunately, in my use case, I need to wrap the a tag with the .accordion-title class inside an heading tag.
Does anyone would know how to solve that issue?
Thanks,
Here's an example of my use case :
<div class="accordion" enter code here
data-multi-expand="true"
data-allow-all-closed="true"
data-accordion>
<article class="accordion-item" data-accordion-item>
<header>
<h3>
Group Name
</h3>
</header>
<div class="accordion-content" data-tab-content>
<p>Hello World</p>
</div>
</article>
</div>

As you figured out, it doesn't have to do with the <article> tag but rather that a direct child click trigger is needed.
Example of it working with <article>
https://codepen.io/rafibomb/pen/pGKZYg
Without JS modification, it may not work the way you want it to.

Sublime Text Regex Search for alphanumeric string, not working..

I'm trying to replace a common theme used in hundreds of pages in my project:
<div id="PageTitle"> (Page title as a string) </div>
And the title varies each page. I want to replace it with
<div class="row">
<div class="col-md-12 col-sm-12">
<h3><?= $pageTitle?></h3>
</div>
</div>
I've tried searching with <div id="PageTitle">/^\w+$/</div>, and <div id="PageTitle">"^[a-zA-Z0-9_]*$"</div> with no luck. Any ideas?

You are almost there. Looks like you got the pattern from somewhere else. ^ and $ are starting and ending anchors so they match with the start and end of an input so you should probably get rid of them.
Next if your page title is only going to contain alphanumeric characters (no spaces too) then \w is fine, else you might want to use . instead.
<div id="PageTitle">\w+<\/div>
For a title containing any character:
<div id="PageTitle">.+?<\/div>
Here's a demo
Hope this helps!

Try this one as well, I think its pretty strict:
<div id="PageTitle">(?:(?!<\/div>).)+<\/div>
Or even:
<div id="PageTitle">[\s\S]*?<\/div>

RegEx to match against HTML tags with certain attributes

I’m trying to write a RegEx that matches an opening HTML tag with a class attribute on it. Just like the following:
<!-- these should match -->
<div class="
<div class=">
<img src="image.jpg" class="
<img src="image.jpg" class=">
<!-- these should not match -->
<div> class="
</div class=">
So far I have:
<[^/^>]+>
This matches any opening HTML tag. I’m looking to adapt it to look for a class attribute within there too, like in the examples above.

Try this:
<[[a-z]{1,} class=">?
This is a really simple and will only match your examples. If you want to catch any opening tag with a class attribute in any place, you'll have to do something more complex.
Also, I like to use this:
https://regex101.com/
For testing online regex, a pretty helpful little playground.

R: how to get xpath expression for the nested structure

the following is the html codes:
<div class="grid1-4">
<a class="largeButton javascript sponsorProject button orangeGrad" href="javascript:;">
<div class="button">
<div class="button progress">
<div class="progressWrapper">
<div class="meter">
<div class="progress" style="width:52%"> </div>
</div>
<p class="progressText">
<span>52% Raised of $20,000 Goal</span>
I want to extract the sentence around at the very bottom of the codes - that is, 52% Raised of $20,000 Goal.
what is the xpath expression for that? I googled and searched for hints but couldn't get much out of it...:(. I even used firebug to find xpath expression yet still, no progress...
thank you
PS: due to the nature of my project, I CANNOT write
//p[#class="progressText"]//span
the xpath expression HAS TO INVOLVE
<div class="grid1-4">

This XPath:
//div[#class="grid1-4"]//text()[contains(., 'Raised of')]
Yields:
52% Raised of $20,000 Goal

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Parse specific div from raw text using regex? - regex

You may try this, /<div\b[^>]\bclass="([^"])"[^>]>(?:(?!<\/div>)[\s\S])?<img\b[^>]>(?:(?!<\/div>)[\s\S])?<\/div>/ DEMO

Related

Why doesnt this regexp work for this html?

Accordion container with article tag

Sublime Text Regex Search for alphanumeric string, not working..

RegEx to match against HTML tags with certain attributes

R: how to get xpath expression for the nested structure

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Parse specific div from raw text using regex? - regex

You may try this, /<div\b[^>]*\bclass="([^"]*)"[^>]*>(?:(?!<\/div>)[\s\S])*?<img\b[^>]*>(?:(?!<\/div>)[\s\S])*?<\/div>/ DEMO

Related

Why doesnt this regexp work for this html?

Accordion container with article tag

Sublime Text Regex Search for alphanumeric string, not working..

RegEx to match against HTML tags with certain attributes

R: how to get xpath expression for the nested structure

Categories

Resources

You may try this, /<div\b[^>]\bclass="([^"])"[^>]>(?:(?!<\/div>)[\s\S])?<img\b[^>]>(?:(?!<\/div>)[\s\S])?<\/div>/ DEMO