I'm working with existing code where regular expressions are used to parse HTML. For specific reasons it is not possible to use XPATH. The HTML is actually a html/text email. In the email I have multiple div elements with text content. I'm trying to write regex which match n-th div element. Unfortunatelly these div elements do not have any attributes like classes or ids. I tried this but it match all occurrences
<div>(.*)<\/div>{1}
There many suggestions out there but none of theme is working for me.
Thanks.
Related
I want to find a RegEx that allows me to find a specific text between HTML table tags.
I have: This is a test text <tr><td>text inside table</td></tr> and I want the RegEx to return me just the second 'text' because it is inside the table.
I have tried <tr>(text)<\/tr> but returns nothing.
It needs to be done with RegEx it cannot be done with a HTML parser
Your <tr>(text)<\/tr> matches only <tr>text</tr>, but you have other text around.
So you need <tr>.*?(text).*?<\/tr> for that
I need to wrap numbers inside HTML tags without affecting attributes.
So far, all I could get is selecting what's inside a tag only, digits and non digital characters too :(
Here's the regular expression I'm using :
/([0-9]+(?:\.[0-9]*)?)/g
Here's the code at RegExr!
I'll be using jQuery to parse it. This is the closest I could get jsfiddle.
How to make this regular expression look only for numbers inside html tags?
Thanks for your help.
This matches 123 in <div>123</div> for example:
[0-9]+(?:\.[0-9]*)|(?<=^|>)\d+(?=<|$)
This regex was edited from the link you provided: http://regexr.com/?361gc
This selects only numbers within html tags. It also works on multi line text.
(?!<[A-Z][A-Z0-9]*\b[^><]*>[^><0-9]*)([0-9]+)(?=[^><0-9]*<)
You can test it here.
But please be advised that <html> and <body> tags will match the pattern you asked for, so when you are running a complete html document through this regex, most or all numbers will be matching.
Testing on your code on jsfiddle I changed it to this:
$('body').each(function() {
$(this).html(function(i, v) {
return v.replace(/(?!<[A-Z][A-Z0-9]*\b[^><]*>[^><0-9]*)([0-9]+)(?=[^><0-9]*<)/gim, '<span>$1</span>');
});
});
So now it only runs on the elements of the body and not the whole document. Is that giving the expected result?
I have the following content
<li>Title: [...]</li>
and I'm looking for regex that will match and replace this so that I can parse it as XML. I'm just looking to use a regex find and replace inside Sublime Text 2, so I want to match everything in the above example except for the [...] which is the content.
Why not extract the content and use it to build the xml rather than trying to mold the wrapper of the content into xml? (or am i mis understanding you?)
<li>Title: ([^<]*)<\/li>
is the regular expression to extract the content.
Its pretty self explanatory other than the [^<]* which means match any number of characters that is not a "<"
I don't know Sublime, but something like this should suffice to get you the contents of the li. It allows for there being optional extra attributes on the tag. Make sure and turn off case-sensitivity, incase of LI or Li etc. (lifted straight from http://www.regular-expressions.info/examples.html ):
<li\b[^>]*>(.*?)</li>
<li>\S*(.*)?</li>
That should match your string, with the content being capturing group 1.
Hello I'm trying to match multi-nested quote's blockquotes and transform them back into BBCode
This is what I got so far as far as regex is involved
Converted it back to html entities to be seen on stackedoverflow
<div class="quoteheader"><div class="topslice_quote">([\s\S]*?)</div></div><blockquote>([\s\S]*?)(?:</blockquote><div class="quotefooter"><div class="botslice_quote"></div></div>){2,})
I'm trying to match this
<div class="quoteheader"><div class="topslice_quote">Quote</div></div><blockquote>Outside quote is this
<div class="quoteheader"><div class="topslice_quote">Quote</div></div><blockquote>Inner quote is this</blockquote><div class="quotefooter"><div class="botslice_quote"></div></div>
</blockquote><div class="quotefooter"><div class="botslice_quote"></div></div>
to generate this
[quote]Outside quote is
this[quote]Inner quote is
this[/quote][/quote]
I'm using VBScript 5.5 Regeular Expressions for this. (but this isn't that important)
I really need help on the expression. I've tired using a HTML Parser for this but it turns out to be more difficult then using regex
I'm just repeating what's said here.
Regular Expressions can't match Context Free languages, like groups of tags. You can't match opening to closing tags, so parsing a block (Especially a nested one) becomes impossible to do reliably.
You can certainly build a cludge to help, but there will be situations where it won't work.
Well, this is all you need to do with the parser.
Here's the pseudocode. I don't know your parser so this is the best I can offer.
First find the div tag with the quoteheader class. Get the next sibling.
That is the blockquote tag. Let's call this tag theQuote.
Get the first child of theQuote. It will be a html text item. That is the outer quote.
Get the third child of theQuote. It will be another blockquote tag. Let's call this tag theInner.
Get the first child of theInner. It will be a html text item. That is the inner quote.
I need a regex that will find either an opening div tag, or a closing div tag, or both in an html web page. Thanks :)
Just to be safe:
</? *div[^>]*>
You could start with:
</?div>
This won't correctly handle:
whitespace
attributes on the div
self-closing div tags
upper case tags
tags inside HTML comments that should be ignored
etc...
To handle HTML correctly you're better off using an HTML parser rather than regular expressions.
If you can use xpath it would be //div Look into using an XML parser that supports it instead of regex. If you MUST use regex, go with coding_hero's answer.
Just for show, in PHP:
//$htmldoc is some xhtml document from somewhere
$xhtml = simplexml_load_file($htmldoc);
$divs = $xhtml->xpath('//div'); //grab simpleXMLElement from all divs in document
return $divs->asXML(); //returns xml of div elements and children
HTML, XHTML, and XML can not be parsed using regular expressions. There are parsers designed for this type of thing. If you specify the language(s) you are using, I'm sure someone can suggest the right tool(s) for the job, but I know for a fact that regular expressions will not be on that list.
To find opening and closing div tag I would use
</?\bdiv\b[^>]*>