I need a regular Expression to get a string in between <ul> and </ul> tag...
But the thing is if there is one "<ul></ul>" tag inside the <ul> tag then regex stops with the inner tag...But i need the entire string between the outer two tags...
Can anyone help me?
Try this regex
String text = "<ul>My list</ul>";
String text1 = text.replaceAll("</?ul>", "");
^
? says / one time or none at all
So it will take out <ul> and </ul>
This is java language by the way. The regex may work in different languages
it will pick everything between from the first <ul> to the last </ul>.
<ul>(.*)</ul>
Related
I'm trying to process the html inside a contenteditable div. It might look like:
<div>Hi I'm Jack...</div>
<div><br></div>
<div><br></div>
<div>More text.</div> *<div><br></div>*
*<div><br></div>**<div><br></div>*
*<div><br></div>*
*<div>
<br>
</div>*
What regex expression would match all trailing <div><br></div> but not the ones sandwiched between useful divs containing text, i.e., <div> text (not html) </div>?
I have enclosed all expressions I want to match in asterisks. The asterisk are for reference only and are not part of my string.
Thanks,
Jack
You can use the pattern:
(?:<div>[\n\s]*<br>[\n\s]*<\/div>)(?!.*?<div>[^<]+<\/div>)
You can try it here.
Let me know if this works for all your cases and I will write a detailed explanation of the pattern.
Is it possible to replace all matches between two sub strings in one regular expression?
My case is that I want to inject HTML after the li tag but only inside myList.
Before:
<ul class="myList">
<li>List item</li>
<li>List item</li>
</ul>
After (... would be the injected markup):
<ul class="myList">
<li>...List item</li>
<li>...List item</li>
</ul>
Any help would be great, thanks.
My case is that I want to inject HTML after the li tag but only inside myList.
Other than for trivial cases where it's easy to write a "quick, hacky" script to get the job done, you should never use regular expressions to parse HTML.
A "quick, hacky" script in this case could be e.g. to search for:
<ul class="myList">.*<li>([^<]*)</li>(?=.*</ul>)
(Note: this probably needs to be a multi-line search; if there's no option for this then replace .* with [\s\S]*.)
...And then replace the value of the first match group (probably represented by $1 or \1, depending on how you do this).
However, as per my link above, I'd like to emphasise that this is not a perfect answer. It is literally impossible to perfectly parse HTML with a regular expression.
To do this "properly", you must use an XML parser instead.
I just don't get my Regex right:
I have the following template:
<!-- Defines the template for the tabs. -->
{{TMPL:Import=../../../../Data/Templates/Ribbon/tabs.tmpl; Name=Tabs}}
<div class="tabs">
<ul role="tablist">
{{BOS:Sequence}}
<li role="tab" class="{{TabType}}" id="{{tabId}}">
<span>{{TabFile}}</span>
</li>
{{EOS:Sequence}}
</ul>
</div>
{{Render:Tabs}}
I would like to find everything between {{}} except the tags that begins with {{BOS, {{EOS, {{TMPL, {{Render
Here are a couple approaches:
Attempt 1:
({{).*(}})
This selects everything between {{ }} tags, which is not good.
Attempt 2:
({{)[^TMPL][^BOS][^EOS][^Render].*(}})
This will make that {{TabType}} and {{TabFile}} are not selected anymore and I just don't know why.
With some other regex, I get that {{TabType}}" id="{{tabId}} is selected as one match.
Does anyone have a clue on how to solve this, I really need a regex Guru :-)
You can use negative lookahead based regex like this:
{{(?!TMPL|[BE]OS|Render).*?}}
RegEx Demo
You have to use the following regex to get the content between braces:
\{\{(.*?)\}\}
Working Demo
If you want to exclude the content from the comment you posted you can use a regex technique to exclude what you don't want and keep what you want at the end of the regex:
\{\{BOS:Sequence\}\}|\{\{EOS:Sequence\}\}|\{\{TMPL:Import.*?\}\}|\{\{Render:Tabs\}\}|\{\{(.*?)\}\}
Working demo
By the way, if you want to have a shortcut for above regex you can use:
\{\{(?:BOS|EOS):Sequence\}\}|\{\{TMPL:Import.*?\}\}|\{\{Render:Tabs\}\}|\{\{(.*?)\}\}
This is a very useful technique for pattern exclusion that I glad to learn it from Anubhava and zx81 (they rock using regex pattern). For this regex technique you can find the content you need using capturing groups (check the green highlights on the screenshot below):
Using [^TMPL] and the like won't work because these are character classes. You could use a negative lookahead, though (or even lookbehind depending upon the regex library you are using).
\{\{(?!BOS:)(?!EOS:)(?!Render:)(?!TMPL:)(.*?)\}\}
Still I get the feeling that you want the BOS, EOS, etc. to just be strings in the template with {{ and other values to be interpolated. If you are using handlebars or something, you can have strings interpolated:
{{'{{BOS:Sequence}}'}}
I have some html which I want to grab between 2 tags. However nested tags exist in the html so looking for wouldn't work as it would return on the first nested div.
Basically I want my regex to..
Match some text literally, followed by ANY character upto another literal text string. So my question is how do I get [^<]* to continue matching until it see's the next div.
such as
<div id="test"[^<]*<div id="test2"
Example html
<div id="test" class="whatever">
<div class="wrapper">
<fieldset>Test</fieldset><div class="testclass">some info</div>
</div>
<!-- end test div--></div>
</div>
<div id="test2" class="endFind">
In general, I suspect you want to look at "greedy" vs "lazy" in your regex, assuming that's supported by your platform/language.
For example, <div[^>]*>(.*?)</div> would make $1 match all the text inside a div, but would try to keep it as small as possible. Some people call *? a "lazy star".
But it seems you're looking to find the text within a div that is before the start of the first nested div. That would be something like <div[^>]*>(.*?)<div
Read about greedy vs lazy here and check to make sure that whatever language you're using supports it.
$ php -r '$text="<div>Test<div>foo</div></div>\n"; print preg_replace("/<div[^>]*>(.*?)<div.*/", "\$1", $text);'
Test
$
Regex is not capable of parsing HTML. If this is part of an application, you're doing something wrong. If you absolutely have to parse a document, use a html/xml parser.
If you're trying to screen scrape something and don't want to bother with a parser, look for identifying marks in the page you're scraping. For example, maybe the embedded div ends just before the one you want to match, so you could match </div></div> instead.
Alternatively, here's a regex that meets your requirements. However, it is very fragile: it will break if, for example, #test's children have children, or the html isn't valid, or I missed something, etc, etc ...
/<div id="test"[^<]*(<([^ >]+).+<\/$2>[^<]*)*<\/div>/
I have content something like
<div class="c2">
<div class="c3">
<p>...</p>
</div>
</div>
What I want is to match the div.c2's inner HTML. The contents of it may vary a lot. The only problem I am facing here is that how can I make it to work so that the right closing div is taken?
You can't. This problem is unsolvable with classic regular expressions, and with most of the existing regex implementations.
However, some regex engines have special support for balanced pair matching. See, e.g., here (.NET). Though even in this case your regex will be able to parse only a subset of syntactically correct texts (e.g., what if a < /div > is embedded in a comment?). You need an HTML parser to get reliable results.
Any chance this will always be valid XHTML? If so, you'd be better off parsing it as XML than trying to regex this.
Delete the first line, delete the last line. Problem solved. No need for RegEx.
The following pattern works well with .Net RegEx implementation:
\<div class="c2"\>{[\n a-z.<>="0-9/]+}\</div\>
And we replace that with \1.
Input:
<div class="c2">
<div class="c3">
<p>...</p>
</div></div></div></div></div></div></div></div>
</div>
Output:
<div class="c3">
<p>...</p>
</div></div></div></div></div></div></div></div>