This question already has answers here:
RegEx match open tags except XHTML self-contained tags
(35 answers)
Closed 9 years ago.
In my document I have
<Country>US</Country>
<Country>PR</Country>
Between the
<country>
and
</country>
I want to find ANYTHING except for US and PR.
For example
<country>US</country> = ignore
<country>PR</country> = ignore
<country>UP</county> = match found
What I have is
Pattern = "<Country>(.*?[^USPR].*?)</Country>"
but this ignores strings like
<Country>UP</Country>
Not sure how to write allowing only 2 options between the tags.. US and PR only.
This should work.
<country>(?!(US|PR))(.*?)</country>
Matches the opening <country> tag not followed by US or PR. Then goes on to match anything before the closing </country> tag.
Try this one:
(?<=<Country>(?!US|PR)).*?(?=</Country>)
Related
This question already has answers here:
Why it's not possible to use regex to parse HTML/XML: a formal explanation in layman's terms
(10 answers)
Closed 3 years ago.
I know I got BeautifulSoup, But I want to try my own.
Regex I've been working on
<br>This Text needed
<a>unwanted text</a>
<br/>
This text needed
<a >unwanted text</a>
This text needed
<a>unwanted text</a>
<br>this text needed
What I have come up with:
(</a>|(<br(/>|>)))(\s.*|\w.*)
I want to match the This text needed but one of them isn't matching.
How about this way with lookahead negative and lookbehind positive,
(?<!<a>)This Text needed(?!<\/a>)
DEMO: https://regex101.com/r/RT5LZu/1
This question already has answers here:
What is the best way to parse html in C#? [closed]
(15 answers)
Closed 5 years ago.
I have a requirement where I don't have to match a specific word when in occurs between anchor tag. Anchor tags can have other html tags nested.
For Example:
<a title="Test" href="http://www.google.com/"><span style="color: blue;">Test</span></a><p>Test - MANUALLY<br /><br />Google </p><p> Resolving as duplicate of Test</p><p>Test test</p>
Here every "Test" gets selected. All I want here is getting only "Test" not present inside "anchor tag" and also not part of attributes of "anchor tag".
Regex I used was:
(?!<a[^>]*>)(Test)(?![^<]*<\/a>)/gi
Not sure if this will accomplish your needs, but the second capturing group should only include matches that do not fall within the anchor tag.
(<a.*?<\/a>)|(test)/gi
https://regex101.com/r/rTLifk/1
However, I would highly recommend utilizing an XML parser or XPath.
This question already has answers here:
Why it's not possible to use regex to parse HTML/XML: a formal explanation in layman's terms
(10 answers)
Closed 5 years ago.
I have a xml file with this data format
<row Id="9" Body="aaaaaaaaa" Target="123456" />
I want to find & replace all Body="" things with a space from my xml file. What is the regex for that?
There are many possibilities, here is one way to remove the content from the Body attribute
(<row.*Body=").*?("[^>]+>)
This creates two capturing groups for the content before and after the Body attribute. Then, you just use those capturing groups for the replacement:
$1$2
It will transform:
<row Id="9" Body="aaaaaaaaa" Target="123456" />
Into:
<row Id="9" Body="" Target="123456" />
You can see it working here.
This question already has answers here:
RegEx match open tags except XHTML self-contained tags
(35 answers)
Closed 8 years ago.
I'm trying to write a regular expression to see if a string contains any of the typical table tags:
<table></table>
<td></td>
<th></th>
<tr></tr>
<thead></thead>
<tfoot></tfoot>
<tbody></tbody>
Along with tags that may contain other attributes e.g:
<table border="1">
I've come up with this so far, however, it matches <br /> tag and I'm not sure why:
/<\/?[table|td|th|tr|tfoot|thead|tbody]{1,}>?/
http://www.rexfiddle.net/20Xtqka
Regular expressions use parentheses, not square brackets, to group things. A set of characters inside square brackets matches any of those characters.
/<\/?(table|td|th|tr|tfoot|thead|tbody)+>?/
When you want to match 1 or more of something, use + rather than {1,}.
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
RegEx match open tags except XHTML self-contained tags
How to remove single attribute with quotes via RegEx
I am trying to remove the "sfref" attribute from the html code below:
<a sfref="[Libraries]719c25f9-89b3-4a7c-b6d5-e734b0c06ac1" href="../../HPLC.sflb.ashx">Determination</a> <br />
<img sfref="[Libraries]3e60aebb-acac-4806-bd22-f7986f66e7b3" src="../../Note52011.sflb.ashx">Test</a><br />
So far I have come up with this regex, but it is not matching:
(sfref=")([a-zA-Z0-9:;.\s()-\,]*)(")
This is where I am testing if it help:
http://regexr.com?2v4h6
Can someone please help me remove the "sfref" attribute?
You really really really shouldn't use regex (see the link in #Jack Maney's comment), but if you have to, this should work:
sfref="[^"]*"
This will work for single or double quotes.
sfref=('|").*?\1