I am currently in a bit of a pickle with JS Regex. From the following code, we need to extract the content inside the div or span:
<span class="code">
!(true ^ false)
</span>
<span class="code">
(true ^ false)
</span>
So we need to match both !(true ^ false) and (true ^ false)
I came up with the following regex: /<(div|span) class="code">([\s\S]+)<\/(div|span)>/im. This works when there is only one div or span in the to be matched text. However, in the situation outlined above, the match is:
! (true ^ false)
</span>
<span class="code">
!(true ^ false)
So basically it only takes the opening and the ending tag. How do I fix this ?
Should fix it by matching <div> with </div> and <span> with </span> and making regex lazy using ?
Regex: <(div|span) class="code">([\s\S]+?)<\/\1>
Explanation:
Matching <div> with </div> should be done by using back-referencing for first captured group using \1.
Made regex to match minimum tags using ?.
Regex101 Demo
All you really need is to put a ? behind your + to make the match non-greedy and then add the g modifier to continue searching after the first match:
<(div|span) class="code">\s*([\s\S]+?)<\/(div|span)>
Demo
Related
I have an HTML string, e.g. :
<a href=“{{foo.bar}}”>some text “nice” here</a>
I'm trying to find out if any opening/closing double quote (“”, not ") is present inside an html tag (i.e. inside <>, but there could others things also in the tag).
In my example, <a href=“{{foo.bar}}”> should match but “nice” or </a> shouldn't.
What is the right regex for this ?
Actually I don't believe you've found it but you rather you fell into the common trap of regular expressions. You found a pattern which matches what you desire in a specific case.
If you place a < character inside the value of the tag of the link, <a href=“{{foo.bar}}”>some text < “nice” here</a> and your regex will match <a href=“{{foo.bar}}”> and < “nice” here</a>.
So an extra caution needs to be taken when it comes to regular expressions. To match any opening tag of html better use <\w+.*?>. After that extract whatever you find inside “”.
ok, found it : <[^>]*[“”]+[^>]*>
That does not work as you probably expect it to. When you add capturing groups, you'll see which parts of the string are actually matched by which groups:
<([^>]*)([“”]+)([^>]*)>
matches your example in this way:
<a href=“{{foo.bar}}”> a href=“{{foo.bar}} ”
^ Full match ^ 1st group ^ 2nd group ^ 3rd group (nothing)
Building on #Themelis' answer, you probably want to start with something like this:
<(\w+ [^<>“]*)“([^”]+)”([^<>]*)>
matches your example in this way:
<a href=“{{foo.bar}}”> a href= {{foo.bar}}
^ Full match ^ 1st group ^ 2nd group ^ 3rd group (nothing)
Have a string:
This is a <div> simple div </div> test /n
How can i match:
match_1:
'<div> simple div </div>'
and from this match_1: get second finally match?
'simple div'
Or another words: "get pattern_1 > get pattern_2(from pattern_1)"
Sounds like you just need to use some simple capture groups in one regex query. No need to do two separate expressions:
.*(<div>([\w\s]+)<\/div>).*
Full match: This is a <div> simple div </div> test /n
Group 1: <div> simple div </div>
Group 2: simple div
If you're using python, you can always use str = str.strip() to trim any excess whitespace on group 2.
I need a regex for fetching the value in the </span> tag
<span class="booking-id-value">U166097</span>
value required: U166097
can please someone suggest me. I have tried using
<span class="booking-id-value">(.+?)
but it is not deriving the desired result it display on "U"
I think you need to be more specific about your expected value - below I'll just accept alphabetic and numeric characters as value - and more flexible about your tag, then I can suggest you to use a regex like this:
/<\s*span.+?class\s*=\s*"\s*booking-id-value\s*".*?>/s*([A-Za-z0-9]+)\s*<\//
Regex Demo
? after the .+ makes it ungreedy, tells it to match as little as possible - and that’s just the first U in this case.
Remove the ?, and instead look for the closing </span> after (.*) to terminate what is matched correctly:
<span class="booking-id-value">(.+)<\/span>
https://regex101.com/r/vt4pgN/1/
Regex:
<span.*>(.*)<\/span>
Substitute with:
$1
Result
Can anyone help me turn this into a regular expresion?
<a onclick="NavigateChat();" style="cursor:pointer;"><img src="images/online-chat.jpg" width="350" height="150" border="0" alt="Title Loans Novato - Online Chat"/></a>
The alt tag will change, and so might the image, but
<a onclick="NavigateChat();" style="cursor:pointer;">
will always start the string, and
</a>
will always end it.. How can I used a regex to find this?
Description
I'm not quite sure what you're looking to return, so this generic regular expression will:
find anchor tags
require the anchor tag to have an attribute onclick="navigatechat();"
require the anchor tag to have an attribute style="cursor:pointer;"
allow the attributes to be matched in any order
require the anchor tag's inner text to be only an image tag
capture the anchor tag's inner text tag in it's entirety
avoid many of the edge cases which makes pattern matching in html difficult
<a(?=\s|>)(?=(?:[^>=|&)]*|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*?\sonclick="NavigateChat\(\);")(?=(?:[^>=|&)]*|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*?\sstyle="cursor:pointer;")(?:[^>=|&)]|='(?:[^']|\\')*'|="(?:[^"]|\\")*"|=[^'"][^\s>]*)*>\s*(<img\s.*?)\s*<\/a>
Example
Live Demo
Sample Text
<a onmouseover=' a=1; onclick="NavigateChat();" style="cursor:pointer;" href="www.NotYourURL.com" ; if (3 <a && href="www.NotYourURL.com" && id="revSAR" && 6 > 3) { funRotate(href) ; } ; ' href='http://InterestedURL.com' id='revSAR'><img src="YouShouldn'tFindMe.nope"></a>
<a onclick="NavigateChat();" style="cursor:pointer;"><img src="images/online-chat.jpg" width="350" height="150" border="0" alt="Title Loans Novato - Online Chat"/></a>
Matches
Group 0 gets the entire matched anchor tag
Group 1 gets the inner text
[0][0] = <a onclick="NavigateChat();" style="cursor:pointer;"><img src="images/online-chat.jpg" width="350" height="150" border="0" alt="Title Loans Novato - Online Chat"/></a>
[0][1] = <img src="images/online-chat.jpg" width="350" height="150" border="0" alt="Title Loans Novato - Online Chat"/>
Do you need to extract/capture certain pieces of info or just find the whole string?
My usual method for generalizing regexp is to start with the literal text and just replace elements with general placeholders...
<a onclick="NavigateChat\(\);" style="cursor:pointer;"><img src="[^"]+" width="\d+" height="\d+" border="\d+" alt="[^"]+"/></a>
This expression uses the character set [^"] which stands for "not a quote mark". If you just use .* as a wildcard, your regexp will fail if there is more than one tag present in your document. Regexps are "greedy" and would try to select ALL the text from the first tag through to the end of the last link.
Without a data sample, I can't test this for sure, but it should be close.
I have problems constructing a reg exp. I think I should use lookahead/behind but I just don't make it.
I want to make a reg-exp that catches all HTML tags that do NOT contain a string ('rabbit').
For example, the following tags should be matched
<a XXX> <span yyy> </div x zz> </li qwerty=ab cd> <div hello=stackoverflow>
But not the following
<a XXrabbitX> <span yyyrabbit> </div xrabbitzz> </li rabbit=abcd hippo=9876> <div hello=rabbit>
(My next step is to make make a substitution so that the word rabbit enters the tags, but that will hopefully come easy.)
(I use PHP5-preg_replace.)
Thanks.
I guess you're matching the HTML tags with a regex something like this:
/<[^>]*>/
You can add a negative look-ahead assertion in there to assert that "rabbit" cannot be found in the tag:
/<(?![^>]*rabbit)[^>]*>/