Reg exp: string NOT in pattern - regex

I have problems constructing a reg exp. I think I should use lookahead/behind but I just don't make it.
I want to make a reg-exp that catches all HTML tags that do NOT contain a string ('rabbit').
For example, the following tags should be matched
<a XXX> <span yyy> </div x zz> </li qwerty=ab cd> <div hello=stackoverflow>
But not the following
<a XXrabbitX> <span yyyrabbit> </div xrabbitzz> </li rabbit=abcd hippo=9876> <div hello=rabbit>
(My next step is to make make a substitution so that the word rabbit enters the tags, but that will hopefully come easy.)
(I use PHP5-preg_replace.)
Thanks.

I guess you're matching the HTML tags with a regex something like this:
/<[^>]*>/
You can add a negative look-ahead assertion in there to assert that "rabbit" cannot be found in the tag:
/<(?![^>]*rabbit)[^>]*>/

Related

How in regexp match one pattern from another?

Have a string:
This is a <div> simple div </div> test /n
How can i match:
match_1:
'<div> simple div </div>'
and from this match_1: get second finally match?
'simple div'
Or another words: "get pattern_1 > get pattern_2(from pattern_1)"
Sounds like you just need to use some simple capture groups in one regex query. No need to do two separate expressions:
.*(<div>([\w\s]+)<\/div>).*
Full match: This is a <div> simple div </div> test /n
Group 1: <div> simple div </div>
Group 2: simple div
If you're using python, you can always use str = str.strip() to trim any excess whitespace on group 2.

How do I conditionally add a space in a regex replace

When I woke up this morning, I didn’t know a stroke of regex. By the time I went to Mass, I’d been able to cobble together this regex to find occurrences of ‘Mph’ in an html document.
(?i)(?<=[\s|\d])mph+
If I run it against the following test data:
<div class="vsMph">
<p>95 Mph</p>
</div>
<div class="vsMph">
<p>95Mph</p>
</div>
It correctly matches:
‘ Mph’ and
‘Mph’
And equally correctly leaves the ‘vsMph’ alone, which is exactly what I want. Eventually, I'm going to use the same technique to match knots, ft, in, km and so on.
I’m executing this expression in in Sublime Text 3 using RegReplace and ultimately, what I hope to do is to use this regular expression to find all occurrences of ‘Mph’ preceded by a space or a digit and:
Enclose ‘Mph’ in <abbr> tags.
Add a space between the digit and the
opening <abbr> tag if there was no space between the last digit and
'Mph' originally.
In other words, I want to convert the above test data to:
<div class="vsMph">
<p>95 <abbr title="Miles per hour">Mph</abbr></p>
</div>
<div class="vsMph">
<p>95 <abbr title="Miles per hour">Mph</abbr></p>
</div>
I can get RegReplace to add the <abbr> tags as described in 1. above, but I’ve searched around on Google and I can’t find anything that tells me how to conditionally insert a space in a regex replace.
So I’m wondering. Is it possible in the first place to conditionally add a space in a regex replacement and if so how do I do it, or do I have to search for ‘\sMph’ and ‘\dMph’ and replace them separately?
Regards.
I would suggest using groups to match Mph. You could search for simply the following regex:
(\d)(\s)?(Mph)
Then replace using groups
$1 <abbr title="Miles per hour">$3</abbr>
output:
<div class="vsMph">
<p>95 <abbr title="Miles per hour">Mph</abbr></p>
</div>
<div class="vsMph">
<p>95 <abbr title="Miles per hour">Mph</abbr></p>
</div>

How to Match only url from a tag node js

I have a tag <span style="color: rgb(255,255,255);">[1]</span>
I am using this regex <a href="(.*)">(.*)<\/a> But its not parsing the url only. Its also parsing <span style="color: rgb(255,255,255);">[1]</span>
How can i get only the url from a tags?
Easily! Capture everything that is between href, that is a key word, and <.
href=(.*?)>
If you don't want to capture "", try this one:
href="(.*?)">
Although I am not much experienced with node.js, I think this one may work, but it won't be hard for you if you know the Regex.
var pattern = new RegExp(/href="(.*?)">/);
Here is Regex101.

Regex find and replace between <div class="customclass"> and </div> tag

I cant find anywhere a working regex expression to find and replace the text between the div tags
So there is this html where i want to select everything between the <div class="info"> and </div> tag and replace it with some other texts
<div class="extraUserInfo">
<p>Hello World! This is a sample text</p>
<javascript>.......blah blah blah etc etc
</div>
and replace it with
My custom text with some codes
<tags> asdasd asdasdasdasdasd</tags>
so it would look like
<div class="extraUserInfo">
My custom text with some codes
<tags> asdasd asdasdasdasdasd</tags>
</div>
here is a refiddle that all my code is there and as you can see I want to replace the whole bunch of codes between the and tag
http://refiddle.com/1h6j
Hope you get what I mean :)
If there's no nesting, would just do a plain match non-greedy (lazy)
(?s)<div class="extraUserInfo">.*?</div>
.*? matches any amount of any character (as few as possible) to meet </div>
Used s modifier for making the dot match newlines too.
Edit: Here a Javascript-version without s modifier
/<div class="extraUserInfo">[\s\S]*?<\/div>/g
And replace with new content:
<div class="extraUserInfo">My custom...</div>
See example at regex101; Regex FAQ

Regexp: remove all tags from string except one kind of tags

I have such string
<p>test <span class=\"match\">match</span> <span class=\"testtes\">dddddd</span></p>
I want to get string without tags. But I want to save highlighting by class "match":
test <span class=\"match\">match</span> dddddd
If I want to just remove all tags I substitute all substrings that satisfied regexp /<\/?[^>]*>/ by empty string. But what regexp should I use in my special case?
UPD: The algorithm is: if you see and some sentence without tags and then then you shouldn't remove these spans; otherwise you should remove all tags
I can could do someting like this
<\/?(?![^>]*class=\\"match)[^>]*>
This would preserve the opening tag and result in this
test <span class=\"match\">match dddddd
See it here on Regexr
But how should I find the matching closing tag?
<p>test <span class=\"match\">match</span> <span class=\"testtes\">dddddd</span></p>
^^^^^^^ or the next one? ^^^^^^^
Regex can't know which closing tag belongs to the opening <span> tag that contains that class. I don't have the possibility to find matching closing tags. So its not a good idea to do this using regex.
I am quite sure the language you are using has an html parser that can be used to do this task.