How in regexp match one pattern from another? - regex

Have a string:
This is a <div> simple div </div> test /n
How can i match:
match_1:
'<div> simple div </div>'
and from this match_1: get second finally match?
'simple div'
Or another words: "get pattern_1 > get pattern_2(from pattern_1)"

Sounds like you just need to use some simple capture groups in one regex query. No need to do two separate expressions:
.*(<div>([\w\s]+)<\/div>).*
Full match: This is a <div> simple div </div> test /n
Group 1: <div> simple div </div>
Group 2: simple div
If you're using python, you can always use str = str.strip() to trim any excess whitespace on group 2.

Related

Trouble using Regex in VB for a string that contain \n + space

I want to get this string between the <p> tags here:
<h2>Description</h2>
</div>
<div class="topIdeasDetailsInner clearfix">
<p>There is nothing better than a customized sloth! With this app, you can dress up sloths, face-in-a-hole a sloth, and insert the sloth in any pictures. This app will be like having a small version of photoshop on your phone. </p>
</div>
</div>
In Regexr, I can get that string using the following expression:
<div class="topIdeasDetailsInner clearfix">\n <p>?[^>]*<\/p>
But when I use it in VB it doesn't really work.
"<div class=""topIdeasDetailsInner clearfix"">\n <p>(?<Data>[^>]*)<\/p>"
Maybe in VB I need to define the spaces after "\n" with a specific expression? I tried " *" but it didn't work too ...
P.S : I use <Data> to get the value I want so don't mind it.
P.S 2 : Here is the link I try to get my string from if you ever wonder: http://www.preapps.com/app-ideas/topic/41/sloth-me
You don't need to hardcode the whitespaces.
Just use \s with * (0 or more) or + (1 or more)
<div\s+class="topIdeasDetailsInner clearfix">\s*<p>([^<]*)<\/p>
And surrounding what you need with () will put it into a capture group.

Replace substring of a string using REGEX in Notepad++

I am using notepad++ and I want to create an automation in order to replace some strings.
In this case I am going to deal with the a href tag.
So, I will give 3 examples of some lines I have in my code :
01)
<img src="urlurlurlurl" alt="">
02)
<a href="https://url.com" class="logo"><img src="urlurlurlurl" alt="">
</a>
03)
<img src="urlurlurlurl" alt="">
04)
link
So, if I wanted to replace the full a href tag above in all 4 cases, I would use this one : <a href(.*?)a>
Now, I am trying to think of a way to replace the url within the a href tag only.
I tried using that :
href="(?s)(.*?)"|href ="(?s)(.*?)"
and it works fine because I also take into consideration that there might be a space.
But now in the replace window I have to include href=""
Is there a way to make it search for the a href tags and then replace a specific substring of it?
I want to know because there are cases where I have other tags that include a url and I want to replace it. But a generic replacement for all the strings that are included within quotes ("string") would not be good as I do not to replace all of them.
You can use a negated class to match everything before and after the href like,
(a[^>]*href\s*=\s*")[^"]*
replace with capture group $1REPLACE_STRING
Regex Demo
What it does?
a[^>]* Matches a followed by anything other than a closing >.
href\s*=\s*" Matches href=". Till here is captured in group 1.
[^"]* Matches anything other than ". This form the url that you want to replace.

regex to replace HTML sorrounding tag

im trying to replace an html tag with another one using notepad++ search and replace.
i would like this:
<strong style="font-size: 1em;"><br />some text</strong>
to become this
<h3>some text</h3>
so far i have reached this:
<strong style="font-size: 1em;"\s(.*?)><br />(.*?)</strong>
and am not sure what to put inside "replace with", is this ok:
<h3>$1</h3>
?
Thanks
Try this as the replacement pattern.
<h3>\2</h3>
You can reference capture groups (between parenthesis) in the regex by \n where n is the number of the group.
The regex should be this for catching...
<strong style="font-size: 1em;"\s?(.*?)><br />(.*?)</strong>
this \s should be optional according to your html

Regex for extracting links with specified attributes

I'm trying to build regex to extract links from text which have not rel="nofollow".
Example:
aiusdiua asudauih <a rel="nofollow" hre="http://uashiuadha.asudh/adas>adsaag</a> uhwaida <br> asdgydug <a href="http://asdha.sda/uduih/dufhuis>aguuia</a>
Thanks!
The following regex will do the job:
<a (?![^>]*?rel="nofollow")[^>]*?href="(.*?)"
The wanted urls will be in the capture group #1. E.g. in Ruby it would be:
if input =~ /<a (?![^>]*?rel="nofollow")[^>]*?href="(.*?)"/
match = $~[1]
end
Since it accepts [^>]*? before rel in the negative lookahead, href or anything else can come before rel. If href comes after rel, it'll of course also be ok.
Try this
<(?:A|AREA)\b[^<>]*?(?!rel="nofollow")[^<>]*?href=['"]([^>"]*)[^>]*?>
if you are using .net regex then
<(?:A|AREA)\b[^<>]*?(?!rel="nofollow")[^<>]*?href=['"](?<URL>[^>"]*)[^>]*?>
data lies in group named URL or group 1

Reg exp: string NOT in pattern

I have problems constructing a reg exp. I think I should use lookahead/behind but I just don't make it.
I want to make a reg-exp that catches all HTML tags that do NOT contain a string ('rabbit').
For example, the following tags should be matched
<a XXX> <span yyy> </div x zz> </li qwerty=ab cd> <div hello=stackoverflow>
But not the following
<a XXrabbitX> <span yyyrabbit> </div xrabbitzz> </li rabbit=abcd hippo=9876> <div hello=rabbit>
(My next step is to make make a substitution so that the word rabbit enters the tags, but that will hopefully come easy.)
(I use PHP5-preg_replace.)
Thanks.
I guess you're matching the HTML tags with a regex something like this:
/<[^>]*>/
You can add a negative look-ahead assertion in there to assert that "rabbit" cannot be found in the tag:
/<(?![^>]*rabbit)[^>]*>/