Have a string:
This is a <div> simple div </div> test /n
How can i match:
match_1:
'<div> simple div </div>'
and from this match_1: get second finally match?
'simple div'
Or another words: "get pattern_1 > get pattern_2(from pattern_1)"
Sounds like you just need to use some simple capture groups in one regex query. No need to do two separate expressions:
.*(<div>([\w\s]+)<\/div>).*
Full match: This is a <div> simple div </div> test /n
Group 1: <div> simple div </div>
Group 2: simple div
If you're using python, you can always use str = str.strip() to trim any excess whitespace on group 2.
Related
I want to get this string between the <p> tags here:
<h2>Description</h2>
</div>
<div class="topIdeasDetailsInner clearfix">
<p>There is nothing better than a customized sloth! With this app, you can dress up sloths, face-in-a-hole a sloth, and insert the sloth in any pictures. This app will be like having a small version of photoshop on your phone. </p>
</div>
</div>
In Regexr, I can get that string using the following expression:
<div class="topIdeasDetailsInner clearfix">\n <p>?[^>]*<\/p>
But when I use it in VB it doesn't really work.
"<div class=""topIdeasDetailsInner clearfix"">\n <p>(?<Data>[^>]*)<\/p>"
Maybe in VB I need to define the spaces after "\n" with a specific expression? I tried " *" but it didn't work too ...
P.S : I use <Data> to get the value I want so don't mind it.
P.S 2 : Here is the link I try to get my string from if you ever wonder: http://www.preapps.com/app-ideas/topic/41/sloth-me
You don't need to hardcode the whitespaces.
Just use \s with * (0 or more) or + (1 or more)
<div\s+class="topIdeasDetailsInner clearfix">\s*<p>([^<]*)<\/p>
And surrounding what you need with () will put it into a capture group.
I am using notepad++ and I want to create an automation in order to replace some strings.
In this case I am going to deal with the a href tag.
So, I will give 3 examples of some lines I have in my code :
01)
<img src="urlurlurlurl" alt="">
02)
<a href="https://url.com" class="logo"><img src="urlurlurlurl" alt="">
</a>
03)
<img src="urlurlurlurl" alt="">
04)
link
So, if I wanted to replace the full a href tag above in all 4 cases, I would use this one : <a href(.*?)a>
Now, I am trying to think of a way to replace the url within the a href tag only.
I tried using that :
href="(?s)(.*?)"|href ="(?s)(.*?)"
and it works fine because I also take into consideration that there might be a space.
But now in the replace window I have to include href=""
Is there a way to make it search for the a href tags and then replace a specific substring of it?
I want to know because there are cases where I have other tags that include a url and I want to replace it. But a generic replacement for all the strings that are included within quotes ("string") would not be good as I do not to replace all of them.
You can use a negated class to match everything before and after the href like,
(a[^>]*href\s*=\s*")[^"]*
replace with capture group $1REPLACE_STRING
Regex Demo
What it does?
a[^>]* Matches a followed by anything other than a closing >.
href\s*=\s*" Matches href=". Till here is captured in group 1.
[^"]* Matches anything other than ". This form the url that you want to replace.
im trying to replace an html tag with another one using notepad++ search and replace.
i would like this:
<strong style="font-size: 1em;"><br />some text</strong>
to become this
<h3>some text</h3>
so far i have reached this:
<strong style="font-size: 1em;"\s(.*?)><br />(.*?)</strong>
and am not sure what to put inside "replace with", is this ok:
<h3>$1</h3>
?
Thanks
Try this as the replacement pattern.
<h3>\2</h3>
You can reference capture groups (between parenthesis) in the regex by \n where n is the number of the group.
The regex should be this for catching...
<strong style="font-size: 1em;"\s?(.*?)><br />(.*?)</strong>
this \s should be optional according to your html
I'm trying to build regex to extract links from text which have not rel="nofollow".
Example:
aiusdiua asudauih <a rel="nofollow" hre="http://uashiuadha.asudh/adas>adsaag</a> uhwaida <br> asdgydug <a href="http://asdha.sda/uduih/dufhuis>aguuia</a>
Thanks!
The following regex will do the job:
<a (?![^>]*?rel="nofollow")[^>]*?href="(.*?)"
The wanted urls will be in the capture group #1. E.g. in Ruby it would be:
if input =~ /<a (?![^>]*?rel="nofollow")[^>]*?href="(.*?)"/
match = $~[1]
end
Since it accepts [^>]*? before rel in the negative lookahead, href or anything else can come before rel. If href comes after rel, it'll of course also be ok.
Try this
<(?:A|AREA)\b[^<>]*?(?!rel="nofollow")[^<>]*?href=['"]([^>"]*)[^>]*?>
if you are using .net regex then
<(?:A|AREA)\b[^<>]*?(?!rel="nofollow")[^<>]*?href=['"](?<URL>[^>"]*)[^>]*?>
data lies in group named URL or group 1
I have problems constructing a reg exp. I think I should use lookahead/behind but I just don't make it.
I want to make a reg-exp that catches all HTML tags that do NOT contain a string ('rabbit').
For example, the following tags should be matched
<a XXX> <span yyy> </div x zz> </li qwerty=ab cd> <div hello=stackoverflow>
But not the following
<a XXrabbitX> <span yyyrabbit> </div xrabbitzz> </li rabbit=abcd hippo=9876> <div hello=rabbit>
(My next step is to make make a substitution so that the word rabbit enters the tags, but that will hopefully come easy.)
(I use PHP5-preg_replace.)
Thanks.
I guess you're matching the HTML tags with a regex something like this:
/<[^>]*>/
You can add a negative look-ahead assertion in there to assert that "rabbit" cannot be found in the tag:
/<(?![^>]*rabbit)[^>]*>/