Capture specific first matches in regex - regex

I have this text and want to capture each match of the letter 'ñ' under the html href attribute. I want it to match the 'ñ' in both niño.html and niña.html, but not the ones in Niño and Niña:
<a href='niño.html'>Niño</a> <a href='niña.html'>Niña</a>
I tried this but it also matches Niño:
ñ(.*?\.html'>)+?
When replacing with n\1, it gives:
<a href='nino.html'>Nino</a> <a href='niña.html'>Niña</a>
What I would want the text to look like is:
<a href='nino.html'>Niño</a> <a href='nina.html'>Niña</a>
How can I do this?

when you try this the part between does not contain the single quote:
ñ([^']*?\.html'>)+?

Related

Replace substring of a string using REGEX in Notepad++

I am using notepad++ and I want to create an automation in order to replace some strings.
In this case I am going to deal with the a href tag.
So, I will give 3 examples of some lines I have in my code :
01)
<img src="urlurlurlurl" alt="">
02)
<a href="https://url.com" class="logo"><img src="urlurlurlurl" alt="">
</a>
03)
<img src="urlurlurlurl" alt="">
04)
link
So, if I wanted to replace the full a href tag above in all 4 cases, I would use this one : <a href(.*?)a>
Now, I am trying to think of a way to replace the url within the a href tag only.
I tried using that :
href="(?s)(.*?)"|href ="(?s)(.*?)"
and it works fine because I also take into consideration that there might be a space.
But now in the replace window I have to include href=""
Is there a way to make it search for the a href tags and then replace a specific substring of it?
I want to know because there are cases where I have other tags that include a url and I want to replace it. But a generic replacement for all the strings that are included within quotes ("string") would not be good as I do not to replace all of them.
You can use a negated class to match everything before and after the href like,
(a[^>]*href\s*=\s*")[^"]*
replace with capture group $1REPLACE_STRING
Regex Demo
What it does?
a[^>]* Matches a followed by anything other than a closing >.
href\s*=\s*" Matches href=". Till here is captured in group 1.
[^"]* Matches anything other than ". This form the url that you want to replace.

Match that doesn't end with a slash

I'd like to match URLs that don't end in /, to use it in Dreamweaver's find tool.
What regex could I use?
For example, I'd like the following URL to be matched:
<a href="http://www.sometext"
You can do it with this simple regex:
href=".+?[^/]"
Explanation:
It will match href="________X", where X != /.
The following will match:
<a href="http://some-url.com">
<a href="http://www.another-url-here.com/content">
These ones won't:
<a href="http://www.url.com/">
<a href="http://www.url-2.com/posts/2014/">
Edit:
The following will allow URLs like <a href= http://www.url.com> too.
`href=\s*".+?[^/]"
Sure. You can use [^/]" at the end of your link expression to match any non-slash followed by a close-quote.
Maybe with this
href\s*=\s*"[^"]*[^"/\s]\s*"

Regular Expression in dreamweaver

Can anyone help me turn this into a regular expresion?
<a onclick="NavigateChat();" style="cursor:pointer;"><img src="images/online-chat.jpg" width="350" height="150" border="0" alt="Title Loans Novato - Online Chat"/></a>
The alt tag will change, and so might the image, but
<a onclick="NavigateChat();" style="cursor:pointer;">
will always start the string, and
</a>
will always end it.. How can I used a regex to find this?
Description
I'm not quite sure what you're looking to return, so this generic regular expression will:
find anchor tags
require the anchor tag to have an attribute onclick="navigatechat();"
require the anchor tag to have an attribute style="cursor:pointer;"
allow the attributes to be matched in any order
require the anchor tag's inner text to be only an image tag
capture the anchor tag's inner text tag in it's entirety
avoid many of the edge cases which makes pattern matching in html difficult
<a(?=\s|>)(?=(?:[^>=|&)]*|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*?\sonclick="NavigateChat\(\);")(?=(?:[^>=|&)]*|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*?\sstyle="cursor:pointer;")(?:[^>=|&)]|='(?:[^']|\\')*'|="(?:[^"]|\\")*"|=[^'"][^\s>]*)*>\s*(<img\s.*?)\s*<\/a>
Example
Live Demo
Sample Text
<a onmouseover=' a=1; onclick="NavigateChat();" style="cursor:pointer;" href="www.NotYourURL.com" ; if (3 <a && href="www.NotYourURL.com" && id="revSAR" && 6 > 3) { funRotate(href) ; } ; ' href='http://InterestedURL.com' id='revSAR'><img src="YouShouldn'tFindMe.nope"></a>
<a onclick="NavigateChat();" style="cursor:pointer;"><img src="images/online-chat.jpg" width="350" height="150" border="0" alt="Title Loans Novato - Online Chat"/></a>
Matches
Group 0 gets the entire matched anchor tag
Group 1 gets the inner text
[0][0] = <a onclick="NavigateChat();" style="cursor:pointer;"><img src="images/online-chat.jpg" width="350" height="150" border="0" alt="Title Loans Novato - Online Chat"/></a>
[0][1] = <img src="images/online-chat.jpg" width="350" height="150" border="0" alt="Title Loans Novato - Online Chat"/>
Do you need to extract/capture certain pieces of info or just find the whole string?
My usual method for generalizing regexp is to start with the literal text and just replace elements with general placeholders...
<a onclick="NavigateChat\(\);" style="cursor:pointer;"><img src="[^"]+" width="\d+" height="\d+" border="\d+" alt="[^"]+"/></a>
This expression uses the character set [^"] which stands for "not a quote mark". If you just use .* as a wildcard, your regexp will fail if there is more than one tag present in your document. Regexps are "greedy" and would try to select ALL the text from the first tag through to the end of the last link.
Without a data sample, I can't test this for sure, but it should be close.

regexp, help with assertions

I have the following string:
<a name="subhd_182"></a>
<a name="st_394"></a>
<a name="st_395"></a>
<a name="qn_494"></a>
<a name="st_495"></a>
<a name="qn_594"></a>
<a name="st_595"></a>
<a name="subhd_282"></a>
<a name="qn_694"></a>
<a name="st_695"></a>
<a name="qn_794"></a>
<a name="st_795"></a>
<a name="qn_894"></a>
<a name="st_895"></a>`
And I want to replace every <a name="st_\d*"></a> with <a name="qn_\d*"></a> if it follows immediately <a name="subhd_\d*"></a>
I use this regex %(.*<a name="subhd_.*)(?=<a name="st(?!<a name="qn))(<a name=")st(.*)%sU and replace with $1$2qn$3. But it also replaces second case too
I'm assuming you only want to match name after the first subhd row above, but not the second, since the first one is an "st_" and the second one is a "qn_".
Try:
(<a name="subhd_\d+">\s*<\/a>\s*<a name=")st(_\d+">)
where you would replace as $1qn$2 Note that here I have assumed that you were quite literal when you said "it follows immediately .
I don't really understand why you're throwing the lookahead in, unless the actual rule you're trying to implement is more complicated than you've stated.
Try: %(<a name="subhd_\d+"></a>\n<a name=")st(.*)%sU and replace with $1qn$2. On a sidenote I don't really know what the U modifier does for you here. Also, you might want to change your \n newline matcher according to your operating system.
I have found RegExr a really useful tool for regular expressions.

Regex for extracting links with specified attributes

I'm trying to build regex to extract links from text which have not rel="nofollow".
Example:
aiusdiua asudauih <a rel="nofollow" hre="http://uashiuadha.asudh/adas>adsaag</a> uhwaida <br> asdgydug <a href="http://asdha.sda/uduih/dufhuis>aguuia</a>
Thanks!
The following regex will do the job:
<a (?![^>]*?rel="nofollow")[^>]*?href="(.*?)"
The wanted urls will be in the capture group #1. E.g. in Ruby it would be:
if input =~ /<a (?![^>]*?rel="nofollow")[^>]*?href="(.*?)"/
match = $~[1]
end
Since it accepts [^>]*? before rel in the negative lookahead, href or anything else can come before rel. If href comes after rel, it'll of course also be ok.
Try this
<(?:A|AREA)\b[^<>]*?(?!rel="nofollow")[^<>]*?href=['"]([^>"]*)[^>]*?>
if you are using .net regex then
<(?:A|AREA)\b[^<>]*?(?!rel="nofollow")[^<>]*?href=['"](?<URL>[^>"]*)[^>]*?>
data lies in group named URL or group 1