Match that doesn't end with a slash - regex

I'd like to match URLs that don't end in /, to use it in Dreamweaver's find tool.
What regex could I use?
For example, I'd like the following URL to be matched:
<a href="http://www.sometext"

You can do it with this simple regex:
href=".+?[^/]"
Explanation:
It will match href="________X", where X != /.
The following will match:
<a href="http://some-url.com">
<a href="http://www.another-url-here.com/content">
These ones won't:
<a href="http://www.url.com/">
<a href="http://www.url-2.com/posts/2014/">
Edit:
The following will allow URLs like <a href= http://www.url.com> too.
`href=\s*".+?[^/]"

Sure. You can use [^/]" at the end of your link expression to match any non-slash followed by a close-quote.

Maybe with this
href\s*=\s*"[^"]*[^"/\s]\s*"

Related

Capture specific first matches in regex

I have this text and want to capture each match of the letter 'ñ' under the html href attribute. I want it to match the 'ñ' in both niño.html and niña.html, but not the ones in Niño and Niña:
<a href='niño.html'>Niño</a> <a href='niña.html'>Niña</a>
I tried this but it also matches Niño:
ñ(.*?\.html'>)+?
When replacing with n\1, it gives:
<a href='nino.html'>Nino</a> <a href='niña.html'>Niña</a>
What I would want the text to look like is:
<a href='nino.html'>Niño</a> <a href='nina.html'>Niña</a>
How can I do this?
when you try this the part between does not contain the single quote:
ñ([^']*?\.html'>)+?

Can't seem to capture newline+spaces in Regex

I know regexes aren't the best for web parsing, but I'm using it as an exercise.
I'm using Район:[^<>]*\n\s*<[^<>]*>\n\s*<a[^<>]*>([^<>]+)<\/a>
to try to match:
Район: </span>
<span class="company__contacts-item-text">
<a class="link" href="/moscow/top/marina-roscha/">Марьина роща</a>
I've been looking at it for a while but I don't know what I've been doing wrong. How can I capture something that would have newlines and different urls in the tags?
Try this regex:
Район:.+?<a[^>]+>(.+?)</a>
DESCRIPTION
DEMO
https://regex101.com/r/wA4oH0/1

Regex for matching url prefix

I want to delete the Google prefix in all URLs.
<a href="http://news.google.com/news/url?sa=t&fd=R&ct2=en&usg=YFo&url=http://www.goo.tv/gd/2015/0509/735557.html
dfgdfgdfgdfgdf9
<a href="http://news.google.com/news/url?sa=t&fd=R&ct2=en&usg=AFQjCNFUS_UVkd9L-r7g&clid=c3878e0698331&cid=5213281008&ei=5DFNVJ4eymQLmyYFo&url=http://www.goo.tv/gd/2015/0509/735557.html
I want to remove http://news.google.com/news/url?sa=t&fd=R&ct2=en&blalba....url=
this Google prefix, so that it only retains the real URL.
I tried the regex, but it doesn't match each prefix, it matches all content
<a href="(http:\/\/news.google.com/news/url\?([\s\S]*)&url=)
Use Lazy Quantifiers:
<a href="(http:\/\/news.google.com\/news\/url\?([\s\S]*?)&url=)
Your regex did not worked because it was greedy(*), and took the match until the last &url= found. Lazy quantifiers(*?) stops at first match found, which is the expected behavior for your case.

Regular expressions: Find and replace url with c:url tag

I have a problem to build good regular expressions to find and replace. I need to replace all urls in many .jsf files. I want replace ulrs staring by XXX with <c:url value="URL_WITHOUT_XXX"/>. Examples below.
I stuck with find regular expression "XXX(.*)" and replace expression "<c:url value="\1"/>", but my find expression match to long string , for example "XXX/a" style="", but need that match only to first " (href end). Anybody helps ?
I have:
<a href="XXX/a" style="">
<a href="XXX/b" >
<a href="XXX/c" ...>
I want:
<a href="<c:url value="/a"/>" style="">
<a href="<c:url value="/b"/>" >
<a href="<c:url value="/c"/>" ...>
PS: Sorry for my poor english ;)
Edit:
I use Find/Replace in Eclipse (regular expressions on)
You should specify the language you're working with.
The following regex will match what you want:
<a href="XXX[^\"]*"
If you want to have some particular value, you can group the regex according to your needs. For example:
<a href="(XXX[^\"]*)"
will give you in the first group:
XXX/a
XXX/b
XXX/b
If you want to have only /a, /b, and /c, you can group it like that:
<a href="XXX([^\"]*)"
Edit:
I will explain what <a href="XXX[^\"]*" does:
It will match: <a href="XXX
Then it should match anything that except a " zero or many times: [^\"]*
Finally match the ", which is not really necessary
When you do: [^abc] you're telling it to match anything but not a, or b, or c.
So [^\"] is: Match anything except a ".
And the quantifier * means zero or more times, so a* will match either an empty string, or a, aa, aaa, ...
And the last thing: Groups
When you want to keep the value appart from the entire match, so you can do anything with it, you can use groups: (something).

Regex for extracting links with specified attributes

I'm trying to build regex to extract links from text which have not rel="nofollow".
Example:
aiusdiua asudauih <a rel="nofollow" hre="http://uashiuadha.asudh/adas>adsaag</a> uhwaida <br> asdgydug <a href="http://asdha.sda/uduih/dufhuis>aguuia</a>
Thanks!
The following regex will do the job:
<a (?![^>]*?rel="nofollow")[^>]*?href="(.*?)"
The wanted urls will be in the capture group #1. E.g. in Ruby it would be:
if input =~ /<a (?![^>]*?rel="nofollow")[^>]*?href="(.*?)"/
match = $~[1]
end
Since it accepts [^>]*? before rel in the negative lookahead, href or anything else can come before rel. If href comes after rel, it'll of course also be ok.
Try this
<(?:A|AREA)\b[^<>]*?(?!rel="nofollow")[^<>]*?href=['"]([^>"]*)[^>]*?>
if you are using .net regex then
<(?:A|AREA)\b[^<>]*?(?!rel="nofollow")[^<>]*?href=['"](?<URL>[^>"]*)[^>]*?>
data lies in group named URL or group 1