Extract Regex Match - regex

I want to scrape the src of images in a RSS Feed with Yahoo Pipes.
This Regex is selecting exactly what I want.
src="([^"]*)"
Image: <img src="image.jpg" width="300" height="422" alt="urkunde300" style="margin-right: 10px;" />
Pipes: In scrapedimage replace src="([^"]*)" with $1
Output: <img//// width="300" height="422" alt="image300" style="margin-right:10px;"/>
How do I invert the Match so I can replace everything except image.jpg with 'nothing'

You want to match the whole line (<img src="([^"]+)".*), and replace it by specified group in regex ($1)

Related

dreamweaver regex find and replace to add webp images

I want to find all the jpgs in a big static site and add webp
so i need to find
<img src="image/path/my be/long/new_2.jpg" alt="descriptive tag blah" />
and replace with
<picture>
<source srcset="image/path/amy be/long/new_2.webp" type="image/webp">
<img src="image/path/my be/long/new_2.jpg" alt="descriptive tag blah" />
</picture>
Ive tried all kinds but I'm really unfamiliar with regex, so the closest I've got is
<img src="([^"]+)" alt="(.*?)" />
and replace with
<picture>
<source srcset="$1.webp" type="image/webp">
<img src="$1" alt="$2" />
</picture>
but that comes up with the file extension .jpg.webp
regex is such a huge topic any help on this from anyone with some experience will be very welcome
To adjust the regex to match the source without .jpg you can use this regex:
It matches:
a-z: lowercase characters
0-9: numbers
\/: slash
\s: whitespace
_: underscore
and stops with .jpg"
<img src="([a-z0-9\/\s_]+).jpg" alt="(.*?)" \/>

regex to get linkable text

I've been trying for hours now.
I need to get the linkable text meaning, all text from a webpage source that is between <a href> and </a> except the other tags that are nested between the <a> tags.
Example:
<a href="blabla.net">THIS TEXT
<img src="hhh.jpg" /> THIS TEXT TOO
<span> ALSO THIS TEXT. </span>AND ALSO THIS TEXT</a>
You could use a simple regular expression with a non greedy group:
<[aA]\b[^\>]*>([\w\W]*?)<\/[aA]>
You can test it on this page by hitting F12 then typing
$(document.body).html().match(/<a\b[^\>]*>([\w\W]*?)<\/a>/ig)
You can try the following Regular expression, that returns the text between tags in four groups:
(?<=>)[^<]+?(?=<)
It removes tags from the text.

regex to replace HTML sorrounding tag

im trying to replace an html tag with another one using notepad++ search and replace.
i would like this:
<strong style="font-size: 1em;"><br />some text</strong>
to become this
<h3>some text</h3>
so far i have reached this:
<strong style="font-size: 1em;"\s(.*?)><br />(.*?)</strong>
and am not sure what to put inside "replace with", is this ok:
<h3>$1</h3>
?
Thanks
Try this as the replacement pattern.
<h3>\2</h3>
You can reference capture groups (between parenthesis) in the regex by \n where n is the number of the group.
The regex should be this for catching...
<strong style="font-size: 1em;"\s?(.*?)><br />(.*?)</strong>
this \s should be optional according to your html

Regular Expression in dreamweaver

Can anyone help me turn this into a regular expresion?
<a onclick="NavigateChat();" style="cursor:pointer;"><img src="images/online-chat.jpg" width="350" height="150" border="0" alt="Title Loans Novato - Online Chat"/></a>
The alt tag will change, and so might the image, but
<a onclick="NavigateChat();" style="cursor:pointer;">
will always start the string, and
</a>
will always end it.. How can I used a regex to find this?
Description
I'm not quite sure what you're looking to return, so this generic regular expression will:
find anchor tags
require the anchor tag to have an attribute onclick="navigatechat();"
require the anchor tag to have an attribute style="cursor:pointer;"
allow the attributes to be matched in any order
require the anchor tag's inner text to be only an image tag
capture the anchor tag's inner text tag in it's entirety
avoid many of the edge cases which makes pattern matching in html difficult
<a(?=\s|>)(?=(?:[^>=|&)]*|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*?\sonclick="NavigateChat\(\);")(?=(?:[^>=|&)]*|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*?\sstyle="cursor:pointer;")(?:[^>=|&)]|='(?:[^']|\\')*'|="(?:[^"]|\\")*"|=[^'"][^\s>]*)*>\s*(<img\s.*?)\s*<\/a>
Example
Live Demo
Sample Text
<a onmouseover=' a=1; onclick="NavigateChat();" style="cursor:pointer;" href="www.NotYourURL.com" ; if (3 <a && href="www.NotYourURL.com" && id="revSAR" && 6 > 3) { funRotate(href) ; } ; ' href='http://InterestedURL.com' id='revSAR'><img src="YouShouldn'tFindMe.nope"></a>
<a onclick="NavigateChat();" style="cursor:pointer;"><img src="images/online-chat.jpg" width="350" height="150" border="0" alt="Title Loans Novato - Online Chat"/></a>
Matches
Group 0 gets the entire matched anchor tag
Group 1 gets the inner text
[0][0] = <a onclick="NavigateChat();" style="cursor:pointer;"><img src="images/online-chat.jpg" width="350" height="150" border="0" alt="Title Loans Novato - Online Chat"/></a>
[0][1] = <img src="images/online-chat.jpg" width="350" height="150" border="0" alt="Title Loans Novato - Online Chat"/>
Do you need to extract/capture certain pieces of info or just find the whole string?
My usual method for generalizing regexp is to start with the literal text and just replace elements with general placeholders...
<a onclick="NavigateChat\(\);" style="cursor:pointer;"><img src="[^"]+" width="\d+" height="\d+" border="\d+" alt="[^"]+"/></a>
This expression uses the character set [^"] which stands for "not a quote mark". If you just use .* as a wildcard, your regexp will fail if there is more than one tag present in your document. Regexps are "greedy" and would try to select ALL the text from the first tag through to the end of the last link.
Without a data sample, I can't test this for sure, but it should be close.

Regex for extracting links with specified attributes

I'm trying to build regex to extract links from text which have not rel="nofollow".
Example:
aiusdiua asudauih <a rel="nofollow" hre="http://uashiuadha.asudh/adas>adsaag</a> uhwaida <br> asdgydug <a href="http://asdha.sda/uduih/dufhuis>aguuia</a>
Thanks!
The following regex will do the job:
<a (?![^>]*?rel="nofollow")[^>]*?href="(.*?)"
The wanted urls will be in the capture group #1. E.g. in Ruby it would be:
if input =~ /<a (?![^>]*?rel="nofollow")[^>]*?href="(.*?)"/
match = $~[1]
end
Since it accepts [^>]*? before rel in the negative lookahead, href or anything else can come before rel. If href comes after rel, it'll of course also be ok.
Try this
<(?:A|AREA)\b[^<>]*?(?!rel="nofollow")[^<>]*?href=['"]([^>"]*)[^>]*?>
if you are using .net regex then
<(?:A|AREA)\b[^<>]*?(?!rel="nofollow")[^<>]*?href=['"](?<URL>[^>"]*)[^>]*?>
data lies in group named URL or group 1