dreamweaver regex find and replace to add webp images - regex

I want to find all the jpgs in a big static site and add webp
so i need to find
<img src="image/path/my be/long/new_2.jpg" alt="descriptive tag blah" />
and replace with
<picture>
<source srcset="image/path/amy be/long/new_2.webp" type="image/webp">
<img src="image/path/my be/long/new_2.jpg" alt="descriptive tag blah" />
</picture>
Ive tried all kinds but I'm really unfamiliar with regex, so the closest I've got is
<img src="([^"]+)" alt="(.*?)" />
and replace with
<picture>
<source srcset="$1.webp" type="image/webp">
<img src="$1" alt="$2" />
</picture>
but that comes up with the file extension .jpg.webp
regex is such a huge topic any help on this from anyone with some experience will be very welcome

To adjust the regex to match the source without .jpg you can use this regex:
It matches:
a-z: lowercase characters
0-9: numbers
\/: slash
\s: whitespace
_: underscore
and stops with .jpg"
<img src="([a-z0-9\/\s_]+).jpg" alt="(.*?)" \/>

Related

Notepad++ RegEx replace with pattern

I want to find the following pattern:
Image not found: /Images/IMG-20160519-WA0015.jpg
And replace with some markup, including the image name from the above text like:
<a href="IMG-20160519-WA0015.jpg"><img src="IMG-20160519-WA0015.jpg" width="180" height="240" alt="IMG-20160519-WA0015.jpg" class="image" />
Is it possible with some kind of Regex or plugin or I'm simply burning neurones?
Thanks.
Try finding ^Image not found: \/Images\/(IMG-.*\.jpg) and replacing with <a href="\1"><img src="\1" width="180" height="240" alt="\1" class="image" />
Note that the caret (^) in the regex says that it must be at the beginning of the line, not sure if that's the case for you but I suspect that it is. I also assumed that the "IMG-" prefix is constant, if not then you can just remove those four characters from the regex.
If you're not aware of it, RegExr is a nice interactive way to build and test regular expressions.
EDIT: Since you mentioned having trouble in the comments, here's an image of my settings:

Extract Regex Match

I want to scrape the src of images in a RSS Feed with Yahoo Pipes.
This Regex is selecting exactly what I want.
src="([^"]*)"
Image: <img src="image.jpg" width="300" height="422" alt="urkunde300" style="margin-right: 10px;" />
Pipes: In scrapedimage replace src="([^"]*)" with $1
Output: <img//// width="300" height="422" alt="image300" style="margin-right:10px;"/>
How do I invert the Match so I can replace everything except image.jpg with 'nothing'
You want to match the whole line (<img src="([^"]+)".*), and replace it by specified group in regex ($1)

Regex to match url with blank space

Basically I want to do a Match with a regex expression to get this:
/xpto/uuuu/That name tho [1080p].mp4
from
<section class="video">
<video id="video" autoplay="">
<source src="/xpto/uuuu/That name tho [1080p].mp4" type="video/mp4">
</video>
</section>
What i want is to get relative path that ends with .mp4 from big HTML page.
Can someone help me with this?
Thanks
SOLVED BY RAJ:
"(?<=src="")[^""]+"
Use lookbehind to match all the characters which are just after to src=" and upto to the next " symbol.(ie, the value of source attribute),
(?<=src=")[^"]+
DEMO

Why does this regular expression work?

OK I'm thoroughly on why this regular expression works. The text I'm working with is this:
<html>
<body>
hello
<img src="withalt" alt="hi"/>asdf
<img src="noalt" />fdsaasdf
<img src="withalt2" alt="blah" />
</body>
</html>
Using the following regular expression (tested in php but I'm assuming it's true for all perl regular expressions), it will return all img tags which do not contain an alt tag:
/<img(?:(?!alt=).)*?>/
Returns:
<img src="noalt" />
So based on that I would think that simply removing the no backreference would return the same:
/<img(?!alt=).*?>/
Returns:
<img src="withalt" alt="hi"/>
<img src="noalt" />
<img src="withalt2" alt="blah" />
As you see instead it just returns all image tags. Then to make things even more confusing, removing the ? (simply a wildcard as far as I'm aware) after the * returns up to the final >
/<img(?!alt=).*>/
Returns:
<img src="withalt" alt="hi"/>
<img src="noalt" />fdsaasdf
<img src="withalt2" alt="blah" />
So anyone care to inform me, or at least point me in the right direction of what's going on here?
/<img(?:(?!alt=).)*?>/
This regex applies negative look-ahead for each character it matches after img. So, as soon as it finds alt=, it stops. So, it will only match the img tag, that does not have an alt attribute.
/<img(?!alt=).*?>/
This regex, just applies the negative look-ahead after img. So, it will match everything till the first > for all the img tag which is not followed by alt=, no matter whether alt= appears anywhere further down the string. It will be covered in .*?
/<img(?!alt=).*>/
This is same as the previous one, but it matches everything till the last >, since it uses greedy matching. But I don't know why you got that output. You should have got everything till the last > for </html>.
Now forget everything that happened there, and move towards an HTML Parser, for parsing an HTML. They are specifically designed for this task. So, don't bother using regex, because you can't parse every kind of HTML's through regex.

Emacs query-replace-regexp with html

I was trying the replace-regexp command in Emacs but I've no idea about how to construct the right regexp. My file looks like the following:
<img src="http://s.perros.com/content/perros_com/imagenes/thumbs/1lundehund2.jpg" />
<img src="http://s.perros.com/content/perros_com/imagenes/thumbs/1pleon2.jpg" />
And I want to replace for:
<img src="" class="class-1lundehund2.jpg" />
<img src="" class="class-1pleon2.jpg" />
I was using this regexp with no success (Replaced 0 occurrences):
M-x replace-regexp
Replace regexp: src\=\"http\:\/\/s\.perros\.com\/content\/perros_com\/imagenes\/thumbs\/\([a-zA-Z0-9._-]+\)\"
Replace regexp with: src\=\"\" class\=\"class-\1\"
But in re-builder mode with the same regexp (changing \([a-zA-Z0-9.-]+\) by \\([a-zA-Z0-9.-]+\\)) all the results are right highlighted. I've no idea of what's happening, any tip?
I think you're escaping too many things. regexp = src="http://s\.perros\.com/content/perros_com/imagenes/thumbs/\([^"]*\)", replacement = src="" class="class-\1"