Regex to Match Two Strings Including Anything in Between Without Line-Breaks - regex

I want to replace all the .png extensions in my HTML to .webp
so I am doing the regex expressing to match the png links:
\.\/assets\/images\/.*\.png
This works ok if my HTML file has line breaks like this:
<picture>
<source class="d-block w-100" media="(max-width: 575px)"
srcset="./assets/images/slider/advertisers-pt.png">
<source class="d-block w-100"
media="(min-width: 576px) and (max-width: 768px)"
srcset="./assets/images/slider/advertisers-pt.png">
<img class="w-100" srcset="
./assets/images/slider/advertisers-ls.png"
src="./assets/images/slider/advertisers-ls.png" alt="">
</picture>
and it matches all the strings correctly.
but after it's minified, it's no longer working, and it matches the start string until the last occurrence of the second string with everything in between, so the following:
<picture><source class="d-block w-100" media="(max-width: 575px)"srcset="./assets/images/slider/advertisers-pt.png"><source class="d-block w-100"media="(min-width: 576px) and (max-width: 768px)" srcset="./assets/images/slider/advertisers-pt.png"><img class="w-100" srcset="./assets/images/slider/advertisers-ls.png" src="./assets/images/slider/advertisers-ls.png" alt=""></picture>
will have a match for:
./assets/images/slider/advertisers-pt.png"><source class="d-block w-100"media="(min-width: 576px) and (max-width: 768px)" srcset="./assets/images/slider/advertisers-pt.png"><img class="w-100" srcset="./assets/images/slider/advertisers-ls.png" src="./assets/images/slider/advertisers-ls.png
How can I do this with regex after my file is minified?

Try with /S non-whitespace character matcher instead of . matches to any:
\.\/assets\/images\/\S*\.png
regex101 Demo

Related

dreamweaver regex find and replace to add webp images

I want to find all the jpgs in a big static site and add webp
so i need to find
<img src="image/path/my be/long/new_2.jpg" alt="descriptive tag blah" />
and replace with
<picture>
<source srcset="image/path/amy be/long/new_2.webp" type="image/webp">
<img src="image/path/my be/long/new_2.jpg" alt="descriptive tag blah" />
</picture>
Ive tried all kinds but I'm really unfamiliar with regex, so the closest I've got is
<img src="([^"]+)" alt="(.*?)" />
and replace with
<picture>
<source srcset="$1.webp" type="image/webp">
<img src="$1" alt="$2" />
</picture>
but that comes up with the file extension .jpg.webp
regex is such a huge topic any help on this from anyone with some experience will be very welcome
To adjust the regex to match the source without .jpg you can use this regex:
It matches:
a-z: lowercase characters
0-9: numbers
\/: slash
\s: whitespace
_: underscore
and stops with .jpg"
<img src="([a-z0-9\/\s_]+).jpg" alt="(.*?)" \/>

Only grep img tags that contain a keyword, but not img tags that don't?

Using grep/regex, I am trying to pull img tags out of a file. I only want img tags that contain 'photobucket' in the source, and I do not want img tags that do not contain photobucket.
Want:
<img src="/photobucket/img21.png">
Do Not Want:
<img src="/imgs/test.jpg">
<img src="/imgs/thiswillgetpulledtoo.jpg"><p>We like photobucket</p>
What I have tried:
(<img.*?photobucket.*?>)
This did not work, because it pulled the second example in "Do Not Want", as there was a 'photobucket' and then a closing bracket. How can I only check for 'photobucket' up until the first closing bracket, and if photobucket is not contained, ignore it and move on?
'photobucket' may be in different locations within the string.
grep -o '<img[^>]*src="[^"]*photobucket[^>]*>' infile
-o returns only the matches. Split up:
<img # Start with <img
[^>]* # Zero or more of "not >"
src=" # start of src attribute
[^"]* # Zero or more or "not quotes"
photobucket # Match photobucket
[^>]* # Zero or more of "not >"
> # Closing angle bracket
For the input file
<img src="/imgs/test.jpg">
<img src="/imgs/thiswillgetpulledtoo.jpg"><p>We like photobucket</p>
<img src="/photobucket/img21.png">
<img alt="photobucket" src="/something/img21.png">
<img alt="something" src="/photobucket/img21.png">
<img src="/photobucket/img21.png" alt="something">
<img src="/something/img21.png" alt="photobucket">
this returns
$ grep -o '<img[^>]*src="[^"]*photobucket[^>]*>' infile
<img src="/photobucket/img21.png">
<img alt="something" src="/photobucket/img21.png">
<img src="/photobucket/img21.png" alt="something">
The non-greedy .*? works only with the -P option (Perl regexes).
Just add a negation of > sign:
(<img[^>]*?photobucket.*?>)
https://regex101.com/r/tZ9lI9/2
Try the following:
<img[^>]*?photobucket[^>]*?>
This way the regex can't got past the '>'
Try with this pattern:
<img.*src=\"[/a-zA-Z0-9_]+photobucket[/a-zA-Z0-9_]+\.\w+\".*>
I´m not sure the characters admited by the name folders, but you just need add in the ranges "[]" before and after the "photobucket".

Extract Regex Match

I want to scrape the src of images in a RSS Feed with Yahoo Pipes.
This Regex is selecting exactly what I want.
src="([^"]*)"
Image: <img src="image.jpg" width="300" height="422" alt="urkunde300" style="margin-right: 10px;" />
Pipes: In scrapedimage replace src="([^"]*)" with $1
Output: <img//// width="300" height="422" alt="image300" style="margin-right:10px;"/>
How do I invert the Match so I can replace everything except image.jpg with 'nothing'
You want to match the whole line (<img src="([^"]+)".*), and replace it by specified group in regex ($1)

I need to match all a tags that contain imgs from a string

I need to match all a tags that contain imgs from a string.
<img src="...." alt="####1" title="####1"/>
<img src="...." alt="####2" title="####2"/>
<img src="...." alt="####2" title="####3"/>
Thanks
<a\s*[^>]*>.*?<img\s*[^>]*>.*<\/a>
Try this.This should do it.

Regex to match url with blank space

Basically I want to do a Match with a regex expression to get this:
/xpto/uuuu/That name tho [1080p].mp4
from
<section class="video">
<video id="video" autoplay="">
<source src="/xpto/uuuu/That name tho [1080p].mp4" type="video/mp4">
</video>
</section>
What i want is to get relative path that ends with .mp4 from big HTML page.
Can someone help me with this?
Thanks
SOLVED BY RAJ:
"(?<=src="")[^""]+"
Use lookbehind to match all the characters which are just after to src=" and upto to the next " symbol.(ie, the value of source attribute),
(?<=src=")[^"]+
DEMO