Emacs query-replace-regexp with html - regex

I was trying the replace-regexp command in Emacs but I've no idea about how to construct the right regexp. My file looks like the following:
<img src="http://s.perros.com/content/perros_com/imagenes/thumbs/1lundehund2.jpg" />
<img src="http://s.perros.com/content/perros_com/imagenes/thumbs/1pleon2.jpg" />
And I want to replace for:
<img src="" class="class-1lundehund2.jpg" />
<img src="" class="class-1pleon2.jpg" />
I was using this regexp with no success (Replaced 0 occurrences):
M-x replace-regexp
Replace regexp: src\=\"http\:\/\/s\.perros\.com\/content\/perros_com\/imagenes\/thumbs\/\([a-zA-Z0-9._-]+\)\"
Replace regexp with: src\=\"\" class\=\"class-\1\"
But in re-builder mode with the same regexp (changing \([a-zA-Z0-9.-]+\) by \\([a-zA-Z0-9.-]+\\)) all the results are right highlighted. I've no idea of what's happening, any tip?

I think you're escaping too many things. regexp = src="http://s\.perros\.com/content/perros_com/imagenes/thumbs/\([^"]*\)", replacement = src="" class="class-\1"

Related

Replace a string using Notepad++ and regex

I have strings like this:
<img src="http://www.example.com/app_res/emoji/1F60A.png" /><img src="http://www.example.com/app_res/emoji/1F389.png" />
<img src="http://www.example.com/app_res/emoji/1F61E.png" /><img src="http://www.example.com/app_res/emoji/1F339.png" />
I want them to be like this:
😊 🎉
😞 🌹
In Notepad++, I tried this :
Find what: ^\s*<img src="http://www.example.com/app_res/emoji/(1F.*).png" />
Replace with: &#x\1;
The result is not as expected:
&#x1F60A.png" /><img src="http://www.example.com/app_res/emoji/1F389;
How to best isolate the regular expression ?
Any help is welcome ! Thank you
You're using the unspecific . together with the greedy star *. Don't do that here, as this tends to overshoot the target.
Be more specific.
The file name (in your case) does not contain dot's. Let's use "anything except a dot" ([^.]*) instead of "anything" (.*):
^\s*<img src="http://www.example.com/app_res/emoji/(1F[^.]*).png" />
You may try the following find and replace, in regex mode:
Find: <img src=".*?/([A-Z0-9]+\.\w+"\s*/><img src=".*?/([A-Z0-9]+\.\w+"\s*/>
Replace: &#x$1; &#x$2;
Here is a working regex demo.
Try
Find:^<.*?/(1\w+).*?/(1\w+).*
Replace:&#x$1; &#x$2;

dreamweaver regex find and replace to add webp images

I want to find all the jpgs in a big static site and add webp
so i need to find
<img src="image/path/my be/long/new_2.jpg" alt="descriptive tag blah" />
and replace with
<picture>
<source srcset="image/path/amy be/long/new_2.webp" type="image/webp">
<img src="image/path/my be/long/new_2.jpg" alt="descriptive tag blah" />
</picture>
Ive tried all kinds but I'm really unfamiliar with regex, so the closest I've got is
<img src="([^"]+)" alt="(.*?)" />
and replace with
<picture>
<source srcset="$1.webp" type="image/webp">
<img src="$1" alt="$2" />
</picture>
but that comes up with the file extension .jpg.webp
regex is such a huge topic any help on this from anyone with some experience will be very welcome
To adjust the regex to match the source without .jpg you can use this regex:
It matches:
a-z: lowercase characters
0-9: numbers
\/: slash
\s: whitespace
_: underscore
and stops with .jpg"
<img src="([a-z0-9\/\s_]+).jpg" alt="(.*?)" \/>

Notepad++ RegEx replace with pattern

I want to find the following pattern:
Image not found: /Images/IMG-20160519-WA0015.jpg
And replace with some markup, including the image name from the above text like:
<a href="IMG-20160519-WA0015.jpg"><img src="IMG-20160519-WA0015.jpg" width="180" height="240" alt="IMG-20160519-WA0015.jpg" class="image" />
Is it possible with some kind of Regex or plugin or I'm simply burning neurones?
Thanks.
Try finding ^Image not found: \/Images\/(IMG-.*\.jpg) and replacing with <a href="\1"><img src="\1" width="180" height="240" alt="\1" class="image" />
Note that the caret (^) in the regex says that it must be at the beginning of the line, not sure if that's the case for you but I suspect that it is. I also assumed that the "IMG-" prefix is constant, if not then you can just remove those four characters from the regex.
If you're not aware of it, RegExr is a nice interactive way to build and test regular expressions.
EDIT: Since you mentioned having trouble in the comments, here's an image of my settings:

Why does this regular expression work?

OK I'm thoroughly on why this regular expression works. The text I'm working with is this:
<html>
<body>
hello
<img src="withalt" alt="hi"/>asdf
<img src="noalt" />fdsaasdf
<img src="withalt2" alt="blah" />
</body>
</html>
Using the following regular expression (tested in php but I'm assuming it's true for all perl regular expressions), it will return all img tags which do not contain an alt tag:
/<img(?:(?!alt=).)*?>/
Returns:
<img src="noalt" />
So based on that I would think that simply removing the no backreference would return the same:
/<img(?!alt=).*?>/
Returns:
<img src="withalt" alt="hi"/>
<img src="noalt" />
<img src="withalt2" alt="blah" />
As you see instead it just returns all image tags. Then to make things even more confusing, removing the ? (simply a wildcard as far as I'm aware) after the * returns up to the final >
/<img(?!alt=).*>/
Returns:
<img src="withalt" alt="hi"/>
<img src="noalt" />fdsaasdf
<img src="withalt2" alt="blah" />
So anyone care to inform me, or at least point me in the right direction of what's going on here?
/<img(?:(?!alt=).)*?>/
This regex applies negative look-ahead for each character it matches after img. So, as soon as it finds alt=, it stops. So, it will only match the img tag, that does not have an alt attribute.
/<img(?!alt=).*?>/
This regex, just applies the negative look-ahead after img. So, it will match everything till the first > for all the img tag which is not followed by alt=, no matter whether alt= appears anywhere further down the string. It will be covered in .*?
/<img(?!alt=).*>/
This is same as the previous one, but it matches everything till the last >, since it uses greedy matching. But I don't know why you got that output. You should have got everything till the last > for </html>.
Now forget everything that happened there, and move towards an HTML Parser, for parsing an HTML. They are specifically designed for this task. So, don't bother using regex, because you can't parse every kind of HTML's through regex.

Coldfusion RegEx to replace characters

I have the following code:
<cfset arguments.textToFormat = Replace(arguments.textToFormat, Chr(10), '<br />', "ALL") />
It replaces all instances of Chr(10) with a <br /> tag.
What I'd like to do however is afterwards, if there are more than two <br /> tags, replace all the extra ones with empty string (i.e. remove them)
I could do this via code, but I'm sure a regex replace would be faster. Unfortunately I haven't a clue how to construct the regex.
Any help would be great - thanks.
There may be a more elegant regex, but this should do it:
rereplace( myText, '(<br />){2,}', '<br />', 'all' )
That should find all instances of 2 or more <br /> tags, and replace the whole set with a single tag.