Regex to get part of image url (sublime text 3) - regex

I have a XML database containing several thousand positions. Text + html tags (images and links). I need a regex for Sublime Text 3 to replace a portion of the every image url (everything before file name).
For example, I have this:
<img src="/images/fanart/bigfana2121rt/215627676.jpg">
and
<img src="/images/screenshots/goodlooking/tret/215627676.gif">
And I need to get this:
/images/fanart/bigfana2121rt/
and this:
/images/screenshots/goodlooking/tret/
Thank you.

Regex:
<img\b[^>]*\bsrc="([^"]*\/)[^\/"]*"[^<>]*>
Replacement string:
\1
DEMO

Related

regex to get linkable text

I've been trying for hours now.
I need to get the linkable text meaning, all text from a webpage source that is between <a href> and </a> except the other tags that are nested between the <a> tags.
Example:
<a href="blabla.net">THIS TEXT
<img src="hhh.jpg" /> THIS TEXT TOO
<span> ALSO THIS TEXT. </span>AND ALSO THIS TEXT</a>
You could use a simple regular expression with a non greedy group:
<[aA]\b[^\>]*>([\w\W]*?)<\/[aA]>
You can test it on this page by hitting F12 then typing
$(document.body).html().match(/<a\b[^\>]*>([\w\W]*?)<\/a>/ig)
You can try the following Regular expression, that returns the text between tags in four groups:
(?<=>)[^<]+?(?=<)
It removes tags from the text.

Use Regular Expression to retrieve Url in the row with more than one Url

This is an example string.
<p style="text-align: center;"><img class="aligncenter wp-image-22582 size-full" src="http://the7.dream-demo.com/main/wp-content/uploads/sites/9/2014/05/show-04.png" alt="" width="372" height="225" /></p
There are two Url in a row
One is for PNG, the other is for a web page. I want to get the Png url like the pattern "http:.....png".
It simply uses "http://.*?png", but it retrieves a string from the first "http://" URL to the second Url with Png file extension.
I can now do it using the condition href and src to identify which belongs to Png url. But it will miss a lot of png urls with other patterns like <png>Png url</png>.
How could it be solved? Thanks.
Uhmm, dont parse html with regex as Biffen commented on, but you can extract bits eg:
(?<=href=")[^"]+.png
will do a lookbehind for href=" at the start of the pattern, match every character that isn't a " until the .png at the end.
Spending an hour learning regex will save you time coming here.

How do I get Regex to correctly match urls with correct image file names?

I am trying to update paths of a large document using Regex and I am trying to get the match to match all img Src tags that include the file type JPG, PNG
I have useded the following to try get Regex to correctly match:
<img src="xanne.nnn.pagespeed.ic.u49smximgo.jpg" alt="test">
<img src="xanne.nnn.pagespeed.ic.u49smximgo.png" alt="test">
but it also matches the bottom
<img src="xanne.nnn.pagespeed.ic.u49smximgo.webp" alt="test">
<img src="xanne.nnn.pagespeed.ic.u49smximgo.gif" alt="test">
When I use the following Regex:
<img src="(?=.*(jpg|png)?)
Any ideas how I can get it to match only (1 and 2) and not 1,2,3,4?
If you want to match only the src attribute value
img src="(?=(.*jpg|.*png))
RegEx Demo
This will match whole line that has of the format shown in 1,2
(?=^<img src=".*(jpg|png)").*$
How do I get Regex to correctly match urls with correct image file names?
Get the matched group from index 1 that is captured using parenthesis in below regex pattern:
<img src="(.*\.(jpg|png))"
Online demo
OR try without grouping as well using Non-capturing group and Positive Look Behind that do not consume characters in the string, but only assert whether a match is possible or not.
(?<=<img src=").*\.(?:jpg|png)(?=")
Online demo

get text between html tags correctly

I want to grab the text between html tags using Dreamweaver's search and replace tool.
The link format is a standard a tag e.g.
Text
Or:
Text and Text 2
Or:
Text
I am using the following expression:
(.*)
This works fine for example 1, but it picks up everything between the first opening tag <a href and the last closing tag </a> in the case of example 2.
What can I do to just targeting each individual link?
Also, what can I do in the case of example 3 where links also have a target="_blank" property?
if you just want the "Text" in the body of the tag
<a[^>]*>([^<]*)</a>
would work
if you also want the href
<a[^>]*href="([^>"]*)"[^>]*>([^<]*)</a>

Regex to convert URLs to HTML <a href> hyperlinks in Notepad++?

I have a list of URLs in a text file I am trying to change to HTML, but I'm failing miserably.
My URLs are in this format:
http://mydomain.com/here-are-my-links.html
Does anybody know of a regex search/replace command I can run in Notepad++ to change my URL list to this format:
here are my links
Use the regex
(http://mydomain.com/(.*?)\.html)
and replace it with
\2
If you want to change - into space you can do this
-(?=[^<>]*?</a>)
and replace it with