I have deleted all tags in my blog with this regex expression:
<a\s[^>]*> $1 and now I need to change all my image URLs from:
<img src="http://files.tampo.ua/files/news/part_38/388705/1.jpg" width="500" height="291" border="0" class="c24" />
to:
<img src="https://dl.dropbox.com/u/85819604/1.jpg" width="500" height="291" border="0" class="c24" />
So I need to replace the main path to the image server.
http://clip2net.com/s/22PwP
Try replacing http://files.tampo.ua/files/news/[^/]*/[^/]*/ with https://dl.dropbox.com/u/85819604/
Related
Hello I have a html file with several img tags:
<img src="https://www.pokeyplay.com/imagenes/backend/publicidad.gif" alt="Publicidad" align="left" />
<img src="https://www.pokeyplay.com/imagenes/backend/spacer.gif" alt="sp" />
<img src="imagenes/backend/etiqueta-pyp-pokedex.gif" alt="P&P PokéDex" width="184" height="100" />
<img src="imagenes/backend/spacer.gif" alt="sp" />
<img src="http://urpgstatic.com/img_library/pokemon_sprites/187.png" style="vertical-align:middle" />
In order to stract all img tags I am using the following regexp:
'<img[^>]* src=\"([^\"]*)\"[^>]*>'
But I want to extract only all IMG tags from urpgstatic.com
How can do this?
I did several tries like this:
<img.*?src="(http[s]?:\/\/)urpgstatic.com?([^\/\s]+\/)(.*)[png]$"[^\>]+>
Thanks
Try this
<img[^>]*(?=\"https?:\/\/(www\.)?urpgstatic\.com)\"([^\"]*)\"[^>]*>
Demo
Also, this will work with grep
grep -iP '<img[^>]*(?=\"https?:\/\/(www\.)?urpgstatic\.com)\"([^\"]*)\"[^>]*>' index.html
You may use this grep command:
grep -ioE '<img [^>]*src="https?://(www\.)?urpgstatic\.com/[^>]*>' file.html
<img src="http://urpgstatic.com/img_library/pokemon_sprites/187.png" style="vertical-align:middle" />
Though please remember that parsing HTML using regex may be error prone and using a HTML parser such as DOM in php is more reliable.
RegEx Details:
<img [^>]*src=: Match <img <anything-except->src= text
"https?://: Match http://orhttps://`
(www\.)?urpgstatic\.com/: Match optional www. followed by urpgstatic.com/
I have a requirement where I need to modify html 'img' tags in an html string that do not end with a '/>'
ex: <img src=""> needs to be changed to <img src=""/>
I am using following regex: <img(.*[^/])> to replace with <img$1/>
This works fine however for cases like: <center><img src=""/></center> the regex returns: <center><img src=""></center/>
Any suggestions how to impact this regex only upto the end of the img tag? Thanks.
You may use this:
<\s*img\s+([^>]*=(?:\".*?\"|\'.*?\'))[\s\w\-]*>
with following replace by:
<img $1/>
this will match these simple and complex cases:
<img src="images/a.jpg" title="test"><br/>
<img src="a/b.jpg" >
<span><img src="a.jpg"></span>
<img src="" title="">
<img src="" data-val>
<img src="a.jpg" title="a'>b">
<img src="a.jpg" title='a">b'>
<img src="a.jpg" title='a>=b"=>' >
but not following:
<img src="a.jpg" />
<imgXTag src="b.jpg" >
<img src="a.jpg" / >
Sample Demo
I use photobucket to host my imagery for my ebay ads when I sell things, so I copy the html out of photobucket into notepad, and I'm always left the <img> tag being wrapped in photobucket's <a> tag, and I have to go through each line and manually delete each <a></a>, which on 26 lines across multiple items can soon equate too hundreds of "highlight and delete" actions.
I already do a search for the closing tag </a> and just do a "replace" with nothing, thus removing it, but the string I cannot fathom to remove, due to the image file name being different on every line is as the following example demonstrates:
So it's essentially the section of the anchor tag up to and including the > I need to be able to remove on a mass scale - Any help would be greatly appreciated!
<img src="http://i1297.photobucket.com/albums/ag35/eye/Programmes/Yes%20joblot/DSC02424c_zpslt9m0cuu.jpg" border="0" alt=" photo DSC05653_zpslt9m0cuu.jpg"/>
<img src="http://i1297.photobucket.com/albums/ag35/eye/Programmes/Yes%20joblot/DSC04444_zpspkgjw6vf.jpg" border="0" alt=" photo DSC05654_zpspkgjw6vf.jpg"/>
<img src="http://i1297.photobucket.com/albums/ag35/eye/Programmes/Yes%20joblot/DSC05655_zpsxuev7czs.jpg" border="0" alt=" photo DSC05655_zpsxuev7czs.jpg"/>
<img src="http://i1297.photobucket.com/albums/ag35/eye/Programmes/Yes%20joblot/DSC06624_zpsifjidypy.jpg" border="0" alt=" photo DSC05656_zpsifjidypy.jpg"/>
<img src="http://i1297.photobucket.com/albums/ag35/eye/Programmes/Yes%20joblot/DSC07777_zpsacyjrnnr.jpg" border="0" alt=" photo DSC05663_zpsacyjrnnr.jpg"/>
<a href="[^"]+?" target="_blank">
would do what you want, or even more general:
<a href=[^>]+?>
Take the following code as an example
<a style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;" href="http://3.bp.blogspot.com/xxxxxxx.JPG"><img src="xxxxxxxxxxxxxx" alt="" width="200" height="150" border="0" /></a>
How can I create a regular expression to strip out any link tag containg the domain 'blogspot.com' from the img tag?
In the end I would want this
<img src="xxxxxxxxxxxxxx" alt="" width="200" height="150" border="0" />
Thanks in advance.
First of all I suggest you to read this. If you still want the regex..
You can use the following to match:
<[^>]*?href\s*=\s*"[^>]*?blogspot\.com[^>]*>(<img[^>]*?\/>)<\/[^>]*>
And replace or extract with $1
See DEMO
I am using yahoo pipes to get content matching a certian category from my WordPress.com Blog. Everything is working fine but WordPress adds "share" links to the bottom of the feed that I would like to remove.
Here is what's being added:
<a rel="nofollow" target="_blank" href="http://feeds.wordpress.com/1.0/gocomments/bandonrandon.wordpress.com/87/">
<img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/bandonrandon.wordpress.com/87/"/></a>
<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=bandonrandon.wordpress.com&blog=1046814&post=87&subd=bandonrandon&ref=&feed=1" width="1" height="1"/>
I edited out some of the services but you get the idea. I tried to use regex to remove this content what I tried was this:
<a rel="nofollow" target="_blank" href="http://feeds.wordpress.com/.*?><img alt="" border="0" src="http://feeds.wordpress.com.*?></a>
and
<img alt="" border="0" src="http://stats.wordpress.com.*?>
however it didn't fileter the results at all.
Using this would filter ALL images and works fine
<a.*?><img.*?></a>
<a[^>]+href="http://feeds.wordpress.com[^"]*"[^>]*>\s*<img[^>]+src="http://feeds.wordpress.com/[^"]*"[^>]*>\s*</a>\s*<img[^>]+src="http://stats.wordpress.com/[^"]*"[^>]*>
Regex updated, try that to match the whole lot.