Replacing a dynamic url with text? - regex

I need to replace the below url (including img tags) with text. I am not very good with regex...
However, as the date will change its not only about copy and replace infortunately.
<img src="http://thailandsbloggare.se/wp-content/uploads/2012/10/icon_wink.gif" alt=";)">
and sometimes with class="wp-smiley"
<img src="http://thailandsbloggare.se/wp-content/uploads/2012/09/icon_wink.gif" alt=";)" class="wp-smiley" />
So any time this image is posted I want the complete string to replaced to text ";)"
Its a site built in Wordpress using PHP
Thanks in advance!
EDIT: I want to clarify why I can not just use a search and replace and why the url is dynamic.
The date part - written as this /2012/10/ will be different for every time this image is posted. So everything but the date will be the same all the time!!
And saometime the string ends with alt=";)"> and sometimes with class="wp-smiley" />

This should match your URL and any date (at least until 2099)
http://thailandsbloggare.se/wp-content/uploads/20\d\d/\d+/icon_wink\.gif

Related

Use Regular Expression to retrieve Url in the row with more than one Url

This is an example string.
<p style="text-align: center;"><img class="aligncenter wp-image-22582 size-full" src="http://the7.dream-demo.com/main/wp-content/uploads/sites/9/2014/05/show-04.png" alt="" width="372" height="225" /></p
There are two Url in a row
One is for PNG, the other is for a web page. I want to get the Png url like the pattern "http:.....png".
It simply uses "http://.*?png", but it retrieves a string from the first "http://" URL to the second Url with Png file extension.
I can now do it using the condition href and src to identify which belongs to Png url. But it will miss a lot of png urls with other patterns like <png>Png url</png>.
How could it be solved? Thanks.
Uhmm, dont parse html with regex as Biffen commented on, but you can extract bits eg:
(?<=href=")[^"]+.png
will do a lookbehind for href=" at the start of the pattern, match every character that isn't a " until the .png at the end.
Spending an hour learning regex will save you time coming here.

Changing regexes to target image links

I need to batch change a folder full of files, changing all image links to lower case and replacing underscores with dashes. Thus, <img src="/images/Maps/South_America.png"> would become <img src="images/maps/south-america.png">
I already performed similar operations on all local links in the same files. I used this regex to change them to lower case:
(?<=(?i)href=")((?:<\?php(?:(?!\?>).)+\?>)?)((?:'[^']+')?)([^"]+)(?=")
\1\2\L\3
And I used this one to replace underscores with dashes:
(href="(?!http)[^_"]+)\_([^"]+")
$1-$2
I'm not even sure if they're the same "language;" I think one only works in Dreamweaver, the other in TextWrangler. Anyway, I haven't figured out how to modify to match images, rather than links. I should emphasize that I only to change the image paths and names, not any classes, ID's or alt tags.
For example, <img src="Buffalo_Bill.jpg" alt="Buffalo Bill" class="People"> would become <img src="buffalo-bill.jpg" alt="Buffalo Bill" class="People">
Also, I think this covers all the bases if defining image extensions is necessary...
(?:jpe?g|gif|png|svg|swf)
The regexes I posted above are just examples. If you have a regex that's totally different, that's fine - just as long as it will work in a common text editor like Dreamweaver or TextWrangler. (I'm on a Mac.)
With an input like this:
<img id="BoringSnowDay" class="FunkySmellsFromGarden" src="/images/Maps/South_America.png" alt="Powerball Winner!" /> <img id="ExcitingSunNight" class="SmoothTasteInKitchen" src="/images/Flags/Antartica.jpg" alt="Racecar racecaR!" />
This regex in TextWrangler:
(<img [^>]+)(src="[^"]+")
Replace:
\1\L\2
Gives me something that ONLY affects the src="..." portion and nothing else.
Unfortunately, combining that to a "...and replace _ to -" tends to get a little tricky.

Need to replace dynamic image tag with text

I need to replace the below url (including img tags) with text. I am not very good with regex... As you can see its dynamic with dates, and it ends in two different ways:
with alt=";)"> and sometimes with class="wp-smiley" />
<img src="http://thailandsbloggare.se/wp-content/uploads/2012/10/icon_wink.gif" alt=";)">
and sometimes with class="wp-smiley" at the end
<img src="http://thailandsbloggare.se/wp-content/uploads/2012/09/icon_wink.gif" alt=";)" class="wp-smiley" />
So any time this image is posted I want the complete string to replaced to text ";)"
I have managed to write the regex for everything until alt=";)"> and sometimes with class="wp-smiley" /> but then I am stuck, pressume need some OR functionality here.
<img src="http://thailandsbloggare.se/wp-content/uploads/20\d\d/\d+/icon_wink\.gif
Updated information after replies below
<img src="http://thailandsbloggare.se/wp-content/uploads/20[0-9]{2}/[01][0-9]/icon_wink.gif" alt=";\)" *(|class="wp-smiley")?>
and
Both fail returning strings whith class="wp-smiley" /> included
Its a site built in Wordpress using PHP and I am using http://urbangiraffe.com/plugins/search-regex/
Thanks in advance!
Normally, in a regex, you can create alternative sub-regexes:
(match this|or this)
In your case
(alt=";\)"|class="wp-smiley")
If alt=";)" is always there, do:
alt=";\)" *(|class="wp-smiley")
Of course, we don't know in which editor or programming language you are operating, and the actual regex implementation can be different from the above example.
Try the following pattern search:
<img src="http://thailandsbloggare.se/wp-content/uploads/20[0-9]{2}/[01][0-9]/icon_wink.gif" alt=";\)"(\sclass="wp-smiley")?>
Please refer to the syntax supported by the regex engine you are using. But, for most engines the above pattern should work. Note the character class used for date ranges, you should change it appropriately.

How to get string of everything between these two em tag?

I want to get string between em tag , including other html also.
for example:
<em>UNIVERSALPOSTAL UNION - International Bureau Circular<br />
By: K.J.S. McKeown</em>
output should be as:
UNIVERSALPOSTAL UNION - International Bureau Circular<br />
By: K.J.S. McKeown
please help me.
Thanks
Use the regular expression function like this:
REMatch("(?s)<em>.*?</em>", html)
See also: http://livedocs.adobe.com/coldfusion/8/htmldocs/help.html?content=regexp_01.html
The (?s) sets the mode to single line, so that the input text is interpreted as one line even if it contains line feeds. This is probably the default (I'm not sure) so it can be omitted. As Peter pointed out in a comment, this is not the default and therefore must be set.
The .*? matches all characters inbetween <em> and </em>. The questionmark after the multiplier makes it "non-greedy", so that as few as possible characters are matched. This is needed in case the input html contains something like <em>foo</em><em>bar</em> where otherwise only the outermost <em></em> tags are considered.
The returned array contains all matches found, i.e. all texts including html that was in <em> tags.
Note that this could fail for circumstances where </em> also occurs as attribute text and is incorrectly not html-encoded, for example: <em><a title="Help for </em> tag">click</a></em> or in other rare circumstances (e.g. javascript script tags etc.). A regex cannot replace a full HTML/XML parser and if you need 100% accurateness, you should consider using one: http://livedocs.adobe.com/coldfusion/8/htmldocs/help.html?content=functions_t-z_23.html
If your input is exactly in the format given above, you don't even need regex - just strip the outer tags:
<cfsavecontent variable="Input">[text from above]</cfsavecontent>
<cfset Output = mid( Input, 4 , len(Input) - 9 />
If your input is more than this (i.e. a significant piece of HTML, or a full HTML document), regex is still not the ideal tool - instead, you should be using a HTML parser, such as JSoup:
<cfset jsoup = createObject('java','org.jsoup.Jsoup') />
<cfset Output = jsoup.parse(Input).select('em').html() />
(With CF8, this code requires placing the jsoup JAR file in CF's lib directory, or using a tool such as JavaLoader.)
If you are using jquery you can do this also pretty easily.
$("em").html();
Will return all html between the em tags.
See this fiddle
I had to remove any text that was to follow after a partiucular tag . Now the HTML content was getting generated dynamically from a database that cater to 5 different langauges. so I only had the div tag to help me. I am not sure why REMatch("(?s).*?", html) did not work for me. However Ben helped me here (http://www.bennadel.com/blog/769-Learning-ColdFusion-8-REMatch-For-Regular-Expression-Matching.htm). My code looks like tghis:
<cfset extContentArr = REMatch("(?i)<div class=""inlineBlock"" style=""margin-right:30px;"">.+?</div>",qry_getContent.colval) />
<cfif !ArrayIsEmpty(extContentArr)>
Loop the array and do whatever you need with the extract , I just deleted them.
</cfif>

replacing image path with regular expression

I have massive html code, with loooads of images, problem is, every single image has a different path, example:
<img src="../media/2010/01/something.jpg" />
<img src="../media/logo.png" />
What I wanted to do with regular expressions is, to find every image path and replace it with:
<img src="../img/FILENAME.EXTENSION" />
I know that it's definately possible with regular expressions ... but it's just not my cup of tea, could any1 help me please?
Cheers, Mart
This might not be the best solution but it might work:
(<img.*?src=")([^"]*?(\/[^/]*\.[^"]+))
and then you use capture group 1 and 3 to create the new string (depending on flavor):
$1../img$3
You can see it in action here: http://regexr.com?2v8ir
If you want to parse html, its much better if you use an html parser instead of regex. There are quite alot of them and they do a very good work.
Html Agility Pack is a good one
Try this link
Using this regex <img src="[\w/\.]+"(\s|)/> and replacing with <img src="../img/FILENAME.EXTENSION" />