I have a simple regex line to extract the src="" value from an image tag:
<cfset variables.attrSrc = REMatch("(?i)src\s*=\s*""[^""]+", variables.myImageTag) />
<!--- REMatch("(?i)src\s*=\s*""[^""]+" --->
However, while this works great, it doesn't appear to be working with src='' attrubutes that display single quotes instead of double.
Ideally, I'd like it to work with both single quotes and double.
Any thoughts?
Thanks,
Michael.
(?i)src\s*=\s*(""[^""]+""|'[^']+')
Related
I need to replace the below url (including img tags) with text. I am not very good with regex...
However, as the date will change its not only about copy and replace infortunately.
<img src="http://thailandsbloggare.se/wp-content/uploads/2012/10/icon_wink.gif" alt=";)">
and sometimes with class="wp-smiley"
<img src="http://thailandsbloggare.se/wp-content/uploads/2012/09/icon_wink.gif" alt=";)" class="wp-smiley" />
So any time this image is posted I want the complete string to replaced to text ";)"
Its a site built in Wordpress using PHP
Thanks in advance!
EDIT: I want to clarify why I can not just use a search and replace and why the url is dynamic.
The date part - written as this /2012/10/ will be different for every time this image is posted. So everything but the date will be the same all the time!!
And saometime the string ends with alt=";)"> and sometimes with class="wp-smiley" />
This should match your URL and any date (at least until 2099)
http://thailandsbloggare.se/wp-content/uploads/20\d\d/\d+/icon_wink\.gif
I want to get string between em tag , including other html also.
for example:
<em>UNIVERSALPOSTAL UNION - International Bureau Circular<br />
By: K.J.S. McKeown</em>
output should be as:
UNIVERSALPOSTAL UNION - International Bureau Circular<br />
By: K.J.S. McKeown
please help me.
Thanks
Use the regular expression function like this:
REMatch("(?s)<em>.*?</em>", html)
See also: http://livedocs.adobe.com/coldfusion/8/htmldocs/help.html?content=regexp_01.html
The (?s) sets the mode to single line, so that the input text is interpreted as one line even if it contains line feeds. This is probably the default (I'm not sure) so it can be omitted. As Peter pointed out in a comment, this is not the default and therefore must be set.
The .*? matches all characters inbetween <em> and </em>. The questionmark after the multiplier makes it "non-greedy", so that as few as possible characters are matched. This is needed in case the input html contains something like <em>foo</em><em>bar</em> where otherwise only the outermost <em></em> tags are considered.
The returned array contains all matches found, i.e. all texts including html that was in <em> tags.
Note that this could fail for circumstances where </em> also occurs as attribute text and is incorrectly not html-encoded, for example: <em><a title="Help for </em> tag">click</a></em> or in other rare circumstances (e.g. javascript script tags etc.). A regex cannot replace a full HTML/XML parser and if you need 100% accurateness, you should consider using one: http://livedocs.adobe.com/coldfusion/8/htmldocs/help.html?content=functions_t-z_23.html
If your input is exactly in the format given above, you don't even need regex - just strip the outer tags:
<cfsavecontent variable="Input">[text from above]</cfsavecontent>
<cfset Output = mid( Input, 4 , len(Input) - 9 />
If your input is more than this (i.e. a significant piece of HTML, or a full HTML document), regex is still not the ideal tool - instead, you should be using a HTML parser, such as JSoup:
<cfset jsoup = createObject('java','org.jsoup.Jsoup') />
<cfset Output = jsoup.parse(Input).select('em').html() />
(With CF8, this code requires placing the jsoup JAR file in CF's lib directory, or using a tool such as JavaLoader.)
If you are using jquery you can do this also pretty easily.
$("em").html();
Will return all html between the em tags.
See this fiddle
I had to remove any text that was to follow after a partiucular tag . Now the HTML content was getting generated dynamically from a database that cater to 5 different langauges. so I only had the div tag to help me. I am not sure why REMatch("(?s).*?", html) did not work for me. However Ben helped me here (http://www.bennadel.com/blog/769-Learning-ColdFusion-8-REMatch-For-Regular-Expression-Matching.htm). My code looks like tghis:
<cfset extContentArr = REMatch("(?i)<div class=""inlineBlock"" style=""margin-right:30px;"">.+?</div>",qry_getContent.colval) />
<cfif !ArrayIsEmpty(extContentArr)>
Loop the array and do whatever you need with the extract , I just deleted them.
</cfif>
I am running a query and am try to output the information using cfoutput like this:
<cfoutput query="the_query">
<p>#QueryResult#<p>
</cfoutput>
Coldfusion won't allow me to uses the # in href. It says "Invalid CFML construct", but I need it to be href="#". Is there a way to escape this?
Just double up on the # character. ## inside a tag will output a single #.
<cfoutput query="the_query">
<p>#QueryResult#<p>
</cfoutput>
No problem putting these up against regular terms, either, say you wanted to name the anchor using a field from the query:
<p>#QueryResult#<p>
This would give you
<p>Result Here<p>
I currently have code to remove html from a string:
<cfset arguments.textToFormat = REReplaceNoCase(arguments.textToFormat,"<[^>]*>","","ALL") />
However, this does not remove html characters like
What Regex could I use to ReReplace these characters??
Thanks
For removing and other similar strings :
&[^;]+?;
HTH
im looking for a regex to strip the h3 tag and only the content should remain
eg.
<h3 id="123">my h3 tag</h3>
i need to strip the tag and be left with
my h3 tag
im currently have reMatchNoCase("(<h3)(.*?)(</h3>)",i)
this was parsed from many other parts of a string, not i need it cleaned to the content
thanks
<cfset content = ReReplace(content, "</?[hH]3[^>]*>", "", "ALL")>
This should be faster than ReReplaceNoCase, and still be case insensitive (because of the [hH]).
You shouldn't really use regexen to parse HTML, but it you want to here's a quick hack:
/<\s*h3[^>]*>(.*?)<\/h3>/
Just replace this with $1.
You could just replace
<h3[^>]*>
and
</h3>
with nothing.