I currently have code to remove html from a string:
<cfset arguments.textToFormat = REReplaceNoCase(arguments.textToFormat,"<[^>]*>","","ALL") />
However, this does not remove html characters like
What Regex could I use to ReReplace these characters??
Thanks
For removing and other similar strings :
&[^;]+?;
HTH
Related
I have a code like:
<p>Also: <a>text 1</a></p> <p><a> text 2 </a></p>
I am using a regex like this, I just want to remove until the first </P>
<p>Also:(.*?)</p>
and the output is
empty
How do I select until the first </p> from <p>Also?
I think you want a regex like this:
/(?<=<p>Also:).+?(?=<\/p>)/i
[Regex Demo]
or
/^.*?(?<=<p>Also:).+?(?=<\/p>)/gi
[Regex Demo]
I tried VB.NET and found that your regex pattern works for your input, However, When I tried in regexr.com I found that the foward slash "/" should be escaped.
You could try this:
<p>Also:(.+?)<\/p>
Note: For HTML, I won't recommend you to use regex. It's better to use an HTML parser depending on your programming language.
I have an HTML parser doing the hard work, but I need a regex to select anchors that don't have an attriburte id="optout". Here's my current regex that selects all anchors that have href with http... this is great just needs to ignore those anchors with id="optout" -- any ideas?
Thanks!
<cfset matches = ReMatch('<a[^>]*href="http[^"]*"[^>]*>(.+?)</a>', arguments.htmlCode) />
Regex is the wrong tool for this task, and given that you've already got a HTML parser involved, there's no reason not to keep using it!
Here's the trivial way to do it with a HTML parser (jsoup):
jsoup.parse( Arguments.HtmlCode ).select('a:not([id=optout])')
Here's the far less maintainable regex way to do it:
rematch( '(?i)<a\s*(?:(?!id\s*=\s*[''"]optout[''"])[^>])+>(?:[^<]+|<(?!/a>))+</a>' , Arguments.HtmlCode )
I want to get string between em tag , including other html also.
for example:
<em>UNIVERSALPOSTAL UNION - International Bureau Circular<br />
By: K.J.S. McKeown</em>
output should be as:
UNIVERSALPOSTAL UNION - International Bureau Circular<br />
By: K.J.S. McKeown
please help me.
Thanks
Use the regular expression function like this:
REMatch("(?s)<em>.*?</em>", html)
See also: http://livedocs.adobe.com/coldfusion/8/htmldocs/help.html?content=regexp_01.html
The (?s) sets the mode to single line, so that the input text is interpreted as one line even if it contains line feeds. This is probably the default (I'm not sure) so it can be omitted. As Peter pointed out in a comment, this is not the default and therefore must be set.
The .*? matches all characters inbetween <em> and </em>. The questionmark after the multiplier makes it "non-greedy", so that as few as possible characters are matched. This is needed in case the input html contains something like <em>foo</em><em>bar</em> where otherwise only the outermost <em></em> tags are considered.
The returned array contains all matches found, i.e. all texts including html that was in <em> tags.
Note that this could fail for circumstances where </em> also occurs as attribute text and is incorrectly not html-encoded, for example: <em><a title="Help for </em> tag">click</a></em> or in other rare circumstances (e.g. javascript script tags etc.). A regex cannot replace a full HTML/XML parser and if you need 100% accurateness, you should consider using one: http://livedocs.adobe.com/coldfusion/8/htmldocs/help.html?content=functions_t-z_23.html
If your input is exactly in the format given above, you don't even need regex - just strip the outer tags:
<cfsavecontent variable="Input">[text from above]</cfsavecontent>
<cfset Output = mid( Input, 4 , len(Input) - 9 />
If your input is more than this (i.e. a significant piece of HTML, or a full HTML document), regex is still not the ideal tool - instead, you should be using a HTML parser, such as JSoup:
<cfset jsoup = createObject('java','org.jsoup.Jsoup') />
<cfset Output = jsoup.parse(Input).select('em').html() />
(With CF8, this code requires placing the jsoup JAR file in CF's lib directory, or using a tool such as JavaLoader.)
If you are using jquery you can do this also pretty easily.
$("em").html();
Will return all html between the em tags.
See this fiddle
I had to remove any text that was to follow after a partiucular tag . Now the HTML content was getting generated dynamically from a database that cater to 5 different langauges. so I only had the div tag to help me. I am not sure why REMatch("(?s).*?", html) did not work for me. However Ben helped me here (http://www.bennadel.com/blog/769-Learning-ColdFusion-8-REMatch-For-Regular-Expression-Matching.htm). My code looks like tghis:
<cfset extContentArr = REMatch("(?i)<div class=""inlineBlock"" style=""margin-right:30px;"">.+?</div>",qry_getContent.colval) />
<cfif !ArrayIsEmpty(extContentArr)>
Loop the array and do whatever you need with the extract , I just deleted them.
</cfif>
I have a simple regex line to extract the src="" value from an image tag:
<cfset variables.attrSrc = REMatch("(?i)src\s*=\s*""[^""]+", variables.myImageTag) />
<!--- REMatch("(?i)src\s*=\s*""[^""]+" --->
However, while this works great, it doesn't appear to be working with src='' attrubutes that display single quotes instead of double.
Ideally, I'd like it to work with both single quotes and double.
Any thoughts?
Thanks,
Michael.
(?i)src\s*=\s*(""[^""]+""|'[^']+')
I need a regular expression that will find lines with:
<cflocation url="index.cfm" addtoken="No">
but without the addtoken
<cflocation url="index.cfm">
The index.cfm can be any web address
I also added a comment below but this is for my text editor so I can search in files for all cflocation tags that are missing addtoken.
Thanks!
Try <cflocation\s+url="[^"]*"\s*>
You can test out regex's with data at regexpal.com.
You can use strfriend.com to explain regex's, an example output for the regex above is given below: