Coldfusion RegEx to replace characters - regex

I have the following code:
<cfset arguments.textToFormat = Replace(arguments.textToFormat, Chr(10), '<br />', "ALL") />
It replaces all instances of Chr(10) with a <br /> tag.
What I'd like to do however is afterwards, if there are more than two <br /> tags, replace all the extra ones with empty string (i.e. remove them)
I could do this via code, but I'm sure a regex replace would be faster. Unfortunately I haven't a clue how to construct the regex.
Any help would be great - thanks.

There may be a more elegant regex, but this should do it:
rereplace( myText, '(<br />){2,}', '<br />', 'all' )
That should find all instances of 2 or more <br /> tags, and replace the whole set with a single tag.

Related

Regex to ignore HTML tags

In this scenario, I would like to capture dgdhyt2464t6ubvf through regex. Please can you help me. Thank you so much!
<br />For API key "fnt56urkehicdvd", use API key secret:
<br />
<br /> dgdhyt2464t6ubvf
<br />
<br />Note that it's normal to
So far, I have this, but it is not getting past the <br />:
use API key secret:[\s]+</br>*(.*)[\s]+\sNote
You can do this:
public static void main(String[] args)
{
String test = "<br />For API key \"fnt56urkehicdvd\", use API key secret:" +
"<br /><br /> dgdhyt2464t6ubvf<br /><br />Note that it's normal to";
String[] temp = test.split("\\<br /\\>");
System.out.println(temp[3].trim());
}
Output:
dgdhyt2464t6ubvf
You have errors in your regex.
The * after the </br> only makes the > optional. My guess is that you wanted the whole </br> to be optional.
</br> should be <br />. There is no </br> in the text, so the regex won't find anything. By the way, <br */> is better, since the space isn't necessary.
Since the (.*) is greedy, it will match everything from here on up to the last "Note" in the text. I'm sure that's not what you want.
\sNote would match the word "Note" preceded by a space, but not (as in the example) the word "Note" not preceded by a space. Remove the \s.
Only when you correct all those errors will the regex work. Heed the disclaimers in the comments though.

Removing <br> with REReplace in coldfusion

I have some html line break tags in a text file that i would like to remove or replace with chr(10) using the coldfusion REReplace command. I am trying
<CFSET newtext = REreplace(text, "<BR>", chr(10), "ALL")>
but it doesn't seem to work. What am i doing wrong?
Can you do a plain <cfset newtext = replaceNoCase(text, '<br>', chr(10), 'ALL')> ? Since it doesn't look like you are looking for something that needs a complex matcher, it will probably work better for you.
I would recommend using a regex here in case there are XHTML tags like <br/> or <br />:
<cfset newtext = REReplaceNoCase(text, "<br[^>]*>", chr(10), "all") />

Why does Regex Replace delete a quote?

I'm trying to sanitize HTML tags, e.g. turn
<input type="image" name="name" src="image.png">
into the correct empty-element form
<input type="image" name="name" src="image.png" />
with a slash at the end.
I'm using Eclipse's Find/Replace with regular expressions like this:
Find: <(input .*)[^/]>
Replace with: <\1 />
But I end up with
<input type="image" name="name" src="image.png />
I.e. the last quote is missing.
Is that an error in my regex, or a bug in Eclipse?
The term [^/] is consuming the quote. Move it inside the captured group:
Find: <(input .*[^/])>
Replace: <\1 />
The error is in your regex. The [^/] at the end captures the last non-> character. \1 represents the first capturing group, which would be (input.*). In short, you are getting everything inside the tag except the last character. If you put the [^\] inside your group, your replace should work.
Also, you may run into issues if you have a / inside of one of your attribute values. For performance reasons, I would recommend using the following regex:
<(input [^/]*(/[^/]*)*)>
In this case, it does not have to backtrack if you have a / inside of one of your attributes. Your regex should capture everything you need though.

Emacs query-replace-regexp with html

I was trying the replace-regexp command in Emacs but I've no idea about how to construct the right regexp. My file looks like the following:
<img src="http://s.perros.com/content/perros_com/imagenes/thumbs/1lundehund2.jpg" />
<img src="http://s.perros.com/content/perros_com/imagenes/thumbs/1pleon2.jpg" />
And I want to replace for:
<img src="" class="class-1lundehund2.jpg" />
<img src="" class="class-1pleon2.jpg" />
I was using this regexp with no success (Replaced 0 occurrences):
M-x replace-regexp
Replace regexp: src\=\"http\:\/\/s\.perros\.com\/content\/perros_com\/imagenes\/thumbs\/\([a-zA-Z0-9._-]+\)\"
Replace regexp with: src\=\"\" class\=\"class-\1\"
But in re-builder mode with the same regexp (changing \([a-zA-Z0-9.-]+\) by \\([a-zA-Z0-9.-]+\\)) all the results are right highlighted. I've no idea of what's happening, any tip?
I think you're escaping too many things. regexp = src="http://s\.perros\.com/content/perros_com/imagenes/thumbs/\([^"]*\)", replacement = src="" class="class-\1"

How can I reduce three or more repetitions of some text to only two?

I've got a text. I want to find out if a certain part of that text is repeated three or more times and replace that by only two repetitions.
For example, in the HTML code I'm looking at, there are 3 or more <br /> in a row and I want to change that to just 2 <br /> in a row.
How can I do that?
Is this what you want?
<?php
$s='<br /><br /> <br />';
$s=preg_replace('#(<br />\s*<br />)(?:\s*<br />)+#', "$1", $s);
print($s);
?>
If there are more than 2 consecutive <br /> tags (not counting whitespace), delete all but the first two.
Edit: As noted by Tim below, my original answer was altogether incorrect.
The correct regex for replacement would look like:
$s = preg_replace('/(.)\1{2,}/', '$1$1', $s);
It means: match any character once, then the same character (\1) at least twice more ({2,}), and replace the entire matched set with the first character, but only 2 times.
However, it might be that the above answers are probably closer to what you want.
For posterity, my original, incorrect regex looked like: /(.){3,}/ig
Not sure if it's possible to do this with a single regex. You probably need something like this:
$temp = preg_split('/<br \/>/', $input, 3);
if (count($temp) == 3) {
$temp[2] = str_replace('<br />', '', $temp[2]);
}
$result = implode($temp, '<br />');
By the way: it's not a good idea to use regular expressions for HTML parsing
If it is just <br /> you are trying to replace and not multiple patterns then this should work:
$s = preg_replace('/(<br />){3,}/', '<br /><br />');
If you need to match several different strings then this won't work.