coldfusion bug in Replace function - replace

Here is my program:
<cfset test = 'a~b~~c~d~~~e'>
<cfset test2 = Replace(test, '~~','~X~','all')>
<cfoutput>
test #test#
<br> test2 #test2#
<br>wanted: a~b~X~c~d~X~X~e
</cfoutput>
The output I got:
test a~b~~c~d~~~e
test2 a~b~X~c~d~X~~e
wanted: a~b~X~c~d~X~X~e
So the output of test2 is wrong This no doubt has to do with the inner workings of the Replace function, but I need it to work correctly.
Does anyone know of a workaround for this problem?

It's not a bug.
Replace() doesn't have any special "lookaround" capability. It just walks the input string until it finds ~~. Then jumps to the next character - after the matched text - and continues searching. Resulting in only two matches.
It sounds more like the requirement is to insert an "X" in between any two tildes "~~". A regex with a non-capturing look-ahead should accomplish that.
reReplace(test, '~(?=~)','~X','all')
Explanation
~ Find tilde
(?=~) .. followed by another tilde
Demo Example

Related

Perl Script (running slow) taking a minute to replace the simple regex

Perl script taking (running slowly) a minute to replace the following regex:
$str = '<![CDATA[$..$]]>;
I have a file contains <![CDATA[$..$]]> (not less than 1000 occurrences) latex/tex coding in CDATA. Hence I need to change this into Comment tag and processing instruction like <!--<![CDATA[--><?processingInstruction $..$?><!--]]>-->.
$SqrBrLoopMany = qw/((?:[^\[\]]*(?:{(?:[^\[\]]*(?:{[^\[\]]*})*[^\[\]]*)*})*[^\[\]]*)*)/; # This is for using `\[ <whatever> \]` Square bracket.
$str=~s/(\<\!\[CDATA\[)$SqrBrLoopMany(\]\]>)/<\!\-\-$1\-\-><\?processingInstruction $2\?><\!\-\-$3\-\->/sg;
The above regex I am doing however the script takes a minute to replace the output.
Output should be:
<!--<![CDATA[--><?processingInstruction $..$?><!--]]>-->
It would be appreciated if someone help on this one.
Simplest possible:
s/<!\[CDATA\[(.*?)]]>/<!--<![CDATA[--><?processingInstruction $1?><!--]]>-->/sg
CDATA can not contain any nested structures, so the pattern just looks for the starting <![CDATA[ and closest ending ]]>, and matches everything in between.
The reason your pattern is running slowly, is because you are matching non-brackets ([^\[\]]) in between braces { ... }. If the CDATA section contains [ or ]that are not part of the ending ]]>, it will fail and try to backtrack each of the [^\[\]]* in turn, leading to quintic (O(x5)) execution time.
If square brackets are required to be balanced for it to match, you could do
s/<!\[CDATA\[(([^][]|\[(?2)*?])*?)]]>/<!--<![CDATA[--><?processingInstruction $1?><!--]]>-->/sg
The (?2) will recursively match the second subpattern/capture group again. This should work in both Perl and PCRE based regex engines.
Demo: https://regex101.com/r/LmClY9/2
Thanks to Markus Jarderot given the way/answer to achieve this:
$str=~s/(\<\!\[CDATA\[)([^\]\]>]*)(\]\]>)/<\!\-\-$1\-\-><\?xmltex $2\?><\!\-\-$3\-\->/sg;
<!\[CDATA\[(.*?)]]> Instead of <!\[CDATA\[([^\]\]>]*)]]>

Regular expression remove full stop from bullet text

I'm really struggling to work out how to remove the full stop from the following:
• this is a test bullet.<br>
• this is a test bullet 2.<br>
• this is a test bullet 3.<br>
It needs to only remove the full stops from the bullets as there are other paragraphs containing full stops and break returns.
Any help with this please?
The output would need to look like:
• this is a test bullet<br>
• this is a test bullet 2<br>
• this is a test bullet 3<br>
How about, given that we should be able to use the bullet character, something simple like:
Find: (•.*)\.(.*)
Replace with: $1$2
You could just use the replaceAll method on the String object, like so:
String values = "• this is a test bullet.<br>\n" +
"• this is a test bullet 2.<br>\n" +
"• this is a test bullet 3.<br>";
values = values.replaceAll("(?i)\\.(?=<br>)", "");
// result:
// • this is a test bullet<br>
// • this is a test bullet 2<br>
// • this is a test bullet 3<br>
It will remove any full stops preceded by a <br> tag, and is case insensitive.
Explanation of regex:
Make pattern case insensitive:
(?i)
Find full stop (.):
\\.
Forward look ahead for <br> tag:
(?=<br>)
Regex:
^(\s*•.*)\.$
Replacement string:
$1
OR
Regex:
^\s*•.*\K\.$
Replacement string:
Empty string
DEMO

coldfusion - regex - because lazy doesnt work

Trying to remove some code by regex where it follows the pattern
<cfif CheckMember.RecordCount gt 0>[SOME TEXT HERE ALL I KNOW IS IT DOENST CONTAIN A </cfif>]</cfif>
So i need to find the first occurrence of </cfif> after that first bit. Problem is lazy is not working, its just getting everything. Any way to get everything between some text and the first occurrence of a word?
I was hoping <cfif CheckMember.RecordCount gt 0>.+?</cfif> would work like it does in other engines.
There's no reason what you wrote shouldn't work (aside from . not matching newlines without the appropriate flag set), but in general lazy matching is not the most efficient way to do things, and using a pattern like this is likely to be better:
<cfif CheckMember\.RecordCount gt 0>(?:[^<]++|<(?!/cfif>))*</cfif>
The key part being:
(?:
[^<]++
|
<(?!/cfif>)
)*
i.e. not an angle bracket, or an angle-bracket that isn't starting a </cfif> sequence.
(Depending on what regex engine you are using, you may need to change the possessive ++ to a simple greedy +)
This regex should work for what you are looking to do
<cfif CheckMember.RecordCount gt 0>.*?</cfif>

Regex matching in ColdFusion OR condition

I am attempting to write a CF component that will parse wikiCreole text. I am having trouble getting the correct matches with some of my regular expression though. I feel like if I can just get my head around the first one the rest will just click. Here is an example:
The following is sample input:
You can make things **bold** or //italic// or **//both//** or //**both**//.
Character formatting extends across line breaks: **bold,
this is still bold. This line deliberately does not end in star-star.
Not bold. Character formatting does not cross paragraph boundaries.
My first attempt was:
<cfset out = REreplace(out, "\*\*(.*?)\*\*", "<strong>\1</strong>", "all") />
Then I realized that it would not match where the ** is not given, and it should end where there are two carriage returns.
So I tried this:
<cfset out = REreplace(out, "\*\*(.*?)[(\*\*)|(\r\n\r\n)]", "<strong>\1</strong>", "all") />
and it is close but for some reason it gives you this:
You can make things <strong>bold</strong>* or //italic// or <strong>//both//</strong>* or //<strong>both</strong>*//.
Character formatting extends across line breaks: <strong>bold,</strong>
this is still bold. This line deliberately does not end in star-star.
Not bold. Character formatting does not cross paragraph boundaries.
Any ideas?
PS: If anyone has any suggestions for better tags, or a better title for this post I am all ears.
The [...] represents a character class, so this:
[(\*\*)|(\r\n\r\n)]
Is effectively the same as this:
[*|\r\n]
i.e. it matches a single "*" and the "|" isn't an alternation.
Another problem is that you replace the double linefeed. Even if your match succeeded you would end up merging paragraphs. You need to either restore it or not consume it in the first place. I'd use a positive lookahead to do the latter.
In Perl I'd write it this way:
$string =~ s/\*\*(.*?)(?:\*\*|(?=\n\n))/<strong>$1<\/strong>/sg;
Taking a wild guess, the ColdFusion probably looks like this:
REreplace(out, "\*\*(.*?)(?:\*\*|(?=\r\n\r\n))", "<strong>\1</strong>", "all")
You really should change your
(.*?)
to something like
[^*]*?
to match any character except the *. I don't know if that is the problem, but it could be the any-character . is eating one of your stars. It also a generally accepted "best practice" when trying to balance matching characters like the double star or html start/end tags to explicitly exclude them from your match set for the inner text.
*Disclaimer, I didn't test this in ColdFusion for the nuances of the regex engine - but the idea should hold true.
I know this is an older question but in response to where Ryan Guill said "I tried the $1 but it put a literal $1 in there instead of the match" for ColdFusion you should use \1 instead of $1
I always use a regex web-page. It seems like I start from scratch every time I used regex.
Try using '$1' instead of \1 for this one - the replace is slightly different... but I think the pattern is what you need to get working.
Getting closer with this:
**(.?)**|//(.?)//
The tricky part is the //** or **//
Ok, first checking for //bold//
then //bold// then bold, then
//bold//
**//(.?)//**|//**(.?)**//|**(.?)**|//(.?)//
I find this app immensely helpful when I'm doing anything with regex:
http://www.gskinner.com/RegExr/desktop/
Still doesn't help with your actual issue, but could be useful going forward.

Regex greedy issue

I'm sure this one is easy but I've tried a ton of variations and still cant match what I need. The thing is being too greedy and I cant get it to stop being greedy.
Given the text:
test=this=that=more text follows
I want to just select:
test=
I've tried the following regex
(\S+)=(\S.*)
(\S+)?=
[^=]{1}
...
Thanks all.
here:
// matches "test=, test"
(\S+?)=
or
// matches "test=, test" too
(\S[^=]+)=
you should consider using the second version over the first. given your string "test=this=that=more text follows", version 1 will match test=this=that= then continue parsing to the end of the string. it will then backtrack, and find test=this=, continue to backtrack, and find test=, continue to backtrack, and settle on test= as it's final answer.
version 2 will match test= then stop. you can see the efficiency gains in larger searches like multi-line or whole document matches.
You probably want something like
^(\S+?=)
The caret ^ anchors the regex to the beginning of the string. The ? after the + makes the + non-greedy.
You might be looking for lazy quantifiers *?, +?, ??, and {n, n}?
You should be able to use this:
(\S+?)=(\S.*)
Lazy quantifiers work, but they also can be a performance hit because of backtracking.
Consider that what you really want is "a bunch of non-equals, an equals, and a bunch more non-equals."
([^=]+)=([^=]+)
Your examples of [^=]{1} only matches a single non-equals character.
if you want only "text=", I think that a simply:
^(\w+=)
should be fine if you are shure about that the string "text=" will always start the line.
the real problem is when the string is like this:
this=that= more test= text follows
if you use the regex above the result is "this=" and if you modify the above with the reapeater qualifiers at the end, like this:
^(\w+=)*
you find a tremendous "this=that=", so I could only imagine the trivial:
[th\w+=]*test=
Bye.