coldfusion - regex - because lazy doesnt work - regex

Trying to remove some code by regex where it follows the pattern
<cfif CheckMember.RecordCount gt 0>[SOME TEXT HERE ALL I KNOW IS IT DOENST CONTAIN A </cfif>]</cfif>
So i need to find the first occurrence of </cfif> after that first bit. Problem is lazy is not working, its just getting everything. Any way to get everything between some text and the first occurrence of a word?
I was hoping <cfif CheckMember.RecordCount gt 0>.+?</cfif> would work like it does in other engines.

There's no reason what you wrote shouldn't work (aside from . not matching newlines without the appropriate flag set), but in general lazy matching is not the most efficient way to do things, and using a pattern like this is likely to be better:
<cfif CheckMember\.RecordCount gt 0>(?:[^<]++|<(?!/cfif>))*</cfif>
The key part being:
(?:
[^<]++
|
<(?!/cfif>)
)*
i.e. not an angle bracket, or an angle-bracket that isn't starting a </cfif> sequence.
(Depending on what regex engine you are using, you may need to change the possessive ++ to a simple greedy +)

This regex should work for what you are looking to do
<cfif CheckMember.RecordCount gt 0>.*?</cfif>

Related

perl regex substitution if NOT this string NOR that character

I'm using Perl to highlight errors through my browser as I scan through pages of text. At this point, I want to ensure the text Seq is preceded by a maltese cross and space ✠ , otherwise highlight it. I also want to ignore n>Seq.
PS. If it's easier, I want to ignore > but it will always be n>. In fact, it would always be </span> - whichever is easiest to check for.
Example phrase: ✠ Seq. S. Evangélii sec. Joánnem. — In illo témpore
I'm trying to replace xySeq if xy is NOT a Maltese cross and a space ✠ , AND if xy is NOT the letter n and a greater than symbol n>.
In other words, I don’t want to substitute
✠ Seq
n>Seq
>Seq
</span>Seq
but I do want to replace things like
✠Seq
* Seq
a✠Seq
>aSeq
The following would work if I was just checking for single characters like ✠ or >
my $span_beg = q(<span class='bcy'>); # HTML markup for highlighting
my $span_end = q(</span>);
$phr =~ s/([^✠>]Seq)/$span_beg$1$span_end/g;
but [^✠ >]Seq will naturally only treat the ✠ and the space as one or the other.
I even tried [^(✠\s)>]Seq and a varible [^$var>] but these didn’t work.
I played with (?<!✠\s)Seq but didn't know how to incorporate > or if it was even the right way to go.
I hope this is possible, thanks for all.
Guy
If you always want to tag Seq and exactly two characters before it, a couple of look-behinds might be enough:
s{..(?<!✠\s)(?<!n>)Seq}{$span_beg$&$span_end}g;
Or, with look-ahead:
s{(?!✠\s)(?!n>)..Seq}{$span_beg$&$span_end}g;
This should be more efficient than performing lookaround at every position:
# Doesn't include preceding characters in the span.
s{(✠ |>)?Seq}{ $1 ? $& : "$span_beg$&$span_end" }eg
# Includes two preceding characters in the span.
s{(?:(✠ |>)|..)Seq}{ $1 ? $& : "$span_beg$&$span_end" }seg

coldfusion bug in Replace function

Here is my program:
<cfset test = 'a~b~~c~d~~~e'>
<cfset test2 = Replace(test, '~~','~X~','all')>
<cfoutput>
test #test#
<br> test2 #test2#
<br>wanted: a~b~X~c~d~X~X~e
</cfoutput>
The output I got:
test a~b~~c~d~~~e
test2 a~b~X~c~d~X~~e
wanted: a~b~X~c~d~X~X~e
So the output of test2 is wrong This no doubt has to do with the inner workings of the Replace function, but I need it to work correctly.
Does anyone know of a workaround for this problem?
It's not a bug.
Replace() doesn't have any special "lookaround" capability. It just walks the input string until it finds ~~. Then jumps to the next character - after the matched text - and continues searching. Resulting in only two matches.
It sounds more like the requirement is to insert an "X" in between any two tildes "~~". A regex with a non-capturing look-ahead should accomplish that.
reReplace(test, '~(?=~)','~X','all')
Explanation
~ Find tilde
(?=~) .. followed by another tilde
Demo Example

Regex stop at the first double quote

This is my string
'<input type="hidden" name="mode" value="<?=$modeValue?>"/>'
I am trying to take 1) name="mode" and 2) value="<?=$modeValue?>"
from it.
I ran two regex to find it which is
/name\s*=\s*['\"].*['\"]/ for name="mode" and /value\s*=\s*['\"].*['\"]/ for value="<?=$modeValue?>"
But I fail to get name="mode" on the first regex.
Instead I get name="mode" value="$modeValue".
However I succeeded in getting value="<?=$modeValue?>"
What is wrong with my regex for name="mode"?
My observation, I think I have to make the regex stops at the first " it encounters. Anyone know how to do this. I am running out of time...
A little change and your regex is good to go.
name\s*=\s*['\"].*?['\"]
^
Why your regex was not working the way you wanted.
So by nature quantifiers are greedy in nature so . will try to match as many characters as it can.
So by adding ? we make it lazy which means it will now try to match as less character as it can.
Demo
In case you want to join both of regex together.
(name=\".*?\")\s*(value=\".*?\")|(value=\".*?\")\s*(name=\".*?\")
Demo2
You can create capturing groups to match both,
(name=\".*?\")\s*(value=\".*?\")
Demo:
https://regex101.com/r/z9dDE2/1

Howto match many words and use this match in substitution

I'm using the VS2012 replace feature to replace some text in the editor.
I want to replace >OneWord</Label> with Content="OneWord" />.
And I want to replace >More than one word</Label> with Content="More than one word" />.
At the moment I have Find what filled with
>(\w+|[^\S\r\n]+)</Label>
and Replace with filled with
Content="$1" />
This works for the first case where only one word is used, but not for the second case.
If I use >(\w+|[^\S\r\n]+)+</Label> I get Content="word" /> for the secod case.
How can I define my regular expressions to work in both cases?
At the moment your are matching either a sequence of word characters \w+ or a sequence of whitespace [^\S\r\n]+.
To solve your problem, just move the quantifier and add another group:
>((\w|[^\S\r\n])+)</Label>
Your result is still in $1.

Regex matching in ColdFusion OR condition

I am attempting to write a CF component that will parse wikiCreole text. I am having trouble getting the correct matches with some of my regular expression though. I feel like if I can just get my head around the first one the rest will just click. Here is an example:
The following is sample input:
You can make things **bold** or //italic// or **//both//** or //**both**//.
Character formatting extends across line breaks: **bold,
this is still bold. This line deliberately does not end in star-star.
Not bold. Character formatting does not cross paragraph boundaries.
My first attempt was:
<cfset out = REreplace(out, "\*\*(.*?)\*\*", "<strong>\1</strong>", "all") />
Then I realized that it would not match where the ** is not given, and it should end where there are two carriage returns.
So I tried this:
<cfset out = REreplace(out, "\*\*(.*?)[(\*\*)|(\r\n\r\n)]", "<strong>\1</strong>", "all") />
and it is close but for some reason it gives you this:
You can make things <strong>bold</strong>* or //italic// or <strong>//both//</strong>* or //<strong>both</strong>*//.
Character formatting extends across line breaks: <strong>bold,</strong>
this is still bold. This line deliberately does not end in star-star.
Not bold. Character formatting does not cross paragraph boundaries.
Any ideas?
PS: If anyone has any suggestions for better tags, or a better title for this post I am all ears.
The [...] represents a character class, so this:
[(\*\*)|(\r\n\r\n)]
Is effectively the same as this:
[*|\r\n]
i.e. it matches a single "*" and the "|" isn't an alternation.
Another problem is that you replace the double linefeed. Even if your match succeeded you would end up merging paragraphs. You need to either restore it or not consume it in the first place. I'd use a positive lookahead to do the latter.
In Perl I'd write it this way:
$string =~ s/\*\*(.*?)(?:\*\*|(?=\n\n))/<strong>$1<\/strong>/sg;
Taking a wild guess, the ColdFusion probably looks like this:
REreplace(out, "\*\*(.*?)(?:\*\*|(?=\r\n\r\n))", "<strong>\1</strong>", "all")
You really should change your
(.*?)
to something like
[^*]*?
to match any character except the *. I don't know if that is the problem, but it could be the any-character . is eating one of your stars. It also a generally accepted "best practice" when trying to balance matching characters like the double star or html start/end tags to explicitly exclude them from your match set for the inner text.
*Disclaimer, I didn't test this in ColdFusion for the nuances of the regex engine - but the idea should hold true.
I know this is an older question but in response to where Ryan Guill said "I tried the $1 but it put a literal $1 in there instead of the match" for ColdFusion you should use \1 instead of $1
I always use a regex web-page. It seems like I start from scratch every time I used regex.
Try using '$1' instead of \1 for this one - the replace is slightly different... but I think the pattern is what you need to get working.
Getting closer with this:
**(.?)**|//(.?)//
The tricky part is the //** or **//
Ok, first checking for //bold//
then //bold// then bold, then
//bold//
**//(.?)//**|//**(.?)**//|**(.?)**|//(.?)//
I find this app immensely helpful when I'm doing anything with regex:
http://www.gskinner.com/RegExr/desktop/
Still doesn't help with your actual issue, but could be useful going forward.