Complex regular expression ... AND OR, negation - regex

I would like to search files by their content in Total Commander so I want to create a regex, but I cannot find any manual where it would really be explained. My situation is that I need something like this:
fileContains("<html>") && fileContains("{myVariable1}") && fileNotContains("<script>")
I can write cca this:
(<html>)+
({myVariable1})+
(<script>){0} ... but this does not work for me
And I cannot put it all together. Any ideas, please? Or do you have a link to an excellent regex explanation?

try this regex:
(?=.*\{myVariable1\})(?=.*<html>)(?!.*<script>)
it's just 3 lookaheads in a row. one of those is a negative lookahead. Note the "single line" modifier to enable 'dot matches newline'.
edit (per comment): I guess Total Commander's regex engine does not support lookarounds at all. While you could combine two positive lookaheads into an equivalent 'consuming' pattern with something like this untested regex: (.*(\{myVariable1\}|<html>)){2}, you cannot include the 'negated search' within a single regex unless you have a legitimate regex engine.
You could try this Total Commander regex plug-in:
A RegEx content plug-in with Unicode support - based on Perl
Compatible Regular Expressions (PCRE) library. This plug-in may
replace TC's RegEx engine for file content

Related

Negative lookbehind does not work with multiple alternatives in FileLocatorPro regex

(?<!ing|how|out)\sto\b
the expression I used in FileLocatorPro is ok, but after I added some words in it, like
(?<!ing|how|out|wants)\sto\b
it went wrong. Is there any limit of using "|"?
The Regular Expression flavor used for the Perl compatible option is Boost, see the FileLocatorPro docs:
Perl compatible regexp syntax is based around the Boost regular expression engine and includes not only the functionality of the 'classic' regular expression engine but also additional Perl style expression enhancements detailed here: http://www.boost.org/doc/libs/release/libs/regex.
Boost docs say that (?<!pattern) consumes zero characters, only if pattern could not be matched against the characters preceding the current position (pattern must be of fixed length).
That means, all alternatives inside a lookbehind must be of the same length.
The work around is to chain the lookbehinds with alternatives of the same length:
(?<!ing|how|out)(?<!wants)\sto\b
See the regex demo (Python option is used because Python has the same lookbehind length restriction).

RegEx Lookarounds - Using own escape sequence

I'm currently writing a little flatfile database for a project and in that context need to escape list item delimiters.
I decided to use ; as the delimiter and /; as my escaped version of that.
Since I already used RegEx lookarounds in the past, I was sure the following expression I use to split would do the job.
(?<!/);
My expression should match the ; in
abc;def
but should not match the ; in
abc/;def
I used RegExPal and the expression doesn't fit any of my examples.
Isn't this the correct structure of a regular expression to achieve my goal?
(?<!ForbiddenPreceedingExpression)CharacterFollowing
Any hints on where to find my problem?
There is nothing wrong with the regex.
The problem is that Regexpal is a javascript regular expression tester. Java script does not support look behinds.
Take a look at
pcre(php) Demo
where as this one won't
Javascript Demo

regex using Sublime

Using my text editor of choice, Sublime 2, I want to search through code that has uncommented alerts. So I need a regex for that finds "alert" but not "//alert" or "// alert". I don't know how to invert and then combine the two results. Sublime Text uses the Boost syntax for regular expressions. Thank you for any help.
You can search for text not preceeded by //, thus
(?<!\/\/\s?)alert
EDIT:
If the editor doesn't support variable lookbehinds you must specify all the possibilities in different lookbehinds
(?<!\/\/\s)(?<!\/\/)alert
try this:
(?<!//)(?<!// )alert
Boost syntax is based on Pearl RegExp. Thus negative lookbehind (?<!text) should be supported. In this example I use the negative lookbehind twice (with and without space) because the lookbehind text has to be fixed length.
you can read more about lookaraound feature in RegExp here:
http://www.regular-expressions.info/lookaround.html

regular expression and substitution

In Latex, I had a lot of math expressions with subscriptions in terms of 123, now, I need to change them to \alpha \beta \gamma instead of 123.
for example:
$E_{223}$ to $E_{\beta\beta\gamma}$
and
$E_{31}$ to $_{\gamma\alpha}$
However, I also have power index which is not supposed to be altered, such as $E^3_{112}$ should be change to $E^3_{\alpha\alpha\beta}$.
Is there a way to use regular expression to make this task easier? I know some regular expression from unix and perl, but seems inadequate for this problem.
thank you for anything!
I'm not 100% familiar with Latex, but typical regex would look like this:
(?<\^)#
Where the # is 1, 2 or 3. Then, in your replace, you would replace the matches with \alpha, \beta and \gamma. The (?<\^) is a negative look-behind that says to only replace instances of that number when they aren't preceded by a ^ character (your power indicator).
If typical regex doesn't permit, I'll delete my answer.
In Perl you could do things like:
$text =~ s#\$\w[^${\s]*_{\K([123]+)(?=}\$)#
local $_ = $1;
s/1/\\alpha/g; s/2/\\beta/g; s/3/\\gamma/g;
$_
#ge;
Try these:
replace (?<!\^\d|\d{2}|\d{3}|\d{4})1 with \alpha
replace (?<!\^\d|\d{2}|\d{3}|\d{4})2 with \beta
replace (?<!\^\d|\d{2}|\d{3}|\d{4})3 with \gamma
Edit: These regexes make sure that it won't replace a number from an exponent. You may have to tweak them to check for optional - if you have negative exponents.
Edit 2: #QTax pointed out that you can't use a variable length lookbehinds.
Subexp of look-behind must be fixed character length.
But different character length is allowed in top level
alternatives only.
Reference: http://tacosw.com/latexian/help/find/regex.html
I don't know what editor or regex engine you're using for this, but here's the basic idea I'd go with in Perl-ish regex:
Replace this:
(?<=\{\d*)1(?=\d*\})
With this:
\\alpha
I think you'll want to set the g flag as well.
Not sure if I have the right escaping syntax (it's been a while since I touched Perl) but I think so.
Repeat as necessary for \beta, \gamma, etc.

RegExp: want to find all links that do not end in ".html"

I'm a relative novice to regular expressions (although I've used them many times successfully).
I want to find all links in a document that do not end in ".html"
The regular expression I came up with is:
href=\"([^"]*)(?<!html)\"
In Notepad++, my editor, href=\"([^"]*)\" finds all the links (both those that end in "html" and those that do not).
Why doesn't negative lookbehind work?
I've also tried lookahead:
href=\"[^"]*(?!html\")
but that didn't work either.
Can anybody help?
Cheers, grovel
That regular expression would work fine, if you were using PERL or PCRE (e.g. preg_match in PHP). However, lookahead and lookbehind assertions are not supported by most, especially the more simple, regular expression engines, like one that is used by the Notepad++. Only the most basic syntax such as quantifiers, subpatterns and characters classes are supported by almost all regular expression engines.
You can find the documentation for the notepad++ regular expression engine at: http://sourceforge.net/apps/mediawiki/notepad-plus/index.php?title=Regular_Expressions
Edit: Notepad++ using SciTE regular expression engine and it does not support look around expressions.
For more info take a look here http://www.scintilla.org/SciTERegEx.html
Original Answer
^.*(?<!\.html)$
You can make a regexp that does it, but it would probably be too complex:
href=\"((([^"]*)([^h"][^"][^"][^"]|[^t"][^"][^"]|[^m"][^"]|[^l]))|([^"]|)([^"]|)([^"]|))\"
Thank you all very much.
In the end the regular expression did indeed not work.
I simply used a workaround, and replaced all links with themselves+".html", then replaced all occurences of ".html.html" with ".html".
So I replaced href=\"([^"]*)\" with href="\1.html" and then .html.html with .html
Thanks anyway, grovel
Note that Notepad++ (now?) supports assertions like this. (I have Notepad++ 6.3, dated Feb 3 2012.)
I believe the Regular Expressions documentation implies that both replace-variants use the same PCRE-dialect:
standard: Search | Replace (default shortcut Ctrl H)
plugin: TextFX | TextFX Quick | Find/Replace (default shortcut Ctrl R)