RegEx Lookarounds - Using own escape sequence - regex

I'm currently writing a little flatfile database for a project and in that context need to escape list item delimiters.
I decided to use ; as the delimiter and /; as my escaped version of that.
Since I already used RegEx lookarounds in the past, I was sure the following expression I use to split would do the job.
(?<!/);
My expression should match the ; in
abc;def
but should not match the ; in
abc/;def
I used RegExPal and the expression doesn't fit any of my examples.
Isn't this the correct structure of a regular expression to achieve my goal?
(?<!ForbiddenPreceedingExpression)CharacterFollowing
Any hints on where to find my problem?

There is nothing wrong with the regex.
The problem is that Regexpal is a javascript regular expression tester. Java script does not support look behinds.
Take a look at
pcre(php) Demo
where as this one won't
Javascript Demo

Related

Complex regular expression ... AND OR, negation

I would like to search files by their content in Total Commander so I want to create a regex, but I cannot find any manual where it would really be explained. My situation is that I need something like this:
fileContains("<html>") && fileContains("{myVariable1}") && fileNotContains("<script>")
I can write cca this:
(<html>)+
({myVariable1})+
(<script>){0} ... but this does not work for me
And I cannot put it all together. Any ideas, please? Or do you have a link to an excellent regex explanation?
try this regex:
(?=.*\{myVariable1\})(?=.*<html>)(?!.*<script>)
it's just 3 lookaheads in a row. one of those is a negative lookahead. Note the "single line" modifier to enable 'dot matches newline'.
edit (per comment): I guess Total Commander's regex engine does not support lookarounds at all. While you could combine two positive lookaheads into an equivalent 'consuming' pattern with something like this untested regex: (.*(\{myVariable1\}|<html>)){2}, you cannot include the 'negated search' within a single regex unless you have a legitimate regex engine.
You could try this Total Commander regex plug-in:
A RegEx content plug-in with Unicode support - based on Perl
Compatible Regular Expressions (PCRE) library. This plug-in may
replace TC's RegEx engine for file content

What's the eclipse c++ search regex dialect?

I want to learn more about the regex syntax of the search and replace function in eclipse c++.
Does it use it's own regex syntax(in this case anyone knows a good tutorial) or does it use the syntax of another language(like java regex, grep, perl regex)?
Eclipse search and replace feature uses Java regex:
The regular expression must respect Java Regex.
However, one of the peculiarities is that you cannot match zero-legnth strings (i.e. (?=,) won't match the empty string before ,). In such cases, use capturing groups in the regex and use backreferences to those groups in the replacement patterns (e.g. to add newlines after a comma use , in the search and $0\n in the replacement).

Why would a regex work in Sublime and not in vim?

Tried searching for regex found in this answer:
(,)(?=(?:[^']|'[^']*')*$)
I tried doing a search in Sublime and it worked out (around 700 results). When trying to replace the results it runs out of memory. Tried /(,)(?=(?:[^']|'[^']*')*$) in vim for searching first but it does not find any instances of the pattern. Also tried escaping all the ( and ) with \ in the regex.
Vim uses its own regular expression engine and syntax (which predates PCRE, by the way) so porting a regex from perl or some other editor will most likely need some work.
The many differences are too numerous to list in detail here but :help pattern and :help perl-patterns will help.
Anyway, this quick and dirty rewrite of your regular expression seems to work on the sample given in the linked question:
/\v(,)(\#=([^']|'[^']*')*$)
See :help \#= and :help \v.
One possible explanation is that the regular expression engine used in Sublime is different than the engine used in vim.
Not all regex engines are created equal; they don't all support the same features. (For example, a "negative lookahead" feature can be very powerful, but not all engines support it. And the syntax for some features differs betwen engines.)
A brief comparison of regular expression engines is available here:
http://en.wikipedia.org/wiki/Comparison_of_regular_expression_engines
Unfortunately Vim uses a different engine, and "normal" regular expressions won't work.
The regex you've mentioned isn't perfect: it doesn't skip escaped quotes, but, as I understand, it's good enough for you. Try this one, and if it doesn't match something, please send me that piece.
\v^([^']|'[^']*')*\zs,
A little explanation:
\v enables very magic search to avoid complex escaping rules
([^']|'[^']*') matches all symbols but quote and a pair of qoutes
\zs indicates the beginning of selection; you can think of it as of a replacement for lookbehind.
You have to escape the |, otherwise it doesn't work under vim. You should also escape the round brackets, unless you are searching for the '(' or ')' characters.
More information on regex usage in vim can be found on vimregex.com.

Perl Extended Regular Expressions - match with multiple question marks inside

I have got a weird thing to solve in perl using regular expressions.
Consider the strings -
abcdef000000123
blaDeF002500456
wefdEF120045423
All of these strings are matching with the below regular expression when I tried in C with pcre library support :
???[dD][eE][fF][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]
But I'm unable to achieve the same in perl code. I'm getting some weird errors.
Please help with the piece of perl code with which these two things match.
Thanks in advance...
? is called quantifier that makes preceding pattern or group an optional match. Independently ? doesn't make any sense in regex and you are getting an error like: Quantifier follows nothing in regex.
Following regex should work for you in perl:
...[dD][eE][fF][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]
OR even more concise regex:
.{3}[dD][eE][fF][0-9]{9}
Each dot means match any character.
PS: You probably are getting confused by shell's glob vs regex.
That looks more like a file system regex than a PCRE. In Perl, the ? is a quantifier, not a wild card. You may want to replace them with . to get the same results in anything Perl compatible.
I might use ...[dD][eE][fF][0-9]{9} or even replace the [0-9] with \d.
qr/[A-z]{3}def[0-9]{9}/i
should be the Perl Regex object used to validate the mentioned strings.
Regards

RegExp: want to find all links that do not end in ".html"

I'm a relative novice to regular expressions (although I've used them many times successfully).
I want to find all links in a document that do not end in ".html"
The regular expression I came up with is:
href=\"([^"]*)(?<!html)\"
In Notepad++, my editor, href=\"([^"]*)\" finds all the links (both those that end in "html" and those that do not).
Why doesn't negative lookbehind work?
I've also tried lookahead:
href=\"[^"]*(?!html\")
but that didn't work either.
Can anybody help?
Cheers, grovel
That regular expression would work fine, if you were using PERL or PCRE (e.g. preg_match in PHP). However, lookahead and lookbehind assertions are not supported by most, especially the more simple, regular expression engines, like one that is used by the Notepad++. Only the most basic syntax such as quantifiers, subpatterns and characters classes are supported by almost all regular expression engines.
You can find the documentation for the notepad++ regular expression engine at: http://sourceforge.net/apps/mediawiki/notepad-plus/index.php?title=Regular_Expressions
Edit: Notepad++ using SciTE regular expression engine and it does not support look around expressions.
For more info take a look here http://www.scintilla.org/SciTERegEx.html
Original Answer
^.*(?<!\.html)$
You can make a regexp that does it, but it would probably be too complex:
href=\"((([^"]*)([^h"][^"][^"][^"]|[^t"][^"][^"]|[^m"][^"]|[^l]))|([^"]|)([^"]|)([^"]|))\"
Thank you all very much.
In the end the regular expression did indeed not work.
I simply used a workaround, and replaced all links with themselves+".html", then replaced all occurences of ".html.html" with ".html".
So I replaced href=\"([^"]*)\" with href="\1.html" and then .html.html with .html
Thanks anyway, grovel
Note that Notepad++ (now?) supports assertions like this. (I have Notepad++ 6.3, dated Feb 3 2012.)
I believe the Regular Expressions documentation implies that both replace-variants use the same PCRE-dialect:
standard: Search | Replace (default shortcut Ctrl H)
plugin: TextFX | TextFX Quick | Find/Replace (default shortcut Ctrl R)