PERL Regex Negation Issue - regex

I have written a regex to pick files of the format
(ABC.*\.DAT) in perl.
How to write a negation for the above regex?
I already tried expressions like (?!ABC.*)\.DAT or (?!(ABC.*\.DAT))
Any help is appreciated.

(?s:(?!ABC).)*\.DAT
You can try this negation based regex. See demo.
The above can be safely embedded into a larger pattern. For example,
/^(?:(?!ABC).)*\.DAT\z/s
If you are trying to match the whole input, and if ABC doesn't end with ., .D, .DA or .DAT, then the following will be faster:
/^(?!.*ABC)\.DAT\z/s

Related

Complex regular expression ... AND OR, negation

I would like to search files by their content in Total Commander so I want to create a regex, but I cannot find any manual where it would really be explained. My situation is that I need something like this:
fileContains("<html>") && fileContains("{myVariable1}") && fileNotContains("<script>")
I can write cca this:
(<html>)+
({myVariable1})+
(<script>){0} ... but this does not work for me
And I cannot put it all together. Any ideas, please? Or do you have a link to an excellent regex explanation?
try this regex:
(?=.*\{myVariable1\})(?=.*<html>)(?!.*<script>)
it's just 3 lookaheads in a row. one of those is a negative lookahead. Note the "single line" modifier to enable 'dot matches newline'.
edit (per comment): I guess Total Commander's regex engine does not support lookarounds at all. While you could combine two positive lookaheads into an equivalent 'consuming' pattern with something like this untested regex: (.*(\{myVariable1\}|<html>)){2}, you cannot include the 'negated search' within a single regex unless you have a legitimate regex engine.
You could try this Total Commander regex plug-in:
A RegEx content plug-in with Unicode support - based on Perl
Compatible Regular Expressions (PCRE) library. This plug-in may
replace TC's RegEx engine for file content

RegEx Lookarounds - Using own escape sequence

I'm currently writing a little flatfile database for a project and in that context need to escape list item delimiters.
I decided to use ; as the delimiter and /; as my escaped version of that.
Since I already used RegEx lookarounds in the past, I was sure the following expression I use to split would do the job.
(?<!/);
My expression should match the ; in
abc;def
but should not match the ; in
abc/;def
I used RegExPal and the expression doesn't fit any of my examples.
Isn't this the correct structure of a regular expression to achieve my goal?
(?<!ForbiddenPreceedingExpression)CharacterFollowing
Any hints on where to find my problem?
There is nothing wrong with the regex.
The problem is that Regexpal is a javascript regular expression tester. Java script does not support look behinds.
Take a look at
pcre(php) Demo
where as this one won't
Javascript Demo

Perl Extended Regular Expressions - match with multiple question marks inside

I have got a weird thing to solve in perl using regular expressions.
Consider the strings -
abcdef000000123
blaDeF002500456
wefdEF120045423
All of these strings are matching with the below regular expression when I tried in C with pcre library support :
???[dD][eE][fF][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]
But I'm unable to achieve the same in perl code. I'm getting some weird errors.
Please help with the piece of perl code with which these two things match.
Thanks in advance...
? is called quantifier that makes preceding pattern or group an optional match. Independently ? doesn't make any sense in regex and you are getting an error like: Quantifier follows nothing in regex.
Following regex should work for you in perl:
...[dD][eE][fF][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]
OR even more concise regex:
.{3}[dD][eE][fF][0-9]{9}
Each dot means match any character.
PS: You probably are getting confused by shell's glob vs regex.
That looks more like a file system regex than a PCRE. In Perl, the ? is a quantifier, not a wild card. You may want to replace them with . to get the same results in anything Perl compatible.
I might use ...[dD][eE][fF][0-9]{9} or even replace the [0-9] with \d.
qr/[A-z]{3}def[0-9]{9}/i
should be the Perl Regex object used to validate the mentioned strings.
Regards

Regular expression theory

I have a little problem with RE theory.
Given an alphabet {0, 1}, I have to create a regular expression that matches all string that does NOT contain the substring 111.
I'm not able to get the point, also for simplier substring like 00.
Edit: The solution must contains only the three standard operation: concatenation, alternation, kleene star, as you can see in the wiki link
Thank you.
As far as I understand, the language you want to regexify is not allowed to contain three or more consecutive 1's. Such a regexp could be (110|10|0*)*|1|11|0*1|0*11
How about this:
{ε|1}{ε|1}{ε|{0{ε|1}{ε|1}}*}
Back in the days when we didn't have the ?! negative lookahead facility I would use a negation match. So for grep I would
grep -v (pattern I'm searching for) someFile.txt
which would give the lines in the file that don't contain the pattern.
In perl I would use the
!~
negation matcher rather than the usual
=~
I don't know which regex variant you are using, but I'm struggling to see a way to solve your problem without either an overall negation or a ?! negative lookahead.
matcher.

How do I properly match Regular Expressions?

I have a list of objects output from ldapsearch as follows:
dn: cn=HPOTTER,ou=STUDENTS,ou=HOGWARTS,o=SCHOOL
dn: cn=HGRANGER,ou=STUDENTS,ou=HOGWARTS,o=SCHOOL
dn: cn=RWEASLEY,ou=STUDENTS,ou=HOGWARTS,o=SCHOOL
dn: cn=DMALFOY,ou=STUDENTS,ou=HOGWARTS,o=SCHOOL
dn: cn=SSNAPE,ou=FACULTY,ou=HOGWARTS,o=SCHOOL
dn: cn=ADUMBLED,ou=FACULTY,ou=HOGWARTS,o=SCHOOL
So far, I have the following regex:
/\bcn=\w*,/g
Which returns results like this:
cn=HPOTTER,
cn=HGRANGER,
cn=RWEASLEY,
cn=DMALFOY,
cn=SSNAPE,
cn=ADUMBLED,
I need a regex that returns results like this:
HPOTTER
HGRANGER
RWEASLEY
DMALFOY
SSNAPE
ADUMBLED
What do I need to change in my regex so the pattern (the cn= and the comma) is not included in the results?
EDIT: I will be using sed to do the pattern matching, and piping the output to other command line utilities.
You will have to perform a grouping. This is done by modifying the regex to:
/\bcn=\(\w*\),/g
This will then populate your result into a grouping variable. Depending on your language how to extract this value will differ. (For you with sed the variable will be \1)
Note that most regex flavors you don't have to escape the brackets (), but since you're using sed you will need to as shown above.
For an excellent resource on Regular Expressions I suggest: Mastering Regular Expressions
OK, the place where you asked the more specific question was closed as "exact duplicate" of this, so I'm copying my answer from there to here:
If you want to use sed, you can use something like the following:
sed -e 's/dn: cn=\([^,]*\),.*$/\1/'
You have to use [^,]* because in sed, .* is "greedy" meaning it will match everything it can before looking at any following character. That means if you use \(.*\), in your pattern it will match up to the last comma, not up to the first comma.
Check out Expresso I have used it in the past to build my RegEx. It is good to help learning too.
The quick and dirty method is to use submatches assuming your engine supports it:
/\bcn=(\w*),/g
Then you would want to get the first submatch.
Without knowing what language you're using, we can't tell for sure, but in most regular expression parsers, if you use parenthesis, such as
/\bcn=(\w*),/g
then you'll be able to get the first matching pattern (often \1) as exactly what you are searching for. To be more specific, we need to know what language you are using.
If your regex supports Lookaheads and Lookbehinds then you can use
/(?<=\bcn=)\w*(?=,)/g
That will match
HPOTTER
HGRANGER
RWEASLEY
DMALFOY
SSNAPE
ADUMBLED
But not the cn= or the , on either side. The comma and cn= still have to be there for the match, it just isn't included in the result.
Sounds more like a simple parsing problem and not regex. An ANTLR grammar would sort this out in no time.