regular expression to match nested braces - regex

I need regular expression to match braces correct e.g for every open one close one
abc{abc{bc}xyz} I need it get all it from {abc{bc}xyz} not get {abc{bc}.
I tried using (\{.*?})

This is not possible with regular expressions. A context-free grammar would be necessary for this and regular expressions only work for finite regular languages.
According to this link there is an extension available for the regular expressions in .NET that can do this, but this just means that .NET regular expressions are more than just regular expressions.

This is not a task for a regular expression. What you're looking for is parser at that point. Which means a language grammar, LL(1), LALR, recursive-descent, the dragon book, and generally a splitting migraine.

Balanced parenthesis of arbitrary nested depth is not a regular language. It's a context-free language.
That said, many "regular expression" implementations actually recognize more than regular languages, so this is possible with some implementation but not others.
Wikipedia
Regular language
Pumping lemma for regular languages
Context-free language
Regular expression
Many features found in modern regular expression libraries provide an expressive power that far exceeds the regular languages.

As Bryan said, regular expressions might not be the right tool here, but if you're using PHP, the manual gives an example of how you might be able to use regular expressions in a recursive/nested fashion:
$input = "plain [indent] deep [indent] deeper [/indent] deep [/indent] plain";
function parseTagsRecursive($input)
{
$regex = '#\[indent]((?:[^[]|\[(?!/?indent])|(?R))+)\[/indent]#';
if (is_array($input)) {
$input = '<div style="margin-left: 10px">'.$input[1].'</div>';
}
return preg_replace_callback($regex, 'parseTagsRecursive', $input);
}
$output = parseTagsRecursive($input);
echo $output;
I'm not sure if that'll be helpful to you or not.

This is not possible in the "standard" regular expression language. However, a few different implementations have extensions that allow you to implement it. For example, here's a blog post that explains how to do it with .NET's regex library.
Generally speaking though, this is a task that regular expressions are not really suited to.

Assuming what you want to do is select a maximal substring between { and }:
.*? is a lazy quantifier. That is, it will match the least number of characters possible. If you change your expression to {.*}, you should find it will work.
If what you want to do is to verify that the braces are matched correctly, then as the other answers have stated, this is not possible with a (single) regular expression. You can do it by scanning the string with a stack though. Or with some voodoo of iterating your regular expression over the previous maximal match. Yikes.

Related

A regular expression to match a Perl expression?

Trying to mangle Perl source code (trying to implement a macro), I wonder (I doubt it) whether there exists a Perl regular expression to match one Perl expression (or one function parameter, if you prefer that view of things).
The expression may be arbitrary complex, even using multiple lines.
AFAIK things like "balanced parentheses" are impossible to do with pure regular expressions. Unfortunately all the source filters (like Filter::Simple) seem to be based on regular expressions.
See (?&PerlExpression) in PPR and PPI::Statement::Expression from PPI.

POSIX Regular Expressions: Excluding a word in an expression?

I am trying to create a regular expression using POSIX (Extended) Regular Expressions that I can use in my C program code.
Specifically, I have come up with the following, however, I want to exclude the word "http" within the matched expressions. Upon some searching, it doesn't look like POSIX makes it obvious for catching specific strings. I am using something called a "negative look-a-head" in the below example (i.e. the (?!http:) ). However, I fear that this may only be something available to regular expressions defined in dialects other than POSIX.
Is negative lookahead allowed? Is the logical NOT operator allowed in POSIX (i.e. ! )?
Working regular expression example:
href|HREF|src[[:space:]]=[[:space:]]\"(?!http:)[^\"]+\"[/]
If I cannot use negative-lookahead like in other dialects, what can I do to the above regular expression to filter out the specific word "http:"? Ideally, is there any way without inverse logic and ultimately creating a ridiculously long regular expression in the process? (the one I have above is quite long already, I'd rather it not look more confusing if possible)
[NOTE: I have consulted other related threads in Stack Overflow, but the most relevant ones seem to only ask this question "generically", which means answers given didn't necessarily mean they were POSIX-flavored ==> in another thread or two, I've seen the above (?!insertWordToExcludeHere) negative lookahead, but I fear it's only for PHP.)
[NOTE 2: I will take any POSIX regular expression phrasings as well, any help would be appreciated. Does anyone have a suggestion on how whatever regular expression that would filter out "http:" would look like and how it could be fit into my current regular expression, replacing the (?!http:)?]
According to http://www.regular-expressions.info/refflavors.html lookaheads and lookbehinds are not in the POSIX flavour.
You may consider thinking in terms of lexing (tokenization) and parsing if your problem is too complex to be represented cleanly as a regex.

Construction of pattern that doesn't contain binary string

I was trying to write a pattern which doesn't contain binary string (let's assume 101). I know that such expressions cannot be written using Regular Expression considering http://en.wikipedia.org/wiki/Regular_language.
I tried writing the pattern for the above problem using Regular Expression though and it seems to be working.
\b(?!101)\w+\b
What I wanted to ask is that can a regular expression be written for my problem and why? And if yes, then is my regular expression correct?
To match a whole string that doesn't contain 101:
^(?!.*101).*$
Look-ahead are indeed an easy way to check a condition on a string through regex, but your regex will only match alphanumeric words that do not start with 101.
You wrote
I know that such expressions cannot be written using Regular
Expression considering http://en.wikipedia.org/wiki/Regular_language.
In that Wikipedia article, you seem to have missed the
Note that the "regular expression" features provided with many
programming languages are augmented with features that make them
capable of recognizing languages that can not be expressed by the
formal regular expressions (as formally defined below).
The negative lookahead construct is such a feature.

Is it possible to have regexp that matches all valid regular expressions?

Is it possible to detect if a given string is valid regular expression, using just regular expressions?
Say I have some strings, that may or may not be a valid regular expressions. I'd like to have a regular expression matches those string that correspond to valid regular expression. Is that possible? Or do I have use some higher level grammar (i.e. context free language) to detect this? Does it affect if I am using some extended version of regexps like Perl regexps?
If that is possible, what the regexp matching regexp is?
No, it is not possible. This is because valid regular expressions involve grouping, which requires balanced parentheses.
Balanced delimiters cannot be matched by a regular expression; they must instead be matched with a context-free grammar. (The first example on that article deals with balanced parentheses.)
See an excellent write-up here:
Regular expression for regular expressions?
The answer is that regexes are NOT written using a regular grammar, but a context-free one.
If your question had been "match all valid regular expressions", the answer is (perhaps surprisingly) 'yes'. The regular expression .* match all valid (and non-valid) regular expressions, but is pretty useless for determining if you're looking at a valid one.
However, as the question is "match all and only valid regular expressions", the answer is (as DVK and Platinum Azure" have said 'no'.

Regular expression which matches regular expressions

Is it possible to write a regular expression which matches regular expressions? Does anyone have examples? If there is some theoretical obstruction, does anyone know of a regex which will match at least the most common regex patterns?
Regular expressions are not a regular language, and thus cannot be described by a regular expression!
Update: More useful practical answer
You cannot detect valid regular expressions using any regular expression. To detect its validity, you should just parse the string using the regex library and it would fail if it is an invalid regular expression. For example, in Java, it would be something like:
boolean isValidRegexp(String s) {
try {
Pattern.compile(s);
return true;
} catch (Exception e) {
return false;
}
}
This technique should work with almost any language.
You're all wrong! In my secret laboratories, my evil scientists have discovered the regular expression that can match any regular expression:
.*
It will even match the null expression. Let's see you try to match that!
As an added benefit, it will even match strings that are not regular expressions.
It is not possible using standard regular expressions.
Regular expressions can be nested indefinitely (eg, /(a(b(c(d))))/), which is impossible to match using standard regex.
According Crockford this is a regex which matches regular expression (at least in JavaScript)
/\/(\\[^\x00-\x1f]|\[(\\[^\x00-\x1f]|[^\x00-\x1f\\\/])*\]|[^\x00-\x1f\\\/\[])+\/[gim]*/
Yes. Example: This regex ^[a-z][+*]$ will match this regex z+ and this a* and this c+ and so on.
This is not possible. Regular expressions can only match regular languages. Regular expressions are not a regular language. If memory serves I believe they are a context-free language and require a context-free grammar to match.
Here we go:
m{/([^\\/]++|\\.)/}
Should match a regular expression delimited by //.
Of course, it won't ensure that the regular expression parses correctly - it just identifies where it is (say, for a tokenizer).