regex worst possible complexity [closed] - regex

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 2 years ago.
Improve this question
I'm aware regular expressions, as a concept, represent a single Regular Language, which can be processed via a DFA/NFA with O(n) ~O(2^m) complexity, being n the size of the string and m the size of the regex. Most stack-overflow discussions about the subject quote this awesome article that proves it.
However, regex implemented in modern languages deal with more than regular languages. For instance, it's possible to recognize palindromes with regex, a context-free grammar problem that, when solved via a push-down automata(PDA), is known to have a O(n³) complexity.
I would like to know were exactly in the Extended Chomsky Hierarchy modern (perl or python re, for example) regex implementations fit, or, at least, their worst possible complexity.

Related

Snobol Pattern Matching [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
My question is simple. Is the programming language Snobol4 still useful to provide a modern day solution for pattern matching, or has regex in other procedural languages pretty much wiped it out in application?
The pattern language is modeled on context free grammars,
with context sensitive extensions that provide full (type 0)
computational capabilities.
This is from an introduction originally written by Robert Dewar, creator of the SPITBOL implementation of SNOBOL-4. Since both references are in relatively new, and maintained libraries, I'd think that even though the pattern language is not part of, say, JavaSrcipt with its statistically many uses, or part of other modern languages, it might surprise users of REs in terms of speed and power.
That being said, SNOBOL-4 patterns have been criticized for leading to hard to understand programs, for using FENCE, not NOT (Farber? Gimpel?) and other phenomena that seem to have a come-back with Perl5 compatible "regular expressions", and ICU's. They, too, are rediscovering some effects of backtracking and anchors. R.E. Griswold, creator of SNOBOL-4, has later created the Icon programming language. It features generators and goal directed evaluation, thus taking backtracking to a level at which arguably search is more clearly expressed than one could do using the implications of complex patterns.
Insofar as this historic development is preceding today's "REs" growing in power, I'd say that SNOBOL-4 patterns offer something to the profession for evaluation, what to do again and what not to do again.
Say, do we need a BAL pattern in practice?

Among regular and context-free grammars which one is more powerful. Please give me the reason too [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I was just going through the principles of programming languages. I know the concepts of regular and context-free grammars and their usage. But still I am unable to decide which one is more powerful and why. Please help
Thanks in advance.
Every regular language is context-free, but some context-free languages are not regular. In that sense, the context-free languages are more "powerful" than the regular languages.
As one example of a nonregular language that is context-free, consider the language of all palindromes made up of the characters x and y. You can prove that this language is nonregular by using the pumping lemma or the Myhill-Nerode theorem. However, it is context-free, since it's generated by the grammar
S → aSa | bSb | a | b | ε
Intuitively, regular languages correspond to yes/no questions that can be solved on a computer with finite memory (the Myhill-Nerode theorem is one way of formalizing this intuition). This means that any yes/no problem that can't be solved with only finite memory therefore won't correspond to regular languages. Context-free languages occupy a strange middle ground - they correspond to problems that can be solved on computers with finite memory and an unbounded stack.
If you're interested in learning more about this, I'd recommend reading through a book on formal languages and computability. There's a lot of amazingly beautiful results about these classes of languages and there's no way I can compress it into a single answer.
Hope this helps!

Regular Expression - At most One Repeated Digit [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 9 years ago.
Improve this question
I am struggling with a homework problem. I have tried this problem for hours literally. I found a similar question here, but it is not exactly my problem.
The homework problem says 1. (20 points) Construct regular expressions for the following languages.
a) All strings of digits with at most one repeated digit.
The only way I see how this is possible would be to exhaustively somehow take care of every possible case. There are 10 different digits, so it's like A LOT of different cases. I think the max length string can be 11, because after 11, you have to have a second repeated digit. So the number of possible combinations is 10^11. I thought even about writing a DFA and just converting it to a regex, but even that seems like it's impossible.
Does anyone have any advice? We aren't allowed to use non-standard regex features, like groups, lookahead, etc. This is just a plain old regex kind of problem.
Response to comment:
It is not binary. I already asked the teacher.
"Commenters, “regular expression” has one well-defined meaning in computer science. Since this is homework, it’s almost certainly that which is meant (and even more so as it talks of “languages”), and not some specific library. There’s no ambiguity here, and no clarification needed." This is basically what we want. The standard regex stuff often used in theoretical CS classes. As far as what we learned in class, I go to USC if anyone is familiar with that and we only barely talked about this at all. We're onto a completely different topic now.

NFA DFA and Regex to Transition Table [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I have been looking out for some algorithm that has in input a regular expression or a string and that converts it into an NFA and then a DFA, and that would actually print out the transition table of the corresponding final DFA.
I'am thus wondering if there is already an algorithm or C or Python library that does that,or if you have suggestions of algorithms to use, that I could implement.
Thank you.
I'm not certain if either of these links might help you.
The first provide a very simple NFA/DFA implementation in Python, with conversion from NFA to DFA. It doesn't generate the NFA from a regex though, but it is not so difficult to do. The second site provides a long discussion on NFA vs DFA, including numerous code examples (mostly in C) and links to external libraries that I know little of. The third and fourth links provide the source code of two regex engines implementation developped by the second article's author, including parsing from regex to NFA, then conversion from NFA to DFA. Note however that I haven't had a look at either of these projects.
https://gist.github.com/Arachnid/491973
http://swtch.com/~rsc/regexp/
https://code.google.com/p/re1/source/browse/
https://code.google.com/p/re2/source/browse/
Otherwise, I would mention that most real world regex engines use NFA, not DFA, because of some extended features that simply can't be performed with a DFA. So if none of hte links above can help you, then you might have some luck looking at compiler-compilers, since they are the ones that actually use DFAs.

C++ Business rule expression parser/evaluation [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I'm looking for suggestions of portable lightweight libraries written in C++, that support mathematical and business rule expression and evaluation. I understand C++ doesn't provide such functionality in the STL.
The basic requirement is as follows:
The expressions to be evaluated will be comprised of numbers and strings and variables either representing numbers or strings.
Some of the expressions are expected to be evaluated many times per second (1000-2000 times), hence there is a requirement for high performance evaluations of the expressions.
Originally the project at my company, we encode all the business rules as classes that derived from a base expression class. The problem is that this approach does not scale well as the number of expressions increases.
I've googled around, but most "libraries" I could find are pretty much simple examples of the shunting yard algorithm, most of the expression parsers, perform parsing and evaluation in the same step making them unsuitable for continuous reevaluations, and most only support numbers.
What I'm looking for:
Library written in C++ (C++03 or C++11)
Stable/production worthy
Fast evaluations
Portable (win32/linux)
Any suggestions for building high performance business rules engine.
Example business rule:
'rule_result = (remaining_items < min_items) and (item == "beach ball")'
See the C++ Mathematical Expression Library outlined in this answer.
But, if you really want speed, consider compiling the expressions as C/C++ directly, then load them dynamically (shared objects/DLLs).
Have you considered generating your own parser with Bison + Flex? It uses a FSM-based LALR parser implementation that is fast and is easy to write, and supports evaluation of expressions while you're parsing them, as well as AST generation for repeated evaluation.