Library to check if two regular expressions are equal/isomorphic [closed] - c++

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I need a library which will take in two regular expressions and determine whether they are isomorphic (i.e. match exactly the same set of strings or not)
For example a|b is isomorphic to [ab]
As I understand it, a regular expression can be converted to an NFA which in some cases can be efficiently converted to a DFA. The DFA can then be converted to a minimal DFA, which, if I understand it correctly, is unique and so these minimal DFA's can then be compared for equality. I realize that not all regular expression NFA's can be efficently transformed into DFA's (especially when they were generate from Perl Regexps which are not truly "regular") in which case ideally the library would just return an error or some other indication that the conversion is not possible.
I see tons of articles and academic papers on-line about doing this (and even some programming assignments for classes asking students to do this) but I can't seem to find a library which implements this functionality. I would prefer a Python and/or C/C++ library, but a library in any language will do. Does anyone know if such a library? If not, does someone know of a library that gets close that I can use as a starting point?

Haven't tried it, but Regexp:Compare for Perl looks promising: two regex's are equivalent if the language of the first is a subset of the second, and vice verse.

The brics automaton library for Java supports this.
It can be used to convert regular expressions to minimal Deterministic Finite State Automata, and check if these are equivalent:
public static void isIsomorphic(String regexA, String regexB) {
Automaton a = new RegExp(regexA).toAutomaton();
Automaton b = new RegExp(regexB).toAutomaton();
return a.equals(b);
}
Note that this library only works for regular expressions that describe a regular language: it does not support some more advanced features, such as backreferences.

Related

Among regular and context-free grammars which one is more powerful. Please give me the reason too [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I was just going through the principles of programming languages. I know the concepts of regular and context-free grammars and their usage. But still I am unable to decide which one is more powerful and why. Please help
Thanks in advance.
Every regular language is context-free, but some context-free languages are not regular. In that sense, the context-free languages are more "powerful" than the regular languages.
As one example of a nonregular language that is context-free, consider the language of all palindromes made up of the characters x and y. You can prove that this language is nonregular by using the pumping lemma or the Myhill-Nerode theorem. However, it is context-free, since it's generated by the grammar
S → aSa | bSb | a | b | ε
Intuitively, regular languages correspond to yes/no questions that can be solved on a computer with finite memory (the Myhill-Nerode theorem is one way of formalizing this intuition). This means that any yes/no problem that can't be solved with only finite memory therefore won't correspond to regular languages. Context-free languages occupy a strange middle ground - they correspond to problems that can be solved on computers with finite memory and an unbounded stack.
If you're interested in learning more about this, I'd recommend reading through a book on formal languages and computability. There's a lot of amazingly beautiful results about these classes of languages and there's no way I can compress it into a single answer.
Hope this helps!

NFA DFA and Regex to Transition Table [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I have been looking out for some algorithm that has in input a regular expression or a string and that converts it into an NFA and then a DFA, and that would actually print out the transition table of the corresponding final DFA.
I'am thus wondering if there is already an algorithm or C or Python library that does that,or if you have suggestions of algorithms to use, that I could implement.
Thank you.
I'm not certain if either of these links might help you.
The first provide a very simple NFA/DFA implementation in Python, with conversion from NFA to DFA. It doesn't generate the NFA from a regex though, but it is not so difficult to do. The second site provides a long discussion on NFA vs DFA, including numerous code examples (mostly in C) and links to external libraries that I know little of. The third and fourth links provide the source code of two regex engines implementation developped by the second article's author, including parsing from regex to NFA, then conversion from NFA to DFA. Note however that I haven't had a look at either of these projects.
https://gist.github.com/Arachnid/491973
http://swtch.com/~rsc/regexp/
https://code.google.com/p/re1/source/browse/
https://code.google.com/p/re2/source/browse/
Otherwise, I would mention that most real world regex engines use NFA, not DFA, because of some extended features that simply can't be performed with a DFA. So if none of hte links above can help you, then you might have some luck looking at compiler-compilers, since they are the ones that actually use DFAs.

C++ Business rule expression parser/evaluation [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I'm looking for suggestions of portable lightweight libraries written in C++, that support mathematical and business rule expression and evaluation. I understand C++ doesn't provide such functionality in the STL.
The basic requirement is as follows:
The expressions to be evaluated will be comprised of numbers and strings and variables either representing numbers or strings.
Some of the expressions are expected to be evaluated many times per second (1000-2000 times), hence there is a requirement for high performance evaluations of the expressions.
Originally the project at my company, we encode all the business rules as classes that derived from a base expression class. The problem is that this approach does not scale well as the number of expressions increases.
I've googled around, but most "libraries" I could find are pretty much simple examples of the shunting yard algorithm, most of the expression parsers, perform parsing and evaluation in the same step making them unsuitable for continuous reevaluations, and most only support numbers.
What I'm looking for:
Library written in C++ (C++03 or C++11)
Stable/production worthy
Fast evaluations
Portable (win32/linux)
Any suggestions for building high performance business rules engine.
Example business rule:
'rule_result = (remaining_items < min_items) and (item == "beach ball")'
See the C++ Mathematical Expression Library outlined in this answer.
But, if you really want speed, consider compiling the expressions as C/C++ directly, then load them dynamically (shared objects/DLLs).
Have you considered generating your own parser with Bison + Flex? It uses a FSM-based LALR parser implementation that is fast and is easy to write, and supports evaluation of expressions while you're parsing them, as well as AST generation for repeated evaluation.

Linux C or C++ library to diff and patch strings? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
Possible Duplicate:
Is there a way to diff files from C++?
I have long text strings that I wish to diff and patch. That is given strings a and b:
string a = ...;
string b = ...;
string a_diff_b = create_patch(a,b);
string a2 = apply_patch(a_diff_b, b);
assert(a == a2);
If a_diff_b was human readable that would be a bonus.
One way to implement this would be to use system(3) to call the diff and patch shell commands from diffutils and pipe them the strings. Another way would be to implement the functions myself (I was thinking treat each line atomically and use the standard edit distance n^3 algorithm linewise with backtracking).
I was wondering if anyone knows of a good Linux C or C++ library that would do the job in-process?
You could google implementation of Myers Diff algorithm. ("An O(ND) Difference Algorithm and Its Variations") or libraries that solve "Longest common subsequence" problem.
As far as I know, the situation with diff/patch in C++ isn't good - there are several libraries (including diff match patch, libmba), but according to my experience they're either somewhat poorly documented or have heavy external dependencies (diff match patch requires Qt 4, for example) or are specialized on type you don't need (std::string when you need unicode, for example), or aren't generic enough, or use generic algorithm which has very high memory requirements ((M+N)^2 where M and N are lengths of input sequences).
You could also try to implement Myers algorithm ((N+M) memory requirements) yourself, but the solution of problem is extremely difficult to understand - expect to waste at least a week reading documentation. Somewhat human-readable explanation of Myers algorithm is available here.
I believe that
https://github.com/cubicdaiya/dtl/wiki/Tutorial
may have what you need
http://code.google.com/p/google-diff-match-patch/
The Diff Match and Patch libraries offer robust algorithms to perform the operations required for synchronizing plain text.
Currently available in Java, JavaScript, Dart, C++, C#, Objective C, Lua and Python. Regardless of language, each library features the same API and the same functionality. All versions also have comprehensive test harnesses.

BNF grammar test case generation [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
Does anyone have any experience with a tool that generates test strings from a BNF grammar that could then be fed into a unit test?
I don't have an answer to the tool question, but I will say it is fairly easy in any text processing language (perl/python/etc) to randomly generate sentences from a BNF grammar, and slightly more verbose in a bigger language (Java/C/etc), but it shouldn't be too hard to roll your own.
The problem with this, of course, is that it can only generate strings in the grammar, and unless your grammar is very simple, the test space is infinitely large.
I've done exactly as hazzen commented (using an embedded DSL in a scripting language). It was a mildly interesting exercise, but except for the most basic tests of e.g. parsing, it wasn't terribly useful. Most of my most interesting tests have to do with more sophisticated relationships than one can easily express in BNF (or any other context-free grammar).
If, say, you're developing a compiler, then you likely have an abstract syntax tree datatype. If so, then you could write a function to generate an random AST -- with that, you can print it to a string and feed that to your unit test. It's guaranteed to be a valid program this way, since you started with your AST.
If I were writing a compiler in Haskell or ML, this is what I would do, using QuickCheck.
Gramtest is one such tool that can generate strings from arbitrary user defined BNF grammars. You can read more details about the algorithm behind Gramtest here and some practical tips on the tool are available here.