I would like to expand the following expression in sympy. This expression uses Einstein summation convention:
I have found a previous post addressing a similar situation. Also the docs provide some information, but it is too abstract to put into use.
How can I expand the expression using sympy while respecting the rules for summation convention? Thanks.
Related
I'm using sympy to generate expressions like this:
for crowd in itertools.combinations(symbs, max_true + 1):
exprs.append(functools.reduce(operator.and_, crowd))
unaltered = ~functools.reduce(operator.or_, exprs)
Later, I convert them to CNF:
altered = sympy.logic.boolalg.to_cnf(unaltered, simplify=True, force=True)
It takes a lot of computer time. I made a gist with more details:
https://gist.github.com/MatrixManAtYrService/501ea099826a5aeeacc9368710b059ec
Given that I'm generating expressions with for loops, they're in a reliable format. Sympy (understandably) is doing the exhaustive thing and solving them "by hand", because it doesn't know that they're so well behaved. A human who is looking at the unaltered/altered expressions can easily ascertain the pattern and just generate the CNF directly with a for loop.
That's easy enough in this case, but I expect to have more constraints.
I want to know if I'm in uncharted terratory, or just failing to ask for help correctly.
Does Sympy have anything to help with this kind of thing? Is there another library I should explore? Is there a name for the "look at it and extrapolate based on a pattern" strategy that I'm proposing? Is there a list of algorithms for the task somewhere?
I am currently using QuadProg++ for solving a dual problem. The problem also has some box constraints, i.e. constraints which limit the variable to be between two values. However, QuadProg++ has no provision which allows for incorporating such constraints. It only takes in the equality and inequality constraints. The equivalent Quadratic Programming tool in MATLAB, on the other hand, does have a provision for including box constraints.
You can take a look at the following link to see what I'm talking about:
http://www.mathworks.in/help/optim/ug/quadprog.html
Basically, I have a constraint equivalent to lb < x < ub.
I tried adding this as an inequality constraint, but it doesn't work. It results in an error, saying that the constraints are linearly dependent. However, I'm pretty sure that the constraints I'm inputting are in no way linearly dependent on each other.
Please suggest a workaround, or some other quadratic programming tool in C++, which can be of help for me. Thanks!
Given a regular expression, I want to produce the set of strings that that regular expression would match. It is important to note that this set would not be infinite because there would be maximum length for each string. Are there any well known algorithms in place to do this? Are there any research papers I could read to gain insight into this problem?
Thanks.
p.s. Would this sort of question be more appropriate in the theoretical cs stack exchange?
Are there any well known algorithms in place to do this?
In the Perl eco-system the Regexp::Genex CPAN module does this.
In Python the sre_yield generates the matching words. Regex inverter also does this.
A recursive algorithm is described here link1 link2 and several libraries that do this in Java are mentioned here.
Generation of random words/strings that match a given regex: xeger (Python)
Are there any research papers I could read to gain insight into this problem?
Yes, the following papers are available for counting the strings that would match a regex (or obtaining generating functions for them):
Counting occurrences for a finite set of words: an inclusion-exclusion approach by F. Bassino, J. Clement2, J. Fayolle, and P. Nicodeme (2007) paper slides
Regexpcount, a symbolic package for counting problems on regular expressions and words by Pierre Nicodeme (2003) paper link link code
I was reading the Java project idea described here:
The user gives examples of what he wants and does not want to match. The program tries to deduce a regex that fits the examples. Then it generates examples that would fit and not fit. The user corrects the examples it got wrong, and it composes a new regex. Iteratively, you get a regex that is close enough to what you need.
This sounds like a really interesting idea to me. Does anyone has an idea as to how to do this? My first idea was something like a genetic algorithm, but I would love some input from you guys.
Actually, this starts to look more and more like a compiler application. In fact, if I remember correctly, the Aho Dragon compiler book uses a regex example to build a DFA compiler. That's the place to start. This could be a really cool compiler project.
If that is too much, you can approach it as an optimization in several passes to refine it further and further, but it will be all predefined algo's at first:
First pass: Want to match Cat, Catches cans
Result: /Cat|Catches|Cans/
Second Pass: Look for similar starting conditions:
Result: /Ca(t|tches|ans)/
Second Pass: Look for similar ending conditions:
Result: /Ca(t|tch|an)s*/
Third Pass: Look for more refinements like repetitions and negative conditions
There exists algorithm that does exactly this for positive examples.
Regular expression are equivalent to DFA (Deterministic Finite Automata).
The strategie is mostly always the same:
Look at Alergia (for the theory) and MDI algorithm (for real usage) if generate an Deterministic Automaton is enough.
The Alergia algorithm and MDI are both described here:
http://www.info.ucl.ac.be/~pdupont/pdupont/pdf/icml2k.pdf
If you want to generate smaller models you can use another algorithm. The article describing it is here:
http://www.grappa.univ-lille3.fr/~lemay/publi/TCS02.ps.gz
His homepage is here:
http://www.grappa.univ-lille3.fr/~lemay
If you want to use negative example, I suggest you to use a simple rule (coloring) that prevent two states of the DFA to be merged.
If you ask these people, I am sure they will share their code source.
I made the same kind of algorithm during my Ph.D. for probabilistic automata. That means, you can associate a probability to each string, and I have made a C++ program that "learn" Weighted Automata.
Mainly these algorithm work like that:
from positive examples: {abb, aba, abbb}
create the simplest DFA that accept exactly all these examples:
-> x -- a --> (y) -- b --> (z)
\
b --> t -- b --> (v)
x cant got to state y by reading the letter 'a' for example.
The states are x, y, z, t and v. (z) means it is a finite state.
then "merge" states of the DFA: (here for example the result after merging states y and t.
_
/ \
v | a,b ( <- this is a loop :-) )
x -- a -> (y,t) _/
the new state (y,t) is a terminal state obtaining by merging y and t. And you can read the letter a and b from it.
Now the DFA can accept: a(a|b)* and it is easy to construct the regular expression from the DFA.
Which states to merge is a choice that makes the main difference between algorithms.
The program tries to deduce a regex
that fits the examples
I don't think it's a useful question to ask. You have to know semantically what you need to represent to deduce something. When you write a regex, you have a purpose: accepting urls, accepting emails, extracting tokens from code, etc. I would redefine the question as so: given a knowledge base and a semantic for regular expression, compute the smallest regex. This get a step further, because you have natural language trying explaining a general expression and we all know how it get ambiguous! You have to have some semantic explanation. Without that, the best thing you can do for examples is to compute regex that cover all string from the ok list.
Algorithm for coverage:
Populate Ok List
Populate Not ok List
Check for repetitions
Check for contradictions ( the same string cannot be in both list )
Create Deterministic Finite Automata (DFA) from Ok List where strings from the list are final states
Simplify the DFA by eliminating repetitive states. ([1] 4.4 )
Convert DFA to regular expression. ([1] 3.2.2 )
Test Ok list and Not ok list
[1] Introduction to Automata Theory, Language, and Computation. J. Hopcroft, R. Motwani, J.D. Ullman, 2nd Edition, Pearson Education.
P.S.
I had some experience with genetic programming and I think it's really overhead for your problem. Since the objective function is really light it's better to evaluate with a single processor and this can take a lot of time. To have shorter expression you just need to minimize the DFA. But GA may possibly produce interesting result.
Maybe I'm a bit late, but there is a way to solve this problem by means of Genetic Programming.
Genetic Programming (GP) is an evolutionary machine learning technique in which candidate a candidate solution for a given problem is represeted as an abstract syntax tree.
Several studies have been published on how to use GP in order to find a regular expression that matches a given set of examples.
You can find the articles and the details here
A webapp that does this is hosted at regex.inginf.units.it.
The source code behind the application has been publicly released on github
You may try to use a basic inferring algorithm that has been used in other applications. I have implemented a very basic based on building a state machine. However, it only accounts for positive samples. The source code is on http://github.com/mvaled/inferdtd
Should could be interested in the AutomataInferrer.py which is very simple.
RegexBuilder seems to have many of the features you're looking for.
Does anyone know any examples of the following?
Proof developments about regular expressions (possibly extended with backreferences) in proof assistants (such as Coq).
Programs in dependently-typed languages (such as Agda) about regular expressions.
Certified Programming with Dependent Types has a section on creating a verified regular expression matcher. Coq Contribs has an automata contribution that might be useful. Jan-Oliver Kaiser formalized the equivalence between regular expressions, finite automata and the Myhill-Nerode characterization in Coq for his bachelors thesis.
Moreira, Pereira & de Sousa, On the Mechanisation of Kleene Algebra in Coq gives a nice verified construction of the Antimirov derivative of regexps in Coq. It's pretty easy to read off a CFA from this construction, and to compute the intersection of regexps.
I'm not sure why you separate Coq from dependently typed programming: Coq essentially is programming in a polymorphic dependently typed lambda calculus with inductive types (i.e., CIC, the calculus of inductive constructions).
I've never heard of a formalisation of regexps in a dependently typed language, nor have I heard of something such as an Antimirov-like derivative for regexps with backtracking, but Becchi & Crowley, Extending Finite Automata to Efficiently Match Perl-Compatible Regular Expressions provide a notion of finite-state automata that matches a Perl-like regexp languages. That might attractive to formalisers in the near future.
See Perl Regular Expression Matching is NP-Hard
Regex matching is NP-hard when regexes are allowed to have backreferences.
Reduction of 3-CNF-SAT to Perl Regular Expression Matching
[...] 3-CNF-SAT is NP-complete. If there
were an efficient (polynomial-time)
algorithm for computing whether or not
a regex matched a certain string, we
could use it to quickly compute
solutions to the 3-CNF-SAT problem,
and, by extension, to the knapsack
problem, the travelling salesman
problem, etc. etc.
I don't know of any development that treats regular expressions by themselves.
Finite automata, however, relevant since NFAs are the standard way to match those regular expressions, have been studied in NuPRL. Have a look at : Robert L. Constable, Paul B. Jackson, Pavel Naumov, Juan Uribe. Constructively Formalizing Automata Theory.
Should you be interested in approaching those formal languages through algebra, esp. developing finite semigroup theory, there are a number of algebra libraries developed in various theorem provers that you could think of using, with one particularly efficient in a finite setting.
The proof assistant Isabelle/HOL ships a number of formalized proofs regarding regular expressions (without back reference):
http://afp.sourceforge.net/browser_info/devel/HOL/Regular-Sets/
(here is a paper by the authors regarding what they did exactly).
Another approach is to characterize regular expressions via Myhill-Nerode Theorem:
http://www.dcs.kcl.ac.uk/staff/urbanc/Publications/itp-11.pdf