regular expression to express number range with ascending order - regex

I have to specify a pair of ports using format "number1-number2". Number1 and number2 both in range [0-65535]. But number2 is always larger than number1.
Is it possible to make a regular expression to expression the logic "number2 is always larger than number1".

Extracting numbers should be your first choice, because it's the best choice. There's no good way to do this in regular expressions alone. You should use
\\[(\\d+)-(\\d+)\\]
to extract those two numbers and compare them. The conversion from string to integer is miniscule in cost, and pales in comparison to how expensive any regex that might approach what you need would be. We're talking massive polynomial exponents versus linear time.

Related

Dynamically Allocating an Array from a Polynomial Function String

So I have a polynomial addition problem like the one below:
(1*x+2*x^3-1*x+7)+(1+1*x^2-1*x+1*x^4)
I need to figure out how to extract the numbers for the coefficients and exponents and enter them into a dynamically allocated 2D array (from here I can sort them and add them together before outputting the answer).
I am pretty lost on how to do this because the polynomials can be in any order of degrees and include any amount of terms. I can dynamically allocate them after I extract all of the numbers. The part I need help on is:
Extracting all of the numbers
Differentiating between them to see if it is a coefficient or an exponent
Allowing this to happen for any number of terms
If anyone could answer this or at least point me in the right direction it would be appreciated.
Your problem looks like its parsing and evaluation.
Step1: You need to parse the string assuming an infix expression, so
that you can pull out the coefficient
Step2: push those coefficients into a vector/deque etc to perform the
polynomial calculation.
Here are some good examples:
Evaluating arithmetic expressions from string in C++
What is the best way to evaluate mathematical expressions in C++?
To extract coefficients from a string you need to create a parser. You can use special library like boost.spirit, you can use special tool which builds parsers like Flex, or you can make your own manually using regular expressions or not.
To store coeficients you can use std::vector<int> using indexes as power of x, so for 1*x+2*x^3-1*x+7 your vector would have data:
{ 7, -1, 0, 2 } // 7*x^0 + -1*x^1 + 0*x^2 + 2*x^3
then you do not need to sort them to add coefficients. To store all polynoms you would use std::vector<std::vector<int>> accordingly.

Regular expression to match a missing number in a string sequence

I'm reproducing verbatim a coding exercise statement below:
Given a string consisting only of digits representing a sequence, the goal is to find a missing number. For instance, given the string 596597598600601602 the missing number is 599. You may assume all the numbers
are positive integers and the sequence increases by one at
each number except the missing number. The numbers
will have no more than six digits and the string will have
no more than four hundred characters.
I've managed to come up with the solution in C++ code without using any regular expression whatsoever. Will the incorporation of regular expressions in the logic either simplify the logic or reduce the complexity? If yes, some pointers on how I can go about the task would be appreciated.

How do I find the shortest possible reg exp that accepts a sequence?

I'm looking for a way to find the smallest possible regular-expression that accepts a sequence.
To make it interesting I don't want any stars(Kleene stars) and preferably no wildcards?
For instance the sequence : 'aaaaaaaa' would be accepted by 'a^8' and a^8 would be the shortest possible expression to accept the sequence.
Does anyone body know how to generate such an expression?
The search space for what you are after will most likely grow exponentially as the string grows, since there is usually a large amount of regular patterns that can match a given string.
I think that in your case you could try using some search heuristic to try and approximate or even manage to find the optimal solution. I do not think that there is a straightforward solution for that (albeit that is just my opinion).
Given that regular expressions and deterministic finite automata are equivalent, you can minimise a given regular expression using any of the algorithms for the minimisation of DFAs. You would of course still need to come up with a regular expression to start with, but if you only need it to accept one string, then the characters of that string are the states. You can then minimise that DFA and convert it to a regular expression.

find Reg. Expr. over {0,1,2} so last symbol of string is the sum of the symbols so far on the string mod 3.

I'm learning by myself formal languages (Aho's,Hopcroft) but I'm having a hard time with regular expressions.
I've been able to tackle simple tasks but this one has posed a challenge, at least for me. How to solve this if you can't count so far, I'm not used to this type of computation.
There must be some property or something that let me generalize the answer that much that i can put it as a regular expresion.
So far I've devised that is possible that there may be at least 2 o 3 cases:
sums mod3=0 if sum=3k
sums mod3=1 if sum=3k+1
sums mod3=2 if sum=3k+2.
But I've come to realize that there may be many combinations for a sum to happen so can't find the pattern the regular expression must follow.
The string for ex. {122211}0 (braces are for easy read sake) has the zero at the end as it holds that {sum=3k}0, if the sum is "10" from a string for ex. {1222111}1 the case may be {sum=3k+1} so the one has to be at the end, and so on.
This may or not be the right track to tackle the problem but I'm open to any suggestions please, any help is very appreciated.
Here's a hint: think of what distinct final states you can possibly be in. You certainly have at least 3 states, since the number of values can be three different things mod three. Also, you need to have a distinct start state, since the empty string cannot be accepted. Do you need more states?
Hint2: I think you can easily do this with a DFA using a start state and nine other states, of which exactly three will be accepting.
EDIT: Once you have a DFA, you can use Kleene's Theorem to construct an equivalent regular expression. If you'd rather go straight for a regular expression, here's another hint: if you're looking at any string of length 3k, you can append: 0; any string of length 1, followed by 1; any string of length 2, followed by 2. So if you can write regular expressions for strings of lengths 3k, 1, and 2, you're practically done.

Distance between regular expression

Can we compute a sort of distance between regular expressions ?
The idea is to mesure in which way two regular expression are similar.
You can build deterministic finite-state machines for both regular expressions and compare the transitions. The difference of both transitions can then be used to measure the distance of these regular expressions.
There are a few of metrics you could use:
The length of a valid match. Some regexs have a fixed size, some an upper limit and some a lower limit. Compare how similar their lengths or possible lengths are.
The characters that match. Any regex will have a set of characters a match can contain (maybe all characters). Compare the set of included characters.
Use a large document and see how many matches each regex makes and how many of those are identical.
Are you looking for strict equivalence?
I suppose you could compute a Levenshtein Distance between the actual Regular Experssion strings. That's certainly one way of measuring a "distance" between two different Regular Expression strings.
Of course, I think it's possible that regular expressions are not required here at all, and computing the Levenshtein Distance of the actual "value" strings that the Regular Expressions would otherwise be applied to, may yield a better result.
If you have two regular expressions and have a set of example inputs you could try matching every input against each regex. For each input:
If they both match or both don't match, score 0.
If one matches and the other doesn't, score 1.
Sum this score over all inputs, and this will give you a 'distance' between the regular expressions. This will give you an idea of how often two regular expressions will differ for typical input. It will be very slow to calculate if your sample input set is large. It won't work at all if both regexes fail to match for almost all random strings and your expected input is entirely random. For example the regex 'sgjlkwren' and the regex 'ueuenwbkaalf' would probably both never match anything if tested on random input, so this metric would say the distance between them is zero. That might or might not be what you want (probably not).
You might be able to analyze the structure of the regex and use biased random sampling to deliberately hit strings that match more frequently than in completely random input. For example, if both regex require that the string starts with 'foo', you could make sure that your test inputs also always start with foo, to avoid wasting time testing strings that you know will fail for both.
So in conclusion: unless you have a very specific situation with a restricted input set and/or restricted regular expression language, I'd say its not possible. If you do have some restrictions on your input and on the regular expression, it might be possible. Please specify what these restrictions are and maybe I can come up with something better.
There's an answer hidden in an earlier question here on SO: Generating strings from regexes. You can calculate an (asymmetric) distance measure by generating strings using one regex and checking how many of those match the other regex.
This can be optimized by stripping out shared prefixes/suffixes. E.g. a[0-9]* and a[0-7]* share the a prefix, so you can calculate the distance between [0-9]* and [0-7]* instead.
I think first you need to understand for yourself how you see a "difference" between two expressions. Basically, define a distance metric.
In general case, it would be quite different to make. Depending on what you need to do, you may see allowing one different character in some place as a big difference. In the other case, allowing any number of consequent but same characters may not yield much difference.
I'd like to emphasize as well that normally when they talk about distance functions, they apply them to..., well, let's call them, tokens. In our case, character sequences. What you are willing to do, is to apply this method not to those tokens, but to the rules a multitude of tokens will match. I'm not quite sure it even makes sense.
Still, I believe we could think of something, but not in general, but for one particular and quite restricted case. Do you have some sort of example to show us?