C++ Simple use of regex - c++

I'm just trying to mess around and get familiar with using regex in c++.
Let's say I want the user to input the following: ###-$$-###, make #=any number between 0-9 and $=any number between 0-5. This is my idea for accomplishing this:
regex rx("[0-9][0-9][0-9]""\\-""[0-5][0-5]")
That's not the exact code however that's the general idea to check whether or not the user's input is a valid string of numbers. However, let's say i won't allow numbers starting with a 0 so: 099-55-999 is not acceptable. How can I check something like that and output invalid? Thanks

[0-9]{3}-[0-5]{2}-[0-9]{3}
matches a string that starts with three digits between 0 and 9, followed by a dash, followed by two digits between 0 and 5, followed by a dash, followed by three digits between 0 and 9.
Is that what you're looking for? This is very basic regex stuff. I suggest you look at a good tutorial.
EDIT: (after you changed your question):
[1-9][0-9]{2}-[0-5]{2}-[0-9]{3}
would match the same as above except for not allowing a 0 as the first character.

std::tr1::regex rx("[0-9]{3}-[0-5]{2}-[0-9]{3}");
Your talking about using tr1 regex in c++ right and not the managed c++? If so, go here where it explains this stuff.
Also, you should know that if your using VS2010 that you don't need the boost library anymore for regex.

Try this:
#include <regex>
#include <iostream>
#include <string>
int main()
{
std::tr1::regex rx("\\d{3}-[0-5]{2}-\\d{3}");
std::string s;
std::getline(std::cin,s);
if(regex_match(s.begin(),s.end(),rx))
{
std::cout << "Matched!" << std::endl;
}
}
For explanation check #Tim's answer. Do note the double \ for the digit metacharacter.

Related

Why does regex_match throw "complexity exception"?

I am trying to test (using boost::regex) whether a line in a file contains only numeric entries seperated by spaces. I encountered an exception which I do not understand (see below). It would be great if someone could explain why it is thrown. Maybe I am doing something stupid here in my way of defining the patterns? Here is the code:
// regex_test.cpp
#include <string>
#include <iostream>
#include <boost/regex.hpp>
using namespace std;
using namespace boost;
int main(){
// My basic pattern to test for a single numeric expression
const string numeric_value_pattern = "(?:-|\\+)?[[:d:]]+\\.?[[:d:]]*";
// pattern for the full line
const string numeric_sequence_pattern = "([[:s:]]*"+numeric_value_pattern+"[[:s:]]*)+";
regex r(numeric_sequence_pattern);
string line= "1 2 3 4.444444444444";
bool match = regex_match(line, r);
cout<<match<<endl;
//...
}
I compile that successfully with
g++ -std=c++11 -L/usr/lib64/ -lboost_regex regex_test.cpp
The resulting program worked fine so far and match == true as I wanted. But then I test an input line like
string line= "1 2 3 4.44444444e-16";
Of course, my pattern isn't built to recognise the format 4.44444444e-16 and I would expect that match == false. However, instead I get the following runtime error:
terminate called after throwing an instance of
'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<std::runtime_error> >'
what(): The complexity of matching the regular expression exceeded predefined bounds.
Try refactoring the regular expression to make each choice made by the state machine unambiguous.
This exception is thrown to prevent "eternal" matches that take an indefinite period time to locate.
Why is that?
Note: the example I gave is extremal in the sense that putting one digit less after the dot works ok. That means
string line= "1 2 3 4.4444444e-16";
just results in match == false as expected. So, I'm baffled. What is happening here?
Thanks already!
Update:
Problem seems to be solved. Given the hint of alejrb I refactored the pattern to
const string numeric_value_pattern = "(?:-|\\+)?[[:d:]]+(?:\\.[[:d:]]*)?";
That seems to work as it should. Somehow, the isolated optional \\. inside the original pattern [[:d:]]+\\.?[[:d:]]* left to many possibilities to match a long sequence of digits in different ways.
I hope the pattern is safe now. However, if someone finds a way to use it for a blow up in the new form, let me know! It's not so obvious for me whether that might still be possible...
I'd say that your regex is probably exponentially backtracking. To protect you from a loop that would become entirely unworkable if the input were any longer, the regex engine just aborts the attempt.
One of the patterns that often causes this problem is anything of the form (x+x+)+ - which you build up here when you place the first pattern inside the second.
There's a good discussion at http://www.regular-expressions.info/catastrophic.html

Finding a string of numbers within another string

So, I'm having a problem in C++.
I need to search for a string of five numbers that won't always be in the same spot in a string.
For example, sometimes the source string might be "sjdjfut93835sxx" and other times it may be "jj3333333335".
In the first string, I would need to exctract "93835". In the second string, I wouldn't extract anything since the string of numbers is over five characters.
I need to find strings of numbers that are 5 characters long and only numbers, no letters in-between.
What would the easiest way of doing this be? I'm having a lot of trouble with this and can't find an answer to it anywhere on Google or past StackOverflow questions
Thanks!
Try splitting the task up into two steps.
First, use something like regular expressions to pull out all of the numeric strings (93835 and 3333333335 in your example).
Second, remove any results that aren't 5 characters long.
with std::regex
int extract(const string& str) {
smatch result;
regex r("\\d{5}");
regex_search(str, result, r);
return stoi(result.str());
}
this function(stoi) throws an exception if the number is not found.
Edit:: this function also matches string that contain more than 5 consecutive digits.
you can modify the regex to (^|\\D)\\d{5}($|\\D), then remove the first non-digit(if there is one) before calling stoi.
That would be pretty simple to do with DFA (deterministic finite automaton) algorithms and pattern matching ones. Examples are Boyer-Moore algorithm or Knuth-Morris-Pratt's one. You can find thorough descriptions of them into any algorithm book.
Otherwise as Joshua noted you might use some ready regex libraries and have the searching and pattern matching work done by it.
Your specific problem might also be solved "manually" with a hand-crafted solution (if I understood it correctly) like the following:
Scan the string one character at a time
If you meet a number, start counting how many there are next
If > 5, then drop it and reset the counter until you find another number
pretty easy and O(N).
You can create simple finite state machine with the states:
1) Waiting for digit
2) Have first digit, waiting for second digit
3) Have second digit, waiting for third digit
4) ...
5) ...
6) ...
7) Have fifth digit, waiting for letter or end of string
8) Finish. Return string.
string text="sjdjfut93835sxx";
int digitCount=0;
string aux="";
for(int i=0; i<strlen(text); i++)
{
if(text[i]>=48 && text[i]<=57) // if is a digit
{
digitCount++;
aux+=text[i];
if(digitCount==5)
{
cout<<"I found it! "<<aux;
}
}
else
{
aux="";
digitCount=0;
}
}

Quadric equation parser using QRegExp

I want to implement a parser for a quadric equation using regular expressions. I want to keep it as a console app. I done the regex and tested it in Debuggex. Currently I have 2 problems - I can't get the a,b,c from (ax^2+bx+c) and i want to add bash-like history with up and down arrows. Thanks in advance. My code:
#include <QCoreApplication>
#include <QRegExp>
#include <QString>
#include <QTextStream>
#include <QStringList>
#include <QDebug>
#include <cstdio>
int main(int argc, char *argv[])
{
QCoreApplication a(argc, argv);
Q_UNUSED(a);
QTextStream cin(stdin, QIODevice::ReadOnly | QIODevice::Text);
QTextStream cout(stdout, QIODevice::WriteOnly | QIODevice::Text);
const QString regexText = R"(^[-]?\d*x\^2\s*[+,-]\s*\d*x\s*[+,-]\s*\d*$)";
while(true)
{
QRegExp regex(regexText);
cout << "Enter an equation to solve or press EOF(Ctrl+D/Z) to exit." << endl;
cout << "--> " << flush;
QString equation;
equation = cin.readLine();
if( equation.isNull() )
{
cout << endl;
cout << "Thanks for using quadric equation solver! Exitting..." << endl;
return 0;
}
int pos = regex.indexIn(equation);
QStringList captures = regex.capturedTexts();
qDebug() << captures;
}
}
I think you're looking to learn how to properly use Capturing groups, which debuggex isn't great at showing you the result of. I'd shoot for a regular expression more along these lines:
^(-?\d*)x\^2\s*([+-]\s*\d*)x\s*([+-]\s*\d+)?$
You can see it in action at RegExr, my preferred RegEx tool. Mouse over the highlighted matches to see what the groups have captured.
You can see that the parentheses essentially deliniate sub-expressions that can be extracted separately, and parsed for meaning. I've chosen to include the operation (+/-) so you can use it to parse the positive or negative nature of the coefficients. You'll see in the example data that it doesn't cover decimal coefficients, but neither did your original expression, and I think this answers the most pressing issue.
Decimals
Capturing a decimal is as easy as adding a snipped after every set of digits that you capture:
(?:\.\d+)?
Which optionally matches (without capturing) a literal period followed by some other digits. This turns your greater Regular Expression into:
^(-?\d*(?:\.\d+)?)x\^2\s*([+-]\s*\d*(?:\.\d+)?)x\s*([+-]\s*\d+(?:\.\d+)?)?$
Which, as you can see, allows the capture of decimal expressions. They still have to be in order (a shortcoming of regular expressions, but only when you're trying to do everything at once), but you've increased the number of problems you can solve.
Reordered
The next step is to deal with out of order expressions. You could do this in a single regular expression, but I recommend against it for a few reasons:
It's awful to read, and thus maintain
Doing it in a single RegEx makes it hard to exclude extraneous information.
Doing it in pieces solves the problem of multiple terms automagically (like x^2+x+x+2)
Doing it in pieces sets you up to capture higher order polynomials more easily.
1: Validation
The first basic step is to decide what a term looks like. For me, a term is an operator, followed by optional whitespace, followed a variable expression or a constant. OR:
[+-]\s*(?:\d+(?:\.\d+)?|\d*(?:\.\d+)?x(?:\^\d+(?:\.\d+)?)?)
It's a doozy, so I'll include the Debuggex Visualization.
Wrap your head around the way that expression works, because it's the basic unit for the next one:
^-?\s*(?:\d+(?:\.\d+)?|\d*(?:\.\d+)?x(?:\^\d+(?:\.\d+)?)?)(?:\s*[+-]\s*(?:\d+(?:\.\d+)?|\d*(?:\.\d+)?x(?:\^\d+(?:\.\d+)?)?))+$
When you see that one in Debuggex, it becomes clear that it's basically just the former expression repeated one or more times. I added some whitespace and gave the first one an optional negative instead of an operator, but it's essentially the same.
Now, there is some room missing here, to add a negative or subtract a positive number. (think, 3x+ -4x^2), but it's a minor change to the regular expression, so I think I'll move on. Match that regular expression against your line (trimmed, of course), and you can know you have a valid equation.
2. Extraction
Extraction is based off a single regular expression, modified to capture specific terms. It does require the ability to use a lookahead, which I must admit some Regular expression engines don't support. But Debuggex supports it, and I didn't find confirmation or denial of QRegExp, so I'm going to include it.
((?:^-?|[+-])\s*d*(?:\.\d+)?)
This is your basic Regular Expression. Used by itself, it will capture a number, with no regard as to wether it's a coefficient or constant. To capture a constant, add a negative lookahead to ensure it's not followed by a variable:
((?:^-?|[+-])\s*d*(?:\.\d+)?)(?!\s*x)
To capture a specific exponent, just match it, followed by space or another sign.:
((?:^-?|[+-])\s*d*(?:\.\d+)?)\S*x\^2(?=[\s+-])
To capture without an exponent, use a negative lookahead to ensure it's missing:
((?:^-?|[+-])\s*d*(?:\.\d+)?)\s*x(?!\^)
Although, personally, I'd prefer to capture all variable terms at once with this:
((?:^-?|[+-])\s*d*(?:\.\d+)?)\s*x(?:^(\d+(?:\.\d+)?))
Which has exactly two capturing groups: one for the coefficient, and one for the exponent.

Regex less than or greater than 0

I'm trying to find a regex that validates for a number being greater or less than 0.
It must allow a number to be 1.20, -2, 0.0000001, etc...it simply can't be 0 and it must be a number, also means it can't be 0.00, 0.0
^(?=.*[1-9])(?:[1-9]\d*\.?|0?\.)\d*$
tried that but it does not allows negative
I don't think a regex is the appropriate tool for that problem.
Why not using a simple condition ?
long number = ...;
if (number != 0)
{
// ...
}
Why using a bazooka to kill a fly ?
also tried something:
-?[0-9]*([1-9][0-9]*(\.[0-9]*)?|\.[0-9]*[1-9][0-9]*)
demo: http://regex101.com/r/bZ8fE5
Just tried something:
[+-]?(?:\d*[1-9]\d*(?:\.\d+)?|0+\.\d*[1-9]\d*)
Online demo
Take a typical regex for a number, say
^[+-]?[0-9]*(\.[0-9]*)?$
and then require that there be a non-zero digit either before or after the decimal. Based on your examples, you're not expecting leading zeros before the decimal, so a simple regex might be
^([+-]?[1-9][0-9]*(\.[0-9]*)?)|([+-]?[0-9]*\.0*[1-9]*0*)
Then decide if you still want to use a regex for this.
Try to negate the regex like this
!^[0\.]+$
If you're feeling the need to use regex just because it's stored as a String you could use Double.parseDouble() to covert the string into a numeric type. This would have an added advantage of checking if the string is a valid number or not (by catching NumberFormatException).

replace string through regex using boost C++

I have string in which tags like this comes(there are multiple such tags)
|{{nts|-2605.2348}}
I want to use boost regex to remove |{{nts| and }} and replace whole string that i have typed above with
-2605.2348
in original string
To make it more clear:
Suppose string is:
number is |{{nts|-2605.2348}}
I want string as:
number is -2605.2348
I am quite new to boost regex and read many things online but not able to get answer to this any help would be appreciated
It really depends on how specific do you want to be. Do you want to always remove exactly |{{nts|, or do you want to remove pipe, followed by {{, followed by any number of letters, followed by pipe? Or do you want to remove everything that isn't whitespace between the last space and the first part of the number?
One of the many ways to do this would be something like:
#include <iostream>
#include <boost/regex.hpp>
int main()
{
std::string str = "number is |{{nts|-2605.2348}}";
boost::regex re("\\|[^-\\d.]*(-?[\\d.]*)\\}\\}");
std::cout << regex_replace(str, re, "$1") << '\n';
}
online demo: http://liveworkspace.org/code/2B290X
However, since you're using boost, consider the much simpler and faster parsers generated by boost.spirit.