Regex library not working correctly in c++ - c++

I have been looking up places to work with regex in c++ , as I want to learn regular expressions in c++ (do give me a step by step link also if you guys have any). I am using g++ to compile my programs and working in Ubuntu.
earlier my program were not compiling but then I read this post where it said to compile the program by
"g++ -std=c++0x sample.cpp"
to use the regex header.
My first program works correctly, i tried implementing regex_match
#include<stdio.h>
#include<iostream>
#include<regex>
using namespace std;
int main()
{
string str = "Hello world";
regex rx ("ello");
if(regex_match(str.begin(), str.end(), rx))
{
cout<<"True"<<endl;
}
else
cout<<"False"<<endl;
return(0);
}
for which my program returned false ... ( as the expression is not matching completely)
I also rechecked it by making it match...it works.
Now I am writing another program to implement regex_replace and regex_search . Both of which doesnt work ( for regex_search just replace regex_match in the above program with regex_search. kindly help.I dont know where I am getting wrong.

The <regex> header is not fully supported by GCC.
You can see GCC support here.

Related

regex.h matching differences between OSX and Linux

I need to match the following line with multiple capturing groups:
0.625846 29Si 29 [4934.39 0] [0.84 100000000000000.0]
I use the regex:
^(0+\.[0-9]?e?[+-]?[0-9]+)\s+([0-9]+\.?[0-9]*|[0-9][0-9]?[0-9]?[A-Z][a-z]?)\s+([0-9][0-9]?[0-9]?)\s+(\[.*\])\s+(\[.*\])$
see this link for a regex101 workspace. However I find that when I'm trying the matching using regex.h it behaves differently on OSX or linux, specifically:
Fails on:
OSX: 10.14.6
LLVM: 10.0.1 (clang-1001.0.46.4)
Works on:
linux: Ubuntu 18.04
g++: 7.5.0
I worked up a brief code the reproduces the problem, compiled with g++ regex.cpp -o regex:
#include <iostream>
//regex
#include <regex.h>
using namespace std;
int main(int argc, char** argv) {
//define a buffer for keeping results of regex matching
char buffer[100];
//regex object to use
regex_t regex;
//*****regex match and input file line*******
string iline = "0.625846 29Si 29 [4934.39 0] [0.84 100000000000000.0]";
string matchfile="^(0+\\.[0-9]?e?[+-]?[0-9]+)\\s+([0-9]+\\.?[0-9]*|[0-9][0-9]?[0-9]?[A-Z][a-z]?)\\s+([0-9][0-9]?[0-9]?)\\s+(\\[.*\\])\\s+(\\[.*\\])$";
//compile the regex
int reti = regcomp(&regex,matchfile.c_str(),REG_EXTENDED);
regerror(reti, &regex, buffer, 100);
if(reti==0)
printf("regex compile success!\n");
else
printf("regcomp() failed with '%s'\n", buffer);
//match the input line
regmatch_t input_matchptr[6];
reti = regexec(&regex,iline.c_str(),6,input_matchptr,0);
regerror(reti, &regex, buffer, 100);
if(reti==0)
printf("regex compile success!\n");
else
printf("regexec() failed with '%s'\n", buffer);
//******************************************
return 0;
I have also modified my regex to comply with POSIX (I think?) by removing the previous use of +? and *? operators as per this post but may have missed something that makes me incompatible with POSIX? However, the regex now seems to compile correctly which makes me thing I used a valid regex but I still don't understand why no match is obtained. Which I understand that LLVM requires.
How can I modify my regex to correctly match?
To answer the immediate question, you need to use
string matchfile="^(0+\\.[0-9]?e?[+-]?[0-9]+)[[:space:]]+([0-9]+\\.?[0-9]*|[0-9][0-9]?[0-9]?[A-Z][a-z]?)[[:space:]]+([0-9][0-9]?[0-9]?)[[:space:]]+(\\[.*\\])[[:space:]]+(\\[.*\\])$";
That is, instead of Perl-like \s, you can use [:space:] POSIX character class inside a bracket expression.
You mention that you tried [:space:] outside of a bracket expression, and it did not work - that is expected. As per Character Classes,
[:digit:] is a POSIX character class, used inside a bracket expression like [x-z[:digit:]].
This means that POSIX character classes are only parse as such when used inside bracket expressions.

C++ std::regex segmentation fault

I have various data that I need to parse and get the weight out of it.
I'm using
C++11
std::regex
Debian 9.9
gcc 6.3.0
The problem is that sometimes segmentation fault occurs, it happens very rarely.
The input that throws the error mostly consist of just space and newline characters.
Here is the regex:
(?:\b(?:(kilogram\.*s*\.*|kg\.*s*\.*)(?:[^[:alnum:]])*)(?:\s*weight\s*)*(?:\s*is\s*|\s*are\s*)*)\W*([\d\.,]*\d+\b)|(?:(?:[\s\.]?|^)([\d\.,]*\d+)\W*(kilogram\.*s*\.*|kg\.*s*\.*)\b)
Example regex that works on regex101.com but throws segmentation fault in C++ on my Debian server regex101
Here are some more regex101 examples of input, just to fast get an idea of what regex is searching for.
Here is an example of C++ code that fails.
And here is the same C++ code that works, but using another online compiler (cpp.sh).
Can someone please help me to solve this segmentation fault problem?
Thank you.
I have the same issue with a simple regex .+ and [a-zA-Z0-9\\+/=]+.
I have tried different compilers: g++, clang++, clang-cl on Windows, and g++, clang++ on Linux (WSL).
On Windows, the application freeze and ends suddenly. On Ubuntu (WSL), I have the Segmentation Fault.
The error happens for g++ on Windows with c++11, c++14, c++17 and also c++20.
Limit
In your example, your data regex101 has 31275 characters which, I suppose, is too many for regex_match.
Here is the program I used to guess the maximal length of the data.
#include <iostream>
#include <regex>
int main(int argc, char **argv) {
int length = argc > 1 ? std::stoi(std::string(argv[1])) : 30000;
std::regex testRegex(".+");
std::string data = "";
for (int i = 0; i < length; ++i) {
data += "a";
}
std::cout << "Match: " << std::regex_match(data, testRegex) << std::endl;
return 0;
}
// Limit before crash (it's a bit random so the limit is not accurate)
// Windows 11
// clang++ Windows : 4999998
// clang-cl Windows : 4999998
// g++ Windows : 6833
// WSL Ubuntu 20.04
// clang++ WSL : 23804
// g++ WSL : 26187
How to solve
According to this test, data has a size limit, and the application will stop if the limit is exceeded.
What you can do is:
Remove some unnecessary spaces before using regex_match
Split the data in half
On Windows, you can use clang++ to increase the limit to 5M chars
For me, I split my data in half because the regex [a-zA-Z0-9\\+/=]+ doesn't require the entire input.
If anybody knows how we can increase the limit (with some flags or #define), I am interested.

regex_error being thrown when trying to do simple things like [:digit:] or \d

Every time I put [:digit:] in a regex like so: regex r("[:digit:]") it throws an exception and .what() just returns regex_error instead of a descriptive, meaningful explanation of the error. Same things happens when I try regex r("\\d"). And when I try regex r("\d") my compiler says that \d is an unfamiliar character escape sequence. I'm in Code::Blocks by the way. Here's my code:
#include <regex>
#include <iostream>
using namespace std;
int main()
{
regex r("\d"); //and or r("[:digit:]")
string i = "5";
if(regex_match(i,r))
{
cout << "Integer";
}
return 0;
}
After getting a newer version of Code::Blocks and the MinGW GCC compiler suite it worked.
P.S. I kept having an error when trying to set the compiler after downloading Code::Blocks. I had to go to Global compiler settings and click Reset defaults for it to auto-detect my compiler. As seen here.

Weird compiling error with CodeBlocks - c++

that's my first use to code block but it hasn't gone fine ,, i face a really weird problem
i cant even describe it so i will just tell you what's happened.
the problem is that the ide dont compile my project even if the code were correct
its just open a new tab that called "iostream" and the console window appears but empty
why do that happens ?
look to the code which the ide face a problem while compiling it ,, simplest code ever
#include <iostream>
using namespace std;
int main()
{
cout << "Hello world!" << endl;
return 0;
}
and this is the compiling results...
thats all..
will codeblocks stop annoying me ?
This line is not valid syntax
usingnamespace std;
Those are two separate keywords
using namespace std;
And since you are just starting C++, Lesson 1: Don't do that.

How can I use wildcards with string::find?

I want to be able to use a wildcard in string::find and then fetch what was in that wildcard place.
For example:
if (string::npos !=input.find("How is * doing?")
{
cout<<"(the wildcard) is doing fine."<<endl;
}
And so if I ask, "How is Mom doing?", the output would be "Mom is doing fine."
What libraries would I use for this, or how would I write the code manually? If I should use AIML, can AIML execute .bat files?
Regarding your question on how to do it manually, i will give you an idea with this simple code, which solves the example which you gave in the question.
#include<iostream>
#include<string>
using namespace std;
int main()
{
string expr="How is * doing?";
string input="How is Mom doing?";
int wildcard_pos=expr.find("*");
if(wildcard_pos!=string::npos)
{
int foo=input.find(expr.substr(0,wildcard_pos)),bar=input.find(expr.substr(wildcard_pos+1));
if(foo!=string::npos && bar!=string::npos)
cout<<input.substr(wildcard_pos,bar-wildcard_pos)<<" is doing fine\n";
}
}
You can easily modify this idea to suit your needs. Else, follow the answer given by src
C++ 2011 provides a regular expression library. If you can't use C++ 2011 yet you can use the Boost.Regex library or any C library like PCRE.