Regular Expressions misunderstanding or just broken implementation? - c++

I tried a very simple use of regex_search and can not understand why I do not get a match:
Alas, the gcc-C++0x-implementations 4.5 does not seem to be working, I get a link error here.
But here is my gcc-4.7.0 try, quite straightforward:
#include <iostream>
#include <string>
#include <regex>
using namespace std;
int main () {
regex rxWorld("world");
const string text = "hello world!";
const auto t0 = text.cbegin();
smatch match;
const bool ok = regex_search(text, match, rxWorld);
/* ... */
}
I think I should get ok==true and something in match as well. I reduced the example to a very simple regex for this. I tried slightly more complicated first.
But by printing code at /* ... */ says otherwise:
cout << " text:'" << text
<< "' ok:" << ok
<< " size:" << match.size();
cout << " pos:" << match.position()
<< " len:"<< match.length();
for(const auto& sub : match) {
cout << " ["<<(sub.first-t0)<<".."<<(sub.second-t0)
<< ":"<<sub.matched
<< "'"<<sub.str()
<< "']";
}
cout << endl;
The output is:
$ ./regex-search-01.x
text:'hello world!' ok:0 size:0 pos:-1 len:0
Update: I also tried regex_search(t0, text.cend(), match, rxWorld) and const char* text, no change.
`
Is my understanding of regex_search wrong? I am completely baffled. Or is it just the gcc?

As you can see from the C++-0x status of libstdc++ the regex support is incomplete.
In particular match_results are not finished. Iterators are not even started.
Volunteers are welcome ;-)
[EDIT] [As of gcc-4.9]2 will be fully supported.

Related

c++ regex search pattern not found

Following the example here I wrote following code:
using namespace std::regex_constants;
std::string str("{trol,asdfsad},{safsa, aaaaa,aaaaadfs}");
std::smatch m;
std::regex r("\\{(.*)\\}"); // matches anything between {}
std::cout << "Initiating search..." << std::endl;
while (std::regex_search(str, m, r)) {
for (auto x : m) {
std::cout << x << " ";
}
std::cout << std::endl;
str = m.suffix().str();
}
But to my surprise, it doesn't find anything at all which I fail to understand. I would understand if the regex matches whole string since .* is greedy but nothing at all? What am I doing wrong here?
To be clear - I know that regexes are not suitable for Parsing BUT I won't deal with more levels of bracket nesting and therefore I find usage of regexes good enough.
If you want to use basic posix syntax, your regex should be
{\\(.*\\)}
If you want to use default ECMAScript, your regex should be
\\{(.*)\\}
with clang and libc++ or with gcc 4.9+ (since only it fully support regex) your code give:
Initiating search...
{trol,asdfsad},{safsa, aaaaa,aaaaadfs} trol,asdfsad},{safsa, aaaaa,aaaaadfs
Live example on coliru
Eventually it turned out to really be problem with gcc version so I finally got it working using boost::regex library and following code:
std::string str("{trol,asdfsad},{safsa,aaaaa,aaaaadfs}");
boost::regex rex("\\{(.*?)\\}", boost::regex_constants::perl);
boost::smatch result;
while (boost::regex_search(str, result, rex)) {
for (uint i = 0; i < result.size(); ++i) {
std::cout << result[i] << " ";
}
std::cout << std::endl;
str = result.suffix().str();
}

Retrieving a regex search in C++

Hello I am new to regular expressions and from what I understood from the c++ reference website it is possible to get match results.
My question is: how do I retrieve these results? What is the difference between smatch and cmatch? For example, I have a string consisting of date and time and this is the regular expression I wrote:
"(1[0-2]|0?[1-9])([:][0-5][0-9])?(am|pm)"
Now when I do a regex_search with the string and the above expression, I can find whether there is a time in the string or not. But I want to store that time in a structure so I can separate hours and minutes. I am using Visual studio 2010 c++.
If you use e.g. std::regex_search then it fills in a std::match_result where you can use the operator[] to get the matched strings.
Edit: Example program:
#include <iostream>
#include <string>
#include <regex>
void test_regex_search(const std::string& input)
{
std::regex rgx("((1[0-2])|(0?[1-9])):([0-5][0-9])((am)|(pm))");
std::smatch match;
if (std::regex_search(input.begin(), input.end(), match, rgx))
{
std::cout << "Match\n";
//for (auto m : match)
// std::cout << " submatch " << m << '\n';
std::cout << "match[1] = " << match[1] << '\n';
std::cout << "match[4] = " << match[4] << '\n';
std::cout << "match[5] = " << match[5] << '\n';
}
else
std::cout << "No match\n";
}
int main()
{
const std::string time1 = "9:45pm";
const std::string time2 = "11:53am";
test_regex_search(time1);
test_regex_search(time2);
}
Output from the program:
Match
match[1] = 9
match[4] = 45
match[5] = pm
Match
match[1] = 11
match[4] = 53
match[5] = am
Just use named groups.
(?<hour>(1[0-2]|0?[1-9]))([:](?<minute>[0-5][0-9]))?(am|pm)
Ok, vs2010 doesn't support named groups. You already using unnamed capture groups. Go through them.

Regex C++: extract substring

I would like to extract a substring between two others.
ex: /home/toto/FILE_mysymbol_EVENT.DAT
or just FILE_othersymbol_EVENT.DAT
And I would like to get : mysymbol and othersymbol
I don't want to use boost or other libs. Just standard stuffs from C++, except CERN's ROOT lib, with TRegexp, but I don't know how to use it...
Since last year C++ has regular expression built into the standard. This program will show how to use them to extract the string you are after:
#include <regex>
#include <iostream>
int main()
{
const std::string s = "/home/toto/FILE_mysymbol_EVENT.DAT";
std::regex rgx(".*FILE_(\\w+)_EVENT\\.DAT.*");
std::smatch match;
if (std::regex_search(s.begin(), s.end(), match, rgx))
std::cout << "match: " << match[1] << '\n';
}
It will output:
match: mysymbol
It should be noted though, that it will not work in GCC as its library support for regular expression is not very good. Works well in VS2010 (and probably VS2012), and should work in clang.
By now (late 2016) all modern C++ compilers and their standard libraries are fully up to date with the C++11 standard, and most if not all of C++14 as well. GCC 6 and the upcoming Clang 4 support most of the coming C++17 standard as well.
TRegexp only supports a very limited subset of regular expressions compared to other regex flavors. This makes constructing a single regex that suits your needs somewhat awkward.
One possible solution:
[^_]*_([^_]*)_
will match the string until the first underscore, then capture all characters until the next underscore. The relevant result of the match is then found in group number 1.
But in your case, why use a regex at all? Just find the first and second occurrence of your delimiter _ in the string and extract the characters between those positions.
If you want to use regular expressions, I'd really recommend using C++11's regexes or, if you have a compiler that doesn't yet support them, Boost. Boost is something I consider almost-part-of-standard-C++.
But for this particular question, you do not really need any form of regular expressions. Something like this sketch should work just fine, after you add all appropriate error checks (beg != npos, end != npos etc.), test code, and remove my typos:
std::string between(std::string const &in,
std::string const &before, std::string const &after) {
size_type beg = in.find(before);
beg += before.size();
size_type end = in.find(after, beg);
return in.substr(beg, end-beg);
}
Obviously, you could change the std::string to a template parameter and it should work just fine with std::wstring or more seldomly used instantiations of std::basic_string as well.
I would study corner cases before trusting it.
But This is a good candidate:
std::string text = "/home/toto/FILE_mysymbol_EVENT.DAT";
std::regex reg("(.*)(FILE_)(.*)(_EVENT.DAT)(.*)");
std::cout << std::regex_replace(text, reg, "$3") << '\n';
The answers of Some programmer dude, Tim Pietzcker, and Christopher Creutzig are cool and correct, but they seemed to me not very obvious for beginners.
The following function is an attempt to create an auxiliary illustration for Some programmer dude and Tim Pietzcker's answers:
void ExtractSubString(const std::string& start_string
, const std::string& string_regex_extract_substring_template)
{
std::regex regex_extract_substring_template(
string_regex_extract_substring_template);
std::smatch match;
std::cout << std::endl;
std::cout << "A substring extract template: " << std::endl;
std::cout << std::quoted(string_regex_extract_substring_template)
<< std::endl;
std::cout << std::endl;
std::cout << "Start string: " << std::endl;
std::cout << start_string << std::endl;
std::cout << std::endl;
if (std::regex_search(start_string.begin(), start_string.end()
, match, regex_extract_substring_template))
{
std::cout << "match0: " << match[0] << std::endl;
std::cout << "match1: " << match[1] << std::endl;
std::cout << "match2: " << match[2] << std::endl;
}
std::cout << std::endl;
}
The following overloaded function is an attempt to help illustrate Christopher Creutzig's answer:
void ExtractSubString(const std::string& start_string
, const std::string& before_substring, const std::string& after_substring)
{
std::cout << std::endl;
std::cout << "A before substring: " << std::endl;
std::cout << std::quoted(before_substring) << std::endl;
std::cout << std::endl;
std::cout << "An after substring: " << std::endl;
std::cout << std::quoted(after_substring) << std::endl;
std::cout << std::endl;
std::cout << "Start string: " << std::endl;
std::cout << start_string << std::endl;
std::cout << std::endl;
size_t before_substring_begin
= start_string.find(before_substring);
size_t extract_substring_begin
= before_substring_begin + before_substring.size();
size_t extract_substring_end
= start_string.find(after_substring, extract_substring_begin);
std::cout << "Extract substring: " << std::endl;
std::cout
<< start_string.substr(extract_substring_begin
, extract_substring_end - extract_substring_begin)
<< std::endl;
std::cout << std::endl;
}
This is the main function to run the overloaded functions:
#include <regex>
#include <iostream>
#include <iomanip>
int main()
{
const std::string start_string
= "/home/toto/FILE_mysymbol_EVENT.DAT";
const std::string string_regex_extract_substring_template(
".*FILE_(\\w+)_EVENT\\.DAT.*");
const std::string string_regex_extract_substring_template2(
"[^_]*_([^_]*)_");
ExtractSubString(start_string, string_regex_extract_substring_template);
ExtractSubString(start_string, string_regex_extract_substring_template2);
const std::string before_substring = "/home/toto/FILE_";
const std::string after_substring = "_EVENT.DAT";
ExtractSubString(start_string, before_substring, after_substring);
}
This is the result of executing the main function:
A substring extract template:
".*FILE_(\\w+)_EVENT\\.DAT.*"
Start string:
"/home/toto/FILE_mysymbol_EVENT.DAT"
match0: /home/toto/FILE_mysymbol_EVENT.DAT
match1: mysymbol
match2:
A substring extract template:
"[^_]*_([^_]*)_"
Start string:
"/home/toto/FILE_mysymbol_EVENT.DAT"
match0: /home/toto/FILE_mysymbol_
match1: mysymbol
match2:
A before substring:
"/home/toto/FILE_"
An after substring:
"_EVENT.DAT"
Start string:
"/home/toto/FILE_mysymbol_EVENT.DAT"
Extract substring:
mysymbol

Boost RegEx: Specific Question

I am trying to use this expression:
Expression: "\w{1,}\s*?\-\-(\>)?\s*?\w{1,}"
Keep in mind I am escaping the \ with a second \ in my code.
When searching in the strings below. I think I am close, but no cigar. I want the expression above to be able to find matches in the text below. Where am I going wrong?
Text: "AB --> CD"
Text: "AB --> Z"
Text: "A --> 123d"
etc.
Resources Used:
http://www.solarix.ru/for_developers/api/regex-en.html
http://www.boost.org/doc/libs/1_47_0/libs/regex/doc/html/boost_regex/introduction_and_overview.html
http://www.regular-expressions.info/reference.html
UPDATE
The comment helped me. I would still like to see people post on my thread, for record keeping purposes, regex sites that have helped them master regex. Anyways my code (mostly copied from the boost website) is.
/* All captures from a regular expression */
#include <boost/regex.hpp>
#include <iostream>
/* Compiled with g++ -o regex_tut -lboost_regex -Wall ./regex_tut.cpp */
void print_captures(const std::string& regx, const std::string& text)
{
boost::regex e(regx);
boost::smatch what;
std::cout << "Expression: \"" << regx << "\"\n";
std::cout << "Text: \"" << text << "\"\n";
if(boost::regex_match(text, what, e, boost::match_extra))
{
unsigned i;
std::cout << "** Match found **\n Sub-Expressions:\n";
for(i = 0; i < what.size(); ++i) {
std::cout << " $" << i << " = \"" << what[i] << "\"\n";
}
}
else
{
std::cout << "** No Match found **\n";
}
}
int main(int argc, char* argv[ ])
{
print_captures("^\\w+\\s*-->?\\s*\\w+\\s*(\\(\\d+\\))?", "AB --> CD (12)" );
return 0;
}
Seems to work. Please though so I can accept an answer post your favorite site up and give a newb a few pointers =).
Not sure if i understood your question correctly, but if you want your regex to match for example AB and CD in "AB --> CD" you can use the following regex:
Expression: "(\w+)\s*-->?\s*(\w+)"

Extract IP address from a string using boost regex?

I was wondering if anyone can help me, I've been looking around for regex examples but I still can't get my head over it.
The strings look like this:
"User JaneDoe, IP: 12.34.56.78"
"User JohnDoe, IP: 34.56.78.90"
How would I go about to make an expression that matches the above strings?
The question is how exactly do you want to match these, and what else do you want to exclude?
It's trivial (but rarely useful) to match any incoming string with a simple .*.
To match these more exactly (and add the possibility of extracting things like the user name and/or IP), you could use something like: "User ([^,]*), IP: (\\d{1,3}(\\.\\d{1,3}){3})". Depending on your input, this might still run into a problem with a name that includes a comma (e.g., "John James, Jr."). If you have to allow for that, it gets quite a bit uglier in a hurry.
Edit: Here's a bit of code to test/demonstrate the regex above. At the moment, this is using the C++0x regex class(es) -- to use Boost, you'll need to change the namespaces a bit (but I believe that should be about all).
#include <regex>
#include <iostream>
void show_match(std::string const &s, std::regex const &r) {
std::smatch match;
if (std::regex_search(s, match, r))
std::cout << "User Name: \"" << match[1]
<< "\", IP Address: \"" << match[2] << "\"\n";
else
std::cerr << s << "did not match\n";
}
int main(){
std::string inputs[] = {
std::string("User JaneDoe, IP: 12.34.56.78"),
std::string("User JohnDoe, IP: 34.56.78.90")
};
std::regex pattern("User ([^,]*), IP: (\\d{1,3}(\\.\\d{1,3}){3})");
for (int i=0; i<2; i++)
show_match(inputs[i], pattern);
return 0;
}
This prints out the user name and IP address, but in (barely) enough different format to make it clear that it's matching and printing out individual pieces, not just passing entire strings through.
#include <string>
#include <iostream>
#include <boost/regex.hpp>
int main() {
std::string text = "User JohnDoe, IP: 121.1.55.86";
boost::regex expr ("User ([^,]*), IP: (\\d{1,3}(\\.\\d{1,3}){3})");
boost::smatch matches;
try
{
if (boost::regex_match(text, matches, expr)) {
std::cout << matches.size() << std::endl;
for (int i = 1; i < matches.size(); i++) {
std::string match (matches[i].first, matches[i].second);
std::cout << "matches[" << i << "] = " << match << std::endl;
}
}
else {
std::cout << "\"" << expr << "\" does not match \"" << text << "\". matches size(" << matches.size() << ")" << std::endl;
}
}
catch (boost::regex_error& e)
{
std::cout << "Error: " << e.what() << std::endl;
}
return 0;
}
Edited: Fixed missing comma in string, pointed out by Davka, and changed cmatch to smatch