return boost::smatch and get substring "\000" - c++

here is my code, I get messy code if I extract boost::regex_search into function #match
boost::smatch match() {
std::string s = "foobar";
std::string re_s = "f(oo)(b)ar";
boost::regex re(re_s);
boost::smatch what;
if (boost::regex_search(s, what, re)) {
return what;
}
}
int main(int argc, char **argv) {
boost::smatch what = match();
std::cout << what.size() << std::endl;
std::cout << what[0] << std::endl;
std::cout << what[1] << std::endl;
std::cout << what[2] << std::endl;
return (0);
};
the output is:
3
\000\000\000\000\000
\000\000
\000
how to make what[n] return real string

boost::smatch contains string::iterator values for tracking the matches internally. You are matching against a string object that is on the stack. When the match() function returns, that string is destructed and the iterators become invalid. Try moving the string s to the main() function and passing it into match() as a reference.

In Boost, the operator[](int index) of smatch returns a const_reference which is a typedef for sub_match<BidirectionalIterator>. sub_match<BidirectionalIterator> has a cast operator to a string, but you must cast the match to a string, otherwise it calls the operator<<(basic_ostream,sub_match) function which returns the distance from the last match. If you cast the what[0] to a std::string, it will print out. (I tested it on my machine.)
This is the code I used:
#include <iostream>
#include <string>
#include <boost/regex.hpp>
boost::smatch match() {
std::string s = "foobar";
std::string re_s = "f(oo)(b)ar";
boost::regex re(re_s);
boost::smatch what;
if (boost::regex_search(s, what, re)) {
return what;
}
}
int main(int argc, char **argv) {
boost::smatch what = match();
std::cout << what.size() << std::endl;
std::string what0 = what[0];
std::cout << what0 << std::endl;
std::cout << what[1] << std::endl;
std::cout << what[2] << std::endl;
return (0);
};

If only wanna use regex, use std::regex_search instead of boost::regex_search is good, following work well.
#include "boost/regex.hpp"
#include "iostream"
#include "regex"
std::smatch match() {
std::string s = "foobar";
std::string re_s = "f(oo)(b)ar";
std::regex re(re_s);
std::smatch what;
if (std::regex_search(s, what, re)) {
return what;
}
}
int main(int argc, char **argv) {
std::smatch what = match();
std::cout << what.size() << std::endl;
std::cout << what[0].str() << std::endl;
std::cout << what[1].str() << std::endl;
std::cout << what[2].str() << std::endl;
return (0);
};

Related

For every instance of a character/substring in string

I have a string in C++ that looks like this:
string word = "substring"
I want to read through the word string using a for loop, and each time an s is found, print out "S found!". The end result should be:
S found!
S found!
Maybe you could utilize toupper:
#include <iostream>
#include <string>
void FindCharInString(const std::string &str, const char &search_ch) {
const char search_ch_upper = toupper(search_ch, std::locale());
for (const char &ch : str) {
if (toupper(ch, std::locale()) == search_ch_upper) {
std::cout << search_ch_upper << " found!\n";
}
}
}
int main() {
std::string word = "substring";
std::cout << word << '\n';
FindCharInString(word, 's');
return 0;
}
Output:
substring
S found!
S found!

Why is my string extraction function using back referencing in regex not working as intended?

Extraction Function
string extractStr(string str, string regExpStr) {
regex regexp(regExpStr);
smatch m;
regex_search(str, m, regexp);
string result = "";
for (string x : m)
result = result + x;
return result;
}
The Main Code
#include <iostream>
#include <regex>
using namespace std;
string extractStr(string, string);
int main(void) {
string test = "(1+1)*(n+n)";
cout << extractStr(test, "n\\+n") << endl;
cout << extractStr(test, "(\\d)\\+\\1") << endl;
cout << extractStr(test, "([a-zA-Z])[+-/*]\\1") << endl;
cout << extractStr(test, "([a-zA-Z])[+-/*]([a-zA-Z])") << endl;
return 0;
}
The Output
String = (1+1)*(n+n)
n\+n = n+n
(\d)\+\1 = 1+11
([a-zA-Z])[+-/*]\1 = n+nn
([a-zA-Z])[+-/*]([a-zA-Z]) = n+nnn
If anyone could kindly point the error I've done or point me to a similar question in SO that I've missed while searching, it would be greatly appreciated.
Regexes in C++ don't work quite like "normal" regexes. Specialy when you are looking for multiple groups later. I also have some C++ tips in here (constness and references).
#include <cassert>
#include <iostream>
#include <sstream>
#include <regex>
#include <string>
// using namespace std; don't do this!
// https://stackoverflow.com/questions/1452721/why-is-using-namespace-std-considered-bad-practice
// pass strings by const reference
// 1. const, you promise not to change them in this function
// 2. by reference, you avoid making copies
std::string extractStr(const std::string& str, const std::string& regExpStr)
{
std::regex regexp(regExpStr);
std::smatch m;
std::ostringstream os; // streams are more efficient for building up strings
auto begin = str.cbegin();
bool comma = false;
// C++ matches regexes in parts so work you need to loop
while (std::regex_search(begin, str.end(), m, regexp))
{
if (comma) os << ", ";
os << m[0];
comma = true;
begin = m.suffix().first;
}
return os.str();
}
// small helper function to produce nicer output for your tests.
void test(const std::string& input, const std::string& regex, const std::string& expected)
{
auto output = extractStr(input, regex);
if (output == expected)
{
std::cout << "test succeeded : output = " << output << "\n";
}
else
{
std::cout << "test failed : output = " << output << ", expected : " << expected << "\n";
}
}
int main(void)
{
std::string input = "(1+1)*(n+n)";
test(input, "n\\+n", "n+n");
test(input, "(\\d)\\+\\1", "1+1");
test(input, "([a-zA-Z])[+-/*]\\1", "n+n");
return 0;
}

tokenizing a string using C++

I have a string
"server ('m1.labs.teradata.com') username ('user5') password ('user)5') dbname ('default') "
I want to separate it as
string1 = server
string2 = 'm1.labs.teradata.com'
and password has a ')' in it.
Can anyone help me out in how to use it using regex.??
I only tested the regex to extract your items, but I think the following code snippet will work.
#include <regex>
#include <iostream>
int main()
{
const std::string s = "server ('m1.labs.teradata.com') username ('user5') password ('user)5') dbname ('default') ";
std::regex rgx("server\s+\(\'[^']+\'\)\s+username\s+(\'[^']+\'\)\s+password\s+\(\'[^']*\'\)\s+dbname\s+\(\'[^']+\'\)");
std::smatch match;
if (std::regex_search(s.begin(), s.end(), match, rgx)) {
std::cout << "match: " << match[1] << '\n';
std::cout << "match: " << match[2] << '\n';
....
}
}
In the following example you will iterate over all matches in the regular expression.
#include <iostream>
#include <string>
#include <regex>
int main()
{
std::string str("server ('m1.labs.teradata.com') username ('user5') password ('user)5') dbname ('default') ");
std::regex r("server\s+\(\'[^']+\'\)\s+username\s+(\'[^']+\'\)\s+password\s+\(\'[^']*\'\)\s+dbname\s+\(\'[^']+\'\)");
std::smatch m;
std::regex_search(str, m, r);
for(auto v: m) std::cout << v << std::endl; // Here you will iterate over all matches
}
For your other querstion with passing string to a function:
void print(const std::string& input)
{
cout << input << endl;
}
or a const char*:
void print(const char* input)
{
cout << input << endl;
}
Both ways allow you to call it like this:
print("Hello World!\n"); // A temporary is made
std::string someString = //...
print(someString); // No temporary is made
The second version does require c_str() to be called for std::strings:
print("Hello World!\n"); // No temporary is made
std::string someString = //...
print(someString.c_str()); // No temporary is made

Multilines regex in C++

Actually, I try to find a regular expression in a multilines string, but I think I'm on the wrong way to find the next regular expression after a new line (equals to a '\n'). Here is my regex :
#include <iostream>
#include <fstream>
#include <sstream>
#include <regex>
#define USE ".*[0-9]{2}\\.[0-9]{2}\\.[0-9]{2}\\.[0-9]{2}\\.[0-9]{2}.*(?:\\n)*"
int main(int argc, char **argv)
{
std::stringstream stream;
std::filebuf *buffer;
std::fstream fs;
std::string str;
std::regex regex(USE);
if (argc != 2)
{
std::cerr << "Think to the use !" << std::endl;
return (-1);
}
fs.open(argv[1]);
if (fs)
{
stream << (buffer = fs.rdbuf());
str = stream.str();
if (std::regex_match(str, regex))
std::cout << "Yes" << std::endl;
else
std::cout << "No" << std::endl;
fs.close();
}
return (0);
}
There are some flags that can be specified when constructing regex object, see
documentation http://en.cppreference.com/w/cpp/regex/basic_regex for details.
Short working example with regex::extended flag, where newline character '\n' is specified in search follows:
#include <iostream>
#include <regex>
int main(int argc, char **argv)
{
std::string str = "Hello, world! \n This is new line 2 \n and last one 3.";
std::string regex = ".*2.*\n.*3.*";
std::regex reg(regex, std::regex::extended);
std::cout << "Input: " << str << std::endl;
if(std::regex_match(str, reg))
std::cout << "Match" << std::endl;
else
std::cout << "NOT match" << std::endl;
return 0;
}

This regex doesn't work in c++

It is supposed to match "abababab" since "ab" is repeated more than two times consecutively but the code isn't printing any output.
Is there some other trick in using regex in C++.
I tried with other languages and it works just fine.
#include<bits/stdc++.h>
int main(){
std::string s ("xaxababababaxax");
std::smatch m;
std::regex e ("(.+)\1\1+");
while (std::regex_search (s,m,e)) {
for (auto x:m) std::cout << x << " ";
std::cout << std::endl;
s = m.suffix().str();
}
return 0;
}
Your problem is your backslashes are escaping the '1''s in your string. You need to inform std::regex to treat them as '\' 's. You can do this by using a raw string R"((.+)\1\1+)", or by escaping the slashes, as shown here:
#include <regex>
#include <string>
#include <iostream>
int main(){
std::string s ("xaxababababaxax");
std::smatch m;
std::regex e ("(.+)\\1\\1+");
while (std::regex_search (s,m,e)) {
for (auto x:m) std::cout << x << " ";
std::cout << std::endl;
s = m.suffix().str();
}
return 0;
}
Which produces the output
abababab ab