C++ split string on multiple substrings - c++

In C++, I'd like to something similar to:
Split on substring
However, I'd like to specify more than one substring to split on. For example, I'd like to split on "+", "foo", "ba" for the string "fda+hifoolkjba4" into a vector of "fda", "hi", "lkj", "4". Any suggestions? Preferably within STL and Boost (I'd rather not have to incorporate the Qt framework or additional libraries if I can avoid it).

I would go with regular expressions, either from <regex> or <boost/regex.hpp>; the regular expression you need would be something like (.*?)(\+|foo|ba) (plus the final token).
Here's a cut-down example using Boost:
std::string str(argv[1]);
boost::regex r("(.*?)(\\+|foo|ba)");
boost::sregex_iterator rt(str.begin(), str.end(), r), rend;
std::string final;
for ( ; rt != rend; ++rt)
{
std::cout << (*rt)[1] << std::endl;
final = rt->suffix();
}
std::cout << final << std::endl;

I suggest using regular expression support in boost. See here for an example.
here is a sample code that can split the string:
#include <iostream>
#include <boost/regex.hpp>
using namespace std;
int main()
{
boost::regex re("(\\+|foo|ba)");
std::string s("fda+hifoolkjba4");
boost::sregex_token_iterator i(s.begin(), s.end(), re, -1);
boost::sregex_token_iterator j;
while (i != j) {
std::cout << *i++ << " ";
}
std::cout << std::endl;
return 0;
}

Related

Regex not able to show chars after space

I want to break this string into two parts
{[data1]name=NAME1}{[data2]name=NAME2}
1) {[data1]name=NAME1}
2){[data2]name=NAME2}
I am using Regex to attain this and this works fine with the above string , but if i add space to the name then the regex does not take characters after the space.
{[data1]name=NAME 1}{[data2]name=NAME 2}
In this string it breaks only till NAME and does not show the 1 and 2 chars
This is my code
std::string stdstrData = "{[data1]name=NAME1}{[data2]name=NAME2}"
std::vector<std::string> commandSplitUnderScore;
std::regex re("\\}");
std::sregex_token_iterator iter(stdstrData.begin(), stdstrData.end(), re, -1);
std::sregex_token_iterator end;
while (iter != end) {
if (iter->length()) {
commandSplitUnderScore.push_back(*iter);
}
++iter;
}
for (auto& str : commandSplitUnderScore) {
std::cout << str << std::endl;
}
A good place to start is to use regex101.com and debug your regex before putting it into your c++ code. e.g. https://regex101.com/r/ID6OSj/1 (don't forget to escape your C++ string properly when copying the regex you made on that site).
Example :
#include <iostream>
#include <string>
#include <regex>
int main()
{
std::string input{ "{[data]name=NAME1}{[data]name=NAME2}" };
std::regex rx{ "(\\{\\[data\\].*?\\})(\\{\\[data\\].*?\\})" };
std::smatch match;
if (std::regex_match(input, match, rx))
{
std::cout << match[1] << "\n"; // match[1] is first group
std::cout << match[2] << "\n"; // match[2] is second group
}
return 0;
}

Regex search overlapping matches c++11

What regex expression should I use to search all occurrences that match:
Start with 55 or 66
followed by a minimum 8 characters in the range of [0-9a-fA-F] (HEX numbers)
Ends with \r (a carriage return)
Example string: 0205065509085503400066/r09\r
My expected result:
5509085503400066\r
5503400066\r
My current result:
5509085503400066\r
Using
(?:55|66)[0-9a-fA-F]{8,}\r
As you can sie, this finds onlny the first result but not the second one.
Edit clarification
I search the string using Regex. It'll select the message for further parsing. The target string can start anywhere in the string. The target string is only valid if it only contains base-16 (HEX) numbers, and ends with a carriage return.
[start] [information part minimum 8 chars] [end symbol-carigge return]
I'm using the std::regex library in c++11 with the flag ECMAScript
Edit
I have created an alternative solution that gives me the expected result. But this is not pure regex.
#include <iostream>
#include <string>
#include <regex>
int main()
{
// repeated search (see also
std::regex_iterator)
std::string log("0055\r0655036608090705\r");
std::regex r("(?:55|66)[0-9a-fA-F]{8,}\r");
std::smatch sm;
while(regex_search(log, sm, r))
{
std::cout << sm.str() << '\n';
log = sm.str();
log += sm.suffix();
log[0] = 'a' ;
}
}
** Edit: Working regex solution based on comments **
#include <iostream>
#include <string>
#include <regex>
int main()
{
// repeated search (see also
std::regex_iterator)
std::string s("0055\r06550003665508090705\r0970");
std::regex r("(?=((?:55|66)[0-9a-fA-F]{8,}\r))");
auto words_begin =
std::sregex_iterator(s.begin(), s.end(), r);
auto words_end = std::sregex_iterator();
std::cout << "Found "
<< std::distance(words_begin, words_end)
<< " words:\n";
for (std::sregex_iterator i = words_begin; i != words_end; ++i) {
std::smatch match = *i;
std::string match_str = s.substr(match.position(1), match.length(1) - 1); //-1 cr
std::cout << match_str << " =" << match.position(1) << '\n';
}
}
Your are actually looking for overlapping matches. This can be achieved using a regex lookahead like this:
(?=((?:55|66)[0-9a-fA-F]{8,}\/r))
You will find the matches in question in group 1. The full-match, however, is empty.
Regex Demo (using /r instead of a carriage return for demonstration purposes only)
Sample Code:
#include <iostream>
#include <string>
#include <regex>
using namespace std;
int main() {
std::string subject("0055\r06550003665508090705\r0970");
try {
std::regex re("(?=((?:55|66)[0-9a-fA-F]{8,}\r))");
std::sregex_iterator next(subject.begin(), subject.end(), re);
std::sregex_iterator end;
while (next != end) {
std::smatch match = *next;
std::cout << match.str(1) << "\n";
next++;
}
} catch (std::regex_error& e) {
// Syntax error in the regular expression
}
return 0;
}
See also: Regex-Info: C++ Regular Expressions with std::regex

Preg match all in pcre c++

Hello this is my string
last_name, first_name
bjorge, philip
kardashian, kim
mercury, freddie
in php i am using preg_match_all (pcre) to start regex process
preg_match_all("/(.*), (.*)/", $input_lines, $output_array);
now i installed pcre on c++ and i want to know what exactly process in c++ pcre that equal my php code? what exactly function in c++ pcre that work like php preg_match_all ?
In C++ 11 regular expressions are supported by standard library. So you don't need to use pcre without any specific reasons.
As for example above, you can achieve the same using standard regular expressions. E.g.:
#include <iostream>
#include <string>
#include <regex>
int main()
{
std::vector<std::string> input = {
"last_name, first_name",
"bjorge, philip",
"kardashian, kim",
"mercury, freddie"
};
std::regex re("(.*), (.*)");
std::smatch pieces;
for (const std::string &s : input) {
if (std::regex_match(s, pieces, re)) {
std::cout << "Pieces: " << pieces.size() << std::endl;
for (size_t i = 0; i < pieces.size(); ++i) {
std::cout << pieces[i].str() << std::endl;
}
}
}
return 0;
}

find alphabetic substring

I have the following strings, from which i want to extract the alphabetic part (alphabetic substring) only which is greater than 1:
% d. i.p.p. attendu --> attendu
aprà ¨ s. expertise --> apr, expertise
n.c.p.c. condamner --> condamner
I am trying the following piece code :
#include <regex>
#include <iostream>
void main()
{
const std::string s = "% d. i.p.p. attendu";
std::regex rgx("[a-zA-Z]{2,20}");
std::smatch match;
if (std::regex_search(s.begin(), s.end(), match, rgx))
std::cout << "match: " << match[1] << '\n';
}
But I am having the following error when i run the code :
Terminate called after throwing an instance of 'std::regex_error' what(): regex_error
Can you please help me,
Thank you,
Hani.
Ok I managed to use boost since gcc's regex is an abomination.
#include <boost/regex.hpp>
void main()
{
const std::string s = "% d. i.p.p. tototo attendu";
boost::regex re("[a-zA-Z]{4,7}");
boost::smatch matches;
if( boost::regex_search( s, matches, re ) )
{
std::string value( matches[0].first, matches[0].second );
cout << value << " ";
}
}
Fine i found attendu but the output is only tototo. It's not incrementing
The return value is "tototo attendu" I was wondering if I can return each value at a time instead of 1 string
I was wondering if I can return each value at a time instead of 1 string
The only way of doing this seems to be via regex_iterator. Here’s an example using Boost:
#include <boost/regex.hpp>
#include <iostream>
int main() {
const std::string s = "% d. i.p.p. tototo attendu";
boost::regex rgx("([a-zA-Z]{2,20})");
boost::smatch match;
boost::sregex_iterator begin{s.begin(), s.end(), rgx},
end{};
for (auto&& i = begin; i != end; ++i)
std::cout << "match: " << *i << '\n';
}
This yields:
match: tototo
match: attendu
Two things:
The return type of main is always int. Your code shouldn’t even compile.
I’ve added parentheses around your (first, which was correct!) regular expression so that it creates a capture for each match. The iterators then iterate over each match in turn.

Regex: C++ extract text within double quotes

I want to extract only those words within double quotes. So, if the content is:
Would "you" like to have responses to your "questions" sent to you via email?
The answer must be
1- you
2- questions
std::string str("test \"me too\" and \"I\" did it");
std::regex rgx("\"([^\"]*)\""); // will capture "me too"
std::regex_iterator current(str.begin(), str.end(), rgx);
std::regex_iterator end;
while (current != end)
std::cout << *current++;
If you really want to use Regex, you can do it like so:
#include <regex>
#include <sstream>
#include <vector>
#include <iostream>
int main() {
std::string str = R"d(Would "you" like to have responses to your "questions" sent to you via email?)d";
std::regex rgx(R"(\"(\w+)\")");
std::smatch match;
std::string buffer;
std::stringstream ss(str);
std::vector<std::string> strings;
//Split by whitespaces..
while(ss >> buffer)
strings.push_back(buffer);
for(auto& i : strings) {
if(std::regex_match(i,match, rgx)) {
std::ssub_match submatch = match[1];
std::cout << submatch.str() << '\n';
}
}
}
I think only MSVC and Clang supposedly support though, otherwise you can use boost.regex like so.
Use the split() function from this answer then extract odd-numbered items:
std::vector<std::string> itms = split("would \"you\" like \"questions\"?", '"');
for (std::vector<std::string>::iterator it = itms.begin() + 1; it != itms.end(); it += 2) {
std::cout << *it << endl;
}