Regex: C++ extract text within double quotes - c++

I want to extract only those words within double quotes. So, if the content is:
Would "you" like to have responses to your "questions" sent to you via email?
The answer must be
1- you
2- questions

std::string str("test \"me too\" and \"I\" did it");
std::regex rgx("\"([^\"]*)\""); // will capture "me too"
std::regex_iterator current(str.begin(), str.end(), rgx);
std::regex_iterator end;
while (current != end)
std::cout << *current++;

If you really want to use Regex, you can do it like so:
#include <regex>
#include <sstream>
#include <vector>
#include <iostream>
int main() {
std::string str = R"d(Would "you" like to have responses to your "questions" sent to you via email?)d";
std::regex rgx(R"(\"(\w+)\")");
std::smatch match;
std::string buffer;
std::stringstream ss(str);
std::vector<std::string> strings;
//Split by whitespaces..
while(ss >> buffer)
strings.push_back(buffer);
for(auto& i : strings) {
if(std::regex_match(i,match, rgx)) {
std::ssub_match submatch = match[1];
std::cout << submatch.str() << '\n';
}
}
}
I think only MSVC and Clang supposedly support though, otherwise you can use boost.regex like so.

Use the split() function from this answer then extract odd-numbered items:
std::vector<std::string> itms = split("would \"you\" like \"questions\"?", '"');
for (std::vector<std::string>::iterator it = itms.begin() + 1; it != itms.end(); it += 2) {
std::cout << *it << endl;
}

Related

Regex not able to show chars after space

I want to break this string into two parts
{[data1]name=NAME1}{[data2]name=NAME2}
1) {[data1]name=NAME1}
2){[data2]name=NAME2}
I am using Regex to attain this and this works fine with the above string , but if i add space to the name then the regex does not take characters after the space.
{[data1]name=NAME 1}{[data2]name=NAME 2}
In this string it breaks only till NAME and does not show the 1 and 2 chars
This is my code
std::string stdstrData = "{[data1]name=NAME1}{[data2]name=NAME2}"
std::vector<std::string> commandSplitUnderScore;
std::regex re("\\}");
std::sregex_token_iterator iter(stdstrData.begin(), stdstrData.end(), re, -1);
std::sregex_token_iterator end;
while (iter != end) {
if (iter->length()) {
commandSplitUnderScore.push_back(*iter);
}
++iter;
}
for (auto& str : commandSplitUnderScore) {
std::cout << str << std::endl;
}
A good place to start is to use regex101.com and debug your regex before putting it into your c++ code. e.g. https://regex101.com/r/ID6OSj/1 (don't forget to escape your C++ string properly when copying the regex you made on that site).
Example :
#include <iostream>
#include <string>
#include <regex>
int main()
{
std::string input{ "{[data]name=NAME1}{[data]name=NAME2}" };
std::regex rx{ "(\\{\\[data\\].*?\\})(\\{\\[data\\].*?\\})" };
std::smatch match;
if (std::regex_match(input, match, rx))
{
std::cout << match[1] << "\n"; // match[1] is first group
std::cout << match[2] << "\n"; // match[2] is second group
}
return 0;
}

Multiple substrings in between the same delimiters

I am new to c++ and would like to know how to extract multiple substrings, from a single string, in between the same delimiters?
ex.
"{("id":"4219","firstname":"Paul"),("id":"4349","firstname":"Joe"),("id":"4829","firstname":"Brandy")}"
I want the ids:
4219 , 4349 , 4829
You can use regex to match the ids:
#include <iostream>
#include <regex>
int main() {
// This is your string.
std::string s{ R"({("id":"4219","firstname":"Paul"),("id":"4349","firstname":"Joe"),"("id":"4829","firstname":"Brandy")})"};
// Matches "id":"<any number of digits>"
// The id will be captured in the first group
std::regex r(R"("id"\s*:\s*"(\d+))");
// Make iterators that perform the matching
auto ids_begin = std::sregex_iterator(s.begin(), s.end(), r);
auto ids_end = std::sregex_iterator();
// Iterate the matches and print the first group of each of them
// (where the id is captured)
for (auto it = ids_begin; it != ids_end; ++it) {
std::smatch match = *it;
std::cout << match[1].str() << ',';
}
}
See it live on Coliru
Well, here is the q&d hack:
#include <iostream>
#include <sstream>
#include <string>
int main()
{
std::string s{ "{(\"id\":\"4219\",\"firstname\":\"Paul\"),"
"(\"id\":\"4349\",\"firstname\":\"Joe\"),"
"(\"id\":\"4829\",\"firstname\":\"Brandy\")}"
};
std::string id{ "\"id\":\"" };
for (auto f = s.find("\"id\":\""); f != s.npos; f = s.find(id, f)) {
std::istringstream iss{ std::string{ s.begin() + (f += id.length()), s.end() } };
int id; iss >> id;
std::cout << id << '\n';
}
}
Reliable? Well, just hope nobody names children "id":" ...

regex_iterator not matching groups in regular expression

How to extract Test and Again from string s in below code.
Currently I am using regex_iterator and it doesn't seems to be matching groups in regular expression and I am getting {{Test}} and {{Again}} in output.
#include <regex>
#include <iostream>
int main()
{
const std::string s = "<abc>{{Test}}</abc><def>{{Again}}</def>";
std::regex rgx("\\{\\{(\\w+)\\}\\}");
std::smatch match;
std::sregex_iterator next(s.begin(), s.end(), rgx);
std::sregex_iterator end;
while (next != end) {
std::smatch match = *next;
std::cout << match.str() << "\n";
next++;
}
return 0;
}
I also tried using regex_search but it is not working with multiple patterns and only giving Test ouput
#include <regex>
#include <iostream>
int main()
{
const std::string s = "<abc>{{Test}}</abc><def>{{Again}}</def>";
std::regex rgx("\\{\\{(\\w+)\\}\\}");
std::smatch match;
if (std::regex_search(s, match, rgx,std::regex_constants::match_any))
{
std::cout<<"Match size is "<<match.size()<<std::endl;
for(auto elem:match)
std::cout << "match: " << elem << '\n';
}
}
Also as a side note why two backslashes are needed to escape { or }
To access the contents of the capturing group you need to use .str(1):
std::cout << match.str(1) << std::endl;
See the C++ demo:
#include <regex>
#include <iostream>
int main()
{
const std::string s = "<abc>{{Test}}</abc><def>{{Again}}</def>";
// std::regex rgx("\\{\\{(\\w+)\\}\\}");
// Better, use a raw string literal:
std::regex rgx(R"(\{\{(\w+)\}\})");
std::smatch match;
std::sregex_iterator next(s.begin(), s.end(), rgx);
std::sregex_iterator end;
while (next != end) {
std::smatch match = *next;
std::cout << match.str(1) << std::endl;
next++;
}
return 0;
}
Output:
Test
Again
Note you do not have to use double backslashes to define a regex escape sequence inside raw string literals (here, R"(pattern_here)").

C++ split string using a list of words as separators

I would like to split a string like this one
“this1245is#g$0,therhsuidthing345”
using a list of words like the one bellow
{“this”, “is”, “the”, “thing”}
into this list
{“this”, “1245”, “is”, “#g$0,”, “the”, “rhsuid”, “thing”, “345”}
// ^--------------^---------------^------------------^-- these were the delimiters
The delimiters are allowed to appear more than once in the string to split, and it can be done using regular expressions
The precedence is in the order in which the delimiters appear in the array
The platform I'm developing for has no support for the Boost library
Update
This is what I have for the moment
#include <iostream>
#include <string>
#include <regex>
int main ()
{
std::string s ("this1245is#g$0,therhsuidthing345");
std::string delimiters[] = {"this", "is", "the", "thing"};
for (int i=0; i<4; i++) {
std::string delimiter = "(" + delimiters[i] + ")(.*)";
std::regex e (delimiter); // matches words beginning by the i-th delimiter
// default constructor = end-of-sequence:
std::sregex_token_iterator rend;
std::cout << "1st and 2nd submatches:";
int submatches[] = { 1, 2 };
std::sregex_token_iterator c ( s.begin(), s.end(), e, submatches );
while (c!=rend) std::cout << " [" << *c++ << "]";
std::cout << std::endl;
}
return 0;
}
output:
1st and 2nd submatches:[this][x1245fisA#g$0,therhsuidthing345]
1st and 2nd submatches:[is][x1245fisA#g$0,therhsuidthing345]
1st and 2nd submatches:[the][rhsuidthing345]
1st and 2nd submatches:[thing][345]
I think I need to make some recursive thing to call on each iteration
Build the expression you want for matches only (re), then pass in {-1, 0} to your std::sregex_token_iterator to return all non-matches (-1) and matches (0).
#include <iostream>
#include <regex>
int main() {
std::string s("this1245is#g$0,therhsuidthing345");
std::regex re("(this|is|the|thing)");
std::sregex_token_iterator iter(s.begin(), s.end(), re, { -1, 0 });
std::sregex_token_iterator end;
while (iter != end) {
//Works in vc13, clang requires you increment separately,
//haven't gone into implementation to see if/how ssub_match is affected.
//Workaround: increment separately.
//std::cout << "[" << *iter++ << "] ";
std::cout << "[" << *iter << "] ";
++iter;
}
}
I don't know how to perform the precedence requirement. This seems to work on the given input:
std::vector<std::string> parse (std::string s)
{
std::vector<std::string> out;
std::regex re("\(this|is|the|thing).*");
std::string word;
auto i = s.begin();
while (i != s.end()) {
std::match_results<std::string::iterator> m;
if (std::regex_match(i, s.end(), m, re)) {
if (!word.empty()) {
out.push_back(word);
word.clear();
}
out.push_back(std::string(m[1].first, m[1].second));
i += out.back().size();
} else {
word += *i++;
}
}
if (!word.empty()) {
out.push_back(word);
}
return out;
}
vector<string> strs;
boost::split(strs,line,boost::is_space());

C++ split string on multiple substrings

In C++, I'd like to something similar to:
Split on substring
However, I'd like to specify more than one substring to split on. For example, I'd like to split on "+", "foo", "ba" for the string "fda+hifoolkjba4" into a vector of "fda", "hi", "lkj", "4". Any suggestions? Preferably within STL and Boost (I'd rather not have to incorporate the Qt framework or additional libraries if I can avoid it).
I would go with regular expressions, either from <regex> or <boost/regex.hpp>; the regular expression you need would be something like (.*?)(\+|foo|ba) (plus the final token).
Here's a cut-down example using Boost:
std::string str(argv[1]);
boost::regex r("(.*?)(\\+|foo|ba)");
boost::sregex_iterator rt(str.begin(), str.end(), r), rend;
std::string final;
for ( ; rt != rend; ++rt)
{
std::cout << (*rt)[1] << std::endl;
final = rt->suffix();
}
std::cout << final << std::endl;
I suggest using regular expression support in boost. See here for an example.
here is a sample code that can split the string:
#include <iostream>
#include <boost/regex.hpp>
using namespace std;
int main()
{
boost::regex re("(\\+|foo|ba)");
std::string s("fda+hifoolkjba4");
boost::sregex_token_iterator i(s.begin(), s.end(), re, -1);
boost::sregex_token_iterator j;
while (i != j) {
std::cout << *i++ << " ";
}
std::cout << std::endl;
return 0;
}