I want to break this string into two parts
{[data1]name=NAME1}{[data2]name=NAME2}
1) {[data1]name=NAME1}
2){[data2]name=NAME2}
I am using Regex to attain this and this works fine with the above string , but if i add space to the name then the regex does not take characters after the space.
{[data1]name=NAME 1}{[data2]name=NAME 2}
In this string it breaks only till NAME and does not show the 1 and 2 chars
This is my code
std::string stdstrData = "{[data1]name=NAME1}{[data2]name=NAME2}"
std::vector<std::string> commandSplitUnderScore;
std::regex re("\\}");
std::sregex_token_iterator iter(stdstrData.begin(), stdstrData.end(), re, -1);
std::sregex_token_iterator end;
while (iter != end) {
if (iter->length()) {
commandSplitUnderScore.push_back(*iter);
}
++iter;
}
for (auto& str : commandSplitUnderScore) {
std::cout << str << std::endl;
}
A good place to start is to use regex101.com and debug your regex before putting it into your c++ code. e.g. https://regex101.com/r/ID6OSj/1 (don't forget to escape your C++ string properly when copying the regex you made on that site).
Example :
#include <iostream>
#include <string>
#include <regex>
int main()
{
std::string input{ "{[data]name=NAME1}{[data]name=NAME2}" };
std::regex rx{ "(\\{\\[data\\].*?\\})(\\{\\[data\\].*?\\})" };
std::smatch match;
if (std::regex_match(input, match, rx))
{
std::cout << match[1] << "\n"; // match[1] is first group
std::cout << match[2] << "\n"; // match[2] is second group
}
return 0;
}
Related
I need to get sentences with regex from String with word "walk". Now I am trying just to get sentences
std::string s ("Hello world! My name is Mike. Why so serious?");
std::smatch m;
std::regex e ("^\\s+[A-Za-z,;'\"\\s]+[.?!]$"); // matches words beginning by "sub"
while (std::regex_search (s,m,e)) {
for (auto w:m)
std::cout << w << "\n" ;
}
And this doesn't work.
Apart from start and end of the string in regex, you are forgetting to update the 's' with the suffix.
#include <iostream>
#include <regex>
int main()
{
std::string s ("Hello world! My name is Mike. Why so serious?");
std::smatch m;
std::regex e ("\\s?[A-Za-z,;'\"\\s]+[.?!]");
while(std::regex_search (s,m,e))
{
std::cout << m.str() << "\n" ;
s = m.suffix();
}
return 0;
}
What regex expression should I use to search all occurrences that match:
Start with 55 or 66
followed by a minimum 8 characters in the range of [0-9a-fA-F] (HEX numbers)
Ends with \r (a carriage return)
Example string: 0205065509085503400066/r09\r
My expected result:
5509085503400066\r
5503400066\r
My current result:
5509085503400066\r
Using
(?:55|66)[0-9a-fA-F]{8,}\r
As you can sie, this finds onlny the first result but not the second one.
Edit clarification
I search the string using Regex. It'll select the message for further parsing. The target string can start anywhere in the string. The target string is only valid if it only contains base-16 (HEX) numbers, and ends with a carriage return.
[start] [information part minimum 8 chars] [end symbol-carigge return]
I'm using the std::regex library in c++11 with the flag ECMAScript
Edit
I have created an alternative solution that gives me the expected result. But this is not pure regex.
#include <iostream>
#include <string>
#include <regex>
int main()
{
// repeated search (see also
std::regex_iterator)
std::string log("0055\r0655036608090705\r");
std::regex r("(?:55|66)[0-9a-fA-F]{8,}\r");
std::smatch sm;
while(regex_search(log, sm, r))
{
std::cout << sm.str() << '\n';
log = sm.str();
log += sm.suffix();
log[0] = 'a' ;
}
}
** Edit: Working regex solution based on comments **
#include <iostream>
#include <string>
#include <regex>
int main()
{
// repeated search (see also
std::regex_iterator)
std::string s("0055\r06550003665508090705\r0970");
std::regex r("(?=((?:55|66)[0-9a-fA-F]{8,}\r))");
auto words_begin =
std::sregex_iterator(s.begin(), s.end(), r);
auto words_end = std::sregex_iterator();
std::cout << "Found "
<< std::distance(words_begin, words_end)
<< " words:\n";
for (std::sregex_iterator i = words_begin; i != words_end; ++i) {
std::smatch match = *i;
std::string match_str = s.substr(match.position(1), match.length(1) - 1); //-1 cr
std::cout << match_str << " =" << match.position(1) << '\n';
}
}
Your are actually looking for overlapping matches. This can be achieved using a regex lookahead like this:
(?=((?:55|66)[0-9a-fA-F]{8,}\/r))
You will find the matches in question in group 1. The full-match, however, is empty.
Regex Demo (using /r instead of a carriage return for demonstration purposes only)
Sample Code:
#include <iostream>
#include <string>
#include <regex>
using namespace std;
int main() {
std::string subject("0055\r06550003665508090705\r0970");
try {
std::regex re("(?=((?:55|66)[0-9a-fA-F]{8,}\r))");
std::sregex_iterator next(subject.begin(), subject.end(), re);
std::sregex_iterator end;
while (next != end) {
std::smatch match = *next;
std::cout << match.str(1) << "\n";
next++;
}
} catch (std::regex_error& e) {
// Syntax error in the regular expression
}
return 0;
}
See also: Regex-Info: C++ Regular Expressions with std::regex
How to extract Test and Again from string s in below code.
Currently I am using regex_iterator and it doesn't seems to be matching groups in regular expression and I am getting {{Test}} and {{Again}} in output.
#include <regex>
#include <iostream>
int main()
{
const std::string s = "<abc>{{Test}}</abc><def>{{Again}}</def>";
std::regex rgx("\\{\\{(\\w+)\\}\\}");
std::smatch match;
std::sregex_iterator next(s.begin(), s.end(), rgx);
std::sregex_iterator end;
while (next != end) {
std::smatch match = *next;
std::cout << match.str() << "\n";
next++;
}
return 0;
}
I also tried using regex_search but it is not working with multiple patterns and only giving Test ouput
#include <regex>
#include <iostream>
int main()
{
const std::string s = "<abc>{{Test}}</abc><def>{{Again}}</def>";
std::regex rgx("\\{\\{(\\w+)\\}\\}");
std::smatch match;
if (std::regex_search(s, match, rgx,std::regex_constants::match_any))
{
std::cout<<"Match size is "<<match.size()<<std::endl;
for(auto elem:match)
std::cout << "match: " << elem << '\n';
}
}
Also as a side note why two backslashes are needed to escape { or }
To access the contents of the capturing group you need to use .str(1):
std::cout << match.str(1) << std::endl;
See the C++ demo:
#include <regex>
#include <iostream>
int main()
{
const std::string s = "<abc>{{Test}}</abc><def>{{Again}}</def>";
// std::regex rgx("\\{\\{(\\w+)\\}\\}");
// Better, use a raw string literal:
std::regex rgx(R"(\{\{(\w+)\}\})");
std::smatch match;
std::sregex_iterator next(s.begin(), s.end(), rgx);
std::sregex_iterator end;
while (next != end) {
std::smatch match = *next;
std::cout << match.str(1) << std::endl;
next++;
}
return 0;
}
Output:
Test
Again
Note you do not have to use double backslashes to define a regex escape sequence inside raw string literals (here, R"(pattern_here)").
I have a string 'CCCC' and I want to match 'CCC' in it, with overlap.
My code:
...
std::string input_seq = "CCCC";
std::regex re("CCC");
std::sregex_iterator next(input_seq.begin(), input_seq.end(), re);
std::sregex_iterator end;
while (next != end) {
std::smatch match = *next;
std::cout << match.str() << "\t" << "\t" << match.position() << "\t" << "\n";
next++;
}
...
However this only returns
CCC 0
and skips the CCC 1 solution, which is needed for me.
I read about non-greedy '?' matching, but I could not make it work
Your regex can be put into the capturing parentheses that can be wrapped with a positive lookahead.
To make it work on Mac, too, make sure the regex matches (and thus consumes) a single char at each match by placing a . (or - to also match line break chars - [\s\S]) after the lookahead.
Then, you will need to amend the code to get the first capturing group value like this:
#include <iostream>
#include <regex>
#include <string>
using namespace std;
int main() {
std::string input_seq = "CCCC";
std::regex re("(?=(CCC))."); // <-- PATTERN MODIFICATION
std::sregex_iterator next(input_seq.begin(), input_seq.end(), re);
std::sregex_iterator end;
while (next != end) {
std::smatch match = *next;
std::cout << match.str(1) << "\t" << "\t" << match.position() << "\t" << "\n"; // <-- SEE HERE
next++;
}
return 0;
}
See the C++ demo
Output:
CCC 0
CCC 1
I want to extract only those words within double quotes. So, if the content is:
Would "you" like to have responses to your "questions" sent to you via email?
The answer must be
1- you
2- questions
std::string str("test \"me too\" and \"I\" did it");
std::regex rgx("\"([^\"]*)\""); // will capture "me too"
std::regex_iterator current(str.begin(), str.end(), rgx);
std::regex_iterator end;
while (current != end)
std::cout << *current++;
If you really want to use Regex, you can do it like so:
#include <regex>
#include <sstream>
#include <vector>
#include <iostream>
int main() {
std::string str = R"d(Would "you" like to have responses to your "questions" sent to you via email?)d";
std::regex rgx(R"(\"(\w+)\")");
std::smatch match;
std::string buffer;
std::stringstream ss(str);
std::vector<std::string> strings;
//Split by whitespaces..
while(ss >> buffer)
strings.push_back(buffer);
for(auto& i : strings) {
if(std::regex_match(i,match, rgx)) {
std::ssub_match submatch = match[1];
std::cout << submatch.str() << '\n';
}
}
}
I think only MSVC and Clang supposedly support though, otherwise you can use boost.regex like so.
Use the split() function from this answer then extract odd-numbered items:
std::vector<std::string> itms = split("would \"you\" like \"questions\"?", '"');
for (std::vector<std::string>::iterator it = itms.begin() + 1; it != itms.end(); it += 2) {
std::cout << *it << endl;
}