Split string by regex in VC++ - c++

I am using VC++ 10 in a project. Being new to C/C++ I just Googled, it appears that in standard C++ doesnt have regex? VC++ 10 seems to have regex. However, how do I do a regex split? Do I need boost just for that?
Searching the web, I found that many recommend Boost for many things, tokenizing/splitting string, parsing (PEG), and now even regex (though this should be build in ...). Can I conclude boost is a must have? Its 180MB for just trivial things, supported naively in many languages?

C++11 standard has std::regex. It also included in TR1 for Visual Studio 2010. Actually TR1 is available since VS2008, it's hidden under std::tr1 namespace. So you don't need Boost.Regex for VS2008 or later.
Splitting can be performed using regex_token_iterator:
#include <iostream>
#include <string>
#include <regex>
const std::string s("The-meaning-of-life-and-everything");
const std::tr1::regex separator("-");
const std::tr1::sregex_token_iterator endOfSequence;
std::tr1::sregex_token_iterator token(s.begin(), s.end(), separator, -1);
while(token != endOfSequence)
{
std::cout << *token++ << std::endl;
}
if you need to get also the separator itself, you could obtain it from sub_match object pointed by token, it is pair containing start and end iterators of token.
while(token != endOfSequence)
{
const std::tr1::sregex_token_iterator::value_type& subMatch = *token;
if(subMatch.first != s.begin())
{
const char sep = *(subMatch.first - 1);
std::cout << "Separator: " << sep << std::endl;
}
std::cout << *token++ << std::endl;
}
This is sample for case when you have single char separator. If separator itself can be any substring you need to do some more complex iterator work and possible store previous token submatch object.
Or you can use regex groups and place separators in first group and the real token in second:
const std::string s("The-meaning-of-life-and-everything");
const std::tr1::regex separatorAndStr("(-*)([^-]*)");
const std::tr1::sregex_token_iterator endOfSequence;
// Separators will be 0th, 2th, 4th... tokens
// Real tokens will be 1th, 3th, 5th... tokens
int subMatches[] = { 1, 2 };
std::tr1::sregex_token_iterator token(s.begin(), s.end(), separatorAndStr, subMatches);
while(token != endOfSequence)
{
std::cout << *token++ << std::endl;
}
Not sure it is 100% correct, but just to illustrate the idea.

Here an example from this blog.
You'll have all your matches in res
std::tr1::cmatch res;
str = "<h2>Egg prices</h2>";
std::tr1::regex rx("<h(.)>([^<]+)");
std::tr1::regex_search(str.c_str(), res, rx);
std::cout << res[1] << ". " << res[2] << "\n";

Related

Get all numbers from string c++

I know this question is asked for several times, but none of the answer fits with my need.
So I have this string
Sep=1, V_Batt=7.40, I_Batt=-559.63, V_SA=7.20, I_SA=-0.55, I_MB=500.25, V_5v=4.95, I_5v=446.20, V_3v=3.28, I_3v=3.45, S=0, T_Batt=25.24, T_SA1=22.95, T_SA2=-4.86
I want to get all of the number after the "=" sign and make a new string like
1,7.40,559.63,7.20,0.55,500.25,4.95,446.20,3.28,3.45,0,25.24,22.95,4.68
Can anyone help me to solve the problem. I have used stringstream but I got all 0 for my output
Thank you
Based on a corrected understanding of what's actually desired, I'd do things quite differently than I originally suggested. Under the circumstances, I agree with Stephen Webb that a regular expression is probably the right way to go, though I differ as to the right regex to use, and a bit in how to use it (though the latter is probably as much about habits I've formed as anything else).
#include <regex>
#include <iostream>
#include <string>
int main()
{
using iter = std::regex_token_iterator<std::string::const_iterator>;
std::string s = "Sep=1, V_Batt=7.40, I_Batt=-559.63, V_SA=7.20,
" I_SA=-0.55, I_MB=500.25, V_5v=4.95, I_5v=446.20,"
" V_3v=3.28, I_3v=3.45, S=0, T_Batt=25.24, T_SA1=22.95,"
" T_SA2=-4.86";
std::regex re(R"#([A-Z][^=]*=([-\.\d]+))#");
auto begin = iter(s.begin(), s.end(), re, 1);
iter end;
for (auto i = begin; i!= end; ++i)
std::cout << *i << ", ";
std::cout << '\n';
}
Result:
1, 7.40, -559.63, 7.20, -0.55, 500.25, 4.95, 446.20, 3.28, 3.45, 0, 25.24, 22.95, -4.86,
If the number of arguments and their order are known, you can use snprintf like this:
char str[100];
int Sep=1;
double V_Batt = 7.40, I_Batt = 559.63;// etc ...
snprintf(str, 100, "%d,%.2f,%.2f", Sep, V_Batt, I_Batt); //etc...
// str = 1,7.40,559.63
Open your file with fopen() function.
It returns you the File* variable. Of course, if already available your chars, just skip this step.
Use this File variable to get each char, let's say, by means of fgetc().
Check the content of obtained char variable and make what you want with it, eventually insert some comma in your new string, as necessary
That's exactly what std::regex_iterator is for.
#include <regex>
#include <iostream>
#include <string>
int main()
{
const std::string s = "Sep=1, V_Batt=7.40, I_Batt=-559.63, V_SA=7.20, I_SA=-0.55, I_MB=500.25, V_5v=4.95, I_5v=446.20, V_3v=3.28, I_3v=3.45, S=0, T_Batt=25.24, T_SA1=22.95, T_SA2=-4.86";
std::regex re("[-\\d\\.]+");
auto words_begin = std::sregex_iterator(s.begin(), s.end(), re);
auto words_end = std::sregex_iterator();
for (std::sregex_iterator i = words_begin; i != words_end; ++i)
std::cout << (*i).str() << ',';
std::cout << "\n";
}
The output of the above complete program is this.
1,7.40,-559.63,7.20,-0.55,500.25,5,4.95,5,446.20,3,3.28,3,3.45,0,25.24,1,22.95,2,-4.86,

Get string between 2 delimiters using C++ STl TR1 regular expressions

I want to extract string between open and close bracket.So if the content is
145(5)(7)
the output must be
5
7
I tried with C++ STL TR1 using below code
const std::tr1::regex pattern("\\((.*?)\\)");
// the source text
std::string text= "145(5)(7)";
const std::tr1::sregex_token_iterator end;
for (std::tr1::sregex_token_iterator i(text.begin(),
text.end(), pattern);
i != end;
++i)
{
std::cout << *i << std::endl;
}
I am getting the output as,
(5)
(7)
I want the output without delimiters.
Please help me to meet my requirement using STL TR1.
You need to use sregex_iterator instead of sregex_token_iterator and then access the submatches via .str(n):
const std::tr1::regex pattern1("\\((.*?)\\)");
std::string text= "145(5)(7)";
const std::tr1::sregex_iterator end;
for (std::tr1::sregex_iterator i(text.begin(),
text.end(), pattern1);
i != end;
++i)
{
std::cout << (*i).str(1) << std::endl;
}

c++ regex search pattern not found

Following the example here I wrote following code:
using namespace std::regex_constants;
std::string str("{trol,asdfsad},{safsa, aaaaa,aaaaadfs}");
std::smatch m;
std::regex r("\\{(.*)\\}"); // matches anything between {}
std::cout << "Initiating search..." << std::endl;
while (std::regex_search(str, m, r)) {
for (auto x : m) {
std::cout << x << " ";
}
std::cout << std::endl;
str = m.suffix().str();
}
But to my surprise, it doesn't find anything at all which I fail to understand. I would understand if the regex matches whole string since .* is greedy but nothing at all? What am I doing wrong here?
To be clear - I know that regexes are not suitable for Parsing BUT I won't deal with more levels of bracket nesting and therefore I find usage of regexes good enough.
If you want to use basic posix syntax, your regex should be
{\\(.*\\)}
If you want to use default ECMAScript, your regex should be
\\{(.*)\\}
with clang and libc++ or with gcc 4.9+ (since only it fully support regex) your code give:
Initiating search...
{trol,asdfsad},{safsa, aaaaa,aaaaadfs} trol,asdfsad},{safsa, aaaaa,aaaaadfs
Live example on coliru
Eventually it turned out to really be problem with gcc version so I finally got it working using boost::regex library and following code:
std::string str("{trol,asdfsad},{safsa,aaaaa,aaaaadfs}");
boost::regex rex("\\{(.*?)\\}", boost::regex_constants::perl);
boost::smatch result;
while (boost::regex_search(str, result, rex)) {
for (uint i = 0; i < result.size(); ++i) {
std::cout << result[i] << " ";
}
std::cout << std::endl;
str = result.suffix().str();
}

How to use regular expressions to deal with Chinese punctuation symbols in C++

I want to achieve such a result:
Before:
有人可能会问:“那情绪、欲望、冲动、强迫症有什么区别呢?”
After:
有人可能会问 那情绪 欲望 冲动 强迫症有什么区别呢
To space replace Chinese punctuation symbols.
I tried to use replace and replace_if function but failed. The code like this:
char myints[] = "有人可能会问:“那情绪、欲望、冲动、强迫症有什么区别呢?”";
std::vector<char> myvector ;
std::replace_if (myvector.begin(), myvector.end(), "\\pP", " ");
std::cout << "myvector contains:";
for (std::vector<char>::iterator it=myvector.begin(); it!=myvector.end(); ++it)
std::cout << ' ' << *it;
std::cout << '\n';
Assuming you did mean to use a regular expression, rather than a character-by-character replacement function... Here's what I meant by using std::regex_replace. There's probably a more elegant regex that generalizes with fewer surprises, but at least this works for your example.
#include <regex>
#include <string>
int main()
{
std::wstring s(L"有人可能会问:“那情绪、欲望、冲动、强迫症有什么区别呢?”");
// Replace each run of punctuation with a space; use ECMAScript grammar
s = std::regex_replace(s, std::wregex(L"[[:punct:]]+"), L" ");
// Remove extra space at ends of line
s = std::regex_replace(s, std::wregex(L"^ | $"), L"");
return (s != L"有人可能会问 那情绪 欲望 冲动 强迫症有什么区别呢"); // returns 0
}

Regex C++: extract substring

I would like to extract a substring between two others.
ex: /home/toto/FILE_mysymbol_EVENT.DAT
or just FILE_othersymbol_EVENT.DAT
And I would like to get : mysymbol and othersymbol
I don't want to use boost or other libs. Just standard stuffs from C++, except CERN's ROOT lib, with TRegexp, but I don't know how to use it...
Since last year C++ has regular expression built into the standard. This program will show how to use them to extract the string you are after:
#include <regex>
#include <iostream>
int main()
{
const std::string s = "/home/toto/FILE_mysymbol_EVENT.DAT";
std::regex rgx(".*FILE_(\\w+)_EVENT\\.DAT.*");
std::smatch match;
if (std::regex_search(s.begin(), s.end(), match, rgx))
std::cout << "match: " << match[1] << '\n';
}
It will output:
match: mysymbol
It should be noted though, that it will not work in GCC as its library support for regular expression is not very good. Works well in VS2010 (and probably VS2012), and should work in clang.
By now (late 2016) all modern C++ compilers and their standard libraries are fully up to date with the C++11 standard, and most if not all of C++14 as well. GCC 6 and the upcoming Clang 4 support most of the coming C++17 standard as well.
TRegexp only supports a very limited subset of regular expressions compared to other regex flavors. This makes constructing a single regex that suits your needs somewhat awkward.
One possible solution:
[^_]*_([^_]*)_
will match the string until the first underscore, then capture all characters until the next underscore. The relevant result of the match is then found in group number 1.
But in your case, why use a regex at all? Just find the first and second occurrence of your delimiter _ in the string and extract the characters between those positions.
If you want to use regular expressions, I'd really recommend using C++11's regexes or, if you have a compiler that doesn't yet support them, Boost. Boost is something I consider almost-part-of-standard-C++.
But for this particular question, you do not really need any form of regular expressions. Something like this sketch should work just fine, after you add all appropriate error checks (beg != npos, end != npos etc.), test code, and remove my typos:
std::string between(std::string const &in,
std::string const &before, std::string const &after) {
size_type beg = in.find(before);
beg += before.size();
size_type end = in.find(after, beg);
return in.substr(beg, end-beg);
}
Obviously, you could change the std::string to a template parameter and it should work just fine with std::wstring or more seldomly used instantiations of std::basic_string as well.
I would study corner cases before trusting it.
But This is a good candidate:
std::string text = "/home/toto/FILE_mysymbol_EVENT.DAT";
std::regex reg("(.*)(FILE_)(.*)(_EVENT.DAT)(.*)");
std::cout << std::regex_replace(text, reg, "$3") << '\n';
The answers of Some programmer dude, Tim Pietzcker, and Christopher Creutzig are cool and correct, but they seemed to me not very obvious for beginners.
The following function is an attempt to create an auxiliary illustration for Some programmer dude and Tim Pietzcker's answers:
void ExtractSubString(const std::string& start_string
, const std::string& string_regex_extract_substring_template)
{
std::regex regex_extract_substring_template(
string_regex_extract_substring_template);
std::smatch match;
std::cout << std::endl;
std::cout << "A substring extract template: " << std::endl;
std::cout << std::quoted(string_regex_extract_substring_template)
<< std::endl;
std::cout << std::endl;
std::cout << "Start string: " << std::endl;
std::cout << start_string << std::endl;
std::cout << std::endl;
if (std::regex_search(start_string.begin(), start_string.end()
, match, regex_extract_substring_template))
{
std::cout << "match0: " << match[0] << std::endl;
std::cout << "match1: " << match[1] << std::endl;
std::cout << "match2: " << match[2] << std::endl;
}
std::cout << std::endl;
}
The following overloaded function is an attempt to help illustrate Christopher Creutzig's answer:
void ExtractSubString(const std::string& start_string
, const std::string& before_substring, const std::string& after_substring)
{
std::cout << std::endl;
std::cout << "A before substring: " << std::endl;
std::cout << std::quoted(before_substring) << std::endl;
std::cout << std::endl;
std::cout << "An after substring: " << std::endl;
std::cout << std::quoted(after_substring) << std::endl;
std::cout << std::endl;
std::cout << "Start string: " << std::endl;
std::cout << start_string << std::endl;
std::cout << std::endl;
size_t before_substring_begin
= start_string.find(before_substring);
size_t extract_substring_begin
= before_substring_begin + before_substring.size();
size_t extract_substring_end
= start_string.find(after_substring, extract_substring_begin);
std::cout << "Extract substring: " << std::endl;
std::cout
<< start_string.substr(extract_substring_begin
, extract_substring_end - extract_substring_begin)
<< std::endl;
std::cout << std::endl;
}
This is the main function to run the overloaded functions:
#include <regex>
#include <iostream>
#include <iomanip>
int main()
{
const std::string start_string
= "/home/toto/FILE_mysymbol_EVENT.DAT";
const std::string string_regex_extract_substring_template(
".*FILE_(\\w+)_EVENT\\.DAT.*");
const std::string string_regex_extract_substring_template2(
"[^_]*_([^_]*)_");
ExtractSubString(start_string, string_regex_extract_substring_template);
ExtractSubString(start_string, string_regex_extract_substring_template2);
const std::string before_substring = "/home/toto/FILE_";
const std::string after_substring = "_EVENT.DAT";
ExtractSubString(start_string, before_substring, after_substring);
}
This is the result of executing the main function:
A substring extract template:
".*FILE_(\\w+)_EVENT\\.DAT.*"
Start string:
"/home/toto/FILE_mysymbol_EVENT.DAT"
match0: /home/toto/FILE_mysymbol_EVENT.DAT
match1: mysymbol
match2:
A substring extract template:
"[^_]*_([^_]*)_"
Start string:
"/home/toto/FILE_mysymbol_EVENT.DAT"
match0: /home/toto/FILE_mysymbol_
match1: mysymbol
match2:
A before substring:
"/home/toto/FILE_"
An after substring:
"_EVENT.DAT"
Start string:
"/home/toto/FILE_mysymbol_EVENT.DAT"
Extract substring:
mysymbol