Parsing *.cpp file containing enum using boost::regex. - c++

I alredy parsed file and split content to enum or enum classes.
std::string sourceString = readFromFile(typesHDestination);
boost::smatch xResults;
std::string::const_iterator Start = sourceString.cbegin();
std::string::const_iterator End = sourceString.cend();
while (boost::regex_search(Start, End, xResults, boost::regex("(?<data_type>enum|enum\\s+class)\\s+(?<enum_name>\\w+)\\s*\{(?<content>[^\}]+?)\\s*\}\\s*")))
{
std::cout << xResults["data_type"]
<< " " << xResults["enum_name"] << "\n{\n";
std::string::const_iterator ContentStart = xResults["content"].begin();
std::string::const_iterator ContentEnd = xResults["content"].end();
boost::smatch xResultsInner;
while (boost::regex_search(ContentStart, ContentEnd, xResultsInner, boost::regex("(?<name>\\w+)(?:(?:\\s*=\\s*(?<value>[^\,\\s]+)(?:(?:,)|(?:\\s*)))|(?:(?:\\s*)|(?:,)))")))
{
std::cout << xResultsInner["name"] << ": " << xResultsInner["value"] << std::endl;
ContentStart = xResultsInner[0].second;
}
Start = xResults[0].second;
std::cout << "}\n";
}
Its ok if enums are without comments.
I tried to add named group <comment> to save comments in enums, but failed every time. (\/{2}\s*.+) - sample for comments with double slashes.
I tested using this online regex and with boost::regex.
The first step - from *.cpp file to <data_type> <enum_name> <content>
regex:
(?'data_type'enum|enum\s+class)\s+(?'enum_name'\w+)\s*{\s*(?'content'[^}]+?)\s*}\s*
From <content> to <name> <value> <comment>
regex:
(?'name'\w+)(?:(?:\s*=\s*(?'value'[^\,\s/]+)(?:(?:,)|(?:\s*)))|(?:(?:\s*)|(?:,)))
The last one contains error. Is there any way to fix it and add feature to store coments in group?

As some comments said, may it is not a good idea to parse a source file with Regular Expression except with some simple cases
for example this source file, from: http://en.cppreference.com/w/cpp/language/enum
#include <iostream>
// enum that takes 16 bits
enum smallenum: int16_t
{
a,
b,
c
};
// color may be red (value 0), yellow (value 1), green (value 20), or blue (value 21)
enum color
{
red,
yellow,
green = 20,
blue
};
// altitude may be altitude::high or altitude::low
enum class altitude: char
{
high='h',
low='l', // C++11 allows the extra comma
};
// the constant d is 0, the constant e is 1, the constant f is 3
enum
{
d,
e,
f = e + 2
};
//enumeration types (both scoped and unscoped) can have overloaded operators
std::ostream& operator<<(std::ostream& os, color c)
{
switch(c)
{
case red : os << "red"; break;
case yellow: os << "yellow"; break;
case green : os << "green"; break;
case blue : os << "blue"; break;
default : os.setstate(std::ios_base::failbit);
}
return os;
}
std::ostream& operator<<(std::ostream& os, altitude al)
{
return os << static_cast<char>(al);
}
int main()
{
color col = red;
altitude a;
a = altitude::low;
std::cout << "col = " << col << '\n'
<< "a = " << a << '\n'
<< "f = " << f << '\n';
}
the key pattern here is: starting with enum and end with ; and you cannot predict any text between enum and ; there will be so many possibilities! and for that you can use .*? lazy star
Thus if I want to extract all enums I use:
NOTE: it is not the efficient way
boost::regex rx( "^\\s*(enum.*?;)" );
boost::match_results< std::string::const_iterator > mr; // or boost::smatch
std::ifstream ifs( "file.cpp" );
const uintmax_t file_size = ifs.seekg( 0, std::ios_base::end ).tellg();
ifs.seekg( 0, std::ios_base::beg ); // rewind
std::string whole_file( file_size, ' ' );
ifs.read( &*whole_file.begin(), file_size );
ifs.close();
while( boost::regex_search( whole_file, mr, rx ) ){
std::cout << mr.str( 1 ) << '\n';
whole_file = mr.suffix().str();
}
which the output will be:
enum smallenum: int16_t
{
a,
b,
c
};
enum color
{
red,
yellow,
green = 20,
blue
};
enum class altitude: char
{
high='h',
low='l', // C++11 allows the extra comma
};
enum
{
d,
e,
f = e + 2
};
And Of course for such simple thing I prefer to use:
perl -lne '$/=unlef;print $1 while/^\s*(enum.*?;)/smg' file.cpp
that has the same output.
And may this pattern helps you if you want to match each section separately
^\s*(enum[^{]*)\s*({)\s*([^}]+)\s*(};)
But again this is not a good idea except for some simple source files. Since C++ Source Code has free style and not all code writers follow the standard rules. For example with the pattern above, I assumed that (};) the } comes with ; and if someone separates them ( which is still a valid code ) the pattern will be failed to match.

I argree with the fact that using regex to parse complicated data is not the best solution. I'v made an omission of the few major conditions. First of all, i parsed some kind of generated source code containing emuns and enum classes. So there were no suprises in code, and code was regular. So i parsing regular code with regex.
The Answer:
(the first step is the same, the second was fixed)
How to parse enums/emun classes with regex:
The first step - from *.cpp file to <data_type> <enum_name> <content> regex:
(?'data_type'enum|enum\s+class)\s+(?'enum_name'\w+)\s*{\s*(?'content'[^}]+?)\s*}\s*
From <content> to <name> <value> <comment> regex:
^\s*(?'name'\w+)(?:(?:\s*=\s*(?'value'[^,\n/]+))|(?:[^,\s/]))(?:(?:\s$)|(?:\s*,\s*$)|(?:[^/]/{2}\s(?'comment'.*$)))
All test were ok and here is marked text by colors.

Related

Manipulating the standard output stream to print multiline strings horizontally

So I have three strings and these strings are supposed to occupy 3 lines. I thought this was a good way to represent my string:
std::string str1 = "███████\n███1███\n███████";
std::string str2 = "███████\n███2███\n███████";
std::string str3 = "███████\n███3███\n███████";
But I realise that when I do this and just cout the strings, they get printed on top of each other which is not I want. I want the output to look like this:
█████████████████████
███1██████2██████3███
█████████████████████
How can I achieve this effect? I only know setw to manipulate the output however I don't know how that could help here.
note: I will have these stored in an array and than loop over the array and print them, I feel like that might change the solution a bit as well.
Store the rows of each card as elements in an array. That makes it pretty easy.
#include <iostream>
int main()
{
const char * str1[3] = {"███████","███1███","███████"};
const char * str2[3] = {"███████","███2███","███████"};
const char * str3[3] = {"███████","███3███","███████"};
for( int row = 0; row < 3; row ++ )
{
std::cout << str1[row] << str2[row] << str3[row] << "\n";
}
}
Output:
█████████████████████
███1██████2██████3███
█████████████████████
Again, pretty easy to add a space between those, if you want.
You could split each on \n and print them. We can use std::stringstream for splitting by \n.
void print(std::array<std::string, 3>& arr){
std::vector<std::stringstream> arr_buf{};
arr_buf.reserve(arr.size());
for(auto& str: arr){
arr_buf.emplace_back(str);
}
for(auto i=0u; i < arr.size(); ++i){
for(auto& stream: arr_buf){
std::string t;
stream >> t;
std::cout << t ;
}
std::cout << "\n";
}
}
Output:
print(arr)
█████████████████████
███1██████2██████3███
█████████████████████
Link to Demo
If you are certain that your output will always be displayed on a modern terminal supporting “ANSI Escape Codes” then you can use that to your advantage.
#include <iostream>
#include <sstream>
#include <string>
#include <vector>
// helper: split a string into a list of views
auto splitv( const std::string & s, const std::string & separators )
{
std::vector <std::string_view> views;
std::string::size_type a = 0, b = 0;
while (true)
{
a = s.find_first_not_of( separators, b );
b = s.find_first_of ( separators, a );
if (a >= s.size()) break;
if (b == s.npos) b = s.size();
views.emplace_back( &(s[a]), b-a );
}
return views;
}
std::string print( const std::string & s )
{
std::ostringstream os;
for (auto sv : splitv( s, "\n" ))
os
<< "\033" "7" // DEC save cursor position
<< sv
<< "\033" "8" // DEC restore cursor position
<< "\033[B"; // move cursor one line down
return os.str().substr( 0, os.str().size()-5 );
}
std::string movexy( int dx, int dy )
{
std::ostringstream os;
if (dy < 0) os << "\033[" << -dy << "A";
else if (dy > 0) os << "\033[" << dy << "B";
if (dx > 0) os << "\033[" << dx << "C";
else if (dx < 0) os << "\033[" << -dx << "D";
return os.str();
}
int main()
{
std::string str1 = "███████\n███1███\n███████";
std::string str2 = "███████\n███2███\n███████";
std::string str3 = "███████\n███3███\n███████";
std::cout
<< "\n" "\n\n" // blank line at top + blocks are three lines high
<< movexy( 2, -2 ) << print( str1 ) // first block is placed two spaces from left edge
<< movexy( 1, -2 ) << print( str2 ) // remaining blocks are placed one space apart
<< movexy( 1, -2 ) << print( str3 )
<< "\n\n"; // newline after last block, plus extra blank line at bottom
}
This produces the output:
███████ ███████ ███████
███1███ ███2███ ███3███
███████ ███████ ███████
The addition of spacing is, of course, entirely optional and only added for demonstrative purposes.
Advantages: UTF-8 and Pretty colors!
The advantage to this method is that you do not have to store or otherwise take any special care for strings containing multi-byte characters (UTF-8, as yours does) or any additional information like terminal color sequences.
That is, you could color each of your blocks differently by adding a color sequence to each strN variable! (The caveat is that you must repeat a color sequence after every newline. This is a known problem with various terminals...)
// red, white, and blue
std::string str1 = "\033[31m███████\n\033[31m███1███\n\033[31m███████";
std::string str2 = "\033[37m███████\n\033[37m███2███\n\033[37m███████";
std::string str3 = "\033[34m███████\n\033[34m███3███\n\033[34m███████";
Relative vs Absolute Caret Positioning
The other caveat to this particular example is that you must be aware of where the text caret (“cursor”) ends-up after each output. You could also use terminal escape sequences to absolutely position the caret before every output.
std::string gotoxy( int x, int y )
{
std::ostringstream os;
os << "\033[" << y << ";" << x << "H";
return os.str();
}
Then you wouldn’t have to care where the caret ends up. Just specify an absolute position before printing. Just don’t let the text scroll!
Windows OS Considerations
Finally, if you are on Windows and using the old Windows Console, you must initialize the terminal for ANSI terminal sequences and for UTF-8 output:
#ifdef _WIN32
#include <windows.h>
void init_terminal()
{
DWORD mode;
HANDLE hStdOut = GetStdHandle( STD_OUTPUT_HANDLE );
GetConsoleMode( hStdOut, &mode );
SetConsoleMode( hStdOut, mode | ENABLE_VIRTUAL_TERMINAL_PROCESSING );
SetConsoleOutputCP( 65001 );
}
#else
void init_terminal() { }
#endif
int main()
{
init_terminal();
...
This does no harm to the new Windows Terminal. I recommend you do it either way just because you do not know which of the two your user will use to run your program, alas.

Function behaves differently when run on Windows or Linux

I have a simple function that prints lines of text to the console, centered, with empty space filled in with an '=' sign. When I run this function with my program on Linux I see the text displayed properly at the top of the console window followed by the menu prompt from my program, but on Windows it prints nothing and skips directly to the menu prompt. Both programs are compiled and run in codeblocks using GNU gcc with -std=c++11.
void _print_center(vector<string>& tocenter)
{
int center;
for ( int x; x<static_cast<int>(tocenter.size());x++ )
{
char sfill = '=';
string line = tocenter[x];
center = (68/2)-(tocenter[x].length()/2);
line.replace(0, 0, center, sfill);
cout << std::left << std::setfill(sfill);
cout << std::setw(68) << line << endl;
}
}
You got an answer to your question (uninitialized variables). I recommend that you untangle and simplify your code so that this kind of issues don't creep up as often. For example:
Create a function that centers a single string.
void center( std::ostream& os, const std::string& text, int width ) {
if ( text.size() >= width ) {
// Nothing to center, just print the text.
os << text << std::endl;
} else {
// Total whitespace to pad.
auto to_pad = width - text.size();
// Pad half on the left
auto left_padding = to_pad / 2;
// And half on the right (account for uneven numbers)
auto right_padding = to_pad - left_padding;
// Print the concatenated strings. The string constructor will
// correctly handle a padding of zero (it will print zero `=`).
os << std::string( left_padding, '=' )
<< text
<< std::string( right_padding, '=' )
<< std::endl;
}
}
Once you've tested that the function works well for a single string, it is trivial to rely on C++ to apply it to a vector of strings:
void center( std::ostream& os,
const std::vector< std::string >& strings,
int width ) {
for ( auto&& string : strings ) {
center( os, string, width );
}
}
Whether you want to use std::string, or iomanip manipulators, or std::setfill the point remains the same: do not implement "iteration and formating" in the same function.

why doesn't "cout << Color::green" compile?

I had this question on a test.
I know that I can do something like:
enum class Color { red, green = 1, blue };
Color c = Color::blue;
if( c == Color::blue )
cout << "blue\n";
But when I replace cout << "blue\n"; with cout << Color::green, it doesn't even compile. Why doesn't it compile?
This error happens because C++ does not have a pre-defined way of printing an enum. You need to define an operator << for printing objects of Color enum type according to your needs.
For example, if you would like to print the numeric value, cast the color to int inside your operator:
ostream& operator<<(ostream& ostr, const Color& c) {
ostr << (int)c;
return ostr;
}
Demo.
If you would like to print enum value as text, see this Q&A for a sample implementation.

Convert textual date and time to ATL::CTime

Given a textual date and time, for example:
Sat, 13 Jan 2018 07:54:39 -0500 (EST)
How can I convert it into ATL / MFC CTime?
The function should return (in case of my example):
CTime(2018,1,13,7,54,39)
Either in GMT / UTF or plus the time zone
Update:
I tried writing the following function but it seems that ParseDateTime() always fail.
CTime DateTimeString2CTime(CString DateTimeStr)
{
COleDateTime t;
if (t.ParseDateTime(DateTimeStr))
{
CTime result(t);
return result;
}
return (CTime)NULL;
}
As an alternative to manual parsing, you could use the COleDateTime Class, and it's member COleDateTime::ParseDateTime:
bool ParseDateTime(
LPCTSTR lpszDate,
DWORD dwFlags = 0,
LCID lcid = LANG_USER_DEFAULT) throw();
From the docs:
The lpszDate parameter can take a variety of formats.
For example, the following strings contain acceptable date/time formats:
"25 January 1996"
"8:30:00"
"20:30:00"
"January 25, 1996 8:30:00"
"8:30:00 Jan. 25, 1996"
"1/25/1996 8:30:00" // always specify the full year,
// even in a 'short date' format
From there you could convert to CTime if needed.
You have to parse the string into the individual time components, convert these to integers and pass them to the appropriate CTime constructor.
There are many ways for parsing, one of the most straightforward and easy-to-maintain ways is to use regular expressions (once you get used to the syntax):
#include <iostream>
#include <regex>
void test( std::wstring const& s, std::wregex const& r );
int main()
{
std::wregex const r{
LR"(.*?)" // any characters (none or more)
LR"((\d+))" // match[1] = day
LR"(\s*)" // whitespace (none or more)
LR"((Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec))" // match[2] = month
LR"(\s*)" // whitespace (none or more)
LR"((\d+))" // match[3] = year
LR"(\s+)" // whitespace (1 or more)
LR"((\d+))" // match[4] = hour
LR"(\s*:\s*)" // whitespace (none ore more), colon (1), whitespace (none ore more)
LR"((\d+))" // match[5] = minute
LR"((?:\s*:\s*(\d+))?)" // match[6] = second (none or more)
LR"(.*)" // any characters (none or more)
, std::regex_constants::icase };
test( L"Sat, 13 Jan 2018 07:54:39 -0500 (EST)", r );
test( L"Wed, 10 jan2018 18:30 +0100", r );
test( L"10Jan 2018 18 :30 : 00 + 0100", r );
}
void test( std::wstring const& s, std::wregex const& r )
{
std::wsmatch m;
if( regex_match( s, m, r ) )
{
std::wcout
<< L"Day : " << m[ 1 ] << L'\n'
<< L"Month : " << m[ 2 ] << L'\n'
<< L"Year : " << m[ 3 ] << L'\n'
<< L"Hour : " << m[ 4 ] << L'\n'
<< L"Minute : " << m[ 5 ] << L'\n'
<< L"Second : " << m[ 6 ] << L'\n';
}
else
{
std::wcout << "no match" << '\n';
}
std::wcout << std::endl;
}
Live demo.
You specify a pattern (the r variable) that encloses each component in parenthesis. After the call to regex_match, the result is stored in the variable m where you can access each component (aka sub match) through the subscript operator. These are std::wstrings aswell.
If necessary, catch exceptions that can be thrown by regex library aswell as std::stoi. I've omitted this code for brevity.
Edit:
After OP commented that a more robust parsing is required, I modified the regex accordingly.
As can be seen in the calls to the test() function, the whitespace requirements are more relaxed now. Also the seconds part of the timestamp is now optional. This is implemented using a non-capturing group that is introduced with (?: and ends with ). By putting a ? after that group, the whole group (including whitespace, : and digits) can occur none or one time, but only the digits are captured.
Note: LR"()" designates a raw string literal to make the regex more readable (it avoids escaping the backslash). So the outer parenthesis are not part of the actual regex!
For manual parsing one could employ std::wstringstream. In my opinion, the only advantage over regular expressions is propably better performance. Otherwise this solution is just harder to maintain, for instance if the time format must be changed in the future.
#include <iostream>
#include <sstream>
#include <array>
#include <string>
int month_to_int( std::wstring const& m )
{
std::array<wchar_t const*, 12> names{ L"Jan", L"Feb", L"Mar", L"Apr", L"May", L"Jun", L"Jul", L"Aug", L"Sep", L"Oct", L"Nov", L"Dec" };
for( std::size_t i = 0; i < names.size(); ++i )
{
if( names[ i ] == m )
return i + 1;
}
return 0;
}
int main()
{
std::wstringstream s{ L"Sat, 13 Jan 2018 07:54:39 -0500 (EST)" };
std::wstring temp;
int day, month, year, hour, minute, second;
// operator >> reads until whitespace delimiter
s >> temp;
s >> day;
s >> temp; month = month_to_int( temp );
s >> year;
// use getline to explicitly specify the delimiter
std::getline( s, temp, L':' ); hour = std::stoi( temp );
std::getline( s, temp, L':' ); minute = std::stoi( temp );
// last token separated by whitespace again
s >> second;
std::cout
<< "Day : " << day << '\n'
<< "Month : " << month << '\n'
<< "Year : " << year << '\n'
<< "Hour : " << hour << '\n'
<< "Minute : " << minute << '\n'
<< "Second : " << second << '\n';
}
Live demo.
Again, no error handling here for brevity. You should check stream state after each input operation or call std::wstringstream::exceptions() after construction to enable exceptions and handle them.

Anything like substr but instead of stopping at the byte you specified, it stops at a specific string [duplicate]

This question already has answers here:
How do you search a std::string for a substring in C++?
(6 answers)
Closed 8 years ago.
I have a client for a pre-existing server. Let's say I get some packets "MC123, 456!##".
I store these packets in a char called message. To print out a specific part of them, in this case the numbers part of them, I would do something like "cout << message.substr(3, 7) << endl;".
But what if I receive another message "MC123, 456, 789!##". "cout << message.substr(3,7)" would only print out "123, 456", whereas I want "123, 456, 789". How would I do this assuming I know that every message ends with "!##".
First - Sketch out the indexing.
std::string packet1 = "MC123, 456!##";
// 0123456789012345678
// ^------^ desired text
std::string packet2 = "MC123, 456, 789!##";
// 0123456789012345678
// ^-----------^ desired text
The others answers are ok. If you wish to use std::string find,
consider rfind and find_first_not_of, as in the following code:
// forward
void messageShow(std::string packet,
size_t startIndx = 2);
// /////////////////////////////////////////////////////////////////////////////
int main (int, char** )
{
// 012345678901234567
// |
messageShow("MC123, 456!##");
messageShow("MC123, 456, 789!##");
messageShow("MC123, 456, 789, 987, 654!##");
// error test cases
messageShow("MC123, 456, 789##!"); // missing !##
messageShow("MC123x 456, 789!##"); // extraneous char in packet
return(0);
}
void messageShow(std::string packet,
size_t startIndx) // default value 2
{
static size_t seq = 0;
seq += 1;
std::cout << packet.size() << " packet" << seq << ": '"
<< packet << "'" << std::endl;
do
{
size_t bangAtPound_Indx = packet.rfind("!##");
if(bangAtPound_Indx == std::string::npos){ // not found, can't do anything more
std::cerr << " '!##' not found in packet " << seq << std::endl;
break;
}
size_t printLength = bangAtPound_Indx - startIndx;
const std::string DIGIT_SPACE = "0123456789, ";
size_t allDigitSpace = packet.find_first_not_of(DIGIT_SPACE, startIndx);
if(allDigitSpace != bangAtPound_Indx) {
std::cerr << " extraneous char found in packet " << seq << std::endl;
break; // something extraneous in string
}
std::cout << bangAtPound_Indx << " message" << seq << ": '"
<< packet.substr(startIndx, printLength) << "'" << std::endl;
}while(0);
std::cout << std::endl;
}
This outputs
13 packet1: 'MC123, 456!##'
10 message1: '123, 456'
18 packet2: 'MC123, 456, 789!##'
15 message2: '123, 456, 789'
28 packet3: 'MC123, 456, 789, 987, 654!##'
25 message3: '123, 456, 789, 987, 654'
18 packet4: 'MC123, 456, 789##!'
'!##' not found in packet 4
18 packet5: 'MC123x 456, 789!##'
extraneous char found in packet 5
Note: String indexes start at 0. The index of the digit '1' is 2.
The correct approach is to look for existence / location of the "known termination" string, then take the substring up to (but not including) that substring.
Something like
str::string termination = "!#$";
std::size_t position = inputstring.find(termination);
std::string importantBit = message.substr(0, position);
You could check the front of the string separately as well. Combining these, you could use regular expressions to make your code more robust, using a regex like
MC([0-9,]+)!#\$
This will return the bit between MC and !#$ but only if it consists entirely of numbers and commas. Obviously you can adapt this as needed.
UPDATE you asked in your comment how to use the regular expression. Here is a very simple program. Note - this is using C++11: you need to make sure our compiler supports it.
#include <iostream>
#include <regex>
int main(void) {
std::string s ("ABC123,456,789!#$");
std::smatch m;
std::regex e ("ABC([0-9,]+)!#\\$"); // matches the kind of pattern you are looking for
if (std::regex_search (s,m,e)) {
std::cout << "match[0] = " << m[0] << std::endl;
std::cout << "match[1] = " << m[1] << std::endl;
}
}
On my Mac, I can compile the above program with
clang++ -std=c++0x -stdlib=libc++ match.cpp -o match
If instead of just digits and commas you want "anything" in your expression (but it's still got fixed characters in front and behind) you can simply do
std::regex e ("ABC(.*)!#\\$");
Here, .+ means "zero or more of 'anything'" - but followed by !#$. The double backslash has to be there to "escape" the dollar sign, which has special meaning in regular expressions (it means "the end of the string").
The more accurately your regular expression reflects exactly what you expect, the better you will be able to trap any errors. This is usually a very good thing in programming. "Always check your inputs".
One more thing - I just noticed you mentioned that you might have "more stuff" in your string. This is where using regular expressions quickly becomes the best. You mentioned a string
MC123, 456!##*USRChester.
and wanted to extract 123, 456 and Chester. That is - stuff between MC and !#$, and more stuff after USR (if that is even there). Here is the code that shows how that is done:
#include <iostream>
#include <regex>
int main(void) {
std::string s1 ("MC123, 456!#$");
std::string s2 ("MC123, 456!#$USRChester");
std::smatch m;
std::regex e ("MC([0-9, ]+)!#\\$(?:USR)?(.*)$"); // matches the kind of pattern you are looking for
if (std::regex_search (s1,m,e)) {
std::cout << "match[0] = " << m[0] << std::endl;
std::cout << "match[1] = " << m[1] << std::endl;
std::cout << "match[2] = " << m[2] << std::endl;
}
if (std::regex_search (s2,m,e)) {
std::cout << "match[0] = " << m[0] << std::endl;
std::cout << "match[1] = " << m[1] << std::endl;
std::cout << "match[2] = " << m[2] << std::endl;
if (match[2].length() > 0) {
std::cout << m[2] << ": " << m[1] << std::endl;
}
}
}
Output:
match[0] = MC123, 456!#$
match[1] = 123, 456
match[2] =
match[0] = MC123, 456!#$USRChester
match[1] = 123, 456
match[2] = Chester
Chester: 123, 456
The matches are:
match[0] : "everything in the input string that was consumed by the Regex"
match[1] : "the thing in the first set of parentheses"
match[2] : "The thing in the second set of parentheses"
Note the use of the slightly tricky (?:USR)? expression. This says "This might (that's the ()? ) be followed by the characters USR. If it is, skip them (that's the ?: part) and match what follows.
As you can see, simply testing whether m[2] is empty will tell you whether you have just numbers, or number plus "the thing after the USR". I hope this gives you an inkling of the power of regular expressions for chomping through strings like yours.
If you are sure about the ending of the message, message.substr(3, message.size()-6) will do the trick.
However, it is good practice to check everything, just to avoid surprises.
Something like this:
if (message.size() < 6)
throw error;
if (message.substr(0,3) != "MCX") //the exact numbers do not match in your example, but you get the point...
throw error;
if (message.substr(message.size()-3) != "!##")
throw error;
string data = message.substr(3, message.size()-6);
Just calculate the offset first.
string str = ...;
size_t start = 3;
size_t end = str.find("!##");
assert(end != string::npos);
return str.substr(start, end - start);
You can get the index of "!##" by using:
message.find("!##")
Then use that answer instead of 7. You should also check for it equalling std::string::npos which indicates that the substring was not found, and take some different action.
string msg = "MC4,512,541,3123!##";
for (int i = 2; i < msg.length() - 3; i++) {
if (msg[i] != '!' && msg[i + 1] != '#' && msg[i + 2] != '#')
cout << msg[i];
}
or use char[]
char msg[] = "MC4,123,54!##";
sizeof(msg -1 ); //instead of msg.length()
// -1 for the null byte at the end (each char takes 1 byte so the size -1 == number of chars)