Convert textual date and time to ATL::CTime - c++

Given a textual date and time, for example:
Sat, 13 Jan 2018 07:54:39 -0500 (EST)
How can I convert it into ATL / MFC CTime?
The function should return (in case of my example):
CTime(2018,1,13,7,54,39)
Either in GMT / UTF or plus the time zone
Update:
I tried writing the following function but it seems that ParseDateTime() always fail.
CTime DateTimeString2CTime(CString DateTimeStr)
{
COleDateTime t;
if (t.ParseDateTime(DateTimeStr))
{
CTime result(t);
return result;
}
return (CTime)NULL;
}

As an alternative to manual parsing, you could use the COleDateTime Class, and it's member COleDateTime::ParseDateTime:
bool ParseDateTime(
LPCTSTR lpszDate,
DWORD dwFlags = 0,
LCID lcid = LANG_USER_DEFAULT) throw();
From the docs:
The lpszDate parameter can take a variety of formats.
For example, the following strings contain acceptable date/time formats:
"25 January 1996"
"8:30:00"
"20:30:00"
"January 25, 1996 8:30:00"
"8:30:00 Jan. 25, 1996"
"1/25/1996 8:30:00" // always specify the full year,
// even in a 'short date' format
From there you could convert to CTime if needed.

You have to parse the string into the individual time components, convert these to integers and pass them to the appropriate CTime constructor.
There are many ways for parsing, one of the most straightforward and easy-to-maintain ways is to use regular expressions (once you get used to the syntax):
#include <iostream>
#include <regex>
void test( std::wstring const& s, std::wregex const& r );
int main()
{
std::wregex const r{
LR"(.*?)" // any characters (none or more)
LR"((\d+))" // match[1] = day
LR"(\s*)" // whitespace (none or more)
LR"((Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec))" // match[2] = month
LR"(\s*)" // whitespace (none or more)
LR"((\d+))" // match[3] = year
LR"(\s+)" // whitespace (1 or more)
LR"((\d+))" // match[4] = hour
LR"(\s*:\s*)" // whitespace (none ore more), colon (1), whitespace (none ore more)
LR"((\d+))" // match[5] = minute
LR"((?:\s*:\s*(\d+))?)" // match[6] = second (none or more)
LR"(.*)" // any characters (none or more)
, std::regex_constants::icase };
test( L"Sat, 13 Jan 2018 07:54:39 -0500 (EST)", r );
test( L"Wed, 10 jan2018 18:30 +0100", r );
test( L"10Jan 2018 18 :30 : 00 + 0100", r );
}
void test( std::wstring const& s, std::wregex const& r )
{
std::wsmatch m;
if( regex_match( s, m, r ) )
{
std::wcout
<< L"Day : " << m[ 1 ] << L'\n'
<< L"Month : " << m[ 2 ] << L'\n'
<< L"Year : " << m[ 3 ] << L'\n'
<< L"Hour : " << m[ 4 ] << L'\n'
<< L"Minute : " << m[ 5 ] << L'\n'
<< L"Second : " << m[ 6 ] << L'\n';
}
else
{
std::wcout << "no match" << '\n';
}
std::wcout << std::endl;
}
Live demo.
You specify a pattern (the r variable) that encloses each component in parenthesis. After the call to regex_match, the result is stored in the variable m where you can access each component (aka sub match) through the subscript operator. These are std::wstrings aswell.
If necessary, catch exceptions that can be thrown by regex library aswell as std::stoi. I've omitted this code for brevity.
Edit:
After OP commented that a more robust parsing is required, I modified the regex accordingly.
As can be seen in the calls to the test() function, the whitespace requirements are more relaxed now. Also the seconds part of the timestamp is now optional. This is implemented using a non-capturing group that is introduced with (?: and ends with ). By putting a ? after that group, the whole group (including whitespace, : and digits) can occur none or one time, but only the digits are captured.
Note: LR"()" designates a raw string literal to make the regex more readable (it avoids escaping the backslash). So the outer parenthesis are not part of the actual regex!
For manual parsing one could employ std::wstringstream. In my opinion, the only advantage over regular expressions is propably better performance. Otherwise this solution is just harder to maintain, for instance if the time format must be changed in the future.
#include <iostream>
#include <sstream>
#include <array>
#include <string>
int month_to_int( std::wstring const& m )
{
std::array<wchar_t const*, 12> names{ L"Jan", L"Feb", L"Mar", L"Apr", L"May", L"Jun", L"Jul", L"Aug", L"Sep", L"Oct", L"Nov", L"Dec" };
for( std::size_t i = 0; i < names.size(); ++i )
{
if( names[ i ] == m )
return i + 1;
}
return 0;
}
int main()
{
std::wstringstream s{ L"Sat, 13 Jan 2018 07:54:39 -0500 (EST)" };
std::wstring temp;
int day, month, year, hour, minute, second;
// operator >> reads until whitespace delimiter
s >> temp;
s >> day;
s >> temp; month = month_to_int( temp );
s >> year;
// use getline to explicitly specify the delimiter
std::getline( s, temp, L':' ); hour = std::stoi( temp );
std::getline( s, temp, L':' ); minute = std::stoi( temp );
// last token separated by whitespace again
s >> second;
std::cout
<< "Day : " << day << '\n'
<< "Month : " << month << '\n'
<< "Year : " << year << '\n'
<< "Hour : " << hour << '\n'
<< "Minute : " << minute << '\n'
<< "Second : " << second << '\n';
}
Live demo.
Again, no error handling here for brevity. You should check stream state after each input operation or call std::wstringstream::exceptions() after construction to enable exceptions and handle them.

Related

Problem with special characters with RegEx in C++

I have an issue to replace a special characters in string (from IIS Sharepoint log files) that contains a domain name with forward slash and names that starts with t, n, r that makes confusions with regular expressions. my code is as follow:
std::setlocale(LC_ALL, ".ACP"); //Sets the locale to the ANSI code page obtained from the operating system. FR characters
std::string subject("2018-08-26 11:38:20 172.20.1.148 GET /BaseDocumentaire/Documents+de+la+page+Notes+de+services/Rappel+du+dispositif+de+Sécurité+relatif+aux+Moyens+de+paiement+et+d’épargne+en+agence.pdf - 80 0#.w|domainname\tonzaro 10.12.105.24 Mozilla/5.0+(Windows+NT+10.0;+Win64;+x64;+rv:61.0)+Gecko/20100101+Firefox/61.0 200 0 0 29984");
std::string result;
std::string g1, g2, g5, g9, g10; //str groups in regex
try {
std::regex re("(\\d{4}-\\d{2}-\\d{2})( \\d{2}:\\d{2}:\\d{2})( 172.20.1.148)( GET | POST | HEAD )((/.*){1,4}/.*.(pdf|aspx))( -.*)(domainname.[a-zA-Z0-9]*)( \\d+.\\d+.\\d+.\\d+)");
std::sregex_iterator next(subject.begin(), subject.end(), re);
std::sregex_iterator end;
while (next != end) {
std::smatch match = *next;
std::cout << match.str() << "\n";
std::cout << "-------------------------------------------" << "\n";
g1 = match.str(1);
g2 = match.str(2);
g5 = match.str(5);
g9 = match.str(9);
g10 = match.str(10);
next++;
}
std::cout << "Date: " + g1 << "\n";
std::cout << "Time: " + g2 << "\n";
std::replace(g5.begin(), g5.end(), '+', ' ');
std::cout << "Link Document : " + g5 << "\n";
std::cout << "User: " + g9 << "\n";
std::cout << "IP: " + g10 << "\n";
}
catch (std::regex_error& e) {
std::cout << "Syntax error in the regular expression" << "\n";
}
My output for domain name is: domainname onzaro
Any help please for this problem with \, \t, \n or \r ?
I'd urge you to use raw string literals. This is solution designed for cases where the literal should not be processed in any way, such as yours.
The syntax is R "delimiter( raw_characters )delimiter", so in your case it could be:
std::string subject(R"raw(2018-08-26 11:38:20 172.20.1.148 GET /BaseDocumentaire/Documents+de+la+page+Notes+de+services/Rappel+du+dispositif+de+Sécurité+relatif+aux+Moyens+de+paiement+et+d’épargne+en+agence.pdf - 80 0#.w|domainname\tonzaro 10.12.105.24 Mozilla/5.0+(Windows+NT+10.0;+Win64;+x64;+rv:61.0)+Gecko/20100101+Firefox/61.0 200 0 0 29984)raw");
std::regex re( R"raw((\d{4}-\d{2}-\d{2})( \d{2}:\d{2}:\d{2})( 172.20.1.148)( GET | POST | HEAD )((/.*){1,4}/.*.(pdf|aspx))( -.*)(domainname.[a-zA-Z0-9]*)( \d+.\d+.\d+.\d+))raw");
(I might have missed some superfluous \ above). See it live.
Those special characters are called escape sequences are being processed in string literals at compilation level (in phase 5 to be precise). For raw string literals this transformation is suppressed.
You don't care about any special character handling. You just need to take care that ")delimiter" doesn't appear in your literal, which I imagine could happen in regex.
'\t' is one character, a horizontal tab. If you want the characters \ and t, you need to escape the backslash: "\\t".

Convert a char array of escaped UTF-8 octets to a string in C++

I have a char array containing some UTF-8-encoded Turkish characters - in the form of escaped octets. Thus if I run this code in C++11:
void foo(char* utf8_encoded) {
cout << utf8_encoded << endl;
}
it prints \xc4\xb0-\xc3\x87-\xc3\x9c-\xc4\x9e. I want to convert this char[] to an std::string so that it contains UTF-8 decoded values İ-Ç-Ü-Ğ. I have converted that char[] to wstring but it still prints as \xc4\xb0-\xc3\x87-\xc3\x9c-\xc4\x9e. How can I do that?
EDIT: I'm not the one who constructs this char[]. It is one of the static-length parameter of a callback function called by a private library. So the callback function is as follows:
void some_callback_function (INFO *info) {
cout << info->some_char_array << endl;
cout << "*****" << endl;
for(int i=0; i<64; i++) {
cout << "-" << info->some_char_array[i];
}
cout << "*****" << endl;
char bar[65] = "\xc4\xb0-\xc3\x87-\xc3\x9c-\xc4\x9e";
cout << bar << endl;
}
Where the INFO struct is:
typedef struct {
char some_char_array[65];
} INFO;
So when my callback function is called, the output is as follows:
\xc4\xb0-\xc3\x87-\xc3\x9c-\xc4\x9e
*****
-\-x-c-4-\-x-b-0---\-x-c-3-\-x-8-7---\-x-c-3-\-x-9-c---\-x-c-4-\-x-9-e-----------------------------
*****
İ-Ç-Ü-Ğ
So my current question is, I didn't get the difference between info->some_char_array and bar char arrays. What I want is to edit info->some_char_array such that, it prints the output as İ-Ç-Ü-Ğ.
OK, this is a bit of a handful, ripped out of a larger parser I am using. But "a bit of a handful" is the nature of Boost.Spirit. ;-)
The parser will not only parse hexadecimal escapes, but octals (\123) and "standard" escapes (\n) as well. Provided under CC0, so you can do with it whatever you like. ;-)
Boost.Spirit is a "header only" part of Boost, so you don't need to link in any library code. The rather involved "magic" done by the Spirit headers to allow grammars expressed in C++ source this way is a bit hard on the compile time, though.
But it works, and works well.
#define BOOST_SPIRIT_USE_PHOENIX_V3
#include "boost/spirit/include/qi.hpp"
#include "boost/spirit/include/phoenix.hpp"
#include <string>
#include <cstring>
#include <sstream>
#include <stdexcept>
namespace
{
// Helper function: Turn on_error positional parameters into error message.
template< typename Iterator >
std::string make_error_message( boost::spirit::info const & info, Iterator first, Iterator last )
{
std::ostringstream oss;
oss << "Invalid sequence. Expecting " << info << " here: \"" << std::string( first, last ) << "\"";
return oss.str();
}
}
// Wrap helper function with Boost.Phoenix boilerplate, so the function
// can be called from within a parser's [].
BOOST_PHOENIX_ADAPT_FUNCTION( std::string, make_error_message_, make_error_message, 3 )
// Supports various escape sequences:
// - Character escapes ( \a \b \f \n \r \t \v \" \\ )
// - Octal escapes ( \n \nn \nnn )
// - Hexadecimal escapes ( \xnn ) (*)
//
// (*): In C/C++, a hexadecimal escape runs until the first non-hexdigit
// is encountered, which is not very helpful. This one takes exactly
// two hexdigits.
// Declaring a grammer that works given any kind of iterator,
// and results in a std::string object.
template < typename Iterator >
class EscapedString : public boost::spirit::qi::grammar< Iterator, std::string() >
{
public:
// Constructor
EscapedString() : EscapedString::base_type( escaped_string )
{
// An escaped string is a sequence of
// characters that are not '\', or
// an escape sequence
escaped_string = *( +( boost::spirit::ascii::char_ - '\\' ) | escapes );
// An escape sequence begins with '\', followed by
// an escaped character (e.g. "\n"), or
// an 'x' and 2..2 hexadecimal digits, or
// 1..3 octal digits.
escapes = '\\' > ( escaped_character
| ( "x" > boost::spirit::qi::uint_parser< char, 16, 2, 2 >() )
| boost::spirit::qi::uint_parser< char, 8, 1, 3 >() );
// The list of special "escape" characters
escaped_character.add
( "a", 0x07 ) // alert
( "b", 0x08 ) // backspace
( "f", 0x0c ) // form feed
( "n", 0x0a ) // new line
( "r", 0x0d ) // carriage return
( "t", 0x09 ) // horizontal tab
( "v", 0x0b ) // vertical tab
( "\"", 0x22 ) // literal quotation mark
( "\\", 0x5c ) // literal backslash
;
// Error handling
boost::spirit::qi::on_error< boost::spirit::qi::fail >
(
escapes,
// backslash not followed by a valid sequence
boost::phoenix::throw_(
boost::phoenix::construct< std::runtime_error >( make_error_message_( boost::spirit::_4, boost::spirit::_3, boost::spirit::_2 ) )
)
);
}
private:
// Qi Rule member
boost::spirit::qi::rule< Iterator, std::string() > escaped_string;
// Helpers
boost::spirit::qi::rule< Iterator, std::string() > escapes;
boost::spirit::qi::symbols< char const, char > escaped_character;
};
int main()
{
// Need to escape the backslashes, or "\xc4" would give *one*
// byte of output (0xc4, decimal 196). I understood the input
// to be the FOUR character hex char literal,
// backslash, x, c, 4 in this case,
// which is what this string literal does.
char * some_char_array = "\\xc4\\xb0-\\xc3\\x87-\\xc3\\x9c-\\xc4\\x9e";
std::cout << "Input: '" << some_char_array << "'\n";
// result object
std::string s;
// Create an instance of the grammar with "char *"
// as the iterator type.
EscapedString< char * > es;
// start, end, parsing grammar, result object
boost::spirit::qi::parse( some_char_array,
some_char_array + std::strlen( some_char_array ),
es,
s );
std::cout << "Output: '" << s << "'\n";
return 0;
}
This gives:
Input: '\xc4\xb0-\xc3\x87-\xc3\x9c-\xc4\x9e'
Output: 'İ-Ç-Ü-Ğ'

C++ Boost:regex_search expression - Issue combining expressions to catch all sequences

I'm trying to write a template parser and need to pickup (3) distinct sets of sequences for string replacement.
// Each of These Expressions Work Perfect Separately!
// All Sequences start with | pipe. Followed by
boost::regex expr {"(\\|[0-9]{2})"}; // 2 Digits only.
boost::regex expr {"(\\|[A-Z]{1,2}+[0-9]{1,2})"}; // 1 or 2 Uppercase Chars and 1 or 2 Digits.
boost::regex expr {"(\\|[A-Z]{2})(?!\\d)"}; // 2 Uppercase Chars with no following digits.
However, once I try to combine them into a single statement, I get can't them to work properly to catch all sequences. I must be missing something. Can anyone shed some light on what I'm missing?
Here is what I have so far:
// Each sequence is separated with a | for or between parenthesis.
boost::regex expr {"(\\|[0-9]{2})|(\\|[A-Z]{1,2}+[0-9]{1,2})|(\\|[A-Z]{2})(?!\\d)"};
I'm using the follow string for testing, and probably little more then needed here is the code as well.
#include <boost/regex.hpp>
#include <string>
#include <iostream>
std::string str = "|MC01 |U1 |s |A22 |12 |04 |2 |EW |SSAADASD |15";
boost::regex expr {"(\\|[0-9]{2})|(\\|[A-Z]{1,2}+[0-9]{1,2})|(\\|[A-Z]{2})(?!\\d)"};
boost::smatch matches;
std::string::const_iterator start = str.begin(), end = str.end();
while(boost::regex_search(start, end, matches, expr))
{
std::cout << "Matched Sub '" << matches.str()
<< "' following ' " << matches.prefix().str()
<< "' preceeding ' " << matches.suffix().str()
<< std::endl;
start = matches[0].second;
for(size_t s = 1; s < matches.size(); ++s)
{
std::cout << "+ Matched Sub " << matches[s].str()
<< " at offset " << matches[s].first - str.begin()
<< " of length " << matches[s].length()
<< std::endl;
}
}
I believe this is what you want:
const boost::regex expr {"(\\|[0-9]{2})|(\\|[A-Z]{1,2}+[0-9]{1,2})|(\\|[A-Z]{2})"}; // basically, remove the constraint on the last sub
I also suggest being explicit in your flags for expr and passed to regex_search.
I also fond that by added an extra check for matches on matched, this removes half-matched patterns which was throwing me off.
for(size_t s = 1; s < matches.size(); ++s)
{
if (matches[s].matched) // Check for bool True/False
{
std::cout << "+ Matched Sub " << matches[s].str()
<< " at offset " << matches[s].first - str.begin()
<< " of length " << matches[s].length()
<< std::endl;
}
}
Without it, matches where showing with an offset at the end of the string showing length 0. So I hope this helps anyone else who runs into this.
Another Tip is, in the loop, checking s == 1, 2, 3 refers back to the match on the expressions. Since I have (3) expressions, if it matched on the first part of the expression, s will have a 1 value when matched is a true value, otherwise it will have 2 or 3. Pretty nice!

Anything like substr but instead of stopping at the byte you specified, it stops at a specific string [duplicate]

This question already has answers here:
How do you search a std::string for a substring in C++?
(6 answers)
Closed 8 years ago.
I have a client for a pre-existing server. Let's say I get some packets "MC123, 456!##".
I store these packets in a char called message. To print out a specific part of them, in this case the numbers part of them, I would do something like "cout << message.substr(3, 7) << endl;".
But what if I receive another message "MC123, 456, 789!##". "cout << message.substr(3,7)" would only print out "123, 456", whereas I want "123, 456, 789". How would I do this assuming I know that every message ends with "!##".
First - Sketch out the indexing.
std::string packet1 = "MC123, 456!##";
// 0123456789012345678
// ^------^ desired text
std::string packet2 = "MC123, 456, 789!##";
// 0123456789012345678
// ^-----------^ desired text
The others answers are ok. If you wish to use std::string find,
consider rfind and find_first_not_of, as in the following code:
// forward
void messageShow(std::string packet,
size_t startIndx = 2);
// /////////////////////////////////////////////////////////////////////////////
int main (int, char** )
{
// 012345678901234567
// |
messageShow("MC123, 456!##");
messageShow("MC123, 456, 789!##");
messageShow("MC123, 456, 789, 987, 654!##");
// error test cases
messageShow("MC123, 456, 789##!"); // missing !##
messageShow("MC123x 456, 789!##"); // extraneous char in packet
return(0);
}
void messageShow(std::string packet,
size_t startIndx) // default value 2
{
static size_t seq = 0;
seq += 1;
std::cout << packet.size() << " packet" << seq << ": '"
<< packet << "'" << std::endl;
do
{
size_t bangAtPound_Indx = packet.rfind("!##");
if(bangAtPound_Indx == std::string::npos){ // not found, can't do anything more
std::cerr << " '!##' not found in packet " << seq << std::endl;
break;
}
size_t printLength = bangAtPound_Indx - startIndx;
const std::string DIGIT_SPACE = "0123456789, ";
size_t allDigitSpace = packet.find_first_not_of(DIGIT_SPACE, startIndx);
if(allDigitSpace != bangAtPound_Indx) {
std::cerr << " extraneous char found in packet " << seq << std::endl;
break; // something extraneous in string
}
std::cout << bangAtPound_Indx << " message" << seq << ": '"
<< packet.substr(startIndx, printLength) << "'" << std::endl;
}while(0);
std::cout << std::endl;
}
This outputs
13 packet1: 'MC123, 456!##'
10 message1: '123, 456'
18 packet2: 'MC123, 456, 789!##'
15 message2: '123, 456, 789'
28 packet3: 'MC123, 456, 789, 987, 654!##'
25 message3: '123, 456, 789, 987, 654'
18 packet4: 'MC123, 456, 789##!'
'!##' not found in packet 4
18 packet5: 'MC123x 456, 789!##'
extraneous char found in packet 5
Note: String indexes start at 0. The index of the digit '1' is 2.
The correct approach is to look for existence / location of the "known termination" string, then take the substring up to (but not including) that substring.
Something like
str::string termination = "!#$";
std::size_t position = inputstring.find(termination);
std::string importantBit = message.substr(0, position);
You could check the front of the string separately as well. Combining these, you could use regular expressions to make your code more robust, using a regex like
MC([0-9,]+)!#\$
This will return the bit between MC and !#$ but only if it consists entirely of numbers and commas. Obviously you can adapt this as needed.
UPDATE you asked in your comment how to use the regular expression. Here is a very simple program. Note - this is using C++11: you need to make sure our compiler supports it.
#include <iostream>
#include <regex>
int main(void) {
std::string s ("ABC123,456,789!#$");
std::smatch m;
std::regex e ("ABC([0-9,]+)!#\\$"); // matches the kind of pattern you are looking for
if (std::regex_search (s,m,e)) {
std::cout << "match[0] = " << m[0] << std::endl;
std::cout << "match[1] = " << m[1] << std::endl;
}
}
On my Mac, I can compile the above program with
clang++ -std=c++0x -stdlib=libc++ match.cpp -o match
If instead of just digits and commas you want "anything" in your expression (but it's still got fixed characters in front and behind) you can simply do
std::regex e ("ABC(.*)!#\\$");
Here, .+ means "zero or more of 'anything'" - but followed by !#$. The double backslash has to be there to "escape" the dollar sign, which has special meaning in regular expressions (it means "the end of the string").
The more accurately your regular expression reflects exactly what you expect, the better you will be able to trap any errors. This is usually a very good thing in programming. "Always check your inputs".
One more thing - I just noticed you mentioned that you might have "more stuff" in your string. This is where using regular expressions quickly becomes the best. You mentioned a string
MC123, 456!##*USRChester.
and wanted to extract 123, 456 and Chester. That is - stuff between MC and !#$, and more stuff after USR (if that is even there). Here is the code that shows how that is done:
#include <iostream>
#include <regex>
int main(void) {
std::string s1 ("MC123, 456!#$");
std::string s2 ("MC123, 456!#$USRChester");
std::smatch m;
std::regex e ("MC([0-9, ]+)!#\\$(?:USR)?(.*)$"); // matches the kind of pattern you are looking for
if (std::regex_search (s1,m,e)) {
std::cout << "match[0] = " << m[0] << std::endl;
std::cout << "match[1] = " << m[1] << std::endl;
std::cout << "match[2] = " << m[2] << std::endl;
}
if (std::regex_search (s2,m,e)) {
std::cout << "match[0] = " << m[0] << std::endl;
std::cout << "match[1] = " << m[1] << std::endl;
std::cout << "match[2] = " << m[2] << std::endl;
if (match[2].length() > 0) {
std::cout << m[2] << ": " << m[1] << std::endl;
}
}
}
Output:
match[0] = MC123, 456!#$
match[1] = 123, 456
match[2] =
match[0] = MC123, 456!#$USRChester
match[1] = 123, 456
match[2] = Chester
Chester: 123, 456
The matches are:
match[0] : "everything in the input string that was consumed by the Regex"
match[1] : "the thing in the first set of parentheses"
match[2] : "The thing in the second set of parentheses"
Note the use of the slightly tricky (?:USR)? expression. This says "This might (that's the ()? ) be followed by the characters USR. If it is, skip them (that's the ?: part) and match what follows.
As you can see, simply testing whether m[2] is empty will tell you whether you have just numbers, or number plus "the thing after the USR". I hope this gives you an inkling of the power of regular expressions for chomping through strings like yours.
If you are sure about the ending of the message, message.substr(3, message.size()-6) will do the trick.
However, it is good practice to check everything, just to avoid surprises.
Something like this:
if (message.size() < 6)
throw error;
if (message.substr(0,3) != "MCX") //the exact numbers do not match in your example, but you get the point...
throw error;
if (message.substr(message.size()-3) != "!##")
throw error;
string data = message.substr(3, message.size()-6);
Just calculate the offset first.
string str = ...;
size_t start = 3;
size_t end = str.find("!##");
assert(end != string::npos);
return str.substr(start, end - start);
You can get the index of "!##" by using:
message.find("!##")
Then use that answer instead of 7. You should also check for it equalling std::string::npos which indicates that the substring was not found, and take some different action.
string msg = "MC4,512,541,3123!##";
for (int i = 2; i < msg.length() - 3; i++) {
if (msg[i] != '!' && msg[i + 1] != '#' && msg[i + 2] != '#')
cout << msg[i];
}
or use char[]
char msg[] = "MC4,123,54!##";
sizeof(msg -1 ); //instead of msg.length()
// -1 for the null byte at the end (each char takes 1 byte so the size -1 == number of chars)

Retrieving a regex search in C++

Hello I am new to regular expressions and from what I understood from the c++ reference website it is possible to get match results.
My question is: how do I retrieve these results? What is the difference between smatch and cmatch? For example, I have a string consisting of date and time and this is the regular expression I wrote:
"(1[0-2]|0?[1-9])([:][0-5][0-9])?(am|pm)"
Now when I do a regex_search with the string and the above expression, I can find whether there is a time in the string or not. But I want to store that time in a structure so I can separate hours and minutes. I am using Visual studio 2010 c++.
If you use e.g. std::regex_search then it fills in a std::match_result where you can use the operator[] to get the matched strings.
Edit: Example program:
#include <iostream>
#include <string>
#include <regex>
void test_regex_search(const std::string& input)
{
std::regex rgx("((1[0-2])|(0?[1-9])):([0-5][0-9])((am)|(pm))");
std::smatch match;
if (std::regex_search(input.begin(), input.end(), match, rgx))
{
std::cout << "Match\n";
//for (auto m : match)
// std::cout << " submatch " << m << '\n';
std::cout << "match[1] = " << match[1] << '\n';
std::cout << "match[4] = " << match[4] << '\n';
std::cout << "match[5] = " << match[5] << '\n';
}
else
std::cout << "No match\n";
}
int main()
{
const std::string time1 = "9:45pm";
const std::string time2 = "11:53am";
test_regex_search(time1);
test_regex_search(time2);
}
Output from the program:
Match
match[1] = 9
match[4] = 45
match[5] = pm
Match
match[1] = 11
match[4] = 53
match[5] = am
Just use named groups.
(?<hour>(1[0-2]|0?[1-9]))([:](?<minute>[0-5][0-9]))?(am|pm)
Ok, vs2010 doesn't support named groups. You already using unnamed capture groups. Go through them.