Advice on converting timestamp string in "HH:MM:SS.microseconds" format - c++

I'm given a list of timestamps (suppose we have a ready-made std::vector<std::string>) in a string format of a kind std::vector<std::string> = {"12:27:37.740002", "19:37:17.314002", "20:00:07.140902",...}. No dates, no timezones. What would be a preferable way to parse these strings to some kind of C++ type (std::chrono::time_point ?) to be able to perform some comparisons and sorting later.
For example: compare value, which was parsed from "20:00:07.140902" and value, was parsed from "20:00:07.000000".
C++17 is ok, but I can't use any third-party library (Boost, Date etc).
Keeping microseconds precision essential.

You can build this functionality completly with C++ standard library functionality.
For parsing the string use std::regex.
For time related datatypes use std::chrono
Example :
#include <stdexcept>
#include <regex>
#include <chrono>
#include <iostream>
auto parse_to_timepoint(const std::string& input)
{
// setup a regular expression to parse the input string
// https://regex101.com/
// each part between () is a group and will end up in the match
// [0-2] will match any character from 0 to 2 etc..
// [0-9]{6} will match exactly 6 digits
static const std::regex rx{ "([0-2][0-9]):([0-5][0-9]):([0-5][0-9])\\.([0-9]{6})" };
std::smatch match;
if (!std::regex_search(input, match, rx))
{
throw std::invalid_argument("input string is not a valid time string");
}
// convert each matched group to the corresponding value
// note match[0] is the complete matched string by the regular expression
// we only need the groups which start at index 1
const auto& hours = std::stoul(match[1]);
const auto& minutes = std::stoul(match[2]);
const auto& seconds = std::stoul(match[3]);
const auto& microseconds = std::stoul(match[4]);
// build up a duration
std::chrono::high_resolution_clock::duration duration{};
duration += std::chrono::hours(hours);
duration += std::chrono::minutes(minutes);
duration += std::chrono::seconds(seconds);
duration += std::chrono::microseconds(microseconds);
// then return a time_point (note this will not help you with correctly handling day boundaries)
// since there is no date in the input string
return std::chrono::high_resolution_clock::time_point{ duration };
}
int main()
{
std::string input1{ "20:00:07.140902" };
std::string input2{ "20:00:07.000000" };
auto tp1 = parse_to_timepoint(input1);
auto tp2 = parse_to_timepoint(input2);
std::cout << "start time = " << ((tp1 < tp2) ? input1 : input2) << "\n";
std::cout << "end time = " << ((tp1 >= tp2) ? input1 : input2) << "\n";
return 0;
}

I don't see why this shouldn't work. Using std::chrono::from_stream to parse the string into a time point, then just compare the two time points.
However, I've been trying it now with Visual Studio 2022 17.0.2 (Community Edition) and it fails to parse the string into a tp.
There is this answer from Ted Lyngmo's talking about a bug (fixed in VS2022 17.0.3) when parsing seconds with subseconds. I have to say though that his solution didn't work for me either in my VS2022.
Anyway, you may want to give it a try.
#include <chrono>
#include <iomanip> // boolalpha
#include <iostream> // cout
#include <sstream> // istringstream
#include <string>
auto parse_string_to_tp(const std::string& str)
{
std::istringstream iss{ str };
std::chrono::sys_time<std::chrono::microseconds> tp{};
std::chrono::from_stream(iss, "%H:%M:%S", tp); // or simply "%T"
return tp;
}
int main()
{
const std::string str1{ "12:27:37.740002" };
const std::string str2{ "13:00:00.500000" };
auto tp1{ parse_string_to_tp(str1) };
auto tp2{ parse_string_to_tp(str2) };
std::cout << "tp1 < tp2: " << std::boolalpha << (tp1 < tp2) << "\n";
std::cout << "tp2 < tp1: " << std::boolalpha << (tp2 < tp1) << "\n";
}
EDIT: it works if you just use durations instead of time points:
#include <chrono>
#include <iomanip> // boolalpha
#include <iostream> // cout
#include <sstream> // istringstream
#include <string>
auto parse_string_to_duration(const std::string& str)
{
std::istringstream iss{ str };
std::chrono::microseconds d{};
std::chrono::from_stream(iss, "%T", d);
return d;
}
int main()
{
const std::string str1{ "12:27:37.740002" };
const std::string str2{ "23:39:48.500000" };
auto d1{ parse_string_to_duration(str1) };
auto d2{ parse_string_to_duration(str2) };
std::cout << "d1 < d2: " << std::boolalpha << (d1 < d2) << "\n";
std::cout << "d2 < d1: " << std::boolalpha << (d2 < d1) << "\n";
}

Related

Why is my string extraction function using back referencing in regex not working as intended?

Extraction Function
string extractStr(string str, string regExpStr) {
regex regexp(regExpStr);
smatch m;
regex_search(str, m, regexp);
string result = "";
for (string x : m)
result = result + x;
return result;
}
The Main Code
#include <iostream>
#include <regex>
using namespace std;
string extractStr(string, string);
int main(void) {
string test = "(1+1)*(n+n)";
cout << extractStr(test, "n\\+n") << endl;
cout << extractStr(test, "(\\d)\\+\\1") << endl;
cout << extractStr(test, "([a-zA-Z])[+-/*]\\1") << endl;
cout << extractStr(test, "([a-zA-Z])[+-/*]([a-zA-Z])") << endl;
return 0;
}
The Output
String = (1+1)*(n+n)
n\+n = n+n
(\d)\+\1 = 1+11
([a-zA-Z])[+-/*]\1 = n+nn
([a-zA-Z])[+-/*]([a-zA-Z]) = n+nnn
If anyone could kindly point the error I've done or point me to a similar question in SO that I've missed while searching, it would be greatly appreciated.
Regexes in C++ don't work quite like "normal" regexes. Specialy when you are looking for multiple groups later. I also have some C++ tips in here (constness and references).
#include <cassert>
#include <iostream>
#include <sstream>
#include <regex>
#include <string>
// using namespace std; don't do this!
// https://stackoverflow.com/questions/1452721/why-is-using-namespace-std-considered-bad-practice
// pass strings by const reference
// 1. const, you promise not to change them in this function
// 2. by reference, you avoid making copies
std::string extractStr(const std::string& str, const std::string& regExpStr)
{
std::regex regexp(regExpStr);
std::smatch m;
std::ostringstream os; // streams are more efficient for building up strings
auto begin = str.cbegin();
bool comma = false;
// C++ matches regexes in parts so work you need to loop
while (std::regex_search(begin, str.end(), m, regexp))
{
if (comma) os << ", ";
os << m[0];
comma = true;
begin = m.suffix().first;
}
return os.str();
}
// small helper function to produce nicer output for your tests.
void test(const std::string& input, const std::string& regex, const std::string& expected)
{
auto output = extractStr(input, regex);
if (output == expected)
{
std::cout << "test succeeded : output = " << output << "\n";
}
else
{
std::cout << "test failed : output = " << output << ", expected : " << expected << "\n";
}
}
int main(void)
{
std::string input = "(1+1)*(n+n)";
test(input, "n\\+n", "n+n");
test(input, "(\\d)\\+\\1", "1+1");
test(input, "([a-zA-Z])[+-/*]\\1", "n+n");
return 0;
}

How can I find the positions from characters in a string with string::find?

I need the positions of characters in a string.
The String contains:
"username":"secret", "password":"also secret", "id":"secret too", "token":"secret"
and I need the positions of the quotation marks from the token that are bold: "token":"secret".
I have experimented with the code from http://www.cplusplus.com/reference/string/string/find
but everything didn't work. Can anyone help me?
Here is what i have tried but it only gives out a 0:
#include <iostream>
#include <string>
int main() {
std::string buffer("\"username\":\"secret\", \"password\":\"also secret\", \"id\":\"secret too\", \"token\":\"secret\"");
size_t found = buffer.find('"');
if (found == std::string::npos)std::cout << "something went wrong\n";
if (found != std::string::npos)
std::cout << "first " << '"' << " found at: " << found << '\n';
for (int j = 0; j <= 17; ++j) {
found = buffer.find('"');
found + 1, 6;
if (found != std::string::npos)
std::cout << "second " << '"' << " found at : " << found << '\n';
}
return 0;
There are so many possible solutions. So, it is hard to answer.
What basically needs to be done, is to iterate through the string, position by position, then check if the character is the searched one, and then do something with the result.
A first simple implementation could be:
#include <iostream>
#include <string>
const std::string buffer("\"username\":\"secret\", \"password\":\"also secret\", \"id\":\"secret too\", \"token\":\"secret\"");
int main() {
for (size_t position{}, counter{}; position < buffer.length(); ++position) {
if (buffer[position] == '\"') {
++counter;
std::cout << "Character \" number " << counter << " found at position " << position << '\n';
}
}
return 0;
}
But then, your question was about the usage of std::string.find(). In your implementation, you start always the search at the beginning of the std::string. And because of that, you will always find the same " at position 0.
Solution: After you have found the first match, use the resulting pos (incremented by one) as the second parameter to the std::string.find() function. Then you will start the search after the first found " and hence find the next one. And all this can be done in a normal for-loop.
See below the next easy example:
#include <iostream>
#include <string>
const std::string buffer("\"username\":\"secret\", \"password\":\"also secret\", \"id\":\"secret too\", \"token\":\"secret\"");
int main() {
for (size_t position{}, counter{}; std::string::npos != (position = buffer.find("\"", position)); ++position, ++counter) {
std::cout << "Character \" number " << counter << " found at position " << position << '\n';
}
return 0;
}
There are more solutions, depending on what you really want to do. You coud extract all keywords and data with a simple regex.
Something like this:
#include <iostream>
#include <string>
#include <regex>
#include <vector>
const std::regex re{ R"(\"([ a-zA-Z0-9]+)\")" };
const std::string buffer("\"username\":\"secret\", \"password\":\"also secret\", \"id\":\"secret too\", \"token\":\"secret\"");
int main() {
std::vector part(std::sregex_token_iterator(buffer.begin(), buffer.end(), re, 1), {});
std::cout << part[7] << '\n';
return 0;
}
Or, you can split everything into tokens and values. Like this:
#include <iostream>
#include <string>
#include <regex>
#include <vector>
#include <map>
#include <iomanip>
const std::regex re1{ "," };
const std::regex re2{ R"(\"([^\"]+)\")" };
const std::string buffer("\"username\":\"secret\", \"password\":\"also secret\", \"id\":\"secret too\", \"token\":\"secret\"");
int main() {
std::vector<std::string> block(std::sregex_token_iterator(buffer.begin(), buffer.end(), re1, -1), {});
std::map<std::string, std::string> entry{};
for (const auto& b : block) {
std::vector blockPart(std::sregex_token_iterator(b.begin(), b.end(), re2, 1), {});
entry[blockPart[0]] = blockPart[1];
}
for (const auto& [token, value] : entry)
std::cout << std::setw(20) << token << " --> " << value << '\n';
return 0;
}
But if you have a complex given format, like JSON, there are so many special cases that the only meaningful approach is to use an existing library.

How to parse two strings using boost::spirit?

I am still trying to wrap my head around Boost::Spirit.
I want to parse two words into a variable. When I can do that, into a struct.
The single word compiles, the Variable doesn't. Why?
#include <boost/spirit/include/qi.hpp>
#include <boost/tuple/tuple.hpp>
#include <string>
#include <iostream>
using namespace boost::spirit;
/*
class Syntax : public qi::parser{
};
*/
int main()
{
//get user input
std::string input;
std::getline(std::cin, input);
auto it = input.begin();
bool result;
//define grammar for a single word
auto word_grammar = +qi::alnum - qi::space;
std::string singleWord;
result = qi::parse(
it, input.end(),
word_grammar,
singleWord
);
if(!result){
std::cout << "Failed to parse a word" << '\n';
return -1;
}
std::cout << "\"" << singleWord << "\"" << '\n';
//Now parse two words into a variable
std::cout << "Variable:\n";
typedef boost::tuple<std::string, std::string> Variable;
Variable variable;
auto variable_grammar = word_grammar >> word_grammar;
result = qi::parse(
it, input.end(),
variable_grammar,
variable
);
if(!result){
std::cout << "Failed to parse a variable" << '\n';
return -1;
}
std::cout << "\"" << variable.get<0>() << "\" \"" << variable.get<1>() << "\"" << '\n';
//now parse a list of variables
std::cout << "List of Variables:\n";
std::list<Variable> variables;
result = qi::parse(
it, input.end(),
variable_grammar % +qi::space,
variable
);
if(!result){
std::cout << "Failed to parse a list of variables" << '\n';
return -1;
}
for(auto var : variables)
std::cout << "DataType: " << var.get<0>() << ", VariableName: " << var.get<1>() << '\n';
}
In the end I want to parse something like this:
int a
float b
string name
Templates are nice, but when problems occur the error messages are just not human readable (thus no point in posting them here).
I am using the gcc
Sorry to take so long. I've been building a new web server in a hurry and had much to learn.
Here is what it looks like in X3. I think it is easier to deal with than qi. And then, I've used it a lot more. But then qi is much more mature, richer. That said, x3 is meant to be adaptable, hackable. So you can make it do just about anything you want.
So, live on coliru
#include <string>
#include <iostream>
#include <vector>
#include <boost/spirit/home/x3.hpp>
#include <boost/tuple/tuple.hpp>
//as pointed out, for the error 'The parser expects tuple-like attribute type'
#include <boost/fusion/adapted/boost_tuple.hpp>
//our declarations
using Variable = boost::tuple<std::string, std::string>;
using Vector = std::vector<Variable>;
namespace parsers {
using namespace boost::spirit::x3;
auto const word = lexeme[+char_("a-zA-Z")];
//note, using 'space' as the stock skipper
auto const tuple = word >> word;
}
std::ostream& operator << (std::ostream& os, /*const*/ Variable& obj) {
return os << obj.get<0>() << ' ' << obj.get<1>();
}
std::ostream& operator << (std::ostream& os, /*const*/ Vector& obj) {
for (auto& item : obj)
os << item << " : ";
return os;
}
template<typename P, typename A>
bool test_parse(std::string in, P parser, A& attr) {
auto begin(in.begin());
bool r = phrase_parse(begin, in.end(), parser, boost::spirit::x3::space, attr);
std::cout << "result:\n " << attr << std::endl;
return r;
}
int main()
{
//not recomended but this is testing stuff
using namespace boost::spirit::x3;
using namespace parsers;
std::string input("first second third forth");
//parse one word
std::string singleWord;
test_parse(input, word, singleWord);
//parse two words into a variable
Variable variable;
test_parse(input, tuple, variable);
//parse two sets of two words
Vector vector;
test_parse(input, *tuple, vector);
}
You may like this form of testing. You can concentrate on testing parsers without a lot of extra code. It makes it easier down the road to keep your basic parsers in their own namespace. Oh yea, x3 compiles much faster than qi!
The single word compiles, the Variable doesn't. Why?
There are missing two #includes:
#include <boost/fusion/adapted/boost_tuple.hpp>
#include <boost/spirit/include/qi_list.hpp>

boost::spirit::x3 phrase_parse doing arithmetic operations before pushing into vector

I'm working on a project for my univertitiy studies. My goal is to read double numbers from a large file (2,6 GB) into a double vector.
I am working with the boost spirit x3 library with mmap. I have found some code in the net: https://github.com/sehe/bench_float_parsing which i am using.
Before pushing these double values into the vector i would like to do some arithmetic operations on these. So here i'm stuck. How can i do some artihmetic operations to double values before pushing them?
template <typename source_it>
size_t x3_phrase_parse<data::float_vector, source_it>::parse(source_it f, source_it l, data::float_vector& data) const {
using namespace x3;
bool ok = phrase_parse(f, l, *double_ % eol, space, data);
if (ok)
std::cout << "parse success\n";
else
std::cerr << "parse failed: '" << std::string(f, l) << "'\n";
if (f != l) std::cerr << "trailing unparsed: '" << std::string(f, l) << "'\n";
std::cout << "data.size(): " << data.size() << "\n";
return data.size();
}
I am sorry to not exactly answer your question. But boost spirit is not the appropriate tool. Spirit is a parser generator (as a subset is does of course also lexical analysis) . So, one level to high in the Chomsky hiearchy of languages. You do not need a parser but regular expressions: std:regex
A double can easily be found with a regular expression. In the attached code, I created a simple pattern for a doubles. And a regex can be used to search for it.
So, we will read from an istream (what can be a file, a stringstream, console input or whatever). We will read line by line, until the whole input is consumed.
For each line, we will check, if the input matches the expected pattern, being 1 double.
Then we read this double, do some calculations and then push it into the vector.
Please see the following very simple code.
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <regex>
std::istringstream input{R"(0.0
1.5
2.0
3.0
4.0
-5.0
)"};
using VectorDouble = std::vector<double>;
const std::regex reDouble{R"(([-+]?[0-9]*\.?[0-9]*))"};
std::istream& get(std::istream& is, VectorDouble& dd)
{
// Reset vector to empty before reading
dd.clear();
//Read all data from istream
std::string line{};
while (getline(is, line)) {
// Search for 2 doubles
std::smatch sm;
if (std::regex_search(line, sm, reDouble)) {
// Convert found strings to double
double d1{std::stod(sm[1])};
// Do some calculations
d1 = d1 + 10.0;
// Push back into vector
dd.emplace_back(d1);
}
else
std::cerr << "Error found in line: " << line << "\n";
}
return is;
}
int main()
{
// Define vector and fill it
VectorDouble dd{};
(void)get(input, dd);
// Some debug output
for (double& d : dd) {
std::cout << d << "\n";
}
return 0;
}
Why not use semantic actions to perform the arithmetic operations?
In the following code:
#include <iostream>
#include <sstream>
#include <string>
#include <cstdio>
#include <vector>
using VectorDouble = std::vector<double>;
void show( VectorDouble const& dd)
{
std::cout<<"vector result=\n";
for (double const& d : dd) {
std::cout << d << "\n";
}
}
auto arith_ops=[](double&x){ x+=10.0;};
std::string input_err_yes{R"(0.0
1.5
2.0xxx
not double
4.0
-5.0
)"};
std::string input_err_not{R"(0.0
1.5
2.0
3.0
4.0
-5.0
)"};
void stod_error_recov(std::string const&input)
//Use this for graceful error recovery in case input has syntax errors.
{
std::cout<<__func__<<":\n";
VectorDouble dd;
std::istringstream is(input);
std::string line{};
while (getline(is, line) ) {
try {
std::size_t eod;
double d1(std::stod(line,&eod));
arith_ops(d1);
dd.emplace_back(d1);
auto const eol=line.size();
if(eod!=eol) {
std::cerr << "Warning: trailing chars after double in line: "<< line << "\n";
}
}
catch (const std::invalid_argument&) {
if(!is.eof())
std::cerr << "Error: found in line: " << line << "\n";
}
}
show(dd);
}
void stod_error_break(std::string const&input)
//Use this if input is sure to have correct syntax.
{
std::cout<<__func__<<":\n";
VectorDouble dd;
char const*d=input.data();
while(true) {
try {
std::size_t eod;
double d1(std::stod(d,&eod));
d+=eod;
arith_ops(d1);
dd.emplace_back(d1);
}
catch (const std::invalid_argument&) {
//Either syntax error
//Or end of input.
break;
}
}
show(dd);
}
#include <boost/spirit/home/x3.hpp>
void x3_error_break(std::string const&input)
//boost::spirit::x3 method.
{
std::cout<<__func__<<":\n";
VectorDouble dd;
auto f=input.begin();
auto l=input.end();
using namespace boost::spirit::x3;
auto arith_action=[](auto&ctx)
{ arith_ops(_attr(ctx));
};
phrase_parse(f, l, double_[arith_action] % eol, blank, dd);
show(dd);
}
int main()
{
//stod_error_recov(input_err_yes);
//stod_error_break(input_err_not);
x3_error_break(input_err_not);
return 0;
}
the stod_* functions, unlike that of Armin's, don't need regex because
std:stod does the parsing and, because it doesn't
use regex it probably runs a bit faster.
There are 2 stod_* functions shown with in-source comments
indicated which should be used.
For completeness, a 3ird function using boost::spirit::x3 is
shown. IMHO, it's readability is better than the others; however,
it would probably take more time to compile.

Parse RFC3339/ISO 8601 timestamp in Boost

How do I parse a RFC3339 timestamp ("1985-04-12T23:20:50.52Z") (i.e. a subset of ISO8601) in C++03? I'm using Boost, but none of the Boost datetime libraries seem to include a function to do this.
The type of the actual time object doesn't matter, as long as I can easily compare it to 'now'. I only care about timestamps in the UTC timezone.
Have limitation of parsing timezone.
#include <sstream>
#include <iostream>
#include <string>
#include <iomanip>
int main() {
std::tm t = {};
std::string s = "2016-01-02T15:04:05+09:00";
int tz_offset = 0;
auto pos = s.find_last_of("+-Z");
if (pos != s.npos) {
std::string zone = s.substr(pos);
tz_offset = std::atoi(zone.c_str());
s.resize(pos);
}
std::stringstream ss(s);
ss >> std::get_time(&t, "%Y-%m-%dT%H:%M:%S");
if (ss.fail()) {
std::cout << "Parse failed\n";
} else {
std::time_t l = std::mktime(&t);
std::tm tm_utc(*std::gmtime(&l));
std::time_t utc = std::mktime(&tm_utc);
tz_offset += (utc - l);
l = std::mktime(&t) - tz_offset;
t = *std::gmtime(&l);
std::cout << std::put_time(&t, "%c") << std::endl;
}
}
Without using Boost, just strptime you can.
Assuming the same _adaptive_parser_ helper posted here: Using boost parse datetime string: With single digit hour format
Note: the samples taken from the RFC link
#include "adaptive_parser.h"
#include <string>
#include <iostream>
int main() {
using namespace mylib::datetime;
adaptive_parser parser;
for (std::string const input : {
"1985-04-12T23:20:50.52Z",
"1996-12-19T16:39:57-08:00",
"1990-12-31T23:59:60Z",
"1990-12-31T15:59:60-08:00",
"1937-01-01T12:00:27.87+00:20",
})
try {
std::cout << "Parsing '" << input << "'\n";
std::cout << " -> epoch " << parser(input).count() << "\n";
} catch(std::exception const& e) {
std::cout << "Exception: " << e.what() << "\n";
}
}
Prints
Parsing '1985-04-12T23:20:50.52Z'
-> epoch 482196050
Parsing '1996-12-19T16:39:57-08:00'
-> epoch 851042397
Parsing '1990-12-31T23:59:60Z'
-> epoch 662688000
Parsing '1990-12-31T15:59:60-08:00'
-> epoch 662688000
Parsing '1937-01-01T12:00:27.87+00:20'
-> epoch -1041335973
Of course, you can limit the number of accepted patterns for the subset you require.