Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
There is a string s. What is minimum length of substring to rearrange for making string s palindrome.
Example:
Input: abbaabbca
Output: 4
I can rearrange substring from index 4 to 7 (abbc), and get abbacabba
It is guaranteed that there is palindrome after rearrange.
Is there a way to solve it using modification of Manacher's or some other text algorithm?
Thanks.
I think this is not the case for standard text processing algorithms. It is so simple you don't need them - there is only one reshuffled part of the string, so four situations can occur.
'ppssXXXXXXXpp'
'ppXXXXXsssspp'
'ppsssiiiXXXpp'
'ppXXXiiissspp'
where
pp is the outer part that is already palindromic ( may be zero )
XX is the part we reshuffle
ss is the part we leave as it is ( and reshuffle the XX to match it )
ii is the inner part around the center that is also already palindromic ( may be zero )
we can check and clip the outer palindromic part first, leaving us with 'ssXXXXXXX' , 'XXXXXssss' , 'sssiiiXXX' or 'XXXiiisss'
Then we use the symmetry - if the middle part exists, we can arbitrarily choose which side we keep and which we shuffle to adapt to the other, so we just do one.
When there is no middle palindromic part, we simply run the same check but starting from opposite directions and then we choose the one that gave the shorter substring
So, let's proceed from the start. We will simply take one character after the other
's--------'
'ss-------'
'sss------'
and stop when the rest of the string would not be any longer made to match the rest.
When does that happen ? When the 'ssss... part of the string already gobbled up more than a half of all occurrences of a character, then it will be missing on the other side and it can't be made to match by shuffling.
On the other hand, we will always eat up more than a half of each character's occurrences after passing the middle of the string. So three situations can occur.
we run short of the middle. In that case we have found the string to reshuffle. 'sssXXXXXXXXXXXX'
we reach the middle. Then we can search for the inner part that is palindromic too, yielding something like 'ssssiiiiXXXX'
there is a special case you reach the middle of an odd-sided string - there has to be the one odd count character there. If it is not there, you will have to proceed as with 1)
The resulting algorithm ( in java, already tried it here ) :
package palindrometest;
import java.io.*;
import java.util.*;
import java.util.stream.*;
class PalindromeTest {
static int[] findReshuffleRange( String s ) {
// first the easy part,
//split away the already palindromatic start and end if there is any
int lo = 0, hi = s.length()-1;
while(true) {
if( lo >= hi ) {
return new int[]{0,0}; // entire string a palindrome
}
if( s.charAt(lo) != s.charAt(hi) ) {
break;
}
lo++;
hi--;
}
// now we compute the char counts and things based on them
Map<Character,Integer> charCounts = countChars( s, lo, hi );
if( !palindromePossible( charCounts ) ) {
return null;
}
Map<Character,Integer> halfCounts = halfValues( charCounts );
char middleChar = 0;
if( (s.length() % 2) != 0 ) { // only an odd-sized string has a middle char
middleChar = findMiddleChar( charCounts );
}
// try from the beginning first
int fromStart[] = new int[2];
if( findMiddlePart( fromStart, s, lo, hi, halfCounts, middleChar, false ) ) {
// if the middle palindromatic part exist, the situation is symmetric
// we don't have to check the opposite direction
return fromStart;
}
// try from the end
int fromEnd[] = new int[2];
findMiddlePart( fromEnd, s, lo, hi, halfCounts, middleChar, true );
// take the shorter
if( fromEnd[1]-fromEnd[0] < fromStart[1]-fromStart[0] ) {
return fromEnd;
} else {
return fromStart;
}
}
static boolean findMiddlePart( int[] result, String s, int lo, int hi, Map<Character,Integer> halfCounts, char middleChar, boolean backwards ) {
Map<Character,Integer> limits = new HashMap<>(halfCounts);
int pos, direction, end, oth;
if( backwards ) {
pos = hi;
direction = -1;
end = (lo+hi)/2; // mid rounded down
oth = (lo+hi+1)/2; // mid rounded up
} else {
pos = lo;
direction = 1;
end = (lo+hi+1)/2; // mid rounded up
oth = (lo+hi)/2; // mid rounded down
}
// scan until we run out of the limits
while(true) {
char c = s.charAt(pos);
int limit = limits.get(c);
if( limit <= 0 ) {
break;
}
limits.put(c,limit-1);
pos += direction;
}
// whether we reached the middle
boolean middleExists = pos == end && ( oth != end || s.charAt(end) == middleChar );
if( middleExists ) {
// scan through the middle until we find the first non-palindromic character
while( s.charAt(pos) == s.charAt(oth) ) {
pos += direction;
oth -= direction;
}
}
// prepare the resulting interval
if( backwards ) {
result[0] = lo;
result[1] = pos+1;
} else {
result[0] = pos;
result[1] = hi+1;
}
return middleExists;
}
static Map<Character,Integer> countChars( String s, int lo, int hi ) {
Map<Character,Integer> charCounts = new HashMap<>();
for( int i = lo ; i <= hi ; i++ ) {
char c = s.charAt(i);
int cnt = charCounts.getOrDefault(c,0);
charCounts.put(c,cnt+1);
}
return charCounts;
}
static boolean palindromePossible(Map<Character,Integer> charCounts) {
int oddCnt = 0;
for( int cnt : charCounts.values() ) {
if( (cnt % 2) != 0 ) {
oddCnt++;
if( oddCnt > 1 ) {
return false; // can not be made palindromic
}
}
}
return true;
}
static char findMiddleChar( Map<Character,Integer> charCounts ) {
Map<Character,Integer> halfCounts = new HashMap<>();
for( Map.Entry<Character,Integer> e : charCounts.entrySet() ) {
char c = e.getKey();
int cnt = e.getValue();
if( (cnt % 2) != 0 ) {
return c;
}
}
return 0;
}
static Map<Character,Integer> halfValues( Map<Character,Integer> charCounts ) {
Map<Character,Integer> halfCounts = new HashMap<>();
for( Map.Entry<Character,Integer> e : charCounts.entrySet() ) {
char c = e.getKey();
int cnt = e.getValue();
halfCounts.put(c,cnt/2); // we round *down*
}
return halfCounts;
}
static String repeat(char c, int cnt ) {
return cnt <= 0 ? "" : String.format("%"+cnt+"s","").replace(" ",""+c);
}
static void testReshuffle(String s ) {
int rng[] = findReshuffleRange( s );
if( rng == null ) {
System.out.println("Result : '"+s+"' is not palindromizable");
} else if( rng[0] == rng[1] ) {
System.out.println("Result : whole '"+s+"' is a palindrome");
} else {
System.out.println("Result : '"+s+"'");
System.out.println(" "+repeat('-',rng[0])+repeat('X',rng[1]-rng[0])+repeat('-',s.length()-rng[1]) );
}
}
public static void main (String[] args) {
testReshuffle( "abcdefedcba" );
testReshuffle( "abcdcdeeba" );
testReshuffle( "abcfdeedcba" );
testReshuffle( "abcdeedbca" );
testReshuffle( "abcdefcdeba" );
testReshuffle( "abcdefgfcdeba" );
testReshuffle( "accdefcdeba" );
}
}
you can use like this
bool morethanone(string s, char c)
{
// Count variable
int res = 0;
for (int i=0;i < s.length(); i++)
// checking character in string
if (s[i] == c)
res++;
if(res > 1)
return true;
else
return false;
}
int getsubstringlength(string text)
{
int result = 0;
for (int i = 0; i < text.length(); i++)
{
if(morethanone(text, text[i]))
result++;
}
return result / 2;
}
I am working on my first web app (weather visualization) that requires some light c++ on the back end. I am using wget to download the raw text, and c++ console to parse the data and it then writes HTML. This works great so far.
METAR is basically raw weather data from a station. (Time, Date, Conditions, Temp etc). The one I am using currently is :
2018/08/10 08:09
KBAZ 100809Z AUTO 00000KT 10SM BKN012 26/23 A3002 RMK AO2 T02610233
I have been able to store each set of data into different variables. The set I am looking at with the issue is the "26/23" above, which is the temperature and dew point in Celsius.
So far I have a string called tempAndDewpoint with "26/23" stored in it... I am using substr(0,2) to return the just temperature in a new string called temperature. (since the first number is temperature). This works great.
My question is, what happens if the temperature is below 10, like 9? I could no longer use substring(0,2) because that would then return "9/" as the current temperature.
I hope to find some guidance with this that is not too complicated for me to duplicate. I wasn't even sure what to name this question as I am not sure what this issue is called. Surely it must be common?
Beware: Negative temperatures in METAR are prefixed with M. So these are valid temp groups: 5/M2 or M8/M12 (negative dew points are in fact icing points). So I would not use a custom parser here:
struct TTD {
short int t;
short int td;
bool parse(const char *tempAndDewpoint) {
const char *next;
t = parse_partial(tempAndDewpoint, &next);
if (*next != '/') return false;
td = parse_partial(next + 1, &next);
return (*next == '\0');
}
private:
static short int parse_partial(const char *beg, const char **next) {
bool neg = false;
short int val = 0;
if (*beg == 'M') {
neg = true;
beg += 1;
}
while (*beg >= '0' && *beg <= '9') {
val = val * 10 + (*beg - '0');
beg += 1;
}
*next = beg;
if (neg) val = -val;
return val;
}
};
The simple solution is to not store as a string at all. Split the string into two independent numbers. As stated in the other answer you do need to take care of "M" being a prefix for negative numbers but there is no read to parse the numbers by hand:
int parseNum(const std::string& str)
{
size_t pos;
int num;
if (!str.empty() && str.front() == 'M')
{
num = -std::stoi(str.substr(1), &pos);
if (pos != str.size() - 1)
{
throw std::invalid_argument("invalid input");
}
}
else
{
num = std::stoi(str, &pos);
if (pos != str.size())
{
throw std::invalid_argument("invalid input");
}
}
return num;
}
size_t slash = tempAndDewpoint.find("/");
if (slash == std::string::npos)
{
throw std::invalid_argument("invalid input");
}
int temp = parseNum(tempAndDewpoint.substr(0, slash));
int dew = parseNum(tempAndDewpoint.substr(slash + 1));
I want to parse relatively simple registry file format, let's assume it's plain ascii, saved in old REGEDIT4 format. I want to parse it using standard c++ regex class or function (preferably no boost). As an input data it could take for example sample file like this:
REGEDIT4
[HKEY_LOCAL_MACHINE\SOFTWARE\MyCompany\ConfigurationData\v1.0]
[HKEY_LOCAL_MACHINE\SOFTWARE\MyCompany\ConfigurationData\v1.0\General]
"SettingDword"=dword:00000009
"Setting1"="Some string 1"
"SettingString2"="my String"
[HKEY_LOCAL_MACHINE\SOFTWARE\MyCompany\ConfigurationData\v1.0\Networking]
"SettingDword2"=dword:00000002
"Setting2"="Some string 2"
"SettingString3"="my String2"
What I have briefly analyzed - scanning multiple [] can be done using for example cregex_token_iterator class, but main problem is that it is working in opposite way, which I want to use it. I want to start matching pattern like this: regex re("(\\[.*?\\])"), but token iterator returns all strings, which were not matched, which does sounds kind silly to me.
Basically I would like to match first whole section (\\[.*?\\])(.*?\n\n), and then pick up registry path first, and key-values next - then split using regex key-value pairs.
It's really incredible that in C# it's relatively easy to write regex matcher like this, but I would prefer go with C++, as it's native, does not have performance and assembly unload problems.
Finally cross analyzed - it's possible to use regex_search, but search needs to be retried by continuing from next char* after found pattern.
Below is almost complete example to load .reg file at run-time, I'm using MFC's CString, because it's slightly easier to use than std::string and portability is not needed currently.
#include "stdafx.h"
#include <afx.h> //CFile
#include "TestRegex.h"
#include <fstream>
#include <string>
#include <regex>
#include <map>
CWinApp theApp;
using namespace std;
typedef enum
{
eREG_DWORD = REG_DWORD,
eREG_QWORD = REG_QWORD,
eREG_BINARY = REG_BINARY,
eREG_SZ = REG_SZ
}eRegType;
class RegVariant
{
public:
eRegType type;
union
{
DWORD dw;
__int64 qw;
};
CStringA str;
};
class RegKeyNode
{
public:
// Paths to next nodes
map<CStringA, RegKeyNode> keyToNode;
// Values of current key
map<CStringA, RegVariant> keyValues;
};
map<HKEY, RegKeyNode> g_registry;
int char2int(char input)
{
if (input >= '0' && input <= '9')
return input - '0';
if (input >= 'A' && input <= 'F')
return input - 'A' + 10;
if (input >= 'a' && input <= 'f')
return input - 'a' + 10;
return 0;
}
void hexToBin( const char* hex, CStringA& bin, int maxSize = -1 )
{
int size = (strlen(hex) + 1)/ 3;
if(maxSize != -1 && size > maxSize)
size = maxSize;
unsigned char* buf = (unsigned char*)bin.GetBuffer(size);
for( int i = 0; i < size; i++ )
buf[i] = char2int( hex[ i*3 ] ) * 16 + char2int(hex[i * 3 + 1]);
bin.ReleaseBuffer();
}
int main()
{
HMODULE hModule = ::GetModuleHandle(nullptr);
AfxWinInit(hModule, nullptr, ::GetCommandLine(), 0);
//
// Load .reg file.
//
CString fileName = L"test1.reg";
CStringA file;
CFile cfile;
if (cfile.Open(fileName, CFile::modeRead | CFile::shareDenyNone))
{
int len = (int)cfile.GetLength();
cfile.Read(file.GetBuffer(len), len);
file.ReleaseBuffer();
}
cfile.Close();
file.Replace("\r\n", "\n");
const char* pbuf = file.GetBuffer();
regex reSection("\\[(.*?)\\]([^]*?)\n\n");
regex reLine("^\\s*\"(.*?)\"\\s*=\\s*(.*)$");
regex reTypedValue("^(hex|dword|hex\\(b\\)):(.*)$");
regex reStringValue("^\"(.*)\"$" );
cmatch cmSection, cmLine;
//
// For each section:
//
// [registry path]
// "value1"="value 1"
// "value2"="value 1"
//
while( regex_search(pbuf, pbuf + strlen(pbuf), cmSection, reSection) )
{
CStringA path = cmSection[1].str().c_str();
string key_values = cmSection[2].str();
const char* pkv = key_values.c_str();
int iPath = 0;
CStringA hkeyName = path.Tokenize("\\", iPath).MakeUpper();
RegKeyNode* rnode;
if( hkeyName.Compare("HKEY_LOCAL_MACHINE") == 0 )
rnode = &g_registry[HKEY_LOCAL_MACHINE];
else
rnode = &g_registry[HKEY_CURRENT_USER]; // Don't support other HKEY roots.
//
// Locate path where to place values.
//
for( ; hkeyName = path.Tokenize("\\", iPath); )
{
if( hkeyName.IsEmpty() )
break;
rnode = &rnode->keyToNode[hkeyName];
}
//
// Scan "key"="value" pairs.
//
while( regex_search(pkv, pkv+strlen(pkv), cmLine, reLine ))
{
CStringA key = cmLine[1].str().c_str();
string valueType = cmLine[2].str();
smatch cmTypeValue;
RegVariant* rvValue = &rnode->keyValues[key];
//
// Extract type and value.
//
if(regex_search(valueType, cmTypeValue, reTypedValue))
{
string type = cmTypeValue[1].str();
string value = cmTypeValue[2].str();
if( type == "dword")
{
rvValue->type = eREG_DWORD;
rvValue->dw = (DWORD)strtoul(value.c_str(), 0, 16);
}
else if (type == "hex(b)")
{
rvValue->type = eREG_QWORD;
rvValue->qw = 0;
if( value.size() == 8 * 2 + 7 )
{
CStringA v;
hexToBin(value.c_str(), v, sizeof(__int64));
rvValue->qw = *((__int64*)v.GetBuffer());
}
} else //if (type == "hex")
{
rvValue->type = eREG_BINARY;
hexToBin(value.c_str(), rvValue->str);
}
} else if( regex_search(valueType, cmTypeValue, reStringValue))
{
rvValue->type = eREG_SZ;
rvValue->str = cmTypeValue[1].str().c_str();
}
pkv = cmLine[2].second;
} //while
pbuf = cmSection[2].second;
} //while
return 0;
}
Here's the thing:
A datetime access is created with C# using DateTime.Now. This datetime is passed through JSON to a C++ method. I'm using JsonCpp to handle the Json data, but I'm not sure how to handle when the data is a datetime.
I want to compare this datetime that I received with the actual datetime and check the minutes difference between this two (if the difference is on a interval that was defined).
If I convert the Json datetime to a string using JsonCpp I have this format:
2015-06-08T11:17:23.746389-03:00
So what I'm trying to do is something like this:
var d1 = oldAccess["Date"].ToString(); //Json datetime converted to a string
var d2 = actualAccess["Date"].ToString()
if((d2 - d1) < 20) { //Difference between the two dates needs to be less than 20 minutes
return true;
} else return false;
I'm new in C++, even looking for I don't discovered how to do this.
Well, I got it. Not the best way neither the pretty one, but it works since I know that the two dates were set on the same server and always comes in the same format \"2015-01-01T23:40:00.000000-03:00\"
Here's what I did:
int convertToInt(std::string number_str){
int number;
std::istringstream ss(number_str);
ss.imbue(std::locale::classic());
ss >> number;
return number;
}
time_t convertDatetime(std::string date_str) {
time_t rawtime;
struct tm date;
int year, month, day, hour, min, sec;
date_str.erase(std::remove_if(date_str.begin(), date_str.end(), isspace), date_str.end());
year = convertToInt(date_str.substr(1, 4));
month = convertToInt(date_str.substr(6, 2));
day = convertToInt(date_str.substr(9, 2));
hour = convertToInt(date_str.substr(12, 2));
min = convertToInt(date_str.substr(15, 2));
sec = convertToInt(date_str.substr(18, 2));
time(&rawtime);
localtime_s(&date, &rawtime);
date.tm_year = year - 1900;
date.tm_mon = month - 1;
date.tm_mday = day;
date.tm_hour = hour;
date.tm_min = min;
date.tm_sec = sec;
return mktime(&date);
}
bool isValidIntervalDatetime(std::string actualDatetime_str, std::string oldDatetime_str, int maxMinutesInterval) {
double maxSecondsInterval = 60 * maxMinutesInterval;
time_t actualDatetime = convertDatetime(actualDatetime_str);
time_t oldDatetime = convertDatetime(oldDatetime_str);
double secondsDiff = difftime(actualDatetime, oldDatetime);
return secondsDiff <= maxSecondsInterval;
}
int main(int argc, char* argv[])
{
auto maxMinutesInterval = 20;
auto actuaDatetime = JsonConverter::toString(actualAccess["Date"]); // \"2015-01-02T00:00:00.000000-03:00\"
auto oldDatetime = JsonConverter::toString(oldAccess["Date"]); // \"2015-01-01T23:40:00.000000-03:00\"
if (isValidIntervalDatetime(actuaDatetime, oldDatetime, maxMinutesInterval){
//do something
}
}
I have a string which should specify a date and time in ISO 8601 format, which may or may not have milliseconds in it, and I am wanting to get a struct tm from it as well as any millisecond value that may have been specified (which can be assumed to be zero if not present in the string).
What would be involved in detecting whether the string is in the correct format, as well as converting a user-specified string into the struct tm and millisecond values?
If it weren't for the millisconds issue, I could probably just use the C function strptime(), but I do not know what the defined behavior of that function is supposed to be when the seconds contain a decimal point.
As one final caveat, if it is at all possible, I would greatly prefer a solution that does not have any dependency on functions that are only found in Boost (but I'm happy to accept C++11 as a prerequisite).
The input is going to look something like:
2014-11-12T19:12:14.505Z
or
2014-11-12T12:12:14.505-5:00
Z, in this case, indicates UTC, but any time zone might be used, and will be expressed as a + or - hours/minutes offset from GMT. The decimal portion of the seconds field is optional, but the fact that it may be there at all is why I cannot simply use strptime() or std::get_time(), which do not describe any particular defined behavior if such a character is found in the seconds portion of the string.
New answer for old question. Rationale: updated tools.
Using this free, open source library, one can parse into a std::chrono::time_point<system_clock, milliseconds>, which has the advantage over a tm of being able to hold millisecond precision. And if you really need to, you can continue on to the C API via system_clock::to_time_t (losing the milliseconds along the way).
#include "date.h"
#include <iostream>
#include <sstream>
date::sys_time<std::chrono::milliseconds>
parse8601(std::istream&& is)
{
std::string save;
is >> save;
std::istringstream in{save};
date::sys_time<std::chrono::milliseconds> tp;
in >> date::parse("%FT%TZ", tp);
if (in.fail())
{
in.clear();
in.exceptions(std::ios::failbit);
in.str(save);
in >> date::parse("%FT%T%Ez", tp);
}
return tp;
}
int
main()
{
using namespace date;
using namespace std;
cout << parse8601(istringstream{"2014-11-12T19:12:14.505Z"}) << '\n';
cout << parse8601(istringstream{"2014-11-12T12:12:14.505-5:00"}) << '\n';
}
This outputs:
2014-11-12 19:12:14.505
2014-11-12 17:12:14.505
Note that both outputs are UTC. The parse converted the local time to UTC using the -5:00 offset. If you actually want local time, there is also a way to parse into a type called date::local_time<milliseconds> which would then parse but ignore the offset. One can even parse the offset into a chrono::minutes if desired (using a parse overload taking minutes&).
The precision of the parse is controlled by the precision of the chrono::time_point you pass in, instead of by flags in the format string. And the offset can either be of the style +/-hhmm with %z, or +/-[h]h:mm with %Ez.
You can use C's sscanf (http://www.cplusplus.com/reference/cstdio/sscanf/) to parse it:
const char *dateStr = "2014-11-12T19:12:14.505Z";
int y,M,d,h,m;
float s;
sscanf(dateStr, "%d-%d-%dT%d:%d:%fZ", &y, &M, &d, &h, &m, &s);
If you have std::string it can be called like this (http://www.cplusplus.com/reference/string/string/c_str/):
std::string dateStr = "2014-11-12T19:12:14.505Z";
sscanf(dateStr.c_str(), "%d-%d-%dT%d:%d:%fZ", &y, &M, &d, &h, &m, &s);
If it should handle different timezones you need to use sscanf return value - number of parsed arguments:
int tzh = 0, tzm = 0;
if (6 < sscanf(dateStr.c_str(), "%d-%d-%dT%d:%d:%f%d:%dZ", &y, &M, &d, &h, &m, &s, &tzh, &tzm)) {
if (tzh < 0) {
tzm = -tzm; // Fix the sign on minutes.
}
}
And then you can fill tm (http://www.cplusplus.com/reference/ctime/tm/) struct:
tm time = { 0 };
time.tm_year = y - 1900; // Year since 1900
time.tm_mon = M - 1; // 0-11
time.tm_mday = d; // 1-31
time.tm_hour = h; // 0-23
time.tm_min = m; // 0-59
time.tm_sec = (int)s; // 0-61 (0-60 in C++11)
It also can be done with std::get_time (http://en.cppreference.com/w/cpp/io/manip/get_time) since C++11 as #Barry mentioned in comment how do I parse an iso 8601 date (with optional milliseconds) to a struct tm in C++?
Modern C++ version of parse ISO 8601* function
* - this code supports only subset of ISO 8601. The only supported forms are "2020-09-19T05:12:32Z" and "2020-09-19T05:12:32.123Z". Milliseconds can be 3 digit length or no milliseconds part at all, no timezone except Z, no other more rare features.
#include <cstdlib>
#include <ctime>
#include <string>
#ifdef _WIN32
#define timegm _mkgmtime
#endif
inline int ParseInt(const char* value)
{
return std::strtol(value, nullptr, 10);
}
// ParseISO8601 returns milliseconds since 1970
std::time_t ParseISO8601(const std::string& input)
{
constexpr const size_t expectedLength = sizeof("1234-12-12T12:12:12Z") - 1;
static_assert(expectedLength == 20, "Unexpected ISO 8601 date/time length");
if (input.length() < expectedLength)
{
return 0;
}
std::tm time = { 0 };
time.tm_year = ParseInt(&input[0]) - 1900;
time.tm_mon = ParseInt(&input[5]) - 1;
time.tm_mday = ParseInt(&input[8]);
time.tm_hour = ParseInt(&input[11]);
time.tm_min = ParseInt(&input[14]);
time.tm_sec = ParseInt(&input[17]);
time.tm_isdst = 0;
const int millis = input.length() > 20 ? ParseInt(&input[20]) : 0;
return timegm(&time) * 1000 + millis;
}
Old question, and I have some old code to contribute ;). I was using the date library mentioned here. While it works great, it comes at a performance cost. For most common cases this would be not really relevant. However, if you have for example a service parsing data like I do, it really does matter.
I was profiling my server application for performance optimization, and found that parsing an ISO timestamp using the date library was 3 times slower compared to parsing the whole (roughly 500 bytes) json document. In total parsing the timestamp accounted for about 4.8% of total CPU time.
On my quest to optimize this part, I did not find much with C++ that I would consider for a living product. And the code which I did consider further mostly had some dependencies (e.g. the ISO parser in CEPH looks ok and seems well tested).
In the end, I turned to good old C and stripped out some code from the SQLite date.c to make it work standalone. The difference:
date: 872ms
SQLite date.c: 54ms
(Profiled function weight of real life service application)
Here it is (all credits to SQLite):
The header file date_util.h
#include <stdint.h>
#include <stdbool.h>
#ifdef __cplusplus
extern "C" {
#endif
// Calculates time since epoch including milliseconds
uint64_t ParseTimeToEpochMillis(const char *str, bool *error);
// Creates an ISO timestamp with milliseconds from epoch with millis.
// The buffer size (resultLen) for result must be at least 100 bytes.
void TimeFromEpochMillis(uint64_t epochMillis, char *result, int resultLen, bool *error);
#ifdef __cplusplus
}
#endif
This is the C file date_util.c:
#include "_date.h"
#include <ctype.h>
#include <stdio.h>
#include <stdarg.h>
#include <stdarg.h>
#include <assert.h>
#include <stdio.h>
#include <string.h>
/*
** A structure for holding a single date and time.
*/
typedef struct DateTime DateTime;
struct DateTime {
int64_t iJD; /* The julian day number times 86400000 */
int Y, M, D; /* Year, month, and day */
int h, m; /* Hour and minutes */
int tz; /* Timezone offset in minutes */
double s; /* Seconds */
char validJD; /* True (1) if iJD is valid */
char rawS; /* Raw numeric value stored in s */
char validYMD; /* True (1) if Y,M,D are valid */
char validHMS; /* True (1) if h,m,s are valid */
char validTZ; /* True (1) if tz is valid */
char tzSet; /* Timezone was set explicitly */
char isError; /* An overflow has occurred */
};
/*
** Convert zDate into one or more integers according to the conversion
** specifier zFormat.
**
** zFormat[] contains 4 characters for each integer converted, except for
** the last integer which is specified by three characters. The meaning
** of a four-character format specifiers ABCD is:
**
** A: number of digits to convert. Always "2" or "4".
** B: minimum value. Always "0" or "1".
** C: maximum value, decoded as:
** a: 12
** b: 14
** c: 24
** d: 31
** e: 59
** f: 9999
** D: the separator character, or \000 to indicate this is the
** last number to convert.
**
** Example: To translate an ISO-8601 date YYYY-MM-DD, the format would
** be "40f-21a-20c". The "40f-" indicates the 4-digit year followed by "-".
** The "21a-" indicates the 2-digit month followed by "-". The "20c" indicates
** the 2-digit day which is the last integer in the set.
**
** The function returns the number of successful conversions.
*/
static int GetDigits(const char *zDate, const char *zFormat, ...){
/* The aMx[] array translates the 3rd character of each format
** spec into a max size: a b c d e f */
static const uint16_t aMx[] = { 12, 14, 24, 31, 59, 9999 };
va_list ap;
int cnt = 0;
char nextC;
va_start(ap, zFormat);
do{
char N = zFormat[0] - '0';
char min = zFormat[1] - '0';
int val = 0;
uint16_t max;
assert( zFormat[2]>='a' && zFormat[2]<='f' );
max = aMx[zFormat[2] - 'a'];
nextC = zFormat[3];
val = 0;
while( N-- ){
if( !isdigit(*zDate) ){
goto end_getDigits;
}
val = val*10 + *zDate - '0';
zDate++;
}
if( val<(int)min || val>(int)max || (nextC!=0 && nextC!=*zDate) ){
goto end_getDigits;
}
*va_arg(ap,int*) = val;
zDate++;
cnt++;
zFormat += 4;
}while( nextC );
end_getDigits:
va_end(ap);
return cnt;
}
/*
** Parse a timezone extension on the end of a date-time.
** The extension is of the form:
**
** (+/-)HH:MM
**
** Or the "zulu" notation:
**
** Z
**
** If the parse is successful, write the number of minutes
** of change in p->tz and return 0. If a parser error occurs,
** return non-zero.
**
** A missing specifier is not considered an error.
*/
static int ParseTimezone(const char *zDate, DateTime *p){
int sgn = 0;
int nHr, nMn;
int c;
while( isspace(*zDate) ){ zDate++; }
p->tz = 0;
c = *zDate;
if( c=='-' ){
sgn = -1;
}else if( c=='+' ){
sgn = +1;
}else if( c=='Z' || c=='z' ){
zDate++;
goto zulu_time;
}else{
return c!=0;
}
zDate++;
if( GetDigits(zDate, "20b:20e", &nHr, &nMn)!=2 ){
return 1;
}
zDate += 5;
p->tz = sgn*(nMn + nHr*60);
zulu_time:
while( isspace(*zDate) ){ zDate++; }
p->tzSet = 1;
return *zDate!=0;
}
/*
** Parse times of the form HH:MM or HH:MM:SS or HH:MM:SS.FFFF.
** The HH, MM, and SS must each be exactly 2 digits. The
** fractional seconds FFFF can be one or more digits.
**
** Return 1 if there is a parsing error and 0 on success.
*/
static int ParseHhMmSs(const char *zDate, DateTime *p){
int h, m, s;
double ms = 0.0;
if( GetDigits(zDate, "20c:20e", &h, &m)!=2 ){
return 1;
}
zDate += 5;
if( *zDate==':' ){
zDate++;
if( GetDigits(zDate, "20e", &s)!=1 ){
return 1;
}
zDate += 2;
if( *zDate=='.' && isdigit(zDate[1]) ){
double rScale = 1.0;
zDate++;
while( isdigit(*zDate) ){
ms = ms*10.0 + *zDate - '0';
rScale *= 10.0;
zDate++;
}
ms /= rScale;
}
}else{
s = 0;
}
p->validJD = 0;
p->rawS = 0;
p->validHMS = 1;
p->h = h;
p->m = m;
p->s = s + ms;
if( ParseTimezone(zDate, p) ) return 1;
p->validTZ = (p->tz!=0)?1:0;
return 0;
}
/*
** Put the DateTime object into its error state.
*/
static void DatetimeError(DateTime *p){
memset(p, 0, sizeof(*p));
p->isError = 1;
}
/*
** Convert from YYYY-MM-DD HH:MM:SS to julian day. We always assume
** that the YYYY-MM-DD is according to the Gregorian calendar.
**
** Reference: Meeus page 61
*/
static void ComputeJD(DateTime *p){
int Y, M, D, A, B, X1, X2;
if( p->validJD ) return;
if( p->validYMD ){
Y = p->Y;
M = p->M;
D = p->D;
}else{
Y = 2000; /* If no YMD specified, assume 2000-Jan-01 */
M = 1;
D = 1;
}
if( Y<-4713 || Y>9999 || p->rawS ){
DatetimeError(p);
return;
}
if( M<=2 ){
Y--;
M += 12;
}
A = Y/100;
B = 2 - A + (A/4);
X1 = 36525*(Y+4716)/100;
X2 = 306001*(M+1)/10000;
p->iJD = (int64_t)((X1 + X2 + D + B - 1524.5 ) * 86400000);
p->validJD = 1;
if( p->validHMS ){
p->iJD += p->h*3600000 + p->m*60000 + (int64_t)(p->s*1000);
if( p->validTZ ){
p->iJD -= p->tz*60000;
p->validYMD = 0;
p->validHMS = 0;
p->validTZ = 0;
}
}
}
/*
** Parse dates of the form
**
** YYYY-MM-DD HH:MM:SS.FFF
** YYYY-MM-DD HH:MM:SS
** YYYY-MM-DD HH:MM
** YYYY-MM-DD
**
** Write the result into the DateTime structure and return 0
** on success and 1 if the input string is not a well-formed
** date.
*/
static int ParseYyyyMmDd(const char *zDate, DateTime *p){
int Y, M, D, neg;
if( zDate[0]=='-' ){
zDate++;
neg = 1;
}else{
neg = 0;
}
if( GetDigits(zDate, "40f-21a-21d", &Y, &M, &D)!=3 ){
return 1;
}
zDate += 10;
while( isspace(*zDate) || 'T'==*(uint8_t*)zDate ){ zDate++; }
if( ParseHhMmSs(zDate, p)==0 ){
/* We got the time */
}else if( *zDate==0 ){
p->validHMS = 0;
}else{
return 1;
}
p->validJD = 0;
p->validYMD = 1;
p->Y = neg ? -Y : Y;
p->M = M;
p->D = D;
if( p->validTZ ){
ComputeJD(p);
}
return 0;
}
/* The julian day number for 9999-12-31 23:59:59.999 is 5373484.4999999.
** Multiplying this by 86400000 gives 464269060799999 as the maximum value
** for DateTime.iJD.
**
** But some older compilers (ex: gcc 4.2.1 on older Macs) cannot deal with
** such a large integer literal, so we have to encode it.
*/
#define INT_464269060799999 ((((int64_t)0x1a640)<<32)|0x1072fdff)
/*
** Return TRUE if the given julian day number is within range.
**
** The input is the JulianDay times 86400000.
*/
static int ValidJulianDay(int64_t iJD){
return iJD>=0 && iJD<=INT_464269060799999;
}
/*
** Compute the Year, Month, and Day from the julian day number.
*/
static void ComputeYMD(DateTime *p){
int Z, A, B, C, D, E, X1;
if( p->validYMD ) return;
if( !p->validJD ){
p->Y = 2000;
p->M = 1;
p->D = 1;
}else if( !ValidJulianDay(p->iJD) ){
DatetimeError(p);
return;
}else{
Z = (int)((p->iJD + 43200000)/86400000);
A = (int)((Z - 1867216.25)/36524.25);
A = Z + 1 + A - (A/4);
B = A + 1524;
C = (int)((B - 122.1)/365.25);
D = (36525*(C&32767))/100;
E = (int)((B-D)/30.6001);
X1 = (int)(30.6001*E);
p->D = B - D - X1;
p->M = E<14 ? E-1 : E-13;
p->Y = p->M>2 ? C - 4716 : C - 4715;
}
p->validYMD = 1;
}
/*
** Compute the Hour, Minute, and Seconds from the julian day number.
*/
static void ComputeHMS(DateTime *p){
int s;
if( p->validHMS ) return;
ComputeJD(p);
s = (int)((p->iJD + 43200000) % 86400000);
p->s = s/1000.0;
s = (int)p->s;
p->s -= s;
p->h = s/3600;
s -= p->h*3600;
p->m = s/60;
p->s += s - p->m*60;
p->rawS = 0;
p->validHMS = 1;
}
/*
** Compute both YMD and HMS
*/
static void ComputeYMD_HMS(DateTime *p){
ComputeYMD(p);
ComputeHMS(p);
}
/*
** Input "r" is a numeric quantity which might be a julian day number,
** or the number of seconds since 1970. If the value if r is within
** range of a julian day number, install it as such and set validJD.
** If the value is a valid unix timestamp, put it in p->s and set p->rawS.
*/
static void SetRawDateNumber(DateTime *p, double r){
p->s = r;
p->rawS = 1;
if( r>=0.0 && r<5373484.5 ){
p->iJD = (int64_t)(r*86400000.0 + 0.5);
p->validJD = 1;
}
}
/*
** Clear the YMD and HMS and the TZ
*/
static void ClearYMD_HMS_TZ(DateTime *p){
p->validYMD = 0;
p->validHMS = 0;
p->validTZ = 0;
}
// modified methods to only calculate for and back between epoch and iso timestamp with millis
uint64_t ParseTimeToEpochMillis(const char *str, bool *error) {
assert(str);
assert(error);
*error = false;
DateTime dateTime;
int res = ParseYyyyMmDd(str, &dateTime);
if (res) {
*error = true;
return 0;
}
ComputeJD(&dateTime);
ComputeYMD_HMS(&dateTime);
// get fraction (millis of a full second): 24.355 => 355
int millis = (dateTime.s - (int)(dateTime.s)) * 1000;
uint64_t epoch = (int64_t)(dateTime.iJD/1000 - 21086676*(int64_t)10000) * 1000 + millis;
return epoch;
}
void TimeFromEpochMillis(uint64_t epochMillis, char *result, int resultLen, bool *error) {
assert(resultLen >= 100);
assert(result);
assert(error);
int64_t seconds = epochMillis / 1000;
int millis = epochMillis - seconds * 1000;
DateTime x;
*error = false;
memset(&x, 0, sizeof(x));
SetRawDateNumber(&x, seconds);
/*
** unixepoch
**
** Treat the current value of p->s as the number of
** seconds since 1970. Convert to a real julian day number.
*/
{
double r = x.s*1000.0 + 210866760000000.0;
if( r>=0.0 && r<464269060800000.0 ){
ClearYMD_HMS_TZ(&x);
x.iJD = (int64_t)r;
x.validJD = 1;
x.rawS = 0;
}
ComputeJD(&x);
if( x.isError || !ValidJulianDay(x.iJD) ) {
*error = true;
}
}
ComputeYMD_HMS(&x);
snprintf(result, resultLen, "%04d-%02d-%02dT%02d:%02d:%02d.%03dZ",
x.Y, x.M, x.D, x.h, x.m, (int)(x.s), millis);
}
These two helper methods simply convert to and from a timestamp in with milliseconds. Setting a tm struct from the DateTime should be obvious.
Example usage:
// Calculate milliseconds since epoch
std::string timeStamp = "2019-09-02T22:02:24.355Z";
bool error;
uint64_t time = ParseTimeToEpochMillis(timeStamp.c_str(), &error);
// Get ISO timestamp with milliseconds component from epoch in milliseconds.
// Multiple by 1000 in case you have a standard epoch in seconds)
uint64_t epochMillis = 1567461744355; // == "2019-09-02T22:02:24.355Z"
char result[100] = {0};
TimeFromEpochMillis(epochMillis, result, sizeof(result), &error);
std::string resultStr(result); // == "2019-09-02T22:02:24.355Z"
There is a from_iso_string and from_iso_extended_string in Boost::DateTime library:
#include <boost/date_time/posix_time/posix_time.hpp>
using namespace boost::posix_time;
// signature
ptime from_iso_string(std::string)
ptime from_iso_extended_string(std::string)
// examples
std::string ts("20020131T235959");
ptime t1(from_iso_string(ts))
std::string ts("2020-01-31T23:59:59.123");
ptime t2(from_iso_extended_string(ts))
I used strptime():
const chrono::time_point<chrono::system_clock, chrono::seconds> iSO8601StringToTimePoint(const string& iso8601) {
std::tm t = {};
// F: Equivalent to %Y-%m-%d, the ISO 8601 date format.
// T: ISO 8601 time format (HH:MM:SS), equivalent to %H:%M:%S
// z: ISO 8601 offset from UTC in timezone (1 minute=1, 1 hour=100). If timezone cannot be determined, no characters
strptime(iso8601.c_str(), "%FT%T%z", &t);
return chrono::system_clock::from_time_t(mktime(&t));
}
While I went the sscanf() path at first, after switching my IDE to CLion, it suggested the use of std::strtol() function to replace sscanf().
Remember that this is just an example of achieving the same result as the sscanf() version. It's not meant to be shorter, universal and correct in every way, but to point everyone in the "pure C++ solution" direction. It's based on the timestamp strings I receive from an API and is not yet universal (my case needs handling the YYYY-MM-DDTHH:mm:ss.sssZ format), it could be easily modified to handle different ones.
Before posting the code, there's one thing that needs to be done before using std::strtol(): cleaning up the string itself, so removing any non-digit markers ("-", ":", "T", "Z", "."), because without it std::strtol() will parse the numbers the wrong way (you might end up with negative month or day values without it).
This little snippet takes a ISO-8601 string (the format I needed, as mentioned above) and converts it into a std::time_t result, representing the epoch time in milliseconds. From here it's quite easy to go into std::chrono-type objects.
std::time_t parseISO8601(const std::string &input)
{
// prepare the data output placeholders
struct std::tm time = {0};
int millis;
// string cleaning for strtol() - this could be made cleaner, but for the sake of the example itself...
std::string cleanInput = input
.replace(4, 1, 1, ' ')
.replace(7, 1, 1, ' ')
.replace(10, 1, 1, ' ')
.replace(13, 1, 1, ' ')
.replace(16, 1, 1, ' ')
.replace(19, 1, 1, ' ');
// pointers for std::strtol()
const char* timestamp = cleanInput.c_str();
// last parsing end position - it's where strtol finished parsing the last number found
char* endPointer;
// the casts aren't necessary, but I just wanted CLion to be quiet ;)
// first parse - start with the timestamp string, give endPointer the position after the found number
time.tm_year = (int) std::strtol(timestamp, &endPointer, 10) - 1900;
// next parses - use endPointer instead of timestamp (skip the part, that's already parsed)
time.tm_mon = (int) std::strtol(endPointer, &endPointer, 10) - 1;
time.tm_mday = (int) std::strtol(endPointer, &endPointer, 10);
time.tm_hour = (int) std::strtol(endPointer, &endPointer, 10);
time.tm_min = (int) std::strtol(endPointer, &endPointer, 10);
time.tm_sec = (int) std::strtol(endPointer, &endPointer, 10);
millis = (int) std::strtol(endPointer, &endPointer, 10);
// convert the tm struct into time_t and then from seconds to milliseconds
return std::mktime(&time) * 1000 + millis;
}
Not the cleanest and most universal, but gets the job done without resorting to C-style functions like sscanf().