Getting the best performance out of {fmt}

Getting the best performance out of {fmt} - c++

I need to format a FILETIME value info a wide string buffer and configuration provides the format string.
What I am actually doing:
Config provides the format string: L"{YYYY}-{MM}-{DD} {hh}:{mm}:{ss}.{mmm}"
Convert the FILETIME to System time:
SYSTEMTIME stUTC;
FileTimeToSystemTime(&fileTime, &stUTC);
Format the string with
fmt::format_to(std::back_inserter(buffer), strFormat,
fmt::arg(L"YYYY", stUTC.wYear),
fmt::arg(L"MM", stUTC.wMonth),
fmt::arg(L"DD", stUTC.wDay),
fmt::arg(L"hh", stUTC.wHour),
fmt::arg(L"mm", stUTC.wMinute),
fmt::arg(L"ss", stUTC.wSecond),
fmt::arg(L"mmm", stUTC.wMilliseconds));
I perfectly understand that with a service comes a cost :) but my code is calling this statement millions of time and the performance penalty is clearly present (more than 6% of CPU usage).
"Anything" I could do to improve this code would be welcomed.
I saw that {fmt} has a time API support.
Unfortunately, it seems to be unable to format the millisecond part of the time/date and, it requires some conversion effort from FILETIME to std::time_t...
Should I forget about the "custom" format string and provide a custom formatter for the FILETIME (or SYSTEMTIME) types? Would that result in a significant performance boost?
I'd appreciate any guidance you can provide.

In the comments I suggested parsing your custom time format string into a simple state machine. It does not even have to be a state machine as such. It is simply a linear series of instructions.
Currently, the fmt class needs to do a bit of work to parse the format type and then convert an integer to a zero-padded string. It is possible, though unlikely, that it is as heavily optimized as I'm about to suggest.
The basic idea is to have a (large) lookup table, which of course can be generated at runtime, but for the purposes of quick illustration:
const wchar_t zeroPad4[10000][5] = { L"0000", L"0001", L"0002", ..., L"9999" };
You can have 1-, 2- and 3-digit lookup tables if you want, or alternatively recognize that these values are all contained in the 4-digit lookup table if you just add an offset.
So to output a number, you just need to know what the offset in SYSTEMTIME is, what type the value is, and what string offset to apply (0 for 4-digit, 1 for 3-digit, etc). It makes things simpler, given that all struct elements in SYSTEMTIME are the same type. And you should reasonably assume that no values require range checks, although you can add that if unsure.
And you can configure it like this:
struct Output {
int dataOffset; // offset into SYSTEMTIME struct
int count; // extra adjustment after string lookup
};
What about literal strings? Well, you can either copy those or just repurpose Output to use a negative dataOffset representing where to start in the format string and count to hold how many characters to output in that mode. If you need extra output modes, extend this struct with a mode member.
Anwyay, let's take your string L"{YYYY}-{MM}-{DD} {hh}:{mm}:{ss}.{mmm}" as an example. After you parse this, you would end up with:
Output outputs[] {
{ offsetof(SYSTEMTIME, wYear), 0 }, // "{YYYY}"
{ -6, 1 }, // "-"
{ offsetof(SYSTEMTIME, wMonth), 2 }, // "{MM}"
{ -11, 1 }, // "-"
{ offsetof(SYSTEMTIME, wDay), 2 }, // "{DD}"
{ -16, 1 }, // " "
// etc... you get the idea
{ offsetof(SYSTEMTIME, wMilliseconds), 1 }, // "{mmm}"
{ -1, 0 }, // terminate
};
It shouldn't take much to see that, when you have a SYSTEMTIME as input, a pointer to the original format string, the lookup table, and this basic array of instructions you can go ahead and output the result into a pre-sized buffer very quickly.
I'm sure you can come up with the code to execute these instructions efficiently.
The main drawback of this approach is the size of the lookup table may lead to cache issues. However, most lookups will occur in the first 100 elements. You could also compress the table to ordinary char values and then inject the wchar_t zero bytes when copying.
As always: experiment, measure, and have fun!

Related

QJsonDocument::toJson() incorrect double precision

In my project I read from a json file with QJsonDocument::fromJson(). This works great, however when I try to write the QJsonDocument back to file with toJson() some of the doubles have messed up precision.
For example, calling toJson() on a document with a QJsonValue with a double value of 0.15 will save to file as 0.14999999999999999. I do not want this.
This is because the Qt source file qjsonwriter.cpp at line 126 (Qt 5.6.2) reads:
json += QByteArray::number(d, 'g', std::numeric_limits<double>::digits10 + 2); // ::digits10 is 15
That +2 at the end there is messing me up. If this same call to QByteArray::number() instead has a precision of 15 (instead of 17), the result is exactly as I need... 0.15.
I understand how the format of floating point precision causes the double to be limited in what it can represent. But if I limit the precision to 15 instead of 17, this has the effect of matching the input double precision, which I want.
How can I get around this?
Obviously... I could write my own Json parser, but that's last resort. And obviously I could edit the Qt source code, however my software is already deployed with the Qt5Core.dll included in everyone's install directory, and my updater is not designed to update any dll's. So I cannot edit the Qt source code.
Fingers crossed someone has a magic fix for this :)

this has the effect of matching the input double precision, which I want.
This request doesn't make much sense. A double doesn't carry any information about its precision - it only carries a value. 0.15, 0.1500 and 0.14999999999999999 are the exact same double value, and the JSON writer has no way to know how it was read from the file in first place (if it was read from a file at all).
In general you cannot ask for maximum 15 digits of precision as you propose, as, depending from the particular value, up to 17 are required for a precise double->text->double roundtrip, so you would write incorrectly rounded values. What some JSON writers do however is to write numbers with the minimum number of decimals required to read the same double back. This is far from trivial to do numerically correctly unless you do - as many do - a loop from 15 to 17, write the number with such precision, parse it back and see if it comes back as the exact same double value. While this generates "nicer" (and smaller) output, it's more work and slows down the JSON write, so that's why probably Qt doesn't do this.
Still, you can write your own JSON write code and have this feature, for a simple recursive implementation I expect ~15 lines of code.
That being said, again, if you want to precisely match your input this won't save you - as it's simply impossible.

I just encountered this as well. Rather than replace an entire Qt JSON implementation with a third party library (or roll my own!), however, I kludged a solution...
My full code base related to this is too extensive and elaborate to post and explain here. But the gist of the solution to this point is simple enough.
First, I use a QVariantMap (or QVariantHash) to collect my data, and then convert that to json via the built-in QJsonObject::fromVariantMap or QJsonDocument::fromVariant functions. To control the serialization, I define a class called DataFormatOptions which has a decimalPrecision member (and sets up easy expansion to other such formatting options..) and then I call a function called toMagicVar to create "magic variants" for my data structure to be converted to json bytes. To control for the number format / precision toMagicVar converts doubles and floats to strings that are in the desired format, and surrounds the string value with some "magic bytes". The way my actual code is written, one can easily do this on any "level" of the map/hash I'm building / formatting via recursive processing, but I've omitted those details...
const QString NO_QUOTE( "__NO_QUOT__" );
QVariant toMagicVar( const QVariant &var, const DataFormatOptions &opt )
{
...
const QVariant::Type type( var.type() );
const QMetaType::Type metaType( (QMetaType::Type)type );
...
if( opt.decimalPrecision != DataFormatOptions::DEFAULT_PRECISION
&& (type == QVariant::Type::Double || metaType == QMetaType::Float) )
{
static const char FORMAT( 'f' );
static const QRegExp trailingPointAndZeros( "\\.?0+$" );
QString formatted( QString::number(
var.toDouble(), FORMAT, opt.decimalPrecision ) );
formatted.remove( trailingPointAndZeros );
return QVariant( QString( NO_QUOTE + formatted + NO_QUOTE ) );
}
...
}
Note that I trim off any extraneous digits via formatted.remove. If you want the data to always include exactly X digits after the decimal point, you may opt to skip that step. (Or you might want to control that via DataFormatOptions?)
Once I have the json bytes I'm going to send across the network as a QByteArray, I remove the magic bytes so my numbers represented as quoted strings become numbers again in the json.
// This is where any "magic residue" is removed, or otherwise manipulated,
// to produce the desired final json bytes...
void scrubMagicBytes( QByteArray &bytes )
{
static const QByteArray EMPTY, QUOTE( "\"" ),
NO_QUOTE_PREFIX( QUOTE + NO_QUOTE.toLocal8Bit() ),
NO_QUOTE_SUFFIX( NO_QUOTE.toLocal8Bit() + QUOTE );
bytes.replace( NO_QUOTE_PREFIX, EMPTY );
bytes.replace( NO_QUOTE_SUFFIX, EMPTY );
}

Data structure for quick access to glpyh textures via char

I am attempting to create an edit box that allows users to input text. I've been working on this for some time now and have tossed around different ideas. Ultimately, the one I think that would offer the best performance is to load all the characters from the .ttf (I'm using SDL to manage events, windows, text, and images for openGL) onto their own surface, and then render those surfaces onto textures one time. Then each frame, I can just bind an appropriate texture in the appropriate location.
However, now I'm thinking how to access these glyphs. My limited bkg would say something like this:
struct CharTextures {
char glpyh;
GLuint TextureID;
int Width;
int Height;
CharTextures* Next;
}
//Code
CharTexture* FindGlyph(char Foo) {
CharTextures* Poo = _FirstOne;
while( Poo != NULL ) {
if( Foo == Poo->glyph ) {
return Poo;
}
Poo = Poo->Next;
}
return NULL;
}
I know that will work. However, it seems very wasteful to iterate the entire list each time. My scripting experience has taught me some lua and they have tables in lua that allow for unordered indices of all sorts of types. How could I mimic it in C++ such that instead of this iteration, I could do something like:
CharTexture* FindGlyph(char Foo) {
return PooPointers[Foo]; //somehow use the character as a key to get pointer to glyph without iteration
}
I was thinking I could try converting to the numerical value, but I don't know how to convert char to UTF8 values and if I could use those as keys. I could convert to ascii but would that handle all the characters I would want to be able to type? I am trying to get this application to run on mac and windows and am not sure about the machine specifics. I've read about the differences of the different format (ascii v unicode v utf8 v utf16 etc)... I understand it has to do with bit width and endianness but I understand relatively little about the interface differences between platforms and implications of said endianness on my code.
Thank you

What you probably want is
std::map<char,CharTexture*> PooPointers;
using the array access operator will also use some search in the map behind the scene, but optimized.

What g-makulik has said is probably right. The map may be what you're after. To expand on the reply, maps are automatically sorted base on the key (char in this case) and so lookups based on the character is extremely quick using
CharTexture* pCharTexture = PooPointers[char];
If you want a sparse data structure where you don't predefine the texture for each character.
Note that running the code above where an entry doesn't exist will create a default entry in the map.
Depending on your general needs you could also use a simple vector if generalized sorting isn't important or if you know that you'll always have a fixed number of characters. You could fill the vector with predefined data for each possible character.
It all depends on your memory requirements.

Can I use JsonCpp to partially-validate JSON input?

I'm using JsonCpp to parse JSON in C++.
e.g.
Json::Reader r;
std::stringstream ss;
ss << "{\"name\": \"sample\"}";
Json::Value v;
assert(r.parse(ss, v)); // OK
assert(v["name"] == "sample"); // OK
But my actual input is a whole stream of JSON messages, that may arrive in chunks of any size; all I can do is to get JsonCpp to try to parse my input, character by character, eating up full JSON messages as we discover them:
Json::Reader r;
std::string input = "{\"name\": \"sample\"}{\"name\": \"aardvark\"}";
for (size_t cursor = 0; cursor < input.size(); cursor++) {
std::stringstream ss;
ss << input.substr(0, cursor);
Json::Value v;
if (r.parse(ss, v)) {
std::cout << v["name"] << " ";
input.erase(0, cursor);
}
} // Output: sample aardvark
This is already a bit nasty, but it does get worse. I also need to be able to resync when part of an input is missing (for any reason).
Now it doesn't have to be lossless, but I want to prevent an input such as the following from potentially breaking the parser forever:
{"name": "samp{"name": "aardvark"}
Passing this input to JsonCpp will fail, but that problem won't go away as we receive more characters into the buffer; that second name is simply invalid directly after the " that precedes it; the buffer can never be completed to present valid JSON.
However, if I could be told that the fragment certainly becomes invalid as of the second n character, I could drop everything in the buffer up to that point, and then simply wait for the next { to consider the start of a new object, as a best-effort resync.
So, is there a way that I can ask JsonCpp to tell me whether an incomplete fragment of JSON has already guaranteed that the complete "object" will be syntactically invalid?
That is:
{"name": "sample"} Valid (Json::Reader::parse == true)
{"name": "sam Incomplete (Json::Reader::parse == false)
{"name": "sam"LOL Invalid (Json::Reader::parse == false)
I'd like to distinguish between the two fail states.
Can I use JsonCpp to achieve this, or am I going to have to write my own JSON "partial validator" by constructing a state machine that considers which characters are "valid" at each step through the input string? I'd rather not re-invent the wheel...

It certainly depends if you actually control the packets (and thus the producer), or not. If you do, the most simple way is to indicate the boundaries in a header:
+---+---+---+---+-----------------------
| 3 | 16|132|243|endofprevious"}{"name":...
+---+---+---+---+-----------------------
The header is simple:
3 indicates the number of boundaries
16, 132 and 243 indicate the position of each boundary, which correspond to the opening bracket of a new object (or list)
and then comes the buffer itself.
Upon receiving such a packet, the following entries can be parsed:
previous + current[0:16]
current[16:132]
current[132:243]
And current[243:] is saved for the next packet (though you can always attempt to parse it in case it's complete).
This way, the packets are auto-synchronizing, and there is no fuzzy detection, with all the failure cases it entails.
Note that there could be 0 boundaries in the packet. It simply implies that one object is big enough to span several packets, and you just need to accumulate for the moment.
I would recommend making the numbers representation "fixed" (for example, 4 bytes each) and settling on a byte order (that of your machine) to convert them into/from binary easily. I believe the overhead to be fairly minimal (4 bytes + 4 bytes per entry given that {"name":""} is already 11 bytes).

Iterating through the buffer character-by-character and manually checking for:
the presence of alphabetic characters
outside of a string (being careful that " can be escaped with \, though)
not part of null, true or false
not a e or E inside what looks like a numeric literal with exponent
the presence of a digit outside of a string but immediately after a "
...is not all-encompassing, but I think it covers enough cases to fairly reliably break parsing at the point of or reasonably close to the point of a message truncation.
It correctly accepts:
{"name": "samL
{"name": "sam0
{"name": "sam", 0
{"name": true
as valid JSON fragments, but catches:
{"name": "sam"L
{"name": "sam"0
{"name": "sam"true
as being unacceptable.
Consequently, the following inputs will all result in the complete trailing object being parsed successfully:
1. {"name": "samp{"name": "aardvark"}
// ^ ^
// A B - B is point of failure.
// Stripping leading `{` and scanning for the first
// free `{` gets us to A. (*)
{"name": "aardvark"}
2. {"name": "samp{"0": "abc"}
// ^ ^
// A B - B is point of failure.
// Stripping and scanning gets us to A.
{"0": "abc"}
3. {"name":{ "samp{"0": "abc"}
// ^ ^ ^
// A B C - C is point of failure.
// Stripping and scanning gets us to A.
{ "samp{"0": "abc"}
// ^ ^
// B C - C is still point of failure.
// Stripping and scanning gets us to B.
{"0": "abc"}
My implementation passes some quite thorough unit tests. Still, I wonder whether the approach itself can be improved without exploding in complexity.
* Instead of looking for a leading "{", I actually have a sentinel string prepended to every message which makes the "stripping and scanning" part even more reliable.

Just look at expat or other streamed xml parsers. The logic of jsoncpp should be similar if its not. (Ask developers of this library to improve stream reading if needed.)
In other words, and from my point of view:
If some of your network (not JSON) packets are lost its not problem of JSON parser, just use more reliable protocol or invent your own. And only then transfer JSON over it.
If JSON parser reports errors and this error happened on the last parsed token (no more data in stream but expected) - accumulate data and try again (this task should be done by the library itself).
Sometimes it may not report errors though. For example when you transfer 123456 and only 123 is received. But this does not match your case since you don't transfer primitive data in a single JSON packet.
If the stream contains valid packets followed by semi-received packets, some callback should be called for each valid packet.
If the JSON parser reports errors and it's really invalid JSON, the stream should be closed and opened again if necessary.

How can GetFreeDiskSpaceEx return the (seemingly) wrong amount of disk space?

So I work on a device that outputs large images (anywhere from 30MB to 2GB+). Before we begin creating one of these images we check to see if there is sufficient disk space via GetDiskFreeSpaceEx. Typically (and in this case) we are writing to a shared folder on the same network. There are no user quotas on disk space at play.
Last night, in preparation for a demo, we kicked off a test run. During the run we experienced a failure. We needed 327391776 bytes and were told that we only had 186580992 available. The numbers from GetDiskFreeSpaceEx were:
User free space available: 186580992
Total free space available: 186580992
Those correspond to the QuadPart variables in the two (output) arguments lpFreeBytesAvailable and lpTotalNumberOfFreeBytes to GetDiskFreeSpaceAvailable.
This code has been in use for years now and I have never seen a false negative. Here is the complete function:
long IsDiskSpaceAvailable( const char* inDirectory,
const _int64& inRequestedSize,
_int64& outUserFree,
_int64& outTotalFree,
_int64& outCalcRequest )
{
ULARGE_INTEGER fba;
ULARGE_INTEGER tnb;
ULARGE_INTEGER tnfba;
ULARGE_INTEGER reqsize;
string dir;
size_t len;
dir = inDirectory;
len = strlen( inDirectory );
outUserFree = 0;
outTotalFree = 0;
outCalcRequest = 0;
if( inDirectory[len-1] != '\\' )
dir += "\\";
// this is the value of inRequestSize that was passed in
// inRequestedSize = 3273917760;
if( GetDiskFreeSpaceEx( dir.c_str(), &fba, &tnb, &tnfba ) )
{
outUserFree = fba.QuadPart;
outTotalFree = tnfba.QuadPart;
// this is computed dynamically given a specific compression
// type, but for simplicity I had hard-coded the value that was used
float compressionRatio = 10.0;
reqsize.QuadPart = (ULONGLONG) (inRequestedSize / compressionRatio);
outCalcRequest = reqsize.QuadPart;
// this is what was triggered to cause the failure,
// i.e., user free space was < the request size
if( fba.QuadPart < reqsize.QuadPart )
return( RetCode_OutOfSpace );
}
else
{
return( RetCode_Failure );
}
return( RetCode_OK );
}
So, a value of 3273917760 was passed to the function which is the total amount of disk space needed before compression. The function divides this by the compression ratio of 10 to get the actual size needed.
When I checked the disk that the share resides on it had ~177GB free, far more than what was reported. After starting the test again without changing anything it worked.
So my question here is; what could cause something like this? As far as I can tell it is not a programming error and, as I mentioned earlier, this code has been in use for a very long time now with no problems.
I checked the event log of the remote machine and found nothing of interest around the time of the failure. I'm hoping that someone out there has seen something similar before, thanks in advance.

Might not be of any use, but it's "strange" that:
177GB ~= 186580992 * 1000.
This could be explained by a stack corruption (since you don't initialize your local variable) happening elsewhere in the code.
The code "inRequestedSize / compressionRatio" doesn't have to be using float for the division, and since you've silented the "conversion loose precision" warning with the cast, you might actually hit an error too (but the number given in the example should work). You could simply do "inRequestedSize / 10".
Last but not least, you don't say where the code is running. On Mobile, the documentation of GetDiskFreeSpaceEx states:
When Mobile Encryption is enabled, the reporting behavior of this function changes. Each encrypted file has at least one 4-KB page of overhead associated. This function takes this overhead into account when it reports the amount pf space available. That is, if a 128-KB disk contains a single 60-KB file, this function reports that 64 KB is available, subtracting the space occupied by both the file and its associated overhead.
Although this function reports the total available space, keep the space requirement for encrypted files in mind when estimating whether multiple new files will fit into the remaining space. Include the amount of space required for overhead when Mobile Encryption is enabled. Each file requires at least an additional 4 KB. For example, a single 60-KB file requires 64 KB, but two 30-KB files actually require 68 KB.

String issue with assert on erase

I am developing a program in C++, using the string container , as in std::string to store network data from the socket (this is peachy), I receive the data in a maximum possible 1452 byte frame at a time, the protocol uses a header that contains information about the data area portion of the packets length, and header is a fixed 20 byte length. My problem is that a string is giving me an unknown debug assertion, as in , it asserts , but I get NO message about the string. Now considering I can receive more than a single packet in a frame at a any time, I place all received data into the string , reinterpret_cast to my data struct, calculate the total length of the packet, then copy the data portion of the packet into a string for regex processing, At this point i do a string.erase, as in mybuff.Erase(totalPackLen); <~ THIS is whats calling the assert, but totalpacklen is less than the strings size.
Is there some convention I am missing here? Or is it that the std::string really is an inappropriate choice here? Ty.
Fixed it on my own. Rolled my own VERY simple buffer with a few C calls :)
int ret = recv(socket,m_buff,0);
if(ret > 0)
{
BigBuff.append(m_buff,ret);
while(BigBuff.size() > 16){
Header *hdr = reinterpret_cast<Header*>(&BigBuff[0]);
if(ntohs(hdr->PackLen) <= BigBuff.size() - 20){
hdr->PackLen = ntohs(hdr->PackLen);
string lData;
lData.append(BigBuff.begin() + 20,BigBuff.begin() + 20 + hdr->PackLen);
Parse(lData); //regex parsing helper function
BigBuff.erase(hdr->PackLen + 20); //assert here when len is packlen is 235 and string len is 1458;
}
}
}

From the code snippet you provided it appears that your packet comprises a fixed-length binary header followed by a variable length ASCII string as a payload. Your first mistake is here:
BigBuff.append(m_buff,ret);
There are at least two problems here:
1. Why the append? You presumably have dispatched with any previous messages. You should be starting with a clean slate.
2. Mixing binary and string data can work, but more often than not it doesn't. It is usually better to keep the binary and ASCII data separate. Don't use std::string for non-string data.
Append adds data to the end of the string. The very next statement after the append is a test for a length of 16, which says to me that you should have started fresh. In the same vein you do that reinterpret cast from BigBuff[0]:
Header *hdr = reinterpret_cast<Header*>(&BigBuff[0]);
Because of your use of append, you are perpetually dealing with the header from the first packet received rather than the current packet. Finally, there's that erase:
BigBuff.erase(hdr->PackLen + 20);
Many problems here:
- If the packet length and the return value from recv are consistent the very first call will do nothing (the erase is at but not past the end of the string).
- There is something very wrong if the packet length and the return value from recv are not consistent. It might mean, for example, that multiple physical frames are needed to form a single logical frame, and that in turn means you need to go back to square one.
- Suppose the physical and logical frames are one and the same, you're still going about this all wrong. As noted, the first time around you are erasing exactly nothing. That append at the start of the loop is exactly what you don't want to do.
Serialization oftentimes is a low-level concept and is best treated as such.

Your comment doesn't make sense:
BigBuff.erase(hdr->PackLen + 20); //assert here when len is packlen is 235 and string len is 1458;
BigBuff.erase(hdr->PackLen + 20) will erase from hdr->PackLen + 20 onwards till the end of the string. From the description of the code - seems to me that you're erasing beyond the end of the content data. Here's the reference for std::string::erase() for you.
Needless to say that std::string is entirely inappropriate here, it should be std::vector.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Getting the best performance out of {fmt} - c++

Related

QJsonDocument::toJson() incorrect double precision

Data structure for quick access to glpyh textures via char

Can I use JsonCpp to partially-validate JSON input?

How can GetFreeDiskSpaceEx return the (seemingly) wrong amount of disk space?

String issue with assert on erase

Categories

Resources