I'm dealing with the problem of reading a 64bit unsigned integer unsigned long long from a string. My code should work both for GCC 4.3 and Visual Studio 2010.
I read this question and answers on the topic: Read 64 bit integer string from file and thougth that strtoull would make the work just fine and more efficiently than using a std::stringstream. Unfortunately strtoullis not available in Visual Studio's stdlib.h.
So I wrote a short templated function:
template <typename T>
T ToNumber(const std::string& Str)
{
T Number;
std::stringstream S(Str);
S >> Number;
return Number;
}
unsigned long long N = ToNumber<unsigned long long>("1234567890123456789");
I'm worried about the efficiency of this solution so, is there a better option in this escenario?
See http://social.msdn.microsoft.com/Forums/en-US/vclanguage/thread/d69a6afe-6558-4913-afb0-616f00805229/
"It's called _strtoui64(), _wcstoui64() and _tcstoui64(). Plus the _l versions for custom locales.
Hans Passant."
By the way, the way to Google things like this is to notice that Google automatically thinks you're wrong (just like newer versions of Visual Studio) and searches for something else instead, so be sure to click on the link to search for what you told it to search for.
Of course you can easily enough write your own function to handle simple decimal strings. The standard functions handle various alternatives according to numeric base and locale, which make them slow in any case.
Yes, stringstream will add a heap allocation atop all that. No, performance really doesn't matter until you can tell the difference.
There is a faster option, to use the deprecated std::strstream class which does not own its buffer (hence does not make a copy or perform an allocation). I wouldn't call that "better" though.
You could parse the string 9 digits at a time starting from the rear and multiplying by 1 000 000 000 ^ i, ie (last 8 digits * 1) + (next 8 digits * 1 billion) ... or
(1000000000000000000)+(234567890000000000)+(123456789)
Related
So I was recently upgrading an old c++ project that was built using the Visual Studio 2012 - Windows XP (v110_xp) platform toolset. In the code of this project, there is some very precise double calculations happening to require up to 20 characters of precision. These doubles were then saved to a string and printed off using the printf APIs. Here is an example of what something that would happen in this project:
double testVal = 123.456789;
// do some calculations on testVal
char str[100] = { 0 };
sprintf(str, "%.20le", testVal);
After this operation str = "1.23456789000...000e+02", which is what is expected.
However, once I update the project to be compatible with Visual Studio 2019, using Visual Studio 2019 (v142) platform Toolset, with c++ 17, the above-mentioned code produces different outputs for str.
After the call to sprintf to format the value to a string, str = "1.23456789000...556e+02". This problem isn't localized to this one value, there are even more aggregious problems. For example, one of the starting values of "2234332.434322" after the sprintf formatting gets changed to "2.23433324343219995499e+07"
From all the documentation I've read with the "l" format code, it should be the correct character for converting long doubles to the string. This behavior feels like textbook float->double conversion though.
I tried setting the projects floating-point model build an argument to precise, strict, and then fast to see if any of these options would help, but it does not have an effect on the problem.
Does anyone know why this is happening?
Use the brand new Ryu (https://github.com/ulfjack/ryu) or Grisu-Exact (https://github.com/jk-jeon/Grisu-Exact) instead which are much faster than sprintf and guaranteed to be roundtrip-correct (and more), or the good old Double-Conversion (https://github.com/google/double-conversion) which is slower than the other two but has the same guarantees, still much faster than sprintf, and is battle-tested.
(Disclaimer: I'm the author of Grisu-Exact.)
I'm not sure if you really need to print out exactly 20 decimal digits, because I personally had rare occasions where the number of digits mattered. If the sole purpose of having 20 digits is just to not lose any precision, then the above mentioned libraries will definitely provide you better (and shorter) results. If the number of digits must be precisely 20 for some reasons, then well, Ryu still provides such a feature (it's called Ryu-printf) which again has the roundtrip-guarantee and much faster than sprintf.
EDIT
To elaborate more on the last sentence, note that in general it is impossible to have the roundtrip guarantee if the number of digits is fixed, because, well, if that fixed number is too small, say, 3, then there is no way to distinguish 0.123 and 0.1234. However, 20 is big enough so that the best approximation of the true value (which is what Ryu-printf produces) is always guaranteed to be roundtrip-correct.
Assume that you're working a x86 32-bits system. Your task is to implement the strlen as fast as possible.
There're two problems you've to take care:
1. address alignment.
2. read memory with machine word length(4 bytes).
It's not hard to find the first alignment address in the given string.
Then we can read memory once with the 4 bytes, and count up it the total length. But we should stop once there's a zero byte in the 4 bytes, and count the left bytes before zero byte. In order to check the zero byte in a fast way, there's a code snippet from glibc:
unsigned long int longword, himagic, lomagic;
himagic = 0x80808080L;
lomagic = 0x01010101L;
// There's zero byte in 4 bytes.
if (((longword - lomagic) & ~longword & himagic) != 0) {
// do left thing...
}
I used it in Visual C++, to compare with CRT's implementation. The CRT's is much more faster than the above one.
I'm not familiar with CRT's implementation, did they use a faster way to check the zero byte?
You could save the length of the string along with the string when creating it, as is done in Pascal.
First CRT's one is written directly in assembler. you can see it's source code here C:\Program Files\Microsoft Visual Studio 9.0\VC\crt\src\intel\strlen.asm (this is for VS 2008)
It depends. Microsoft's library really has two different versions of strlen. One is a portable version in C that's about the most trivial version of strlen possible, pretty close (and probably equivalent) to:
size_t strlen(char const *str) {
for (char const *pos=str; *pos; ++pos)
;
return pos-str;
}
The other is in assembly language (used only for Intel x86), and quite similar to what you have above, at least as far as load 4 bytes, check of one of them is zero, and react appropriately. The only obvious difference is that instead of subtracting, they basically add pre-negate the bytes and add. I.e. instead of word-0x0101010101, they use word + 0x7efefeff.
there are also compiler intrinsic versions which use the REPNE SCAS instruction pair, though these are generally on older compilers, they can still be pretty fast. there are also SSE2 versions of strlen, such as Dr Agner Fog's performance library's implementation, or something such as this
Remove those 'L' suffixes and see... You are promoting all calculations to "long"! On my 32-bits tests, that alone doubles the cost.
I also do two micro-optimizations:
Since most strings we use scan consist of ASCII chars in the range 0~127, the high bit is (almost) never set, so only check for it in a second test.
Increment an index rather than a pointer, which is cheaper on some architectures (notably x86) and give you the length for 'free'...
uint32_t gatopeich_strlen32(const char* str)
{
uint32_t *u32 = (uint32_t*)str, u, abcd, i=0;
while(1)
{
u = u32[i++];
abcd = (u-0x01010101) & 0x80808080;
if (abcd && // If abcd is not 0, we have NUL or a non-ASCII char > 127...
(abcd &= ~u)) // ... Discard non-ASCII chars
{
#if BYTE_ORDER == BIG_ENDIAN
return 4*i - (abcd&0xffff0000 ? (abcd&0xff000000?4:3) : abcd&0xff00?2:1);
#else
return 4*i - (abcd&0xffff ? (abcd&0xff?4:3) : abcd&0xff0000?2:1);
#endif
}
}
}
Assuming you know the maximum possible length, and you've initated the memory to \0 before use, you could do a binary split and go left/right depending on the value(\0, split on left, else split on right). That way you'd dramatically decrease the amount of checks you'll need to find the length. Not optimal(requires some setup), but should be really fast.
// Eric
Obviously, crafting a tight loop like this in assembler would be fastest, however if you want/need to keep it more human-readable and/or portable in C(++), you can still increase the speed of the standard function by using the register keyword.
The register keyword prompts the compiler to store the counter in a register on the CPU instead of in memory which will significantly speed up the loop.
Note however, that the register keyword is only a suggestion and the compiler is free to ignore it if it thinks it can do better, especially if certain optimization options are used. That said, while it is almost certainly going to be ignored for a local, class variable in a triple for-loop, it is likely to be honored for the code below, thus improving performance quite a bit (nearly on par with the assembler version):
size_t strlen ( const char* s ) {
for (register const char* i=s; *i; ++i);
return (i-s);
}
I'm porting one of my C++ libraries to a somewhat wonky compiler -- it doesn't support stringstreams, or C99 features like snprintf(). I need to format int, float, etc values as char*, and the only options available seem to be 1) use sprintf() 2) hand-roll formatting procedures.
Given this, how do I determine (at either compile- or run-time) how many bytes are required for a formatted floating-point value? My library might be used for fuzz-testing, so it needs to handle even unusual or extreme values.
Alternatively, is there a small (100-200 lines preferred), portable implementation of snprintf() available that I could simply bundle with my library?
Ideally, I would end up with either normal snprintf()-based code, or something like this:
static const size_t FLOAT_BUFFER_SIZE = /* calculate max buffer somehow */;
char *fmt_double(double x)
{
char *buf = new char[FLOAT_BUFFER_SIZE + 1];
sprintf(buf, "%f", x);
return buf;
}
Related questions:
Maximum sprintf() buffer size for integers
Maximum sprintf() buffer size for %g-formatted floats
Does the compiler support any of ecvt, fcvt or gcvt? They are a bit freakish, and hard to use, but they have their own buffer (ecvt, fcvt) and/or you may get lucky and find the system headers have, as in VC++, a definition of the maximum number of chars gcvt will produce. And you can take it from there.
Failing that, I'd consider the following quite acceptable, along the lines of the code provided. 500 chars is pretty conservative for a double; valid values are roughly 10^-308 to 10^308, so even if the implementation is determined to be annoying by printing out all the digits there should be no overflow.
char *fmt_double(double d) {
static char buf[500];
sprintf(buf,"%f",d);
assert(buf[sizeof buf-1]==0);//if this fails, increase buffer size!
return strdup(buf);
}
This doesn't exactly provide any amazing guarantees, but it should be pretty safe(tm). I think that's as good as it gets with this sort of approach, unfortunately. But if you're in the habit of regularly running debug builds, you should at least get early warning of any problems...
I think GNU Libiberty is what you want. You can just include the implementation of snprintf.
vasprintf.c - 152 LOC.
I'm working on a GUI framework, where I want all the elements to be identified by ascii strings of up to 8 characters (or 7 would be ok).
Every time an event is triggered (some are just clicks, but some are continuous), the framework would callback to the client code with the id and its value.
I could use actual strings and strcmp(), but I want this to be really fast (for mobile devices), so I was thinking to use char constants (e.g. int id = 'BTN1';) so you'd be doing a single int comparison to test for the id. However, 4 chars isn't readable enough.
I tried an experiment, something like-
long int id = L'abcdefg';
... but it looks as if char constants can only hold 4 characters, and the only thing making a long int char constant gives you is the ability for your 4 characters to be twice as wide, not have twice the amount of characters. Am I missing something here?
I want to make it easy for the person writing the client code. The gui is stored in xml, so the id's are loaded in from strings, but there would be constants written in the client code to compare these against.
So, the long and the short of it is, I'm looking for a cross-platform way to do quick 7-8 character comparison, any ideas?
Are you sure this is not premature optimisation? Have you profiled another GUI framework that is slow purely from string comparisons? Why are you so sure string comparisons will be too slow? Surely you're not doing that many string compares. Also, consider strcmp should have a near optimal implementation, possibly written in assembly tailored for the CPU you're compiling for.
Anyway, other frameworks just use named integers, for example:
static const int MY_BUTTON_ID = 1;
You could consider that instead, avoiding the string issue completely. Alternatively, you could simply write a helper function to convert a const char[9] in to a 64-bit integer. This should accept a null-terminated string "like so" up to 8 characters (assuming you intend to throw away the null character). Then your program is passing around 64-bit integers, but the programmer is dealing with strings.
Edit: here's a quick function that turns a string in to a number:
__int64 makeid(const char* str)
{
__int64 ret = 0;
strncpy((char*)&ret, str, sizeof(__int64));
return ret;
}
One possibility is to define your IDs as a union of a 64-bit integer and an 8-character string:
union ID {
Int64 id; // Assuming Int64 is an appropriate typedef somewhere
char name[8];
};
Now you can do things like:
ID id;
strncpy(id.name, "Button1", 8);
if (anotherId.id == id.id) ...
The concept of string interning can be useful for this problem, turning string compares into pointer compares.
Easy to get pre-rolled Components
binary search tree for the win -- you get a red-black tree from most STL implementations of set and map, so you might want to consider that.
Intrusive versions of the STL containers perform MUCH better when you move the container nodes around a lot (in the general case) -- however they have quite a few caveats.
Specific Opinion -- First Alternative
If I was you I'd stick to a 64-bit integer type and bundle it in a intrusive container and use the library provided by boost. However if you are new to this sort of thing then use stl::map it is conceptually simpler to grasp, and it has less chances of leaking resources since there is more literature and guides out there for these types of containers and the best practises.
Alternative 2
The problem you are trying to solve I believe: is to have a global naming scheme which maps to handles. You can create a mapping of names to handles so that you can use the names to retrieve handles:
// WidgetHandle is a polymorphic base class (i.e., it has a virtual method),
// and foo::Luv implement WidgetHandle's interface (public inheritance)
foo::WidgetHandle * LuvComponent =
Factory.CreateComponent<foo::Luv>( "meLuvYouLongTime");
....
.... // in different function
foo::WidgetHandle * LuvComponent =
Factory.RetrieveComponent<foo::Luv>("meLuvYouLongTime");
Alternative 2 is a common idiom for IPC, you create an IPC type say a pipe in one process and you can ask the kernel for to retrieve the other end of the pipe by name.
I see a distinction between easily read identifiers in your code, and the representation being passed around.
Could you use an enumerated type (or a large header file of constants) to represent the identifier? The names of the enumerated types could then be as long and meaningful as you wish, and still fit in (I am guessing) a couple of bytes.
In C++0x, you'll be able to use user-defined string literals, so you could add something like 7chars..id or "7chars.."id:
template <char...> constexpr unsigned long long operator ""id();
constexpr unsigned long long operator ""id(const char *, size_t);
Although I'm not sure you can use constexpr for the second one.
Assume that you're working a x86 32-bits system. Your task is to implement the strlen as fast as possible.
There're two problems you've to take care:
1. address alignment.
2. read memory with machine word length(4 bytes).
It's not hard to find the first alignment address in the given string.
Then we can read memory once with the 4 bytes, and count up it the total length. But we should stop once there's a zero byte in the 4 bytes, and count the left bytes before zero byte. In order to check the zero byte in a fast way, there's a code snippet from glibc:
unsigned long int longword, himagic, lomagic;
himagic = 0x80808080L;
lomagic = 0x01010101L;
// There's zero byte in 4 bytes.
if (((longword - lomagic) & ~longword & himagic) != 0) {
// do left thing...
}
I used it in Visual C++, to compare with CRT's implementation. The CRT's is much more faster than the above one.
I'm not familiar with CRT's implementation, did they use a faster way to check the zero byte?
You could save the length of the string along with the string when creating it, as is done in Pascal.
First CRT's one is written directly in assembler. you can see it's source code here C:\Program Files\Microsoft Visual Studio 9.0\VC\crt\src\intel\strlen.asm (this is for VS 2008)
It depends. Microsoft's library really has two different versions of strlen. One is a portable version in C that's about the most trivial version of strlen possible, pretty close (and probably equivalent) to:
size_t strlen(char const *str) {
for (char const *pos=str; *pos; ++pos)
;
return pos-str;
}
The other is in assembly language (used only for Intel x86), and quite similar to what you have above, at least as far as load 4 bytes, check of one of them is zero, and react appropriately. The only obvious difference is that instead of subtracting, they basically add pre-negate the bytes and add. I.e. instead of word-0x0101010101, they use word + 0x7efefeff.
there are also compiler intrinsic versions which use the REPNE SCAS instruction pair, though these are generally on older compilers, they can still be pretty fast. there are also SSE2 versions of strlen, such as Dr Agner Fog's performance library's implementation, or something such as this
Remove those 'L' suffixes and see... You are promoting all calculations to "long"! On my 32-bits tests, that alone doubles the cost.
I also do two micro-optimizations:
Since most strings we use scan consist of ASCII chars in the range 0~127, the high bit is (almost) never set, so only check for it in a second test.
Increment an index rather than a pointer, which is cheaper on some architectures (notably x86) and give you the length for 'free'...
uint32_t gatopeich_strlen32(const char* str)
{
uint32_t *u32 = (uint32_t*)str, u, abcd, i=0;
while(1)
{
u = u32[i++];
abcd = (u-0x01010101) & 0x80808080;
if (abcd && // If abcd is not 0, we have NUL or a non-ASCII char > 127...
(abcd &= ~u)) // ... Discard non-ASCII chars
{
#if BYTE_ORDER == BIG_ENDIAN
return 4*i - (abcd&0xffff0000 ? (abcd&0xff000000?4:3) : abcd&0xff00?2:1);
#else
return 4*i - (abcd&0xffff ? (abcd&0xff?4:3) : abcd&0xff0000?2:1);
#endif
}
}
}
Assuming you know the maximum possible length, and you've initated the memory to \0 before use, you could do a binary split and go left/right depending on the value(\0, split on left, else split on right). That way you'd dramatically decrease the amount of checks you'll need to find the length. Not optimal(requires some setup), but should be really fast.
// Eric
Obviously, crafting a tight loop like this in assembler would be fastest, however if you want/need to keep it more human-readable and/or portable in C(++), you can still increase the speed of the standard function by using the register keyword.
The register keyword prompts the compiler to store the counter in a register on the CPU instead of in memory which will significantly speed up the loop.
Note however, that the register keyword is only a suggestion and the compiler is free to ignore it if it thinks it can do better, especially if certain optimization options are used. That said, while it is almost certainly going to be ignored for a local, class variable in a triple for-loop, it is likely to be honored for the code below, thus improving performance quite a bit (nearly on par with the assembler version):
size_t strlen ( const char* s ) {
for (register const char* i=s; *i; ++i);
return (i-s);
}