Optimize non-cost variable access - c++

There's an interesting optimization problem I'm facing.
In a large code base, consisting of a large number of classes, in many places the value of a non-constant global (=file scope) variable is very often used/examined and the unnecessary memory accesses of this variable are to be avoided.
This variable is initialized once, but because of the complexity of its initialization and the need to call a number of functions, it cannot be initialized like this, before execution of main():
unsigned size = 1000;
int main()
{
// some code
}
or
unsigned size = CalculateSize();
int main()
{
// some code
}
Instead it has to be initialized like this:
unsigned size;
int main()
{
// some code
size = CalculateSize();
// lots of code (statically/dynamically created class objects, whatnot)
// that makes use of "size"
return 0;
}
Just because size isn't a constant and it is global (=file scope) and the code is large and complex, the compiler is unable to infer that size never changes after size = CalculateSize();. The compiler generates code that fetches and refetches the value of size from the variable and can't "cache" it in a register or in a local (on-stack) variable that's likely to be in the CPU's d-cache together with other frequently accessed local variables.
So, if I have something like the following (a made-up example for illustrative purposes):
size = CalculateSize();
if (size > 200) blah1();
blah2();
if (size > 200) blah3();
The compiler thinks that blah1() and blah2() may change size and it generates a memory read from size in if (size > 200) blah3();.
I'd like to avoid that extra read whenever and wherever possible.
Obviously, hacks like this:
const unsigned size = 0;
int main()
{
// some code
*(unsigned*)&size = CalculateSize();
// lots more code
}
won't do as they invoke undefined behavior.
The question is how to inform the compiler that it can "cache" the value of size once size = CalculateSize(); has been performed and do it without invoking undefined behavior, unspecified behavior and, hopefully, implementation-specific behavior.
This is needed for C++03 and g++ (4.x.x). C++11 may or may not be an option, I'm not sure, I'm trying to avoid using advanced/modern C++ features to stay within the coding guidelines and predefined toolset.
So far I've only come up with a hack to create a constant copy of size within every class that's using it and use the copy, something like this (decltype makes it C++11, but we can do without decltype):
#include <iostream>
using namespace std;
volatile unsigned initValue = 255;
unsigned size;
#define CACHE_VAL(name) \
const struct CachedVal ## name \
{ \
CachedVal ## name() { this->val = ::name; } \
decltype(::name) val; \
} _CachedVal ## name;
#define CACHED(name) \
_CachedVal ## name . val
class C
{
public:
C() { cout << CACHED(size) << endl; }
CACHE_VAL(size);
};
int main()
{
size = initValue;
C c;
return 0;
}
The above may only help up to a point. Are there better and more suggestive-to-the-compiler alternatives that are legal C++? Hoping for a minimally intrusive (source-code-wise) solution.
UPDATE: To make it a bit more clear, this is in a performance-sensitive application. It's not that I'm trying to get rid of unnecessary reads of that particular variable out of whim. I'm trying to let/make the compiler produce more optimal code. Any solution that involves reading/writing another variable as often as size and any additional code in the solution (especially with branching and conditional branching) executed as often as size is referred to is also going to affect the performance. I don't want to win in one place only to lose the same or even more in another place.
Here's a related non-solution, causing UB (at least in C).

There's the register keyword in C++ which tells the compiler you plan on using a variable a lot. Don't know about the compiler you're using, but most of the modern compilers do that for the users, adding a variable into the registry if needed. You can also declare the variable as constant and initialize it using const_cast.

what of:
const unsigned getSize( void )
{
static const unsigned size = calculateSize();
return size;
}
This will delay the initialization of size until the first call to getSize(), but still keep it const.
GCC 4.8.2

#include <iostream>
unsigned calculate() {
std::cout<<"calculate()\n";
return 42;
}
const unsigned mySize() {
std::cout<<"mySize()\n";
static const unsigned someSize = calculate();
return someSize;
}
int main() {
std::cout<<"main()\n";
mySize();
}
prints:
main()
mySize()
calculate()
on GCC 4.8.0
Checking for whether it has been initialized already or not will be almost fully mitigated by the branch predictor. You will end up having one false and a quadrillion trues afterwards.
Yes, you will still have to access that state after the pipeline has been basically built, potentially wreaking havoc in the caches, but you can't be sure unless you profile.
Also, compiler can likely do some extra magic for you (and it is what you're looking for), so I suggest you first compile and profile with this approach before discarding it entirely.

Related

How to let the compiler do the offset computations for an odd polymorphism structure, with as little code as possible?

I am not sure if this is possible at all in standard C++, so whether it even is possible to do, could be a secondary way to put my question.
I have this binary data which I want to read and re-create using structs. This data is originally created as a stream with the content appended to a buffer, field by field at a time; nothing special about that. I could simply read it as a stream, the same way it was written. Instead, I merely wanted to see if letting the compiler do the math for me, was possible, and instead implementing the binary data as a data structure instead.
The fields of the binary data have a predictable order which allows it to be represented as a data type, the issue I am having is with the depth and variable length of repeating fields. I am hoping the example code below makes it clearer.
Simple Example
struct Common {
int length;
};
struct Boo {
long member0;
char member1;
};
struct FooSimple : Common {
int count;
Boo boo_list[];
};
char buffer[1024];
int index = 15;
((FooSimple *)buffer)->boo_list[index].member0;
Advanced Example
struct Common {
int length;
};
struct Boo {
long member0;
char member1;
};
struct Goo {
int count;
Boo boo_list[];
};
struct FooAdvanced : Common {
int count;
Goo goo_list[];
};
char buffer[1024];
int index0 = 5, index1 = 15;
((FooAdvanced *)buffer)->goo_list[index0].boo_list[index1].member0;
The examples are not supposed to relate. I re-used some code due to lack of creativity for unique names.
For the simple example, there is nothing unusual about it. The Boo struct is of fixed size, therefore the compiler can do the calculations just fine, to reach the member0 field.
For the advanced example, as far as I can tell at least, it isn't as trivial of a case. The problem that I see, is that if I use the array selector operator to select a Goo object from the inline array of Goo-elements (goo_list), the compiler will not be able to do the offset calculations properly unless it makes some assumptions; possibly assuming that all preceding Goo-elements in the array have zero Boo-elements in the inline array (boo_list), or some other constant value. Naturally, that won't be the case.
Question(s):
What ways are there to achieve the offset computations to be done by the compiler, despite the inline arrays having variable lengths? Unless I am missing something, I believe templates can't help at all, due to their compile-time nature.
Is this even possible to achieve in C++?
How do you handle the case with instantiating a FoodAdvanced object, by feeding a variable number of Goo and Boo element counts to the goo_list and boo_list members, respectively?
If it is impossible, would I have to write some sort of wrapper code to handle the calculations instead?

Is using alloca() for variable length arrays better than using a vector on the heap?

I have some code using a variable length array (VLA), which compiles fine in gcc and clang, but does not work with MSVC 2015.
class Test {
public:
Test() {
P = 5;
}
void somemethod() {
int array[P];
// do something with the array
}
private:
int P;
}
There seem to be two solutions in the code:
using alloca(), taking the risks of alloca in account by making absolutely sure not to access elements outside of the array.
using a vector member variable (assuming that the overhead between vector and c array is not the limiting factor as long as P is constant after construction of the object)
The ector would be more portable (less #ifdef testing which compiler is used), but I suspect alloca() to be faster.
The vector implementation would look like this:
class Test {
public:
Test() {
P = 5;
init();
}
void init() {
array.resize(P);
}
void somemethod() {
// do something with the array
}
private:
int P;
vector<int> array;
}
Another consideration: when I only change P outside of the function, is having a array on the heap which isn't reallocated even faster than having a VLA on the stack?
Maximum P will be about 400.
You could and probably should use some dynamically allocated heap memory, such as managed by a std::vector (as answered by Peter). You could use smart pointers, or plain raw pointers (new, malloc,....) that you should not forget to release (delete,free,....). Notice that heap allocation is probably faster than what you believe (practically, much less than a microsecond on current laptops most of the time).
Sometimes you can move the allocation out of some inner loop, or grow it only occasionally (so for a realloc-like thing, better use unsigned newsize=5*oldsize/4+10; than unsigned newsize=oldsize+1; i.e. have some geometrical growth). If you can't use vectors, be sure to keep separate allocated size and used lengths (as std::vector does internally).
Another strategy would be to special case small sizes vs bigger ones. e.g. for an array less than 30 elements, use the call stack; for bigger ones, use the heap.
If you insist on allocating (using VLAs -they are a commonly available extension of standard C++11- or alloca) on the call stack, be wise to limit your call frame to a few kilobytes. The total call stack is limited (e.g. often to about a megabyte or a few of them on many laptops) to some implementation specific limit. In some OSes you can raise that limit (see also setrlimit(2) on Linux)
Be sure to benchmark before hand-tuning your code. Don't forget to enable compiler optimization (e.g. g++ -O2 -Wall with GCC) before benchmarking. Remember that caches misses are generally much more expensive than heap allocation. Don't forget that developer's time also has some cost (which often is comparable to cumulated hardware costs).
Notice that using static variable or data has also issues (it is not reentrant, not thread safe, not async-signal-safe -see signal-safety(7) ....) and is less readable and less robust.
First of all, you're getting lucky if your code compiles with ANY C++ compiler as is. VLAs are not standard C++. Some compilers support them as an extension.
Using alloca() is also not standard, so is not guaranteed to work reliably (or even at all) when using different compilers.
Using a static vector is inadvisable in many cases. In your case, it gives behaviour that is potentially not equivalent to the original code.
A third option you may wish to consider is
// in definition of class Test
void somemethod()
{
std::vector<int> array(P); // assume preceding #include <vector>
// do something with array
}
A vector is essentially a dynamically allocated array, but will be cleaned up properly in the above when the function returns.
The above is standard C++. Unless you perform rigorous testing and profiling that provides evidence of a performance concern this should be sufficient.
Why don't you make the array a private member?
#include <vector>
class Test
{
public:
Test()
{
data_.resize(5);
}
void somemethod()
{
// do something with data_
}
private:
std::vector<int> data_;
}
As you've specified a likely maximum size of the array, you could also look at something like boost::small_vector, which could be used like:
#include <boost/container/small_vector.hpp>
class Test
{
public:
Test()
{
data_.resize(5);
}
void somemethod()
{
// do something with data_
}
private:
using boc = boost::container;
constexpr std::size_t preset_capacity_ = 400;
boc::small_vector<int, preset_capacity_> data_;
}
You should profile to see if this is actually better, and be aware this will likely use more memory, which could be an issue if there are many Test instances.

Struct hack equivalent in C++

The struct hack where you have an array of length 0 as the last member of a struct from C90 and C99 is well known, and with the introduction of flexible array members in C99, we even got a standardized way of using it with []. Unfortunately, C++ provides no such construct, and (at least with Clang 3.4), compiling a struct with either [0] or [] will yield a compilation warning with --std=c++11 -pedantic:
$ cat test.cpp
struct hack {
char filler;
int things[0];
};
$ clang++ --std=c++11 -pedantic test.cpp
\test.cpp:3:14: warning: zero size arrays are an extension [-Wzero-length-array]
int things[0];
and similarly
$ cat test.cpp
struct fam {
char filler;
int things[];
};
$ clang++ --std=c++11 -pedantic test.cpp
\test.cpp:3:7: warning: flexible array members are a C99 feature [-Wc99-extensions]
int things[];
My question then is this; say that I want to have a struct that contains an array of variable size as the last item in C++. What is the right thing to do given a compiler that supports both? Should I go with the struct hack [0] (which is a compiler extension), or the FAM [] (which is a C99 feature)? As far as I understand it, either will work, but I am trying to figure out which is the lesser evil?
Also, before people start suggesting keeping an int* to a separately allocated piece of memory in the struct instead, that is not a satisfactory answer. I want to allocate a single piece of memory to hold both my struct and the array elements. Using a std::vector also falls into the same category. If you wonder why I don't want to use a pointer instead, the R.'s answer to another question gives a good overview.
There have been some similar questions elsewhere, but none give an answer to this particular question:
Are flexible array members valid in C++?: Very similar, but the question there is whether FAM is valid in C++ (no). I am looking for a good reason to pick one or the other.
Conforming variant of the old “struct hack”: Proposes an alternative, but it's neither pretty, nor always correct (what if padding is added to the struct?). Accessing the elements later is also not as clean as doing e.things[42].
You can get more or less the same effect using a member
function and a reinterpret_cast:
int* buffer() { return reinterpret_cast<int*>(this + 1); }
This has one major defect: it doesn't guarantee correct
alignment. For example, something like:
struct Hack
{
char size;
int* buffer() { return reinterpret_cast<int*>(this + 1); }
};
is likely to return a mis-aligned pointer. You can work around
this by putting the data in the struct in a union with the type
whose pointer you are returning. If you have C++11, you can
declare:
struct alignas(alignof(int)) Hack
{
char size;
int* buffer() { return reinterpret_cast<int*>(this + 1); }
};
(I think. I've never actually tried this, and I could have some
details of the syntax wrong.)
This idiom has a second important defect: it does nothing to
ensure that the size field corresponds to the actual size of the
buffer, and worse, there is no real way of using new here. To
correct this, somewhat, you can define a class specific
operator new and operator delete:
struct alignas(alignof(int)) Hack
{
void* operator new( size_t, size_t n );
void operator delete( void* );
Hack( size_t n );
char size;
int* buffer() { return reinterpret_cast<int*>(this + 1); }
};
The client code will then have to use placement new to allocate:
Hack* hack = new (20) Hack(20);
The client still has to repeat the size, but he cannot ignore
it.
There are also techniques which can be used to prevent creating
instances which aren't allocated dynamically, etc., to end up
with something like:
struct alignas(alignof(int)) Hack
{
private:
void operator delete( void* p )
{
::operator delete( p );
}
// ban all but dynamic lifetime (and also inheritance, member, etc.)
~Hack() = default;
// ban arrays
void* operator new[]( size_t ) = delete;
void operator delete[]( void* p ) = delete;
public:
Hack( size_t n );
void* operator new( size_t, size_t n )
{
return ::operator new( sizeof(Hack) + n * sizeof(int) );
}
char size;
// Since dtor is private, we need this.
void deleteMe() { delete this; }
int* buffer() { return reinterpret_cast<int*>(this + 1); }
};
Given the fundamental dangers of such a class, it is debatable
if so many protective measures are necessary. Even with them,
it's really only usable by someone who fully understands all of
the constraints, and is carefully paying attention. In all but
extreme cases, in very low level code, you'd just make the
buffer a std::vector<int> and be done with it. In all but the
lowest level code, the difference in performance would not be
worth the risk and effort.
EDIT:
As a point of example, g++'s implementation of
std::basic_string uses something very similar to the above,
with a struct containing a reference count, the current size
and the current capacity (three size_t), followed directly by
the character buffer. And since it was written long before
C++11 and alignas/alignof, something like
std::basic_string<double> will crash on some systems (e.g.
a Sparc). (While technically a bug, most people do not consider
this a critical problem.)
This is C++, so templates are available:
template <int N>
struct hack {
int filler;
int thing [N];
};
Casting between different pointers to different instantiations will be the difficult issue, then.
The first thing that comes to mind is DON't, don't write C in C++. In 99.99% of the cases this hack is not needed, won't make any noticeable improvement in performance over just holding a std::vector and will complicate your life and that of the other maintainers of the project in which you deploy this.
If you want a standard compliant approach, provide a wrapper type that dynamically allocates a chunk of memory large enough to contain the hack (minus the array) plus N*sizeof(int) for the equivalent of the array (don't forget to ensure proper alighnment). The class would have accessors that map the members and the array elements to the correct location in memory.
Ignoring alignment and boiler plate code to make the interface nice and the implementation safe:
template <typename T>
class DataWithDynamicArray {
void *ptr;
int* array() {
return static_cast<int*>(static_cast<char*>(ptr)+sizeof(T)); // align!
}
public:
DataWithDynamicArray(int size) : ptr() {
ptr = malloc(sizeof(T) + sizeof(int)*size); // force correct alignment
new (ptr) T();
}
~DataWithDynamicArray() {
static_cast<T*>(ptr)->~T();
free(ptr);
}
// copy, assignment...
int& operator[](int pos) {
return array()[pos];
}
T& data() {
return *static_cast<T*>(ptr);
}
};
struct JustSize { int size; };
DataWithDynamicArray<JustSize> x(10);
x.data().size = 10
for (int i = 0; i < 10; ++i) {
x[i] = i;
}
Now I would really not implement it that way (I would avoid implementing it at all!!), as for example the size should be a part of the state of DataWithDynamicArray...
This answer is provided only as an exercise, to explain that the same thing can be done without extensions, but beware this is just a toy example that has many issues including but not limited to exception safety or alignment (and yet is better than forcing the user to do the malloc with the correct size). The fact that you can does not mean that you should, and the real question is whether you need this feature and whether what you are trying to do is a good design at all or not.
If you really you feel the need to use a hack, why not just use
struct hack {
char filler;
int things[1];
};
followed by
hack_p = malloc(sizeof(struct hack)+(N-1)*sizeof int));
Or don't even bother about the -1 and live with a little extra space.
C++ does not have the concept of "flexible arrays". The only way to have a flexible array in C++ is to use a dynamic array - which leads you to use int* things. You will need a size parameter if you are attempting to read this data from a file so that you can create the appropriate sized array (or use a std::vector and just keep reading until you reach the end of the stream).
The "flexible array" hack keeps the spatial locality (that is has the allocated memory in a contiguous block to the rest of the structure), which you lose when you are forced to use dynamic memory. There isn't really an elegant way around that (e.g. you could allocate a large buffer, but you would have to make it sufficiently large enough to hold any number of elements you wanted - and if the actual data being read in was smaller than the buffer, there would be wasted space allocated).
Also, before people start suggesting keeping an int* to a separately
allocated piece of memory in the struct instead, that is not a
satisfactory answer. I want to allocate a single piece of memory to
hold both my struct and the array elements. Using a std::vector also
falls into the same category.
A non-standard extension is not going to work when you move to a compiler that does not support it. If you keep to the standard (e.g. avoid using compiler-specific hacks), you are less likely to run into these types of issues.
There is at least one advantage for flexible array members over zero length arrays when the compiler is clang.
struct Strukt1 {
int fam[];
int size;
};
struct Strukt2 {
int fam[0];
int size;
};
Here clang will error if it sees Strukt1 but won't error if it instead sees Strukt2. gcc and icc accept either without errors and msvc errors in either case. gcc does error if the code is compiled as C.
The same applies for this similar but less obvious example:
struct Strukt3 {
int size;
int fam[];
};
strukt Strukt4 {
Strukt3 s3;
int i;
};

Order of Local Variables : Best way to declare variables(varying in size) in cpp [duplicate]

This question already has answers here:
How declaration of variables behave?
(2 answers)
Closed 9 years ago.
I'm currently reviewing a code, and there are many local variables in varying sizes.
Is declaring in increasing order of size the preferable one or vice versa.
Explain it with memory layout in either of scenarios.
Is memory allocated for the local variables based on order of declaration or on size.
int fun()
{
struct *ptr;
int var1;
long double *ld;
.
.
.
.
}
The best place to declare (and initialize) a local variable in C++ is right at the point where it's first needed.
The size of the variable should not be a consideration at all, unless you have specific evidence to the contrary.
Compiler will reorder local variables as it sees fit, when it does optimizing. In short, order of variables in the same scope does not matter.
What is good idea though, is to declare local variables in the scope where it is used, for example:
void func() {
//int i, j; // not here!
for (int i = 0 ; i<10; ++i) {
int j = func2(i);
...
}
// i and j below are different variables than i and j above
// you can consider changing their names if they also have different meaning
for (int i = 0 ; i<10; ++i) {
int j = func3(i);
...
}
}
Though for good optimizing compiler, that likely will not matter from performance or memory footprint point of view (it will detect when variables are used anyway). It will still make the code more readable, and avoid mixing unrelated values in different scopes, thus protecting from some stupid bugs not caught by compiler warnings (because compiler doesn't know when you are accidentally forgetting re-initialization of re-used variable, but it will know if you forget to initialize a new variable).
Also, important thing when worrying about variables (or anything): remember to turn on warnings for compiler, like -Wall -Wextra for gcc. Also, using valgrind is good idea (if you can get your code to run on OS which has valgrind).
My approach is, that I declare local variables in the smallest possible scope, at the scope's beginning, e.g.
void foo()
{
int local1 = 42;
int local2 = bar(local1);
if ( local2 != local1)
{
double local3 = double(local2)/double(local1);
MyMemoryAllocatingObject mmao; // large memory allocation, deallocation in destructor
baz(local3);
bat(mmao);
} // mmao memory gets freed here
}
For not-sophisticated compilers it helps optimization, for users it helps tracking the information. Plus, it helps keeping memory footprint as small as possible, because the locals go out of scope (sic!), i.e. their destructor is called.

Meaning of acronym SSO in the context of std::string

In a C++ question about optimization and code style, several answers referred to "SSO" in the context of optimizing copies of std::string. What does SSO mean in that context?
Clearly not "single sign on". "Shared string optimization", perhaps?
Background / Overview
Operations on automatic variables ("from the stack", which are variables that you create without calling malloc / new) are generally much faster than those involving the free store ("the heap", which are variables that are created using new). However, the size of automatic arrays is fixed at compile time, but the size of arrays from the free store is not. Moreover, the stack size is limited (typically a few MiB), whereas the free store is only limited by your system's memory.
SSO is the Short / Small String Optimization. A std::string typically stores the string as a pointer to the free store ("the heap"), which gives similar performance characteristics as if you were to call new char [size]. This prevents a stack overflow for very large strings, but it can be slower, especially with copy operations. As an optimization, many implementations of std::string create a small automatic array, something like char [20]. If you have a string that is 20 characters or smaller (given this example, the actual size varies), it stores it directly in that array. This avoids the need to call new at all, which speeds things up a bit.
EDIT:
I wasn't expecting this answer to be quite so popular, but since it is, let me give a more realistic implementation, with the caveat that I've never actually read any implementation of SSO "in the wild".
Implementation details
At the minimum, a std::string needs to store the following information:
The size
The capacity
The location of the data
The size could be stored as a std::string::size_type or as a pointer to the end. The only difference is whether you want to have to subtract two pointers when the user calls size or add a size_type to a pointer when the user calls end. The capacity can be stored either way as well.
You don't pay for what you don't use.
First, consider the naive implementation based on what I outlined above:
class string {
public:
// all 83 member functions
private:
std::unique_ptr<char[]> m_data;
size_type m_size;
size_type m_capacity;
std::array<char, 16> m_sso;
};
For a 64-bit system, that generally means that std::string has 24 bytes of 'overhead' per string, plus another 16 for the SSO buffer (16 chosen here instead of 20 due to padding requirements). It wouldn't really make sense to store those three data members plus a local array of characters, as in my simplified example. If m_size <= 16, then I will put all of the data in m_sso, so I already know the capacity and I don't need the pointer to the data. If m_size > 16, then I don't need m_sso. There is absolutely no overlap where I need all of them. A smarter solution that wastes no space would look something a little more like this (untested, example purposes only):
class string {
public:
// all 83 member functions
private:
size_type m_size;
union {
class {
// This is probably better designed as an array-like class
std::unique_ptr<char[]> m_data;
size_type m_capacity;
} m_large;
std::array<char, sizeof(m_large)> m_small;
};
};
I'd assume that most implementations look more like this.
SSO is the abbreviation for "Small String Optimization", a technique where small strings are embedded in the body of the string class rather than using a separately allocated buffer.
As already explained by the other answers, SSO means Small / Short String Optimization.
The motivation behind this optimization is the undeniable evidence that applications in general handle much more shorter strings than longer strings.
As explained by David Stone in his answer above, the std::string class uses an internal buffer to store contents up to a given length, and this eliminates the need to dynamically allocate memory. This makes the code more efficient and faster.
This other related answer clearly shows that the size of the internal buffer depends on the std::string implementation, which varies from platform to platform (see benchmark results below).
Benchmarks
Here is a small program that benchmarks the copy operation of lots of strings with the same length.
It starts printing the time to copy 10 million strings with length = 1.
Then it repeats with strings of length = 2. It keeps going until the length is 50.
#include <string>
#include <iostream>
#include <vector>
#include <chrono>
static const char CHARS[] = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
static const int ARRAY_SIZE = sizeof(CHARS) - 1;
static const int BENCHMARK_SIZE = 10000000;
static const int MAX_STRING_LENGTH = 50;
using time_point = std::chrono::high_resolution_clock::time_point;
void benchmark(std::vector<std::string>& list) {
std::chrono::high_resolution_clock::time_point t1 = std::chrono::high_resolution_clock::now();
// force a copy of each string in the loop iteration
for (const auto s : list) {
std::cout << s;
}
std::chrono::high_resolution_clock::time_point t2 = std::chrono::high_resolution_clock::now();
const auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(t2 - t1).count();
std::cerr << list[0].length() << ',' << duration << '\n';
}
void addRandomString(std::vector<std::string>& list, const int length) {
std::string s(length, 0);
for (int i = 0; i < length; ++i) {
s[i] = CHARS[rand() % ARRAY_SIZE];
}
list.push_back(s);
}
int main() {
std::cerr << "length,time\n";
for (int length = 1; length <= MAX_STRING_LENGTH; length++) {
std::vector<std::string> list;
for (int i = 0; i < BENCHMARK_SIZE; i++) {
addRandomString(list, length);
}
benchmark(list);
}
return 0;
}
If you want to run this program, you should do it like ./a.out > /dev/null so that the time to print the strings isn't counted.
The numbers that matter are printed to stderr, so they will show up in the console.
I have created charts with the output from my MacBook and Ubuntu machines.
Note that there is a huge jump in the time to copy the strings when the length reaches a given point.
That's the moment when strings don't fit in the internal buffer anymore and memory allocation has to be used.
Note also that on the linux machine, the jump happens when the length of the string reaches 16.
On the macbook, the jump happens when the length reaches 23. This confirms that SSO depends on the platform implementation.
Ubuntu
Macbook Pro