Local-variable declaration inside a "virtual" loop optimization

Local-variable declaration inside a "virtual" loop optimization - c++

This time, I couldn't find what I'm looking for (dunno if I'm not searching for the right stuff...) but, here it is:
In c++, imagine you have a function Bar() that is called once every cycle... like this:
class Foo {
public:
void Bar() {
double my_array[3];
// fills the array based on some calculations
double my_array1[3];
// fills the array based on some calculations
double my_array2[3];
// fills the array based on some calculations
_member_of_class = my_array + my_array1 + my_array2; //+ overloaded
}
private:
float _member_of_class[3];
}
int main(int argc, char** argv) {
Foo foo;
while(/*program running*/) {
foo.Bar();
//LOTS of code here
}
return 0;
}
Now, my_arrays are temporary arrays, not important to be data members, just used to fill the class member... Obviously, the overhead of calling that function is not necessary... Is there a way (well, I'm trying to avoid putting them as class members) of telling the compiler to "save the allocation space" or something so they is less overhead? const would give any tip to the compiler? I'm not sure I'm being clear...
Anyway, thanks!

Use a profiler
As it stands, the function is declared inline in a class. This code will trivially optimize, and the compiler will probably get the allocations out of the loop anyway (depending on how much cruft you have left out of the picture, really).
Also, the overloaded array operators likely get vectorized in gcc (starting with -O3 of -ftree-vectorize). Just look at the verbose out put with
g++ -O3 -march-native -ftreevectorizer-verbose=2 ...
to see what loops got vectorized, and if not, why not.
By all means have a look at the output of g++ -S etc.
Use a profiler. Don't 'optimize' if you don't know that it's necessary.

Pass them as parameters to the Bar function. Arrays decay into pointers and will be quite fast to pass.

You can declare the arrays as static. This will force the compiler to reserve some memory for them, rather than putting them on the stack each time you call the function. Keep in mind that this breaks thread-safety, however.
Also remember that the stack is pre-allocated; this code isn't actually allocating any new memory for your arrays, it's only placing them in memory that's already allocated. There is no overhead here. Your compiler can reserve space for all three arrays with only one instruction.

Why not consider making the arrays private members? If you want to guarantee no overhead in stack allocation during runtime making them private members would make it clear to other programmers that that is what was happening , where as compiler switches or relying on optimisations automatically done by the compiler aren't always obvious to other devs.

Related

Do I need to declare arrays as static local variables?

Let's say that I have a function that I call a lot which has an array in it:
char foo[LENGTH];
Depending upon the value of LENGTH this may be expensive to allocate every time the function is called. I have seen:
static char foo[LENGTH];
So that it is only allocated once and that array is always used: https://en.cppreference.com/w/cpp/language/storage_duration#Static_local_variables
Is that best practice for arrays?
EDIT:
I've seen several responses that static locals are not best. But what about initialization cost? What if I'd called:
char foo[LENGTH] = "lorem ipsum";
Isn't that going to have to be copied every time I call the function?

As LENGTH is supposed to be a compile time constant (C++, no C99 VLA), foo is just going to use space on the stack. Very fast.

First off, time to allocate automatic array of char is not dependent on it's size, and on any sane implementation is a constant time complexity of incrementing stack pointer, which is superfast. Please note, this would be the same even for VLA (which are not valid in C++), only that increment would be a run-time operand. Also please note, the answer would be different if your array would be initialized.
So it is really unclear what performance drawback you are referring to.
On the other hand, if you make the said array static, you would incur no penalty whatsoever in the provided example - since char is not initialized, there will be no normal synchronization which prevents static variables from getting doubly initialized. However, your function will (likely) become thread-unsafe.
Bottom line: premature optimization is the root of evil.

"Allocating" an object of primitive data type and with automatic storage duration is usually not a big deal. The question is more: Do you want that the contents of foo to survive the execution of the function or not?
Consider, for example, following function:
char* bar() {
char foo[LENGTH];
strcpy(foo, "Hello!");
return foo; // returning a pointer to a local variable; undefined behaviour if someone will use it.
}
In this case, foo will go out of scope and will not be (legally) accessible when bar has finished.
Everything is OK, however, if you write
char* bar() {
static char foo[LENGTH];
strcpy(foo, "Hello!");
return foo; // foo has static storage duration and will be destroyed at the end of your program (not at the end of bar())
}
An issue with large variables with automatic storage duration might arise, if they get so large that they will exceed a (limited) stack size, or if you call the function recursively. To overcome this issue, however, you'd need to use dynamic memory allocation instead (i.e. new/delete).

Static data memory limit in C++ application

Someone has written function in our C++ application and is already in the production and I don't know why it's not crashing the application yet. Below is the code.
char *str_Modify()
{
char buffer[70000] = { 0 };
char targetString[70000] = { 0 };
memset(targetString, '\0', sizeof(targetString));
...
...
...
return targetString;
}
As you can see, the function is returning the address of a local variable and the allocated memory will be released once the function returns.
My question
Wanted to know what is the static data memory limit?
What can be the quick fix for this code? Is it a good practice to make the variable targetString static?

(Note that your call to memset has no effect, all the elements are zero-initialised prior to the call.)
It's not crashing the application since one manifestation of the undefined behaviour of your code (returning back a pointer to a now out-of-scope variable with automatic storage duration) is not crashing the application.
Yes, making it static does validate the pointer, but can create other issues centred around concurrent access.
And pick your language: In C++ there are other techniques.

Returning targetString is indeed UB as other answers have said. But there's another supplemental reason why it might crash on some platforms (especially embedded ones): Stack size. The stack segment, where auto variables usually live, is often limited to a few kilobytes; 64K may be common. Two 70K arrays might not be safe to use.
Making targetString static fixes both problems and is an unalloyed improvement IMO; but might still be problematic if the code is used re-entrantly from multiple threads. In some circumstances it could also be considered an inefficent use of memory.
An alternative approach might be to allocate the return buffer dynamically, return the pointer, and have the calling code free it when no longer required.
As for why might it not crash: if the stack segment is large enough and no other function uses enough of it to overwrite buffer[] and that gets pushed first; then targetString[] might survive unscathed, hanging just below the used stack, effectively in a world of its own. Very unsafe though!

It is well defined behaviour in C and C++ because return an address of a static variable exist in memory after function call is over.
For example:
#include <stdio.h>
int *f()
{
static int a[10]={0};
return a;
}
int main()
{
f();
return 0;
}
It is working fine on GCC compiler. [Live Demo]
But, if you remove static keyword then compiler generates warning:
prog.c: In function 'f':
prog.c:6:12: warning: function returns address of local variable [-Wreturn-local-addr]
return a;
^
Also, see this question comments wrote by Ludin.
I believe you are confusing this with int* fun (void) { static int i
= 10; return &i; } versus int* fun (void) { int i = 10; return &i; }, which is another story. The former is well-defined, the latter is
undefined behavior.
Also, tutorialspoint say's :
Second point to remember is that C does not advocate to return the
address of a local variable to outside of the function, so you would
have to define the local variable as static variable.

Wanted to know what is the static data memory limit?
Platform-specific. You haven't specified a platform (OS, compiler, version), so no-one can possibly tell you. It's probably fine though.
What can be the quick fix for this code?
The quick fix is indeed to make the buffer static.
The good fix is to rewrite the function as
char *modify(char *out, size_t outsz) {
// ...
return out;
}
(returning the input is just to simplify reusing the new function in existing code).
Is it a good practice to make the variable targetString static?
No. Sometimes it's the best you can do, but it has a number of problems:
The buffer is always the same size, and always using ~68Kb of memory and/or address space for no good reason. You can't use a bigger one in some contexts, and a smaller one in others. If you really have to memset the whole thing, this incurs a speed penalty in situations where the buffer could be much smaller.
Using static (or global) variables breaks re-entrancy. Simple example: code like
printf("%s,%s\n", str_Modify(1), str_Modify(2));
cannot work sanely, because the second invocation overwrites the first (compare strtok, which can't be used to interleave the tokenizing of two different strings, because it has persistent state).
Since it isn't re-entrant, it also isn't thread-safe, in case you use multiple threads. It's a mess.

Setting size of custom C++ container as template parameter vs constructor

I've written a fixed-size container (a ring buffer, to be exact) in C++. Currently I'm setting the size of the container in the constructor and then allocate the actual buffer on the heap. However, I've been thinking about moving the size parameter out of the constructor and into the template.
Going from this (RingBuffer fitting 100 integers)
RingBuffer<int> buffer(size);
to this
RingBuffer<int, 100> buffer;
This would allow me to allocate the whole buffer on the stack, which is faster than heap allocation, as far as I know. Mainly it's a matter of readability and maintainability though. These buffers often appear as members of classes. I have to initialize them with a size, so I have to initialize them in the initializer-list of every single constructor of the class. That means if I want to change the capacity of the RingBuffer I have to either remember to change it in every initializer-list or work with awkward static const int BUFFER_SIZE = 100; member variables.
My question is, is there any downside to specifying the container size as a template parameter as opposed to in the constructor? What are the pros and cons of either method?
As far as I know the compiler will generate a new type for each differently-sized RingBuffer. This could turn out to be quite a few. Does that hurt compile times much? Does it bloat the code or prevent optimizations? Of course I'm aware that much of this depends on the exact use case but what are the things I need to be aware of when making this decision?

My question is, is there any downside to specifying the container size as a template parameter as opposed to in the constructor? What are the pros and cons of either method?
If you give the size as template parameter, then it needs to be a constexpr (compile time constant expression). Thus your buffer size cannot depend on any run time characteristics (like user input).
Being a compile time constant opens up doors for some optimizations (loop unrolling and constant folding come to my mind) to be more efficient.
As far as I know the compiler will generate a new type for each differently-sized RingBuffer.
This is true. But I wouldn't worry about that, as having many different types per se won't have any impact on performance or code size (but probably on compile time).
Does that hurt compile times much?
It will make compilation slower. Though I doubt that in your case (this is a pretty simple template) this will even be noticeable. Thus it depends on your definition of "much".
Does it bloat the code or prevent optimizations?
Prevent optimizations? No. Bloat the code? Possibly. That depends on both how exactly you implement your class and what your compiler does. Example:
template<size_t N>
struct Buffer {
std::array<char, N> data;
void doSomething(std::function<void(char)> f) {
for (size_t i = 0; i < N; ++i) {
f(data[i]);
}
}
void doSomethingDifferently(std::function<void(char)> f) {
doIt(data.data(), N, f);
}
};
void doIt(char const * data, size_t size, std::function<void(char)> f) {
for (size_t i = 0; i < size; ++i) {
f(data[i]);
}
}
doSomething might get compiled to (perhaps completely) unrolled loop code, and you'd have a Buffer<100>::doSomething, a Buffer<200>::doSomething and so on, each a possibly large function. doSomethingDifferently might get compiled to not much more than a simple jump instruction, so having multiple of those wouldn't be much of an issue. Though your compiler could also change doSomething to be implemented similar doSomethingDifferently, or the other way around.
So in the end:
Don't try to make this decision depend on performance, optimizations, compile time or code bloat. Decide what's more meaningful in your situation. Will there only ever be buffers with compile time known sizes?
Also:
These buffers often appear as members of classes. I have to initialize them with a size, so I have to initialize them in the initializer-list of every single constructor of the class.
Do you know "delegating constructors"?

As Daniel Jour already said code bloating is not a huge issue and can be dealt with if needed.
The good about having size as constexpr is that it will allow you to detect some errors in compile time that would otherwise happen in runtime.
This would allow me to allocate the whole buffer on the stack, which is faster than heap allocation, as far as I know.
These buffers often appear as members of classes
This will happen only if owning class is allocated in automatic memory. Which is usually not the case. Consider following example:
struct A {
int myArray[10];
};
struct B {
B(): dynamic(new A()) {}
A automatic; // should be in the "stack"
A* dynamic; // should be in the "heap"
};
int main() {
B b1;
b1; // automatic memory
b1.automatic; // automatic memory
b1.automatic.myArray; // automatic memory
b1.dynamic; // automatic memory
(*b1.dynamic); // dynamic memory
(*b1.dynamic).myArray; // dynamic memory
B* b2 = new B();
b2; // automatic memory
(*b2); // dynamic memory
(*b2).automatic; // dynamic memory
(*b2).automatic.myArray; // dynamic memory
(*b2).dynamic; // dynamic memory
(*(*b2).dynamic).myArray; // dynamic memory
}

C++ - Loop Efficiency: Storing temporarly values of a class member Vs pointer to this member

Within a class method, I'm accessing private attributes - or attributes of a nested class. Moreover, I'm looping over these attributes.
I was wondering what is the most efficient way in terms of time (and memory) between:
copying the attributes and accessing them within the loop
Accessing the attributes within the loop
Or maybe using an iterator over the attribute
I feel my question is related to : Efficiency of accessing a value through a pointer vs storing as temporary value. But in my case, I just need to access a value, not change it.
Example
Given two classes
class ClassA
{
public:
vector<double> GetAVector() { return AVector; }
private:
vector<double> m_AVector;
}
and
class ClassB
{
public:
void MyFunction();
private:
vector<double> m_Vector;
ClassA m_A;
}
I. Should I do:
1.
void ClassB::MyFunction()
{
vector<double> foo;
for(int i=0; i<... ; i++)
{
foo.push_back(SomeFunction(m_Vector[i]));
}
/// do something ...
}
2.
void ClassB::MyFunction()
{
vector<double> foo;
vector<double> VectorCopy = m_Vector;
for(int i=0; i<... ; i++)
{
foo.push_back(SomeFunction(VectorCopy[i]));
}
/// do something ...
}
3.
void ClassB::MyFunction()
{
vector<double> foo;
for(vector<double>::iterator it = m_Vector.begin(); it != m_Vector.end() ; it++)
{
foo.push_back(SomeFunction((*it)));
}
/// do something ...
}
II. What if I'm not looping over m_vector but m_A.GetAVector()?
P.S. : I understood while going through other posts that it's not useful to 'micro'-optimize at first but my question is more related to what really happens and what should be done - as for standards (and coding-style)

You're in luck: you can actually figure out the answer all by yourself, by trying each approach with your compiler and on your operating system, and timing each approach to see how long it takes.
There is no universal answer here, that applies to every imaginable C++ compiler and operating system that exists on the third planet from the sun. Each compiler, and hardware is different, and has different runtime characteristics. Even different versions of the same compiler will often result in different runtime behavior that might affect performance. Not to mention various compilation and optimization options. And since you didn't even specify your compiler and operating system, there's literally no authoritative answer that can be given here.
Although it's true that for some questions of this type it's possible to arrive at the best implementation with a high degree of certainty, for most use cases, this isn't one of them. The only way you can get the answer is to figure it out yourself, by trying each alternative yourself, profiling, and comparing the results.

I can categorically say that 2. is less efficient than 1. Copying to a local copy, and then accessing it like you would the original would only be of potential benefit if accessing a stack variable is quicker than accessing a member one, and it's not, so it's not (if you see what I mean).
Option 3. is trickier, since it depends on the implementation of the iter() method (and end(), which may be called once per loop) versus the implementation of the operator [] method. I could irritate some C++ die-hards and say there's an option 4: ask the Vector for a pointer to the array and use a pointer or array index on that directly. That might just be faster than either!
And as for II, there is a double-indirection there. A good compiler should spot that and cache the result for repeated use - but otherwise it would only be marginally slower than not doing so: again, depending on your compiler.

Without optimizations, option 2 would be slower on every imaginable platform, becasue it will incur copy of the vector, and the access time would be identical for local variable and class member.
With optimization, depending on SomeFunction, performance might be the same or worse for option 2. Same performance would happen if SomeFunction is either visible to compiler to not modify it's argument, or it's signature guarantees that argument will not be modified - in this case compiler can optimize away the copy altogether. Otherwise, the copy will remain.

Comparison between constant accessors of private members

The main portion of this question is in regards to the proper and most computationally efficient method of creating a public read-only accessor for a private data member inside of a class. Specifically, utilizing a const type & reference to access the variables such as:
class MyClassReference
{
private:
int myPrivateInteger;
public:
const int & myIntegerAccessor;
// Assign myPrivateInteger to the constant accessor.
MyClassReference() : myIntegerAccessor(myPrivateInteger) {}
};
However, the current established method for solving this problem is to utilize a constant "getter" function as seen below:
class MyClassGetter
{
private:
int myPrivateInteger;
public:
int getMyInteger() const { return myPrivateInteger; }
};
The necessity (or lack thereof) for "getters/setters" has already been hashed out time and again on questions such as: Conventions for accessor methods (getters and setters) in C++ That however is not the issue at hand.
Both of these methods offer the same functionality using the syntax:
MyClassGetter a;
MyClassReference b;
int SomeValue = 5;
int A_i = a.getMyInteger(); // Allowed.
a.getMyInteger() = SomeValue; // Not allowed.
int B_i = b.myIntegerAccessor; // Allowed.
b.myIntegerAccessor = SomeValue; // Not allowed.
After discovering this, and finding nothing on the internet concerning it, I asked several of my mentors and professors for which is appropriate and what are the relative advantages/disadvantages of each. However, all responses I received fell nicely into two categories:
I have never even thought of that, but use a "getter" method as it is "Established Practice".
They function the same (They both run with the same efficiency), but use a "getter" method as it is "Established Practice".
While both of these answers were reasonable, as they both failed to explain the "why" I was left unsatisfied and decided to investigate this issue further. While I conducted several tests such as average character usage (they are roughly the same), average typing time (again roughly the same), one test showed an extreme discrepancy between these two methods. This was a run-time test for calling the accessor, and assigning it to an integer. Without any -OX flags (In debug mode), the MyClassReference performed roughly 15% faster. However, once a -OX flag was added, in addition to performing much faster both methods ran with the same efficiency.
My question is thus has two parts.
How do these two methods differ, and what causes one to be faster/slower than the others only with certain optimization flags?
Why is it that established practice is to use a constant "getter" function, while using a constant reference is rarely known let alone utilized?
As comments pointed out, my benchmark testing was flawed, and irrelevant to the matter at hand. However, for context it can be located in the revision history.

The answer to question #2 is that sometimes, you might want to change class internals. If you made all your attributes public, they're part of the interface, so even if you come up with a better implementation that doesn't need them (say, it can recompute the value on the fly quickly and shave the size of each instance so programs that make 100 million of them now use 400-800 MB less memory), you can't remove it without breaking dependent code.
With optimization turned on, the getter function should be indistinguishable from direct member access when the code for the getter is just a direct member access anyway. But if you ever want to change how the value is derived to remove the member variable and compute the value on the fly, you can change the getter implementation without changing the public interface (a recompile would fix up existing code using the API without code changes on their end), because a function isn't limited in the way a variable is.

There are semantic/behavioral differences that are far more significant than your (broken) benchmarks.
Copy semantics are broken
A live example:
#include <iostream>
class Broken {
public:
Broken(int i): read_only(read_write), read_write(i) {}
int const& read_only;
void set(int i) { read_write = i; }
private:
int read_write;
};
int main() {
Broken original(5);
Broken copy(original);
std::cout << copy.read_only << "\n";
original.set(42);
std::cout << copy.read_only << "\n";
return 0;
}
Yields:
5
42
The problem is that when doing a copy, copy.read_only points to original.read_write. This may lead to dangling references (and crashes).
This can be fixed by writing your own copy constructor, but it is painful.
Assignment is broken
A reference cannot be reseated (you can alter the content of its referee but not switch it to another referee), leading to:
int main() {
Broken original(5);
Broken copy(4);
copy = original;
std::cout << copy.read_only << "\n";
original.set(42);
std::cout << copy.read_only << "\n";
return 0;
}
generating an error:
prog.cpp: In function 'int main()':
prog.cpp:18:7: error: use of deleted function 'Broken& Broken::operator=(const Broken&)'
copy = original;
^
prog.cpp:3:7: note: 'Broken& Broken::operator=(const Broken&)' is implicitly deleted because the default definition would be ill-formed:
class Broken {
^
prog.cpp:3:7: error: non-static reference member 'const int& Broken::read_only', can't use default assignment operator
This can be fixed by writing your own copy constructor, but it is painful.
Unless you fix it, Broken can only be used in very restricted ways; you may never manage to put it inside a std::vector for example.
Increased coupling
Giving away a reference to your internals increases coupling. You leak an implementation detail (the fact that you are using an int and not a short, long or long long).
With a getter returning a value, you can switch the internal representation to another type, or even elide the member and compute it on the fly.
This is only significant if the interface is exposed to clients expecting binary/source-level compatibility; if the class is only used internally and you can afford to change all users if it changes, then this is not an issue.
Now that semantics are out of the way, we can speak about performance differences.
Increased object size
While references can sometimes be elided, it is unlikely to ever happen here. This means that each reference member will increase the size of an object by at least sizeof(void*), plus potentially some padding for alignment.
The original class MyClassA has a size of 4 on x86 or x86-64 platforms with mainstream compilers.
The Broken class has a size of 8 on x86 and 16 on x86-64 platforms (the latter because of padding, as pointers are aligned on 8-bytes boundaries).
An increased size can bust up CPU caches, with a large number of items you may quickly experience slow downs due to it (well, not that it'll be easy to have vectors of Broken due to its broken assignment operator).
Better performance in debug
As long as the implementation of the getter is inline in the class definition, then the compiler will strip the getter whenever you compile with a sufficient level of optimizations (-O2 or -O3 generally, -O1 may not enable inlining to preserve stack traces).
Thus, the performance of access should only vary in debug code, where performance is least necessary (and otherwise so crippled by plenty of other factors that it matters little).
In the end, use a getter. It's established convention for a good number of reasons :)

When implementing constant reference (or constant pointer) your object also stores a pointer, which makes it bigger in size. Accessor methods, on the other hand, are instantiated only once in program and are most likely optimized out (inlined), unless they are virtual or part of exported interface.
By the way, getter method can also be virtual.

To answer question 2:
const_cast<int&>(mcb.myIntegerAccessor) = 4;
Is a pretty good reason to hide it behind a getter function. It is a clever way to do a getter-like operation, but it completely breaks abstraction in the class.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Local-variable declaration inside a "virtual" loop optimization - c++

Pass them as parameters to the Bar function. Arrays decay into pointers and will be quite fast to pass.

Related

Do I need to declare arrays as static local variables?

Static data memory limit in C++ application

Setting size of custom C++ container as template parameter vs constructor

C++ - Loop Efficiency: Storing temporarly values of a class member Vs pointer to this member

Comparison between constant accessors of private members

Categories

Resources