Is it more memory or possibly computationally efficient to declare variables late?
Example:
int x;
code
..
.
.
. x is able to be used in all this code
.
actually used here
.
end
versus
code
..
.
.
.
int x;
actually used here
.
end
Thanks.
Write whatever logically makes most sense (usually closer to use). The compiler can and will spot things like this and produce code that makes the most sense for your target architecture.
Your time is far more valuable than trying to second guess the interactions of the compiler and the cache on the processor.
For example on x86 this program:
#include <iostream>
int main() {
for (int j = 0; j < 1000; ++j) {
std::cout << j << std::endl;
}
int i = 999;
std::cout << i << std::endl;
}
compared to:
#include <iostream>
int main() {
int i = 999;
for (int j = 0; j < 1000; ++j) {
std::cout << j << std::endl;
}
std::cout << i << std::endl;
}
compiled with:
g++ -Wall -Wextra -O4 -S measure.c
g++ -Wall -Wextra -O4 -S measure2.c
When inspecting the output with diff measure*.s gives:
< .file "measure2.cc"
---
> .file "measure.cc"
Even for:
#include <iostream>
namespace {
struct foo {
foo() { }
~foo() { }
};
}
std::ostream& operator<<(std::ostream& out, const foo&) {
return out << "foo";
}
int main() {
for (int j = 0; j < 1000; ++j) {
std::cout << j << std::endl;
}
foo i;
std::cout << i << std::endl;
}
vs
#include <iostream>
namespace {
struct foo {
foo() { }
~foo() { }
};
}
std::ostream& operator<<(std::ostream& out, const foo&) {
return out << "foo";
}
int main() {
foo i;
for (int j = 0; j < 1000; ++j) {
std::cout << j << std::endl;
}
std::cout << i << std::endl;
}
the results of the diff of the assembly produced by g++ -S are still identical except for the filename, because there are no side effects. If there were side effects then that would dictate where you constructed the object - at what point did you want the side effects to occur?
For fundamental types such as int, it does not matter from a performance point of view. For class types a variable definition includes a constructor invocation as well, which could be ommited if the control flow skips that variable. Furthermore, both for fundamental and class types, a definition should be delayed at least to the point were there is sufficient information to make such variable meaningful. For non default constructible class types this is mandatory; for other types it may not be but it forces you to work with uninitialized states (like -1 or other invalid values). You should define your variables as late as possible, within the minimum scope possible; it may not matter sometimes from a performance point of view but its always important design-wise.
In general, you should declare where and when you use the variables. It improves reability, maintainability and, for purely practical reasons, memory locality.
Even if you have a large object and you declare it outside or inside a loop body, the only difference is going to be between construction and assignment; the actual memory allocation will be virtually identical, since contemporary allocators are very good at short-lived allocations.
You may even consider creating new, anonymous scopes if you have a small part of code whose variables aren't required afterwards (though that usually indicates you're better off with a separate function).
So basically, write the way it makes most logical sense, and you will usually also end up with the most efficient code; or at least you won't do any worse than you would by declaring everything at the top.
It is neither memory nor computationally more efficient either way for simple types. For more complex types, it may be more efficient to have the contents hot in the cache (from being constructed) near where they are used. It can also minimize the amount of time the memory remains allocated.
Related
I'm curious about why my research result is strange
#include <iostream>
int test()
{
return 0;
}
int main()
{
/*include either the next line or the one after*/
const int a = test(); //the result is 1:1
const int a = 0; //the result is 0:1
int* ra = (int*)((void*)&a);
*ra = 1;
std::cout << a << ":" << *ra << std::endl;
return 0;
}
why the constant var initialize while runtime can completely change, but initialize while compile will only changes pointer's var?
The function isn't really that relevant here. In principle you could get same output (0:1) for this code:
int main() {
const int a = 0;
int* ra = (int*)((void*)&a);
*ra = 1;
std::cout << a << ":" << *ra;
}
a is a const int not an int. You can do all sorts of senseless c-casts, but modifiying a const object invokes undefined behavior.
By the way in the above example even for std::cout << a << ":" << a; the compiler would be allowed to emit code that prints 1:0 (or 42:3.1415927). When your code has undefinded behavior anything can happen.
PS: the function and the different outcomes is relevant if you want to study internals of your compiler and what it does to code that is not valid c++ code. For that you best look at the output of the compiler and how it differs in the two cases (https://godbolt.org/).
It is undefined behavior to cast a const variable and change it's value. If you try, anything can happen.
Anyway, what seems to happen, is that the compiler sees const int a = 0;, and, depending on optimization, puts it on the stack, but replaces all usages with 0 (since a will not change!). For *ra = 1;, the value stored in memory actually changes (the stack is read-write), and outputs that value.
For const int a = test();, dependent on optimization, the program actually calls the function test() and puts the value on the stack, where it is modified by *ra = 1, which also changes a.
I know the ISO C standard make a big deal about separating translation behaviour and execution behaviour, partly to ensure cross-compilers don't have to carry the execution environment of every target.
By that, I mean the compiler has limited information available to it compared to a running program. That limits what you can do in the source code, such as not initialising a variable based on the return value from a function like this:
int twice (int val) { return val * 2; }
int xyzzy = twice (7);
int main () { printf ("%d\n", xyzzy); return 0; }
What I'm curious about is how user defined literals in C++11 fit into this scheme. Since the literal evaluation relies on a function, what's to stop that function doing things like:
returning a random value (even if based on the input such as 42_roughly giving you a value between 40 and 44)?
having side effects, such as changing global variables?
Does the fact that a function have to be called means that these aren't really literals in the sense of being calculated at compile time?
If that's the case, what's the advantage of these literals over any other function call. In other words why is:
int xyzzy = 1101_1110_b;
preferable to:
int xyzzy = makeBin ("1101_1110");
?
The secret is all in whether you declare the user-defined literal functions as constexpr or not.
Compare this (normal execution-time function):
#include <iostream>
int operator "" _twice(unsigned long long num) { return num*2; }
int xyzzy = 7_twice;
int main () { std::cout << xyzzy << "\n"; return 0; }
With this (compile-time constant, the static_assert works):
#include <iostream>
constexpr int operator "" _twice(unsigned long long num) { return num*2; }
int xyzzy = 7_twice;
int main () {
static_assert(7_twice == 14, "not compile time constant");
std::cout << xyzzy << "\n";
return 0;
}
Obviously, declaring a function constexpr restricts all statements within to be constexpr also, or compile-time constants; no random number shenanigans allowed.
I have the following sample code. Just wanted to know if is valid to take address of a local variable in a global pointer and then modify it's contents in a sub function. Following program correctly modifies value of variable a . Can such practice cause any issues ?
#include <iostream>
#include <vector>
using namespace std;
vector<int*> va;
void func()
{
int b ;
b = 10;
int * c = va[0];
cout << "VALUE OF C=" << *c << endl;
*c = 20;
cout << "VALUE OF C=" << *c << endl;
}
int main()
{
int a;
a = 1;
va.push_back(&a);
func();
cout << "VALUE IS= " << a << endl;
return 0;
}
This is OK, as long as you don't try to dereference va[0] after a has gone out of scope. You don't, so technically this code is fine.
That said, this whole approach may not be such a good idea because it makes code very hard to maintain.
I'd say that if your program grows you could forget about a change you made in some function and get some weird errors you didn't expect.
Your code is perfectly valid as long as you call func() while being in the scope of a. However, this is not considered to be a good practice. Consider
struct HugeStruct {
int a;
};
std::vector<HugeStruct*> va;
void print_va()
{
for (size_t i = 0; i < va.size(); i++)
std::cout<<va[i].a<<' ';
std::cout<<std:endl;
}
int main()
{
for (int i = 0; i < 4; i++) {
HugeStruct hs = {i};
va.push_back(&hs);
}
print_va(); // oups ...
}
There are 2 problems in the code above.
Don't use global variables unless absolutely necessary. Global variables violate encapsulation and may cause overlay of variable names. In most cases it's much easier to pass them to functions when needed.
The vector of pointers in this code looks awful. As you can see, I forgot that pointers became invalid as soon as I left for-loop, and print_va just printed out garbage. The simple solution could be to store objects in a vector instead of pointers. But what if I don't want HugeStruct objects to be copied again and again? It can take quite a lot of time. (Suppose that instead of one int we have a vector of million integers.) One of the solutions is to allocate HugeStructs dynamically and use vector of smart pointers: std::vector<std::shared_ptr<HugeStruct>>. This way you don't have to bother about memory management and scope. Objects will be destroyed as soon as nobody will refer to them.
I wonder if there are performance comparisons of classes and C style structs in C++ with g++ -O3 option. Is there any benchmark or comparison about this. I've always thought C++ classes as heavier and possibly slower as well than the structs (compile time isn't very important for me, run time is more crucial). I'm going to implement a B-tree, should I implement it with classes or with structs for the sake of performance.
On runtime level there is no difference between structs and classes in C++ at all.
So it doesn't make any performance difference whether you use struct A or class A in your code.
Other thing, is using some features -- like, constructors, destructors and virtual functions, -- could have some performance penalties (but if you use them you probably need them anyway). But you can with equal success use them both inside your class or struct.
In this document you can read about other performance-related subtleties of C++.
In C++, struct is syntactic sugar for classes whose members are public by default.
My honest opinion...don't worry about performance until it actually shows itself to be a problem, then profile your code. Premature optimization is the root of all evil. But, as others have said, there is no difference between a struct and class in C++ at runtime.
Focus on creating an efficient data structure and efficient logic to manipulate the data structure. C++ classes are not inherently slower than C-style structs, so don't let that limit your design.
AFAIK, from a performance point of view, they are equivalent in C++.
Their difference is synctatic sugar like struct members are public by default, for example.
my2c
Just do an experiment, people!
Here is the code for the experiment I designed:
#include <iostream>
#include <string>
#include <ctime>
using namespace std;
class foo {
public:
void foobar(int k) {
for (k; k > 0; k--) {
cout << k << endl;
}
}
void initialize() {
accessor = "asdfasdfasdfasdfasdfasdfasdfasdfasdfasdf";
}
string accessor;
};
struct bar {
public:
void foobar(int k) {
for (k; k > 0; k--) {
cout << k << endl;
}
}
void initialize() {
accessor = "asdfasdfasdfasdfasdfasdfasdfasdfasdfasdf";
}
string accessor;
};
int main() {
clock_t timer1 = clock();
for (int j = 0; j < 200; j++) {
foo f;
f.initialize();
f.foobar(7);
cout << f.accessor << endl;
}
clock_t classstuff = clock();
clock_t timer2 = clock();
for (int j = 0; j < 200; j++) {
bar b;
b.initialize();
b.foobar(7);
cout << b.accessor << endl;
}
clock_t structstuff = clock();
cout << "struct took " << structstuff-timer2 << endl;
cout << "class took " << classstuff-timer1 << endl;
return 0;
}
On my computer, struct took 1286 clock ticks, and class took 1450 clock ticks. To answer your question, struct is slightly faster. However, that shouldn't matter, because computers are so fast these days.
well actually structs can be more efficient than classes both in time and memory (e.g arrays of structs vs arrays of objects),
There is a huge
difference in efficiency in
some cases. While the
overhead of an object
might not seem like very
much, consider an array
of objects and compare it
to an array of structs.
Assume the data
structure contains 16
bytes of data, the array
length is 1,000,000, and
this is a 32-bit system.
For an array of objects
the total space usage is:
8 bytes array overhea
(4 byte pointer size ×
((8 bytes overhead +
= 28 MB
For an array of structs,
the results are
dramatically different:
8 bytes array overhea
(16 bytes data × 1,00
= 16 MB
With a 64-bit process, the
object array takes over
40 MB while the struct
array still requires only 16
MB.
see this article for details.
I am trying to show by example that the prefix increment is more efficient than the postfix increment.
In theory this makes sense: i++ needs to be able to return the unincremented original value and therefore store it, whereas ++i can return the incremented value without storing the previous value.
But is there a good example to show this in practice?
I tried the following code:
int array[100];
int main()
{
for(int i = 0; i < sizeof(array)/sizeof(*array); i++)
array[i] = 1;
}
I compiled it using gcc 4.4.0 like this:
gcc -Wa,-adhls -O0 myfile.cpp
I did this again, with the postfix increment changed to a prefix increment:
for(int i = 0; i < sizeof(array)/sizeof(*array); ++i)
The result is identical assembly code in both cases.
This was somewhat unexpected. It seemed like that by turning off optimizations (with -O0) I should see a difference to show the concept. What am I missing? Is there a better example to show this?
In the general case, the post increment will result in a copy where a pre-increment will not. Of course this will be optimized away in a large number of cases and in the cases where it isn't the copy operation will be negligible (ie., for built in types).
Here's a small example that show the potential inefficiency of post-increment.
#include <stdio.h>
class foo
{
public:
int x;
foo() : x(0) {
printf( "construct foo()\n");
};
foo( foo const& other) {
printf( "copy foo()\n");
x = other.x;
};
foo& operator=( foo const& rhs) {
printf( "assign foo()\n");
x = rhs.x;
return *this;
};
foo& operator++() {
printf( "preincrement foo\n");
++x;
return *this;
};
foo operator++( int) {
printf( "postincrement foo\n");
foo temp( *this);
++x;
return temp;
};
};
int main()
{
foo bar;
printf( "\n" "preinc example: \n");
++bar;
printf( "\n" "postinc example: \n");
bar++;
}
The results from an optimized build (which actually removes a second copy operation in the post-increment case due to RVO):
construct foo()
preinc example:
preincrement foo
postinc example:
postincrement foo
copy foo()
In general, if you don't need the semantics of the post-increment, why take the chance that an unnecessary copy will occur?
Of course, it's good to keep in mind that a custom operator++() - either the pre or post variant - is free to return whatever it wants (or even do whatever it wants), and I'd imagine that there are quite a few that don't follow the usual rules. Occasionally I've come across implementations that return "void", which makes the usual semantic difference go away.
You won't see any difference with integers. You need to use iterators or something where post and prefix really do something different. And you need to turn all optimisations on, not off!
I like to follow the rule of "say what you mean".
++i simply increments. i++ increments and has a special, non-intuitive result of evaluation. I only use i++ if I explicitly want that behavior, and use ++i in all other cases. If you follow this practice, when you do see i++ in code, it's obvious that post-increment behavior really was intended.
Several points:
First, you're unlikely to see a major performance difference in any way
Second, your benchmarking is useless if you have optimizations disabled. What we want to know is if this change gives us more or less efficient code, which means that we have to use it with the most efficient code the compiler is able to produce. We don't care whether it is faster in unoptimized builds, we need to know if it is faster in optimized ones.
For built-in datatypes like integers, the compiler is generally able to optimize the difference away. The problem mainly occurs for more complex types with overloaded increment iterators, where the compiler can't trivially see that the two operations would be equivalent in the context.
You should use the code that clearest expresses your intent. Do you want to "add one to the value", or "add one to the value, but keep working on the original value a bit longer"? Usually, the former is the case, and then a pre-increment better expresses your intent.
If you want to show the difference, the simplest option is simply to impement both operators, and point out that one requires an extra copy, the other does not.
This code and its comments should demonstrate the differences between the two.
class a {
int index;
some_ridiculously_big_type big;
//etc...
};
// prefix ++a
void operator++ (a& _a) {
++_a.index
}
// postfix a++
void operator++ (a& _a, int b) {
_a.index++;
}
// now the program
int main (void) {
a my_a;
// prefix:
// 1. updates my_a.index
// 2. copies my_a.index to b
int b = (++my_a).index;
// postfix
// 1. creates a copy of my_a, including the *big* member.
// 2. updates my_a.index
// 3. copies index out of the **copy** of my_a that was created in step 1
int c = (my_a++).index;
}
You can see that the postfix has an extra step (step 1) which involves creating a copy of the object. This has both implications for both memory consumption and runtime. That is why prefix is more efficient that postfix for non-basic types.
Depending on some_ridiculously_big_type and also on whatever you do with the result of the incrememt, you'll be able to see the difference either with or without optimizations.
In response to Mihail, this is a somewhat more portable version his code:
#include <cstdio>
#include <ctime>
using namespace std;
#define SOME_BIG_CONSTANT 100000000
#define OUTER 40
int main( int argc, char * argv[] ) {
int d = 0;
time_t now = time(0);
if ( argc == 1 ) {
for ( int n = 0; n < OUTER; n++ ) {
int i = 0;
while(i < SOME_BIG_CONSTANT) {
d += i++;
}
}
}
else {
for ( int n = 0; n < OUTER; n++ ) {
int i = 0;
while(i < SOME_BIG_CONSTANT) {
d += ++i;
}
}
}
int t = time(0) - now;
printf( "%d\n", t );
return d % 2;
}
The outer loops are there to allow me to fiddle the timings to get something suitable on my platform.
I don't use VC++ any more, so i compiled it (on Windows) with:
g++ -O3 t.cpp
I then ran it by alternating:
a.exe
and
a.exe 1
My timing results were approximately the same for both cases. Sometimes one version would be faster by up to 20% and sometimes the other. This I would guess is due to other processes running on my system.
Try to use while or do something with returned value, e.g.:
#define SOME_BIG_CONSTANT 1000000000
int _tmain(int argc, _TCHAR* argv[])
{
int i = 1;
int d = 0;
DWORD d1 = GetTickCount();
while(i < SOME_BIG_CONSTANT + 1)
{
d += i++;
}
DWORD t1 = GetTickCount() - d1;
printf("%d", d);
printf("\ni++ > %d <\n", t1);
i = 0;
d = 0;
d1 = GetTickCount();
while(i < SOME_BIG_CONSTANT)
{
d += ++i;
}
t1 = GetTickCount() - d1;
printf("%d", d);
printf("\n++i > %d <\n", t1);
return 0;
}
Compiled with VS 2005 using /O2 or /Ox, tried on my desktop and on laptop.
Stably get something around on laptop, on desktop numbers are a bit different (but rate is about the same):
i++ > 8xx <
++i > 6xx <
xx means that numbers are different e.g. 813 vs 640 - still around 20% speed up.
And one more point - if you replace "d +=" with "d = " you will see nice optimization trick:
i++ > 935 <
++i > 0 <
However, it's quite specific. But after all, I don't see any reasons to change my mind and think there is no difference :)
Perhaps you could just show the theoretical difference by writing out both versions with x86 assembly instructions? As many people have pointed out before, compiler will always make its own decisions on how best to compile/assemble the program.
If the example is meant for students not familiar with the x86 instruction set, you might consider using the MIPS32 instruction set -- for some odd reason many people seem to find it to be easier to comprehend than x86 assembly.
Ok, all this prefix/postfix "optimization" is just... some big misunderstanding.
The major idea that i++ returns its original copy and thus requires copying the value.
This may be correct for some unefficient implementations of iterators. However in 99% of cases even with STL iterators there is no difference because compiler knows how to optimize it and the actual iterators are just pointers that look like class. And of course there is no difference for primitive types like integers on pointers.
So... forget about it.
EDIT: Clearification
As I had mentioned, most of STL iterator classes are just pointers wrapped with classes, that have all member functions inlined allowing out-optimization of such irrelevant copy.
And yes, if you have your own iterators without inlined member functions, then it may
work slower. But, you should just understand what compiler does and what does not.
As a small prove, take this code:
int sum1(vector<int> const &v)
{
int n;
for(auto x=v.begin();x!=v.end();x++)
n+=*x;
return n;
}
int sum2(vector<int> const &v)
{
int n;
for(auto x=v.begin();x!=v.end();++x)
n+=*x;
return n;
}
int sum3(set<int> const &v)
{
int n;
for(auto x=v.begin();x!=v.end();x++)
n+=*x;
return n;
}
int sum4(set<int> const &v)
{
int n;
for(auto x=v.begin();x!=v.end();++x)
n+=*x;
return n;
}
Compile it to assembly and compare sum1 and sum2, sum3 and sum4...
I just can tell you... gcc give exactly the same code with -02.