Performances of Structs vs Classes - c++

I wonder if there are performance comparisons of classes and C style structs in C++ with g++ -O3 option. Is there any benchmark or comparison about this. I've always thought C++ classes as heavier and possibly slower as well than the structs (compile time isn't very important for me, run time is more crucial). I'm going to implement a B-tree, should I implement it with classes or with structs for the sake of performance.

On runtime level there is no difference between structs and classes in C++ at all.
So it doesn't make any performance difference whether you use struct A or class A in your code.
Other thing, is using some features -- like, constructors, destructors and virtual functions, -- could have some performance penalties (but if you use them you probably need them anyway). But you can with equal success use them both inside your class or struct.
In this document you can read about other performance-related subtleties of C++.

In C++, struct is syntactic sugar for classes whose members are public by default.

My honest opinion...don't worry about performance until it actually shows itself to be a problem, then profile your code. Premature optimization is the root of all evil. But, as others have said, there is no difference between a struct and class in C++ at runtime.

Focus on creating an efficient data structure and efficient logic to manipulate the data structure. C++ classes are not inherently slower than C-style structs, so don't let that limit your design.

AFAIK, from a performance point of view, they are equivalent in C++.
Their difference is synctatic sugar like struct members are public by default, for example.
my2c

Just do an experiment, people!
Here is the code for the experiment I designed:
#include <iostream>
#include <string>
#include <ctime>
using namespace std;
class foo {
public:
void foobar(int k) {
for (k; k > 0; k--) {
cout << k << endl;
}
}
void initialize() {
accessor = "asdfasdfasdfasdfasdfasdfasdfasdfasdfasdf";
}
string accessor;
};
struct bar {
public:
void foobar(int k) {
for (k; k > 0; k--) {
cout << k << endl;
}
}
void initialize() {
accessor = "asdfasdfasdfasdfasdfasdfasdfasdfasdfasdf";
}
string accessor;
};
int main() {
clock_t timer1 = clock();
for (int j = 0; j < 200; j++) {
foo f;
f.initialize();
f.foobar(7);
cout << f.accessor << endl;
}
clock_t classstuff = clock();
clock_t timer2 = clock();
for (int j = 0; j < 200; j++) {
bar b;
b.initialize();
b.foobar(7);
cout << b.accessor << endl;
}
clock_t structstuff = clock();
cout << "struct took " << structstuff-timer2 << endl;
cout << "class took " << classstuff-timer1 << endl;
return 0;
}
On my computer, struct took 1286 clock ticks, and class took 1450 clock ticks. To answer your question, struct is slightly faster. However, that shouldn't matter, because computers are so fast these days.

well actually structs can be more efficient than classes both in time and memory (e.g arrays of structs vs arrays of objects),
‎‏There is a huge
difference in efficiency in
some cases. While the
overhead of an object
might not seem like very
much, consider an array
of objects and compare it
to an array of structs.
Assume the data
structure contains 16
bytes of data, the array
length is 1,000,000, and
this is a 32-bit system.
For an array of objects
the total space usage is:
8 bytes array overhea
(4 byte pointer size ×
((8 bytes overhead +
= 28 MB
For an array of structs,
the results are
dramatically different:
8 bytes array overhea
(16 bytes data × 1,00
= 16 MB
With a 64-bit process, the
object array takes over
40 MB while the struct
array still requires only 16
MB.
see this article for details.

Related

Zero sized array in struct managed by shared pointer

Consider the following structure:
struct S
{
int a;
int b;
double arr[0];
} __attribute__((packed));
As you can see, this structure is packed and has Zero sized array at the end.
I'd like to send this as binary data over the network (assume I took care of endianity).
In C/C++ I could just use malloc to allocate as much space as I want and use free later.
I'd like this memory to be handled by std::shared_ptr.
Is there a straight forward way of doing so without special hacks?
I'd like this memory to be handled by std::shared_ptr.
Is there a straight forward way of doing so without special hacks?
Sure, there is:
shared_ptr<S> make_buffer(size_t s)
{
auto buffer = malloc(s); // allocate as usual
auto release = [](void* p) { free(p); }; // a deleter
shared_ptr<void> sptr(buffer, release); // make it shared
return { sptr, new(buffer) S }; // an aliased pointer
}
This works with any objects that are placed in a malloced buffer, not just when there are zero-sized arrays, provided that the destructor is trivial (performs no action) because it is never called.
The usual caveats about zero-sized arrays and packed structures still apply as well, of course.
double arr[0];
} __attribute__((packed));
Zero sized arrays are not allowed as data members (nor as any other variable) in C++. Furthermore, there is no such attribute as packed in C++; it is a language extension (as such, it may be considered to be a special hack). __attribute__ itself is a language extension. The standard syntax for function attributes uses nested square brackets like this: [[example_attribute]].
I'd like to send this as binary data over the network
You probably should properly serialise the data. There are many serialisation specifications although none of them is universally ubiquitous and none of them is implemented in the C++ standard library. Indeed, there isn't a standard API for network commnication either.
A straightforward solution is to pick an existing serialisation format and use an existing library that implements it.
First, let me explain again why I have this packed structure:
it is used for serialization of data over the network so there's a header file with all network packet structures.
I know it generates bad assembly due to alignment issues, but I guess that this problem persists with regular serialization (copy to char * buffer with memcpy).
Zero size arrays are supported both by gcc and clang which I use.
Here's an example of a full program with a solution to my question and it's output (same output for gcc and g++).
compiled with -O3 -std=c++17 flags
#include <iostream>
#include <memory>
#include <type_traits>
#include <cstddef>
struct S1
{
~S1() {std::cout << "deleting s1" << std::endl;}
char a;
int b;
int c[0];
} __attribute__((packed));
struct S2
{
char a;
int b;
int c[0];
};
int main(int argc, char **argv)
{
auto s1 = std::shared_ptr<S1>(static_cast<S1 *>(::operator
new(sizeof(S1) + sizeof(int) * 1e6)));
std::cout << "is standart: " << std::is_standard_layout<S1>::value << std::endl;
for (int i = 0; i < 1e6; ++i)
{
s1->c[i] = i;
}
std::cout << sizeof(S1) << ", " << sizeof(S2) << std::endl;
std::cout << offsetof(S1, c) << std::endl;
std::cout << offsetof(S2, c) << std::endl;
return 0;
}
This is the output:
is standart: 1
5, 8
5
8
deleting s1
Is there anything wrong with doing this?
I made sure using valgrind all allocations/frees work properly.

Generic way to generate random struct for testing

I've been given the task to refactor a bunch of C++ code that has a lot of math and not an explanation of what it does.
In order to do that I've setup a bunch of automated test that given random data compare old and new code results.
The thing is that, while it is simple to generate random vector of any size I have a lot of "struct" with many public fields (> 20) I'm a bit tired of writing custom function to fill them.
One can think of using some kind of script to parse the definition and autobuild the corresponding generator function.
Do you think this is a good idea ?
Is there anything like that already done?
If you have only Plain Old Data, a struct is, roughly, merely a blob of memory with some meaning to the compiler.
This means you can treat it as such, and simply fill it with random bytes, using a union:
struct a {
int i;
char c;
float f;
double d;
};
union u {
char arr[sizeof(a)];
a record;
};
char generateRandomChar(); // implement some random char generation
int main() {
u foo;
for (char& c : foo.arr) {
c = generateRandomChar();
}
std::cout << "i:" << foo.record.i
<< "\nc:" << foo.record.c
<< "\nf:" << foo.record.f
<< "\nd:" << foo.record.d;
}
See it live!
Technically, this is Undefined Behavior. In practice, it is well defined in most compilers.

Why does C not have pass by address/reference without pointers?

Consider the trivial test of this swap function in C++ which uses pass by pointer.
#include <iostream>
using std::cout;
using std::endl;
void swap_ints(int *a, int *b)
{
int temp = *a;
*a = *b;
*b = temp;
return;
}
int main(void)
{
int a = 1;
int b = 0;
cout << "a = " << a << "\t" << "b = " << b << "\n\n";
swap_ints(&a, &b);
cout << "a = " << a << "\t" << "b = " << b << endl;
return 0;
}
Does this program use more memory than if I had passed by address? Such as in this function decleration:
void swap_ints(int &a, int &b)
{
int temp = a;
a = b;
b = temp;
return;
}
Does this pass-by-reference version of the C++ function use less memory, by not needing to create the pointer variables?
And does C not have this "pass-by-reference" ability the same that C++ does? If so, then why not, because it means more memory efficient code right? If not, what is the pitfall behind this that C does not adopt this ability. I suppose what I am not consider is the fact that C++ probably creates pointers to achieve this functionality behind the scenes. Is this what the compiler actually does -- and so C++ really does not have any true advantage besides neater code?
The only way to be sure would be to examine the code the compiler generated for each and compare the two to see what you get.
That said, I'd be a bit surprised to see a real difference (at least when optimization was enabled), at least for a reasonably mainstream compiler. You might see a difference for a compiler on some really tiny embedded system that hasn't been updated in the last decade or so, but even there it's honestly pretty unlikely.
I should also add that in most cases I'd expect to see code for such a trivial function generated inline, so there was on function call or parameter passing involved at all. In a typical case, it's likely to come down to nothing more than a couple of loads and stores.
Don't confuse counting variables in your code with counting memory used by the processor. C++ has many abstractions that hide the inner workings of the compiler in order to make things simpler and easier for a human to follow.
By design, C does not have quite as many levels of abstractions as C++.

Is it more efficient to declare variables late?

Is it more memory or possibly computationally efficient to declare variables late?
Example:
int x;
code
..
.
.
. x is able to be used in all this code
.
actually used here
.
end
versus
code
..
.
.
.
int x;
actually used here
.
end
Thanks.
Write whatever logically makes most sense (usually closer to use). The compiler can and will spot things like this and produce code that makes the most sense for your target architecture.
Your time is far more valuable than trying to second guess the interactions of the compiler and the cache on the processor.
For example on x86 this program:
#include <iostream>
int main() {
for (int j = 0; j < 1000; ++j) {
std::cout << j << std::endl;
}
int i = 999;
std::cout << i << std::endl;
}
compared to:
#include <iostream>
int main() {
int i = 999;
for (int j = 0; j < 1000; ++j) {
std::cout << j << std::endl;
}
std::cout << i << std::endl;
}
compiled with:
g++ -Wall -Wextra -O4 -S measure.c
g++ -Wall -Wextra -O4 -S measure2.c
When inspecting the output with diff measure*.s gives:
< .file "measure2.cc"
---
> .file "measure.cc"
Even for:
#include <iostream>
namespace {
struct foo {
foo() { }
~foo() { }
};
}
std::ostream& operator<<(std::ostream& out, const foo&) {
return out << "foo";
}
int main() {
for (int j = 0; j < 1000; ++j) {
std::cout << j << std::endl;
}
foo i;
std::cout << i << std::endl;
}
vs
#include <iostream>
namespace {
struct foo {
foo() { }
~foo() { }
};
}
std::ostream& operator<<(std::ostream& out, const foo&) {
return out << "foo";
}
int main() {
foo i;
for (int j = 0; j < 1000; ++j) {
std::cout << j << std::endl;
}
std::cout << i << std::endl;
}
the results of the diff of the assembly produced by g++ -S are still identical except for the filename, because there are no side effects. If there were side effects then that would dictate where you constructed the object - at what point did you want the side effects to occur?
For fundamental types such as int, it does not matter from a performance point of view. For class types a variable definition includes a constructor invocation as well, which could be ommited if the control flow skips that variable. Furthermore, both for fundamental and class types, a definition should be delayed at least to the point were there is sufficient information to make such variable meaningful. For non default constructible class types this is mandatory; for other types it may not be but it forces you to work with uninitialized states (like -1 or other invalid values). You should define your variables as late as possible, within the minimum scope possible; it may not matter sometimes from a performance point of view but its always important design-wise.
In general, you should declare where and when you use the variables. It improves reability, maintainability and, for purely practical reasons, memory locality.
Even if you have a large object and you declare it outside or inside a loop body, the only difference is going to be between construction and assignment; the actual memory allocation will be virtually identical, since contemporary allocators are very good at short-lived allocations.
You may even consider creating new, anonymous scopes if you have a small part of code whose variables aren't required afterwards (though that usually indicates you're better off with a separate function).
So basically, write the way it makes most logical sense, and you will usually also end up with the most efficient code; or at least you won't do any worse than you would by declaring everything at the top.
It is neither memory nor computationally more efficient either way for simple types. For more complex types, it may be more efficient to have the contents hot in the cache (from being constructed) near where they are used. It can also minimize the amount of time the memory remains allocated.

Can I define a type based on the result of some calculation?

I perform some calculations, based on the result, I would like to either use a short int or int for some type of data for the remaining program. Can (/How can) this be done sensibly in C or C++? I don't really care about the amount of memory used (i.e., 2 or 4 bytes), my primary aim is to access generic arrays as if they contained data of this type. I would like to avoid code such as the following:
char s[128];
if (result of preliminary calculations was A)
*((int*) s) = 50;
else
*((short int*) s) = 50;
to set the first 4 or 2 bytes of s. A conditional global typedef would be ideal:
if (result of preliminary calculations was A)
typedef int mytype;
else
typedef short int mytype;
I am not that familiar with C++ class templates (yet). Do they apply to my problem? Would I have to change the declarations throughout my program (to myclass< > and myclass< >*)?
Many thanks!
Frank
Edit: The values may not always be aligned. I.e, a int can start at position 21. Thanks for the answers.
For plain C, you could do this using function pointers:
static union { s_int[32]; s_short[64]; s_char[128]; } s;
static void set_s_int(int i, int n)
{
s.s_int[i] = n;
}
static int get_s_int(int i)
{
return s.s_int[i];
}
static void set_s_short(int i, int n)
{
s.s_short[i] = n;
}
static int get_s_short(int i)
{
return s.s_short[i];
}
static void (*set_s)(int, int);
static int (*get_s)(int);
Set them once based on the preliminary calculations:
if (result of preliminary calculations was A)
{
set_s = set_s_int;
get_s = get_s_int;
}
else
{
set_s = set_s_short;
get_s = get_s_short;
}
Then just use the function pointers in the rest of the program:
set_s(0, 50); /* Set entry 0 in array to 50 */
Your file writing function can directly reference s or s.s_char depending on how it works.
In C and C++, all type information is defined at Compile-time. So no, you cannot do this.
If the result of the preliminary calculations can be found at compile time, then this can work. Here are some simple examples to show how this can work. To do more complicated examples, see http://en.wikipedia.org/wiki/Template_metaprogramming
using namespace std;
#include <iostream>
template<int x> struct OddOrEven { typedef typename OddOrEven<x-2>::t t; };
template<> struct OddOrEven<0> { typedef short t; };
template<> struct OddOrEven<1> { typedef int t; };
template<bool makeMeAnInt> struct X { typedef short t; };
template<> struct X<true> { typedef int t; };
int main(void) {
cout << sizeof(X<false>::t) << endl;
cout << sizeof(X<true>::t) << endl;
cout << sizeof(OddOrEven<0>::t) << endl;
cout << sizeof(OddOrEven<1>::t) << endl;
cout << sizeof(OddOrEven<2>::t) << endl;
cout << sizeof(OddOrEven<3>::t) << endl;
cout << sizeof(OddOrEven<4>::t) << endl;
cout << sizeof(OddOrEven<5>::t) << endl;
}
I think above is standard C++, but if not I can tell you this work on g++ (Debian 4.3.2-1.1) 4.3.2
I think your main problem is how you plan to read the data from s later on if you don't know what type to read.
If you have that part covered, you can use a union:
union myintegers
{
int ints[32];
short shorts[64];
};
Now simply use the type you want.
myintegers s;
if (result of preliminary calculations was A)
s.ints[0] = 50;
else
s.shorts[0] = 50;
As a step further, you could wrap it all in a class which is constructed with result of preliminary calculations was A and overrides the operators * and [] to store in one or the other.
But are you sure you want any of that?
In current C++ standard (C++03), you can't.
In fact you can use some advanced metaprogramming tricks but it will not help most of the time.
In the next standard (C++0x, certainly C++11 in the end), you will be able to use the keyword decltype to get the type of an expression. If you're using VC10 (VS2010) or GCC 4.4 or more recent, then you already have the feature available.
You could abuse templates for this purpose. Any code that's subject to the decision would have to be templated based on the int type. One branch would instantiate the int version, the other would instantiate the short int version. This is probably a bad idea*.
Edit
*Well, it's only a bad idea to apply this to your overall architecture. If you have a particular data type that encapsulates the varied behavior, a template should work just fine.
Here's a variation on Aaron McDaid's answer to illustrate it's use with conditions:
#include <iostream>
#include <string>
using namespace std;
template<int x> struct OddOrEven { typedef typename OddOrEven<x-2>::t t; };
template<> struct OddOrEven<0> { typedef short t; };
template<> struct OddOrEven<1> { typedef int t; };
int main() {
cout << "int or short? ";
string which;
cin >> which;
if (which.compare("int") == 0)
cout << sizeof(OddOrEven<1>::t) << endl;
else if (which.compare("short") == 0)
cout << sizeof(OddOrEven<0>::t) << endl;
else
cout << "Please answer with either int or short next time." << endl;
return 0;
}
This is a code snippet from a project I had a while back.
void* m_pdata;
if (e_data_type == eU8C1){
pimage_data = new unsigned char[size_x * size_y];
}
if (e_data_type == eU16C1){
pimage_data = new unsigned short[size_x * size_y];
}
I hope it can help you
Since your stated goal is to store information efficiently on disk, you should learn to stop writing memory images of C/C++ data structures to disk directly and instead serialize your data. Then you can use any of a number of forms of variable-length coding ("vlc") to get the effect you want. The simplest is a coding with 7 bits per byte where the 8th bit is a continuation flag indicating that the value is continued in the next byte. So 259 would be stored as (binary, with continuation bit marked by spacing and byte boundaries marked by ;):
1 0000010 ; 0 0000011
Alternatively you could use the head nibble to signal the number of bytes that will follow, or use a scheme similar to UTF-8 with slightly more overhead but stricter resynchronization guarantees. There are also vlcs with are designed to be parsable and easily resynchronized when reading either forward or in reverse.