Storing multiple structs in continuous memory area in C++ - c++

I want to store a state variable composed of multiple POD-structs of various types into a single memory area. Since the combination of structs used to make up the state variable is decided at run time, i cannot just place them into a surrounding struct or class. Also i want the number of memory allocations to be as low as possible.
What is the best way to do it? Is the following code legal/portable or can it cause alignment errors on some platforms / with some compilers?
struct TestA {
int a;
short b;
};
struct TestB {
int c;
float d;
char e;
};
int main() {
void* mem = new uint8_t[sizeof(TestA) + sizeof(TestB)];
TestA* a1 = (TestA*) mem;
a1->a = a1->b = 42;
a1++;
TestB* b = (TestB*) a1;
b->c = 5;
b->d = 23.f;
b->e = 'e';
}

What you're trying to do is essentially "placement new." So all caveats apply here too. If the memory location is not aligned properly for the given type, then you're into undefined behavior. In your code:
a1++;
is not guaranteed to give an address that's properly aligned for a TestB. So your code is not standard-conformant.

Related

Coalescing two memory chunks in C++?

I'm trying to make my own memory allocator in C++ for educational purposes, and I have a code like this:
class IntObj
{
public:
IntObj(): var_int(6) {}
void setVar(int var)
{
var_int = var;
}
int getVar()
{
return var_int;
}
virtual size_t getMemorySize()
{
return sizeof(*this);
}
int a = 8;
~IntObj()
{}
private:
int var_int;
};
And I'm stuck with how to have unused memory chunks merge. I'm trying to test it like this:
char *pz = new char[sizeof(IntObj) * 2]; //In MacOS, IntObj takes 16 bytes
char *pz2 = &pz[sizeof(IntObj)]; // Take address of 16-th cell
char *pz3 = new char[sizeof(IntObj) / 2]; //Array of 8 bytes
char **pzz = &pz2;
pzz[sizeof(IntObj)] = pz3; // Set address of cell 16 to the pz3 array
new (&pzz) IntObj; //placement new
IntObj *ss = reinterpret_cast<IntObj *>(&pzz);
cout << ss->a;
The output is 8 as expected. My questions:
Why the output is correct?
Is the code like this correct? If not, are there any other ways to implement coalescence of two memory chunks?
UPDATE: All methods work correctly.
e.g this would work:
ss->setVar(54);
cout << ss->getVar();
The output is 54.
UPDATE 2: First of all, my task is not to request a new memory block from OS for instantiating an object, but to give it from a linked list of free blocks(that were allocated when starting a program). My problem is that I can have polymorphic objects with different sizes, and don't know how to split memory blocks, or merge (that is what I understand by merging or coalescing chunks) them (if allocation is requested) effectively.
There's a number of misunderstandings apparent here
char *pz = new char[sizeof(IntObj) * 2]; // fine
char *pz2 = &pz[sizeof(IntObj)]; // fine
char *pz3 = new char[sizeof(IntObj) / 2]; // fine
char **pzz = &pz2; // fine
pzz[sizeof(IntObj)] = pz3; // bad
pzz is a pointer that is pointing to only a single char*, which is the variable pz2. Meaning that any access or modification past pzz[0] is undefined behavior (very bad). You're likely modifying the contents of some other variable.
new (&pzz) IntObj; // questionable
This is constructing an IntObj in the space of the variable pzz, not where pzz is pointing to. The constructor of course sets a to 8 thereby stomping on the contents of pzz (it won't be pointing to pz2 anymore). I'm uncertain if this in-and-of-itself is undefined behavior (since there would be room for a whole IntObj), but using it certainly is:
IntObj *ss = reinterpret_cast<IntObj *>(&pzz); // bad
This violates the strict-aliasing rule. While the standard is generous for char* aliases, it does not allow char** to IntObj* aliases. This exhibits more undefined behavior.
If your question comes down to whether or not you can use two independent and contiguous blocks of memory as a single block then no, you cannot.

C++ understanding Unions and Structs

I've come to work on an ongoing project where some unions are defined as follows:
/* header.h */
typedef union my_union_t {
float data[4];
struct {
float varA;
float varB;
float varC;
float varD;
};
} my_union;
If I understand well, unions are for saving space, so sizeof(my_union_t) = MAX of the variables in it. What are the advantages of using the statement above instead of this one:
typedef struct my_struct {
float varA;
float varB;
float varC;
float varD;
};
Won't be the space allocated for both of them the same?
And how can I initialize varA,varB... from my_union?
Unions are often used when implementing a variant like object (a type field and a union of data types), or in implementing serialisation.
The way you are using a union is a recipe for disaster.
You are assuming the the struct in the union is packing the floats with no gaps between then!
The standard guarantees that float data[4]; is contiguous, but not the structure elements. The only other thing you know is that the address of varA; is the same as the address of data[0].
Never use a union in this way.
As for your question: "And how can I initialize varA,varB... from my_union?". The answer is, access the structure members in the normal long-winded way not via the data[] array.
Union are not mostly for saving space, but to implement sum types (for that, you'll put the union in some struct or class having also a discriminating field which would keep the run-time tag). Also, I suggest you to use a recent standard of C++, at least C++11 since it has better support of unions (e.g. permits more easily union of objects and their construction or initialization).
The advantage of using your union is to be able to index the n-th floating point (with 0 <= n <= 3) as u.data[n]
To assign a union field in some variable declared my_union u; just code e.g. u.varB = 3.14; which in your case has the same effect as u.data[1] = 3.14;
A good example of well deserved union is a mutable object which can hold either an int or a string (you could not use derived classes in that case):
class IntOrString {
bool isint;
union {
int num; // when isint is true
str::string str; // when isint is false
};
public:
IntOrString(int n=0) : isint(true), num(n) {};
IntOrString(std::string s) : isint(false), str(s) {};
IntOrString(const IntOrString& o): isint(o.isint)
{ if (isint) num = o.num; else str = o.str); };
IntOrString(IntOrString&&p) : isint(p.isint)
{ if (isint) num = std::move (p.num);
else str = std::move (p.str); };
~IntOrString() { if (isint) num=0; else str->~std::string(); };
void set (int n)
{ if (!isint) str->~std::string(); isint=true; num=n; };
void set (std::string s) { str = s; isint=false; };
bool is_int() const { return isint; };
int as_int() const { return (isint?num:0; };
const std::string as_string() const { return (isint?"":str;};
};
Notice the explicit calls of destructor of str field. Notice also that you can safely use IntOrString in a standard container (std::vector<IntOrString>)
See also std::optional in future versions of C++ (which conceptually is a tagged union with void)
BTW, in Ocaml, you simply code:
type intorstring = Integer of int | String of string;;
and you'll use pattern matching. If you wanted to make that mutable, you'll need to make a record or a reference of it.
You'll better use union-s in a C++ idiomatic way (see this for general advices).
I think the best way to understand unions is to just to give 2 common practical examples.
The first example is working with images. Imagine you have and RGB image that is arranged in a long buffer.
What most people would do, is represent the buffer as a char* and then loop it by 3's to get the R,G,B.
What you could do instead, is make a little union, and use that to loop over the image buffer:
union RGB
{
char raw[3];
struct
{
char R;
char G;
char B;
} colors;
}
RGB* pixel = buffer[0];
///pixel.colors.R == The red color in the first pixel.
Another very useful use for unions is using registers and bitfields.
Lets say you have a 32 bit value, that represents some HW register, or something.
Sometimes, to save space, you can split the 32 bits into bit fields, but you also want the whole representation of that register as a 32 bit type.
This obviously saves bit shift calculation that a lot of programmers use for no reason at all.
union MySpecialRegister
{
uint32_t register;
struct
{
unsigned int firstField : 5;
unsigned int somethingInTheMiddle : 25;
unsigned int lastField : 6;
} data;
}
// Now you can read the raw register into the register field
// then you can read the fields using the inner data struct
The advantage is that with a union you can access the same memory in two different ways.
In your example the union contains four floats. You can access those floats as varA, varB... which might be more descriptive names or you can access the same variables as an array data[0], data[1]... which might be more useful in loops.
With a union you can also use the same memory for different kinds of data, you might find that useful for things like writing a function to tell you if you are on a big endian or little endian CPU.
No, it is not for saving space. It is for ability to represent some binary data as various data types.
for example
#include <iostream>
#include <stdint.h>
union Foo{
int x;
struct y
{
unsigned char b0, b1, b2, b3;
};
char z[sizeof(int)];
};
int main()
{
Foo bar;
bar.x = 100;
std::cout << std::hex; // to show number in hexadec repr;
for(size_t i = 0; i < sizeof(int); i++)
{
std::cout << "0x" << (int)bar.z[i] << " "; // int is just to show values as numbers, not a characters
}
return 0;
}
output: 0x64 0x0 0x0 0x0 The same values are stored in struct bar.y, but not in array but in sturcture members. Its because my machine have a little endiannes. If it were big, than the output would be reversed: 0x0 0x0 0x0 0x64
You can achieve the same using reinterpret_cast:
#include <iostream>
#include <stdint.h>
int main()
{
int x = 100;
char * xBytes = reinterpret_cast<char*>(&x);
std::cout << std::hex; // to show number in hexadec repr;
for (size_t i = 0; i < sizeof(int); i++)
{
std::cout << "0x" << (int)xBytes[i] << " "; // (int) is just to show values as numbers, not a characters
}
return 0;
}
its usefull, for example, when you need to read some binary file, that was written on a machine with different endianess than yours. You can just access values as bytearray and swap those bytes as you wish.
Also, it is usefull when you have to deal with bit fields, but its a whole different story :)
First of all: Avoid unions where the access goes to the same memory but to different types!
Unions did not save space at all. The only define multiple names on the same memory area! And you can only store one of the elements in one time in a union.
if you have
union X
{
int x;
char y[4];
};
you can store an int OR 4 chars but not both! The general problem is, that nobody knows which data is actually stored in a union. If you store a int and read the chars, the compiler will not check that and also there is no runtime check. A solution is often to provide an additional data element in a struct to a union which contains the actual stored data type as an enum.
struct Y
{
enum { IS_CHAR, IS_INT } tinfo;
union
{
int x;
char y[4];
};
}
But in c++ you always should use classes or structs which can derive from a maybe empty parent class like this:
class Base
{
};
class Int_Type: public Base
{
...
int x;
};
class Char_Type: public Base
{
...
char y[4];
};
So you can device pointers to base which actually can hold a Int or a Char Type for you. With virtual functions you can access the members in a object oriented way of programming.
As mentioned already from Basile's answer, a useful case can be the access via different names to the same type.
union X
{
struct data
{
float a;
float b;
};
float arr[2];
};
which allows different access ways to the same data with the same type. Using different types which are stored in the same memory should be avoided at all!

Pass by Reference with Pointers

Why this program only works when I initialize a and b.
I want to pass it without initializing a and b, for example:
numChange(10,15);
Is this possible ?
#include <iostream>
using namespace std;
void numChange(int *x,int *y)
{
*x = 99;
*y = 77;
}
int main()
{
numChange(10,15);
//int a=10;
//int b=15;
//numChange(&a,&b);
cout<<a<<" , "<<b<<endl;
return 0;
}
Because you have defined your function to receive pointers, but when you call that function you are trying to pass an int.
The compiler is expecting memory addresses and you are trying to pass constants.
It does not make sense, you are trying to do something like 10 = 99; 15 = 77;?
numChange(10,15);
//int a=10;
//int b=15;
It seems that you are hopping that a = 10 = 99 and b = 15 = 77;
If this was possible, it means that I could never (after the call of numChange(10,15);) make a variable to actually have the value 10 because 10 is "pointing" to 99 (is not).
Recall: a pointer is an integer containing a location in memory.
This:
int a, b;
...
a = b;
copies the integer stored at the memory location reserved for 'b' to
the memory location reserved for 'a'.
This:
int *a, b;
...
a = &b;
stores the location of 'b' in 'a'. Following it with this:
*a = 42;
will store 42 in the memory location stored in 'a', which is the
variable 'b'.
Now, let's look at your code. This:
void numChange(int *x,int *y)
tells the compiler that 'numChange' will be called with two
pointers--that is, memory addresses. This part:
*x = 99;
*y = 77;
then stores two integers at the locations given in 'x' and 'y'.
When you call:
numChange(10,15);
the arguments are integers instead of memory location. However under
the hood, memory locations are also integers so the compiler converts
the arguments to pointers. Effectively, it's doing this:
numChange((int *)10, (int*)15);
(It should issue a warning when this happens, since it's almost never
a good idea, but it will do it.)
Basically, your call to 'numChange' tells it that there are integer
variables at memory addresses 10 and 15, and 'numChange' carries on
and stores integers at those memory locations. Since there aren't
variables (that we know of) at those locations, this code actually
overwrites some other data somewhere.
Meanwhile, this code:
int a=10;
int b=15;
numChange(&a,&b);
creates two integer variables and then passes their addresses in
memory to 'numChange'. BTW, you don't actually need to initialize
them. This works too:
int a, b;
numChange(&a,&b);
What's important is that the variables are created (and the compiler
sets aside RAM for them) and that their locations are then passed to
'numChange'.
(One aside: I'm treating variables as always being stored in RAM.
It's safe to think of them this way but modern compilers will try to
store them in CPU registers as much as possible for performance
reasons, copying them back into RAM when needed.)

Why does assignment to an element of an AVX-Vector-wrapper-class-object-array provoke access violation errors?

I am trying to do some vector stuff and wrote a wrapper for the m256d datatype from immintrin.h to use overloaded operators.
The following example should give you a basic idea.
Class definition
#include <immintrin.h>
using namespace std;
class vwrap {
public:
__m256d d;
vwrap(void) {
this->d = _mm256_set_pd(0.0,0.0,0.0,0.0);
}
void init (const double &a, const double &b, const double &c) {
this->d = _mm256_set_pd(0.0,c,b,a);
}
};
Array of vwrap objects
Let's imagine an array of vwrap that is allocated dynamically:
vwrap *a = (vwrap*) malloc(sizeof(vwrap)*2);
Access violation errors
Using a function of a vwrap object that contains a mm256-set-function... provokes an access violation error.
a[0].init(1.3,2.3,1.2);
The same thing is happening for assigning d with a mm256-set-function
(assigning another m256d-object doesn't work as well):
a[0].d = _mm256_set_pd(1,2,3,4);
Copying data from another object isn't working, too.
vwrap b;
a[0].d = b.d;
Stuff that works
The m256d-object can be manipulated without any problems:
a[0].d.m256d_f64[0] = 1.0;
a[0].d.m256d_f64[1] = 2.0;
a[0].d.m256d_f64[2] = 3.0;
a[0].d.m256d_f64[3] = 4.0;
The assignments are working in case of a normal class instance:
vwrap b,c;
__mm256d t = _mm256_set_pd(1,2,3,5);
b.d = _mm256_set_pd(1,2,3,4);
b.d = t;
b.d = c.d;
I don't get the problem.
Why can't I use the _mm256 functions (or assign a m256d-object) in case of a class array?
My only idea is to avoid using the mm256-functions and manipulate the double values directly.
But this is not what I intentionally wanted to do.
It's likely an alignment problem. __m256d need to be aligned on 32 byte boundaries. You shouldn't use malloc when alignment is a concern, use new or aligned malloc.
The your stack-allocated variables work correctly is that the compiler aligns them properly, because it knows they need to be aligned. Whereas when you call malloc, there's no way the runtime knows what you plan to store in the memory it gives you. Therefore, you need to either explicitly request alignment using aligned malloc, or use type-aware allocation which is what new is for.
Changing
vwrap *a = (vwrap*) malloc(sizeof(vwrap)*2);
to
vwrap *a = new vwrap[2];
vwrap *a = (vwrap*) _aligned_malloc(sizeof(vwrap)*2, 32);
should work.
EDIT: After trying this out on Windows with GCC 4.6.1 (compiler switch -march=corei7-avx) it seems new doesn't respect alignment requirements. Changing the new call to use _aligned_malloc works.

Initialization of c++ heap objects

I'am wondering if built-in types in objects created on heap with new will be initialized to zero? Is it mandated by the standard or is it compiler specific?
Given the following code:
#include <iostream>
using namespace std;
struct test
{
int _tab[1024];
};
int main()
{
test *p(new test);
for (int i = 0; i < 1024; i++)
{
cout << p->_tab[i] << endl;
}
delete p;
return 0;
}
When run, it prints all zeros.
You can choose whether you want default-initialisation, which leaves fundamental types (and POD types in general) uninitialised, or value-initialisation, which zero-initialises fundamental (and POD) types.
int * garbage = new int[10]; // No initialisation
int * zero = new int[10](); // Initialised to zero.
This is defined by the standard.
No, if you do something like this:
int *p = new int;
or
char *p = new char[20]; // array of 20 bytes
or
struct Point { int x; int y; };
Point *p = new Point;
then the memory pointed to by p will have indeterminate/uninitialized values.
However, if you do something like this:
std::string *pstring = new std::string();
Then you can be assured that the string will have been initialized as an empty string, but that is because of how class constructors work, not because of any guarantees about heap allocation.
It's not mandated by the standard. The memory for the primitive type members may contain any value that was last left in memory.
Some compilers I guess may choose to initialize the bytes. Many do in debug builds of code. They assign some known byte sequence to give you a hint when debugging that the memory wasn't initialized by your program code.
Using calloc will return bytes initialized to 0, but that's not standard-specific. calloc as been around since C along with malloc. However, you will pay a run-time overhead for using calloc.
The advice given previously about using the std::string is quite sound, because after all, you're using the std, and getting the benefits of class construction/destruction behaviour. In other words, the less you have to worry about, like initialization of data, the less that can go wrong.