Arranging global/static objects sequentially in memory

Arranging global/static objects sequentially in memory - c++

In C++, is it possible to force the compiler to arrange a series of global or static objects in a sequential memory position? Or is this the default behavior? For example, if I write…
MyClass g_first (“first”);
MyClass g_second (“second”);
MyClass g_third (“third”);
… will these objects occupy a continuous chunk of memory, or is the compiler free to place them anywhere in the address space?

The compiler can do as it pleases when it comes to placing static objects in memory; if you want better control over how your globals are placed, you should consider writing a struct that encompasses all of them. That will guarantee that your objects will all be packed in a sequential and predictable order.

Placing specific variables or group of variables into a memory segment is not a standard feature of the compiler.
But some compiler supports special methods to do this. Especially in embedded systems. For example in Keil, I guess you at at operator to place a particular variable.

The way to force objects to be in a contiguous piece of memory is to put them into an array.
If you use the built-in array type, the only way they can be initialized is their default constructors (although you can change their values later):
MyClass my_globals[3];
If you use a dynamic array (called std::vector in C++), you are more flexible:
namespace {
typedef std::vector<MyClass> my_globals_type;
my_globals_type init_my_globals()
{
my_globals_type globals;
globals.push_back(MyClass(“first”));
globals.push_back(MyClass(“second”));
globals.push_back(MyClass(“third”));
return globals;
}
my_globals_type my_globals = init_my_globals();
}
Note that global variables are usually frowned upon. And rightly so.

Yes, some compilers now contain optimizations that will automatically do something like this for you:
e.g.
Automatic pool allocation: improving performance by controlling data structure layout in the heap
Restructuring field layouts for embedded memory systems
MPADS: memory-pooling-assisted data splitting

Related

Is there a way to distinguish what type of memory used by the object instance?

If i have this code :
#include <assert.h>
class Foo {
public:
bool is_static();
bool is_stack();
bool is_dynamic();
};
Foo a;
int main()
{
Foo b;
Foo* c = new Foo;
assert( a.is_static() && !a.is_stack() && !a.is_dynamic());
assert(!b.is_static() && b.is_stack() && !b.is_dynamic());
assert(!c->is_static() && !c->is_stack() && c->is_dynamic());
delete c;
}
Is it possible to implement is_stack, is_static, is_dynamic method to do so in order to be assertions fulfilled?
Example of use: counting size of memory which particular objects of type Foo uses on stack, but not counting static or dynamic memory

This cannot be done using standard C++ facilities, which take pains to ensure that objects work the same way no matter how they are allocated.
You can do it, however, by asking the OS about your process memory map, and figuring out what address range a given object falls into. (Be sure to use uintptr_t for arithmetic while doing this.)

Scroll down to the second answer that gives a wide array of available options depending on the Operating System:
How to determine CPU and memory consumption from inside a process?
I would also recommend reading this article on Tracking Memory Alloactions in C++:
http://www.almostinfinite.com/memtrack.html
Just be aware that it's a ton of work.

while the intention is good here, the approach is not the best.
Consider a few things:
on the stack you allocate temporary variables for your methods. You
don't always have to worry about how much stack you use because the
lifetime of the temp variables is short
related to stack what you usually care about is not corrupting it,
which can happen if your program uses pointers and accesses data
outside the intended bounds. For this type of problems a isStatic
function will not help.
for dynamic memory allocation you usually override the new/ delete
operators and keep a counter to track the amount of memory used. so
again, a isDynamic function might not do the trick.
in the case of global variables (you said static but I extended the
scope a bit) which are allocated in a separate data section (not
stack nor heap) well you don't always care about them because they
are statically allocated and the linker will tell you at link time if
you don't have enough space. Plus you can check the map file if you
really want to know address ranges.
So most of your concerns are solved at compile time and to be honest you rarely care about them. And the rest are (dynamic memory allocation) are treated differently.
But if you insist on having those methods you can tell the linker to generate a map file which will give you the address ranges for all data sections and use those for your purposes.

Should arrays be used in C++?

Since std::list and std::vector exist, is there a reason to use traditional C arrays in C++, or should they be avoided, just like malloc?

In C++11 where std::array is available, the answer is "yes, arrays should be avoided". Prior to C++11, you may need to use C arrays to allocate arrays in the automatic storage (i.e. on the stack).

Definitely, although with std::array in C++11, practically only for
static data. C style arrays have three important advantages over
std::vector:
They don't require dynamic allocation. For this reason, C style
arrays are to be preferred where you're likely to have a lot of very
small arrays. Say something like an n-dimension point:
template <typename T, int dims>
class Point
{
T myData[dims];
// ...
};
Typically, one might imagine a that dims will be very small (2 or 3),
T a built-in type (double), and that you might end up with
std::vector<Point> with millions of elements. You definitely don't
want millions of dynamic allocations of 3 double.
The support static initialization. This is only an issue for static
data, where something like:
struct Data { int i; char const* s; };
Data const ourData[] =
{
{ 1, "one" },
{ 2, "two" },
// ...
};
This is often preferable to using a vector (and std::string), since it
avoids all order of initialization issues; the data is pre-loaded,
before any actual code can be executed.
Finally, related to the above, the compiler can calculate the actual
size of the array from the initializers. You don't have to count them.
If you have access to C++11, std::array solves the first two issues,
and should definitely be used in preference to C style arrays in the
first case. It doesn't address the third, however, and having the
compiler dimension the array according to the number of initializers is
still a valid reason to prefer C style arrays.

Never say "never", but I'd agree that their role is greatly diminished by true data structures from STL.
I'd also say that encapsulation inside objects should minimize the impact of choices like this. If the array is a private data member, you can swap it in or out without affecting clients of your class.

I have worked on safety critical systems where you are unable to use dynamic memory allocation. The memory has to always be on the stack. Therefore in this case you would use arrays as the size is fixed at compile time.

array in c++ gives you fixed size fast alternative of dynamic sized std::vector and std::list. std::array is one of the additions in c++11. It provides the benefit of std containers while still providing the aggregate type semantics of C-style arrays.
So in c++11 i'd certainly use std::array, where it is required, over vector. But i'd avoid C style array in C++03.

Most usually, no, I can't think of a reason to use raw arrays over, say, vectors. If the code is new.
You might have to resort to using arrays if your libraries need to be compatible with code that expects arrays and raw pointers.

I know a lot of people are pointing out std::array for allocating arrays on the stack, and std::vector for the heap. But neither seem to support non-native alignment. If you're doing any kind of numeric code that you want use SSE or VPX instructions on (thus requiring 128 or 256 byte alignment respectively), C arrays would still seem to be your best bet.

I would say arrays are still useful, if you are storing a small static amount of data why not.

The only advantage of an array (of course wrapped in something that will manage automatically its deallocation when need) over std::vector I can think about is that vector cannot pass ownership of its data, unless your compiler supports C++11 and move constructors.

C style arrays are a fundamental data structure, so there will be cases when it is better to use it. For the general case, however, use the more advanced data structures that round off the corners of the underlying data. C++ allows you to do some very interesting and useful things with memory, many of which work with simple arrays.

You should use STL containers internally, but you should not pass pointers to such containers between different modules, or you will end up in dependency hell. Example:
std::string foo;
// fill foo with stuff
myExternalOutputProc(foo.c_str());
is a very good solution but not
std::string foo;
// fill foo with stuff
myExternalOutputProc(&foo);
The reason is that std::string can be implemented in many different ways but a c-style string is always a c-style string.

Access data in shared memory C++ POSIX

I open a piece of shared memory and get a handle of it. I'm aware there are several vectors of data stored in the memory. I'd like to access those vectors of data and perform some actions on them. How can I achieve this? Is it appropriate to treat the shared memory as an object so that we can define those vectors as fields of the object and those needed actions as member functions of the object?
I've never dealt with shared memory before. To make things worse, I'm new to C++ and POSIX. Could someone please provide some guidance? Simple examples would be greatly appreciated.

int my_shmid = shmget(key,size,shmflgs);
...
void* address_of_my_shm1 = shat(my_shmid,0,shmflags);
Object* optr = static_cast<Object*>(address_of_my_shm1);
...or, in some other thread/process to which you arranged to pass the address_of_my_shm1
...by some other means
void* address_of_my_shm2 = shat(my_shmid,address_of_my_shm1,shmflags);
You may want to assert that address_of_shm1 == address_of_shm2. But note that I say "may" - you don't actually have to do this. Some types/structs/classes can be read equally well at different addresses.
If the object will appear in different address spaces, then pointers outside the shhm in process A may not point to the same thing as in process B. In general, pointers outside the shm are bad. (Virtual functions are pointers outside the object, and outside the shm. Bad, unless you have other reason to trust them.)
Pointers inside the shm are usable, if they appear at the same address.
Relative pointers can be quite usable, but, again, so long as they point only inside the shm. Relative pointers may be relative to the base of an object, i.e. they may be offsets. Or they may be relative to the pointer itself. You can define some nice classes/templates that do these calculations, with casting going on under the hood.
Sharing of objects through shmem is simplest if the data is just POD (Plain Old Data). Nothing fancy.
Because you are in different processes that are not sharing the whole address space, you may not be guaranteed that things like virtual functions will appear at the same address in all processes using the shm shared memory segment. So probably best to avoid virtual functions. (If you try hard and/or know linkage, you may in some circumstances be able to share virtual functions. But that is one of the first things I would disable if I had to debug.)
You should only do this if you are aware of your implementation's object memory model. And if advanced (for C++) optimizations like splitting structs into discontiguous hot and cold parts are disabled. Since such optimizations rae arguably not legal for C++, you are probably safe.
Obviously you are better off if you are casting to the same object type/class on all sides.
You can get away with non-virtual functions. However, note that it can be quite easy to have the same class, but different versions of the class - e.g. differing in size, e.g. adding a new field and changing the offsets of all of the other fields - so you need to be quite careful to ensure all sides are using the same definitions and declarations.

C++ Class Memory Model And Alignment

I have several questions to ask that pertains to data position and alignment in C++. Do classes have the same memory placement and memory alignment format as structs?
More specifically, is data loaded into memory based on the order in which it's declared? Do functions affect memory alignment and data position or are they allocated to another location? Generally speaking, I keep all of my memory alignment and position dependent stuff like file headers and algorithmic data within a struct. I'm just curious to know whether or not this is intrinsic to classes as it is to structs and whether or not it will translate well into classes if I chose to use that approach.
Edit: Thanks for all your answers. They've really helped a lot.

Do classes have the same memory placement and memory alignment format
as structs?
The memory placement/alignment of objects is not contingent on whether its type was declared as a class or a struct. The only difference between a class and a struct in C++ is that a class have private members by default while a struct have public members by default.
More specifically, is data loaded into memory based on the order in
which it's declared?
I'm not sure what you mean by "loaded into memory". Within an object however, the compiler is not allowed to rearrange variables. For example:
class Foo {
int a;
int b;
int c;
};
The variables c must be located after b and b must be located after a within a Foo object. They are also constructed (initialized) in the order shown in the class declaration when a Foo is created, and destructed in the reverse order when a Foo is destroyed.
It's actually more complicated than this due to inheritance and access modifiers, but that is the basic idea.
Do functions affect memory alignment and data position or are they
allocated to another location?
Functions are not data, so alignment isn't a concern for them. In some executable file formats and/or architectures, function binary code does in fact occupy a separate area from data variables, but the C++ language is agnostic to that fact.
Generally speaking, I keep all of my memory alignment and position
dependent stuff like file headers and algorithmic data within a
struct. I'm just curious to know whether or not this is intrinsic to
classes as it is to structs and whether or not it will translate well
into classes if I chose to use that approach.
Memory alignment is something that's almost automatically taken care of for you by the compiler. It's more of an implementation detail than anything else. I say "almost automatically" since there are situations where it may matter (serialization, ABIs, etc) but within an application it shouldn't be a concern.
With respect with reading files (since you mention file headers), it sounds like you're reading files directly into the memory occupied by a struct. I can't recommend that approach since issues with padding and alignment may make your code work on one platform and not another. Instead you should read the raw bytes a couple at a time from the file and assign them into the structs with simple assignment.

Do classes have the same memory placement and memory alignment format as structs?
Yes. Technically there is no difference between a class and a struct. The only difference is the default member access specification otherwise they are identical.
More specifically, is data loaded into memory based on the order in which it's declared?
Yes.
Do functions affect memory alignment and data position or are they allocated to another location?
No. They do not affect alignment. Methods are compiled separately. The object does not contain any reference to methods (to those that say virtual tables do affect members the answer is yes and no but this is an implementation detail that does not affect the relative difference between members. The compiler is allowed to add implementation specific data to the object).
Generally speaking, I keep all of my memory alignment and position dependent stuff like file headers and algorithmic data within a struct.
OK. Not sure how that affects anything.
I'm just curious to know whether or not this is intrinsic to classes as it is to structs
Class/Structs different name for the same thing.
and whether or not it will translate well into classes if I chose to use that approach.
Choose what approach?

C++ classes simply translate into structs with all the instance variables as the data contained inside the structs, while all the functions are separated from the class and are treated like functions with accept those structs as an argument.
The exact way instance variables are stored depends on the compiler used, but they generally tend to be in order.

C++ classes do not participate in "persistence", like binary-mode structures, and shouldn't have alignment attached to them. Keep the classes simple.
Putting alignment with classes may have negative performance benefits and may have side effects too.

Locating objects (structs) in memory - how to?

How would you locate an object in memory, lets say that you have a struct defined as:
struct POINT {
int x;
int y;
};
How would I scan the memory region of my app to find instances of this struct so that I can read them out?
Thanks R.

You can't without adding type information to the struct. In memory a struct like that is nothing else than 2 integers so you can't recognize them any better than you could recognize any other object.

You can't. Structs don't store any type information (unless they have virtual member functions), so you can't distinguish them from any other block of sizeof(POINT) bytes.
Why don't you store your points in a vector or something?

You can't. You have to know the layout to know what section of memory have to represent a variable. That's a kind of protocol and that's why we use text based languages instead raw values.

You don't - how would you distinguish two arbitrary integers from random noise?
( but given a Point p; in your source code, you can obtain its address using the address-of operator ... Point* pp = &p;).

Short answer: you can't. Any (appropriately aligned) sequence of 8 bytes could potentially represent a POINT. In fact, an array of ints will be indistinguishable from an array of POINTS. In some cases, you could take advantage of knowledge of the compiler implementation to do better. For instance, if the struct had virtual functions, you could look for the correct vtable pointer - but there could also be false positives.
If you want to keep track of objects, you need to register them in their constructor and unregister them in their destructor (and pay the performance penalty), or give them their own allocator.

There's no way to identify that struct. You need to put the struct somewhere it can be found, on the stack or on the heap.
Sometimes data structures are tagged with identifying information to assist with debugging or memory management. As a means of data organization, it is among the worst possible approaches.
You probably need to a lot of general reading on memory management.

There is no standard way of doing this. The platform may specify some APIs which allow you to access the stack and the free store. Further, even if you did, without any additional information how would you be sure that you are reading a POINT object and not a couple of ints? The compiler/linker can read this because it deals with (albeit virtual) addresses and has some more information (and control) than you do.

You can't. Something like that would probably be possible on some "tagged" architecture that also supported tagging objects of user-defined types. But on a traditional architecture it is absolutely impossible to say for sure what is stored in memory simply by looking at the raw memory content.
You can come closer to achieving what you want by introducing a unique signature into the type, like
struct POINT {
char signature[8];
int x;
int y;
};
and carefully setting it to some fixed and "unique" pattern in each object of POINT type, and then looking for that pattern in memory. If it is your application, you can be sure with good degree of certainty that each instance of the pattern is your POINT object. But in general, of course, there will never be any guarantee that the pattern you found belongs to your object, as opposed to being there purely accidentally.

What everyone else has said is true. In memory, your struct is just a few bytes, there's nothing in particular to distinguish it.
However, if you feel like a little hacking, you can look up the internals of your C library and figure out where memory is stored on the heap and how it appears. For example, this link shows how stuff gets allocated in one particular system.
Armed with this knowledge, you could scan your heap to find allocated blocks that were sizeof(POINT), which would narrow down the search considerably. If you look at the table you'll notice that the file name and line number of the malloc() call are being recorded - if you know where in your source code you're allocating POINTs, you could use this as a reference too.
However, if your struct was allocated on the stack, you're out of luck.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js