Access data in shared memory C++ POSIX

Access data in shared memory C++ POSIX - c++

I open a piece of shared memory and get a handle of it. I'm aware there are several vectors of data stored in the memory. I'd like to access those vectors of data and perform some actions on them. How can I achieve this? Is it appropriate to treat the shared memory as an object so that we can define those vectors as fields of the object and those needed actions as member functions of the object?
I've never dealt with shared memory before. To make things worse, I'm new to C++ and POSIX. Could someone please provide some guidance? Simple examples would be greatly appreciated.

int my_shmid = shmget(key,size,shmflgs);
...
void* address_of_my_shm1 = shat(my_shmid,0,shmflags);
Object* optr = static_cast<Object*>(address_of_my_shm1);
...or, in some other thread/process to which you arranged to pass the address_of_my_shm1
...by some other means
void* address_of_my_shm2 = shat(my_shmid,address_of_my_shm1,shmflags);
You may want to assert that address_of_shm1 == address_of_shm2. But note that I say "may" - you don't actually have to do this. Some types/structs/classes can be read equally well at different addresses.
If the object will appear in different address spaces, then pointers outside the shhm in process A may not point to the same thing as in process B. In general, pointers outside the shm are bad. (Virtual functions are pointers outside the object, and outside the shm. Bad, unless you have other reason to trust them.)
Pointers inside the shm are usable, if they appear at the same address.
Relative pointers can be quite usable, but, again, so long as they point only inside the shm. Relative pointers may be relative to the base of an object, i.e. they may be offsets. Or they may be relative to the pointer itself. You can define some nice classes/templates that do these calculations, with casting going on under the hood.
Sharing of objects through shmem is simplest if the data is just POD (Plain Old Data). Nothing fancy.
Because you are in different processes that are not sharing the whole address space, you may not be guaranteed that things like virtual functions will appear at the same address in all processes using the shm shared memory segment. So probably best to avoid virtual functions. (If you try hard and/or know linkage, you may in some circumstances be able to share virtual functions. But that is one of the first things I would disable if I had to debug.)
You should only do this if you are aware of your implementation's object memory model. And if advanced (for C++) optimizations like splitting structs into discontiguous hot and cold parts are disabled. Since such optimizations rae arguably not legal for C++, you are probably safe.
Obviously you are better off if you are casting to the same object type/class on all sides.
You can get away with non-virtual functions. However, note that it can be quite easy to have the same class, but different versions of the class - e.g. differing in size, e.g. adding a new field and changing the offsets of all of the other fields - so you need to be quite careful to ensure all sides are using the same definitions and declarations.

Related

C++ Class Memory Model And Alignment

I have several questions to ask that pertains to data position and alignment in C++. Do classes have the same memory placement and memory alignment format as structs?
More specifically, is data loaded into memory based on the order in which it's declared? Do functions affect memory alignment and data position or are they allocated to another location? Generally speaking, I keep all of my memory alignment and position dependent stuff like file headers and algorithmic data within a struct. I'm just curious to know whether or not this is intrinsic to classes as it is to structs and whether or not it will translate well into classes if I chose to use that approach.
Edit: Thanks for all your answers. They've really helped a lot.

Do classes have the same memory placement and memory alignment format
as structs?
The memory placement/alignment of objects is not contingent on whether its type was declared as a class or a struct. The only difference between a class and a struct in C++ is that a class have private members by default while a struct have public members by default.
More specifically, is data loaded into memory based on the order in
which it's declared?
I'm not sure what you mean by "loaded into memory". Within an object however, the compiler is not allowed to rearrange variables. For example:
class Foo {
int a;
int b;
int c;
};
The variables c must be located after b and b must be located after a within a Foo object. They are also constructed (initialized) in the order shown in the class declaration when a Foo is created, and destructed in the reverse order when a Foo is destroyed.
It's actually more complicated than this due to inheritance and access modifiers, but that is the basic idea.
Do functions affect memory alignment and data position or are they
allocated to another location?
Functions are not data, so alignment isn't a concern for them. In some executable file formats and/or architectures, function binary code does in fact occupy a separate area from data variables, but the C++ language is agnostic to that fact.
Generally speaking, I keep all of my memory alignment and position
dependent stuff like file headers and algorithmic data within a
struct. I'm just curious to know whether or not this is intrinsic to
classes as it is to structs and whether or not it will translate well
into classes if I chose to use that approach.
Memory alignment is something that's almost automatically taken care of for you by the compiler. It's more of an implementation detail than anything else. I say "almost automatically" since there are situations where it may matter (serialization, ABIs, etc) but within an application it shouldn't be a concern.
With respect with reading files (since you mention file headers), it sounds like you're reading files directly into the memory occupied by a struct. I can't recommend that approach since issues with padding and alignment may make your code work on one platform and not another. Instead you should read the raw bytes a couple at a time from the file and assign them into the structs with simple assignment.

Do classes have the same memory placement and memory alignment format as structs?
Yes. Technically there is no difference between a class and a struct. The only difference is the default member access specification otherwise they are identical.
More specifically, is data loaded into memory based on the order in which it's declared?
Yes.
Do functions affect memory alignment and data position or are they allocated to another location?
No. They do not affect alignment. Methods are compiled separately. The object does not contain any reference to methods (to those that say virtual tables do affect members the answer is yes and no but this is an implementation detail that does not affect the relative difference between members. The compiler is allowed to add implementation specific data to the object).
Generally speaking, I keep all of my memory alignment and position dependent stuff like file headers and algorithmic data within a struct.
OK. Not sure how that affects anything.
I'm just curious to know whether or not this is intrinsic to classes as it is to structs
Class/Structs different name for the same thing.
and whether or not it will translate well into classes if I chose to use that approach.
Choose what approach?

C++ classes simply translate into structs with all the instance variables as the data contained inside the structs, while all the functions are separated from the class and are treated like functions with accept those structs as an argument.
The exact way instance variables are stored depends on the compiler used, but they generally tend to be in order.

C++ classes do not participate in "persistence", like binary-mode structures, and shouldn't have alignment attached to them. Keep the classes simple.
Putting alignment with classes may have negative performance benefits and may have side effects too.

C++: What are scenarios where using pointers is a "Good Idea"(TM)? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Common Uses For Pointers?
I am still learning the basics of C++ but I already know enough to do useful little programs.
I understand the concept of pointers and the examples I see in tutorials make sense to me. However, on the practical level, and being a (former) PHP developer, I am not yet confident to actually use them in my programs.
In fact, so far I have not felt the need to use any pointer. I have my classes and functions and I seem to be doing perfectly fine without using any pointer (let alone pointers to pointers). And I can't help feeling a bit proud of my little programs.
Still, I am aware that I am missing on one of C++'s most important feature, a double edged one: pointers and memory management can create havoc, seemingly random crashes, hard to find bugs and security holes... but at the same time, properly used, they must allow for clever and efficient programming.
So: do tell me what I am missing by not using pointers.
What are good scenarios where using pointers is a must?
What do they allow you to do that you couldn't do otherwise?
In which way to they make your programs more efficient?
And what about pointers to pointers???
[Edit: All the various answers are useful. One problem at SO is that we cannot "accept" more than one answer. I often wish I could. Actually, it's all the answers combined that help to understand better the whole picture. Thanks.]

I use pointers when I want to give a class access to an object, without giving it ownership of that object. Even then, I can use a reference, unless I need to be able to change which object I am accessing and/or I need the option of no object, in which case the pointer would be NULL.

This question has been asked on SO before. My answer from there:
I use pointers about once every six lines in the C++ code that I write. Off the top of my head, these are the most common uses:
When I need to dynamically create an object whose lifetime exceeds the scope in which it was created.
When I need to allocate an object whose size is unknown at compile time.
When I need to transfer ownership of an object from one thing to another without actually copying it (like in a linked list/heap/whatever of really big, expensive structs)
When I need to refer to the same object from two different places.
When I need to slice an array without copying it.
When I need to use compiler intrinsics to generate CPU-specific instructions, or work around situations where the compiler emits suboptimal or naive code.
When I need to write directly to a specific region of memory (because it has memory-mapped IO).

Pointers are commonly used in C++. Becoming comfortable with them, will help you understand a broader range of code. That said if you can avoid them that is great, however, in time as your programs become more complex, you will likely need them even if only to interface with other libraries.
Primarily pointers are used to refer to dynamically allocated memory (returned by new).
They allow functions to take arguments that cannot be copied onto the stack either because they are too big or cannot be copied, such as an object returned by a system call. (I think also stack alignment, can be an issue, but too hazy to be confident.)
In embedded programing they are used to refer to things like hardware registers, which require that the code write to a very specific address in memory.
Pointers are also used to access objects through their base class interfaces. That is if I have a class B that is derived from class A class B : public A {}. That is an instance of the object B could be accessed as if it where class A by providing its address to a pointer to class A, ie: A *a = &b_obj;
It is a C idiom to use pointers as iterators on arrays. This may still be common in older C++ code, but is probably considered a poor cousin to the STL iterator objects.
If you need to interface with C code, you will invariable need to handle pointers which are used to refer to dynamically allocated objects, as there are no references. C strings are just pointers to an array of characters terminated by the nul '\0' character.
Once you feel comfortable with pointers, pointers to pointers won't seem so awful. The most obvious example is the argument list to main(). This is typically declared as char *argv[], but I have seen it declared (legally I believe) as char **argv.
The declaration is C style, but it says that I have array of pointers to pointers to char. Which is interpreted as a arbitrary sized array (the size is carried by argc) of C style strings (character arrays terminated by the nul '\0' character).

If you haven't felt a need for pointers, I wouldn't spend a lot of time worrying about them until a need arises.
That said, one of the primary ways pointers can contribute to more efficient programming is by avoiding copies of actual data. For example, let's assume you were writing a network stack. You receive an Ethernet packet to be processed. You successively pass that data up the stack from the "raw" Ethernet driver to the IP driver to the TCP driver to, say, the HTTP driver to something that processes the HTML it contains.
If you're making a new copy of the contents for each of those, you end up making at least four copies of the data before you actually get around to rendering it at all.
Using pointers can avoid a lot of that -- instead of copying the data itself, you just pass around a pointer to the data. Each successive layer of the network stack looks at its own header, and passes a pointer to what it considers the "payload" up to the next higher layer in the stack. That next layer looks at its own header, modifies the pointer to show what it considers the payload, and passes it on up the stack. Instead of four copies of the data, all four layers work with one copy of the real data.

A big use for pointers is dynamic sizing of arrays. When you don't know the size of the array at compile time, you will need to allocate it at run-time.
int *array = new int[dynamicSize];
If your solution to this problem is to use std::vector from the STL, they use dynamic memory allocation behind the scenes.

There are several scenarios where pointers are required:
If you are using Abstract Base Classes with virtual methods. You can hold a std::vector and loop through all these objects and call a virtual method. This REQUIRES pointers.
You can pass a pointer to a buffer to a method reading from a file etc.
You need a lot of memory allocated on the heap.
It's a good thing to care about memory problems right from the start. So if you start using pointers, you might as well take a look at smart pointers, like boost's shared_ptr for example.

What are good scenarios where using pointers is a must?
Interviews. Implement strcpy.
What do they allow you to do that you couldn't do otherwise?
Use of inheritance hierarchy. Data structures like Binary trees.
In which way to they make your programs more efficient?
They give more control to the programmer, for creating and deleting resources at run time.
And what about pointers to pointers???
A frequently asked interview question. How will you create two dimensional array on heap.

A pointer has a special value, NULL, that reference's won't. I use pointers wherever NULL is a valid and useful value.

I just want to say that i rarely use pointers. I use references and stl objects (deque, list, map, etc).
A good idea is when you need to return an object where the calling function should free or when you dont want to return by value.
List<char*>* fileToList(char*filename) { //dont want to pass list by value
ClassName* DataToMyClass(DbConnectionOrSomeType& data) {
//alternatively you can do the below which doesnt require pointers
void DataToMyClass(DbConnectionOrSomeType& data, ClassName& myClass) {
Thats pretty much the only situation i use but i am not thinking that hard. Also if i want a function to modify a variable and cant use the return value (say i need more then one)
bool SetToFiveIfPositive(int**v) {

You can use them for linked lists, trees, etc.
They're very important data structures.

In general, pointers are useful as they can hold the address of a chunk of memory. They are especially useful in some low level drivers where they are efficiently used to operate on a piece of memory byte by byte. They are most powerful invention that C++ inherits from C.
As to pointer to pointer, here is a "hello-world" example showing you how to use it.
#include <iostream>
void main()
{
int i = 1;
int j = 2;
int *pInt = &i; // "pInt" points to "i"
std::cout<<*pInt<<std::endl; // prints: 1
*pInt = 6; // modify i, i = 6
std::cout<<i<<std::endl; // prints: 6
int **ppInt = &pInt; // "ppInt" points to "pInt"
std::cout<<**ppInt<<std::endl; // prints: 6
**ppInt = 8; // modify i, i = 8
std::cout<<i<<std::endl; // prints: 8
*ppInt = &j; // now pInt points to j
*pInt = 10; // modify j, j = 10
std::cout<<j<<std::endl; // prints: 10
}
As we see, "pInt" is a pointer to integer which points to "i" at the beginning. With it, you can modify "i". "ppInt" is a pointer to pointer which points to "pInt". With it, you can modify "pInt" which happens to be an address. As a result, "*ppInt = &j" makes "pInt" points to "j" now. So we have all the results above.

Can you write a polymorphic class to disk and survive?

Firstly, I know that writing a class to disk is bad, but you should see some of our other code. D:
My question is: can I write a polymorphic class to disk and then read it in later and not get undefined behaviour? I am going to guess not because of vtables (I think these are generated at runtime and unique to the object?)
I.e.
class A {
virtual ~A() {}
virtual void foo() = 0;
};
class B : public A {
virtual ~B() {}
virtual void foo() {}
};
A * a = new B;
fwrite( a, 1, sizeof( B ), fp );
delete a;
a = new B;
fread( a, 1, sizeof( B ), fp );
a->foo();
delete a;
Thank-you!

I'll suggest you to take a look at Boost Serialization.
we use the term "serialization" to mean the reversible deconstruction of an arbitrary set of C++ data structures to a sequence of bytes. Such a system can be used to reconstitute an equivalent structure in another program context. Depending on the context, this might used implement object persistence, remote parameter passing or other facility.

You might be able to get away with it if such objects are always read back during the same execution of the program that wrote them (though I really don't recommend it). But if the data in the file must persist between different executions of the program, then using the raw bytes of the in-memory objects will almost certainly lead to significant problems.
Each vtable itself is generated at compile time and stored somewhere in the resulting executable. What each object instance contains is just a pointer to the appropriate vtable, and that pointer does not change for the lifetime of any given object. (Multiple inheritance can be a little more complicated, but for this discussion those details aren't relevant. The pointers are still constant.)
So if an object has a vtable pointer and you write the raw bytes of that object to disk, then the vtable pointer is written to disk as well. If you then read back those bytes during the same execution of the program and push them into an appropriate object, it may work since the vtable will still be in the same location and thus the vtable pointer will still be correct.
(However note that everything I just explained there is an implementation detail. While many compilers typically implement virtual functions in that manner, I don't think any of the exact details are guaranteed by the C++ standard. So there could be additional potential problems.)
Now, if this might be possible, why not store such objects for longer durations? Because you have no guarantee that any particular virtual table will be in the same memory location.
Some operating systems may change the memory layout for each execution of the same program. I don't know whether or not this actually affects virtual table locations, but that's certainly a serious risk.
Furthermore, if you ever compile a new version of the program, the location of each virtual table is completely up to the whims of the compiler. Changes to seemingly unrelated parts of the code may cause the compiler to place the relevant virtual tables in different locations. Obviously, that happening would completely break this scheme. And you have no way to prevent it from happening.
(And beyond the vtables, what if new data members need to be added to those objects in subsequent versions of the program? You might have to deal with reading past versions of raw objects' bytes into new versions that have new members or a different layout of members. That can get complicated and ugly as well as error prone.)
Now, even if you only intend to store the objects temporarily for each execution of the program. I still don't think it's a good idea. You are highly restricted as to what kinds of variables these objects can contain. No smart objects (std::string, std::vector, etc). No pointers to memory allocated per each object. Any strings must therefore be stored in raw character arrays. Other dynamic allocation would have to be turned into fixed members or member arrays. That means you lose a lot of C++'s benefits everywhere these objects are used.
Furthermore, these objects and the scheme of writing this directly to disk would need to be accompanied by comments and documentation warning of all the dangers I've described. Otherwise, some future programmer might unknowingly decide to add the wrong kind of data member. Or even worse, they might decide to try storing such objects longer than the execution of the program, opening them up to serious crashes and failures that might not happen until much later in the future (and probably at the worst possible time).
In the end, I strongly suggest using a scheme that stores the data in a format specifically intended for the file. As someone else already mentioned, Boost Serialization is a good option. If not that, there are may be other usable serialization libraries. Or else depending on your needs, you may be able to roll your own mechanism without too much trouble.

The problem is not the vtable. It is stored per class type, not per instance, so you won't write it to file. Basically your code should work (haven't tried it).
However, you should keep in mind that reading pointers/handles from file does not work.

Getting list of all existing vtables

In my application I have quite some void-pointers (this is because of historical reasons, application was originally written in pure C). In one of my modules I know that the void-pointers points to instances of classes that could inherit from a known base class, but I cannot be 100% sure of it. Therefore, doing a dynamic_cast on the void-pointer might give problems. Possibly, the void-pointer even points to a plain-struct (so no vptr in the struct).
I would like to investigate the first 4 bytes of the memory the void-pointer is pointing to, to see if this is the address of the valid vtable. I know this is platform, maybe even compiler-version-specific, but it could help me in moving the application forward, and getting rid of all the void-pointers over a limited time period (let's say 3 years).
Is there a way to get a list of all vtables in the application, or a way to check whether a pointer points to a valid vtable, and whether that instance pointing to the vtable inherits from a known base class?

I would like to investigate the first
4 bytes of the memory the void-pointer
is pointing to, to see if this is the
address of the valid vtable.
You can do that, but you have no guarantees whatsoever it will work. Y don't even know if the void* will point to the vtable. Last time I looked into this (5+ years ago) I believe some compiler stored the vtable pointer before the address pointed to by the instance*.
I know this is platform, maybe even
compiler-version-specific,
It may also be compiler-options speciffic, depending on what optimizations you use and so on.
but it could help me in moving the
application forward, and getting rid
of all the void-pointers over a
limited time period (let's say 3
years).
Is this the only option you can see for moving the application forward? Have you considered others?
Is there a way to get a list of all
vtables in the application,
No :(
or a way to check whether a pointer
points to a valid vtable,
No standard way. What you can do is open some class pointers in your favorite debugger (or cast the memory to bytes and log it to a file) and compare it and see if it makes sense. Even so, you have no guarantees that any of your data (or other pointers in the application) will not look similar enough (when cast as bytes) to confuse whatever code you like.
and whether that instance pointing to
the vtable inherits from a known base
class?
No again.
Here are some questions (you may have considered them already). Answers to these may give you more options, or may give us other ideas to propose:
how large is the code base? Is it feasible to introduce global changes, or is functionality to spread-around for that?
do you treat all pointers uniformly (that is: are there common points in your source code where you could plug in and add your own metadata?)
what can you change in your sourcecode? (If you have access to your memory allocation subroutines or could plug in your own for example you may be able to plug in your own metadata).
If different data types are cast to void* in various parts of your code, how do you decide later what is in those pointers? Can you use the code that discriminates the void* to decide if they are classes or not?
Does your code-base allow for refactoring methodologies? (refactoring in small iterations, by plugging in alternate implementations for parts of your code, then removing the initial implementation and testing everything)
Edit (proposed solution):
Do the following steps:
define a metadata (base) class
replace your memory allocation routines with custom ones which just refer to the standard / old routines (and make sure your code still works with the custom routines).
on each allocation, allocate the requested size + sizeof(Metadata*) (and make sure your code still works).
replace the first sizeof(Metadata*) bytes of your allocation with a standard byte sequence that you can easily test for (I'm partial to 0xDEADBEEF :D). Then, return [allocated address] + sizeof(Metadata*) to the application. On deallocation, take the recieved pointer, decrement it by `sizeof(Metadata*), then call the system / previous routine to perform the deallocation. Now, you have an extra buffer allocated in your code, specifically for metadata on each allocation.
In the cases you're interested in having metadata for, create/obtain a metadata class pointer, then set it in the 0xDEADBEEF zone. When you need to check metadata, reinterpret_cast<Metadata*>([your void* here]), decrement it, then check if the pointer value is 0xDEADBEEF (no metadata) or something else.
Note that this code should only be there for refactoring - for production code it is slow, error prone and generally other bad things that you do not want your production code to be. I would make all this code dependent on some REFACTORING_SUPPORT_ENABLED macro that would never allow your Metadata class to see the light of a production release (except for testing builds maybe).

I would say it is not possible without related reference (header declaration).

If you want to replace those void pointers to correct interface type, here is what I think to automate it:
Go through your codebase to get a list of all classes that has virtual functions, you could do this fast by writing script, like Perl
Write an function which take a void* pointer as input, and iterate over those classes try to dynamic_cast it, and log information if succeeded, such as interface type, code line
Call this function anywhere you used void* pointer, maybe you could wrap it with a macro so you could get file, line information easy
Run a full automation (if you have) and analyse the output.

The easier way would be to overload operator new for your particular base class. That way, if you know your void* pointers are to heap objects, then you can also with 100% certainty determine whether they're pointing to your object.

Locating objects (structs) in memory - how to?

How would you locate an object in memory, lets say that you have a struct defined as:
struct POINT {
int x;
int y;
};
How would I scan the memory region of my app to find instances of this struct so that I can read them out?
Thanks R.

You can't without adding type information to the struct. In memory a struct like that is nothing else than 2 integers so you can't recognize them any better than you could recognize any other object.

You can't. Structs don't store any type information (unless they have virtual member functions), so you can't distinguish them from any other block of sizeof(POINT) bytes.
Why don't you store your points in a vector or something?

You can't. You have to know the layout to know what section of memory have to represent a variable. That's a kind of protocol and that's why we use text based languages instead raw values.

You don't - how would you distinguish two arbitrary integers from random noise?
( but given a Point p; in your source code, you can obtain its address using the address-of operator ... Point* pp = &p;).

Short answer: you can't. Any (appropriately aligned) sequence of 8 bytes could potentially represent a POINT. In fact, an array of ints will be indistinguishable from an array of POINTS. In some cases, you could take advantage of knowledge of the compiler implementation to do better. For instance, if the struct had virtual functions, you could look for the correct vtable pointer - but there could also be false positives.
If you want to keep track of objects, you need to register them in their constructor and unregister them in their destructor (and pay the performance penalty), or give them their own allocator.

There's no way to identify that struct. You need to put the struct somewhere it can be found, on the stack or on the heap.
Sometimes data structures are tagged with identifying information to assist with debugging or memory management. As a means of data organization, it is among the worst possible approaches.
You probably need to a lot of general reading on memory management.

There is no standard way of doing this. The platform may specify some APIs which allow you to access the stack and the free store. Further, even if you did, without any additional information how would you be sure that you are reading a POINT object and not a couple of ints? The compiler/linker can read this because it deals with (albeit virtual) addresses and has some more information (and control) than you do.

You can't. Something like that would probably be possible on some "tagged" architecture that also supported tagging objects of user-defined types. But on a traditional architecture it is absolutely impossible to say for sure what is stored in memory simply by looking at the raw memory content.
You can come closer to achieving what you want by introducing a unique signature into the type, like
struct POINT {
char signature[8];
int x;
int y;
};
and carefully setting it to some fixed and "unique" pattern in each object of POINT type, and then looking for that pattern in memory. If it is your application, you can be sure with good degree of certainty that each instance of the pattern is your POINT object. But in general, of course, there will never be any guarantee that the pattern you found belongs to your object, as opposed to being there purely accidentally.

What everyone else has said is true. In memory, your struct is just a few bytes, there's nothing in particular to distinguish it.
However, if you feel like a little hacking, you can look up the internals of your C library and figure out where memory is stored on the heap and how it appears. For example, this link shows how stuff gets allocated in one particular system.
Armed with this knowledge, you could scan your heap to find allocated blocks that were sizeof(POINT), which would narrow down the search considerably. If you look at the table you'll notice that the file name and line number of the malloc() call are being recorded - if you know where in your source code you're allocating POINTs, you could use this as a reference too.
However, if your struct was allocated on the stack, you're out of luck.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js