Sharing underlying state in C++

Sharing underlying state in C++ - c++

Hi I'm wondering if there is any C++ standard that allows the compiler to "share" the data with an object.
IE
LinkedList a = std::share(myLinkedListObject);
In which case the data of a is identical to myLinkedListObject and they share the same reference to memory. In such instance the ownership of the data could either be handled by a shared_ptr or the original object (myLinkedListObj) would retain ownership and control the lifespan of the data.
I acknowledge that the intuitive response and solution would simply use a pointer to the appropriate object but I am creating a wrapper class for a set of data in which case the fields of the wrapper class be the different but the data itself be identical.

There is nothing built in to C++ that codifies the semantics you want, mostly because unlike move and copy - the full behavior of the source and destination objects aren't really well defined. For example, what if you modify the source object a after your "share" operation? Should the the source list also reflect your modification?
If so, then you want some type of reference. Depending on the surrounding code and the ownership of the source object, you might be able to accomplish that with a plain reference or pointer, or you may need to use some type of smart pointer like a shared_ptr.
If not, then you are really talking about a copy operation: the source is copied to the destination, but they have distinct states after that point. In this case sharing the underlying state is still possible, but any write operation to either object needs to modify only the written object. This is referred to as copy-on-write (COW) and can be used to implement certain operations in an efficient way without affecting copy semantics.
You could implement copy-on-write semantics yourself depending on your use case. For example, if you have a linked list of immutable objects, a share operation could conceivably simply point at the original list, and any mutations would adjust only the necessary nodes to ensure that only the written list is logically modified.
In your specific case, it isn't really clear which of these options you want (if either). You mention that you want to create a wrapper object around some data in which the fields of the wrapper object may be different but the underlying data will be the same. That itself doesn't require any tricks at all: you just have a wrapper object which has fields, and then the data field which is shared must be a pointer or reference1 to a data object with appropriate lifetime semantics (e.g., shared_ptr).
1 In general, reference members are not too useful and pointers (dumb or smart) should generally be preferred.

Related

std::function internal memory organization and copies; passing reference vs value

When a std::function is copied, are the code instructions it references copied as well?
An std::function is initialized via some form of callable, that points to executable code in some way (like a function pointer typically does). Now, when a function-object is copied, is this executable code runtime copied or internally referenced?
To rephrase the question: If one instance of std::function is copied, are there then multiple copies of the same compiled code instructions in memory?
Is std::function an object that actually stores the function code or is it more an abstraction for a function pointer?
The former would seem wasteful and I don't suspect it, but everything I found so far on the subject is either too vague, lacking or too specific for me to say for me for sure. For example
When the target is a function pointer or a std::reference_wrapper, small object optimization is guaranteed, that is, these targets are always directly stored inside the std::function object, no dynamic allocation takes place. Other large objects may be constructed in dynamic allocated storage and accessed by the std::function object through a pointer. - cppreference
gives some hints about how it's done but seems still too vague and maybe is not related at all to this question, because of further abstractions inside of std::function.
For context: I am trying to refactor some bad C-ish code that maps input-events (keystrokes, mouse input and the like) to a certain behavior, which is executed upon a target data structure which can be interpreted by the program as more specific input that have semantic context other than than keystrokes (, aka keybindings). One can suspect that requirements of behaviours varies drastically.
This was previously implemented with lists of defines and numbers specifying input-event-ids, and hard-coded behavior, which was selected by switch-case. We quickly approach the border of where this intial way of doing it becomes unwieldly.
To get out of the defined lists to an expandable, declarative, object oriented and flexible design I consider higher order functions.
Especially since some behavior is quite simple and repeatedly needed (like for example the toggle of one value in the output data structure) other behaviors are more complex with multiple conditions attached, I'd like to declare some of the behavior statically, but still would like to be open to just assign some special lambda in some cases. Since I need to store behavior per input (key, mousebutton, mouse-axis, etc.) and potentially many copies of one certain behaviour type can be instantiated in one time for different sets of keybindings, I wonder if this behavior should be referenced, rather than stored by value. In the former case, fresh lambdas would need to be owned by the behavior structures, but statically declared behavior does not, which pragmatically would lead to some shared_ptr shenanigans. In the latter case, by value, this would not be an issue, but I wouldn't want multiple copies of for example the toggle behavior to cause too much redundant overhead instead.

(Note: the whole discussion below is a little simplified. AFAIK, none of it is wrong, but I did omit some details and edge cases and definitions and implementation stuff.)
The std::function does not copy any executable code. The executable code is always merely pointed to, by std::function. And when the std::function gets copied, the pointer gets duplicated (which is completely fine, because executable code is never freed either.) So far, there is no difference between a plain old function pointer and a std::function.
But that's not the whole story.
Contrary to function pointers, instances of std::function can carry around "state" as well as a pointer to the executable code, and the whole hubbub about std::function having to allocate/deallocate and copy/move data around is about this extra state, not the function pointer.
Suppose that you have code like this:
(And note that although I've used a lambda here, the following explanation would have been equally applicable for "functors" and "function objects" and "bind results" and other forms of callable things in C++, all except plain old function pointers.)
int x = 42, y = 17;
std::function<int()> f = [x, y] {return x + y;};
Here, f not only stores the pointer to the executable code for return x + y;, but it also has to remember the value of x and y. Since the amount of state that you can "capture" in this way is not limited, then - by definition - the std::function must allocate memory from the heap upon construction, and deallocate it, copy it and move it at appropriate times. Again, it is this extra "state" that gets copied, not the code.
Let's review: each std::function needs to be able to store at least a pointer to executable code, and 0 or more bytes of extra captured state. If there is no captured state, a std::function is essentially the same as a function pointer (although in practice, std::functions are usually implemented polymorphically and have other stuff in there.)
Some (most) implementations of std::function that I'm aware of employ an optimization that is called "Small Object Optimization". In these implementations, in addition to the space for the pointer to code, the std::function object has some more (fixed amount of) space inside its instance (i.e. as a member of its class, as opposed to somewhere else on the heap) and will use that area if the total number of bytes of the captured state would fit in there. This eliminates the heap allocation, which is important in some use cases and would balance out the additional memory used (when there is no or little state to capture.)

I think the information in regarding the exceptions share some light:
Does not throw if other's target is a function pointer or a std::reference_wrapper, otherwise may throw std::bad_alloc or any exception thrown by the constructor used to copy or move the stored callable object. CppReference
This seems to imply that every copy of the std::function copies the contained callable as well. For example, in case your function contains a lambda with a vector, that lambda and by result vector gets copied. The actual machine code that is linked to it, stays in the read-only part of your executable and won't be copied.
An update from the c++20 standard draft: 20.14.16.2.1 Constructors and destructor[func.wrap.func.con]
function(const function& f);
Postconditions: !*this if !f; otherwise, *this targets a copy off.target().
Throws: Nothing iff’s target is a specialization ofreference_wrapperor
a function pointer. Otherwise, may throwbad_allocor any exception
thrown by the copy constructor of the stored callable object.
[Note:
Implementations should avoid the use of dynamically allocated memory
for small callable objects for example, where f’s target is an object holding only a pointer or reference to an object and a member function pointer. — end note]

It seems that std::function does only manage one callable.
If copied, what happens to code is specified by the callable itself.
In a function pointer case, only a function pointer needs to be copied.
In a lambda or custom callable case this would be determined by the implementation of the copy of lambdas or any custom callable class.
These latter 2 typically can hold members of their own, outside of the reference to code. Therefore some space must be allocated by std::function to accomodate these cases. This is however misleading as it could seem std::function as allocating space for code. The management of instruction code seems to be done by the callable however this is done internally there.
In this context default behavior of typically used callables (like lambdas) when copied seems far more interesting for the intended question, but does seem to strech the posed question too far out of the bounds of the context of std::function.
I therefore would consider this question as solved as posed and deepen my knowledge about how lamdas are implemented especially in regards to how they are compiled and the compiled code referenced.

Are there more than two benefits of move semantics?

I'm trying to more fully understand when to implement move semantics, and I believe it's intertwined with what the benefits are.
So far I'm aware of two.
Saving two likely expensive operations when moving an object (copying when the source is known to vanish soon and not be used), without move a a full copy and a full destructor would have been executed, move will save an atomic increment and decrement for atomically reference counted objects, the deep copy of bitmaps or other data structures held by pointers, or a duplication of a file handle, one of which would be closed, or any other pair of "copy" and "destruct"
Implement objects that can't be copied (as in duplicated), but can be moved, making sure there's ever only a single object with the same contents, yet the object can be handed off to a function. E.g. unique_ptr, or any object where copying is not possible or desired, but it needs to be created in one place but used somewhere else.
The distinction between the two is that the first deals with performance and the second with preventing semantic copying.
My question is this, are there any other uses or advantages to implementing move semantics for a class? A consideration other than performance or making sure only one location contains a live copy of the object.

When you move a container in certain contexts, the pointers and references and iterators travel with it. When you copy it, they do not. This is neither performance, nor an unmovable object; rather, move offers a different set of guarantees than copy.
This also can hold whenever you can have anything reference-like that talks about the "inside" of a complex object. move logically can make that reference travel, while copy may not.

It is related to your points, but I think another benefit of move semantics is as a way of documenting ownership in your code. By using move you can make it clear that ownership of an object has been transferred from one place to another. This makes it easier to reason about code.
If you are trying to debug a program and understand the lifetime of an object, it is easier if there is just one path to follow. If ownership transfer is implemented as a copy you have to make sure that the lifetime of the original copy of the object has ended as well.

Wrapping an opengl object into a c++ class with copying

How is this usually done?
For example you might have a texture class. This would of course hold the GLuint id, and maybe other fields such as width and height. When the object needs to be copied for whatever reason, a user-defined copy constructor is needed.
Now, in the case of an opengl texture, it is possible to copy it to another texture object. But what about shader programs, or FBOs? These can't be copied so easily. How do people usually go about doing this? Should they be reference counted? Should copying be disabled on all objects? Should copying be disabled on all objects that can't be copied?
What is the best way to go about this? Thanks in advance for any answers.

For something like a texture there may actually be a point to copying it, but for a shader object much less so (in my experience). Even for objects where copying makes sense, you don't want to do it very often so you want to make it very explicit.
Either you use your wrapper class by-value (so it is something like a handle to your OpenGL entity) and it uses reference counting internally, or you consider your wrapper class instance to own the OpenGL entity and you use reference counting on your wrapper class itself (using std::shared_ptr for example).
In the latter case you could implement a copy-constructor only on those entities for which it makes sense. However, in order to avoid unintentional use I normally resort to a private constructor and a public static factory function on the entity that returns a new instance as a smart pointer, so it is not mistakenly used by-value. In that case it also makes more sense to have an explicit "Copy/Clone" member function (easier to use and it allows for polymorphism).
In the former case the copy-constructor only increments the reference count, so you will need to add some explicit member for copying where appropriate anyway.
I usually go with the latter option by the way... I'm used to passing resource-intensive objects around by smart pointers and I don't see much point in reimplementing their functionality in custom handles.

Passing smart-pointers by reference

Smart-pointers are generally tiny so passing by value isn't a problem, but is there any problem passing references to them; or rather are there specific cases where this mustn't be done?
I'm writing a wrapper library and several of my classes wrap smart-pointer objects in the underlying library... my classes are not smart-pointers but the APIs currently pass smart-pointer objects by value.
e.g current code:
void class::method(const AnimalPtr pAnimal) { ... }
becomes
void class::method(const MyAnimal &animal){...}
where MyAnimal is my new wrapper class encapsulating AnimalPtr.
There is no guarantee the Wrapper classes won't one day grow beyond wrapping a smart-pointer, so passing by value makes me nervous.

You should pass shared pointers by reference, not value, in most cases. While the size of a std::shared_ptr is small, the cost of copying involves an atomic operation (conceptually an atomic increment and an atomic decrement on destruction of the copy, although I believe that some implementations manage to do a non-atomic increment).
In other cases, for example std::unique_ptr you might prefer to pass by value, as the copy will have to be a move and it clearly documents that ownership of the object is transferred to the function (if you don't want to transfer ownership, then pass a reference to the real object, not the std::unique_ptr).
In other cases your mileage might vary. You need to be aware of what the semantics of copy are for your smart pointer, and whether you need to pay for the cost or not.

It's ok to pass a smart pointer by reference, except if it's to a constructor. In a constructor, it's possible to store a reference to the original object, which violates the contract of the smart pointers. You would likely get memory corruption if you did that. Even if your constructor does not today store the reference, I would still be wary because code changes and it's an easy thing to miss if you decide later you need to hold the variable longer.
In a normal function, you cannot store a function parameter as a reference anywhere because references must be set during their initialization. You could assign the reference to some longer-living non-reference variable, but that would be a copy and so would increase its lifetime appropriately. So in either case, you could not hold onto it past when the calling function might have freed it. In this case, you might get a small performance boost with a reference, but I wouldn't plan on noticing it in most cases.
So I would say - constructor, always pass by value; other functions, pass by reference if you want.

What is the difference between: Handle, Pointer and Reference

How does a handle differ from a pointer to an object and also why can't we have a reference to a reference?

A handle is usually an opaque reference to an object. The type of the handle is unrelated to the element referenced. Consider for example a file descriptor returned by open() system call. The type is int but it represents an entry in the open files table. The actual data stored in the table is unrelated to the int that was returned by open() freeing the implementation from having to maintain compatibility (i.e. the actual table can be refactored transparently without affecting user code. Handles can only be used by functions in the same library interface, that can remap the handle back to the actual object.
A pointer is the combination of an address in memory and the type of the object that resides in that memory location. The value is the address, the type of the pointer tells the compiler what operations can be performed through that pointer, how to interpret the memory location. Pointers are transparent in that the object referenced has a concrete type that is present from the pointer. Note that in some cases a pointer can serve as a handle (a void* is fully opaque, a pointer to an empty interface is just as opaque).
References are aliases to an object. That is why you cannot have a reference to a reference: you can have multiple aliases for an object, but you cannot have an alias of an alias. As with pointers references are typed. In some circumstances, references can be implemented by the compiler as pointers that are automatically dereferenced on use, in some other cases the compiler can have references that have no actual storage. The important part is that they are aliases to an object, they must be initialized with an object and cannot be reseated to refer to a different object after they are initialized. Once they are initialized, all uses of the reference are uses of the real object.

To even ask the question, "why can't we have a reference to a reference?" means you don't understand what a reference is.
A reference is another name for an object; nothing more. If I have an object stored in variable X, I can create a variable Y that is a reference to this object. They're both talking about the same object, so what exactly would it mean to have a reference to Y? It wouldn't be any different from having a reference to X because they're all referencing the same thing.
A "handle" does not have a definition as far as the C++ language is concerned. Generally speaking, a "handle" is a construct of some form which represents some sort of resource. You get it from some API that creates the resource. You call functions that take the handle as a parameter in order to query the state of the resource or modify it. And when you're done with it, you give it to some other API function.
A pointer could be a handle. A reference could be a handle. An object could be a handle. An integer could be a handle. It all depends on what the system that implements the handle wants to do with it.

A handle is also sometimes called a "magic cookie". Its just a value of some opaque type that identifies an object. In some cases it's implemented as an actual pointer, so if you cast it to a pointer to the correct type, you can dereference it and work with whatever sort of thing it points at.
In other cases, it'll be implemented as something other than a pointer -- for example, you might have a table of objects of that type, and the handle is really just an index into that table. Unless you know the base address of the table, you can't do much of anything with the index.
C++ simply says that references to references aren't possible. There isn't much in the way of a "why" -- if they'd wanted to badly enough, they undoubtedly could have allowed it (as well as arrays of references, for that matter). The decision was made, however, that it was best to restrict references (a lot), so that's what they did.

The difference is the context.
Basic meaning of a handle is that it refers to some object in very limited context; eg. an OS can keep only 20 files opened for a user or pid. A pointer refers to the same object in the context of "memory". And reference is an "alias" to an object -- it refers to an object in the context of source code; thus reference to a reference doesn't exists as a reference already "is" the object.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js