Is there an adequate way in C++ to extract a hash from the data that std::any stores?
Well, or at least an object in the form of a list of bytes and its length
std::any is a type-safe mechanism for passing an object of known type from one location to another, through an intermediary that does not need to know what that type is. Computing a hash from it is not its goal. And indeed, it wouldn't be meaningfully possible without compromising any's functionality.
Hashing an object requires some knowledge of what that object is and is doing. Assuming that you can just look at the bytes of the object representation and thereby compute a meaningful hash from it is not going to end well. It might appear to work... for a while. But eventually, it's going to do the wrong thing.
You could create a type-erased type similar to any that requires the object to implement hashing. But std::any is not that type, because anyone who doesn't want to hash the types they put into any would be unable to store said object in any.
This is because any operation that any provides is an operation that all types that gets stored into any must also provide. For example, any is copyable, therefore any cannot store move-only types. That is an annoyance for those who want to do so, and the more functionality you dump into any, the more restrictive the type's ability to store "any"thing becomes.
Related
C++17 presents std::variant and std::any, both able to store different type of values under an object. For me, they are somehow similar (are they?).
Also std::variant restricts the entry types, beside this one. Why we should prefer std::variant over std::any which is simpler to use?
The more things you check at compile time the fewer runtime bugs you have.
variant guarantees that it contains one of a list of types (plus valueless by exception). It provides a way for you to guarantee that code operating on it considers every case in the variant with std::visit; even every case for a pair of variants (or more).
any does not. With any the best you can do is "if the type isn't exactly what I ask for, some code won't run".
variant exists in automatic storage. any may use the free store; this means any has performance and noexcept(false) issues that variant does not.
Checking for which of N types is in it is O(N) for an any -- for variant it is O(1).
any is a dressed-up void*. variant is a dressed-up union.
any cannot store non-copy or non-move able types. variant can.
The type of variant is documentation for the reader of your code.
Passing a variant<Msg1, Msg2, Msg3> through an API makes the operation obvious; passing an any there means understanding the API requires reliable documentation or reading the implementation source.
Anyone who has been frustrated by statically typeless languages will understand the dangers of any.
Now this doesn't mean any is bad; it just doesn't solve the same problems as variant. As a copyable object for type erasure purposes, it can be great. Runtime dynamic typing has its place; but that place is not "everywhere" but rather "where you cannot avoid it".
The difference is that the objects are stored within the memory allocated by std::variant:
cppreference.com - std::variant
As with unions, if a variant holds a value of some object type T, the object representation of T is allocated directly within the object representation of the variant itself. Variant is not allowed to allocate additional (dynamic) memory.
and for std::any this is not possible.
As of that a std::variant, does only require one memory allocation for the std::variant itself, and it can stay on the stack.
In addition to never using additional heap memory, variant has one other advantage:
You can std::visit a variant, but not any.
Is there a term for a class/struct that is both trivial and standard-layout but also has no pointer members?
Basically I'd like to refer to "really" plain-old-data types. Data that I can grab from memory and store on disk, and read back into memory for later processing because it is nothing more than a collection of ints, characters, enums, etc.
Is there a way to test at compile time if a type is a "really" plain-old-data type?
related:
What are POD types in C++?
What are Aggregates and PODs and how/why are they special?
This can depend on semantics of the structure. I could imagine a struct having int fields being keys into some volatile temporary data store (or cache). You still shouldn't serialize those, but you need internal knowledge about that struct to be able to tell1.
In general, C++ lacks features for generic serialization. Making this automatic just on pointers is just a tip of the iceberg (if possibly pretty accurate in general) - it's also impossible in a generic way. C++ still has no reflection, and thus no way to check "every member" for some condition.
The realistic approaches could be:
preprocessing the class sources before build to scan for pointers
declaring all structs that are to be serialized with some macros that track the types
the regular template check could be implemented for a set of known names for fields
All of those have their limitations, though, and together with my earlier reservations, I'm not sure how practical they'd be.
1 This of course goes both ways; pointers could be used to store relative offsets, and thus be perfectly serializable.
As far as I know each polymorphic class in C++ contains a string with a mangled type name. And RTTI is implemented by string comparison.
Is this true? Would it be more efficient to implement a centralized type storage instead?
With centralized type storage each object can just hold a pointer to type information. Dynamic casts can be implemented simply by pointer comparison.
The actual implementation is even more efficient than one pointer per object.
The Standard forbids adding any data to "standard layout" classes, so there's not even room for a pointer, let alone a string. For polymorphic classes, there will be extra metadata, but in real-world implementations, all data specific to the dynamic type of the object is stored together, and there's just one pointer needed to all of it.
As a result, because polymorphic objects already need a pointer to the virtual function dispatch table, there is zero incremental per-object cost to storing the type name. There just an extra pointer stored in the v-table alongside the function pointers, so the cost is one pointer per polymorphic type no matter how many instances exist.
Polymorphic classes contain what the compiler builder considered worthy putting in them, there is no rule or requirement to have any type information.
The concept of C++ is strongly typed, and the checking id to be done by the compiler. The compiled code is typically optimized for performance and/or size, and not to carry information that shouldn’t be needed.
Of course, some compilers offer this, but that is not the spirit of the language.
I have been able to find reference material at cppreference.com, cplusplus.com, and this site (What is a scalar Object in C++?) that enables me to determine whether a particular C++ data type is a scalar. Namely, I can apply a mental algorithm that runs like this: "Is it a reference type, a function type, or void? If not, is it an array, class, or union? If not, it's a scalar type." In code, of course, I can apply std::is_scalar<T>. And finally, I can apply the working definition "A scalar type is a type that has built-in functionality for the addition operator without overloads (arithmetic, pointer, member pointer, enum and std::nullptr_t)."
What I have not been able to find is a description of the purpose of the scalar classification. Why would anyone care if something is a scalar? It seems like a kind of "leftover" classification, like "reptile" in zoological taxonomy ("Well, a reptile is, um, an amniote that's not a bird or a mammal"). I'm guessing that it must have some use to justify its messiness. I can understand why someone would want to know whether a type is a reference -- you can't take a reference of a reference, for instance. But why would people care whether something is a scalar? What is scalarness all about?
Given is_scalar<T>, you can be sure that operator=(), operator==() and operator!=() does what you think (that is, assignment, comparison and the inverse of that, respectively) for any T.
a class T might or might not have any of these, with arbitrary meaning;
a union T is problematic;
a function doesn't have =;
a reference might hold any of these;
an array - well for two arrays of different size, == and != will make it decay to pointer and compare, while = will fail compile-time.
Thus if you have is_scalar<T>, you can be sure that these work consistently. Otherwise, you need to look further.
One purpose is to write more efficient template specializations. On many architectures, it would be more efficient to pass around pointers to objects than to copy them, but scalars can fit into registers and be copied with a single machine instruction. Or a generic type might need locking, while the machine guarantees that it will read or update a properly-aligned scalar with a single atomic instruction.
Clue here in the notes on cppreference.com?
Each individual memory location in the C++ memory model, including the hidden memory locations used by language features (e.g virtual table pointer), has scalar type (or is a sequence of adjacent bit-fields of non-zero length). Sequencing of side-effects in expression evaluation, interthread synchronization, and dependency ordering are all defined in terms of individual scalar objects.
I came across an older pondering Type erasure techniques, by Xeo, and I became to wonder, how should one to amend that code to make it work with std::unique_ptr'ers and std::shared_ptr'ers.
The code in the post can be found here. The code won't compile if fed with something containing unique_ptrs and the data in shared_ptr'ers become garbage. What I tried was a class inherited from a templated base class, so maybe it was somewhat complicated too. Now, this is mainly out of curiosity, as I became to wonder if it would be difficult (in general case) as this could become handy when storing complex objects, say, in std::vector when Boost.Any isn't available for use.
Edit: I noticed I just had a bug in my code whilst testing, the code works just fine with shared_ptr'ers (the contents aren't garbage), though not with unique_ptr'ers. And then also, why not store a newed instance of this type erasured Any_Virtual (as in code provided by Xeo) to, say, std::unique_ptr'ers.
I guess then the questions would be:
How to amend the Any_Virtual so that it could work with std::unique_ptr?
Which one would be better design, a std::vector<Any_Virtual> objects, where Any_Virtual holds a smart pointer, or a std::vector<std::unique_ptr<Any_Virtual>> objects? Or does it even matter?
When you use type-erasure there is always a set of requirements that a type must satisfy to be compatible with the type-erasing container: this set is called the model. In particular, the holder<T>::clone member that copies *this requires in turn that held_ (of type T, the type being erased) be copy constructible. Hence the model of your type-erasing class is copy constructible.
However a type containing an std::unique_ptr will not be copy constructible out of the box.
There is no obvious fix without knowing what you want to achieve. Perhaps you really want the model to be less strict, e.g. it could only require to be move constructible (which a type containing an std::unique_ptr can easily fulfill, right out of the box). Or perhaps you really want those types that do hold std::unique_ptrs to be copy constructible.
In my opinion the very worst you can do with type-erasure is to compromise, and make an operation on the model work conditionally, depending on whether the type being erased supports such an operation itself. Here that would mean that coyping an Any_Virtual value would result in an exception if it happened to hold a value of non-copy constructible erased type.
Perhaps more worryingly, the fact that you obtain garbage std::shared_ptrs heavily suggest that there is a problem with either your implementation of Any_Virtual or your use of it. You should definitively not assume that there is a problem with using std::shared_ptr in tandem with Any_Virtual alone, but that there might be a problem with using anything with Any_Virtual. Since I haven't noticed a problem in the implementation (but I could easily have overlooked something), I'd like to see an example of a program exhibiting the problem.