How can I zero just the padding bytes of a class? - c++

I want to set the padding bytes of a class to 0, since I am saving/loading/comparing/hashing instances at a byte level, and garbage-initialised padding introduces non-determinism in each of those operations.
I know that this will achieve what I want (for trivially copyable types):
struct Example
{
Example(char a_, int b_)
{
memset(this, 0, sizeof(*this));
a = a_;
b = b_;
}
char a;
int b;
};
I don't like doing that though, for two reasons: I like constructor initialiser lists, and I know that setting the bits to 0 isn't always the same as zero-initialisation (e.g. pointers and floats don't necessarily have zero values that are all 0 bits).
As an aside, it's obviously limited to types that are trivially copyable, but that's not an issue for me since the operations I listed above (loading/saving/comparing/hashing at a byte level) require trivially copyable types anyway.
What I would like is something like this [magical] snippet:
struct Example
{
Example(char a_, int b_) : a(a_), b(b_)
{
// Leaves all members alone, and sets all padding bytes to 0.
memset_only_padding_bytes(this, 0);
}
char a;
int b;
};
I doubt such a thing is possible, so if anyone can suggest a non-ugly alternative... I'm all ears :)

There's no way I know of to do this fully automatically in pure C++. We use a custom code generation system to accomplish this (among other things). You could potentially accomplish this with a macro to which you fed all your member variable names; it would simply look for holes between offsetof(memberA)+sizeof(memberA) and offsetof(memberB).
Alternatively, serialize/hash on a memberwise basis, rather than as a binary blob. That's ten kinds of cleaner.
Oh, one other option -- you could provide an operator new which explicitly cleared the memory before returning it. I'm not a fan of that approach, though..... it doesn't work for stack allocation.

You should never use padded structs when binary writing/reading them. Simply because the padding can vary from one platform to another which will lead to binary incompatibility.
Use some compiler options, like #pragma pack (push, 1) to disable padding when defining those writable structs and restore it with #pragma pack(pop).
This sadly means you're losing the optimization provided by it. If that is a concern, by carefully designing your structs you can manually "pad" them by inserting dummy variables. Then zero-initialization becomes obvious, you just assign zeros to those dummies. I don't recommend that "manual" approach as it's very error-prone, but as you're using binary blob write you're probably concerned already. But by all means, benchmark unpadded structs before.

I faced a similar problem - and simply saying that this is a poor design decision (as per dasblinkenlight's comment) doesn't necessarily help as you may have no control over the hashing code (in my case I was using an external library).
One solution is to write a custom iterator for your class, which iterates through the bytes of the data and skips the padding. You then modify your hashing algorithm to use your custom iterator instead of a pointer. One simple way to do this is to templatize the pointer so that it can take an iterator - since the semantics of a pointer and an iterator are the same, you shouldn't have to modify any code beyond the templatizing.
EDIT: Boost provides a nice library which makes it simple to add custom iterators to your container: Boost.Iterator.
Whichever solution you go for, it is highly preferable to avoid hashing the padding as doing so means that your hashing algorithm is highly coupled with your data structure. If you switch data structures (or as Agent_L mentions, use the same data structure on a different platform which pads differently), then it will produce different hashes. On the other hand, if you only hash the actual data itself, then you will always produce the same hash values no matter what data structure you use later.

Related

Is it a good idea to base a non-owning bit container on std::vector<bool>? std::span?

In a couple of projects of mine I have had an increasing need to deal with contiguous sequences of bits in memory - efficiently (*). So far I've written a bunch of inline-able standalone functions, templated on the choice of a "bit container" type (e.g. uint32_t), for getting and setting bits, applying 'or' and 'and' to their values, locating the container, converting lengths in bits to sizes in bytes or lengths in containers, etc. ... it looks like it's class-writing time.
I know the C++ standard library has a specialization of std::vector<bool>, which is considered by many to be a design flaw - as its iterators do not expose actual bools, but rather proxy objects. Whether that's a good idea or a bad one for a specialization, it's definitely something I'm considering - an explicit bit proxy class, which will hopefully "always" be optimized away (with a nice greasing-up with constexpr, noexcept and inline). So, I was thinking of possibly adapting std::vector code from one of the standard library implementation.
On the other hand, my intended class:
Will never own the data / the bits - it'll receive a starting bit container address (assuming alignment) and a length in bits, and won't allocate or free.
It will not be able resize the data dynamically or otherwise - not even while retaining the same amount of space like std::vector::resize(); its length will be fixed during its lifespan/scope.
It shouldn't anything know about the heap (and work when there is no heap)
In this sense, it's more like a span class for bits. So maybe start out with a span then? I don't know, spans are still not standard; and there are no proxies in spans...
So what would be a good basis (edit: NOT a base class) for my implementation? std::vector<bool>? std::span? Both? None? Or - maybe I'm reinventing the wheel and this is already a solved problem?
Notes:
The bit sequence length is known at run time, not compile time; otherwise, as #SomeProgrammerDude suggests I could use std::bitset.
My class doesn't need to "be-a" span or "be-a" vector, so I'm not thinking of specializing any of them.
(*) - So far not SIMD-efficiently but that may come later. Also, this may be used in CUDA code where we don't SIMDize but pretend the lanes are proper threads.
Rather than std::vector or std::span I suspect an implementation of your class would share more in common with std::bitset, since it is pretty much the same thing, except with a (fixed) runtime-determined size.
In fact, you could probably take a typical std::bitset implementation and move the <size_t N> template parameter into the class as a size_t size_ member (or whatever name you like) and you'll have your dynamic bitset class with almost no changes. You may want to get rid anything of you consider cruft like the constructors that take std::string and friends.
The last step is then to remove ownership of the underlying data: basically you'll remove the creation of the underlying array in the constructor and maintain a view of an existing array with some pointers.
If your clients disagree on what the underlying unsigned integer type to use for storage (what you call the "bit container"), then you may also need to make your class a template on this type, although it would be simpler if everyone agreed on say uint64_t.
As far as std::vector<bool> goes, you don't need much from that: everything that vector does that you want, std::bitset probably does too: the main thing that vector adds is dynamic growth - but you've said you don't want that. vector<bool> has the proxy object concept to represent a single bit, but so does std::bitset.
From std::span you take the idea of non-ownership of the underlying data, but I don't think this actually represents a lot of underlying code. You might want to consider the std::span approach of having either a compile-time known size or a runtime provided size (indicated by Extent == std::dynamic_extent) if that would be useful for you (mostly if you sometimes use compile-time sizes and could specialize some methods to be more efficient in that case).

Is there any environment where "int" would cause struct padding?

Specifically, this came up in a discussion:
Memory consuption wise, is there a possibility that using a struct of two ints take more memory than just two ints?
Or, in language terms:
#include <iostream>
struct S { int a, b; };
int main() {
std::cout << (sizeof(S) > sizeof(int) * 2 ? "bigger" : "the same") << std::endl;
}
Is there any reasonable1 (not necessarily common or current) environment where this small program would print bigger?
1To clarify, what I meant here is systems (and compilers) developed and produced in some meaningful quantity, and specifically not theoretical examples constructed just to prove the point, or one-off prototypes or hobbyist creations.
Is there any reasonable (not necessarily common or current) environment where this small program would print bigger?
Not that I know of. I know that's not completely reassuring, but I have reason to believe there is no such environment due to the requirements imposed by the C++ standard.
In a standard-compliant† compiler the following hold:
(1) arrays cannot have any padding between elements, due to the way they can be accessed with pointersref;
(2) standard layout structs may or may not have padding after each member, but not at the beginning, because they are layout-compatible with "shorter"-but-equal standard layout structsref;
(3) array elements and struct members are properly alignedref;
From (1) and (3), it follows that the alignment of a type is less than or equal to its size. Were it greater, an array would need to add padding to have all its elements aligned. For the same reason, the size of a type is always a whole multiple of its alignment.
This means that in a struct as the one given, the second member will always be properly aligned—whatever the size and alignment of ints—if placed right after the first member, i.e., no interstitial padding is required. Under this layout, the size of the struct is also already a multiple of its alignment, so no trailing padding is required either.
There is no standard-compliant set of (size, alignment) values that we can pick that makes this structure need any form of padding.
Any such padding would then need a different purpose. However, such a purpose seems elusive. Suppose there is an environment that needs this padding for some reason. Whatever the reason for the padding is, it would likely&ddagger; also apply in the case of arrays, but from (1) we know that it cannot.
But suppose such an environment truly exists and we want a C++ compiler for it. It could support this extra required padding in arrays by simply making ints larger that much, i.e. by putting the padding inside the ints. This would in turn once more allow the struct to be the same size as two ints and leave us without a reason to add padding.
† A compiler—even one otherwise not-standard-compliant—that gets any of these wrong is arguably buggy, so I'll ignore those.
&ddagger; I guess that in an environment where arrays and structures are primitives there might be some underlying distinction that allows us to have unpadded arrays and padded structs, but again, I don't know of any such thing in use.
In your specific example, struct S { int a, b; };, I cannot see any reasonable argument for padding. int should be naturally aligned already, and if it is, int * can and should be the natural representation for pointers, and there is no need for S * to be any different. But in general:
A few rare systems have pointers with different representations, where e.g. int * is represented as just an integer representing a "word" address, and char * is a combination of a word address and a byte offset into that word (where the byte offset is stored in otherwise unneeded high bits of the word address). Dereferencing a char * happens in software by loading the word, and then masking and shifting to get the right byte.
On such implementations, it may make sense to ensure all structure types have a minimal alignment, even if it's not necessary for the structure's members, just so that that byte offset mess isn't necessary for pointers to that structure. Meaning it's reasonable that given struct S { char a, b; };, sizeof(S) > 2. Specifically, I'd expect sizeof(S) == sizeof(int).
I've never personally worked with such implementations, so I don't know if they do indeed produce such padding. But an implementation that does so would be reasonable, and at the very least very close to an existing real-world implementation.
I know this is not what you asked for, it's not in the spirit of your question (as you probably have standard layout classes in mind), but strictly answering just this part:
Memory consuption wise, is there a possibility that using a struct of
two ints take more memory than just two ints?
the answer is kinda... yes:
struct S
{
int a;
int b;
virtual ~S() = default;
};
with the pedantic note that C++ doesn't have structs, it has classes. struct is a keyword that introduces the declaration/definition of a class.
It would not be totally implausible that a system which can only access memory in 64-bit chunks might have an option to use a 32-bit "int" size for compatibility with other programs that could get tripped up of uint32_t promotes to a larger type. On such a system, a struct with an even number of "int" values would likely not have extra padding, but one with an odd number of values might plausibly do so.
From a practical perspective, the only way a struct with two int values would need padding would be if the alignment of a struct was more than twice as coarse as that of "int". That would in turn require either that the alignment of structures be coarser than 64 bits, or that the size of int be smaller than 32 bits. The latter situation wouldn't be unusual in and of itself, but combining both in a fashion that would make struct alignment more than twice as coarse as int alignment would seem very weird.
Theoretically padding is used to provide efficient way of accessing memory area.If adding padding to 2 integer variable would increase the efficient than yes it can have padding.But practically I haven't came across any structure with 2 integer have padding bits.

Can storing unrelated data in the least-significant-bit of a pointer work reliably?

Let me just say up front that what I'm aware that what I'm about to propose is a mortal sin, and that I will probably burn in Programming Hell for even considering it.
That said, I'm still interested in knowing if there's any reason why this wouldn't work.
The situation is: I have a reference-counting smart-pointer class that I use everywhere. It currently looks something like this (note: incomplete/simplified pseudocode):
class IRefCountable
{
public:
IRefCountable() : _refCount(0) {}
virtual ~IRefCountable() {}
void Ref() {_refCount++;}
bool Unref() {return (--_refCount==0);}
private:
unsigned int _refCount;
};
class Ref
{
public:
Ref(IRefCountable * ptr, bool isObjectOnHeap) : _ptr(ptr), _isObjectOnHeap(isObjectOnHeap)
{
_ptr->Ref();
}
~Ref()
{
if ((_ptr->Unref())&&(_isObjectOnHeap)) delete _ptr;
}
private:
IRefCountable * _ptr;
bool _isObjectOnHeap;
};
Today I noticed that sizeof(Ref)=16. However, if I remove the boolean member variable _isObjectOnHeap, sizeof(Ref) is reduced to 8. That means that for every Ref in my program, there are 7.875 wasted bytes of RAM... and there are many, many Refs in my program.
Well, that seems like a waste of some RAM. But I really need that extra bit of information (okay, humor me and assume for the sake of the discussion that I really do). And I notice that since IRefCountable is a non-POD class, it will (presumably) always be allocated on a word-aligned memory address. Therefore, the least significant bit of (_ptr) should always be zero.
Which makes me wonder... is there any reason why I can't OR my one bit of boolean data into the least-significant bit of the pointer, and thus reduce sizeof(Ref) by half without sacrificing any functionality? I'd have to be careful to AND out that bit before dereferencing the pointer, of course, which would make pointer dereferences less efficient, but that might be made up for by the fact that the Refs are now smaller, and thus more of them can fit into the processor's cache at once, and so on.
Is this a reasonable thing to do? Or am I setting myself up for a world of hurt? And if the latter, how exactly would that hurt be visited upon me? (Note that this is code that needs to run correctly in all reasonably modern desktop environments, but it doesn't need to run in embedded machines or supercomputers or anything exotic like that)
If you want to use only the standard facilities and not rely on any implementation then with C++0x there are ways to express alignment (here is a recent question I answered). There's also std::uintptr_t to reliably get an unsigned integral type large enough to hold a pointer. Now the one thing guaranteed is that a conversion from the pointer type to std::[u]intptr_t and back to that same type yields the original pointer.
I suppose you could argue that if you can get back the original std::intptr_t (with masking), then you can get the original pointer. I don't know how solid this reasoning would be.
[edit: thinking about it there's no guarantee that an aligned pointer takes any particular form when converted to an integral type, e.g. one with some bits unset. probably too much of a stretch here]
The problem here is that it is entirely machine-dependent. It isn't something one often sees in C or C++ code, but it has certainly been done many times in assembly. Old Lisp interpreters almost always used this trick to store type information in the low bit(s). (I have seen int in C code, but in projects that were being implemented for a specific target platform.)
Personally, if I were trying to write portable code, I probably wouldn't do this. The fact is that it will almost certainly work on "all reasonably modern desktop environments". (Certainly, it will work on every one I can think of.)
A lot depends on the nature of your code. If you are maintaining it, and nobody else will ever have to deal with the "world of hurt", then it might be ok. You will have to add ifdef's for any odd architecture that you might need to support later on. On the other hand, if you are releasing it to the world as "portable" code, that would be cause for concern.
Another way to handle this is to write two versions of your smart pointer, one for machines on which this will work and one for machines where it won't. That way, as long as you maintain both versions, it won't be that big a deal to change a config file to use the 16-byte version.
It goes without saying that you would have to avoid writing any other code that assumes sizeof(Ref) is 8 rather than 16. If you are using unit tests, run them with both versions.
Any reason? Unless things have changed in the standard lately, the value representation of a pointer is implementation-defined. It is certainly possible that some implementation somewhere may pull the same trick, defining these otherwise-unused low bits for its own purposes. It's even more possible that some implementation might use word-pointers rather than byte-pointers, so instead of two adjacent words being at "addresses" 0x8640 and 0x8642, they would be at "addresses" 0x4320 and 0x4321.
One tricky way around the problem would be to make Ref a (de facto) abstract class, and all instances would actually be instances of RefOnHeap and RefNotOnHeap. If there are that many Refs around, the extra space used to store the code and metadata for three classes rather than one would be made up by the space savings in having each Ref being half the size. (Won't work too well, the compiler can omit the vtable pointer if there are no virtual methods and introducing virtual methods will add the 4-or-8 bytes back to the class).
You always have at least a free bit to use in the pointer as long as
you're not pointing to arbitrary positions inside a struct or array with alignment of 1, or
the platform gives you a free bit
Since IRefCountable has an alignment of 4, you'll have 2 free bottom bits in IRefCountable* to use
Regarding the first point, storing data in the least significant bit is always reliable if the pointer is aligned to a power of 2 larger than 1. That means it'll work for everything apart from char*/bool* or a pointer to a struct containing all char/bool members, and obviously it'll work for IRefCountable* in your case. In C++11 you can use alignof or std::alignment_of to ensure that you have the required alignment like this
static_assert(alignof(Ref) > 1);
static_assert(alignof(IRefCountable) > 1);
// This check for power of 2 is likely redundant
static_assert((alignof(Ref) & (alignof(Ref) - 1)) == 0);
// Now IRefCountable* is always aligned,
// so its least significant bit can be used freely
Even if you have some object with only 1-byte alignment, for example if you change the _refCount in IRefCountable to uint8_t, then you can still enforce alignment requirement with alignas, or with other extensions in older C++ like __declspec(align). Dynamically allocated memory is already aligned to max_align_t, or you can use aligned_alloc() for a higher level alignment
My second bullet point means in case you really need to store arbitrary pointers to objects with absolute 1-byte alignment then most of the time you can still utilize the feature from the platform
On many 32-bit platforms the address space is split in half for user and kernel processes. User pointers will always have the most significant bit unset so you can use that to store data. Of course it won't work on platforms with more than 2GB of user address space, like when the split is 3/1 or 4/4
On 64-bit platforms currently most have only 48-bit virtual address, and a few newer high-end CPUs may have 57-bit virtual address which is far from the total 64 bits. Therefore you'll have lots of bits to spare. And in reality this always work in personal computing since you'll never be able to fill that vast address space
This is called tagged pointer
If the data is always heap-allocated then you can tell the OS to limit the range of address space to use to get more bits
For more information read Using the extra 16 bits in 64-bit pointers
Yes, this can work reliably. This is, in fact, used by the Linux kernel as part of its red-black tree implementation. Instead of storing an extra boolean to indicate whether a node is red or black (which can take up quite a bit of additional space), the kernel uses the low-order bit of the parent node address.
From rbtree_types.h:
struct rb_node {
unsigned long __rb_parent_color;
struct rb_node *rb_right;
struct rb_node *rb_left;
} __attribute__((aligned(sizeof(long))));
The __rb_parent_color field stores both the address of the nodes parent and the color of the node (in the least-significant bit).
Getting The Pointer
To retrieve the parent address from this field you just clear the lower order bits (this clears the lowest 2-bits).
From rbtree.h:
#define rb_parent(r) ((struct rb_node *)((r)->__rb_parent_color & ~3))
Getting The Boolean
To retrieve the color you just extract the lower bit and treat it like a boolean.
From rbtree_augmented.h:
#define __rb_color(pc) ((pc) & 1)
#define __rb_is_black(pc) __rb_color(pc)
#define __rb_is_red(pc) (!__rb_color(pc))
#define rb_color(rb) __rb_color((rb)->__rb_parent_color)
#define rb_is_red(rb) __rb_is_red((rb)->__rb_parent_color)
#define rb_is_black(rb) __rb_is_black((rb)->__rb_parent_color)
Setting The Pointer And Boolean
You set the pointer and boolean value using standard bit manipulation operations (making sure to preserve each part of the final value).
From rbtree_augmented.h:
static inline void rb_set_parent(struct rb_node *rb, struct rb_node *p)
{
rb->__rb_parent_color = rb_color(rb) | (unsigned long)p;
}
static inline void rb_set_parent_color(struct rb_node *rb,
struct rb_node *p, int color)
{
rb->__rb_parent_color = (unsigned long)p | color;
}
You can also clear the boolean value setting it to false via (unsigned long)p & ~1.
There will be always a sense of uncertainty in mind even if this method is working, because ultimately you are playing with the internal architecture which may or may not be portable.
On the other hand to solve this problem, if you want to avoid bool variable, I would suggest a simple constructor as,
Ref(IRefCountable * ptr) : _ptr(ptr)
{
if(ptr != 0)
_ptr->Ref();
}
From the code, I smell that the reference counting is needed only when the object is on heap. For automatic objects, you can simply pass 0 to the class Ref and put appropriate null checks in constructor/destructor.
Have you thought about an out of class storage ?
Depending on whether you have (or not) to worry about multi-threading and control the implementation of new/delete/malloc/free, it might be worth a try.
The point would be that instead of incrementing a local counter (local to the object), you would maintain a "counter" map address --> count that would haughtily ignore addresses passed that are outside the allocated area (stack for example).
It may seem silly (there is room for contention in MT), but it also plays rather nice with read-only since the object is not "modified" only for counting.
Of course, I have no idea of the performance you might hope to achieve with this :p

(C++) Looking for tips to reduce memory usage

I've got a problem with a really tight and tough memory limit. I'm a CPP geek and I want to reduce my memory usage. Please give me some tips.
One of my friends recommended to take functions inside my structs out of them.
for example instead of using:
struct node{
int f()
{}
}
he recommended me to use:
int f(node x)
{}
does this really help?
Note: I have lots of copies of my struct.
here's some more information:
I'm coding some sort of segment tree for a practice problem on an online judge. I get tree nodes in a struct. my struct has these variables:
int start;
int end;
bool flag;
node* left;
node* right;
The memory limit is 16 MB and I'm using 16.38 MB.
I'm guessing by the subtext of your question that the majority of your memory usage is data, not code. Here are a couple of tips:
If your data ranges are limited, take advantage of it. If the range of an integer is -128 to 127, use char instead of int, or unsigned char if it's 0 to 255. Likewise use int16_t or uint16_t for ranges of -32768..32767 and 0..65535.
Rearrange the structure elements so the larger items come first, so that data alignment doesn't leave dead space in the middle of the structure. You can also usually control padding via compiler options, but it's better just to make the layout optimal in the first place.
Use containers that don't have a lot of overhead. Use vector instead of list, for example. Use boost::ptr_vector instead of std::vector containing shared_ptr.
Avoid virtual methods. The first virtual method you add to a struct or class adds a hidden pointer to a vtable.
No, regular member functions don't make the class or struct larger. Introducing a virtual function might (on many platforms) add a vtable pointer to the class. On x86 that would increase the size by four bytes. No more memory will be required as you add virtual functions, though -- one pointer is sufficient. The size of a class or struct type is never zero (regardless of whether it has any member variables or virtual functions). This is to make sure that each instance occupies its own memory space (source, section 9.0.3).
In my opinion, the best way to reduce memory is to consider your algorithmic space compexity instead of justing doing fine code optimizations. Reconsider things like dynamic programming tables, unnecessary copies, generally any thing that is questionable in terms of memory efficiency. Also, try to free memory resources early whenever they are not needed anymore.
For your final example (the tree), you can use a clever hack with XOR to replace the two node pointers with a single node pointer, as described here. This only works if you traverse the tree in the right order, however. Obviously this hurts code readability, so should be something of a last resort.
You could use compilation flags to do some optimization. If you are using g++ you could test with: -O2
There are great threads about the subject:
C++ Optimization Techniques
Should we still be optimizing "in the small"?
Constants and compiler optimization in C++
What are the known C/C++ optimizations for GCC
The two possibilities are not at all equivalent:
In the first, f() is a member function of node.
In the second, f() is a free (or namespace-scope) function. (Note also that the signature of two f() are different.)
Now note that, in the first style, f() is an inline member function. Defining a member function inside the class body makes it inline. Although inlining is not guranteed, it is just a hint to the compiler. For functions with small bodies, it may be good to inline them, as it would avoid function call over head. However, I have never seen that to be a make-or-break factor.
If you do not want or if f() does not qualifiy for inlining, you should define it outside the class body (probably in a .cpp file) as:
int node::f() { /* stuff here */ }
If memory usage is a problem in your code, then most probably the above topics are not relevant. Exploring the following might give you some hint
Find the sizes of all classes in your program. Use sizeof to find this information, e.g. sizeof( node)
Find what is the maximum number of objects of each class that your program is creating.
Using the above two series of information, estimate worst case memory usage by your program
Worst case memory usage = n1 * sizeof( node1 ) + n2 * sizeof( node2 ) + ...
If the above number is too high, then you have the following choices:
Reducing the number of maximum instances of classes. This probably won't be possible because this depends on the input to the program, and that is beyond your control
Reduce the size of each classes. This is in your control though.
How to reduce the size of the classes? Try packing the class members compactly to avoid packing.
As others have said, having methods doesn't increase the size of the struct unless one of them is virtual.
You can use bitfields to effectively compress the data (this is especially effective with your boolean...). Also, you can use indices instead of pointers, to save some bytes.
Remember to allocate your nodes in big chunks rather than individually (e.g., using new[] once, not regular new many times) to avoid memory management overhead.
If you don't need the full flexibility your node pointers provide, you may be able to reduce or eliminate them. For example, heapsort always has a near-full binary tree, so the standard implementation uses an implicit tree, which doesn't need any pointers at all.
Above all, finding a different algorithm may change the game completely...

Is this a safe way to implement a generic operator== and operator<?

After seeing this question, my first thought was that it'd be trivial to define generic equivalence and relational operators:
#include <cstring>
template<class T>
bool operator==(const T& a, const T& b) {
return std::memcmp(&a, &b, sizeof(T)) == 0;
}
template<class T>
bool operator<(const T& a, const T& b) {
return std::memcmp(&a, &b, sizeof(T)) < 0;
}
using namespace std::rel_ops would then become even more useful, since it would be made fully generic by the default implementations of operators == and <. Obviously this does not perform a memberwise comparison, but instead a bitwise one, as though the type contains only POD members. This is not entirely consistent with how C++ generates copy constructors, for instance, which do perform memberwise copying.
But I wonder whether the above implementation is indeed safe. The structures would naturally have the same packing, being of the same type, but are the contents of the padding guaranteed to be identical (e.g., filled with zeros)? Are there any reasons why or situations in which this wouldn't work?
No -- just for example, if you have T==(float | double | long double), your operator== doesn't work right. Two NaNs should never compare as equal, even if they have the identical bit pattern (in fact, one common method of detecting a NaN is to compare the number to itself -- if it's not equal to itself, it's a NaN). Likewise, two floating point numbers with all the bits in their exponents set to 0 have the value 0.0 (exactly) regardless of what bits might be set/clear in the significand.
Your operator< has even less chance of working correctly. For example, consider a typical implementation of std::string that looks something like this:
template <class charT>
class string {
charT *data;
size_t length;
size_t buffer_size;
public:
// ...
};
With this ordering of the members, your operator< will do its comparison based on the addresses of the buffers where the strings happen to have stored their data. If, for example, it happened to have been written with the length member first, your comparison would use the lengths of the strings as the primary keys. In any case, it won't do a comparison based on the actual string contents, because it will only ever look at the value of the data pointer, not whatever it points at, which is what you really want/need.
Edit: As far as padding goes, there's no requirement that the contents of padding be equal. It's also theoretically possible for padding to be some sort of trap representation that will cause a signal, throw an exception, or something on that order, if you even try to look at it at all. To avoid such trap representations, you need to use something like a cast to look at it as a buffer of unsigned chars. memcmp might do that, but then again it might not...
Also note that being the same types of objects does not necessarily mean the use the same alignment of members. That's a common method of implementation, but it's also entirely possible for a compiler to do something like using different alignments based on how often it "thinks" a particular object will be used, and include a tag of some sort in the object (e.g., a value written into the first padding byte) that tells the alignment for this particular instance. Likewise, it could segregate objects by (for example) address, so an object located at an even address has 2-byte alignment, at an address that's a multiple of four has 4-byte alignment, and so on (this can't be used for POD types, but otherwise, all bets are off).
Neither of these is likely or common, but offhand I can't think of anything in the standard that prohibits them either.
Never do this unless you're 100% sure about the memory layout, compiler behavior, and you really don't care portability, and you really want to gain the efficiency
SOURCE
Even for POD, == operator can be wrong. This is due to alignment of structures like the following one which takes 8 bytes on my compiler.
class Foo {
char foo; /// three bytes between foo and bar
int bar;
};
That's highly dangerous because the compiler will use these definitions not only for plain old structs, but also for any classes, however complex, for which you forgot to define == and < properly.
One day, it will bite you.
A lot can depend on your definition of equivalence.
e.g. if any of the members that you are comparing within your classes are floating point numbers.
The above implementation may treat two doubles as not equal even though they came from the same mathematical calculation with the same inputs - as they may not have generated exactly the same output - rather two very similar numbers.
Typically such numbers should be compared numerically with an appropriate tolerance.
Any struct or class containing a single pointer will instantly fail any sort of meaningful comparison. Those operators will ONLY work for any class that is Plain Old Data, or POD. Another answerer correctly pointed out floating points as a case when even that won't hold true, and padding bytes.
Short answer: If this was a smart idea, the language would have it like default copy constructors/assignment operators.