Compare two somewhat large objects for equality - c++

I have to compare two larger objects for equality.
Properties of the objects:
Contain all their members by value (so no pointers to follow).
They also contain some stl::array.
They contain some other objects for which 1 and 2 hold.
Size is up to several kB.
Some of the members will more likely differ than others, therefore lead to a quicker break of the comparison operation if compared first.
The objects do not change. Basically, the algorithm is just to count how many objects are the same. Each object is only compared once against several "master" objects.
What is the best way to compare these objects? I see three options:
Just use plain, non-overloaded operator==.
Overload == and perform a member-by-member comparison, beginning with members likely to differ.
Overload == and view the object as a plain byte field and compare word by word.
Some thoughts:
Option 1 seems good because it means the least amount of work (and opportunities to introduce errors).
Option 2 seems good, because I can exploit the heuristic about which elements differ most likely. But maybe it's still slower because built-in ==of option 1 is ridiculously fast.
Option 3 seems to be most "low-level" optimized, but that's what the compiler probably also does for option 1.
So the questions are:
Is there a well-known best way to solve the task?
Is one of the options an absolute no-go?
Do I have to consider something else?

Default == is fast for small objects, but if you have big data members to compare, try to find some optimizations thinking about the specific data stored and the way they are update, redefining an overloaded == comparison operator more "smarter" than the default one.
As many already said, option 3 is wrong, due to the fact that fields generally are padded to respect the data-alignment, and for optimization reason the bytes added are not initialized to 0 (maybe this is done in the DEBUG version).
I can suggest you to explore the option to divide the check in two stages:
first stage, create some sort of small & fast member that "compress" the status of the instance (think like an hash); this field could be updated every time some big field changes, for example the elements of the stl-array. Than check frequently changed fields and this "compress" status first to make a conservative comparison (for example, the sum of all int in the array, or maybe the xor)
second stage, use an in-deep test for every members. This is the slowest but complete check, but likely will be activated only sometimes

A good question.
If you have some heuristic about which members are likely to differ - use it. So that overloading operator == and checking suspected members first seems to be a good idea.
About byte-wise comparison (aka memcmp and friends) - may be problematic due to struct member alignment. I.e. the compiler sometimes puts "empty spaces" in your struct layout, so that each member will have required alignment. Those are not initialized and usually contain garbage.
This may be solved by explicit zero-initializing of your whole object. But, I don't see any advantage of memcmp vs automatic operator ==, which is a members-wise comparison. It probably may save some code size (a single call to memcpy vs explicit reads and comparisons), however from the performance perspective this seems to be pretty much the same.

Related

Is it a good idea to base a non-owning bit container on std::vector<bool>? std::span?

In a couple of projects of mine I have had an increasing need to deal with contiguous sequences of bits in memory - efficiently (*). So far I've written a bunch of inline-able standalone functions, templated on the choice of a "bit container" type (e.g. uint32_t), for getting and setting bits, applying 'or' and 'and' to their values, locating the container, converting lengths in bits to sizes in bytes or lengths in containers, etc. ... it looks like it's class-writing time.
I know the C++ standard library has a specialization of std::vector<bool>, which is considered by many to be a design flaw - as its iterators do not expose actual bools, but rather proxy objects. Whether that's a good idea or a bad one for a specialization, it's definitely something I'm considering - an explicit bit proxy class, which will hopefully "always" be optimized away (with a nice greasing-up with constexpr, noexcept and inline). So, I was thinking of possibly adapting std::vector code from one of the standard library implementation.
On the other hand, my intended class:
Will never own the data / the bits - it'll receive a starting bit container address (assuming alignment) and a length in bits, and won't allocate or free.
It will not be able resize the data dynamically or otherwise - not even while retaining the same amount of space like std::vector::resize(); its length will be fixed during its lifespan/scope.
It shouldn't anything know about the heap (and work when there is no heap)
In this sense, it's more like a span class for bits. So maybe start out with a span then? I don't know, spans are still not standard; and there are no proxies in spans...
So what would be a good basis (edit: NOT a base class) for my implementation? std::vector<bool>? std::span? Both? None? Or - maybe I'm reinventing the wheel and this is already a solved problem?
Notes:
The bit sequence length is known at run time, not compile time; otherwise, as #SomeProgrammerDude suggests I could use std::bitset.
My class doesn't need to "be-a" span or "be-a" vector, so I'm not thinking of specializing any of them.
(*) - So far not SIMD-efficiently but that may come later. Also, this may be used in CUDA code where we don't SIMDize but pretend the lanes are proper threads.
Rather than std::vector or std::span I suspect an implementation of your class would share more in common with std::bitset, since it is pretty much the same thing, except with a (fixed) runtime-determined size.
In fact, you could probably take a typical std::bitset implementation and move the <size_t N> template parameter into the class as a size_t size_ member (or whatever name you like) and you'll have your dynamic bitset class with almost no changes. You may want to get rid anything of you consider cruft like the constructors that take std::string and friends.
The last step is then to remove ownership of the underlying data: basically you'll remove the creation of the underlying array in the constructor and maintain a view of an existing array with some pointers.
If your clients disagree on what the underlying unsigned integer type to use for storage (what you call the "bit container"), then you may also need to make your class a template on this type, although it would be simpler if everyone agreed on say uint64_t.
As far as std::vector<bool> goes, you don't need much from that: everything that vector does that you want, std::bitset probably does too: the main thing that vector adds is dynamic growth - but you've said you don't want that. vector<bool> has the proxy object concept to represent a single bit, but so does std::bitset.
From std::span you take the idea of non-ownership of the underlying data, but I don't think this actually represents a lot of underlying code. You might want to consider the std::span approach of having either a compile-time known size or a runtime provided size (indicated by Extent == std::dynamic_extent) if that would be useful for you (mostly if you sometimes use compile-time sizes and could specialize some methods to be more efficient in that case).

What is the rationale for limitations on pointer arithmetic or comparison?

In C/C++, addition or subtraction on pointer is defined only if the resulting pointer lies within the original pointed complete object. Moreover, comparison of two pointers can only be performed if the two pointed objects are subobjects of a unique complete object.
What are the reasons of such limitations?
I supposed that segmented memory model (see here §1.2.1) could be one of the reasons but since compilers can actually define a total order on all pointers as demonstrated by this answer, I am doubting this.
The reason is to keep the possibility to generate reasonable code. This applies to systems with a flat memory model as well as to systems with more complex memory models. If you forbid the (not very useful) corner cases like adding or subtracting out of arrays and demanding a total order on pointers between objects you can skip a lot of overhead in the generated code.
The limitations imposed by the standard allows the compiler to make assumptions on pointer arithmetic and use this to improve quality of the code. It covers both computing things statically in the compiler instead of at runtime and choosing which instrutions and addressing modes to use. As an example, consider a program with two pointers p1 and p2. If the compiler can derive that they point to different data objects it can safely assume that any no operation based on following p1 will ever affect the object pointed to by p2. This allows the compiler to reorder loads and stores based on p1 without consider loads and stores based on p2 and the other way around.
There are architectures where program and data spaces are separated, and it's simply impossible to subtract two arbitrary pointers. A pointer to a function or to const static data will be in a completely different address space than a normal variable.
Even if you arbitrarily supplied a ranking between different address spaces, there's a possibility that the diff_t type would need to be a larger size. And the process of comparing or subtracting two pointers would be greatly complicated. That's a bad idea in a language that is designed for speed.
You only prove that the restriction could be removed - but miss that it would come with a cost (in terms of memory and code) - which was contrary to the goals of C.
Specifically the difference needs to have a type, which is ptrdiff_t, and one would assume it is similar to size_t.
In a segmented memory model you (normally) indirectly have a limitation on the sizes of objects - assuming that the answers in: What's the real size of `size_t`, `uintptr_t`, `intptr_t` and `ptrdiff_t` type on 16-bit systems using segmented addressing mode? are correct.
Thus at least for differences removing that restriction would not only add extra instructions to ensure a total order - for an unimportant corner case (as in other answer), but also spend double the amount of memory for differences etc.
C was designed to be more minimalistic and not to force compiler to spend memory and code on such cases. (In those days memory limitations mattered more.)
Obviously there are also other benefits - like the possibility to detect errors when mixing pointers from different arrays. Similarly as mixing iterators for two different containers is undefined in C++ (with some minor exceptions) - and some debug-implementations detect such errors.
The rationale is that some architectures have segmented memory, and pointers to different objects may point at different memory segments. The difference between the two pointers would then not necessarily be something meaningful.
This goes back all the way to pre-standard C. The C rationale doesn't mention this explicitly, but it hints at this being the reason, if we look where it explains the rationale why using a negative array index is undefined behavior (C99 rationale 5.10 6.5.6, emphasis mine):
In the case of p-1, on the other hand, an entire object would have to be allocated prior to the
array of objects that p traverses, so decrement loops that run off the bottom of an array can fail.
This restriction allows segmented architectures, for instance, to place objects at the start of a
range of addressable memory.
Since the C standard intends to cover the majority of processor architectures, it should also cover this one:
Imagine an architecture (I know one, but wouldn't name it) where pointers are not just plain numbers, but are like structures or "descriptors". Such a structure contains information about the object it points into (its virtual address and size) and the offset within it. Adding or subtracting a pointer produces a new structure with only the offset field adjusted; producing a structure with the offset greater than the size of the object is hardware prohibited. There are other restrictions (such as how the initial descriptor is produced or what are the other ways to modify it), but they are not relevant to the topic.
In most cases where the Stanadrd classifies an action as invoking Undefined Behavior, it has done so because:
There might be platforms where defining the behavior would be expensive. Segmented architectures could behave weirdly if code tries to do pointer arithmetic that extends beyond object boundaries, and some compilers may evaluate p > q by testing the sign of q-p.
There are some kinds of programming where defining the behavior would be useless. Many kinds of code can get by just fine without relying upon forms of pointer addition, subtraction, or relational comparison beyond those given by the Standard.
People writing compilers for various purposes should be capable of recognizing cases where quality compilers intended for such purposes should behave predictably, and handling such cases when appropriate, whether or not the Standard compels them to do so.
Both #1 and #2 are very low bars, and #3 was thought to be a "gimme". Although it has become fashionable for compiler writers to show off their cleverness by finding ways of breaking code whose behavior was defined by quality implementations intended for low-level programming, I don't think the authors of the Standard expected compiler writers to perceive a huge difference between actions which were required to behave predictably, versus those where nearly all quality implementations were expected to behave identically, but where there it might conceivably be useful to let some arcane implementations do something else.
I would like to answer this by inverting the question. Instead of asking why pointer addition and most of the arithmetic operations are not allowed, why do pointers allow only adding or subtracting an integer, post and pre increment and decrement and comparison (or subtraction) of pointers pointing to the same array? It is to do with the logical consequence of the arithmetic operation.
Adding/subtracting an integer n to a pointer p gives me the address of nth element from the currently pointed element either in the forward or reverse direction. Similarly, subtracting p1 and p2 pointing to the same array gives me the count of elements between the two pointers.
The fact (or design) that the pointer arithmetic operations are defined consistent with the type of variable it is pointing to is a real stroke of genius. Any operation other than the permitted ones defies programming or philosophically logical reasoning and therefore is not allowed.

How many arguments can theoretically be passed as parameters in c++ functions?

I was wondering if there was a limit on the number of parameters you can pass to a function.
I'm just wondering because I have to maintain functions of 5+ arguments here at my jobs.
And is there a critical threshold in nbArguments, talking about performance, or is it linear?
Neither the C nor C++ standard places an absolute requirement on the number of arguments/parameters you must be able to pass when calling a function, but the C standard suggests that an implementation should support at least 127 parameters/arguments (§5.2.4.1/1), and the C++ standard suggests that it should support at least 256 parameters/arguments (§B/2).
The precise wording from the C standard is:
The implementation shall be able to translate and execute at least one program that
contains at least one instance of every one of the following limits.
So, one such function must be successfully translated, but there's no guarantee that if your code attempts to do so that compilation will succeed (but it probably will, in a modern implementation).
The C++ standard doesn't even go that far, only going so far as to say that:
The bracketed number following each quantity is recommended as the minimum for that quantity. However, these quantities are only guidelines and do not determine compliance.
As far as what's advisable: it depends. A few functions (especially those using variadic parameters/variadic templates) accept an arbitrary number of arguments of (more or less) arbitrary types. In this case, passing a relatively large number of parameters can make sense because each is more or less independent from the others (e.g., printing a list of items).
When the parameters are more...interdependent, so you're not just passing a list or something on that order, I agree that the number should be considerably more limited. In C, I've seen a few go as high as 10 or so without being terribly unwieldy, but that's definitely starting to push the limit even at best. In C++, it's generally enough easier (and more common) to aggregate related items into a struct or class that I can't quite imagine that many parameters unless it was in a C-compatibility layer or something on that order, where a more...structured approach might force even more work on the user.
In the end, it comes down to this: you're going to either have to pass a smaller number of items that are individually larger, or else break the function call up into multiple calls, passing a smaller number of parameters to each.
The latter can tend to lead toward a stateful interface, that basically forces a number of calls in a more or less fixed order. You've reduced the complexity of a single call, but may easily have done little or nothing to reduce the overall complexity of the code.
In the other direction, a large number of parameters may well mean that you've really defined the function to carry out a large number of related tasks instead of one clearly defined task. In this case, finding more specific tasks for individual functions to carry out, and passing a smaller set of parameters needed by each may well reduce the overall complexity of the code.
It seems like you're veering into subjective territory, considering that C varargs are (usually) passed mechanically the same way as other arguments.
The first few arguments are placed in CPU registers, under most ABIs. How many depends on the number of architectural registers; it may vary from two to ten. In C++, empty classes (such as overload dispatch tags) are usually omitted entirely. Loading data into registers is usually "cheap as free."
After registers, arguments are copied onto the stack. You could say this takes linear time, but such operations are not all created equal. If you are going to be calling a series of functions on the same arguments, you might consider packaging them together as a struct and passing that by reference.
To literally answer your question, the maximum number of arguments is an implementation-defined quantity, meaning that the ISO standard requires your compiler manual to document it. The C++ standard also recommends (Annex B) that no implementation balk at less than 256 arguments, which should be Enough For Anyone™. C requires (§5.2.4.1) support for at least 127 arguments, although that requirement is normatively qualified such as to weaken it to only a recommendation.
It is not really dirty, sometimes you can't avoid using 4+ arguments while maintaining stability and efficiency. If possible it should be minimized for sake of clarity (perhaps by use of structs), especially if you think that some function is becoming a god construct (function that runs most of the program, they should be avoided for sake of stability). If this is the case, functions that take larger numbers of arguments are pretty good indicators of such constructs.

Does it make sense to verify if values are different in a setter

I remember I saw somewhere (probably in Github) an example like this in a setter:
void MyClass::setValue(int newValue)
{
if (value != newValue) {
value = newValue;
}
}
For me it doesn't make a lot of sense, but I wonder if it gives any performance improvement.
It have no sense for scalar types, but it may have sense for some user-defined types (since type can be really "big" or its assignment operator can do some "hard" work).
The deeper the instruction pipeline (and it only gets deeper and deeper on Intel platform at least), the higher the cost of a branch misprediction.
When a branch mispredicts, some instructions from the mispredicted
path still move through the pipeline. All work performed on these
instructions is wasted since they would not have been executed had the
branch been correctly predicted
So yes, adding an if int he code can actually hurt performance. The write would be L1 cached, possibly for a long time. If the write has to be visible then the operation would have to be interlocked to start with.
The only way you can really tell is by actually testing the different alternatives (benchmarking and/or profiling the code). Different compiler, different processors and different code calling it will make a big difference.
In general, and for "simple" data types (int, double, char, pointers, etc), it won't make sense. It will just make the code longer and more complex for the processor [at least if the compiler does what you ask of it - it may realize that "this doesn't make any sense, let's remove this check - I wouldn't rely on that tho' - compilers are often smarter than you, but making life more difficult for the compiler almost never leads to better code].
Edit: Additionally, it only makes GOOD sense to compare things that can be easily compared. If it's difficult to compare the data in the case where they are equal (for example, long strings take a lot of reads from both strings if they are equal [or strings that begin the same, and are only different in the last few characters]. So there is very little saving. The same applies for a class with a bunch of members that are often almost all the same, but one or two fields are not, and so on. On the other hand, if you have a "customer data" class, that has an integer customer ID that must be unique, then comparing just the customer id will be "cheap", but copying the customer name, address, phone number(s), and other data on the customer will be expensive. [Of course, in this case, why is it not a (smart) pointer or reference?]. End Edit.
If the data is "shared" between different processors (multiple threads accessing the same data), then it may help a little bit [in particular if this value is often read, and often written with the same value as before]. This is because "kicking out" the old value from the other processor's caches is expensive, and you only want to do that if you ACTUALLY change something.
And of course, it only makes ANY sense to worry about performance when you are working on code that you know is absolutely on the bleeding edge of the performance hot-path. Anywhere else, making the code as easily readable and as clear and concise as possible is always the best choice - this will also, typically, make the compiler more able to determine what is actually going on and ensure best optimization results.
This pattern is common in Qt, where the API is highly based on signals & slots. This pattern helps to avoid infinite looping in the case of cyclic connections.
In your case, where signals aren't present, this code only kills performance, as pointed out by #remus-rusanu and #mats-petersson.

Use bit arrays?

Imagine there's a fixed and constant set of 'options' (e.g. skills). Every object (e.g. human) can either have or not have any of the options.
Should I maintain a member list-of-options for every object and fill it with options?
OR:
Is it more efficient (faster) to use a bitarray where each bit represents the respective option's taken (or not taken) status?
-edited:-
To be more specific, the list of skills is a vector of strings (option names), definitely shorter than 256.
The target is for the program to be AS FAST as possible (no memory concerns).
That rather depends. If the number of options is small, then use several bool members to represent them. If the list grows large, then both your options become viable:
a bitset (which an appropriate enum to symbolically represent the options) takes a constant, and very small, amount of space, and getting a certain option takes O(1) time;
a list of options, or rather an std::set or unordered_set of them, might be more space-efficient, but only if the number of options is huge, and it is expected that a very small number of them will be set per object.
When in doubt, use either a bunch of bool members, or a bitset. Only if profiling shows that storing options becomes a burden, consider a dynamic list or set representation (and even then, you might want to reconsider your design).
Edit: with less than 256 options, a bitset would take at most 64 bytes, which will definitely beat any list or set representation in terms of memory and likely speed. A bunch of bools, or even an array of unsigned char, might still be faster because accessing a byte is commonly faster than accessing a bit. But copying the structure will be slower, so try several options and measure the result. YMMV.
Using a bit array is faster when testing for the presence of multiple skills in a person in a single operation.
If you use a list of options then you'll have to go over the list one item at a time to find if a skill set exits which would obviously take more time and require many comparison operations.
The bitarray will be generally faster to edit and faster to search. As for space required, just do the math. A list of options requires a dynamically sized array (which suffers some overhead over the set of options itself); but if there are a large number of options, it may be smaller if (typically) only a small number of options are set.