C++ test to verify equality operator is kept consistent with struct over time - c++

I voted up #TomalakGeretkal for a good note about by-contract; I'm haven't accepted an answer as my question is how to programatically check the equals function.
I have a POD struct & an equality operator, a (very) small part of a system with >100 engineers.
Over time I expect the struct to be modified (members added/removed/reordered) and I want to write a test to verify that the equality op is testing every member of the struct (eg is kept up to date as the struct changes).
As Tomalak pointed out - comments & "by contract" is often the best/only way to enforce this; however in my situation I expect issues and want to explore whether there are any ways to proactively catch (at least many) of the modifications.
I'm not coming up with a satisfactory answer - this is the best I've thought of:
-new up two instances struct (x, y), fill each with identical non-zero data.
-check x==y
-modify x "byte by byte"
-take ptr to be (unsigned char*)&x
-iterator over ptr (for sizeof(x))
-increment the current byte
-check !(x==y)
-decrement the current byte
-check x==y
The test passes if the equality operator caught every byte (NOTE: there is a caveat to this - not all bytes are used in the compilers representation of x, therefore the test would have to 'skip' these bytes - eg hard code ignore bytes)
My proposed test has significant problems: (at least) the 'don't care' bytes, and the fact that incrementing one byte of the types in x may not result in a valid value for the variable at that memory location.
Any better solutions?
(This shouldn't matter, but I'm using VS2008, rtti is off, googletest suite)

Though tempting to make code 'fool-proof' with self-checks like this, it's my experience that keeping the self-checks themselves fool-proof is, well, a fool's errand.
Keep it simple and localise the effect of any changes. Write a comment in the struct definition making it clear that the equality operator must also be updated if the struct is; then, if this fails, it's just the programmer's fault.
I know that this will not seem optimal to you as it leaves the potential for user error in the future, but in reality you can't get around this (at least without making your code horrendously complicated), and often it's most practical just not to bother.

I agree with (and upvoted) Tomalak's answer. It's unlikely that you'll find a foolproof solution. Nonetheless, one simple semi-automated approach could be to validate the expected size within the equality operator:
MyStruct::operator==(const MyStruct &rhs)
{
assert(sizeof(MyStruct) == 42); // reminder to update as new members added
// actual functionality here ...
}
This way, if any new members are added, the assert will fire until someone updates the equality operator. This isn't foolproof, of course. (Member vars might be replaced with something of same size, etc.) Nonetheless, it's a relatively simple (one line assert) that has a good shot of detecting the error case.

I'm sure I'm going to get downvoted for this but...
How about a template equality function that takes a reference to an int parameter, and the two objects being tested. The equality function will return bool, but will increment the size reference (int) by the sizeof(T).
Then have a large test function that calls the template for each object and sums the total size --> compare this sum with the sizeof the object. The existence of virtual functions/inheritance, etc could kill this idea.

it's actually a difficult problem to solve correctly in a self-test.
the easiest solution i can think of is to take a few template functions which operate on multiple types, perform the necessary conversions, promotions, and comparisons, then verify the result in an external unit test. when a breaking change is introduced, at least you'll know.
some of these challenges are more easily maintained/verified using approaches such as composition, rather than extension/subclassing.

Agree with Tomalak and Eric. I have used this for very similar problems.
Assert does not work unless the DEBUG is defined, so potentially you can release code that is wrong. These tests will not always work reliably. If the structure contains bit fields, or items are inserted that take up slack space cause by compiler aligning to word boundaries, the size won't change. For this reason they offer limited value. e.g.
struct MyStruct {
char a ;
ulong l ;
}
changed to
struct MyStruct {
char a ;
char b ;
ulong l ;
}
Both structures are 8 bytes (on 32bit Linux x86)

Related

Do constexpr values cause binary size to increase?

I know that templatized types such as the below cost nothing on the compiled binary size:
template<auto value>
struct ValueHolder{};
I'm making a program that will use a LOT of such wrapped types, and I don't think I want to be using integral_constants for that reason, since they have a ::value member. I can get away with something more like:
template<typename ValHolder>
struct Deducer;
template<auto value>
struct Deducer<ValueHolder<value>> {
using Output = ValueHolder<value+1>;
};
But it's definitely a bit more work, so I want to make sure I'm not doing it for nothing. Note that we're talking TONS of such values (I'd explain, but I don't want to go on too far a tangent; I'd probably get more comments about "should I do the project" than the question!).
So the question is: Do [static] constexpr values take any size at all in the compiled binary, or are the values substituted at compile-time, as if they were typed-in literally? I'm pretty sure they DO take size in the binary, but I'm not positive.
I did a little test at godbolt to look at the assembly of a constexpr vs non-constexpr array side-by-side, and everything looked pretty similar to me: https://godbolt.org/z/9hecfq
int main()
{
// Non-constexpr large array
size_t arr[0xFFFF] = {};
// Constexpr large array
constexpr size_t cArr[0xFFF] = {};
// Just to avoid unused variable optimizations / warnings
cout << arr[0] << cArr[0] << endl;
return 0;
}
This depends entirely on:
How much the compiler feels like optimizing the variable away.
How you use the variable.
Consider the code you posted. You created a constexpr array. As this is an array, it is 100% legal to index it with a runtime value. This would require the compiler to emit code that accesses the array at that index, which would require that array to actually exist in memory. So if you use it in such a way, it must have storage.
However, since your code only indexes this array with a constant expression index, a compiler that wants to think a bit more than -O0 would allow would realize that it knows the value of all of the elements in that array. So it knows exactly what cArr[0] is. And that means the compiler can just convert that expression into the proper value and just ignore that cArr exists.
Such a compiler could do the same with arr, BTW; it doesn't have to be a constant expression for the compiler to detect a no-op.
Also, note that since both arrays are non-static, neither will take up storage "in the compiled binary". If runtime storage for them is needed, it will be stack space, not executable space.
Broadly speaking, a constexpr variable will take up storage at any reasonable optimization level if you do something that requires it to take up storage. This could be something as innocuous as passing it to a (un-inlined) function that takes the parameter by const&.
Ask your linker :) There is nothing anywhere in the C++ standard that has any bearing on the answer. So you absolutely, positively, must build your code in release mode, and check if in the particular use scenario it does increase the size or not.
Any general results you obtain on other platforms, different compilers (1), other compile options, other modules added/removed to your project, or even any changes to the code, will not have much relevance.
You have a specific question that depends on so many factors that general answers are IMHO useless.
But moreover, if you actually care about the binary size, then it should be already in your test/benchmark suite, you should have integration builds fail when things grow when they shouldn’t, etc. No measurement and no automation are prima facie evidence that you don’t actually care.
So, since you presumably do care about the binary size, just write the code you had in mind and look in your CI dashboard at the binary size metric. Oh, you don’t have it? Well, that’s the first thing to get done before you go any further. I’m serious.
(1): Same compiler = same binary. I’m crazy, you say? No. It bit me once too many. If the compiler binary is different (other than time stamps), it’s not the same compiler, end of story.

C++ memory Alignment on a 64 bit machine

Class A
{
public:
void fun1();
void fun2();
private:
uint16 temp_var;
};
Is there any reason why I shouldn't just make this variable uint16 to full uint64? Doing it this way (uin16), do I leave memory "holes" in the object ?, and I'm told the processor is more efficient in dealing with full uint64s.
for clarification, temp_var is the only member variable. And no where i was using it for size(temp_var) or as a counter to loop back to zero
Thank you all for your inputs, appreciate it..
If the question is, can the compiler do the substitution:
You asked for a uint16 so it gave you a uint16. It would be surprising to get something else.
For instance, imagine if a developer was counting on a behavior of integer overflow or underflow. In that case, if the compiler substituted a uint64 behind the scenes, then that would be problematically surprising for the developer.
Along the same lines, one would expect sizeof(temp_var) to equal sizeof(uint16).
There are probably further cases in which such a substitution could lead to unexpected behavior that wouldn't be anticipated by the developer.
If the question is, can you the developer pick something else:
Sure, you can if you want a variable of that size. So then how about some possibilities where you wouldn't...
If you rely on overflow/underflow behavior of a uint16 then of course you would want to stick to that.
Or perhaps this data is going to be passed along to some further location that only supports values in the range of a uint16, so leaving it that size may make logical sense to try to implicitly document what's valid and/or avoid compiler warnings.
Similarly you might want for sizeof(temp_var) to be 2 bytes instead of 8 for some logical reason relevant to other parts of the program.
I also expect there are some games one could play with the packing pragma, but I presume that isn't relevant to the intended question.
Depending on the goal of your program, logical consistency or clarity of code may be more important than maximum possible performance (especially at the micro level of being concerned about size/type of a member variable). To phrase that another way, uint16 is still fast enough for many many use cases.
In some situations though, there won't be any compelling reason to make the decision one way or the other. At that point, I would go with whatever seems to make the most sense as per personal sensibilities.

Do data type ranges matter as a memory-saving measure anymore?

I was always taught to use the appropriate data type depending on the specific needs of the class/method/function/member/variable/what-have-you. That said, does it even matter anymore?
Hypothetically, if I have a class that has a data member that will never be negative and will never be more than the maximum value of unsigned char, does storing it as an unsigned char (1 byte) versus an int (4 bytes) even matter anymore due to implicit type promotion/demotion, internal representation, register size and the often quoted "CPUs are more efficient when working with int"?
Example:
class Foo {
public:
Foo() : _status(0) { /* DO NOTHING */ }
void AddTo(unsigned char value) {
if(std::numeric_limits<unsigned char>::max() - _stat < value) {
value = std::numeric_limits<unsigned char>::max() - _status;
}
_status += value;
}
void Increment() {
if(_status == std::numeric_limits<unsigned char>::max()) return;
++_status;
}
private:
unsigned char _status;
};
A main effect of generally using "right-sized" types is that you and others waste a lot of time on it.
If you have a zillion values stored, e.g. a very large picture, or if you absolutely need a 64-bit range, say, then sure, in such cases it makes sense to right-size.
But using right-sizing as a general guideline produces no significant gain and much pain.
Authority argument: Bjarne Stroustrup, who created the language, generally just uses a few types, e.g. int for integers.
"Premature optimization is the root of all evil" Donald Knuth.
Is this one data member's size going to significantly impact the size of the class? Are you serializing the class? Is the serialization representation seeing any reduction? Are you making the code harder to read worrying about this when your boss doesn't care?
Y2K, IPv4 32bit addresses, ASCII, yes the future will look back at your code and laugh. Remember moores law, write something that works, and expect that something will be wrong. Until it is you'll never know what. Write testable, maintainable, and refactorable code and it might just stay in production long enough for someone to care.
For most use cases when targeting PCs and servers, you're not going to need to worry about using chars vs using ints to hold numeric values. Just use an int or, if you need a larger range, a long.
However, if you're targeting a platform with 16 bytes of RAM which has less than 1 KB to store your program, you may need to carefully consider whether that loop counter really has to take up more than 1 byte.
Unless there's a particular reason for choosing some other variable type, just stick with int. A large part of modern programming is managing complexity and there's no reason to start sprinkling your code with a whole variety of types if it doesn't actually help anything. Sure, if you have 5,000 copies of a particular class or working on a system with a tightly constrained memory footprint then it might be important. But on a multigigabyte system this isn't generally going to be a concern. In that case it's more about writing something understandable and maintainable.
You are hitting one of the problems of C-style languages. They deprive the ability to do range checking that you can do in other languages. If your value should be within a specific range, the ability to say a type can be, say, 1..64 is a big help for error tracking. I have found so many bugs in C/C++ code by converting it to pascal or ada.
I like to use typedefs for documentation purposes in the situation you describe--
COLORCOMPONENT
DEGREES
RADIANS
--for documentation purposes. Even if the compiler does not do the checking for me, I can usually spot when I am using degrees when I should be using radians.

Use of Literals, yay/nay in C++

I've recently heard that in some cases, programmers believe that you should never use literals in your code. I understand that in some cases, assigning a variable name to a given number can be helpful (especially in terms of maintenance if that number is used elsewhere). However, consider the following case studies:
Case Study 1: Use of Literals for "special" byte codes.
Say you have an if statement that checks for a specific value stored in (for the sake of argument) a uint16_t. Here are the two code samples:
Version 1:
// Descriptive comment as to why I'm using 0xBEEF goes here
if (my_var == 0xBEEF) {
//do something
}
Version 2:
const uint16_t kSuperDescriptiveVarName = 0xBEEF;
if (my_var == kSuperDescriptiveVarName) {
// do something
}
Which is the "preferred" method in terms of good coding practice? I can fully understand why you would prefer version 2 if kSuperDescriptiveVarName is used more than once. Also, does the compiler do any optimizations to make both versions effectively the same executable code? That is, are there any performance implications here?
Case Study 2: Use of sizeof
I fully understand that using sizeof versus a raw literal is preferred for portability and also readability concerns. Take the two code examples into account. The scenario is that you are computing the offset into a packet buffer (an array of uint8_t) where the first part of the packet is stored as my_packet_header, which let's say is a uint32_t.
Version 1:
const int offset = sizeof(my_packet_header);
Version 2:
const int offset = 4; // good comment telling reader where 4 came from
Clearly, version 1 is preferred, but what about for cases where you have multiple data fields to skip over? What if you have the following instead:
Version 1:
const int offset = sizeof(my_packet_header) + sizeof(data_field1) + sizeof(data_field2) + ... + sizeof(data_fieldn);
Version 2:
const int offset = 47;
Which is preferred in this case? Does is still make sense to show all the steps involved with computing the offset or does the literal usage make sense here?
Thanks for the help in advance as I attempt to better my code practices.
Which is the "preferred" method in terms of good coding practice? I can fully understand why you would prefer version 2 if kSuperDescriptiveVarName is used more than once.
Sounds like you understand the main point... factoring values (and their comments) that are used in multiple places. Further, it can sometimes help to have a group of constants in one place - so their values can be inspected, verified, modified etc. without concern for where they're used in the code. Other times, there are many constants used in proximity and the comments needed to properly explain them would obfuscate the code in which they're used.
Countering that, having a const variable means all the programmers studying the code will be wondering whether it's used anywhere else, keeping it in mind as they inspect the rest of the scope in which it's declared etc. - the less unnecessary things to remember the surer the understanding of important parts of the code will be.
Like so many things in programming, it's "an art" balancing the pros and cons of each approach, and best guided by experience and knowledge of the way the code's likely to be studied, maintained, and evolved.
Also, does the compiler do any optimizations to make both versions effectively the same executable code? That is, are there any performance implications here?
There's no performance implications in optimised code.
I fully understand that using sizeof versus a raw literal is preferred for portability and also readability concerns.
And other reasons too. A big factor in good programming is reducing the points of maintenance when changes are done. If you can modify the type of a variable and know that all the places using that variable will adjust accordingly, that's great - saves time and potential errors. Using sizeof helps with that.
Which is preferred [for calculating offsets in a struct]? Does is still make sense to show all the steps involved with computing the offset or does the literal usage make sense here?
The offsetof macro (#include <cstddef>) is better for this... again reducing maintenance burden. With the this + that approach you illustrate, if the compiler decides to use any padding your offset will be wrong, and further you have to fix it every time you add or remove a field.
Ignoring the offsetof issues and just considering your this + that example as an illustration of a more complex value to assign, again it's a balancing act. You'd definitely want some explanation/comment/documentation re intent here (are you working out the binary size of earlier fields? calculating the offset of the next field?, deliberately missing some fields that might not be needed for the intended use or was that accidental?...). Still, a named constant might be enough documentation, so it's likely unimportant which way you lean....
In every example you list, I would go with the name.
In your first example, you almost certainly used that special 0xBEEF number at least twice - once to write it and once to do your comparison. If you didn't write it, that number is still part of a contract with someone else (perhaps a file format definition).
In the last example, it is especially useful to show the computation that yielded the value. That way, if you encounter trouble down the line, you can easily see either that the number is trustworthy, or what you missed and fix it.
There are some cases where I prefer literals over named constants though. These are always cases where a name is no more meaningful than the number. For example, you have a game program that plays a dice game (perhaps Yahtzee), where there are specific rules for specific die rolls. You could define constants for One = 1, Two = 2, etc. But why bother?
Generally it is better to use a name instead of a value. After all, if you need to change it later, you can find it more easily. Also it is not always clear why this particular number is used, when you read the code, so having a meaningful name assigned to it, makes this immediately clear to a programmer.
Performance-wise there is no difference, because the optimizers should take care of it. And it is rather unlikely, even if there would be an extra instruction generated, that this would cause you troubles. If your code would be that tight, you probably shouldn't rely on an optimizer effect anyway.
I can fully understand why you would prefer version 2 if kSuperDescriptiveVarName is used more than once.
I think kSuperDescriptiveVarName will definitely be used more than once. One for check and at least one for assignment, maybe in different part of your program.
There will be no difference in performance, since an optimization called Constant Propagation exists in almost all compilers. Just enable optimization for your compiler.

`short int` vs `int`

Should I bother using short int instead of int? Is there any useful difference? Any pitfalls?
short vs int
Don't bother with short unless there is a really good reason such as saving memory on a gazillion values, or conforming to a particular memory layout required by other code.
Using lots of different integer types just introduces complexity and possible wrap-around bugs.
On modern computers it might also introduce needless inefficiency.
const
Sprinkle const liberally wherever you can.
const constrains what might change, making it easier to understand the code: you know that this beastie is not gonna move, so, can be ignored, and thinking directed at more useful/relevant things.
Top-level const for formal arguments is however by convention omitted, possibly because the gain is not enough to outweight the added verbosity.
Also, in a pure declaration of a function top-level const for an argument is simply ignored by the compiler. But on the other hand, some other tools may not be smart enough to ignore them, when comparing pure declarations to definitions, and one person cited that in an earlier debate on the issue in the comp.lang.c++ Usenet group. So it depends to some extent on the toolchain, but happily I've never used tools that place any significance on those consts.
Cheers & hth.,
Absolutely not in function arguments. Few calling conventions are going to make any distinction between short and int. If you're making giant arrays you could use short if your data fits in short to save memory and increase cache effectiveness.
What Ben said. You will actually create less efficient code since all the registers need to strip out the upper bits whenever any comparisons are done. Unless you need to save memory because you have tons of them, use the native integer size. That's what int is for.
EDIT: Didn't even see your sub-question about const. Using const on intrinsic types (int, float) is useless, but any pointers/references should absolutely be const whenever applicable. Same for class methods as well.
The question is technically malformed "Should I use short int?". The only good answer will be "I don't know, what are you trying to accomplish?".
But let's consider some scenarios:
You know the definite range of values that your variable can take.
The ranges for signed integers are:
signed char — -2⁷ – 2⁷-1
short — -2¹⁵ – 2¹⁵-1
int — -2¹⁵ – 2¹⁵-1
long — -2³¹ – 2³¹-1
long long — -2⁶³ – 2⁶³-1
We should note here that these are guaranteed ranges, they can be larger in your particular implementation, and often are. You are also guaranteed that the previous range cannot be larger than the next, but they can be equal.
You will quickly note that short and int actually have the same guaranteed range. This gives you very little incentive to use it. The only reason to use short given this situation becomes giving other coders a hint that the values will be not too large, but this can be done via a comment.
It does, however, make sense to use signed char, if you know that you can fit every potential value in the range -128 — 127.
You don't know the exact range of potential values.
In this case you are in a rather bad position to attempt to minimise memory useage, and should probably use at least int. Although it has the same minimum range as short, on many platforms it may be larger, and this will help you out.
But the bigger problem is that you are trying to write a piece of software that operates on values, the range of which you do not know. Perhaps something wrong has happened before you have started coding (when requirements were being written up).
You have an idea about the range, but realise that it can change in the future.
Ask yourself how close to the boundary are you. If we are talking about something that goes from -1000 to +1000 and can potentially change to -1500 – 1500, then by all means use short. The specific architecture may pad your value, which will mean you won't save any space, but you won't lose anything. However, if we are dealing with some quantity that is currently -14000 – 14000, and can grow unpredictably (perhaps it's some financial value), then don't just switch to int, go to long right away. You will lose some memory, but will save yourself a lot of headache catching these roll-over bugs.
short vs int - If your data will fit in a short, use a short. Save memory. Make it easier for the reader to know how much data your variable may fit.
use of const - Great programming practice. If your data should be a const then make it const. It is very helpful when someone reads your code.