Checking if a class has expected attributes - c++

I am going to give some classes about C++ and data structures, and to check students' progress I'd like them to develop the structures I talk about. This is the common approach for data structures classes, I guess. But I want more, I want the students to have a quick feedback on what they are missing, so I developed several unit tests for the classes that check the behavior and give them instant results on what is wrong.
This has been working properly for the past two semesters, but I want a step further on that automatized correction. I've been studying how to check what are the internal components of a class, so I can know if someone has implemented correctly a tree with a node* root and size_t size and hasn't used additional not-necessary attributes, for instance.
I know that I can have a rough approximation of an object size with sizeof, but the results are not that precise. It frequently is different from what I expect, for example: I tested creating a class with a pointer (8 bytes) and an int (4 bytes), but the sizeof was 28. From what I learnt, probably this has something to do with virtual function table and other alignment stuff.
So, how far and further can I go analyzing if someone has coded a data structure the proper and expected manner? How can I check that someone just didn't #include <list> and created an adaptor (for this I know I can just strip the includes but anyway)?

Let's break this answer into two parts, we'll split on the return from is_standard_layout.
1. Virtual Classes
is_standard_layout will return false, meaning the class is virtual. virtual classes will contain all the members from their parents aside from just a virtual function pointer. You can find more info here. Basically, your best bet for finding the size of the members here is to do sizeof the class in question reduced by sizeof(void*) And that's the size of your virtual class's members.
2. Non-Virtual Classes
is_standard_layout will return true meaning this is not a virtual class. In this case we can use offsetof to find the first member variable past the header information. Finding the end of the object with the pointer to the object and sizeof will you to measure the distance to the point returned by offsetof.
Both of these methods should yield the size of the members in the classes. determining an allowable range for class size is a matter of preference. But placing the evaluation in a static_assert will allow you to also provide a compile time message indicating the reason for the assert.

Related

Dealing with hard to avoid large numbers of const members in a C++ class

This is a followup question on my previous question:
Initialize const members using complex function in C++ class
In short, I have a program that has a class Grid that contains the properties of a 3D grid. I would like the properties of this grid to be read-only after creation, such that complex functions within the class cannot accidentally mess the grid up (such as if(bla = 10), instead of if(bla == 10)) etc. Now, this question has been answered well in the previous discussion: calling an initializer lists via a create function.
Here comes my new problem. My Grid has many properties that just plainly describe the grid (number of grid points, coordinates at grid points etc.) for which it just does not make sense to redistribute them among different objects. Still, basic textbooks in C++ always link functions with a large number of parameters to bad design, but I need them in order to be able to have const member variables.
Are there any standard techniques to deal with such problems?
The answer depends on what you're trying to protect.
If you're trying to assure that users of the class can't inadvertently alter the critical parameters, then the way to do that is to declare these members as private or protected and only provide const getters if they're needed at all outside the class implementation.
If you're trying to assure that the implementer of the Grid class doesn't alter these values, then there a few ways to do so. One simple way is to create a subclass that contains just those parameters and then the answer looks just like 1. Another way is to declare them const in which case they must be initialized when a Grid instance is constructed.
If the answer is 2, then there are also some other things that one can do to prevent inadvertently altering critical values. During the time that you're writing and testing the class implementation, you could temporarily use fixed dummy const values for the critical parameters, assuring that the other functions you write cannot alter those values.
One more trick to avoid specifically the if (i=7) ... error when you meant to write if (i == 7) ... is to always put the constant first. That is, write if (7 == i) .... Also, any decent compiler should be able to flag a warning for this kind of error -- make sure you're taking advantage of that feature by turning on all of the warning and error reporting your compiler provides.

c++ alignment using pragma pack and inheritance

I am not too familiar with concept of packing / alignment in C++, I did some reading about this recently and have a question.
I am deriving from a base class (written by somebody else and I have header for that). Author of this class has used pragma pack to align members to 1 byte boundary. however I am not sure if it is necessary for derived class to do the same or not, what are consequences of packing/not packing derived class with same alignment as base class ?
any help/suggestions will be greatly appreciated
thanks
In everyday, well-written C++ code it doesn't normally matter if there's padding or not, though the choice may impact performance. So, you should be able to derive from that base class without worrying about explicitly specifying any packing yourself. That said, the base class may be packed because there'll be a massive number of instances in memory or bitwise-copied to a file or network stream, in which case you'll want to consider whether instances of your new class may end up mixed in with that data, and whether you also want to use packing for the extra data members for the same reasons.
Not all code is well-written though. For example, if the program treats the objects as binary blobs of data and uses functions like memcmp on them, or does a byte-wise void*/size checksum, then garbage data in padding members may break the logic/behaviour. If the data is written object by object with particular separator or delimiter characters, then embedded garbage may inject unwanted separators/delimiters and break the reading/parsing logic. There's no way to assess these risks without doing an impact study on the existing code.

Access v-table at run-time

Is it possible to access a function's v-table at runtime? Can meta-information such as the number of different function versions be determined? This might be more of a theoretical question, but could a developer put a cap on the number of classes that can extend a given base class by making sure the v-table never exceeds a certain number of rows?
Is it possible to access a function's v-table at runtime? Can meta-information such as the number of different function versions be determined?
Not in a portable way. The standard does not even have the concept of virtual table, it is more of an implementation detail than a requirement, even if all implementations I know use vtables. In the general case there will not even be enough information available at runtime (i.e. the compiler does not need to store the number of entries in the vtable, as it sees the type and can count)
Could a developer put a cap on the number of classes that can extend a given base class by making sure the v-table never exceeds a certain number of rows?
Again no, but since this shows a misconception, it might be worth treating it apart. When a base class has any virtual functions the compiler (in all implementations that use vtables) will create the vtable and that table will have exactly 1 entry per virtual function in the base class (plus some additional data --typeinfo or pointer to it, offset to the beginning of the object or other implementation details). When a class extends that base class, it will not add new elements to that vtable, but rather create a separate vtable (or more, depending on the type hierarchy). If the derived function does not add any new virtual function, the vtable for the derived object will contain the exact number of elements that the original vtable had. That is, you can have a huge hierarchy of inheritances without that affecting the vtable layout at all. What will change are the typeinfo data stored and the pointers to each virtual function, that will refer to the final overrider
an meta-information such as the number of different function versions be determined?
No C++ doesn't support reflection. What you are trying to achieve is not possible in C++ AFAIK
Theoretically, yes, because it's stored in memory and you have access to it. In practice, there is no sane, portable way to do it, because the compiler is free to implement virtual functions in any way it wants, so you would have to dig through your compiler's source code to find out how/where to access the desired information and how to interpret it.
The only glimmer of hope I could imagine for your effort, is the handling of dynamic_cast. Each compiler, with corresponding library support, has some concept of traversing a hierarchy to achieve a dynamic cast. If you could hook into that traversal, you might then know something about how many levels of inheritance you are dealing with. That said, even if you made this work, it would be compiler-specific (as others have said) since such implementation is proprietary.
You can use the Debug Interface Access SDK or other debug support interfaces (gdb) for this sort of thing.
RTT data is more portable but may not have sufficient details for your project.
For your specific question on limiting the v-table and preventing it from being extended too far, you can try this method;
IDiaSymbol::get_classParente
Retrieves a reference to the class parent of the symbol.
HRESULT get_classParent (IDiaSymbol** pRetVal);
You can investigate all of the class related symbol types here, what you might want to do is enumerate all class types loaded, get_classParent recursively and keep a tally of all of the classes which extend your base.
Your class could also require symbols to be available on startup to help with enforcement.

Are type fields pure evil?

As discusses in The c++ Programming Language 3rd Edition in section 12.2.5, type fields tend to create code that is less versatile, error-prone, less intuitive, and less maintainable than the equivalent code that uses virtual functions and polymorphism.
As a short example, here is how a type field would be used:
void print(const Shape &s)
{
switch(s.type)
{
case Shape::TRIANGE:
cout << "Triangle" << endl;
case Shape::SQUARE:
cout << "Square" << endl;
default:
cout << "None" << endl;
}
}
Clearly, this is a nightmare as adding a new type of shape to this and a dozen similar functions would be error-prone and taxing.
Despite these shortcomings and those described in TC++PL, are there any examples where such an implementation (using a type field) is a better solution than utilizing the language features of virtual functions? Or should this practice be black listed as pure evil?
Realistic examples would be preferred over contrived ones, but I'd still be interested in contrived examples. Also, have you ever seen this in production code (even though virtual functions would have been easier)?
When you "know" you have a very specific, small, constant set of types, it can be easier to hardcode them like this. Of course, constants aren't and variables don't, so at some point you might have to rewrite the whole thing anyway.
This is, more or less, the technique used for discriminated unions in several of Alexandrescu's articles.
For example, if I was implementing a JSON library, I'd know each Value can only be an Object, Array, String, Integer, Boolean, or Null—the spec doesn't allow any others.
A type enum can be serialized via memcpy, a v-table can't. A similar feature is that corruption of a type enum value is easy to handle, corruption of the v-table pointer means instant badness. There's no portable way to even test a v-table pointer for validity, calling dynamic_cast or typeinfo to do RTTI checks on an invalid object is undefined behavior.
For example, one instance where I choose to use a type hierarchy with static dispatch controlled by a discriminator and not dynamic dispatch is when passing a pointer to a structure through a Windows message queue. This gives me some protection against other software that may have allocated broadcast messages from a range I'm using (it's supposed to be reserved for app-local messages, do not pass GO if you think that rule is actually respected).
The following guideline is from Clean Code by Robert C. Martin.
"My general rule for switch statements is that they can be tolerated if they appear only once, are used to create polymorphic objects, and are hidden behind an inheritance relationship so that the rest of the system can't see them".
The rationale is this: if you expose type fields to the rest of your code, you'll get multiple instances of the above switch statement. That is a clear violation of DRY. When you add a type, all these switches need to change (or, even worse, they become inconsistent without breaking your build).
My take is: It depends.
A parameterized Factory Method design pattern relies on this technique.
class Creator {
public:
virtual Product* Create(ProductId);
};
Product* Creator::Create (ProductId id) {
if (id == MINE) return new MyProduct;
if (id == YOURS) return new YourProduct;
// repeat for remaining products...
return 0;
}
So, is this bad. I don't think so as we do not have any other alternative at this stage. This is a place where it is absolutely necessary as it involves creation of an object. The type of the object is yet to be known.
The example in OP is however an example which sure needs refactoring. Here we are already dealing with an existing object/type (passed as argument to function).
As Herb Sutter mentions -
"Switch off: Avoid switching on the
type of an object to customize
behavior. Use templates and virtual
functions to let types (not their
calling code) decide their behavior."
Aren't there costs associated to virtual functions and polymorphism? Like maintaining a vtable per class, increase of each class object size by 4 byptes, runtime slowness (I have never measured it though) for resolving the virtual function appropriately. So, for simple situations, using a type field appears acceptable.
I think that if the type corresponds precisely to the implied classes then type is wrong. Where it gets complicated is where the type does not quite match or its not so cut and dried.
Taking your example what if type was Red, Green, Blue. Those are types of shapes. You could even have a color class as a mixin; but its probably too much.
I am thinking of using a type field to solve the problem of vector slicing. That is, I want a vector of hierarchical objects. For example I want my vector to be a vector of shapes, but I want to store circles, rectangles, triangles etc.
You can't do that in the most obvious simple way because of slicing. So the normal solution is to have a vector of pointers or smart pointers instead. But I think there are cases where using a type field will be a simpler solution, (avoids new/delete or alternative lifecycle techniques).
The best example I can think of (and the one I've run into before), is when your set of types is fixed and the set of functions you want to do (that depend on those types) is fluid. That way, when you add a new function, you modify a single place (adding a single switch) rather than adding a new base virtual function with the real implementation scattered all across the classes in your type hierarchy.
I don't know of any realistic examples. Contrived ones would depend on there being some good reason that virtual methods cannot be used.

C++: defining maximum/minimum limits for a class

I have created a class that models time slots in a variable-granularity daily schedule, where, for example, the first time slot is 30 minutes, but the second time slot can be 40 minutes and the first available slot starts at (a value comparable to) 1.
What I want to do now is to define somehow the maximum and minimum allowable values that this class takes and I have two practical questions in order to do so:
1.- Does it make sense to define absolute minimum and maximum in such a way for a custom class? Or better, does it suffice that a value always compares as lower-than any other possible value of the type, given the class's defined relational operators, to be defined the min? (and analogusly for the max)
2.- Assuming the previous question has an answer modeled after "yes" (or "yes but ..."), how do I define such max/min? I know that there is std::numeric_limits<> but from what I read it is intended for "numeric types". Do I interpret that as meaning "represented as a number" or can I make a broader assumption like "represented with numbers" or "having a correspondence to integers"? After all, it would make sense to define the minimum and maximum for a date class, and maybe for a dictionary class, but numeric_limits may not be intended for those uses (I don't have much experience with it). Plus, numeric_limits has a lot of extra members and information that I don't know what to make with. If I don't use numeric_limits, what other well-known / widely-used mechanism does C++ offer to indicate the available range of values for a class?
Having trouble making sense of your question. I think what you're asking is whether it makes sense to be assertive about the class's domain (that data which can be fed to it and make sense), and if so how to be assertive.
The first has a very clear answer: yes, absolutely. You want your class to be, "...easy to use correctly and difficult to use incorrectly." This includes making sure the clients of the class are being told when they do something wrong.
The second has a less clear answer. Much of the time you'll simply want to use the assert() function to assert a function or class's domain. Other times you'll want to throw an exception. Sometimes you want to do both. When performance can be an issue sometimes you want to provide an interface that does neither. Usually you want to provide an interface that can at least be checked against so that the clients can tell what is valid/invalid input before attempting to feed it to your class or function.
The reason you might want to both assert and throw is because throwing an exception destroys stack information and can make debugging difficult, but assert only happens during build and doesn't actually do anything to protect you from running calculations or doing things that can cause crashes or invalidate data. Thus asserting and then throwing is often the best answer so that you can debug when you run into it while testing but still protect the user when those bugs make it to the shelf.
For your class you might consider a couple ways to provide min/max. One is to provide min/max functions in the class's interface. Another might be to use external functionality and yes, numeric_limits might just be the thing since a range is sometimes a type of numeric quantity. You could even provide a more generic interface that has a validate_input() function in your class so that you can do any comparison that might be appropriate.
The second part of your question has a lot of valid answers depending on a lot of variables including personal taste.
As the designer of your schedule/slot code, it's up to you as to how much flexibility/practicality you want.
Two simple approaches would be to either define your own values in that class
const long MIN_SLOT = 1;
const long MAX_SLOT = 999; // for example
Or define another class that holds the definitions
class SchedLimits{
public:
const static long MIN_SLOT = 1;
const static long MAX_SLOT = 999;
}
Simplest of all would be enums. (my thanks to the commenter that reminded me of those)
enum {MIN_SLOT = 1, MAX_SLOT = 999};
Just create some const static members that reflect the minimums and maximums.