c++ exceptions and bad reading of union

c++ exceptions and bad reading of union - c++

I have a class containing union as a field. The union is of pointers to two different classes. As a second field my class contains a flag informing which type is currently stored.
class Item {
std::string *title;
bool who_am_I;
union { Submenu *smenu; Function *call; } content;
public:
bool am_I_a_submenu();
bool am_I_a_function();
Submenu *give_me_submenu();
Function *give_me_function();
/*(...)*/
};
Now, before each usage of my "give_me" methods I urge user to check the type through the appropriate method accessing the flag, i.e. the "am_I" methods. Nevertheless I would like my library to throw appropriate exception if the user would happen to forget about it. Can I do that without checking the flag inside the "give_me" method? I ask because it would mean that in normal usage the flag is unnecessarily checked twice.
I was wondering if, which and when does c++ throw some in build exception once the conflict of types will appear causing program malfunctioning. Or maybe should I handle this case otherwise, still preferably without double checking the flag.

OK, first of all... why do you care if the flag is checked twice? Is that really a serious performance bottleneck in your product that profiling has shown must be optimized? I very much doubt it.
But even if it is a performance bottleneck, what's more important? A properly functioning application that doesn't crash seems to me to be an acceptable tradeoff for a super-tiny amount of extra overhead.
But you should probably just redesign your interface(s) so that the user always knows what he has and can't make mistakes like this.

There is no way to check what's stored inside union. If you use union, it's up to you to make sure you are always accessing the right element. Reading the wrong one is undefined behavior and will not throw exception.
I don't know what are you doing, but I would recommend you not to use union at all. It's an old fashioned way to save some memory and not worth the bugs it is likely to introduce. In your case, it does not save memory at all (preceeding boolean takes as much memory as a pointer would, because of memory alignment).

By using a union you are telling the compiler that the two types are compatible. Normally converting from the two would cause a compile time issue, since there is no user-specified, nor default conversion, however since you defined it as a union there is a default conversion to both types.
I would postulate that an extra call to check a boolean is probably not too expensive in this case.

Related

Should I reset primitive member variable in destructor?

Please see following code,
class MyClass
{
public:
int i;
MyClass()
{
i = 10;
}
};
MyClass* pObj = nullptr;
int main()
{
{
MyClass obj;
pObj = &obj;
}
while (1)
{
cout << pObj->i; //pObj is dangling pointer, still no crash.
Sleep(1000);
}
return 0;
}
obj will die once it comes out of scope. But I tested in VS 2017, I see no crash even after I use it.
Is it good practice to reset int member varialbe i?

Accessing a member after an object got destroyed is undefined behavior. It may seem like a good to set members in a destructor to a predictable and most likely unexpected value, e.g., a rather large value or a value with specific bit pattern making it easy to recognize the value in a debugger.
However, this idea is flawed and dwarved by the system:
All classes would need to play along and instead of concentrating on creating correct code developers would spent time (both development time as well as run-time) making pointless change.
Compilers happen get rather smart and can detect that changes in destructor are not needed. Since a correct program cannot detect whether the change was made they may not make the change at all. This effect is an actual issue for security applications where, e.g., a password should be erased from memory so it cannot be read (using some non-portable mean).
Even if the value gets set to a specific value, memory gets reused and the values get overwritten. Especially with objects on the stack it is most likely that the memory is used for something else before you see the bad value in a debugger.
Even when resetting values you would necessarily see a "crash": a crash is caused by something being setup to protect against something invalid. In your example you are accessing an int on the stack: the stack will remain accessible from a CPU point of view and at best you'd get an unexpected value. Use of unusual pointer values typically leads to a crash because the memory management system tries to access a location which isn't mapped but even that isn't guaranteed: on a busy 32 bit system pretty much all memory may be in use. That is, trying to rely on undefined behavior being detect is also futile.
Correspondingly, it is much better to use good coding practices which avoid dangling references right away and concentrate on using these. For example, I'm always initializing members in the member initializer list, even in the rare cases they end up getting changed in the body of the constructor (i.e., you'd write your constructor as MyClass(): i() {}).
As a debugging tool it may be reasonable to replace the allocation functions (ideally the allocator object but potentially the global operator new()/operator delete() and family with a version which doesn't quickly hand out released memory and instead fills the released memory with a predictable pattern. Since these actions slow down the program you'd only use this code in a debug build but it is relatively simple to implement once and easy to enable/disable centrally it may be worth the effort. In practice I don't think even such a system pays off as use of managed pointers and proper design of ownership and lifetime avoid most errors due to dangling references.

The behaviour of code you gave is undefined. Partial case of undefined behaviour is working as expected, so here is nothing strange that the code works. Code can work now and it can broke anyway at any time depending of compiler version, compiler options, stack content and a moon phase.
So first and most important is to avoid dangling pointers (and all other kinds of undefined behaviour) everywhere.
What about clearing variables in destructor, I found a best practice:
Follow coding rules saving me from mistakes of access to unallocated or destroyed objects. I cannot describe it in a few words but rules are pretty common (see here and anywhere).
Analyze code by humans (code review) or by statical analyzers (like cppcheck or PVS-Studio or another) to avoid cases similar to one you described above.
Do not call delete manually, better use scoped_ptr or similar object lifetime managers. When delete is reasonable, I usually (usually) set pointer to nullptr after deletion to keep myself from mistakes.
Use pointers as rare as it possible. References are preferred.
When objects of my class used outside and I suspect that somebody can access it after deletion I can put signature field inside, set it to something like 0xDEAD in destructor and check at enter or every public method. Here be careful to not slow down your code to unacceptable speed.
After all of this setting i from your example to 0 or -1 is redundant. As for me it's not a thing you should focus your attention.

Should operators == or != throw in this case?

I'm writing a program where I have created a std::vector of POD structs. One of the members of the struct is a unique identifier.
In order to be able to use std::binary_search I have to implement operator< for the struct. Following the guidelines here, I'm writing the full set of overloads for ==, !=, <, >, >= and <=.
This presents one issue I wasn't sure how to handle. The vector will be ordered by the unique ID I assign each struct. Two structs are the same if they have the same identifier. However, it occurs to me that a situation could crop up if two structs have the same identifier but different data in the other members.
This should never happen. Would it be appropriate to have the comparison operator check the rest of the fields then and throw an exception if they're different but the ID is the same? What sort of exception would be most appropriate?

This is merely an expansion on SCombinator's answer.
The fact that you said "This should never happen." means you want to use an assert, not an exception (or a combination of the two). An exception would hide the error - well, not hide, but you'll be able to catch it and continue. It is better suited for exceptional situations which you didn't really plan for - for example you're attempting to open a file that doesn't exist. It's not really part of the logic that the file is missing, it's your little brother accidentally deleting your file, or an aggressive anti-virus that thought it was a virus, or whatever - it's just an exceptional situation.
If members with the same ID but otherwise different is something that should never happen, that's basically an assertion. It's part of the logic - part of the requirements if you will. Throwing an exception simply points this out, but there's not really a way you can recover from it. By the time you realize something is wrong, it's already 2 late. You have two objects with the same id, yet different, you don't know which one is correct, you don't know why the incorrect one exists, and so on. You probably don't even want to recover from it. The application is already in a erroneous state - the two contradicting objects already exist. Your application is in an un-recovarable state - if you continue with it, you'll probably get wrong results or worse.
If it's not a critical assertion, you can also throw an exception afterwards, and provide a clean way to close the application, but that's just a beautification.
In general, I follow a simple rule - If it's something that can happen under extraordinary circumstances, I use exceptions. If it's something that should never happen, and if it does, means something is seriously wrong with the logic in the code, I use an assert (and possibly a force crash).

If such a thing is a logical error, you should probably use an assert.

Well, you have to ask yourself a few questions:
1) Is this likely to happen during testing (ie signaling a bug) but will never happen during normal execution? If the answer is yes, use an assert.
2) Is this going to happen during normal runtime, and if so, can the program recover and continue execution? If the answer is yes, throw an error, catch and handle it.
3) Is this going to happen during normal runtime and is it unrecoverable? If yes, either call abort or something like that (exit, terminate) or throw an exception which you're not going to handle.

Bad data should be rejected when the data is created. It's not the job of comparison operators to weed out mistakes; that should be done before the program does any serious data manipulation. So the answer to the question is no, operator== and operator!= should not throw exceptions on bad data; they should be written with the assumption that the data that's being passed to them is valid.

You could just use a map which is implemented with a binary search tree, although you'd be duplicating an id. Then again, you wouldn't even need to store the id in the struct, since they are supposed to be unique.
// Example POD struct.
struct MyStruct
{
int id; // Redundant.
char a;
}
std::map<int, MyStruct> myMap;
MyStruct m; m.id = 1;
myMap[m.id] = m;
// Or simply..
myMap[1] = m;

Not sure when to use exceptions in C++

After many years of coding scientific software in C++, I still can't seem to get used to exceptions and I've no idea when I should use them. I know that using them for controlling program flow is a big no-no, but otherwise than that... consider the following example (excerpt from a class that represents an image mask and lets the user add areas to it as polygons):
class ImageMask
{
public:
ImageMask() {}
ImageMask(const Size2DI &imgSize);
void addPolygon(const PolygonI &polygon);
protected:
Size2DI imgSize_;
std::vector<PolygonI> polygons_;
};
The default constructor for this class creates a useless instance, with an undefined image size. I don't want the user to be able to add polygons to such an object. But I'm not sure how to handle that situation. When the size is undefined, and addPolygon() is called, should I:
Silently return,
assert(imgSize_.valid) to detect violations in code using this class and fix them before a release,
throw an exception?
Most of the time I go either with 1) or 2) (depending on my mood), because it seems to me exceptions are costly, messy and simply overkill for such a simple scenario. Some insight please?

The general rule is that you throw an exception when you cannot perform the desired operation. So in your case, yes, it does make sense to throw an exception when addPolygon is called and the size is undefined or inconsistent.
Silently returning is almost always the wrong thing to do. assert is not a good error-handling technique (it is more of a design/documentation technique).
However, in your case a redesign of the interface to make an error condition impossible or unlikely may be better. For example, something like this:
class ImageMask
{
public:
// Constructor requires collection of polygons and size.
// Neither can be changed after construction.
ImageMask(std::vector<PolygonI>& polygons, size_t size);
}
or like this
class ImageMask
{
public:
class Builder
{
public:
Builder();
void addPolygon();
};
ImageMask(const Builder& builder);
}
// used like this
ImageMask::Builder builder;
builder.addPolygon(polyA);
builder.addPolygon(polyB);
ImageMask mask(builder);

I would try to avoid any situation where it's possible to create data that is in some kind of useless state. If you need a polygon that is not empty, than don't let empty polygons be created and you save yourself much trouble because the compiler will enforce that there are no empty polygons.
I never use silent returns, because they hide bugs and this makes finding bugs much more complicated than it have to be.
I use asserts when I detect that the program is in a state that it only can be in, if there is a bug in the software. In your example, if you check in the c'tor that takes a Size2DI, that this size is not empty, than asserting if the size stored is not empty, is useful to detect bugs. Asserts should not have side effect and it must be possible to remove them, without changing the behavior of the software. I find them very useful, to find my own bugs and to document, the current state of the object / function etc.
If it's very likely, that a runtime error will be handled directly by a caller of a function, I would use conventional return values. If it's very likely, that this error situation have to be communicated over several function calls at the call stack, I prefer exceptions. In doubt I offer two function.
kind regards
Torsten

To me, 1 is a no option. Whether it is 2 or 3 depends on the design of your program/library, whether you consider (and document) default-constructing image mask and then adding polygons a valid or invalid usage of your component. This is an important design decision. I recommend reading this article by Matthew Wilson.
Note that you have more options:
Invent your own assert that always calls std::terminate and does additional logging
Disable the default constructor (as others already pointed out) -- this is my favourite

"Silently return" - that's real 'the big no-no'. The program should know what's wrong.
"assert" - the second rule is that asserts using only if normal program's flow couldn't be restored.
"throw exception" - yes, this right and good technique. Just take care about exception-safety. There are many articles about exception-safe coding on GotW.
Don't afraid exceptions. They don't bite. :) If you'll take this technique enough, you'll be a strong coder. ;)

Best practice when calling initialize functions multiple times?

This may be a subjective question, but I'm more or less asking it and hoping that people share their experiences. (As that is the biggest thing which I lack in C++)
Anyways, suppose I have -for some obscure reason- an initialize function that initializes a datastructure from the heap:
void initialize() {
initialized = true;
pointer = new T;
}
now When I would call the initialize function twice, an memory leak would happen (right?). So I can prevent this is multiple ways:
ignore the call (just check wether I am initialized, and if I am don't do anything)
Throw an error
automatically "cleanup" the code and then reinitialize the thing.
Now what is generally the "best" method, which helps keeping my code manegeable in the future?
EDIT: thank you for the answers so far. However I'd like to know how people handle this is a more generic way. - How do people handle "simple" errors which can be ignored. (like, calling the same function twice while only 1 time it makes sense).

You're the only one who can truly answer the question : do you consider that the initialize function could eventually be called twice, or would this mean that your program followed an unexpected execution flow ?
If the initialize function can be called multiple times : just ignore the call by testing if the allocation has already taken place.
If the initialize function has no decent reason to be called several times : I believe that would be a good candidate for an exception.
Just to be clear, I don't believe cleanup and regenerate to be a viable option (or you should seriously consider renaming the function to reflect this behavior).

This pattern is not unusual for on-demand or lazy initialization of costly data structures that might not always be needed. Singleton is one example, or for a class data member that meets those criteria.
What I would do is just skip the init code if the struct is already in place.
void initialize() {
if (!initialized)
{
initialized = true;
pointer = new T;
}
}
If your program has multiple threads you would have to include locking to make this thread-safe.

I'd look at using boost or STL smart pointers.

I think the answer depends entirely on T (and other members of this class). If they are lightweight and there is no side-effect of re-creating a new one, then by all means cleanup and re-create (but use smart pointers). If on the other hand they are heavy (say a network connection or something like that), you should simply bypass if the boolean is set...
You should also investigate boost::optional, this way you don't need an overall flag, and for each object that should exist, you can check to see if instantiated and then instantiate as necessary... (say in the first pass, some construct okay, but some fail..)

The idea of setting a data member later than the constructor is quite common, so don't worry you're definitely not the first one with this issue.
There are two typical use cases:
On demand / Lazy instantiation: if you're not sure it will be used and it's costly to create, then better NOT to initialize it in the constructor
Caching data: to cache the result of a potentially expensive operation so that subsequent calls need not compute it once again
You are in the "Lazy" category, in which case the simpler way is to use a flag or a nullable value:
flag + value combination: reuse of existing class without heap allocation, however this requires default construction
smart pointer: this bypass the default construction issue, at the cost of heap allocation. Check the copy semantics you need...
boost::optional<T>: similar to a pointer, but with deep copy semantics and no heap allocation. Requires the type to be fully defined though, so heavier on dependencies.
I would strongly recommend the boost::optional<T> idiom, or if you wish to provide dependency insulation you might fall back to a smart pointer like std::unique_ptr<T> (or boost::scoped_ptr<T> if you do not have access to a C++0x compiler).

I think that this could be a scenario where the Singleton pattern could be applied.

How defensive should you be? [duplicate]

This question already has answers here:
Closed 13 years ago.
Possible Duplicate:
Defensive programming
We had a great discussion this morning about the subject of defensive programming. We had a code review where a pointer was passed in and was not checked if it was valid.
Some people felt that only a check for null pointer was needed. I questioned whether it could be checked at a higher level, rather than every method it is passed through, and that checking for null was a very limited check if the object at the other end of the point did not meet certain requirements.
I understand and agree that a check for null is better than nothing, but it feels to me that checking only for null provides a false sense of security since it is limited in scope. If you want to ensure that the pointer is usable, check for more than the null.
What are your experiences on the subject? How do you write defenses in to your code for parameters that are passed to subordinate methods?

In Code Complete 2, in the chapter on error handling, I was introduced to the idea of barricades. In essence, a barricade is code which rigorously validates all input coming into it. Code inside the barricade can assume that any invalid input has already been dealt with, and that the inputs that are received are good. Inside the barricade, code only needs to worry about invalid data passed to it by other code within the barricade. Asserting conditions and judicious unit testing can increase your confidence in the barricaded code. In this way, you program very defensively at the barricade, but less so inside the barricade. Another way to think about it is that at the barricade, you always handle errors correctly, and inside the barricade you merely assert conditions in your debug build.
As far as using raw pointers goes, usually the best you can do is assert that the pointer is not null. If you know what is supposed to be in that memory then you could ensure that the contents are consistent in some way. This begs the question of why that memory is not wrapped up in an object which can verify it's consistency itself.
So, why are you using a raw pointer in this case? Would it be better to use a reference or a smart pointer? Does the pointer contain numeric data, and if so, would it be better to wrap it up in an object which managed the lifecycle of that pointer?
Answering these questions can help you find a way to be more defensive, in that you'll end up with a design that is easier to defend.

The best way to be defensive is not to check pointers for null at runtime, but to avoid using pointers that may be null to begin with
If the object being passed in must not be null, use a reference! Or pass it by value! Or use a smart pointer of some sort.
The best way to do defensive programming is to catch your errors at compile-time.
If it is considered an error for an object to be null or point to garbage, then you should make those things compile errors.
Ultimately, you have no way of knowing if a pointer points to a valid object. So rather than checking for one specific corner case (which is far less common than the really dangerous ones, pointers pointing to invalid objects), make the error impossible by using a data type that guarantees validity.
I can't think of another mainstream language that allows you to catch as many errors at compile-time as C++ does. use that capability.

There is no way to check if a pointer is valid.

In all serious, it depends on how many bugs you'd like to have to have inflicted upon you.
Checking for a null pointer is definitely something that I would consider necessary but not sufficient. There are plenty of other solid principles you can use starting with entry points of your code (e.g., input validation = does that pointer point to something useful) and exit points (e.g., you thought the pointer pointed to something useful but it happened to cause your code to throw an exception).
In short, if you assume that everyone calling your code is going to do their best to ruin your life, you'll probably find a lot of the worst culprits.
EDIT for clarity: some other answers are talking about unit tests. I firmly believe that test code is sometimes more valuable than the code that it's testing (depending on who's measuring the value). That said, I also think that units tests are also necessary but not sufficient for defensive coding.
Concrete example: consider a 3rd party search method that is documented to return a collection of values that match your request. Unfortunately, what wasn't clear in the documentation for that method is that the original developer decided that it would be better to return a null rather than an empty collection if nothing matched your request.
So now, you call your defensive and well unit-tested method thinking (that is sadly lacking an internal null pointer check) and boom! NullPointerException that, without an internal check, you have no way of dealing with:
defensiveMethod(thirdPartySearch("Nothing matches me"));
// You just passed a null to your own code.

I'm a big fan of the "let it crash" school of design. (Disclaimer: I don't work on medical equipment, avionics, or nuclear power-related software.) If your program blows up, you fire up the debugger and figure out why. In contrast, if your program keeps running after illegal parameters have been detected, by the time it crashes you'll probably have no idea what went wrong.
Good code consists of many small functions/methods, and adding a dozen lines of parameter-checking to every one of those snippets of code makes it harder to read and harder to maintain. Keep it simple.

I may be a bit extreme, but I don't like Defensive Programming, I think it's laziness that has introduced the principle.
For this particular example, there is no sense in assert that the pointer is not null. If you want a null pointer, there is no better way to actually enforce it (and document it clearly at the same time) than to use a reference instead. And it's documentation that will actually be enforced by the compiler and does not cost a ziltch at runtime!!
In general, I tend not to use 'raw' types directly. Let's illustrate:
void myFunction(std::string const& foo, std::string const& bar);
What are the possible values of foo and bar ? Well that's pretty much limited only by what a std::string may contain... which is pretty vague.
On the other hand:
void myFunction(Foo const& foo, Bar const& bar);
is much better!
if people mistakenly reverse the order of the arguments, it's detected by the compiler
each class is solely responsible for checking that the value is right, the users are not burdenned.
I have a tendency to favor Strong Typing. If I have an entry that should be composed only of alphabetical characters and be up to 12 characters, I'd rather create a small class wrapping a std::string, with a simple validate method used internally to check the assignments, and pass that class around instead. This way I know that if I test the validation routine ONCE, I don't have to actually worry about all the paths through which that value can get to me > it will be validated when it reaches me.
Of course, that doesn't me that the code should not be tested. It's just that I favor strong encapsulation, and validation of an input is part of knowledge encapsulation in my opinion.
And as no rule can come without an exception... exposed interface is necessarily bloated with validation code, because you never know what might come upon you. However with self-validating objects in your BOM it's quite transparent in general.

"Unit tests verifying the code does what it should do" > "production code trying to verify its not doing what its not supposed to do".
I wouldn't even check for null myself, unless its part of a published API.

It very much depends; is the method in question ever called by code external to your group, or is it an internal method?
For internal methods, you can test enough to make this a moot point, and if you're building code where the goal is highest possible performance, you might not want to spend the time on checking inputs you're pretty darn sure are right.
For externally visible methods - if you have any - you should always double check your inputs. Always.

From debugging point of view, it is most important that your code is fail-fast. The earlier the code fails, the easier to find the point of failure.

For internal methods, we usually stick to asserts for these kinds of checks. That does get errors picked up in unit tests (you have good test coverage, right?) or at least in integration tests that are running with assertions on.

checking for null pointer is only half of the story,
you should also assign a null value to every unassigned pointer.
most responsible API will do the same.
checking for a null pointer comes very cheap in CPU cycles, having an application crashing once its delivered can cost you and your company in money and reputation.
you can skip null pointer checks if the code is in a private interface you have complete control of and/or you check for null by running a unit test or some debug build test (e.g. assert)

There are a few things at work here in this question which I would like to address:
Coding guidelines should specify that you either deal with a reference or a value directly instead of using pointers. By definition, pointers are value types that just hold an address in memory -- validity of a pointer is platform specific and means many things (range of addressable memory, platform, etc.)
If you find yourself ever needing a pointer for any reason (like for dynamically generated and polymorphic objects) consider using smart pointers. Smart pointers give you many advantages with the semantics of "normal" pointers.
If a type for instance has an "invalid" state then the type itself should provide for this. More specifically, you can implement the NullObject pattern that specifies how an "ill-defined" or "un-initialized" object behaves (maybe by throwing exceptions or by providing no-op member functions).
You can create a smart pointer that does the NullObject default that looks like this:
template <class Type, class NullTypeDefault>
struct possibly_null_ptr {
possibly_null_ptr() : p(new NullTypeDefault) {}
possibly_null_ptr(Type* p_) : p(p_) {}
Type * operator->() { return p.get(); }
~possibly_null_ptr() {}
private:
shared_ptr<Type> p;
friend template<class T, class N> Type & operator*(possibly_null_ptr<T,N>&);
};
template <class Type, class NullTypeDefault>
Type & operator*(possibly_null_ptr<Type,NullTypeDefault> & p) {
return *p.p;
}
Then use the possibly_null_ptr<> template in cases where you support possibly null pointers to types that have a default derived "null behavior". This makes it explicit in the design that there is an acceptable behavior for "null objects", and this makes your defensive practice documented in the code -- and more concrete -- than a general guideline or practice.

Pointer should only be used if do you need to do something with the pointer. Such as pointer arithmetic to transverse some data structure. Then if possible that should be encapsulated in a class.
IF the pointer is passed into the function to do something with the object to which it points, then pass in a reference instead.
One method for defensive programming is to assert almost everything that you can. At the beginning of the project it is annoying but later it is a good adjunct to unit testing.

A number of answer address the question of how to write defenses in your code, but no much was said about "how defensive should you be?". That's something you have to evaluate based on the criticality of your software components.
We're doing flight software and the impacts of a software error range from a minor annoyance to loss of aircraft/crew. We categorize different pieces of software based on their potential adverse impacts which affects coding standards, testing, etc. You need to evaluate how your software will be used and the impacts of errors and set what level of defensiveness you want (and can afford). The DO-178B standard calls this "Design Assurance Level".

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js