I ran into a nasty bug in some of my code. Here's the simplified version:
#include <iostream>
class A
{
public:
std::string s;
void run(const std::string& x)
{
// do some "read-only" stuff with "x"
std::cout << "x = " << x << std::endl;
// Since I passed X as a const referece, I expected string never to change
// but actually, it does get changed by clear() function
clear();
// trying to do something else with "x",
// but now it has a different value although I declared it as
// "const". This killed the code logic.
std::cout << "x = " << x << std::endl;
// is there some way to detect possible change of X here during compile-time?
}
void clear()
{
// in my actual code, this doesn't even happen here, but 3 levels deep on some other code that gets called
s.clear();
}
};
int main()
{
A a;
a.s = "test";
a.run(a.s);
return 0;
}
Basically, the code that calls a.run() use to be used for all kinds of strings in the past and at one point, I needed the exact value that object "a.s" had, so I just put a.s in there and then some time later noticed program behaving weird. I tracked it down to this.
Now, I understand why this is happening, but it looks like one of those really hard to trace and detect bugs. You see the parameter declared as const & and suddenly it's value changes.
Is there some way to detect this during compile-time? I'm using CLang and MSVC.
Thanks.
Is there some way to detect this during compile-time?
I don't think so. There is nothing inherently wrong about modifying a member variable that is referred by a const reference, so there is no reason for the compiler to warn about it. The compiler cannot read your mind to find out what your expectations are.
There are some usages where such wrong assumption could result in definite bugs such as undefined behaviour that could be diagnosed if identified. I suspect that identifying such cases in general would be quite expensive computationally, so I wouldn't rely on it.
Redesigning the interface could make that situation impossible For example following:
struct wrapper {
std::string str;
};
void run(const wrapper& x);
x.str will not alias the member because the member is not inside a wrapper.
Related
Here's an (admittedly brain-dead) refactoring algorithm I've performed on several occasions:
Start with a .cpp file that compiles cleanly and (AFAICT) works correctly.
Read through the file, and wherever there is a local/stack-variable declared without the const keyword, prepend the const keyword to its declaration.
Compile the .cpp file again
If any fresh compile-time errors are reported, examine the relevant lines of code to determine why -- if it turns out the local-variable legitimately does need to be non-const, remove the const keyword from it; otherwise fix whatever underlying issue the const keyword's addition has revealed.
Goto (3) until the .cpp file again compiles cleanly
Setting aside for the moment whether or not it's a good idea to "const all the local variables", is there any risk of this practice introducing a run-time/logic error into the program that wouldn't be caught at compile-time? AFAICT this seems "safe" in that it won't introduce regressions, only compile-time errors which I can then fix right away; but C++ is a many-splendored thing so perhaps there is some risk I haven't thought of.
If you're willing to accept a contrived example, you could enter the world of undefined behavior.
void increment(int & num)
{
++num;
}
int main()
{
int n = 99;
increment(const_cast<int&>(n));
cout << n;
}
The above compiles and outputs 100. The below compiles and is allowed to do whatever it wants (but happened to output 99 for me). Modifying a const object through a non-const access path results in undefined behavior.
void increment(int & num)
{
++num;
}
int main()
{
const int n = 99;
increment(const_cast<int&>(n));
cout << n;
}
Yes, this is contrived because why would someone do a const_cast on a non-const object? On the other hand, this is a simple example. Maybe in more complex code this might actually come up. Shrug I won't claim that this is a big risk, but it does fall under "any risk", as stated in the question.
Adding const to a variable can cause a different overload to be called:
https://godbolt.org/z/aE3jzPjP3
#include <stdexcept>
void func(int &) {
}
void func(const int &) {
throw std::runtime_error("");
}
int main() {
int var = 0;
func(var);
}
Changing var to const will cause the program to call a different function and thereby throw an exception.
It can easily trigger completely different behaviour:
#include <map>
#include <iostream>
class Map {
public:
std::map<int, int> data;
int func() {
return data[5];
}
int func() const {
const auto iter = data.find(5);
return iter == data.end() ? -1 : iter->second;
}
};
int main() {
Map m; // try with const
std::cout << m.func(1) << std::endl;
std::cout << m.data.size() << std::endl;
}
const has no runtime implications, that the compiler will not tell you at compile time and fail the build.
For a strictly internal class that is not intended to be used as part of an API provided to an external client, is there anything inherently evil with initializing a class pointer member variable to itself rather than NULL or nullptr?
Please see the below code for an example.
#include <iostream>
class Foo
{
public:
Foo() :
m_link(this)
{
}
Foo* getLink()
{
return m_link;
}
void setLink(Foo& rhs)
{
m_link = &rhs;
// Do other things too.
// Obviously, the name shouldn't be setLink() if the real code is doing multiple things,
// but this is a code sample.
}
void changeState()
{
// This is a code sample, but play along and assume there are actual states to change.
std::cout << "Changing a state." << std::endl;
}
private:
Foo* m_link;
};
void doSomething(Foo& foo)
{
Foo* link = foo.getLink();
if (link == &foo)
{
std::cout << "A is not linked to anything." << std::endl;
}
else
{
std::cout << "A is linked to something else. Need to change the state on the link." << std::endl;
link->changeState();
}
}
int main(int argc, char** argv)
{
Foo a;
doSomething(a);
std::cout << "-------------------" << std::endl;
// This is a mere code sample.
// In the real code, I'm fetching B from a container.
Foo b;
a.setLink(b);
doSomething(a);
return 0;
}
Output
A is not linked to anything.
-------------------
A is linked to something else. Need to change the state on the link.
Changing a state.
Pros
The benefit to initializing the pointer variable, Foo::link, to itself is to avoid accidental NULL dereferences. Since the pointer can never be NULL, then at worst, the program will produce erroneous output rather than segmentation fault.
Cons
However, the clear downside to this strategy is that it appears to be unconventional. Most programmers are used to checking for NULL, and thus don't expect to check for equality with the object invoking the pointer. As such, this technique would be ill-advised to use in a codebase that is targeted for external consumers, that is, developers expecting to use this codebase as a library.
Final Remarks
Any thoughts from anyone else? Has anyone else said anything substantial on this subject, especially with C++98 in consideration? Note that I compiled this code with a GCC compiler with these flags: -std=c++98 -Wall and did not notice any issues.
P.S. Please feel free to edit this post to improve any terminology I used here.
Edits
This question is asked in the spirit of other good practice questions, such as this question about deleting references.
A more extensive code example has been provided to clear up confusion. To be specific, the sample is now 63 lines which is an increase from the initial 30 lines. Thus, the variable names have been changed and therefore comments referencing Foo:p should apply to Foo:link.
It's a bad idea to start with, but a horrendous idea as a solution to null dereferences.
You don't hide null dereferences. Ever. Null dereferences are bugs, not errors. When bugs happens, all invariances in your program goes down the toilet and there can be no guarantee for any behaviour. Not allowing a bug to manifest itself immediately doesn't make the program correct in any sense, it only serves to obfuscate and make debugging significantly more difficult.
That aside, a structure pointing into itself is a gnarly can of worms. Consider your copy assignment
Foo& operator=(const Foo& rhs) {
if(this != &rhs)
return *this;
if(rhs->m_link != &rhs)
m_link = this;
else
m_link = rhs->m_link;
}
You now have to check whether you're pointing to yourself every time you copy because its value is possibly tied to its own identity.
As it turns out, there's plenty of cases where such checks are required. How is swap supposed to be implemented?
void swap(Foo& x, Foo& y) noexcept {
Foo* tx, *ty;
if(x.m_link == &x)
tx = &y;
else
tx = x.m_link;
if(y.m_link == &y)
ty = &x;
else
ty = y.m_link;
x.m_link = ty;
y.m_link = tx;
}
Suppose Foo has some sort of pointer/reference semantics, then your equality is now also non-trivial
bool operator==(const Foo& rhs) const {
return m_link == rhs.m_link || (m_link == this && rhs.m_link == &rhs);
}
Don't point into yourself. Just don't.
Foo is responsible for its own state. Especially pointers it exposes to its users.
If you expose a pointer in this fashion, as a public member, it is a very odd design decision. My gut has told me the last 30 odd years a pointer like this is not a responsible way to handle Foo's state.
Consider providing getters for this pointer instead.
Foo* getP() {
// create a safe pointer for user
// and indicate an error state. (exceptions might be an alternative)
}
Unless you share more context what Foo is, advice is hard to provide.
is there anything inherently evil with initializing a class pointer member variable to itself rather than NULL or nullptr?
No. But as you pointed out, there might be different considerations depending on the use case.
I'm not sure this would be relevant under most circumstances, but there are some instances where an object needs to hold a pointer of its own type, so its really just pertinent to those cases.
For instance, an element in a singly-linked list will have a pointer to the next element, so the last element in the list would normally have a NULL pointer to show there are no further elements. So using this example, the end element could instead point to itself instead of NULL to denote it is the last element. It really just depends on personal implementation preference.
Many times, you can end up obfuscating code needlessly when trying too hard to make it crash-proof. Depending on the situation, you might mask issues and make problems much harder to debug. For instance, going back to the singly-linked example, if the pointer-to-self initialization method is used, and a bug in the program attempts to access the next element from the end element in the list, the list will return the end element again. This would most likely cause the program to continue "traversing" the list for eternity. That might be harder to find/understand than simply letting the program crash and finding the culprit via debugging tools.
I'm currently reading about mixin classes and I think I unerstand everything more or less. The only thing I don't understand is why I don't need virtual functions anymore. (See here and here)
E.g. greatwolf writes in his answer here that virtual functions are not needed. Here is the example: (I just copied the essential parts)
struct Number
{
typedef int value_type;
int n;
void set(int v) { n = v; }
int get() const { return n; }
};
template <typename BASE, typename T = typename BASE::value_type>
struct Undoable : public BASE
{
typedef T value_type;
T before;
void set(T v) { before = BASE::get(); BASE::set(v); }
void undo() { BASE::set(before); }
};
typedef Undoable<Number> UndoableNumber;
int main()
{
UndoableNumber mynum;
mynum.set(42); mynum.set(84);
cout << mynum.get() << '\n'; // 84
mynum.undo();
cout << mynum.get() << '\n'; // 42
}
But what happens now if I do something like this:
void foo(Number *n)
{
n->set(84); //Which function is called here?
}
int main()
{
UndoableNumber mynum;
mynum.set(42);
foo(&mynum);
mynum.undo();
cout << mynum.get() << '\n'; // 42 ???
}
What value does mynum have and why? Does the polymorphism work in foo()?!?
n->set(84); //Which function is called here?
Number::set will be called here.
Does the polymorphism work in foo()?!?
No, without virtual. If you try the code, you'll get an unspecified value because before doesn't be set at all.
LIVE
I compiled your code in VS 2013, and it gives an unspecified number.
You got no constructor in your struct, which means that the variable before is not initialized.
Your code example invokes undefined behaviour, because you try to read from the int variable n while it is not in a valid status. The question is not what value will be printed. Your program is not required to print anything, or do anything that makes sense, although you are likely using a machine on which the undefined behaviour will only present itself as a seeminly random value in n or on which it will mostly appear as 0.
Your compiler likely gives you an important hint if you allow it to detect such problems, for example:
34:21: warning: 'mynum.Number::n' is used uninitialized in this function [-Wuninitialized]
However, the undefined behaviour starts even before that. Here's how it happens, step by step:
UndoableNumber mynum;
This also creates the Number sub-object with an unintialised n. That n is of type int and can thus have its individual bits set to a so-called trap representation.
mynum.set(42);
This calls the derived-class set function. Inside of set, an attempt is made to set the before member variable to the uninitialised n value with the possible trap representation:
void set(T v) { before = BASE::get(); BASE::set(v); }
But you cannot safely do that. The before = BASE::get() part is already wrong, because Base::get() copies the int with the possible trap representation. This is already undefined behaviour.
Which means that from this point on, C++ as a programming language no longer defines what will happen. Reasoning about the rest of your program is moot.
Still, let's assume for a moment that the copy would be fine. What else would happen afterwards?
Base::set is called, setting n to a valid value. before remains in its previous invalid status.
Now foo is called:
void foo(Number *n)
{
n->set(84); //Which function is called here?
}
The base-class set is called because n is of type Number* and set is non-virtual.
set happily sets the n member variable to 84. The derived-class before remains invalid.
Now the undo function is called and does the following:
BASE::set(before);
After this assignment, n is no longer 84 but is set to the invalid before value.
And finally...
cout << mynum.get() << '\n';
get returns the invalid value. You try to print it. This will yield unspecified results even on a machine which does not have trap representation for ints (you are very likely using such a machine).
Conclusion:
C++ as a language does not define what your program does. It may print something, print nothing, crash or do whatever it feels like, all because you copy an unininitialised int.
In practice, crashing or doing whatever it feels like is unlikely on a typical end-user machine, but it's still undefined what will be printed.
If you want your derived-class set to be called when invoked on a Number*, then you must make set a virtual function in Number.
Inside this following thread routine :
void* Nibbler::moveRoutine(void* attr)
{
[...]
Nibbler* game = static_cast<Nibbler*>(attr);
while (game->_continue == true)
{
std::cout << game->_snake->_body.front()->getX() << std::endl; // display 0
std::cout << game->getDirection() << std::endl; // display 0
game->moveSnake();
std::cout << game->_snake->_body.front()->getX() << std::endl; // display 0
std::cout << game->getDirection() << std::endl; // display 42
}
}
[...]
}
I am calling the member function moveSnake(), which is supposed to modify the positions of the cells forming my snake's body.
void Nibbler::moveSnake()
{
[...]
std::cout << this->_snake->_body.front()->getX() << std::endl; // display 0
this->_snake->_body.front()->setX(3);
this->_direction = 42;
std::cout << this->_snake->_body.front()->getX() << std::endl; // display 3
[...]
}
Although my two coordinates are effectively modified inside my moveSnake() function, they are not anymore when I go back to my routine, where they keep their initial value. I don't understand why this is happening, since if I try to modify any other value of my class inside my moveSnake() function, the instance is modified and it will keep this value back in the routine.
The Nibbler class :
class Nibbler
{
public :
[...]
void moveSnake();
static void* moveRoutine(void*);
private :
[...]
int _direction
Snake* _snake;
IGraphLib* _lib;
pthread_t _moveThread;
...
};
The snake :
class Snake
{
public :
[...]
std::vector<Cell*> _body;
};
And finally the cell :
class Cell
{
public :
void setX(const int&);
void setY(const int&);
int getX() const;
int getY() const;
Cell(const int&, const int&);
~Cell();
private :
int _x;
int _y;
};
The cell.cpp code :
void Cell::setX(const int& x)
{
this->_x = x;
}
void Cell::setY(const int& y)
{
this->_y = y;
}
int Cell::getX() const
{
return this->_x;
}
int Cell::getY() const
{
return this->_y;
}
Cell::Cell(const int& x, const int& y)
{
this->_x = x;
this->_y = y;
}
Cell::~Cell()
{}
On its face, your question ("why does this member not get modified when it should?") seems reasonable. The design intent of what has been shown is clear enough and I think it matches what you have described. However, other elements of your program have conspired to make it not so.
One thing that may plague you is Undefined Behavior. Believe it or not, even the most experienced C++ developers run afoul of UB occasionally. Also, stack and heap corruption are extremely easy ways to cause terribly difficult-to-isolate problems. You have several things to turn to in order to root it out:
Debuggers (START HERE!)
with a simple single-step debugger, you can walk through your code and check your assumptions at every turn. Set a breakpoint, execute until, check the state of memory/variables, bisect the problem space again, iterate.
Static analysis
Starting with compiler warnings and moving up to lint and sophisticated commercial tools, static analysis can help point out "code smell" that may not necessarily be UB, but could be dead code or other places where your code likely doesn't do what you think it does.
Have you ignored the errors returned by the library/OS you're making calls into? In your case, it seems as if you're manipulating the memory directly, but this is a frequent source of mismatch between expectations and reality.
Do you have a rubber duck handy?
Dynamic analysis
Tools like Electric Fence/Purify/Valgrind(memcheck, helgrind)/Address-Sanitizer, Thread-Sanitizer/mudflaps can help identify areas where you've written to memory outside of what's been allocated.
If you haven't used a debugger yet, that's your first step. If you've never used one before, now is the time when you must take a brief timeout and learn how. If you plan on making it beyond this level, you will be thankful that you did.
If you're developing on Windows, there's a good chance you're using Visual Studio. The debugger is likely well-integrated into your IDE. Fire it up!
If you are developing on linux/BSD/OSX, you either have access to gdb or XCode, both of which should be simple enough for this problem. Read a tutorial, watch a video, do whatever it takes and get that debugger rolling. You may quickly discover that your code has been modifying one instance of Snake and printing out the contents of another (or something similarly frustrating).
If you can't duplicate the problem condition when you use a debugger, CONGRATULATIONS! You have found a heisenbug. It likely indicates a race condition, and that information alone will help you hone in on the source of the problem.
Suppose I am using a library which implements the function foo, and my code could look something like this:
void foo(const int &) { }
int main() {
int x = 1;
foo(x);
std::cout << (1/x) << std::endl;
}
Everything works fine. But now suppose at one point either foo gets modified or overloaded for some reason. Now what we get could be something like this:
void foo(int & x) {
x--;
}
void foo(const int &) {}
int main() {
int x = 1;
foo(x);
std::cout << (1/x) << std::endl;
}
BAM. Suddenly the program breaks. This is because what we actually wanted to pass in that snippet was a constant reference, but with the API change suddenly the compiler selects the version we don't want and the program breaks unexpectedly.
What we wanted was actually this:
int main() {
int x = 1;
foo(static_cast<const int &>(x));
std::cout << (1/x) << std::endl;
}
With this fix, the program starts working again. However, I must say I've not seen many of these casts around in code, as everybody seems to simply trust this type of errors not to happen. In addition, this seems needlessly verbose, and if there's more than one parameter and names start to become longer, function calls get really messy.
Is this a reasonable concern and how should I go about it?
If you change a function that takes a const reference so that it no longer is a const, you are likely to break things. This means you have to inspect EVERY place where that function is called, and ensure that it is safe. Further having two functions with the same name, one with const and one without const in this sort of scenario is definitely a bad plan.
The correct thing to do is to create a new function, which does the x-- variant, with a different name from the existing one.
Any API supplier that does something like this should be severely and physically punished, possibly with slightly less violence involved if there is a BIG notice in the documentation saying "We have changed function foo, it now decrements x unless the parameter is cast to const". It's one of the worst possible binary breaks one can imagine (in terms of "it'll be terribly hard to find out what went wrong").