accessing element of boost sparse_matrix seems to stall program - c++

I've got a strange bug that I'm hoping a more experience programmer might have some insight into. I'm using the boost ublas sparse matrices, specifically mapped_matrix, and there is an intermittent bug that occurs eventually, but not in the initial phases of the program. This is a large program, so I cannot post all the code but the core idea is that I call a function which belongs to a particular class:
bool MyClass::get_cell(unsigned int i, unsigned int j) const
{
return c(i,j);
}
The variable c is defined as a member of the class
boost::numeric::ublas::mapped_matrix<bool> c;
When the bug occurs, the program seems to stop (but does not crash). Debugging with Eclipse, I can see that the program enters the boost mapped_matrix code and continues several levels down into std::map, std::_Rb_tree, and std::less. Also, the program occasionally traces down to std::map, std::_Rb_tree, and std::_Select1st. While code is executing and the active line what's in memory changes in _Rb_tree, execution never seems to return in the level of std::map. The line in std::map the program is stuck on is the return of the following function.
const_iterator
find(const key_type& __x) const
{ return _M_t.find(__x); }
It seems to me that there is some element in the c matrix that the program is looking for but somehow the underlying storage mechanism has "misplaced it". However, I'm not sure why or how to fix it. That could also be completely off base.
Any help you can provide would be greatly appreciated. If I have not included the right information in this question, please let me know what I'm missing. Thank you.

Some things to try to debug the code (not necessarily permanent changes):
Change the bool to an int in the matrix type for c, to see if the matrix expects numeric types.
Change the matrix type to another with a similar interface, possibly plain old matrix.
Valgrind the app (if you're on linux) to check you're not corrupting memory.
If that fails, you could try calling get_cell every time you modify the matrix to see what might be causing the problem.
Failing that, you may have to try reduce the problem to a much smaller subset of code which you can post here.
It might help if you tell us what compiler and OS you're using.

Is this part of a multithreaded program?
I ask, because usually when I see problems in STL, it ends up being a problem with unsynchronized access.

Related

C++ What could cause a line of code in one place to change behavior in an unrelated function?

Consider this mock-up of my situation.
in an external header:
class ThirdPartyObject
{
...
}
my code: (spread among a few headers and source files)
class ThirdPartyObjectWrapper
{
private:
ThirdPartyObject myObject;
}
class Owner
{
public:
Owner() {}
void initialize();
private:
ThirdPartyObjectWrapper myWrappedObject;
};
void Owner::initialize()
{
//not weird:
//ThirdPartyObjectWrapper testWrappedObject;
//weird:
//ThirdPartyObject testObject;
}
ThirdPartyObject is, naturally, an object defined by a third party (static precompiled) library I'm using. ThirdPartyObjectWrapper is a convenience class that eliminates a lot of boiler-plating for working with ThirdPartyObject. Owner::initialize() is called shortly after an instance of Owner is created.
Notice the two lines I have labeled as "weird" and "not weird" in Owner::initialize(). All I'm doing here is creating a couple of objects on the stack with their default constructors. I don't do anything with those objects and they get destroyed when they leave scope. There are no build or linker errors involved, I can uncomment either or both lines and the code will build.
However, if I uncomment "weird" then I get a segmentation fault, and (here's why I say it's weird) it's in a completely unrelated location. Not in the constructor of testObject, like you might expect, but in the constructor of Owner::myObjectWrapper::myObject. The weird line never even gets called, but somehow its presence or absence consistently changes the behavior of an unrelated function in a static library.
And consider that if I only uncomment "not weird" then it runs fine, executing the ThirdPartyObject constructor twice with no problems.
I've been working with C++ for a year so it's not really a surprise to me that something like this would be able happen, but I've about reached the limit of my ability to figure out how this gotcha is happening. I need the input of people with significantly more C++ experience than me.
What are some possibilities that could cause this to happen? What might be going on here?
Also, note, I'm not asking for advice on how to get rid of the segfault. Segfaults I understand, I suspect it's a simple race condition. What I don't understand is the behavior gotcha so that's the only thing I'm trying to get answers for.
My best lead is that it has to do with headers and macros. The third party library actually already has a couple of gotchas having to do with its headers and macros, for example the code won't build if you put your #include's in the wrong order. I'm not changing any #include's so strictly this still wouldn't make sense, but perhaps the compiler is optimizing includes based on the presence of a symbol here? (it would be the only mention of ThirdPartyObject in the file)
It also occurs to me that because I am using Qt, it could be that the Meta-Object Compiler (which generates supplementary code between compilations) might be involved in this. Very unlikely, as Qt has no knowledge of the third party library where the segfault is happening and this is not actually relevant to the functionality of the MOC (since at no point ThirdPartyObject is being passed as an argument), but it's worth investigating at least.
Related questions have suggested that it could be a relatively small buffer overflow or race condition that gets tripped up by compiler optimizations. Continuing to investigate but all leads are welcome.
Typical culprits:
Some build products are stale and not binary-compatible.
You have a memory bug that has corrupted the state of your process, and are seeing a manifestation of that in a completely unrelated location.
Fixing #1 is trivial: delete the build folder and build again. If you're not building in a shadow build folder, you've set yourself up for failure, hopefully you now know enough to stop :)
Fixing #2 is not trivial. View manual memory management and possible buffer overflows with suspicion. Use modern C++ programming techniques to leverage the compiler to help you out: store things by value, use containers, use smart pointers, and use iterators and range-for instead of pointers. Don't use C-style arrays. Abhor C-style APIs of the (Type * array, int count) kind - they don't belong in C++.
What fun. I've boiled this down to the bottom.
//#include <otherthirdpartyheader.h>
#include <thirdpartyobject.h>
int main(...)
{
ThirdPartyObject test;
return 0;
}
This code runs. If I uncomment the first include, delete all build artifacts, and build again, then it breaks. There's obviously a header/macro component, and probably some kind of compiler-optimization component. But, get this, according to the library documentation it should give me a segfault every time because I haven't been doing a required initialization step. So the fact that it runs at all indicates unexpected behavior.
I'm chalking this up to library-specific issues rather than broad spectrum C++ issues. I'll be contacting the vendor going forward from here, but thanks everyone for the help.

C++ Segmentation fault in binary_function

I'm using Visual Studio 2010 Beta 2 (also tried with NetBeans), and I'm having a segmentation fault in the following code:
// One of the #link s20_3_3_comparisons comparison functors#endlink.
template <class _Tp>
struct less : public binary_function<_Tp, _Tp, bool>
{
bool
operator()(const _Tp& __x, const _Tp& __y) const
{ return __x < __y; } //this is the problem line
};
I don't know what in my program calls it, but I am trying to find out. (I think it's a map) Does anyone know what to do, or has encountered this before?
There's nothing wrong with this code; the problem lies in your code that's calling it. (In fact, since this is part of the STL, it's extremely unlikely that there's a problem with this code.) It's probably getting passed an invalid reference due to deallocated memory, a NULL pointer, or similar.
less is used, by default, for std::map, std::set, and likely some other containers that I'm not thinking of right now, so you can check if you have any containers such as those that are being left with invalid values.
(Really, though, the easiest approach is to do as James McNellis said - run it in a debugger and look at your stack trace.)
I had the same problem yesterday.
This is tried and tested code so there is a very low probability this is causing the crash.
Usually there are three possibilities for this crash:
another problem is causing a corruption of the data that this function is called on
another problem is causing a corruption of your stack causing this function to be called when it shouldn't be
any combination of the two possibilities above.
To diagnose this, run your code in the VS debugger. When your application crashes, look at the parameter values and check that the stack trace shown in the debugger is the same as the stack trace you should see (click on each entry in the stack trace and look and see the code is calling what it should).

Remove never-run call to templated function, get allocation error on run-time

I have a piece of templated code that is never run, but is compiled. When I remove it, another part of my program breaks.
First off, I'm a bit at a loss as to how to ask this question. So I'm going to try throwing lots of information at the problem.
Ok, so, I went to completely redesign my test project for my experimental core library thingy. I use a lot of template shenanigans in the library. When I removed the "user" code, the tests gave me a memory allocation error. After quite a bit of experimenting, I narrowed it down to this bit of code (out of a couple hundred lines):
void VOODOO(components::switchBoard &board) {
board.addComponent<using_allegro::keyInputs<'w'> >();
}
Fundementally, what's weirding me out is that it appears that the act of compiling this function (and the template function it then uses, and the template functions those then use...), makes this bug not appear. This code is not being run. Similar code (the same, but for different key vals) occurs elsewhere, but is within Boost TDD code.
I realize I certainly haven't given enough information for you to solve it for me; I tried, but it more-or-less spirals into most of the code base. I think I'm most looking for "here's what the problem could be", "here's where to look", etc. There's something that's happening during compile because of this line, but I don't know enough about that step to begin looking.
Sooo, how can a (presumably) compilied, but never actually run, bit of templated code, when removed, cause another part of code to fail?
Error:
Unhandled exceptionat 0x6fe731ea (msvcr90d.dll) in Switchboard.exe:
0xC0000005: Access violation reading location 0xcdcdcdc1.
Callstack:
operator delete(void * pUser Data)
allocator< class name related to key inputs callbacks >::deallocate
vector< same class >::_Insert_n(...)
vector< " " >::insert(...)
vector<" ">::push_back(...)
It looks like maybe the vector isn't valid, because _MyFirst and similar data members are showing values of 0xcdcdcdcd in the debugger. But the vector is a member variable...
Update: The vector isn't valid because it's never made. I'm getting a channel ID value stomp, which is making me treat one type of channel as another.
Update:
Searching through with the debugger again, it appears that my method for giving each "channel" it's own, unique ID isn't giving me a unique ID:
inline static const char channel<template args>::idFunction() {
return reinterpret_cast<char>(&channel<CHANNEL_IDENTIFY>::idFunction);
};
Update2: These two are giving the same:
slaveChannel<switchboard, ALLEGRO_BITMAP*, entityInfo<ALLEGRO_BITMAP*>
slaveChannel<key<c>, char, push<char>
Sooo, having another compiled channel type changing things makes sense, because it shifts around the values of the idFunctions? But why are there two idFunctions with the same value?
you seem to be returning address of the function as a character? that looks weird. char has much smaller bit count than pointer, so it's highly possible you get same values. that could reason why changing code layout fixes/breaks your program
As a general answer (though aaa's comment alludes to this): When something like this affects whether a bug occurs, it's either because (a) you're wrong and it is being run, or (b) the way that the inclusion of that code happens to affect your code, data, and memory layout in the compiled program causes a heisenbug to change from visible to hidden.
The latter generally occurs when something involves undefined behavior. Sometimes a bogus pointer value will cause you to stomp on a bit of your code (which might or might not be important depending on the code layout), or sometimes a bogus write will stomp on a value in your data stack that might or might not be a pointer that's used later, or so forth.
As a simple example, supposing you have a stack that looks like:
float data[10];
int never_used;
int *important pointer;
And then you erroneously write
data[10] = 0;
Then, assuming that stack got allocated in linear order, you'll stomp on never_used, and the bug will be harmless. However, if you remove never_used (or change something so the compiler knows it can remove it for you -- maybe you remove a never-called function call that would use it), then it will stomp on important_pointer instead, and you'll now get a segfault when you dereference it.

Visual Studio 2005 C compiler problem when optimizing a switch statement

General Question which may be of interest to others:
I ran into a, what I believe, C++-compiler optimization (Visual Studio 2005) problem with a switch statement. What I'd want to know is if there is any way to satisfy my curiosity and find out what the compiler is trying to but failing to do. Is there any log I can spend some time (probably too much time) deciphering?
My specific problem for those curious enough to continue reading - I'd like to hear your thoughts on why I get problems in this specific case.
I've got a tiny program with about 500 lines of code containing a switch statement. Some of its cases contain some assignment of pointers.
double *ptx, *pty, *ptz;
double **ppt = new double*[3];
//some code initializing etc ptx, pty and ptz
ppt[0]=ptx;
ppt[1]=pty; //<----- this statement causes problems
ppt[2]=ptz;
The middle statement seems to hang the compiler. The compilation never ends. OK, I didn't wait for longer than it took to walk down the hall, talk to some people, get a cup of coffee and return to my desk, but this is a tiny program which usually compiles in less than a second. Remove a single line (the one indicated in the code above) and the problem goes away, as it also does when removing the optimization (on the whole program or using #pragma on the function).
Why does this middle line cause a problem? The compilers optimizer doesn't like pty.
There is no difference in the vectors ptx, pty, and ptz in the program. Everything I do to pty I do to ptx and ptz. I tried swapping their positions in ppt, but pty was still the line causing a problem.
I'm asking about this because I'm curious about what is happening. The code is rewritten and is working fine.
Edit:
Almost two weeks later, I check out the closest version to the code I described above and I can't edit it back to make it crash. This is really annoying, embarrassing and irritating. I'll give it another try, but if I don't get it breaking anytime soon I guess this part of the question is obsolete and I'll remove it. Really sorry for taking your time.
If you need to make this code compilable without changing it too much consider using memcpy where you assign a value to ppt[1]. This should at least compile fine.
However, you problem seems more like another part of the source code causes this behaviour.
What you can also try is to put this stuff:
ppt[0]=ptx;
ppt[1]=pty; //<----- this statement causes problems
ppt[2]=ptz;
in another function.
This should also help compiler a bit to avoid the path it is taking to compile your code.
Did you try renaming pty to something else (i.e. pt_y)? I encountered a couple of times (i.e. with a variable "rect2") the problem that some names seem to be "reserved".
It sounds like a compiler bug. Have you tried re-ordering the lines? e.g.,
ppt[1]=pty;
ppt[0]=ptx;
ppt[2]=ptz;
Also what happens if you juggle about the values that are assigned (which will introduce bugs in your code, but may indicator whether its the pointer or the array that's the issue), e.g.:
ppt[0] = pty;
ppt[1] = ptz;
ppt[2] = ptx;
(or similar).
It's probably due to your declaration of ptx, pty and ptz with them being optimised out to use the same address. Then this action is causing your compiler problems later in your code.
Try
static double *ptx;
static double *pty;
static double *ptz;

VC++ 6.0 vector access violation crash. Known bug?

I'm trying to use a std::vector<>::const_iterator and I get an 'access violation' crash. It looks like the std::vector code is crashing when it uses its own internal First_ and Last_ pointers. Presumably this is a known bug. I'm hoping someone can point me to the correct workaround. It's probably relevant that the crashing function is called from an external library?
const Thing const* AClass::findThing (const std::string& label) const
{
//ThingList_.begin() blows up at run time. Compiles fine.
for (std::vector<Thing*>::const_iterator it = ThingList_.begin(); it != ThingList_.end(); ++it) {
//Irrelevant.
}
return 0;
}
Simply calling ThingList_.size() also crashes.
This is sp6, if it matters.
If you're passing C++ objects across external library boundaries, you must ensure that all libraries are using the same runtime library (in particular, the same heap allocator). In practice, this means that all libraries must be linked to the DLL version of MSVCRT.
It's almost certainly a bug in your code and not std::vector. This code is used by way too many projects to have such an easy to repro bug.
What's likely happening is that the ThnigList_ variable has been corrupted in some way. Was the underlying array accessed directly and/or modified?
I agree with Jared that it is probably in your code,
never the less, you should be sure your stl libs are up to date.
The dinkumware site has the patched files you need.
You should update just to be safe