OpenCV (C++) using SIFT descriptors increases the number of detected features? - c++

I have a bit confusing situation while using the SIFT descriptors implementation from OpenCV.
I'm trying to test various feature detector + descriptor calculation methods, so I am using a combination of cv::FeatureDetector and cv::DescriptorExtractor interfaces which allow me to simply change between different detector methods and descriptors.
When calling cv::DescriptorExtractor::compute(...) (the variant for a single image), the documentation says that it is possible for the number of key points given to the algorithm to decrease if it is impossible to calculate their descriptors, and I understand how and why that is done.
But, what happens to me is that the number of key points after the descriptor computations actually increases. It is clearly so, and I'm not trying to stop it from happening, I am just hoping for an explanation as to why (just an intuitive description would be cool, altough I appreciate anything more that that).
I've got layers upon layers of wrappers around the actual OpenCV that don't have any code (just setting up some local non-OpenCV flags), so here's the OpenCV code that's being called at the bottom of it all:
cv::Ptr<cv::FeatureDetector> dect = cv::FeatureDetector::create("MSER");
cv::Mat input = cv::imread("someImg.ppm", 0);
std::vector<cv::KeyPoint> keypoints;
dect->detect(input, keypoints);
cv::Ptr<cv::DescriptorExtractor>deEx=cv::DescriptorCalculator::create("SIFT");
std::cout << "before computing, feats size " << keypoints.size() << std::endl;
// code to print out 10 features
cv::Mat desc;
deEx->compute(input, keypoints, desc);
std::cout << "after computing, feats size " << keypoints.size() << std::endl;
// code to print out 10 features
I've printed out the first 10 key points just before and after the descriptor calculations, so here are some concrete numbers as an example:
before computing, feats size 379
feat[0]: 10.7584 39.9262 176.526 0 12.5396
feat[1]: 48.2209 207.904 275.091 0 11.1319
feat[2]: 160.894 313.781 170.278 0 9.63786
feat[3]: 166.061 239.115 158.33 0 19.5027
feat[4]: 150.043 233.088 171.887 0 11.9569
feat[5]: 262.323 322.173 188.103 0 8.65429
feat[6]: 189.501 183.462 177.396 0 12.3069
feat[7]: 218.135 253.027 171.763 0 123.069
feat[8]: 234.508 353.236 173.281 0 11.8375
feat[9]: 234.404 394.079 176.23 0 8.99652
after computing, feats size 463
feat[0]: 10.7584 39.9262 13.1313 0 12.5396
feat[1]: 48.2209 207.904 69.0472 0 11.1319
feat[2]: 48.2209 207.904 107.438 0 11.1319
feat[3]: 160.894 313.781 9.57937 0 9.63786
feat[4]: 166.061 239.115 166.144 0 19.5027
feat[5]: 150.043 233.088 78.8696 0 11.9569
feat[6]: 262.323 322.173 167.259 0 8.65429
feat[7]: 189.501 183.462 -1.49394 0 12.3069
feat[8]: 218.135 253.027 -117.067 3 123.069
feat[9]: 218.135 253.027 7.44055 3 123.069
I can see from this example that the original feat[1] and feat[7] have spanned in to two new key points each, but I do not see any logical explanation for the compute method to do that :(
The printout I have given here is from using MSER for detection of keypoints, and then trying to calculate SIFT descriptors, but the same increase in size also happens with STAR, SURF, and SIFT (i.e. DoG) keypoints detected. I didn't try to change the SIFT descriptor in to something else, but if someone thinks it's relevant to the question, I'll try it and edit it in my question.

First of all, as you can see in the documentation cv::DescriptorExtractor::compute take a std::vector<cv::Keypoints> in argument which in non const. It means that this vector can be modified by cv::DescriptorExtractor::compute.
In practice, KeyPointsFilter::runByImageBorder and KeyPointsFilter::runByKeypointSize (two non-const functions) will be applied to the vector and will remove the keypoint for which a descriptor cannot be computed. No re-extraction of keypoints will be done.
You should post the few lines of code you are using for further diagnostic.
--
Well, I finally found where the problem occurs: cv::SiftDescriptorExtractor::compute method calls SIFT::operator() which (re)calculate orientation of features and also duplicates the points with several dominant orientations.
Solution could be to change descriptorParams.recalculateAngles to false.

Looks like it is due to OpenCV using Rob Hess's SIFT implementation, which sometimes duplicates the keypoints with more than one dominant orientation.
Looking around the OpenCV reported bugs did the trick, the issue was reported here.
It is not a bug, the behavior was not corrected in the newer versions but instead just documented. Since I am obliged to the OpenCV version I am using right now (v2.1), it did not occur to me to look at newer documentation for additional behavior since the behavior described in the old one made sense to me.

This is not a bug but by design:
SIFT returns multiple interest points at the same location with different orientations if there is not clearly a single dominant orientation. Usually, up to three (depending on the actual image patch) orientations are estimated.

Related

How to correctly format input and resize output data whille using TensorRT engine?

I'm trying implementing deep learning model into TensorRT runtime. The model conversion step is done quite OK and i'm pretty sure about it.
Now there's 2 parts i'm currently struggle with is memCpy data from host To Device (like openCV to Trt) and get the right output shape in order to get the right data. So my questions is:
How actually a shape of input dims relate with memory buffer. What is the difference when the model input dims is NCHW and NHWC, so when i read a openCV image, it's NHWC and also the model input is NHWC, do i have to re-arange the buffer data, if Yes then what's the actual consecutive memory format i have to do ?. Or simply what does the format or sequence of data that the engine are expecting ?
About the output (assume the input are correctly buffered), how do i get the right result shape for each task (Detection, Classification, etc..)..
Eg. an array or something look similar like when working with python .
I read Nvidia docs and it's not beginner-friendly at all.
//Let's say i have a model thats have a dynamic shape input dim in the NHWC format.
auto input_dims = nvinfer1::Dims4{1, 386, 342, 3}; //Using fixed H, W for testing
context->setBindingDimensions(input_idx, input_dims);
auto input_size = getMemorySize(input_dims, sizeof(float));
// How do i format openCV Mat to this kind of dims and if i encounter new input dim format, how do i adapt to that ???
And the expected output dims is something like (1,32,53,8) for example, the output buffer result in a pointer and i don't know what's the sequence of the data to reconstruct to expected array shape.
// Run TensorRT inference
void* bindings[] = {input_mem, output_mem};
bool status = context->enqueueV2(bindings, stream, nullptr);
if (!status)
{
std::cout << "[ERROR] TensorRT inference failed" << std::endl;
return false;
}
auto output_buffer = std::unique_ptr<int>{new int[output_size]};
if (cudaMemcpyAsync(output_buffer.get(), output_mem, output_size, cudaMemcpyDeviceToHost, stream) != cudaSuccess)
{
std::cout << "ERROR: CUDA memory copy of output failed, size = " << output_size << " bytes" << std::endl;
return false;
}
cudaStreamSynchronize(stream);
//How do i use this output_buffer to form right shape of output, (1,32,53,8) in this case ?
Could you please edit your question and tell us which model you're using if it's a commonly known NN, prehaps one we can download to test locally?
Then, the answer since it doesn't depend on the model (even though it would help to answer)
How actually a shape of input dims relate with memory buffer
If the input is NxCxHxW, you need to allocate N*C*H*W*sizeof(float) memory for that on your CPU and GPU. To be more precise, you need to allocate space on GPU for all the bindings and on CPU for only input and output bindings.
when i read a openCV image, it's NHWC and also the model input is NHWC, do i have to re-arange the buffer data
No, you do not have to re-arrange the buffer data. If you would have to change between NHWC and NCHW you can check this or google 'opencv NHWC to NHCW'.
Full working code example here, especially this function.
Or simply what does the format or sequence of data that the engine are expecting ?
This depends on how the neural network was trained. You should in general know exactly which kind of preprocessing and image data formats have been used to train the NN. You should even use the same libraries to load images and process them if possible. It's an open problem in ML: if you try to replicate results of some papers and use their models but they haven't open sourced the preprocessing you might get worse results. In the "worst" case you can implement both NHCW and NCHW and test which of them works.
About the output (assume the input are correctly buffered), how do i get the right result shape for each task (Detection, Classification, etc..).. Eg. an array or something look similar like when working with python .
This question clearly requires me to understand which NNs you are referring to. But I myself do the following:
Load the TensorRT .engine file in my code like this and deserialize like this
Print the bindings like this
Then I know the size of the input binding or bindings if there are many inputs, and the size of the output binding or bindings if there are many outputs.
This way you know the right result shape for each task. I hope this answered your question. If not, please add detailed comments and edit your post to be more precise. Thank you.
I read Nvidia docs and it's not beginner-friendly at all.
Yes I agree. You're better of searching TensorRT c++ (or Python) repositories from Github and studying their code. Have you seen TensorRT samples? It doesn't really take many lines of code to implement TensorRT inference.

Parse Individual Curves from General_polygon_set_2 in CGAL

To start, I want to thank everyone who has helped me so far on previous problems I have had with working through the CGAL Library, it is greatly appreciated.
Background on myself: I am still very new with C++ and my coding experience is in MATLAB so there is a lot of concepts that I am learning very quickly and are therefore very new to me, so please excuse my erroneous language that I may use with regard to C++.
The Problem:
I have recently wrote some code that finds the Minkowski sum of a polyline and a circle (i.e., buffer of a polyline) using the code found in the documentation of Boolean Set Operations on General Polygons.
Here, a General_polygon_set_2 concept is utilized in the output, and if the output code is used from the example above I can get the following output of a Polygon_with_holes_2 class:
48 [775.718 -206.547 --> 769.134 -157.991] (769 -157 1 1) [769.134 -157.991 --> 770 -157] (769 -157 1 1) [770 -157 --> 768.866 -156.009] [768.866 -156.009 --> 762.282 -107.453] [762.282 -107.453 --> 703.282 -115.453] [703.282 -115.453 --> 708.072 -150.778] ...
7 15 [549.239 -193.612 --> 569.403 -216.422] ... 3 [456.756 -657.812 --> 657.930 908.153] ...
Here, if I understand correctly, the first integer refers to the number of a vertices in the .outer_boundary() , followed by descriptions of the curves for each "edge" of the general polygon. In my problem, the outputs will only consist of linear functions and circular arcs.
Linear: [775.718 -206.547 --> 769.134 -157.991]
Circular Arc (x-monotone): (769 -157 1 1) [769.134 -157.991 --> 770 -157]
The linear element is simple, go from this x-y coordinate to this other one by a line. As for the the circular arc, it is little bit more different, it says to use this circle described by the arguments in these brackets () to go from this x-y coordinate to this other one contained in these brackets []. The arguments to circle are: (x,y,radius,orientation).
Next, since we have holes, after the .outer_boundary() has been written out, two more integers are displayed. The first one states the number of holes, the second states the number vertices in this hole, then followed by those vertices for that hole. Then once that hole is written out, another integer is written describing the number of vertices in that hole, and this then continues for all of the holes, completing the description of the polygon.
So with that, my current problem is parsing out each individual curve one at a time so that I can do operations on them.
I have the following functions from the documentation to work with:
.outer_boundary(): returns the general polygon that represents the outer boundary.
.holes_begin(): returns the begin iterator of the holes.
.holes_end():
So my thought is to break the General_polygon_set_2 to General_polygon_2, then break that down into the .outer_boundary() and the different holes. Finally, for each set of curves, break those down into individual curves.
I am not really sure how to go about this, I just know that I need individual curve data so I can do my own operations on them. Any help, will be, as always, greatly appreciated!
Note: I actually deleted this post after reading through the arrangements documentation thinking that this was too obvious of an answer, but after sometime I still really do not see how to pull this info properly, I think the biggest issue is in my lacking knowledge of C++. Sorry about this being a noob-ish question.
Solution in Progress:
list<Polygon_with_holes_2> res;
S.polygons_with_holes (back_inserter (res));
list<Polygon_with_holes_2>::iterator i = res.begin();
Polygon_with_holes_2 mink = *i;
minkOuter = mink.outer_boundary();
cout << minkOuter << endl;
int numHoles = mink.holes_end()-mink.holes_begin();
cout << numHoles << endl;
Now I am working on isolating the holes, followed by breaking those down into each individual curve.
The doc here states that the value_type of a Hole_const_iterator is a General_polygon_2, which means that what you can iterate through all "curves" using "holes_begin()" and "holes-end", like you thought. To do that, use the following syntax:
for(auto h_it = mink.holes_begin(); h_it != mink.holes_end(); ++h_it)
{
//in here h_it is an iterator with value type General_polygon_2, so *h_it will be a the polygon describing a hole. Every step of this loop will give you another hole.
}
Then, you can iterate the curves of each polygon with curves_begin() and curves_end() the same way.
So to iterate each curve of a polygon_with_holes:
for(auto h_it = mink.holes_begin(); h_it != mink.holes_end(); ++h_it)
{
for(auto curve_it = h_it->curves_begin(); curves_it != h_it->curves_end(); ++curves_it)
{
//*curves_it gives you a curve.
}
}

Declaring variables in Python 2.7x to avoid issues later

I am new to Python, coming from MATLAB, and long ago from C. I have written a script in MATLAB which simulates sediment transport in rivers as a Markov Process. The code randomly places circles of a random diameter within a rectangular area of a specified dimension. The circles are non-uniform is size, drawn randomly from a specified range of sizes. I do not know how many times I will step through the circle placement operation so I use a while loop to complete the process. In an attempt to be more community oriented, I am translating the MATLAB script to Python. I used the online tool OMPC to get started, and have been working through it manually from the auto-translated version (was not that helpful, which is not surprising). To debug the code as I go, I use the
MATLAB generated results to generally compare and contrast against results in Python. It seems clear to me that I have declared variables in a way that introduces problems as calculations proceed in the script. Here are two examples of consistent problems between different instances of code execution. First, the code generated what I think are arrays within arrays because the script is returning results which look like:
array([[ True]
[False]], dtype=bool)
This result was generated for the following code snippet at the overlap_logix operation:
CenterCoord_Array = np.asarray(CenterCoordinates)
Diameter_Array = np.asarray(Diameter)
dist_check = ((CenterCoord_Array[:,0] - x_Center) ** 2 + (CenterCoord_Array[:,1] - y_Center) ** 2) ** 0.5
radius_check = (Diameter_Array / 2) + radius
radius_check_update = np.reshape(radius_check,(len(radius_check),1))
radius_overlap = (radius_check_update >= dist_check)
# Now actually check the overalp condition.
if np.sum([radius_overlap]) == 0:
# The new circle does not overlap so proceed.
newCircle_Found = 1
debug_value = 2
elif np.sum([radius_overlap]) == 1:
# The new circle overlaps with one other circle
overlap = np.arange(0,len(radius_overlap), dtype=int)
overlap_update = np.reshape(overlap,(len(overlap),1))
overlap_logix = (radius_overlap == 1)
idx_true = overlap_update[overlap_logix]
radius = dist_check(idx_true,1) - (Diameter(idx_true,1) / 2)
A similar result for the same run was produced for variables:
radius_check_update
radius_overlap
overlap_update
Here is the same code snippet for the working MATLAB version (as requested):
distcheck = ((Circles.CenterCoordinates(1,:)-x_Center).^2 + (Circles.CenterCoordinates(2,:)-y_Center).^2).^0.5;
radius_check = (Circles.Diameter ./ 2) + radius;
radius_overlap = (radius_check >= distcheck);
% Now actually check the overalp condition.
if sum(radius_overlap) == 0
% The new circle does not overlap so proceed.
newCircle_Found = 1;
debug_value = 2;
elseif sum(radius_overlap) == 1
% The new circle overlaps with one other circle
temp = 1:size(radius_overlap,2);
idx_true = temp(radius_overlap == 1);
radius = distcheck(1,idx_true) - (Circles.Diameter(1,idx_true)/2);
In the Python version I have created arrays from lists to more easily operate on the contents (the first two lines of the code snippet). The array within array result and creating arrays to access data suggests to me that I have incorrectly declared variable types, but I am not sure. Furthermore, some variables have a size, for example, (2L,) (the numerical dimension will change as circles are placed) where there is no second dimension. This produces obvious problems when I try to use the array in an operation with another array with a size (2L,1L). Because of these problems I started reshaping arrays, and then I stopped because I decided these were hacks because I had declared one, or more than one variable incorrectly. Second, for the same run I encountered the following error:
TypeError: 'numpy.ndarray' object is not callable
for the operation:
radius = dist_check(idx_true,1) - (Diameter(idx_true,1) / 2)
which occurs at the bottom of the above code snippet. I have posted the entire script at the following link because it is probably more useful to execute the script for oneself:
https://github.com/smchartrand/MarkovProcess_Bedload
I have set-up the code to run with some initial parameter values so decisions do not need to be made; these parameter values produce the expected results in the MATLAB-based script, which look something like this when plotted:
So, I seem to specifically be having issues with operations on lines 151-165, depending on the test value np.sum([radius_overlap]) and I think it is because I incorrectly declared variable types, but I am really not sure. I can say with confidence that the Python version and the MATLAB version are consistent in output through the first step of the while loop, and code line 127 which is entering the second step of the while loop. Below this point in the code the above documented issues eventually cause the script to crash. Sometimes the script executes to 15% complete, and sometimes it does not make it to 5% - this is due to the random nature of circle placement. I am preparing the code in the Spyder (Python 2.7) IDE and will share the working code publicly as a part of my research. I would greatly appreciate any help that can be offered to identify my mistakes and misapplications of python coding practice.
I believe I have answered my own question, and maybe it will be of use for someone down the road. The main sources of instruction for me can be found at the following three web pages:
Stackoverflow Question 176011
SciPy FAQ
SciPy NumPy for Matlab users
The third web page was very helpful for me coming from MATLAB. Here is the modified and working python code snippet which relates to the original snippet provided above:
dist_check = ((CenterCoordinates[0,:] - x_Center) ** 2 + (CenterCoordinates[1,:] - y_Center) ** 2) ** 0.5
radius_check = (Diameter / 2) + radius
radius_overlap = (radius_check >= dist_check)
# Now actually check the overalp condition.
if np.sum([radius_overlap]) == 0:
# The new circle does not overlap so proceed.
newCircle_Found = 1
debug_value = 2
elif np.sum([radius_overlap]) == 1:
# The new circle overlaps with one other circle
overlap = np.arange(0,len(radius_overlap[0]), dtype=int).reshape(1, len(radius_overlap[0]))
overlap_logix = (radius_overlap == 1)
idx_true = overlap[overlap_logix]
radius = dist_check[idx_true] - (Diameter[0,idx_true] / 2)
In the end it was clear to me that it was more straightforward for this example to use numpy arrays vs. lists to store results for each iteration of filling the rectangular area. For the corrected code snippet this means I initialized the variables:
CenterCoordinates, and
Diameter
as numpy arrays whereas I initialized them as lists in the posted question. This made a few mathematical operations more straightforward. I was also incorrectly indexing into variables with parentheses () as opposed to the correct method using brackets []. Here is an example of a correction I made which helped the code execute as envisioned:
Incorrect: radius = dist_check(idx_true,1) - (Diameter(idx_true,1) / 2)
Correct: radius = dist_check[idx_true] - (Diameter[0,idx_true] / 2)
This example also shows that I had issues with array dimensions which I corrected variable by variable. I am still not sure if my working code is the most pythonic or most efficient way to fill a rectangular area in a random fashion, but I have tested it about 100 times with success. The revised and working code can be downloaded here:
Working Python Script to Randomly Fill Rectangular Area with Circles
Here is an image of a final results for a successful run of the working code:
The main lessons for me were (1) numpy arrays are more efficient for repetitive numerical calculations, and (2) dimensionality of arrays which I created were not always what I expected them to be and care must be practiced when establishing arrays. Thanks to those who looked at my question and asked for clarification.

C++: OpenCV2.4.11(!) access to webcam parameters

This is a direct follow-up of the last question I asked which was aptly named "C++: OpenCV2.3.1(!) access to webcam parameters" and where I was told to install OpenCV2.4.11 instead (OpenCV3.0 did not work)... which I did. And yes, most of this text is an exact copy&paste of the last thread since my problem hasn't actually vanished...
Again, I've searched here, on other forums (Google, OpenCV etc), looked at the code of the videoInput library, the different header files and especially OpenCV's highgui_c.h and still seem to be unable to find an answer to this very simple question:
How do I change exposure and gain (or, to be general, any webcam property) in my Logitech C310 webcam with OpenCV2.4.11 the same way I was able to with OpenCV2.1.0? (using Win7 64-bit, Visual Studio 10)
EDIT: This has been solved. I do not know how but when I tested my code this morning it was able to report and set the exposure using VideoCapture and the set/get method.
There's the nice and easy VideoCapture get and set method, I know, similar to the videoInput's [Set/Get]VideoSetting[Camera/Filter] functions. Here's my short example in OpenCV2.4.11 that doesn't work:
EDIT: It does work now. What I don't understand is that the values of several properties are reported as -8.58993E+008 (namely hue, monocrome, gamma, temperature, zoom, focus, pan, tilt, roll and iris) and that property 6 (fourcc) is -4.66163E+008. I know I don't have these features on my webcam but all other unimplemented features report -1.
int __stdcall WinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance, char* CmdArgs, int CmdShow) {
int device0 = 0;
VideoCapture VC(device0);
if(!VC.isOpened()) // check if we succeeded
return -1;
ostringstream oss;
double CamProp;
for(int i=-4; i<27; i++) {
CamProp = VC.get(i);
Sleep(5);
oss << "Item " << i << ": " << CamProp << "\n";
}
MessageBox(NULL, oss.str().c_str(), "Webcam Values", MB_OK);
return 0;
}
It compiles, it runs, it accesses the webcam alright (and even shows a picture with imshow if I add it to the code) but it only opens a nice window saying this:
Item -4: 0
Item -3: 0
Item -2: 0
...
Item 2: 0
Item 3: 640
Item 4: 480
Item 5: 0
...
Item 25: 0
Item 26: 0
EDIT: See above, this works now. I get values for all supported parameters like exposure, gain, sharpness, brightness, contrast and so on. Perhaps I was still linking to the 2.3.1 libraries or whatever.
The point is: This was all perfectly settable with this camera under OpenCV 2.1.0 using videoInput. I had a running application doing its own lighting instead of using the Logitech functions (RightLight, Auto Exposure, Auto Whitebalance). Now setting and getting the parameters has been integrated into OpenCV highgui for quite a while but with a strongly reduced feature list (no requesting of parameter ranges, Min/Max/Stepwidth..., no setting of auto exposure, RightLight and similar stuff) and for some reason it's incompatible with my Logitech webcam. I can report the resolution but nothing else.
EDIT: I still miss the Min, Max, Step, Auto/Manual features of videoInput. I can set a value but I don't know whether it's allowed.
The videoInput code is now merged into OpenCV's code in the file cap_dshow.cpp but I can't find a header file that declares the videoInput class and simply using my old code doesn't work. So I have a cpp file which contains all functions I need and which I know did the job for me a while back but which I can't access now. Any clues on how to do that? Has anyone accessed and changed camera parameters in OpenCV2.4.11 using the videoInput/DirectShow interface?
EDIT: Seems this has happened now in a working way, unlike 2.3.1. No direct interaction with videoInput seems to be needed. However it would be nice to have it for the aforementioned reasons.
There's also the funny problem that using e.g.
VideoCapture cam(0)
addresses exactly the same camera as
VideoCapture cam(1)
or
VideoCapture cam(any integer value)
which seems odd to me and hints in the same direction - that CV's VideoCapture does not work properly for me. A similar problem is described here but I also tried the code with a Sleep(1000) after opening the capture - without success.
EDIT: This is also working correctly now. I get my webcam with (0) and and error with (1) which is absolutely OK.

my c++ extension behaves differently with faulthandler

Background
I have a C++ extension which runs a 3D watershed pass on a buffer. It's got a nice Cython wrapper to initialise a massive buffer of signed chars to represent the voxels. I initialise some native data structures in python (in a compiled cython file) and then call one C++ function to initialise the buffer, and another to actually run the algorithm (I could have written these in Cython too, but I'd like it to work as a C++ library as well without a python.h dependancy.)
Weirdness
I'm in the process of debugging my code, trying different image sizes to gauge RAM usage and speed, etc, and I've noticed something very strange about the results - they change depending on whether I use python test.py (specifically /usr/bin/python on Mac OS X 10.7.5/Lion, which is python 2.7) or python and running import test, and calling a function on it (and indeed, on my laptop (OS X 10.6.latest, with macports python 2.7) the results are also deterministically different - each platform/situation is different, but each one is always the same as itself.). In all cases, the same function is called, loads some input data from a file, and runs the C++ module.
A note on 64-bit python - I am not using distutils to compile this code, but something akin to my answer here (i.e. with an explicit -arch x86_64 call). This shouldn't mean anything, and all my processes in Activity Monitor are called Intel (64-bit).
As you may know, the point of watershed is to find objects in the pixel soup - in 2D it's often used on photos. Here, I'm using it to find lumps in 3D in much the same way - I start with some lumps ("grains") in the image and I want to find the inverse lumps ("cells") in the space between them.
The way the results change is that I literally find a different number of lumps. For exactly the same input data:
python test.py:
grain count: 1434
seemed to have 8000000 voxels, with average value 0.8398655
find cells:
running watershed algorithm...
found 1242 cells from 1434 original grains!
...
however,
python, import test, test.run():
grain count: 1434
seemed to have 8000000 voxels, with average value 0.8398655
find cells:
running watershed algorithm...
found 927 cells from 1434 original grains!
...
This is the same in the interactive python shell and bpython, which I originally thought was to blame.
Note the "average value" number is exactly the same - this indicates that the same fraction of voxels have initially been marked as in the problem space - i.e. that my input file was initialised in (very very probably) exactly the same way both times in voxel-space.
Also note that no part of the algorithm is non-deterministic; there are no random numbers or approximations; subject to floating point error (which should be the same each time) we should be performing exactly the same computations on exactly the same numbers both times. Watershed runs using a big buffer of integers (here signed chars) and the results are counting clusters of those integers, all of which is implemented in one big C++ call.
I have tested the __file__ attribute of the relevant module objects (which are themselves attributes of the imported test), and they're pointing to the same installed watershed.so in my system's site-packages.
Questions
I don't even know where to begin debugging this - how is it possible to call the same function with the same input data and get different results? - what about interactive python might cause this (e.g. by changing the way the data is initialised)? - Which parts of the (rather large) codebase are relevant to these questions?
In my experience it's much more useful to post ALL the code in a stackoverflow question, and not assume you know where the problem is. However, that is thousands of lines of code here, and I have literally no idea where to start! I'm happy to post small snippets on request.
I'm also happy to hear debugging strategies - interpreter state that I can check, details about the way python might affect an imported C++ binary, and so on.
Here's the structure of the code:
project/
clibs/
custom_types/
adjacency.cpp (and hpp) << graph adjacency (2nd pass; irrelevant = irr)
*array.c (and h) << dynamic array of void*s
*bit_vector.c (and h) << int* as bitfield with helper functions
polyhedron.cpp (and hpp) << for voxel initialisation; convex hull result
smallest_ints.cpp (and hpp) << for voxel entity affiliation tracking (irr)
custom_types.cpp (and hpp) << wraps all files in custom_types/
delaunay.cpp (and hpp) << marshals calls to stripack.f90
*stripack.f90 (and h) << for computing the convex hulls of grains
tensors/
*D3Vector.cpp (and hpp) << 3D double vector impl with operators
watershed.cpp (and hpp) << main algorithm entry points (ini, +two passes)
pywat/
__init__.py
watershed.pyx << cython class, python entry points.
geometric_graph.py << python code for post processing (irr)
setup.py << compile and install directives
test/
test.py << entry point for testing installed lib
(files marked * have been used extensively in other projects and are very well tested, those suffixed irr contain code only run after the problem has been caused.)
Details
as requested, the main stanza in test/test.py:
testfile = 'valid_filename'
if __name__ == "__main__":
# handles segfaults...
import faulthandler
faulthandler.enable()
run(testfile)
and my interactive invocation looks like:
import test
test.run(test.testfile)
Clues
when I run this at the straight interpreter:
import faulthandler
faulthandler.enable()
import test
test.run(test.testfile)
I get the results from the file invocation (i.e. 1242 cells), although when I run it in bpython, it just crashes.
This is clearly the source of the problem - hats off to Ignacio Vazquez-Abrams for asking the right question straight away.
UPDATE:
I've opened a bug on the faulthandler github and I'm working towards a solution. If I find something that people can learn from I'll post it as an answer.
After debugging this application extensively (printf()ing out all the data at multiple points during the run, piping outputs to log files, diffing the log files) I found what seemed to cause the strange behaviour.
I was using uninitialised memory in a couple of places, and (for some bizarre reason) this gave me repeatable behaviour differences between the two cases I describe above - one without faulthandler and one with.
Incidentally, this is also why the bug disappeared from one machine but continued to manifest itself on another, part way through debugging (which really should have given me a clue!)
My mistake here was to assume things about the problem based on a spurious correlation - in theory the garbage ram should have been differently random each time I accessed it (ahh, theory.) In this case I would have been quicker finding the problem with a printout of the main calculation function and a rubber duck.
So, as usual, the answer is the bug is not in the library, it is somewhere in your code - in this case, it was my fault for malloc()ing a chunk of RAM, falsely assuming that other parts of my code were going to initialise it (which they only did sometimes.)