Chi-Squared Probability Function in C++ - c++

The following code of mine computes the confidence interval using Chi-square's 'quantile' and probability function from Boost.
I am trying to implement this function as to avoid dependency to Boost. Is there any resource where can I find such implementation?
#include <boost/math/distributions/chi_squared.hpp>
#include <boost/cstdint.hpp>
using namespace std;
using boost::math::chi_squared;
using boost::math::quantile;
vector <double> ConfidenceInterval(double x) {
vector <double> ConfInts;
// x is an estimated value in which
// we want to derive the confidence interval.
chi_squared distl(2);
chi_squared distu((x+1)*2);
double alpha = 0.90;
double lower_limit = 0;
if (x != 0) {
chi_squared distl(x*2);
lower_limit = (quantile(distl,((1-alpha)/2)))/2;
}
double upper_limit = (quantile(distu,1-((1-alpha)/2)))/2;
ConfInts.push_back(lower_limit);
ConfInts.push_back(upper_limit);
return ConfInts;
}

If you're looking for source code you can copy/paste, here are some links:
AlgLib
Koders
YMMV...

Have a look at the Gnu Scientific library. Or look in Numerical Recipes. There's also a Java version in Apache Commons Math, which should be straightforward to translate.

I am trying to implement this function as to avoid dependency to Boost.
Another option is to reduce Boost dependencies, but not avoid them. If you reduce the dependency, you might be able to use a Boost folder with say, 200 or 300 source files rather than the entire 1+ GB of material. (Yes, 200 or 300 can be accurate - its what I ended up with when copying out shared_ptr).
To reduce the dependency, use bcp (boost copy) to copy out just the files needed for chi_squared.hpp. The bad thing is, you usually have to build bcp from sources because its not distributed in the ZIPs or TARBALLs.
To find instructions on building bcp, see How to checkout latest stable Boost (not development or bleeding edge)?

Related

How to access C++ libraries in Octave coding

I have the task to make some scripts compatible to Octave (or create counterparts that run in Octave).
I am trying to handle this matlab part:
%ft = fittype( 'a+b*y*x+c*y^2', 'independent', {'x', 'y'}, 'dependent', 'z' );
%
%opts = fitoptions( 'Method', 'NonlinearLeastSquares' );
%opts.Display = 'Off';
%opts.Lower = [0 0 0];
%
%opts.MaxFunEvals = 10000;
%opts.MaxIter = 10000;
%opts.Robust = 'Bisquare';
%opts.StartPoint = [0.12345 0.456789 0.23456];
%[fitresult, gof] = fit( [xData, yData], zData, ft, opts );
%coeffs=coeffvalues(fitresult);
There's obviously no direct equivalent to this toolbox from Matlab in Octave.
I can do linear and nonlinear regression, but not robust. So I looked around for alternative ways.
I found that GSL can calculate fittings using the same biweight robust alghorithm.
https://www.gnu.org/software/gsl/doc/html/lls.html?highlight=robust#c.gsl_multifit_robust_alloc
I think there are ways to call GSL libraries form Octave (the forge package gsl does not include the funtions I need).
So my questions:
a) Working on windows, can I make octave use those libraries? (I know it can call C++ functions). And can I include those libraries to give ppl with "just" octave a "package" to make it work together?
b) Any leads on how to go this way?
In the documentation it is only described how to make .oct files (c++ code) that use octave and c++ libraries. But I'd like to stay within octave code (and .m files).

Preloading for Importr in Django

Is there a way to preload libraries for the R instance that rpy2 talks to? I am spending 25-30% of my response time (about .5s per chart) in importr calls to lattice or grdevices, and would like to cut down if possible.
Code snippet:
grdevices = importr('grDevices')
importr('lattice')
imagefile = File(open('1d_%s.png' % str(uuid4()), 'w'))
grdevices.png(file=imagefile.name, type='cairo',width=400,height=350)
rcmd="""
print(
xyplot(yvec~xvec,labels=labels,type=c('p','r'),
ylab='%s',xlab='%s'
)
)"""% (y_lab, x_lab)
robjects.r(rcmd)
grdevices.dev_off()
imagefile.close()
If I do not invoke importr("lattice"), robjects.r freaks at the "xyplot(..." call I make later. Can I use R_PROFILE or R_ENVIRON_USER to speed up the lattice and grdevices calls?
importr is a pretty high level function, trading performance for ease-of-use. It does a lot beside just loading an R package. It also maps all R objects in that package to Python (rpy2) objects. That effort is lost when doing importr('lattice') in your script, if the result is not used.
Beside that, importing packages in R itself is not without a cost (for the larger R packages with S4 class definitions, you this can be noticeable when the script is short). rpy2 can't do much about this.
Using R variables such as R_PROFILE is possible, but this was not enabled by default before very recently. How to enable it is on SO (here).
Now, here importr is taking "only" 25% of the response time. Optimization efforts focusing on this will not be able to make it more than 25% faster (and that's a very optimistic limit). Interpolating data into a string to evaluate it as R code after that is not very optimal (as warned in the documentation for rpy2 ). Consider calling the R function through rpy2, passing the data as anything exporting the buffer interface (for example).

my c++ extension behaves differently with faulthandler

Background
I have a C++ extension which runs a 3D watershed pass on a buffer. It's got a nice Cython wrapper to initialise a massive buffer of signed chars to represent the voxels. I initialise some native data structures in python (in a compiled cython file) and then call one C++ function to initialise the buffer, and another to actually run the algorithm (I could have written these in Cython too, but I'd like it to work as a C++ library as well without a python.h dependancy.)
Weirdness
I'm in the process of debugging my code, trying different image sizes to gauge RAM usage and speed, etc, and I've noticed something very strange about the results - they change depending on whether I use python test.py (specifically /usr/bin/python on Mac OS X 10.7.5/Lion, which is python 2.7) or python and running import test, and calling a function on it (and indeed, on my laptop (OS X 10.6.latest, with macports python 2.7) the results are also deterministically different - each platform/situation is different, but each one is always the same as itself.). In all cases, the same function is called, loads some input data from a file, and runs the C++ module.
A note on 64-bit python - I am not using distutils to compile this code, but something akin to my answer here (i.e. with an explicit -arch x86_64 call). This shouldn't mean anything, and all my processes in Activity Monitor are called Intel (64-bit).
As you may know, the point of watershed is to find objects in the pixel soup - in 2D it's often used on photos. Here, I'm using it to find lumps in 3D in much the same way - I start with some lumps ("grains") in the image and I want to find the inverse lumps ("cells") in the space between them.
The way the results change is that I literally find a different number of lumps. For exactly the same input data:
python test.py:
grain count: 1434
seemed to have 8000000 voxels, with average value 0.8398655
find cells:
running watershed algorithm...
found 1242 cells from 1434 original grains!
...
however,
python, import test, test.run():
grain count: 1434
seemed to have 8000000 voxels, with average value 0.8398655
find cells:
running watershed algorithm...
found 927 cells from 1434 original grains!
...
This is the same in the interactive python shell and bpython, which I originally thought was to blame.
Note the "average value" number is exactly the same - this indicates that the same fraction of voxels have initially been marked as in the problem space - i.e. that my input file was initialised in (very very probably) exactly the same way both times in voxel-space.
Also note that no part of the algorithm is non-deterministic; there are no random numbers or approximations; subject to floating point error (which should be the same each time) we should be performing exactly the same computations on exactly the same numbers both times. Watershed runs using a big buffer of integers (here signed chars) and the results are counting clusters of those integers, all of which is implemented in one big C++ call.
I have tested the __file__ attribute of the relevant module objects (which are themselves attributes of the imported test), and they're pointing to the same installed watershed.so in my system's site-packages.
Questions
I don't even know where to begin debugging this - how is it possible to call the same function with the same input data and get different results? - what about interactive python might cause this (e.g. by changing the way the data is initialised)? - Which parts of the (rather large) codebase are relevant to these questions?
In my experience it's much more useful to post ALL the code in a stackoverflow question, and not assume you know where the problem is. However, that is thousands of lines of code here, and I have literally no idea where to start! I'm happy to post small snippets on request.
I'm also happy to hear debugging strategies - interpreter state that I can check, details about the way python might affect an imported C++ binary, and so on.
Here's the structure of the code:
project/
clibs/
custom_types/
adjacency.cpp (and hpp) << graph adjacency (2nd pass; irrelevant = irr)
*array.c (and h) << dynamic array of void*s
*bit_vector.c (and h) << int* as bitfield with helper functions
polyhedron.cpp (and hpp) << for voxel initialisation; convex hull result
smallest_ints.cpp (and hpp) << for voxel entity affiliation tracking (irr)
custom_types.cpp (and hpp) << wraps all files in custom_types/
delaunay.cpp (and hpp) << marshals calls to stripack.f90
*stripack.f90 (and h) << for computing the convex hulls of grains
tensors/
*D3Vector.cpp (and hpp) << 3D double vector impl with operators
watershed.cpp (and hpp) << main algorithm entry points (ini, +two passes)
pywat/
__init__.py
watershed.pyx << cython class, python entry points.
geometric_graph.py << python code for post processing (irr)
setup.py << compile and install directives
test/
test.py << entry point for testing installed lib
(files marked * have been used extensively in other projects and are very well tested, those suffixed irr contain code only run after the problem has been caused.)
Details
as requested, the main stanza in test/test.py:
testfile = 'valid_filename'
if __name__ == "__main__":
# handles segfaults...
import faulthandler
faulthandler.enable()
run(testfile)
and my interactive invocation looks like:
import test
test.run(test.testfile)
Clues
when I run this at the straight interpreter:
import faulthandler
faulthandler.enable()
import test
test.run(test.testfile)
I get the results from the file invocation (i.e. 1242 cells), although when I run it in bpython, it just crashes.
This is clearly the source of the problem - hats off to Ignacio Vazquez-Abrams for asking the right question straight away.
UPDATE:
I've opened a bug on the faulthandler github and I'm working towards a solution. If I find something that people can learn from I'll post it as an answer.
After debugging this application extensively (printf()ing out all the data at multiple points during the run, piping outputs to log files, diffing the log files) I found what seemed to cause the strange behaviour.
I was using uninitialised memory in a couple of places, and (for some bizarre reason) this gave me repeatable behaviour differences between the two cases I describe above - one without faulthandler and one with.
Incidentally, this is also why the bug disappeared from one machine but continued to manifest itself on another, part way through debugging (which really should have given me a clue!)
My mistake here was to assume things about the problem based on a spurious correlation - in theory the garbage ram should have been differently random each time I accessed it (ahh, theory.) In this case I would have been quicker finding the problem with a printout of the main calculation function and a rubber duck.
So, as usual, the answer is the bug is not in the library, it is somewhere in your code - in this case, it was my fault for malloc()ing a chunk of RAM, falsely assuming that other parts of my code were going to initialise it (which they only did sometimes.)

what is an easy / easiest way to *plot* a std::vector<double>?

I'm looking for something like:
std::vector<double> X = some_math_function( );
somenamespace :: plot( Wrapper( X ) ); // pop-up and display a graph of X on y-axis, 1 to X.size() on x-axis.
Obviously there are heavier-weight methods like setting up gnu-plot or whatever, and I've used the stuff in VTK charts. I just want a stupid, ghetto, plot to appear. This is for coarse debug checking things like "is the vector even changing? does it suddenly jerk when I move the camera?" and so on.
If this is for debugging why not just output the vector to a delimited file and plot in excel or gnuplot or something as a separate step?
so something like
//untested
ofstream myfilestream("myfile");
std::copy(X.begin(), X.end(), std::ostream_iterator<double>(myfilestream, '\n');
then just plot the file in what ever tool you like e.g.
gnuplot
plot "myfile" with lines
This thread seems to have quite a few suggestions on the matter. I haven't seen anything that stands out as a simple library for the purposes you want.
Here are a few lightweight examples, but it seems to me that if you've got to learn enough to stand up any library, you may as well stand up a respected one like gnuplot. In many cases the time you lose by having to deal with a more complex library is more than made up for by the community support and (relative) bugless..ness... of a more mature product.
koolplot
GOBLIN
You can use MathGL (cross-platform GPL plotting library). The code look as
mglGraphZB gr;// create canvas
mglData d; d.Set(X); // convert to internal format
gr.YRange(d); // set range for y-axis
gr.Plot(d); // plot it
gr.Axis(); // draw axis if you need
gr.WritePNG("1.png"); // save it
Using C++11: I would recommend using matplotlibcpp which uses python for plots. The library is REALLY straightforward to use and you only need to copy the header file in your repository.
The code would looks like:
#include "matplotlibcpp.h"
#include <vector>
#include <algorithm> // for std::iota
int main()
{
std::vector <double> y = {0.1, 0.2, 0.4, 0.8, 1.6};
std::vector <int> x(y.size());
std::iota(x.begin(), x.end(), 0);
matplotlibcpp::plot(x, y);
matplotlibcpp::show();
plt::save("plot.png");
}
In your cmake :
find_package(PythonLibs 2.7)
target_include_directories(myproject PRIVATE ${PYTHON_INCLUDE_DIRS})
target_link_libraries(myproject ${PYTHON_LIBRARIES})
Or pass it to directly to your compiler:
g++ main.cpp -std=c++11 -I/usr/include/python2.7 -lpython2.7
I remember being easy to plot curves with gd in php, but that was a long time ago.

Looping through the samples of CAF file

Could anyone give me a suggestion or an example of how I would loop through the samples of CAF (Core Audio Format) file? Like taking the first 1000 samples and changing them to 0?
Something like:
for(int i=0; i < numberOfSamples; i++)
{
cafFile.sample[i] = methodThatUsesSampleValue(cafFile.sample[i]);
}
Keep in mind:
I don't need to do this live. No buffers, etc. needed.
I would like to do this in C, C++ or Objective-C.
Would like to be able to use the code on the iOS platform.
Any help appreciated!
The libsndfile C++ library is well-known and provides -among lot of other functions - a function to read AIFF-files: http://www.mega-nerd.com/libsndfile/, the lib is distributed under the "Gnu Lesser General Public License". I don't know if that is an issue.
Recently I stumbled over the amazing OpenFrameWorks library: http://www.openframeworks.cc/about And I know it compiles on MacOS and iPhone. Amongst many(!) other libs it come with an interface to rtAudio: http://www.music.mcgill.ca/~gary/rtaudio/index.html. You might want to consider also using that directly. Hope that helps.