How does R/C compute the binomial distribution? - c++

In R the cumulative distribution function for the binomial distribution is called via an underlying C/C++ function called C_pbinom. I am trying to find the underlying code for this algorithm, so that I can find out what algorithm this function uses to compute the cumulative distribution function. Unfortunately, I have not been successful in finding the underlying code, nor any information on the algorithm that is used.
My question: How do I find the underlying C/C++ code for the function C_pbinom. Alternatively, is there any information source available showing the algorithm used by this function?
What I have done so far: Calling pbinom in R gives the following details:
function (q, size, prob, lower.tail = TRUE, log.p = FALSE)
.Call(C_pbinom, q, size, prob, lower.tail, log.p)
<bytecode: 0x000000000948c5a0>
<environment: namespace:stats>
I have located and opened the underlying NAMESPACE file in the stats library. This file lists various functions, including the pbinom function, but does not give code for the C_pbinom function, nor any pointer to where it can be found. I have also read a related answer on finding source code in R, and an article here on "compiling source codes", but neither has been of sufficient assistance to let me find the code. At this point I have his a dead end.

I went to the Github mirror for the R source code, searched for pbinom, and filtered to C: that got me here. The meat of the function is simply
pbeta(p, x + 1, n - x, !lower_tail, log_p)
This is invoking the incomplete beta function (= CDF of the Beta distribution): it means you need to in turn look up the pbeta function in the code: here, it says that the code is "a wrapper for TOMS708" , which is in src/nmath/toms708.c and described in a little more detail here (google "TOMS 708") ... original code here.
The full reference is here: Didonato and Morris, Jr.,
ACM Transactions on Mathematical Software (TOMS), Volume 18 Issue 3, Sept. 1992, Pages 360-373.

Related

How to call function from external library in C/C++

I want to find the symmetry group of an integer linear program. I think there is a function in skip called SCIPgetGeneratorsSymmetry . I how I can use this function?
You are right, to access symmetry information in SCIP, you have to call the function SCIPgetGeneratorsSymmetry() via C/C++. Note that you need to link SCIP against the external software bliss, because otherwise, SCIP is not able to compute symmetries of your (mixed-integer) linear program.
If you set up your (mixed-integer) linear program using a C/C++ project, you have several options for computing symmetries.
If you set the "recompute" parameter to FALSE, SCIP will return the currently available symmetry information - if symmetries have not been computed yet, SCIP will compute symmetries to give you access to this information.
If you set "recompute" to TRUE, SCIP will discard the available symmetry information and you get access to the generators of the current symmetry group. Moreover, you can control the kind of symmetries that are computed via the parameters "symspecrequire" and "symspecrequirefixed", e.g., to only compute symmetries of binary variables that fix continuous variables.
Edit:
If you have no experience with coding in C/C++ and you are only interested in printing the generators of the symmetry group, the easiest way is probably to modify SCIP's source code in presol_symmetry.c as follows:
Add two integer paramaters int i and int p at the very beginning of determineSymmetry().
Search within determineSymmetry() for the line in which computeSymmetryGroup() is called.
Add the following code snippet right after this function call:
for (p = 0; p < presoldata->nperms; ++p)
{
printf("permutation %d\n", p);
for (i = 0; i < presoldata->npermvars; ++i)
{
if ( TRUE )
printf("%d ", presoldata->perms[p][i]);
else
printf("%s ", SCIPvarGetName(presoldata->permvars[presoldata->perms[p][i]]));
}
printf("\n");
}
This code prints the generators of the symmetry group as a list of variable indices, e.g., 1 2 0 is the permutation that maps 0 -> 1, 1 -> 2, and 2 -> 0. If you change TRUE to FALSE, you get the same list but variable indices are replaced by their names.
Do not forget to recompile SCIP.
If you solve an instance with SCIP and symmetry handling is enabled, SCIP will print the generators in the above format whenever it computes the symmetry group. If you are interested in the symmetry group of the original problem, you should use the parameter setting presolving/symbreak/addconsstiming = 0 and propagating/orbitalfixing/symcomptiming = 0. If you are fine with symmetries of the presolved problem, change the zeros to ones.

change FMI variable/function during simulation

I have a simple FMU file which contains a sine block that takes u as input and outputs y. In this case, u is set to equal to time. In my C++ code I have loaded the FMI library from FMILibrary and had done all the necessary steps up to a point where I want to give my input u a new value of pi(as 3.14). So I went:
fmistatus = fmi2_import_set_real(fmu, &uRef, 1, &pi);
while (timeCurrent < timeEnd){
fmistatus = fmi2_import_do_step(fmu, timeCurrent , stepSize, fmi2_true);
timeCurrent += stepSize;
}
u was still set to time even though I tried to give it a new value. Did I miss something?
PS. Is there anywhere I can find a more detailed description on the FMI library functions? Currently I can only find input output descriptions or did I miss something again.
UPDATE: After a few trials, I think this issue might be because I was trying to redefine my equation u = time. In other words when I change my u variable into RealInput block in openmodelica everything goes fine. So what if I really wants to redefine a certain equation? what do I have to do?
You shall not be able to set any variable in FMI - and especially not a variable with a binding equation - and I assume your Modelica model has "u=time;". Instead of having "u=time" you need to add a top-level input without any equation (so that the exported FMI has it as an input) - and then connect that to the sine-block.
Details:
For a co-simulation FMI the restriction on what you can set are in the state-diagram in section 4.2.4 of FMI2 specification.
Between fmi2DoStep you can only set Real variables that have causality="input" or causality="parameter" and variability="tunable" - and an input with an equation doesn't qualify.
Before starting the integration you could set it for other variables as well, but that is only guess-values for the initialization - and should not over-write the "u=time" equation.

custom kernel in e1071

i am currently trying to tune the svm function in the e1071 package for R. my input is genomic data (that is each attribute takes a value in the set {-1, 0, 1}) and none of the four kernels currently offered in the package is really good for this kind of data --- i would like to use Hamming distance as my kernel instead.
the svm function, it seems, is written in C++. i have downloaded the source via
download.packages(pkgs = "e1071",
destdir = ".",
type = "source")
found the svm.cpp file containing code for the function and the corresponding kernel portion, where i can potentially add my own custom kernel. has anyone tried doing this? is it possible to do this? once i've finished modifying svm.cpp (provided i figure out how..), how do i make the package "see" the modified file?
You can modify the existing kernel.
I changed the return statement of radial kernel to make the changes..
You can try with that

Igraph eigenvector centrality Run-Time error c++

I'm writing program on c++ that needs to generate graphs and calculate some measures.I'm working with Visual Studio 2013 and Igraph C library. At this point I can create graphs from custom info and calculate some metrics like betweennes and closeness centrality, but when i try to calculate eigenvector centrality, the program crash and show me this message:
"Run-Time Check Failure #3 - The variable 'tgetv0' is being used without being initialized."
The tgetv0 variable is used inside of dgetv.c from Igraph source.
Here is my code:
void GraphObject::calcEigen()
{
igraph_arpack_options_t options;
igraph_real_t value;
igraph_vector_t weights;
igraph_vector_init(&weights, igraph_ecount(&cGraph)); //cGraph is already created.
igraph_vector_init(&eigenRes, igraph_vcount(&cGraph)); //All ..Res igraph_vector_t are declarated in header
igraph_vector_init(&betweennesRes, 0);
igraph_vector_init(&closenessRes, 0);
igraph_arpack_options_init(&options);
igraph_betweenness(&cGraph, &betweennesRes, igraph_vss_all(), 0, 0, 1);
igraph_closeness(&cGraph, &closenessRes, igraph_vss_all(), IGRAPH_ALL, 0, 1);
igraph_eigenvector_centrality(&cGraph, &eigenRes, &value, 0, 1, &weights, &options);
}
The closeness and betwenness are correctly calculated an "couted" but crash on eigenvector function.
After lot of research on documentation, internet and the debugger i cant't figure which is the problem, especially when I tryed the example code in the documentation http://igraph.org/c/doc/igraph-Structural.html#igraph_eigenvector_centrality (copy/paste) and makes the same. Is this a library or example issue, I a'm missing something?
When I init the weights vector and then I call igraph_null(&weights), it works but the result of all eigenvalues is 1, and this is incorrect result. What I'm doing wrong?
Let us assume that Visual Studio is right and we indeed have a variable named tgetv0 that is being used uninitialized. I scanned igraph's source code and it looks like there are two places where it could indeed be the case. One of them is in src/lapack/dnaupd.c, the other one is in src/lapack/dsaupd.c. Both of these files were converted from Fortran using f2c so it is hard to tell whether the issue was present in the original Fortran code or whether this was introduced during the conversion. Either way, you can probably fix this easily by looking up the lines where tgetv0 is declared in src/lapack/dnaupd.c and src/lapack/dsaupd.c and initializing it to a value of 0. In my version, the lines to change are line 486 in src/lapack/dnaupd.c and line 482 in src/lapack/dsaupd.c.
Please add a comment to confirm whether the solution works for you or not - if it works, I'll commit a patch to the igraph source tree.

my c++ extension behaves differently with faulthandler

Background
I have a C++ extension which runs a 3D watershed pass on a buffer. It's got a nice Cython wrapper to initialise a massive buffer of signed chars to represent the voxels. I initialise some native data structures in python (in a compiled cython file) and then call one C++ function to initialise the buffer, and another to actually run the algorithm (I could have written these in Cython too, but I'd like it to work as a C++ library as well without a python.h dependancy.)
Weirdness
I'm in the process of debugging my code, trying different image sizes to gauge RAM usage and speed, etc, and I've noticed something very strange about the results - they change depending on whether I use python test.py (specifically /usr/bin/python on Mac OS X 10.7.5/Lion, which is python 2.7) or python and running import test, and calling a function on it (and indeed, on my laptop (OS X 10.6.latest, with macports python 2.7) the results are also deterministically different - each platform/situation is different, but each one is always the same as itself.). In all cases, the same function is called, loads some input data from a file, and runs the C++ module.
A note on 64-bit python - I am not using distutils to compile this code, but something akin to my answer here (i.e. with an explicit -arch x86_64 call). This shouldn't mean anything, and all my processes in Activity Monitor are called Intel (64-bit).
As you may know, the point of watershed is to find objects in the pixel soup - in 2D it's often used on photos. Here, I'm using it to find lumps in 3D in much the same way - I start with some lumps ("grains") in the image and I want to find the inverse lumps ("cells") in the space between them.
The way the results change is that I literally find a different number of lumps. For exactly the same input data:
python test.py:
grain count: 1434
seemed to have 8000000 voxels, with average value 0.8398655
find cells:
running watershed algorithm...
found 1242 cells from 1434 original grains!
...
however,
python, import test, test.run():
grain count: 1434
seemed to have 8000000 voxels, with average value 0.8398655
find cells:
running watershed algorithm...
found 927 cells from 1434 original grains!
...
This is the same in the interactive python shell and bpython, which I originally thought was to blame.
Note the "average value" number is exactly the same - this indicates that the same fraction of voxels have initially been marked as in the problem space - i.e. that my input file was initialised in (very very probably) exactly the same way both times in voxel-space.
Also note that no part of the algorithm is non-deterministic; there are no random numbers or approximations; subject to floating point error (which should be the same each time) we should be performing exactly the same computations on exactly the same numbers both times. Watershed runs using a big buffer of integers (here signed chars) and the results are counting clusters of those integers, all of which is implemented in one big C++ call.
I have tested the __file__ attribute of the relevant module objects (which are themselves attributes of the imported test), and they're pointing to the same installed watershed.so in my system's site-packages.
Questions
I don't even know where to begin debugging this - how is it possible to call the same function with the same input data and get different results? - what about interactive python might cause this (e.g. by changing the way the data is initialised)? - Which parts of the (rather large) codebase are relevant to these questions?
In my experience it's much more useful to post ALL the code in a stackoverflow question, and not assume you know where the problem is. However, that is thousands of lines of code here, and I have literally no idea where to start! I'm happy to post small snippets on request.
I'm also happy to hear debugging strategies - interpreter state that I can check, details about the way python might affect an imported C++ binary, and so on.
Here's the structure of the code:
project/
clibs/
custom_types/
adjacency.cpp (and hpp) << graph adjacency (2nd pass; irrelevant = irr)
*array.c (and h) << dynamic array of void*s
*bit_vector.c (and h) << int* as bitfield with helper functions
polyhedron.cpp (and hpp) << for voxel initialisation; convex hull result
smallest_ints.cpp (and hpp) << for voxel entity affiliation tracking (irr)
custom_types.cpp (and hpp) << wraps all files in custom_types/
delaunay.cpp (and hpp) << marshals calls to stripack.f90
*stripack.f90 (and h) << for computing the convex hulls of grains
tensors/
*D3Vector.cpp (and hpp) << 3D double vector impl with operators
watershed.cpp (and hpp) << main algorithm entry points (ini, +two passes)
pywat/
__init__.py
watershed.pyx << cython class, python entry points.
geometric_graph.py << python code for post processing (irr)
setup.py << compile and install directives
test/
test.py << entry point for testing installed lib
(files marked * have been used extensively in other projects and are very well tested, those suffixed irr contain code only run after the problem has been caused.)
Details
as requested, the main stanza in test/test.py:
testfile = 'valid_filename'
if __name__ == "__main__":
# handles segfaults...
import faulthandler
faulthandler.enable()
run(testfile)
and my interactive invocation looks like:
import test
test.run(test.testfile)
Clues
when I run this at the straight interpreter:
import faulthandler
faulthandler.enable()
import test
test.run(test.testfile)
I get the results from the file invocation (i.e. 1242 cells), although when I run it in bpython, it just crashes.
This is clearly the source of the problem - hats off to Ignacio Vazquez-Abrams for asking the right question straight away.
UPDATE:
I've opened a bug on the faulthandler github and I'm working towards a solution. If I find something that people can learn from I'll post it as an answer.
After debugging this application extensively (printf()ing out all the data at multiple points during the run, piping outputs to log files, diffing the log files) I found what seemed to cause the strange behaviour.
I was using uninitialised memory in a couple of places, and (for some bizarre reason) this gave me repeatable behaviour differences between the two cases I describe above - one without faulthandler and one with.
Incidentally, this is also why the bug disappeared from one machine but continued to manifest itself on another, part way through debugging (which really should have given me a clue!)
My mistake here was to assume things about the problem based on a spurious correlation - in theory the garbage ram should have been differently random each time I accessed it (ahh, theory.) In this case I would have been quicker finding the problem with a printout of the main calculation function and a rubber duck.
So, as usual, the answer is the bug is not in the library, it is somewhere in your code - in this case, it was my fault for malloc()ing a chunk of RAM, falsely assuming that other parts of my code were going to initialise it (which they only did sometimes.)