mlpack: Lasso regression that takes in pointer to function - c++

from
http://www.mlpack.org/doxygen.php?doc=classmlpack_1_1regression_1_1LARS.html
I'm trying to use
void mlpack::regression::LARS::Regress
but the function itself only takes in some &gramMatrix as input. If I want to pass in function to compute some sum R_i * X_i, I'm stuck because it only takes in pointer to matrix. Any idea on how to get around this ? ( the beta is constantly updating within the optimization function void mlpack::regression::LARS::Regress, and beta is neeeded to compute sum R_i * X_i ).
Any suggestion to other C++ ml library would also be madly helpful.
Thanks!

Related

Critical points of an absolute value periodic function in Sympy

I want to look for the critical points of the following periodic function abs(sin(x)) for x in the Real set.
So far I have tried:
from sympy import *
x = Symbol("x")
f = abs(sin(x))
solveset(diff(f), x, S.Reals)
with no luck, I then tried to unpack the absolute value into a piecewise function:
as_piece = Piecewise((sin(x), sin(x)>=0), (-sin(x), sin(x)<0))
solveset(diff(as_piece), x, S.Reals)
and again solveset cannot solve it. The interesting thing is that solve is able to solve as_piece but of course returns only the first two critical points as its output is not a set.

Correct implementation of a step-size controller for embedded Runge-Kutta methods

I have been trying to write a code in C++ for embedded Runge-Kutta methods (explicit and Rosenbrock for the moment). The idea is to keep the code simple and general so that one can pass their Butcher tableau (of any order) and just run it.
I have verified that the code works in general, but there are cases (when I have a very complicated system of 4 differential equations) where the stepsize control fails to adapt (I get constant stepsize or just wrong in general).
The stepsize control I use (I found it in this paper is:
//beta is some safety parameter
fac=beta;
if(Delta<=1) { //if current step is accepeted
if(delta_rej<=1){fac*=h/h_old; } //if previous step was rejected
fac*=std::pow(Delta, -0.65/( (LD) method.p + 1.) );
fac*=std::pow( delta_acc/Delta, 0.3/ ( (LD) method.p + 1. ) );
h_stop=true ; //this is used exit a loop inside which the stepsize control is called
}else{ //if current step is rejected
fac*=std::pow( Delta , -1./((LD) method.p +1. ) );
}
//don't allow h to increase or decrease very much
if(fac> fac_max){fac = fac_max;}
if(fac< fac_min){fac = fac_min;}
//update h
h=h*fac;
Here, h_old is the previously accepted stepsize, the step size of the current trial step is h.
Also, Delta [1] is the relative (local) error estimate for the current try (which the controller tries to make ~1), delta_rej is the Delta of the previous try, delta_acc is the Delta for the previous accepted step, and method.p is the order of the method (LD is a macro that can be double or long double).
I have tried using the simple version of this (i.e. just fac*=std:: pow( Delta, -1./((LD) method.p +1. ) );), but it seems that the previous one is a bit more stable.
For example, these are histograms I got that show the number of steps taken by my code vs scipy:
Explicit RKF,
Rosenbrock. As you can see, they are close, and the difference can be caused by the difference in the details of the implementation.
Having said that, I am still not sure, and what I really would like to know is whether I am using the controller correctly.
Thanks
[1]: This is the definition of Delta

How does R/C compute the binomial distribution?

In R the cumulative distribution function for the binomial distribution is called via an underlying C/C++ function called C_pbinom. I am trying to find the underlying code for this algorithm, so that I can find out what algorithm this function uses to compute the cumulative distribution function. Unfortunately, I have not been successful in finding the underlying code, nor any information on the algorithm that is used.
My question: How do I find the underlying C/C++ code for the function C_pbinom. Alternatively, is there any information source available showing the algorithm used by this function?
What I have done so far: Calling pbinom in R gives the following details:
function (q, size, prob, lower.tail = TRUE, log.p = FALSE)
.Call(C_pbinom, q, size, prob, lower.tail, log.p)
<bytecode: 0x000000000948c5a0>
<environment: namespace:stats>
I have located and opened the underlying NAMESPACE file in the stats library. This file lists various functions, including the pbinom function, but does not give code for the C_pbinom function, nor any pointer to where it can be found. I have also read a related answer on finding source code in R, and an article here on "compiling source codes", but neither has been of sufficient assistance to let me find the code. At this point I have his a dead end.
I went to the Github mirror for the R source code, searched for pbinom, and filtered to C: that got me here. The meat of the function is simply
pbeta(p, x + 1, n - x, !lower_tail, log_p)
This is invoking the incomplete beta function (= CDF of the Beta distribution): it means you need to in turn look up the pbeta function in the code: here, it says that the code is "a wrapper for TOMS708" , which is in src/nmath/toms708.c and described in a little more detail here (google "TOMS 708") ... original code here.
The full reference is here: Didonato and Morris, Jr.,
ACM Transactions on Mathematical Software (TOMS), Volume 18 Issue 3, Sept. 1992, Pages 360-373.

vtktriangle compute normal from arbitrary points with python

I am using python wrappings for VTK. I want my script to let the user pick three arbitrary points and return a triangle with its normal information. In VTK VTK Triangle reference there is vtkTriangle::ComputeNormal (double v1[3], double v2[3],double v3[3],double n[3]).
I checked Cxx implementation examples about vtkTriangle but, I don't understand how to implement this in Python. Does n[3] stand for the normal? If so what it should be as an input parameter?
#g.stevo I understand that. However, when I give a random value the method ComputeNormal returns None. To be more clear you can find the snippet of related code below:
`p0 = trianglePolyData.GetPoints().GetPoint(0)
p1 = trianglePolyData.GetPoints().GetPoint(1)
p2 = trianglePolyData.GetPoints().GetPoint(2)
print vtk.vtkTriangle().TriangleArea(p0,p1,p2)
n=[0.0,0.0,0.0]
print vtk.vtkTriangle().ComputeNormal(p0,p1,p2,n)`
Your code is working. The result you are looking for is in the array n. The function ComputeNormal returns void, according to the documentation.
Try this:
n=[0.0,0.0,0.0]
vtk.vtkTriangle().ComputeNormal(p0,p1,p2,n)
print n

Cuda: least square solving , poor in speed

Recently ,I use Cuda to write an algorithm called 'orthogonal matching pursuit' . In my ugly Cuda code the entire iteration takes 60 sec , and Eigen lib takes just 3 sec...
In my code Matrix A is [640,1024] and y is [640,1] , in each step I select some vectors from A to compose a new Matrix called A_temp [640,itera], iter=1:500 . I new a array MaxDex_Host[] in cpu to tell which column to select .
I want to get x_temp[itera,1] from A_temp*x_temp=y using least-square , I use a cula API 'culaDeviceSgels' and cublas matrix-vector multiplication API.
So the culaDeviceSgels would call 500 times , and I think this would be faster than Eigen lib's QR.Sovler .
I check the Nisight performence anlysis , I found the custreamdestory takes a long time . I initial cublas before iteration and destory it after I get the result . So I want to know the what is the custreamdestory , different with cublasdestory?
The main problem is memcpy and function 'gemm_kernel1x1val' . I think this function is from 'culaDeviceSgels'
while(itera<500): I use cublasSgemv and cublasIsamax to get MaxDex_Host[itera] , then
MaxDex_Host[itera]=pos;
itera++;
float* A_temp_cpu=new float[M*itera]; // matrix all in col-major
for (int j=0;j<itera;j++) // to get A_temp [M,itera] , the MaxDex_Host[] shows the positon of which column of A to chose ,
{
for (int i=0;i<M;i++) //M=640 , and A is 640*1024 ,itera is add 1 each step
{
A_temp_cpu[j*M+i]=A[MaxDex_Host[j]*M+i];
}
}
// I must allocate one more array because culaDeviceSgels will decompose the one input Array , and I want to use A_temp after least-square solving.
float* A_temp_gpu;
float* A_temp2_gpu;
cudaMalloc((void**)&A_temp_gpu,Size_float*M*itera);
cudaMalloc((void**)&A_temp2_gpu,Size_float*M*itera);
cudaMemcpy(A_temp_gpu,A_temp_cpu,Size_float*M*itera,cudaMemcpyHostToDevice);
cudaMemcpy(A_temp2_gpu,A_temp_gpu,Size_float*M*itera,cudaMemcpyDeviceToDevice);
culaDeviceSgels('N',M,itera,1,A_temp_gpu,M,y_Gpu_temp,M);// the x_temp I want is in y_Gpu_temp's return value , stored in the y_Gpu_temp[0]——y_Gpu_temp[itera-1]
float* x_temp;
cudaMalloc((void**)&x_temp,Size_float*itera);
cudaMemcpy(x_temp,y_Gpu_temp,Size_float*itera,cudaMemcpyDeviceToDevice);
Cuda's memory manage seems too complex , is there any other convenience method to solve least-square?
I think that custreamdestory and gemm_kernel1x1val are internally called by the APIs you are using, so there is not much to do with them.
To improve your code, I would suggest to do the following.
You can get rid of A_temp_cpu by keeping a device copy of the matrix A. Then you can copy the rows of A into the rows of A_temp_gpu and A_temp2_gpu by a kernel assignment. This would avoid performing the first two cudaMemcpys.
You can preallocate A_temp_gpu and A_temp2_gpu outside the while loop by using the maximum possible value of itera instead of itera. This will avoid the first two cudaMallocs inside the loop. The same applies to x_temp.
As long as I know, culaDeviceSgels solves a linear system of equations. I think you can do the same also by using cuBLAS APIs only. For example, you can perform an LU factorization first by cublasDgetrfBatched() and then use cublasStrsv() two times to solve the two arising linear systems. You may wish to see if this solution leads to a faster algorithm.