I just implemented the numerical integration for a set of coupled ODEs
from a discretized PDE using the odeint C++ library. It works nicely and
is lightning fast, but there is one issue:
My system of ODEs has, so-called, absorbing boundary conditions: the time
derivatives of my state variable, n, which is a vector of N doubles
(a population density) gets calculated in the system function, but before that happens
(or after the time integration) I would like to set:
n[N]=n[N-2];
n[N-1]=n[N-2];
However, of course this doesn't work because the state variable in the system
function is declared as const, and it looks as if this could not be changed
other than through meddling with the library... is there any way around this?
I should mention that setting dndt[N] and dndt[N-1] to zero might look like a
solution, but it doesn't really help as it defies the concept of absorbing boundary
conditions (n[N] and n[N-1] would then always have the values they had at t=0, rather
then the value of n[N-2] at any point in time), and so I'd really prefer to change n.
Thanks for any help!
Regards,
Michael
Usually absorbing boundary condition manifests itself in the equations of motion. n[N] = n[N-1] = n[N-2], so can insert n[N]=n[N-2] and n[N-1]=n[N-2] into the equation for dndt[N-2].
For example, the discrete Laplacian Lx[i] = x[i+1]-2 x[i] +x[i-1] with absorbing boundaries x[n]=x[n-1] can be written as Lx[n-1] = x[n-2] - x[n-1]. The equation for x[n] can then be omitted.
Related
When optimizing a SVGP with Poisson Likelihood for a big data set I see what I think are exploding gradients.
After a few epochs I see a spiky drop of the ELBO, which then very slowly recovers after getting rid of all progress made before.
Roughly 21 iterations correspond to an Epoch.
This spike (at least the second one) resulted in a complete shift of the parameters (for vectors of parameters I just plotted the norm to see changes):
How can I deal with that? My first approach would be to clip the gradient, but that seems to require digging around the gpflow code.
My Setup:
Training works via Natural Gradients for the variational parameters and ADAM for the rest, with a slowly (linearly) increasing schedule for the Natural Gradient Gamma.
The batch and inducing point sizes are as large as possible for my setup
(both 2^12, with the data set consisting of ~88k samples). I include 1e-5 jitter and initialize the inducing points with kmeans.
I use a combined kernel, consisting of a combination of RBF, Matern52, a periodic and a linear kernel on a total of 95 features (a lot of them due to a one-hot encoding), all learnable.
The lengthscales are transformed with gpflow.transforms.
with gpflow.defer_build():
k1 = Matern52(input_dim=len(kernel_idxs["coords"]), active_dims=kernel_idxs["coords"], ARD=False)
k2 = Periodic(input_dim=len(kernel_idxs["wday"]), active_dims=kernel_idxs["wday"])
k3 = Linear(input_dim=len(kernel_idxs["onehot"]), active_dims=kernel_idxs["onehot"], ARD=True)
k4 = RBF(input_dim=len(kernel_idxs["rest"]), active_dims=kernel_idxs["rest"], ARD=True)
#
k1.lengthscales.transform = gpflow.transforms.Exp()
k2.lengthscales.transform = gpflow.transforms.Exp()
k3.variance.transform = gpflow.transforms.Exp()
k4.lengthscales.transform = gpflow.transforms.Exp()
m = gpflow.models.SVGP(X, Y, k1 + k2 + k3 + k4, gpflow.likelihoods.Poisson(), Z,
mean_function=gpflow.mean_functions.Constant(c=np.ones(1)),
minibatch_size=MB_SIZE, name=NAME)
m.mean_function.set_trainable(False)
m.compile()
UPDATE: Using only ADAM
Following the suggestion by Mark, I switched to ADAM only,
which helped me get rid of that sudden explosion. However, I still initialized with an epoch of natgrad only, which seems to save a lot of time.
In addition, the variational parameters seem to change a lot less abrupt (in terms of their norm at least). I guess they'll converge way slower now, but at least it's stable.
Just to add to Mark's answer above, when using nat grads in non-conjugate models it can take a bit of tuning to get the best performance, and instability is potentially a problem. As Mark points out, the large steps that provide potentially faster convergence can also lead to the parameters ending up in in bad regions of the parameter space. When the variational approximation is good (i.e. the true and approximate posterior are close) then there is good reason to expect that the nat grad will perform well, but unfortunately there is no silver bullet in the general case. See https://arxiv.org/abs/1903.02984 for some intuition.
This is very interesting. Perhaps trying to not use natgrads is a good idea as well. Clipping gradients indeed seems like a hack that could work. And yes, this would require digging around in the GPflow code a bit. One tip that can help towards this, is by not using the GPflow optimisers directly. The model._likelihood_tensor contains the TF tensor that should be optimised. Perhaps with some manual TensorFlow magic, you can do the gradient clipping on here before running an optimiser.
In general, I think this sounds like you've stumbled on an actual research problem. Usually these large gradients have a good reason in the model, which can be addressed with careful thought. Is it variance in some monte carlo estimate? Is the objective function behaving badly?
Regarding why not using natural gradients helps. Natural gradients use the Fisher matrix as a preconditioner to perform second order optimisation. Doing so can result in quite aggressive moves in parameter space. In certain cases (when there are usable conjugacy relations) these aggressive moves can make optimisation much faster. This case, with the Poisson likelihood, is not one where there are conjugacy relations that will necessarily help optimisation. In fact, the Fisher preconditioner can often be detrimental, particularly when variational parameters are not near the optimum.
I am looking for a simple way to obtain a lot of "good" solutions in a LP problem (not MIP) with CPLEX, and not only (one of the) optimal basic solution(s). By "good" solutions I mean that the corresponding objective values are not so far from the real optimal value. Such pool of solutions could help the decision-maker...
More precisely, given a certain polyedron Ax<=b with x>=0 and an objective function z=cx I want to maximize, after running the LP, I can obtain the optimal value z*. Then I want to enumerate all the extreme points of the polyhedron given by the set of constraints
Ax <= b
cx >= z* - epsilon
x >= 0
when epsilon is a given tolerance.
I know that CPLEX offers way to generate solution pool (see here), but it will not function because this method is for MIP : it enumerates all the solutions of an IP (or one solution for every given set of fixed integer variables if the problem is a MIP).
An interesting efficient way is to visit the adjacent solutions of the optimal basic solution, i.e. all the adjacent extreme points : if I suppose the polyhedron is not degenerative, for each pair of basic variable x_B and non-basic variable x_N, I compute the basic solution obtained when x_B leaves the basis and x_N enters in the basis. Then I throw the solutions with cx < z*-epsilon, and for the others I repeat the procedure. [I know that I could improve this algorithm, but this is the general idea].
The routine CPPXpivot of the Callable Library could help to do this pivoting operation, but I did not find an equivalent in the C++ API (concert technology). Does someone know if such an equivalent exist, or could propose me an other way to answer my original problem ?
Thanks a lot :) !
Rémi L.
There is one interesting way to make this suitable for use with the Cplex solution pool. Use binary variables to encode the current basis, e.g. basis[k] = 0 meaning nonbasic and basis[k] = 1 indicating variable (or row) k is basic. Of course we have sum(k, basis[k]) = m (number of rows). Finally we have x[k] <= basis[k] * upperbound[k] (i.e. if nonbasic then zero -- assuming positive variables). When we add this to the LP model we end up with a MIP and can enumerate (all or some, optimal or near optimal) bases using the Cplex solution pool. See here and here.
I've been reading this website: http://www.csimn.com/CSI_pages/PIDforDummies.html and I'm confused about the proportional integral part. Here's what it says.
Proportional control
Here’s a diagram of the controller when we have enabled only P control:
In Proportional Only mode, the controller simply multiplies the Error by the Proportional Gain (Kp) to get the controller output.
The Proportional Gain is the setting that we tune to get our desired performance from a “P only” controller.
A match made in heaven: The P + I Controller
If we put Proportional and Integral Action together, we get the humble PI controller. The Diagram below shows how the algorithm in a PI controller is calculated.
The tricky thing about Integral Action is that it will really screw up your process unless you know exactly how much Integral action to apply.
A good PID Tuning technique will calculate exactly how much Integral to apply for your specific process - but how is the Integral Action adjusted in the first place?
As you can see, the proportional part is easy to understand it says that you multiply error by tuning variable. The part that I don't get is where you get the P and I from on the second part, and what mathematical operation you do with them. I don't have a degree in mathematics or advanced calculus knowledge, so I would appreciate it if you would try to keep it algebra level.
There is a big part missing from the text, the actual physical system that turns the control into a process and the actual physical variable.
Think of the integral as some kind of averaging operation that filters out small oscillations in the PV input. It also represents some kind of memory of the immediate past of the process.
A moving exponential average, for instance, can be thought of being a mix of integral and proportional action.
Staying with the car driving example, if you come to a curb where you need the steering wheel in a certain position to go in a circle, you don't just yank the wheel to that position, you move it gradually (most of the time). Exactly such ramp-up and -down actions are effects of using the integral action part.
I integral part is just summation also multiplied by some constant.
Analogue integration is done by nonlinear gain and amplifier.
Digital integration of first order is just:
output += input*dt;
second order is:
temp += input*dt;
output += temp*dt;
dt is the duration time of iteration loop (timer or what ever)
do not forget that PI regulator can have more complicated response
i1 += input*dt;
i2 += i1*dt;
i3 += i2*dt;
output = a0*input + a1*i1 + a2*i2 +a3*i3 ...;
where a0 is the P part
Now the I regulator adds more and more amount of control value
until the controlled value is the same as the preset value
the longer it takes to match it the faster it controls
this creates fast oscillations around preset value
in comparison to P with the same gain
but in average the control time is smaller then in just P regulators
therefore the I gain is usually much much smaller which creates the memory and smooth effect LutzL mentioned. (while the regulation time is similar or smaller then just for P regulation)
The controlled device has its own response
this can be represented as differential function
there is a lot of theory in cybernetics about obtaining the right regulator response
to match your process needs as:
quality of control
reaction times
max oscillations amplitude
stability
but for all you need differential math like solving system of differential equations of any order
strongly recommend use of Laplace transform
but many people also use Z transform instead
So I-regulator add speed to regulation
but it also create bigger oscillations
and when not matching the regulated system properly also creates instability
Integration adds overflow risks to regulation (Analog integration is very sensitive to it)
Also take in mind you can also substracting the I part from control value
which will make the exact opposite
sometimes the combination of more I parts are used to match desired regulation response shape
I am trying to solve two point boundary problem with odeint. My equation has the form of
y'' + a*y' + b*y + c = 0
It is pretty trivial when I have boundary conditions of y(x_1) = y_1 , y'(x_2) = y_2, but when boundary conditions are y(x_1) = y_1 , y(x_2) = y_2 I am lost. Does anybody know the way to deal with problems like this with odeint or other scientific library?
In this case you need a shooting method. odeint does not have such a method, it solved the initial value problem (IVP) which is your first case. I think in the Numerical Recipies this method is explained and you can use Boost.Odeint to do the time stepping.
An alternative and more efficient method to solve this type of problem is finite differences or finite elements method. For finite differences you can check Numerical Recipes. For finite elements I recommend dealii library.
Another approach is to use b-splines: Assuming you do know the initial x0 and final xfinal points of integration, then you can expand the solution y(x) in a b-spline basis, defined over (x0,xfinal), i.e.
y(x)= \sum_{i=1}^n A_i*B_i(x),
where A_i are constant coefficients to be determined, and B_i(x) are b-spline basis (well defined polynomial functions, that can be differentiated numerically). For scientific applications you can find an implementation of b-splines in GSL.
With this substitution the boundary value problem is reduced to a linear problem, since (am using Einstein summation for repeated indices):
A_i*[ B_i''(x) + a*B_i'(x) + b*B_i(x)] + c =0
You can choose a set of points x and create a linear system from the above equation. You can find information for this type of method in the following review paper "Applications of B-splines in Atomic and Molecular Physics" - H Bachau, E Cormier, P Decleva, J E Hansen and F Martín
http://iopscience.iop.org/0034-4885/64/12/205/
I do not know of any library solving directly this problem, but there are several libraries for B-splines (I recommend GSL for your needs), that will allow you to form the linear system. See this stackoverflow question:
Spline, B-Spline and NURBS C++ library
First, please excuse my ignorance in this field, I'm a programmer by trade but have been stuck in a situation a little beyond my expertise (in math and signals processing).
I have a Matlab script that I need to port to a C++ program (without compiling the matlab code into a DLL). It uses the hilbert() function with one argument. I'm trying to find a way to implement the same thing in C++ (i.e. have a function that also takes only one argument, and returns the same values).
I have read up on ways of using FFT and IFFT to build it, but can't seem to get anything as simple as the Matlab version. The main thing is that I need it to work on a 128*2000 matrix, and nothing I've found in my search has showed me how to do that.
I would be OK with either a complex value returned, or just the absolute value. The simpler it is to integrate into the code, the better.
Thank you.
The MatLab function hilbert() does actually not compute the Hilbert transform directly but instead it computes the analytical signal, which is the thing one needs in most cases.
It does it by taking the FFT, deleting the negative frequencies (setting the upper half of the array to zero) and applying the inverse FFT. It would be straight forward in C/C++ (three lines of code) if you've got a decent FFT implementation.
This looks pretty good, as long as you can deal with the GPL license. Part of a much larger numerical computing resource.
Simple code below. (Note: this was part of a bigger project). The value for L is based on the your determination of your order, N. With N = 2L-1. Round N to an odd number. xbar below is based on the signal you define as the input to your designed system. This was implemented in MATLAB.
L = 40;
n = -L:L; % index n from [-40,-39,....,-1,0,1,...,39,40];
h = (1 - (-1).^n)./(pi*n); %impulse response of Hilbert Transform
h(41) = 0; %Corresponds to the 0/0 term (for 41st term, 0, in n vector above)
xhat = conv(h,xbar); %resultant from Hilbert Transform H(w);
plot(abs(xhat))
Not a true answer to your question but maybe a way of making you sleep better. I believe that you won't be able to be much faster than Matlab in the particular case of what is basically ffts on a matrix. That is where Matlab excels!
Matlab FFTs are computed using FFTW, the de-facto fastest FFT algorithm written in C which seem to be also parallelized by Matlab. On top of that, quoting from http://www.mathworks.com/help/matlab/ref/fftw.html:
For FFT dimensions that are powers of 2, between 214 and 222, MATLAB
software uses special preloaded information in its internal database
to optimize the FFT computation.
So don't feel bad if your code is slightly slower...