Related
I am trying to implement Non Linear MPC for a 7-DOF manipulator in drake. To do this, in my constraints, I need to have dynamic parameters like the Mass matrix M(q) and the bias term C(q,q_dot)*q_dot, but those depend on the decision variables q, q_dot.
I tried the following
// finalize plant
// create builder, diagram, context, plant context
...
// formulate optimazation problem
drake::solvers::MathematicalProgram prog;
// create decision variables
...
std::vector<drake::solvers::VectorXDecisionVariable> q_v;
std::vector<drake::solvers::VectorXDecisionVariable> q_ddot;
for (int i = 0; i < H; i++) {
q_v.push_back(prog.NewContinuousVariables<14>(state_var_name));
q_ddot.push_back(prog.NewContinuousVariables<7>(input_var_name));
}
// add cost
...
// add constraints
...
for (int i = 0; i < H; i++) {
plant.SetPositionsAndVelocities(*plant_context, q_v[i]);
plant.CalcMassMatrix(*plant_context, M);
plant.CalcBiasTerm(*plant_context, C_q_dot);
}
...
for (int i = 0; i < H; i++) {
prog.AddConstraint( M * q_ddot[i] + C_q_dot + G >= lb );
prog.AddConstraint( M * q_ddot[i] + C_q_dot + G <= ub );
}
// solve prog
...
The above code will not work, because plant.SetPositionsAndVelocities(.) doesn't accept symbolic variables.
Is there any way to integrate M,C in my ocp constraints ?
I think you want to impose the following nonlinear nonconvex constraint
lb <= M * qddot + C(q, v) + g(q) <= ub
This constraint is non-convex. We will need to solve it through nonlinear optimization, and evaluate the constraint in every iteration of the nonlinear optimization. We can't do this evaluation using symbolic computation (it would be horribly slow with symbolic computation).
So you will need a constraint evaluator, something like this
// This constraint takes [q;v;vdot] and evaluate
// M * vdot + C(q, v) + g(q)
class MyConstraint : public solvers::Constraint {
public:
MyConstraint(const MultibodyPlant<AutoDiffXd>& plant, systems::Context<AutoDiffXd>* context, const Eigen::Ref<const Eigen::VectorXd>& lb, const Eigen::Ref<const Eigen::VectorXd>& ub) : solvers::Constraint(plant.num_velocitiex(), plant.num_positions() + 2 * plant.num_velocities(), lb, ub), plant_{plant}, context_{context} {
...
}
private:
void DoEval(const Eigen::Ref<const AutoDiffVecXd>& x, AutoDiffVecXd* y) const {
...
}
MultibodyPlant<AutoDiffXd> plant_;
systems::Context<AutoDiffXd>* context_;
};
int main() {
...
// Construct the constraint and add it to every time instances
std::vector<std::unique_ptr<systems::Context<AutoDiffXd>>> plant_contexts;
for (int i = 0; i < H; ++i) {
plant_contexts.push_back(plant.CreateDefaultContext());
prog.AddConstraint(std::make_shared<MyConstraint>(plant, plant_context[i], lb, ub), {q_v[i], qddot[i]});
}
}
You could refer to the class CentroidalMomentumConstraint on how to construct your own MyConstraint class.
In C++, when using LibTorch (The C++ version of PyTorch), what should you store a batch of tensors in? I'm running into the problem of not being able to reset the batch on the next step because C++ doesn't allow storing a new variable over an existing variable.
In my attempt my batch of tensors is one single 385x385 tensor. The batch size is 385. In a for loop I use torch::cat to concatenate 385 smaller 1D tensors, which are 385 numbers long. (Maybe 'stack' or 'append' are better terms for what I'm doing since the are stacked together picket fence style more than 'concatenated', but that's what I'm using.) Anyways, there is not problem with this shape. It seems to work fine for one forward and backward pass but then the tensor becomes 770x385 on the next pass instead of a 385x385 tensor of the next 385, 385 long arrays. I hope I am painting a picture and not being too verbose.
The code.
Near the bottom I have the line all_step_obs = torch::tensor({}); to try to wipe out the contents of the tensor, AKA, the batch, but this gives me a Segmentation fault (core dumped). I guess for trying to access the tensor outside of the loop(?)
If I don't have this line I get a 770x385 tensor after the next step.
The model
#include "mujoco/mujoco.h"
struct Net : torch::nn::Module {
torch::Tensor action_high, action_low;
public:
Net(torch::Tensor action_high, torch::Tensor action_low) : action_high(action_high), action_low(action_low){
// Construct and register two Linear submodules.
fc1 = torch::nn::Linear(385, 385);
fc2 = torch::nn::Linear(385, 385);
fc3 = torch::nn::Linear(385, 42);
// cholesky_layer = torch::nn::Linear(385, (42 * (42 + 1)) / 2);
cholesky_layer = torch::nn::Linear(385, 385);
}
// Implement the Net's algorithm.
torch::Tensor forward(torch::Tensor x) {
// Use one of many tensor manipulation functions.
x = torch::relu(fc1->forward(x));
x = torch::dropout(x, /*p=*/0.2, /*train=*/is_training());
x = torch::relu(fc2->forward(x));
auto mean_layer = fc3->forward(x);
auto mean = action_low + (action_high - action_low) * mean_layer;
auto chol_l = cholesky_layer->forward(x);
// auto chol = torch::rand({385, 385});
auto chol = torch::matmul(chol_l, chol_l.transpose(0, 1));
chol = torch::nan_to_num(chol, 0, 2.0);
chol = chol.add(torch::eye(385));
auto cholesky = torch::linalg::cholesky(chol);
// return torch::cat({mean, cholesky}, 0);
return mean_layer;
}
// Use one of many "standard library" modules.
torch::nn::Linear fc1{nullptr}, fc2{nullptr}, fc3{nullptr}, cholesky_layer{nullptr};
};
The training
auto high = torch::ones({385, 42}) * 0.4;
auto low = torch::ones({385, 42}) * -0.4;
auto actor = Net(low, high);
int max_steps = 385;
int steps = 2000;
auto l1_loss = torch::smooth_l1_loss;
auto optimizer = torch::optim::Adam(actor.parameters(), 3e-4);
torch::Tensor train() {
torch::Tensor all_step_obs;
for (int i = 0; i<steps; ++i)
{
for (int i = 0; i<max_steps; ++i)
{
all_step_obs = torch::cat({torch::rand({385}).unsqueeze(0), all_step_obs});
}
auto mean = actor.forward(all_step_obs);
auto loss = l1_loss(mean, torch::rand({385, 42}), 1, 0);
optimizer.zero_grad();
loss.backward();
optimizer.step();
all_step_obs = torch::tensor({});
if (steps == 1999) {
return loss;
}
}
};
int main (int argc, const char** argv) {
std::cout << train();
}
I have been struggling for a few days now to figure out how to port code from MatLab or Python to C++...
My problem is with porting a Non-Linear Least Squares optimization function and I have basically hit a roadblock since everything so far is working perfectly.
MathLab implementation: https://github.com/SuwoongHeo/Deformation-Transfer-Matlab
Python implementation: https://github.com/ziyeshanwai/python-deformation-transfer
The exact bit of code I`m having trouble porting ...
The MatLab Code is:
[b, res, resi] = lsqnonlin(#(b) resSimXform( b,double(A),double(B) ),(double(b0)),[],[],options);
function result = resSimXform( b,A,B )
r = b(1:4);
t = b(5:7);
s = b(8);
% s = b(8:10);
n = size(A,2);
if ~isreal(r)
a = 1;
end
R = vrrotvec2mat(r);
test =repmat(t', 1, n);
rot_A = diag(s) * R * A + repmat(t', 1, n);
result = sum(sum((B-rot_A).^2,2));
end
and the equivalent python code is:
b = least_squares(fun=resSimXform, x0=b0, jac='3-point', method='lm', args=(Points_A, Points_B),
ftol=1e-12, xtol=1e-12, gtol=1e-12, max_nfev=100000)
def resSimXform(b, A, B):
print("resSimXform function")
t = b[4:7]
R = np.zeros((3, 3))
R = R_axis_angle(R, b[0:3], b[3])
rot_A = b[7]*R.dot(A) + t[:, np.newaxis]
result = np.sqrt(np.sum((B-rot_A)**2, axis=0))
return result
I have tried various optimizing libraries in C++ with no luck... The way they work is apparently different from the ones in Python or Matlab.
I have tried Eigen's LevenbergMarquardt and DLibs solve_least_squares_lm but failed... I couldn't figure out how to set up them since apparently they are different from their counterparts.
I have already implemented the "Residual of similarity" function(resSimXform) in C++ and it works fine. I`ve been constantly debugging the results and checking against the output from MathLab and Python.
My version in c++ looks something like: ( I'll try to clean it up more when I figure out how the use it with a least squared function...but at the moment it returns the exact value as Matlab )
double residual(MatrixXd x, MatrixXd A, MatrixXd B) {
MatrixXd r(1, 4);
r(0, 0) = x(0, 0);
r(0, 1) = x(0, 1);
r(0, 2) = x(0, 2);
r(0, 3) = x(0, 3);
MatrixXd t(1, 3);
t(0, 0) = x(0, 4);
t(0, 1) = x(0, 5);
t(0, 2) = x(0, 6);
double s = x(0, 7);
MatrixXd R(3, 3);
R = R_axis_angle(r);
MatrixXd rot_A(A.rows(), A.cols());
t.transposeInPlace();
t = t.col(0).replicate(1, A.cols());
rot_A = s * R * A + t;
MatrixXd fvecM = B - rot_A;
for (int i = 0; i < fvecM.rows(); i++) {
for (int j = 0; j < fvecM.cols(); j++) {
fvecM(i, j) = pow(fvecM(i, j), 2);
}
}
Eigen::VectorXd sums(3);
sums(0) = fvecM.row(0).sum();
sums(1) = fvecM.row(1).sum();
sums(2) = fvecM.row(2).sum();
return fvecM.sum();
}
I would greatly appreciate it if anyone out there can help me figure out what optimization library for c++ would work in this scenario and how to implement it... if there are vast differences from the ones in Matlab or Python since my experience with this sort of optimizers is limited.
Thanks in advance...
I have already tried googling for a couple of days and couldn't find a solution so that's why I'm here :)
Figured it out with Eigen`s not supported LevenbergMarquardt. You build a functor... my_functor(void): Functor(8, 8) {} .. If you want to solve 8 values... it will have to be 8x8 ...
The operator function... is basically the residual of similarity...
also if your functor function returns only one value.. you need to pass it 8 times otherwise it will not converge if you just pass it like fvec(0) = result
for (int i = 0; i < x.rows(); i++) {
fvec(i) = sum;
}
And then you use the NumericalDiff mode...
Eigen::NumericalDiff<my_functor> numDiff(functor);
Eigen::LevenbergMarquardt<Eigen::NumericalDiff<my_functor>, double> lm(numDiff);
Hope this helps someone...Cheers
Is there a standard function that does val = cos(phase) + sin(phas) j. Is there a conventional name for applying "Euler's formula" to do the opposite of the "argument" function? Similar to std::polar, but that expects the magnitude component.
I am looking at a large code base that does the following pattern a lot:
Eigen::VectorXf phase = ...;
Eigen::VectorXcf cplx = ...;
cplx.real() = phase.array().cos();
cplx.imag() = phase.array().sin();
This unfortunately does not take advantage of sincos optimization.
Is there a natural way to do this that I haven't found in the documentation? Is there a conventional name for this?
This is probably what I'm going with, just checking with the world before I change a lot of code.
void phase2Cplx(const Eigen::Ref<const Eigen::VectorXf> phaseIn,
Eigen::Ref<Eigen::VectorXcf> cplxOut) {
// assert len in == len out
Eigen::Index len = cplxOut.size();
std::complex<float> *out = cplxOut.data();
const float *in = phaseIn.data();
for (Eigen::Index i = 0; i < len; ++i) {
float val = in[i];
out[i] = { cosf(val), sinf(val) };
}
}
// #Details: Using () was measurably worse than []
I presume, or rather hope, that I have a singular fixable problem or perhaps many smaller ones and should give up. Either way I am relatively new to Rcpp and extremely uninformed on parallel computation and can't find a solution online.
The problem is typically, a 'fatal error' in R or R gets stuck in a loop, something like 5 minuets for 10 iterations, when the non-parallel version will do 5K iterations in the same time, roughly speaking.
As this algorithm fits into a much larger project I call on several other functions, these are all in Rcpp and I rewrote them with only 'arma' objects as that seemed to help other people, here. I also ran the optimization part with a 'heat map' optimizer I wrote in Rcpp, again exclusively in 'arma' without improvement - I should also point out this returned as an 'arma::vec'.
// [[Rcpp::depends("RcppArmadillo")]]
// [[Rcpp::depends("RcppParallel")]]
#include <RcppArmadillo.h>
#include <RcppParallel.h>
using namespace Rcpp;
using namespace std;
using namespace arma;
using namespace RcppParallel;
struct Boot_Worker : public Worker {
//Generate Inputs
// Source vector to keep track of the number of bootstraps
const arma::vec Boot_reps;
// Initial non-linear theta parameter values
const arma::vec init_val;
// Decimal date vector
const arma::colvec T_series;
// Generate the price series observational vector
const arma::colvec Y_est;
const arma::colvec Y_res;
// Generate the optimization constants
const arma::mat U;
const arma::colvec C;
const int N;
// Generate Output Matrix
arma::mat Boots_out;
// Initialize with the proper input and output
Boot_Worker( const arma::vec Boot_reps, const arma::vec init_val, const arma::colvec T_series, const arma::colvec Y_est, const arma::colvec Y_res, const arma::mat U, const arma::colvec C, const int N, arma::mat Boots_out)
: Boot_reps(Boot_reps), init_val(init_val), T_series(T_series), Y_est(Y_est), Y_res(Y_res), U(U), C(C), N(N), Boots_out(Boots_out) {}
void operator()(std::size_t begin, std::size_t end){
//load necessary stuffs from around
Rcpp::Environment stats("package:stats");
Rcpp::Function constrOptim = stats["constrOptim"];
Rcpp::Function SDK_pred_mad( "SDK_pred_mad");
arma::mat fake_data(N,2);
arma::colvec index(N);
for(unsigned int i = begin; i < end; i ++){
// Need a nested loop to create and fill the fake data matrix
arma::vec pool = arma::regspace(0, N-1) ;
std::random_shuffle(pool.begin(), pool.end());
for(int k = 0; k <= N-1; k++){
fake_data(k, 0) = Y_est[k] + Y_res[ pool[k] ];
fake_data(k, 1) = T_series[k];
}
// Call the optimization
Rcpp::List opt_results = constrOptim(Rcpp::_["theta"] = init_val,
Rcpp::_["f"] = SDK_pred_mad,
Rcpp::_["data_in"] = fake_data,
Rcpp::_["grad"] = "NULL",
Rcpp::_["method"] = "Nelder-Mead",
Rcpp::_["ui"] = U,
Rcpp::_["ci"] = C );
/// fill the output matrix ///
// need to create an place holder arma vector for the parameter output
arma::vec opt_param = Rcpp::as<arma::vec>(opt_results[0]);
Boots_out(i, 0) = opt_param[0];
Boots_out(i, 1) = opt_param[1];
Boots_out(i, 2) = opt_param[2];
// for the cost function value at optimization
arma::vec opt_value = Rcpp::as<arma::vec>(opt_results[1]);
Boots_out(i, 3) = opt_value[0];
// for the number of function calls (?)
arma::vec counts = Rcpp::as<arma::vec>(opt_results[2]);
Boots_out(i, 4) = counts[0];
// for thhe convergence code
arma::vec convergence = Rcpp::as<arma::vec>(opt_results[3]);
Boots_out(i, 5) = convergence[0];
}
}
};
// [[Rcpp::export]]
arma::mat SDK_boots_test(arma::vec init_val, arma::mat data_in, int boots_n){
//First establish theta_sp, estimate and residuals
const int N = arma::size(data_in)[0];
// Create the constraints for the constrained optimization
// Make a boundry boundry condition matrix of the form Ui*theta - ci >= 0
arma::mat U(6, 3);
U(0, 0) = 1;
U(1, 0) = -1;
U(2, 0) = 0;
U(3, 0) = 0;
U(4, 0) = 0;
U(5, 0) = 0;
U(0, 1) = 0;
U(1, 1) = 0;
U(2, 1) = 1;
U(3, 1) = -1;
U(4, 1) = 0;
U(5, 1) = 0;
U(0, 2) = 0;
U(1, 2) = 0;
U(2, 2) = 0;
U(3, 2) = 0;
U(4, 2) = 1;
U(5, 2) = -1;
arma::colvec C(6);
C[0] = 0;
C[1] = -data_in(N-1, 9)-0.5;
C[2] = 0;
C[3] = -3;
C[4] = 0;
C[5] = -50;
Rcpp::Function SDK_est( "SDK_est");
Rcpp::Function SDK_res( "SDK_res");
arma::vec Y_est = as<arma::vec>(SDK_est(init_val, data_in));
arma::vec Y_res = as<arma::vec>(SDK_res(init_val, data_in));
// Generate feed items for the Bootstrap Worker
arma::vec T_series = data_in( span(0, N-1), 9);
arma::vec Boots_reps(boots_n+1);
// Allocate the output matrix
arma::mat Boots_out(boots_n, 6);
// Pass input and output the Bootstrap Worker
Boot_Worker Boot_Worker(Boots_reps, init_val, T_series, Y_est, Y_res, U, C, N, Boots_out);
// Now finnaly call the parallel for loop
parallelFor(0, Boots_reps.size(), Boot_Worker);
return Boots_out;
}
So I wrote back in my 'heat algorithm' to solve the optimization, this is entirely in Rcpp-armadillo, this simplifies the code massively as the constraints are written into the optimizer. Additionally, I removed the randomization, so it just has to solve the same optimization; just to see if that was the only problem. Without fail I am still having the same 'fatal error'.
as it stands here is code:
// [[Rcpp::depends("RcppArmadillo")]]
// [[Rcpp::depends("RcppParallel")]]
#include <RcppArmadillo.h>
#include <RcppParallel.h>
#include <random>
using namespace Rcpp;
using namespace std;
using namespace arma;
using namespace RcppParallel;
struct Boot_Worker : public Worker {
//Generate Inputs
// Source vector to keep track of the number of bootstraps
const arma::vec Boot_reps;
// Initial non-linear theta parameter values
const arma::vec init_val;
// Decimal date vector
const arma::colvec T_series;
// Generate the price series observational vector
const arma::colvec Y_est;
const arma::colvec Y_res;
const int N;
// Generate Output Matrix
arma::mat Boots_out;
// Initialize with the proper input and output
Boot_Worker( const arma::vec Boot_reps, const arma::vec init_val, const arma::colvec T_series, const arma::colvec Y_est, const arma::colvec Y_res, const int N, arma::mat Boots_out)
: Boot_reps(Boot_reps), init_val(init_val), T_series(T_series), Y_est(Y_est), Y_res(Y_res), N(N), Boots_out(Boots_out) {}
void operator()(std::size_t begin, std::size_t end){
//load necessary stuffs from around
Rcpp::Function SDK_heat( "SDK_heat");
arma::mat fake_data(N,2);
arma::colvec index(N);
for(unsigned int i = begin; i < end; i ++){
// Need a nested loop to create and fill the fake data matrix
//arma::vec pool = arma::shuffle( arma::regspace(0, N-1) );
for(int k = 0; k <= N-1; k++){
fake_data(k, 0) = Y_est[k] + Y_res[ k ];
//fake_data(k, 0) = Y_est[k] + Y_res[ pool[k] ];
fake_data(k, 1) = T_series[k];
}
// Call the optimization
arma::vec opt_results = Rcpp::as<arma::vec>( SDK_heat(Rcpp::_["data_in"] = fake_data, Rcpp::_["tol"] = 0.1) );
/// fill the output matrix ///
// need to create an place holder arma vector for the parameter output
Boots_out(i, 0) = opt_results[0];
Boots_out(i, 1) = opt_results[1];
Boots_out(i, 2) = opt_results[2];
// for the cost function value at optimization
Boots_out(i, 3) = opt_results[3];
}
}
};
// [[Rcpp::export]]
arma::mat SDK_boots_test(arma::vec init_val, arma::mat data_in, int boots_n){
//First establish theta_sp, estimate and residuals
const int N = arma::size(data_in)[0];
Rcpp::Function SDK_est( "SDK_est");
Rcpp::Function SDK_res( "SDK_res");
const arma::vec Y_est = as<arma::vec>(SDK_est(init_val, data_in));
const arma::vec Y_res = as<arma::vec>(SDK_res(init_val, data_in));
// Generate feed items for the Bootstrap Worker
const arma::vec T_series = data_in( span(0, N-1), 9);
arma::vec Boots_reps(boots_n+1);
// Allocate the output matrix
arma::mat Boots_out(boots_n, 4);
// Pass input and output the Bootstrap Worker
Boot_Worker Boot_Worker(Boots_reps, init_val, T_series, Y_est, Y_res, N, Boots_out);
// Now finnaly call the parallel for loop
parallelFor(0, Boots_reps.size(), Boot_Worker);
return Boots_out;
}
Looking at your code I see the following:
struct Boot_Worker : public Worker {
[...]
void operator()(std::size_t begin, std::size_t end){
//load necessary stuffs from around
Rcpp::Environment stats("package:stats");
Rcpp::Function constrOptim = stats["constrOptim"];
Rcpp::Function SDK_pred_mad( "SDK_pred_mad");
[...]
// Call the optimization
Rcpp::List opt_results = constrOptim(Rcpp::_["theta"] = init_val,
Rcpp::_["f"] = SDK_pred_mad,
Rcpp::_["data_in"] = fake_data,
Rcpp::_["grad"] = "NULL",
Rcpp::_["method"] = "Nelder-Mead",
Rcpp::_["ui"] = U,
Rcpp::_["ci"] = C );
You are calling an R function from a multi-threaded C++ context. That's something you should not do. R is single-threaded so this will lead to undefined behavior or crashes:
API Restrictions
The code that you write within parallel workers should not call the R or Rcpp API in any fashion. This is because R is single-threaded and concurrent interaction with it’s data structures can cause crashes and other undefined behavior. Here is the official guidance from Writing R Extensions:
Calling any of the R API from threaded code is ‘for experts only’: they will need to read the source code to determine if it is thread-safe. In particular, code which makes use of the stack-checking mechanism must not be called from threaded code.
Besides, calling back to R from C++ even in a single threaded context is not the best thing you can do for performance. It should be more efficient to use a optimization library that offers a direct C(++) interface. One possibility might be the development version of nlopt, c.f. this issue for a discussion and references to examples. In addition, std::random_shuffle is not only deprecated in C++14 and removed from C++17, but it is also not thread-safe.
In your second example, you say that the function SDK_heat is actually implemented in C++. In that case you can call it directly:
Remove importing the corresponding R function, i.e. the Rcpp::Function SDK_heat( "SDK_heat");
Make sure that the compiler knows the declaration of the C++ function and that the linker has the actual function:
Quick and dirty: Copy the function definition into your cpp file before the definition of BootWorker.
For a cleaner approach, see section "1.10 Sharing code" in the Rcpp attributes vignette
Call the function like any other C++ function, i.e. using positional arguments with types compatible to the function declaration.
All this assumes you are using sourceCpp as indicated by your usage of [[Rcpp::depends(...)]]. You are reaching a complexity that warrants to build a package from this.