I am writing a function f to be used in a Runge Kutta integrator.
output RungeKutta(function f, initial conditions IC, etc.)
Since the function will be called many times, I am looking for a way to generate the function f at compile time.
In this case, function f depends on a fixed list of parameters vector p, where p is sparse and is fixed before the code is compiled. To be concrete,
double function f(vector<double> x) {
return x dot p;
}
Since p is sparse, taking the dot product in f is not the most efficient. Hard-coding x dot p seems to be the way to go, but p can be very long (1000).
What are my options?
Is writing another program (taking p as input) to generate a .cpp file my only option?
Thanks for the comments. Here is a more concrete example for the differential equation.
dy/dx = f_p(x)
One example for f_p(x):
p = [0, 1, 0]; x = [x1, x2, x3]
double f_p(vector<double> x) {
return x2; // This is what I meant by hard-coding
}
instead of:
double f(vector<double> p, vector<double> x) {
double r = 0;
for (i=0; i < p.length(); i++) {
r += p[i]*x[i];
}
return r;
}
The key problem you are trying to solve is that a "leaf" function in your calculation that will be called many times will also most often do no work given the problem domain. The hope is that the redundant work - namely multiplying a value with an element of an array known at compile time to be zero - can be collapsed as part of a compile time step.
C++ has language facilities to deal with this, namely template metaprogramming. C++ templates are very powerful (ie Turing complete) and allow for things like recursive calculations based on compile time constants.
Below is an example of how to implement your example using templates and template specialization (you can also find a runnable example I've created here http://ideone.com/BDtBt7). The basic idea behind the code is to generate a type with a static function that returns the resulting dot product of an input vector of values and a compile time constant array. The static function recursively calls instances of itself, passing a lower index value as it moves through the input/constant arrays of elements. It is also templated with whether the value in the compile time constant array p that is being evaluated is zero. If it is, we can skip calculating that and move onto the next value in the recursion. Lastly, there is a base case that stops the recursion once we have reached the first element in the array.
#include <array>
#include <iostream>
#include <vector>
constexpr std::array<double, 5> p = { 1.0, 0.0, 3.0, 5.0, 0.0 };
template<size_t index, bool isZero>
struct DotProductCalculator
{
static double Calculate(const std::vector<double>& xArg)
{
return (xArg[index] * p[index])
+ DotProductCalculator<index - 1, p[index - 1] == 0.0>::Calculate(xArg);
}
};
template<>
struct DotProductCalculator<0, true>
{
static double Calculate(const std::vector<double>& xArg)
{
return 0.0;
}
};
template<>
struct DotProductCalculator<0, false>
{
static double Calculate(const std::vector<double>& xArg)
{
return xArg[0] * p[0];
}
};
template<size_t index>
struct DotProductCalculator<index, true>
{
static double Calculate(const std::vector<double>& xArg)
{
return 0.0 + DotProductCalculator<index - 1, p[index - 1] == 0.0>::Calculate(xArg);
}
};
template<typename ArrayType>
double f_p_driver(const std::vector<double>& xArg, const ArrayType& pAsArgument)
{
return DotProductCalculator<std::tuple_size<ArrayType>::value - 1,
p[std::tuple_size<ArrayType>::value -1] == 0.0>::Calculate(xArg);
}
int main()
{
std::vector<double> x = { 1.0, 2.0, 3.0, 4.0, 5.0 };
double result = f_p_driver(x, p);
std::cout << "Result: " << result;
return 0;
}
You say in the comments that P really is a row or column of a matrix, and that the matrix is sparse. I'm not familiar with the specific physical problem you are solving, but often, sparse matrices have a fixed diagonal "banding" structure of some kind, e.g.:
| a1 b1 0 0 0 0 0 d1 |
| c1 a2 b2 0 0 0 0 0 |
| 0 c2 a3 b3 0 0 0 0 |
| 0 0 c3 a4 b4 0 0 0 |
| 0 0 0 c4 a5 b5 0 0 |
| 0 0 0 0 c5 a6 b6 0 |
| 0 0 0 0 0 c6 a7 b7 |
| e1 0 0 0 0 0 c7 a8 |
The most efficient way to store such matrices tends to be to store the diagonals as arrays/vectors, so:
A = [a1, a2, a3, a4, a5, a6, a7, a8]
B = [b1, b2, b3, b4, b5, b6, b7]
C = [c1, c2, c3, c4, c5, c6, c7]
D = [d1]
E = [e1]
Multiplying a row-vector X = [x1, x2, x3, x4, x5, x6, x7, x8] by the above matrix thus becomes:
Y = X . M
Y[0] = X[0] * A[0] + X[1] * C[0] + X[7] * E[0]
Y[1] = X[0] * B[0] + X[1] * A[1] + X[2] * C[1]
etc.
or more generally:
Y[i] = X[i-7] * D[i] + X[i-1] * B[i] + X[i] * A[i] + X[i+1] * C[i] + X[i+7] * E[i]
Where out-of-range array accesses (< 0 or >= 8 should be treated as evaluating to 0. To avoid having to test for out-of-bounds everywhere, you can actually store each diagonal and the vector itself in oversize arrays which have leading and trailing elements filled with zeroes.
Note that this will also be highly cache efficient, as all array accesses are linear.
With the given constraints I would create a custom function object which stores the matrix p and computes the operation in its function call operator. I would implement two versions of the function: one which preprocesses the matrix upon construction to "know" where the non-zero elements are and one which just does the operations as stated, accepting that many of the computations just result in 0. The quoted amount of 10% non-zero elements sounds likely to be too dense for the complication from taking advantage of the sparsity to pay off.
Ignoring that p is a matrix and using it as a vector, the version without preprocessing would be something like this:
class dotProduct {
std::vector<double> p;
public:
dotProduct(std::vector<double> const& p): p(p) {}
double operator()(std::vector<double> const& x) const {
return std::inner_product(p.begin(), p.end(), x.begin());
}
};
// ...
... RungeKutta(dotProduct(p), initial conditions IC, etc.);
When using C++11 a lambda function could be used instead:
... RungeKutta([=](std::vector<double> const& x) {
return std::inner_product(p.begin(), p.end(), x.begin());
}, intitial conditions IC, etc.);
For the preprocessing version you'd store a std::vector<std::pair<double, std::size_t>> indicating which indices actually need to be multiplied:
class sparseDotProduct {
typedef std::vector<std::pair<double, std::size_t>> Vector;
Vector p;
public:
sparsedotProduct(std::vector<double> const& op) {
for (std::size_t i(0), s(op.size()); i != s; ++i) {
if (op[i]) {
p.push_back(std::make_pair(op[i], i));
}
}
}
double operator()(std::vector<double> const& x) {
double result(0);
for (Vector::const_iterator it(p.begin()), end(p.end()); it != end; ++it) {
result += it->first * x[it->second];
}
return result;
}
};
The use of this function object is just the same although it may be reasonable to keep this object around if p doesn't change.
I would personally expect that the non-sparse version actually outperforms the sparse version if there are 10% non-zero values. However, with these two versions around it should be relatively simple to measure the performance of the different approaches. I wouldn't expect a custom created code to be substantially better although it could improve on the computation. If so, it may work to use meta programming techniques to create the code but I doubt that this would be too practical.
Related
The problem I'm having is that I'm unsure how I can multiply matrices together by the same matrices over and over. What I'm trying to achieve is that I want to be able to update the matrices. Here is my code:
int fib3(int a, int b, int n) {
int num[2][2] = { {0,1}, {1,1} };
const int num2[2][2] = { {0,1}, {1,1} };
int factArray[2][1] = { {0}, {1} };
if (n == 0) {
return a;
}
else if (n == 1) {
return b;
}
else {
for (int i = 0; i <= n; i++) {
num[0][0] = ((num2[0][0] * 0) + num2[0][1] * 1);
num[0][1] = ((num2[0][0] * 1) + num2[0][1] * 1);
num[1][0] = ((num2[1][0] * 0) + num2[1][1] * 1);
num[1][1] = ((num2[1][0] * 1) + num2[1][1] * 1);
}
factArray[0][0] = ((num[0][0] * factArray[0][0]) + num[0][1] * factArray[1][0]);
factArray[1][0] = ((num[1][0] * factArray[0][0]) + num[1][1] * factArray[1][0]);
return factArray[0][0];
}
Here I would take the previous matrices and multiply it by a constant matrices, but I am unsure how to update the matrices as I do.
So the matrices is raised to some power.
So for example, I want to find f(5) the 5th Fibonacci sequence , which should be 5, and I am getting 1 as the result in the programming.
The formula in matrix representation is mainly of interest for theoretical analysis. The trick is that you can have always two elements of the sequence in the vector instead of having to refer to earlier elements of the sequence. However, to implement it I dont see the benefit compared to using the recursive formula. Condsider that
| 1 1 | | a | | a+b |
| 1 0 | * | b | = | a |
Hence the matrix multiplication effectively does exactly the same: add the last two elements, remeber current one (a).
That being said, your code has some problems:
you pass a and b but you only ever use them for the first and second element of the sequence. You dont need a and b. The initial values are already in the starting value of the matrix.
you have a loop, but in each iteration you calculate the same values and write them into the same array elements.
I cannot really follow the logic of your code. Why is there another multiplication after the loop? The matrix formula says, in a nutshell, "take some starting vector, apply a matrix n times, done". To be honest I cannot find that anywhere in your code ;)
If you insist on using matrix multiplications, I would suggest to stay away from c-style arrays. They don't like to be passed around. Use std::array instead. I have a slight aversion against nesting, hence I'd suggest to use
constexpr size_t N = 2;
using matrix = std::array<int,N*N>;
using vector = std::array<int,N>;
std::arrays can be returned with no pain:
vector multiply(const matrix& a,const vector& b) {
vector result;
auto ma = [&a](size_t row,size_t col) { return a[row*N+col];};
result[0] = ma(0,0)*b[0] + ma(0,1)*b[1];
result[1] = ma(1,0)*b[0] + ma(1,1)*b[1];
return result;
}
Now it should be straight-forward to implement the fibonacci sequence.
Spoiler Alert
I am trying to evaluate the coefficients and time of two fifth-order polynomials (one each for x and y position) that minimizes effort and time (the objective function) when connecting an initial position, velocity, and orientation to a desired final position and orientation with 0 velocity (equality constraints). Here is the code:
#include <vector>
#include <cppad/cppad.hpp>
#include <cppad/ipopt/solve.hpp>
using CppAD::AD;
typedef struct {
double x, y, theta, linear_velocity;
} Waypoint;
typedef std::vector<Waypoint> WaypointList;
struct TrajectoryConfig {
//! gain on accumulated jerk term in cost function
double Kj;
//! gain on time term in cost function
double Kt;
//! gain on terminal velocity term in cost function
double Kv;
};
class Trajectory {
public:
explicit Trajectory(TrajectoryConfig config);
~Trajectory();
void updateConfigs(TrajectoryConfig config);
void solve(WaypointList waypoints);
private:
//! solution vector
std::vector<double> solution_;
//! gain on accumulated jerk term in cost function
double Kj_;
//! gain on time term in cost function
double Kt_;
//! gain on terminal velocity term in cost function
double Kv_;
};
/*
Trajectory(TrajectoryConfig)
class constructor. Initializes class given configuration struct
*/
Trajectory::Trajectory(TrajectoryConfig config) {
Kj_ = config.Kj;
Kt_ = config.Kt;
Kv_ = config.Kv;
}
Trajectory::~Trajectory() {
std::cerr << "Trajectory Destructor!" << std::endl;
}
enum Indices { A0 = 0, A1, A2, A3, A4, A5, B0, B1, B2, B3, B4, B5, T };
class FGradEval {
public:
size_t M_;
// gains on cost;
double Kj_, Kt_;
// constructor
FGradEval(double Kj, double Kt) {
M_ = 13; // no. of parameters per trajectory segment: 2 x 6 coefficients + 1 time
Kj_ = Kj;
Kt_ = Kt;
}
typedef CPPAD_TESTVECTOR(AD<double>) ADvector;
void operator()(ADvector& fgrad, const ADvector& vars) {
fgrad[0] = 0;
AD<double> accum_jerk;
AD<double> a0, a1, a2, a3, a4, a5;
AD<double> b0, b1, b2, b3, b4, b5;
AD<double> T, T2, T3, T4, T5;
AD<double> x, y, vx, vy;
size_t offset = 1;
a0 = vars[Indices::A0];
a1 = vars[Indices::A1];
a2 = vars[Indices::A2];
a3 = vars[Indices::A3];
a4 = vars[Indices::A4];
a5 = vars[Indices::A5];
b0 = vars[Indices::B0];
b1 = vars[Indices::B1];
b2 = vars[Indices::B2];
b3 = vars[Indices::B3];
b4 = vars[Indices::B4];
b5 = vars[Indices::B5];
T = vars[Indices::T];
T2 = T*T;
T3 = T*T2;
T4 = T*T3;
T5 = T*T4;
x = a0 + a1*T + a2*T2 + a3*T3 + a4*T4 + a5*T5;
y = b0 + b1*T + b2*T2 + b3*T3 + b4*T4 + b5*T5;
vx = a1 + 2*a2*T + 3*a3*T2 + 4*b4*T3 + 5*a5*T4;
vy = b1 + 2*b2*T + 3*b3*T2 + 4*b4*T3 + 5*b5*T4;
//! cost-terms
//! accum_jerk is the analytic integral of int_0^T (jerk_x^2 + jerk_y^2) dt
accum_jerk = 36 * T * (a3*a3 + b3*b3) + 144 * T2 * (a3*a4 + b3*b4) + T3 * (240*(a3*a5 + b3*b5) + 192*(a4*a4 + b4*b4))
+ 720 * T4 * (a4*a5 + b4*b5) + 720 * T5 * (a5*a5 + b5*b5);
fgrad[0] += Kj_ * accum_jerk;
fgrad[0] += Kt_ * T;
//! initial equality constraints
fgrad[offset] = vars[Indices::A0];
fgrad[1 + offset] = vars[Indices::B0];
fgrad[2 + offset] = vars[Indices::A1];
fgrad[3 + offset] = vars[Indices::B1];
offset += 4;
//! terminal inequality constraints
fgrad[offset] = x;
fgrad[offset + 1] = y;
fgrad[offset + 2] = vx;
fgrad[offset + 3] = vy;
}
};
void Trajectory::solve(WaypointList waypoints) {
if (waypoints.size() != 2) {
std::cerr << "Trajectory::solve - Function requires 2 waypoints." << std::endl;
return;
}
//! status flag for solution
bool ok;
//! typedef for ipopt/cppad
typedef CPPAD_TESTVECTOR(double) Dvector;
//! no. of variables for optimization problem
size_t n_vars = 13;
//! no. of constraints
size_t n_cons = 4 * 2; // the start and final waypoint each contribute 4 constraints (x, y, theta, v) -> (x, y, vx, vy)
//! create vector container for optimizer solution
//! and initialize it to zero
Dvector vars(n_vars);
for (size_t i = 0; i < n_vars; i++) {
vars[i] = 0;
}
//! set initial state (this will only determine the first two coefficients of the initial polynomials)
double v = (fabs(waypoints[0].linear_velocity) < 1e-3)
? 1e-3 : waypoints[0].linear_velocity;
vars[Indices::A0] = waypoints[0].x;
vars[Indices::B0] = waypoints[0].y;
vars[Indices::A1] = v * cos(waypoints[0].theta);
vars[Indices::B1] = v * sin(waypoints[0].theta);
vars[Indices::T] = 0;
//! there are no explicit bounds on vars, so set to something large for the optimizer
//! we could perhaps put bounds on the coeffs corresponding to acc, jerk, snap, ..
Dvector vars_lb(n_vars);
Dvector vars_ub(n_vars);
for (size_t i = 0; i < n_vars; i++) {
vars_lb[i] = -1e10;
vars_ub[i] = 1e10;
}
//! time must be non-negative!
vars_lb[Indices::T] = 0;
//! set the bounds on the constraints
Dvector cons_lb(n_cons);
Dvector cons_ub(n_cons);
//! offset term on index
size_t offset = 0;
//! initial equality constraint - we must start from where we are!
cons_lb[0] = waypoints[0].x;
cons_ub[0] = waypoints[0].x;
cons_lb[1] = waypoints[0].y;
cons_ub[1] = waypoints[0].y;
cons_lb[2] = v * cos(waypoints[0].theta);
cons_ub[2] = v * cos(waypoints[0].theta);
cons_lb[3] = v * sin(waypoints[0].theta);
cons_ub[3] = v * sin(waypoints[0].theta);
offset += 4;
//! terminal point
cons_lb[offset] = waypoints[1].x;
cons_ub[offset] = waypoints[1].x;
cons_lb[offset + 1] = waypoints[1].y;
cons_ub[offset + 1] = waypoints[1].y;
cons_lb[offset + 2] = 1e-3 * cos(waypoints[1].theta);
cons_ub[offset + 2] = 1e-3 * cos(waypoints[1].theta);
cons_lb[offset + 3] = 1e-3 * sin(waypoints[1].theta);
cons_ub[offset + 3] = 1e-3 * sin(waypoints[1].theta);
//! create instance of objective function class
FGradEval fg_eval(Kj_, Kt_);
//! IPOPT INITIALIZATION
std::string options;
options += "Integer print_level 5\n";
options += "Sparse true forward\n";
options += "Sparse true reverse\n";
options += "Integer max_iter 100\n";
// options += "Numeric tol 1e-4\n";
//! compute the solution
CppAD::ipopt::solve_result<Dvector> solution;
//! solve
CppAD::ipopt::solve<Dvector, FGradEval>(
options, vars, vars_lb, vars_ub, cons_lb, cons_ub, fg_eval, solution);
//! check if the solver was successful
ok = solution.status == CppAD::ipopt::solve_result<Dvector>::success;
//! if the solver was unsuccessful, exit
//! this case will be handled by calling method
if (!ok) {
std::cerr << "Trajectory::solve - Failed to find a solution!" << std::endl;
return;
}
//! (DEBUG) output the final cost
std::cout << "Final Cost: " << solution.obj_value << std::endl;
//! populate output with argmin vector
for (size_t i = 0; i < n_vars; i++) {
solution_.push_back(solution.x[i]);
}
return;
}
Where I am having problems is in the following:
The initial equality constraint (starting position, velocity, and orientation) is being upheld, while the terminal velocity constraint is not. The algorithm terminates at the correct final (x,y,angle), but the velocity is not zero. I have looked through the code and I cannot understand why the position and orientation at the endpoint would be obeyed while the velocity would not. My suspicion is that my definition of the equality constraints is not what I think it is.
The problem does not converge regularly, but this seems a fairly simple problem as defined (see output)
******************************************************************************
This program contains Ipopt, a library for large-scale nonlinear optimization.
Ipopt is released as open source code under the Eclipse Public License (EPL).
For more information visit http://projects.coin-or.org/Ipopt
******************************************************************************
This is Ipopt version 3.11.9, running with linear solver mumps.
NOTE: Other linear solvers might be more efficient (see Ipopt documentation).
Number of nonzeros in equality constraint Jacobian...: 30
Number of nonzeros in inequality constraint Jacobian.: 0
Number of nonzeros in Lagrangian Hessian.............: 23
Total number of variables............................: 13
variables with only lower bounds: 0
variables with lower and upper bounds: 13
variables with only upper bounds: 0
Total number of equality constraints.................: 8
Total number of inequality constraints...............: 0
inequality constraints with only lower bounds: 0
inequality constraints with lower and upper bounds: 0
inequality constraints with only upper bounds: 0
iter objective inf_pr inf_du lg(mu) ||d|| lg(rg) alpha_du alpha_pr ls
0 9.9999900e-03 1.00e+00 5.00e-04 -1.0 0.00e+00 - 0.00e+00 0.00e+00 0
1 5.9117705e-02 1.00e+00 1.20e+02 -1.0 5.36e+07 - 1.04e-05 7.63e-06f 18
2 1.1927070e+00 1.00e+00 2.62e+06 -1.0 9.21e+05 -4.0 6.16e-15 2.29e-23H 1
3 2.9689692e-01 1.00e+00 1.80e+05 -1.0 2.24e+13 - 1.83e-07 8.42e-10f 20
4r 2.9689692e-01 1.00e+00 1.00e+03 -0.0 0.00e+00 - 0.00e+00 4.58e-07R 11
5r 2.1005820e+01 9.99e-01 5.04e+02 -0.0 6.60e-02 - 9.90e-01 4.95e-01f 2
6r 7.7118141e+04 9.08e-01 5.18e+03 -0.0 2.09e+00 - 4.21e-01 1.00e+00f 1
7r 1.7923891e+04 7.82e-01 1.54e+03 -0.0 3.63e+00 - 9.90e-01 1.00e+00f 1
8r 5.9690221e+03 5.41e-01 5.12e+02 -0.0 2.92e+00 - 9.90e-01 1.00e+00f 1
9r 4.6855625e+03 5.54e-01 1.95e+02 -0.0 5.14e-01 - 9.92e-01 1.00e+00f 1
iter objective inf_pr inf_du lg(mu) ||d|| lg(rg) alpha_du alpha_pr ls
10r 8.4901226e+03 5.55e-01 5.18e+01 -0.0 2.24e-01 - 1.00e+00 1.00e+00f 1
Number of Iterations....: 10
(scaled) (unscaled)
Objective...............: 8.4901225582208808e+03 8.4901225582208808e+03
Dual infeasibility......: 6.3613117039244315e+06 6.3613117039244315e+06
Constraint violation....: 5.5503677023620179e-01 5.5503677023620179e-01
Complementarity.........: 9.9999982900301554e-01 9.9999982900301554e-01
Overall NLP error.......: 6.3613117039244315e+06 6.3613117039244315e+06
Number of objective function evaluations = 43
Number of objective gradient evaluations = 6
Number of equality constraint evaluations = 71
Number of inequality constraint evaluations = 0
Number of equality constraint Jacobian evaluations = 12
Number of inequality constraint Jacobian evaluations = 0
Number of Lagrangian Hessian evaluations = 10
Total CPU secs in IPOPT (w/o function evaluations) = 0.006
Total CPU secs in NLP function evaluations = 0.001
EXIT: Maximum Number of Iterations Exceeded.
I am not looking for an answer to my problem specifically. What I am hoping for are some suggestions as to why my problem may not be working as expected. Specifically, do my constraints make sense, as defined? Is the variable initialization done properly?
The problem was in the following lines:
x = a0 + a1*T + a2*T2 + a3*T3 + a4*T4 + a5*T5;
y = b0 + b1*T + b2*T2 + b3*T3 + b4*T4 + b5*T5;
vx = a1 + 2*a2*T + 3*a3*T2 + 4*b4*T3 + 5*a5*T4;
vy = b1 + 2*b2*T + 3*b3*T2 + 4*b4*T3 + 5*b5*T4;
Specifically,
vx = a1 + 2*a2*T + 3*a3*T2 + 4*b4*T3 + 5*a5*T4;
should be
vx = a1 + 2*a2*T + 3*a3*T2 + 4*a4*T3 + 5*a5*T4;
based upon the mapping of a's to the x-coordinate and b's to the y-coordinate.
This fixed the problem of constraint violation.
With regards to the problem of convergence/feasibility, I found that ensuring that the initial guess is in the feasible set (obeys the equality constraints) fixed this problem; measures of optimizer performance (inf_pr and inf_du, etc...) were much smaller after fixing the initial condition.
Consider the following problem:
My question is the following: how to optimize the following independent functions:
// Computation of the coordinates of P
inline std::array<double, 3> P(const std::array<double, 3>& A,
const std::array<double, 3>& B,
const std::array<double, 3>& M)
{
// The most inefficient version in the world (to be verified)
std::array<double, 3> AB = {B[0]-A[0], B[1]-A[1], B[2]-A[2]};
std::array<double, 3> AM = {M[0]-A[0], M[1]-A[1], M[2]-A[2]};
double norm = std::sqrt(AB[0]*AB[0]+AB[1]*AB[1]+AB[2]*AB[2]);
double dot = AB[0]*AM[0]+AB[1]*AM[1]+AB[2]*AM[2];
double d1 = dot/norm;
std::array<double, 3> AP = {AB[0]/d1, AB[1]/d1, AB[2]/d1};
std::array<double, 3> P = {AP[0]-A[0], AP[1]-A[1], AP[2]-A[2]};
return P;
}
// Computation of the distance d0
inline double d0(const std::array<double, 3>& A,
const std::array<double, 3>& B,
const std::array<double, 3>& M)
{
// The most inefficient version in the world (to be verified)
std::array<double, 3> AB = {B[0]-A[0], B[1]-A[1], B[2]-A[2]};
std::array<double, 3> AM = {M[0]-A[0], M[1]-A[1], M[2]-A[2]};
double norm = std::sqrt(AB[0]*AB[0]+AB[1]*AB[1]+AB[2]*AB[2]);
double dot = AB[0]*AM[0]+AB[1]*AM[1]+AB[2]*AM[2];
double d1 = dot/norm;
std::array<double, 3> AP = {AB[0]/d1, AB[1]/d1, AB[2]/d1};
std::array<double, 3> P = {AP[0]-A[0], AP[1]-A[1], AP[2]-A[2]};
std::array<double, 3> MP = {P[0]-M[0], P[1]-M[1], P[2]-M[2]};
double d0 = std::sqrt(MP[0]*MP[0]+MP[1]*MP[1]+MP[2]*MP[2]);
return d0;
}
// Computation of the distance d1
inline double d1(const std::array<double, 3>& A,
const std::array<double, 3>& B,
const std::array<double, 3>& M)
{
// The most inefficient version in the world (to be verified)
std::array<double, 3> AB = {B[0]-A[0], B[1]-A[1], B[2]-A[2]};
std::array<double, 3> AM = {M[0]-A[0], M[1]-A[1], M[2]-A[2]};
double norm = std::sqrt(AB[0]*AB[0]+AB[1]*AB[1]+AB[2]*AB[2]);
double dot = AB[0]*AM[0]+AB[1]*AM[1]+AB[2]*AM[2];
double d1 = dot/norm;
}
// Computation of the distance d2
inline double d2(const std::array<double, 3>& A,
const std::array<double, 3>& B,
const std::array<double, 3>& M)
{
// The most inefficient version in the world (to be verified)
std::array<double, 3> AB = {B[0]-A[0], B[1]-A[1], B[2]-A[2]};
std::array<double, 3> AM = {M[0]-A[0], M[1]-A[1], M[2]-A[2]};
double norm = std::sqrt(AB[0]*AB[0]+AB[1]*AB[1]+AB[2]*AB[2]);
double dot = AB[0]*AM[0]+AB[1]*AM[1]+AB[2]*AM[2];
double d1 = dot/norm;
double d2 = norm-d1;
return d2;
}
So that each function will be as much optimized as possible ? (I will execute these functions billion times).
From algorithm point of view, you can calculate projection of vector to another vector not using SQRT call
the pseudocode from here
http://www.euclideanspace.com/maths/geometry/elements/line/projections/
// projection of vector v1 onto v2
inline vector3 projection( const vector3& v1, const vector3& v2 ) {
float v2_ls = v2.len_squared();
return v2 * ( dot( v2, v1 )/v2_ls );
}
where dot() is a dot product of two vectors and len_squared is the dot product of vector with self.
NOTE: Try to pre calculate inverse of v2_ls before main loop, if possible.
It is probably better to compute all requested quantities in a single go.
Let P = A + t . AB the vector equation giving the position of P. Express that MP and AB are orthogonal: MP . AB = 0 = (MA + t . AB) . AB, which yields t= - (MA . AB) / AB^2, and P.
t is the ratio AP / AB, hence d1 = t . |AB|. Similarly, d2 = (1 - t) . |AB|. d0 is obtained from Pythagoras, d0^2 = MA^2 - d1^2, or by direct computation of |MP|.
Accounting: compute MA (3 add), AB (3 add), AB^2 (2 add, 3 mul), MA.AB (2 add, 3 mul), t (1 div), P (3 add, 3 mul), |AB| (1 sqrt), d1 (1 mul), d2 (1 add, 1 mul), MA^2 (2 add, 3 mul), d0 (1 add, 1 mul, 1 sqrt).
Total 17 add, 15 mul, 1 div, 2 sqrt.
If you want portable code, so no processor specific features are used, I'd suggest the following:-
1) As I mentioned in my comment above, create a 3D vector class, it will just make it a lot easier to write the code (optimise development time)
2) Create an intersection class that uses lazy evaluation to get P, d0 and d1, like this:-
class Intersection
{
public:
Intersection (A, B, M) { store A, B, M; constants_calculated = false }
Point GetP () { CalculateConstants; Return P; }
double GetD0 () { CalculateConstants; Return D0; }
double GetD1 () { CalculateConstants; Return D1; }
private:
CalculateConstants ()
{
if (!constants_calculate)
{
calculate and store common expressions required for P, d0 and d1
constants_calculate = true
}
}
3) Don't call it a billion times. Not doing something is infinitely quicker. Why does it need to be called so often? Is there a way to do the same thing with fewer calls to find P, d0 and d1?
If you can use processor specific features, then you could look into doing things like using SIMD, but that may require dropping the precision from double to float.
The following is a C++implementation of point-to-line projection calculation
#include <iostream>
#include <cmath>
using namespace std;
int main() {
// the point
double x0 = 1.0, y0 = 1.0;
// the line equation
double A = 1.0, B = 1.0, C = 0.0;
// calc point to line distance
double dist = fabs(A * x0 + B * y0 + C) / sqrt(A * A + B * B);
// calc project point coord
double x1 = x0 - dist * A / sqrt(A * A + B * B);
double y1 = y0 - dist * B / sqrt(A * A + B * B);
// result
cout << "project point:(" << x1 << ", " << y1 << ")" << endl;
return 0;
}
question is easy.
Lets say you have function
double interpolate (double x);
and you have a table that has map of known x-> y
for example
5 15
7 18
10 22
note: real tables are bigger ofc, this is just example.
so for 8 you would return 18+((8-7)/(10-7))*(22-18)=19.3333333
One cool way I found is
http://www.bnikolic.co.uk/blog/cpp-map-interp.html
(long story short it uses std::map, key= x, value = y for x->y data pairs).
If somebody asks what is the if else if else way in title
it is basically:
if ((x>=5) && (x<=7))
{
//interpolate
}
else
if((x>=7) && x<=10)
{
//interpolate
}
So is there a more clever way to do it or map way is the state of the art? :)
Btw I prefer soutions in C++ but obviously any language solution that has 1:1 mapping to C++ is nice.
Well, the easiest way I can think of would be using a binary search to find the point where your point lies. Try to avoid maps if you can, as they are very slow in practice.
This is a simple way:
const double INF = 1.e100;
vector<pair<double, double> > table;
double interpolate(double x) {
// Assumes that "table" is sorted by .first
// Check if x is out of bound
if (x > table.back().first) return INF;
if (x < table[0].first) return -INF;
vector<pair<double, double> >::iterator it, it2;
// INFINITY is defined in math.h in the glibc implementation
it = lower_bound(table.begin(), table.end(), make_pair(x, -INF));
// Corner case
if (it == table.begin()) return it->second;
it2 = it;
--it2;
return it2->second + (it->second - it2->second)*(x - it2->first)/(it->first - it2->first);
}
int main() {
table.push_back(make_pair(5., 15.));
table.push_back(make_pair(7., 18.));
table.push_back(make_pair(10., 22.));
// If you are not sure if table is sorted:
sort(table.begin(), table.end());
printf("%f\n", interpolate(8.));
printf("%f\n", interpolate(10.));
printf("%f\n", interpolate(10.1));
}
You can use a binary search tree to store the interpolation data. This is beneficial when you have a large set of N interpolation points, as interpolation can then be performed in O(log N) time. However, in your example, this does not seem to be the case, and the linear search suggested by RedX is more appropriate.
#include <stdio.h>
#include <assert.h>
#include <map>
static double interpolate (double x, const std::map<double, double> &table)
{
assert(table.size() > 0);
std::map<double, double>::const_iterator it = table.lower_bound(x);
if (it == table.end()) {
return table.rbegin()->second;
} else {
if (it == table.begin()) {
return it->second;
} else {
double x2 = it->first;
double y2 = it->second;
--it;
double x1 = it->first;
double y1 = it->second;
double p = (x - x1) / (x2 - x1);
return (1 - p) * y1 + p * y2;
}
}
}
int main ()
{
std::map<double, double> table;
table.insert(std::pair<double, double>(5, 6));
table.insert(std::pair<double, double>(8, 4));
table.insert(std::pair<double, double>(9, 5));
double y = interpolate(5.1, table);
printf("%f\n", y);
}
Store your points sorted:
index X Y
1 1 -> 3
2 3 -> 7
3 10-> 8
Then loop from max to min and as soon as you get below a number you know it the one you want.
You want let's say 6 so:
// pseudo
for i = 3 to 1
if x[i] <= 6
// you found your range!
// interpolate between x[i] and x[i - 1]
break; // Do not look any further
end
end
Yes, I guess that you should think in a map between those intervals and the natural nummbers. I mean, just label the intervals and use a switch:
switch(I) {
case Int1: //whatever
break;
...
default:
}
I don't know, it's the first thing that I thought of.
EDIT Switch is more efficient than if-else if your numbers are within a relative small interval (that's something to take into account when doing the mapping)
If your x-coordinates must be irregularly spaced, then store the x-coordinates in sorted order, and use a binary search to find the nearest coordinate, for example using Daniel Fleischman's answer.
However, if your problem permits it, consider pre-interpolating to regularly spaced data. So
5 15
7 18
10 22
becomes
5 15
6 16.5
7 18
8 19.3333333
9 20.6666667
10 22
Then at run-time you can interpolate with O(1) using something like this:
double interp1( double x0, double dx, double* y, int n, double xi )
{
double f = ( xi - x0 ) / dx;
if (f<0) return y[0];
if (f>=(n-1)) return y[n-1];
int i = (int) f;
double w = f-(double)i;
return dy[i]*(1.0-w) + dy[i+1]*w;
}
using
double y[6] = {15,16.5,18,19.3333333, 20.6666667, 22 }
double yi = interp1( 5.0 , 1.0 , y, 5, xi );
This isn't necessarily suitable for every problem -- you could end up losing accuracy (if there's no nice grid that contains all your x-samples), and it could have a bad cache penalty if it would make your table much much bigger. But it's a good option for cases where you have some control over the x-coordinates to begin with.
How you've already got it is fairly readable and understandable, and there's a lot to be said for that over a "clever" solution. You can however do away with the lower bounds check and clumsy && because the sequence is ordered:
if (x < 5)
return 0;
else if (x <= 7)
// interpolate
else if (x <= 10)
// interpolate
...
I recently came across the following interview question, I was wondering if a dynamic programming approach would work, or/and if there was some kind of mathematical insight that would make the solution easier... Its very similar to how ieee754 doubles are constructed.
Question:
There is vector V of N double values. Where the value at the ith index of the vector is equal to 1/2^(i+1). eg: 1/2, 1/4, 1/8, 1/16 etc...
You're to write a function that takes one double 'r' as input, where 0 < r < 1, and output the indexes of V to stdout that when summed will give a value closest to the value 'r' than any other combination of indexes from the vector V.
Furthermore the number of indexes should be a minimum, and in the event there are two solutions, the solution closest to zero should be preferred.
void getIndexes(std::vector<double>& V, double r)
{
....
}
int main()
{
std::vector<double> V;
// populate V...
double r = 0.3;
getIndexes(V,r);
return 0;
}
Note: It seems like there are a few SO'ers that aren't in the mood of reading the question completely. So lets all note the following:
The solution, aka the sum may be larger than r - hence any strategy incrementally subtracting fractions from r, until it hits zero or near zero is wrong
There are examples of r, where there will be 2 solutions, that is |r-s0| == |r-s1| and s0 < s1 - in this case s0 should be selected, this makes the problem slightly more difficult, as the knapsack style solutions tend to greedy overestimates first.
If you believe this problem is trivial, you most likely haven't understood it. Hence it would be a good idea to read the question again.
EDIT (Matthieu M.): 2 examples for V = {1/2, 1/4, 1/8, 1/16, 1/32}
r = 0.3, S = {1, 3}
r = 0.256652, S = {1}
Algorithm
Consider a target number r and a set F of fractions {1/2, 1/4, ... 1/(2^N)}. Let the smallest fraction, 1/(2^N), be denoted P.
Then the optimal sum will be equal to:
S = P * round(r/P)
That is, the optimal sum S will be some integer multiple of the smallest fraction available, P. The maximum error, err = r - S, is ± 1/2 * 1/(2^N). No better solution is possible because this would require the use of a number smaller than 1/(2^N), which is the smallest number in the set F.
Since the fractions F are all power-of-two multiples of P = 1/(2^N), any integer multiple of P can be expressed as a sum of the fractions in F. To obtain the list of fractions that should be used, encode the integer round(r/P) in binary and read off 1 in the kth binary place as "include the kth fraction in the solution".
Example:
Take r = 0.3 and F as {1/2, 1/4, 1/8, 1/16, 1/32}.
Multiply the entire problem by 32.
Take r = 9.6, and F as {16, 8, 4, 2, 1}.
Round r to the nearest integer.
Take r = 10.
Encode 10 as a binary integer (five places)
10 = 0b 0 1 0 1 0 ( 8 + 2 )
^ ^ ^ ^ ^
| | | | |
| | | | 1
| | | 2
| | 4
| 8
16
Associate each binary bit with a fraction.
= 0b 0 1 0 1 0 ( 1/4 + 1/16 = 0.3125 )
^ ^ ^ ^ ^
| | | | |
| | | | 1/32
| | | 1/16
| | 1/8
| 1/4
1/2
Proof
Consider transforming the problem by multiplying all the numbers involved by 2**N so that all the fractions become integers.
The original problem:
Consider a target number r in the range 0 < r < 1, and a list of fractions {1/2, 1/4, .... 1/(2**N). Find the subset of the list of fractions that sums to S such that error = r - S is minimised.
Becomes the following equivalent problem (after multiplying by 2**N):
Consider a target number r in the range 0 < r < 2**N and a list of integers {2**(N-1), 2**(N-2), ... , 4, 2, 1}. Find the subset of the list of integers that sums to S such that error = r - S is minimised.
Choosing powers of two that sum to a given number (with as little error as possible) is simply binary encoding of an integer. This problem therefore reduces to binary encoding of a integer.
Existence of solution: Any positive floating point number r, 0 < r < 2**N, can be cast to an integer and represented in binary form.
Optimality: The maximum error in the integer version of the solution is the round-off error of ±0.5. (In the original problem, the maximum error is ±0.5 * 1/2**N.)
Uniqueness: for any positive (floating point) number there is a unique integer representation and therefore a unique binary representation. (Possible exception of 0.5 = see below.)
Implementation (Python)
This function converts the problem to the integer equivalent, rounds off r to an integer, then reads off the binary representation of r as an integer to get the required fractions.
def conv_frac (r,N):
# Convert to equivalent integer problem.
R = r * 2**N
S = int(round(R))
# Convert integer S to N-bit binary representation (i.e. a character string
# of 1's and 0's.) Note use of [2:] to trim leading '0b' and zfill() to
# zero-pad to required length.
bin_S = bin(S)[2:].zfill(N)
nums = list()
for index, bit in enumerate(bin_S):
k = index + 1
if bit == '1':
print "%i : 1/%i or %f" % (index, 2**k, 1.0/(2**k))
nums.append(1.0/(2**k))
S = sum(nums)
e = r - S
print """
Original number `r` : %f
Number of fractions `N` : %i (smallest fraction 1/%i)
Sum of fractions `S` : %f
Error `e` : %f
""" % (r,N,2**N,S,e)
Sample output:
>>> conv_frac(0.3141,10)
1 : 1/4 or 0.250000
3 : 1/16 or 0.062500
8 : 1/512 or 0.001953
Original number `r` : 0.314100
Number of fractions `N` : 10 (smallest fraction 1/1024)
Sum of fractions `S` : 0.314453
Error `e` : -0.000353
>>> conv_frac(0.30,5)
1 : 1/4 or 0.250000
3 : 1/16 or 0.062500
Original number `r` : 0.300000
Number of fractions `N` : 5 (smallest fraction 1/32)
Sum of fractions `S` : 0.312500
Error `e` : -0.012500
Addendum: the 0.5 problem
If r * 2**N ends in 0.5, then it could be rounded up or down. That is, there are two possible representations as a sum-of-fractions.
If, as in the original problem statement, you want the representation that uses fewest fractions (i.e. the least number of 1 bits in the binary representation), just try both rounding options and pick whichever one is more economical.
Perhaps I am dumb...
The only trick I can see here is that the sum of (1/2)^(i+1) for i in [0..n) where n tends towards infinity gives 1. This simple fact proves that (1/2)^i is always superior to sum (1/2)^j for j in [i+1, n), whatever n is.
So, when looking for our indices, it does not seem we have much choice. Let's start with i = 0
either r is superior to 2^-(i+1) and thus we need it
or it is inferior and we need to choose whether 2^-(i+1) OR sum 2^-j for j in [i+2, N] is closest (deferring to the latter in case of equality)
The only step that could be costly is obtaining the sum, but it can be precomputed once and for all (and even precomputed lazily).
// The resulting vector contains at index i the sum of 2^-j for j in [i+1, N]
// and is padded with one 0 to get the same length as `v`
static std::vector<double> partialSums(std::vector<double> const& v) {
std::vector<double> result;
// When summing doubles, we need to start with the smaller ones
// because of the precision of representations...
double sum = 0;
BOOST_REVERSE_FOREACH(double d, v) {
sum += d;
result.push_back(sum);
}
result.pop_back(); // there is a +1 offset in the indexes of the result
std::reverse(result.begin(), result.end());
result.push_back(0); // pad the vector to have the same length as `v`
return result;
}
// The resulting vector contains the indexes elected
static std::vector<size_t> getIndexesImpl(std::vector<double> const& v,
std::vector<double> const& ps,
double r)
{
std::vector<size_t> indexes;
for (size_t i = 0, max = v.size(); i != max; ++i) {
if (r >= v[i]) {
r -= v[i];
indexes.push_back(i);
continue;
}
// We favor the closest to 0 in case of equality
// which is the sum of the tail as per the theorem above.
if (std::fabs(r - v[i]) < std::fabs(r - ps[i])) {
indexes.push_back(i);
return indexes;
}
}
return indexes;
}
std::vector<size_t> getIndexes(std::vector<double>& v, double r) {
std::vector<double> const ps = partialSums(v);
return getIndexesImpl(v, ps, r);
}
The code runs (with some debug output) at ideone. Note that for 0.3 it gives:
0.3:
1: 0.25
3: 0.0625
=> 0.3125
which is slightly different from the other answers.
At the risk of downvotes, this problem seems to be rather straightforward. Just start with the largest and smallest numbers you can produce out of V, adjust each index in turn until you have the two possible closest answers. Then evaluate which one is the better answer.
Here is untested code (in a language that I don't write):
void getIndexes(std::vector<double>& V, double r)
{
double v_lower = 0;
double v_upper = 1.0 - 0.5**V.size();
std::vector<int> index_lower;
std::vector<int> index_upper;
if (v_upper <= r)
{
// The answer is trivial.
for (int i = 0; i < V.size(); i++)
cout << i;
return;
}
for (int i = 0; i < N; i++)
{
if (v_lower + V[i] <= r)
{
v_lower += V[i];
index_lower.push_back(i);
}
if (r <= v_upper - V[i])
v_upper -= V[i];
else
index_upper.push_back(i);
}
if (r - v_lower < v_upper - r)
printIndexes(index_lower);
else if (v_upper - r < r - v_lower)
printIndexes(index_upper);
else if (v_upper.size() < v_lower.size())
printIndexes(index_upper);
else
printIndexes(index_lower);
}
void printIndexes(std::vector<int>& ind)
{
for (int i = 0; i < ind.size(); i++)
{
cout << ind[i];
}
}
Did I get the job! :D
(Please note, this is horrible code that relies on our knowing exactly what V has in it...)
I will start by saying that I do believe that this problem is trivial...
(waits until all stones have been thrown)
Yes, I did read the OP's edit that says that I have to re-read the question if I think so. Therefore I might be missing something that I fail to see - in this case please excuse my ignorance and feel free to point out my mistakes.
I don't see this as a dynamic programming problem. At the risk of sounding naive, why not try keeping two estimations of r while searching for indices - namely an under-estimation and an over-estimation. After all, if r does not equal any sum that can be computed from elements of V, it will lie between some two sums of the kind. Our goal is to find these sums and to report which is closer to r.
I threw together some quick-and-dirty Python code that does the job. The answer it reports is correct for the two test cases that the OP provided. Note that if the return is structured such that at least one index always has to be returned - even if the best estimation is no indices at all.
def estimate(V, r):
lb = 0 # under-estimation (lower-bound)
lbList = []
ub = 1 - 0.5**len(V) # over-estimation = sum of all elements of V
ubList = range(len(V))
# calculate closest under-estimation and over-estimation
for i in range(len(V)):
if r == lb + V[i]:
return (lbList + [i], lb + V[i])
elif r == ub:
return (ubList, ub)
elif r > lb + V[i]:
lb += V[i]
lbList += [i]
elif lb + V[i] < ub:
ub = lb + V[i]
ubList = lbList + [i]
return (ubList, ub) if ub - r < r - lb else (lbList, lb) if lb != 0 else ([len(V) - 1], V[len(V) - 1])
# populate V
N = 5 # number of elements
V = []
for i in range(1, N + 1):
V += [0.5**i]
# test
r = 0.484375 # this value is equidistant from both under- and over-estimation
print "r:", r
estimate = estimate(V, r)
print "Indices:", estimate[0]
print "Estimate:", estimate[1]
Note: after finishing writing my answer I noticed that this answer follows the same logic. Alas!
I don't know if you have test cases, try the code below. It is a dynamic-programming approach.
1] exp: given 1/2^i, find the largest i as exp. Eg. 1/32 returns 5.
2] max: 10^exp where exp=i.
3] create an array of size max+1 to hold all possible sums of the elements of V.
Actually the array holds the indexes, since that's what you want.
4] dynamically compute the sums (all invalids remain null)
5] the last while loop finds the nearest correct answer.
Here is the code:
public class Subset {
public static List<Integer> subsetSum(double[] V, double r) {
int exp = exponent(V);
int max = (int) Math.pow(10, exp);
//list to hold all possible sums of the elements in V
List<Integer> indexes[] = new ArrayList[max + 1];
indexes[0] = new ArrayList();//base case
//dynamically compute the sums
for (int x=0; x<V.length; x++) {
int u = (int) (max*V[x]);
for(int i=max; i>=u; i--) if(null != indexes[i-u]) {
List<Integer> tmp = new ArrayList<Integer>(indexes[i - u]);
tmp.add(x);
indexes[i] = tmp;
}
}
//find the best answer
int i = (int)(max*r);
int j=i;
while(null == indexes[i] && null == indexes[j]) {
i--;j++;
}
return indexes[i]==null || indexes[i].isEmpty()?indexes[j]:indexes[i];
}// subsetSum
private static int exponent(double[] V) {
double d = V[V.length-1];
int i = (int) (1/d);
String s = Integer.toString(i,2);
return s.length()-1;
}// summation
public static void main(String[] args) {
double[] V = {1/2.,1/4.,1/8.,1/16.,1/32.};
double r = 0.6, s=0.3,t=0.256652;
System.out.println(subsetSum(V,r));//[0, 3, 4]
System.out.println(subsetSum(V,s));//[1, 3]
System.out.println(subsetSum(V,t));//[1]
}
}// class
Here are results of running the code:
For 0.600000 get 0.593750 => [0, 3, 4]
For 0.300000 get 0.312500 => [1, 3]
For 0.256652 get 0.250000 => [1]
For 0.700000 get 0.687500 => [0, 2, 3]
For 0.710000 get 0.718750 => [0, 2, 3, 4]
The solution implements Polynomial time approximate algorithm. Output of the program is the same as outputs of another solutions.
#include <math.h>
#include <stdio.h>
#include <vector>
#include <algorithm>
#include <functional>
void populate(std::vector<double> &vec, int count)
{
double val = .5;
vec.clear();
for (int i = 0; i < count; i++) {
vec.push_back(val);
val *= .5;
}
}
void remove_values_with_large_error(const std::vector<double> &vec, std::vector<double> &res, double r, double max_error)
{
std::vector<double>::const_iterator iter;
double min_err, err;
min_err = 1.0;
for (iter = vec.begin(); iter != vec.end(); ++iter) {
err = fabs(*iter - r);
if (err < max_error) {
res.push_back(*iter);
}
min_err = std::min(err, min_err);
}
}
void find_partial_sums(const std::vector<double> &vec, std::vector<double> &res, double r)
{
std::vector<double> svec, tvec, uvec;
std::vector<double>::const_iterator iter;
int step = 0;
svec.push_back(0.);
for (iter = vec.begin(); iter != vec.end(); ++iter) {
step++;
printf("step %d, svec.size() %d\n", step, svec.size());
tvec.clear();
std::transform(svec.begin(), svec.end(), back_inserter(tvec),
std::bind2nd(std::plus<double>(), *iter));
uvec.clear();
uvec.insert(uvec.end(), svec.begin(), svec.end());
uvec.insert(uvec.end(), tvec.begin(), tvec.end());
sort(uvec.begin(), uvec.end());
uvec.erase(unique(uvec.begin(), uvec.end()), uvec.end());
svec.clear();
remove_values_with_large_error(uvec, svec, r, *iter * 4);
}
sort(svec.begin(), svec.end());
svec.erase(unique(svec.begin(), svec.end()), svec.end());
res.clear();
res.insert(res.end(), svec.begin(), svec.end());
}
double find_closest_value(const std::vector<double> &sums, double r)
{
std::vector<double>::const_iterator iter;
double min_err, res, err;
min_err = fabs(sums.front() - r);
res = sums.front();
for (iter = sums.begin(); iter != sums.end(); ++iter) {
err = fabs(*iter - r);
if (err < min_err) {
min_err = err;
res = *iter;
}
}
printf("found value %lf with err %lf\n", res, min_err);
return res;
}
void print_indexes(const std::vector<double> &vec, double value)
{
std::vector<double>::const_iterator iter;
int index = 0;
printf("indexes: [");
for (iter = vec.begin(); iter != vec.end(); ++iter, ++index) {
if (value >= *iter) {
printf("%d, ", index);
value -= *iter;
}
}
printf("]\n");
}
int main(int argc, char **argv)
{
std::vector<double> vec, sums;
double r = .7;
int n = 5;
double value;
populate(vec, n);
find_partial_sums(vec, sums, r);
value = find_closest_value(sums, r);
print_indexes(vec, value);
return 0;
}
Sort the vector and search for the closest fraction available to r. store that index, subtract the value from r, and repeat with the remainder of r. iterate until r is reached, or no such index can be found.
Example :
0.3 - the biggest value available would be 0.25. (index 2). the remainder now is 0.05
0.05 - the biggest value available would be 0.03125 - the remainder will be 0.01875
etc.
etc. every step would be an O(logN) search in a sorted array. the number of steps will also be O(logN) total complexity will be than O(logN^2).
This is not dynamic programming question
The output should rather be vector of ints (indexes), not vector of doubles
This might by off 0-2 in exact values, this is just concept:
A) output zero index until the r0 (r - index values already outputded) is bigger than 1/2
B) Inspect the internal representation of r0 double and:
x (1st bit shift) = -Exponent; // The bigger exponent, the smallest numbers (bigger x in 1/2^(x) you begin with)
Inspect bit representation of the fraction part of float in cycle with body:
(direction depends on little/big endian)
{
if (bit is 1)
output index x;
x++;
}
Complexity of each step is constant, so overall it is O(n) where n is size of output.
To paraphrase the question, what are the one bits in the binary representation of r (after the binary point)? N is the 'precision', if you like.
In Cish pseudo-code
for (int i=0; i<N; i++) {
if (r>V[i]) {
print(i);
r -= V[i];
}
}
You could add an extra test for r == 0 to terminate the loop early.
Note that this gives the least binary number closest to 'r', i.e. the one closer to zero if there are two equally 'right' answers.
If the Nth digit was a one, you'll need to add '1' to the 'binary' number obtained and check both against the original 'r'. (Hint: construct vectors a[N], b[N] of 'bits', set '1' bits instead of 'print'ing above. Set b = a and do a manual add, digit by digit from the end of 'b' until you stop carrying. Convert to double and choose whichever is closer.
Note that a[] <= r <= a[] + 1/2^N and that b[] = a[] + 1/2^N.
The 'least number of indexes [sic]' is a red-herring.