boost odeint gives very different values from Python3 scipy - c++

I am trying to integrate a very simple ODE using boost odeint.
For some cases, the values are the same (or very similar) to Python's scipy odeint function.
But for other initial conditions, the values are vastly different.
The function is: d(uhat) / dt = - alpha^2 * kappa^2 * uhat
where alpha is 1.0, and kappa is a constant depending on the case (see values below).
I have tried several different ODE solvers from boost, and none seem to work.
Update: The code below is now working.
In the code below, the first case gives nearly identical results, the 2nd case is kind of trivial (but reassuring), and the 3rd case gives erroneous answers in the C++ version.
Here is the C++ version:
#include <boost/numeric/odeint.hpp>
#include <cstdlib>
#include <iostream>
typedef boost::numeric::odeint::runge_kutta_dopri5<double> Stepper_Type;
struct ResultsObserver
{
std::ostream& m_out;
ResultsObserver( std::ostream &out ) : m_out( out ) { }
void operator()(const State_Type& x , double t ) const
{
m_out << t << " : " << x << std::endl;
}
};
// The rhs: d_uhat_dt = - alpha^2 * kappa^2 * uhat
class Eq {
public:
Eq(double alpha, double kappa)
: m_constant(-1.0 * alpha * alpha * kappa * kappa) {}
void operator()(double uhat, double& d_uhat_dt, const double t) const
{
d_uhat_dt = m_constant * uhat;
}
private:
double m_constant;
};
void integrate(double kappa, double initValue)
{
const unsigned numTimeIncrements = 100;
const double dt = 0.1;
const double alpha = 1.0;
double uhat = initValue; //Init condition
std::vector<double> uhats; //Results vector
Eq rhs(alpha, kappa); //The RHS of the ODE
//This is what I was doing that did not work
//
//boost::numeric::odeint::runge_kutta_dopri5<double> stepper;
//for(unsigned step = 0; step < numTimeIncrements; ++step) {
// uhats.push_back(uhat);
// stepper.do_step(rhs, uhat, step*dt, dt);
//}
//This works
integrate_const(
boost::numeric::odeint::make_dense_output<Stepper_Type>( 1E-12, 1E-6 ),
rhs, uhat, startTime, endTime, dt, ResultsObserver(std::cout)
);
std::cout << "kappa = " << kappa << ", initial value = " << initValue << std::endl;
for(auto val : uhats)
std::cout << val << std::endl;
std::cout << "---" << std::endl << std::endl;
}
int main() {
const double kappa1 = 0.062831853071796;
const double initValue1 = -187.097241230045967;
integrate(kappa1, initValue1);
const double kappa2 = 28.274333882308138;
const double initValue2 = 0.000000000000;
integrate(kappa2, initValue2);
const double kappa3 = 28.337165735379934;
const double initValue3 = -0.091204068895190;
integrate(kappa3, initValue3);
return EXIT_SUCCESS;
}
and the corresponding Python3 version:
enter code here
#!/usr/bin/env python3
import numpy as np
from scipy.integrate import odeint
def Eq(uhat, t, kappa, a):
d_uhat = -a**2 * kappa**2 * uhat
return d_uhat
def integrate(kappa, initValue):
dt = 0.1
t = np.arange(0,10,dt)
a = 1.0
print("kappa = " + str(kappa))
print("initValue = " + str(initValue))
uhats = odeint(Eq, initValue, t, args=(kappa,a))
print(uhats)
print("---")
print()
kappa1 = 0.062831853071796
initValue1 = -187.097241230045967
integrate(kappa1, initValue1)
kappa2 = 28.274333882308138
initValue2 = 0.000000000000
integrate(kappa2, initValue2)
kappa3 = 28.337165735379934
initValue3 = -0.091204068895190
integrate(kappa3, initValue3)

With boost::odeint you should use the integrate... interface functions.
What happens in your code using do_step is that you use the dopri5 method as a fixed-step method with your provided step size. In connection with the large coefficient L=kappa^2 of about 800, the product L*dt=80 is far outside the stability region of the method, the boundary is between values of 2 to 3.5. Divergence or oscillating divergence is the expected result.
What should happen and what is implemented in the scipy odeint, ode-dopri5 or solve_ivp-RK45 methods is a method with adaptive step size. Internally the optimal step size for the error tolerances is chosen, and in each internal step an interpolation polynomial is determined. The output or observed values are computed with this interpolator, also called "dense output" if the interpolation function is returned from the integrator. One result of the error control is that the step size is always inside the stability region, for stiff problems with medium error tolerances on or close to the boundary of this region.

This has all the hallmarks of precision issues.
Simply replacing double with long double gives:
Live On Compiler Explorer
#include <boost/numeric/odeint.hpp>
#include <boost/multiprecision/cpp_bin_float.hpp>
#include <boost/multiprecision/cpp_dec_float.hpp>
#include <fmt/ranges.h>
#include <fmt/ostream.h>
#include <iostream>
using Value = long double;
//using Value = boost::multiprecision::number<
//boost::multiprecision::backends::cpp_bin_float<100>,
//boost::multiprecision::et_off>;
// The rhs: d_uhat_dt = - alpha^2 * kappa^2 * uhat
class Eq {
public:
Eq(Value alpha, Value kappa)
: m_constant(-1.0 * alpha * alpha * kappa * kappa)
{
}
void operator()(Value uhat, Value& d_uhat_dt, const Value) const
{
d_uhat_dt = m_constant * uhat;
}
private:
Value m_constant;
};
void integrate(Value const kappa, Value const initValue)
{
const unsigned numTimeIncrements = 100;
const Value dt = 0.1;
const Value alpha = 1.0;
Value uhat = initValue; // Init condition
std::vector<Value> uhats; // Results vector
Eq rhs(alpha, kappa); // The RHS of the ODE
boost::numeric::odeint::runge_kutta_dopri5<Value> stepper;
for (unsigned step = 0; step < numTimeIncrements; ++step) {
uhats.push_back(uhat);
auto&& stepdt = step * dt;
stepper.do_step(rhs, uhat, stepdt, dt);
}
fmt::print("kappa = {}, initial value = {}\n{}\n---\n", kappa, initValue,
uhats);
}
int main() {
for (auto [kappa, initValue] :
{
std::pair //
{ 0.062831853071796 , -187.097241230045967 },
{28.274333882308138 , 0.000000000000 },
{28.337165735379934 , -0.091204068895190 },
}) //
{
integrate(kappa, initValue);
}
}
Prints
kappa = 0.0628319, initial value = -187.097
[-187.097, -187.023, -186.95, -186.876, -186.802, -186.728, -186.655, -186.581, -186.507, -186.434, -186.36, -186.287, -186.213, -186.139, -186.066, -185.993, -185.919, -185.846, -185.772, -185.699, -185.626, -185.553, -185.479, -185.406, -185.333, -185.26, -185.187, -185.114, -185.04, -184.967, -184.894, -184.821, -184.748, -184.676, -184.603, -184.53, -184.457, -184.384, -184.311, -184.239, -184.166, -184.093, -184.021, -183.948, -183.875, -183.803, -183.73, -183.658, -183.585, -183.513, -183.44, -183.368, -183.296, -183.223, -183.151, -183.079, -183.006, -182.934, -182.862, -182.79, -182.718, -182.645, -182.573, -182.501, -182.429, -182.357, -182.285, -182.213, -182.141, -182.069, -181.998, -181.926, -181.854, -181.782, -181.71, -181.639, -181.567, -181.495, -181.424, -181.352, -181.281, -181.209, -181.137, -181.066, -180.994, -180.923, -180.852, -180.78, -180.709, -180.638, -180.566, -180.495, -180.424, -180.353, -180.281, -180.21, -180.139, -180.068, -179.997, -179.926]
---
kappa = 28.2743, initial value = 0
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
---
kappa = 28.3372, initial value = -0.0912041
[-0.0912041, -38364100, -1.61375e+16, -6.78809e+24, -2.85534e+33, -1.20107e+42, -5.0522e+50, -2.12516e+59, -8.93928e+67, -3.76022e+76, -1.5817e+85, -6.65327e+93, -2.79864e+102, -1.17722e+111, -4.95186e+119, -2.08295e+128, -8.76174e+136, -3.68554e+145, -1.55029e+154, -6.52114e+162, -2.74306e+171, -1.15384e+180, -4.85352e+188, -2.04159e+197, -8.58774e+205, -3.61235e+214, -1.5195e+223, -6.39163e+231, -2.68858e+240, -1.13092e+249, -4.75713e+257, -2.00104e+266, -8.41718e+274, -3.54061e+283, -1.48932e+292, -6.26469e+300, -2.63518e+309, -1.10846e+318, -4.66265e+326, -1.9613e+335, -8.25002e+343, -3.47029e+352, -1.45975e+361, -6.14028e+369, -2.58285e+378, -1.08645e+387, -4.57005e+395, -1.92235e+404, -8.08618e+412, -3.40137e+421, -1.43075e+430, -6.01833e+438, -2.53155e+447, -1.06487e+456, -4.47929e+464, -1.88417e+473, -7.92559e+481, -3.33382e+490, -1.40234e+499, -5.89881e+507, -2.48128e+516, -1.04373e+525, -4.39033e+533, -1.84675e+542, -7.76818e+550, -3.26761e+559, -1.37449e+568, -5.78166e+576, -2.432e+585, -1.023e+594, -4.30314e+602, -1.81008e+611, -7.61391e+619, -3.20272e+628, -1.34719e+637, -5.66684e+645, -2.3837e+654, -1.00268e+663, -4.21768e+671, -1.77413e+680, -7.4627e+688, -3.13911e+697, -1.32044e+706, -5.55429e+714, -2.33636e+723, -9.82768e+731, -4.13392e+740, -1.73889e+749, -7.31449e+757, -3.07677e+766, -1.29421e+775, -5.44399e+783, -2.28996e+792, -9.6325e+800, -4.05182e+809, -1.70436e+818, -7.16922e+826, -3.01567e+835, -1.26851e+844, -5.33587e+852]
---
As you can see, I tried some simple takes to get it to use Boost Multiprecision, but that didn't immediately work and may require someone with actual understanding of the maths / IDEINT.

Related

ArrayFire: Translate a batch of images at the same time

I'm using arrayfire and I need to translate a lot of images at once and store it in a new array. The images are contained in a single array of size (w, h, c, b) and the amount by which each image needs to be translated is inside a (2, 1, 1, b) array.
The sequential implementation is as follows
for (int i=0; i<b; i++)
{
float x = coords(0, 0, 0, i).scalar<float>();
float y = coords(1, 0, 0, i).scalar<float>();
af::array t_imgs(af::span,af::span,af::span,i) =
af::translate(imgs(af::span,af::span,af::span,i), x, y);
}
How could I parallelize it? Translate doesn't accept arrays as arguments, so I can't do something like this:
gfor(af::seq i, b)
{
af::array x = coords(0, 0, 0, i);
af::array y = coords(1, 0, 0, i);
t_imgs(af::span,af::span,af::span,i) =
af::translate(imgs(af::span,af::span,af::span,i), x, y);
}

AVX calculation precision

I wrote a program to display the mandelbrot set. To speed it up, I used AVX (really AVX2) instructions through the <immintrin.h> header.
The problem is: The result of the AVX computation (with double precision) has artifacts, and it differs to the result when computed using "normal" doubles.
In detail, there is a function getIterationCount which calculates the number of iterations until the mandelbrot sequence exceeds 4, or assumes the point is included in the set if the sequences does not exceed 4 during the first N steps.
The code looks like this:
#include "stdafx.h"
#include <iostream>
#include <complex>
#include <immintrin.h>
class MandelbrotSet {
public:
int getIterationCount(const std::complex<double>, const int) const noexcept;
__m256i getIterationCount(__m256d cReal, __m256d cIm, unsigned maxIterations) const noexcept;
};
inline int MandelbrotSet::getIterationCount(const std::complex<double> c, const int maxIterations) const noexcept
{
double currentReal = 0;
double currentIm = 0;
double realSquare;
double imSquare;
for (int i = 0; i < maxIterations; ++i) {
realSquare = currentReal * currentReal;
imSquare = currentIm * currentIm;
currentIm = 2 * currentReal * currentIm + c.imag();
currentReal = realSquare - imSquare + c.real();
if (realSquare + imSquare >= 4) {
return i;
}
}
return -1;
}
const __m256i negone = _mm256_set_epi64x(-1, -1, -1, -1);
const __m256i one = _mm256_set_epi64x(1, 1, 1, 1);
const __m256d two = _mm256_set_pd(2, 2, 2, 2);
const __m256d four = _mm256_set_pd(4, 4, 4, 4);
//calculates for i = 0,1,2,3
//output[i] = if ctrl[i] == 0b11...1 then onTrue[i] else onFalse[i]
inline __m256i _mm256_select_si256(__m256i onTrue, __m256i onFalse, __m256i ctrl) {
return _mm256_or_si256(_mm256_and_si256(onTrue, ctrl), _mm256_and_si256(onFalse, _mm256_xor_si256(negone, ctrl)));
}
inline __m256i MandelbrotSet::getIterationCount(__m256d cReal, __m256d cIm, unsigned maxIterations) const noexcept {
__m256i result = _mm256_set_epi64x(0, 0, 0, 0);
__m256d currentReal = _mm256_set_pd(0, 0, 0, 0);
__m256d currentIm = _mm256_set_pd(0, 0, 0, 0);
__m256d realSquare;
__m256d imSquare;
for (unsigned i = 0; i <= maxIterations; ++i)
{
realSquare = _mm256_mul_pd(currentReal, currentReal);
imSquare = _mm256_mul_pd(currentIm, currentIm);
currentIm = _mm256_mul_pd(currentIm, two);
currentIm = _mm256_fmadd_pd(currentIm, currentReal, cIm);
currentReal = _mm256_sub_pd(realSquare, imSquare);
currentReal = _mm256_add_pd(currentReal, cReal);
__m256i isSmaller = _mm256_castpd_si256(_mm256_cmp_pd(_mm256_add_pd(realSquare, imSquare), four, _CMP_LE_OS));
result = _mm256_select_si256(_mm256_add_epi64(one, result), result, isSmaller);
//if (i % 10 == 0 && !isSmaller.m256i_i64[0] && !isSmaller.m256i_i64[1] && !isSmaller.m256i_i64[2] && !isSmaller.m256i_i64[3]) return result;
}
return result;
}
using namespace std;
int main() {
MandelbrotSet m;
std::complex<double> point(-0.14203954214360026, 1);
__m256i result_avx = m.getIterationCount(_mm256_set_pd(-0.14203954214360026, -0.13995837669094691, -0.13787721123829355, -0.13579604578563975),
_mm256_set_pd(1, 1, 1, 1), 2681);
int result_normal = m.getIterationCount(point, 2681);
cout << "Normal: " << result_normal << ", AVX: " << result_avx.m256i_i64[0] << ", at point " << point << endl;
return 0;
}
When I run this code, I get the following result:
(The point -0.14203954214360026 + i is chosen intentionally, because both methods return the same/almost the same value in most points)
Normal: 13, AVX: 20, at point (-0.14204,1)
A difference of 1 might be acceptable, but a difference of 7 seems quite big, since both methods use double precision.
Have AVX instructions a lower precision than "normal" instruction? If not, why do both results differ so much?
I use MS Visual Studio 2017, MS Visual C++ 2017 15.6 v14.13 141 and my computer has a i7-7700K Processor. The Project is compiled for x64. The result is the same if it is compiler with no or full optimization.
The rendered results look like this:
AVX:
Normal
The values of realSquare and imSquare during the loop are as follows:
0, 0, 0
1, 0.0201752, 1
2, 1.25858, 0.512543
3, 0.364813, 0.367639
4, 0.0209861, 0.0715851
5, 0.0371096, 0.850972
6, 0.913748, 0.415495
7, 0.126888, 0.0539759
8, 0.00477863, 0.696364
9, 0.69493, 0.782567
10, 0.0527514, 0.225526
11, 0.0991077, 1.48388
12, 2.33115, 0.0542994
13, 4.5574, 0.0831971
In the AVX loop the values are:
0, 0, 0
1, 0.0184406, 1
2, 1.24848, 0.530578
3, 0.338851, 0.394109
4, 0.0365017, 0.0724287
5, 0.0294888, 0.804905
6, 0.830307, 0.478687
7, 0.04658, 0.0680608
8, 0.024736, 0.78746
9, 0.807339, 0.519651
10, 0.0230712, 0.0872787
11, 0.0400014, 0.828561
12, 0.854433, 0.404359
13, 0.0987707, 0.0308286
14, 0.00460416, 0.791455
15, 0.851277, 0.773114
16, 0.00332154, 0.387519
17, 0.270393, 1.14866
18, 1.02832, 0.0131355
19, 0.773319, 1.51892
20, 0.776852, 10.0336
Reversing the order of the arguments passed to _mm256_set_pd solves the problem.
If you inspect the value of cReal in the debugger you'll see that the first element is set to -0.13579604578563975 not -0.14203954214360026.

Neural Network not learning - MNIST data - Handwriting recognition

I have written a Neural Network Program. It works for Logic Gates, but when I try to use it for recognizing handwritten digits - it simply does not learn.
Please find the code below:
// This is a single neuron; this might be necessary in order to understand remaining code
typedef struct SingleNeuron
{
double outputValue;
std::vector<double> weight;
std::vector<double> deltaWeight;
double gradient;
double sum;
}SingleNeuron;
Then I initialize the net. I set weights to be random value between -0.5 to +0.5, sum to 0, deltaWeight to 0
Then comes the FeedForward:
for (unsigned i = 0; i < inputValues.size(); ++i)
{
neuralNet[0][i].outputValue = inputValues[i];
neuralNet[0][i].sum = 0.0;
// std::cout << "o/p Val = " << neuralNet[0][i].outputValue << std::endl;
}
for (unsigned i = 1; i < neuralNet.size(); ++i)
{
std::vector<SingleNeuron> prevLayerNeurons = neuralNet[i - 1];
unsigned j = 0;
double thisNeuronOPVal = 0;
// std::cout << std::endl;
for (j = 0; j < neuralNet[i].size() - 1; ++j)
{
double sum = 0;
for (unsigned k = 0; k < prevLayerNeurons.size(); ++k)
{
sum += prevLayerNeurons[k].outputValue * prevLayerNeurons[k].weight[j];
}
neuralNet[i][j].sum = sum;
neuralNet[i][j].outputValue = TransferFunction(sum);
// std::cout << neuralNet[i][j].outputValue << "\t";
}
// std::cout << std::endl;
}
My transfer function and its derivative is mentioned at the end.
After this I try to back-propagate using:
// calculate output layer gradients
for (unsigned i = 0; i < outputLayer.size() - 1; ++i)
{
double delta = actualOutput[i] - outputLayer[i].outputValue;
outputLayer[i].gradient = delta * TransferFunctionDerivative(outputLayer[i].sum);
}
// std::cout << "Found Output gradients "<< std::endl;
// calculate hidden layer gradients
for (unsigned i = neuralNet.size() - 2; i > 0; --i)
{
std::vector<SingleNeuron>& hiddenLayer = neuralNet[i];
std::vector<SingleNeuron>& nextLayer = neuralNet[i + 1];
for (unsigned j = 0; j < hiddenLayer.size(); ++j)
{
double dow = 0.0;
for (unsigned k = 0; k < nextLayer.size() - 1; ++k)
{
dow += nextLayer[k].gradient * hiddenLayer[j].weight[k];
}
hiddenLayer[j].gradient = dow * TransferFunctionDerivative(hiddenLayer[j].sum);
}
}
// std::cout << "Found hidden layer gradients "<< std::endl;
// from output to 1st hidden layer, update all weights
for (unsigned i = neuralNet.size() - 1; i > 0; --i)
{
std::vector <SingleNeuron>& currentLayer = neuralNet[i];
std::vector <SingleNeuron>& prevLayer = neuralNet[i - 1];
for (unsigned j = 0; j < currentLayer.size() - 1; ++j)
{
for (unsigned k = 0; k < prevLayer.size(); ++k)
{
SingleNeuron& thisNeueon = prevLayer[k];
double oldDeltaWeight = thisNeueon.deltaWeight[j];
double newDeltaWeight = ETA * thisNeueon.outputValue * currentLayer[j].gradient + (ALPHA * oldDeltaWeight);
thisNeueon.deltaWeight[j] = newDeltaWeight;
thisNeueon.weight[j] += newDeltaWeight;
}
}
}
These are the TransferFuntion and its derivative;
double TransferFunction(double x)
{
double val;
//val = tanh(x);
val = 1 / (1 + exp(x * -1));
return val;
}
double TransferFunctionDerivative(double x)
{
//return 1 - x * x;
double val = exp(x * -1) / pow((exp(x * -1) + 1), 2);
return val;
}
One thing I observed If i use standard sigmoid function to be my transfer function AND if I pass output of neuron to transfer function - Result is INFINITY. But tanh(x) works fine with this value
So if I am using 1/1+e^(-x) as transfer function I have to pass Sum of Net Inputs and with tanh being my transfer function I have to pass output of current neuron.
I do not completely understand why this is the way it is, may be this calls for a different question.
But this question is really about something else: NETWORK IS WORKING FOR LOGIC GATES BUT NOT FOR CHARACTER RECOGNITION
I have tried many variations/combinations of Learning Rate and Acceleration and # hidden layers and their sizes. Please find the results below:
AvgErr: 0.299399 #Pass799
AvgErr : 0.305071 #Pass809
AvgErr : 0.303046 #Pass819
AvgErr : 0.299569 #Pass829
AvgErr : 0.30413 #Pass839
AvgErr : 0.304165 #Pass849
AvgErr : 0.300529 #Pass859
AvgErr : 0.302973 #Pass869
AvgErr : 0.299238 #Pass879
AvgErr : 0.304708 #Pass889
AvgErr : 0.30068 #Pass899
AvgErr : 0.302582 #Pass909
AvgErr : 0.301767 #Pass919
AvgErr : 0.303167 #Pass929
AvgErr : 0.299551 #Pass939
AvgErr : 0.301295 #Pass949
AvgErr : 0.300651 #Pass959
AvgErr : 0.297867 #Pass969
AvgErr : 0.304221 #Pass979
AvgErr : 0.303702 #Pass989
After looking at the results you might feel this guy is simply stuck into local minima, but please wait and read through:
Input = [0, 0, 0, 0, 0, 0, 1, 0, 0, 0]
Output = 0.0910903, 0.105674, 0.064575, 0.0864824, 0.128682, 0.0878434, 0.0946296, 0.154405, 0.0678767, 0.0666924
Input = [1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Output = 0.0916106, 0.105958, 0.0655508, 0.086579, 0.126461, 0.0884082, 0.110953, 0.163343, 0.0689315, 0.0675822
Input = [0, 0, 0, 1, 0, 0, 0, 0, 0, 0]
Output = 0.105344, 0.105021, 0.0659517, 0.0858077, 0.123104, 0.0884107, 0.116917, 0.161911, 0.0693426, 0.0675156
Input = [0, 0, 0, 0, 0, 0, 1, 0, 0, 0]
Output = , 0.107113, 0.101838, 0.0641632, 0.0967766, 0.117149, 0.085271, 0.11469, 0.153649, 0.0672772, 0.0652416
Above is the output of epoch #996, #997,#998 and #999
So simply network is not learning. For this e.g. I have used ALPHA = 0.4, ETA = 0.7, 10 hidden layers each of 100 neurons and average is over 10 epochs. If you are worried about Learning Rate being 0.4 or so many hidden layers I have already tried their variations. For e.g. for learning rate being 0.1 and 4 hidden layers - each of 16
Input = [0, 0, 0, 0, 0, 0, 1, 0, 0, 0]
Output = 0.0883238, 0.0983253, 0.0613749, 0.0809751, 0.124972, 0.0897194, 0.0911235, 0.179984, 0.0681346, 0.0660039
Input = [1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Output = 0.0868767, 0.0966924, 0.0612488, 0.0798343, 0.120353, 0.0882381, 0.111925, 0.169309, 0.0676711, 0.0656819
Input = [0, 0, 0, 1, 0, 0, 0, 0, 0, 0]
Output = 0.105252, 0.0943837, 0.0604416, 0.0781779, 0.116231, 0.0858496, 0.108437, 0.1588, 0.0663156, 0.0645477
Input = [0, 0, 0, 0, 0, 0, 1, 0, 0, 0]
Output = 0.102023, 0.0914957, 0.059178, 0.09339, 0.111851, 0.0842454, 0.104834, 0.149892, 0.0651799, 0.063558
I am so damn sure that I have missed something. I am not able to figure it out. I have read Tom Mitchel's algorithm so many times, but I don't know what is wrong. Whatever example I solve by hand - works! (Please don't ask me to solve MNIST data images by hand ;) ) I do not know where to change the code, what to do.. please help out..
EDIT -- Uploading more data as per suggestions in comments
1 Hidden Layer of 32 -- still no learning.
Expected Output -- Input is images between 0-9, so a simple vector describing which is current image, that bit is 1 all others are 0. So i would want output to be as close to 1 for that particular bit and others being close to 0 For e.g. if input is Input = [0, 0, 0, 0, 0, 0, 1, 0, 0, 0] I would want output to be something like Output = 0.002023, 0.0914957, 0.059178, 0.09339, 0.011851, 0.0842454, 0.924834, 0.049892, 0.0651799, 0.063558 (THis is vague, hand-generated)
Here are the links of other researcher's work.
Stanford
SourceForge -- This is rather a library
Not only these 2, there are so many sites showing the demos.
Things are working quite fine for them. If I set my network parameters(Alpha, ETA) like them I am not getting results like them, so this is reassurance that something is wrong with my code.
EDIT 2
Adding more failure cases
Accelaration - 0.7, Learning Rate 0.1
Accelaration - 0.7, Learning Rate 0.6
In both of the above cases Hidden layers were 3, each of 32 neurons.
This answer is copied from the OP's comment on the question.
I solved the puzzle. I had made the worst possible mistake. I was giving wrong input. I have used opencv to scan the images, instead of using reshape I was using resize and so input was linear interpolation of images. So my input was wrong. There was nothing wrong with the code. My network is 784 - 65 - 10 giving 96.43% accuracy.

inverse fft of fft not returning expected data

I'm trying to make sure FFTW does what I think it should do, but am having problems. I'm using OpenCV's cv::Mat. I made a test program that, given a Mat f, computes ifft(fft(f)) and compares the result to f. I would expect the difference between the two to be negligible, but there's a strange pattern in the data..
In this case, f is initialized to be an 8x8 array of floats with positive values less than 1.
Here's my test program code:
Mat f = .. //populate f
if (f.type() != CV_32FC1)
DLOG << "Bad f type";
const int y = f.rows;
const int x = f.cols;
double* input = fftw_alloc_real(y * 2*(x/2 + 1));
// forward fft
fftw_plan plan = fftw_plan_dft_r2c_2d(x, y, input, (fftw_complex*)input, FFTW_MEASURE);
// inverse fft
fftw_plan iplan = fftw_plan_dft_c2r_2d(x, y, (fftw_complex*)input, input, FFTW_MEASURE);
// populate fftw data from f
for (int yi = 0; yi < y; ++yi)
{
const float* yptr = f.ptr<float>(yi);
for (int xi = 0; xi < x; ++xi)
input[yi*x + xi] = (double)yptr[xi];
}
fftw_execute(plan);
fftw_execute(iplan);
// put data into another cv::Mat for comparison
Mat check(y, x, f.type());
for (int yi = 0; yi < y; ++yi)
{
float* yptr = check.ptr<float>(yi);
for (int xi = 0; xi < x ; ++xi)
yptr[xi] = (float)input[yi*x + xi];
}
DLOG << Util::summary(f, "f");
DLOG << f;
DLOG << Util::summary(check, "check");
DLOG << check;
Mat diff = f*x*y - check;
DLOG << Util::summary(diff, "diff");
DLOG << diff;
Where DLOG is my logger and Util::summary(cv::Mat m) just prints passed string and the dimensions, channels, min, and max of the mat.
Here's what the data looks like (output):
f: rows:8 cols:8 chans:1 min:0.00257996 max:0.4
[0.050668437, 0.04509116, 0.033668514, 0.10986148, 0.12855141, 0.048241843, 0.12613985,.09731093;
0.028602425, 0.0092236707, 0.037089188, 0.118964, 0.075040311, 0.40000001, 0.11959606, 0.071930833;
0.0025799556, 0.051522054, 0.22233701, 0.052993439, 0.032000393, 0.12673819, 0.015244827, 0.044803992;
0.13946071, 0.019708242, 0.0112687, 0.047459811, 0.019342113, 0.030085485, 0.018739942, 0.0098618753;
0.041809395, 0.029681522, 0.026837418, 0.16038358, 0.29034778, 0.17247421, 0.1789207, 0.042179305;
0.025630442, 0.017192598, 0.060540862, 0.1854037, 0.21287154, 0.04813192, 0.042614728, 0.034764063;
0.0030835248, 0.018511582, 0.0071733585, 0.017076733, 0.064545207, 0.0026390438, 0.088922881, 0.045725599;
0.12798512, 0.23215951, 0.027465452, 0.03174505, 0.04352935, 0.025079668, 0.044403922, 0.035459157]
check: rows:8 cols:8 chans:1 min:-3.26489 max:25.6
[3.24278, 2.8858342, 2.1547849, 7.0311346, 8.2272902, 3.0874779, 8.0729504, 6.2278996;
0.30818239, 0, 2.373708, 7.6136961, 4.8025799, 25.6, 7.6541481, 4.6035733;
0.16511716, 3.2974114, -3.2648909, 0, 2.0480251, 8.1112442, 0.97566891, 2.8674555;
8.9254856, 1.2613275, 0.72119683, 3.0374279, -0.32588482, 0, 1.1993563, 0.63116002;
2.6758013, 1.8996174, 1.7175947, 10.264549, 18.582258, 11.038349, 0.042666838, 0;
1.6403483, 1.1003263, 3.8746152, 11.865837, 13.623778, 3.0804429, 2.7273426, 2.2249;
0.44932228, 0, 0.45909494, 1.0929109, 4.1308932, 0.16889881, 5.6910644, 2.9264383;
8.1910477, 14.858209, -0.071794562, 0, 2.7858784, 1.6050987, 2.841851, 2.2693861]
diff: rows:8 cols:8 chans:1 min:-0.251977 max:17.4945
[0, 0, 0, 0, 0, 0, 0, 0;
1.5223728, 0.59031492, 0, 0, 0, 0, 0, 0;
0, 0, 17.494459, 3.3915801, 0, 0, 0, 0;
0, 0, 0, 0, 1.5637801, 1.9254711, 0, 0;
0, 0, 0, 0, 0, 0, 11.408258, 2.6994755;
0, 0, 0, 0, 0, 0, 0, 0;
-0.2519767, 1.1847413, 0, 0, 0, 0, 0, 0;
0, 0, 1.8295834, 2.0316832, 0, 0, 0, 0]
The difficult part for me is the nonzero entries in the diff matrix. I've accounted for the scaling FFTW does on the values and the padding needed to do an in-place fft on real data; what am I missing?
I find it surprising that the data could be off by a value of 17 (which is 66% of the max value), when there are so many zeros. Also, the data irregularities seem to form a diagonal pattern.
As you may have noticed when writting fftw_alloc_real(y * 2*(x/2 + 1)); fftw needs extra space in the x direction to store complex data. In your case, as x=8, it needs 2*(x/2+1)=10 reals.
http://www.fftw.org/doc/Real_002ddata-DFT-Array-Format.html#Real_002ddata-DFT-Array-Format
So...you should take care of this as you populate the input array or retreive values from it.
You way change
input[yi*x + xi] = (double)yptr[xi];
for
int xfft=2*(x/2 + 1);
...
input[yi*xfft + xi] = (double)yptr[xi];
And
yptr[xi] = (float)input[yi*x + xi];
for
yptr[xi] = (float)input[yi*xfft + xi];
It should solve your problem since the non-nul points in your diff correspond to the extra padding.
Bye,

Which is a good C++ BigInteger class for programming contests?

I was just wondering which will the best BigInteger class in C++ for programming contests which do not allow external libraries?
Mainly I was looking for a class which could be used in my code( I will of course write it on my own, on similar grounds ).
The primary factors which I think are important are( according to their importance ):
Arbitrary length numbers and their operations should be supported.
Should be as small as possible, code-wise. Usually there's a limit on the size of the source code which can be submitted to ~50KB, so the code should be ( much )smaller than that.
Should be as fast as possible. I read somewhere that bigInt classes take O( log( n ) ) time, so this should have a similiar complexity. What I mean is that it should be as fast as possible.
So far I've only needed unsigned integer big numbers for codechef, but codechef only gives 2KB, so I don't have the full implementation up there anywhere, just the members needed for that problem. My code also assumes that long long has at least twice as many bits as a unsigned, though that's pretty safe. The only real trick to it is that different biguint classes may have different data lengths. Here's summaries of the more interesting functions.
#define BIG_LEN() (data.size()>rhs.data.size()?data.size():rhs.data.size())
//the length of data or rhs.data, whichever is bigger
#define SML_LEN() (data.size()<rhs.data.size()?data.size():rhs.data.size())
//the length of data or rhs.data, whichever is smaller
const unsigned char baselut[256]={ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 0, 0, 0, 0, 0,
0,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,
25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,
41,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,
25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40
};
const unsigned char base64lut[256]={ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,62, 0, 0, 0,63,
52,53,54,55,56,57,58,59,60,61, 0, 0, 0, 0, 0, 0,
0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10,11,12,13,14,
15,16,17,18,19,20,21,22,23,24,25, 0, 0, 0, 0, 0,
0,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,
41,42,43,44,45,46,47,48,49,50,51, 0, 0, 0, 0, 0
};
//lookup tables for creating from strings
void add(unsigned int amount, unsigned int index)
adds amount at index with carry, simplifies other members
void sub(unsigned int amount, unsigned int index)
subtracts amount at index with borrow, simplifies other members
biguint& operator+=(const biguint& rhs)
resize data to BIG_LEN()
int carry = 0
for each element i in data up to SML_LEN()
data[i] += rhs.data[i] + carry
carry = ((data[i]<rhs[i]+carry || (carry && rhs[i]+carry==0)) ? 1u : 0u);
if data.length > rhs.length
add(carry, SML_LEN())
biguint& operator*=(const biguint& rhs)
biguint lhs = *this;
resize data to data.length + rhs.length
zero out data
for each element j in lhs
long long t = lhs[j]
for each element i in rhs (and j+i<data.size)
t*=rhs[i];
add(t&UINT_MAX, k);
if (k+1<data.size())
add(t>>uint_bits, k+1);
//note this was public, so I could do both at the same time when needed
//operator /= and %= both just call this
//I have never needed to divide by a biguint yet.
biguint& div(unsigned int rhs, unsigned int & mod)
long long carry = 0
for each element i from data length to zero
carry = (carry << uint_bits) | data[i]
data[i] = carry/rhs;
carry %= rhs
mod = carry
//I have never needed to shift by a biguint yet
biguint& operator<<=(unsigned int rhs)
resize to have enough room, always at least 1 bigger
const unsigned int bigshift = rhs/uint_bits;
const unsigned int lilshift = rhs%uint_bits;
const unsigned int carry_shift = (uint_bits-lilshift)%32;
for each element i from bigshift to zero
t = data[i-bigshift] << lilshift;
t |= data[i-bigshift-1] >> carry_shift;
data[i] = t;
if bigshift < data.size
data[bigshift] = data[0] << lilshift
zero each element i from 0 to bigshift
std::ofstream& operator<<(std::ofstream& out, biguint num)
unsigned int mod
vector reverse
do
num.div(10,mod);
push back mod onto reverse
while num greater than 0
print out elements of reverse in reverse
std::ifstream& operator>>(std::ifstream& in, biguint num)
char next
do
in.get(next)
while next is whitespace
num = 0
do
num = num * 10 + next
while in.get(next) and next is digit
//these are handy for initializing to known values.
//I also have constructors that simply call these
biguint& assign(const char* rhs, unsigned int base)
for each char c in rhs
data *= base
add(baselut[c], 0)
biguint& assign(const char* rhs, std::integral_constant<unsigned int, 64> base)
for each char c in rhs
data *= base
add(base64lut[c], 0)
//despite being 3 times as much, code, the hex version is _way_ faster.
biguint& assign(const char* rhs, std::integral_constant<unsigned int, 16>)
if first two characters are "0x" skip them
unsigned int len = strlen(rhs);
grow(len/4+1);
zero out data
const unsigned int hex_per_int = uint_bits/4;
if (len > hex_per_int*data.size()) { //calculate where first digit goes
rhs += len-hex_per_int*data.size();
len = hex_per_int*data.size();
}
for(unsigned int i=len; i --> 0; ) { //place each digit directly in it's place
unsigned int t = (unsigned int)(baselut[*(rhs++)]) << (i%hex_per_int)*4u;
data[i/hex_per_int] |= t;
}
I also made specializations for multiplication, divide, modulo, shifts and others for std::integral_constant<unsigned int, Value>, which made massive improvements to my serializing and deserializing functions amongst others.