what are some optimization tricks to make my code run faster

what are some optimization tricks to make my code run faster - c++

i'm moving outside my confront zone and trying to make a random number distribution program while also making sure it is still somewhat uniform.
here is my code
this is the RandomDistribution.h file
#pragma once
#include <vector>
#include <random>
#include <iostream>
static float randy(float low, float high) {
static std::random_device rd;
static std::mt19937 random(rd());
std::uniform_real_distribution<float> ran(low, high);
return ran(random);
}
typedef std::vector<float> Vfloat;
class RandomDistribution
{
public:
RandomDistribution();
RandomDistribution(float percent, float contents, int container);
~RandomDistribution();
void setvariables(float percent, float contents, int container);
Vfloat RunDistribution();
private:
float divider;
float _percent;
int jar_limit;
float _contents;
float _maxdistribution;
Vfloat Jar;
bool is0;
};
this is my RandomDistribution.cpp
#include "RandomDistribution.h"
RandomDistribution::RandomDistribution() {
}
RandomDistribution::RandomDistribution(float percent, float contents, int containers):_contents(contents),jar_limit(containers)
{
Jar.resize(containers);
if (percent < 0)
_percent = 0;
else {
_percent = percent;
}
divider = jar_limit * percent;
is0 = false;
}
RandomDistribution::~RandomDistribution()
{
}
void RandomDistribution::setvariables(float percent, float contents, int container) {
if (jar_limit != container)
Jar.resize(container);
_contents = contents;
jar_limit = container;
is0 = false;
if (percent < 0)
_percent = 0;
else {
_percent = percent;
}
divider = jar_limit * percent;
}
Vfloat RandomDistribution::RunDistribution() {
for (int i = 0; i < jar_limit; i++) {
if (!is0) {
if (i + 1 >= jar_limit || _contents < 2) {
Jar[i] = _contents;
_contents -= Jar[i];
is0 = true;
}
if (!_percent <= 0) {//making sure it does not get the hole container at once
_maxdistribution = (_contents / (divider)) * (i + 1);
}
else {
_maxdistribution = _contents;
}
Jar[i] = randy(0, _maxdistribution);
if (Jar[i] < 1) {
Jar[i] = 0;
continue;
}
_contents -= Jar[i];
}
else {
Jar[0];
}
//mixing Jar so it is randomly spaced out instead all at the top
int swapper = randy(0, i);
float hold = Jar[i];
Jar[i] = Jar[swapper];
Jar[swapper] = hold;
}
return Jar;
}
source code
int main(){
RandomDistribution distribution[100];
for (int i = 0; i < 100; i++) {
distribution[i] = {RandomDistribution(1.0f, 5000.0f, 2000) };
}
Vfloat k;
k.resize(200);
for (int i = 0; i < 10; i++) {
auto t3 = chrono::steady_clock::now();
for (int b = 0; b < 100; b++) {
k = distribution[b].RunDistribution();
distribution[b].setvariables(1.0f, 5000.0f, 2000);
}
auto t4 = chrono::steady_clock::now();
auto time_span = chrono::duration_cast<chrono::duration<double>>(t4 - t3);
cout << time_span.count() << " seconds\n";
}
}
what prints out is usually between 1 to 2 seconds for each cycle. i want to bring it down to a tenth of a second if possible cause this is gonna be only one step of the process to completion and i want to run it alot more then 100 times. what can i do to speed this up, any trick or something i'm just missing here.
here is a sample of the time stamps
4.71113 seconds
1.35444 seconds
1.45008 seconds
1.74961 seconds
2.59192 seconds
2.76171 seconds
1.90149 seconds
2.2822 seconds
2.36768 seconds
2.61969 seconds

Cheinan Marks has some benchmarks and performance tips related to random generators & friends in his cppcon 2016 talk I Just Wanted a Random Integer! He mentions some fast generators as well IIRC. I'd start there.

Related

Why is multi-threading of matrix calculation not faster than single-core?

this is my first time using multi-threading to speed up a heavy calculation.
Background: The idea is to calculate a Kernel Covariance matrix, by reading a list of 3D points x_test and calculating the corresponding matrix, which has dimensions x_test.size() x x_test.size().
I already sped up the calculations by only calculating the lower triangluar matrix. Since all the calculations are independent from each other I tried to speed up the process (x_test.size() = 27000 in my case) by splitting the calculations of the matrix entries row-wise, assigning a range of rows to each thread.
On a single core the calculations took about 280 seconds each time, on 4 cores it took 270-290 seconds.
main.cpp
int main(int argc, char *argv[]) {
double sigma0sq = 1;
double lengthScale [] = {0.7633, 0.6937, 3.3307e+07};
const std::vector<std::vector<double>> x_test = parse2DCsvFile(inputPath);
/* Finding data slices of similar size */
//This piece of code works, each thread is assigned roughly the same number of matrix entries
int numElements = x_test.size()*x_test.size()/2;
const int numThreads = 4;
int elemsPerThread = numElements / numThreads;
std::vector<int> indices;
int j = 0;
for(std::size_t i=1; i<x_test.size()+1; ++i){
int prod = i*(i+1)/2 - j*(j+1)/2;
if (prod > elemsPerThread) {
i--;
j = i;
indices.push_back(i);
if(indices.size() == numThreads-1)
break;
}
}
indices.insert(indices.begin(), 0);
indices.push_back(x_test.size());
/* Spreding calculations to multiple threads */
std::vector<std::thread> threads;
for(std::size_t i = 1; i < indices.size(); ++i){
threads.push_back(std::thread(calculateKMatrixCpp, x_test, lengthScale, sigma0sq, i, indices.at(i-1), indices.at(i)));
}
for(auto & th: threads){
th.join();
}
return 0;
}
As you can see, each thread performs the following calculations on the data assigned to it:
void calculateKMatrixCpp(const std::vector<std::vector<double>> xtest, double lengthScale[], double sigma0sq, int threadCounter, int start, int stop){
char buffer[8192];
std::ofstream out("lower_half_matrix_" + std::to_string(threadCounter) +".csv");
out.rdbuf()->pubsetbuf(buffer, 8196);
for(int i = start; i < stop; ++i){
for(int j = 0; j < i+1; ++j){
double kij = seKernel(xtest.at(i), xtest.at(j), lengthScale, sigma0sq);
if (j!=0)
out << ',';
out << kij;
}
if(i!=xtest.size()-1 )
out << '\n';
}
out.close();
}
and
double seKernel(const std::vector<double> x1,const std::vector<double> x2, double lengthScale[], double sigma0sq) {
double sum(0);
for(std::size_t i=0; i<x1.size();i++){
sum += pow((x1.at(i)-x2.at(i))/lengthScale[i],2);
}
return sigma0sq*exp(-0.5*sum);
}
Aspects I considered
locking by simultaneous access to data vector -> I don't pass a reference to the threads, but a copy of the data. I know this is not optimal in terms of RAM usage, but as far as I know this should prevent simultaneous data access since every thread has its own copy
Output -> every thread writes its part of the lower triangular matrix to its own file. My task manager doesn't indicate a full SSD utilization in the slightest
Compiler and machine
Windows 11
GNU GCC Compiler
Code::Blocks (although I don't think that should be of importance)

There are many details that can be improved in your code, but I think the two biggest issues are:
using vectors or vectors, which leads to fragmented data;
writing each piece of data to file as soon as its value is computed.
The first point is easy to fix: use something like std::vector<std::array<double, 3>>. In the code below I use an alias to make it more readable:
using Point3D = std::array<double, 3>;
std::vector<Point3D> x_test;
The second point is slightly harder to address. I assume you wanted to write to the disk inside each thread because you couldn't manage to write to a shared buffer that you could then write to a file.
Here is a way to do exactly that:
void calculateKMatrixCpp(
std::vector<Point3D> const& xtest, Point3D const& lengthScale, double sigma0sq,
int threadCounter, int start, int stop, std::vector<double>& kMatrix
) {
// ...
double& kij = kMatrix[i * xtest.size() + j];
kij = seKernel(xtest[i], xtest[j], lengthScale, sigma0sq);
// ...
}
// ...
threads.push_back(std::thread(
calculateKMatrixCpp, x_test, lengthScale, sigma0sq,
i, indices[i-1], indices[i], std::ref(kMatrix)
));
Here, kMatrix is the shared buffer and represents the whole matrix you are trying to compute. You need to pass it to the thread via std::ref. Each thread will write to a different location in that buffer, so there is no need for any mutex or other synchronization.
Once you make these changes and try to write kMatrix to the disk, you will realize that this is the part that takes the most time, by far.
Below is the full code I tried on my machine, and the computation time was about 2 seconds whereas the writing-to-file part took 300 seconds! No amount of multithreading can speed that up.
If you truly want to write all that data to the disk, you may have some luck with file mapping. Computing the exact size needed should be easy enough if all values have the same number of digits, and it looks like you could write the values with multithreading. I have never done anything like that, so I can't really say much more about it, but it looks to me like the fastest way to write multiple gigabytes of memory to the disk.
#include <vector>
#include <thread>
#include <iostream>
#include <string>
#include <cmath>
#include <array>
#include <random>
#include <fstream>
#include <chrono>
using Point3D = std::array<double, 3>;
auto generateSampleData() -> std::vector<Point3D> {
static std::minstd_rand g(std::random_device{}());
std::uniform_real_distribution<> d(-1.0, 1.0);
std::vector<Point3D> data;
data.reserve(27000);
for (auto i = 0; i < 27000; ++i) {
data.push_back({ d(g), d(g), d(g) });
}
return data;
}
double seKernel(Point3D const& x1, Point3D const& x2, Point3D const& lengthScale, double sigma0sq) {
double sum = 0.0;
for (auto i = 0u; i < 3u; ++i) {
double distance = (x1[i] - x2[i]) / lengthScale[i];
sum += distance*distance;
}
return sigma0sq * std::exp(-0.5*sum);
}
void calculateKMatrixCpp(std::vector<Point3D> const& xtest, Point3D const& lengthScale, double sigma0sq, int threadCounter, int start, int stop, std::vector<double>& kMatrix) {
std::cout << "start of thread " << threadCounter << "\n" << std::flush;
for(int i = start; i < stop; ++i) {
for(int j = 0; j < i+1; ++j) {
double& kij = kMatrix[i * xtest.size() + j];
kij = seKernel(xtest[i], xtest[j], lengthScale, sigma0sq);
}
}
std::cout << "end of thread " << threadCounter << "\n" << std::flush;
}
int main() {
double sigma0sq = 1;
Point3D lengthScale = {0.7633, 0.6937, 3.3307e+07};
const std::vector<Point3D> x_test = generateSampleData();
/* Finding data slices of similar size */
//This piece of code works, each thread is assigned roughly the same number of matrix entries
int numElements = x_test.size()*x_test.size()/2;
const int numThreads = 4;
int elemsPerThread = numElements / numThreads;
std::vector<int> indices;
int j = 0;
for(std::size_t i = 1; i < x_test.size()+1; ++i){
int prod = i*(i+1)/2 - j*(j+1)/2;
if (prod > elemsPerThread) {
i--;
j = i;
indices.push_back(i);
if(indices.size() == numThreads-1)
break;
}
}
indices.insert(indices.begin(), 0);
indices.push_back(x_test.size());
auto start = std::chrono::system_clock::now();
std::vector<double> kMatrix(x_test.size() * x_test.size(), 0.0);
std::vector<std::thread> threads;
for (std::size_t i = 1; i < indices.size(); ++i) {
threads.push_back(std::thread(calculateKMatrixCpp, x_test, lengthScale, sigma0sq, i, indices[i - 1], indices[i], std::ref(kMatrix)));
}
for (auto& t : threads) {
t.join();
}
auto end = std::chrono::system_clock::now();
auto elapsed_seconds = std::chrono::duration<double>(end - start).count();
std::cout << "computation time: " << elapsed_seconds << "s" << std::endl;
start = std::chrono::system_clock::now();
constexpr int buffer_size = 131072;
char buffer[buffer_size];
std::ofstream out("matrix.csv");
out.rdbuf()->pubsetbuf(buffer, buffer_size);
for (int i = 0; i < x_test.size(); ++i) {
for (int j = 0; j < i + 1; ++j) {
if (j != 0) {
out << ',';
}
out << kMatrix[i * x_test.size() + j];
}
if (i != x_test.size() - 1) {
out << '\n';
}
}
end = std::chrono::system_clock::now();
elapsed_seconds = std::chrono::duration<double>(end - start).count();
std::cout << "writing time: " << elapsed_seconds << "s" << std::endl;
}

Okey I've wrote implementation with optimized formatting.
By using #Nelfeal code it was taking on my system around 250 seconds for the run to complete with write time taking the most by far. Or rather std::ofstream formatting taking most of the time.
I've written a C++20 version via std::format_to/format. It is a multi-threaded version that takes around 25-40 seconds to complete all the computations, formatting, and writing. If run in a single thread, it takes on my system around 70 seconds. Same performance should be achievable via fmt library on C++11/14/17.
Here is the code:
import <vector>;
import <thread>;
import <iostream>;
import <string>;
import <cmath>;
import <array>;
import <random>;
import <fstream>;
import <chrono>;
import <format>;
import <filesystem>;
using Point3D = std::array<double, 3>;
auto generateSampleData(Point3D scale) -> std::vector<Point3D>
{
static std::minstd_rand g(std::random_device{}());
std::uniform_real_distribution<> d(-1.0, 1.0);
std::vector<Point3D> data;
data.reserve(27000);
for (auto i = 0; i < 27000; ++i)
{
data.push_back({ d(g)* scale[0], d(g)* scale[1], d(g)* scale[2] });
}
return data;
}
double seKernel(Point3D const& x1, Point3D const& x2, Point3D const& lengthScale, double sigma0sq) {
double sum = 0.0;
for (auto i = 0u; i < 3u; ++i) {
double distance = (x1[i] - x2[i]) / lengthScale[i];
sum += distance * distance;
}
return sigma0sq * std::exp(-0.5 * sum);
}
void calculateKMatrixCpp(std::vector<Point3D> const& xtest, Point3D lengthScale, double sigma0sq, int threadCounter, int start, int stop, std::filesystem::path localPath)
{
using namespace std::string_view_literals;
std::vector<char> buffer;
buffer.reserve(15'000);
std::ofstream out(localPath);
std::cout << std::format("starting thread {}: from {} to {}\n"sv, threadCounter, start, stop);
for (int i = start; i < stop; ++i)
{
for (int j = 0; j < i; ++j)
{
double kij = seKernel(xtest[i], xtest[j], lengthScale, sigma0sq);
std::format_to(std::back_inserter(buffer), "{:.6g}, "sv, kij);
}
double kii = seKernel(xtest[i], xtest[i], lengthScale, sigma0sq);
std::format_to(std::back_inserter(buffer), "{:.6g}\n"sv, kii);
out.write(buffer.data(), buffer.size());
buffer.clear();
}
}
int main() {
double sigma0sq = 1;
Point3D lengthScale = { 0.7633, 0.6937, 3.3307e+07 };
const std::vector<Point3D> x_test = generateSampleData(lengthScale);
/* Finding data slices of similar size */
//This piece of code works, each thread is assigned roughly the same number of matrix entries
int numElements = x_test.size() * (x_test.size()+1) / 2;
const int numThreads = 3;
int elemsPerThread = numElements / numThreads;
std::vector<int> indices;
int j = 0;
for (std::size_t i = 1; i < x_test.size() + 1; ++i) {
int prod = i * (i + 1) / 2 - j * (j + 1) / 2;
if (prod > elemsPerThread) {
i--;
j = i;
indices.push_back(i);
if (indices.size() == numThreads - 1)
break;
}
}
indices.insert(indices.begin(), 0);
indices.push_back(x_test.size());
auto start = std::chrono::system_clock::now();
std::vector<std::thread> threads;
using namespace std::string_view_literals;
for (std::size_t i = 1; i < indices.size(); ++i)
{
threads.push_back(std::thread(calculateKMatrixCpp, std::ref(x_test), lengthScale, sigma0sq, i, indices[i - 1], indices[i], std::format("./matrix_{}.csv"sv, i-1)));
}
for (auto& t : threads)
{
t.join();
}
auto end = std::chrono::system_clock::now();
auto elapsed_seconds = std::chrono::duration<double>(end - start);
std::cout << std::format("total elapsed time: {}"sv, elapsed_seconds);
return 0;
}
Note: I used 6 digits of precision here as it is the default for std::ofstream. More digits means more writing time to disk and lower performance.

Fixing Neural Net vanishing gradients problem?

This is going to be a long one. I am still very new to coding, started 3 months ago so I know my code is not perfect, any criticism beyond the question is more than welcome. I have specifically avoided using pointers because I do not fully understand them, I can use them but I dont trust that I will use them correctly in a program like this.
First things first, I have a version of this where there is only 1 hidden layer and the net works perfectly. I have started running into problems since I tried to expand the number of hidden layers.
Some info on the net:
-I am using softmax output activation as I have 3 output neurons.
-I am using tanh as my activation function on the rest of the net.
-The file being read for the input has a format of
"input: 0.56 0.76 0.23 0.67"
"output: 0.0 0.0 1.0" (this is the target)
-The weights for connecting layer 1 neuron to layer 2 neuron are stored in layer 1 one neuron.
-The bias's for each neuron are stored in that neuron.
-The target is 1.0 0.0 0.0 if the sum of the input numbers is below one, 0.0 1.0 0.0 if sum is between 1 and 2, 0.0 0.0 1.0 if sum is above 2.
-using L1 regularization.
Those problems specifically being:
The softmax output values do not move from an relatively equalised range ie:
(position 1 and 2 in the target vector have a roughly 50/50 occurance rate while position 3 less than 3% occurance rate. so by relatively equalised I mean the softmax output generally looks something like
"0.56.... 0.48.... 0.02..." even after 500 epochs.
The weights at the hidden layer closer to inputlayer dont change much at all, which is what i think vanishing gradients are. I might be wrong on this. But the weights at hiddenlayer closest to output are ending up at between -50 & 50 (which i think is okay?)
Things I have tried:
I have tried using Relu, parametric Relu, exponential Relu, but with all of these the softmax output value for neuron 3 keeps rising, the other 2 neurons values keep falling. these values continue their trajectory until either 500 epochs have been reached or they just turn into nans. (I think this is to do with the structure of my code rather than the Relu function itself).
If I set the number of hidden layers above 3 while using relu, it immediately spits out nans, within the first epoch.
The backprop function is pretty long, but this is specifically because I have deconstructed it many times over to try and figure out where I might be mismatching values or something. I do have it in a condensed version but I feel I have a higher chance of being completely off the mark there than I do if I have it deconstructed.
I have included the Relu function code that I used, it is the first time I use it so I might be wrong on that aswell but I dont think so, I have double checked multiple times. The Relu in the code is specifically "Elu" or exponential relu.
here is the code for the net:
#include <iostream>
#include <fstream>
#include <cmath>
#include <vector>
#include <sstream>
#include <random>
#include <string>
#include <iomanip>
double randomt(double x, double y)
{
std::random_device rd;
std::mt19937 mt(rd());
std::uniform_real_distribution<double> dist(x, y);
return dist(mt);
}
class InputN
{
public:
double val{};
std::vector <double> weights{};
};
class HiddenN
{
public:
double preactval{};
double actval{};
double actvalPD{};
double preactvalpd{};
std::vector <double> weights{};
double bias{};
};
class OutputN
{
public:
double preactval{};
double actval{};
double preactvalpd{};
double bias{};
};
class Net
{
public:
std::vector <InputN> inneurons{};
std::vector <std::vector <HiddenN>> hiddenneurons{};
std::vector <OutputN> outputneurons{};
double lambda{ 0.015 };
double alpha{ 0.02 };
};
double tanhderiv(double val)
{
return 1 - tanh(val) * tanh(val);
}
double Relu(double val)
{
if (val < 0) return 0.01 *(exp(val) - 1);
else return val;
}
double Reluderiv(double val)
{
if (val < 0) return Relu(val) + 0.01;
else return 1;
}
double regularizer(double weight)
{
double absval{};
if (weight < 0) absval = weight - weight - weight;
else if (weight > 0 || weight == 0) absval = weight;
else;
if (absval > 0) return 1;
else if (absval < 0) return -1;
else if (absval == 0) return 0;
else return 2;
}
void feedforward(Net& net)
{
double sum{};
int prevlayer{};
for (size_t Hsize = 0; Hsize < net.hiddenneurons.size(); Hsize++)
{
//std::cout << "in first loop" << '\n';
prevlayer = Hsize - 1;
for (size_t Hel = 0; Hel < net.hiddenneurons[Hsize].size(); Hel++)
{
//std::cout << "in second loop" << '\n';
if (Hsize == 0)
{
//std::cout << "in first if" << '\n';
for (size_t Isize = 0; Isize < net.inneurons.size(); Isize++)
{
//std::cout << "in fourth loop" << '\n';
sum += (net.inneurons[Isize].val * net.inneurons[Isize].weights[Hel]);
}
net.hiddenneurons[Hsize][Hel].preactval = net.hiddenneurons[Hsize][Hel].bias + sum;
net.hiddenneurons[Hsize][Hel].actval = tanh(sum);
sum = 0;
//std::cout << "first if done" << '\n';
}
else
{
//std::cout << "in else" << '\n';
for (size_t prs = 0; prs < net.hiddenneurons[prevlayer].size(); prs++)
{
//std::cout << "in fourth loop" << '\n';
sum += net.hiddenneurons[prevlayer][prs].actval * net.hiddenneurons[prevlayer][prs].weights[Hel];
}
//std::cout << "fourth loop done" << '\n';
net.hiddenneurons[Hsize][Hel].preactval = net.hiddenneurons[Hsize][Hel].bias + sum;
net.hiddenneurons[Hsize][Hel].actval = tanh(sum);
//std::cout << "else done" << '\n';
sum = 0;
}
}
}
//std::cout << "first loop done " << '\n';
int lasthid = net.hiddenneurons.size() - 1;
for (size_t Osize = 0; Osize < net.outputneurons.size(); Osize++)
{
for (size_t Hsize = 0; Hsize < net.hiddenneurons[lasthid].size(); Hsize++)
{
sum += (net.hiddenneurons[lasthid][Hsize].actval * net.hiddenneurons[lasthid][Hsize].weights[Osize]);
}
net.outputneurons[Osize].preactval = net.outputneurons[Osize].bias + sum;
}
}
void softmax(Net& net)
{
double sum{};
for (size_t Osize = 0; Osize < net.outputneurons.size(); Osize++)
{
sum += exp(net.outputneurons[Osize].preactval);
}
for (size_t Osize = 0; Osize < net.outputneurons.size(); Osize++)
{
net.outputneurons[Osize].actval = exp(net.outputneurons[Osize].preactval) / sum;
}
}
void lossfunc(Net& net, std::vector <double> target)
{
int pos{ -1 };
double val{};
for (size_t t = 0; t < target.size(); t++)
{
pos += 1;
if (target[t] > 0)
{
break;
}
}
for (size_t s = 0; net.outputneurons.size(); s++)
{
val = -log(net.outputneurons[pos].actval);
}
}
void backprop(Net& net, std::vector<double>& target)
{
for (size_t outI = 0; outI < net.outputneurons.size(); outI++)
{
double PD = target[outI] - net.outputneurons[outI].actval;
net.outputneurons[outI].preactvalpd = PD * -1;
}
size_t lasthid = net.hiddenneurons.size() - 1;
for (size_t LH = 0; LH < net.hiddenneurons[lasthid].size(); LH++)
{
for (size_t LHW = 0; LHW < net.hiddenneurons[lasthid][LH].weights.size(); LHW++)
{
double weight = net.hiddenneurons[lasthid][LH].weights[LHW];
double PD = net.outputneurons[LHW].preactvalpd * net.hiddenneurons[lasthid][LH].actval;
PD = PD * -1;
double delta = PD - (net.lambda * regularizer(weight));
weight = weight + (net.alpha * delta);
net.hiddenneurons[lasthid][LH].weights[LHW] = weight;
}
}
for (size_t OB = 0; OB < net.outputneurons.size(); OB++)
{
double bias = net.outputneurons[OB].bias;
double BPD = net.outputneurons[OB].preactvalpd;
BPD = BPD * -1;
double Delta = BPD;
bias = bias + (net.alpha * Delta);
}
for (size_t HPD = 0; HPD < net.hiddenneurons[lasthid].size(); HPD++)
{
double PD{};
for (size_t HW = 0; HW < net.outputneurons.size(); HW++)
{
PD += net.hiddenneurons[lasthid][HPD].weights[HW] * net.outputneurons[HW].preactvalpd;
}
net.hiddenneurons[lasthid][HPD].actvalPD = PD;
PD = 0;
}
for (size_t HPD = 0; HPD < net.hiddenneurons[lasthid].size(); HPD++)
{
net.hiddenneurons[lasthid][HPD].preactvalpd = net.hiddenneurons[lasthid][HPD].actvalPD * tanhderiv(net.hiddenneurons[lasthid][HPD].preactval);
}
for (size_t AllHid = net.hiddenneurons.size() - 2; AllHid > -1; AllHid--)
{
size_t uplayer = AllHid + 1;
for (size_t cl = 0; cl < net.hiddenneurons[AllHid].size(); cl++)
{
for (size_t clw = 0; clw < net.hiddenneurons[AllHid][cl].weights.size(); clw++)
{
double weight = net.hiddenneurons[AllHid][cl].weights[clw];
double PD = net.hiddenneurons[uplayer][clw].preactvalpd * net.hiddenneurons[AllHid][cl].actval;
PD = PD * -1;
double delta = PD - (net.lambda * regularizer(weight));
weight = weight + (net.alpha * delta);
net.hiddenneurons[AllHid][cl].weights[clw] = weight;
}
}
for (size_t up = 0; up < net.hiddenneurons[uplayer].size(); up++)
{
double bias = net.hiddenneurons[uplayer][up].bias;
double PD = net.hiddenneurons[uplayer][up].preactvalpd;
PD = PD * -1;
double delta = PD;
bias = bias + (net.alpha * delta);
}
for (size_t APD = 0; APD < net.hiddenneurons[AllHid].size(); APD++)
{
double PD{};
for (size_t APDW = 0; APDW < net.hiddenneurons[AllHid][APD].weights.size(); APDW++)
{
PD += net.hiddenneurons[AllHid][APD].weights[APDW] * net.hiddenneurons[uplayer][APDW].preactvalpd;
}
net.hiddenneurons[AllHid][APD].actvalPD = PD;
PD = 0;
}
for (size_t PPD = 0; PPD < net.hiddenneurons[AllHid].size(); PPD++)
{
double PD = net.hiddenneurons[AllHid][PPD].actvalPD * tanhderiv(net.hiddenneurons[AllHid][PPD].preactval);
net.hiddenneurons[AllHid][PPD].preactvalpd = PD;
}
}
for (size_t IN = 0; IN < net.inneurons.size(); IN++)
{
for (size_t INW = 0; INW < net.inneurons[IN].weights.size(); INW++)
{
double weight = net.inneurons[IN].weights[INW];
double PD = net.hiddenneurons[0][INW].preactvalpd * net.inneurons[IN].val;
PD = PD * -1;
double delta = PD - (net.lambda * regularizer(weight));
weight = weight + (net.alpha * delta);
net.inneurons[IN].weights[INW] = weight;
}
}
for (size_t hidB = 0; hidB < net.hiddenneurons[0].size(); hidB++)
{
double bias = net.hiddenneurons[0][hidB].bias;
double PD = net.hiddenneurons[0][hidB].preactvalpd;
PD = PD * -1;
double delta = PD;
bias = bias + (net.alpha * delta);
net.hiddenneurons[0][hidB].bias = bias;
}
}
int main()
{
std::vector <double> invals{ };
std::vector <double> target{ };
Net net;
InputN Ineuron;
HiddenN Hneuron;
OutputN Oneuron;
int IN = 4;
int HIDLAYERS = 4;
int HID = 8;
int OUT = 3;
for (int i = 0; i < IN; i++)
{
net.inneurons.push_back(Ineuron);
for (int m = 0; m < HID; m++)
{
net.inneurons.back().weights.push_back(randomt(0.0, 0.5));
}
}
//std::cout << "first loop done" << '\n';
for (int s = 0; s < HIDLAYERS; s++)
{
net.hiddenneurons.push_back(std::vector <HiddenN>());
if (s == HIDLAYERS - 1)
{
for (int i = 0; i < HID; i++)
{
net.hiddenneurons[s].push_back(Hneuron);
for (int m = 0; m < OUT; m++)
{
net.hiddenneurons[s].back().weights.push_back(randomt(0.0, 0.5));
}
net.hiddenneurons[s].back().bias = 1.0;
}
}
else
{
for (int i = 0; i < HID; i++)
{
net.hiddenneurons[s].push_back(Hneuron);
for (int m = 0; m < HID; m++)
{
net.hiddenneurons[s].back().weights.push_back(randomt(0.0, 0.5));
}
net.hiddenneurons[s].back().bias = 1.0;
}
}
}
//std::cout << "second loop done" << '\n';
for (int i = 0; i < OUT; i++)
{
net.outputneurons.push_back(Oneuron);
net.outputneurons.back().bias = randomt(0.0, 0.5);
}
//std::cout << "third loop done" << '\n';
int count{};
std::ifstream fileread("N.txt");
for (int epoch = 0; epoch < 500; epoch++)
{
count = 0;
if (epoch == 100 || epoch == 100 * 2 || epoch == 100 * 3 || epoch == 100 * 4 || epoch == 499)
{
printvals("no", net);
}
fileread.clear(); fileread.seekg(0, std::ios::beg);
while (fileread.is_open())
{
std::cout << '\n' << "epoch: " << epoch << '\n';
std::string fileline{};
fileread >> fileline;
if (fileline == "in:")
{
std::string input{};
double nums{};
std::getline(fileread, input);
std::stringstream ss(input);
while (ss >> nums)
{
invals.push_back(nums);
}
}
if (fileline == "out:")
{
std::string output{};
double num{};
std::getline(fileread, output);
std::stringstream ss(output);
while (ss >> num)
{
target.push_back(num);
}
}
count += 1;
if (count == 2)
{
for (size_t inv = 0; inv < invals.size(); inv++)
{
net.inneurons[inv].val = invals[inv];
}
//std::cout << "calling feedforward" << '\n';
feedforward(net);
//std::cout << "ff done" << '\n';
softmax(net);
printvals("output", net);
std::cout << "target: " << '\n';
for (auto element : target) std::cout << element << " / ";
std::cout << '\n';
backprop(net, target);
invals.clear();
target.clear();
count = 0;
}
if (fileread.eof()) break;
}
}
//std::cout << "fourth loop done" << '\n';
return 1;
}
Much aprecciated to anyone who actually made it through all that! :)

C++ Memory Error

When I compile my code, I repeatedly get the error
free(): invalid next size (fast)
Yet the code only goes so far as to create references. Specifically, commenting out a specific line seems to fix the error; however, it's a very important line.
void neuron::updateWeights(layer &prevLayer) {
for(unsigned i = 0; i < prevLayer.size(); i++) {
double oldDeltaWeight = prevLayer[i].m_connections[m_index].m_deltaWeight;
double newDeltaWeight = eta * prevLayer[i].m_output * m_gradient + alpha * oldDeltaWeight;
prevLayer[i].m_connections[m_index].m_deltaWeight = newDeltaWeight; // THIS LINE
prevLayer[i].m_connections[m_index].m_weight += newDeltaWeight;
}
}
Any help would be very appreciated!
EDIT:
Additional code
// Headers
#include "../../Include/neuralNet.h"
// Libraries
#include <vector>
#include <iostream>
#include <cmath>
// Namespace
using namespace std;
// Class constructor
neuron::neuron(unsigned index, unsigned outputs) {
m_index = index;
for(unsigned i = 0; i < outputs; i++) {
m_connections.push_back(connection());
}
// Set default neuron output
setOutput(1.0);
}
double neuron::eta = 0.15; // overall net learning rate, [0.0..1.0]
double neuron::alpha = 0.5; // momentum, multiplier of last deltaWeight, [0.0..1.0]
// Definition of transfer function method
double neuron::transferFunction(double x) const {
return tanh(x); // -1 -> 1
}
// Transfer function derivation method
double neuron::transferFunctionDerivative(double x) const {
return 1 - x*x; // Derivative of tanh
}
// Set output value
void neuron::setOutput(double value) {
m_output = value;
}
// Forward propagate
void neuron::recalculate(layer &previousLayer) {
double sum = 0.0;
for(unsigned i = 0; i < previousLayer.size(); i++) {
sum += previousLayer[i].m_output * previousLayer[i].m_connections[m_index].m_weight;
}
setOutput(transferFunction(sum));
}
// Change weights based on target
void neuron::updateWeights(layer &prevLayer) {
for(unsigned i = 0; i < prevLayer.size(); i++) {
double oldDeltaWeight = prevLayer[i].m_connections[m_index].m_deltaWeight;
double newDeltaWeight = eta * prevLayer[i].m_output * m_gradient + alpha * oldDeltaWeight;
prevLayer[i].m_connections[m_index].m_deltaWeight = newDeltaWeight;
prevLayer[i].m_connections[m_index].m_weight += newDeltaWeight;
}
}
// Complex math stuff
void neuron::calculateOutputGradients(double target) {
double delta = target - m_output;
m_gradient = delta * transferFunctionDerivative(m_output);
}
double neuron::sumDOW(const layer &nextLayer) {
double sum = 0.0;
for(unsigned i = 1; i < nextLayer.size(); i++) {
sum += m_connections[i].m_weight * nextLayer[i].m_gradient;
}
return sum;
}
void neuron::calculateHiddenGradients(const layer &nextLayer) {
double dow = sumDOW(nextLayer);
m_gradient = dow * neuron::transferFunctionDerivative(m_output);
}
Also the line is called here
// Update weights
for(unsigned layerIndex = m_layers.size() - 1; layerIndex > 0; layerIndex--) {
layer &currentLayer = m_layers[layerIndex];
layer &previousLayer = m_layers[layerIndex - 1];
for(unsigned i = 1; i < currentLayer.size(); i++) {
currentLayer[i].updateWeights(previousLayer);
}
}

Your constructor initialize N 'outputs' m_connections in the class.
But you have a lot of places calling:
m_connections[m_index]
What happens if m_index > outputs? Is this possible in your problem?
Try including an assert (http://www.cplusplus.com/reference/cassert/assert/) in the first line of the constructor:
assert(index < outputs)
You are probably having a bad pointer access somewhere.

How to set a timeout in a function that is running in a thread

I'm doing a blocking communication with a server using a client. the function is running in a thread. I would like to set a time out functionality. I'm not using boost or something like that. I'm using windows threading library.
Here is the function that I want to set a time out functionality in it.
bool S3W::IWFSData::WaitForCompletion(unsigned int timeout)
{
if (m_Buffer)
{
while (!m_Buffer.IsEmpty())
{
unsigned int i = 0;
char gfname[255]; // must be changed to SBuffer
char minHeightArr[8], maxHeightArr[8], xArr[8], yArr[8];
m_PingTime += timeout;
if (m_PingTime > PONG_TIMEOUT)
{
m_PingTime = 0;
return false;
}
while (m_Buffer[i] != '\0')
{
gfname[i] = m_Buffer[i];
i++;
}
gfname[i] = '\0';
for (unsigned int j = 0; j < 8; j++)
{
minHeightArr[j] = m_Buffer[i++];
}
for (unsigned int j = 0; j < 8; j++)
{
maxHeightArr[j] = m_Buffer[i++];
}
double minH = *(double*)minHeightArr;
double maxH = *(double*)maxHeightArr;
for (unsigned int j = 0; j < 8; j++)
{
xArr[j] = m_Buffer[i++];
}
for (unsigned int j = 0; j < 8; j++)
{
yArr[j] = m_Buffer[i++];
}
double x = *(double*)xArr;
double y = *(double*)yArr;
OGRFeature *poFeature = OGRFeature::CreateFeature(m_Layer->GetLayerDefn());
if(poFeature)
{
poFeature->SetField("gfname", gfname);
poFeature->SetField("minHeight", minH);
poFeature->SetField("maxHeight", maxH);
OGRPoint point;
point.setX(x);
point.setY(y);
poFeature->SetGeometry(&point);
if (m_Layer->CreateFeature(poFeature) != OGRERR_NONE)
{
std::cout << "error inserting an area" << std::endl;
}
else
{
std::cout << "Created a feature" << std::endl;
}
}
OGRFeature::DestroyFeature(poFeature);
m_Buffer.Cut(0, i);
}
}
return true;
}
There is a thread that is setting the data to the buffer
int S3W::ImplConnection::Thread(void * pData)
{
SNet::SAutoLock lockReader(m_sLock);
// RECEIVE DATA
SNet::SBuffer buffer;
m_data->SrvReceive(buffer);
// Driver code for inserting data into the buffer in blocking communication
SNet::SAutoLock lockWriter(m_sLockWriter);
m_data->SetData("ahmed", strlen("ahmed"));
double minHeight = 10;
double maxHeight = 11;
double x = 4;
double y = 2;
char minHeightArr[sizeof(minHeight)];
memcpy(&minHeightArr, &minHeight, sizeof(minHeight));
char maxHeightArr[sizeof(maxHeight)];
memcpy(&maxHeightArr, &maxHeight, sizeof(maxHeight));
char xArr[sizeof(x)];
memcpy(&xArr, &x, sizeof(x));
char yArr[sizeof(y)];
memcpy(&yArr, &y, sizeof(y));
m_data->SetData(minHeightArr, sizeof(minHeightArr));
m_data->SetData(maxHeightArr, sizeof(maxHeightArr));
m_data->SetData(xArr, sizeof(xArr));
m_data->SetData(yArr, sizeof(yArr));
m_data->WaitForCompletion(1000);
return LOOP_TIME;
}

In general, you should not use threads for these purposes, because when terminating a thread like this, the process and the other threads could be left in an unknown state. Look here for the explanation.
Therefore, consider using procceses instead. Read here about opening processes in c++.
If you do want to use threads, you can exit the thread after the time passed.
Make a loop (as you have) that will break when some time has elapsed.
#include <ctime>
#define NUM_SECONDS_TO_WAIT 5
// outside your loop
std::time_t t1 = std::time(0);
// and in your while loop, each iteration:
std::time_t t2 = std::time(0);
if ((t2 - t1) >= NUM_SECONDS_TO_WAIT)
{
break; // ...
}

You can have a class member which holds a time stamp (when to timeout, set its value to currentTime + intervalToTimeout). In WaitForCompletion(), get current time and compare with the timeout time.
I assume in your code, m_PingTime is the time you start communication. You want to timeout after 1000 ms. What you need to do is, in WaitForCompletion():
while (!m_Buffer.IsEmpty())
{
...
long time = getCurrentTime(); // fake code
if (m_PingTime + timeout < time)
{
m_PingTime = 0;
return false;
}
...
}

Here is something I did, if you want to implement it yourself:
clock_t startTime = clock();
clock_t timeElapsed;
double counter;
while(true){
if(counter>=10){
//do what you want upon timeout, in this case it is 10 secs
}
startTime = clock();
}

Debugging a bad_alloc error c++

When I run my code everything seems to be working fine but after a certain number of timesteps (usually ~100, but a different number each time) I get the error:
"terminate called after throwing an instance of 'std::bad_alloc' "
Not really sure how to go about debugging this as it doesn't happen at the same point each time the code runs. I will post my code but it's quite long and is admittedly a bit of a mess (this is my first real attempt at writing a program in c++), but I will try and explain the structure and where I would expect the most likely place for the origin of the error to be.
The basic structure is that I have an array of "birds" (a class I define) that choose how to update themselves at every time step by some quite complicated calculation. In doing so it regularly calls the function getVisualState to update a linked list that every bird stores as its "visual state". I believe this is the only time I allocate any memory dynamically during the simulation, so I guess there's a pretty good chance this is the source of the error. The function Bird::resetVisualState() should clear the allocated memory after it's been used (but it doesn't seem like I am running out of memory, at least monitoring it in the task manager).
If anyone can see anything they think may be the source of the problem that would be fantastic, or if not just any suggestions for how I should actually debug this!
#include <iostream>
#include <cmath>
#include <gsl/gsl_rng.h>
#include <gsl/gsl_randist.h>
#include <ctime>
#include <vector>
#include <algorithm>
#include <fstream>
#include "birdClasses.h"
using namespace std;
/*
nBirds, nSteps, nF, v, dt, birdRad defined in "birdClasses.h"
*/
//define other parameters.
const int nSensors = 20;
const int nMoves = 3; //no. possible moves at each step.
double dTheta = 15*M_PI/180.0; //angle that birds can change their orientation by in a timestep.
double moves[nMoves] = {-dTheta, 0, dTheta}; //possible moves.
double noise = 0.0;
double initBoxX = 20, initBoxY = 20; //size of initial box particles are placed in.
double sensorFrac[nSensors];
double sensorRef[nSensors];
double sensorRange = 2*M_PI/((double)nSensors);
int counter = 0;
int nps = numStates(nMoves,nF);
int *possibleStates = new int[nps];
//variables to record positions and orientations.
double xPositions[nSteps][nBirds], yPositions[nSteps][nBirds], orientations[nSteps][nBirds];
//array to keep track of which collisions are possible.
int couldCollide[nF][nBirds][nBirds];
//function prototypes
bool checkCollision(int i, int nFut, Bird *birds, double xi, double yi);
unsigned long int getVisualState(Bird *birdList, int nFut, int i, double cX, double cY, double cAng);
void updateTree(double exploreX, double exploreY, double exploreO, Bird *bird, int bn, int nFut);
int main()
{
sensorRef[0] = sensorRange;
for(int u=1; u<nSensors; u++) sensorRef[u] = sensorRef[u-1] + sensorRange;
//set up GSL random number generator.
const gsl_rng_type * Tr;
gsl_rng * RNG;
gsl_rng_env_setup();
Tr = gsl_rng_default;
RNG = gsl_rng_alloc (Tr);
gsl_rng_set(RNG,time(NULL));
//set up output
ofstream output("output.txt");
//initialize birds in a box randomly, all with the same orientation.
Bird birdList[nBirds];
for(int i=0; i<nBirds; i++) {
birdList[i].set_position(gsl_ran_flat(RNG,0,initBoxX),gsl_ran_flat(RNG,0,initBoxY));
}
//ACTUAL CODE
int uniqueVisStates[nMoves];
double cX, cY, fX, fY, exploreX, exploreY, exploreO;
//main time step loop
for(int ts=0; ts<nSteps; ts++) {
//save current positions
for(int i=0; i<nBirds; i++) {
xPositions[ts][i] = birdList[i].get_xPos();
yPositions[ts][i] = birdList[i].get_yPos();
orientations[ts][i] = birdList[i].get_orientation();
birdList[i].updateFuture();
}
//update list of possible collisions.
for(int nFut=0; nFut<nF; nFut++) {
for(int i=0; i<nBirds; i++) {
cX = birdList[i].get_xPos(); cY = birdList[i].get_yPos();
counter = 0;
for(int j=0; j<nBirds; j++) {
if(i==j) {
continue;
} else {
fX = birdList[j].get_futureX(nFut); fY = birdList[j].get_futureY(nFut);
if((cX-fX)*(cX-fX)+(cY-fY)*(cY-fY) < ((nFut+1)*v*dt+2*birdRad)*((nFut+1)*v*dt+2*birdRad)) {
couldCollide[nFut][i][counter]=j;
counter++;
}
}
}
if(counter < nBirds) couldCollide[nFut][i][counter]=-1;
}
}
//loop over birds to choose how they update their orientation.
for(int bn=0; bn<nBirds; bn++) {
//loop over possible moves bird can make NOW.
for(int l=0; l<nMoves; l++) {
uniqueVisStates[l]=0;
}
for(int mn=0; mn<nMoves; mn++) {
for(int l=0; l<nps; l++) {
possibleStates[l]=0;
}
counter = 0;
exploreO = birdList[bn].get_orientation() + moves[mn];
exploreX = birdList[bn].get_xPos() + cos(exploreO)*v*dt;
exploreY = birdList[bn].get_yPos() + sin(exploreO)*v*dt;
updateTree(exploreX,exploreY,exploreO,&birdList[0],bn,0);
vector<int> visStates (possibleStates,possibleStates+counter);
vector<int>::iterator it;
sort (visStates.begin(),visStates.end());
it = unique(visStates.begin(),visStates.end());
uniqueVisStates[mn] = distance(visStates.begin(),it);
}
int maxInd = 0, maxVal = uniqueVisStates[0];
for(int h=1; h<nMoves; h++) {
if(uniqueVisStates[h] > maxVal) {
maxInd = h; maxVal = uniqueVisStates[h];
} else if(uniqueVisStates[h]==maxVal) {
if(abs(moves[h])<abs(moves[maxInd])) {
maxInd = h;
}
}
}
birdList[bn].update_Orientation(moves[maxInd]);
birdList[bn].update_Pos(birdList[bn].get_xPos()+cos(birdList[bn].get_orientation())*v*dt,birdList[bn].get_yPos()+sin(birdList[bn].get_orientation())*v*dt);
}
for(int bn=0; bn<nBirds; bn++) birdList[bn].finishUpdate();
cout << ts << "\n";
}
//OUTPUT DATA INTO A TEXT FILE.
for(int ts=0; ts<(nSteps-1); ts++) {
for(int bn=0; bn<nBirds; bn++) {
output << xPositions[ts][bn] << " " << yPositions[ts][bn] << " " << orientations[ts][bn] << "\n";
}
}
delete[] possibleStates;
return 0;
}
bool checkCollision(int i, int nFut, Bird *birds, double xi, double yi) {
int cond = 1; int index, counti=0;
while(cond) {
index = couldCollide[nFut][i][counti];
if(index==-1) break;
double xj = birds[index].get_futureX(nFut);
double yj = birds[index].get_futureY(nFut);
if((xi-xj)*(xi-xj)+(yi-yj)*(yi-yj) < 4*birdRad*birdRad) {
return 1;
}
counti++;
if(counti==nBirds) break;
}
return 0;
}
unsigned long int getVisualState(Bird *birdList, int nFut, int i, double cX, double cY, double cAng) {
//finds the visual state of bird i based on its current "exploring position" and the predicted positions of other birds at timestep nFut.
//visual state is defined by discretizing the bird's field of view into nSensors (relative to current orientation) and creating a vector of
//0s and 1s depending on whether each sensor is < half covered or not. This is then converted to an integer (as we are actually interested only
//in the number of unique visual states.
double relX, relY, relDist, dAng, s, dTheta, ang1, ang2;
//clear current visual state.
birdList[i].resetVisualState();
for(int j=0; j<nBirds; j++) {
if(i==j) continue;
relX = birdList[j].get_futureX(nFut)-cX;
relY = birdList[j].get_futureY(nFut)-cY;
relDist = sqrt(relX*relX+relY*relY);
dAng = acos((cos(cAng)*relX+sin(cAng)*relY)/relDist);
dTheta = atan(birdRad/relDist);
s = cos(cAng)*relY - sin(cAng)*relX;
if( s<0 ) dAng = 2*M_PI-dAng;
ang1 = dAng - dTheta; ang2 = dAng + dTheta;
if( ang1 < 0 ) {
birdList[i].addInterval(0,ang2);
birdList[i].addInterval(2*M_PI+ang1,2*M_PI);
} else if( ang2 > 2*M_PI ) {
birdList[i].addInterval(0,fmod(ang2,2*M_PI));
birdList[i].addInterval(ang1,2*M_PI);
} else {
birdList[i].addInterval(ang1,ang2);
}
}
Node *sI = birdList[i].get_visualState();
birdList[i].cleanUp(sI);
int ind1, ind2;
for(int k=0; k<nSensors; k++) sensorFrac[k]=0.0; //initialize.
while(sI->next->next != 0) {
ang1 = sI->value; ang2 = sI->next->value;
ind1 = floor(ang1/sensorRange); ind2 = floor(ang2/sensorRange);
if(ind2==nSensors) ind2--; //this happens if ang2 = 2pi (which can happen a lot).
if(ind1==ind2) {
sensorFrac[ind1] += (ang2-ang1)/sensorRange;
} else if(ind2-ind1==1) {
sensorFrac[ind1] += (sensorRef[ind1]-ang1)/sensorRange;
sensorFrac[ind2] += (ang2-sensorRef[ind1])/sensorRange;
} else {
sensorFrac[ind1] += (sensorRef[ind1]-ang1)/sensorRange;
sensorFrac[ind2] += (ang2-sensorRef[ind2-1])/sensorRange;
for(int y=ind1+1;y<ind2;y++) sensorFrac[y] = 1.0;
}
sI=sI->next->next;
}
//do final interval separately.
ang1 = sI->value; ang2 = sI->next->value;
ind1 = floor(ang1/sensorRange); ind2 = floor(ang2/sensorRange);
if(ind2==nSensors) ind2--; //this happens if ang2 = 2pi (which can happen a lot).
if(ind1==ind2) {
sensorFrac[ind1] += (ang2-ang1)/sensorRange;
} else if(ind2-ind1==1) {
sensorFrac[ind1] += (sensorRef[ind1]-ang1)/sensorRange;
sensorFrac[ind2] += (ang2-sensorRef[ind1])/sensorRange;
} else {
sensorFrac[ind1] += (sensorRef[ind1]-ang1)/sensorRange;
sensorFrac[ind2] += (ang2-sensorRef[ind2-1])/sensorRange;
for(int y=ind1+1;y<ind2;y++) sensorFrac[y] = 1.0;
}
int output = 0, multiplier = 1;
for(int y=0; y<nSensors; y++) {
if(sensorFrac[y]>0.5) output += multiplier;
multiplier *= 2;
}
return output;
}
void updateTree(double exploreX, double exploreY, double exploreO, Bird *bird, int bn, int nFut) {
double o,x,y;
if(checkCollision(bn,nFut,bird,exploreX,exploreY)) return;
int vs = getVisualState(bird,nFut,bn,exploreX,exploreY,exploreO);
possibleStates[counter] = vs;
counter++;
if(nFut < (nF-1)) {
for(int m=0; m<nMoves; m++) {
o = exploreO + moves[m];
x = exploreX + cos(o)*v*dt;
y = exploreY + sin(o)*v*dt;
updateTree(x,y,o,bird,bn,nFut+1);
}
} else {
return;
}
}
"birdClasses.h":
#ifndef BIRDCLASSES_H_INCLUDED
#define BIRDCLASSES_H_INCLUDED
#include <iostream>
#include <cmath>
using namespace std;
//DEFINE SOME GLOBAL PARAMETERS OF THE SIMULATION
const int nBirds = 50;
const int nF = 6; //number of future timesteps to consider.
const int nSteps = 200;
const double v = 20, dt = 0.1, birdRad = 0.2;
int numStates(int numMoves, int nFut) {
int num = 1; int multiplier = numMoves;
for(int i=1; i<nFut; i++) {
num += multiplier;
multiplier *= numMoves;
}
return num;
}
//Node class is just for a linked list (used in constructing the visual states),
class Node {
public:
int identifier; // 0 is left side of interval, 1 is right side
double value; //angular value.
Node *next; //pointer to the next interval.
void display(Node *start);
};
//printout linked list if necessary (mainly for debugging purposes).
void Node::display(Node *start) {
if(start != 0) {
double inter = start->value;
cout << inter << " ";
display(start->next);
}
}
//bird class.
class Bird {
double currX, currY;
double updatedX, updatedY;
double currOrientation;
double futureX[nF], futureY[nF];
Node *visualState;
public:
Bird() {
currOrientation=0.0; currX = 0.0; currY = 0.0;
visualState = new Node;
visualState->value = 0.0;
visualState->next = new Node;
visualState->next->value = 0.0;
visualState->next->next = 0;
}
Bird(double x, double y, double o) {
currX = x; currY = y; currOrientation = o;
visualState = new Node;
visualState->value = 0.0;
visualState->next = new Node;
visualState->next->value = 0.0;
visualState->next->next = 0;
}
void set_position(double x, double y) {
currX = x; currY = y;
}
double get_xPos() {
return currX;
}
double get_yPos() {
return currY;
}
double get_orientation() {
return currOrientation;
}
double get_futureX(int ts) {
return futureX[ts];
}
double get_futureY(int ts) {
return futureY[ts];
}
//return pointer to first node.
Node* get_visualState() {
return visualState;
}
void updateFuture() {
//use current orientation and position to update future positions.
for(int i=0; i<nF; i++) {
futureX[i] = currX + v*(i+1)*cos(currOrientation)*dt;
futureY[i] = currY + v*(i+1)*sin(currOrientation)*dt;
}
}
void update_Pos(double x, double y) {
updatedX = x;
updatedY = y;
}
//run this after all birds have updated positions:
void finishUpdate() {
currX = updatedX;
currY = updatedY;
}
void update_Orientation(double o) {
currOrientation += o;
}
//add the interval defined by [l r] to the visual state.
void addInterval(double l, double r) {
int placed = 0; double cL = 0.0; double cR = 0.0;
if(visualState->value==0.0 && visualState->next->value==0.0) { //then this is first interval to place.
visualState->value = l;
visualState->next->value = r;
placed = 1;
return;
}
Node *curr_L = visualState;
Node *prev_L = visualState;
while(placed==0) {
cL = curr_L->value;
cR = curr_L->next->value;
if(l<cL && r<cL) { //add new interval before this one.
Node *newRoot = new Node;
newRoot->value = l;
newRoot->identifier = 0;
newRoot->next = new Node;
newRoot->next->value = r;
newRoot->next->next = curr_L;
if(curr_L == visualState) {
visualState = newRoot;
} else {
prev_L->next->next = newRoot;
}
placed = 1;
} else if(l <= cL && r >= cR) {
curr_L->value = l;
curr_L->next->value = r;
placed = 1;
} else if(l <= cL && r <= cR) {
curr_L->value = l;
placed = 1;
} else if(l >= cL && r <= cR) {
placed = 1; //dont need to do anything.
} else if(l >= cL && l<=cR && r >= cR) {
curr_L->next->value = r;
placed = 1;
}
if(l > cR && r > cR) {
if(curr_L->next->next != 0) {
prev_L = curr_L;
curr_L = curr_L->next->next;
} else {
Node *newEndL = new Node;
newEndL->value = l;
newEndL->identifier = 0;
newEndL->next = new Node;
newEndL->next->value = r;
newEndL->next->identifier = 1;
newEndL->next->next = 0;
curr_L->next->next = newEndL;
placed = 1;
}
}
}
}
//remove any overlaps.
void cleanUp(Node *start) {
Node *NP, *NNP; NP = start->next->next;
if(NP==0) return;
NNP = start->next->next->next->next;
double cL = start->value, cR = start->next->value, nL = start->next->next->value, nR = start->next->next->next->value;
if(nL < cR) {
if(nR > cR) {
start->next->value = nR;
}
start->next->next = NNP;
}
if(NNP!=0) cleanUp(NP);
}
//reset the visual state.
void resetVisualState() {
Node *cNode = visualState;
Node *nNode = visualState->next;
while(nNode != 0) {
delete cNode;
cNode = nNode;
nNode = nNode->next;
}
delete cNode;
delete nNode;
visualState = new Node;
visualState->identifier = 0;
visualState->value = 0.0;
visualState->next = new Node;
visualState->next->identifier = 1;
visualState->next->value = 0.0;
visualState->next->next = 0;
return;
}
};
#endif // BIRDCLASSES_H_INCLUDED

or if not just any suggestions for how I should actually debug this!
You can try to set catchpoint in gdb to catch std::bad_alloc exception:
(gdb) catch throw bad_alloc
(See Setting Catchpoints)
If you are able to reproduce this bad_alloc in gdb you can then look at bt to see possible reason of this exception.

I think this is a logic bug and not necessarily memory related.
In void addInterval(double l, double r) you declare
Node *curr_L = visualState;
Node *prev_L = visualState;
These pointers will now point to whatever the member visualState is pointing to.
later on you are changing visualState to point to a newly created Node
Node *newRoot = new Node;
// ....
if(curr_L == visualState) {
visualState = newRoot;
but your pointers curr_L and prev_L will still point to whatever visualState was pointing to before. The only time you change those pointers is at
if(curr_L->next->next != 0) {
prev_L = curr_L;
curr_L = curr_L->next->next;
which is the same as
if(WHATEVER_VISUAL_STATE_USED_TO_POINT_TO->next->next != 0) {
prev_L = curr_L;
curr_L = curr_L->next->next;
Is this your intention? You can follow the assignment of curr_L by looking for *curr_L = * in your editor.
I would suggest testing your code on a small data sample and make sure your code follows your intentions. Use a debugger or trace outputs. Use
valgrind if you have access to it, I think you will appreciate valgrind.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

what are some optimization tricks to make my code run faster - c++

Cheinan Marks has some benchmarks and performance tips related to random generators & friends in his cppcon 2016 talk I Just Wanted a Random Integer! He mentions some fast generators as well IIRC. I'd start there.

Related

Why is multi-threading of matrix calculation not faster than single-core?

Fixing Neural Net vanishing gradients problem?

C++ Memory Error

How to set a timeout in a function that is running in a thread

Debugging a bad_alloc error c++

Categories

Resources