Comparing elapsed time to copy memory buffers: memcpy vs ForLoopIndexCopying - c++

I am trying to understand the difference between "memcpy" and "for-loop-index-copying" of a buffer in the time it takes to copy.
results:
CopyingType | MemoryType | RunMode | Elapsed Time(ms)
-----------------------------------------------------------
memcpy | stack | Debug | x
forLoopIndexing | stack | Debug | 30x
memcpy | stack | Release | 0
forLoopIndexing | stack | Release | 0
memcpy | heap | Release | 0
forLoopIndexing | heap | Release | 2000
Here is my code to run... Maybe I am doing something wrong ???
Seems odd that to copy a 500,000 byte buffer 100,000 times takes no time at all or at least less than the resolution of the machine... in my case 16ms.
#include "stdafx.h"
#include <windows.h>
#include <iostream>
int main()
{
long baseTime;
const long packetLength = 500000;
//char packet1[packetLength];//stack
//char packet2[packetLength];//stack
char *packet1 = (char*)calloc(packetLength, sizeof(char));//heap
char *packet2 = (char*)calloc(packetLength, sizeof(char));//heap
memset(packet1, 0, packetLength);//init
memset(packet2, 0, packetLength);//init
long NumPackets = 100000;
long NumRuns = 10;
for (long k = 0; k < NumRuns; k++)
{
//create packet
printf("\npacket1:\n");
for (long i = 0; i < packetLength; i++) {
packet1[i] = (char)(i % 26 + 65);
}
printf("\nk:%d\n", k);
//index copy
baseTime = GetTickCount();
long ii = 0;
for (long j = 0; j < NumPackets; j++) {
for (long i = 0; i < packetLength; i++) {
packet2[i] = packet1[i];
}
}
printf("Time(IndexCopy): %ld\n", GetTickCount() - baseTime);
//memcpy
memset(packet2, 0, packetLength);//reset
baseTime = GetTickCount();
for (long j = 0; j < NumPackets; j++) {
memcpy(packet2, packet1, packetLength); //Changed via PaulMcKenzie.
}
printf("Time(memcpy): %ld\n", GetTickCount() - baseTime);
//printf("\npacket2\n");
for (long i = 0; i < packetLength; i++) {
//printf("%c", packet2[i]);
}
}
int iHalt;
scanf_s("%d", &iHalt);
return 0;
}
Via the change... the new table
CopyingType | MemoryType | RunMode | Elapsed Time(ms)
-----------------------------------------------------------
memcpy | stack | Debug | x
forLoopIndexing | stack | Debug | 50x
memcpy | stack | Release | 0
forLoopIndexing | stack | Release | 0
memcpy | heap | Release | 2000
forLoopIndexing | heap | Release | 2000

Maybe I am doing something wrong ???
You are doing something wrong, more to the point, in the code that uses memcpy.
const long packetLength = 500000;
char *packet1 = (char*)calloc(packetLength, sizeof(char));
char *packet2 = (char*)calloc(packetLength, sizeof(char));
//...
for (long j = 0; j < NumPackets; j++) {
memcpy(packet2, packet1, sizeof(packet2)); // <-- Incorrect
}
The sizeof(packet2) is the same as sizeof(char *), which more than likely is either 4 or 8.
What you want is not the sizeof(char *), but the actual number of bytes to copy.
for (long j = 0; j < NumPackets; j++) {
memcpy(packet2, packet1, packetLength);
}

Related

My Neural Network is only learning some data sets

I've created the following NN that should be learning based on back propagation.
I've peiced it together from a lot of reading and a bunch of different tutorials.
To test, I've tried giving it the XOR problem. Each data set is 2 inputs and 2 outputs. The two inputs are both either a 1 or 0, and the two outputs should indicate whether a 0 should be output (the first output) or a 1 should be output (the second output).
What's happening when I give it the following data:
___________________________________________________________________________
| Input 1 | Input 2 | Expected 1 | Expected 2 | NN Output 1 | NN Output 2 |
|-------------------------------------------------------------------------|
| 0 | 1 | 1 | 0 | 0.49 | 0.50 |
| 1 | 0 | 1 | 0 | 0.98 | 0.01 |
| 1 | 1 | 0 | 1 | 0.01 | 0.98 |
| 0 | 0 | 0 | 1 | 0.49 | 0.50 |
---------------------------------------------------------------------------
What is hopefully clear in the above is that for two of the problems given it; it's sort of worked, assuming there's a margin of error, getting within 0.01 of the answer is pretty good.
But for the other two answers, it's way off. Sure a step-function would result correctly, but it's basically saying there's a 50/50 split.
This is with 100,000 epochs and a learning rate of 0.03 and what you see above was the actual training data.
If I increase the learning rate to 0.9; the results are different but also make me question things:
___________________________________________________________________________
| Input 1 | Input 2 | Expected 1 | Expected 2 | NN Output 1 | NN Output 2 |
|-------------------------------------------------------------------------|
| 0 | 1 | 1 | 0 | 0.99 | 0.00 |
| 1 | 0 | 1 | 0 | 0.99 | 0.00 |
| 1 | 1 | 0 | 1 | 0.49 | 0.99 |
| 0 | 0 | 0 | 1 | 0.00 | 0.99 |
---------------------------------------------------------------------------
Much better; but there's still the weird output for the 1,1 input.
My code is fairly short, here below. It's the complete code:
#include <iostream>
#include <array>
#include <random>
#include <vector>
class RandomGenerator
{
public:
RandomGenerator(const double min, const double max)
:
m_ran(),
m_twister(m_ran()),
m_distrib(min,max)
{
}
double operator()(void) { return m_distrib(m_twister); }
private:
std::random_device m_ran;
std::mt19937_64 m_twister;
std::uniform_real_distribution<double> m_distrib;
} randGen(-2,2);
double sigmoid(const double x)
{
return 1.0 / (1.0 + std::exp(-x));
}
double softplus(const double x)
{
return std::log(1.0 + std::exp(x));
}
double step(const double x)
{
return x > 0 ? 1 : 0;
}
template<int NumInputs, double(*ActivationFunction)(const double)>
class Neuron
{
public:
void SetInput(const std::size_t index, const double value)
{
m_inputsAndWeights[index].value = value;
}
double GetInput(const std::size_t index) const { return m_inputsAndWeights[index].value; }
void SetWeight(const std::size_t index, const double weight)
{
m_inputsAndWeights[index].weight = weight;
}
double GetWeight(const std::size_t index) const { return m_inputsAndWeights[index].weight; }
void SetBiasWeight(const double weight) { m_biasWeight = weight; }
double GetBiasWeight() const { return m_biasWeight; }
double GetOutput() const
{
double output = 0;
for(const auto& p : m_inputsAndWeights)
output += p.value * p.weight;
output += 1.0 * m_biasWeight;
return ActivationFunction(output);
}
private:
struct DataPair
{
double value;
double weight;
};
std::array<DataPair,NumInputs> m_inputsAndWeights;
double m_biasWeight;
};
template<std::size_t NumInputs, std::size_t NumOutputs>
class NeuralNetwork
{
public:
static constexpr NumHidden() { return (NumInputs+NumOutputs) / 2; }
SetInputs(std::array<double,NumInputs> inputData)
{
for(auto& i : m_hiddenNeurons)
{
for(auto index = 0; index < inputData.size(); ++index)
i.SetInput(index,inputData[index]);
}
}
std::array<double,NumOutputs> GetOutputs() const
{
std::array<double,NumOutputs> outputs;
for(auto i = 0; i < NumOutputs; ++i)
{
outputs[i] = m_outputNeurons[i].GetOutput();
}
return outputs;
}
void PassForward(std::array<double,NumInputs> inputData)
{
SetInputs(inputData);
for(auto i = 0; i < NumHidden(); ++i)
{
for(auto& o : m_outputNeurons)
{
o.SetInput(i,m_hiddenNeurons[i].GetOutput());
}
}
}
void Train(std::vector<std::array<double,NumInputs>> trainingData,
std::vector<std::array<double,NumOutputs>> targetData,
double learningRate, std::size_t numEpochs)
{
for(auto& h : m_hiddenNeurons)
{
for(auto i = 0; i < NumInputs; ++i)
h.SetWeight(i,randGen());
h.SetBiasWeight(randGen());
}
for(auto& o : m_outputNeurons)
{
for(auto h = 0; h < NumHidden(); ++h)
o.SetWeight(h,randGen());
o.SetBiasWeight(randGen());
}
for(std::size_t e = 0; e < numEpochs; ++e)
{
for(std::size_t dataIndex = 0; dataIndex < trainingData.size(); ++dataIndex)
{
PassForward(trainingData[dataIndex]);
std::array<double,NumHidden()+1> deltaHidden;
std::array<double,NumOutputs> deltaOutput;
for(auto i = 0; i < NumOutputs; ++i)
{
auto output = m_outputNeurons[i].GetOutput();
deltaOutput[i] = output * (1.0 - output) * (targetData[dataIndex][i] - output);
}
for(auto i = 0; i < NumHidden(); ++i)
{
double error = 0;
for(auto j = 0; j < NumOutputs; ++j)
{
error += m_outputNeurons[j].GetWeight(i) * deltaOutput[j];
}
auto output = m_hiddenNeurons[i].GetOutput();
deltaHidden[i] = output * (1.0 - output) * error;
}
for(auto i = 0; i < NumOutputs; ++i)
{
for(auto j = 0; j < NumHidden(); ++j)
{
auto currentWeight = m_outputNeurons[i].GetWeight(j);
m_outputNeurons[i].SetWeight(j,currentWeight + learningRate * deltaOutput[i] * m_hiddenNeurons[j].GetOutput());
}
auto currentWeight = m_outputNeurons[i].GetBiasWeight();
m_outputNeurons[i].SetBiasWeight(currentWeight + learningRate * deltaOutput[i] * (1.0*currentWeight));
}
for(auto i = 0; i < NumHidden(); ++i)
{
for(auto j = 0; j < NumInputs; ++j)
{
auto currentWeight = m_hiddenNeurons[i].GetWeight(j);
m_hiddenNeurons[i].SetWeight(j,currentWeight + learningRate * deltaHidden[i] * m_hiddenNeurons[i].GetInput(j));
}
auto currentWeight = m_hiddenNeurons[i].GetBiasWeight();
m_hiddenNeurons[i].SetBiasWeight(currentWeight + learningRate * deltaHidden[i] * (1.0*currentWeight));
}
}
}
}
private:
std::array<Neuron<NumInputs,sigmoid>,NumHidden()> m_hiddenNeurons;
std::array<Neuron<NumHidden(),sigmoid>,NumOutputs> m_outputNeurons;
};
int main()
{
NeuralNetwork<2,2> NN;
std::vector<std::array<double,2>> trainingData = {{{0,1},{1,0},{1,1},{0,0}}};
std::vector<std::array<double,2>> targetData = {{{1,0},{1,0},{0,1},{0,1}}};
NN.Train(trainingData,targetData,0.03,100000);
for(auto i = 0; i < trainingData.size(); ++i)
{
NN.PassForward(trainingData[i]);
auto outputs = NN.GetOutputs();
for(auto o = 0; o < outputs.size(); ++o)
{
std::cout << "Out " << o << ":\t" << outputs[o] << std::endl;
}
}
return 0;
}
I have done the same thing a few days ago, and I can tell you that 100 000 iterations for back propagation is not enough, if you hit some unfortunate weight initialization. Dont initialize you weights randomly, the sigmoid can easily fall into saturation for large weights, on the other hand 0 weights wont help either. I have initialized mine weights +/-(0.3, 0.7) and the convergence improved significantly.

Strange behaviour in nested For Loops

First of all, I'm pretty new to C++ so try not to be too harsh on me. I wrote this block of code:
int LargestProduct (string numStr, int groupSize) {
int numOfGroups = numStr.size() / groupSize;
int groupsRemaining = numStr.size() % groupSize;
int largestProduct = 0, thisProduct = 1;
for (int i = 1; i <= numOfGroups; i++) {
for (int j = i; j <= groupSize; j++)
thisProduct *= (numStr[j-1] - '0');
if (thisProduct > largestProduct)
largestProduct = thisProduct;
thisProduct = 1;
}
// .. A bit more irrelevant code here
return largestProduct;
}
The function call LargestProduct ("1234567890", 2) should yield 72, but it wrongly yields 6. So, for some reason, this code will work but not as expected (Note: this code I wrote should compute the largest product of groupsSize-adjacent numbers in a big, given number called numStr).
I did some debugging, and found a strange behaviour in the nested for-loop. I set up a breakpoint inside the second for-loop
thisProduct *= (numStr[j] - '0');
After some iterations (for example, 8 iterations), this is what I would expect i and j to be:
+--------+---------+
| i | j |
+--------+---------+
| 1 | 1 |
| 1 | 2 |
| 2 | 1 |
| 2 | 2 |
| 3 | 1 |
| 3 | 2 |
| 4 | 1 |
| 4 | 2 |
+--------+---------+
This is what really happens:
+--------+---------+
| i | j |
+--------+---------+
| 1 | 1 |
| 1 | 2 |
| 2 | 2 |
+--------+---------+
And suddenly the program spits out a wrong result (6, instead of 72)
But this seems counterintuitive, to say the least. The variable i goes from 0 to numOfGroups, which in the example above equals 5. On the other hand, j goes from i to groupSize, which happens to be 2.
There should be 5*2 = 10 iterations, but there are only 3 of them. Also, in the last iteration, j should be "re-initialized" to 0. This doesn't happen though.
Anyone please help this C++ newbie?
EDIT
The problem was that the j-for-loop ranged from a moving index (i) to a non-moving index(groupSize). This was causing that "shrinking" effect in the second for-loop, which is easily fixed by changing this line:
for (int j = i; j <= groupSize; j++)
To this other one:
for (int j = i; j <= i + groupSize - 1; j++)
And to make the full algorithm to work as expected, one should also replace these lines:
int numOfGroups = numStr.size() / groupSize;
int groupsRemaining = numStr.size() % groupSize;
with this single one:
int numOfGroups = numStr.size() - 1;
EDIT 2
Everything is OK now, thank you for your kindness guys! I appreciate it. The whole code is:
int LargestProduct (string numStr, int groupSize) {
int numOfGroups = numStr.size() - 1;
int largestProduct = 0, thisProduct = 1;
for (int i = 1; i <= numOfGroups; i++) {
for (int j = i; j <= i + groupSize - 1; j++)
thisProduct *= (numStr[j-1] - '0');
if (thisProduct > largestProduct)
largestProduct = thisProduct;
thisProduct = 1;
}
return largestProduct;
}
You said:
On the other hand, j goes from 0 to groupSize
But the code says:
for (int j = i; j <= groupSize; j++)
This means j is going from i to groupSize, not 0 to groupSize

Reading triple nested for loop

My code is as follows. My confusion occurs during the 2nd and 3rd loop. Why does the result return 1*** then 12** then 123* then 1234.. I get the j loop is reset to 0 but doesn't it reenter the k loop whenever its true that j<=i?
for(int i = 1; i <= 4; i++)
{
for(int j = 1; j <= i; j++)
cout << j;
for(int k = 4 - i; k >= 1; k--)
cout << "*";
cout << endl;
}
Some clarification first:
Firstly: j is never reset to 0, but to 1.
Secondly: This is imho no triple-nested for-loop, which was be (but is not needed to have your code working as you describe it):
for(...) {
for(...) {
for(...) {
}
}
}
To your confusion:
Pretty printing your code:
for(int i=1; i<=4; i++) {
// Write the digits 1..i (1, 12, 123, 1234)
for(int j=1; j<=i; j++) {
std::cout << j;
}
// Write the stars (***, **, *)
for(int k=(4-i); k>=1; k--) {
std::cout << "*";
}
std::cout << std::endl;
}
Imagine the following sequences:
// Iteration | i | j | k | String
// 1 | 1 | 1 | 3 | 1*
// 2 | 1 | 1 | 2 | 1**
// 3 | 1 | 1 | 1 | 1***\n
// 4 | 2 | 1 | - | 1
// 5 | 2 | 2 | - | 12
// 6 | 2 | 2 | 2 | 12*
// 7 | 2 | 2 | 1 | 12**\n
// 8 | 3 | 1 | - | 1
// 9 | 3 | 2 | - | 12
// 10 | 3 | 3 | - | 123
// 11 | 3 | 3 | 1 | 123*\n
// 12 | 4 | 1 | - | 1
// 13 | 4 | 2 | - | 12
// 14 | 4 | 3 | - | 123
// 15 | 4 | 4 | - | 1234\n
The k-loop is reentered, if the initial index:
// k:=(4-i) >= 1
So entering the k-Loop is exclusively dependent on the index i.
Mathematically:
// (4-i) >= 1
// <=> -i >= (1-3)
// <=> -i >= -3
// <=> i <= 3
So the k-loop is reentered, as long as i is <= 3.
In order to get the effect you want your code should be like this:
for(int i = 1; i <= 4; i++)
{
for(int j = 1; j <= i; j++)
{
cout << j;
for(int k = 4 - i; k >= 1; k--)
cout << "*";
}
cout << endl;
}
if you dont have the {} the k loop is executed only after finishing the j loop

Remove if conditions from simple function

I need to remove as many if conditions as possible from the two functions below:
inline int inc_with_1bit_saturation(int counter)
{
if (counter == 1)
return --counter;
return ++counter;
}
void branch_prediction_1bit_saturation(int* input, int* output, int size)
{
int counter = 0;
for (int i = 0; i < size; ++i)
{
if (input[i] != counter)
{
counter = inc_with_1bit_saturation(counter);
output[i] = 0;
}
else output[i] = 1;
}
}
How can I do that and what if branch is absolutely necessary and cannot be removed and which one can be replaced by simple bitwise operations or something like that?
Update 1
According to User JSF's great tip, the code is now looking like this:
void branch_prediction_1bit_saturation(int* input, int* output, int size)
{
int counter = 0;
for (int i = 0; i < size; ++i)
{
if (input[i] != counter)
{
counter = 1 - counter;
output[i] = 0;
}
else output[i] = 1;
}
}
Update 2
Thanks to Cantfindname, the code became like this:
void branch_prediction_1bit_saturation(int* input, int* output, int size)
{
int counter = 0;
for (int i = 0; i < size; ++i)
{
output[i] = counter == input[i];
counter = output[i] * counter + (1 - output[i])*(1 - counter);
}
}
And this completely solves the question.
For the if statement inside the loop:
output[i] = (int)(input[i]==counter);
counter = output[i]*counter + (1-output[i])*(1-counter) //used JSF's trick
True converts to 1 and false to 0, according to this: bool to int conversion
function inc_with_1bit_saturation is equivalent of modulo 2. So you can replace
counter = inc_with_1bit_saturation(counter);
With
counter = (counter+1) % 2;
void branch_prediction_1bit_saturation(int* input, int* output, int size) {
int counter = 0;
for (int i = 0; i < size; ++i)
{
output[i] = (int)!((!!input[i]) ^ counter);
counter = (int)((!!input[i]) & counter) | ((!!input[i]) & !counter);
}
}
A is logic input[i];
B is logic counter;
The truth table for input[i] != counter is:
A B
0 0 | 0 --> (0 & 0) | (0 & !0) = 0 | 0 = 0
0 1 | 0 --> (0 & 1) | (0 & !1) = 0 | 0 = 0
1 0 | 1 --> (1 & 0) | (1 & !0) = 0 | 1 = 1
1 1 | 1 --> (1 & 1) | (1 & !1) = 1 | 0 = 1
The truth table for output[i]
A B
0 0 | 1 --> !(0 ^ 0) = !(0) = 1
0 1 | 0 --> !(0 ^ 1) = !(1) = 0
1 0 | 0 --> !(1 ^ 0) = !(1) = 0
1 1 | 1 --> !(1 ^ 1) = !(0) = 1
:)

Dynamic 2D Array Creation Runtime Error

I'm trying to create a multidimentional int array with the following function code:
int ** createIntMatrix(unsigned int rows, unsigned int cols)
{
int ** matrix;
unsigned int i,j;
matrix = (int **) calloc(cols, sizeof(int *));
for(i = 0; i < cols; i++)
matrix[i] = (int *) calloc(rows, sizeof(int));
for(i = 0; i < cols; i++)
for(j = 0; j < rows; j++)
matrix[i][j] = 0;
return matrix;
}
I create three instances using this function in the following code,
cout<<"allocating temporary data holders..."<<endl;
int ** temp_meanR;
int ** temp_meanG;
int ** temp_meanB;
temp_meanR = createIntMatrix(img->height,img->width);
temp_meanG = createIntMatrix(img->height,img->width);
temp_meanB = createIntMatrix(img->height,img->width);
cout<<"....done!"<<endl;
I'm accessing these elements like temp_meanB[4][5].
But unfortunately, I get the following error during runtime:
allocating temporary data holders...
....done!
tp6(1868) malloc: *** error for object 0x122852e08: incorrect checksum for freed object - object was probably modified after being freed.
*** set a breakpoint in malloc_error_break to debug
Abort trap
Where am I going wrong here?
for(i = 0; i < cols; i++)
for(j = 0; i < rows; i++)
matrix[i][j] = 0;
note the inside for loop, it says j=0; i<rows; i++ (before Aarohi Johal's edit)
Next you do not have to set the memory manually to 0, as calloc does it for you.
In C++, you should use new and delete .
In the code segment
matrix = (int **) calloc(cols, sizeof(int *));
for(i = 0; i < cols; i++)
matrix[i] = (int *) calloc(rows, sizeof(int));
I think first the rows should be allocated and then for each row link the int arrays.
Visulize like this:
+--------+
| matrix |
+--------+
| c o l s
| +----------------------------+
V | |
+-- +---+ +---+---+---+ +---+
| | |-->| | | | . . . | |
| +---+ +---+---+---+ +---+
| | |--+
r | +---+ | +---+---+---+ +---+
o | | | +-->| | | | . . . | |
w | +---+ +---+---+---+ +---+
s . .
. .
. .
| | |
| +---+ +---+---+---+ +---+
| | |-->| | | | . . . | |
+-- +---+ +---+---+---+ +---+
First do the rows and then the cols, in the above visualization, then the arr[i][j] interpretation would be like normal array.