I am kind of new to C++ and I was doing a physics simulation in python which was taking forever to finish so I decided to switch to C++, and I don t understand how to make a function which will return a 2D array (or 3D array)
#include <iostream>
#include <cmath>
// #include <complex> //
using namespace std;
double** psiinit(int L, int n, double alpha){
double yj[400][400] = {};
for (int j = 0; j < n; j++)
{
double xi[400] = {};
for (int i = 0; i < n; i++)
{
xi[i] = exp(-(pow((i-(L/4)), 2) + (pow((j-(L/4)), 2)))/alpha) / (sqrt(2)*3.14159*alpha);
};
yj[j] = xi;
};
return yj;
}
int main(){
int L = 10;
int n = 400;
int nt = 200*n;
double alpha = 1;
double m = 1;
double hbar = 1;
double x[n] = {};
double y[n] = {};
double t[nt] = {};
double psi[nt][n][n] = {};
psi[0] = psiinit(L, n, alpha);
cout << psi <<endl;
return 0;
}
I have look for answers but it doesn't seems to be for my kind of problems
Thanks
If you're new to c++ you should read about the concepts of heap and stack, and about stack frames. There are a ton of good resources for that.
In short, when you declare a C-style array (such as yj), it is created in the stack frame of the function, and therefore there are no guarantees about it once you exit the frame, and your program invokes undefined behavior when it references that returned array.
There are 3 options:
Pass the array to the function as an output parameter (very C-style and not recommended).
Wrap the array in a class (like std::array already does for you), in which case it remains on the stack and is copied to the calling frame when returned, but then its size has to be known at compile time.
Allocate the array on the heap and return it, which seems to me to best suit your case. std::vector does that for you:
std::vector<std::vector<double>> psiinit(int L, int n, double alpha){
std::vector<std::vector<double>> yj;
for (int j = 0; j < n; j++)
{
std::vector<double> xi;
for (int i = 0; i < n; i++)
{
const int value = exp(-(pow((i-(L/4)), 2) + (pow((j-(L/4)), 2)))/alpha) / (sqrt(2)*3.14159*alpha);
xi.push_back(value);
}
yj.push_back(xi);
}
return yj;
}
If you're concerned with performance and all of your inner vectors are of a fixed size N, it might be better to use std::vector<std::array<double, N>>.
Either make a wrapper as said above, or use a vector of vectors.
#include <vector>
#include <iostream>
auto get_2d_array()
{
// use std::vector since it will allocate (the large amount of) data on the heap
// construct a vector of 400 vectors with 400 doubles each
std::vector<std::vector<double>> arr(400, std::vector<double>(400));
arr[100][100] = 3.14;
return arr;
}
int main()
{
auto arr = get_2d_array();
std::cout << arr[100][100];
}
Your understanding of arrays, pointers and return values is incomplete. I cannot write you a whole tutorial on the topic but I recommend you read up on this.
In the mean time, I recommend you use std::vector instead of C-style arrays and treat your multidimensional arrays as 1D vectors with proper indexing, e.g. cell = vector[row * cols + col]
Something like this:
#include <cmath>
// using std::exp, M_PI, M_SQRT2
#include <vector>
std::vector<double> psiinit(int L, int n, double alpha) {
std::vector<double> yj(n * n);
double div = M_SQRT2 * M_PI * alpha;
for (int j = 0; j < n; j++)
{
double jval = j - L/4;
jval = jval * jval;
for (int i = 0; i < n; i++)
{
double ival = i - L/4;
ival = ival * ival;
yj[j * n + i] = std::exp(-(ival + jval) / alpha) / div;
}
}
return yj;
}
Addendum: There are also specialized libraries to support matrices better and faster. For example Eigen
https://eigen.tuxfamily.org/dox/GettingStarted.html
heap allocating and returning that pointer will also work...
instead of
double yj[400][400] = {};
do,
double** yj;
yj = new double*[400];
yj[i] = new double[400];
then just,
return yj;
I have a matrix (relatively big) that I need to transpose. For example assume that my matrix is
a b c d e f
g h i j k l
m n o p q r
I want the result be as follows:
a g m
b h n
c I o
d j p
e k q
f l r
What is the fastest way to do this?
This is a good question. There are many reason you would want to actually transpose the matrix in memory rather than just swap coordinates, e.g. in matrix multiplication and Gaussian smearing.
First let me list one of the functions I use for the transpose (EDIT: please see the end of my answer where I found a much faster solution)
void transpose(float *src, float *dst, const int N, const int M) {
#pragma omp parallel for
for(int n = 0; n<N*M; n++) {
int i = n/N;
int j = n%N;
dst[n] = src[M*j + i];
}
}
Now let's see why the transpose is useful. Consider matrix multiplication C = A*B. We could do it this way.
for(int i=0; i<N; i++) {
for(int j=0; j<K; j++) {
float tmp = 0;
for(int l=0; l<M; l++) {
tmp += A[M*i+l]*B[K*l+j];
}
C[K*i + j] = tmp;
}
}
That way, however, is going to have a lot of cache misses. A much faster solution is to take the transpose of B first
transpose(B);
for(int i=0; i<N; i++) {
for(int j=0; j<K; j++) {
float tmp = 0;
for(int l=0; l<M; l++) {
tmp += A[M*i+l]*B[K*j+l];
}
C[K*i + j] = tmp;
}
}
transpose(B);
Matrix multiplication is O(n^3) and the transpose is O(n^2), so taking the transpose should have a negligible effect on the computation time (for large n). In matrix multiplication loop tiling is even more effective than taking the transpose but that's much more complicated.
I wish I knew a faster way to do the transpose (Edit: I found a faster solution, see the end of my answer). When Haswell/AVX2 comes out in a few weeks it will have a gather function. I don't know if that will be helpful in this case but I could image gathering a column and writing out a row. Maybe it will make the transpose unnecessary.
For Gaussian smearing what you do is smear horizontally and then smear vertically. But smearing vertically has the cache problem so what you do is
Smear image horizontally
transpose output
Smear output horizontally
transpose output
Here is a paper by Intel explaining that
http://software.intel.com/en-us/articles/iir-gaussian-blur-filter-implementation-using-intel-advanced-vector-extensions
Lastly, what I actually do in matrix multiplication (and in Gaussian smearing) is not take exactly the transpose but take the transpose in widths of a certain vector size (e.g. 4 or 8 for SSE/AVX). Here is the function I use
void reorder_matrix(const float* A, float* B, const int N, const int M, const int vec_size) {
#pragma omp parallel for
for(int n=0; n<M*N; n++) {
int k = vec_size*(n/N/vec_size);
int i = (n/vec_size)%N;
int j = n%vec_size;
B[n] = A[M*i + k + j];
}
}
EDIT:
I tried several function to find the fastest transpose for large matrices. In the end the fastest result is to use loop blocking with block_size=16 (Edit: I found a faster solution using SSE and loop blocking - see below). This code works for any NxM matrix (i.e. the matrix does not have to be square).
inline void transpose_scalar_block(float *A, float *B, const int lda, const int ldb, const int block_size) {
#pragma omp parallel for
for(int i=0; i<block_size; i++) {
for(int j=0; j<block_size; j++) {
B[j*ldb + i] = A[i*lda +j];
}
}
}
inline void transpose_block(float *A, float *B, const int n, const int m, const int lda, const int ldb, const int block_size) {
#pragma omp parallel for
for(int i=0; i<n; i+=block_size) {
for(int j=0; j<m; j+=block_size) {
transpose_scalar_block(&A[i*lda +j], &B[j*ldb + i], lda, ldb, block_size);
}
}
}
The values lda and ldb are the width of the matrix. These need to be multiples of the block size. To find the values and allocate the memory for e.g. a 3000x1001 matrix I do something like this
#define ROUND_UP(x, s) (((x)+((s)-1)) & -(s))
const int n = 3000;
const int m = 1001;
int lda = ROUND_UP(m, 16);
int ldb = ROUND_UP(n, 16);
float *A = (float*)_mm_malloc(sizeof(float)*lda*ldb, 64);
float *B = (float*)_mm_malloc(sizeof(float)*lda*ldb, 64);
For 3000x1001 this returns ldb = 3008 and lda = 1008
Edit:
I found an even faster solution using SSE intrinsics:
inline void transpose4x4_SSE(float *A, float *B, const int lda, const int ldb) {
__m128 row1 = _mm_load_ps(&A[0*lda]);
__m128 row2 = _mm_load_ps(&A[1*lda]);
__m128 row3 = _mm_load_ps(&A[2*lda]);
__m128 row4 = _mm_load_ps(&A[3*lda]);
_MM_TRANSPOSE4_PS(row1, row2, row3, row4);
_mm_store_ps(&B[0*ldb], row1);
_mm_store_ps(&B[1*ldb], row2);
_mm_store_ps(&B[2*ldb], row3);
_mm_store_ps(&B[3*ldb], row4);
}
inline void transpose_block_SSE4x4(float *A, float *B, const int n, const int m, const int lda, const int ldb ,const int block_size) {
#pragma omp parallel for
for(int i=0; i<n; i+=block_size) {
for(int j=0; j<m; j+=block_size) {
int max_i2 = i+block_size < n ? i + block_size : n;
int max_j2 = j+block_size < m ? j + block_size : m;
for(int i2=i; i2<max_i2; i2+=4) {
for(int j2=j; j2<max_j2; j2+=4) {
transpose4x4_SSE(&A[i2*lda +j2], &B[j2*ldb + i2], lda, ldb);
}
}
}
}
}
This is going to depend on your application but in general the fastest way to transpose a matrix would be to invert your coordinates when you do a look up, then you do not have to actually move any data.
Some details about transposing 4x4 square float (I will discuss 32-bit integer later) matrices with x86 hardware. It's helpful to start here in order to transpose larger square matrices such as 8x8 or 16x16.
_MM_TRANSPOSE4_PS(r0, r1, r2, r3) is implemented differently by different compilers. GCC and ICC (I have not checked Clang) use unpcklps, unpckhps, unpcklpd, unpckhpd whereas MSVC uses only shufps. We can actually combine these two approaches together like this.
t0 = _mm_unpacklo_ps(r0, r1);
t1 = _mm_unpackhi_ps(r0, r1);
t2 = _mm_unpacklo_ps(r2, r3);
t3 = _mm_unpackhi_ps(r2, r3);
r0 = _mm_shuffle_ps(t0,t2, 0x44);
r1 = _mm_shuffle_ps(t0,t2, 0xEE);
r2 = _mm_shuffle_ps(t1,t3, 0x44);
r3 = _mm_shuffle_ps(t1,t3, 0xEE);
One interesting observation is that two shuffles can be converted to one shuffle and two blends (SSE4.1) like this.
t0 = _mm_unpacklo_ps(r0, r1);
t1 = _mm_unpackhi_ps(r0, r1);
t2 = _mm_unpacklo_ps(r2, r3);
t3 = _mm_unpackhi_ps(r2, r3);
v = _mm_shuffle_ps(t0,t2, 0x4E);
r0 = _mm_blend_ps(t0,v, 0xC);
r1 = _mm_blend_ps(t2,v, 0x3);
v = _mm_shuffle_ps(t1,t3, 0x4E);
r2 = _mm_blend_ps(t1,v, 0xC);
r3 = _mm_blend_ps(t3,v, 0x3);
This effectively converted 4 shuffles into 2 shuffles and 4 blends. This uses 2 more instructions than the implementation of GCC, ICC, and MSVC. The advantage is that it reduces port pressure which may have a benefit in some circumstances.
Currently all the shuffles and unpacks can go only to one particular port whereas the blends can go to either of two different ports.
I tried using 8 shuffles like MSVC and converting that into 4 shuffles + 8 blends but it did not work. I still had to use 4 unpacks.
I used this same technique for a 8x8 float transpose (see towards the end of that answer).
https://stackoverflow.com/a/25627536/2542702. In that answer I still had to use 8 unpacks but I manged to convert the 8 shuffles into 4 shuffles and 8 blends.
For 32-bit integers there is nothing like shufps (except for 128-bit shuffles with AVX512) so it can only be implemented with unpacks which I don't think can be convert to blends (efficiently). With AVX512 vshufi32x4 acts effectively like shufps except for 128-bit lanes of 4 integers instead of 32-bit floats so this same technique might be possibly with vshufi32x4 in some cases. With Knights Landing shuffles are four times slower (throughput) than blends.
If the size of the arrays are known prior then we could use the union to our help. Like this-
#include <bits/stdc++.h>
using namespace std;
union ua{
int arr[2][3];
int brr[3][2];
};
int main() {
union ua uav;
int karr[2][3] = {{1,2,3},{4,5,6}};
memcpy(uav.arr,karr,sizeof(karr));
for (int i=0;i<3;i++)
{
for (int j=0;j<2;j++)
cout<<uav.brr[i][j]<<" ";
cout<<'\n';
}
return 0;
}
Consider each row as a column, and each column as a row .. use j,i instead of i,j
demo: http://ideone.com/lvsxKZ
#include <iostream>
using namespace std;
int main ()
{
char A [3][3] =
{
{ 'a', 'b', 'c' },
{ 'd', 'e', 'f' },
{ 'g', 'h', 'i' }
};
cout << "A = " << endl << endl;
// print matrix A
for (int i=0; i<3; i++)
{
for (int j=0; j<3; j++) cout << A[i][j];
cout << endl;
}
cout << endl << "A transpose = " << endl << endl;
// print A transpose
for (int i=0; i<3; i++)
{
for (int j=0; j<3; j++) cout << A[j][i];
cout << endl;
}
return 0;
}
transposing without any overhead (class not complete):
class Matrix{
double *data; //suppose this will point to data
double _get1(int i, int j){return data[i*M+j];} //used to access normally
double _get2(int i, int j){return data[j*N+i];} //used when transposed
public:
int M, N; //dimensions
double (*get_p)(int, int); //functor to access elements
Matrix(int _M,int _N):M(_M), N(_N){
//allocate data
get_p=&Matrix::_get1; // initialised with normal access
}
double get(int i, int j){
//there should be a way to directly use get_p to call. but i think even this
//doesnt incur overhead because it is inline and the compiler should be intelligent
//enough to remove the extra call
return (this->*get_p)(i,j);
}
void transpose(){ //twice transpose gives the original
if(get_p==&Matrix::get1) get_p=&Matrix::_get2;
else get_p==&Matrix::_get1;
swap(M,N);
}
}
can be used like this:
Matrix M(100,200);
double x=M.get(17,45);
M.transpose();
x=M.get(17,45); // = original M(45,17)
of course I didn't bother with the memory management here, which is crucial but different topic.
template <class T>
void transpose( const std::vector< std::vector<T> > & a,
std::vector< std::vector<T> > & b,
int width, int height)
{
for (int i = 0; i < width; i++)
{
for (int j = 0; j < height; j++)
{
b[j][i] = a[i][j];
}
}
}
Modern linear algebra libraries include optimized versions of the most common operations. Many of them include dynamic CPU dispatch, which chooses the best implementation for the hardware at program execution time (without compromising on portability).
This is commonly a better alternative to performing manual optimization of your functinos via vector extensions intrinsic functions. The latter will tie your implementation to a particular hardware vendor and model: if you decide to swap to a different vendor (e.g. Power, ARM) or to a newer vector extensions (e.g. AVX512), you will need to re-implement it again to get the most of them.
MKL transposition, for example, includes the BLAS extensions function imatcopy. You can find it in other implementations such as OpenBLAS as well:
#include <mkl.h>
void transpose( float* a, int n, int m ) {
const char row_major = 'R';
const char transpose = 'T';
const float alpha = 1.0f;
mkl_simatcopy (row_major, transpose, n, m, alpha, a, n, n);
}
For a C++ project, you can make use of the Armadillo C++:
#include <armadillo>
void transpose( arma::mat &matrix ) {
arma::inplace_trans(matrix);
}
intel mkl suggests in-place and out-of-place transposition/copying matrices. here is the link to the documentation. I would recommend trying out of place implementation as faster ten in-place and into the documentation of the latest version of mkl contains some mistakes.
I think that most fast way should not taking higher than O(n^2) also in this way you can use just O(1) space :
the way to do that is to swap in pairs because when you transpose a matrix then what you do is: M[i][j]=M[j][i] , so store M[i][j] in temp, then M[i][j]=M[j][i],and the last step : M[j][i]=temp. this could be done by one pass so it should take O(n^2)
my answer is transposed of 3x3 matrix
#include<iostream.h>
#include<math.h>
main()
{
int a[3][3];
int b[3];
cout<<"You must give us an array 3x3 and then we will give you Transposed it "<<endl;
for(int i=0;i<3;i++)
{
for(int j=0;j<3;j++)
{
cout<<"Enter a["<<i<<"]["<<j<<"]: ";
cin>>a[i][j];
}
}
cout<<"Matrix you entered is :"<<endl;
for (int e = 0 ; e < 3 ; e++ )
{
for ( int f = 0 ; f < 3 ; f++ )
cout << a[e][f] << "\t";
cout << endl;
}
cout<<"\nTransposed of matrix you entered is :"<<endl;
for (int c = 0 ; c < 3 ; c++ )
{
for ( int d = 0 ; d < 3 ; d++ )
cout << a[d][c] << "\t";
cout << endl;
}
return 0;
}
I'm working on graph implementations in C++ and came across an implementation for an adjacency matrix that mostly made sense to me. The implementation uses an "init" function to initialize the matrix:
void init(int n) {
numVertex = 0;
numEdge = 0;
mark = new int[n]; //initialize mark array
for (int i = 0; i < numVertex; i++) {
mark[i] = 0;
}
matrix = (int**) new int*[numVertex]; //make matrix
for (int i = 0; i < numVertex; i++) {
matrix[i] = new int[numVertex];
}
for (int i = 0; i < numVertex; i++) { //mark all matrix cells as false
for (int j = 0; j < numVertex; j++) {
matrix[i][j] = 0;
}
}
}
The line I'm confused about is:
matrix = (int**) new int*[numVertex]; //make matrix
What does the (int**) aspect do? Why would I choose to use this instead of matrix = new int**[numVertex];?
Thanks so much!
(int**)value is a C-style cast operation.
Notes:
Don't use those in C++, it tends to cause or hide problems, like mismatches between right and left side of an assignment.
The code is relatively low quality, proper C++ would rather use std::vector.
The code is also not complete, so little can be said with certainty about how it functions.
Note that matrix = new int**[numVertex]; as mentioned by you would create (for this example) a 3D array, because you'd have numVertex entries of int**.
The (int**) cast does not accomplish much, if anything at all, because if matrix is of type int**, there is no need for the cast (you get back an int** already from the new).
If column dimension is fixed, you can use vector of array there.
godbolt
wandbox
#include <vector>
#include <array>
#include <iostream>
#include <iomanip>
template<typename T, int col>
using row_templ = std::array<T,col>;
template<typename T, int col, template <typename,int> typename U = row_templ>
using mat_templ = std::vector<U<T,col>>;
int main()
{
constexpr int numVertex = 30;
constexpr int numEdge = 30;
constexpr int numCol = numVertex;
int numRow = numEdge;
using row_t = row_templ<int, numCol>; // alias to the explicit class template specialization
using mat_t = mat_templ<int, numCol>;
auto make_mat = [&](){ return mat_t(numRow); }; // define a maker if lazy
mat_t my_mat(numRow);
mat_t my_mat2 = make_mat(); // or just use our maker
// Due to that default allocator uses value initialization, a.k.a T().
// At this point, all positions are value init to int(), which is zero,
// from value init of array<int, col>() by the default allocator.
// numVertex x numEdge is one solid contaguous chunk and now ready to roll.
// range for
for (row_t r : my_mat) {
for (int n : r) {
std::cout << std::setw(4) << n;
}
std::cout << '\n';
}
// classic for
for (int i = 0; i < numRow; ++i) {
for (int j = 0; j < numCol; ++j) {
std::cout << std::setw(4) << (my_mat2[i][j] = i*numRow + numCol);
}
std::cout << '\n';
}
}
#include <iostream>
using namespace std;
int main() {
int n,d,i=0,temp;
cin>>n>>d;
int a[1000000];
for(i=0;i<n;i++){
cin>>a[i];
}
while(d--){
temp=a[0];
for(i=1;i<n;i++){
a[i-1]=a[i];}
a[n-1]=temp;
}
for(i=0;i<n;i++){
cout<<a[i]<<" ";
}
return 0;
}
how to optimize it further as it's giving TLE error. the input file is very large obviously.
Some suggestions:
Rotate by the full amount d in a single loop (note that the result is a different array b):
for (i = 0; i < n; i++) {
b[(i+n-d) % n]=a[i];
}
Don't touch the array at all but transform the index when accessing it, for example:
cout << a[(i+n-d) % n] << " ";
The second version requires extra calculation to be done whenever accessing an array element but it should be faster if you don't need to access all array elements after each rotate operation.
There is also a way to do the rotation in-place by using a helper function that reverses a range of the array. It's a bit odd but might be the best solution. For convenience I have used a std::vector instead of an array here:
void ReverseVector( std::vector<int>& a, int from, int to ) {
for (auto i = 0; i < (to - from) / 2; i++) {
auto tmp = a[from + i];
a[from + i] = a[to - i];
a[to-i] = tmp;
}
}
void RotateVector( std::vector<int>& a, int distance ) {
distance = (distance + a.size()) % a.size();
ReverseVector( a, 0, a.size() - 1 );
ReverseVector( a, 0, distance - 1 );
ReverseVector( a, distance, a.size() - 1 );
}
I wrote a multithreaded simulated annealing program but its not running. I am not sure if the code is correct or not. The code is able to compile but when i run the code it crashes. Its just a run time error.
#include <stdio.h>
#include <time.h>
#include <iostream>
#include <stdlib.h>
#include <math.h>
#include <string>
#include <vector>
#include <algorithm>
#include <fstream>
#include <ctime>
#include <windows.h>
#include <process.h>
using namespace std;
typedef vector<double> Layer; //defines a vector type
typedef struct {
Layer Solution1;
double temp1;
double coolingrate1;
int MCL1;
int prob1;
}t;
//void SA(Layer Solution, double temp, double coolingrate, int MCL, int prob){
double Rand_NormalDistri(double mean, double stddev) {
//Random Number from Normal Distribution
static double n2 = 0.0;
static int n2_cached = 0;
if (!n2_cached) {
// choose a point x,y in the unit circle uniformly at random
double x, y, r;
do {
// scale two random integers to doubles between -1 and 1
x = 2.0*rand()/RAND_MAX - 1;
y = 2.0*rand()/RAND_MAX - 1;
r = x*x + y*y;
} while (r == 0.0 || r > 1.0);
{
// Apply Box-Muller transform on x, y
double d = sqrt(-2.0*log(r)/r);
double n1 = x*d;
n2 = y*d;
// scale and translate to get desired mean and standard deviation
double result = n1*stddev + mean;
n2_cached = 1;
return result;
}
} else {
n2_cached = 0;
return n2*stddev + mean;
}
}
double FitnessFunc(Layer x, int ProbNum)
{
int i,j,k;
double z;
double fit = 0;
double sumSCH;
if(ProbNum==1){
// Ellipsoidal function
for(j=0;j< x.size();j++)
fit+=((j+1)*(x[j]*x[j]));
}
else if(ProbNum==2){
// Schwefel's function
for(j=0; j< x.size(); j++)
{
sumSCH=0;
for(i=0; i<j; i++)
sumSCH += x[i];
fit += sumSCH * sumSCH;
}
}
else if(ProbNum==3){
// Rosenbrock's function
for(j=0; j< x.size()-1; j++)
fit += 100.0*(x[j]*x[j] - x[j+1])*(x[j]*x[j] - x[j+1]) + (x[j]-1.0)*(x[j]-1.0);
}
return fit;
}
double probl(double energychange, double temp){
double a;
a= (-energychange)/temp;
return double(min(1.0,exp(a)));
}
int random (int min, int max){
int n = max - min + 1;
int remainder = RAND_MAX % n;
int x;
do{
x = rand();
}while (x >= RAND_MAX - remainder);
return min + x % n;
}
//void SA(Layer Solution, double temp, double coolingrate, int MCL, int prob){
void SA(void *param){
t *args = (t*) param;
Layer Solution = args->Solution1;
double temp = args->temp1;
double coolingrate = args->coolingrate1;
int MCL = args->MCL1;
int prob = args->prob1;
double Energy;
double EnergyNew;
double EnergyChange;
Layer SolutionNew(50);
Energy = FitnessFunc(Solution, prob);
while (temp > 0.01){
for ( int i = 0; i < MCL; i++){
for (int j = 0 ; j < SolutionNew.size(); j++){
SolutionNew[j] = Rand_NormalDistri(5, 1);
}
EnergyNew = FitnessFunc(SolutionNew, prob);
EnergyChange = EnergyNew - Energy;
if(EnergyChange <= 0){
Solution = SolutionNew;
Energy = EnergyNew;
}
if(probl(EnergyChange ,temp ) > random(0,1)){
//cout<<SolutionNew[i]<<endl;
Solution = SolutionNew;
Energy = EnergyNew;
cout << temp << "=" << Energy << endl;
}
}
temp = temp * coolingrate;
}
}
int main ()
{
srand ( time(NULL) ); //seed for getting different numbers each time the prog is run
Layer SearchSpace(50); //declare a vector of 20 dimensions
//for(int a = 0;a < 10; a++){
for (int i = 0 ; i < SearchSpace.size(); i++){
SearchSpace[i] = Rand_NormalDistri(5, 1);
}
t *arg1;
arg1 = (t *)malloc(sizeof(t));
arg1->Solution1 = SearchSpace;
arg1->temp1 = 1000;
arg1->coolingrate1 = 0.01;
arg1->MCL1 = 100;
arg1->prob1 = 3;
//cout << "Test " << ""<<endl;
_beginthread( SA, 0, (void*) arg1);
Sleep( 100 );
//SA(SearchSpace, 1000, 0.01, 100, 3);
//}
return 0;
}
Please help.
Thanks
Avinesh
As leftaroundabout pointed out, you're using malloc in C++ code. This is the source of your crash.
Malloc will allocate a block of memory, but since it was really designed for C, it doesn't call any C++ constructors. In this case, the vector<double> is never properly constructed. When
arg1->Solution1 = SearchSpace;
Is called, the member variable "Solution1" has an undefined state and the assignment operator crashes.
Instead of malloc try
arg1 = new t;
This will accomplish roughly the same thing but the "new" keyword also calls any necessary constructors to ensure the vector<double> is properly initialized.
This also brings up another minor issue, that this memory you've newed also needs to be deleted somewhere. In this case, since arg1 is passed to another thread, it should probably be cleaned up like
delete args;
by your "SA" function after its done with the args variable.
While I don't know the actual cause for your crashes I'm not really surprised that you end up in trouble. For instance, those "cached" static variables in Rand_NormalDistri are obviously vulnerable to data races. Why don't you use std::normal_distribution? It's almost always a good idea to use standard library routines when they're available, and even more so when you need to consider multithreading trickiness.
Even worse, you're heavily mixing C and C++. malloc is something you should virtually never use in C++ code – it doesn't know about RAII, which is one of the few intrinsically safe things you can cling onto in C++.