trying to output the element of an array in c++ - c++

I am trying to search which elements in this c++ array contain the highest value and then output the key of that element but it keeps outputting a list of numbers like 9044007. What I want to know is how do I output the first number only because I'm guessing it is outputting all the keys of the elements with the same value.
double max = 0;
for (int i = 1; i <= 9; i++) {
if (responselayer[i] > max) {
max = responselayer[i];
for (int i = 1; i < 10; i++) {
if (responselayer[i] == max) {
return i;
Here is the part of the code that fills the entire thing. It's a lot of code so it's hard to make it minimal. This is my first time trying stack so I don't really know how everything works on here.
//final layer input calculation and output//
for (int j = 1; j <= 10; j++) {
double subval = 0;
if (j == 1) {
for (int i = 0; i < 5; i++) {
subval = (midlayerweight4[i] * midlayer3[i]) + subval;
responselayer[j] = subval;
responselayer[j] = 1 / (1 + exp(-responselayer[j]));
else if (j == 2) {
int k = 0;
for (int i = 5; i < 10; i++) {
subval = (midlayerweight4[i] * midlayer3[k]) + subval;
responselayer[j] = subval;
responselayer[j] = 1 / (1 + exp(-responselayer[j]));
else if (j == 3) {
int k = 0;
for (int i = 10; i <15; i++) {
subval = (midlayerweight4[i] * midlayer3[k]) + subval;
responselayer[j] = subval;
responselayer[j] = 1 / (1 + exp(-responselayer[j]));
else if (j == 4) {
int k = 0;
for (int i = 15; i < 20; i++) {
subval = (midlayerweight4[i] * midlayer3[k]) + subval;
responselayer[j] = subval;
responselayer[j] = 1 / (1 + exp(-responselayer[j]));
else if (j == 5) {
int k = 0;
for (int i = 20; i <25; i++) {
subval = (midlayerweight4[i] * midlayer3[k]) + subval;
responselayer[j] = subval;
responselayer[j] = 1 / (1 + exp(-responselayer[j]));
else if (j == 6) {
int k = 0;
for (int i = 25; i <30; i++) {
subval = (midlayerweight4[i] * midlayer3[k]) + subval;
responselayer[j] = subval;
responselayer[j] = 1 / (1 + exp(-responselayer[j]));
else if (j == 7) {
int k = 0;
for (int i = 30; i <35; i++) {
subval = (midlayerweight4[i] * midlayer3[k]) + subval;
responselayer[j] = subval;
responselayer[j] = 1 / (1 + exp(-responselayer[j]));
else if (j == 8) {
int k = 0;
for (int i = 35; i < 40; i++) {
subval = (midlayerweight4[i] * midlayer3[k]) + subval;
responselayer[j] = subval;
responselayer[j] = 1 / (1 + exp(-responselayer[j]));
else if (j == 9) {
int k = 0;
for (int i = 40; i <45; i++) {
subval = (midlayerweight4[i] * midlayer3[k]) + subval;
responselayer[j] = subval;
responselayer[j] = 1 / (1 + exp(-responselayer[j]));
double max = 0;
for (int i = 1; i <10; i++) {
if (responselayer[i] > max) {
max = responselayer[i];
for (int i = 1; i < 10; i++) {
if (responselayer[i] == max) {
return i;
midlayerweight4[] was filled by an array before it in this same fashion and so was others behind that. This is the declaration for responselayer and all the other arras used were of type double and they were all declared outside of the array.
double responselayer[10];

A few things:
Array indexing in C/C++ starts at zero.
You do not show us the type of your array, so I'm going to assume it is double responselayer[SZ]; If this is not the case, you have some issues involving implicit type conversion from integral to floating point types.
It is sloppy to use 2 different styles of conditions in your for-loop.
It is sloppy to use magic numbers in your code.
I would write the code as follows:
size_t find_max_index(const double data[], size_t size) {
size_t max_index = 0;
// No need to compare data[0] to itself on the first iteration, so
// loop starts at one
for (size_t i = 1; i < size; ++i)
if (data[i] > data[max_index])
max_index = i;
return max_index;
Or getting a little fancier and taking care of the possibility of floating point strangeness, we can make it a templated function:
template <typename T>
size_t find_max_index(const T data[], size_t size) {
size_t max_index = 0;
for (size_t i = 1; i < size; ++i)
if (data[i] > data[max_index])
max_index = i;
return max_index;


OpenMP with good performance in C function, but not in C++ class method

I have been trying to port a code for an university-related project from C to C++ while also adapting it to an object-oriented paradigm. The original code makes use of OpenMP, through pragmas to create a parallel variation of a function used to calculate certain values and matrices according to an input given by the user.
In C, I wrote a function that returns a struct data type I have defined myself like such. A note: the DoubVet1D/2D functions basically call calloc to initialize arrays of a size defined by their argument(s).
SaidaEstacionario getDDParallel(DadosEntrada e){
SaidaEstacionario dd;
* Preenche os vetores com os dados do dom�nio
int J = 0;
for (int i = 0; i < e.numReg; i++)
J += e.partReg[i];
int N = e.ordQuad;
dd.dimFi = J + 1;
//NNR = e.NNR;
//double Stop = e.erro;
//TipoContorno = e.CCETipo + e.CCDTipo;
double *St = DoubVet1D(J);
double *Ss = DoubVet1D(J);
double *Q = DoubVet1D(J);
double *h = DoubVet1D(J);
double *S = DoubVet1D(J);
//double *Fi = DoubVet1D(J + 1);
//double *x = DoubVet1D(J + 1);
//double *TaxaAbR = DoubVet1D(e.numReg);
double **psiOld = DoubVet2D(e.ordQuad, J + 1);
double **psi = DoubVet2D(e.ordQuad, J + 1);
double tmpErro = 0;
double erro = 0;
int k = 0; = DoubVet1D(J + 1);
dd.x = DoubVet1D(J + 1);
dd.x[0] = 0;
for (int i = 0; i < e.numReg; i++)
double temp = e.tamReg[i] / e.partReg[i];
for (int j = 0; j < e.partReg[i]; j++)
h[k] = temp;
Ss[k] = e.ssZon[e.zonReg[i]];
St[k] = e.stZon[e.zonReg[i]];
Q[k] = e.fontReg[i];
dd.x[k + 1] = dd.x[k] + temp;
/* Estabelecendo a condi��o de contorno */
for (int n = 0; n < e.ordQuad/2; n++)
if(e.cce < 0){
psi[n][0] = 0;
psiOld[n][0] = psi[n][0];
psi[n][0] = e.cce;
psiOld[n][0] = psi[n][0];
if(e.ccd < 0){
psi[e.ordQuad/2 + n][J] = 0;
psiOld[e.ordQuad/2 + n][0] = psi[e.ordQuad/2 + n][0];
psi[e.ordQuad/2 + n][J] = e.ccd;
psiOld[e.ordQuad/2 + n][0] = psi[e.ordQuad/2 + n][0];
/* Obtendo as quadraturas de Gauss-Legendre */
PLegendre pl = GaussLegendreAbsPes(e.ordQuad);
double *Mi = pl.mi;
double *W = pl.w;
double start;
dd.numInter = 0;
/* Calculando a fonte de espalhamento */
#pragma omp parallel for
for (int j = 0; j < J; j++)
S[j] = 0;
for (int n = 0; n < N; n++)
S[j] += 0.25 * Ss[j] * (psi[n][j + 1] + psi[n][j]) * W[n];
/* Varredura para a direita/esquerda */
#pragma omp sections
#pragma omp section
#pragma omp parallel for
for (int m = 0; m < N / 2; m++)
for (int j = 0; j < J; j++)
/* Varredura para a direita */
psiOld[m][j + 1] = psi[m][j + 1];
psi[m][j + 1] = ((Mi[m] / h[j] - 0.5 * St[j]) * psi[m][j] + S[j] + Q[j]) / (Mi[m] / h[j] + 0.5 * St[j]);
#pragma omp section
#pragma omp parallel for
for (int m = 0; m < N / 2; m++)
for (int j = 0; j < J; j++)
/* Varredura para a esquerda */
psiOld[m + N / 2][J - 1 - j] = psi[m + N / 2][J - 1 - j];
psi[m + N / 2][J - 1 - j] = ((Mi[m] / h[J - 1 - j] - 0.5 * St[J - 1 - j]) * psi[m + N / 2][J - j] + S[J - 1 - j] + Q[J - 1 - j]) / (Mi[m] / h[J - 1 - j] + 0.5 * St[J - 1 - j]);
} // Aqui h� uma barreira impl�cita
/* Estabelecendo a condi��o de contorno */
if(e.cce < 0 || e.ccd < 0){
#pragma omp parallel for
for (int n = 0; n < e.ordQuad/2; n++)
if(e.cce < 0){
psiOld[n][0] = psi[n][0];
psi[n][0] = psi[e.ordQuad/2 + n][ 0];
if(e.ccd < 0){
psiOld[e.ordQuad/2 + n][0] = psi[n][0];
psi[e.ordQuad/2 + n][J] = psi[n][J];
/* Calculando o erro para o crit�rio de parada*/
tmpErro = 0;
erro = 0;
for (int m = 0; m < N; m++)
for (int j = 0; j < J; j++){
tmpErro = Abs(psi[m][j] - psiOld[m][j])/psi[m][j];
if(tmpErro > erro) erro = tmpErro;
}while(erro > e.erro);
dd.tempoVar = omp_get_wtime() - start;
dd.psi = DoubVet2D(e.ordQuad, J + 1);
/* Calcula o fluxo escalar */
#pragma omp parallel for
for (int j = 0; j <= J; j++){[j] = 0;
for (int m = 0; m < N; m++){[j] += psi[m][j] * W[m];
dd.psi[m][j] = psi[m][j];
/*Calcula a taxa de absorção*/
int i = 0;
dd.taxaAbsorRegiao = DoubVet1D(e.numReg);
dd.taxaAbsorTotal = 0;
for (int r = 0; r < e.numReg; r++)
dd.taxaAbsorRegiao[r] = 0;
#pragma omp parallel for
for (int j = i; j < i + e.partReg[r]; j++)
dd.taxaAbsorRegiao[r] += 0.5 * ([j] +[j + 1]);
dd.taxaAbsorRegiao[r] *= (St[i] - Ss[i]) * h[i];
dd.taxaAbsorTotal += dd.taxaAbsorRegiao[r];
i += e.partReg[r];
return dd;
///Execu��o dos c�lculos do m�todo Diamod Difference DD ou Dg estacion�rio vers�o parallel
SaidaEstacionario getDgDDParallel(DadosEntrada e){
SaidaEstacionario saida;
* Preenche os vetores com os dados do dom�nio
int J = 0;
for (int i = 0; i < e.numReg; i++)
J += e.partReg[i];
int N = e.ordQuad;
saida.dimFi = J + 1;
//NNR = e.NNR;
//double Stop = e.erro;
//TipoContorno = e.CCETipo + e.CCDTipo;
double *St = DoubVet1D(J);
double *Ss = DoubVet1D(J);
double *Q = DoubVet1D(J);
double *h = DoubVet1D(J);
double *S = DoubVet1D(J);
//double *Fi = DoubVet1D(J + 1);
//double *x = DoubVet1D(J + 1);
//double *TaxaAbR = DoubVet1D(e.numReg);
double **psiOld = DoubVet2D(e.ordQuad, J + 1);
double **psi = DoubVet2D(e.ordQuad, J + 1);
double tmpErro = 0;
double erro = 0;
int k = 0; = DoubVet1D(J + 1);
saida.x = DoubVet1D(J + 1);
saida.x[0] = 0;
for (int i = 0; i < e.numReg; i++)
double temp = e.tamReg[i] / e.partReg[i];
for (int j = 0; j < e.partReg[i]; j++)
h[k] = temp;
Ss[k] = e.ssZon[e.zonReg[i]];
St[k] = e.stZon[e.zonReg[i]];
Q[k] = e.fontReg[i];
saida.x[k + 1] = saida.x[k] + temp;
/* Estabelecendo a condi��o de contorno */
for (int n = 0; n < e.ordQuad/2; n++)
if(e.cce < 0){
psi[n][0] = 0;
psiOld[n][0] = psi[n][0];
psi[n][0] = e.cce;
psiOld[n][0] = psi[n][0];
if(e.ccd < 0){
psi[e.ordQuad/2 + n][J] = 0;
psiOld[e.ordQuad/2 + n][0] = psi[e.ordQuad/2 + n][0];
psi[e.ordQuad/2 + n][J] = e.ccd;
psiOld[e.ordQuad/2 + n][0] = psi[e.ordQuad/2 + n][0];
/* Obtendo as quadraturas de Gauss-Legendre */
PLegendre pl = GaussLegendreAbsPes(e.ordQuad);
double *Mi = pl.mi;
double *W = pl.w;
double start;
start = omp_get_wtime();
saida.numInter = 0;
/* Calculando a fonte de espalhamento */
#pragma omp parallel for
for (int j = 0; j < J; j++)
S[j] = 0;
for (int n = 0; n < N; n++)
S[j] += 0.25 * Ss[j] * (psi[n][j + 1] + psi[n][j]) * W[n];
/* Varredura para a direita/esquerda */
#pragma omp sections
#pragma omp section
#pragma omp parallel for
for (int m = 0; m < N / 2; m++)
for (int j = 0; j < J; j++)
/* Varredura para a direita */
psiOld[m][j + 1] = psi[m][j + 1];
psi[m][j + 1] = ((Mi[m] / h[j] - 0.5 * St[j] * (1 - e.teta)) * psi[m][j] + S[j] + Q[j]) / (Mi[m] / h[j] + 0.5 * St[j] * (1 + e.teta));
#pragma omp section
#pragma omp parallel for
for (int m = 0; m < N / 2; m++)
for (int j = 0; j < J; j++)
/* Varredura para a esquerda */
psiOld[m + N / 2][J - 1 - j] = psi[m + N / 2][J - 1 - j];
psi[m + N / 2][J - 1 - j] = ((Mi[m] / h[J - 1 - j] - 0.5 * St[J - 1 - j] * (1 - e.teta)) * psi[m + N / 2][J - j] + S[J - 1 - j] + Q[J - 1 - j]) / (Mi[m] / h[J - 1 - j] + 0.5 * St[J - 1 - j] * (1 + e.teta));
} // Aqui h� uma barreira impl�cita
/* Estabelecendo a condi��o de contorno */
if(e.cce < 0 || e.ccd < 0){
#pragma omp parallel for
for (int n = 0; n < e.ordQuad/2; n++)
if(e.cce < 0){
psiOld[n][0] = psi[n][0];
psi[n][0] = psi[e.ordQuad/2 + n][ 0];
if(e.ccd < 0){
psiOld[e.ordQuad/2 + n][0] = psi[n][0];
psi[e.ordQuad/2 + n][J] = psi[n][J];
/* Calculando o erro para o crit�rio de parada*/
tmpErro = 0;
erro = 0;
for (int m = 0; m < N; m++)
for (int j = 0; j < J; j++){
tmpErro = Abs(psi[m][j] - psiOld[m][j])/psi[m][j];
if(tmpErro > erro) erro = tmpErro;
}while(erro > e.erro);
saida.tempoVar = omp_get_wtime() - start;
saida.psi = DoubVet2D(e.ordQuad, J + 1);
/* Calcula o fluxo escalar */
#pragma omp parallel for
for (int j = 0; j <= J; j++){[j] = 0;
for (int m = 0; m < N; m++){[j] += psi[m][j] * W[m];
saida.psi[m][j] = psi[m][j];
/*Calcula a taxa de absorção*/
int i = 0;
saida.taxaAbsorRegiao = DoubVet1D(e.numReg);
saida.taxaAbsorTotal = 0;
for (int r = 0; r < e.numReg; r++)
saida.taxaAbsorRegiao[r] = 0;
#pragma omp parallel for
for (int j = i; j < i + e.partReg[r]; j++)
saida.taxaAbsorRegiao[r] += 0.5 * ([j] +[j + 1]);
saida.taxaAbsorRegiao[r] *= (St[i] - Ss[i]) * h[i];
saida.taxaAbsorTotal += saida.taxaAbsorRegiao[r];
i += e.partReg[r];
return saida;
As you can see, I only use #pragma omp parallel for and #pragma omp parallel section. In C++, as part of the object orientation, I transformed the original SaidaEstacionario struct in two classes called Methodand StationaryMethod (which inherits from Method), and implemented the function above as a method of a subclass at the lower end of the inheritance, while also encapsulating the variables and creating getters/setters. The end result was a StationaryMethodclass, which inherits from StationaryMethod, with the following method:
void StationaryDD::calculateParallel(){
* Preenche os vetores com os dados do dom�nio
int J = 0;
for (int i = 0; i < this->inputData.getNumReg(); i++)
J += this->inputData.getPartReg()[i];
int N = this->inputData.getQuadOrder();
this->dimFi = J + 1;
//NNR = this->inputData.NNR;
//double Stop = this->inputData.erro;
//TipoContorno = this->inputData.getCcl()Tipo + this->inputData.getCcr()Tipo;
double *St = DoubVet1D(J);
double *Ss = DoubVet1D(J);
double *Q = DoubVet1D(J);
double *h = DoubVet1D(J);
double *S = DoubVet1D(J);
//double *Fi = DoubVet1D(J + 1);
//double *x = DoubVet1D(J + 1);
//double *TaxaAbR = DoubVet1D(this->inputData.numReg);
double **psiOld = DoubVet2D(this->inputData.getQuadOrder(), J + 1);
double **psi = DoubVet2D(this->inputData.getQuadOrder(), J + 1);
double tmpErro = 0;
double erro = 0;
int k = 0;
this->fi = DoubVet1D(J + 1);
this->x = DoubVet1D(J + 1);
this->x[0] = 0;
for (int i = 0; i < this->inputData.getNumReg(); i++)
double temp = this->inputData.getSizeReg()[i] / this->inputData.getPartReg()[i];
for (int j = 0; j < this->inputData.getPartReg()[i]; j++)
h[k] = temp;
Ss[k] = this->inputData.getSsZon()[this->inputData.getZonReg()[i]];
St[k] = this->inputData.getStZon()[this->inputData.getZonReg()[i]];
Q[k] = this->inputData.getSrcReg()[i];
this->x[k + 1] = this->x[k] + temp;
/* Estabelecendo a condi��o de contorno */
for (int n = 0; n < this->inputData.getQuadOrder()/2; n++)
if(this->inputData.getCcl() < 0){
psi[n][0] = 0;
psiOld[n][0] = psi[n][0];
psi[n][0] = this->inputData.getCcl();
psiOld[n][0] = psi[n][0];
if(this->inputData.getCcr() < 0){
psi[this->inputData.getQuadOrder()/2 + n][J] = 0;
psiOld[this->inputData.getQuadOrder()/2 + n][0] = psi[this->inputData.getQuadOrder()/2 + n][0];
psi[this->inputData.getQuadOrder()/2 + n][J] = this->inputData.getCcr();
psiOld[this->inputData.getQuadOrder()/2 + n][0] = psi[this->inputData.getQuadOrder()/2 + n][0];
/* Obtendo as quadraturas de Gauss-Legendre */
PLegendre pl = PLegendre(this->inputData.getQuadOrder());
double *Mi = pl.getMi();
double *W = pl.getW();
double start;
this->numIter = 0;
/* Calculando a fonte de espalhamento */
#pragma omp parallel for
for (int j = 0; j < J; j++)
printf("T%d: j = %d\n", omp_get_thread_num(), j);
S[j] = 0;
for (int n = 0; n < N; n++)
S[j] += 0.25 * Ss[j] * (psi[n][j + 1] + psi[n][j]) * W[n];
/* Varredura para a direita/esquerda */
#pragma omp sections
#pragma omp section
#pragma omp parallel for
for (int m = 0; m < N / 2; m++)
for (int j = 0; j < J; j++)
/* Varredura para a direita */
psiOld[m][j + 1] = psi[m][j + 1];
psi[m][j + 1] = ((Mi[m] / h[j] - 0.5 * St[j]) * psi[m][j] + S[j] + Q[j]) / (Mi[m] / h[j] + 0.5 * St[j]);
#pragma omp section
#pragma omp parallel for
for (int m = 0; m < N / 2; m++)
for (int j = 0; j < J; j++)
/* Varredura para a esquerda */
psiOld[m + N / 2][J - 1 - j] = psi[m + N / 2][J - 1 - j];
psi[m + N / 2][J - 1 - j] = ((Mi[m] / h[J - 1 - j] - 0.5 * St[J - 1 - j]) * psi[m + N / 2][J - j] + S[J - 1 - j] + Q[J - 1 - j]) / (Mi[m] / h[J - 1 - j] + 0.5 * St[J - 1 - j]);
} // Aqui h� uma barreira impl�cita
/* Estabelecendo a condi��o de contorno */
if(this->inputData.getCcl() < 0 || this->inputData.getCcr() < 0){
#pragma omp parallel for
for (int n = 0; n < this->inputData.getQuadOrder()/2; n++)
if(this->inputData.getCcl() < 0){
psiOld[n][0] = psi[n][0];
psi[n][0] = psi[this->inputData.getQuadOrder()/2 + n][ 0];
if(this->inputData.getCcr() < 0){
psiOld[this->inputData.getQuadOrder()/2 + n][0] = psi[n][0];
psi[this->inputData.getQuadOrder()/2 + n][J] = psi[n][J];
/* Calculando o erro para o crit�rio de parada*/
tmpErro = 0;
erro = 0;
for (int m = 0; m < N; m++)
for (int j = 0; j < J; j++){
tmpErro = fabs(psi[m][j] - psiOld[m][j])/psi[m][j];
if(tmpErro > erro) erro = tmpErro;
}while(erro > this->inputData.getE());
this->tempoVar = omp_get_wtime() - start;
this->psi = DoubVet2D(this->inputData.getQuadOrder(), J + 1);
/* Calcula o fluxo escalar */
#pragma omp parallel for
for (int j = 0; j <= J; j++){
this->fi[j] = 0;
for (int m = 0; m < N; m++){
this->fi[j] += psi[m][j] * W[m];
this->psi[m][j] = psi[m][j];
/*Calcula a taxa de absorção*/
int i = 0;
this->taxaAbsorRegiao = DoubVet1D(this->inputData.getNumReg());
this->taxaAbsorTotal = 0;
for (int r = 0; r < this->inputData.getNumReg(); r++)
this->taxaAbsorRegiao[r] = 0;
#pragma omp parallel for
for (int j = i; j < i + this->inputData.getPartReg()[r]; j++)
this->taxaAbsorRegiao[r] += 0.5 * (this->fi[j] + this->fi[j + 1]);
this->taxaAbsorRegiao[r] *= (St[i] - Ss[i]) * h[i];
this->taxaAbsorTotal += this->taxaAbsorRegiao[r];
i += this->inputData.getPartReg()[r];
Which basically does the same as the function above, but instead of returning the struct data type, simply stores all the data in the object itself defined by the class.
However, when comparing the time spent in each method, the method implemented in a C++ class takes a LOT more time than the original C function to execute. It is extremely bad to a point that, where the original C method would result in a fair bit of time saved compared to a non-paralellized variation of the same method (that is, without using OpenMP pragmas), this one will be much worse than it, taking up to a minute to finish a calculation that would be done in a fraction of a second. At first, I thought that the encapsulation could be slowing things down, but simply making everything public and calling the attributes directly instead of getters did not work. Would anyone have any insight as to why this could be happening?

Low Accuracy of DNN

I've been implementing NN recently based on I've made whole algorithm for backprop and SGD almost the same way as author of this book. The problem is that while he gets accuracy around 90 % after one epoch i get 30% after 5 epochs even though i have the same hiperparameters. Do you have any idea what might be the cause ?
Here s my respository.
Here is part with algorithm for backprop and SGD implemented in Network.cpp:
void Network::Train(MatrixD_Array& TrainingData, MatrixD_Array& TrainingLabels, int BatchSize,int epochs, double LearningRate)
assert(TrainingData.size() == TrainingLabels.size() && CostFunc != nullptr && CostFuncDer != nullptr && LearningRate > 0);
std::vector<long unsigned int > indexes;
for (int i = 0; i < TrainingData.size(); i++) indexes.push_back(i);
std::random_device rd;
std::mt19937 g(rd());
std::vector<Matrix<double>> NablaWeights;
std::vector<Matrix<double>> NablaBiases;
for (int i = 0; i < Layers.size(); i++)
NablaWeights[i] = Matrix<double>(Layers[i].GetInDim(), Layers[i].GetOutDim());
NablaBiases[i] = Matrix<double>(1, Layers[i].GetOutDim());
//---- Epoch iterating
for (int i = 0; i < epochs; i++)
cout << "Epoch number: " << i << endl;
shuffle(indexes.begin(), indexes.end(), g);
// Batch iterating
for (int batch = 0; batch < TrainingData.size(); batch = batch + BatchSize)
for (int i = 0; i < Layers.size(); i++)
int i = 0;
while( i < BatchSize && (i+batch)< TrainingData.size())
std::vector<Matrix<double>> ActivationOutput;
std::vector<Matrix<double>> Z_Output;
ActivationOutput.resize(Layers.size() + 1);
ActivationOutput[0] = TrainingData[indexes[i + batch]];
int index = 0;
// Pushing values through
for (auto layer : Layers)
Z_Output[index] = layer.Mul(ActivationOutput[index]);
ActivationOutput[index + 1] = layer.ApplyActivation(Z_Output[index]);
// ---- Calculating Nabla that will be later devided by batch size element wise
auto DeltaNabla = BackPropagation(ActivationOutput, Z_Output, TrainingLabels[indexes[i + batch]]);
for (int i = 0; i < Layers.size(); i++)
NablaWeights[i] = NablaWeights[i] + DeltaNabla.first[i];
NablaBiases[i] = NablaBiases[i] + DeltaNabla.second[i];
for (int g = 0; g < Layers.size(); g++)
Layers[g].Weights = Layers[g].Weights - NablaWeights[g] * LearningRate;
Layers[g].Biases = Layers[g].Biases - NablaBiases[g] * LearningRate;
// std::transform(NablaWeights.begin(), NablaWeights.end(), NablaWeights.begin(),[BatchSize, LearningRate](Matrix<double>& Weight) {return Weight * (LearningRate / BatchSize);});
//std::transform(NablaBiases.begin(), NablaBiases.end(), NablaBiases.begin(), [BatchSize, LearningRate](Matrix<double>& Bias) {return Bias * (LearningRate / BatchSize); });
std::pair<MatrixD_Array, MatrixD_Array> Network::BackPropagation( MatrixD_Array& ActivationOutput, MatrixD_Array& Z_Output,Matrix<double>& label)
MatrixD_Array NablaWeight;
MatrixD_Array NablaBias;
auto zs = Layers[Layers.size() - 1].ActivationPrime(Z_Output[Z_Output.size() - 1]);
Matrix<double> Delta_L = Hadamard(CostFuncDer(ActivationOutput[ActivationOutput.size() - 1],label), zs);
NablaWeight[Layers.size() - 1] = Delta_L * ActivationOutput[ActivationOutput.size() - 2].Transpose();
NablaBias[Layers.size() - 1] = Delta_L;
for (int j = 2; j <= Layers.size() ; j++)
auto sp = Layers[Layers.size() - j].ActivationPrime(Z_Output[Layers.size() -j]);
Delta_L = Hadamard(Layers[Layers.size() - j+1 ].Weights.Transpose() * Delta_L, sp);
NablaWeight[Layers.size() - j] = Delta_L * ActivationOutput[ActivationOutput.size() -j-1].Transpose();
NablaBias[Layers.size() - j] = Delta_L;
return make_pair(NablaWeight, NablaBias);
It turned out that mnist loader didnt work correctly.

has invalid type for ‘reduction' openmp for c++

I have used openm to parallelize my c++ code as below:
int shell_num = 50, grparallel[shell_num],grbot[shell_num];
double p_x,p_y,grp[shell_num];
for (int f = 0; f < shell_num; f++)
grp[f] = 0;
grparallel[f] = 0;
grbot[f] = 0;
//some code...
#pragma omp parallel for reduction(+ : grp,grparallel,grbot)
for(int i = 0; i < N; i++){ //some code
for(int j = 0; j < N; j++){
if (j==i) continue;
double delta_x = x[i]-x[j],
delta_y = y[i]-y[j],
e_dot_e = e_x[i] * e_x[j] + e_y[i] * e_y[j],
e_cross_e = e_x[i] * e_y[j] - e_y[i] * e_x[j];
if (j > i)
double fasele = sqrt(dist(x[i],y[i],x[j],y[j],L));
for (int h = 0; h < shell_num; h++) //determine periodic distance between i and j is in which shel
if( L * h / 100 < fasele && fasele < L * (h + 1) / 100 )
{grp[h]+= e_dot_e;
double pdotr = abs(periodic(delta_x,L) * p_x + periodic(delta_y,L) * p_y)/fasele;
if (pdotr > 0.9659)
grparallel[h]+= 1;}else if(pdotr < 0.2588)
grbot[h]+= 1;
When I run the code in terminal, there is an error:
‘grp’ has invalid type for ‘reduction’
The same error occurs for grparallel and grbot.
How can I remove the error?

Hourglass sum in 2D array

We are given a (6*6) 2D array of which we have to find largest sum of a hourglass in it.
For example, if we create an hourglass using the number 1 within an array full of zeros, it may look like this:
The sum of an hourglass is the sum of all the numbers within it. The sum for the hourglasses above are 7, 4, and 2, respectively.
I had written a code for it as follows.It is basically a competitive programming question and as I am new to the field,I have written the code with a very bad compplexity..perhaps so much that the program could not produce the desired output within the stipulated period of time.Below is my code:
int main(){
vector< vector<int> > arr(6,vector<int>(6));
for(int arr_i = 0;arr_i < 6;arr_i++)
for(int arr_j = 0;arr_j < 6;arr_j++)
cin >> arr[arr_i][arr_j];
} //numbers input
int temp; //temporary sum storing variable
int sum=INT_MIN; //largest sum storing variable
for(int i=0;i+2<6;i++) //check if at least3 exist at bottom
int c=0; //starting point of traversing column wise for row
while(c+2<6) //three columns exist ahead from index
int f=0; //test case variable
{ //if array does not meet requirements,no need of more execution
for(int j=c;j<=j+2;j++)
{ //1st and 3rd row middle element is 0 and 2nd row is non 0.
//condition for hourglass stucture
if((j-c)%2==0 && arr[i+1][j]==0||((j-c)%2==1 && arr[i+1][j]!=0)
//storing 3 dimensional subarray sum column wise
temp+=arr[i][j]+arr[i+1][j]+arr[i+2][j]; //sum storage
f=1; //end traversing further on failure
f=1;//exit condition
}//whiel loop of test variable
temp=0; //reset for next subarray execution
c++; /*begin traversal from one index greater column wise till
}// while loop of c
return 0;
This is my implementation of the code which failed to process in the time interval.Please suggest a better solution considering the time complexity and feel free to point out any mistakes from my side in understanding the problem.The question is from Hackerrank.
Here is the link if you need it anyways:
The solution for your problem is:
#include <cstdio>
#include <iostream>
#include <climits>
int main() {
int m[6][6];
// Read 2D Matrix-Array
for (int i = 0; i < 6; ++i) {
for (int j = 0; j < 6; ++j) {
std:: cin >> m[i][j];
// Compute the sum of hourglasses
long temp_sum = 0, MaxSum = LONG_MIN;
for (int i = 0; i < 6; ++i) {
for (int j = 0; j < 6; ++j) {
if (j + 2 < 6 && i + 2 < 6) {
temp_sum = m[i][j] + m[i][j + 1] + m[i][j + 2] + m[i + 1][j + 1] + m[i + 2][j] + m[i + 2][j + 1] + m[i + 2][j + 2];
if (temp_sum >= MaxSum) {
MaxSum = temp_sum;
fprintf(stderr, "Max Sum: %ld\n", MaxSum);
return 0;
The algorithm is simple, it sums all the Hourglasses starting of the upper left corner and the last 2 columns and 2 rows are not processed because it can not form hourglasses.
The above code is almost correct, but it does not work for a negative array elements.We should not take max sum as 0 as negative numbers array might not reach their max sum total >=0. In this case, initializing max sum to INT_MIN is a better option.
I have solved in Python 3.0 and passed all test cases in HackerRank:
Idea: in just 3 simple steps:
To extract all 16 3X3 in 6X6 matrix. Get each sub-matrix sum Find
the max of all sub-matrix sum
I have initialized max as -1000 for negative values you can also initialize it with Minimum_Integer value
# Complete the hourglassSum function below.
def hourglassSum(arr):
max = -1000
s= []
sub_array = []
for m in range(4)://Move vertically down the rows like(012,123,234,345 and taking 3 values horizontally)
for col in range(4):
for row in range(3):
s = sub_array//Extracting all 16 3X3 matrices
hour_sum = sum_list(s[0])+s[1][1]+sum_list(s[2])//Mask array for hour_glass index[[1,1,1],[0,1,1],[1,1,1]]
if (max<hour_sum):
max = hour_sum
sub_array = []
return max
def sum_list(list1):
total = 0
for ele in range(0, len(list1)):
total = total + list1[ele]
return total
Extra: Try replacing this in your Spyder for lesser lines of code
Instead of
Existing: without numpy
hour_sum = sum_list(s[0])+s[1][1]+sum_list(s[2])//Mask array for hour_glass index[[1,1,1],[0,1,1],[1,1,1]]
if (max<hour_sum):
max = hour_sum
With numpy:
import numpy as np
import as ma
hour_glass = ma.array(sub_array, mask=mask)
sum =
Swift 4 version:
func hourglassSum(arr matrix: [[Int]]) -> Int {
let h = matrix.count
if h < 3 {
return 0
let w = matrix[0].count
if w < 3 {
return 0
var maxSum: Int?
for i in 0 ..< h - 2 {
for j in 0 ..< w - 2 {
// Considering matrix[i][j] as top left cell of hour glass.
let sum = matrix[i][j] + matrix[i][j+1] + matrix[i][j+2]
+ matrix[i+1][j+1]
+ matrix[i+2][j] + matrix[i+2][j+1] + matrix[i+2][j+2]
// If previous sum is less then current sum then update new sum in maxSum
if let maxValue = maxSum {
maxSum = max(maxValue, sum)
} else {
maxSum = sum
return maxSum ?? 0
function hourglassSum(arr) {
// Write your code here
let count = -63;
for(let i = 0; i <= 3; i++){
for(let j = 0; j <= 3; j++){
let sum = arr[i][j] + arr[i][j+1] + arr[i][j+2] + arr[i+1][j+1]
+ arr[i+2][j] + arr[i+2][j+1] + arr[i+2][j+2]
if(sum > count){
count = sum
return count;
Here is python implementation of this algorithm.
arr = []
for _ in xrange(6):
arr.append(map(int, raw_input().rstrip().split()))
maxSum = -99999999
for row in range(len(arr)):
tempSum = 0
for col in range(len(arr[row])):
if col+2 >= len(arr[row]) or row+2 >= len(arr[col]):
tempSum = arr[row][col] + arr[row][col+1] + arr[row][col+2] + arr[row+1][col+1] + arr[row+2][col] + arr[row+2][col+1] + arr[row+2][col+2]
if maxSum < tempSum:
maxSum = tempSum
Basic solution for java;
static int hourglassSum(int[][] arr) {
int sum = 0;
for(int i = 2; i<6; i++){
for(int j = 2; j<6; j++){
int up = arr[i-2][j-2] + arr[i-2][j-1] + arr[i-2][j];
int mid = arr[i-1][j-1];
int down = arr[i][j-2] + arr[i][j-1] + arr[i][j];
if(up+mid+down > sum){
sum = up+mid+down;
return sum;
Python clean and fast solution
def hourglassSum(arr):
arr_sum = -5000
tmp_sum = 0
for i in range(0, 6-2):
for j in range (0, 6-2):
tmp_sum = arr[i][j] + arr[i][j+1] + arr[i][j+2] + \
+ arr[i+1][j+1] + \
arr[i+2][j] + arr[i+2][j+1] + arr[i+2][j+2]
if arr_sum < tmp_sum:
arr_sum = tmp_sum
return arr_sum
Just avoided four for loop iterations
int main()
int arr[6][6],max=-1,sum;
for(int arr_i = 0; arr_i < 6; arr_i++){
for(int arr_j = 0; arr_j < 6; arr_j++){
for(int arr_i = 0; arr_i <4; arr_i++)
for(int arr_j = 0; arr_j < 4; arr_j++){
return 0;
int main(){
vector< vector<int> > arr(6,vector<int>(6));
for(int arr_i = 0;arr_i < 6;arr_i++){
for(int arr_j = 0;arr_j < 6;arr_j++){
cin >> arr[arr_i][arr_j];
int sum=-100, temp;
for(int arr_i = 0;arr_i < 4;arr_i++){
for(int arr_j = 0;arr_j < 4;arr_j++){
cout << sum;
return 0;
def hourglassSum(arr)
maxHourGlass = -82
counter = 0
for i in 1..4
for j in 1..4
acc = arr[i][j]
counter= counter +1
for x in -1..1
acc = acc + arr[i-1][j+x] + arr[i+1][j+x]
maxHourGlass = acc if acc > maxHourGlass
This is written in C++14 and passes all nine test cases. I think someone could improve it to use more C++14 features.
int hourglassSum(vector<vector<int>> arr)
if(arr.size() < 3 || arr[0].size() < 3 )
return -1;
int rowSize = arr[0].size();
int sum = -9 * 6; // smallest negative sum possible;
for( int i = 1; i < arr.size()-1; i++ )
int tmp_sum = 0;
for( int j = 1; j < rowSize-1; j++ )
tmp_sum = (arr[i - 1][j - 1] + arr[i - 1][j] + arr[i - 1][j + 1] );
tmp_sum += (arr[i ][j ]);
tmp_sum += (arr[i + 1][j - 1] + arr[i + 1][j] + arr[i + 1][j + 1]);
sum = max(tmp_sum, sum);
return sum;
class Solution {
static void Main(string[] args) {
int[][] arr = new int[6][];
for (int i = 0; i < 6; i++) {
arr[i] = Array.ConvertAll(Console.ReadLine().Split(' '), arrTemp => Convert.ToInt32(arrTemp));
int[] sum=new int[16];
int j;
int count=0;
for(int i=0; i<4; i++)
int max=sum.Max();
Largest (maximum) hourglass sum found in the array will be -63 as the element cannot be greater than -9 i.e. -9*7 = -63
int max_hourglass_sum = -63;
for (int i = 0; i <arr.Length-2; i++) { //row
for (int j = 0 ; j<arr.Length-2; j++) { //column
int current_hourglass_sum = arr[i][j] + arr[i][j+1] + arr[i][j+2] //1st row
+ arr[i+1][j+1] //2nd row
+ arr[i+2][j] + arr[i+2][j+1] + arr[i+2][j+2] ;//3rd row
max_hourglass_sum = Math.Max(max_hourglass_sum , current_hourglass_sum);
static int hourglassSum(int[][] arr) {
int result = int.MinValue;
int rowLength = arr.GetLength(0);
int colLength = arr.Length;
for (int i = 0; i < rowLength - 2; i++)
for(int j=0; j< colLength - 2; j++)
int sum = 0;
sum = arr[i][j] + arr[i][j+1] + arr[i][j+2]+ arr[i+1][j+1]
+ arr[i+2][j] + arr[i+2][j+1] + arr[i+2][j+2];
result = Math.Max(result,sum);
return result;
function hourglassSum(arr) {
const hourGlass = [];
for (let i = 0; i < 4; i++) {
for (let x = 0; x < 4; x++) {
let hourGlassSumValue = arr[i][x] + arr[i][x + 1] + arr[i][x + 2] + arr[i + 1][x + 1] + arr[i + 2]enter code here[x] + arr[i + 2][x + 1] + arr[i + 2][x + 2];
return Math.max(...hourGlass);

Image processing error

I must implement in C++ using diblok The Seeded Region Growing algorithm due to Adams and Bischof which can be found here
It is the fig.2 pseudocode.
After I choose the seeded points using the mouse , it throws this message : Unhandled exception at 0x00416ca0 in diblook.exe: 0xC0000005: Access violation reading location 0x3d2f6e68.
This is the code of the function:
void CDibView::OnLButtonDblClk(UINT nFlags, CPoint point)
int** labels = new int* [dwHeight];
for(int k = 0;k < dwHeight; k++)
labels[k] = new int[dwWidth];
int noOfRegions = 2;
double meanRegion[2];
double noOfPointsInRegion[2];
for(int i = 0; i < dwHeight ; i++)
for(int j = 0; j < dwWidth ; j++)
labels[i][j] = -1;
if(noOfPoints < 6)
CPoint p = GetScrollPosition() + point;
pos[noOfPoints].x = p.x;
pos[noOfPoints].y = p.y;
int regionLabel = 0;
if(noOfPoints <= noOfPoints / 2)
labels[p.x][p.y] = regionLabel;
labels[p.x][p.y] = regionLabel + 1;
// Calculate the mean of each region
for(int i = 0; i < noOfRegions; i++)
for(int j = 0 ; j < noOfPoints; j++)
if(labels[pos[j].x][pos[j].y] == i)
meanRegion[i] += lpSrc[pos[j].x * w + pos[j].y];
meanRegion[i] /= 3;
noOfPointsInRegion[i] = 3;
for(int seedPoint = 0; seedPoint < noOfPoints; seedPoint++)
// define list
node *start, *temp;
start = (node *) malloc (sizeof(node));
temp = start;
temp -> next = NULL;
for(int i = -1; i <= 1; i++)
for(int j = -1; j<= 1; j++)
if(i == 0 && j == 0) continue;
int gamma = lpSrc[(pos[seedPoint].x + i) * + pos[seedPoint].y + j] - lpSrc[pos[seedPoint].x * w + pos[seedPoint].y];
push(start, pos[seedPoint].x + i, pos[seedPoint].y + j, gamma);
if(start != NULL)
node *y = start;
int sameNeighbour = 1;
int neighValue = -1;
for(int k = -1; k <= 1; k++)
for(int l = -1; l <= 1;l++)
if(k ==0 && l==0) continue;
if(labels[y -> x + k][y -> y + l] != -1)
neighValue = labels[y -> x + k][y -> y + l];
for(int k = -1; k <= 1; k++)
for(int l = -1; l <= 1;l++)
if(k == 0 && l==0) continue;
if(labels[y -> x + k][y -> y = 1] != -1 && labels[y -> x + k][y -> y + l] != neighValue)
sameNeighbour = 0;
if(sameNeighbour == 1)
labels[y -> x][y -> y] = neighValue;
meanRegion[neighValue] = meanRegion[neighValue] * noOfPointsInRegion[neighValue] / noOfPointsInRegion[neighValue] + 1;
for(int k = -1; k <= 1; k++)
for(int l = -1; l <= 1;l++)
if(k == 0 && l == 0) continue;
if(labels[y -> x + k][y -> y + l] == -1 && find(start, y->x + k, y->y + l) == 0)
int gammak = meanRegion[neighValue] - lpSrc[(y->x +k) * w + (y->y + l)];
push(start, y->x + k, y->y + l, gammak);
labels[y->x][y->y] = -1;
int noOfRegionOne = 0;
int noOfRegionTwo = 0;
int noOfBoundary = 0;
for(int i = 0; i< dwHeight; i++)
for(int j = 0;j<dwWidth; j++)
if(labels[i][j] == -1)
else if(labels[i][j] == 0)
else if(labels[i][j] == 1)
CString info;
info.Format("Boundary %d, One %d, Two %d", noOfBoundary, noOfRegionOne, noOfRegionTwo);
noOfPoints = 0;
CScrollView::OnLButtonDblClk(nFlags, point);
After a choose to break the running, this is what is shown
Can anybody tell what is wrong and why it doesn't work?
Your screenshot shows that your node (Y is a terrible name, incidentally) has garbage values in it. Offhand, I suspect that 'sort' is overwriting your node values, resulting in garbage. I would create a static copy of your current node to prevent it from changing during processing:
node *y = start;
node y = *start;