I'm using C++.
I have 5 vectors.
int currDim = 5;
int a[currDim] = {1,6,11,16,21};
int b[currDim] = {2,7,12,17,22};
int c[currDim] = {3,8,13,18,23};
int d[currDim] = {4,9,14,19,24};
int e[currDim] = {5,10,15,20,25};
I want to merge them to one int matrix[currDim][5].
The matrix should be:
{1,2,3,4,5}
{6,7,8,9,10}
{11,12,13,14,15}
{16,17,18,19,20}
{21,22,23,24,25}
What i'm did:
int j=k=0;
for(int i = 0; i<currDim ; i++)
{
matrix[i][k++] = a[j];
matrix[i][k++] = b[j];
matrix[i][k++] = c[j];
matrix[i][k++] = d[j];
matrix[i][k++] = e[j];
k = 0;
j++;
}
The code is works but i looking for better way to increased efficiency , any suggestion?
First of all, I may have to apologize for my answer may not help, since I'm not quite sure what you mean by asking for "better ways": It could mean better coding style, increased efficiency, or generality.
Since it is really a simple task that only takes 25 assignment operations you are dealing with, efficiency is not such a big issue here.
However, for the elegance's sake, you can consider using an array of pointers to store a,b,c,d,e and replace five excessive assignment statements with a simple, elegant loop, just like the following code shows:
typedef int *pInt;
//Each element of arr is a int-type pointer
pInt arr[5] = {a, b, c, d, e};
int matrix[5][5] = {0};
for(int i = 0; i < 5; ++i){
for(int j = 0 ; j < 5; ++j){
matrix[i][j] = arr[j][i];
}
}
Let me know if my approach is helpful to you.
P.S. I usually leave left bracket in the end rather than starting a new line, and I tend to use ++i instead of i++, it's more of a personal habit and you don't need to exactly follow my style.
Related
I have a struct:
struct xyz{
int x,y,z;
};
and I initialize a struct xyz type vector:
for (int i = 0; i < N; i++)
{
for (int j = 0; j < N; j++)
{
for (int k = 0; k < N; k++)
{
v.x=i;
v.y=j;
v.z=k;
vect.push_back(v);
}
}
}
then I want to transform that vector to array because array is 2 time faster than vector to manipulate, so I do
xyz arr[vect.size()];
std::copy(vect.begin(), vect.end(), arr);
when I run this program it shows me segmentation fault which I think is because vect.size() is too large.
So I am wondering is there any way to convert that large size vector to array without that problem.
I appreciate for any help
My overly pedantic comment got too big, so instead I'll try to make this a somewhat roundabout answer. The short answer is probably just to stick with vector but make sure to use reserve; oh, and benchmark.
You didn't say what compiler or C++ version you're using, so I'll just go with my current gcc.godbolt.org default of gcc 4.9.2, C++14. I'm also assuming that you really want this as a 1-dimension array, rather than the more natural (for your example) 3.
If you know N at compile time, you could do something like this (assuming I got the array offset calculation correct):
#include <array>
...
std::array<xyz, N*N*N> xyzs;
for (int i = 0; i < N; i++) {
for (int j = 0; j < N; j++) {
for (int k = 0; k < N; k++) {
xyzs[i*N*N+j*N+k] = {i, j, k};
}
}
}
The biggest downsides, IMO:
error-prone offset calculation
depending on N, where the code is run, etc, this can blow the stack
On the compilers I tried this on, the optimizers seem to understand that we're moving through the array in contiguous order, and the generated machine code is more sensible, but it could also be written like so, if you prefer:
#include <array>
...
std::array<xyz, N*N*N> xyzs;
auto p = xyzs.data();
for (int i = 0; i < N; ++i) {
for (int j = 0; j < N; ++j) {
for (int k = 0; k < N; ++k) {
(*p++) = {i, j, k};
}
}
}
Of course, if you actually know N at compile time, and it won't blow the stack, you might consider a 3-dimensional array xyz xyzs[N][N][N]; since this might be more natural for the way these things are being ultimately being used.
As pointed out in comments, variable length arrays aren't legal C++, but they are legal in C99; if you don't know N at compile time you should be allocating off the heap.
A vector and an array will wind up being identical in terms memory layout; they differ in that vector allocates memory from the heap, and the array (as you are writing it) would be on the stack. The only recommendation I'd make is to call reserve before entering your loop:
vect.reserve(N*N*N);
This means you'll only be doing a single memory allocation up front, rather than grow-and-copy mechanism that you'll get from a default constructed vector.
Assuming xyz is as simple as you declare here, you could also do something like the second example above:
std::vector<xyz> xyzs{N*N*N};
auto p = xyzs.data();
for (int i = 0; i < N; ++i) {
for (int j = 0; j < N; ++j) {
for (int k = 0; k < N; ++k) {
(*p++) = {i, j, k};
}
}
}
You lose the safety of push_back, and it is less efficient if xyz default constructor needs to do anything (like if xyz members were changed to have default values).
Having said all that, you really should benchmark. But then, you should probably be benchmarking the code that ultimately uses this array, rather than the code to construct it; I'd have other concerns if construction was dominating usage.
reader,
Well, I think I just got brainfucked a bit.
I'm implementing knapsack, and I thought about I implemented brute-force algorithm like 1 or 2 times ever. So I decided to make another one.
And here's what I chocked in.
Let us decide W is maximum weight, and w(min) is minimal-weighted element we can put in knapsack like k=W/w(min) times. I'm explaining this because you, reader, are better know why I need to ask my question.
Now. If we imagine that we have like 3 types of things we can put in knapsack, and our knapsack can store like 15 units of mass, let's count each unit weight as its number respectively. so we can put like 15 things of 1st type, or 7 things of 2nd type and 1 thing of 1st type. but, combinations like 22222221[7ed] and 12222222[7ed] will mean the same for us. and counting them is a waste of any type of resources we pay for decision. (it's a joke, 'cause bf is a waste if we have a cheaper algorithm, but I'm very interested)
As I guess the type of selections we need to go through all possible combinations is called "Combinations with repetitions". The number of C'(n,k) counts as (n+k-1)!/(n-1)!k!.
(while I typing my message I just spotted a hole in my theory. we will probably need to add an empty, zero-weighted-zero-priced item to hold free space it's probably just increases n by 1)
so, what's the matter.
https://rosettacode.org/wiki/Combinations_with_repetitions
as this problem is well-described up here^ I don't really want to use stack this way, I want to generate variations in single cycle, which is going from i=0 to i<C'(n,k).
so, If I can make it, how it works?
we have
int prices[n]; //appear mystically
int weights[n]; // same as previous and I guess we place (0,0) in both of them.
int W, k; // W initialized by our lord and savior
k = W/min(weights);
int road[k], finalroad[k]; //all 0
int curP = curW = maxP = maxW = 0;
for (int i = 0; i < rCombNumber(n, k); i ++) {
/*guys please help me to know how to generate this mask which is consists of indices from 0 to n (meaning of each element) and k is size of mask.*/
curW = 0;
for (int j = 0; j < k; j ++)
curW += weights[road[j]];
if (curW < W) {
curP = 0;
for (int l = 0; l < k; l ++)
curP += prices[road[l]];
if (curP > maxP) {
maxP = curP;
maxW = curW;
finalroad = road;
}
}
}
mask, road -- is an array of indices, each can be equal from 0 to n; and have to be generated as C'(n,k) (link about it above) from { 0, 1, 2, ... , n } by k elements in each selection (combination with repetitions where order is unimportant)
that's it. prove me wrong or help me. Much thanks in advance _
and yes, of course algorithm will take the hell much time, but it looks like it should work. and I'm very interesting in it.
UPDATE:
what do I miss?
http://pastexen.com/code.php?file=EMcn3F9ceC.txt
The answer was provided by Minoru here https://gist.github.com/Minoru/745a7c19c7fa77702332cf4bd3f80f9e ,
it's enough to increment only the first element, then we count all of the carries, set where we did a carry and count reset value as the maximum of elements to reset and reset with it.
here's my code:
#include <iostream>
using namespace std;
static long FactNaive(int n)
{
long r = 1;
for (int i = 2; i <= n; ++i)
r *= i;
return r;
}
static long long CrNK (long n, long k)
{
long long u, l;
u = FactNaive(n+k-1);
l = FactNaive(k)*FactNaive(n-1);
return u/l;
}
int main()
{
int numberOFchoices=7,kountOfElementsInCombination=4;
int arrayOfSingleCombination[kountOfElementsInCombination] = {0,0,0,0};
int leftmostResetPos = kountOfElementsInCombination;
int resetValue=1;
for (long long iterationCounter = 0; iterationCounter<CrNK(numberOFchoices,kountOfElementsInCombination); iterationCounter++)
{
leftmostResetPos = kountOfElementsInCombination;
if (iterationCounter!=0)
{
arrayOfSingleCombination[kountOfElementsInCombination-1]++;
for (int anotherIterationCounter=kountOfElementsInCombination-1; anotherIterationCounter>0; anotherIterationCounter--)
{
if(arrayOfSingleCombination[anotherIterationCounter]==numberOFchoices)
{
leftmostResetPos = anotherIterationCounter;
arrayOfSingleCombination[anotherIterationCounter-1]++;
}
}
}
if (leftmostResetPos != kountOfElementsInCombination)
{
resetValue = 1;
for (int j = 0; j < leftmostResetPos; j++)
{
if (arrayOfSingleCombination[j] > resetValue)
{
resetValue = arrayOfSingleCombination[j];
}
}
for (int j = leftmostResetPos; j != kountOfElementsInCombination; j++)
{
arrayOfSingleCombination[j] = resetValue;
}
}
for (int j = 0; j<kountOfElementsInCombination; j++)
{
cout<<arrayOfSingleCombination[j]<<" ";
}
cout<<"\n";
}
return 0;
}
thanks a lot, Minoru
I'm writing simple ANN (neural network) for functions' approximation. I got crash with message: "Heap corrupted". I found few advices how to resolve it, but nothing help.
I got error at first line of this function:
void LU(double** A, double** &L, double** &U, int s){
U = new double*[s];
L = new double*[s];
for (int i = 0; i < s; i++){
U[i] = new double[s];
L[i] = new double[s];
for (int j = 0; j < s; j++)
U[i][j] = A[i][j];
}
for (int i = 0, j = 0; i < s; i = ++j){
L[i][j] = 1;
for (int k = i + 1; k < s - 1; k++){
L[k][j] = U[k][j] / U[i][j];
double* vec_t = mul(U[i], L[k][j], s);
for (int z = 0; z < s; z++)
U[k][z] = U[k][z] - vec_t[z];
delete[] vec_t;
}
}
};
As I understood from debagger's information: two arrays (U and L) has been passed to function with some addresses in memory. And it's quite strange because I didn't initialize it. I call this function two times and first time it works nicely (ok, at least it works), but at second call it crashes. I have no idea how to resolve it.
There is link to whole project: CLICK
I'm working in MS Visual Studio 2013 under Windows 7 x64.
UPDATE
According to some commentaries below I should provide some additive information.
First of all, sorry for quality of code. I wrote it only for myself for 2 days.
Second, when I said "at second call", I mean that first I call LU when I need to get determinant of S (I use LU decomposition fot this) and it working without any crashes. Second call it's when I trying to get inverse of matrix (the same, S). And when I call detLU at [0, 0] point of matrix (to get cofactor) I got this crash.
Third, if I get information from debagger correctly, arrays L and U passes in function at second call with already defined memory's addresses. I can't understand why, becouse before LU call I have just wrote "double** L; double** U;" without any initialization.
I can try provide some additional debug information or some tests, if somebody explain me what exactly I have to do.
The point you get a heap corruption error/crash is typically just the symptom of an actual heap overflow/underflow or other memory error at some other time/point in the past. This is why heap corruptions can be difficult to track down.
You have a lot of code and all the double-pointers are difficult to track but I did notice one potential issue:
double** initInWeights(double f, int h, int w) {
double** W = new double*[h];
for (int i = 0; i < 10; i++) {
W[i] = new double[w];
The loop will overflow W[] if h is less than 10. Chances are that somewhere in your code you have a buffer overflow/underflow or are using memory after it is freed. The complexity and design of your code makes it difficult to pinpoint at a glance.
Is there a reason you are using raw double-pointers instead of simply std::vector<std::vector<double>>? This would remove all your manual memory management code, making your code shorter, simpler, and more importantly remove the heap corruption issue.
Barring that you should double-check that all manually allocated memory is the correct size and access loops can never go out-of-bounds.
Update -- I think your problem may lie with a buffer overflow in the extract() function in matrix.cpp:
double** extract(double** mat, int s, int col, int row)
{
double** ext = new double*[s - 1];
for (int i = 0; i < s - 1; i++)
{
ext[i] = new double[s - 1];
}
int ext_c = 0, ext_r = 0;
for (int i = 0; i < s; i++)
{
if (i != row)
{
for (int j = 0; j < s; j++)
{ // Overflow on ext_c here
if (j != col) ext[ext_r][ext_c++] = mat[i][j];
}
ext_r++;
}
}
return ext;
};
You never reset ext_c so it simply keeps increasing in size up to (s-1)*(s-1) which obviously overflows the ext[] array. To fix this you simply need to change the inner loop definition to:
for (int j = 0, ext_c = 0; j < s; j++)
At least that one change lets me run your project without any heap corruption errors.
I would like to optimize this simple loop:
unsigned int i;
while(j-- != 0){ //j is an unsigned int with a start value of about N = 36.000.000
float sub = 0;
i=1;
unsigned int c = j+s[1];
while(c < N) {
sub += d[i][j]*x[c];//d[][] and x[] are arrays of float
i++;
c = j+s[i];// s[] is an array of unsigned int with 6 entries.
}
x[j] -= sub; // only one memory-write per j
}
The loop has an execution time of about one second with a 4000 MHz AMD Bulldozer. I thought about SIMD and OpenMP (which I normally use to get more speed), but this loop is recursive.
Any suggestions?
think you may want to transpose the matrix d -- means store it in such a way that you can exchange the indices -- make i the outer index:
sub += d[j][i]*x[c];
instead of
sub += d[i][j]*x[c];
This should result in better cache performance.
I agree with transposing for better caching (but see my comments on that at the end), and there's more to do, so let's see what we can do with the full function...
Original function, for reference (with some tidying for my sanity):
void MultiDiagonalSymmetricMatrix::CholeskyBackSolve(float *x, float *b){
//We want to solve L D Lt x = b where D is a diagonal matrix described by Diagonals[0] and L is a unit lower triagular matrix described by the rest of the diagonals.
//Let D Lt x = y. Then, first solve L y = b.
float *y = new float[n];
float **d = IncompleteCholeskyFactorization->Diagonals;
unsigned int *s = IncompleteCholeskyFactorization->StartRows;
unsigned int M = IncompleteCholeskyFactorization->m;
unsigned int N = IncompleteCholeskyFactorization->n;
unsigned int i, j;
for(j = 0; j != N; j++){
float sub = 0;
for(i = 1; i != M; i++){
int c = (int)j - (int)s[i];
if(c < 0) break;
if(c==j) {
sub += d[i][c]*b[c];
} else {
sub += d[i][c]*y[c];
}
}
y[j] = b[j] - sub;
}
//Now, solve x from D Lt x = y -> Lt x = D^-1 y
// Took this one out of the while, so it can be parallelized now, which speeds up, because division is expensive
#pragma omp parallel for
for(j = 0; j < N; j++){
x[j] = y[j]/d[0][j];
}
while(j-- != 0){
float sub = 0;
for(i = 1; i != M; i++){
if(j + s[i] >= N) break;
sub += d[i][j]*x[j + s[i]];
}
x[j] -= sub;
}
delete[] y;
}
Because of the comment about parallel divide giving a speed boost (despite being only O(N)), I'm assuming the function itself gets called a lot. So why allocate memory? Just mark x as __restrict__ and change y to x everywhere (__restrict__ is a GCC extension, taken from C99. You might want to use a define for it. Maybe the library already has one).
Similarly, though I guess you can't change the signature, you can make the function take only a single parameter and modify it. b is never used when x or y have been set. That would also mean you can get rid of the branch in the first loop which runs ~N*M times. Use memcpy at the start if you must have 2 parameters.
And why is d an array of pointers? Must it be? This seems too deep in the original code, so I won't touch it, but if there's any possibility of flattening the stored array, it will be a speed boost even if you can't transpose it (multiply, add, dereference is faster than dereference, add, dereference).
So, new code:
void MultiDiagonalSymmetricMatrix::CholeskyBackSolve(float *__restrict__ x){
// comments removed so that suggestions are more visible. Don't remove them in the real code!
// these definitions got long. Feel free to remove const; it does nothing for the optimiser
const float *const __restrict__ *const __restrict__ d = IncompleteCholeskyFactorization->Diagonals;
const unsigned int *const __restrict__ s = IncompleteCholeskyFactorization->StartRows;
const unsigned int M = IncompleteCholeskyFactorization->m;
const unsigned int N = IncompleteCholeskyFactorization->n;
unsigned int i;
unsigned int j;
for(j = 0; j < N; j++){ // don't use != as an optimisation; compilers can do more with <
float sub = 0;
for(i = 1; i < M && j >= s[i]; i++){
const unsigned int c = j - s[i];
sub += d[i][c]*x[c];
}
x[j] -= sub;
}
// Consider using processor-specific optimisations for this
#pragma omp parallel for
for(j = 0; j < N; j++){
x[j] /= d[0][j];
}
for( j = N; (j --) > 0; ){ // changed for clarity
float sub = 0;
for(i = 1; i < M && j + s[i] < N; i++){
sub += d[i][j]*x[j + s[i]];
}
x[j] -= sub;
}
}
Well it's looking tidier, and the lack of memory allocation and reduced branching, if nothing else, is a boost. If you can change s to include an extra UINT_MAX value at the end, you can remove more branches (both the i<M checks, which again run ~N*M times).
Now we can't make any more loops parallel, and we can't combine loops. The boost now will be, as suggested in the other answer, to rearrange d. Except… the work required to rearrange d has exactly the same cache issues as the work to do the loop. And it would need memory allocated. Not good. The only options to optimise further are: change the structure of IncompleteCholeskyFactorization->Diagonals itself, which will probably mean a lot of changes, or find a different algorithm which works better with data in this order.
If you want to go further, your optimisations will need to impact quite a lot of the code (not a bad thing; unless there's a good reason for Diagonals being an array of pointers, it seems like it could do with a refactor).
I want to give an answer to my own question: The bad performance was caused by cache conflict misses due to the fact that (at least) Win7 aligns big memory blocks to the same boundary. In my case, for all buffers, the adresses had the same alignment (bufferadress % 4096 was same for all buffers), so they fall into the same cacheset of L1 cache. I changed memory allocation to align the buffers to different boundaries to avoid cache conflict misses and got a speedup of factor 2. Thanks for all the answers, especially the answers from Dave!
I tried to write a countingsort, but there's some problem with it.
here's the code:
int *countSort(int* start, int* end, int maxvalue)
{
int *B = new int[(int)(end-start)];
int *C = new int[maxvalue];
for (int i = 0; i < maxvalue; i++)
{
*(C+i) = 0;
}
for (int *i = start; i < end; i++)
{
*(C+*i) += 1;
}
for (int i = 1; i < maxvalue-1 ; i++)
{
*(C+i) += *(C+i-1);
}
for (int *i = end-1; i > start-1; i--)
{
*(B+*(C+(*i))) = *i;
*(C+(*i)) -= 1;
}
return B;
}
In the last loop it throws an exception "Acces violation writing at location: -some ram address-"
Where did I go wrong?
for (int i = 1; i < maxvalue-1 ; i++)
That's the incorrect upper bound. You want to go from 1 to maxvalue.
for (int *i = end-1; i > start-1; i--)
{
*(B+*(C+(*i))) = *i;
*(C+(*i)) -= 1;
}
This loop is also completely incorrect. I don't know what it does, but a brief mental test shows that the first iteration sets the element of B at the index of the value of the last element in the array to the number of times it shows. I guarantee that that is not correct. The last loop should be something like:
int* out = B;
int j=0;
for (int i = 0; i < maxvalue; i++) { //for each value
for(j<C[i]; j++) { //for the number of times its in the source
*out = i; //add it to the output
++out; //in the next open slot
}
}
As a final note, why are you playing with pointers like that?
*(B + i) //is the same as
B[i] //and people will hate you less
*(B+*(C+(*i))) //is the same as
B[C[*i]]
Since you're using C++ anyway, why not simplify the code (dramatically) by using std::vector instead of dynamically allocated arrays (and leaking one in the process)?
std::vector<int>countSort(int* start, int* end, int maxvalue)
{
std::vector<int> B(end-start);
std::vector<int> C(maxvalue);
for (int *i = start; i < end; i++)
++C[*i];
// etc.
Other than that, the logic you're using doesn't make sense to me. I think to get a working result, you're probably best off sitting down with a sheet of paper and working out the steps you need to use. I've left the counting part in place above, because I believe that much is correct. I don't think the rest really is. I'll even give a rather simple hint: once you've done the counting, you can generate B (your result) based only on what you have in C -- you do not need to refer back to the original array at all. The easiest way to do it will normally use a nested loop. Also note that it's probably easier to reserve the space in B and use push_back to put the data in it, rather than setting its initial size.