I have a struct:
struct xyz{
int x,y,z;
};
and I initialize a struct xyz type vector:
for (int i = 0; i < N; i++)
{
for (int j = 0; j < N; j++)
{
for (int k = 0; k < N; k++)
{
v.x=i;
v.y=j;
v.z=k;
vect.push_back(v);
}
}
}
then I want to transform that vector to array because array is 2 time faster than vector to manipulate, so I do
xyz arr[vect.size()];
std::copy(vect.begin(), vect.end(), arr);
when I run this program it shows me segmentation fault which I think is because vect.size() is too large.
So I am wondering is there any way to convert that large size vector to array without that problem.
I appreciate for any help
My overly pedantic comment got too big, so instead I'll try to make this a somewhat roundabout answer. The short answer is probably just to stick with vector but make sure to use reserve; oh, and benchmark.
You didn't say what compiler or C++ version you're using, so I'll just go with my current gcc.godbolt.org default of gcc 4.9.2, C++14. I'm also assuming that you really want this as a 1-dimension array, rather than the more natural (for your example) 3.
If you know N at compile time, you could do something like this (assuming I got the array offset calculation correct):
#include <array>
...
std::array<xyz, N*N*N> xyzs;
for (int i = 0; i < N; i++) {
for (int j = 0; j < N; j++) {
for (int k = 0; k < N; k++) {
xyzs[i*N*N+j*N+k] = {i, j, k};
}
}
}
The biggest downsides, IMO:
error-prone offset calculation
depending on N, where the code is run, etc, this can blow the stack
On the compilers I tried this on, the optimizers seem to understand that we're moving through the array in contiguous order, and the generated machine code is more sensible, but it could also be written like so, if you prefer:
#include <array>
...
std::array<xyz, N*N*N> xyzs;
auto p = xyzs.data();
for (int i = 0; i < N; ++i) {
for (int j = 0; j < N; ++j) {
for (int k = 0; k < N; ++k) {
(*p++) = {i, j, k};
}
}
}
Of course, if you actually know N at compile time, and it won't blow the stack, you might consider a 3-dimensional array xyz xyzs[N][N][N]; since this might be more natural for the way these things are being ultimately being used.
As pointed out in comments, variable length arrays aren't legal C++, but they are legal in C99; if you don't know N at compile time you should be allocating off the heap.
A vector and an array will wind up being identical in terms memory layout; they differ in that vector allocates memory from the heap, and the array (as you are writing it) would be on the stack. The only recommendation I'd make is to call reserve before entering your loop:
vect.reserve(N*N*N);
This means you'll only be doing a single memory allocation up front, rather than grow-and-copy mechanism that you'll get from a default constructed vector.
Assuming xyz is as simple as you declare here, you could also do something like the second example above:
std::vector<xyz> xyzs{N*N*N};
auto p = xyzs.data();
for (int i = 0; i < N; ++i) {
for (int j = 0; j < N; ++j) {
for (int k = 0; k < N; ++k) {
(*p++) = {i, j, k};
}
}
}
You lose the safety of push_back, and it is less efficient if xyz default constructor needs to do anything (like if xyz members were changed to have default values).
Having said all that, you really should benchmark. But then, you should probably be benchmarking the code that ultimately uses this array, rather than the code to construct it; I'd have other concerns if construction was dominating usage.
Related
I am new to C++ and programming so I think I am making inefficient codes.
I was wondering whether there is any way I can speed up the matrix calculation process.
For example, this is the sample code I write which finds the maximum differences(in absolute value) between 3d array 'V' and 'Vnew'.
First, I take subtraction.
And then, I put the value of tempdiff[0][0][0] to 'dif'
Then, I compare 'dif' and tempdiff[i][j][k] and replace if the latter is larger than the former.
This is just a part of my code and there are lots of matrix calculations inside so that I have too many 'for' statements.
So I was wondering whether there is any way I could avoid using 'for' in the matrix calculations.
Thanks in advance.
for (int i = 0; i < Na; i++) {
for (int j = 0; j < Nd; j++) {
for (int k = 0; k < Ny; k++) {
tempdiff[i][j][k] = abs(V[i][j][k] - Vnew[i][j][k]);
}
}
}
dif = tempdiff[0][0][0];
for (int i = 0; i < Na; i++) {
for (int j = 0; j < Nd; j++) {
for (int k = 0; k < Ny; k++) {
if (tempdiff[i][j][k] > dif) {
dif = tempdiff[i][j][k];
}
else {
dif = dif;
}
}
}
}
There's not much you can do with the for loops, as the maximum difference can locate at all possible places. You have already succeeded in iterating the array in the correct, linear, order.
Compilers are generally quite efficient in optimising, but they apparently fail to flatten a contiguous array, such as float V[Na][Nd][Ny];. After you flatten it manually to float V[Na*Nd*Ny], at least clang can auto-vectorise and produce SIMD code for x64 and arm.
A further optimisation is to avoid making this in two steps, as the total memory throughput is exactly doubled with the temporary array compared to a one-pass solution.
I was assuming your matrices are of type float -- if you can select int, gcc can auto-vectorise this as well (relates to NaN handling); furthermore int16_t or int8_t types are even quicker to evaluate, as more operations can be packed to a single SIMD instruction.
I have the following piece of C++ code. The scale of the problem is N and M. Running the code takes about two minutes on my machine. (after g++ -O3 compilation). Is there anyway to further accelerate it, on the same machine? Any kind of option, choosing a better data structure, library, GPU or parallelism, etc, is on the table.
void demo() {
int N = 1000000;
int M=3000;
vector<vector<int> > res(M);
for (int i =0; i <N;i++) {
for (int j=1; j < M; j++){
res[j].push_back(i);
}
}
}
int main() {
demo();
return 0;
}
An additional info: The second loop above for (int j=1; j < M; j++) is a simplified version of the real problem. In fact, j could be in a different range for each i (of the outer loop), but the number of iterations is about 3000.
With the exact code as shown when writing this answer, you could create the inner vector once, with the specific size, and call iota to initialize it. Then just pass this vector along to the outer vector constructor to use it for each element.
Then you don't need any explicit loops at all, and instead use the (highly optimized, hopefully) standard library to do all the work for you.
Perhaps something like this:
void demo()
{
static int const N = 1000000;
static int const M = 3000;
std::vector<int> data(N);
std::iota(begin(data), end(data), 0);
std::vector<std::vector<int>> res(M, data);
}
Alternatively you could try to initialize just one vector with that elements, and then create the other vectors just by copying that part of the memory using std::memcpy or std::copy.
Another optimization would be to allocate the memory in advance (e.g. array.reserve(3000)).
Also if you're sure that all the members of the vector are similar vectors, you could do a hack by just creating a single vector with 3000 elements, and in the other res just put the same reference of that 3000-element vector million times.
On my machine which has enough memory to avoid swapping your original code took 86 seconds.
Adding reserve:
for (auto& v : res)
{
v.reserve(N);
}
made basically no difference (85 seconds but I only ran each version once).
Swapping the loop order:
for (int j = 1; j < M; j++) {
for (int i = 0; i < N; i++) {
res[j].push_back(i);
}
}
reduced the time to 10 seconds, this is likely due to a combination of allowing the compiler to use SIMD optimisations and improving cache coherency by accessing memory in sequential order.
Creating one vector and copying it into the others:
for (int i = 0; i < N; i++) {
res[1].push_back(i);
}
for (int j = 2; j < M; j++) {
res[j] = res[1];
}
reduced the time to 4 seconds.
Using a single vector:
void demo() {
size_t N = 1000000;
size_t M = 3000;
vector<int> res(M*N);
size_t offset = N;
for (size_t i = 0; i < N; i++) {
res[offset++] = i;
}
for (size_t j = 2; j < M; j++) {
std::copy(res.begin() + N, res.begin() + N * 2, res.begin() + offset);
offset += N;
}
}
also took 4 seconds, there probably isn't much improvement because you have 3,000 4 MB vectors, there would likely be more difference if N was smaller or M was larger.
I am implementing an algorithm that uses rather large vector(vector(double)) types for storage; no elements will be added or removed after preallocation. I would like to make sure that element access is as fast as possible, also I need to add and scale several of them (elementwise). What is the best way to do this?
Here are relevant parts of my (naive) code, that I doubt is efficient:
vector<vector<double>> z;
vector<vector<double>> mu;
vector<vector<double>> temp_NNZ;
..
for(int i = 0; i < init.valsA.size(); ++i){
z.push_back({});
mu.push_back({});
temp_NNZ.push_back({});
for(int j = 0; j < init.valsA[i].size(); ++j){
z[i].push_back(0);
mu[i].push_back(0);
temp_NNZ[i].push_back(0);
}
}
..
for(int i = 0; i < z.size(); ++i){
for(int j = 0; j < z[i].size(); ++j){
z[i][j] = temp_NNZ[i][j] - mu[i][j]/rho - z[i][j];
}
}
There are two ways to do this : vector::resize will create all the elements, and value-initialize them (or copy-initialize them if you give it an initial value), and vector::reserve will let you allocate the amount of memory you need in advance without initializing it (which may be more efficient). In the first case, you'll have to copy the final value to the already existing elements (z[i] = x) whereas in the other you'll have to create the elements as you would do with your current code (z.push_back(x)).
I am trying to follow the Guassian Elimination algorithm in https://courses.engr.illinois.edu/cs554/fa2015/notes/06_lu_8up.pdf in order to implement LU factorization and eventually parallelize it with openmp. Does the following algorithm look correct, where l is the multiplier and m is the matrix?
void decompose2(double **m) {
begin =clock();
int i=0, j=0, k=0;
for(k = 1; k < size - 1; k++)
{
for(i = k + 1; i < size; i++)
{
l[i][k] = m[i][k]/m[k][k];
}
for(j = k + 1; j < size; j++)
{
for(i = k + 1; k < size; k++)
{
m[i][j] = m[i][j] - (l[i][k]*m[k][j]);
}
}
}
end = clock();
}
I don't think it is correct because according to a different paper the times I am getting after parallelization on the same number of processors are completely different.
"Does the following algorithm look correct, …" -- No, because
arrays are 0-index in C++,
double[size][size] (which you are likely using) is not convertible to double**,
int is not a good type for iterators (use size_t instead),
you don't check if m[k][k] might be (close to) zero, when you might have to swap rows.
Please notice that I only looked at the obvious implementation errors, not at possible instances to make the code better, e.g. increasing the stability of the calculation.
I'm writing simple ANN (neural network) for functions' approximation. I got crash with message: "Heap corrupted". I found few advices how to resolve it, but nothing help.
I got error at first line of this function:
void LU(double** A, double** &L, double** &U, int s){
U = new double*[s];
L = new double*[s];
for (int i = 0; i < s; i++){
U[i] = new double[s];
L[i] = new double[s];
for (int j = 0; j < s; j++)
U[i][j] = A[i][j];
}
for (int i = 0, j = 0; i < s; i = ++j){
L[i][j] = 1;
for (int k = i + 1; k < s - 1; k++){
L[k][j] = U[k][j] / U[i][j];
double* vec_t = mul(U[i], L[k][j], s);
for (int z = 0; z < s; z++)
U[k][z] = U[k][z] - vec_t[z];
delete[] vec_t;
}
}
};
As I understood from debagger's information: two arrays (U and L) has been passed to function with some addresses in memory. And it's quite strange because I didn't initialize it. I call this function two times and first time it works nicely (ok, at least it works), but at second call it crashes. I have no idea how to resolve it.
There is link to whole project: CLICK
I'm working in MS Visual Studio 2013 under Windows 7 x64.
UPDATE
According to some commentaries below I should provide some additive information.
First of all, sorry for quality of code. I wrote it only for myself for 2 days.
Second, when I said "at second call", I mean that first I call LU when I need to get determinant of S (I use LU decomposition fot this) and it working without any crashes. Second call it's when I trying to get inverse of matrix (the same, S). And when I call detLU at [0, 0] point of matrix (to get cofactor) I got this crash.
Third, if I get information from debagger correctly, arrays L and U passes in function at second call with already defined memory's addresses. I can't understand why, becouse before LU call I have just wrote "double** L; double** U;" without any initialization.
I can try provide some additional debug information or some tests, if somebody explain me what exactly I have to do.
The point you get a heap corruption error/crash is typically just the symptom of an actual heap overflow/underflow or other memory error at some other time/point in the past. This is why heap corruptions can be difficult to track down.
You have a lot of code and all the double-pointers are difficult to track but I did notice one potential issue:
double** initInWeights(double f, int h, int w) {
double** W = new double*[h];
for (int i = 0; i < 10; i++) {
W[i] = new double[w];
The loop will overflow W[] if h is less than 10. Chances are that somewhere in your code you have a buffer overflow/underflow or are using memory after it is freed. The complexity and design of your code makes it difficult to pinpoint at a glance.
Is there a reason you are using raw double-pointers instead of simply std::vector<std::vector<double>>? This would remove all your manual memory management code, making your code shorter, simpler, and more importantly remove the heap corruption issue.
Barring that you should double-check that all manually allocated memory is the correct size and access loops can never go out-of-bounds.
Update -- I think your problem may lie with a buffer overflow in the extract() function in matrix.cpp:
double** extract(double** mat, int s, int col, int row)
{
double** ext = new double*[s - 1];
for (int i = 0; i < s - 1; i++)
{
ext[i] = new double[s - 1];
}
int ext_c = 0, ext_r = 0;
for (int i = 0; i < s; i++)
{
if (i != row)
{
for (int j = 0; j < s; j++)
{ // Overflow on ext_c here
if (j != col) ext[ext_r][ext_c++] = mat[i][j];
}
ext_r++;
}
}
return ext;
};
You never reset ext_c so it simply keeps increasing in size up to (s-1)*(s-1) which obviously overflows the ext[] array. To fix this you simply need to change the inner loop definition to:
for (int j = 0, ext_c = 0; j < s; j++)
At least that one change lets me run your project without any heap corruption errors.