[Christian Hacki and Barry point out that a question has been asked specifically about complex previously. My question is more general, as it applies to std::vector, std::array, and all the container classes that use allocators. Also, the answers on the other question are not adequate, IMO. Perhaps I could bump the other question somehow.]
I have a C++ application that uses lots of arrays and vectors of real values (doubles) and complex values. I do not want them initialized to zeros. Hey compiler and STL! - just allocate the dang memory and be done with it. It's on me to put the right values in there. Should I fail to do so, I want the program to crash during testing.
I managed to prevent std::vector from initializing with zeros by defining a custom allocator for use with POD's. (Is there a better way?)
What to do about std::complex? It is not defined as a POD. It has a default constructor that spews zeros. So if I write
std::complex<double> A[compile_time_const];
it spews. Ditto for
std::array <std::complex<double>, compile_time_constant>;
What's the best way to utilize the std::complex<> functionality without provoking swarms of zeros?
[Edit] Consider this actual example from a real-valued FFT routine.
{
cvec Out(N);
for (int k : range(0, N / 2)) {
complex Evenk = Even[k];
complex T = twiddle(k, N, sgn);
complex Oddk = Odd[k] * T;
Out[k] = Evenk + Oddk;
Out[k + N / 2] = Evenk - Oddk; // Note. Not in order
}
return Out;
}
Related
I am trying to do a product operand on the values inside of a vector. It is a huge mess of code.. I have posted it previously but no one was able to help. I just wanna confirm which is the correct way to do a single part of it. I currently have:
vector<double> taylorNumerator;
for(a = 0; a <= (constant); a++) {
double Number = equation involving a to get numerous values;
taylorNumerator.push_back(Number);
for(b = 0; b <= (constant); b++) {
double NewNumber *= taylorNumerator[b];
}
This is what I have as a snapshot, it is very short from what I actually have. Someone told me it is better to do vector.at(index) instead. Which is the correct or best way to accomplish this? If you so desire I can paste all of the code, it works but the values I get are wrong.
When possible, you should probably avoid using indexes at all. Your options are:
A range-based for loop:
for (auto numerator : taylorNumerators) { ... }
An iterator-based loop:
for (auto it = taylorNumerators.begin(); it != taylorNuemrators.end(); ++it) { ... }
A standard algorithm, perhaps with a lambda:
#include <algorithm>
std::for_each(taylorNumerators, [&](double numerator) { ... });
In particular, note that some algorithms let you specify a number of iterations, like std::generate_n, so you can create exactly n items without counting to n yourself.
If you need the index in the calculation, then it can be appropriate to use a traditional for loop. You have to watch for a couple pitfalls: std::vector<T>::size() returns a std::vector<T>::size_type which is typically identical to std::size_type, which is (1) unsigned and (2) quite possibly larger than an int.
for (std::size_t i = 0; i != taylorNumerators.size(); ++i) { ... }
Your calculations probably deal with doubles or some numerical type other than std::size_t, so you have to consider the best way to convert it. Many programmers would rely on implicit conversions, but that can be dangerous unless you know the conversion rules very well. I'd generally start by doing a static cast of the index to the type I actually need. For example:
for (std::size_t i = 0; i != taylorNumerators.size(); ++i) {
const auto x = static_cast<double>(i);
/* calculation involving x */
}
In C++, it's probably far more common to make sure the index is in range and then use operator[] rather than to use at(). Many projects disable exceptions, so the safety guarantee of at() wouldn't really be available. And, if you can check the range once yourself, then it'll be faster to use operator[] than to rely on the range-check built into at() on each index operation.
What you have is fine. Modern compilers can optimize the heck out of the above such that the code is just as fast as the equivalent C code of accessing items direclty.
The only optimization for using vector I recommend is to invoke taylorNumerator.reserve(constant) to allocate the needed storage upfront instead of the vector resizing itself as new items are added.
About the only worthy optimization after that is to not use vector at all and just use a static array - especially if constant is small enough that it doesn't blow up the stack (or binary size if global).
double taylorNumerator[constant];
I am using some legacy C code that passing around lots of raw pointers. To interface with the code, I have to pass a function of the form:
const int N = ...;
T * func(T * x) {
// TODO Put N elements in x
return x + N;
}
where this function should write the result into x, and then return x.
Internally, in this function, I am using Eigen extensively to perform some calculations. Then I write the result back to the raw pointer using the Map class. A simple example which mimics what I am doing is this:
const int N = 5;
T * func(T * x) {
// Do a lot of operations that result in some matrices like
Eigen::Matrix<T, N, 1 > A = ...
Eigen::Matrix<T, N, 1 > B = ...
Eigen::Map<Eigen::Matrix<T, N, 1 >> constraint(x);
constraint = A - B;
return x + N;
}
Obviously, there is much more complicated stuff going on internally, but that is the gist of it... Do some calculations with Eigen, then use the Map class to write the result back to the raw pointer.
Now the problem is that when I profile this code with Callgrind, and then view the results with KCachegrind, the lines
constraint = A - B;
are almost always the bottleneck. This is sort of understandable, because such lines could/are potentially doing three things:
Constructing the Map object
Performing the calculation
Writing the result to the pointer
So it is understandable that this line might have the longest runtime. But I am a little bit worried that perhaps I am somehow doing an extra copy in that line before the data gets written to the raw pointer.
So is there a better way of writing the result to the raw pointer? Or is that the idiom I should be using?
In the back of my mind, I am wondering if using the placement new syntax would buy me anything here.
Note: This code is mission critical and should run in realtime, so I really need to squeeze every ounce of speed out of it. For instance, getting this call from a runtime of 0.12 seconds to 0.1 seconds would be huge for us. But code legibility is also a huge concern since we are constantly tweaking the model used in the internal calculations.
These two lines of code:
Eigen::Map<Eigen::Matrix<T, N, 1 >> constraint(x);
constraint = A - B;
are essentially compiled by Eigen as:
for(int i=0; i<N; ++i)
x[i] = A[i] - B[i];
The reality is a bit more complicated because of explicit unrolling, and explicit vectorization (both depends on T), but that's essentially it. So the construction of the Map object is essentially a no-op (it is optimized away by any compiler) and no, there is no extra copy going on here.
Actually, if your profiler is able to tell you that the bottleneck lies on this simple expression, then that very likely means that this piece of code has not been inlined, meaning that you did not enabled compiler optimizations flags (like -O3 with gcc/clang).
Long time ago, inspired by "Numerical recipes in C", I started to use the following construct for storing matrices (2D-arrays).
double **allocate_matrix(int NumRows, int NumCol)
{
double **x;
int i;
x = (double **)malloc(NumRows * sizeof(double *));
for (i = 0; i < NumRows; ++i) x[i] = (double *)calloc(NumCol, sizeof(double));
return x;
}
double **x = allocate_matrix(1000,2000);
x[m][n] = ...;
But recently noticed that many people implement matrices as follows
double *x = (double *)malloc(NumRows * NumCols * sizeof(double));
x[NumCol * m + n] = ...;
From the locality point of view the second method seems perfect, but has awful readability... So I started to wonder, is my first method with storing auxiliary array or **double pointers really bad or the compiler will optimize it eventually such that it will be more or less equivalent in performance to the second method? I am suspicious because I think that in the first method two jumps are made when accessing the value, x[m] and then x[m][n] and there is a chance that each time the CPU will load first the x array and then x[m] array.
p.s. do not worry about extra memory for storing **double, for large matrices it is just a small percentage.
P.P.S. since many people did not understand my question very well, I will try to re-shape it: do I understand right that the first method is kind of locality-hell, when each time x[m][n] is accessed first x array will be loaded into CPU cache and then x[m] array will be loaded thus making each access at the speed of talking to RAM. Or am I wrong and the first method is also OK from data-locality point of view?
For C-style allocations you can actually have the best of both worlds:
double **allocate_matrix(int NumRows, int NumCol)
{
double **x;
int i;
x = (double **)malloc(NumRows * sizeof(double *));
x[0] = (double *)calloc(NumRows * NumCol, sizeof(double)); // <<< single contiguous memory allocation for entire array
for (i = 1; i < NumRows; ++i) x[i] = x[i - 1] + NumCols;
return x;
}
This way you get data locality and its associated cache/memory access benefits, and you can treat the array as a double ** or a flattened 2D array (array[i * NumCols + j]) interchangeably. You also have fewer calloc/free calls (2 versus NumRows + 1).
No need to guess whether the compiler will optimize the first method. Just use the second method which you know is fast, and use a wrapper class that implements for example these methods:
double& operator(int x, int y);
double const& operator(int x, int y) const;
... and access your objects like this:
arr(2, 3) = 5;
Alternatively, if you can bear a little more code complexity in the wrapper class(es), you can implement a class that can be accessed with the more traditional arr[2][3] = 5; syntax. This is implemented in a dimension-agnostic way in the Boost.MultiArray library, but you can do your own simple implementation too, using a proxy class.
Note: Considering your usage of C style (a hardcoded non-generic "double" type, plain pointers, function-beginning variable declarations, and malloc), you will probably need to get more into C++ constructs before you can implement either of the options I mentioned.
The two methods are quite different.
While the first method allows for easier direct access to the values by adding another indirection (the double** array, hence you need 1+N mallocs), ...
the second method guarantees that ALL values are stored contiguously and only requires one malloc.
I would argue that the second method is always superior. Malloc is an expensive operation and contiguous memory is a huge plus, depending on the application.
In C++, you'd just implement it like this:
std::vector<double> matrix(NumRows * NumCols);
matrix[y * numCols + x] = value; // Access
and if you're concerned with the inconvenience of having to compute the index yourself, add a wrapper that implements operator(int x, int y) to it.
You are also right that the first method is more expensive when accessing the values. Because you need two memory lookups as you described x[m] and then x[m][n]. There is no way the compiler will "optimize this away". The first array, depending on its size, will be cached, and the performance hit may not be that bad. In the second case, you need an extra multiplication for direct access.
In the first method you use, the double* in the master array point to logical columns (arrays of size NumCol).
So, if you write something like below, you get the benefits of data locality in some sense (pseudocode):
foreach(row in rows):
foreach(elem in row):
//Do something
If you tried the same thing with the second method, and if element access was done the way you specified (i.e. x[NumCol*m + n]), you still get the same benefit. This is because you treat the array to be in row-major order. If you tried the same pseudocode while accessing the elements in column-major order, I assume you'd get cache misses given that the array size is large enough.
In addition to this, the second method has the additional desirable property of being a single contiguous block of memory which further improves the performance even when you loop through multiple rows (unlike the first method).
So, in conclusion, the second method should be much better in terms of performance.
If NumCol is a compile-time constant, or if you are using GCC with language extensions enabled, then you can do:
double (*x)[NumCol] = (double (*)[NumCol]) malloc(NumRows * sizeof (double[NumCol]));
and then use x as a 2D array and the compiler will do the indexing arithmetic for you. The caveat is that unless NumCol is a compile-time constant, ISO C++ won't let you do this, and if you use GCC language extensions you won't be able to port your code to another compiler.
My teacher for an assignment problems requires that we pass an array to a function that just finds the median of it. and if there's an even amount of numbers, it averages the two in the middle. I quote from him "use pointers whenever possible". The only one I could see using a pointer is the array itself?
I understand the concept of what needs to happen, I'm just not sure how to properly use pointers and Googling doesn't reveal too helpful results.
int medianArray(int *pArray, int sizeArray)
{
if(sizeArray % 2 == 1)
{
return (pArray[((int)(sizeArray/2)) -1 ] + pArray[((int)(sizeArray/2)) +1 ]) / 2;
}
else
return pArray[(int) sizeArray/2];
}
You've got the right idea here, but you're computing your offsets wrong.
Here's an idea:
size_t median = sizeArray / 2;
return (pArray[median] + pArray[median+1]) / 2;
Don't forget this will fail if the values exceed int bounds, and additionally you should be using size_t to express sizes as that should never be negative.
Additionally, there's no point in casting the result of an int calculation to int.
Regarding this C++ lesson, I think anything that bucks the principles laid out in the Standard Library better have a good reason for doing so. While an academic exploration of the benefits of pointers vs. iterators vs. references is always encouraged, advocating pointers "whenever possible" is a bad plan and comes from a C mindset.
In a modern C++ course this assignment would revolve around computing the median of an unsorted Standard Library container of an arbitrary type by defining a template function.
just so you can get a concrete feel regarding what people were talking about when they said 'modern c++ doesnt use pointer' etc.
One alternative is to use std::vector
int medianArray(const std::vector<int> &vec)
{
int sizeArray = vec.size();
if(sizeArray % 2 == 1)
{
return (vec[((int)(sizeArray/2)) -1 ] + vec[((int)(sizeArray/2)) +1 ]) / 2;
}
else
return vec[(int) sizeArray/2];
}
note that vector knows how big it is. It also wont run off either end if you use vec.at(index)
I was wondering whether (apart from the obvious syntax differences) there would be any efficiency difference between having a class containing multiple instances of an object (of the same type) or a fixed size array of objects of that type.
In code:
struct A {
double x;
double y;
double z;
};
struct B {
double xvec[3];
};
In reality I would be using boost::arrays which are a better C++ alternative to C-style arrays.
I am mainly concerned with construction/destruction and reading/writing such doubles, because these classes will often be constructed just to invoke one of their member functions once.
Thank you for your help/suggestions.
Typically the representation of those two structs would be exactly the same. It is, however, possible to have poor performance if you pick the wrong one for your use case.
For example, if you need to access each element in a loop, with an array you could do:
for (int i = 0; i < 3; i++)
dosomething(xvec[i]);
However, without an array, you'd either need to duplicate code:
dosomething(x);
dosomething(y);
dosomething(z);
This means code duplication - which can go either way. On the one hand there's less loop code; on the other hand very tight loops can be quite fast on modern processors, and code duplication can blow away the I-cache.
The other option is a switch:
for (int i = 0; i < 3; i++) {
int *r;
switch(i) {
case 0: r = &x; break;
case 1: r = &y; break;
case 1: r = &z; break;
}
dosomething(*r); // assume this is some big inlined code
}
This avoids the possibly-large i-cache footprint, but has a huge negative performance impact. Don't do this.
On the other hand, it is, in principle, possible for array accesses to be slower, if your compiler isn't very smart:
xvec[0] = xvec[1] + 1;
dosomething(xvec[1]);
Since xvec[0] and xvec[1] are distinct, in principle, the compiler ought to be able to keep the value of xvec[1] in a register, so it doesn't have to reload the value at the next line. However, it's possible some compilers might not be smart enough to notice that xvec[0] and xvec[1] don't alias. In this case, using seperate fields might be a very tiny bit faster.
In short, it's not about one or the other being fast in all cases. It's about matching the representation to how you use it.
Personally, I would suggest going with whatever makes the code working on xvec most natural. It's not worth spending a lot of human time worrying about something that, at best, will probably only produce such a small performance difference that you'll only catch it in micro-benchmarks.
MVC++ 2010 generated exactly the same code for reading/writing from two POD structs like in your example. Since the offsets to read/write to are computable at compile time, this is not surprising. Same goes for construction and destruction.
As for the actual performance, the general rule applies: profile it if it matters, if it doesn't - why care?
Indexing into an array member is perhaps a bit more work for the user of your struct, but then again, he can more easily iterate over the elements.
In case you can't decide and want to keep your options open, you can use an anonymous union:
struct Foo
{
union
{
struct
{
double x;
double y;
double z;
} xyz;
double arr[3];
};
};
int main()
{
Foo a;
a.xyz.x = 42;
std::cout << a.arr[0] << std::endl;
}
Some compilers also support anonymous structs, in that case you can leave the xyz part out.
It depends. For instance, the example you gave is a classic one in favor of 'old-school' arrays: a math point/vector (or matrix)
has a fixed number of elements
the data itself is usually kept
private in an object
since (if?) it has a class as an
interface, you can properly
initialize them in the constructor
(otherwise, classic array
inialization is something I don't
really like, syntax-wise)
In such cases (going with the math vector/matrix examples), I always ended up using C-style arrays internally, as you can loop over them instead of writing copy/pasted code for each component.
But this is a special case -- for me, in C++ nowadays arrays == STL vector, it's fast and I don't have to worry about nuthin' :)
The difference can be in storing the variables in memory. In the first example compiler can add padding to align the data. But in your paticular case it doesn't matter.
raw arrays offer better cache locality than c++ arrays, as presented however, the array example's only advantage over the multiple objects is the ability to iterate over the elements.
The real answer is of course, create a test case and measure.