C++ Eigen Matrix clarifications - c++

I have only recently started exploring C++ Eigen library and a little puzzled with some of the documentation. It would be great if someone can clarify this.
In the common pitfalls (https://eigen.tuxfamily.org/dox-devel/TopicPitfalls.html) Alignment Issues section, it says " Indeed, since C++17, C++ does not have quite good enough support for explicit data alignment.".
The page on how to get rid of alignment issues (https://eigen.tuxfamily.org/dox-devel/group__TopicUnalignedArrayAssert.html#getrid), the documentation says, "If you can target [c++17] only with a recent compiler (e.g., GCC>=7, clang>=5, MSVC>=19.12), then you're lucky: enabling c++17 should be enough" .
So is alignment not an issue with Eigen Matrix if I am using c++ 17 with gcc>=7.0? Have I understood this right? And that the macro EIGEN_MAKE_ALIGNED_OPERATOR_NEW won't be needed? And if this is correct, what is different between c++14/c++17 which takes care of the alignment issues?
The second question is regarding the pass-by-value section (https://eigen.tuxfamily.org/dox-devel/group__TopicPassingByValue.html). The documentation claims that pass-by-value could be illegal and could crash the program. This is very puzzling to me. Wouldn't pass-by-value just invoke a copy constructor? As an example.
Eigen::Vector3f veca = ComputeVecA();
Eigen::Vector3f vecb = veca; //< If pass-by-value is unsafe, is this operation safe?
And lastly, can I rely on RVO/NRVO for Eigen fixed sized matrix class? I suspect the answer to this is yes.

In the common pitfalls
(https://eigen.tuxfamily.org/dox-devel/TopicPitfalls.html) Alignment
Issues section, it says "Indeed, since C++17, C++ does not have quite
good enough support for explicit data alignment."
This seems to be a typo. It should say "until C++17" instead of "since C++17" because C++17 actually added support for allocation with extraordinary alignment restrictions. Two comments agree with me.
The page on how to get rid of alignment issues
(https://eigen.tuxfamily.org/dox-devel/group__TopicUnalignedArrayAssert.html#getrid),
the documentation says, "If you can target [C++17] only with a recent
compiler (e.g., GCC >= 7, Clang >= 5, MSVC >= 19.12), then you're
lucky: enabling C++17 should be enough."
So is alignment not an issue with Eigen Matrix if I am using C++17
with gcc >= 7.0? Have I understood this right? And that the macro
EIGEN_MAKE_ALIGNED_OPERATOR_NEW won't be needed?
Yes.
And if this is correct, what is different between C++14/C++17 which
takes care of the alignment issues?
C++17 supports Dynamic memory allocation for over-aligned data. operator new now properly allocates over-aligned memory with the align_val_t argument.
The second question is regarding the pass-by-value section
(https://eigen.tuxfamily.org/dox-devel/group__TopicPassingByValue.html).
The documentation claims that pass-by-value could be illegal and could
crash the program. This is very puzzling to me. Wouldn't pass-by-value
just invoke a copy constructor?
If the variable is a local variable (as vecb in your example), then the compiler and the library take care to ensure that vecb meets the special alignment restriction required by Eigen. However, if the variable is a function parameter, this alignment restriction is not respected, meaning the program may operate on ill-aligned memory, thus crash. (This has little to do with the copy constructor.)
And lastly, can I rely on RVO/NRVO for Eigen fixed sized matrix class?
I suspect the answer to this is yes.
The answer is pretty much the same for Eigen classes and other classes: try and see. Usually the answer is yes.

Q1: as already commented, this was a typo when updating this paragraph for c++17. This is already fixed.
Q2: I don't remember all the details about this one but it is related to two technical issues.
Some compilers failed to properly align the stack, in this case it is hopeless to get aligned function parameters.
Old ABI specifications did not allowed overalignment of function parameters.
I would expect that since c++11 and the usage of the standardized alignas keyword this is not an issue anymore, but maybe this is still a problem on some exotic compiler-OS combinations.
Q3: there is nothing preventing RVO/NRVO, and from my experience when it can apply it does apply.

Related

memset and a dynamic array of std::complex<double>

since std::complex is a non-trivial type, compiling the following with GCC 8.1.1
complex<double>* z = new complex<double>[6];
memset(z,0,6*sizeof*z);
delete [] (z);`
produces a warning
clearing an object of non-trivial type
My question is, is there actually any potential harm in doing so?
The behavior of std::memset is only defined if the pointer it is modifying is a pointer to a TriviallyCopyable type. std::complex is guaranteed to be a LiteralType, but, as far as I can tell, it isn't guaranteed to be TriviallyCopyable, meaning that std::memset(z, 0, ...) is not portable.
That said, std::complex has an array-compatibility guarantee, which states that the storage of a std::complex<T> is exactly two consecutive Ts and can be reinterpreted as such. This seems to suggest that std::memset is actually fine, since it would be accessing through this array-oriented access. It may also imply that std::complex<double> is TriviallyCopyable, but I am unable to determine that.
If you wish to do this, I would suggest being on the safe side and static_asserting that std::complex<double> is TriviallyCopyable:
static_assert(std::is_trivially_copyable<std::complex<double>>::value);
If that assertion holds, then you are guaranteed that the memset is safe.
In either case, it would be safe to use std::fill:
std::fill(z, z + 6, std::complex<double>{});
It optimizes down to a call to memset, albeit with a few more instructions before it. I would recommend using std::fill unless your benchmarking and profiling showed that those few extra instructions are causing problems.
Never, never, ever memset non-POD types. They have constructors for a reason. Just writing a bunch of bytes on top of them is highly unlikely to give the desired result (and if it does, the types themselves are badly designed as they should clearly then just be POD in the first place - or you are simply being unlucky that Undefined Behaviour seems to work in this case - have fun debugging it when it doesn't after you change optimization level, compiler or platform (or moon phase)).
Just don't do this.
The answer to this question is that for a standard-compliant std::complex there is no need for memset after new.
new complex<double>[6] will initialize the complex to (0, 0) because it calls a default (non-trivial) constructor that initializes to zero.
(I think this is a mistake unfortunately.)
https://en.cppreference.com/w/cpp/numeric/complex/complex
If the code posted was just and example with missing code between new and memset, then std::fill will do the right thing.
(In part because the specific standard library implementation knows internally how std::complex is implemented.)

should std::vector honour alignof(value_type)?

If I define a simple type with a certain alignment requirement, shouldn't a std::vector<t> of said type honour the alignment for every single element?
Consider the following example
typedef std::array<double,3> alignas(32) avx_point;
std::vector<avx_point> x(10);
assert(!(std::ptrdiff_t(&(x[0]))&31) && // assert that x[0] is 32-byte aligned
!(std::ptrdiff_t(&(x[1]))&31)); // assert that x[1] is 32-byte aligned
I found that the alignment requirement is silently (without any warning) violated by clang 3.2 (with or without -stdlib=libc++), while gcc 4.8.0 issues a warning that it ignores the attributes on the template argument to std::vector (the intel compiler is too daft to understand alignas, but if I use __declspec(align(32)) instead, it behaves like clang). Both create code that triggers the assert.
So, is this correct behaviour or a bug of clang (and icpc) and an issue with gcc?
edit
to answer a question raised in the comments: if I define
typedef typename std::aligned_storage<sizeof (avx_point),
alignof(avx_point)>::type avx_storage;
I get
sizeof (avx_storage) == 32;
alignof(avx_storage) == 32;
but std::vector<avx_storage> still fails to align the first element (and hence all the others too) for clang and gcc (without warning this time). So there are apparently two issues with the implementations: first, that std::allocator<type> ignores any alignment requirements even for the first element (illegal?) and second, that no padding is applied to ensure alignment of subsequent elements.
––––––––––––
edit There is a related, more practical question for how to obtain memory suitably aligned for SSE/AVX operations. In contrast, I want to know whether std::vector<> (or std::allocator<>) shouldn't honour alignas as of the C++ standard (as of 2011). None of the answers to that other question are suitable answers to this one.
first, that std::allocator ignores any alignment requirements even for the first element (illegal?)
I'm far from being an expert on allocators but it seems to me that, unfortunately, this is legal behaviour. More precisely, an allocator might ignore the requested alignment. Indeed, [allocator.requirements], 17.6.3.5/6 states:
If the alignment associated with a specific over-aligned type is not supported by an allocator, instantiation
of the allocator for that type may fail. The allocator also may silently ignore the requested alignment.
You can write your own allocator to give you aligned memory. I've done that before at my work but, unfortunately, for copyright reasons, I cannot disclose the code :-( All I can say, is the obvious thing: it was based on _aligned_malloc and _aligned_free (which are Microsoft extensions). Or you can Google for "aligned allocator" and a few options will come up, one of which is
https://gist.github.com/donny-dont/1471329
I emphasize that I'm not the author of this aligned allocator and I've never used it.
Update
The aligned allocator above is for Visual Studio/Windows but it can be used as a base for implementing aligned allocators on other platforms. You can use the posix memalign family of functions or the C11 function aligned_alloc.
See this post.

Alignment and the STL in VS 2012/VC11

I have a vague memory of the STL having trouble with aligned structs (e.g. SIMD vectors placed in a std::vector), unless you specify a custom allocator.
According to this document VS 2012/VC11 has partial support for c++ alignment. Does this mean that the VS STL implementation can handle aligned structs now, without providing a custom allocator?
No. It means that the VC++ compiler supports a method for specifying the required alignment for a type (the __declspec(align(N)) syntax). VC++ has always supported that, and it is basically listed as "partial" because "we have some alignment-related functionality, and it looks better than saying "not supported".
Apart from that, I'm not aware of anything in the C++11 alignment specification which indicates that SIMD vectors in a standard library container is guaranteed to work. C++11 alignment is basically just a formalization of what compilers already did, in this regard (as far as I know. I'd love if you could prove me wrong).
SIMD vectors are what the standard calls "over-aligned types" (see the part about "extended alignment"). What that means is basically "we guarantee nothing, and it's entirely up to the compiler how/if they handle such types.
In other words, implementing this part of C++11 would not necessarily change how SIMD objects are handled.

The future of C++ alignment: passing by value?

Reading the Eigen library documentation, I noticed that some objects cannot be passed by value. Are there any developments in C++11 or planned developments that will make it safe to pass such objects by value?
Also, why is there no problem with returning such objects by value?
It is entirely possible that Eigen is just a terribly written library (or just poorly-thought out); just because something is online doesn't make it true. For example:
Passing objects by value is almost always a very bad idea in C++, as this means useless copies, and one should pass them by reference instead.
This is not good advice in general, depending on the object. It is sometimes necessary pre-C++11 (because you might want an object to be uncopyable), but in C++11, it is never necessary. You might still do it, but it is never necessary to always pass a value by reference. You can just move it by value if it contains allocated memory or something. Obviously, if it's a "look-but-don't-touch" sort of thing, const& is fine.
Simple struct objects, presumably like Eigen's Vector2d are probably cheap enough to copy (especially in x86-64, where pointers are 64-bits) that the copy won't mean much in terms of performance. At the same time, it is overhead (theoretically), so if you're in performance critical code, it may help.
Then again, it may not.
The particular crash issue that Eigen seems to be talking about has to do with alignment of objects. However, most C++03 compiler-specific alignment support guarantees that alignment in all cases. So there's no reason that should "make your program crash!". I've never seen an SSE/AltaVec/etc-based library that used compiler-specific alignment declarations that caused crashes with value parameters. And I've used quite a few.
So if they're having some kind of crash problem with this, then I would consider Eigen to be of... dubious merit. Not without further investigation.
Also, if an object is unsafe to pass by value, as the Eigen docs suggest, then the proper way to handle this would be to make the object non-copy-constructable. Copy assignment would be fine, since it requires an already existing object. However, Eigen doesn't do this, which again suggests that the developers missed some of the finer points of API design.
However, for the record, C++11 has the alignas keyword, which is a standard way to declare that an object shall be of a certain alignment.
Also, why is there no problem with returning such objects by value?
Who says that there isn't (noting the copying problem, not the alignment problem)? The difference is that you can't return a temporary value by reference. So they're not doing it because it's not possible.
They could do this in C++11:
class alignas(16) Matrix4f
{
// ...
};
Now the class will always be aligned on a 16-byte boundary.
Also, maybe I'm being silly but this shouldn't be an issue anyway. Given a class like this:
class Matrix4f
{
public:
// ...
private:
// their data type (aligned however they decided in that library):
aligned_data_type data;
// or in C++11
alignas(16) float data[16];
};
Compilers are now obligated to allocate a Matrix4f on a 16-byte boundary anyway, because that would break it; the class-level alignas should be redundant. But I've been known to be wrong in the past, somehow.

What is the fastest portable way to copy an array in C++

This question has been bothering me for some time. The possibilities I am considering are
memcpy
std::copy
cblas_dcopy
Does anyone have any clue on what the pros and cons are with these three? Other suggestions are also welcome.
In C++ you should use std::copy by default unless you have good reasons to do otherwise. The reason is that C++ classes define their own copy semantics via the copy constructor and copy assignment operator, and of the operations listed, only std::copy respects those conventions.
memcpy() uses raw, byte-wise copy of data (though likely heavily optimized for cache line size, etc.), and ignores C++ copy semantics (it's a C function, after all...).
cblas_dcopy() is a specialized function for use in linear algebra routines using double precision floating point values. It likely excels at that, but shouldn't be considered general purpose.
If your data is "simple" POD type struct data or raw fundamental type data, memcpy will likely be as fast as you can get. Just as likely, std::copy will be optimized to use memcpy in these situations, so you'll never know the difference.
In short, use std::copy().
Use std::copy unless profiling shows you a needed benefit in doing otherwise. It honours the C++ object encapsulation, invoking copy constructors and assignment operators, and the implementation could include other inline optimisations. That's more maintainable if the types being copied are changed from something trivially copyable to something not.
As PeterCordes comments below, modern compilers such as GCC and clang analyse memcpy() requests internally and typically avoid an out-of-line function call, and even before that some systems had memcpy() macros that inlined copies below a certain size threshold.
FWIW / on the old Linux box I have handy (in 2010), GCC doesn't do any spectacular optimisations, but bits/type_traits.h does allow the program to easily specify whether std::copy should fall through to memcpy() (see code below), so there's no reason to avoid using std::copy() in favour of memcpy() directly.
* Copyright (c) 1997
* Silicon Graphics Computer Systems, Inc.
*
* Permission to use, copy, modify, distribute and sell this software
* and its documentation for any purpose is hereby granted without fee,
* provided that the above copyright notice appear in all copies and
* that both that copyright notice and this permission notice appear
* in supporting documentation. Silicon Graphics makes no
* representations about the suitability of this software for any
* purpose. It is provided "as is" without express or implied warranty.
...
/*
This header file provides a framework for allowing compile time dispatch
based on type attributes. This is useful when writing template code.
For example, when making a copy of an array of an unknown type, it helps
to know if the type has a trivial copy constructor or not, to help decide
if a memcpy can be used.
The class template __type_traits provides a series of typedefs each of
which is either __true_type or __false_type. The argument to
__type_traits can be any type. The typedefs within this template will
attain their correct values by one of these means:
1. The general instantiation contain conservative values which work
for all types.
2. Specializations may be declared to make distinctions between types.
3. Some compilers (such as the Silicon Graphics N32 and N64 compilers)
will automatically provide the appropriate specializations for all
types.
EXAMPLE:
//Copy an array of elements which have non-trivial copy constructors
template <class _Tp> void
copy(_Tp* __source,_Tp* __destination,int __n,__false_type);
//Copy an array of elements which have trivial copy constructors. Use memcpy.
template <class _Tp> void
copy(_Tp* __source,_Tp* __destination,int __n,__true_type);
//Copy an array of any type by using the most efficient copy mechanism
template <class _Tp> inline void copy(_Tp* __source,_Tp* __destination,int __n) {
copy(__source,__destination,__n,
typename __type_traits<_Tp>::has_trivial_copy_constructor());
}
*/
memcpy, however, if your array contains non-trivial objects, stick with std::copy.
In most cases memcpy will be the fastest, as it is the lowest level and may be implemented in machine code on a given platform. (however, if your array contains non-trivial objects memcpy may not do the correct think, so it may be safer to stick with std::copy)
However it all depends on how well the stdlib is implanted on the given platform etc. As the standard does not say how fast operations must be, there is no way to know in a “portable” since what will be fastest.
Profiling your application will show the fasted on a given platform, but will only tell you about the test platform.
However, when you profile you application you will most likely find that the issues are in your design rather than your choose of array copy method. (E.g. why do you need to copy large arrays so match?)
memcpy is probably the fastest way to copy a contiguous block of memory. This is because it will likely be highly optimized to your particular bit of hardware. It is often implemented as a built-in compiler function.
Having said that, and non POD C++ object is unlikely to be contiguous and therefore copying arrays of C++ objects using memcpy is likely to give you unexpected results. When copying arrays (or collections) of C++ objects, std::copy will use the object's own copy semantics and is therefore suitable for use with non POD C++ objects.
cblas_dcopy looks like a copy for use with a specific library and probably has little use when not using that library.
I have to think that the others will call memcpy(). Having said that I can't beleive that there will be any appreciable difference.
If it really matters to you, code all three and run a profiler, but it might be better to consider things like readability/maintainability, exception-safe, etc... (and code an assembler insert while you are at it, not that you are likely to see a difference)
Is your program threaded?
And, most importantly, how are you declating your array? (what is it an array of) and how large is it?
I've made a small benchmark (VS 2018 Preview, MKL 2017 Update 4) to compare memcpy and the sequential version of cblas_?copy and found them to be equally fast on float and double.
Just Profile your application. You will likely find that copying is not the slowest part of it.