Transitive effect of Eigen EIGEN_MAKE_ALIGNED_OPERATOR_NEW? - c++

Recently, I was made aware of the potential issues of memory alignment for Fixed-size vectorizable Eigen objects.
The correct code as stated in the doc:
class Foo
{
...
Eigen::Vector2d v;
...
public:
EIGEN_MAKE_ALIGNED_OPERATOR_NEW
};
...
Foo *foo = new Foo;
I would like to know if this code is ok or not?
class Foo2
{
...
Foo foo;
...
};
...
Foo2 *foo = new Foo2; //?
Or should EIGEN_MAKE_ALIGNED_OPERATOR_NEW be again added in the Foo2 class?
This is what is suggested here I think:
If we were to add EIGEN_MAKE_ALIGNED_OPERATOR_NEW this would only solve the problem for the Cartographer library itself. Users of the library would also have to add EIGEN_MAKE_ALIGNED_OPERATOR_NEW to classes containing vectorized Cartographer classes. This sounds like a maintenance nightmare.
I have no experience with new operator overloading. I think the question is more general and would somehow be related to how new operator works in C++. For instance is the overloaded new operator in Foo be called by the default new operator in Foo2? What about inheritance? If Foo2 inherits from Foo, should we put also EIGEN_MAKE_ALIGNED_OPERATOR_NEW in Foo2?
Since I was only aware of this topic recently, I did many research and found the following:
default alignment on x86-64 is 16 bytes, so it is fine to not have EIGEN_MAKE_ALIGNED_OPERATOR_NEW (if only SSE is enabled)
unless your code is compiled for more recent SIMD sets (e.g. AVX2 with -march=native to have all the optimizations on a local computer), EIGEN_MAKE_ALIGNED_OPERATOR_NEW is now needed
what about the other architecture? For instance for ARM, any issue if we don't declare EIGEN_MAKE_ALIGNED_OPERATOR_NEW and NEON is enabled?
I found suggestion to use template <typename Scalar> using Isometry3 = Eigen::Transform<Scalar, 3, Eigen::Isometry | Eigen::DontAlign> instead of Isometry
still need to think on how to be able to easily use Eigen type (e.g. Isometry3d) in the code without the alignment issue. So add a new type MyIsometry3d that inherits from Eigen::Transform<double, 3, Eigen::Isometry | Eigen::DontAlign> for instance?
More generally, I would like to "disable alignment" (or vectorization) in fixed-size Eigen type:
I would like to keep the syntax, for instance keeping Isometry3d in the code
and not be bothered with alignment issue when using Isometry3d in a class or when using std::vector<Isometry3d>
something to tell Eigen to always use unaligned load/store (e.g. _loadu_/_storeu_ for x86-64 intrinsics, what about the other architecture, is there an equivalent?) for all fixed-size Eigen type?
else just disable vectorization for fixed-size Eigen type since I believe penalty should be (almost) null between using vectorized instructions and just C++ code for these types
so I guess the solution is to use #define EIGEN_UNALIGNED_VECTORIZE 0, is it correct? So I have to put this #define everywhere before any Eigen/Dense include?
I don't want to replace everywhere with something like Matrix<double,2,2,DontAlign> or a new class
Finally, looking at Fixed-size vectorizable Eigen objects page, I think some types are missing. For the types I am using:
Eigen::Isometry3d, Eigen::Isometry3f?
Eigen::AngleAxisd, Eigen::AngleAxisf?

Related

How to integrate CUDA into an existing class structure?

I have a working CPU-based implementation of a simple deep learning framework where the main components are nodes of a computation graph which can perform computations on tensors.
Now I need to extend my implementation to GPU, I would like to use the existing class structure and only extend its functionality to GPU however, I'm not sure if that's even possible.
Most of the classes have methods that work on and return tensors such as:
tensor_ptr get_output();
where tensor_ptr is simply std::shared_ptr pointer of my tensor class. Now what I would like to do is to add a GPU version for each such method. The idea that I had in mind was to define a struct in a separate file tensor_gpu.cuh as follows
struct cu_shape {
int n_dims;
int x,y,z;
int len;
};
struct cu_tensor {
__device__ float * array;
cu_shape shape;
};
and then the previous function would be mirrored by:
cu_tensor cu_get_output();
The problem seems to be that the .cuh file gets treated as a regular header file and is compiled by the default c++ compiler and gives error:
error: attribute "device" does not apply here
on the line with the definition of __device__ float * array.
I am aware that you cannot mix CUDA and pure C++ code so I planned to hide all the CUDA runtime api functions into .cu files which would be defined in .h files. The problem is that I wanted to store the device pointers within my class and then pass those to the CUDA-calling functions.
This way I could still use all of my existing object structure and only modify the initialization and computation parts.
If a regular c++ class cannot touch anything with __device__ flag then how can you even integrate CUDA code into C++ code?
Can you only use CUDA runtime calls and keywords literally just in .cu files?
Or is there some smart way to hide the fact from c++ compiler that it is dealing with CUDA pointers?
Any insight is deeply appreciated!
EDIT: There seems to be a misunderstanding on my part. You don't need to put the __device__ flag and you'll still be able to use it as a pointer to device memory. If you have something valuable to add to good practices on CUDA integration or clarify something else, don't hesitate!
'__' is reserved for implementation purposes. That is why the Nvidia implementation can use __device__. But the other "regular" C++ implementation has its own reserved symbols.
In hindsight Nvidia could have designed a better solution but that is not going to help you here.

C++ STM32 user defined class constructor problems

I'm having a problem with self made classes. I have a class where i can input a data structure. If i call the function from "old main.cpp", it fills a pre-existing structure and initializes hardware upon this info.
main.cpp (old way of handling, witch works):
UART UARTObj;
IO_t UART1_RX;
IO_t UART1_TX;
...
IOObj.begin(&UART1_RX, GPIOA, 3, GPIO_Mode_AF, GPIO_OType_PP, GPIO_PuPd_UP, GPIO_Speed_Level_3, GPIO_AF_1);
UARTObj.begin(USART2, 230400, &UART1_RX, &UART1_TX);
Because i want to keep my pin assignments to one place, i created a class called IOPin.
IOPin.h :
typedef struct IO_t{
GPIO_InitTypeDef GPIOInfo;
GPIO_TypeDef* GPIOx;
uint8_t GPIO_AF;
bool init;
}IO_t;
class IOPin
{
public:
IOPin(GPIO_TypeDef*, uint16_t, GPIOMode_TypeDef, GPIOOType_TypeDef, GPIOPuPd_TypeDef, GPIOSpeed_TypeDef);
IOPin(GPIO_TypeDef*, uint16_t, GPIOMode_TypeDef, GPIOOType_TypeDef, GPIOPuPd_TypeDef, GPIOSpeed_TypeDef, uint8_t GPIO_AF);
IO_t *PIN = new IO_t;
virtual
~IOPin ();
};
The theory is that i call the constructor with the info that is required for each object.
Later on, i call a function with this class attached. I take the struct from this class and put it through the same function like the old way.
main.cpp (new way of handling, witch gives problems) :
IOPin UART_RX(GPIOA, 3, GPIO_Mode_AF, GPIO_OType_PP, GPIO_PuPd_UP, GPIO_Speed_Level_3, GPIO_AF_1);
IOPin UART_TX(GPIOA, 2, GPIO_Mode_AF, GPIO_OType_PP, GPIO_PuPd_UP, GPIO_Speed_Level_3, GPIO_AF_1);
....
IOObj.begin(&UART_RX);
IOObj.begin(&UART_TX);
UARTObj.begin(USART2, 230400, &UART_RX, &UART_TX);
I'm using GDB as debugger, and cannot see anything that is wrong.
Problems:
If i rebuild the project, it works once.
Resetting the platform does not help.
Anyone an idea why this approach with the class does not work?
I've tried making this a pointer, putting it into the header file, etc..
OK here's some tips that may eventually pan out to a full answer because it's hard to see exactly what's going on from the incomplete fragments posted in the question and this is going to be too long for a comment:
Don't use the heap when the stack will do. The C++11 declaration IO_t *PIN = new IO_t appears to be trivially replaceable with IO_t PIN. Where is PIN initialised with valid content? You don't show this, nor does PIN ever seem to be deleted.
Don't declare members virtual unless there's a very good reason for it. A virtual member instantly introduces a virtual function table, which is implemented in SRAM, which is your scarcest resource. Best practices that you were taught for PC programming don't apply here.
Well first, i would defently not use dynamic allocations that come with toolchain. But better i wouldnt use dynamic allocation at all. it is a microcontroller, you run privileged mode and have access to all of the memory available on board.
Second, check your linker script if and initialization part. make sure it is properly set up. Especially your vtable

C++ typedef struct vs class

I am not very familiar with C++ , and while I am trying some test programms I came to a question regarding the best if I may say so way to define some primitive elements in C++ code.
Let's take a class that describes rectangles. It would create them, draw them , rotate, resize, etc... now in most cases we have to deal with points on the canvas.
The rectangle its self is described by 2 points: Upper Left and Lower Right corner.
Also in order to Rotate it, you need an angle, and a point(anchor point).
Or maybe to move it you need a new anchor point for the given rectangle. I guess I made my point in using points .
So what is more efficient ? to define this primitive point as a class or as a struct?
class cPoint
{
public:
int X;
int Y;
};
or
typedef struct
{
int X;
int Y;
}sPoint;
Niether are more efficient. On a technical level, there is no difference between a class and a struct aside from default visibility of members (public in struct, private in class) and default inheritance model (public in struct, private in class).
They typedef struct {} name model is not idiomatic in C++. In fact, it's an abomination -- a holdover from C. Don't use this model. Use this struct name {}; instead. Using the typedef struct {} name; model doesn't gain you anything in C++ (it was needed in C), and may cost you sojmething in terms of maintainability. For instance, it might be harder to grep for typedef struct declarations. But since it doesn't gain you anything by doing this, there's no compelling reason not to simply do struct name {}; in C++.
Aside from technical issues, the differences between struct and class are semantic ones. It is traditional and expected that objects declared as structs are simple objects which consist of only public: data members (so-called PODs). If it has private or protected data, is expected to be derived from, or has any methods, it is declared as a class.
This guideline is open to interpretation, and is just that -- a guideline. There is nothing to prevent you from declaring an abstract base class as a struct, for example. However you may want to consider following this guideline in order to follow the Principle of Least Surprise, making your code easier to understand and maintain.
Both are nearly equivalent. More precisely, struct { is the same as class {public:
An optimizing compiler would probably generate exactly the same code. Use MELT or simply pass -fdump-tree-all (beware, that option produces hundreds of dump files) to g++ (assuming you use a recent GCC compiler) -preferably with some optimization like -O - to find out (or look at the produced assembler code with g++ -O -fverbose-asm -S ...)
typedef struct is actually the C way to do this. In C++ the two versions would look very similar: Your class as written, and the struct as follows:
struct sPoint
{
int X;
int Y;
};
The two forms are functionally identical but you can provide your future maintainers with significant information by picking and sticking to some convention about how they're used. For example one approach is that if you intend to make the data elements private and give it useful methods (for example if you use inline accessors you can insert print calls every time the methods are used) then by all means make it a class. If you intend to have the data be public and access them as members then make it a struct.
There's no performance difference between a class and a struct
A class defaults to private access, whilst a struct defaults to public access. If interoperability with C is an issue for you then you'll have to use struct, and obviously it can't have any member functions.
As an aside, there's no std::is_struct in the standard library. Instead the std::is_class method returns true if the type is a class or a structure.
Simply put, the first way is more C++, and the second way is more C. Both work, while the first way is more 'standard' now.
A struct in C++ is like a class that would have public members by default*
There is no other formal difference, though your code would probably look confusing if you started using structs as classes, especially the inheritance mechanisms where data privacy is a major benefit.
If you are about to declare private/protected members, there is really little point in using a struct, though your code will still be 100% legal.
*including inherited members, since the zealots and nitpickers around seem to think the point is of a capital importance and only ignorant heatens would fail to mention it.
Except for the fact that this fine doctrine point is defined (or rather hinted, since the inference that base classes are simply defining inherited members is left to the sagacity of the reader) in another verse of the Stoustrup Holy Bible, there is really nothing to fuss about IMHO.
To properly declare the struct in your example, use
struct sPoint {
int X;
int Y;
};
In general, structs and classes in C++ are identical, except that data is public in a struct by default. The other difference is that the struct keyword cannot be used as the type in a template, although a struct can be used as the parameter.
There is a more thorough discussion here: C++ - struct vs. class
technically, struct{} and class{} are the same.
they differ on semantic level, with different member visibility.
struct{...} is equivalent to class{public:...}
And, it is also legal to declare a class using struct keyword. (add member functions, access specifier to struct{})
Generally, using struct for Plain-Old-Data (POD) type, class for Object-Oriented type to improve readability.
typedef struct{} should only be used to hide implementation detail(eg: supply a close-source library to users)
From my opinion, in your case, using struct is better, because Point's member need to be modified directly by other code.

Is this a proper usage of union

I want to have named fields rather than indexed fields, but for some usage I have to iterate on the fields. Dumb simplified example:
struct named_states {float speed; float position;};
#define NSTATES (sizeof(struct named_states)/sizeof(float))
union named_or_indexed_states {
struct named_states named;
float indexed[NSTATES];
}
...
union named_or_indexed_states states,derivatives;
states.named.speed = 0;
states.named.position = 0;
...
derivatives.named.speed = acceleration;
derivatives.named.position= states.named.speed;
...
/* This code is in a generic library (consider nstates=NSTATES) */
for(i=0;i<nstates;i++)
states.indexed[i] += time_step*derivatives.indexed[i];
This avoid a copy from named struct to indexed array and vice-versa, and replace it with a generic solution and is thus easier to maintain (I have very few places to change when I augment the state vector).It also work well with various compiler I tested (several versions of gcc/g++ and MSVC).
But theorically, as I understand it, it does not strictly adhere to proper union usage since I wrote named field then read indexed field, and I'm not sure at all we can say that they share same struct fields...
Can you confirm that's it's theorically bad (non portable)?
Should I better use a cast, a memcpy() or something else?
Apart theory, from pragmatic POV is there any REAL portability issue (some incompatible compiler, exotic struct alignment, planned evolutions...)?
EDIT: your answers deserve a bit more clarification about my intentions that were:
to let programmer focus on domain specific equations and release them from maintenance of conversion functions (I don't know how to write a generic one, apart cast or memcpy tricks which do not seem more robust)
to add a bit more coding security by using struct (fully controlled by compiler) vs arrays (decalaration and access subject to more programmer mistakes)
to avoid polluting namespace too much with enum or #define
I need to know
how portable/dangerous is my steering off the standard (maybe some compiler with aggressive inlining will use full register solution and avoid any memory exchange ruining the trick),
and if I missed a standard solution that address above concerns in part or whole.
There's no requirement that the two fields in named_states line up the same way as the array elements. There's a good chance that they do, but you've got a compiler dependency there.
Here's a simple implementation in C++ of what you're trying to do:
struct named_or_indexed_states {
named_or_indexed_states() : speed(indexed[0], position(indexed[1]) { }
float &speed;
float &position;
float indexed[2];
};
If the size increase because of the reference elements is too much, use accessors:
struct named_or_indexed_states {
float indexed[2];
float& speed() { return indexed[0]; }
float& position() { return indexed[1]; }
};
The compiler will have no problem inlining the accessors, so reading or writing speed() and position() will be just as fast as if they were member data. You still have to write those annoying parentheses, though.
Only accessing last written member of union is well-defined; the code you presented uses, as far as only standard C (or C++) is concerned, undefined behavior - it may work, but it's wrong way to do it. It doesn't really matter that struct uses the same type as the type of array - there may be padding involved, as well as other invisible tricks used by compiler.
Some compilers, like GCC, do define it as allowed way to achieve type-punning. Now the question arises - are we talking about standard C (or C++), or GNU or any other extensions?
As for what you should use - proper conversion operators and/or constructors.
This may be a little old-fashioned, but what I would do in this situation is:
enum
{
F_POSITION,
F_SPEED,
F_COUNT
};
float states[F_COUNT];
Then you can reference them as:
states[F_POSITION] and states[F_SPEED].
That's one way that I might write this. I'm sure that there are many other possibilities.

C++ SLMATH library and SSE optimisation

I have a problem with the SLMATH library. Not sure if anyone uses it or has used it before? Anyway, the issue is that when I compile with SSE optimisation enabled (in VS 2010), I obviously have to provide a container that has the correct byte alignment for SSE type objects. This is OK because there's a little class in SLMATH that's an aligned vector; it aligns the vector allocation on an 8 byte boundary (i.e. I do not use std::vector<>).
Now the problem is that it appears any structure or class that contains something like slm::mat4 must also be aligned on such a boundary too, before it's put into a collection. So, for example, I used an aligned vector to create an array of slm::mat4, but if I create a class called Mesh, and Mesh contains an slm::mat4 and I want to put Mesh into a std::vector, well, I get strange memory errors whilst debugging.
So given the documentation is very sparse indeed, can anyone who's used this library tell me what, precisely, I have to do to use it with SSE optimisation? I mean I don't like the idea of having to use aligned vectors absolutely everywhere in place of std::vector just in case an slm:: component ends up being encapsulated into a class or structure somehow.
Alternatively, a fast vector/matrix/graphics math library as good as SLMATH would be great if there's on around.
Thanks for any advice you can offer.
Edit 1: Simple repro-case not using SLMATH illustrates the problem:
#include <vector>
class Item
{
public:
__declspec(align(8))
struct {
float a, b, c, d;
} Aligned;
};
int main()
{
// Error - won't compile.
std::vector<Item> myItems;
}
Robin
It might work if you when you declare your variable to use __declspec(align) on your variable declarations, or to wrap them within a struct that declares itself to be aligned properly. I have not used the library in question, but it seems that this might be the issue you are facing.
The reference for the align option can be found here.