C++ SLMATH library and SSE optimisation

C++ SLMATH library and SSE optimisation - c++

I have a problem with the SLMATH library. Not sure if anyone uses it or has used it before? Anyway, the issue is that when I compile with SSE optimisation enabled (in VS 2010), I obviously have to provide a container that has the correct byte alignment for SSE type objects. This is OK because there's a little class in SLMATH that's an aligned vector; it aligns the vector allocation on an 8 byte boundary (i.e. I do not use std::vector<>).
Now the problem is that it appears any structure or class that contains something like slm::mat4 must also be aligned on such a boundary too, before it's put into a collection. So, for example, I used an aligned vector to create an array of slm::mat4, but if I create a class called Mesh, and Mesh contains an slm::mat4 and I want to put Mesh into a std::vector, well, I get strange memory errors whilst debugging.
So given the documentation is very sparse indeed, can anyone who's used this library tell me what, precisely, I have to do to use it with SSE optimisation? I mean I don't like the idea of having to use aligned vectors absolutely everywhere in place of std::vector just in case an slm:: component ends up being encapsulated into a class or structure somehow.
Alternatively, a fast vector/matrix/graphics math library as good as SLMATH would be great if there's on around.
Thanks for any advice you can offer.
Edit 1: Simple repro-case not using SLMATH illustrates the problem:
#include <vector>
class Item
{
public:
__declspec(align(8))
struct {
float a, b, c, d;
} Aligned;
};
int main()
{
// Error - won't compile.
std::vector<Item> myItems;
}
Robin

It might work if you when you declare your variable to use __declspec(align) on your variable declarations, or to wrap them within a struct that declares itself to be aligned properly. I have not used the library in question, but it seems that this might be the issue you are facing.
The reference for the align option can be found here.

Related

Transitive effect of Eigen EIGEN_MAKE_ALIGNED_OPERATOR_NEW?

Recently, I was made aware of the potential issues of memory alignment for Fixed-size vectorizable Eigen objects.
The correct code as stated in the doc:
class Foo
{
...
Eigen::Vector2d v;
...
public:
EIGEN_MAKE_ALIGNED_OPERATOR_NEW
};
...
Foo *foo = new Foo;
I would like to know if this code is ok or not?
class Foo2
{
...
Foo foo;
...
};
...
Foo2 *foo = new Foo2; //?
Or should EIGEN_MAKE_ALIGNED_OPERATOR_NEW be again added in the Foo2 class?
This is what is suggested here I think:
If we were to add EIGEN_MAKE_ALIGNED_OPERATOR_NEW this would only solve the problem for the Cartographer library itself. Users of the library would also have to add EIGEN_MAKE_ALIGNED_OPERATOR_NEW to classes containing vectorized Cartographer classes. This sounds like a maintenance nightmare.
I have no experience with new operator overloading. I think the question is more general and would somehow be related to how new operator works in C++. For instance is the overloaded new operator in Foo be called by the default new operator in Foo2? What about inheritance? If Foo2 inherits from Foo, should we put also EIGEN_MAKE_ALIGNED_OPERATOR_NEW in Foo2?
Since I was only aware of this topic recently, I did many research and found the following:
default alignment on x86-64 is 16 bytes, so it is fine to not have EIGEN_MAKE_ALIGNED_OPERATOR_NEW (if only SSE is enabled)
unless your code is compiled for more recent SIMD sets (e.g. AVX2 with -march=native to have all the optimizations on a local computer), EIGEN_MAKE_ALIGNED_OPERATOR_NEW is now needed
what about the other architecture? For instance for ARM, any issue if we don't declare EIGEN_MAKE_ALIGNED_OPERATOR_NEW and NEON is enabled?
I found suggestion to use template <typename Scalar> using Isometry3 = Eigen::Transform<Scalar, 3, Eigen::Isometry | Eigen::DontAlign> instead of Isometry
still need to think on how to be able to easily use Eigen type (e.g. Isometry3d) in the code without the alignment issue. So add a new type MyIsometry3d that inherits from Eigen::Transform<double, 3, Eigen::Isometry | Eigen::DontAlign> for instance?
More generally, I would like to "disable alignment" (or vectorization) in fixed-size Eigen type:
I would like to keep the syntax, for instance keeping Isometry3d in the code
and not be bothered with alignment issue when using Isometry3d in a class or when using std::vector<Isometry3d>
something to tell Eigen to always use unaligned load/store (e.g. _loadu_/_storeu_ for x86-64 intrinsics, what about the other architecture, is there an equivalent?) for all fixed-size Eigen type?
else just disable vectorization for fixed-size Eigen type since I believe penalty should be (almost) null between using vectorized instructions and just C++ code for these types
so I guess the solution is to use #define EIGEN_UNALIGNED_VECTORIZE 0, is it correct? So I have to put this #define everywhere before any Eigen/Dense include?
I don't want to replace everywhere with something like Matrix<double,2,2,DontAlign> or a new class
Finally, looking at Fixed-size vectorizable Eigen objects page, I think some types are missing. For the types I am using:
Eigen::Isometry3d, Eigen::Isometry3f?
Eigen::AngleAxisd, Eigen::AngleAxisf?

Defining my 2D array to support [] and () access operations

I am writing an image processing application in C++. To define my image type, I am considering using either a boost multi array, or boost ublas matrix or Eigen 2D matrix. I would like to carefully benchmark these for all various operations I intend to do and choose one accordingly.
However, I cannot afford to pause development. Hence, I would like to start writing code in such a way that it should be easy to swap my image type definition from either ublas, Eigen or multiarray to another. I don't think typedef will save me here, because the element accessor operators are different in these libraries.
For instance, you access elements of 2D array 'myArray' as follows in each of the three libraries :
Boost multiarray : myArray[x][y]
Boost ublas : myArray (x,y)
Eigen 2DMatrix : myArray(x,y)
You can see the conflict is between [][] vs ( _ , _ ) way of accessing elements. Hence, I cannot write code for one type, and make it work with another type using a simple typedef.
Any ideas how to get around this?
I am thinking of wrapping the underlying type into a new universal type which standardizes access methodology, then I can simply swap one type for another using typedef,
Are there any pitfalls I should be worried about?
Will it cost me a lot of efficiency?
What languages feature can exploit best here?
If you could please help me get started, I will write a code and paste it here for further review.
P.S. I am not using any of the rich API of these three types. I am simply creating them, and accessing their elements.

I would use the Proxy Pattern for this case. You just easily wrap around the proxy and define a unique interface which will use your underlying objects. Hope this helps....
Edit:
I guess this link should be useful as well: Template Proxy

If you don't want to lose any efficiency, you could use a define:
typedef boost::multiarray MyArray ;
#define GET_AT(a,i,j) a[i][j]
Then you just change the typedef and define when you switch type. You could also do a template function (or proper overloading function):
template <class Array>
inline ... getAt (Array <...> const& a, int i, int j) { return a[i][j] ; }
inline ... getAt (2DMatrix <...> const& a, int i, int j) { return a(i,j) ; }
Anyway, if you'd prefer wrapping your class into a single interface, I think using the proper optimization will ensure you no efficiency loss.

cast void* to a struct with an array member

I'm trying to directly cast a stream of data into a structure that actually has a variable number of other structures as members. Here's an example:
struct player
{
double lastTimePlayed;
double timeJoined;
};
struct team
{
uint32_t numberOfPlayers;
player everyone[];
};
then I call:
team *myTeam = (cache_team*)get_stream();
This should work like some kind of serialization, I know my stream is structured exactly as represented above, but I have the problem of the numberOfPlayers being a variable.
My stream starts with 4 bytes representing the number of players of the team, then it contains each player (in this case, each player has only lastTimePlayed and timeJoined).
The code posted seems to be working, I still get a warning from the compiler because of the default assignment and copy constructors, but my question is it it's possible to do this some other way, a better way.
BTW, my stream is actually a direct mapping to a file, and my goal is to use the structure as if it was the file itself (that part is working properly).

uint32_t is 4 bytes. If it starts with 8 bytes you want a uint64_t.
If you want to get rid of the warning you can make the default copy and assignment private:
struct team {
// ...
private:
team(const team &);
team &operator=(const team &);
};
Since you'd probably want to pass everything by pointer anyways it'll prevent ever doing an accidental copy.
Casting the mapped pointer to the struct is probably the easiest way. The big thing is to just make sure everything is lining up correctly.

Visual Studio 2012 gives the following:
warning C4200: nonstandard extension used : zero-sized array in struct/union
A structure or union contains an array with zero size.
Level-2 warning when compiling a C++ file and a Level-4 warning when compiling a C file.
This seems to be a legitimate message. I would recommend you to modify your struct to:
struct team
{
uint32_t numberOfPlayers;
player everyone[1];
};
Such definition is less elegant, but the result will be basically the same. C++ is not checking the value of indexes. Tons of code are using this.
New development should avoid the "array size violations" where possible. Describing external structures in this way is acceptable.

Both scaryrawr's solution, and yours do the trick, but I was in fact searching for another way.
I in fact did find it. I used an uint32_t everyonePtr instead of the array, then I will convert the uint32_t to a pointer using a reinterpret_cast like this:
player *entries = reinterpret_cast<player*>(&team->everyonePtr);
then my mapping will work as expected, and I think it's easier to understand than the array[1] or even the empty one. Thank you guys.

Is this a proper usage of union

I want to have named fields rather than indexed fields, but for some usage I have to iterate on the fields. Dumb simplified example:
struct named_states {float speed; float position;};
#define NSTATES (sizeof(struct named_states)/sizeof(float))
union named_or_indexed_states {
struct named_states named;
float indexed[NSTATES];
}
...
union named_or_indexed_states states,derivatives;
states.named.speed = 0;
states.named.position = 0;
...
derivatives.named.speed = acceleration;
derivatives.named.position= states.named.speed;
...
/* This code is in a generic library (consider nstates=NSTATES) */
for(i=0;i<nstates;i++)
states.indexed[i] += time_step*derivatives.indexed[i];
This avoid a copy from named struct to indexed array and vice-versa, and replace it with a generic solution and is thus easier to maintain (I have very few places to change when I augment the state vector).It also work well with various compiler I tested (several versions of gcc/g++ and MSVC).
But theorically, as I understand it, it does not strictly adhere to proper union usage since I wrote named field then read indexed field, and I'm not sure at all we can say that they share same struct fields...
Can you confirm that's it's theorically bad (non portable)?
Should I better use a cast, a memcpy() or something else?
Apart theory, from pragmatic POV is there any REAL portability issue (some incompatible compiler, exotic struct alignment, planned evolutions...)?
EDIT: your answers deserve a bit more clarification about my intentions that were:
to let programmer focus on domain specific equations and release them from maintenance of conversion functions (I don't know how to write a generic one, apart cast or memcpy tricks which do not seem more robust)
to add a bit more coding security by using struct (fully controlled by compiler) vs arrays (decalaration and access subject to more programmer mistakes)
to avoid polluting namespace too much with enum or #define
I need to know
how portable/dangerous is my steering off the standard (maybe some compiler with aggressive inlining will use full register solution and avoid any memory exchange ruining the trick),
and if I missed a standard solution that address above concerns in part or whole.

There's no requirement that the two fields in named_states line up the same way as the array elements. There's a good chance that they do, but you've got a compiler dependency there.
Here's a simple implementation in C++ of what you're trying to do:
struct named_or_indexed_states {
named_or_indexed_states() : speed(indexed[0], position(indexed[1]) { }
float &speed;
float &position;
float indexed[2];
};
If the size increase because of the reference elements is too much, use accessors:
struct named_or_indexed_states {
float indexed[2];
float& speed() { return indexed[0]; }
float& position() { return indexed[1]; }
};
The compiler will have no problem inlining the accessors, so reading or writing speed() and position() will be just as fast as if they were member data. You still have to write those annoying parentheses, though.

Only accessing last written member of union is well-defined; the code you presented uses, as far as only standard C (or C++) is concerned, undefined behavior - it may work, but it's wrong way to do it. It doesn't really matter that struct uses the same type as the type of array - there may be padding involved, as well as other invisible tricks used by compiler.
Some compilers, like GCC, do define it as allowed way to achieve type-punning. Now the question arises - are we talking about standard C (or C++), or GNU or any other extensions?
As for what you should use - proper conversion operators and/or constructors.

This may be a little old-fashioned, but what I would do in this situation is:
enum
{
F_POSITION,
F_SPEED,
F_COUNT
};
float states[F_COUNT];
Then you can reference them as:
states[F_POSITION] and states[F_SPEED].
That's one way that I might write this. I'm sure that there are many other possibilities.

put different class in hierarchy in one container in C++

Some times we have to put different objects in the same hierarchy in one container. I read some article saying there are some tricks and traps. However, I have no big picture about this question. Actually, this happens a lot in the real word.
For example, a parking lot has to contain different types of cars; a zoo has to contain different types of animals; a book store has to contain different types of books.
I remember that one article saying neither of the following is a good design, but I forgot where it is.
vector<vehicle> parking_lot;
vector<*vehicle> parking_lot;
Can anybody offer some basic rules for this kind of question?

I learnt a lot writing my reply to a similar question by the same author, so I couldn't resist to do the same here. In short, I've written a benchmark to compare the following approaches to the problem of storing heterogeneous elements in a standard container:
Make a class for each type of element and have them all inherit from a common base and store polymorphic base pointers in a std::vector<boost::shared_ptr<Base> >. This is probably the more general and flexible solution:
struct Shape {
...
};
struct Point : public Shape {
...
};
struct Circle : public Shape {
...
};
std::vector<boost::shared_ptr<Shape> > shapes;
shapes.push_back(new Point(...));
shapes.push_back(new Circle(...));
shapes.front()->draw(); // virtual call
Same as (1) but store the polymorphic pointers in a boost::ptr_vector<Base>. This is a bit less general because the elements are owned exclusively by the vector, but it should suffice most of the times. One advantage of boost::ptr_vector is that it has the interface of a std::vector<Base> (without the *), so its simpler to use.
boost::ptr_vector<Shape> shapes;
shapes.push_back(new Point(...));
shapes.push_back(new Circle(...));
shapes.front().draw(); // virtual call
Use a C union that can contain all possible elements and then use a std::vector<UnionType>. This is not very flexible as we need to know all element types in advance (they are hard-coded into the union) and also unions are well known for not interacting nicely with other C++ constructs (for example, the stored types can't have constructors).
struct Point {
...
};
struct Circle {
...
};
struct Shape {
enum Type { PointShape, CircleShape };
Type type;
union {
Point p;
Circle c;
} data;
};
std::vector<Shape> shapes;
Point p = { 1, 2 };
shapes.push_back(p);
if(shapes.front().type == Shape::PointShape)
draw_point(shapes.front());
Use a boost::variant that can contain all possible elements and then use a std::vector<Variant>. This is not very flexible like the union but the code to deal with it is much more elegant.
struct Point {
...
};
struct Circle {
...
};
typedef boost::variant<Point, Circle> Shape;
std::vector<Shape> shapes;
shapes.push_back(Point(1,2));
draw_visitor(shapes.front()); // use boost::static_visitor
Use boost::any (which can contain anything) and then a std::vector<boost::any>. That is very flexible but the interface is a little clumsy and error prone.
struct Point {
...
};
struct Circle {
...
};
typedef boost::any Shape;
std::vector<Shape> shapes;
shapes.push_back(Point(1,2));
if(shapes.front().type() == typeid(Point))
draw_point(shapes.front());
This is the code of the full benchmark program (doesn't run on codepad for some reason). And here are my performance results:
time with hierarchy and boost::shared_ptr: 0.491 microseconds
time with hierarchy and boost::ptr_vector: 0.249 microseconds
time with union: 0.043 microseconds
time with boost::variant: 0.043 microseconds
time with boost::any: 0.322 microseconds
My conclusions:
Use vector<shared_ptr<Base> > only if you need the flexibility provided by runtime polymorphism and if you need shared ownership. Otherwise you'll have significant overhead.
Use boost::ptr_vector<Base> if you need runtime polymorphism but don't care about shared ownership. It will be significantly faster than the shared_ptr counterpart and the interface will be more friendly (stored elements not presented like pointers).
Use boost::variant<A, B, C> if you don't need much flexibility (i.e. you have a small set of types which will not grow). It will be lighting fast and the code will be elegant.
Use boost::any if you need total flexibility (you want to store anything).
Don't use unions. If you really need speed then boost::variant is as fast.
Before I finish I want to mention that a vector of std::unique_ptr will be a good option when it becomes widely available (I think it's already in VS2010)

The problem with vector<vehicle> is that the object only holds vehicles. The problem with vector<vehicle*> is that you need to allocate and, more importantly, free the pointers appropriately.
This might be acceptable, depending on your project, etc...
However, one usually uses some kind of smart-ptr in the vector (vector<boost::shared_ptr<vehicle>> or Qt-something, or one of your own) that handles deallocation, but still permits storing different types objects in the same container.
Update
Some people have, in other answers/comments, also mentioned boost::ptr_vector. That works well as a container-of-ptr's too, and solves the memory deallocation problem by owning all the contained elements. I prefer vector<shared_ptr<T>> as I can then store objects all over the place, and move them using in and out of containers w/o issues. It's a more generic usage model that I've found is easier for me and others to grasp, and applies better to a larger set of problems.

The problems are:
You cannot place polymorphic objects into a container since they may differ in size --i.e., you must use a pointer.
Because of how containers work a normal pointer / auto pointer is not suitable.
The solution is :
Create a class hierarchy, and use at least one virtual function in your base class (if you can't think of any function to virtualize, virtualize the destructor -- which as Neil pointed out is generally a must).
Use a boost::shared_pointer (this will be in the next c++ standard) -- shared_ptr handles being copied around ad hoc, and containers may do this.
Build a class hierarchy and allocate your objects on the heap --i.e., by using new. The pointer to the base class must be encapsulated by a shared_ptr.
place the base class shared_pointer into your container of choice.
Once you understand the "whys and hows" of the points above look at the boost ptr_containers -- thanks to Manual for the tip.

Say vehicle is a base class, that has certain properties, then, inheriting from it you have say a car, and a truck. Then you can just do something like:
std::vector<vehicle *> parking_lot;
parking_lot.push_back(new car(x, y));
parking_lot.push_back(new truck(x1, y1));
This would be perfectly valid, and in fact very useful sometimes. The only requirement for this type of object handling is sane hierarchy of objects.
Other popular type of objects that can be used like that are e.g. people :) you see that in almost every programming book.
EDIT:
Of course that vector can be packed with boost::shared_ptr or std::tr1::shared_ptr instead of raw pointers for ease of memory management. And in fact that's something I would recommend to do by all means possible.
EDIT2:
I removed a not very relevant example, here's a new one:
Say you are to implement some kind of AV scanning functionality, and you have multiple scanning engines. So you implement some kind of engine management class, say scan_manager which can call bool scan(...) function of those. Then you make an engine interface, say engine. It would have a virtual bool scan(...) = 0; Then you make a few engines like my_super_engine and my_other_uber_engine, which both inherit from engine and implement scan(...). Then your engine manager would somewhere during initialization fill that std::vector<engine *> with instances of my_super_engine and my_other_uber_engine and use them by calling bool scan(...) on them either sequentially, or based on whatever types of scanning you'd like to perform. Obviously what those engines do in scan(...) remains unknown, the only interesting bit is that bool, so the manager can use them all in the same way without any modification.
Same can be applied to various game units, like scary_enemies and those would be orks, drunks and other unpleasant creatures. They all implement void attack_good_guys(...) and your evil_master would make many of them and call that method.
This is indeed a common practice, and I would hardly call it bad design for as long as all those types actually are related.

You can refer to this Stroustrup's answer to the question Why can't I assign a vector< Apple*> to a vector< Fruit*>?.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js