Why convert to XMFLOAT instead of using XMVECTOR directly? - directx-12

While studying DirectX 12, it says that I should use XMFLOAT instead of XMVECTOR for class data members.
I do not understand why.
Is it wrong that defining the XMVECTOR variables in my class? or using XMVECTOR directly in my class?

This is covered in the DirectXMath Programmer's Guide on Docs.Microsoft which you should take the time to read. In particular, read the Getting Started section titled Type Usage Guidelines.
The XMVECTOR and XMMATRIX types are the work horses for the DirectXMath Library. Every operation consumes or produces data of these types. Working with them is key to using the library. However, since DirectXMath makes use of the SIMD instruction sets, these data types are subject to a number of restrictions. It is critical that you understand these restrictions if you want to make good use of the DirectXMath functions.
You should think of XMVECTOR as a proxy for a SIMD hardware register, and XMMATRIX as a proxy for a logical grouping of four SIMD hardware registers. These types are annotated to indicate they require 16-byte alignment to work correctly. The compiler will automatically place them correctly on the stack when they are used as a local variable, or place them in the data segment when they are used as a global variable. With proper conventions, they can also be passed safely as parameters to a function (see Calling Conventions for details).
Allocations from the heap, however, are more complicated. As such, you need to be careful whenever you use either XMVECTOR or XMMATRIX as a member of a class or structure to be allocated from the heap. On Windows x64, all heap allocations are 16-byte aligned, but for Windows x86, they are only 8-byte aligned. There are options for allocating structures from the heap with 16-byte alignment (see Properly Align Allocations). For C++ programs, you can use operator new/delete/new[]/delete[] overloads (either globally or class-specific) to enforce optimal alignment if desired.
Note  As an alternative to enforcing alignment in your C++ class directly by overloading new/delete, you can use the pImpl idiom. If you ensure your Impl class is aligned via __aligned_malloc internally, you can then freely use aligned types within the internal implementation. This is a good option when the 'public' class is a Windows Runtime ref class or intended for use with std::shared_ptr<>, which can otherwise disrupt careful alignment.
However, often it is easier and more compact to avoid using XMVECTOR or XMMATRIX directly in a class or structure. Instead, make use of the XMFLOAT3, XMFLOAT4, XMFLOAT4X3, XMFLOAT4X4, and so on, as members of your structure. Further, you can use the Vector Loading and Vector Storage functions to move the data efficiently into XMVECTOR or XMMATRIX local variables, perform computations, and store the results. There are also streaming functions (XMVector3TransformStream, XMVector4TransformStream, and so on) that efficiently operate directly on arrays of these data types.
This strict alignment requirement and the verbosity is by design as it makes it clear to the programmer when load/store overhead is being incurred. If, however, you find it a bit tedious, consider making use of the SimpleMath wrapper in the DirectX Tool Kit for DirectX 11 / DirectX 12
Keep in mind that DirectX has nothing particularly to do with DirectXMath. DirectXMath can work just as well with any version of Direct3D or even OpenGL as it just does CPU-side vector and matrix computations. DirectXMath doesn't really depend on the Windows OS at all; it's just a collection of C/C++ code using intrinsics so the compiler is all that really matters.
In fact, since you are apparently new enough to DirectX generally to not already know how to use DirectXMath, you should consider using DirectX 11 and not trying to jump into DirectX 12 cold. DirectX 12 is a very unforgiving API designed for graphics experts, and largely assumes you are already an expert in Direct3D 11 programming.
See DirectX Tool Kit for DirectX 12 tutorials and Getting Started with Direct3D 12.

You are in your right to store one or more XMVECTOR as object members.
When you do so, you need to be sure you respect the alignment constraint of XMVECTOR : 128bits. This is why they introduce the XMFLOATx to deal with storage without the alignment requirements.
Failure to do so may give you crashes at best and incorrect computation at worst. This is more likely to happen with a 32bits executable when new is not required to returned a memory align on at least 16 bytes.

Related

Overriding new and delete for DirectX structures

I follow some common DirectX tutorial on the web which features classes and structuring.
I need to allocate memory for XMVECTOR and XMMATRIX because of the specific memory allocation issue.
Now it all works, but I wish to make teh code cleaner. Question is:
Is there a way to override new and delete for those structures(so the malloc and pointer conversion details are hidden by the word "new", similarly with the delete) and if so how?
Edit 2014-07-11:
The comments so far suggested two way to workaround the problem by:
1) using a wrapper class for the structures and overloading/overriding delete and new for the wrapper class.
The problem with this is the obvious performance hit and the need to access member structure ever single time (less cleaner code, which defeats the whole purpose).
2) Using XMFLOAT4 and similar structures.
Problem with this is that this makes it easier with memory allocation but adds complications in conversions between types (as XMMATRIX and XMVECTOR are the ones returned by DirectXMath functions). Those conversions also make the code less cleaner so it's like replacing a pile of dog poop with cat poop, it's still poop in the end (yeah, the best comparison I could come up with to convey meaning).
The general recommendation is to use the various memory structures (XMFLOAT4, etc.) and Load/Stores. If you were targeting only x64 native, you could use XMVECTOR/XMMATRIX directly since that platform uses 16-byte aligned memory by default.
The overloading new/delete recommendation is not for XMVECTOR or XMMATRIX. Rather you can overload new/delete for your classes that contain these types to use __aligned_malloc( x, 16 ). Global overriding of new/delete is possible, but doing it per-class is actually the recommended solution. See the Scott Meyers "Effective C++" books for detailed discussion of overriding new/delete.
Another approach is to use the pImpl idiom like the DirectX Tool Kit does. The public class is unaligned but the internal class uses __aligned_malloc( x, 16 ). This actually works really well, and both the implementation and the client code doesn't end up looking like "poop".
Finally, you could make use of the SimpleMath wrapper in the DirectX Tool Kit which provides classes that derive from XMFLOAT4, etc. with implicit conversions. It is not as efficient, but it does look clean without worrying about the alignment issues.
BTW, this topic is covered in the DirectXMath Programmer's Guide on MSDN.

16 byte alignment issue

I am using DirectXMath, creating XMMatrix and XMVector in classes.
When I call XMMatrixMultiply it throws unhandled exception on it.
I have found online that it is an issue with byte alligment, since DirectXMath uses SIMD instructions set which results in missaligned heap allocation.
One of the proposed solution was to use XMFLOAT4X4 variables and then change them to temporary XMMatrix whenever needed, but it isn't the nicest and fastest solution imo.
Another one was to use _aligned_malloc, yet I have no idea whatsoever how to use it. I have never had to do any memory allocations and it is black magic for me.
Another one, was to overload new operator, yet they did not provide any information how to do it.
And regarding the overloading method, I'm not using new to create XMMatrix objects since I don't use them as pointers.
It was all working nice untill I have decided to split code into classes.
I think _alligned_malloc solution would be best here, but I have no idea how to use it, where and when to call it.
Unlike XMFLOAT4X4 and XMFLOAT4, which are safe to store, XMMATRIX and XMVECTOR are aliases for hardware registers (SSE, NEON, etc.). Since the library is abstracting away the register type and alignment requirements, you shouldn't attempt to align the types yourself, since you can easily create a program that happens to work on your machine but fails on another. You should either use the safe types for storage (e.g. XMFLOAT4) or pull up the abstraction and use the vector instructions directly, with special storage and alignment code paths in your application for each vector extension you're trying to support.
Also, using these registers outside of the context of the library's vector instructions might cause unexpected failures for other reasons. For example, if you store an XMMATRIX in your own struct, some architectures might fail to create copies of the struct.
Not pretend to be a complete answer.
There are some ways that you didn't mention:
#define _XM_NO_INTRINSICS_. Simple. Slow. Works right now, just one line of code. ;)
Don't store XMVECTOR and XMMATRIX on a heap. Store XMFLOAT4 or XMFLOAT4X4 and convert to SIMD types only when needed (so they will be stored on stack). Slower. Many code to change (probably).
Don't store XMVECTOR and XMMATRIX on a heap, part 2. Just store your classes on stack. Fast. Pretty hard. Many code to change (probably).
Use aligned allocator. Fast. Hard. Many hours to google, many code to write and debug.
Don't use DirectXMath (previously XMMath) library. Choose any other (there are plenty) or write your own. Fast. Many code to change (probably).
If you want aligned allocator, it has nothing to DirectX or DirectXMath. It is advanced topic. No one can give you complete solution. But, here are some results of googling:
returning aligned memory with new?
Harder to C++: Aligned Memory Allocation
many more
Be very attentive. With bad memory allocator you can introduce much more problems than solve.
Hope it helps somehow. Happy Coding! :)

Are classes guaranteed to have the same organization in memory between program runs?

I'm attempting to implement a Save/Load feature into my small game. To accomplish this I have a central class that stores all the important variables of the game such as position, etc. I then save this class as binary data to a file. Then simply load it back for the loading function. This seems to work MOST of the time, but if I change certain things then try to do a save/load the program will crash with memory access violations. So, are classes guaranteed to have the same structure in memory on every run of the program or can the data be arranged at random like a struct?
Response to Jesus - I mean the data inside the class, so that if I save the class to disk, when I load it back, will everything fit nicely back.
Save
fout.write((char*) &game,sizeof Game);
Load
fin.read((char*) &game, sizeof Game);
Your approach is extremely fragile. With many restrictions, it can work. These restrictions are not worth subjecting your users (or yourself!) to in typical cases.
Some Restrictions:
Never refer to external memory (e.g. a pointer or reference)
Forbid ABI changes/differences. Common case: memory layout and natural alignment on 32 vs 64 will vary. The user will need a new 'game' for each ABI.
Not endian compatible.
Altering your type's layouts will break your game. Changing your compiler options can do this.
You're basically limited to POD data.
Use offsets instead of pointers to refer to internal data (This reference would be in contiguous memory).
Therefore, you can safely use this approach in extremely limited situations -- that typically applies only to components of a system, rather than the entire state of the game.
Since this is tagged C++, "boost - Serialization" would be a good starting point. It's well tested and abstracts many of the complexities for you.
Even if this would work, just don't do it. Define a file format at the byte-level and write sensible 'convert to file format' and 'convert from file format' functions. You'll actually know the format of the file. You'll be able to extend it. Newer versions of the program will be able to read files from older versions. And you'll be able to update your platform, build tools, and classes without fear of causing your program to crash.
Yes, classes and structures will have the same layout in memory every time your program runs., although I can't say if the standard enforces this. The machine code generated by C++ compilers use "hard-coded" offsets to access type fields, so they are fixed. Realistically, the layout will only change if you modify the C++ class definition (field sizes, order, virtual methods, etc.), compile with a different compiler or change compiler options.
As long as the type is POD and without pointer fields, it should be safe to simply dump it to a file and read it back with the exact same program. However, because of the above-mentionned concerns, this approach is quite inflexible with regard to versionning and interoperability.
[edit]
To respond to your own edit, do not do this with your "Game" object! It certainly has pointers to other objects, and those objects will not exist anymore in memory or will be elsewhere when you'll reload your file.
You might want to take a look at this.
Classes are not guaranteed to have the same structure in memory as pointers can point to different locations in memory each time a class is created.
However, without posting code it is difficult to say with certainty where the problem is.

C++ object in memory

Is there a standard in storing a C++ objects in memory? I wish to set a char* pointer to a certain address in memory, so that I can read certain objects' variables directly from the memory byte by byte. When I am using Dev C++, the variables are stored one by one right in the memory address of an object in the order that they were defined. Now, can it be different while using a different compiler (like the variables being in a different order, or somewhere else)? Thank you in advance. :-)
The variables can't be in a different order, as far as I know. However, there may be varying amounts of padding between members. Also I think all bets are off with virtual classes and different implementations of user-defined types (such as std::string) may be completely different between libraries (or even build options).
It seems like a very suspicious thing to do. What do you need it for: to access private members?
I believe that the in-memory layout of objects is implementation defined - not the ordering, necessarily, but the amount of space. In particular, you will probably run into issues with byte-alignment and so-forth, especially across platforms.
Can you give us some details of what you're trying to do?
Implementations are free to do anything they want :P. However since C++ has to appeal to certain styles of programming, you will find a deterministic way of accessing your fields for your specific compiler/platform/cpu architecture.
If your byte ordering is varied on a different compiler, my first assumption would be byte packing issues. If you need the class to have a certain specific byte ordering first look up "#pragma pack" directives for your compiler... you can change the packing order into something less optimal but deterministic. Please note this piece of advice generally applies to POD data types.
The C++ compiler is not allowed to reorder variables within a visibility block (public, protected, etc). But it is allowed to reorder variables in separate visibility blocks. For example:
struct A {
int a;
short b;
char c;
};
struct B {
int a;
public:
short b;
protected:
char c;
};
In the above, the variables in A will always be laid out in the order a, b, c. The variables in B might be laid out in another order if the compiler chose. And, of course, there are alignment and packing requirements so there might be "spaces" between some of the variables if needed.
Keep in mind when working with multi-dimensional arrays that they are stored in Row Major Order.
The order of the variables should never change, but as others have said, the byte packing will vary. Another thing to consider is the endianness of the platform.
To get around the byte alignment/packing problem, most compilers offer some way to guide the process. In gcc you could use __attribute__((__packed__)) and in msvc #pragma pack.
I've worked with something that did this professionally, and as far as I could tell, it worked very specifically because it was decoding something another tool encoded, so we always knew exactly how it worked.
We did also use structs that we pointed at a memory address, then read out data via the struct's variables, but the structs notably included packing and we were on an embedded platform.
Basically, you can do this, so long as you know -exactly- how everything is constructed on a byte-by-byte level. (You might be able to get away with knowing when it's constructed the same way, which could save some time and learning)
It sounds like you want to marshall objects between machines over a TCP/IP connection. You can probably get away with this if the code was compiled with the same compiler on each end, otherwise, I'm not so sure. Keep in mind that if the platforms can be different, then you might need to take into account different processor endians!
Sounds like what you real want to ask is how to serialize your objects
http://dieharddeveloper.blogspot.in/2013/07/c-memory-layout-and-process-image.html
In the middle of the process's address space, there is a region is reserved for shared objects. When a new process is created, the process manager first maps the two segments from the executable into memory. It then decodes the program's ELF header. If the program header indicates that the executable was linked against a shared library, the process manager (PM) will extract the name of the dynamic interpreter from the program header. The dynamic interpreter points to a shared library that contains the runtime linker code.

Aligning a class to a class it inherits from? Force all stack alignment? Change sizeof?

I want to have a base class which dictates the alignment of the objects which inherit from it. This works fine for the heap because I can control how that gets allocated, and how arrays of it get allocated in a custom array template. However, the actual size of the class as far as C++ is concerned doesn't change at all. Which is to say if I perform pointer arithmetic on a pointer of the derived class then it will skip to the wrong location. That's issue one.
Issue two is stack alignment. There doesn't seem to be a way to directly force the entire stack to be 16 byte aligned pointers.
The only thing I can see to affect these at all is the vc++ and g++ compiler specific settings, but the problem here is I don't want to have to manually fix the alignment all the time. That would be bound to be error prone, not to mention a pain.
I could also maybe make some kind of smart pointer but that would introduce more issues, too.
If I just align the base class, will the child classes be aligned, as well? If so, that would solve most of my problems but that seems doubtful to be the case (though I will try this).
If a base class has a particular alignment requirement, then any derived classes will have at least that alignment (they could get a stricter alignment requirement due to their own members). Otherwise the compiler couldn't guarantee that accessing the base members would meet the requirements they have.
However, there's really nothing you can portably do for stack alignment - that's handled entirely by the compiler and the compiler's platform ABI requirements. As you indicate, a compiler option or maybe a pragma might let you exert some control, but nothing portable.
Keeping the Michael's answer in mind with respect to portability.. I assume you are aiming at SSE vectorization and are starting to feel that pain and that really ought to be a standard way to do it with explicit hints and low-level control. Both compilers you mention do strive for 16 byte alignment on stack by default.
So while compiler support does vary, can be changed, is challenged etc, it can also be heavily underutilised on one of them; it depends on the optimizers and types you are containing and naturally the vectorizer choices for sanity. Then again, it's not clear what you are doing and what the use is.
Since your problem definition without an example is sketchy, perhaps you can give a shot to declspec(align(16)) for California (with LTCG), and __attribute((aligned (16))) for gcc. If that helps, it'd be good to let us know and what specifically worked or not in example, which compiler warnings or errors too. I would also avoid any smart pointer usage for various reasons.
sizeof being in your question indicates you are also facing challenges with regards to variable arrays and pointer types, in which case the former might be problematic. And even though you can live with losing the copy constructor and assignment facilities, it will do no harm to inspect the machine code. That is, in case of complication, what you need to do is look at the output and in case of VC++ switch to Intel for better hints in Windows, and test for/compilet and runtime assert/trap failures early or for the other one pick any free object file analysis tool. Then you just work around the clearly problematic constructs in a sanest way you can, something that is always possible.
We had a similar problem on PARISC HPUX where the only atomic construct was a 4 byte atomic clear that required 16 byte alignment.
We could fudge it by declaring our 4 byte quantity like so:
struct Stupid_HPUX
{
int 4byteLockWordInHere[4] ;
} ;
and picking the right one to use at runtime. For your problem, you could do something similar
union rawMemory
{
int blah[(sizeof(yourtype) + 16)/4] ;
char c[1] ;
} u ;
then use placement new with a runtime determined address to fix the location required.