Overriding new and delete for DirectX structures - c++

I follow some common DirectX tutorial on the web which features classes and structuring.
I need to allocate memory for XMVECTOR and XMMATRIX because of the specific memory allocation issue.
Now it all works, but I wish to make teh code cleaner. Question is:
Is there a way to override new and delete for those structures(so the malloc and pointer conversion details are hidden by the word "new", similarly with the delete) and if so how?
Edit 2014-07-11:
The comments so far suggested two way to workaround the problem by:
1) using a wrapper class for the structures and overloading/overriding delete and new for the wrapper class.
The problem with this is the obvious performance hit and the need to access member structure ever single time (less cleaner code, which defeats the whole purpose).
2) Using XMFLOAT4 and similar structures.
Problem with this is that this makes it easier with memory allocation but adds complications in conversions between types (as XMMATRIX and XMVECTOR are the ones returned by DirectXMath functions). Those conversions also make the code less cleaner so it's like replacing a pile of dog poop with cat poop, it's still poop in the end (yeah, the best comparison I could come up with to convey meaning).

The general recommendation is to use the various memory structures (XMFLOAT4, etc.) and Load/Stores. If you were targeting only x64 native, you could use XMVECTOR/XMMATRIX directly since that platform uses 16-byte aligned memory by default.
The overloading new/delete recommendation is not for XMVECTOR or XMMATRIX. Rather you can overload new/delete for your classes that contain these types to use __aligned_malloc( x, 16 ). Global overriding of new/delete is possible, but doing it per-class is actually the recommended solution. See the Scott Meyers "Effective C++" books for detailed discussion of overriding new/delete.
Another approach is to use the pImpl idiom like the DirectX Tool Kit does. The public class is unaligned but the internal class uses __aligned_malloc( x, 16 ). This actually works really well, and both the implementation and the client code doesn't end up looking like "poop".
Finally, you could make use of the SimpleMath wrapper in the DirectX Tool Kit which provides classes that derive from XMFLOAT4, etc. with implicit conversions. It is not as efficient, but it does look clean without worrying about the alignment issues.
BTW, this topic is covered in the DirectXMath Programmer's Guide on MSDN.

Related

C++ std features and Binary size

I was told recently in a job interview their project works on building the smallest size binary for their application (runs embedded) so I would not be able to use things such as templating or smart pointers as these would increase the binary size, they generally seemed to imply using things from std would be generally a no go (not all cases).
After the interview, I tried to do research online about coding and what features from standard lib caused large binary sizes and I could find basically nothing in regards to this. Is there a way to quantify using certain features and the size impact they would have (without needing to code 100 smart pointers in a code base vs self managed for example).
This question probably deserves more attention than it’s likely to get, especially for people trying to pursue a career in embedded systems. So far the discussion has gone about the way that I would expect, specifically a lot of conversation about the nuances of exactly how and when a project built with C++ might be more bloated than one written in plain C or a restricted C++ subset.
This is also why you can’t find a definitive answer from a good old fashioned google search. Because if you just ask the question “is C++ more bloated than X?”, the answer is always going to be “it depends.”
So let me approach this from a slightly different angle. I’ve both worked for, and interviewed at companies that enforced these kinds of restrictions, I’ve even voluntarily enforced them myself. It really comes down to this. When you’re running an engineering organization with more than one person with plans to keep hiring, it is wildly impractical to assume everyone on your team is going to fully understand the implications of using every feature of a language. Coding standards and language restrictions serve as a cheap way to prevent people from doing “bad things” without knowing they’re doing “bad things”.
How you define a “bad thing” is then also context specific. On a desktop platform, using lots of code space isn’t really a “bad” enough thing to rigorously enforce. On a tiny embedded system, it probably is.
C++ by design makes it very easy for an engineer to generate lots of code without having to type it out explicitly. I think that statement is pretty self-evident, it’s the whole point of meta-programming, and I doubt anyone would challenge it, in fact it’s one of the strengths of the language.
So then coming back to the organizational challenges, if your primary optimization variable is code space, you probably don’t want to allow people to use features that make it trivial to generate code that isn’t obvious. Some people will use that feature responsibly and some people won’t, but you have to standardize around the least common denominator. A C compiler is very simple. Yes you can write bloated code with it, but if you do, it will probably be pretty obvious from looking at it.
(Partially extracted from comments I wrote earlier)
I don't think there is a comprehensive answer. A lot also depends on the specific use case and needs to be judged on a case-by-case basis.
Templates
Templates may result in code bloat, yes, but they can also avoid it. If your alternative is introducing indirection through function pointers or virtual methods, then the templated function itself may become bigger in code size simply because function calls take several instructions and removes optimization potential.
Another aspect where they can at least not hurt is when used in conjunction with type erasure. The idea here is to write generic code, then put a small template wrapper around it that only provides type safety but does not actually emit any new code. Qt's QList is an example that does this to some extend.
This bare-bones vector type shows what I mean:
class VectorBase
{
protected:
void** start, *end, *capacity;
void push_back(void*);
void* at(std::size_t i);
void clear(void (*cleanup_function)(void*));
};
template<class T>
class Vector: public VectorBase
{
public:
void push_back(T* value)
{ this->VectorBase::push_back(value); }
T* at(std::size_t i)
{ return static_cast<T*>(this->VectorBase::at(i)); }
~Vector()
{ clear(+[](void* object) { delete static_cast<T*>(object); }); }
};
By carefully moving as much code as possible into the non-templated base, the template itself can focus on type-safety and to provide necessary indirections without emitting any code that wouldn't have been here anyway.
(Note: This is just meant as a demonstration of type erasure, not an actually good vector type)
Smart pointers
When written carefully, they won't generate much code that wouldn't be there anyway. Whether an inline function generates a delete statement or the programmer does it manually doesn't really matter.
The main issue that I see with those is that the programmer is better at reasoning about code and avoiding dead code. For example even after a unique_ptr has been moved away, the destructor of the pointer still has to emit code. A programmer knows that the value is NULL, the compiler often doesn't.
Another issue comes up with calling conventions. Objects with destructors are usually passed on the stack, even if you declare them pass-by-value. Same for return values. So a function unique_ptr<foo> bar(unique_ptr<foo> baz) will have higher overhead than foo* bar(foo* baz) simply because pointers have to be put on and off the stack.
Even more egregiously, the calling convention used for example on Linux makes the caller clean up parameters instead of the callee. That means if a function accepts a complex object like a smart pointer by value, a call to the destructor for that parameter is replicated at every call site, instead of putting it once inside the function. Especially with unique_ptr this is so stupid because the function itself may know that the object has been moved away and the destructor is superfluous; but the caller doesn't know this (unless you have LTO).
Shared pointers are a different beast altogether, simply because they allow a lot of different tradeoffs. Should they be atomic? Should they allow type casting, weak pointers, what indirection is used for destruction? Do you really need two raw pointers per shared pointer or can the reference counter be accessed through shared object?
Exceptions, RTTI
Generally avoided and removed via compiler flags.
Library components
On a bare-metal system, pulling in parts of the standard library can have a significant effect that can only be measured after the linker step. I suggest any such project use continuous integration and tracks the code size as a metric.
For example I once added a small feature, I don't remember which, and in its error handling it used std::stringstream. That pulled in the entire iostream library. The resulting code exceeded my entire RAM and ROM capacity. IIRC the issue was that even though exception handling was deactivated, the exception message was still being set up.
Move constructors and destructors
It's a shame that C++'s move semantics aren't the same as for example Rust's where objects can be moved with a simple memcpy and then "forgetting" their original location. In C++ the destructor for a moved object is still invoked, which requires more code in the move constructor / move assignment operator, and in the destructor.
Qt for example accounts for such simple cases in its meta type system.

Why convert to XMFLOAT instead of using XMVECTOR directly?

While studying DirectX 12, it says that I should use XMFLOAT instead of XMVECTOR for class data members.
I do not understand why.
Is it wrong that defining the XMVECTOR variables in my class? or using XMVECTOR directly in my class?
This is covered in the DirectXMath Programmer's Guide on Docs.Microsoft which you should take the time to read. In particular, read the Getting Started section titled Type Usage Guidelines.
The XMVECTOR and XMMATRIX types are the work horses for the DirectXMath Library. Every operation consumes or produces data of these types. Working with them is key to using the library. However, since DirectXMath makes use of the SIMD instruction sets, these data types are subject to a number of restrictions. It is critical that you understand these restrictions if you want to make good use of the DirectXMath functions.
You should think of XMVECTOR as a proxy for a SIMD hardware register, and XMMATRIX as a proxy for a logical grouping of four SIMD hardware registers. These types are annotated to indicate they require 16-byte alignment to work correctly. The compiler will automatically place them correctly on the stack when they are used as a local variable, or place them in the data segment when they are used as a global variable. With proper conventions, they can also be passed safely as parameters to a function (see Calling Conventions for details).
Allocations from the heap, however, are more complicated. As such, you need to be careful whenever you use either XMVECTOR or XMMATRIX as a member of a class or structure to be allocated from the heap. On Windows x64, all heap allocations are 16-byte aligned, but for Windows x86, they are only 8-byte aligned. There are options for allocating structures from the heap with 16-byte alignment (see Properly Align Allocations). For C++ programs, you can use operator new/delete/new[]/delete[] overloads (either globally or class-specific) to enforce optimal alignment if desired.
Note  As an alternative to enforcing alignment in your C++ class directly by overloading new/delete, you can use the pImpl idiom. If you ensure your Impl class is aligned via __aligned_malloc internally, you can then freely use aligned types within the internal implementation. This is a good option when the 'public' class is a Windows Runtime ref class or intended for use with std::shared_ptr<>, which can otherwise disrupt careful alignment.
However, often it is easier and more compact to avoid using XMVECTOR or XMMATRIX directly in a class or structure. Instead, make use of the XMFLOAT3, XMFLOAT4, XMFLOAT4X3, XMFLOAT4X4, and so on, as members of your structure. Further, you can use the Vector Loading and Vector Storage functions to move the data efficiently into XMVECTOR or XMMATRIX local variables, perform computations, and store the results. There are also streaming functions (XMVector3TransformStream, XMVector4TransformStream, and so on) that efficiently operate directly on arrays of these data types.
This strict alignment requirement and the verbosity is by design as it makes it clear to the programmer when load/store overhead is being incurred. If, however, you find it a bit tedious, consider making use of the SimpleMath wrapper in the DirectX Tool Kit for DirectX 11 / DirectX 12
Keep in mind that DirectX has nothing particularly to do with DirectXMath. DirectXMath can work just as well with any version of Direct3D or even OpenGL as it just does CPU-side vector and matrix computations. DirectXMath doesn't really depend on the Windows OS at all; it's just a collection of C/C++ code using intrinsics so the compiler is all that really matters.
In fact, since you are apparently new enough to DirectX generally to not already know how to use DirectXMath, you should consider using DirectX 11 and not trying to jump into DirectX 12 cold. DirectX 12 is a very unforgiving API designed for graphics experts, and largely assumes you are already an expert in Direct3D 11 programming.
See DirectX Tool Kit for DirectX 12 tutorials and Getting Started with Direct3D 12.
You are in your right to store one or more XMVECTOR as object members.
When you do so, you need to be sure you respect the alignment constraint of XMVECTOR : 128bits. This is why they introduce the XMFLOATx to deal with storage without the alignment requirements.
Failure to do so may give you crashes at best and incorrect computation at worst. This is more likely to happen with a 32bits executable when new is not required to returned a memory align on at least 16 bytes.

16 byte alignment issue

I am using DirectXMath, creating XMMatrix and XMVector in classes.
When I call XMMatrixMultiply it throws unhandled exception on it.
I have found online that it is an issue with byte alligment, since DirectXMath uses SIMD instructions set which results in missaligned heap allocation.
One of the proposed solution was to use XMFLOAT4X4 variables and then change them to temporary XMMatrix whenever needed, but it isn't the nicest and fastest solution imo.
Another one was to use _aligned_malloc, yet I have no idea whatsoever how to use it. I have never had to do any memory allocations and it is black magic for me.
Another one, was to overload new operator, yet they did not provide any information how to do it.
And regarding the overloading method, I'm not using new to create XMMatrix objects since I don't use them as pointers.
It was all working nice untill I have decided to split code into classes.
I think _alligned_malloc solution would be best here, but I have no idea how to use it, where and when to call it.
Unlike XMFLOAT4X4 and XMFLOAT4, which are safe to store, XMMATRIX and XMVECTOR are aliases for hardware registers (SSE, NEON, etc.). Since the library is abstracting away the register type and alignment requirements, you shouldn't attempt to align the types yourself, since you can easily create a program that happens to work on your machine but fails on another. You should either use the safe types for storage (e.g. XMFLOAT4) or pull up the abstraction and use the vector instructions directly, with special storage and alignment code paths in your application for each vector extension you're trying to support.
Also, using these registers outside of the context of the library's vector instructions might cause unexpected failures for other reasons. For example, if you store an XMMATRIX in your own struct, some architectures might fail to create copies of the struct.
Not pretend to be a complete answer.
There are some ways that you didn't mention:
#define _XM_NO_INTRINSICS_. Simple. Slow. Works right now, just one line of code. ;)
Don't store XMVECTOR and XMMATRIX on a heap. Store XMFLOAT4 or XMFLOAT4X4 and convert to SIMD types only when needed (so they will be stored on stack). Slower. Many code to change (probably).
Don't store XMVECTOR and XMMATRIX on a heap, part 2. Just store your classes on stack. Fast. Pretty hard. Many code to change (probably).
Use aligned allocator. Fast. Hard. Many hours to google, many code to write and debug.
Don't use DirectXMath (previously XMMath) library. Choose any other (there are plenty) or write your own. Fast. Many code to change (probably).
If you want aligned allocator, it has nothing to DirectX or DirectXMath. It is advanced topic. No one can give you complete solution. But, here are some results of googling:
returning aligned memory with new?
Harder to C++: Aligned Memory Allocation
many more
Be very attentive. With bad memory allocator you can introduce much more problems than solve.
Hope it helps somehow. Happy Coding! :)

Is it bad design for a class to give access to its data (via ptr/it) when this data can be deleted before the class object is out of scope?

Classic example is iterator invalidation :
std::string test("A");
auto it = test.insert(test.begin()+1,'B');
test.erase();
...
std::cout << *it;
Do you think having this kind of API is bad design, and will be difficult to learn/use for beginners ?
A costly, performance/memory wise, solution would be, in that type of case, to assign the pointer/iterator to an empty string (or a nullptr, but that's not very helpful) when a clear method is used.
Some precisions
I'm thinking of this design for returning const chars* that can be modified internally (maybe they're stored in a std::vector that can be cleared). I don't want to return a std::string (binary compatibility) and I don't want a get(char*,std::size_t) method because of the size argument that needs to be fetched (too slow). Also I don't want to create a wrapper around std::string or my own string class.
I would recommend reading up on Stepanov's design philosophy (pages 9-11):
[This example] is written in a clear object-oriented style with getters and setters. The proponents of this style say that the advantage of having such functions is that it allows programmers later on to change the implementation. What they forget to mention is that sometimes it is awfully good to expose the implementation. Let us see what I mean. It is hard for me to imagine an evolution of a system that would let you keep the interface of get and set, but be able to change the implementation. I could imagine that the implementation outgrows int and you need to switch to long. But that is a different interface. I can imagine that you decide to switch from an array to a list but that also will force you to change the interface, since it is really not a very good idea to index into a linked list.
Now let us see why it is really good to expose the implementation. Let us assume that tomorrow you decide to sort your integers. How can you do it? Could you use the C library qsort? No, since it knows nothing about your getters and setters. Could you use the STL sort? The answer is the same. While you design your class to survive some hypothetical change in the implementation, you did not design it for the very common task of sorting. Of course, the proponents of getters and setters will suggest that you extend your interface with a member function sort. After you do that, you will discover that you need binary search and median, etc. Very soon your class will have 30 member functions but, of course, it will be hiding the implementation. And that could be done only if you are the owner of the class. Otherwise, you need to implement a decent sorting algorithm on top of the setter-getter interface from scratch and that is a far more difficult and dangerous activity than one can imagine. ...
Setters and getters make our daily programming hard but promise huge rewards in the future when we discover better ways to store arrays of integers in memory. But I do not know a single realistic scenario when hiding memory locations inside our data structure helps and exposure hurts; it is, therefore, my obligation to expose a much more convenient interface that also happens to be consistent with the familiar interface to the C arrays. When we program in C++ we should not be ashamed of its C heritage, but make full use of it. The only problems with C++, and even the only problems with C, arise when they themselves are not consistent with their own logic. ...
My remark about exposing the address locations of consecutive integers is not facetious.
It took a major effort to convince the standard committee that such a requirement is an
essential property of vectors; they would not, however, agree that vector iterators should
be pointers and, therefore, on several major platforms – including the Microsoft one – it
is faster to sort your vector by saying the unbelievably ugly
if (!v.empty()) {
sort(&*v.begin(), &*v.begin() + v.size());
}
than the intended
sort(v.begin(), v.end());
Attempts to impose pseudo-abstractness at the cost of efficiency can be defeated, but at a terrible cost.
Stepanov has a lot of other interesting documents available, especially in the "Class Notes" section.
Yes, there are several rules of thumb regarding OOP. No, I'm not convinced that they are really the best way to do things. When you're working with the STL it makes a lot of sense to do things the STL compatible way. And when your abstraction is low level (like std::vector, which is meant specifically to make working with dynamically allocated arrays easier; i.e., it should be usable almost like an array with some added features), then some of those OOP rules of thumb make no sense at all.
To answer the original question: even beginners will eventually need to learn about iterators, object lifetimes, and what I'll call an object's useful life (i.e., "the object hasn't fallen out of scope, but is no longer valid to use, like an invalidated iterator"). I don't see any reason to try to hide those facts of life from the user, so I personally wouldn't rule out an iterator-based API on those grounds. The real question is what your API is meant to abstract and what's it's meant to expose (similar to the fact that a vector is a nicer array and is meant to expose its array nature). If you answer that, you should have a better idea about whether an iterator-based API makes sense.
As Scott Meyers states in Effective C++: yes it is indeed not a good design to grant access to private/protected members via pointers, iterators or references because you never know what the client code will do with it.
As far as I can remember this should be avoided, and it is sometimes better to create a copy of data members which are then returned to the caller.
It is a bad or faulty implementation rather than design.
As for providing access to private or protected members through pointers, basically it destroys one of the basic OOP principle of Abstraction.
I am unsure though as to what the question is, Yes ofcourse it is bad to have implementation which invalidates iterator. What is the real Q here?

C++ Memory management

I've learned in College that you always have to free your unused Objects but not how you actually do it. For example structuring your code right and so on.
Are there any general rules on how to handle pointers in C++?
I'm currently not allowed to use boost. I have to stick to pure c++ because the framework I'm using forbids any use of generics.
I have worked with the embedded Symbian OS, which had an excellent system in place for this, based entirely on developer conventions.
Only one object will ever own a pointer. By default this is the creator.
Ownership can be passed on. To indicate passing of ownership, the object is passed as a pointer in the method signature (e.g. void Foo(Bar *zonk);).
The owner will decide when to delete the object.
To pass an object to a method just for use, the object is passed as a reference in the method signature (e.g. void Foo(Bat &zonk);).
Non-owner classes may store references (never pointers) to objects they are given only when they can be certain that the owner will not destroy it during use.
Basically, if a class simply uses something, it uses a reference. If a class owns something, it uses a pointer.
This worked beautifully and was a pleasure to use. Memory issues were very rare.
Rules:
Wherever possible, use a
smart pointer. Boost has some
good ones.
If you
can't use a smart pointer, null out
your pointer after deleting it.
Never work anywhere that won't let you use rule 1.
If someone disallows rule 1, remember that if you grab someone else's code, change the variable names and delete the copyright notices, no-one will ever notice. Unless it's a school project, where they actually check for that kind of shenanigans with quite sophisticated tools. See also, this question.
I would add another rule here:
Don't new/delete an object when an automatic object will do just fine.
We have found that programmers who are new to C++, or programmers coming over from languages like Java, seem to learn about new and then obsessively use it whenever they want to create any object, regardless of the context. This is especially pernicious when an object is created locally within a function purely to do something useful. Using new in this way can be detrimental to performance and can make it all too easy to introduce silly memory leaks when the corresponding delete is forgotten. Yes, smart pointers can help with the latter but it won't solve the performance issues (assuming that new/delete or an equivalent is used behind the scenes). Interestingly (well, maybe), we have found that delete often tends to be more expensive than new when using Visual C++.
Some of this confusion also comes from the fact that functions they call might take pointers, or even smart pointers, as arguments (when references would perhaps be better/clearer). This makes them think that they need to "create" a pointer (a lot of people seem to think that this is what new does) to be able to pass a pointer to a function. Clearly, this requires some rules about how APIs are written to make calling conventions as unambiguous as possible, which are reinforced with clear comments supplied with the function prototype.
In the general case (resource management, where resource is not necessarily memory), you need to be familiar with the RAII pattern. This is one of the most important pieces of information for C++ developers.
In general, avoid allocating from the heap unless you have to. If you have to, use reference counting for objects that are long-lived and need to be shared between diverse parts of your code.
Sometimes you need to allocate objects dynamically, but they will only be used within a certain span of time. For example, in a previous project I needed to create a complex in-memory representation of a database schema -- basically a complex cyclic graph of objects. However, the graph was only needed for the duration of a database connection, after which all the nodes could be freed in one shot. In this kind of scenario, a good pattern to use is something I call the "local GC idiom." I'm not sure if it has an "official" name, as it's something I've only seen in my own code, and in Cocoa (see NSAutoreleasePool in Apple's Cocoa reference).
In a nutshell, you create a "collector" object that keeps pointers to the temporary objects that you allocate using new. It is usually tied to some scope in your program, either a static scope (e.g. -- as a stack-allocated object that implements the RAII idiom) or a dynamic one (e.g. -- tied to the lifetime of a database connection, as in my previous project). When the "collector" object is freed, its destructor frees all of the objects that it points to.
Also, like DrPizza I think the restriction to not use templates is too harsh. However, having done a lot of development on ancient versions of Solaris, AIX, and HP-UX (just recently - yes, these platforms are still alive in the Fortune 50), I can tell you that if you really care about portability, you should use templates as little as possible. Using them for containers and smart pointers ought to be ok, though (it worked for me). Without templates the technique I described is more painful to implement. It would require that all objects managed by the "collector" derive from a common base class.
G'day,
I'd suggest reading the relevant sections of "Effective C++" by Scott Meyers. Easy to read and he covers some interesting gotchas to trap the unwary.
I'm also intrigued by the lack of templates. So no STL or Boost. Wow.
BTW Getting people to agree on conventions is an excellent idea. As is getting everyone to agree on conventions for OOD. BTW The latest edition of Effective C++ doesn't have the excellent chapter about OOD conventions that the first edition had which is a pity, e.g. conventions such as public virtual inheritance always models an "isa" relationship.
Rob
When you have to use manage memory
manually, make sure you call delete
in the same
scope/function/class/module, which
ever applies first, e.g.:
Let the caller of a function allocate the memory that is filled by it,
do not return new'ed pointers.
Always call delete in the same exe/dll as you called new in, because otherwise you may have problems with heap corruptions (different incompatible runtime libraries).
you could derive everything from some base class that implement smart pointer like functionality (using ref()/unref() methods and a counter.
All points highlighted by #Timbo are important when designing that base class.