DirectX11 How to strip unused variables from constant buffer? - c++

I am calling D3DReflect() to deduce the layout of the constant buffers used by a compiled shader, and I noticed that they often contain unused variables.
I am already using D3DStripShader() to strip debug info, and I was wondering if there is a similar way to strip those unused variables from constant buffers before calling D3DReflect() ?
Is it usually a good practice ?
Since it would imply most of the time to have one cbuffer per original cbuffer/stage/program, I don't know if the gain of stripping unused variables would be superior to the loss of having more (smaller) cbuffers ?

There is no simple way to do this. The naive view of constant buffers was that everyone would make explicit structures to hold their constants, and those structures would be shared by both shaders and the calling C++ code (or C#, whatever). Thus, if the shader compiler altered the layout of the structure, everything would break.
This makes sense in the microscopic view when working on DX sample apps. For a larger project, many people don't do that. Instead, they have older style shaders with constants declared at global scope. On DX9 and other similar platforms, the constants were mapped to registers, so the compiler could strip unused constants (and it did). For DX11, the compiler takes all of those global constants, and puts them in a special "global" constant buffer. Then it decides that you really care about the structure of that buffer, so it refuses to remove anything.
So, there are generally two options:
Break your constants into multiple constant buffers, grouped roughly into sets that are used together. The compiler WILL strip an entire constant buffer that's unused, so you can use that to get coarse stripping. This is time consuming, and you have to maintain your set partition, but it might be good enough, depending on your situation.
Implement constant stripping yourself. This is what we do... After compiling all of the shaders once, we use the reflection API to get a list of all the constants in the binary. That information includes the flag that indicates if the constant is used or not. For each used constant, we simply declare it again, as normal. For each constant that wasn't used, we emit a similar declaration, but mark the variable as static. That has the effect of removing it from any constant buffer (because it's treated as a compile-time constant by the shader compiler). Then we re-compile the shaders, and the newly generated global constant buffer only contains the used constants.
This is also a bunch of work (and in our implementation, we have to wrap all constant declarations in a macro - the wrapper code builds a big string with all of the static/non-static declarations, and defines STRIPPED_CONSTANT_DEFINITIONS to contain that string):
#if defined (STRIPPED_CONSTANT_DEFINITIONS)
STRIPPED_CONSTANT_DEFINITIONS
#else
bool someConstant;
float4 color;
...
#endif
Note that you need to still declare the stripped constants as static, because any unused code paths or uncalled functions that refer to those variables will cause the shader not to compile, otherwise.

Related

Declaring and initializing very large arrays on the heap

I have an array of about 66,000 elements, with each element being a POD struct of integral data types. This array is constant and will never change, so my original thought was to just put it in as a constant global variable.
I declared it in a header as extern and initialized it in a cpp file, like (obviously simplified here):
const PODStruct bigArray[] =
{
{1,2,3,4} , {1,2,3,5} , ....
}
with some editing in a text editor so it wasn't just one continuous line.
--EDIT: I was reminded that global variables are of course not stack-allocated, so the paragraph that was here went away! However, what if I still would rather have the data in a vector?
I was thinking since C++11 allows that same syntax for std::vector initialization, I could just use that with a simple edit and have a vector instead. However, in MSVC++ 2013, when I attempt to compile it says I've hit compiler limits. I looked through the C++ standards for compiler limits and MSVC++13's deviations from it, but nothing seemed to be directly the cause. I'm guessing it has to do with how that initializer-list syntax is actually implemented.
I can get the array itself into a vector using the constructor in the answer here: How to initialize std::vector from C-style array?
However, then I'd still have the array in memory twice, right? It's not on the stack like I originally feared and not that big, so it's not a huge deal, but seems like a sloppy solution.
I'm thinking I could create a class with a default constructor, and declare and initialize the typed-out table in there. Then I can declare the vector in the constructor and construct it with the array. The class's only member could just be the vector.
I could just declare that class then create a global instance of it, right? That'd be similar to the behavior I had with the global array. If I wanted to get away from that, is the best approach to declare the class first thing under main, and then pass it around to the functions and methods that need the table?
Should I want to get away from that? This data is, despite a lot of it, along the lines of PI = 3.4.
Your idea about storing your "huge constant array" in a compile-size generated constant is ok, that what I'd do.
If you try to move all this to a vector or other variant of heap-allocated array, then you'd simply duplicate the data, since the initialization data resides in your executable image anyway.
To workaround the (idiotic) MSVC 2013 compiler limit that's what I'd try.
Switch to MSVC 2010 compiler. See the build options for your .cpp file, in MSVC 2013 you may set the "platform toolset" of MSVC 2010.
Try to redefine your data type. For instance, instead of having an array of structs, try an array of (constant) pointers to structs. All this should be compile-time generated as well.
With some efforts most probably you may work this around. Good luck.
It seem to me that you have hit some unknown file size barrier on MSVC and/or your computer hardware. Try load the data from an external file maybe (using mmap(2) preferably?)
Not really relevant, since this is a huge amount of data, yu can try look into things like OpenCL or CUDA to let the GPU help you crunch the numbers if possible. It would make things a lot faster.

What are __builtin__functions for in C++?

I am debugging a transactional processing system which is performance sensitive.
I found a code which uses, __builtin_memcpy and __builtin_memset instead of memcpy and memset.
What are __builtin_functions for?
,to prevent the dependency problems on architecture or compiler?
Or.. is there any performance reason where __builtin_functions are prefered?
thank you :D
Traditional library functions, the standard memcpy is just a call to a function. Unfortunately, memcpy is often called for every small copies, and the overhead of calling a function, shuffling a few bytes and returning is quite a lot of overhead (especially since memcpy adds extra stuff to the beginning of the function to deal with unaligned memory, unrolling of the loop, etc, to do well on LARGE copies).
So, for the compiler to optimise those, it needs to "know" how to do for example memcpy - the solution for this is to have a function "builtin" into the compiler, which then contains code such as this:
int generate_builtin_memcpy(expr arg1, expr arg2, expr size)
{
if (is_constant(size) && eval(size) < SOME_NUMBER)
{
... do magic inline memory copy ...
}
else
{
... call "real" memcpy ...
}
}
[For retargetable compilers, there is typically one of these functions for each CPU architecture, that has different configurations as to what conditions the "real" memcpy gets called, or when an inline memcpy is used.]
The key here is that you MAY actually write your own memcpy function, that ISN'T based on __builtin_memcpy(), which is ALWAYS a function, and doesn't do the same thing as normal memcpy [you'd be a bit in trouble if you change it's behaviour a lot, since the C standard library probably calls memcpy in a few thousand places - but for example doing statistics over how many times memcpy is called, and what sizes are copies could be one such use-case].
Another big reason for using __builtin_* is that they provide code that would otherwise have to be written in inline assembler, or possibly not available at all to the programmer. Setting/getting special registers would be such a thing.
There are other techniques to solve this problem, for example clang has a LibraryPass that assumes library-calls do common functions with other alternatives, for example since printf is much "heavier" than puts, it replaces suitable printf("constant string with no formatting\n")s into puts("constant string with no formatting"), and many trigonometric and other math functions are resolved into common simple values when called with constants, etc.
Calling __builtin_* directly for functions like memcpy or sin or some such is probably the WRONG thing to do - it just makes your code less portable and not at all certain to be faster. Calling __builtin_special_function when there is no other is typically the solution in some tricky situations - but you should probably wrap it in your own function, e.g.
int get_magic_property()
{
return __builtin_get_magic_property();
}
That way, when you port to Windows, you can easily do:
int get_magic_property()
{
#if WIN32
return Win32GetMagicPropertyEx();
#else
return __builtin_magic_property();
#endif
}
__builtin_* functions are optimised functions provided by the compiler libraries. These might be builtin versions of standard library functions, such as memcpy, and perhaps more typically some of the maths functions.
Alternatively, they might be highly optimised functions for typical tasks for that particular target - eg a DSP might have built-in FFT functions
Which functions are provided as __builtin_ are determined by the developers of the compiler, and will be documented in the manuals for the compiler.
Different CPU types and compilers are designed for different use cases, and this will be reflected in the range of built-in functions provided.
Built-in functions might make use of specialised instructions in the target processor, or might trade off accuracy for speed by using lookup tables rather than calculating values directly, or any other reasonable optimisation, all of which should be documented.
These are definitely not to reduce dependency on a particular compiler or cpu, in fact quite the opposite. It actually adds a dependency, and so these might be wrapped up in preprocessor checks eg
#ifdef SOME_CPU_FLAG
#define MEMCPY __builtin_memcpy
#else
#define MEMCPY memcpy
on a compiler note, __builtin_memcpy can fall back to emitting
a memcpy function call. also less-capable
compilers the ability to simplify, by choosing the slow path of
unconditionally emitting a memcpy call.
http://lwn.net/Articles/29183/

const vs #define in modern compilers

I've read a few things saying that a #define doesn't take up any memory, but a colleague at work was very insistent that modern compilers don't have any differences when it comes to const int/strings.
#define STD_VEC_HINT 6;
const int stdVecHint = 6;
The conversation came about because an old bit of code that was being modernised that dealt with encryption that had its key as a #define.
I always thought that a variable would end up getting a memory address which would show its contents, but maybe compiling under release removes such stuff.
A good compiler will not allocate space for a const variable that can be elided. In C++, const variables at the module scope are also implicity static in visibility, so it's easier for the compiler to optimize out the variable as well. The link time optimization feature of GCC helps as well to do cross-module optimization.
Don't forget the even more important fact that const variables have proper scoping and type safety, which are missing from #define.
As with so many thinks, it depends..!
A #define will just inject the constant straight into your code, so it won't take up any memory. The same is potentially true for a const.
However, you can take the address of a const:
const int *value = &stdVecHint;
And since you're taking its address the compiler will need to store the constant in memory in order to generate an address, so in this case it will require memory.
The compiler is likely to replace occurrences with stdVecHint with the literal 6 everywhere it is used. An address and memory space will be taken up if you take its address explicitly, but then again this is a moot point since you couldn't do that with the STD_VEC_HINT. Pedantically, yes, stdVecHint is a variable with internal linkage and every translation unit that sees the definition will have its own copy of it. In practice, it shouldn't increase the memory footprint.
Pre-processor command uses this type of command like #define ,....
and no need to allocate memory because
and change name of constant [#define pi 3.14] like pi with 3.14 and then compile code
but in const command [const float pi=3.14;] need to memory allocation
For compatibility with older versions of the preprocessor is no problem, but future unclear
be successfull

manifest constants vs C++ keyword "const"

Reading the book of Meyers (item 2 "Prefer const to #define) I'd like to understand some sentences that I list below:
With reference to the comparison between #define ASPECT_RATIO 1.653 and const aspect_ratio = 1.653 Meyers asks that "... in the case of floating point constant (such as in this example) use of the constant may yield smaller code than using #define."
The questions are:
With smaller code Meyers means the a smaller space on disk of executable file?
Why it is smaller? I thought that this may be valid on system with 32 bit because in this case an int (or pointer) needs 4 bytes and a double 8 bytes. Because ASPECT_RATIO may not get entered into symbol table the name is replaced with the value, while in other cases may be used a const pointer to a unique double value. In this case this concept would no longer be valid on machines with 64 bit (because pointer and double are the same number of bytes). I do not know if I explained well what I mean, and especially if this idea is correct?
Then Meyers asks that " ...though good compilers won't set aside storage for const objects of integral types (unless you create a pointer or reference to the object) sloppy compilers may, and you may not be willing to set aside memory for such objects..."
In this context the memory is the RAM occupied by the process in execution? If it is correct to verify this I can use task manager (in Win) or top (in Linux)?
First, micro-optimizations are stupid. Don't care about a couple of constant double values eating up all your RAM. It won't happen. If it does, handle it then, not before you know it's even relevant.
Second, #define can have nasty side effects if used too much, even with the ALL_CAPS_DEFINES convention. Sooner or later you're going to mistakenly make a short macro that is used in some other variable's name, with preprocessor replacement giving you an unfathomable and avoidable error and no debuggability at all. As the linked question in the question comments states, macro's lack namespace and class scope, and are definitely bad in C++.
Third, C++11 adds constexpr, which allows typesafe macro-performant (whatever this misnomer should mean) constant expressions. There are even those (see the C++ Lounge in SO Chat) that do whole calculations at compile time using constexpr. Unfortunately, not all major compilers claiming C++11 support, actually support enough C++11 features to be truly useful (I'm looking at you, MSVC2012!).
The reason that it "may" yield smaller code is that multiple uses of a define will probably (probably: optimizers do weird stuff) also generate the same constant again and again. Whereas using a const will only generate one definition, and then reference the same definition (if the optimizer doesn't calculate stuff inline).
The linker outputs several parts when linking your executable. Some part contains constants, some other part executable code. Wether or not your (operating) system loads the executable into ram before executing, is not defined within the C++ standard. I've used systems where the code executes from flash storage, so only the stack and dynamically allocated memory uses ram.

What are the major advantages of const versus #define for global constants?

In embedded programming, for example, #define GLOBAL_CONSTANT 42 is preferred to const int GLOBAL_CONSTANT = 42; for the following reasons:
it does not need place in RAM (which is usually very limited in microcontrollers, and µC applications usually need a large number of global constants)
const needs not only a storage place in the flash, but the compiler generates extra code at the start of the program to copy it.
Against all these advantages of using #define, what are the major advantages of using const?
In a non-µC environment memory is usually not such a big issue, and const is useful because it can be used locally, but what about global constants? Or is the answer just "we should never ever ever use global constants"?
Edit:
The examples might have caused some misunderstanding, so I have to state that they are in C. If the C compiler generated the exact same code for the two, I think that would be an error, not an optimization.
I just extended the question to C++ without thinking much about it, in the hopes of getting new insights, but it was clear to me, that in an object-oriented environment there is very little space for global constants, regardless whether they are macros or consts.
Are you sure your compiler is too dumb to optimize your constant by inserting its value where it is needed instead of putting it into memory? Compilers usually are good in optimizations.
And the main advantage of constants versus macros is that constants have scope. Macros are substituted everywhere with no respect for scope or context. And it leads to really hard to understand compiler error messages.
Also debuggers are not aware of macros.
More can be found here
The answer to your question varies for C and C++.
In C, const int GLOBAL_CONSTANT is not a constant in C, So the primary way to define a true constant in C is by using #define.
In C++, One of the major advantage of using const over #define is that #defines don't respect scopes so there is no way to create a class scoped namespace. While const variables can be scoped in classes.
Apart from that there are other subtle advantages like:
Avoiding Weird magical numbers during compilation errors:
If you are using #define those are replaced by the pre-processor at time of precompilation So if you receive an error during compilation, it will be confusing because the error message wont refer the macro name but the value and it will appear a sudden value, and one would waste lot of time tracking it down in code.
Ease of Debugging:
Also for same reasons mentioned in #2, while debugging #define would provide no help really.
Another reason that hasn't been mentioned yet is that const variables allow the compiler to perform explicit type-checking, but macros do not. Using const can help prevent subtle data-dependent errors that are often difficult to debug.
I think the main advantage is that you can change the constant without having to recompile everything that uses it.
Since a macro change will effectively modify the contents of the file that use the macro, recompilation is necessary.
In C the const qualifier does not define a constant but instead a read-only object:
#define A 42 // A is a constant
const int a = 42; // a is not constant
A const object cannot be used where a real constant is required, for example:
static int bla1 = A; // OK, A is a constant
static int bla2 = a; // compile error, a is not a constant
Note that this is different in C++ where the const really qualifies an object as a constant.
The only problems you list with const sum up as "I've got the most incompetent compiler I can possibly imagine". The problems with #define, however, are universal- for example, no scoping.
There's no reason to use #define instead of a const int in C++. Any decent C++ compiler will substitute the constant value from a const int in the same way it does for a #define where it is possible to do so. Both take approximately the same amount of flash when used the same way.
Using a const does allow you to take the address of the value (where a macro does not). At that point, the behavior obviously diverges from the behavior of a Macro. The const now needs a space in the program in both flash and in RAM to live so that it can have an address. But this is really what you want.
The overhead here is typically going to be an extra 8 bytes, which is tiny compared to the size of most programs. Before you get to this level of optimization, make sure you have exhausted all other options like compiler flags. Using the compiler to carefully optimize for size and not using things like templates in C++ will save you a lot more than 8 bytes.