What is the point of D3D12's SetGraphicsRootSignature? - directx-12

I am a little confused about the existence of the ID3D12GraphicsCommandList::SetGraphicsRootSignature method. From what I understand of this MSDN page, it seems that the only valid usage of it is to always call it after SetPipelineState, giving it the same root signature as was provided when creating the pipeline state object. If that's so, what benefit is there to it not being implicit? Are there other ways to use this method?

It is CPU optimisation, internally it is possible to prepare part of a mapping from the root signature slots to the actual binding. If you share a root signature between different pipeline state objects, then this work can be done once per root signature instead of once per pipeline state object.
You are likely to call SetGraphicsRootSignature less often then SetGraphicPipelineState. This is why.

The "root signature" in DirectX 12 provides the common layout information for sharing data between the CPU data structures and the GPU shader language execution. DirectX 12 makes the programmer decide how many root signatures they want to use, when to use them, and which pipeline state objects need which root signature. In Direct3D 11, there's essentially one "root signature" active at all times which is quite large.
Root signatures can be changed fairly often without a major penalty, but the assumption is that you will have a few root signatures and many PSOs rather than a 1:1 correspondence.
For simplicity in DirectX Tool Kit for DirectX 12, we set the root signature every time we set the PSO in the IEffect::Apply method, even though we only use a few different root signatures.

#Chuck Walbourn
"Root signatures can be changed fairly often without a major penalty"
https://developer.nvidia.com/dx12-dos-and-donts#roots
Minimize the number of Root Signature changes
The problem is not the change of the RS but there is usually a follow up cost of initializing the root signature entries after such a change

Related

"State pattern" vs "one member function per state"?

My class has 3 states. In each state it does some work, and goes to other state, or remains in the same state (in 95% or more cases it will stay in the same state). I can implement state pattern (I assume you know it). The alternative, which I pretty like, is this:
I have a member function per state, and also a pointer to member function, which points to the current state function. When in a state I want to go to another state, I just point that function pointer to another state function. (maybe this isn't completely equivalent to state pattern, but in my case it works fine).
Those two ways are almost identical, I think.
So, my questions are:
Which solution is better (depends on what)?
Is it worth to declare a class per state (which will have only one function)? I think that would be artificial.
What about performance? isn't creating new object of state class (in case of state pattern) bring with it a slight overhead? (Sure state classes shouldn't have members, but anyway it should cost something)
You don't really mention the constraints under which your program will run, so it's hard comment specifically about overheads of one implementation over the other, so I'll just make a comment about code maintainability.
Personally I think that unless your state machine is extremely simple and will stay simple, then declaring a class per state is far more maintainable, extensible & readable. A good rule of thumb might be that if you can't look at the code in your class and keep the entire picture in your head, then your class is probably doing too much. The small overhead you pay in declaring a class per state is likely to be well worth the productivity gains you will get from writing modular code (or anyone else who ends up maintaining it). I've come across far too many 'uber' classes that are essentially one big (very hard too maintain) state machine that probably started out as a simple state machine, to recommend otherwise.
The 'S' and 'O' portions of the SOLID acronym (https://en.wikipedia.org/wiki/SOLID_(object-oriented_design) are always good things to keep in mind.
It depends if you need to access private members of your object or not. If not, then an out-of-class implementation breaks your code in smaller fragments and may be preferable because of this (but this is non objective : the two solutions have pros and cons).
It's not necessary, but adds a layer of abstraction and loosen the coupling. Using an interface, you can change each implementation without affecting the others (e.g. adding class fields...)
Doesn't matter so much, allocating a new empty class or calling a function have same magnitude of overhead.

Performance: should I use a global variable in a function which gets called often?

First off, let me get of my chest the fact that I'm a greenhorn trying to do things the right way which means I get into a contradiction about what is the right way every now and then.
I am modifying a driver for a peripheral which contains a function - lets call it Send(). In the function I have a timestamp variable so the function loops for a specified amount of time.
So, should I declare the variable global (that way it is always in memory and no time is lost for declaring it each time the function runs) or do I leave the variable local to the function context (and avoid a bad design pattern with global variables)?
Please bear in mind that the function can be called multiple times per milisecond.
Speed of execution shouldn't be significantly different for a local vs. a global variable. The only real difference is where the variable lives. Local variables are allocated on the stack, global variables are in a different memory segment. It is true that local variables are allocated every time you enter a routine, but allocating memory is a single instruction to move the stack pointer.
There are much more important considerations when deciding if a variable should be global or local.
When implementing a driver, try to avoid global variables as much as possible, because:
They are thread-unsafe, and you have no idea about the scheduling scheme of the user application (in fact, even without threads, using multiple instances of the same driver is a potential problem).
It automatically yields the creation of data-section as part of the executable image of any application that links to your driver (which is something that the application programmer might want to avoid).
Did you profile a fully-optimized, release build of your code and identify the bottleneck to be small allocations in this function?
The change you are proposing is a micro-optimization; a change to a small part of your code with the intent to make it more efficient. If the question to the above question is "no" as I'd expect, you shouldn't even be thinking of such things.
Select the correct algorithm for your code. Write your code using idiomatic techniques. Do not write in micro-optimizations. You might be surprised how good your compiler is at optimizing your code for you. It will often be able to optimize away these small allocations, but even if it can't you still don't know if the performance penalty imposed by them is even noticeable or significant.
For drivers, with is usually position independent, global variables are accessed indirectly with GOT table unless IP-relative operations is available (i.e. x86_64, ARM, etc)
In case of GOT, you can think it as an extra indirect pointer.
However, even with an extra pointer it won't make any observable difference if it's "only" called in mill-second frequency.

Behavior of creating objects in ColdFusion

At one time I had a theory that instantiating objects on every request rather than having them reside in the Application scope was a huge memory hog. As my knowledge of ColdFusion has grown over the years, I don't think I really understood how CF deals with classes in the "black box" of the CF framework, so I'm going to ask this for community correction or confirmation.
I'm just going to throw out what I think is happening:
A CFC is compiled into a class, each method within that CFC is compiled into a class.
Those classes will reside in (PermGen) memory and can be written to disk based on CF administrator settings.
When a new object is created or template requested, the source code is hashed and compared to the hash stored with the compiled class.
If there is a match, it will use the compiled class in memory
If the compiled class doesn't exist, it will compile from source
If the compiled class exists, but the hash doesn't match, it will recompile.
As an aside, whenever you enable trusted cache, ColdFusion will no longer hash the source to check for differences and will continue to use the compiled class in memory.
Whenever you create a new object, you get a new pointer to the compiled class and its methods' classes and any runtime events occur in the pseudo-constructor. Edit: At this point, I'm referring to using createObject and having any "loose" code outside of functions run. When I say pointer, I mean the reference to memory allocated for the object's scopes (this, variables, function variables).
If you request an init, then the constructor runs. The memory consumed at this point is just your new reference and any variables set in the pseudo-constructor and constructor. You are not actually taking up memory for a copy of the entire class. Edit: For this step I'm referring to using the new operator or chaining your createObject().init() old school.
This eliminates a huge fallacy that I, personally, might have heard over the years that instantiating large objects in every request is a massive memory hog (due to having a copy of the class rather than just a reference). Please note that I am not in favor of this, the singleton pattern is amazing. I'm just trying to confirm what is going on under the hood to prevent chasing down red herrings in legacy code.
Edit: Thanks for the input everyone, this was a really helpful Q/A for me.
I've been developing CF for 14 years and I've never heard anyone claim that creating CFC instances on each request consumed memory due to class compilation. At the Java level, your CFML code is direct compiled to bytecode and stored as Java classes in memory and on disk. Java classes are not stored in the heap, but rather in the permanent generation which is not (usually) a collected memory space. You can create as many instances of that CFC and no more perm gen space will be used, however heap space will be allocated to store the instance data for that CFC for the duration of its existsance. Note, open source Railo does not use separate classes for methods.
Now, if you create a very large amount of CFC instances (or any variable) for that matter, that will create a lot of cruft in your heap's young generations. As long as hard references are not held after the request finishes, those objects will be cleared from the heap when the next minor garbage collection runs. This isn't necessarily a bad thing, but heap sizes and GC pauses should always be taken into account when performance tuning an application.
Now, there are reasons to persist CFC instances, either as a singleton pattern or for the duration of a session, request, etc. One reason is the overhead of actual object creation. This often involves disk I/O to check last modified times. Object creation has increased speed significantly since the old days, but is still pretty far behind native Java if you're going to be creating thousands of instances. The other main reason is for your objects to maintain state over the life of the application/session/request such as a shopping cart stored in session while the user shops.
And for completeness, I'll attempt to address your points categorically:
For Adobe CF yes, for Railo, methods are inner classes
Yes.
Actually, I don't believe there is any hashing involved. It's all based on the datetime last modified on the source file.
Yes, but again, no hashing-- it just skips the disk I/O to check the last modified datetime
I don't think "pointer" is the right term as that implies the Java classes actually live in the heap. CF uses a custom URL classloader to load the class for the template and then an INSTANCE of that class is created and stored in the heap. I can understand how this may be confusing as CFML has no concept of "class". Everything is simply an instance or doesn't exist at all. I'm not sure what you mean by "runtime events occur[ing] in the pseudo-constructor".
To be clear, the JAVA constructor already ran the instant you created the CFC. The CF constructor may be optional, but it has zero bearing on the memory consumed by the CFC instance. Again, I think you're getting unnecessarily hung up on the pseudo-constructor as well. That's just loose code inside the component that runs when it is created and has no bearing on memory allocated in the heap. The Java class is never copied, it is just the template for the instance.

Are classes guaranteed to have the same organization in memory between program runs?

I'm attempting to implement a Save/Load feature into my small game. To accomplish this I have a central class that stores all the important variables of the game such as position, etc. I then save this class as binary data to a file. Then simply load it back for the loading function. This seems to work MOST of the time, but if I change certain things then try to do a save/load the program will crash with memory access violations. So, are classes guaranteed to have the same structure in memory on every run of the program or can the data be arranged at random like a struct?
Response to Jesus - I mean the data inside the class, so that if I save the class to disk, when I load it back, will everything fit nicely back.
Save
fout.write((char*) &game,sizeof Game);
Load
fin.read((char*) &game, sizeof Game);
Your approach is extremely fragile. With many restrictions, it can work. These restrictions are not worth subjecting your users (or yourself!) to in typical cases.
Some Restrictions:
Never refer to external memory (e.g. a pointer or reference)
Forbid ABI changes/differences. Common case: memory layout and natural alignment on 32 vs 64 will vary. The user will need a new 'game' for each ABI.
Not endian compatible.
Altering your type's layouts will break your game. Changing your compiler options can do this.
You're basically limited to POD data.
Use offsets instead of pointers to refer to internal data (This reference would be in contiguous memory).
Therefore, you can safely use this approach in extremely limited situations -- that typically applies only to components of a system, rather than the entire state of the game.
Since this is tagged C++, "boost - Serialization" would be a good starting point. It's well tested and abstracts many of the complexities for you.
Even if this would work, just don't do it. Define a file format at the byte-level and write sensible 'convert to file format' and 'convert from file format' functions. You'll actually know the format of the file. You'll be able to extend it. Newer versions of the program will be able to read files from older versions. And you'll be able to update your platform, build tools, and classes without fear of causing your program to crash.
Yes, classes and structures will have the same layout in memory every time your program runs., although I can't say if the standard enforces this. The machine code generated by C++ compilers use "hard-coded" offsets to access type fields, so they are fixed. Realistically, the layout will only change if you modify the C++ class definition (field sizes, order, virtual methods, etc.), compile with a different compiler or change compiler options.
As long as the type is POD and without pointer fields, it should be safe to simply dump it to a file and read it back with the exact same program. However, because of the above-mentionned concerns, this approach is quite inflexible with regard to versionning and interoperability.
[edit]
To respond to your own edit, do not do this with your "Game" object! It certainly has pointers to other objects, and those objects will not exist anymore in memory or will be elsewhere when you'll reload your file.
You might want to take a look at this.
Classes are not guaranteed to have the same structure in memory as pointers can point to different locations in memory each time a class is created.
However, without posting code it is difficult to say with certainty where the problem is.

What's the best way to set 'deep' configuration options?

Assume there is a function that requires a configuration setting as an input, but this function is called several levels deep from the top-level 'main' function.
What's the best way, in terms of best programming practices, to pass this setting to the function?
One way is to just use a global variable and set that at the top level function and read it in the target function, but I assume that that is considered bad programming practice.
Another way is to pass the setting as an argument all the way from the top, through the several intermediate functions, all the way down to the final target function. This seems very tedious though and perhaps error-prone.
Are there other approaches?
You can use your language of choice for your answer, but FYI, I'm using C/C++, Perl, and Matlab.
I like singleton objects for configuration. It's a shared resource that should only ever have one instance. When you try to create a new object, you get the existing one. You don't worry about global variables or subroutine or method parameters. Simply get a new configuration object and use it as long as you need it.
There's an example in Gang of Four for C++.
Leave the procedural programming style with deep call stacks behind and the answer becomes a banality.
Remodel your program to take advantage of modern object-orientation. Perl roles make for flat hierarchies. A configuration is then just an attribute.
A system I work with uses a Publish-Subscribe (Observer Pattern) implementation to propagate settings/configuration changes to objects that need to know about them.
The object (Subscriber, or Observer in the original Gang of Four description) that needs to be notified of settings changes:
Inherits from Subscriber.
Attaches itself (subscribes) to the Publisher via the Publisher's Attach method.
Is notified by the Publisher whenever settings/configuration changes occur.
We use a variant that allows Subscribers to poll Publishers for settings/configuration data on demand.
Using the Publish-Subscribe pattern minimizes coupling between the object that manages the settings, and the objects that need them.
In matlab, I always have a script allParam.m, where I set all the parameters.
If a function needs one of those parameters, I just call the script, and it is set.