I've done my best to find an answer to this with no luck. Also, I've tested it and don't see any difference whatsoever in an optimized release build (there is a difference in debug)... still, I can't imagine why there is no difference, or how the optimizer is able to remove the penalty, and maybe someone knows what is happening internally.
If I create new instances of a simple class/struct within a loop, is there any penalty in efficiency for creating the class/struct on every loop iteration?
i.e.
struct mystruct
{
inline mystruct(const double &initial) : _myvalue(initial) {}
double myvalue;
}
why does...
for(int i=0; i<big_int; ++i)
{
mystruct a = mystruct(1.1)
}
take the same amount of real time as
for(int i=0; i<big_int; ++i)
{
double s = 1.1
}
?? Shouldn't there be some time required for the constructor/initialization?
This is easy-peasy work for a modern optimizer to handle.
As a programmer you might look at that constructor and struct and think it has to cost something. "The constructor code involves branching, passing arguments through registers/stack, popping from the stack, etc. The struct is a user-defined type, it must add more data somewhere. There's aliasing/indirection overhead for the const reference, etc."
Except the optimizer then has a go at your code, and it notices that the struct has no virtual functions, it has no objects that require a non-trivial constructor. The whole thing fits into a general-purpose register. And then it notices that your constructor is doing little more than assigning one variable to another. And it'll probably even notice that you're just calling it with a literal constant, which translates to a single move/store instruction to a register which doesn't even require any additional memory beyond the instruction.
It's all very magical, and compilers are sophisticated beasts, but they usually do this in multiple passes, and from your original code to intermediate representations, and from intermediate representations to machine code. To really appreciate and understand what they do, it's worth having a peek at the disassembly from time to time.
It's worth noting that C++ has been around for decades. As a successor to C, it originally was pushed mostly as an object-oriented language with hot concepts like encapsulation and information hiding. To promote a language where people start replacing public data members and manual initialization/destruction and things like that for simple accessor functions, constructors, destructors, it would have been very difficult to popularize the language if there was a measurable overhead in even a simple function call. So as magical as this all sounds, C++ optimizers have been doing this now for decades, squashing all that overhead you add to make things easier to maintain down to the same assembly as something which wouldn't be so easy to maintain.
So it's generally worth thinking of things like function calls and small structures as being basically free, since if it's worth inlining and squashing away all the overhead to zilch, optimizers will generally do it. Exceptions arise with indirect function calls: virtual methods, calls through function pointers, etc. But the code you posted is easy stuff for a modern optimizer to squash down.
C++ philosophy is that you should not "pay" (in CPU cycles or in memory bytes) for anything that you do not use. The struct in your example is nothing more than a double with a constructor tied to it. Moreover, the constructor can be inlined, bringing the overhead all the way down to zero.
If your struct had other parts to initialize, such as other fields or a table of virtual functions, there would be some overhead. The way your example is set up, however, the compiler can optimize out the constructor, producing an assembly output that boils down to a single assignment of a double.
Neither of your loops do anything. Dead code may be removed. Furthermore, there is no representational difference between a struct containing a single double and a primitive double. The compier should be able to easily "see through" an inline constructor. C++ relies on optimisations of these things to allow its abstractions to compete with hand-written versions.
There is no reason for the performance to be different, and if it were, I would consider it a bug (up to debug builds, where debug information could change the performance cost).
These quotes from the C++ Standard may help to understand what optimization is permitted:
The semantic descriptions in this International Standard define a parameterized nondeterministic abstract machine. This International Standard places no requirement on the structure of conforming implementations. In particular, they need not copy or emulate the structure of the abstract machine. Rather, conforming implementations are required to emulate (only) the observable behavior of the abstract machine as explained below.
and also:
The least requirements on a conforming implementation are:
Access to volatile objects are evaluated strictly according to the rules of the abstract machine.
At program termination, all data written into files shall be identical to one of the possible results that execution of the program according to the abstract semantics would have produced.
The input and output dynamics of interactive devices shall take place in such a fashion that prompting output is actually delivered before a program waits for input. What constitutes an interactive device is implementation-defined.
These collectively are referred to as the observable behavior of the program.
To summarize: the compiler can generate whatever executable it likes so long as that executable performs the same I/O and access to volatile variables as the unoptimized version would. In particular, there are no requirements about timing or memory allocation.
In your code sample, the entire thing could be optimized out as it produces no observable behaviour. However, real-world compilers sometimes decide to leave in things that could be optimized out, if they think the programmer really wanted those operations to happen for some reason.
#Ikes answer is exactly what I was getting at. However, If you are curious about this question, I very much recommend reading answers of #dasblinkenlight, #Mankarse, and #Matt McNabb and the discussions below them, which get at the details of the situation. Thanks all.
Related
Consider this function:
void f(void* loc)
{
auto p = new(loc) volatile int{42};
*p = 0;
}
I have check the generated code by clang, gcc and CL, none of them elide the initialization. (The answer may be seen by the hardwer:).
Is it an extension provided by compilers to the standard? Does the standard allow compilers not to perform the write 42?
Actualy for objects of class type, it is specfied that constructor of an object is executed without consideration for the volatile qualifier [class.ctor]:
A constructor can be invoked for a const, volatile or const volatile object. const and volatile
semantics (10.1.7.1) are not applied on an object under construction. They come into effect when the
constructor for the most derived object (4.5) ends.
[intro.execution]/8 lists the minimum requirements for a conforming implementation; these are also known as “observable behavior”. The first requirement is that “Access to volatile objects are evaluated strictly according to the rules of the abstract machine.” The compiler is required to produce all observable behavior. In particular, it is not allowed to remove accesses to volatile objects. And note that “object” here is used in the compiler-writer’s sense: it includes built-in types.
This is not a coherent question because what it means for a compiler to perform a write is platform-specific. There is no platform-independent notion of performing a write other than perhaps seeing the effects of a write in a subsequent read.
As you see, typical compilers on x86 will emit a write instruction but no memory barrier. The CPU may reorder the write, coalesce it, or even avoid doing any write to main memory because of the way the platform's cache coherence works.
The reason they made this implementation choice is that it makes volatile work for a broad range of applications, including those where the standard requires it to work, and because it has acceptable performance consequences. The standard, being platform-neutral, doesn't dictate platform-specific decisions like this and compiler writers do not understand it to do that.
They could have forced every volatile access to be uncoalsecable, un-reorderable, and pushed through the cache subsystem to main memory. But that would provide terrible performance and, on this platform, no significant benefits. So they don't do it, and they don't understand the C++ standard to suggest that there's some mythical observer on the memory bus who must see specific things. The very existence of a memory bus is platform-specific. The standard is not platform-specific.
You will sometimes see people argue, for example, that the standard somehow requires the compiler to issue instructions to do volatile writes in order but that it doesn't matter if the CPU coalesces or re-orders the writes. This is, frankly, silly. The C++ standard doesn't impose requirements on the instructions compilers generate but rather on what those instructions must actually do when executed. It doesn't distinguish between optimizations done by a CPU and optimizations done by a compiler and any such distinctions would be platform-specific anyway.
If the standard allows a CPU to re-order two writes, then it allows the compiler to re-order them. It does not, and cannot, make that kind of distinction. Of course, compiler writers may still decide that they will issues the writes in order even though the CPU can re-order them because that may make the most sense on their platform.
I understand the abstract functionality of these tricks with "const" are for security issues to not let a programmer unintentionally change things, or to not mess things intentionally, or by accident between different programmers of same project (correct me if I am wrong).
What unclear is:
How this commitment (to be unchanged) is achieved by means of compiler, operating system (processes, threads), hardware (CPU, RAM) related, etc - i.e. "how whole system 'marks' this cell(s) to treat in such a special way".
Does using "const" in a (C/C++) code degrade a performance, for extremely efficient applications, and does it depend on compiler (if yes - please specify on which it does)?
To answer both of your questions at once:
const will never degrade performance, if anything it will increase performance. This is because the compiler is allowed to make certain assumptions when variables are declared const. They can't be changed, so the compiler can use this to optimize your code by either refactoring parts of the consts or adding the values of these consts to a read-only data segment, allowing faster lookup.
const_casting this const away (lying to the compiler) while the data is stored in a read-only segment and trying to overwrite it will result in undefined behaviour which is why the compiler can rely on the programmer not doing this and can therefore increase speed.
The compiler is the only one responsible for optimizing this part, not any of the "operating system (processes, threads), hardware (CPU, RAM)".
I understand the abstract functionality of these tricks with "const" are for security issues
Well, constness isn't a trick, and it's not a security issue in the sense that phrase is normally used
... not let a programmer unintentionally change things ...
Yes, const tells the compiler - and other developers including your future self - that you don't intend to mutate something
... or to not mess things intentionally ...
No, because you can simply use const_cast to get around it (where the underlying object is genuinely mutable). You just can't do it accidentally.
... How this commitment (to be unchanged) is achieved by means of compiler ...
That's it. That's everything. The compiler refuses to compile code which mutates const objects. If code that mutates const objects doesn't compile, nothing needs to be done at runtime, and nothing is.
There are other protections, depending on your OS and hardware platform, against writing to regions of memory your process shouldn't be able to change: see for example the UNIX world's SEGV, which is typically enforced by your memory address-mapping hardware and OS working together.
These runtime concepts aren't directly expressed in the language, although they do affect how the language can be implemented on a given platform.
... Does using "const" in a (C/C++) code degrade a performance ...
No, const is normally used to express more clearly how your code is intended to behave, and this extra information sometimes allows the compiler to make more optimizations. I can't think of an obvious reason why it would make anything slower, unless it forces otherwise-avoidable copies.
You are mistaken in thinking this is for security reasons: in fact the const keyword is not enforced by anything except the compiler. Even the compiler though can be ordered to ignore the const keyword by means of a const-cast.
As such there are no performance considerations associated with const at runtime (there may be at compile time as the compiler may use it to optimize code differently).
This article by Herb Sutter answers most of your questions.
When a variable is const, the compiler assumes that the variable has not been changed in this scope, and uses that information to optimize the code.
Any attempt to modify it will result in undefined behavior:
Except that any class member declared mutable (7.1.1) can be modified,
any attempt to modify a const object during its lifetime (3.8) results
in undefined behavior.
How this commitment (to be unchanged) is achieved by means of
compiler, operating system (processes, threads), hardware (CPU, RAM)
related, etc - i.e. "how whole system 'marks' this cell(s) to treat in
such a special way".
const enforces a read-only contract, and breaching that contract will always be caught at compile-time - for a correct (standard conforming code) code; and with the exception of const_cast.
The compiler typically knows its target architecture very well. And using const correctly may help it structure the code generated in special ways to improve safety and perhaps speed. Note: that const will never degrade speed unless your Compiler is broken.
const objects can be kept in read-only memory, or special memory that can be marked as read-only (which enforces a form of hardware safety, an attempt to write after marking the memory as read-only will trigger an Access Violation Exception.
Does using "const" in a (C/C++) code degrade a performance, for
extremely efficient applications, and does it depend on compiler (if
yes - please specify on which it does)?
No, it doesn't. If it does, that compiler is buggy.
Not sure if this has already been asked before. While answering this very simple question, I asked myself the following instead. Consider this:
void foo()
{
int i{};
const ReallyAnyType[] data = { item1, item2, item3,
/* many items that may be potentially heavy to recreate, e.g. of class type */ };
/* function code here... */
}
Now in theory, local variables are recreated every time control reaches function, right? I.e. look at int i above - it's going to be recreated on the stack for sure. What about the array above? Can a compiler be as smart as to optimize its creation to occur only once, or do I need a static modifier here anyway? What about if the array is not const? (OK, if it's not const, there probably i snot sense in creating it only once, since re-initialization to the default state may be required between calls due to modifications being made during function execution.)
Might sound like a basic question, but for some reason I still ponder. Also, ignore the "why would you want to do this" - this is just a language question, not applied to a certain programming problem or design. I mean both C and C++ here. Should there be differences between the two regarding this question, please outline those.
There a two questions here, I think:
Can a compiler optimize a non-static const object to be effectively static so that it is only created once; and
Is it a reasonable expectation that a given compiler will do so.
I think the answer to the second question is "No", because I don't see the point of doing a huge amount of control flow analysis to save the programmer the trouble of typing the word static. However, I've often been surprised what optimizations people spend their time writing (as opposed to the optimizations which I think they should be working on :-) ). All the same, I would strongly recommend using the word static if that's what you wanted.
For the first question, there are circumstances under which the compiler could perform the optimization based on the "as-if" rule, but in very few cases would it work out.
First of all, if any object or subobject in the initializer has a non-trivial constructor/destructor, then the construction/destruction is visible, and this is not an example of copy elision. (This paragraph is C++ only, of course.)
The same would be true if any computation in the initializer list has visible side-effects.
And it should go without saying that if any subobject's value is not constant, the computation of that subobject would need to be done on each construction.
If the object and all subobjects are trivially copyable, all the initializer-list computations are constant, and the only construction cost is that of copying from a template into the object, then the compiler still couldn't perform the optimization if there is any chance that the addresses of more than one live instance of the object might be simultaneously visible. For example, if the function were recursive, and the object's address was used somewhere (hard to avoid for an array), then there would be the possibility that the addresses of two of these objects from different recursive invocations of the function might be compared. And they would have to compare unequal, since they are in fact separate objects. (And, now that I think of it, the function would not even need to be recursive in a multi-threaded environment.)
So the burden of proof for a compiler wishing to optimize that object into a single static instance is quite high. As I said, it may well be that a given compiler actually attempts to perform that task, but I definitely wouldn't expect it to.
The compiler would almost certainly do whatever is deemed most optimal, but most likely it will have it in read-only memory and turn your local variable into a pointer that points to the array in read-only memory. This assumes your array is equivalent to a POD type (or a class composed of POD types; if your class does something non-trivial and/or modifies other things, there is no way the compiler can fairly do this optimization).
Let's say I define a following C++ object:
class AClass
{
public:
AClass() : foo(0) {}
uint32_t getFoo() { return foo; }
void changeFoo() { foo = 5; }
private:
uint32_t foo;
} aObject;
The object is shared by two threads, T1 and T2. T1 is constantly calling getFoo() in a loop to obtain a number (which will be always 0 if changeFoo() was not called before). At some point, T2 calls changeFoo() to change it (without any thread synchronization).
Is there any practical chance that the values ever obtained by T1 will be different than 0 or 5 with modern computer architectures and compilers? All the assembler code I investigated so far was using 32-bit memory reads and writes, which seems to save the integrity of the operation.
What about other primitive types?
Practical means that you can give an example of an existing architecture or a standard-compliant compiler where this (or a similar situation with a different code) is theoretically possible. I leave the word modern a bit subjective.
Edit: I can see many people noticing that I should not expect 5 to be read ever. That is perfectly fine to me and I did not say I do (though thanks for pointing this aspect out). My question was more about what kind of data integrity violation can happen with the above code.
In practice, you will not see anything else than 0 or 5 as far as I know (maybe some weird 16 bits architecture with 32 bits int where this is not the case).
However whether you actually see 5 at all is not guaranteed.
Suppose I am the compiler.
I see:
while (aObject.getFoo() == 0) {
printf("Sleeping");
sleep(1);
}
I know that:
printf cannot change aObject
sleep cannot change aObject
getFoo does not change aObject (thanks inline definition)
And therefore I can safely transform the code:
while (true) {
printf("Sleeping");
sleep(1);
}
Because there is no-one else accessing aObject during this loop, according to the C++ Standard.
That is what undefined behavior means: blown up expectations.
In practice, all mainstream 32-bit architectures perform 32-bit reads and writes atomically. You'll never see anything other than 0 or 5.
Not sure what you're looking for. On most modern architectures, there is a very distinct possibility that getFoo() always returns 0, even after changeFoo has been called. With just about any decent compiler, it's almost guaranteed that getFoo(), will always return the same value, regardless of any calls to changeFoo, if it is called in a tight loop.
Of course, in any real program, there will be other reads and writes, which will be totally unsynchronized with regards to the changes in foo.
And finally, there are 16 bit processors, and there may also be a possibility with some compilers that the uint32_t isn't aligned, so that the accesses won't be atomic. (Of course, you're only changing bits in one of the bytes, so this might not be an issue.)
In practice (for those who did not read the question), any potential problem boils down to whether or not a store operation for an unsigned int is an atomic operation which, on most (if not all) machines you will likely write code for, it will be.
Note that this is not stated by the standard; it is specific to the architecture you are targeting. I cannot envision a scenario in which a calling thread will red anything other than 0 or 5.
As to the title... I am unaware of varying degrees of "undefined behavior". UB is UB, it is a binary state.
Is there any practical chance that the values ever obtained by T1 will be different than 0 or 5 with modern computer architectures and compilers? What about other primitive types?
Sure - there is no guarantee that the entire data will be written and read in an atomic manner. In practice, you may end up with a read which occurred during a partial write. What may be interrupted, and when that happens depends on several variables. So in practice, the results could easily vary as size and alignment of types vary. Naturally, that variance may also be introduced as your program moves from platform to platform and as ABIs change. Furthermore, observable results may vary as optimizations are added and other types/abstractions are introduced. A compiler is free to optimize away much of your program; perhaps completely, depending of the scope of the instance (yet another variable which is not considered in the OP).
Beyond optimizers, compilers, and hardware specific pipelines: The kernel can even affect the manner in which this memory region is handled. Does your program Guarantee where the memory of each object resides? Probably not. Your object's memory may exist on separate virtual memory pages -- what steps does your program take to ensure the memory is read and written in a consistent manner for all platforms/kernels? (none, apparently)
In short: If you cannot play by the rules defined by the abstract machine, you should not use the interface of said abstract machine (e.g. you should just understand and use assembly if the specification of C++'s abstract machine is truly inadequate for your needs -- highly improbable).
All the assembler code I investigated so far was using 32-bit memory reads and writes, which seems to save the integrity of the operation.
That's a very shallow definition of "integrity". All you have is (pseudo-)sequential consistency. As well, the compiler needs only to behave as if in such a scenario -- which is far from strict consistency. The shallow expectation means that even if the compiler actually made no breaking optimization and performed reads and writes in accordance with some ideal or intention, that the result would be practically useless -- your program would observe changes typically 'long' after its occurrence.
The subject remains irrelevant, given what specifically you can Guarantee.
Undefined behavior means that the compiler can do what ever he wants. He could basically change your program to do what ever he likes, e.g. order a pizza.
See, #Matthieu M. answer for a less sarcastic version than this one. I won't delete this as I think the comments are important for the discussion.
Undefined behavior is guaranteed to be as undefined as the word undefined.
Technically, the observable behavior is pointless because it is simply undefined behavior, the compiler is not needed to show you any particular behavior. It may work as you think it should or not or may burn your computer, anything and everything is possible.
I've looked at the standard but couldn't find any indication that simply writing to memory would be considered observable behaviour. If not, that would mean the compiled code need not actually write to that memory. If a compiler choose to optimize away such access anything involving mapper memory, or shared memory, may not work.
1.9-8 seems to defined a very limited observable behaviour but indicates an implementation may define more. Can one assume than any quality compiler would treat modifying memory as an observable behaviour? That is, it may not guarantee atomicity or ordering, but does guarantee that data will eventually be written.
So, have I overlooked something in the standard, or is the writing to memory merely something the compiler decides to do?
Statements from the current or C++0x standard are good. Please note I'm not talking about accessing memory through a function, I mean direct access, such as writing data to a pointer (perhaps retrieved via mmap or another library function).
This kind of thing is what volatile exists for. Else, writing to memory and never apparently reading from it is not observable behaviour. However, in the general case, it would be quite impossible for the optimizer to prove that you never read it back except in relatively trivial examples, so it's not usually an issue.
Can one assume than any quality compiler would treat modifying memory as an observable behaviour?
No. Volatile is meant for marking that. However, you cannot fully trust the compiler even after adding the volatile qualifier, at least as told by a 2008 paper: http://www.cs.utah.edu/~regehr/papers/emsoft08-preprint.pdf
EDIT:
From C standard (not C++) http://c0x.coding-guidelines.com/5.1.2.3.html
An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced (including any caused by calling a function or accessing a volatile object).
My reading of C99 is that unless you specify volatile, how and when the variable is actually accessed is implementation defined. If you specify volatile qualifier then code must work according to the rules of an abstract machine.
Relevant parts in the standard are: 6.7.3 Type qualifiers (volatile description) and 5.1.2.3 Program execution (the abstract machine definition).
For some time now I know that many compilers actually have heuristics to detect cases when a variable should be reread again and when it is okay to use a cached copy. Volatile makes it clear to the compiler that every access to the variable should be actually an access to the memory. Without volatile it seems compiler is free to never reread the variable.
And BTW wrapping the access in a function doesn't change that since a function even without inline might be still inlined by the compiler within the current compilation unit.
From your question below:
Assume I use an array on the heap (unspecified where it is allocated),
and I use that array to perform a calculation (temp space). The
optimizer sees that it doesn't actually need any of that space as it
can use strictly registers. Does the compiler nonetheless write the
temp values to the memory?
Per MSalters below:
It's not guaranteed, and unlikely. Consider a a Static Single
Assignment optimizer. This figures out each possible write/read
dependency, and then assigns registers to optimize these dependencies.
As a side effect, any write that's not followed by a (possible) read
creates no dependencies at all, and is eliminated. In your example
("use strictly registers") the optimizer has satisfied all write/read
dependencies with registers, so it won't write to memory at all. All
reads produce the correct values, so it's a correct optimization.