How are non-POD static values initialized? [duplicate] - c++

This question already has answers here:
What is the lifetime of a static variable in a C++ function?
(5 answers)
Closed 8 years ago.
C++, unlike some other languages, allows static data to be of any arbitrary type, not just plain-old-data. Plain-old-data is trivial to initialize (the compiler just writes the value at the appropriate address in the data segment), but the other, more complex types, are not.
How is initialization of non-POD types typically implemented in C++? In particular, what exactly happens when the function foo is executed for the first time? What mechanisms are used to keep track of whether str has already been initialized or not?
#include <string>
void foo() {
static std::string str("Hello, Stack Overflow!");
}

C++11 requires the initialization of function local static variables to be thread-safe. So at least in compilers that are compliant, there'll typically be some sort of synchronization primitive in use that'll need to be checked each time the function is entered.
For example, here's the assembly listing for the code from this program:
#include <string>
void foo() {
static std::string str("Hello, Stack Overflow!");
}
int main() {}
.LC0:
.string "Hello, Stack Overflow!"
foo():
cmpb $0, guard variable for foo()::str(%rip)
je .L14
ret
.L14:
pushq %rbx
movl guard variable for foo()::str, %edi
subq $16, %rsp
call __cxa_guard_acquire
testl %eax, %eax
jne .L15
.L1:
addq $16, %rsp
popq %rbx
ret
.L15:
leaq 15(%rsp), %rdx
movl $.LC0, %esi
movl foo()::str, %edi
call std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const*, std::allocator<char> const&)
movl guard variable for foo()::str, %edi
call __cxa_guard_release
movl $__dso_handle, %edx
movl foo()::str, %esi
movl std::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string(), %edi
call __cxa_atexit
jmp .L1
movq %rax, %rbx
movl guard variable for foo()::str, %edi
call __cxa_guard_abort
movq %rbx, %rdi
call _Unwind_Resume
main:
xorl %eax, %eax
ret
The __cxa_guard_acquire, __cxa_guard_release etc. are guarding initialization of the static variable.

The implementation that I've seen uses a hidden boolean variable to check if the variable is initialized. Modern compiler will do this thread-safely, but IIRC, some older compilerd did not do that, and if it was called from several threads at the same time you could get the constructor called twice.
Something along the lines of:
static bool __str_initialized = false;
static char __mem_for_str[...]; //std::string str("Hello, Stack Overflow!");
void foo() {
if (!__str_initialized)
{
lock();
__str_initialized = true;
new (__mem_for_str) std::string("Hello, Stack Overflow!");
unlock();
}
}
Then, in the finalization code of the program:
if (__str_initialized)
((std::string&)__mem_for_str).~std::string();

It's implementation specific.
Typically, there'll be a flag (statically initialised to zero) to indicate whether it's initialised, and (in C++11, or earlier thread-safe implementations) some kind of mutex, also statically initialisable, to protect against multiple threads trying to in initialise it.
The generated code would typically behave something along the lines of
static __atomic_flag_type __initialised = false;
static __mutex_type __mutex = __MUTEX_INITIALISER;
if (!__initialised) {
__lock_type __lock(__mutex);
if (!__initialised) {
__initialise(str);
__initialised = true;
}
}

You can check what your compiler does by generating an assembler listing.
MSVC2008 in debug mode generates this code (excluding exception handling prolog/epilog etc):
mov eax, DWORD PTR ?$S1#?1??foo##YA_NXZ#4IA
and eax, 1
jne SHORT $LN1#foo
mov eax, DWORD PTR ?$S1#?1??foo##YA_NXZ#4IA
or eax, 1
mov DWORD PTR ?$S1#?1??foo##YA_NXZ#4IA, eax
mov DWORD PTR __$EHRec$[ebp+8], 0
mov esi, esp
push OFFSET ??_C#_0BH#ENJCLPMJ#Hello?0?5Stack?5Overflow?$CB?$AA#
mov ecx, OFFSET ?str#?1??foo##YA_NXZ#4V?$basic_string#DU?$char_traits#D#std##V?$allocator#D#2##std##A
call DWORD PTR __imp_??0?$basic_string#DU?$char_traits#D#std##V?$allocator#D#2##std##QAE#PBD#Z
cmp esi, esp
call __RTC_CheckEsp
push OFFSET ??__Fstr#?1??foo##YA_NXZ#YAXXZ ; `foo'::`2'::`dynamic atexit destructor for 'str''
call _atexit
add esp, 4
mov DWORD PTR __$EHRec$[ebp+8], -1
$LN1#foo:
i.e there is a static variable referenced by ?$S1#?1??foo##YA_NXZ#4IA this is checked to see if it & 1 is zero. if not it branches to the label $LN1#foo:. Otherwise it or's in 1 to the flag, constructs the string at a known location and then adds a call for its destructor at program exit using 'atexit'. Then continues the function as normal.

Related

Casting from double always returns zero

Why does the following code returns zero, when compiled with clang?
#include <stdint.h>
uint64_t square() {
return ((uint32_t)((double)4294967296.0)) - 10;
}
The assembly it produces is:
push rbp
mov rbp, rsp
xor eax, eax
pop rbp
ret
I would have expected that double to become zero (integer) and that minus would wrap it around. In other words, why doesn't in matter what number there is to subtract, as it always produces zero? Note that gcc does produce different numbers, as expected:
push rbp
mov rbp, rsp
mov eax, 4294967285
pop rbp
ret
I assume casting 4294967296.0 to uint32_t is undefined behaviour but even then, I would expect to produce different results for different subtrahends.
The behaviour of casting an out-of-range double to an unsigned type is indeed undefined.
Wrap-around does not apply, even for an unsigned type.
This is an oft-forgotten rule.
Once program control reaches an undefined construct, the entire program is undefined, even somewhat paradoxically, statements that have already been ran.
Reference: https://timsong-cpp.github.io/cppwp/conv.fpint#1
g++ 5.4.0 gives 4294967285 (-11 unsigned), so it is up to the compiler what it wants to do with undefined behavior.
__Z6squarev:
LFB1023:
.cfi_startproc
pushl %ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
movl %esp, %ebp
.cfi_def_cfa_register 5
movl $-11, %eax
movl $0, %edx
popl %ebp
.cfi_restore 5
.cfi_def_cfa 4, 4
ret
.cfi_endproc

Are arguments loaded into the cache for empty functions?

I know that C++ compilers optimize empty (static) functions.
Based on that knowledge I wrote a piece of code that should get optimized away whenever I some identifier is defined (using the -D option of the compiler).
Consider the following dummy example:
#include <iostream>
#ifdef NO_INC
struct T {
static inline void inc(int& v, int i) {}
};
#else
struct T {
static inline void inc(int& v, int i) {
v += i;
}
};
#endif
int main(int argc, char* argv[]) {
int a = 42;
for (int i = 0; i < argc; ++i)
T::inc(a, i);
std::cout << a;
}
The desired behavior would be the following:
Whenever the NO_INC identifier is defined (using -DNO_INC when compiling), all calls to T::inc(...) should be optimized away (due to the empty function body). Otherwise, the call to T::inc(...) should trigger an increment by some given value i.
I got two questions regarding this:
Is my assumption correct that calls to T::inc(...) do not affect the performance negatively when I specify the -DNO_INC option because the call to the empty function is optimized?
I wonder if the variables (a and i) are still loaded into the cache when T::inc(a, i) is called (assuming they are not there yet) although the function body is empty.
Thanks for any advice!
Compiler Explorer is an very useful tool to look at the assembly of your generated program, because there is no other way to figure out if the compiler optimized something or not for sure. Demo.
With actually incrementing, your main looks like:
main: # #main
push rax
test edi, edi
jle .LBB0_1
lea eax, [rdi - 1]
lea ecx, [rdi - 2]
imul rcx, rax
shr rcx
lea esi, [rcx + rdi]
add esi, 41
jmp .LBB0_3
.LBB0_1:
mov esi, 42
.LBB0_3:
mov edi, offset std::cout
call std::basic_ostream<char, std::char_traits<char> >::operator<<(int)
xor eax, eax
pop rcx
ret
As you can see, the compiler completely inlined the call to T::inc and does the incrementing directly.
For an empty T::inc you get:
main: # #main
push rax
mov edi, offset std::cout
mov esi, 42
call std::basic_ostream<char, std::char_traits<char> >::operator<<(int)
xor eax, eax
pop rcx
ret
The compiler optimized away the entire loop!
Is my assumption correct that calls to t.inc(...) do not affect the performance negatively when I specify the -DNO_INC option because the call to the empty function is optimized?
Yes.
If my assumption holds, does it also hold for more complex function bodies (in the #else branch)?
No, for some definition of "complex". Compilers use heuristics to determine whether it's worth it to inline a function or not, and bases its decision on that and on nothing else.
I wonder if the variables (a and i) are still loaded into the cache when t.inc(a, i) is called (assuming they are not there yet) although the function body is empty.
No, as demonstrated above, the loop doesn't even exist.
Is my assumption correct that calls to t.inc(...) do not affect the performance negatively when I specify the -DNO_INC option because the call to the empty function is optimized? If my assumption holds, does it also hold for more complex function bodies (in the #else branch)?
You are right. I have modified your example (i.e. removed cout which clutters the assembly) in compiler explorer to make it more obvious what happens.
The compiler optimizes everything away and outouts
main: # #main
movl $42, %eax
retq
Only 42 is leaded in eax and returned.
For the more complex case, however, more instructions are needed to compute the return value. See here
main: # #main
testl %edi, %edi
jle .LBB0_1
leal -1(%rdi), %eax
leal -2(%rdi), %ecx
imulq %rax, %rcx
shrq %rcx
leal (%rcx,%rdi), %eax
addl $41, %eax
retq
.LBB0_1:
movl $42, %eax
retq
I wonder if the variables (a and i) are still loaded into the cache when t.inc(a, i) is called (assuming they are not there yet) although the function body is empty.
They are only loaded, when the compiler cannot reason that they are unused. See the second example of compiler explorer.
By the way: You do not need to make an instance of T (i.e. T t;) in order to call a static function within a class. This is defeating the purpose. Call it like T::inc(...) rahter than t.inc(...).
Because the inline keword is used, you can safely assume 1. Using these functions shouldn't negatively affect performance.
Running your code through
g++ -c -Os -g
objdump -S
confirms this; An extract:
int main(int argc, char* argv[]) {
T t;
int a = 42;
1020: b8 2a 00 00 00 mov $0x2a,%eax
for (int i = 0; i < argc; ++i)
1025: 31 d2 xor %edx,%edx
1027: 39 fa cmp %edi,%edx
1029: 7d 06 jge 1031 <main+0x11>
v += i;
102b: 01 d0 add %edx,%eax
for (int i = 0; i < argc; ++i)
102d: ff c2 inc %edx
102f: eb f6 jmp 1027 <main+0x7>
t.inc(a, i);
return a;
}
1031: c3 retq
(I replaced the cout with return for better readability)

Inline function call from inline function?

This sounds too much of a simple question to not be answered already somewhere, but I tried to look around and I couldn't find any simple answer. Take the following example:
class vec
{
double x;
double y;
};
inline void sum_x(vec & result, vec & a, vec & b)
{
result.x = a.x + b.x;
}
inline void sum(vec & result, vec & a, vec & b)
{
sum_x(result, a, b);
result.y = a.y + b.y;
}
What happens when I call sum and compile? Will both sum and sum_x be inlined, so that it will just translate to an inline assembly code to sum the two components?
This looks like a trivial example, but I am working with a vector class that has the dimensionality defined in a template, so iterating over operations on vectors looks a bit like this.
inline is just a hint to the compiler. Whether the compiler actually inlines the function or not is a different question. For gcc there is an always inline attribute to force this.
__attribute__((always_inline));
With always inlining you should achieve what you described (code generate as if it where written in one function).
However, with all the optimizations and transformations applied by compilers you can only be sure if you check the generated code (assembly)
Yes, inlining may be applied recursively.
The entire set of operations that you're performing here can be inlined at the call site.
Note that this has very little to do with your use of the inline keyword, which (other than its effect on the ODR — which can be very noticeable) is just a hint and nowadays mostly ignored for purposes of actually inlining. The functions will be inlined because your clever compiler can see that they are good candidates for it.
The only way you can actually tell whether it's doing this is to inspect the resulting assembly yourself.
It depends. inline is just a hint to the compiler that it might want to think about inlining that function. It's entirely possible for a compiler to inline both calls, but that's up to the implementation.
As an example, here's some prettified assembly output from GCC with and without those inlines of this simple program:
int main()
{
vec a;
vec b;
std::cin >> a.x;
std::cin >> a.y;
sum(b,a,a);
std::cout << b.x << b.y;
return 0;
}
With inlining:
main:
subq $40, %rsp
leaq 16(%rsp), %rsi
movl std::cin, %edi
call std::basic_istream<char, std::char_traits<char> >& std::basic_istream<char, std::char_traits<char> >::_M_extract<double>(double&)
leaq 24(%rsp), %rsi
movl std::cin, %edi
call std::basic_istream<char, std::char_traits<char> >& std::basic_istream<char, std::char_traits<char> >::_M_extract<double>(double&)
movsd 24(%rsp), %xmm0
movapd %xmm0, %xmm1
addsd %xmm0, %xmm1
movsd %xmm1, 8(%rsp)
movsd 16(%rsp), %xmm0
addsd %xmm0, %xmm0
movl std::cout, %edi
call std::basic_ostream<char, std::char_traits<char> >& std::basic_ostream<char, std::char_traits<char> >::_M_insert<double>(double)
movsd 8(%rsp), %xmm0
movq %rax, %rdi
call std::basic_ostream<char, std::char_traits<char> >& std::basic_ostream<char, std::char_traits<char> >::_M_insert<double>(double)
movl $0, %eax
addq $40, %rsp
ret
subq $8, %rsp
movl std::__ioinit, %edi
call std::ios_base::Init::Init()
movl $__dso_handle, %edx
movl std::__ioinit, %esi
movl std::ios_base::Init::~Init(), %edi
call __cxa_atexit
addq $8, %rsp
ret
Without:
sum_x(vec&, vec&, vec&):
movsd (%rsi), %xmm0
addsd (%rdx), %xmm0
movsd %xmm0, (%rdi)
ret
sum(vec&, vec&, vec&):
movsd (%rsi), %xmm0
addsd (%rdx), %xmm0
movsd %xmm0, (%rdi)
movsd 8(%rsi), %xmm0
addsd 8(%rdx), %xmm0
movsd %xmm0, 8(%rdi)
ret
main:
pushq %rbx
subq $48, %rsp
leaq 32(%rsp), %rsi
movl std::cin, %edi
call std::basic_istream<char, std::char_traits<char> >& std::basic_istream<char, std::char_traits<char> >::_M_extract<double>(double&)
leaq 40(%rsp), %rsi
movl std::cin, %edi
call std::basic_istream<char, std::char_traits<char> >& std::basic_istream<char, std::char_traits<char> >::_M_extract<double>(double&)
leaq 32(%rsp), %rdx
movq %rdx, %rsi
leaq 16(%rsp), %rdi
call sum(vec&, vec&, vec&)
movq 24(%rsp), %rbx
movsd 16(%rsp), %xmm0
movl std::cout, %edi
call std::basic_ostream<char, std::char_traits<char> >& std::basic_ostream<char, std::char_traits<char> >::_M_insert<double>(double)
movq %rbx, 8(%rsp)
movsd 8(%rsp), %xmm0
movq %rax, %rdi
call std::basic_ostream<char, std::char_traits<char> >& std::basic_ostream<char, std::char_traits<char> >::_M_insert<double>(double)
movl $0, %eax
addq $48, %rsp
popq %rbx
ret
subq $8, %rsp
movl std::__ioinit, %edi
call std::ios_base::Init::Init()
movl $__dso_handle, %edx
movl std::__ioinit, %esi
movl std::ios_base::Init::~Init(), %edi
call __cxa_atexit
addq $8, %rsp
ret
As you can see, GCC inlined both functions when asked to.
If your assembly is a bit rusty, simply note that sum is present and called in the second version, but not in the first.
As mentioned, the inline keyword is just a hint. However, compilers do an amazing job here (even without your hints), and they do inline recursively.
If you're really interested in this stuff, I recommend learning a bit about compiler design. I've been studying it recently and it blew my mind what complex beasts our production-quality compilers are today.
About inlining, this is one of the things that compilers tend to do an extremely good job at. It was so by necessity, since if you look at how we write code in C++, we often write accessor functions (methods) just to do nothing more than return the value of a single variable. C++'s popularity hinged in large part on the idea that we can write this kind of code utilizing concepts like information hiding without being forced into creating software that is slower than its C-like equivalent, so you often found optimizers as early as the 90s doing a really good job at inlining (and recursively).
For this next part, it's somewhat speculative as I'm somewhat assuming that what I've been reading and studying about compiler design is applicable towards the production-quality compilers we're using today. Who knows exactly what kind of advanced tricks they're all applying?
... but I believe compilers typically inline code before you get to the kind of machine code level. This is because one of the keys to an optimizer is efficient instruction selection and register allocation. To do that, it needs to know all the memory (variables) the code is going to be working with inside a procedure. It wants that in a form that is somewhat abstract where specific registers haven't been chosen yet but are ready to be assigned. So inlining is usually done at this intermediate representation stage, before you get to the kind of assembly realm of specific machine instructions and registers, so that the compiler can gather up all that information before it does its magical optimizations. It might even apply some heuristics here to kind of 'try' inlining or unrolling away branches of code prior to actually doing it.
A lot of linkers can even inline code, and I'm not sure how that works. I think when they can do that, the object code is actually still in an intermediate representation form, still somewhat abstracted away from specific machine-level instructions and registers. Then the linker can still move that code between object files and inline it, deferring that code generation/optimization process until after.

how do static variables inside functions work?

In the following code:
int count(){
static int n(5);
n = n + 1;
return n;
}
the variable n is instantiated only once at the first call to the function.
There should be a flag or something so it initialize the variable only once.. I tried to look on the generated assembly code from gcc, but didn't have any clue.
How does the compiler handle this?
This is, of course, compiler-specific.
The reason you didn't see any checks in the generated assembly is that, since n is an int variable, g++ simply treats it as a global variable pre-initialized to 5.
Let's see what happens if we do the same with a std::string:
#include <string>
void count() {
static std::string str;
str += ' ';
}
The generated assembly goes like this:
_Z5countv:
.LFB544:
.cfi_startproc
.cfi_personality 0x3,__gxx_personality_v0
.cfi_lsda 0x3,.LLSDA544
pushq %rbp
.cfi_def_cfa_offset 16
movq %rsp, %rbp
.cfi_offset 6, -16
.cfi_def_cfa_register 6
pushq %r13
pushq %r12
pushq %rbx
subq $8, %rsp
movl $_ZGVZ5countvE3str, %eax
movzbl (%rax), %eax
testb %al, %al
jne .L2 ; <======= bypass initialization
.cfi_offset 3, -40
.cfi_offset 12, -32
.cfi_offset 13, -24
movl $_ZGVZ5countvE3str, %edi
call __cxa_guard_acquire ; acquire the lock
testl %eax, %eax
setne %al
testb %al, %al
je .L2 ; check again
movl $0, %ebx
movl $_ZZ5countvE3str, %edi
.LEHB0:
call _ZNSsC1Ev ; call the constructor
.LEHE0:
movl $_ZGVZ5countvE3str, %edi
call __cxa_guard_release ; release the lock
movl $_ZNSsD1Ev, %eax
movl $__dso_handle, %edx
movl $_ZZ5countvE3str, %esi
movq %rax, %rdi
call __cxa_atexit ; schedule the destructor to be called at exit
jmp .L2
.L7:
.L3:
movl %edx, %r12d
movq %rax, %r13
testb %bl, %bl
jne .L5
.L4:
movl $_ZGVZ5countvE3str, %edi
call __cxa_guard_abort
.L5:
movq %r13, %rax
movslq %r12d,%rdx
movq %rax, %rdi
.LEHB1:
call _Unwind_Resume
.L2:
movl $32, %esi
movl $_ZZ5countvE3str, %edi
call _ZNSspLEc
.LEHE1:
addq $8, %rsp
popq %rbx
popq %r12
popq %r13
leave
ret
.cfi_endproc
The line I've marked with the bypass initialization comment is the conditional jump instruction that skips the construction if the variable already points to a valid object.
This is entirely up to the implementation; the language standard says nothing about that.
In practice, the compiler will usually include a hidden flag variable somewhere that indicates whether the static variable has already been instantiated or not. The static variable and the flag will probably be in the static storage area of the program (e.g. the data segment, not the stack segment), not in the function scope memory, so you may have to look around about in the assembly. (The variable can't go on the call stack, for obvious reasons, so it's really like a global variable. "static allocation" really covers all sorts of static variables!)
Update: As #aix points out, if the static variable is initialized to a constant expression, you may not even need a flag, because the initialization can be performed at load time rather than at the first function call. In C++11 you should be able to take advantage of that better than in C++03 thanks to the wider availability of constant expressions.
It's quite likely that this variable will be handled just as ordinary global variable by gcc. That means the initialization will be statically initialized directly in the binary.
This is possible, since you initialize it by a constant. If you initialized it eg. with another function return value, the compiler would add a flag and skip the initialization based on the flag.

temporary variables and performance in c++ [duplicate]

This question already has answers here:
Do temp variables slow down my program?
(5 answers)
Closed 5 years ago.
Let's say we have two functions:
int f();
int g();
I want to get the sum of f() and g().
First way:
int fRes = f();
int gRes = g();
int sum = fRes + gRes;
Second way:
int sum = f() + g();
Will be there any difference in performance in this two cases?
Same question for complex types instead of ints
EDIT
Do I understand right i should not worry about performance in such case (in each situation including frequently performed tasks) and use temporary variables to increase readability and to simplify the code ?
You can answer questions like this for yourself by compiling to assembly language (with optimization on, of course) and inspecting the output. If I flesh your example out to a complete, compilable program...
extern int f();
extern int g();
int direct()
{
return f() + g();
}
int indirect()
{
int F = f();
int G = g();
return F + G;
}
and compile it (g++ -S -O2 -fomit-frame-pointer -fno-exceptions test.cc; the last two switches eliminate a bunch of distractions from the output), I get this (further distractions deleted):
__Z8indirectv:
pushq %rbx
call __Z1fv
movl %eax, %ebx
call __Z1gv
addl %ebx, %eax
popq %rbx
ret
__Z6directv:
pushq %rbx
call __Z1fv
movl %eax, %ebx
call __Z1gv
addl %ebx, %eax
popq %rbx
ret
As you can see, the code generated for both functions is identical, so the answer to your question is no, there will be no performance difference. Now let's look at complex numbers -- same code, but s/int/std::complex<double>/g throughout and #include <complex> at the top; same compilation switches --
__Z8indirectv:
subq $72, %rsp
call __Z1fv
movsd %xmm0, (%rsp)
movsd %xmm1, 8(%rsp)
movq (%rsp), %rax
movq %rax, 48(%rsp)
movq 8(%rsp), %rax
movq %rax, 56(%rsp)
call __Z1gv
movsd %xmm0, (%rsp)
movsd %xmm1, 8(%rsp)
movq (%rsp), %rax
movq %rax, 32(%rsp)
movq 8(%rsp), %rax
movq %rax, 40(%rsp)
movsd 48(%rsp), %xmm0
addsd 32(%rsp), %xmm0
movsd 56(%rsp), %xmm1
addsd 40(%rsp), %xmm1
addq $72, %rsp
ret
__Z6directv:
subq $72, %rsp
call __Z1gv
movsd %xmm0, (%rsp)
movsd %xmm1, 8(%rsp)
movq (%rsp), %rax
movq %rax, 32(%rsp)
movq 8(%rsp), %rax
movq %rax, 40(%rsp)
call __Z1fv
movsd %xmm0, (%rsp)
movsd %xmm1, 8(%rsp)
movq (%rsp), %rax
movq %rax, 48(%rsp)
movq 8(%rsp), %rax
movq %rax, 56(%rsp)
movsd 48(%rsp), %xmm0
addsd 32(%rsp), %xmm0
movsd 56(%rsp), %xmm1
addsd 40(%rsp), %xmm1
addq $72, %rsp
ret
That's a lot more instructions and the compiler isn't doing a perfect optimization job, it looks like, but nonetheless the code generated for both functions is identical.
I think in the second way it is assigned to a temporary variable when the function returns a value anyway. However, it becomes somewhat significant when you need to use the values from f() and g() more than once case in which storing them to a variable instead of recalculating them each time can help.
If you have optimization turned off, there likely will be. If you have it turned on, they will likely result in identical code. This is especially true of you label the fRes and gRes as const.
Because it's legal for the compiler to elide the call to the copy constructor if fRes and gRes are complex types they will not differ in performance for complex types either.
Someone mentioned using fRes and gRes more than once. And of course, this is obviously potentially less optimal as you would have to call f() or g() more than once.
As you wrote it, there's only a subtle difference (which another answer addresses, that there's a sequence point in the one vs the other).
They would be different if you had done this instead:
int fRes;
int gRes;
fRes = f();
fRes = g();
int sum = fRes + gRes;
(Imagining that int as actually some other type with a non-trivial default constructor.)
In the case here, you invoke default constructors and then assignment operators, which is potentially more work.
It depends entirely on what optimizations the compiler performs. The two could compile to slightly different or exactly the same bytecode. Even if slightly different, you couldn't measure a statistically significant difference in time and space costs for those particular samples.
On my platform with full optimization turned on, a function returning the sum from both different cases compiled to exactly the same machine code.
The only minor difference between the two examples is that the first guarantees the order in which f() and g() are called, so in theory the second allows the compiler slightly more flexibility. Whether this ever makes a difference would depend on what f() and g() actually do and, perhaps, whether they can be inlined.
There is a slight difference between the two examples. In expression f() + g() there is no sequence point, whereas when the calls are made in different statements there are sequence points at the end of each statement.
The absence of a sequence point means the order these two functions are called is unspecified, they can be called in any order, which might help the compiler optimize it.