C++: Pass Vector struct by reference or value? - c++

This is my Vector struct: struct Vector{ float x, y; };
Should I pass it to functions by value or as const Vector&?

If you pass by value, the function will get a copy that it can modify locally without affecting the caller.
If you pass by const-reference, the function will only get a read-only reference. No copying involved, but the called function cannot modify it locally.
Given the size of the struct, the copy overhead will be very small. So choose what is best/easiest for the called function.

For such a small structure, either one may be the most efficient, depending on your platform and compiler.
(The name may be a bit confusing to other programmers, since vector in C++ usually means "dynamic array". How's about Vector2D?)

I really depends on your platform and compiler, and whether the function is inline or not.
When passed by reference, the structure is not copied, only its address is stored on the stack, not the content. When passed by value, the content is copied. On a 64-bit platform the size of the struct is the same as a pointer to the struct (assuming 64-bit pointers which seems to be the more common situation). So, the gains of passing by reference is not really clear here.
However, there is another thing to consider. Your structure contains float value. On Intel architecture, they can be stored in the FPU or in SIMD register before the call to the function. In that situation, if the function take the parameter by reference, then, they will have to be spilled to memory, and the address to this memory passed to the function. This can be really slow. If they had been passed by value, no copy to memory would be needed (faster). And one some platform (PS3), the compiler will not be smart enough to remove those spilling, even in case of inline function.
In fact, like every question of micro-optimisation, there is no "good answer", it all depends on what usage you make of the function, and what your compiler / platform want. The best would be to mesure (or use a tool to analyse assembly) to check what is the best for your platform / compiler combination.
I'm going to finish by quoting Jaymin Kessler from Q-Games that is much more versed in those topics than I can ever be:
2) If a type fits in a register, pass it by value. DO NOT PASS VECTOR TYPES BY REFERENCE, ESPECIALLY CONST REFERENCE. If the function ends up getting inlined, GCC occasionally will go to memory when it hits the reference. I’ll say it again: If the type you are using fits in registers (float, int, or vector) do not pass it to the function by anything but value. In the case of non-sane compilers like Visual Studio for x86, it can’t maintain the alignment of objects on the stack, and therefore objects that have align directives must be passed to functions by reference. This may be fixed or the Xbox 360. If you are multiplatform, the best thing you can do is make a parameter passing typedef to avoid having to cater to the lowest common denominator.
Considering the following code:
struct Vector { float x, y; };
extern Vector DoSomething1(Vector v);
extern Vector DoSomething2(const Vector& v);
void Test1()
{
Vector v0 = { 1., 2. };
Vector v1 = DoSomething1(v0);
}
void Test2()
{
Vector v0 = { 1., 2. };
Vector v1 = DoSomething2(v0);
}
From the point of view of the code, the only difference between Test1 and Test2 is the calling convention used by DoSomething1 and DoSomething2 to receive the Vector struct. When compiled with g++ (version 4.2, architecture x86_64), the generated code is:
.globl __Z5Test1v
__Z5Test1v:
LFB2:
movabsq $4611686019492741120, %rax
movd %rax, %xmm0
jmp __Z12DoSomething16Vector
LFE2:
.globl __Z5Test2v
__Z5Test2v:
LFB3:
subq $24, %rsp
LCFI0:
movl $0x3f800000, (%rsp)
movl $0x40000000, 4(%rsp)
movq %rsp, %rdi
call __Z12DoSomething2RK6Vector
addq $24, %rsp
ret
LFE3:
We can see that in the case of Test1, the value are passed via the %xmm0 SIMD register once loaded from memory (so if they where the result of a previous computation, they would already be in the register and there would be no need to load them from memory). On the other hand, in the case of Test2, the value are passed on the stack (movl $0x3f800000, (%rsp) push 1.0f on the stack). And if they where the result of a previous computation, that would require copying them from the %xmm0 SIMD register. And that can be really slow (it may well stall the pipeline until the value is available, and if the stack is not properly aligned, the copy will also be slow).
So if your function is not inline, prefer to pass by copy instead of const-reference. If the function is indeed inline, watch the code generated before making your mind.

As a reference. That is more efficient than making a copy of the structure and passing that (i.e. passing by value).
The only exception is if your platform can fit the entire structure into a register.

Related

How are rvalues assigned to lvalues in assembly?

First question here. I will in a few weeks/months need to create procedural code in which there will be functions assigning big (I mean really big) sets of data directly to pointers. Here is some example of code I will be doing :
void MyFuntion(string* str)
{
*str = "some data in a string";
}
As it surely is important : I am on windows 10, in visual-studio 2019, compiling with the default c++ compiler on release x86.
Imagine something like this but with strings that can contain several millions of characters, or with int/float arrays also with several millions of elements.
So, this is a single operation assigning a rvalue to a pointer, which is therefore on the heap. Of course, if I create a local variable containing the data, it will be more than 1MB and therefore will cause a stack overflow, right ?
As I understand, since the data only exists as a rvalue here, it doesn't have a memory existence, but I would like to know : how is the rvalue assigned to the pointer ? Like, how is it done in assembly ? I must say I have never done any assembly, I have a few (very few) notions but I'd like to get into it when I have time.
Is it temporary created in the stack or heap before being put in the final memory address ? My guess is that the memory address (the pointer in which I am assigning the data) is directly filled with the data, like, bit by bit, so no existence of the rvalue in memory.
If I'm correct, the only things that exist in the stack here are : the function call, the pointer copy, then the instruction, which should be something like "assign rvalue X to lvalue Y" and the size of the instruction doesn't depend on the size of the rvalue and lvalue, so there should not be any problem regarding the stack here.
So, if I'm correct, this code should not cause any problem, no matter how big the rvalue is, but I would still like to know how it is done exactly, assembly-wise. Note that I am not only looking for an answer, but more like some references, books or docs, that could explain in detail. I guess what I am looking for won't be in a c++ book, but more like a assembly book, this might be a good starting point to get myself into it !
Although a specific OS and compiler were mentioned, the example assembly in this answer will probably differ from what the querent's compiler would output, because I don't have a Windows 10 machine available at the time of writing and used a different environment having forgotten about Godbolt. However, this topic is general enough in my opinion that it shouldn't really matter in this specific case.
What even is a value on the right side of an assignment operator? What does assignment look like at the assembly level? Here's a simple example.
void assign_thing(int *p) {
*p = 42;
}
movl $42, (%rdi)
retq
"Move the 32-bit integer 42 into the memory location to which rdi is pointing." %rdi here represents p, and (%rdi) means *p. For something dead simple like an integer, it's pretty much that simple. How about a simple structure?
struct stuff {
int id;
float value;
char text[8];
};
void assign_thing(stuff *p) {
*p = {42, 1.5, "Hello!"};
}
movabsq $4593671619917905962, %rax
movq %rax, (%rdi)
movabsq $36762444129608, %rax
movq %rax, 8(%rdi)
retq
A little harder to read at first glance, but pretty much the same idea. The compiler was smart and packed the integer and float values 42 and 1.5 into a single 64-bit value and stuffs that directly into (%rdi). Likewise with the string "Hello!", which is short enough to fit into a single 64-bit value and gets stuffed into 8(%rdi) (8 bytes past p is the offset of text).
So far, none of the rvalues actually exist in memory when they get assigned. They're just part of the instructions. What if it's something a lot bigger, like a string?
// Overflow checking omitted for brevity.
void assign_thing(char *p) {
// Assignment with = doesn't actually do what you'd want here,
// so this'll have to do.
strcpy(p, "What if it's something a lot bigger, like a string?");
}
vmovups -5484(%rip), %ymm0
vmovups %ymm0, 20(%rdi) ; I'm guessing the disassembler meant to say 0x20
vmovups -5517(%rip), %ymm0
vmovups %ymm0, (%rdi)
vzeroupper
retq
Now, the rvalue does reside in memory when it gets assigned. Do note that this is not because strcpy was used instead of =, but because the compiler decided that it would be better to store that "rvalue" string somewhere in a read-only area like .rodata and just copy it over. If I had used a much shorter string, any reasonably modern compiler would probably optimize it into a few mov or movabsq instructions like in the second example. Unless p points to a buffer on the stack and your strcpy ends up overflowing it, you won't get a stack overflow here.
Now what about your example? I'm guessing that your string type is really std::string, and that's not a trivial type. So what happens there? In C++, the assignment operator = is overloadable, and std::string indeed has its own overloads, so instead of directly stuffing or copying values into the object, a special member function operator= is called. That is to say, your *str = "some data in a string" is really a str->operator=("some data in a string"). How your rvalue string gets copied is up to the implementation of std::string::operator=, but it'll most likely be optimized into something like my last example. The actual string data of an std::string resides on the heap, so stack overflow still isn't a problem here.
tl;dr (this answer + the comments, compressed into a few sentences)
If your string is small enough, it probably won't exist in memory during assignment. If it's big enough, it'll sit in a read-only area somewhere and get copied over when needed. The stack is often not even involved, so don't worry about overflow.

embed a functions assembly code in a struct

I've a rather special question: is it possible in C/++ (both because I am sure the question is the same in both languages) to specify a functions's location? Why? I have a very large list of function pointers, and I want to eliminate them.
(Currently) This looks like that(repeated over lika a million times, stored in the user's RAM):
struct {
int i;
void(* funptr)();
} test;
Because I know that in most assembly languages, functions are just "goto" directives, I had the following idea. Is it possible to optimize the above construct so that it looks like that?
struct {
int i;
// embed the assembler of the function here
// so that all the functions
// instructions are located here
// like this: mov rax, rbx
// jmp _start ; just demo code
} test2;
In the end, the thing should look like this in memory: An int holding any value, followed by the function's assembly code, referenced by test2. I should be able to call these functions like that: ((void(*)()) (&pointerToTheStruct + sizeof(int)))();
You might think that I'm insane to optimize the app that way, and I cannot disclose any more details on it's function, but if anyone has some pointers on how solve this problem, I would appreciate it.
I do not think that there is a standard way to this, so any hacky way to do this via inline assembler/other crazy things is also appreciated!
The only thing you really have to do is make the compiler aware of the (constant) value of the function pointer you want in the struct. The compiler will then (presumably/hopefully) inline that function call wherever it sees it called through that function pointer:
template<void(*FPtr)()>
struct function_struct {
int i;
static constexpr auto funptr = FPtr;
};
void testFunc()
{
volatile int x = 0;
}
using test = function_struct<testFunc>;
int main()
{
test::funptr();
}
Demo - no call or jmp after optimization.
It remains unclear what the point of the int i is. Note that the code is not technically "directly after the i" here, but it is even more unclear how you'd expect instances of the struct to look like (is the code in them or is it "static" in a way? I feel there is some misunderstanding here on your part what compilers actually produce...). But consider the ways that compiler inlining can help you and you might find the solution you need. If you're worried about executable size after inlining, tell the compiler and it will compromise between speed and size.
This sounds like a terrible idea for a lot of reasons that probably won't save memory, and will hurt performance by diluting L1I-cache with data and L1D-cache with code. And worse if you ever modify or copy objects: self-modifying code stalls.
But yes, this would be possible in C99/C11 with a flexible array member at the end of the struct, which you cast to a function pointer.
struct int_with_code {
int i;
char code[]; // C99 flexible array member. GNU extension in C++
// Store machine code here
// you can't get the compiler to do this for you. Good Luck!
};
void foo(struct int_with_code *p) {
// explicit C-style cast compiles as both C and C++
void (*funcp)(void) = ( void (*)(void) ) p->code;
funcp();
}
Compiler output from clang7.0, on the Godbolt compiler explorer is the same when compiled as either C or C++. This is targeting the x86-64 System V ABI, where the first function arg is passed in RDI.
# this is the code that *uses* such an object, not the code that goes in its code[]
# This proves that it compiles,
# without showing any way to get compiler-generated code into code[]
foo: # #foo
add rdi, 4 # move the pointer 4 bytes forward, to point at code[]
jmp rdi # TAILCALL
(If you leave out the (void) arg-type declaration in C, the compiler will zero AL first in the x86-64 SysV calling convention, in case its actually a variadic function, because it's passing no FP args in registers.)
You'd have to allocate your objects in memory that was executable (normally not done unless they're const with static storage), e.g. compile with gcc -zexecstack. Or use a custom mmap/mprotect or VirtualAlloc/VirtualProtect on POSIX or Windows.
Or if your objects are all statically allocated, it might be possible to massage compiler output to turn functions in the .text section into objects by adding an int member right before each one. Maybe with some .section and linker tricks, and maybe a linker script, you could even somehow automate it.
But unless they're all the same length (e.g. with padding like char code[60]), that won't form an array you can index, so you'll need some way of referencing all these variable-length object.
There are potentially huge performance downsides if you ever modify an object before calling its function: on x86 you'll get self-modifying-code pipeline nuke for executing code near a just-written memory location.
Or if you copied an object before calling its function: x86 pipeline flush, or on other ISAs you need to manually flush caches to get the I-cache in sync with D-cache (so the newly-written bytes can be executed). But you can't copy such objects because their size isn't stored anywhere. You can't search the machine code for a ret instruction, because a 0xc3 byte might appear somewhere that's not the start of an x86 instruction. Or on any ISA, the function might have multiple ret instructions (tail duplication optimization). Or end with a jmp instead of a ret (tailcall).
Storing a size would start to defeat the purpose of saving size, eating up at least an extra byte in each object.
Writing code to an object at runtime, then casting to a function pointer, is undefined behaviour in ISO C and C++. On GNU C/C++, make sure you call __builtin___clear_cache on it to sync caches or whatever else is necessary. Yes, this is needed even on x86 to disable dead-store elimination optimizations: see this test case. On x86 it's just a compile-time thing, no extra asm. It doesn't actually clear any caches.
If you do copy at runtime startup, maybe allocate a big chunk of memory and carve out variable-length chunks of it, while copying. If you malloc each separately, you're wasting memory-management overhead on it.
This idea will not save you memory unless you have about as many functions as you have objects
Normally you have a fairly limited number of actual functions, with many objects having copies of the same function pointer. (You've kind of hand-rolled C++ virtual functions, but with only one function you just have a function pointer directly instead of a vtable pointer to a table of pointers for that class type. One fewer levels of indirection, and apparently you're not passing the object's own address to the function.)
One of the several benefits of this level of indirection is that one pointer is usually significantly smaller than the entire code for a function. For that to not be the case, your functions would have to be tiny.
Example: with 10 different functions of 32 bytes each, and 1000 objects with function pointers, you have a total of 320 bytes of code (which will stay hot in I-cache), and 8000 bytes of function pointers. (And in your objects, another 4 bytes per object wasted on padding to align the pointer, making the total size 16 instead of 12 bytes per object.) Anyway, that's 16320 bytes total for entire structs + code. If you allocated each object separately, there's per-object bookkeeping.
With inlining machine code into each object, and no padding, that's 1000 * (4+32) = 36000 bytes, over twice the total size.
x86-64 is probably a best-case scenario, where a pointer is 8 bytes and x86-64 machine code uses a (famously complex) variable-length instruction encoding which allows for high code density in some cases, especially when optimizing for code-size. (e.g. code-golfing. https://codegolf.stackexchange.com/questions/132981/tips-for-golfing-in-x86-x64-machine-code). But unless your functions are mostly something trivial like lea eax, [rdi + rdi*2] (3 bytes=opcode + ModRM + SIB) / ret (1 byte), they're still going to take more than 8 bytes. (That's return x*3; for a function that takes a 32-bit integer x arg, in the x86-64 System V ABI.)
If they're wrappers for larger functions, a normal call rel32 instruction is 5 bytes. A load of static data is at least 6 bytes (opcode + modrm + rel32 for a RIP-relative addressing mode, or loading EAX specifically can use the special no-modrm encoding for an absolute address. But in x86-64 that's a 64-bit absolute unless you use an address-size prefix too, potentially causing an LCP stall in the decoders on Intel. mov eax, [32 bit absolute address] = addr32 (0x67) + opcode + abs32 = 6 bytes again, so this is worse for no benefit).
Your function-pointer type doesn't have any args (assuming this is C++ where foo() means foo(void) in a declaration, not like old C where an empty arg list is somewhat similar to (...)). Thus we can assume you're not passing args, so to do anything useful the functions are probably accessing some static data or making another call.
Ideas that make more sense:
Use an ILP32 ABI like Linux x32, where the CPU runs in 64-bit mode but your code uses 32-bit pointers. This would make each of your objects only 8 bytes instead of 16. Avoiding pointer-bloat is a classic use-case for x32 or ILP32 ABIs in general.
Or (yuck) compile your code as 32-bit. But then you have obsolete 32-bit calling conventions that pass args on the stack instead of registers, and less than half the registers, and much higher overhead for position-independent code. (No EIP/RIP-relative addressing.)
Store an unsigned int table index to a table of function pointers. If you have 100 functions but 10k objects, the table is only 100 pointers long. In asm you could index an array of code directly (computed goto style) if all the functions were padded to the same length, but in C++ you can't do that. An extra level of indirection with a table of function pointers is probably your best bet.
e.g.
void (*const fptrs[])(void) = {
func1, func2, func3, ...
};
struct int_with_func {
int i;
unsigned f;
};
void bar(struct int_with_func *p) {
fptrs[p->f] ();
}
clang/gcc -O3 output:
bar(int_with_func*):
mov eax, dword ptr [rdi + 4] # load p->f
jmp qword ptr [8*rax + fptrs] # TAILCALL # index the global table with it for a memory-indirect jmp
If you were compiling a shared library, PIE executable, or not targeting Linux, the compiler couldn't use a 32-bit absolute address to index a static array with one instruction. So there'd be a RIP-relative LEA in there and something like jmp [rcx+rax*8].
This is an extra level of indirection vs. storing a function pointer in each object, but it lets you shrink each object to 8 bytes, down from 16, like using 32-bit pointers. Or to 5 or 6 bytes, if you use an unsigned short or uint8_t and pack the structs with __attribute__((packed)) in GNU C.
No, not really.
The way to specify a function's location is to use a function pointer, which you're already doing.
You could make different types which have their own different member functions, but then you're back to the original problem.
I have in the past experimented with auto-generating (as a pre-build step, using Python) a function with a long switch statement that does the work of mapping int i to a normal function call. This gets rid of the function pointers, at the expense of branching. I don't remember whether it ended up being worthwhile in my case and, even if I did, that wouldn't tell us whether it's worthwhile in your case.
Because I know that in most assembly languages, functions are just "goto" directives
Well, it's perhaps a little more complicated than that…
You might think that I'm insane to optimize the app that way
Perhaps. Trying to eliminate indirection is not, in itself, a bad thing, so I don't think you're wrong to try to improve this. I just don't think that you necessarily can.
but if anyone has some pointers
lol
I don't understand the goal of this "optimization" is it about saving the memory?
I might be misunderstanding the question, but if you just replace your function pointer with a regular function, then you'll have your struct only containing the int as data and the function-pointer being inserted by the compiler when you take the address of it, instead of stored in memory.
So just do
struct {
int i;
void func();
} test;
Then sizeof(test)==sizeof(int) should hold true if you set alignment/packing to be tight.

Does ^= create a temporary variable in memory?

Does the bitwise operator ^= create a temporary variable in memory when it's used?
so for a example if I have:
a ^= b;
Does it create a copy of a in memory, then check against it and then assign? Or does it just straight up check and then assign without creating a temporary variable?
This is a compiler specific question, but I tried it with g++ -O2 and clang++ -O2. It compiled this:
int main (int argc, char** argv) {
int a = argc, b = argc * 3;
a ^= b;
return a;
}
to
leal (%rdi,%rdi,2), %eax
xorl %edi, %eax
ret
The a ^= b part responds to the xorl line, which, as you can see, is a single instruction. So gcc did not create and then assign a new variable, but just left the operation directly to the CPU.
Note that you should look at this purely because you find it interesting. From a performance point of view, you should not care about such things and leave it to your compiler. It is pretty good in optimizing this stuff, so focus your time and knowledge on writing correct and readable code!
Waht happens here is the following:
read a
read b
xor the previously variables
store result in a
Variables are a high level construct used in most programming languages (but not all). There is no variable created here.
Variables are not identical to memory. When the code above is executed on a CPU, the CPU has to load the operands from ram, if it is not already stored in a register. Weteher or not the input or output of such an oprtation is stored in RAM (aka "memory") or registers depends on the previous and the following operations.
So in short: No variables, data is most liekly copied before it is used, but not necessarily stored in RAM.
It probably happens in one of the processor's general purpose registers, not in memory but it depends on a lot of factors: CPU architecture, compiler optimization flags, surrounding code.
A possible scenario could be:
load the value of a into a register (let's say eax f.e.);
load the value of b into another register (let's say ebx);
xor the values stored in the two registers, the result goes to one of them (eax, f.e.);
store the value from the eax register into memory, at the address of a.
The variables a and b themselves might not get a place in memory and spend their lives only in processor registers if they are local variables initialized with constant values, they have short lifespans and the compiler decides it's a waste of time and memory to store them here just to ignore and discard them several instructions later (when the function returns).
In extreme cases, the presented code might not generate CPU instructions at all, if a is a local variable and the value of a is not used after the assignment, f.e.

C++ 64 bit int: pass by reference or pass by value

This is an efficiency question about 64 bit ints. Assuming I don't need to modify the value of a "int" parameter, should I pass it by value or reference.
Assuming 32 bit machine:
1) 32 bit int: I guess the answer is "pass by value" as "pass by reference" will have overhead of extra memory lookup.
2) 64 bit int: If I pass by reference, I only pass 32 bit address on the stack, but need an extra memory lookup. So which one of them is better (reference or value)?
What if the machine is 64 bit?
regards,
JP
Pass by value - definitely. If the system is 64-bit it means it copies 64-bit word extremely fast.
Even on a 64 bit machine pass by value is better (with some very few exceptions), because it can be passed as a register value.
Pass them as a boost::call_traits<int64_t>::param_type. This template captures the best practices for passing any type on the supported platforms. Hence, it will be different on 32 and 64 bits platforms, but you can use the same code everywhere. It even works inside other templates where you don't know the precise type yet.
Use a little common sense,
if the object requires a complex copy constructor, it's probably worth passing by reference (saying that - quite a lot of boost's objects are designed to be passed-by-value rather than reference simply because internal implementation is quite trivial) There is one odd one which I haven't really worked out, std::string, I always pass this by reference...
If you intend to modify the value that is passed in, use a reference
Else, PASS-BY-VALUE!
Do you have a particular performance bottleneck with arguments to functions? Else, don't spend too much time worrying about which is the best way to pass...
Optimizing by worrying about how an int is passed in is like pi**ing in the sea...
For argument's sake, lets ignore the trivial case of optimisers removing differences. Let's also say you're using Microsoft's Intel 64-bit calling conventions (which do differ from the Linux ABI), then you've got 4 64-bit registers for passing such values before you have to resort to pushing them on the stack. That's clearly better.
For a 32-bit app, by value and they'd go straight onto the stack. By-reference may instead put a pointer in a register (again, a few such register uses are allowed before resorting to the stack). We can this in some output from g++ -O3 -S, calling f1(99) by value and f2(101) by const reference:
void f1(int64_t);
void f2(const int64_t&);
int main()
{
f1(99);
f2(101);
}
...
pushl 0
pushl $99
call _Z2f1x // by value - pushed two halves to stack
leal -8(%ebp), %eax
movl %eax, (%esp)
movl $101, -8(%ebp)
movl $0, -4(%ebp)
call _Z2f2RKx // by const& - ugly isn't it!?!
The called function would then have to retrieve before first usage (if any). The called function's free to cache the values read in registers, so that's only needed once. With the stack approach, the value can be reread at will, so the register need not be reserved for that value. With the pointer approach, either the pointer or 64-bit value may need to be saved somewhere more predictable (e.g. pushed, or another less useful register) should that register need to be freed up momentarily for some other work, but the 64-bit int parameter be needed again later. All up, it's hard to guess which is faster - may be CPU/register-usage/optimiser/etc dependent, and it's not worth trying.
A node to pst's advice...
"efficiency" :( KISS. pass it how you pass every other bloody integer. - pst
...though, sometimes you apply KISS to template parameters and make them all const T& even though some may fit in registers....

what will be the addressing mode in assembly code generated by the compiler here?

Suppose we've got two integer and character variables:
int adad=12345;
char character;
Assuming we're discussing a platform in which, length of an integer variable is longer than or equal to three bytes, I want to access third byte of this integer and put it in the character variable, with that said I'd write it like this:
character=*((char *)(&adad)+2);
Considering that line of code and the fact that I'm not a compiler or assembly expert, I know a little about addressing modes in assembly and I'm wondering the address of the third byte (or I guess it's better to say offset of the third byte) here would be within the instructions generated by that line of code themselves, or it'd be in a separate variable whose address (or offset) is within those instructions ?
The best thing to do in situations like this is to try it. Here's an example program:
int main(int argc, char **argv)
{
int adad=12345;
volatile char character;
character=*((char *)(&adad)+2);
return 0;
}
I added the volatile to avoid the assignment line being completely optimized away. Now, here's what the compiler came up with (for -Oz on my Mac):
_main:
pushq %rbp
movq %rsp,%rbp
movl $0x00003039,0xf8(%rbp)
movb 0xfa(%rbp),%al
movb %al,0xff(%rbp)
xorl %eax,%eax
leave
ret
The only three lines that we care about are:
movl $0x00003039,0xf8(%rbp)
movb 0xfa(%rbp),%al
movb %al,0xff(%rbp)
The movl is the initialization of adad. Then, as you can see, it reads out the 3rd byte of adad, and stores it back into memory (the volatile is forcing that store back).
I guess a good question is why does it matter to you what assembly gets generated? For example, just by changing my optimization flag to -O0, the assembly output for the interesting part of the code is:
movl $0x00003039,0xf8(%rbp)
leaq 0xf8(%rbp),%rax
addq $0x02,%rax
movzbl (%rax),%eax
movb %al,0xff(%rbp)
Which is pretty straightforwardly seen as the exact logical operations of your code:
Initialize adad
Take the address of adad
Add 2 to that address
Load one byte by dereferencing the new address
Store one byte into character
Various optimizations will change the output... if you really need some specific behaviour/addressing mode for some reason, you might have to write the assembly yourself.
Without knowing anything about the compiler and the underlying CPU architecture no definitive answer can be given. For example, not all CPU architectures allow the addressing of every arbitrary byte in memory (though I believe all the currently popular ones do): on a CPU that's word-addressed, instead of byte-addressed, what the compiler will generate is inevitably going to be the loading into some register of the whole word adad (presumably by an offset from a base pointer register, if the variable in question is on stack [1]), followed by shifting and masking to isolate the byte of interest.
[1] note that, without knowing what CPU architecture we're talking about and how the compiler uses it, we can't even say whether "load a word at a fixed offset from a base register" is something that's done inline within the instruction (as one might hope, and many popular architectures definitely do support;-) or needs separate address arithmetic in an auxiliary register.
IOW, whether it's a good idea or not, it's definitely possible to define a CPU architecture which cannot load / store registers except from other registers or memory addresses defined by other registers or constant, and some such architectures exist (though they may not be all that popular at this time;-).