Undefined behavior from pointer math on a C++ array - c++

Why the output of this program is 4?
#include <iostream>
int main()
{
short A[] = {1, 2, 3, 4, 5, 6};
std::cout << *(short*)((char*)A + 7) << std::endl;
return 0;
}
From my understanding, on x86 little endian system, where char has 1 byte, and short 2 bytes, the output should be 0x0500, because the data in array A is as fallow in hex:
01 00 02 00 03 00 04 00 05 00 06 00
We move from the beginning 7 bytes forward, and then read 2 bytes. What I'm missing?

You are violating strict aliasing rules here. You can't just read half-way into an object and pretend it's an object all on its own. You can't invent hypothetical objects using byte offsets like this. GCC is perfectly within its rights to do crazy sh!t like going back in time and murdering Elvis Presley, when you hand it your program.
What you are allowed to do is inspect and manipulate the bytes that make up an arbitrary object, using a char*. Using that privilege:
#include <iostream>
#include <algorithm>
int main()
{
short A[] = {1, 2, 3, 4, 5, 6};
short B;
std::copy(
(char*)A + 7,
(char*)A + 7 + sizeof(short),
(char*)&B
);
std::cout << std::showbase << std::hex << B << std::endl;
}
// Output: 0x500
(live demo)
But you can't just "make up" a non-existent object in the original collection.
Furthermore, even if you have a compiler that can be told to ignore this problem (e.g. with GCC's -fno-strict-aliasing switch), the made-up object is not correctly aligned for any current mainstream architecture. A short cannot legally live at that odd-numbered location in memory†, so you doubly can't pretend there is one there. There's just no way to get around how undefined the original code's behaviour is; in fact, if you pass GCC the -fsanitize=undefined switch it will tell you as much.
† I'm simplifying a little.

The program has undefined behaviour due to casting an incorrectly aligned pointer to (short*). This breaks the rules in 6.3.2.3 p6 in C11, which is nothing to do with strict aliasing as claimed in other answers:
A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined.
In [expr.static.cast] p13 C++ says that converting the unaligned char* to short* gives an unspecified pointer value, which might be an invalid pointer, which can't be dereferenced.
The correct way to inspect the bytes is through the char* not by casting back to short* and pretending there is a short at an address where a short cannot live.

This is arguably a bug in GCC.
First, it is to be noted that your code is invoking undefined behavior, due to violation of the rules of strict aliasing.
With that said, here's why I consider it a bug:
The same expression, when first assigned to an intermediate short or short *, causes the expected behavior. It's only when passing the expression directly as a function argument, does the unexpected behavior manifest.
It occurs even when compiled with -O0 -fno-strict-aliasing.
I re-wrote your code in C to eliminate the possibility of any C++ craziness. Your question is was tagged c after all! I added the pshort function to ensure that the variadic nature printf wasn't involved.
#include <stdio.h>
static void pshort(short val)
{
printf("0x%hx ", val);
}
int main(void)
{
short A[] = {1, 2, 3, 4, 5, 6};
#define EXP ((short*)((char*)A + 7))
short *p = EXP;
short q = *EXP;
pshort(*p);
pshort(q);
pshort(*EXP);
printf("\n");
return 0;
}
After compiling with gcc (GCC) 7.3.1 20180130 (Red Hat 7.3.1-2):
gcc -O0 -fno-strict-aliasing -g -Wall -Werror endian.c
Output:
0x500 0x500 0x4
It appears that GCC is actually generating different code when the expression is used directly as an argument, even though I'm clearly using the same expression (EXP).
Dumping with objdump -Mintel -S --no-show-raw-insn endian:
int main(void)
{
40054d: push rbp
40054e: mov rbp,rsp
400551: sub rsp,0x20
short A[] = {1, 2, 3, 4, 5, 6};
400555: mov WORD PTR [rbp-0x16],0x1
40055b: mov WORD PTR [rbp-0x14],0x2
400561: mov WORD PTR [rbp-0x12],0x3
400567: mov WORD PTR [rbp-0x10],0x4
40056d: mov WORD PTR [rbp-0xe],0x5
400573: mov WORD PTR [rbp-0xc],0x6
#define EXP ((short*)((char*)A + 7))
short *p = EXP;
400579: lea rax,[rbp-0x16] ; [rbp-0x16] is A
40057d: add rax,0x7
400581: mov QWORD PTR [rbp-0x8],rax ; [rbp-0x08] is p
short q = *EXP;
400585: movzx eax,WORD PTR [rbp-0xf] ; [rbp-0xf] is A plus 7 bytes
400589: mov WORD PTR [rbp-0xa],ax ; [rbp-0xa] is q
pshort(*p);
40058d: mov rax,QWORD PTR [rbp-0x8] ; [rbp-0x08] is p
400591: movzx eax,WORD PTR [rax] ; *p
400594: cwde
400595: mov edi,eax
400597: call 400527 <pshort>
pshort(q);
40059c: movsx eax,WORD PTR [rbp-0xa] ; [rbp-0xa] is q
4005a0: mov edi,eax
4005a2: call 400527 <pshort>
pshort(*EXP);
4005a7: movzx eax,WORD PTR [rbp-0x10] ; [rbp-0x10] is A plus 6 bytes ********
4005ab: cwde
4005ac: mov edi,eax
4005ae: call 400527 <pshort>
printf("\n");
4005b3: mov edi,0xa
4005b8: call 400430 <putchar#plt>
return 0;
4005bd: mov eax,0x0
}
4005c2: leave
4005c3: ret
I get the same result with GCC 4.9.4 and GCC 5.5.0 from Docker hub

Related

C++ HOW can this out-of-range access inside struct go wrong?

#include <iostream>
#include <random>
using namespace std;
struct TradeMsg {
int64_t timestamp; // 0->7
char exchange; // 8
char symbol[17]; // 9->25
char sale_condition[4]; // 26 -> 29
char source_of_trade; // 30
uint8_t trade_correction; // 31
int64_t trade_volume; // 32->39
int64_t trade_price; // 40->47
};
static_assert(sizeof(TradeMsg) == 48);
char buffer[1000000];
template<class T, size_t N=1>
int someFunc(char* buffer, T* output, int& cursor) {
// read + process data from buffer. Return data in output. Set cursor to the last byte read + 1.
return cursor + (rand() % 20) + 1; // dummy code
}
void parseData(TradeMsg* msg) {
int cursor = 0;
cursor = someFunc<int64_t>(buffer, &msg->timestamp, cursor);
cursor = someFunc<char>(buffer, &msg->exchange, cursor);
cursor++;
int i = 0;
// i is GUARANTEED to be <= 17 after this loop,
// edit: the input data in buffer[] guarantee that fact.
while (buffer[cursor + i] != ',') {
msg->symbol[i] = buffer[cursor + i];
i++;
}
msg->symbol[i] = '\n'; // might access symbol[17].
cursor = cursor + i + 1;
for (i=0; i<4; i++) msg->sale_condition[i] = buffer[cursor + i];
cursor += 5;
//cursor = someFunc...
}
int main()
{
TradeMsg a;
a.symbol[17] = '\0';
return 0;
}
I have this struct that is guaranteed to have predictable size. In the code, there is a case where the program tries to assign value to an array element past its size msg->symbol[17] = ... .
However, in that case, the assignment does not cause any harm as long as:
It is done before the next struct members (sale_condition) are assigned (no unexpected code reordering).
It does not modifies any previous members (timestamp, exchange).
It does not access any memory outside the struct.
I read that this is undefined behavior. But what kind of compiler optimization/code generation can make this go wrong? symbol[17] is pretty deep inside the middle of the struct, so I don't see how can the compiler generates an access outside it. Assume that platform is x86-64 only
Various folks have pointed out debug-mode checks that will fire on access outside the bounds of an array member of a struct, with options like gcc -fsanitize=undefined. Separate from that, it's also legal for a compiler to use the assumption of non-overlap between member accesses to reorder two assignments which actually do alias:
#Peter in comments points out that the compiler is allowed to assume that accesses to msg->symbol[i] don't affect other struct members, and potentially delay msg->symbol[i] = '\n'; until after the loop that writes msg->sale_condition[i]. (i.e. sink that store to the bottom of the function).
There isn't a good reason you'd expect a compiler to want to do that in this function alone, but perhaps after inlining into some caller that also stored something there, it could be relevant. Or just because it's a DeathStation 9000 that exists in this thought experiment to break your code.
You could write this safely, although GCC compiles that worse
Since char* is allowed to alias any other object, you could offset a char* relative to the start of the whole struct, rather than to the start of the member array. Use offsetof to find the right start point like this:
#include <cstddef>
...
((char*)msg + offsetof(TradeMsg, symbol))[i] = '\n'; // might access symbol[17].
That's exactly equivalent to *((char*)msg + offsetof(...) + i) = '\n'; by definition of C++'s [] operator, even though it lets you use [i] to index relative to the same position.
However, that does compile to less efficient asm with GCC11.2 -O2. (Godbolt), mostly because int i, cursor are narrower than pointer-width. The "safe" version that redoes indexing from the start of the struct does more indexing work in asm, not using the msg+offsetof(symbol) pointer that it was already using as the base register in the loop.
# original version, with UB if `i` goes past the buffer.
# gcc11.2 -O2 -march=haswell. -O3 fully unrolls into a chain of copy/branch
... partially peeled first iteration
.L3: # do{
mov BYTE PTR [rbx+8+rax], dl # store into msg->symbol[i]
movsx rdi, eax # not read inside the loop
lea ecx, [r8+rax]
inc rax
movzx edx, BYTE PTR buffer[rsi+1+rax] # load from buffer
cmp dl, 44
jne .L3 # }while(buffer[cursor+i] != ',')
## End of copy-and-search loop.
# Loops are identical up to this point except for MOVSX here vs. MOV in the no-UB version.
movsx rcx, ecx # just redo sign extension of this calculation that was done repeatedly inside the loop just for this, apparently.
.L2:
mov BYTE PTR [rbx+9+rdi], 10 # store a newline
mov eax, 1 # set up for next loop
# offsetof version, without UB
# same loop, but with RDI and RSI usage switched.
# And with mov esi, eax zero extension instead of movsx rdi, eax sign extension
cmp dl, 44
jne .L3 # }while(buffer[cursor+i] != ',')
add esi, 9 # offsetof(TradeMsg, symbol)
movsx rcx, ecx # more stuff getting sign extended.
movsx rsi, esi # including something used in the newline store
.L2:
mov BYTE PTR [rbx+rsi], 10
mov eax, 1 # set up for next loop
The RCX calculation seems to just be for use by the next loop, setting sale_conditions.
BTW, the copy-and-search loop is like strcpy but with a ',' terminator. Unfortunately gcc/clang don't know how to optimize that; they compile to a slow byte-at-a-time loop, not e.g. an AVX512BW masked store using mask-1 from a vec == set1_epi8(',') compare, to get a mask selecting the bytes-before-',' instead of the comma element. (Probably needs a bithack to isolate that lowest-set-bit as the only set bit, though, unless it's safe to always copy 16 or 17 bytes separate from finding the ',' position, which could be done efficiently without masked stores or branching.)
Another option might be a union between a char[21] and struct{ char sym[17], sale[4];}, if you use a C++ implementation that allows C99-style union type-punning. (It's a GNU extension, and also supported by MSVC, but not necessarily literally every x86 compiler.)
Also, style-wise, shadowing int i = 0; with for( int i=0 ; i<4 ; i++ ) is poor style. Pick a different var name for that loop, like j. (Or if there is anything meaningful, a better name for i which has to survive across multiple loops.)
In a few cases:
When variable guard is set up: https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html
In a C++ interpreter (yes they exist): https://root.cern/cling/
Your symbol has a size of 17 Yet, you are trying to assign a value to the 18th index a.symbol[17] = '\0';
Remember your index value starts off at 0 not 1.
So you have two places that can go wrong. i can equal 17 which will cause an error and that last line I showed above will cause an error.

When casting a const to a non-const pointer in C++ 2017 and modifying it, where does the compiler store both values?

In Visual C++ 2017, when experimenting with what happens when you break the rules, I found that if I cast a const int to an int *, and then reassign a value to the int *, the debugger will change the value of the const, but the runtime execution won't.
This happens whether or not I run it in Debug mode or as a released executable. I'm aware it's undefined, but am looking for insight as to where these values are held, as they appear to be identical locations.
const int j = 100;
//int *q = &j; //Compiler disallows
int *q = (int*)&j; //By some magic, now allowed
*q = 300; //After this line, j = 300 in debugger
cout << "j = " << j << endl; //300 in debugger, 100 in console
//^ What is happening here? Where are the two values stored?
cout << "*q = " << *q << endl; //300 in both
//Output:
// j = 100
// *q = 300
Where are the two values being stored? This is like having one bucket that is simultaneously filled with two different liquids.
I'm aware that it's Undefined Behavior, but I was wondering if anyone could shed light on what is happening, internally.
The premise is flawed. The debugger works by the same C++17 rules, so it too can assume that there is no Undefined Behavior. That means it can check the source code and know j==100. There's no reason it would have to check the runtime value.
If an object is in const storage, a compiler may at its leisure replace it with two or more objects that have the same content if it can tell that the addresses are never compared. A compiler would not generally be able to do this if both objects' addresses get exposed to the outside world, but may do so in cases where one object is exposed but the other(s) are not.
Consider, for example:
const char Hey[4] = "Hey";
void test(int index)
{
char const *HeyPtr = Hey;
putchar(HeyPtr[index]);
}
A compiler processing test would be able to see that the value of HeyPtr is never exposed to outside code in any way, and on some platforms might benefit from having the test function use its own copy of the string. On a platform where addresses are 64 bits, if test doesn't include its own copy of the string, then eight bytes would needed to contain the address of Hey. The four bytes needed to store an extra copy of the string would cost less than the eight bytes needed to hold the address.
There are a few situations where the Standard offers guarantees that are stronger than programmers generally need. For example, given:
const int foo[] = {1,2,3,4};
const int bar[] = {1,2,3,4};
Unless a program happens to compare foo (or an address derived from it) with bar (likewise), using the same storage for both objects would save 16 bytes without affecting program semantics. The Standard, however, provides no means by which a programmer could indicate that code either won't compare those addresses, or would not be adversely affected if they happen to compare equal, so a compiler can only make such substitutions in cases where it can tell that a substituted object's address won't be exposed to code that might perform such comparisons.
Well just look at the generated assembly...
const int j = 100;
00052F50 mov dword ptr [j],64h
//int *q = &j; //Compiler disallows
int *q = (int*)&j; //By some magic, now allowed
00052F58 lea rax,[j]
00052F5D mov qword ptr [q],rax
*q = 300; //After this line, j = 300 in debugger
00052F62 mov rax,qword ptr [q]
00052F67 mov dword ptr [rax],12Ch
cout << "j = " << j << endl; //300 in debugger, 100 in console
00052F6D lea rdx,[__xt_z+114h (07FF679CC6544h)]
00052F74 lea rcx,[std::cout (07FF679D31B80h)]
00052F7B call std::operator<<<std::char_traits<char> > (07FF679B43044h)
00052F80 mov edx,64h
00052F85 mov rcx,rax
00052F88 call std::basic_ostream<char,std::char_traits<char> >::operator<< (07FF679B417E9h)
00052F8D lea rdx,[std::endl<char,std::char_traits<char> > (07FF679B42C25h)]
00052F94 mov rcx,rax
00052F97 call std::basic_ostream<char,std::char_traits<char> >::operator<< (07FF679B445F7h)
//^ What is happening here? Where are the two values stored?
cout << "*q = " << *q << endl; //300 in both
00052F9C lea rdx,[__xt_z+11Ch (07FF679CC654Ch)]
00052FA3 lea rcx,[std::cout (07FF679D31B80h)]
00052FAA call std::operator<<<std::char_traits<char> > (07FF679B43044h)
00052FAF mov rcx,qword ptr [q]
00052FB4 mov edx,dword ptr [rcx]
00052FB6 mov rcx,rax
00052FB9 call std::basic_ostream<char,std::char_traits<char> >::operator<< (07FF679B417E9h)
00052FBE lea rdx,[std::endl<char,std::char_traits<char> > (07FF679B42C25h)]
00052FC5 mov rcx,rax
00052FC8 call std::basic_ostream<char,std::char_traits<char> >::operator<< (07FF679B445F7h)
Notice the "weird" read from __xt_z+114h. That's an offset from the end of global initializers (__xt_z is probably the nearest symbol that the debugger found), most probably into the read-only data section (.rdata).
That's where the Debug version puts the 100 (it's a constant after all).
Then, an MSVC Debug version always allocates local variables and constants on the stack, hence you get a separate j variable, which you can even modify (Note the compiler does not have to read from it when you read j, since it knows j is a constant that contains 100).
If we try the same in Release mode, we see the compiler did value propagation and optimized away both variables, simply inlining the values into the code:
const int j = 100;
//int *q = &j; //Compiler disallows
int *q = (int*)&j; //By some magic, now allowed
*q = 300; //After this line, j = 300 in debugger
cout << "j = " << j << endl; //300 in debugger, 100 in console
000C101D lea rdx,[string "j = " (07FF72FAC3298h)]
000C1024 mov rcx,qword ptr [__imp_std::cout (07FF72FAC30A8h)]
000C102B call std::operator<<<std::char_traits<char> > (07FF72FAC1110h)
000C1030 mov edx,64h
000C1035 mov rcx,rax
000C1038 call qword ptr [__imp_std::basic_ostream<char,std::char_traits<char> >::operator<< (07FF72FAC30A0h)]
000C103E lea rdx,[std::endl<char,std::char_traits<char> > (07FF72FAC12E0h)]
000C1045 mov rcx,rax
000C1048 call qword ptr [__imp_std::basic_ostream<char,std::char_traits<char> >::operator<< (07FF72FAC3098h)]
//^ What is happening here? Where are the two values stored?
cout << "*q = " << *q << endl; //300 in both
000C104E lea rdx,[string "*q = " (07FF72FAC32A0h)]
000C1055 mov rcx,qword ptr [__imp_std::cout (07FF72FAC30A8h)]
000C105C call std::operator<<<std::char_traits<char> > (07FF72FAC1110h)
000C1061 mov edx,12Ch
000C1066 mov rcx,rax
000C1069 call qword ptr [__imp_std::basic_ostream<char,std::char_traits<char> >::operator<< (07FF72FAC30A0h)]
000C106F lea rdx,[std::endl<char,std::char_traits<char> > (07FF72FAC12E0h)]
000C1076 mov rcx,rax
000C1079 call qword ptr [__imp_std::basic_ostream<char,std::char_traits<char> >::operator<< (07FF72FAC3098h)]
In both cases the output is the same. A const variable remains unchanged.
Does any of it matter? No, you shouldn't rely on this behavior and you shouldn't modify constants.

Can someone explain the meaning of malloc(20 * c | -(20 * (unsigned __int64)(unsigned int)c >> 32 != 0))

In decompiled code generated by IDA I see expressions like:
malloc(20 * c | -(20 * (unsigned __int64)(unsigned int)c >> 32 != 0))
malloc(6 * n | -(3 * (unsigned __int64)(unsigned int)(2 * n) >> 32 != 0))
Can someone explain the purpose of these calculations?
c and n are int (signed integer) values.
Update.
Original C++ code was compiled with MSVC for 32-bit platform.
Here's assembly code for second line of decompiled C-code above (malloc(6 * ..)):
mov ecx, [ebp+pThis]
mov [ecx+4], eax
mov eax, [ebp+pThis]
mov eax, [eax]
shl eax, 1
xor ecx, ecx
mov edx, 3
mul edx
seto cl
neg ecx
or ecx, eax
mov esi, esp
push ecx ; Size
call dword ptr ds:__imp__malloc
I'm guessing that original source code used the C++ new operator to allocate an array and was compiled with Visual C++. As user3528438's answer indicates this code is meant to prevent overflows. Specifically it's a 32-bit unsigned saturating multiply. If the result of the multiplication would be greater than 4,294,967,295, the maximum value of a 32-bit unsigned number, the result is clamped or "saturated" to that maximum.
Since Visual Studio 2005, Microsoft's C++ compiler has generated code to protect against overflows. For example, I can generate assembly code that could be decompiled into your examples by compiling the following with Visual C++:
#include <stdlib.h>
void *
operator new[](size_t n) {
return malloc(n);
}
struct S {
char a[20];
};
struct T {
char a[6];
};
void
foo(int n, S **s, T **t) {
*s = new S[n];
*t = new T[n * 2];
}
Which, with Visual Studio 2015's compiler generates the following assembly code:
mov esi, DWORD PTR _n$[esp]
xor ecx, ecx
mov eax, esi
mov edx, 20 ; 00000014H
mul edx
seto cl
neg ecx
or ecx, eax
push ecx
call _malloc
mov ecx, DWORD PTR _s$[esp+4]
; Line 19
mov edx, 6
mov DWORD PTR [ecx], eax
xor ecx, ecx
lea eax, DWORD PTR [esi+esi]
mul edx
seto cl
neg ecx
or ecx, eax
push ecx
call _malloc
Most of the decompiled expression is actually meant to handle just one assembly statement. The assembly instruction seto cl sets CL to 1 if the previous MUL instruction overflows, otherwise it sets CL to 0. Similarly the expression 20 * (unsigned __int64)(unsigned int)c >> 32 != 0 evaluates to 1 if the result of 20 * c overflows, and evaluates to 0 otherwise.
If this overflow protection wasn't there and the result of 20 * c did actually overflow then the call to malloc would probably succeed, but allocate much less memory than the program intended. The program would then likely write past the end of the memory actually allocated and trash other bits of memory. This would amount to a buffer overrun, one that could be potentially exploited by hackers.
Since this code is decompiled from ASM, so we can only guess what it actually does.
Let's first format it so figure the precedence:
malloc(20 * c | -(20 * (unsigned __int64)(unsigned int)c >> 32 != 0))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
//this is first evaluated, promoting c to
//64 bit unsigned int without doing sign
//extension, regardless the type of c
malloc(20 * c | -(20 * (uint64_t)c >> 32 != 0))
^^^^^^^^^^^^^^^^
//then, multiply by 20, with uint64 result
malloc(20 * c | -(20 * (uint64_t)c >> 32 != 0))
^^^^^^^^^^^^^^^^^^^^^^^^^^^
//if 20c is greater than 2^32-1, then result is true,
//use -1 to generate a mask of 0xffffffff,
//bitwise operator | then masks 20c to 0xffffffff
//(2^32-1, the maximum of size_t, input type to malloc)
//regardless what 20c actually is
//if 20c is smaller than 2^32-1, then result is false,
//the mask is 0, bitwise operator | keeps the final
//input to malloc as 20c untouched
What are 20 and 6?
Those probably come from the common usage of
malloc(sizeof(Something)*count). Those two calls to malloc are probably made with sizeof(Something) and sizeof(SomethingElse) evaluated to 20 and 6 at compile time.
So what this code actually does:
My guess, it's trying to prevent sizeof(Something)*count from overflowing and cause the malloc to succeed and cause buffer overflow when the memory is used.
By evaluating the product in 64 bit unsigned int and test against 2^32-1, when size is greater than 2^32-1, the input to malloc is set to a very large value that makes it guaranteed to fail (No 32 bit system can allocate 2^32-1 bytes of memory).
Can someone explain the purpose of these calculations?
It is important to understand that compiling changes the semantic meaning of code. Much unspecified behavior of the original code becomes specified by the compilation process.
IDA has no idea whether things the generated assembly code just happens to do are important or not. To be safe, it tries to perfectly replicate the behavior of the assembly code, even in cases that cannot possibly happen given the way the code is used.
Here, IDA is probably replicating the overflow characteristics that the conversion of types just happens to have on this platform. It can't just replicate the original C code because the original C code likely had unspecified behavior for some values of c or n, likely negative ones.
For example, say I write this C code: int f(unsigned j) { return j; }. My compiler will likely turn that into very simple assembly code giving whatever behavior for negative values of j that my platform just happens to give.
But if you decompile the generated assembly, you cannot decompile it to int f(unsigned j) { return j; } because that will not behave the same as the my assembly code did on platforms with different overflow behavior. That could compile to code (on other platforms) that returns different values than my assembly code does for negative values of j.
So it is often literally impossible (in fact, incorrect) to decompile C code into the original code, it will often have these kinds of "portably replicate this platform's behavior" oddities.
it's rounding up to the nearest block size.
forgive me. What it's doing is calculating a multiple of c while simultaneously checking for a negative value (overflow):
#include <iostream>
#include <cstdint>
size_t foo(char c)
{
return 20 * c | -(20 * (std::uint64_t)(unsigned int)c >> 32 != 0);
}
int main()
{
using namespace std;
for (char i = -4 ; i < 4 ; ++i)
{
cout << "input is: " << int(i) << ", result is " << foo(i) << endl;
}
return 0;
}
results:
input is: -4, result is 18446744073709551615
input is: -3, result is 18446744073709551615
input is: -2, result is 18446744073709551615
input is: -1, result is 18446744073709551615
input is: 0, result is 0
input is: 1, result is 20
input is: 2, result is 40
input is: 3, result is 60
To me the number 18446744073709551615 doesn't mean much, at a glance. Only after seeing it expressed in hex I went "ah". – Jongware
adding << hex:
input is: -1, result is ffffffffffffffff

How do instruction sets differentiate value from reference

Let's look at this code:
int main ()
{
int a = 5;
int&b = a;
cout << a << endl; // 5 is displayed
cout << b << endl; // 5 is also displayed
return 0;
}
This is the behavior I saw in my debugger.
int a = 5 will assign value 5 in memory address -0x14(%rbp)
int& b = a will assign value -0x14(%rbp) in memory address -0x8(%rbp)
When I do cout << a << endl the value in the address of a (i.e. -0x14(%rbp)) will be displayed.
But somehow when I do cout << b << endl the value in the address of b (i.e. -0x8(%rbp)) is determined to be an address then the value of that address (-0x14(%rbp)) is displayed.
This is the assembly for the std::cout calls:
20 cout << a << endl;
0000000000401506: mov -0xc(%rbp),%eax
0000000000401509: mov %eax,%edx
000000000040150b: lea 0x6f8c9c6e(%rip),%rcx # 0x6fccb180 <libstdc++-6!_ZSt4cout>
0000000000401512: callq 0x4015f8 <_ZNSolsEi>
0000000000401517: lea 0xe2(%rip),%rdx # 0x401600 <_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_>
000000000040151e: mov %rax,%rcx
0000000000401521: callq 0x401608 <_ZNSolsEPFRSoS_E>
21 cout << b << endl;
0000000000401526: mov -0x8(%rbp),%rax
000000000040152a: mov (%rax),%eax
000000000040152c: mov %eax,%edx
000000000040152e: lea 0x6f8c9c4b(%rip),%rcx # 0x6fccb180 <libstdc++-6!_ZSt4cout>
0000000000401535: callq 0x4015f8 <_ZNSolsEi>
000000000040153a: lea 0xbf(%rip),%rdx # 0x401600 <_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_>
0000000000401541: mov %rax,%rcx
0000000000401544: callq 0x401608 <_ZNSolsEPFRSoS_E>
24 return 0;
Question:
Both std::cout instructions are very similar, How is a treated differently from b?
In short: it does not.
CPU itself doesn't care about which type is stored where, it just executes instructions generated by compiler.
Compiler knows that b is a reference, not an int. So it instructs CPU to treat b as a pointer.
If you look at the assembly code for your program, you'll see that the instructions for accessing a and b are different: the part for b contains an extra instruction
mov (%rax),%eax
which is the dereferencing step. (In this assembly notation, parentheses mean dereferencing, so this instruction means something like eax = *rax).
I presume you've requested absolutely no optimization. Although even
then, I would have expected accessing a and accessing b to generate
exactly the same code (in this case, at least).
With regards to how the compiler knows: a and b have different
types, so the compiler knows to do different things with them. The
standard has been designed so that replacing int& with int* const,
and then automatically dereferencing on each access (except the
initialization) will result in a conforming implementation; it looks
like this is what your compiler is doing.

Compile error with embedded assembler

I don't understand why this code
#include <iostream>
using namespace std;
int main(){
int result=0;
_asm{
mov eax,3;
MUL eax,3;
mov result,eax;
}
cout<<result<<endl;
return 0;
}
shows the following error.
1>c:\users\david\documents\visual studio 2010\projects\assembler_instructions\assembler_instructions.cpp(11): error C2414: illegal number of operands
Everything seems fine, and yet why do I get this compiler error?
According to this page, the mul instruction only takes a single argument:
mul arg
This multiplies "arg" by the value of corresponding byte-length in the A register, see table below:
operand size 1 byte 2 bytes 4 bytes
other operand AL AX EAX
higher part of result stored in: AH DX EDX
lower part of result stored in: AL AX EAX
Thus following the notes as per Justin's link:
#include <iostream>
int main()
{
int result=0;
_asm{
mov eax, 3;
mov ebx, 4;
mul ebx;
mov result,eax;
}
std::cout << result << std::endl;
return 0;
}
Use:
imul eax, 3;
or:
imul eax, eax, 3;
That way you don't need to worry about edx -register being clobbered. It's "signed integer multiply". You seem to have 'int' -result so it shouldn't matter whether you use mul or imul.
Sometimes I've gotten errors from not having edx register zeroed when dividing or multiplying. CPU was Intel core2 quad Q9550
There's numbingly overengineered but correct intel instruction reference manuals you can read. Though intel broke its websites while ago. You could try find same reference manuals from AMD sites though.
Update: I found the manual: http://www.intel.com/design/pentiumii/manuals/243191.htm
I don't know when they are going to again break their sites, so you really always need to search it up.
Update2: ARGHL! those are from year 1999.. well most details are unfortunately the same.
You should download the Intel architecture manuals.
http://www.intel.com/products/processor/manuals/
For your purpose, volume 2 is going to help you the most.
As of access in July 2010, they are current.