Understand stack layout where local variables are located - gdb

I have the following code:
#include <stdio.h>
#include <stdbool.h>
#include <string.h>
#include <stdlib.h>
char *MASTER_PASSWORD = "password";
bool login(char * password){
bool is_logged_in=false;
char buf[8];
strcpy(buf,password);
if(strcmp(buf, MASTER_PASSWORD)==0){
is_logged_in=true;
}
return is_logged_in;
}
int main(int argc, char *argv[]){
if(argc <2)
{
printf("Syntax: %s <input string>\n", argv[0]);
exit (0);
}
if(login(argv[1]))
printf("you are authorized");
return 0;
}
I'm using gdb to debug it, I need to know where the value of is_logged_in is saved in the stack. How can I do that?

Unless you take the address of a local variable (&is_logged_in), optimizing compilers mostly won't store them on the stack. You can see this by using info scope in gdb:
$ gcc -Os -g3 stack-layout.c -o stack-layout
$ gdb -q stack-layout
(gdb) info scope login
would show:
Scope for login:
<...>
Symbol is_logged_in is multi-location:
Range 0x40064c-0x40066e: the constant 0
Range 0x40066e-0x400673: a complex DWARF expression:
0: DW_OP_breg0 0 [$rax]
2: DW_OP_const1u 32
4: DW_OP_shl
5: DW_OP_lit0
6: DW_OP_eq
7: DW_OP_stack_value
, length 1.
<...>
Bear with me here even if you're not familiar with x86-64 assembly. Disassembling login() gives:
8 bool login(char * password){
0x000000000040064c <+0>: sub $0x18,%rsp
0x0000000000400650 <+4>: mov %rdi,%rsi
9 bool is_logged_in=false;
10 char buf[8];
11 strcpy(buf,password);
0x0000000000400653 <+7>: lea 0x8(%rsp),%rdi
0x0000000000400658 <+12>: callq 0x4004c0 <strcpy#plt>
12 if(strcmp(buf, MASTER_PASSWORD)==0){
0x000000000040065d <+17>: mov 0x2009ec(%rip),%rsi # 0x601050 <MASTER_PASSWORD>
0x0000000000400664 <+24>: lea 0x8(%rsp),%rdi
0x0000000000400669 <+29>: callq 0x4004f0 <strcmp#plt>
0x000000000040066e <+34>: test %eax,%eax
0x0000000000400670 <+36>: sete %al
13 is_logged_in=true;
14 }
15
16 return is_logged_in;
17 }
0x0000000000400673 <+39>: add $0x18,%rsp
0x0000000000400677 <+43>: retq
What gdb info scope is saying about is_logged_in:
Between 0x40064c and 0x40066e, i.e. between the start of the function and the call to strcmp(), is_logged_in has the constant value 0.
Between 0x40066e and 0x400673, i.e. after the call to strcmp() till the end of the function, the value of is_logged_in can be calculated by:
Reading the 64 bit register that stores the return value from strcmp() (RAX)
Left shift 32 bits
Compare the result to 0. The value of is_logged_in is 1 if the equality comparison is true and 0 otherwise.
At this point some may argue that is_logged_in would be allocated differently if we compile with a lower optimization level, but my point is that local variables are only guaranteed to be on the stack if you take their address and do something with that address that the compiler would not optimize away. In this case, if you want to change the value of is_logged_in you're better off changing the value returned by strcmp() i.e. change RAX right after strcmp() returns.
If is_logged_in is allocated on the stack, p &is_logged_in would print its address in GDB. If it's not on the stack you'd get an error like
(gdb) p &is_logged_in
Can't take address of "is_logged_in" which isn't an lvalue.
The DWARF debug info format including its stack machine operations are documented at dwarfstd.org.

Related

C++ HOW can this out-of-range access inside struct go wrong?

#include <iostream>
#include <random>
using namespace std;
struct TradeMsg {
int64_t timestamp; // 0->7
char exchange; // 8
char symbol[17]; // 9->25
char sale_condition[4]; // 26 -> 29
char source_of_trade; // 30
uint8_t trade_correction; // 31
int64_t trade_volume; // 32->39
int64_t trade_price; // 40->47
};
static_assert(sizeof(TradeMsg) == 48);
char buffer[1000000];
template<class T, size_t N=1>
int someFunc(char* buffer, T* output, int& cursor) {
// read + process data from buffer. Return data in output. Set cursor to the last byte read + 1.
return cursor + (rand() % 20) + 1; // dummy code
}
void parseData(TradeMsg* msg) {
int cursor = 0;
cursor = someFunc<int64_t>(buffer, &msg->timestamp, cursor);
cursor = someFunc<char>(buffer, &msg->exchange, cursor);
cursor++;
int i = 0;
// i is GUARANTEED to be <= 17 after this loop,
// edit: the input data in buffer[] guarantee that fact.
while (buffer[cursor + i] != ',') {
msg->symbol[i] = buffer[cursor + i];
i++;
}
msg->symbol[i] = '\n'; // might access symbol[17].
cursor = cursor + i + 1;
for (i=0; i<4; i++) msg->sale_condition[i] = buffer[cursor + i];
cursor += 5;
//cursor = someFunc...
}
int main()
{
TradeMsg a;
a.symbol[17] = '\0';
return 0;
}
I have this struct that is guaranteed to have predictable size. In the code, there is a case where the program tries to assign value to an array element past its size msg->symbol[17] = ... .
However, in that case, the assignment does not cause any harm as long as:
It is done before the next struct members (sale_condition) are assigned (no unexpected code reordering).
It does not modifies any previous members (timestamp, exchange).
It does not access any memory outside the struct.
I read that this is undefined behavior. But what kind of compiler optimization/code generation can make this go wrong? symbol[17] is pretty deep inside the middle of the struct, so I don't see how can the compiler generates an access outside it. Assume that platform is x86-64 only
Various folks have pointed out debug-mode checks that will fire on access outside the bounds of an array member of a struct, with options like gcc -fsanitize=undefined. Separate from that, it's also legal for a compiler to use the assumption of non-overlap between member accesses to reorder two assignments which actually do alias:
#Peter in comments points out that the compiler is allowed to assume that accesses to msg->symbol[i] don't affect other struct members, and potentially delay msg->symbol[i] = '\n'; until after the loop that writes msg->sale_condition[i]. (i.e. sink that store to the bottom of the function).
There isn't a good reason you'd expect a compiler to want to do that in this function alone, but perhaps after inlining into some caller that also stored something there, it could be relevant. Or just because it's a DeathStation 9000 that exists in this thought experiment to break your code.
You could write this safely, although GCC compiles that worse
Since char* is allowed to alias any other object, you could offset a char* relative to the start of the whole struct, rather than to the start of the member array. Use offsetof to find the right start point like this:
#include <cstddef>
...
((char*)msg + offsetof(TradeMsg, symbol))[i] = '\n'; // might access symbol[17].
That's exactly equivalent to *((char*)msg + offsetof(...) + i) = '\n'; by definition of C++'s [] operator, even though it lets you use [i] to index relative to the same position.
However, that does compile to less efficient asm with GCC11.2 -O2. (Godbolt), mostly because int i, cursor are narrower than pointer-width. The "safe" version that redoes indexing from the start of the struct does more indexing work in asm, not using the msg+offsetof(symbol) pointer that it was already using as the base register in the loop.
# original version, with UB if `i` goes past the buffer.
# gcc11.2 -O2 -march=haswell. -O3 fully unrolls into a chain of copy/branch
... partially peeled first iteration
.L3: # do{
mov BYTE PTR [rbx+8+rax], dl # store into msg->symbol[i]
movsx rdi, eax # not read inside the loop
lea ecx, [r8+rax]
inc rax
movzx edx, BYTE PTR buffer[rsi+1+rax] # load from buffer
cmp dl, 44
jne .L3 # }while(buffer[cursor+i] != ',')
## End of copy-and-search loop.
# Loops are identical up to this point except for MOVSX here vs. MOV in the no-UB version.
movsx rcx, ecx # just redo sign extension of this calculation that was done repeatedly inside the loop just for this, apparently.
.L2:
mov BYTE PTR [rbx+9+rdi], 10 # store a newline
mov eax, 1 # set up for next loop
# offsetof version, without UB
# same loop, but with RDI and RSI usage switched.
# And with mov esi, eax zero extension instead of movsx rdi, eax sign extension
cmp dl, 44
jne .L3 # }while(buffer[cursor+i] != ',')
add esi, 9 # offsetof(TradeMsg, symbol)
movsx rcx, ecx # more stuff getting sign extended.
movsx rsi, esi # including something used in the newline store
.L2:
mov BYTE PTR [rbx+rsi], 10
mov eax, 1 # set up for next loop
The RCX calculation seems to just be for use by the next loop, setting sale_conditions.
BTW, the copy-and-search loop is like strcpy but with a ',' terminator. Unfortunately gcc/clang don't know how to optimize that; they compile to a slow byte-at-a-time loop, not e.g. an AVX512BW masked store using mask-1 from a vec == set1_epi8(',') compare, to get a mask selecting the bytes-before-',' instead of the comma element. (Probably needs a bithack to isolate that lowest-set-bit as the only set bit, though, unless it's safe to always copy 16 or 17 bytes separate from finding the ',' position, which could be done efficiently without masked stores or branching.)
Another option might be a union between a char[21] and struct{ char sym[17], sale[4];}, if you use a C++ implementation that allows C99-style union type-punning. (It's a GNU extension, and also supported by MSVC, but not necessarily literally every x86 compiler.)
Also, style-wise, shadowing int i = 0; with for( int i=0 ; i<4 ; i++ ) is poor style. Pick a different var name for that loop, like j. (Or if there is anything meaningful, a better name for i which has to survive across multiple loops.)
In a few cases:
When variable guard is set up: https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html
In a C++ interpreter (yes they exist): https://root.cern/cling/
Your symbol has a size of 17 Yet, you are trying to assign a value to the 18th index a.symbol[17] = '\0';
Remember your index value starts off at 0 not 1.
So you have two places that can go wrong. i can equal 17 which will cause an error and that last line I showed above will cause an error.

shellcode calls different syscall while runing alone as individiual code and while running with C++ code

I've such a code that run's shell:
BITS 64
global _start
_start:
mov rax, 59
jmp short file
c1:
pop rdi
jmp short argv
c2:
pop rsi
mov rdx, 0
syscall
file:
call c1
db '/bin/sh',0
argv:
call c2
dq arg, 0
arg:
db 'sh',0
It works when it's built in this way:
nasm -f elf64 shcode.asm
ld shcode.o -o shcode
Althougt, when I bring it into binary form with:
nasm -f bin shcode.asm
paste it into following C++ code:
int main(void)
{
char kod[]="\xB8\x3B\x00\x00\x00\xEB\x0B\x5F\xEB\x15\x5E\xBA\x00\x00\x00\x00\x0F\x05\xE8\xF0\xFF\xFF\xFF\x2F\x62\x69\x6E\x2F\x73\x68\x00\xE8\xE6\xFF\xFF\xFF\x34\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x73\x68\x00";
reinterpret_cast<void(*)()>(kod)();
return 0;
}
make it with clang++ texp.cpp -o texp.e -Wl,-z,execstack and execute, shell isn't running.
After running it with
strace ./texp.e
I see something like this (I stopped this process with ^C):
syscall_0xffffffffffffffda(0x7ffc23e0a297, 0x7ffc23e0a2a4, 0, 0x4a0, 0x7fe1ff3039b0, 0x7fe1ff69b960) = -1 ENOSYS (Nie zaimplementowana funkcja)
syscall_0xffffffffffffffda(0x7ffc23e0a297, 0x7ffc23e0a2a4, 0, 0x4a0, 0x7fe1ff3039b0, 0x7fe1ff69b960) = -1 ENOSYS (Nie zaimplementowana funkcja)
.
.
.
syscall_0xffffffffffffffda(0x7ffc23e0a297, 0x7ffc23e0a2a4, 0, 0x4a0, 0x7fe1ff3039b0, 0x7fe1ff69b960) = -1 ENOSYS (Nie zaimplementowana funkcja)
^Csyscall_0xffffffffffffffda(0x7ffc23e0a297, 0x7ffc23e0a2a4, 0, 0x4a0, 0x7fe1ff3039b0, 0x7fe1ff69b960strace: Process 2806 detached
<detached ...>
Nie zaimplementowana funkcja - Function not implemented
So the program ( aka shellcode ) is propably running improper syscall.
In your C++ shellcode caller, strace shows your execve system call was
execve("/bin/sh", [0x34], NULL) = -1 EFAULT (Bad address)
The later syscall_0xffffffffffffffda(...) = -1 ENOSYS are from an infinite loop with RAX = -EFAULT instead of 59, and then from RAX=- ENOSYS (again not a valid call number). This loop is created by your call/pop.
Presumably because you hexdumped an absolute address for arg from an unlinked .o or from a PIE executable, which is how you got 0x34 as the absolute address.
Obviously the whole approach of embedding an absolute address in your shellcode doesn't work if it's going to run from a randomized stack address, with no relocation fixup. dq arg, 0 is not position-independent.
You need to construct at least the argv array yourself (usually with push) using pointers. You could also use a push imm32 to construct arg itself. e.g. push 'shsh' / lea rax, [rsp+2].
Or the most common trick is to take advantage of a Linux-specific "feature": you can pass argv=NULL (instead of a pointer to a NULL pointer) with xor esi,esi.
(Using mov reg,0 completely defeats the purpose of the jmp/call/pop trick for avoiding zero bytes. You might as well just use a normal RIP-relative LEA if zero bytes are allowed. But if not, you can jump forward over data then use RIP-relative LEA with a negative displacement.)

C++ inline assembly trying to copy a char from a std::string into a register

I have an assignment in C++ to read a file into a string variable which contains digits (no spaces), and using inline assembly, the program needs to sum up the digits of the string. For this I want to loop until end of string (NULL) and every iteration copy 1 char (which is 1 digit) into a register so I can use compare and subtract on it. The problem is that every time instead of copying the char to the register it copies some random value.
I'm using Visual Studio for debugging. Variable Y is the string and I'm trying to copy every iteration of the loop the current char into register AL.
// read from txt file
string y;
cout << "\n" << "the text is \n";
ifstream infile;
infile.open("1.txt");
getline(infile, y);
cout << y;
infile.close();
// inline assembly
_asm
{
mov edx, 0 // counter
mov ebx, 0
mov eax, 0
loop1:
movzx AL, y[ebx]
cmp AL, 0x00
jz finished
sub AL, 48 // convert ascii to number, assuming digit
add edx, eax // add digit to counter
add ebx, 1 // move pointer to the next byte
loop loop1
finished:
mov i, edx
}
For example assuming Y is "123" and it's the first iteration of the loop, EBX is 0. I expect y[ebx] to point to value 49 ('1') and indeed in debug I see y[ebx]'s value is 49. I want to copy said value into a register, so when I use instruction:
movzx AL, y[ebx]
I expect register AL to change to 49 ('1'), but the value changes to something random instead. For instance last debug session it changed to 192 ('À').
y is the std::string object's control block. You want to access its C string data.
MSVC inline asm syntax is pretty crap, so there's no way to just ask for a pointer to that in a register. I think you have to create a new C++ variable like char *ystr = y.c_str();
That C variable is a pointer which you need to load into register with mov ecx, [ystr]. Accessing the bytes of ystr's object-representation directly would give you the bytes of the pointer.
Also, your current code is using the loop instruction, which is slow and equivalent to dec ecx/jnz. But you didn't initialize ECX, and your loop termination condition is based on the zero terminator, not a counter that you know ahead of the first iteration. (Unless you also ask the std::string for its length instead).
There is zero reason to use the loop instruction here. Put a test al,al / jnz loop1 at the bottom of your loop like a normal person.

Undefined behavior from pointer math on a C++ array

Why the output of this program is 4?
#include <iostream>
int main()
{
short A[] = {1, 2, 3, 4, 5, 6};
std::cout << *(short*)((char*)A + 7) << std::endl;
return 0;
}
From my understanding, on x86 little endian system, where char has 1 byte, and short 2 bytes, the output should be 0x0500, because the data in array A is as fallow in hex:
01 00 02 00 03 00 04 00 05 00 06 00
We move from the beginning 7 bytes forward, and then read 2 bytes. What I'm missing?
You are violating strict aliasing rules here. You can't just read half-way into an object and pretend it's an object all on its own. You can't invent hypothetical objects using byte offsets like this. GCC is perfectly within its rights to do crazy sh!t like going back in time and murdering Elvis Presley, when you hand it your program.
What you are allowed to do is inspect and manipulate the bytes that make up an arbitrary object, using a char*. Using that privilege:
#include <iostream>
#include <algorithm>
int main()
{
short A[] = {1, 2, 3, 4, 5, 6};
short B;
std::copy(
(char*)A + 7,
(char*)A + 7 + sizeof(short),
(char*)&B
);
std::cout << std::showbase << std::hex << B << std::endl;
}
// Output: 0x500
(live demo)
But you can't just "make up" a non-existent object in the original collection.
Furthermore, even if you have a compiler that can be told to ignore this problem (e.g. with GCC's -fno-strict-aliasing switch), the made-up object is not correctly aligned for any current mainstream architecture. A short cannot legally live at that odd-numbered location in memory†, so you doubly can't pretend there is one there. There's just no way to get around how undefined the original code's behaviour is; in fact, if you pass GCC the -fsanitize=undefined switch it will tell you as much.
† I'm simplifying a little.
The program has undefined behaviour due to casting an incorrectly aligned pointer to (short*). This breaks the rules in 6.3.2.3 p6 in C11, which is nothing to do with strict aliasing as claimed in other answers:
A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined.
In [expr.static.cast] p13 C++ says that converting the unaligned char* to short* gives an unspecified pointer value, which might be an invalid pointer, which can't be dereferenced.
The correct way to inspect the bytes is through the char* not by casting back to short* and pretending there is a short at an address where a short cannot live.
This is arguably a bug in GCC.
First, it is to be noted that your code is invoking undefined behavior, due to violation of the rules of strict aliasing.
With that said, here's why I consider it a bug:
The same expression, when first assigned to an intermediate short or short *, causes the expected behavior. It's only when passing the expression directly as a function argument, does the unexpected behavior manifest.
It occurs even when compiled with -O0 -fno-strict-aliasing.
I re-wrote your code in C to eliminate the possibility of any C++ craziness. Your question is was tagged c after all! I added the pshort function to ensure that the variadic nature printf wasn't involved.
#include <stdio.h>
static void pshort(short val)
{
printf("0x%hx ", val);
}
int main(void)
{
short A[] = {1, 2, 3, 4, 5, 6};
#define EXP ((short*)((char*)A + 7))
short *p = EXP;
short q = *EXP;
pshort(*p);
pshort(q);
pshort(*EXP);
printf("\n");
return 0;
}
After compiling with gcc (GCC) 7.3.1 20180130 (Red Hat 7.3.1-2):
gcc -O0 -fno-strict-aliasing -g -Wall -Werror endian.c
Output:
0x500 0x500 0x4
It appears that GCC is actually generating different code when the expression is used directly as an argument, even though I'm clearly using the same expression (EXP).
Dumping with objdump -Mintel -S --no-show-raw-insn endian:
int main(void)
{
40054d: push rbp
40054e: mov rbp,rsp
400551: sub rsp,0x20
short A[] = {1, 2, 3, 4, 5, 6};
400555: mov WORD PTR [rbp-0x16],0x1
40055b: mov WORD PTR [rbp-0x14],0x2
400561: mov WORD PTR [rbp-0x12],0x3
400567: mov WORD PTR [rbp-0x10],0x4
40056d: mov WORD PTR [rbp-0xe],0x5
400573: mov WORD PTR [rbp-0xc],0x6
#define EXP ((short*)((char*)A + 7))
short *p = EXP;
400579: lea rax,[rbp-0x16] ; [rbp-0x16] is A
40057d: add rax,0x7
400581: mov QWORD PTR [rbp-0x8],rax ; [rbp-0x08] is p
short q = *EXP;
400585: movzx eax,WORD PTR [rbp-0xf] ; [rbp-0xf] is A plus 7 bytes
400589: mov WORD PTR [rbp-0xa],ax ; [rbp-0xa] is q
pshort(*p);
40058d: mov rax,QWORD PTR [rbp-0x8] ; [rbp-0x08] is p
400591: movzx eax,WORD PTR [rax] ; *p
400594: cwde
400595: mov edi,eax
400597: call 400527 <pshort>
pshort(q);
40059c: movsx eax,WORD PTR [rbp-0xa] ; [rbp-0xa] is q
4005a0: mov edi,eax
4005a2: call 400527 <pshort>
pshort(*EXP);
4005a7: movzx eax,WORD PTR [rbp-0x10] ; [rbp-0x10] is A plus 6 bytes ********
4005ab: cwde
4005ac: mov edi,eax
4005ae: call 400527 <pshort>
printf("\n");
4005b3: mov edi,0xa
4005b8: call 400430 <putchar#plt>
return 0;
4005bd: mov eax,0x0
}
4005c2: leave
4005c3: ret
I get the same result with GCC 4.9.4 and GCC 5.5.0 from Docker hub

x86 Intel Assembly and C++ - Stack around array corrupted

Error:
Run-Time Check Failure #2 - Stack around the variable 'arr' was corrupted.
This seems to be a common error on this forum; however, I was unable to find one that had assembly code mixed into it. Basically, my program is to convert decimal to binary (16-bit representation). After completing the coding, everything seems to compute correctly and convert the decimal to binary without an issue; however, after the "Press any key to continue . . .", the error above pops up.
I do not believe the C++ code is causing the issue as it is very basic, and is there only to invoke the assembly function.
Again, the computation is correct as the program will produce the correct conversion (i.e: Decimal = 10, Binary Conversion: 0000000000001010), but just giving me the error at the end of the program.
C++ Code:
#include <iostream>
using namespace std;
extern"C" void decToBin(char[], int, int);
int main()
{
//Initialize array and variables
const int SIZE = 16;
char arr[SIZE] = { NULL };
int dec = 0;
//Ask user for integer that they want to convert
cout << "Please enter integer you want to convert to binary: ";
cin >> dec;
//Assembly function to convert integer
decToBin(arr, dec, SIZE);
cout << "The 16-bit binary representation of " << dec << " is: ";
//Display the 16-bit binary conversion
for (int i = 0; i < SIZE; i++)
cout << arr[i];
cout << endl;
system("PAUSE");
return 0;
}
Assembly Code:
.686
.model flat
.code
_decToBin PROC ;Start of project
start:
push ebp
mov ebp,esp ;Stack pointer to ebp
mov eax,[ebp+8] ;Address of first array element
mov cx,[ebp+12] ;Integer number being passed - Copying onto 16 bit register
mov edx,[ebp+16] ;Size of array
loopme: ;Loop to fill in array
mov ebx,0 ;Initializes ebx to store carry flag after shift
cmp edx,0 ;Compare edx with 0 to see if we should continue
je alldone
shl cx,1 ;Shift the value to the left
adc ebx,0 ;Check carry flag and add 1 if CF(CY) is set to 1 and stay at 0 if CF(CY) is 0
add ebx,48 ;Since array is CHAR, adding 48 will give correct 0 or 1 instead of null
mov [eax],ebx ;Copy the 0's or 1's into the array location
dec edx ;Decrement the counter
inc eax ;Move the array up an index
jmp loopme
alldone:
pop ebp
ret
_decToBin ENDP
END
I have no assembler to compile your code, but you write 32-bit values into a char[] at this line:
mov [eax],ebx ;Copy the 0's or 1's into the array location
So, the last write will update the memory locations arr[SIZE-1] to arr[SIZE+2].