Access violation on 'ret' instruction - c++

I've got this function, which consists mostly of inline asm.
long *toarrayl(int members, ...){
__asm{
push esp
mov eax, members
imul eax, 4
push eax
call malloc
mov edx, eax
mov edi, eax
xor ecx, ecx
xor esi, esi
loopx:
cmp ecx, members
je done
mov esi, 4
imul esi, ecx
add esi, ebp
mov eax, [esi+0xC]
mov [edi], eax
inc ecx
add edi, 4
jmp loopx
done:
mov eax, edx
pop esp
ret
}
}
And upon running, I get an access violation on the return instruction.
I'm using VC++ 6, and it can sometimes mean to point at the line above, so possible on 'pop esp'.
If you could help me out, it'd be great.
Thanks, iDomo.

You are failing to manage the stack pointer correctly. In particular, your call to malloc unbalances the stack, and your pop esp ends up popping the wrong value into esp. The access violation therefore occurs when you try to ret from an invalid stack (the CPU cannot read the return address). It's unclear why you are pushing and popping esp; that accomplishes nothing.

As you spotted, you should never use the instruction POP ESP - when you see that in the code, you know something extremely wrong has happened. Of course, calling malloc inside asseembler code is also rather a bad thing to do - you have for example forgotten to check if it returned NULL, so you may well crash. Stick that outside your assembler - and check for NULL, it's much easier to debug "Couldn't allocate memory at line 54 in file mycode.c" than "Somewhere in the assembler, we got a
Here's some suggestions for improvement, which should speed up your loop a bit:
long *toarrayl(int members, ...){
__asm{
mov eax, members
imul eax, 4
push eax
call malloc
add esp, 4
mov edx, eax
mov edi, eax
mov ecx, members
lea esi, [ebp+0xc]
loopx:
mov eax, [esi]
mov [edi], eax
add edi, 4
add esi, 4
dec ecx
jnz loopx
mov lret, eax
ret
}
}
Improvements: Remove multiply by four in every loop. Just increment esi instead. Use decrement on ecx, instead of increament, and load it up with members before the loop. This allows usage of just one jump in the loop, rather than two. Remove reduntant move from edx, to eax. Use eax directly.

I've figured out the answer on my own.
For those who have had this same, or alike problem:
The actual exception was occuring after the user code, when vc++ automatically pops/restores the registers to their states before the function was called. Since I miss-aligned the stack pointer when calling malloc, there was an access violation when poping from the stack. I wasn't able to see this in the editor because it wasn't my code, so it was just shown as the last of my code in the function.
To correct this, just add an add esp, (size of parameters for previous call) after the calls you make.
Fixed code:
long *toarrayl(int members, ...){
__asm{
mov eax, members
imul eax, 4
push eax
call malloc
add esp, 4
mov edx, eax
mov edi, eax
xor ecx, ecx
xor esi, esi
loopx:
cmp ecx, members
je done
mov esi, 4
imul esi, ecx
add esi, ebp
mov eax, [esi+0xC]
mov [edi], eax
inc ecx
add edi, 4
jmp loopx
done:
mov eax, edx
ret
}
//return (long*)0;
}
Optimized code:
long *toarrayl(int members, ...){
__asm{
mov eax, members
shl eax, 2
push eax
call malloc
add esp, 4
;cmp eax, 0
;je _error
mov edi, eax
mov ecx, members
lea esi, [ebp+0xC]
loopx:
mov edx, [esi]
mov [edi], edx
add edi, 4
add esi, 4
dec ecx
jnz loopx
}
}

Related

Why there is no `leave` instruction at function epilog on x64? [duplicate]

This question already has answers here:
Why does the x86-64 GCC function prologue allocate less stack than the local variables?
(1 answer)
Why is there no "sub rsp" instruction in this function prologue and why are function parameters stored at negative rbp offsets?
(2 answers)
Closed 4 years ago.
I'm on the way to get idea how the stack works on x86 and x64 machines. What I observed however is that when I manually write a code and disassembly it, it differs from what I see in the code people provide (eg. in their questions and tutorials). Here is little example:
Source
int add(int a, int b) {
int c = 16;
return a + b + c;
}
int main () {
add(3,4);
return 0;
}
x86
add(int, int):
push ebp
mov ebp, esp
sub esp, 16
mov DWORD PTR [ebp-4], 16
mov edx, DWORD PTR [ebp+8]
mov eax, DWORD PTR [ebp+12]
add edx, eax
mov eax, DWORD PTR [ebp-4]
add eax, edx
leave (!)
ret
main:
push ebp
mov ebp, esp
push 4
push 3
call add(int, int)
add esp, 8
mov eax, 0
leave (!)
ret
Now goes x64
add(int, int):
push rbp
mov rbp, rsp
(?) where is `sub rsp, X`?
mov DWORD PTR [rbp-20], edi
mov DWORD PTR [rbp-24], esi
mov DWORD PTR [rbp-4], 16
mov edx, DWORD PTR [rbp-20]
mov eax, DWORD PTR [rbp-24]
add edx, eax
mov eax, DWORD PTR [rbp-4]
add eax, edx
(?) where is `mov rsp, rbp` before popping rbp?
pop rbp
ret
main:
push rbp
mov rbp, rsp
mov esi, 4
mov edi, 3
call add(int, int)
mov eax, 0
(?) where is `mov rsp, rbp` before popping rbp?
pop rbp
ret
As you can see, my main confusion is that when I compile against x86 - I see what I expect. When it's x64 - I miss leave instruction or exact following sequence: mov rsp, rbp then pop rbp. What's worng?
UPDATE
It seems like leave is missing, just because it wasn't altered previously. But then, goes another question - why there is no allocation for local vars in the frame?
To this question #melpomene gives pretty straightforward answer - because of "red zone". Which basically means the function that calls no further functions (leaf) can use the first 128 bytes below the stack without allocating space. So if I insert a call inside an add() to any other dumb function - sub rsp, X and add rsp, X will be added to prologue and epilogue respectively.

complier generating a mov back and forth on eax

int test1(int a, int b) {
if (__builtin_expect(a < b, 0))
return a / b;
return b;
}
was compiled by clang with -O3 -march=native to
test1(int, int): # #test1(int, int)
cmp edi, esi
jl .LBB0_1
mov eax, esi
ret
.LBB0_1:
mov eax, edi
cdq
idiv esi
mov esi, eax
mov eax, esi # moving eax back and forth
ret
why eax is being moved back and forth after the idiv ?
gcc has a similar behavior so this seem to be intended.
gcc with -O3 -march=native complied the code to
test1(int, int):
mov r8d, esi
cmp edi, esi
jl .L4
mov eax, r8d
ret
.L4:
mov eax, edi
cdq
idiv esi
mov r8d, eax
mov eax, r8d #back and forth mov
ret
godbolt
This is not a complete solution to the puzzle but should give some clues.
Without the __builtin_expect, clang generates:
test2(int, int): # #test2(int, int)
mov ecx, esi
cmp edi, esi
jge .LBB1_2
mov eax, edi
cdq
idiv ecx
mov ecx, eax
.LBB1_2:
mov eax, ecx
ret
While the register allocation is still weird here, it at least makes sense: if the branch is taken, the value of b in ecx is transfered to eax as the return value. If it is not taken, the result of the division (in eax) has to be transferred to ecx to be in the same register as in the other case.
It could be that a __builtin_expect convinces the compiler to special case the case where the branch is taken late in the compilatin process, orphaning the .LBB1_2 label and causing it to be ultimately absent from the assembly.
idiv esi is 32-bit operand-size, so EAX is already zero-extended to fill RAX. Therefore copying to ESI or R8D and back has no effect on the value in EAX. (And the calling convention doesn't require zero-extension or sign-extension to 64-bit anyway; 32-bit types are returned in 32-bit registers with possible garbage in the upper 32.)
This looks like purely a missed optimization. (There's no microarchitectural performance reason that this would be a good thing either.)

Run-Time Check Failure #0 - The value of ESP was not properly saved across a function call

#include<stdio.h>
int a[100];
int main(){
char UserName[100];
char *n=UserName;
char *q=NULL;
char Serial[200];
q=Serial;
scanf("%s",UserName);
//this is about
__asm{
pushad
mov eax,q
push eax
mov eax,n
push eax
mov EAX,EAX
mov EAX,EAX
CALL G1
LEA EDX,DWORD PTR SS:[ESP+10H]
jmp End
G1:
SUB ESP,400H
XOR ECX,ECX
PUSH EBX
PUSH EBP
MOV EBP,DWORD PTR SS:[ESP+40CH]
PUSH ESI
PUSH EDI
MOV DL,BYTE PTR SS:[EBP]
TEST DL,DL
JE L048
LEA EDI,DWORD PTR SS:[ESP+10H]
MOV AL,DL
MOV ESI,EBP
SUB EDI,EBP
L014:
MOV BL,AL
ADD BL,CL
XOR BL,AL
SHL AL,1
OR BL,AL
MOV AL,BYTE PTR DS:[ESI+1]
MOV BYTE PTR DS:[EDI+ESI],BL
INC ECX
INC ESI
TEST AL,AL
JNZ L014
TEST DL,DL
JE L048
MOV EDI,DWORD PTR SS:[ESP+418H]
LEA EBX,DWORD PTR SS:[ESP+10H]
MOV ESI,EBP
SUB EBX,EBP
L031:
MOV AL,BYTE PTR DS:[ESI+EBX]
PUSH EDI
PUSH EAX
CALL G2
MOV AL,BYTE PTR DS:[ESI+1]
ADD ESP,8
ADD EDI,2
INC ESI
TEST AL,AL
JNZ L031
MOV BYTE PTR DS:[EDI],0
POP EDI
POP ESI
POP EBP
POP EBX
ADD ESP,400H
RETN
L048:
MOV ECX,DWORD PTR SS:[ESP+418H]
POP EDI
POP ESI
POP EBP
MOV BYTE PTR DS:[ECX],0
POP EBX
ADD ESP,400H
RETN
G2:
MOVSX ECX,BYTE PTR SS:[ESP+4]
MOV EAX,ECX
AND ECX,0FH
SAR EAX,4
AND EAX,0FH
CMP EAX,0AH
JGE L009
ADD AL,30H
JMP L010
L009:
ADD AL,42H
L010:
MOV EDX,DWORD PTR SS:[ESP+8]
CMP ECX,0AH
MOV BYTE PTR DS:[EDX],AL
JGE L017
ADD CL,61H
MOV BYTE PTR DS:[EDX+1],CL
RETN
L017:
ADD CL,45H
MOV BYTE PTR DS:[EDX+1],CL
RETN
End:
mov eax,eax
popad
}
printf("%s\n",Serial);
return 0;
}
Can you help me?
this problem about Asm,I don't know why cause this result.
this program is very easy,and it about a program of internal code.
Run-Time Check Failure #0 - The value of ESP was not properly saved across a function call. This is usually a result of calling a function declared with one calling convention with a function pointer declared with a different calling convention.
It seems the two parameters which are pushed onto the stack before the call to G1 are never popped from the stack.
Possibly it happens because at the beginning of the function G1 you SUB ESP,400H, after L031 you make ADD ESP,8 and at the end you ADD ESP,400H. It seems like ESP before the G1 call is by 8 less then after call.
EDIT: Regarding to the coding style of assembly function please see this. Here briefly described what are the caller's responsibilities and what are callee's responsibilities, that are regarded to ESP.

Call not returning properly [X86_ASM]

This is C++ using x86 inline assembly [Intel syntax]
Function:
DWORD *Call ( size_t lArgs, ... ){
DWORD *_ret = new DWORD[lArgs];
__asm {
xor edx, edx
xor esi, esi
xor edi, edi
inc edx
start:
cmp edx, lArgs
je end
push eax
push edx
push esi
mov esi, 0x04
imul esi, edx
mov ecx, esi
add ecx, _ret
push ecx
call dword ptr[ebp+esi] //Doesn't return to the next instruction, returns to the caller of the parent function.
pop ecx
mov [ecx], eax
pop eax
pop edx
pop esi
inc edx
jmp start
end:
mov eax, _ret
ret
}
}
The purpose of this function is to call multiple functions/addresses without calling them individually.
Why I'm having you debug it?
I have to start school for the day, and I need to have it done by evening.
Thanks alot, iDomo
Thank you for a complete compile-able example, it makes solving problems much easier.
According to your Call function signature, when the stack frame is set up, the lArgs is at ebp+8 , and the pointers start at ebp+C. And you have a few other issues. Here's a corrected version with some push/pop optimizations and cleanup, tested on MSVC 2010 (16.00.40219.01) :
DWORD *Call ( size_t lArgs, ... ) {
DWORD *_ret = new DWORD[lArgs];
__asm {
xor edx, edx
xor esi, esi
xor edi, edi
inc edx
push esi
start:
cmp edx, lArgs
; since you started counting at 1 instead of 0
; you need to stop *after* reaching lArgs
ja end
push edx
; you're trying to call [ebp+0xC+edx*4-4]
; a simpler way of expressing that - 4*edx + 8
; (4*edx is the same as edx << 2)
mov esi, edx
shl esi, 2
add esi, 0x8
call dword ptr[ebp+esi]
; and here you want to write the return value
; (which, btw, your printfs don't produce, so you'll get garbage)
; into _ret[edx*4-4] , which equals ret[esi - 0xC]
add esi, _ret
sub esi, 0xC
mov [esi], eax
pop edx
inc edx
jmp start
end:
pop esi
mov eax, _ret
; ret ; let the compiler clean up, because it created a stack frame and allocated space for the _ret pointer
}
}
And don't forget to delete[] the memory returned from this function after you're done.
I notice that, before calling, you push EAX, EDX, ESI, ECX (in order), but don't pop in the reverse order after returning. If the first CALL returns properly, but subsequent ones don't, that could be the issue.

How's __RTC_CheckEsp implemented?

__RTC_CheckEsp is a call that verifies the correctness of the esp, stack, register. It is called to ensure that the value of the esp was saved across a function call.
Anyone knows how it's implemented?
Well a little bit of inspection of the assembler gives it away
0044EE35 mov esi,esp
0044EE37 push 3039h
0044EE3C mov ecx,dword ptr [ebp-18h]
0044EE3F add ecx,70h
0044EE42 mov eax,dword ptr [ebp-18h]
0044EE45 mov edx,dword ptr [eax+70h]
0044EE48 mov eax,dword ptr [edx+0Ch]
0044EE4B call eax
0044EE4D cmp esi,esp
0044EE4F call #ILT+6745(__RTC_CheckEsp) (42BA5Eh)
There are 2 lines to note in this. First note at 0x44ee35 it stores the current value of esp to esi.
Then after the function call is completed it does a cmp between esp and esi. They should both be the same now. If they aren't then someone has either unwound the stack twice or not unwound it.
The _RTC_CheckEsp function looks like this:
_RTC_CheckEsp:
00475A60 jne esperror (475A63h)
00475A62 ret
esperror:
00475A63 push ebp
00475A64 mov ebp,esp
00475A66 sub esp,0
00475A69 push eax
00475A6A push edx
00475A6B push ebx
00475A6C push esi
00475A6D push edi
00475A6E mov eax,dword ptr [ebp+4]
00475A71 push 0
00475A73 push eax
00475A74 call _RTC_Failure (42C34Bh)
00475A79 add esp,8
00475A7C pop edi
00475A7D pop esi
00475A7E pop ebx
00475A7F pop edx
00475A80 pop eax
00475A81 mov esp,ebp
00475A83 pop ebp
00475A84 ret
As you can see the first thing it check is whether the result of the earlier comparison were "not equal" ie esi != esp. If thats the case then it jumps to the failure code. If they ARE the same then the function simply returns.
If you're any good at asm, maybe this helps:
jne (Jump if Not Equal) - jumps if the ZERO flag is NZ (NotZero)
_RTC_CheckEsp:
004C8690 jne esperror (4C8693h)
004C8692 ret
esperror:
004C8693 push ebp
004C8694 mov ebp,esp
004C8696 sub esp,0
004C8699 push eax
004C869A push edx
004C869B push ebx
004C869C push esi
004C869D push edi
004C869E mov eax,dword ptr [ebp+4]
004C86A1 push 0
004C86A3 push eax
004C86A4 call _RTC_Failure (4550F8h)
004C86A9 add esp,8
004C86AC pop edi
004C86AD pop esi
004C86AE pop ebx
004C86AF pop edx
004C86B0 pop eax
004C86B1 mov esp,ebp
004C86B3 pop ebp
004C86B4 ret
004C86B5 int 3
004C86B6 int 3
004C86B7 int 3
004C86B8 int 3
004C86B9 int 3
004C86BA int 3
004C86BB int 3
004C86BC int 3
004C86BD int 3
004C86BE int 3
004C86BF int 3