I've been trying to convert this code to C++ without any inlining and I cannot figure it out..
Say you got this line
sub edx, (offset loc_42C1F5+5)
My hex-rays gives me
edx -= (uint)((char*)loc_42C1F5 + 5))
But how would it really look like without the loc_42C1F5.
I would think it would be
edx -= 0x42C1FA;
But is that correct? (can't really step this code in any assembler-level debugger.. as it's damaged well protected)
loc_42C1F5 is a label actually..
seg000:0042C1F5 loc_42C1F5: ; DATA XREF: sub_4464A0+2B5o
seg000:0042C1F5 mov edx, [esi+4D98h]
seg000:0042C1FB lea ebx, [esi+4D78h]
seg000:0042C201 xor eax, eax
seg000:0042C203 xor ecx, ecx
seg000:0042C205 mov [ebx], eax
loc_42C1F5 is a symbol. Given the information you've provided, I cannot say what its offset is. It may be 0x42C1F5 or it may be something else entirely.
If it is 0x42C1F5, then your translation should be correct.
IDA has incorrectly identified 0x42C1FA as an offset, and Hex-Rays used that interpretation. Just convert it to plain number (press O) and all will be well. That's why it's called Interactive Disassembler :)
Related
I am writing simple programs then analyze them.
Today I've written this:
#include <stdio.h>
int x;
int main(void){
printf("Enter X:\n");
scanf("%d",&x);
printf("You enter %d...\n",x);
return 0;
}
It's compiled into this:
push rbp
mov rbp, rsp
lea rdi, s ; "Enter X:"
call _puts
lea rsi, x
lea rdi, aD ; "%d"
mov eax, 0
call ___isoc99_scanf
mov eax, cs:x <- don't understand this
mov esi, eax
lea rdi, format ; "You enter %d...\n"
mov eax, 0
call _printf
mov eax, 0
pop rbp
retn
I don't understand what cs:x means.
I use Ubuntu x64, GCC 10.3.0, and IDA pro 7.6.
TL:DR: IDA confusingly uses cs: to indicate a RIP-relative addressing mode in 64-bit code.
In IDA mov eax, x means mov eax, DWORD [x] which in turn means reading a DWORD from the variable x.
For completeness, mov rax, OFFSET x means mov rax, x (i.e. putting the address of x in rax).
In 64-bit displacements are still 32-bit, so, for a Position Independent Executable, it's not always possible to address a variable by encoding its address (because it's 64-bit and it would not fit into a 32-bit field). And in position-independent code, it's not desirable.
Instead, RIP-relative addressing is used.
In NASM, RIP-relative addressing takes the form mov eax, [REL x], in gas it is mov x(%rip), %eax.
Also, in NASM, if DEFAULT REL is active, the instruction can be shortened to mov eax, [x] which is identical to the 32-bit syntax.
Each disassembler will disassemble a RIP-relative operand differently. As you commented, Ghidra gives mov eax, DWORD PTR [x].
IDA uses mov eax, cs:x to mean mov eax, [REL x]/mov x(%rip), %eax.
;IDA listing, 64-bit code
mov eax, x ;This is mov eax, [x] in NASM and most likely wrong unless your exec is not PIE and always loaded <= 4GiB
mov eax, cs:x ;This is mov eax, [REL x] in NASM and idiomatic to 64-bit programs
In short, you can mostly ignore the cs: because that's just the way variables are addressed in 64-bit mode.
Of course, as the listing above shows, the use or absence of RIP-relative addressing tells you the program can be loaded anywhere or just below the 4GiB.
The cs prefix shown by IDA threw me off.
I can see that it could mentally resemble "code" and thus the rip register but I don't think the RIP-relative addressing implies a cs segment override.
In 32-bit mode, the code segment is usually read-only, so an instruction like mov [cs:x], eax will fault.
In this scenario, putting a cs: in front of the operand would be wrong.
In 64-bit mode, segment overrides (other than fs/gs) are ignored (and the read-bit of the code segment is ignored anyway), so the presence of a cs: doesn't really matter because ds and cs are effectively indistinguishable. (Even an ss or ds override doesn't change the #GP or #SS exception for a non-canonical address.)
Probably the AGU doesn't even read the segment shadow registers anymore for segment bases other than fs or gs. (Although even in 32-bit mode, there's a lower latency fast path for the normal case of segment base = 0, so hardware may just let that do its job.)
Still cs: is misleading in my opinion - a 2E prefix byte is still possible in machine code as padding. Most tools still call it a CS prefix, although http://ref.x86asm.net/coder64.html calls it a "null prefix" in 64-bit mode. There's no such byte here, and cs: is not an obvious or clear way to imply RIP-relative addressing.
I am currently analyzing a binary and came across the following three instructions:
movzx ecx, byte [rax+r9]
movzx edx, byte [rbx+r9]
lea ecx, [rcx+rdx]
The meaning of each of these instructions is clear to me, but the first and third in combination do not make any sense. The first movzx copies the value at [rax+r9] into ecx and afterwards ecx is overwritten again by the lea instruction? Why do we need the first movzx here?
I guess I am just missing something and this is a nasty compiler trick, so I appreciate any help.
Are the commands the computer performs the same for these two expressions in C++:
myInt += 1;
myInt ++;
Am I supposed to use a GCC? Or if you know the answer can you tell me?
For many compilers, these will be identical. (Note that I said many compilers - see my disclaimer below). For example, for the following C++ code, both Test 1 and Test 2 will result in the same assembly language:
int main()
{
int test = 0;
// Test 1
test++;
// Test 2
test += 1;
return 0;
}
Many compilers (including Visual Studio) can be configured to show the resulting assembly language, which is the best way to settle questions like this. For example, in Visual Studio, I right-clicked on the project file, went to "Properties", and did the following:
In this case, as shown below, Visual Studio does, in fact, compile them to the same assembly language:
; Line 8
push ebp
mov ebp, esp
sub esp, 204 ; 000000ccH
push ebx
push esi
push edi
lea edi, DWORD PTR [ebp-204]
mov ecx, 51 ; 00000033H
mov eax, -858993460 ; ccccccccH
rep stosd
; Line 9
mov DWORD PTR _test$[ebp], 0
; Line 12 - this is Test 1
mov eax, DWORD PTR _test$[ebp]
add eax, 1
mov DWORD PTR _test$[ebp], eax
; Line 15 - This is Test 2 - note that the assembly is identical
mov eax, DWORD PTR _test$[ebp]
add eax, 1
mov DWORD PTR _test$[ebp], eax
; Line 17
xor eax, eax
; Line 18
pop edi
pop esi
pop ebx
mov esp, ebp
pop ebp
ret 0
_main ENDP
_TEXT ENDS
END
Interestingly enough, its C# compiler also produces the same MSIL (which is C#'s "equivalent" of assembly language) for similar C# code, so this apparently holds across multiple languages as well.
By the way, if you're using another compiler like gcc, you can follow the directions here to get assembly language output. According to the accepted answer, you should use the -S option, like the following:
gcc -S helloworld.c
If you're writing in Java and would like to do something similar, you can follow these directions to use javap to get the bytecode, which is Java's "equivalent" of assembly language, so to speak.
Also of interest, since this question originally asked about Java as well as C++, this question discusses the relationship between Java code, bytecode, and the eventual machine code (if any).
Caution: Different compilers may produce different assembly language, especially if you're compiling for different processors and platforms. Whether or not you have optimization turned on can also affect things. So, strictly speaking, the fact that Visual Studio is "smart enough" to know that those two statements "mean" the same thing doesn't necessarily mean that all compilers for all possible platforms will be that smart.
I am a teaching assistant for computer science and one of my students submitted the following code to check whether an integer is odd or even:
int is_odd (int i) {
if((i % 2 == 1) && (i % 2 == -1));
else;
}
Surprisingly (at least for me) this code gives correct results. I tested numbers up to 100000000, and I honestly cannot explain why this code is behaving as it does.
We are using gcc v6.2.1 and c++
I know that this is not a typical question for so, but I hope to find some help.
Flowing off the end of a function without returning anything is undefined behaviour, regardless of what actually happens with your compiler. Note that if you pass -O3 to GCC, or use Clang, then you get different results.
As for why you actually see the "correct" answer, this is the x86 assembly which GCC 6.2 produces at -O0:
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-4], edi
mov eax, DWORD PTR [rbp-4]
cdq
shr edx, 31
add eax, edx
and eax, 1
sub eax, edx
cmp eax, 1
nop
pop rbp
ret
Don't worry if you can't read x86. The important thing to note is that eax is used for the return value, and all the intermediate calculations for the if statement use eax as their destination. So when the function exits, eax just happens to have the result of the branch check in it.
Of course, this is all a purely academic discussion; the student's code is wrong and I'd certainly give it zero marks, regardless of whether it passes whatever tests you run it through.
I was trying to build a function in assebmly(FASM) that used more than 4 parameters. in x86 it works fine but I know in x64 with fastcall you have to spill the parameters into the shadow space in the order of rcx,rdx,r8,r9 I read that for 5 and etc you have to pass them onto the stack, but I don't know how to do this. this is what I tried but it keeps saying invalid operand. I know that the first 4 parameters I am doing right because I have made x64 functions before but it is the last 3 I don't know how to spill
proc substr,inputstring,outputstring,buffer1,buffer2,buffer3,startposition,length
;spill
mov [inputstring],rcx
mov [outputstring],rdx
mov [buffer1],r8
mov [buffer2],r9
mov [buffer3],[rsp+8*4]
mov [startposition],[rsp+8*5]
mov [length],[rsp+8*6]
if I try
mov [buffer3],rsp+8*4
it says extra characters on the line.
I also saw that somepeople use rsp+20h, rsp+28h etc but that does not work either.
how do I call more than 4 parameters using fastcall on x64?
also do I have to make room on the stack? I saw some people have to put add rsp,20h right before their spill code. I tried that and it did not help the invlaid operand.
thanks
update
after playing around with it for a little bit I found that the only way it seems to work is if I spill the first 4 parameters and then ignore the rest 5-infinity
proc substr,inputstring,outputstring,buffer1,buffer2,buffer3,startposition,length
;spill
mov [inputstring],rcx
mov [outputstring],rdx
mov [buffer1],r8
mov [buffer2],r9
;start the regular code. ignore spilling buffer3,startposition and length
On x86/x64-CPUs this following instructions does not exist:
mov [buffer3],[rsp+8*4]
mov [startposition],[rsp+8*5]
mov [length],[rsp+8*6]
Workaround with using the rax-register for to read and for to write a values from and to a memory loaction:
mov rax,[rsp+8*4]
mov [buffer3],rax
mov rax,[rsp+8*5]
mov [startposition],rax
mov rax,[rsp+8*6]
mov [length],rax