I am practicing reverse engineering software. I am using Microsoft Visual Studio. I created an empty project and then created an empty file which I called main.cpp. I then wrote the following code, compiled
int main()
{
char* str = "hello matthew";
int x = 15;
return 0;
}
When I brought the release version of the executable over to BinText and IdaPro, the string "hello matthew" was no where to be found. I could also never find the value 15 either in base 10 or hexadecimal.
I cannot begin to understand reverse engineering if I cannot find the references to the values I am looking for in the executable.
My theory is that because my program does absolutely nothing that the compiler just omitted it all, but I do not know for sure. Does anyone know why I cannot locate that string or the value 15 in the executable when I disassemble it?
I cannot begin to understand reverse engineering ...
The first step is to actually understand how the program is built out.
Before you can understand how to reverse a program, you need to understand how it's compiled and built; reversing a binary built for Windows is vastly different from reversing a binary for a *nix system.
To that, since you're using Visual Studio, you can see this answer (option 2) explaining how to enable the assembly output of your code. Alternatively if you're compiling via command line, you can pass /FAs and /Fa to generate the assembly inlined with the source.
Your code produces the following assembly:
; Listing generated by Microsoft (R) Optimizing Compiler Version 18.00.40629.0
TITLE C:\Code\test\test.cpp
.686P
.XMM
include listing.inc
.model flat
INCLUDELIB LIBCMT
INCLUDELIB OLDNAMES
CONST SEGMENT
$SG2548 DB 'hello matthew', 00H
CONST ENDS
PUBLIC _main
; Function compile flags: /Odtp
; File c:\code\test\test.cpp
_TEXT SEGMENT
_x$ = -8 ; size = 4
_str$ = -4 ; size = 4
_main PROC
; 2 : {
push ebp
mov ebp, esp
sub esp, 8
; 3 : char* str = "hello matthew";
mov DWORD PTR _str$[ebp], OFFSET $SG2548
; 4 :
; 5 : int x = 15;
mov DWORD PTR _x$[ebp], 15 ; 0000000fH
; 6 :
; 7 : return 0;
xor eax, eax
; 8 : }
mov esp, ebp
pop ebp
ret 0
_main ENDP
_TEXT ENDS
END
While this is helpful to understand how and what your code is doing, one of the best way to start reversing, is to throw a binary in a debugger, like attaching Visual Studio to an executable, and viewing the assembly as the program is running.
It can depend on what your after since a binary could potentially be obfuscated; that is to say that there could be strings within the binary, but they could be encrypted or just scrambled so as to be unreadable until decrypted/unscrambled by some function within the binary.
So just searching for strings won't necessarily give you anything, and trying to search for a specific binary value in the assembled code is like trying to find a needle in a stack of needles. Know why your trying to reverse a program, then attack that vector.
Does anyone know why I cannot locate that string or the value 15 in the executable when I disassemble it?
As has been mentioned, and as you have guessed, the "release" binary you're searching through was optimized, and the compiler just removed the unused variables so the assembly was essentially returning 0.
I hope that can help.
the main reason is that your code does nothing useful with x and str, so they are entirely redundant!!, and no need for them to even exist in your code! so the compiler automatically removes them from the compiled code "optimization"!!.
if you really want to see them in the compiled code under debuggers, you need to use them or simply tell the compiler not to optimize this part of the code!!
This is how to tell the compiler not to optimize these variable's locations by using volatile qualifier
#include <iostream>
int main(int argc, char** argv) {
const char* volatile str = "hello matthew";
volatile int x = 15;
return 0;
}
this shows that your variables are included in the compiled code in IDA Pro
or as I also said just use them!!!
#include <iostream>
int main(int argc, char** argv) {
const char* str = "hello matthew";
int x = 15;
std::cout << str << x;
return 0;
}
Related
I have some x86 code in a .asm file and am trying to use it from C++:
C++
#include <iostream>
extern "C" int addInts(int a, int b);
int main() {
int a = 1;
int b = 2;
int result = addInts(a, b);
std::cout << "Result :\t" << result << std::endl;
return 0;
}
Asm
.386
.MODEL FLAT, C
.CODE
addInts PROC
PUSH EBP
MOV EBP, ESP
MOV EAX, [EBP+8]
MOV ECX, [EBP+12]
ADD EAX, ECX
POP EBP
RET
addInts ENDP
END
Attempting to run this results in:
LNK4042 (warn) object specified more than once, ignoring extras (asm object file)
LNK2019 (error) unresolved external symbol _addInts referenced in function _main (asm object file)
Followed by a final act of defiance taking the form of fatal error LNK1120 due to the unresolved external (solution executable)
I'm using Visual Studio 2019 with MSVC v142 and MASM. My other, self-contained assembly code has had no issues, and another function I've written involving reading int arrays in x86 from C++ worked fine. I really can't see what's going wrong here, if its a problem with my code, some esoteric setting, or something else entirely.
If I change the last line of the Asm code to END addInts then the program just runs and immediately exits with nothing in std::cout.
The solution file has no entry point defined in linker settings, which was what I did for the last piece of code that called asm from C++.
The asm file is included in the build, using Microsoft Macro Assembler.
The cpp file is set to compile as C++, just in case.
This problem was dispelled after noticing that the .cpp file and .asm files had the same names (IntegerAddition.cpp and IntegerAddition.asm).
Renaming the assembly file to something else fixed the issue.
I encountered something weird in the MSVC compiler.
it puts function template definition in assembly while optimization eliminates the need for them.
It seems that Clang and GCC successfully remove function definition at all but MSVC does not.
Can it be fixed?
main.cpp:
#include <iostream>
template <int n> int value() noexcept
{
return n;
}
int main()
{
return value<5>() + value<10>();
}
assembly:
int value<5>(void) PROC ; value<5>, COMDAT
mov eax, 5
ret 0
int value<5>(void) ENDP ; value<5>
int value<10>(void) PROC ; value<10>, COMDAT
mov eax, 10
ret 0
int value<10>(void) ENDP ; value<10>
main PROC ; COMDAT
mov eax, 15
ret 0
main ENDP
Sample code on godbolt
The /FA switch generates the listing file for each translation unit. Since this is before the linking stage, MSVC does not determine if those two functions are required anywhere else within the program, and are thus still included in the generated .asm file (Note: this may be for simplicity on MS's part, since it can treat templates the same as regular functions in the generated .obj file, though realistically there's no actual need to store them in the .obj file, as user17732522 points out in the comments).
During linking, MSVC determines that those functions are in fact not actually used / needed anywhere else, and thus can be eliminated (even if they were used elsewhere, since the result can be determined at compile time, they'd still be eliminated) from the compiled executable.
In order to see what's in the final compiled executable, you can view the executable through a disassembler. Example for using MSVC to do this, is put a breakpoint in the main function, run it, then when the breakpoint is hit, right click and "View Disassembly". In this, you will see that the two functions don't exist anymore.
You can also generate the Mapfile using /MAP option, which also shows it does not exist.
If I am reading the documentation correctly, it seems as those MS chose to include explicit instantiations of templates classes and functions because it "is useful" when creating libraries. Uninstantiated templates are not put into the obj files though.
Just add /Zc:inline to your compile statement and it does the same thing as clang/GCC if you also wrap the template in an anonymous namespace to ensure it does not have external visibility.
#include <iostream>
namespace
{
template <int n> int value() noexcept
{
return n;
}
}
or if you mark the template function inline
template <int n> inline int value() noexcept
{
return n;
}
Both result in:
main PROC
mov eax, 15
ret 0
main ENDP
The /Zc:inline (Remove unreferenced COMDAT) switch was added in VS 2015 Update 2 as part of the C++11 Standard conformance which allows this optimization.
It is off-by-default in command-line builds. In MSBuild, <RemoveUnreferencedCodeData> defaults to true.
See Microsoft Docs
OTHERWISE It will be cleaned up in the linker phase with /OPT:REF.
I compiled your code as given on my vs2022 in release mode. I get
return value<5>() + value<10>();
00007FF65CD21000 mov eax,0Fh
}
00007FF65CD21005 ret
I am using Visual C++ 2010, and MASM as my x64-Assembler.
This is my C++ code:
// include directive
#include "stdafx.h"
// functions
extern "C" int Asm();
extern "C" int (convention) sum(int x, int y) { return x + y; }
// main function
int main()
{
// print asm
printf("Asm returned %d.\n", Asm());
// get char, return
_getch();
return EXIT_SUCCESS;
}
And my assembly code:
; external functions
extern sum : proc
; code segment
.code
Asm proc
; create shadow space
sub rsp, 20o
; setup parameters
mov ecx, 10
mov edx, 15
; call
call sum
; clean-up shadow space
add rsp, 20o
; return
ret
Asm endp
end
The reason I am doing this is so I can learn the different calling conventions.
I would make sum's calling convention stdcall, and modify the asm code so it would call sum the "stdcall" way. Once I got that working, I would make it, say, fastcall, and then call it in asm the "fastcall" way.
But look at my assembly code right now. When I use that code, no matter if sum is stdcall, fastcall or cdecl, it will compile, execute fine, and print 25 as my sum.
My question: How, and why can __cdecl, __stdcall and __fastcall all be called the exact same way?
The problem is that you're compiling for x64 targets. From MSDN
Given the expanded register set, x64 just uses the __fastcall calling
convention and a RISC-based exception-handling model. The __fastcall
model uses registers for the first four arguments and the stack frame
to pass the other parameters.
Switch over to compiling for x86 targets, and you should be able to see the various calling conventions in action.
As far as i know x64 only uses the __fastcall convention. __cdecl and stdcall will just be compiled as __fastcall.
I am trying to run a function on a separately allocated stack.
I want to keep the stack for later so I can restore it and resume the function.
The following code compiles and runs, but nothing prints to the screen.
#include <cstdlib>
#include <csetjmp>
#include <iostream>
using namespace std;
unsigned char stack[65535];
unsigned char *base_ptr = stack + 65535 - 1;
unsigned char *old_stack;
unsigned char *old_base;
void function()
{
cout << "hello world" << endl;
}
int main()
{
__asm
{
mov old_base, ebp
mov old_stack, esp
mov ebp, base_ptr
mov esp, base_ptr
call function
mov ebp, old_base
mov esp, old_stack
}
}
using vs2012/win8/intel Q9650
Welcome to C++ and name mangling. Function names in C++ are mangled by the compiler (such that using gcc function becomes _Z8functionv for me). This is to facilitate function overloading. The compiler keeps track of the actual names that it has given the different functions in the background so you aren't aware of it. This is a problem for any other language that tries to interact with C++.
This code won't link on my computer.
The solutions:
1) compile with g++ and pass the -S flag (so g++ -S test.cpp). And then take a look at the assembly output (cat test.s) to see what the function is called. Then change the name in "call function" to be "call _Z8functionv" (for me - it could easily be different for you).
2) use C: change the cout << to a printf statement and the above should work.
I take it that you aren't using gcc though (as the assembler is back to front for gas - I had to switch all the operands on the assembler around).
Actually I don't see any problem with your code.
Your sample taken as-is compiles, links and runs as expected.
Perhaps your problem with console settings, or some global STL/CRT initialization or whatever. Anyway, you may put a breakpoint inside your function to ensure you're getting there.
According to Intel's x86 documentation for MOV, page 3-403, you should load the SS register immediately before loading a new ESP value. That blocks any interrupts from running until ESP has been assigned.
I am getting the following error when trying to add a static variable to my struct:
Undefined Symbole s2::aa in module
file_name.cpp
s2 is the name of the structure and aa is the static variable.
The compiler I am using is Turbo C++ 3.0.
How do I fix this error?
I think you've probably forgotten to define the storage for the static variable:
int C::v = 0;
Static variable isn't allowed in structs in C because C requires the whole stucture elements to be placed together. To get an element value from a structure you count by the offset of the element from the beginning address of the structure.
However as far as I know you can have a static member in a C++ structure. Are you getting a specific error (which compiler?)
Why do you say this? Under g++ 4.1.2, this compiles:
#include <iostream>
struct Test
{
static int test; // declare (usually in header file)
};
int Test::test = 8; // define (usually in source file)
int
main()
{
std::cout << Test::test << std::endl;
return 0;
}
Static variables are allowed in C++ structs (as you say, they are just classes with a different default access specifier).
Static variables are not allowed in C structs, however.
This works...
typedef struct _X
{
static int x; // declare (usually in header file)
} X;
int X::x = 1; // define (usually in source file)
void _tmain(int argc, _TCHAR* argv[])
{
printf("%d", X::x);
}
effectively, you'll get a public symbol with the name [?test#Test##2HA] placed in a separate (globally accessible) segment/section in memory...
struct Test
{
static int test; // declare (usually in header file)
};
int Test::test = 8; // define (usually in source file)
int main()
{
int x = Test::test++;
return 0;
}
will translate in assembly to:
; Listing generated by Microsoft (R) Optimizing Compiler Version 19.16.27027.1
TITLE C:\WORK\C\Cpp\test.cpp
.686P
.XMM
include listing.inc
.model flat
INCLUDELIB LIBCMT
INCLUDELIB OLDNAMES
PUBLIC ?test#Test##2HA ; Test::test
_DATA SEGMENT
?test#Test##2HA DD 08H ; Test::test
_DATA ENDS
PUBLIC _main
; Function compile flags: /Odtp
_TEXT SEGMENT
_x$ = -4 ; size = 4
_main PROC
; File c:\work\c\cpp\test.cpp
; Line 9
push ebp
mov ebp, esp
push ecx
; Line 10
mov eax, DWORD PTR ?test#Test##2HA ; Test::test
mov DWORD PTR _x$[ebp], eax
mov ecx, DWORD PTR ?test#Test##2HA ; Test::test
add ecx, 1
mov DWORD PTR ?test#Test##2HA, ecx ; Test::test
; Line 11
xor eax, eax
; Line 12
mov esp, ebp
pop ebp
ret 0
_main ENDP
_TEXT ENDS
END
Compile with:
cl /c test.cpp /TP /Fatest.asm /link /NODEFAULTLIB /entry:main
In C++ structure,you can use static variables same as class.
But you can't use static variables in C stuctures.
Because in c, we can't access static variable with stucture name. In c++ we can access static member variable with class name,like below.
ClassName::staticVariableName
'C' stucture don't provide such facility but c++ stucture does.