I have a stripped binary to analyze. Some interesting code is located at address 0x1234, how do I find all jumps to that address ?
(of course I don't expect to find computed jumps to that address, just the ones which are hardcoded). I cannot use a simple search since the jumps instruction are typically coded with relative offset and there are many kind of jumps (je, jne, jmp...). I am working with GDB-PEDA on x86_64 / linux for now if it has to be a platform specific approach.
how do I find all jumps to that address?
Try objdump -d a.out | egrep 'j.* 0x1234'
Related
I'm studying C++ using the website learncpp.com. Chapter 0.5 states that the purpose of a compiler is to translate human-readable source code to machine-readable machine code, consisting of 1's and 0's.
I've written a short hello-world program and used g++ hello-world.cpp to compile it (I'm using macOS). The result is a.out. It does print "Hello World" just fine, however, when I try to look at a.out in vim/less/Atom/..., I don't see 1's and 0', but rather a lot of this:
H�E�H��X�����H�E�H�}���H��X���H9��
Why are the contents of a.out not just 1's and 0's, as would be expected from machine code?
They are binary bits (1s and 0s) but whatever piece of software you are using to view the file's contents is trying to read them as human readable characters, not as machine code.
If you think about it, everything that you open in a text editor is comprised of binary bits stored on bare metal. Those 1s and 0s can be interpreted in many many different ways, and most text editors will attempt to read them in as characters. Take the character 'A' for example. It's ASCII code is 65 which is 01000001 in binary. When a text editor reads through the file on your computer it is processing those bits as characters rather than machine instructions, and therefore it reads in 8 bits (byte) in the pattern 01000001 it knows that it has just read an 'A'.
This process results in that jumble of symbols you see in the executable file. While some of the content happens to be in the right pattern to make human readable characters, the majority of them will likely be outside of what either the character encoding considers valid or knows how to print, resulting in the '�' that you see.
I won't go into the intricacies of how character encodings work here, but read Character Encodings for Beginners for a bit more info.
I'm working with an algorithm that uses uint16_t as the datatype. There are 4 half words in the array so I am trying to display the four half words in hex. I have not done anything other than x/4h:
(gdb) x/4h master_key
0x7fffffffd330: u"Āईᄐᤘ桷"
0x7fffffffd33c: u"桷敥\xbe0#"
0x7fffffffd346: u""
0x7fffffffd348: u"ꆋ翿"
According to the GDB | Memory:
f, the display format
The display format is one of the formats used by print (‘x’, ‘d’, ‘u’, ‘o’, ‘t’, ‘a’, ‘c’, ‘f’, ‘s’), and in addition ‘i’ (for machine
instructions). The default is ‘x’ (hexadecimal) initially. The default
changes each time you use either x or print.
I'm not sure why x is trying to print strings but I would like it to print the half words in hex.
GDB does not seem to be following the manual. I think I need to change the behavior of x and make it persistent. How do I tell GDB to print the half words in hex?
The following in in my .gdbinit but it looks like GDB is ignoring it (not a surprise).
(gdb) shell cat ~/.gdbinit
set output-radix 16
set history save on
set history size 256
set logging on
set logging overwrite on
I have an application that needs to use GDB/MI to get information about a process. Right now I am setting a breakpoint in main and running the process. By using "info locals" I can get a neat list of the local variables in the current frame. While this is good, I need to be able to see what the global variables are.
Is there a way to do this that isn't too painful? I can use "info variables" and get a list of ALL variables that is way too extensive and could hurt the performance of my application. Is there a simpler way to get a list of the global variables?
EDIT: Added that I'm wanting to use GDB/MI.
According to GDB docs the info variables will print out any variables defined outside of functions. This will include your globals and static variables.
If you know the name of the global, or follow a particular naming pattern, you can provide GDB with regex to narrow it down.
So I found a solution for what I want to do.
I followed this answer here. However, I found that when I ran the command that was given in the answer, I got some unneeded garbage (I'm running this on a Mac). I fixed this by eliminating the lines that end in .eh and I noticed that the other lines had lines that started with "__" so I eliminated lines with " __" (that's a space before the two underscores). I used the following to get the correct output:
g++ -O0 -c test.cpp && nm test.o | egrep ' [A-Z] ' | egrep -v ' [UTW] ' | egrep -v '.eh' | egrep -v ' __'
I have a strange question, I am wondering if there is a way to add/edit a string (or something that could be accessed via the C program (inside, ie not an external file)) after it has been compiled?
The purpose is to change a URL on an Windows program via PHP on Linux (obviously I cannot just compile it).
Many posix platforms come with the program strings which will read through a binary file searching for strings. There is an option to print out the offset of the strings. For example:
strings -td myexec
From there you can use a hex editor but the main problem is that you wouldn't be able to make a string bigger than it already is.
A Hex Editor is probably your best bet.
A hex editor will work, but you have to be careful not to alter the size of the executable. If the string happens to be in the .res file, you can use ResEdit.
There are specialized tools to modify existing executable files. A notable tool is
Resource Tuner, which can be used to edit all sorts of resources in an executable.
Another option is to use a text editor, like Hex Workshop, to edit the characters in the strings of an executable. However, bear in mind that with this method, you can only edit existing strings in an executable, and the replaced strings must have an equal or smaller length than the original ones, otherwise you'll end up modifying executable code.
As others have suggested, you can use a binary file editor (hex editor) to change the string in the executable file. You will want to embed into the string a marker (unique sequence of bytes) so that you can find the string in your file. And you will want to ensure that you are reading/writing the file at correct offsets.
As OP stated plans to use PHP on linux to rewrite the file, you will need to use fseek to position the file pointer to the starting location of this URL string, ensure you stay within the size of the string as you replace bytes, and then use fseek/rewind and fwrite to change the file.
This technique can be used to change a URL embedded in a binary file, and it can also be used to embed a license key into a binary, or to embed an application checksum value into a binary so that one can detect when the binary has changed.
As some posters have suggested, you may need to recompute a checksum or re-sign a binary file. A quick way to check for this behavior would be to compile two versions of your binary with different URL values. Then compare the files and see if there are differences other than in the URL values.
to properly edit a string in a compiled program you need to:
read in the files bytes
search the .rdata for strings and record the address of the first occurrence of the string
convert that address to the virtual address using some of the data in the file header
write a new .rdata onto the executable and write your new string into it recording its address and getting its virtual address.
search the .text section for references to the virtual address of the old string and replace it with the reference to your new string.
fortunately i made a program to do this on windows it only works on 32 bit programs here
Not unless you want to poke around in the generated hex or assembly code.
I know o some disassemble libs , but what I'm looking for is one that has an api like:
void * findAddrOfFirstInstructionStartingFrom( void * startAddress , InstructionType instruction);
void* addr = findAddrOfFirstInstructionStartingFrom(startAddress , JMP);
and other apis smiler to this one like search for something specific not disassemble all instructions stating from an address and get all sorts of info because it slow if you only want to find something specific not everything.
If you know any pls let me know , if there isn't any pls tell me one that is open source and easy to modify.
You did not tag nor tell the processor architecture, so it is unlikely that you get a real answer.
Commonly native code instructions are with very varying length depending on operands they take so you have to disassemble the thing before searching. Otherwise you just find first sequence of bytes that matches the pattern of instruction you search for. It is most likely not a real instruction but part of operands of previous instruction.
EDIT: Since you updated title, i can think of choices Borg and PEDasm are open source. If you drop that open-source thing then definitely IDA pro.
I'm not aware of any API that can do this but it can be accomplished using some command line scripting:
objdump -d --start-address address file | grep -m 1 instruction | cut -d : -f 1
So, for example, to find the first JMP instruction starting at address 0x08048664 in the file a.out, you can do this:
$ objdump -d --start-address 0x08048664 a.out | grep -m 1 jmp | cut -d : -f 1
8048675
What you probably want is not just a library, but some Disassembler Framework. Have a look at IDA-Pro, which also provides a versatile scripting interface (and a disassembler API)