This question already has an answer here:
Use gdb to Modify Binary
(1 answer)
Closed 5 years ago.
I have an ELF file that has certain strings in it that I would like to modify (they are paths to configuration directories). This answer to this question says something about running gdb --write <path_to_executable> to modify a string in <path_to_executable>, but does not go into further detail. What are the other things that I need to do with that command to accomplish my goal?
(There are some binary editing tools better than gdb, any hex edit program or some reverse engineering tools)
-write option of gdb is documented in the documentation of the gdb: https://sourceware.org/gdb/onlinedocs/gdb/Mode-Options.html#Mode-Options
-write
Open the executable and core files for both reading and writing. This is equivalent to the ‘set write on’ command inside GDB (see Patching).
Patching is link to https://sourceware.org/gdb/onlinedocs/gdb/Patching.html#Patching
17.6 Patching Programs
...
If you’d like to be able to patch the binary, you can specify that explicitly with the set write command. For example, you might want to turn on internal debugging flags, or even to make emergency repairs.
set write on
set write off
If you specify ‘set write on’, GDB opens executable and core files for both reading and writing; if you specify set write off (the default), GDB opens them read-only.
If you have already loaded a file, you must load it again (using the exec-file or core-file command) after changing set write, for your new setting to take effect.
show write
Display whether executable files and core files are opened for writing as well as reading.
Still no information about file editing. Just try memory editing command set something=value, where something is the address with correct type, use addresses of your strings (like in https://stackoverflow.com/a/3305200):
https://sourceware.org/gdb/onlinedocs/gdb/Assignment.html#Assignment
To store values into arbitrary places in memory, use the ‘{…}’ construct to generate a value of specified type at a specified address (see Expressions). For example, {int}0x83040 refers to memory location 0x83040 as an integer (which implies a certain size and representation in memory), and
set {int}0x83040 = 4
stores the value 4 into that memory location.
Related
This question already has answers here:
Convert assembly to machine code in C++
(3 answers)
Closed 6 years ago.
I would like to include NASM itself (the assembler) in a C++ project. Can I compile NASM as a shared library? If not, is there another assembler that works as a C or C++ library?
I checked libyasm but couldn't understand how I can use it to assemble my code.
Woah, this exploded when I was away.
I had solved this problem by tampering with the YASM source code, and totally forgot about the question in SO as it received absolutely no attention 8 months ago. Below are the details, followed by a better suggestion.
For the project that I had in mind, I needed to use YASM as a library, and I was in a hurry because I was doing this for a company. Back then there were no good libraries that I was aware of; and I had concluded that getting used to the LLVM framework was an overkill for the task (because all I wanted was to assemble singular x86 - x86_64 instructions and receive the bytes).
So I downloaded the source code for YASM.
Upon meddling with the code for a while, I noticed that the executable receives the file paths for input and output files; and passes these two strings along. I wanted char arrays in memory for the input and output; not files. So I figured, maybe if I could find all FILE pointers that are passed around, I can convert them to char pointers, and change every file read/write to array operations.
This turned out to be even more cumbersome than it sounds. Apparently YASM does not open input/output files once and uses the same FILE pointers; instead it passes around copies of the filepath strings. I needed a script that could make all the necessary changes for me, this wasn't good for me.
Eventually, I found all fopen/fclose calls in the program with a script, and replaced them with my_fopen/my_fclose. For each file that I made these replacements, I included my header file in which I implemented these two functions.
In both of these functions, I checked the incoming string, compared it with "fake_file". If they are equal, I passed a 'fake' FILE pointer pointing to two portions of memory, obtained from the function calls fmemopen and open_memstream. Otherwise I simply called the actual fopen/fclose functions. In other words, I redirected these two calls (only for a given filename) to a memory file. Then, I called the library with the filename parameter set to 'fake_file'.
Since I have had limited myself to Linux at that point, this approach worked for me. I also found out (using Valgrind) that there was a memory leak in the library version, so I wrote a very primitive garbage collector for it. Basically I wrapped malloc's etc. to keep track of all allocations that are not freed, and clean them after each execution.
This approach also allowed me to automate these changes using a script. Unfortunately I did all these in a company so I cannot leak any actual code.
Better suggestion:
As of May 31, 2016; you can use Keystone Engine instead. It is "based on LLVM, but it goes much further with a lot more to offer." The disassembly engine Capstone and this are a near perfect couple for assembly and disassembly. If you need either of these components, I suggest these instead of doing the hacks I described. Both of these engines are currently being developed; and even though Keystone has some small bugs, Capstone is very robust at the moment.
TL;DR: Use keystone.
In *nix system there is a command called 'file', which can tell you the underlying type of a file. Say, if you rename a binary executable's name into foo.txt, or you rename a mp3 file into .txt, the system will always tell you the real type of the file. But in Windows, there seems no such functionality, if you rename an executable into .txt, you cannot execute it. Can anyone explain to me how this is done in *nix system, and how can I find the real type of a file using C++, especially in windows, where I cannot use std::system("file blah")?
File utility uses libmagic library. It recognises filetype parsing "special" fields in the file.
Of course, you can program by yourself recognition of some formats, but sometimes this requires plenty of work. E.g. when you try to differentiate between different formats of MP4.
Developers of that library did pretty huge amount of work. So it's adviced to use their results if you want to get god results in saying what type format you deal with.(this is a big sphere, really, and if knowing what type format you are working with,better rely on them then on your code)
File utility - http://www.darwinsys.com/file/
You can download source code and see how really many different recognition types they use.
Download archive file-4.26 -> magic -> Magdir
Personally I had luck with compiling file 4.26 on Windows ftp://ftp.astron.com/pub/file/
Caution It's merely a convention that files of certain formats should have predefined signatures and it's true almost always and helps identify formats of files properly.
If it's not point of concern, you can surely trust signature. But just keep in mind that anyone having enough knowledge and wish can open a file in hex editor and playing with bits make another format of file.
Even in Unix/Linux, the system doesn't actually definitively know a file's type. The "file" program makes an educated guess by comparing the file's contents against a database of patterns that characterize a variety of common file types, but it's no more than a guess — it doesn't know about all possible file formats, and it can be wrong about the ones that it does know.
It's entirely possible to write a program like "file" for Windows; it doesn't depend on any special capabilities in the OS. Cygwin provides a Windows port of the "file" program, for example.
The issue of renaming a program to have a .txt extension is unrelated to the "file" program. That comes from the fact that Windows decides whether a file is executable based on its name (specifically, its extension), whereas Unix/Linux decides whether a file is executable based on its permissions — not its contents. If you chmod a-x a program on a Linux system, the system will consider it non-executable, just like if you remove the .exe extension from a program on Windows.
The command reference is suggesting that the type information is saved to an external place for further usage. It is also mentioning magic numbers, which is refering to file signatures.
Being 100% sure of a file type is theorically impossible since there is no precise rules around what a certain type should contain. Even if they were such rules, it would be possible to alter the file in a way to make it look like another one. While both signatures and extension can give you a good idea of what the type actually is, you still need to face the possibility of dealing with a wrong type.
UNIX file command uses heuristics. There is a database of magic numbers, usually in /usr/share/file/magic and /etc/magic/ that allows you to add new file "types" to be recogized by the file command. It simply probes the file to look for magic numbers (signatures) in its contents.
UNIX traditionally doesn't have the same type of file extension and type associations that Windows does, although Linux is accumulating that in recent times.
I would think on Windows you'd want to at least check the file extension association, to be correct. But even within a given extension (such as .txt) the individual program may perform its own heuristics. Example, notepad has to make an educated guess at the character encoding when it opens a file. Raymond Chen wrote a good read in his blog about it The Old New Thing - The Notepad file encoding problem, redux
This question already has answers here:
How can I measure the actual memory usage of an application or process?
(31 answers)
Closed 8 years ago.
I was asked to write a c/cpp program to find size of any process in an interview. can any one tell me how this could be achieve ?
p.s.- before marking the que as duplicate - plz read it carefully : I have asked how to find via c/cpp program not just with any unix/linux shell command
You can make use of getrusage. But keep in mind that it is not implemented on all systems.
Or by reading the /proc/[pid]/statm
Otherwise, try one of these (command line options).
It's not part of standard C++ and thus depends on the operating system.
On linux for example that is done by accessing /proc filesystem.
Another option is of course to just call a system command like ps and parse its output (that is is what I'd do in a Python script).
Being able to interpret the numbers you can get is however another non trivial problem.
Use
size <executable>
Output
text data bss dec hex filename
1361623 1984 2708 1366315 14d92b <executable>
It shows text, data, bss and total size
I have the source code of a program. The source code is extremely huge and written in C/C++. I have the credentials to modify the source code, compile and execute it.
I want to know the filenames of all the files opened and closed by this program when it executes. It would be a plus if this list is sorted in the order the file operations occurred.
How can I get this information? Is there some monitoring tool I need to use or can I inject a library call into the C++ code to achieve this? The code is too large and complicated to hunt down every file open/close call and add a printf there. Or adding a pseudo macro to the file open API call might also be difficult.
Note that this is not the same as viewing what files are open currently by a process. I am aware of the many questions on StackOverflow that already address this problem (using lsof or /proc and so on).
You can use strace as below
$ strace -e trace=open,close -o /tmp/trace.log <your_program> <program_options>
In file /tmp/trace.log you will get all open, close operation done by the program.
In addition to strace, you can use interposition to intercept open/close syscalls. If you Google for "interposition shared library linux" you'll get many other references also.
I am understanding that you want to determine statically what files a given source code could open (for many runs of its compiled program).
If you just want to know it dynamically for a given run, use strace(1) as answered by Rohan and/or interposition library as answered by Kec. Notice that ltrace(1) could also be useful, and perhaps more relevant (since you would trace stdio or C++ library calls).
First, a program can (and many do) open a file whose name is some input (or some program argument). Then you cannot add that arbitrary file name to a list.
You could #define fopen and #define open to print a message. You could use LD_PRELOAD tricks to override open, fopen
If in C++, the program may open files using std::ifstream etc...
You could consider customizing the GCC compiler with MELT to help you...
I have a compiled program which I want to know if a certain line exist in it. Is there a way, using my source code, I could determine that?
Tony commented on my message so I'll add some info:
I'm using the g++ compiler.
I'm compiling the code on Linux(Scientific)/Unix machine
I only use standard library (nothing downloaded from the web)
The desired line is either multiplication by a number (in a subfunction of a while group) or printing a line in a specific case (if statement)
I need this becouse I'm running several MD simulations and sometimes I find my self in a situation where I'm not sure of the conditions.
objdump is a utility that can be used as a disassembler to view executable in assembly form.
Use this command to disassemble a binary,
objdump -Dslx file
Important to note though that disassemblers make use of the symbolic debugging information present in object files(ELF), So that information should be present in your object files. Also, constants & comments in source code will not be a part of the disassembled output.
Summary
Use source code control and keep track of which source code revision the executable's built from... it should write that into the output so you can always cross-reference the two, checkout the same sources and rebuild the executable that gave you those results etc..
Discussion
The desired line is either multiplication by a number (in a subfunction of a while group) or printing a line in a specific case (if statement)
I need this becouse I'm running several MD simulations and sometimes I find my self in a situation where I'm not sure of the conditions.
For the very simplest case where you want all the MD simulations to be running the latest source, you can compare timestamps on the source files with the executable to see if you forgot to recompile, compare the process start time (e.g. as listed by ps) with the executable creation time.
Where you're deliberately deploying multiple versions of the program and only have the latest source, then it gets pretty tricky. A multiplication will typically only generate a single machine code instruction... unless you have some contextual insight you're unlikely to know which multiplication is significant (or if it's missing). The compiler may generate its own multiplications for e.g. array indexing, and may sometimes optimise multiplications into bit shifts (or nothing, as Ira comments), so it's not as simple as saying 'well, it's my only multiplication in function "X"'. If you're printing a specific line that may be easier to distinguish... if there's a unique string literal you can search for it in the executable (e.g. puts("Hello") -> strings program | grep Hello, though that may get other matches too, and the compiler's allowed to reuse string literal sequences so "Well Hello" might cater to your need via a pointer to 'H' too). If there's a new extern symbol involved you might see it in nm output etc..
All that said (woah)... you should do something altogether different really. Best is to use a source control system (e.g. svn, cvs...), and get it configured so you can do something to find out which revision of the codebase was used to create the executable - it should be a FAQ for any revision control system.
Failing that, you could, for example, do something to print out what multipliers or conditions the progarm was using when it starts running, capturing that in your logs. While hackish, macros allow you to "stringify" their parameters, so you can log and execute something without typing all the code twice. Lots of other options too.
Hope some of that helps....