How to figure out source line number from Linker Map - c++

For some reason I have only the linker map for an application I am debugging. There is a crash log which says crash occurred at offset "myApp.exe! + 4CA24".
From the linker map I am able to locate the method. Say this is at offset "myApp.exe! + 4BD7C".
Is there anyway to figure out the exact line in source code using just the above info?
I know if we have a .cod file it makes it very easy, but I don't have one (and can't create).

The best you can do if you only have MAP-files is to study the EXE-file in a disassembler and compare to constructs that you recognize from the common ways the compiler generates code. These you have to learn. That means learning at least some assembler is required. This is good knowledge that will help you in the future, especially if you have to debug a lot of code.
A slightly simpler approach is to download the free Intel-books on processor instructions and simply check out their sizes. This way you can count your way to the faulting instruction. For best results the two methods should be used in conjunction with each other.
Typically what you'd be looking for is something that looks a bit like this:
mov DWORD PTR [edi+40], eax
(Instruction, register, offset, size and order can be different but indirection is typically where most code crashes)
Whatever you do you should seriously consider turning on COD-file generation for the future as that makes it super-easy to find the faulting line.

It depends on the actual information in the map file - if it has line number information (which is pretty rare nowadays), it'll be obvious and you'll be able to do it. Otherwise the best you can do is guess.

Related

How to measure Code Size?

When certain features or optimizations are discussed, Code Size is often mentioned.
While I certainly understand the basic concept, that is, that a collection of code, compiled to machine code will result in X bytes of machine code (plus static data) I have recently realized that I'm very unsure how to actually measure Code Size of a given binary.
So, how do you measure Code Size?
Do you just check how big the resulting binary ("executable", .exe) is?
Do you need a tool such as dumpbin.exe or some specific linker flags to get detailed results?
You can tell the linker to produce a map file. This gives about the most detailed information that's easy to get (i.e., much short of reverse engineering the code by hand).
Depending on the code, using dumpbin on an object file can produce meaningful results, but can also produce simply "anonymous object" -- especially (exclusively?) when you ask for link-time code generation.
I'd say your best bet is to disassemble the binary.
In the context of code optimizations, total code size isn't typically what is meant, but rather code size for some specific part of your program.
If you mean .exe in bytes in the literal term I think you're over-thinking the question. Your file explorer should say on the right the size of files (if it doesn't, right click the file and open properties). The files you're looking for should be in debug named after .exe
If it's something else, sorry.

print the code of a function in a DLL

I want to print the code of a function in a DLL.
I loaded the dll, I have the name of the desired function, what's next?
Thank you!
Realistically, next is getting the code. What you have in the DLL is object code -- binary code in the form ready for the processor to execute, not ready to be printed.
You can disassemble what's in the DLL. If you're comfortable working with assembly language, that may be useful, but it's definitely not the original source code (nor probably anything very close to it either). If you want to disassemble it, loading it in your program isn't (usually) a very good starting point. Try opening a VS command line and using dumpbin /disasm yourfile.dll. Be prepared for a lot of output unless the DLL in question is really tiny.
Your only option to retrieve hints about the actual implemented functionality of said function inside the DLL is to reverse engineer whatever the binary representation of assembly happens to be. What this means is that you pretty much have to use a disassembler(IDA Pro, or debugger, e.g. OllyDbg) to translate the opcodes to actual assembly mnemonics and then just work your way through it and try to understand the details of how it functions.
Note, that since it is compiled from C/C++ there is lots and lots of data lost in the process due to optimization and the nature of the process; the resulting assembly can(and probably will) seem cryptic and senseless, but it still does it's job the exact same way as the programmer programmed it in higher level language. It won't be easy. It will take time. You will need luck and nerves. But it IS doable. :)
Nothing. A DLL is compiled binary code; you can't get the source just by downloading it and knowing the name of the function.
If this was a .NET assembly, you might be able to get the source using reflection. However, you mentioned C++, so this is doubtful.
Check out this http://www.cprogramming.com/challenges/solutions/self_print.html and this Program that prints its own code? and this http://en.wikipedia.org/wiki/Quine_%28computing%29
I am not sure if it will do what you want, but i guess it may help you.

Changing parts of compiled binaries

learned english as a second lang, sorry for the mistakes & awkwardness
I have given a peculiar project to work on. The company has lost the source code for the app, and I have to make changes to it. Now, reverse engineering the whole thing is impossible for one man, its just too huge, however patching individual functions would be feasible, since the changes are not that monumental.
So, one possible solution would be compiling C code and somehow -after rewriting addresses- patching it into the actual binary, ideally, replacing the code the CALL instruction jumps to, or inserting a JMP to my code.
Is there any way to accomplish this using MingW32? If it is, can you provide a simple example? I'm also interested in books which could help me accomplishing the task.
Thanks for your help
I use OllyDBG for this kind of things. It allows you to see the disassembly and debug it, you can place breakpoints etc, and you can also edit the binary. So, you could edit the PE header of that program adding a code section with your (compiled) code inside, then call it from the original program.
I can't give you any advice since I've never tried, although I thought about it many times. You know, lazyness.. :)
I would disassemble the program with a high-quality disassembler that produces something that can be assembled back into a runnable app, and then replace the parts you need to modify with C code.
Something like this will let you reverse the machine code into source. It won't be pretty but it does work.
http://www.hex-rays.com/idapro/
There are also tools for runtime patching http://www.dyninst.org/ for instance. They really aren't made for patching but they can do the trick.
And of course the last choice is to just use an assembler and write machine code :)

Learning to read GCC assembler output

I'm considering picking up some very rudimentary understanding of assembly. My current goal is simple: VERY BASIC understanding of GCC assembler output when compiling C/C++ with the -S switch for x86/x86-64.
Just enough to do simple things such as looking at a single function and verifying whether GCC optimizes away things I expect to disappear.
Does anyone have/know of a truly concise introduction to assembly, relevant to GCC and specifically for the purpose of reading, and a list of the most important instructions anyone casually reading assembly should know?
You should use GCC's -fverbose-asm option. It makes the compiler output additional information (in the form of comments) that make it easier to understand the assembly code's relationship to the original C/C++ code.
If you're using gcc or clang, the -masm=intel argument tells the compiler to generate assembly with Intel syntax rather than AT&T syntax, and the --save-temps argument tells the compiler to save temporary files (preprocessed source, assembly output, unlinked object file) in the directory GCC is called from.
Getting a superficial understanding of x86 assembly should be easy with all the resources out there. Here's one such resource: http://www.cs.virginia.edu/~evans/cs216/guides/x86.html .
You can also just use disasm and gdb to see what a compiled program is doing.
I usually hunt down the processor documentation when faced with a new device, and then just look up the opcodes as I encounter ones I don't know.
On Intel, thankfully the opcodes are somewhat sensible. PowerPC not so much in my opinion. MIPS was my favorite. For MIPS I borrowed my neighbor's little reference book, and for PPC I had some IBM documentation in a PDF that was handy to search through. (And for Intel, mostly I guess and then watch the registers to make sure I'm guessing right! heh)
Basically, the assembly itself is easy. It basically does three things: move data between memory and registers, operate on data in registers, and change the program counter. Mapping between your language of choice and the assembly will require some study (e.g. learning how to recognize a virtual function call), and for this an "integrated" source and disassembly view (like you can get in Visual Studio) is very useful.
"casually reading assembly" lol (nicely)
I would start by following in gdb at run time; you get a better feel for whats happening. But then maybe thats just me. it will disassemble a function for you (disass func) then you can single step through it
If you are doing this solely to check the optimizations - do not worry.
a) the compiler does a good job
b) you wont be able to understand what it is doing anyway (nobody can)
Unlike higher-level languages, there's really not much (if any) difference between being able to read assembly and being able to write it. Instructions have a one-to-one relationship with CPU opcodes -- there's no complexity to skip over while still retaining an understanding of what the line of code does. (It's not like a higher-level language where you can see a line that says "print $var" and not need to know or care about how it goes about outputting it to screen.)
If you still want to learn assembly, try the book Assembly Language Step-by-Step: Programming with Linux, by Jeff Duntemann.
I'm sure there are introductory books and web sites out there, but a pretty efficient way of learning it is actually to get the Intel references and then try to do simple stuff (like integer math and Boolean logic) in your favorite high-level language and then look what the resulting binary code is.

Finding division by zero in a big project

Recently, our big project began crashing on unhandled division by zero. No recent code seems to contain any likely elements so it may be new data sets affecting old code. The problem is the code base is pretty big, and running on an embedded device with no comfortable debug access (debug is done by a lot of printf()s over serial console, there is no gdb for the device and even if there was, the binary compiled with debug symbols wouldn't fit).
The most viable way would likely be to find all the division operations (they are relatively infrequent), and analyze code surrounding each of them to see if any of the divisor variables was left unguarded.
The question is then either how to find all division operations in a big (~200 files, some big) C++ project, or, if you have a better idea how to locate the error, please give them.
extra info: project runs on embedded ARM9, a small custom Linux distro, crosscompiled with Cygwin/Windows crosstools, IDE is Eclipse but there's also Cygwin with all the respective goodies. Thing is the project is very hardware-specific, and the crashes occur only when running at full capacity, all the essential interconnected modules active. Restricted "fault mode" where only bare bones are active doesn't create them.
I think the most direct step, would be to try to catch the unhandled exception and generate a dump or printf stack information or similar.
Take a look at this question or just search in google for info relating to exception catching in your particular environment.
By the way, I think that the division could happen as a result of a call to an external library, so it's not 100% sure that you'll find the culprit just by greping your code.
If I remember right, the ARM9 doesn't have hardware divide so it's going to be implemented in a function call the compiler makes whenever it has to perform a division.
See if your toolset implements the divide by zero handling in the same way as ARM's toolset does (it's likely that it does something at least similar). If so, you can install a handler that gets called when the problem occurs and you can printf() registers and stack so that you can determine where the problem is occurring. A possible similar alternative is that your small Linux distro is throwing a signal you can catch.
I'm not sure how you're getting your information that a divide by zero is occurring, but if it's because the runtime is spitting out a message to that effect, you always have the option of finding out where that is handled in the runtime, and replacing it with your own more informative message. However, I'd guess that there's a more 'architected' way to get your code to run (a signal handler or ARM's technique).
Finding all of the divisions shouldn't be hard with a custom grep search. You can easily distinguish that usage from other usages of the / and % character in C++.
Also, if you know what you are dividing, you could globally overload the / and % operator to have a __FILE__ and __LINE__ informing assertion. If using a makefile, it shouldn't be hard to include the custom operator code in all the linked files without touching the code.
You should use this as an excuse to invest in improving the debug-ability of your device - for both this problem and future issues. Even if you can't get live debugging, you should be able to find a way to generate and save off core dumps for post-mortem debugging (pinpointing the source or any unhandled exception immediately).
PC-Lint might help, it's like Findbugs for C++. It is a commercial product but there is a 30 money back guarantee.
Handle the exception.
Usually the exception will be handed a structure that contains the address that caused the exception and other information. You will probably have to become familiar with the microcontroller's datasheet or RTOS manual.
Use the -save-temps for gcc and find the relevant assembly for division in the generated .s file. If you're lucky it will be something fairly distinctive, possibly even a function call. If it's a function call you can use weak linking to override it with your own checked version. Otherwise locating the divisions in the assembly should give you a very good idea where they are in the C/C++ code and you can instrument them directly.
usually you could modify/override the divide-by-zero exception handler if you have access to the exception handler routines.
in case of ARM, the division is done by a library routine. and there are mechanisms to inform the user-code, when a divide by zero occurs.
see http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka4061.html
i would suggest to provide a __rt_raise() as said in the page above.
__rt_raise(2,2) will get called when the divide routine detects a divide by zero.
so you can print the LR register.
and then use addr2line to crossref it against the source line
The only way to find those conditions is the usual:
try to reproduce the problem without looking at the source (as the bug already happened you should have info on the part of the program that is affected)
if found, check the source for this point and fix it, otherwise:
2.1. grep for each / not followed by a * or / (grep "/[^/*]" i think)
2.2. find the conditions for which the code is executed and reproduce it
The exception already has the address location of the offending divide by zero code. The CPU saves register contents when a exception occurs including the PC(program counter). Your OS should pass this information along (I assumes that is how you know it is divide by zero). Print the address and go look in your code. If you can print a stack trace it would be even easier to solve.
Another option would be to check the differences in your version control software between the last know working version and the first non working version. This should give you a limmited change set within which to search for the problem.