getting the first and last images (addresses) of the code - llvm

I am trying to grab the addresses that associated with the start of the source code and the one at the end. I tried to do it using LLVM, Clang but I could not.
Is there a way to get the memory addresses that associated of each line in the source code?
Thanks

There are several possibilities:
You can use debug information for this. Note, however, that this
information might not be precise for optimized code
Alternatively,
you can use special linker script which will insert two symbols
before and after all the code in the code section.

Related

Associate text from source code line to line - too fragile

I need to associate textual data with the lines in a source code file. Something like "these lines are to create a Myclass object" -> lines from 20 to 32.
The problem is that this kind of line tracking is highly fragile: it is sufficient that someone adds a newline to break my correspondence between associated text and lines.
I need an idea to make this link a bit stronger (not too much but at least resisting to a few line shifts), suggestions are greatly welcome.
An easy solution would be to hash (md5 is pretty easy and accessible) the lines and store the hash along the data.
You can then check the hash against the possibly modified file. If it matches, great, otherwise begin checking previous/next lines for a match.
One approach might be to enlist the help of a source control system. For example, using Git, you could associate textual data with a specific version of the source code. If the source code is changed, you can use a "diff" algorithm to discover which line(s) have been added or removed. Using that delta information, you can then update your annotation lines (for example, adding a line at the top of the file would cause your 20-32 annotation to move to 21-33).
Are you trying to implement some form of automatic documentation system? If so, then basing this around line numbering is indeed fragile. I would suggest using some sort of markup to associate the text with semantic blocks of code that are robust when moved or altered. Perhaps something along the lines of doxygen might be what you are looking for.

GDB with multiple files of MySQL source code

I am trying to use gdb with MySQL source code which is written in C/C++. In mysql-test/t, I create a custom test case file, say, example.test and then debug it by using the following line of code
/mysql-test-run --gdb example
Now I want to see the flow of execution as it changes from one function in a file to another in some different file. I am not sure of how the execution changes, so I can't pre define the break points. Any solution to how I can get to see the flow with multiple files of source code?
You can use the next directive to take line-by-line steps through the source. When appropriate, you can use the step directive to take steps "into" the function(s) being called on the current line.
A reasonable method would be to do next until you think you only just pass the externally visible behavior you're looking for. Then start over, stopping at the line right before you saw the behavior last time. Then step this time. Repeat as necessary until you find the code you're looking for. If you believe that it's encountering some sort of deadlock, it's significantly easier -- just interrupt (Ctrl-C) the program when you think it's stuck and it should stop right in the interesting spot.
In general, walking through the source you'll build up some places you think are interesting. You can note the source file and line number and/or function names as appropriate and set those breakpoints directly in the future to avoid tedious next/next/next business.

How to find all global variables in C++ source code, DLL or any file created by the VC++ compiler?

I'm making my application thread-safe. One of the steps is to synchronize access or eliminate usages of global variables. I'm using Visual Studio. I can't find any good way to find all global variables in my codebase. It's impossible to create a good text search pattern and I can't find any helpful tool. Do you guys know any good way to do that? It could be a source code analysis tool or a binary file analyzer.
This could help:
Open the project in visual studio.
Open 'Class View' of the project
Under the project title, you will find 'Global Functions and Variable'.
I have checked this with Visual Studio 2010 and above.
Edit: As suggested by Ajay in comments, you could also categorize items in groups. For grouping items:
In class view, right click on project title
Select `Group By Object/Member Type'
Select the required tree like variables or structures or enums etc.
One option might be letting the linker generate a map file (/MAP in Visual Studio).
You will get a .map file for each binary with two sections:
A table of segments
Start Length Name Class
0001:00000000 00010000H .textbss DATA
0002:00000000 000034b4H .text CODE
0003:00000000 00000104H .CRT$XCA DATA
0003:00000104 00000104H .CRT$XCAA DATA
0003:00000208 00000104H .CRT$XCZ DATA
0003:0000030c 00000104H .CRT$XIA DATA
...
A list of symbols (functions and data)
Address Publics by Value Rva+Base Lib:Object
0000:00000000 ___safe_se_handler_count 00000000 <absolute>
0000:00000000 ___safe_se_handler_table 00000000 <absolute>
0000:00000000 ___ImageBase 00400000 <linker-defined>
0001:00000000 __enc$textbss$begin 00401000 <linker-defined>
0001:00010000 __enc$textbss$end 00411000 <linker-defined>
0002:000003a0 _wmain 004113a0 f console4.obj
...
You can tell apart the functions from variables by the "CODE" / "DATA" designaiton in the segment list.
Advantage: You will get all symbols, even those in libraries, that were not removed by the Linker.
Disadvanatge: You will get all symbols, even those in libraries, that were not removed by the Linker. I don't know of any tool that does the code/data separation automatically.
I know the http://code.google.com/p/data-race-test/wiki/ThreadSanitizer program (product of google) which can work in Windows and on compiled code. It is dynamic instrumentation program (like valgrind or bit like qemu/virtualbox), which add some checks to memory accesses. It will try to find some threading problems. You can just run your program under control of threadsanitizer. There will be slowdown from dynamic translation and from instrumentation code (up to 20x-50x times slower). But Some problems will be detected automatically.
It also allows you to annotate some custom synchronization functions in source code.
Wiki of program has links to other thread-race detectors: http://code.google.com/p/data-race-test/wiki/RaceDetectionLinks
cppclean is a static analysis tool that can help you. From the documentation:
cppclean finds global/static data that are potential problems when using threads.
A simple example with a static local variable and a global variable follows.
./example.h:
void foo();
./example.cpp:
#include "example.h"
int globalVar = 42;
void foo(){
static int localStatic = 0;
localStatic++;
}
Open a terminal and run cppclean as follows:
$ cppclean --include-path . example.cpp
example.cpp:3: static data 'globalVar'
example.cpp:6: static data 'localStatic'
Unfortunately, cppclean has some parsing issues and bugs. However, these issues are pretty rare, and affected below a percent of all code I've tested.
You can try CppDepend by using its code query language
from f in Fields where f.IsGlobal select f
Maybe dumpbin tool will help here. You can run it with /SYMBOLS key to display the COFF symbol table and look for External symbols - global variables should be in this list. DUMPBIN /SYMBOLS.
You all are making this too complicated.
1.
Copy the code of each of your files (one at a time separate from the others) to a string or a wide string or etc. and then parse out everything that is from "{" to "}" uninclusive. Save the result to an exterior file. After the first time, then append to that file.
2.
Even if you have 1,000 lines total of what is left after all that parsing, in that are all of your globals (depending upon how you created the globals). If you created them via a namespace, etc. then go back and parse for that. I doubt that most programmers will have 1,000 globals, but for some applications it might be what they use. If you do not have too many at that point then manually edit that text file of the results.
I have found that maybe 90+ % of the answers on this site are bloated with far too much complexity that just eats up cpu time and memory space. Keep it simple.
You might find it handy to have a globals.h file which you load early and keep most or all of you globals there. It looks like time to do a lot of clean up.

Finding location of crash using Map file

I am investigating a faulty code. Application verifier shows heap is corrupted after below call:
AA!Class::Function+dbaf
I have map file with me.Please help me how to reach on line number using above information and information present into Map file.
Preferred load address is 00400000
0002:00000dc4 __imp_?Class#Function##QAEXV?$vector#Uty_point##V?$allocator#Uty_point###std###std##0PAV23##Z 0049bdc4
Note : I have anonymized class and function name.
Do you only have a map file? No PDB? If you have full symbols then use the map and .pdbs (and .exe) with WinDBG (are you on windows?).
I would imagine that you do seeing as how you have been given the name of the function.
IF not... dbaf is your answer. What does that equate to? The offset should be the location of faulty instructions.
Of course you would need to figure out the number of instructions (assembly instructions) that each has.
I remember being able to jump to the faulty code by having only the map file and the value of EIP (the instruction pointer, the address where the code crashed), a quick google search pointed me to this webpage: Map Files And DLL Rebasing. From what I remember in an ideal situation you can change the value of EIP directly in the Visual C++ debugger and it will jump to the corresponding code line.
Now, this was really a long time ago in the Visual C++ 6 era, I don't even know if it's still applicable today. As already pointed out you should really look into symbols and the program database options in Visual C++, there is tons of information about how to setup and use them.
MAP File Browser provides functions to allow you to turn crash addresses, or DLL offsets, or symbol offsets, or event log XML crash data into the corresponding symbol location.
Load the map file into MAP File Browser then go to the Query Menu.
Full disclosure: I wrote MAP File Browser.

huge C file debugging problem

I have a source file in my project, which has more than 65,536 code lines (112,444 to be exact). I'm using an "sqlite amalgamation", which comes in a single huge source file.
I'm using MSVC 2005. The problems arrives during debugging. Everything compiles and links ok. But then when I'm trying to step into a function with the debugger - it shows an incorrect code line.
What's interesting is that the difference between the correct line number and the one the debugger shows is exactly 65536. This makes me suspect (almost be sure in) some unsigned short overflow.
I also suspect that it's not a bug in the MSVC itself. Perhaps it's the limitation of the debug information format. That is, the debug information format used by MSVC stores the line numbers as 2-byte shorts.
Is there anything can be done about this (apart from cutting the huge file into several smaller ones) ?
According to a MS moderator, this is a known issue with the debugger only (the compiler seems to handle it fine as you pointed out). There is apparently no workaround, other than using shorter source files. See official response to very similar question here
Well, when I wanted to look at how sqlite works, I took the last 60000 or so lines, moved them to another file and then #include'd it. That was easy and did the trick for me. Also, if you do that, be careful not to split inside #ifdef .
If you look at the documentation for the symbolic debugging info, you will see the type used for line numbers. For example, both line and column parameters for IDiaSession::findLinesByLinenum are of type DWORD.
Edit: As #valdo points out, that still doesn't mean the debugger works properly with huge line numbers. So you have to use shorter files. It is unfortunate that such limitation exists, but even if there wasn't I'd still recommend you split your source.
Have you looked into using WinDBG instead? It's pretty capable as the Windows team use it for debugging the O/S and there's some biiiig files in there, or at least there was when I last looked.
For anyone having issues with incorrect line numbers for files < 65536 lines: I found my issue was because of inconsistent line endings in the source file. There were 129 \r newlines where the rest of the file was \r\n style. The difference between the debugger line and the correct line was 129 as well.
Unless you are modifying SQLite, you should just trust that it is doing its job. No need to step in at all. SQLite is run through a large battery of tests before release.