GDB backtrace with long function names - gdb

I am doing some debugging of an application that uses boost::spirit. This means that backtraces are very deep and that many of the intermediate layers have function names that take several pages to print. The length of the function names makes examining the backtrace difficult. How can I have gdb limit the length of a function name to 1 or 2 lines? I'd still like the see the full path to the file and line number, but I don't need four pages of template parameters!

I don't think it can be done directly right now. I think it would be a reasonable feature.
However, you can write your own implementation of "bt" in Python and then apply whatever transforms you like. This isn't actually very hard.

Related

How to print out specific lines of user input to console (C++)

I am using c++ and the terminal. So my program takes in user input using read(STD_FILENO,buf,BUFFER and I am trying to write back only specific lines.
So for example, if the user entered in a total of 10 lines, how would I print out lines 3 through 7 or 6 through 10?
I am trying to use the write() function (write(STD_FILENO,buf,BUFFER)) but it's not printing what I want it to.
I have tried messing around with the BUFFER and tried to make it smaller than the total amount of characters that the user has input, but it is still not working.
My understanding is that whatever I say the BUFFER is to be, it will write UP TO that BUFFER value, so it will start from 0 to BUFFER. But if I wanted to start from line 6, that may start on character #15 and not 0...does this make sense?
please note: I need to use read() and write()
Thank You!
If you are required to only use read(2) and write(2), then you'll also need open(2), close(2), lseek(2) and you need to design and code your own buffered IO library above it. Read carefully the documentation of every system call mentioned here. Use the result of each of them. Handle error cases in your code. See errno(3) & perror(3).
So keep a buffer (or more than one) and several pointers (or offsets) into it (probably at least the currently consumed position, and the last read position, etc).
Perhaps you'll want to use some container. You might start implementing your own equivalent of fgetc on your buffered IO class, and build above that.
Lines do not really exist at the system call level. You need to take care of \n in your code.
BTW you could study, for inspiration, the source code of several free software C libraries implementing <stdio.h>, such as musl-libc
Of course you should compile with all warnings and debug info ( g++ -Wall -Wextra -g with GCC) and you'll need to use the debugger gdb to understand the behavior of your program and find your bugs. Don't be shy in drawing on some board what happens in your virtual address space (with pointers represented by arrows).
NB: SO is not a do-my-homework service.

Can I query LTTNG if a given tracepoint with given args is going to be traced, before tracing it?

We need to adapt a huge number of existing traces, printf-like, to LTTNG. One of the issues we are foreseeing is that we will need a catch-all tracepoint with the format of args plus a char* string. We are trying to find a way to avoid having to compose the string before calling the LTTNG tracepoint. Is there any way to know beforehand if the tracepoint "will be traced" before passing it to the LTTNG library? Any method we can call to know if the trace is a match?
Thanks a lot!
P.S. We know that having this kind of tracepoint is a bad practice, but zillions of trace lines are flying above us.
Use tracepoint_enabled() and do_tracepoint() macros as following, code copied from man page:
if (tracepoint_enabled(ust_tests_hello, tptest)) {
/* prepare arguments */
do_tracepoint(ust_tests_hello, tptest, i, netint, values,
text, strlen(text), dbl, flt);
}
Note: For this to work you need to have atleast LTTng-UST 2.7.0-rc1
You could technically query the status of the tracing session through liblttng-ctl. However if your goal is to improve performance, I am not sure doing a lookup through this library every time you hit a tracepoint will be more efficient than a string formatting. You would have to benchmark it.
As a side note, if you are moving existing printf() calls to LTTng tracepoints, you may want to look at tracef(), which is basically a single-format-string tracepoint already defined by the tracer. There is also a slightly more advanced tracelog() function which will be introduced in LTTng 2.7.

How do c/c++ compilers know which line an error is on

There is probably a very obvious answer to this, but I was wondering how the compiler knows which line of code my error is on. In some cases it even knows the column.
The only way I can think to do this is to tokenize the input string into a 2D array. This would store [lines][tokens].
C/C++ could be tokenized into 1 long 1D array which would probably be more efficient. I am wondering what the usual parsing method would be that would keep line information.
actually most of it is covered in the dragon book.
Compilers do Lexing/Parsing i.e.: transforming the source code into a tree representation.
When doing so each keyword variable etc. is associated with a line and column number.
However during parsing the exact origin of the failure might get lost and the information might be off.
This is the first step in the long, complicated path towards "Engineering a Compiler" or Compilers Theory
The short answer to that is: there's a module called "front-end" that usually takes care of many phases:
Scanning
Parsing
IR generator
IR optimizer ...
The structure isn't fixed so each compiler will have its own set of modules but more or less the steps involved in the front-end processing are
Scanning - maps character streams into words (also ignores whitespaces/comments) or tokens
Parsing - this is where syntax and (some) semantic analysis take place and where syntax errors are reported
To make this up to you: the compiler knows the location of your error because when something doesn't fit into a structure called "abstract syntax tree" (i.e. it cannot be constructed) or doesn't follow any of the syntax-directed translation rules, well.. there's something wrong and the compiler indicates the location where this didn't happen. If there's a grammar error on just one word/token then even a precise column location can be returned since nothing matched a terminal keyword: a basic token like the if keyword in the C/C++ language.
If you want to know more about this topic my suggestion is to start with the classic academic approach of the "Compiler Book" or "Dragon Book" and then, later on, possibly study an open-source front-end like Clang

Test environment for an Online Judge

I am planning to build an Online Judge on the lines of CodeChef, TechGig, etc. Initially, I will be accepting solutions only in C/C++.
Have thought through a security model for the same, but my concern as of now is how to model the execution and testing part.
Method 1
The method that seems to be more popular is to redirect standard input to the executable and redirect standard output to a file, for example:
./submission.exe < input.txt > output.txt
Then compare the output.txt file with some solution.txt file character by character and report the results.
Method 2
A second approach that I have seen is not to allow the users to write main(). Instead, write a function that accepts some arguments in the form of strings and set a global variable as the output. For example:
//This variable should be set before returning from submissionAlgorithm()
char * output;
void submissionAlgorithm(char * input1, char * input2)
{
//Write your code here.
}
At each step, and for a test case to be executed, the function submissionAlgorithm() is repeatedly called and the output variable is checked for results.
Form an initial analysis I found that Method 2 would not only be secure (I would prevent all read and write access to the filesystem from the submitted code), but also make the execution of test cases faster (maybe?) since the computations of test results would occur in memory.
I would like to know if there is any reason as to why Method 1 would be preferred over Method 2.
P.S: Of course, I would be hosting the online judge engine on a Linux Server.
Don't take this wrong, but you will need to look at security from a much higher perspective. The problem will not be the input and output being written to a file, and that should not affect performance too much either. But you will need to manage submisions that can actually take down your process (in the second case) or the whole system (with calls to the OS to write to disk, acquire too much memory....)
Disclaimer I am by no means a security expert.

How do I associate changed lines with functions in a git repository of C code?

I'm attempting to construct a “heatmap” from a multi-year history stored in a git repository where the unit of granularity is individual functions. Functions should grow hotter as they change more times, more frequently, and with more non-blank lines changed.
As a start, I examined the output of
git log --patch -M --find-renames --find-copies-harder --function-context -- *.c
I looked at using Language.C from Hackage, but it seems to want a complete translation unit—expanded headers and all—rather being able to cope with a source fragment.
The --function-context option is new since version 1.7.8. The foundation of the implementation in v1.7.9.4 is a regex:
PATTERNS("cpp",
/* Jump targets or access declarations */
"!^[ \t]*[A-Za-z_][A-Za-z_0-9]*:.*$\n"
/* C/++ functions/methods at top level */
"^([A-Za-z_][A-Za-z_0-9]*([ \t*]+[A-Za-z_][A-Za-z_0-9]*([ \t]*::[ \t]*[^[:space:]]+)?){1,}[ \t]*\\([^;]*)$\n"
/* compound type at top level */
"^((struct|class|enum)[^;]*)$",
/* -- */
"[a-zA-Z_][a-zA-Z0-9_]*"
"|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
"|[-+*/<>%&^|=!]=|--|\\+\\+|<<=?|>>=?|&&|\\|\\||::|->"),
This seems to recognize boundaries reasonably well but doesn’t always leave the function as the first line of the diff hunk, e.g., with #include directives at the top or with a hunk that contains multiple function definitions. An option to tell diff to emit separate hunks for each function changed would be really useful.
This isn’t safety-critical, so I can tolerate some misses. Does that mean I likely have Zawinski’s “two problems”?
I realise this suggestion is a bit tangential, but it may help in order to clarify and rank requirements. This would work for C or C++ ...
Instead of trying to find text blocks which are functions and comparing them, use the compiler to make binary blocks. Specifically, for every C/C++ source file in a change set, compile it to an object. Then use the object code as a basis for comparisons.
This might not be feasible for you, but IIRC there is an option on gcc to compile so that each function is compiled to an 'independent chunk' within the generated object code file. The linker can pull each 'chunk' into a program. (It is getting pretty late here, so I will look this up in the morning, if you are interested in the idea. )
So, assuming we can do this, you'll have lots of functions defined by chunks of binary code, so a simple 'heat' comparison is 'how much longer or shorter is the code between versions for any function?'
I am also thinking it might be practical to use objdump to reconstitute the assembler for the functions. I might use some regular expressions at this stage to trim off the register names, so that changes to register allocation don't cause too many false positive (changes).
I might even try to sort the assembler instructions in the function bodies, and diff them to get a pattern of "removed" vs "added" between two function implementations. This would give a measure of change which is pretty much independent of layout, and even somewhat independent of the order of some of the source.
So it might be interesting to see if two alternative implementations of the same function (i.e. from different a change set) are the same instructions :-)
This approach should also work for C++ because all names have been appropriately mangled, which should guarantee the same functions are being compared.
So, the regular expressions might be kept very simple :-)
Assuming all of this is straightforward, what might this approach fail to give you?
Side Note: This basic strategy could work for any language which targets machine code, as well as VM instruction sets like the Java VM Bytecode, .NET CLR code, etc too.
It might be worth considering building a simple parser, using one of the common tools, rather than just using regular expressions. Clearly it is better to choose something you are familiar with, or which your organisation already uses.
For this problem, a parser doesn't actually need to validate the code (I assume it is valid when it is checked in), and it doesn't need to understand the code, so it might be quite dumb.
It might throw away comments (retaining new lines), ignore the contents of text strings, and treat program text in a very simple way. It mainly needs to keep track of balanced '{' '}', balanced '(' ')' and all the other valid program text is just individual tokens which can be passed 'straight through'.
It's output might be a separate file/function to make tracking easier.
If the language is C or C++, and the developers are reasonably disciplined, they might never use 'non-syntactic macros'. If that is the case, then the files don't need to be preprocessed.
Then a parser is mostly just looking for a the function name (an identifier) at file scope followed by ( parameter-list ) { ... code ... }
I'd SWAG it would be a few days work using yacc & lex / flex & bison, and it might be so simple that their is no need for the parser generator.
If the code is Java, then ANTLR is a possible, and I think there was a simple Java parser example.
If Haskell is your focus, their may be student projects published which have made a reasonable stab at a parser.