How to start a REPL while debugging C++? - c++

How can I get a REPL to evaluate simple things and test things out in the middle of a program's execution in C++?
I have some debugging tools at my disposal, lldb, but as far as I can tell, I can only view things, not test new things on the fly, even though it seems like it should be able to do it but maybe that functionality of the lldb debugger only works for Swift.
It doesn't have to be a complete REPL, but the ability to do some basic things would be so incredibly helpful. I am working with a massive binary in a new codebase with all sorts of types, classes, composition, and inheritance going on that's overwhelming and undocumented, and every test and repeat of the 50GB binary is about 10 minutes which is painful (running unit tests takes the same 10 minutes). It would be great if I could test statements out in a REPL on a breakpoint in the middle of the execution, one statement after another to play around and figure things out faster.
I'm coming from a drop into ipython/ipdb background, as well as the same lldb for Swift. Very new to C++.

The lldb command expr - aliased to p - should do what you want. expr will run the code you provide "as though it had been inserted in the current context" and will return the expression's result. So for instance, if you have a function and want to see what it returns for various inputs, you can do:
(lldb) expr my_new_function(10, 20, 30)
You can even debug functions while handling such interesting inputs by doing:
(lldb) break set -n my_new_function
(lldb) expr -i 0 -- my_new_function(10, 20, 30)
the -i tells lldb to stop on the breakpoint, by default lldb ignores breakpoints in hand called functions.
You can also define new objects & structs in the expression command, but unlike the Swift REPL mode(*) user-defined objects, functions & variables have to be named with an initial $ to keep them from getting confused with names from your code. For instance, if you have a class A:
(lldb) expr A $myA()
will make a default-constructed object of type A, then you can call methods on it:
(lldb) expr $myA().doSomething()
Note that C++ templates are poorly described in debug information - the current standard only describes instantiations and not the abstract template form. So YMMV when trying to create objects of complex template types. The other danger with C++ is that some functions you want to call may only exist as inlined code, which the debugger can't invoke. This happens for some of the STL types, e.g. the size() method is often not callable on vectors & the like because it's always inlined. That's less of a problem with -O0 code, however.
(*) When you run swift in Terminal w/o arguments to bring up the Swift REPL, you are actually just running lldb in a fancy input mode that feeds what you type to the expr command under the covers. The main difference between the bare expr and the full Swift REPL is that the REPL is focused on making new code, whereas expr is more meant for exploring the code you are stopped in right now, so the name lookup rules are different. There isn't a full C++ REPL mostly because we don't know how to do jobs like "build C++ classes incrementally", which would be required for a real REPL.

Related

Is it possible to start a cling repl from inside lldb?

So I coming from a python world, trying to learn cpp fast. (at least the basics). The thing that I miss most from the python world was - how you can just add breakpoint anywhere in the code and start an interactive repl session with the context.
I am looking for something similar in cpp. I know this not going to be popular in cppworld, but it really helps in prototyping a solution fast! I found https://github.com/tehrengruber/Defrustrator which gives me some more promise.
There is also this: https://github.com/inspector-repl/inspector which looks interesting and is similar to what I am looking for!
Would it help if I embed the cling interpreter inside my program?
Thanks for reading!
EDIT: To clarify, ideally I am seeking something on lines of - how you can enter swift repl while using lldb. I am not sure why CPP community does not see advantage in this. This would be an amazing feature to have! This would encourage a lot of people like myself to take up CPP more readily.
This is not directly answering your question, but if you run your program under lldb and stop somewhere, the lldb expr command is pretty close to a REPL. It doesn't run in a REPL like mode where you are just entering program text, instead you run the "expr" command and either put in the program text as command arguments or hit a return to enter into a little mini-editor. But you can call any methods of any of the objects you have, and can make new objects and in even make new classes and play with them as well as program objects.
There are some provisos. You can create classes in the expr, but you have to do it in one go (C++ is not about incremental building up of classes). Because the expression evaluator is meant primarily for exploring extant code and tries to avoid shadowing program variables, all types & variables defined in the expr command have to be preceded by a $. So for instance:
(lldb) expr
Enter expressions, then terminate with an empty line to evaluate:
1: class $MyClass {
2: public:
3: int doSomething() {
4: return m_ivar1 * m_ivar2;
5: }
6: private:
7: int m_ivar1 = 100;
8: int m_ivar2 = 200;
9: }
(lldb) expr $MyClass $my_var
(lldb) expr $my_var.doSomething()
(int) $0 = 20000
So you can do a fair amount of playing here.
The other funny limitation is that though you can define classes in expr, you can't define free functions. By default the expr command is trying to run expression text "as if it was typed at the point you are stopped in your program", so you will have access to the current frame's ivars, etc. But under the covers that means wrapping the expression in some function, and C/C++ doesn't allow internal functions...
So to define free functions and otherwise more freely add to the state of your program, you can use the --top-level flag to expr to have your expression text evaluated as if it were in a top-level source file. You won't have access to any local state, but you aren't required to use initial $'s and can do some more things that aren't allowed in a function by C.

How can I lint C++ code to find all unused return values?

I would like to statically inspect all calls to non-void functions where the return value is not used.
In effect this would be like applying __attribute__ ((warn_unused_result)) to all non-void functions, but of course for a large project that is not practical to do.
Is there any static analysis tool that can provide this information?
This can be done using clang-query. Here is a shell script that invokes clang-query to find calls that return a value that is not used:
#!/bin/sh
# cmd.sh: Run clang-query to report unused return values.
# When --dump, print the AST of matching syntax.
if [ "x$1" = "x--dump" ]; then
dump="set output dump"
shift
fi
query='m
callExpr(
isExpansionInMainFile(),
hasParent(anyOf(
compoundStmt(),
ifStmt(hasCondition(expr().bind("cond"))),
whileStmt(hasCondition(expr().bind("cond"))),
doStmt(hasCondition(expr().bind("cond")))
)),
unless(hasType(voidType())),
unless(isTypeDependent()),
unless(cxxOperatorCallExpr()),
unless(callee(namedDecl(anyOf(
hasName("memset"),
hasName("setlength"),
hasName("flags"),
hasName("width"),
hasName("__builtin_memcpy")
)))),
unless(equalsBoundNode("cond")))'
clang-query -c="$dump" -c="$query" "$#"
To run this on, say, test1.cc:
$ ./cmd.sh test1.cc --
The basic idea of the query is to look for call expressions whose immediate parent is a compound statement. That is expanded to handle an immediate parent that is a control flow statement, being careful not to report when the call appears as the conditional expression.
Some other complications the query deals with:
This only reports in the main file of a translation unit in order to eliminate the voluminous noise from headers. Remove the isExpansionInMainFile filter to drink from the fire hose.
In C++ templates, we might not know what the type is, so suppress reporting all calls with dependent types.
Some functions like memset have useless or only rarely useful return values. They have to be filtered out to see any useful signal. The list of function names in the query is just the tip of that iceberg.
C++ overloaded operators, including operator<< and operator=, usually return a value, but that value is most often ignored. So suppress reports for all overloaded operators.
I've tested this lightly (with clang-query from clang+llvm-8.0.1) on some files in a utility library of mine, which is how I found some of the things that need to be filtered out for this to be useful. There are probably many more things that need filtering, depending on your application.
The query language is described at https://clang.llvm.org/docs/LibASTMatchersReference.html . See this answer of mine for some more links and information about clang-query.
Cppcheck is a command-line tool that tries to detect bugs that your C/C++ compiler doesn't see, it also includes a web based report generator.
I think there are software can do this like DevExtreme and in social.msdn.microsoft.com in the answer for this question how-to-get-a-warning-for-an-unused-return-value? they mention that Premium and Ultimate versions of visual studio has some tools.
Read this:https://social.msdn.microsoft.com/Forums/vstudio/en-US/4355715a-5af7-4a2b-8aa0-bc2112eaa911/how-to-get-a-warning-for-an-unused-return-value?forum=vclanguage
and this mandatory-error-codes-revisited from: http://www.drdobbs.com/cpp/mandatory-error-codes-revisited/191601612

What is the difference between dprintf vs break + commands + continue?

For example:
dprintf main,"hello\n"
run
Generates the same output as:
break main
commands
silent
printf "hello\n"
continue
end
run
Is there a significant advantage to using dprintf over commands, e.g. it is considerably faster (if so why?), or has some different functionality?
I imagine that dprinf could be in theory faster as it could in theory compile and inject code with a mechanism analogous to the compile code GDB command.
Or is it mostly a convenience command?
Source
In the 7.9.1 source, breakpoint.c:dprintf_command, which defines dprintf, calls create_breakpoint which is also what break_command calls, so they both seem to use the same underlying mechanism.
The main difference is that dprintf passes the dprintf_breakpoint_ops structure, which has different callbacks and gets initialized at initialize_breakpoint_ops.
dprintf stores list of command strings much like that of commands command, depending on the settings. They are:
set at update_dprintf_command_list
which gets called on after a type == bp_dprintf check inside init_breakpoint_sal
which gets called by create_breakpoint.
When a breakpoint is reached:
bpstat_stop_status gets called and invokes b->ops->after_condition_true (bs); for the breakpoint reached
after_condition_true for dprintf is dprintf_after_condition_true
bpstat_do_actions_1 runs the commands
There are two main differences.
First, dprintf has some additional output modes that can be used to make it work in other ways. See help set dprintf-channel, or the manual, for more information. I think these modes are the reason that dprintf was added as a separate entity; though at the same time they are fairly specialized and unlikely to be of general interest.
More usefully, though, dprintf doesn't interfere with next. If you write a breakpoint and use commands, and then next over such a breakpoint, gdb will forget about the next and act as if you had typed continue. This is a longstanding oddity in the gdb scripting language. dprintf doesn't suffer from this problem. (If you need similar functionality from an ordinary breakpoint, you can do this from Python.)

How to modify C++ code from user-input

I am currently writing a program that sits on top of a C++ interpreter. The user inputs C++ commands at runtime, which are then passed into the interpreter. For certain patterns, I want to replace the command given with a modified form, so that I can provide additional functionality.
I want to replace anything of the form
A->Draw(B1, B2)
with
MyFunc(A, B1, B2).
My first thought was regular expressions, but that would be rather error-prone, as any of A, B1, or B2 could be arbitrary C++ expressions. As these expressions could themselves contain quoted strings or parentheses, it would be quite difficult to match all cases with a regular expression. In addition, there may be multiple, nested forms of this expression
My next thought was to call clang as a subprocess, use "-dump-ast" to get the abstract syntax tree, modify that, then rebuild it into a command to be passed to the C++ interpreter. However, this would require keeping track of any environment changes, such as include files and forward declarations, in order to give clang enough information to parse the expression. As the interpreter does not expose this information, this seems infeasible as well.
The third thought was to use the C++ interpreter's own internal parsing to convert to an abstract syntax tree, then build from there. However, this interpreter does not expose the ast in any way that I was able to find.
Are there any suggestions as to how to proceed, either along one of the stated routes, or along a different route entirely?
What you want is a Program Transformation System.
These are tools that generally let you express changes to source code, written in source level patterns that essentially say:
if you see *this*, replace it by *that*
but operating on Abstract Syntax Trees so the matching and replacement process is
far more trustworthy than what you get with string hacking.
Such tools have to have parsers for the source language of interest.
The source language being C++ makes this fairly difficult.
Clang sort of qualifies; after all it can parse C++. OP objects
it cannot do so without all the environment context. To the extent
that OP is typing (well-formed) program fragments (statements, etc,.)
into the interpreter, Clang may [I don't have much experience with it
myself] have trouble getting focused on what the fragment is (statement? expression? declaration? ...). Finally, Clang isn't really a PTS; its tree modification procedures are not source-to-source transforms. That matters for convenience but might not stop OP from using it; surface syntax rewrite rule are convenient but you can always substitute procedural tree hacking with more effort. When there are more than a few rules, this starts to matter a lot.
GCC with Melt sort of qualifies in the same way that Clang does.
I'm under the impression that Melt makes GCC at best a bit less
intolerable for this kind of work. YMMV.
Our DMS Software Reengineering Toolkit with its full C++14 [EDIT July 2018: C++17] front end absolutely qualifies. DMS has been used to carry out massive transformations
on large scale C++ code bases.
DMS can parse arbitrary (well-formed) fragments of C++ without being told in advance what the syntax category is, and return an AST of the proper grammar nonterminal type, using its pattern-parsing machinery. [You may end up with multiple parses, e.g. ambiguities, that you'll have decide how to resolve, see Why can't C++ be parsed with a LR(1) parser? for more discussion] It can do this without resorting to "the environment" if you are willing to live without macro expansion while parsing, and insist the preprocessor directives (they get parsed too) are nicely structured with respect to the code fragment (#if foo{#endif not allowed) but that's unlikely a real problem for interactively entered code fragments.
DMS then offers a complete procedural AST library for manipulating the parsed trees (search, inspect, modify, build, replace) and can then regenerate surface source code from the modified tree, giving OP text
to feed to the interpreter.
Where it shines in this case is OP can likely write most of his modifications directly as source-to-source syntax rules. For his
example, he can provide DMS with a rewrite rule (untested but pretty close to right):
rule replace_Draw(A:primary,B1:expression,B2:expression):
primary->primary
"\A->Draw(\B1, \B2)" -- pattern
rewrites to
"MyFunc(\A, \B1, \B2)"; -- replacement
and DMS will take any parsed AST containing the left hand side "...Draw..." pattern and replace that subtree with the right hand side, after substituting the matches for A, B1 and B2. The quote marks are metaquotes and are used to distinguish C++ text from rule-syntax text; the backslash is a metaescape used inside metaquotes to name metavariables. For more details of what you can say in the rule syntax, see DMS Rewrite Rules.
If OP provides a set of such rules, DMS can be asked to apply the entire set.
So I think this would work just fine for OP. It is a rather heavyweight mechanism to "add" to the package he wants to provide to a 3rd party; DMS and its C++ front end are hardly "small" programs. But then modern machines have lots of resources so I think its a question of how badly does OP need to do this.
Try modify the headers to supress the method, then compiling you'll find the errors and will be able to replace all core.
As far as you have a C++ interpreter (as CERN's Root) I guess you must use the compiler to intercept all the Draw, an easy and clean way to do that is declare in the headers the Draw method as private, using some defines
class ItemWithDrawMehtod
{
....
public:
#ifdef CATCHTHEMETHOD
private:
#endif
void Draw(A,B);
#ifdef CATCHTHEMETHOD
public:
#endif
....
};
Then compile as:
gcc -DCATCHTHEMETHOD=1 yourfilein.cpp
In case, user want to input complex algorithms to the application, what I suggest is to integrate a scripting language to the app. So that the user can write code [function/algorithm in defined way] so the app can execute it in the interpreter and get the final results. Ex: Python, Perl, JS, etc.
Since you need C++ in the interpreter http://chaiscript.com/ would be a suggestion.
What happens when someone gets ahold of the Draw member function (auto draw = &A::Draw;) and then starts using draw? Presumably you'd want the same improved Draw-functionality to be called in this case too. Thus I think we can conclude that what you really want is to replace the Draw member function with a function of your own.
Since it seems you are not in a position to modify the class containing Draw directly, a solution could be to derive your own class from A and override Draw in there. Then your problem reduces to having your users use your new improved class.
You may again consider the problem of automatically translating uses of class A to your new derived class, but this still seems pretty difficult without the help of a full C++ implementation. Perhaps there is a way to hide the old definition of A and present your replacement under that name instead, via clever use of header files, but I cannot determine whether that's the case from what you've told us.
Another possibility might be to use some dynamic linker hackery using LD_PRELOAD to replace the function Draw that gets called at runtime.
There may be a way to accomplish this mostly with regular expressions.
Since anything that appears after Draw( is already formatted correctly as parameters, you don't need to fully parse them for the purpose you have outlined.
Fundamentally, the part that matters is the "SYMBOL->Draw("
SYMBOL could be any expression that resolves to an object that overloads -> or to a pointer of a type that implements Draw(...). If you reduce this to two cases, you can short-cut the parsing.
For the first case, a simple regular expression that searches for any valid C++ symbol, something similar to "[A-Za-z_][A-Za-z0-9_\.]", along with the literal expression "->Draw(". This will give you the portion that must be rewritten, since the code following this part is already formatted as valid C++ parameters.
The second case is for complex expressions that return an overloaded object or pointer. This requires a bit more effort, but a short parsing routine to walk backward through just a complex expression can be written surprisingly easily, since you don't have to support blocks (blocks in C++ cannot return objects, since lambda definitions do not call the lambda themselves, and actual nested code blocks {...} can't return anything directly inline that would apply here). Note that if the expression doesn't end in ) then it has to be a valid symbol in this context, so if you find a ) just match nested ) with ( and extract the symbol preceding the nested SYMBOL(...(...)...)->Draw() pattern. This may be possible with regular expressions, but should be fairly easy in normal code as well.
As soon as you have the symbol or expression, the replacement is trivial, going from
SYMBOL->Draw(...
to
YourFunction(SYMBOL, ...
without having to deal with the additional parameters to Draw().
As an added benefit, chained function calls are parsed for free with this model, since you can recursively iterate over the code such as
A->Draw(B...)->Draw(C...)
The first iteration identifies the first A->Draw( and rewrites the whole statement as
YourFunction(A, B...)->Draw(C...)
which then identifies the second ->Draw with an expression "YourFunction(A, ...)->" preceding it, and rewrites it as
YourFunction(YourFunction(A, B...), C...)
where B... and C... are well-formed C++ parameters, including nested calls.
Without knowing the C++ version that your interpreter supports, or the kind of code you will be rewriting, I really can't provide any sample code that is likely to be worthwhile.
One way is to load user code as a DLL, (something like plugins,)
this way, you don't need to compile your actual application, just the user code will be compiled, and you application will load it dynamically.

Is there a tool that enables me to insert one line of code into all functions and methods in a C++-source file?

It should turn this
int Yada (int yada)
{
return yada;
}
into this
int Yada (int yada)
{
SOME_HEIDEGGER_QUOTE;
return yada;
}
but for all (or at least a big bunch of) syntactically legal C/C++ - function and method constructs.
Maybe you've heard of some Perl library that will allow me to perform these kinds of operations in a view lines of code.
My goal is to add a tracer to an old, but big C++ project in order to be able to debug it without a debugger.
Try Aspect C++ (www.aspectc.org). You can define an Aspect that will pick up every method execution.
In fact, the quickstart has pretty much exactly what you are after defined as an example:
http://www.aspectc.org/fileadmin/documentation/ac-quickref.pdf
If you build using GCC and the -pg flag, GCC will automatically issue a call to the mcount() function at the start of every function. In this function you can then inspect the return address to figure out where you were called from. This approach is used by the linux kernel function tracer (CONFIG_FUNCTION_TRACER). Note that this function should be written in assembler, and be careful to preserve all registers!
Also, note that this should be passed only in the build phase, not link, or GCC will add in the profiling libraries that normally implement mcount.
I would suggest using the gcc flag "-finstrument-functions". Basically, it automatically calls a specific function ("__cyg_profile_func_enter") upon entry to each function, and another function is called ("__cyg_profile_func_exit") upon exit of the function. Each function is passed a pointer to the function being entered/exited, and the function which called that one.
You can turn instrumenting off on a per-function or per-file basis... see the docs for details.
The feature goes back at least as far as version 3.0.4 (from February 2002).
This is intended to support profiling, but it does not appear to have side effects like -pg does (which compiles code suitable for profiling).
This could work quite well for your problem (tracing execution of a large program), but, unfortunately, it isn't as general purpose as it would have been if you could specify a macro. On the plus side, you don't need to worry about remembering to add your new code into the beginning of all new functions that are written.
There is no such tool that I am aware of. In order to recognise the correct insertion point, the tool would have to include a complete C++ parser - regular expressions are not enough to accomplish this.
But as there are a number of FOSS C++ parsers out there, such a tool could certainly be written - a sort of intelligent sed for C++ code. The biggest problem would probably be designing the specification language for the insert/update/delete operation - regexes are obviously not the answer, though they should certainly be included in the language somehow.
People are always asking here for ideas for projects - how about this for one?
I use this regex,
"(?<=[\\s:~])(\\w+)\\s*\\([\\w\\s,<>\\[\\].=&':/*]*?\\)\\s*(const)?\\s*{"
to locate the functions and add extra lines of code.
With that regex I also get the function name (group 1) and the arguments (group 2).
Note: you must filter out names like, "while", "do", "for", "switch".
This can be easily done with a program transformation system.
The DMS Software Reengineering Toolkit is a general purpose program transformation system, and can be used with many languages (C#, COBOL, Java, EcmaScript, Fortran, ..) as well as specifically with C++.
DMS parses source code (using full langauge front end, in this case for C++),
builds Abstract Syntax Trees, and allows you to apply source-to-source patterns to transform your code from one C# program into another with whatever properties you wish. THe transformation rule to accomplish exactly the task you specified would be:
domain CSharp.
insert_trace():function->function
"\visibility \returntype \fnname(int \parametername)
{ \body } "
->
"\visibility \returntype \fnname(int \parametername)
{ Heidigger(\CppString\(\methodname\),
\CppString\(\parametername\),
\parametername);
\body } "
The quote marks (") are not C++ quote marks; rather, they are "domain quotes", and indicate that the content inside the quote marks is C++ syntax (because we said, "domain CSharp"). The \foo notations are meta syntax.
This rule matches the AST representing the function, and rewrites that AST into the traced form. The resulting AST is then prettyprinted back into source form, which you can compile. You probably need other rules to handle other combinations of arguments; in fact, you'd probably generalize the argument processing to produce (where practical) a string value for each scalar argument.
It should be clear you can do a lot more than just logging with this, and a lot more than just aspect-oriented programming, since you can express arbitrary transformations and not just before-after actions.