Reading/Understanding third-party code

Reading/Understanding third-party code - c++

When you get a third-party library (c, c++), open-source (LGPL say), that does not have good documentation, what is the best way to go about understanding it to be able to integrate into your application?
The library usually has some example programs and I end up walking through the code using gdb. Any other suggestions/best-practicies?
For an example, I just picked one from sourceforge.net, but it's just a broad engineering/programming question:
http://sourceforge.net/projects/aftp/

I frequently use a couple of tools to help me with this:
GNU Global. It generates cross-referencing databases and can produce hyperlinked HTML from source code. Clicking function calls will take you to their definitions, and you can see lists of all references to a function. Only works for C and perhaps C++.
Doxygen. It generates documentation from Javadoc-style comments. If you tell it to generate documentation for undocumented methods, it will give you nice summaries. It can also produce hyperlinked source code listings (and can link into the listings provided by htags).
These two tools, along with just reading code in Emacs and doing some searches with recursive grep, are how I do most of my source reverse-engineering.

One of the better ways to understand it is to attempt to document it yourself. By going and trying to document it yourself, it forces you to really dive in and test and test and test and make sure you know what each statement is doing at what times. Then you can really start to understand what the previous developer may have been thinking (or not thinking for that matter).

Great question. I think that this should be addressed thoroughly, so I'm going to try to make my answer as thorough as possible.
One thing that I do when approaching large projects that I've either inherited or contributing to is automatically generate their sources, UML diagrams, and anything that can ease the various amounts of A.D.D. encountered when learning a new project:)
I believe someone here already mentioned Doxygen, that's a great tool! You should look into it and write a small bash script that will automatically generate sources for the application you're developing in some tree structure you've setup.
One thing that I've haven't seen people mention is BOUML! It's fantastic and free! It automatically generates reverse UML diagrams from existing sources and it supports a variety of languages. I use this as a way to really capture the big picture of what's going on in terms of architecture and design before I start reading code.
If you've got the money to spare, look into Understand for %language-here%. It's absolutely great and has helped me in many ways when inheriting legacy code.
EDIT:
Try out ack (betterthangrep.com), it is a pretty convenient script for searching source trees:)

Familiarize yourself with the information available in the headers. The functions you call will be declared there. Then try to identify the valid arguments and pre-/post-conditions of the functions, as those are your primary guidance (even if they are not documented!). The example programs are your next bet.

If you have code completion/intellisense I like opening up the library and going '.' or 'namespace::' and seeing what comes up. I always find it helpful, you can navigate through the objects/namespaces and see what functionality they have. This is of course assuming its an OOP library with relatively good naming of functions/objects.

There really isn't a silver bullet other than just rolling up your sleeves and digging into the code.
This is where we earn our money.

Three things;
(1) try to run the test or example apps available, set low debug levels, and walk through logs.
(2) use source navigator tool / cscope ( available both on windows and linux) and browse the code to understand the flow.
(3) also in parallel use gdb to walk into code while running test/example apps.

Related

Discovering Symbol Usage

Issue
I have recently found myself working with a large, unfamiliar, multi-department, C++ codebase in need of better organization. I would like to discover a way to map which symbols are used by which source files for any given header. This is in the hope that if only one department uses a given function, then it can be moved out of the shared area and into that department's area.
Attempts
My first thoughts were to use the symbol table: ie. compile the project and dump the symbols for each object file. From there I figured I could simply write a script to check if the symbols from my header file were used. While this approach seems viable, it would require me to create a list of symbols I am looking for from the headers. With my limited knowledge, I am unsure of how to automate such a process, and with hundreds of headers files to test, doing it manually is out of the question.
Questions
Is my approach valid? If so..
What can I use to generate the symbol names from my header file?
If not..
What else can I do?
Additionally, while I am using Linux, most of the development teams work in Windows only environments. What utilities could I use on both platforms?
Any and all help is greatly appreciated.

When I need to clean up APIs I sometimes use information from callcatcher. It basically builds a database of all symbols while compiling and allows you to determine what symbols are used in some build product.
I sometimes also use DXR (code on github, an example installation) to browse what code defined where is used how. In contrast to callcatcher with DXR you can drill down to much finer detail. Setting up DXR is pretty heavy duty, but might be worth it if you have enough code to work with.
On the other side of the spectrum there are tools like cscope. Even though it doesn't work super nicely with C++ code it is still very useful. If you deal with more than a couple 100kloc you will quickly feel limited though.
If I had to pick only one of these tools and would be working on a large code base (>1Mloc) I would definitely pick DXR.

You can get a reasonable start on the information that you've described by using doxygen.
Even for source that doesn't contain the doxygen formatted comments the documentation created can contain a list of places (ie. source files) where a particular symbol is used.
And, as doxygen can be used to generate html documentation, navigating through your source tree becomes trivial. It's can be even better if you enable the dot functionality to generate relationship diagrams for the classes in your source tree.

very old-school, simple, and possibly unix only, but are you aware of etags? there's also gnu global which i think is similar.
the gnu global link refers to the "comparison with similar tools" discussion here which might also be useful.

What generic template processor should I use?

This is a potentially dangerous question because interdisciplinary questions and answers will be biased, but I'll have a stab at it anyway. All in good spirit!
So, here we go. I'm writing a major editing mode for Emacs for the language that it has almost no support for yet. And I'm at the point, where I have to decide on a way to generate project files. Below is the syllabus of the task ahead:
The templates have to represent project directory tree, not only single files.
The resulting files are of various formats, potentially including SGML-like languages, but not limited to this variety. They also have to generate C-like source code and, eLisp source code and plain text files, like README, for example.
The templates must be processed in a batch upon user-initiated action (as in user wants to create a project - several files must be created in the user-appointed directory). It may be beneficial to have an ability to supervise the creation, but this is less important then the ability to run the process entirely automatically.
Bonus features:
The template language has already a user base (with a potential of reuse of existing templates).
The templates can be used for code snippets (contain blanks which are filled interactively once the user invokes code-generating routine while editing the file).
Obvious things like cross-platform-ness, ease of use both through graphical interface and command line.
I made a research, but I won't share my results (yet) so I won't bias the answers. The problem with answering this question is not that the answer is hard to find, but that it is hard to chose one from many.

I'm developing a system based on Mustache for exactly the use case that you've described. The template language itself is a very simple extension of Mustache called Groome.
I also released a command-line tool called Molt that renders Groome templates. I'd be curious to know if it does everything that you need. I'm still adding features to the tool and haven't yet announced it. Thanks.

I went to solve a similar problem several years aback, where I wanted to use Emacs to generate code out of a UML diagram (cogre), and also generate Makefiles from project specifications. I first tried to use Tempo, but when I tried to get the templates to nest, I ran into problems. I also looked into skeleton, but that didn't quite fit the plan either.
I ended up using Google Templates for a little bit, and liked the syntax, and developed SRecode instead, and just borrowed the good bits from Google templates. SRecode was written specifically for machine-generated code. The interaction for template insertion (aka - what tempo was written for) isn't first class in SRecode. For generating code from a data structure, however, it is very robust, and has a lot of features, and automatically filled variables. It works closely with your major mode, and allows many nested templates, with control over the nested dictionary values. There is a subsystem that will use Semantic tags and generate code from them for a couple languages. That means you can parse code in one language with Semantic, and generate code in another language with SReocde using those tags. Nifty! Many parts of CEDET Reference manuals were built that way.
The templates themselves allow looping, if statements, and include statements. There are a couple examples in SRecode for making an 'application', such as the comment writer, and EDE uses it to create Makefiles, which is almost exactly what you are trying to do.

Another option is Generator, which offers “language-agnostic project bootstrapping with an emphasis on simplicity”. Installation requires Node.js and npm.
Generator’s emphasis on simplicity means it is very easy to learn how to make a template. Generator also saves you from having to reference templates by file paths – it looks for templates in ~/.generator.
However, there is no way to write README or LICENSE files for the template itself without those files being copied to the generated project. Also, post-generation commands written in the Makefile will be copied to the generated Makefile, even after they are no longer of use. Finally, the ad-hoc templating language doesn’t provide a way to escape its __lowercasevariables__ – though I can’t think of a language where that limitation would be a problem.

C++: Quickly determine appropriate list of header includes?

Is there any tool or method that can speed up this process?
For instance I just split neatTrick.cpp source file into two separate files neatTrickImplementation.cpp and neatTrickTests.cpp.
What I have to do now is to go through the list of #includes at the top of neatTrick.cpp and determine which of them need to go into the implementation file, and which need to go into the tests file. Some of the headers are required for both of them, some are not. Some may even be completely unnecessary.
I feel like my process (start with nothing, compile, see what's broken, add proper include, compile again, repeat) will produce the most unbloated code but it is so frustratingly slow. I think it'd be great if my IDE could analyze the rest of the headers in my project, see which ones could eliminate the current set of errors, and automate this task for me.

There was a talk by Chandler Carruth on Microsoft's "Going Native" (a C++ conference) where he said that the Clang tooling project had something in the pipeline to solve exactly this problem.
From my understanding, it was presented as something no publically available tool is able to do at the moment and most people were pretty impressed by this.
So: At the moment, there currently is no such tool. In the near future you will probably get something like this as a Clang-based tool to compile for yourself. Long-term, expect this to be a standard feature built upon a Clang toolchain.
(A bit OT: There currently is a discussion on the Clang/LLVM developers list dealing with a tooling/service infrastructure. The tools are not there yet but are under active development, currently by Google engineers, later probably by people in the whole industry and Clang open source community).

During the ACCU conference at Oxford last April, one of the speakers, Peter Sommerlad, demoed exactly this functionality with a plugin for Eclipse CDT, written by one of his students. I don't know if this plugin is already publicly available, but maybe you could drop him an e-mail to ask...

C++: sorting/alphabetizing methods

OK, I have been looking for weeks now. I have looked through Eclipse and Visual Studio, but all the plugins for this sort of thing is for Java or C# and not C++. ReSharper does not work, nor does NArrange. How in the world can I sort my methods in a .cpp file without having to go in and cut and paste by hand (there are hundreds of files and there is not enough time in the world to do that)?
I have tried writing the program myself, but I am not very skilled in scripting and have zero experience in Python. Creating the program in C++ I believe is possible but if there is a simpler way then I would like to know.

I didn't use it but take a look at Regionerate. It is a plugin for Visual Studio. I am sorry, I saw now that it is also only for C#. I thought that it worked with C++ too. Sorry.

I have looked for a long time and talked to many co-workers and am now convinced we should not do it. Too many headaches and one of the developers said he didn't want that because of the way he writes his code. Thank the lord he said something!
If anyone else is looking to do this and trying to find a solution, I would just like to let you know that it is not worth the trouble. If you HAVE to do something like this in C++ then you have got to do it by hand. Pray that you don't have to.

I realize you've concluded you don't want to do this, but just in case someone else does, you might be able to use Doxygen to do the "heavy lifting" and extract the functions from your source.
You can configure Doxygen to extract the code structure from undocumented source files.
You'd then have to extract starting line numbers of the functions from Doxygen's output, sort, and reassemble. It gets messy because you might need to introduce forward declarations.
Thankfully you decided against doing it.

Autocompletion in Vim

In a nutshell, I'm searching for a working autocompletion feature for the Vim editor. I've argued before that Vim completely replaces an IDE under Linux and while that's certainly true, it lacks one important feature: autocompletion.
I know about Ctrl+N, Exuberant Ctags integration, Taglist, cppcomplete and OmniCppComplete. Alas, none of these fits my description of “working autocompletion:”
Ctrl+N works nicely (only) if you've forgotton how to spell class, or while. Oh well.
Ctags gives you the rudiments but has a lot of drawbacks.
Taglist is just a Ctags wrapper and as such, inherits most of its drawbacks (although it works well for listing declarations).
cppcomplete simply doesn't work as promised, and I can't figure out what I did wrong, or if it's “working” correctly and the limitations are by design.
OmniCppComplete seems to have the same problems as cppcomplete, i.e. auto-completion doesn't work properly. Additionally, the tags file once again needs to be updated manually.
I'm aware of the fact that not even modern, full-blown IDEs offer good C++ code completion. That's why I've accepted Vim's lack in this area until now. But I think a fundamental level of code completion isn't too much to ask, and is in fact required for productive usage. So I'm searching for something that can accomplish at least the following things.
Syntax awareness. cppcomplete promises (but doesn't deliver for me), correct, scope-aware auto-completion of the following:
variableName.abc
variableName->abc
typeName::abc
And really, anything else is completely useless.
Configurability. I need to specify (easily) where the source files are, and hence where the script gets its auto-completion information from. In fact, I've got a Makefile in my directory which specifies the required include paths. Eclipse can interpret the information found therein, why not a Vim script as well?
Up-to-dateness. As soon as I change something in my file, I want the auto-completion to reflect this. I do not want to manually trigger ctags (or something comparable). Also, changes should be incremental, i.e. when I've changed just one file it's completely unacceptable for ctags to re-parse the whole directory tree (which may be huge).
Did I forget anything? Feel free to update.
I'm comfortable with quite a lot of configuration and/or tinkering but I don't want to program a solution from scratch, and I'm not good at debugging Vim scripts.
A final note, I'd really like something similar for Java and C# but I guess that's too much to hope for: ctags only parses code files and both Java and C# have huge, precompiled frameworks that would need to be indexed. Unfortunately, developing .NET without an IDE is even more of a PITA than C++.

Try YouCompleteMe. It uses Clang through the libclang interface, offering semantic C/C++/Objective-C completion. It's much like clang_complete, but substantially faster and with fuzzy-matching.
In addition to the above, YCM also provides semantic completion for C#, Python, Go, TypeScript etc. It also provides non-semantic, identifier-based completion for languages for which it doesn't have semantic support.

There’s also clang_complete which uses the clang compiler to provide code completion for C++ projects. There’s another question with troubleshooting hints for this plugin.
The plugin seems to work fairly well as long as the project compiles, but is prohibitively slow for large projects (since it attempts a full compilation to generate the tags list).

as per requested, here is the comment I gave earlier:
have a look at this:
Vim integration to MonoDevelop
for .net stuff at least..
OmniCompletion
this link should help you if you want to use monodevelop on a MacOSX
Good luck and happy coding.

I've just found the project Eclim linked in another question. This looks quite promising, at least for Java integration.

I'm a bit late to the party but autocomplpop might be helpful.

is what you are looking for something like intellisense?
insevim seems to address the issue.
link to screenshots here

Did someone mention code_complete?
http://www.vim.org/scripts/script.php?script_id=1764
But you did not like ctags, so this is probably not what you are looking for...

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js