G++ Compilation Statistics - c++

I was wondering if there is any way to gather statistics from GCC/G++ compilation process. Metrics like the number of lines compiled in the entire process, total time spent compiling, number of errors/warnings, number/size of compiled objects and so on.
I would like to make a script ( maybe in python ) to generate statistical information in a daily, weekly and monthly basis.
Any ideas?
Thanks

I know one, it's called Cdash and it's part of a larger and ideal suite that virtually includes Cmake, Ctest and Cpack.
This will probably be an interesting video for you

Related

Python Sub-Process Coverage

Situation:
I'm attempting to get coverage reports for a project that uses both C++ and Python. I'm using LCOV/GCOV for C++, and attempting to use Coverage.py for the python stuff. The only issue is, most of the python code that's being used is simply utility functions being called one function at a time. No initialization, no real life-cycle, or exit. So no real way to use the API to start/stop/save, or use the coverage command line to measure.
With this, I thought the easiest way to accomplish this would be using the sitecustomize.py method like outlined here. I have gotten that to work, and it measures all configured python code as expected. Now I'm looking at how to accomplish this with compiled python code (.pyc).
I can get it to work if I keep source(.py) and (.pyc) in the same directory when running, and then reporting. However, I'm looking for a way to RUN the files and generate the measurement data. Then at a later time point to the actual source files, and run the actual reports. Ideally I wouldn't need the source(.py) files at all, but I haven't found a way to accomplish this.
Objective:
In the end I want to be able to compile the python files(.pyc), install them on the target, and run coverage like stated above. It will generate coverage data files, then pull those files to my host machine which houses the source(.py) .. and do the actual coverage reporting.
Is this possible currently?
[Edit] Thanks to Ned's advice, I looked into the [paths] usage, and it worked exactly how I needed it to.

Handling really large multi language projects

I am working on an really large multi language project (1000+ Classes + Configs + Scripts), with files distributed over network drives. I am having trouble fighting through the code, since the available Tools are not helping. The main problem is finding things. For the C++ Part: VS with VAX can only find files and symbols which are in the solution. A lot of them are not. Same problem with Reshaper. Right now i am stuck with doing unindexed string and file searches, which is highly inefficient on a network drive. I heared that SourceInsight would be an option since it allows you to just specify the folders that are part of the project and than indexes them, but my company wont spent money on it.
So my question ist: what Tools are there available to fight through an incredible large amount of code? And if possible they should be low cost or even free/open source.
Check out -
ctags
cscope
idutils
snavigator
In every one of these tools, you would have to invest(*) some time in reading the documentation, and then building your index. Consider switching to an editor that will work with these tools.
(*): I do mean invest, because it will reap dividends once you do.
hope this helps,
If you need to maintain a large amount of code, you really should have a source code managment system, a lot of them will help you find text by indexing all the files
And Most of them will work with various language.
Otherwise you can install some indexer like Apache Lucene and index all your files...
You should take a look at LXR. This is used by many Linux kernel source listings.
Try ndexer http://code.google.com/p/ndexer/
promises to Handle extremely large codebases!
The Perl program ack is also worth a look -- think of it as multi-file grep on steroids. The new version (in what I would call late beta) even lets you specify regexes for the files to process as well as regexes to search for -- a feature I've used extensively since it came out (I've got a subproject with 30k lines in 300+ classes, where this feature has been very helpful). You can even chain the new ack with itself so you can subselect the files to process.
VS with VAX can only find files and symbols which are in the solution. A lot of them are not.
You can add all the files that are not in your solution and set them to not build in the settings. Your VS build will not be affected by this, but now VS knows about those files and you can search them along with your VS native files.

Combine several google-pprof files (pproc CPU profiler)

I want to use CPU Profiler from google-perftools (gperftools's libprofiler.so ), which is described here:
http://gperftools.googlecode.com/svn/trunk/doc/cpuprofile.html
In my setup I want to run the program to be profiled several times (up to 1500; each run is different) and then combine pprof outputs from all runs (or from some subset of runs) into single pprof file.
How can I do this?
PS: My program uses almost no shared libraries, so only single binary (elf) file will be analyzed.
PPS: Thanks to Chris, pprof can use several profiles:
pprof ./program first.pprof.out second.pprof.out ...

Profiling for wall-time on Linux

I have an application that I want to profile wrt how much time is spent in various activities. Since this application is I/O intensive, I want to get a report that will summarize how much time is spent in every library/system call (wall time).
I've tried oprofile, but it seems it gives time in terms of Unhalted CPU cycles (thats cputime, not real time)
I've tried strace -T, which gives wall time, but the data generated is huge and getting the summary report is difficult (and awk/py scripts exist for this ?)
Now I'm looking upto SystemTap, but I don't find any script that is close enough and can be modified, and the onsite tutorial didn't help much either. I am not sure if what I am looking for can be done.
I need someone to point me in the right direction.
Thanks a lot!
Judging from this commit, the recently released strace 4.9 supports this with:
strace -w -c
They call it "syscall latency" (and it's hard to see from the manpage alone that's what -w does).
Are you doing this just out of measurement curiosity, or because you want to find time-drains that you can fix to make it run faster?
If your goal is to make it run as fast as possible, then try random-pausing.
It doesn't measure anything, except very roughly.
It may be counter-intuitive, but what it does is pinpoint the code that will result in the greatest speed-up.
See the fntimes.stp systemtap sample script. https://sourceware.org/systemtap/examples/index.html#profiling/fntimes.stp
The fntimes.stp script monitors the execution time history of a given function family (assumed non-recursive). Each time (beyond a warmup interval) is then compared to the historical maximum. If it exceeds a certain threshold (250%), a message is printed.
# stap fntimes.stp 'kernel.function("sys_*")'
or
# stap fntimes.stp 'process("/path/to/your/binary").function("*")'
The last line of that .stp script demonstrates the way to track time consumed in a given family of functions
probe $1.return { elapsed = gettimeofday_us()-#entry(gettimeofday_us()) }

How to measure the amount of data transmitted by my MPI program?

I'm experimenting my distributed clustering algorithm (implemented with MPI) on 24 computers that I set up as a cluster using BCCD (Bootable Cluster CD) that can be downloaded at http://bccd.net/.
I've written a batch program to run my experiment that consists in running my algorithm several times varying the number of nodes and the size of the input data.
I want to know the amount of data used in the MPI communications for each run of my algorithm so I can see how the amount of data changes when varying the previous mentioned parameters. And I want to do all this automatically using a batch program.
Someone told me to use tcpdump, but I found some difficulties in this approach.
First, I don't know how to call tcpdump in my batch program (which is written in C++ using the command system for making calls) before each run of my algorithm, since tcpdump requires another terminal to run in parallel with my application. And I can't run tcpdump in another computer since the network uses a switch. So I need to run it on the master node.
Second, I saw the traffic with tcpdump while my experiment was going on and I couldn't figure out what was the port used by MPI. It seems to use many ports. I wanted to know that for filtering the packages.
Third, I tried capturing whole packages and saving it to a file using tcpdump and in a few seconds the file was 3,5MB. But my whole experiment takes 2 days. So the final log file will be huge if I follow this approach.
The ideal approach would be to capture just the size field in the header of the packages and sum this up to obtain the total amount of data transmitted. In that way the logfile would be much smaller than if I were capturing the whole package. But I don't know how to do it.
Another restriction is that I don't have access to the computer disc. So I only have the RAM and my 4GB USB Flash drive. So I can't have huge logfiles.
I have already thought about using some MPI tracing or profiling tool such as those mentioned at http://www.open-mpi.org/faq/?category=perftools. I have only tested Sun Performance Analyzer until now. The problem is that I guess it will be difficult to install those tools on BCCD and maybe even impossible. In addtion to that, this tool will make my experiment take longer to end, sice it adds overhead. But if someone is familiar with BCCD and think it is a good choice to use one of those tools, so please let me know.
Hope someone have a solution.
Implementations like tcpdump won't work if there are multi-core nodes which use shard memory to communicate, anyway.
Using something like MPE is almost certainly the way to go. Those tools add very little overhead, and some overhead is always going to be necessary if you want to count messages. You can use mpitrace to write out every MPI call, and parse the resulting text file yourself. By the way, note that MPE is explicitly discussed on the bccd website. MPICH2 comes with MPE built in, but it can be compiled for any implementation. I've only found a very modest overhead for MPE.
IPM is another nice tool that does counting of messages and sizes; you should be able either parse the XML output, or use the postprocessing tools and just manually integrate the graphs (say either bytes_rx/bytes_tx by rank, or the message buffer size/count graph). The overhead for IPM is even less than for MPE, and mostly comes after the program's finished running to do the file I/O.
If you were really super worried about the overhead with either of these approaches, you could always write your own MPI wrappers using the profiling interface that wrapped MPI_Send, MPI_Recv, etc, and just counted # of bytes sent and recieved for each process, and output only that total at the end.