How to make lcov perform faster?

How to make lcov perform faster? - c++

I'm having performance issues with lcov.
I'm executing a program in seven different profiles, collecting the coverage for each of them and then merging the coverage profile with lcov:
lcov --rc lcov_branch_coverage=1 -a coverage_1.dat -a coverage_2.dat -a coverage_3.dat -a coverage_4.dat -a coverage_5.dat -a coverage_6.dat -a coverage_7.dat -o coverage_full.dat
However, this is excruciatingly slow. It takes about 10 minutes to combine my 7 profiles, this is actually longer than it takes to compile and run the 7 profiles. Each dat file is about 1M lines.
The lcov --combine and lcov --remove steps are very slow as well. Around 45 seconds for each of them.
Is there any way to speedup this combine step ? I can use several threads if necessary and I have got plenty of memory. If there are other tool that are able to do this combination correctly, I'd be interest as well (I've tried to convert the files to Cobertura and do the merge with a Python script I found, but it crashes).
If there is an alternative to lcov completely, I'm also interested. I've been using gcovr, but with it, I have to use several other tools to do the combination and it is not optimal, but it is much faster.

If there is an alternative to lcov completely, I'm also interested.
Try fastcov - it will use all available cores in parallel (it can output the report in lcov info format):
https://github.com/RPGillespie6/fastcov
It can also combine files. Note: You need GCC 9+

Related

How to use gnu_parallel to run multiple executable and/or bash scripts?

I've been recently attempting to run my scripts in parallel in a more convenient way than to open a several instances of terminal and executing in scripts separately.
I've been trying to learn how to use gnu_parallel for the past couple of days and I am still a bit clueless, and hoping if someone can provide a direct example.
Suppose I have a g++ compiled code called blah.exe and a bash script called blah.sh that will run alone perfectly fine, but I want to execute them in different directories.
I've been reading
https://www.gnu.org/software/parallel/man.html#EXAMPLE:-Working-as-xargs--n1.-Argument-appending
and
https://www.biostars.org/p/182136/
but I am not totally clear about the syntax
To run these in series, I would do:
for i in 1 2 3 4
mv ./blah.exe directory$i
cd directory$i
./blah.exe all
cd ..
end
similarly
for i in 1 2 3 4
mv ./blah.sh directory$i
cd directory$i
source ./blah.sh all
cd ..
end
I am trying to under stand how I would split this load to 4 logical-threads in one command using parallel.
Could someone provide an example for this?
Thank you for your time.

Something like:
parallel --dry-run 'cd directory{}; ../blah.exe all; source ../blah.sh all' ::: {1..4}
No need to copy/move the executable, just run the same one.
No need to cd .. afterwards, as it's a new process each time.
Note this is not multi-threading, it is multi-processing.
If you want to process discontiguous directory numbers, you can use:
parallel ... ::: {1..4} 6 7 {11..14}
If you want to process all directories, you can use:
printf "%s\0" */ | parallel -0 'cd {}; pwd'
If you want to process all directories starting with FRED, you can use:
printf "%s\0" FRED*/ | parallel -0 'cd {}; pwd'

running parallel code on PC

I have fortran code that has been parallelized with OpenMP. I want to test my code on my PC before running on HPC. My PC has double core CPU and I work on Linux-mint. I installed gfortranmultilib and this is my script:
#!/bin/bash
### Job name
#PBS -N pme
### Keep Output and Error
#PBS -j eo
### Specify the number of nodes and thread (ppn) for your job.
#PBS -l nodes=1:ppn=2
### Switch to the working directory;
cd $PBS_O_WORKDIR
### Run:
OMP_NUM_THREADS=$PBS_NUM_PPN
export OMP_NUM_THREADS
ulimit -s unlimited
./a.out
echo 'done'
What should I do more to run my code?
OK, I changed script as suggested in answers:
#!/bin/bash
### Switch to the working directory;
cd Desktop/test
### Run:
OMP_NUM_THREADS=2
export OMP_NUM_THREADS
ulimit -s unlimited
./a.out
echo 'done'
my code and its executable file are in folder test on Desktop, so:
cd Desktop/test
is this correct?
then I compile my simple code:
implicit none
!$OMP PARALLEL
write(6,*)'hi'
!$OMP END PARALLEL
end
by command:
gfortran -fopenmp test.f
and then run by:
./a.out
but only one "hi" is printed as output. What should I do?
(and a question about this site: in situation like this I should edit my post or just add a comment?)

You don't need and probably don't want to use the script on your PC. Not even to learn how to use such a script, because these scripts are too much connected to the specifics of each supercomputer.
I use several supercomputers/clusters and I cannot just reuse the script from one at the other, because they are so much different.
On your PC you should just do:
optional, it is probably the default
export OMP_NUM_THREADS=2
to set the number of OpenMP threads to 2. Adjust if you need some other number.
cd to the working directory
cd my_working_directory
Your working directory is the directory where you have the required data or where the executable resides. In your case it seems to be the directory where a.out is.
run the damn thing
ulimit -s unlimited
./a.out
That's it.
You can also store the standard output and error output to a file
./out > out.txt 2> err.txt
to mimic the supercomputer behaviour.
The PBS variables are only set when you run the script using qsub. You probably don't have that on your PC and you probably don't want to have it either.
$PBS_O_WORKDIR is the directory where you run the qsub command, unless you set it differently by other means.
$PBS_NUM_PPN is the number you indicated in #PBS -l nodes=1:ppn=2. The queue system reads that and sets this variable for you.

The script you posted is for Portable Batch System (https://en.wikipedia.org/wiki/Portable_Batch_System) queue system. That means, that the job you want to run on the HPC infrastructure has to go first into the queue system and when the resources are available the job will run on the system.
Some of the commands (those starting with #PBS) are specific commands for this queue system. Among these commands, some allow the user to indicate the application process hierarchy (i.e. number of processes and threads). Also, keep in mind that since all the PBS commands start by # they are ignored by regular shell script execution. In the case you presented, that is given by
### Specify the number of nodes and thread (ppn) for your job.
#PBS -l nodes=1:ppn=2
which as the comment indicates it should tell the queue system that you want to run 1 process and each process will have 2 threads. The queue system is likely to pass these parameters to the process launcher (srun/mpirun/aprun/... for MPI apps in addition to OMP_NUM_THREADS for OpenMP apps).
If you want to run this job on a computer that does not have PBS queue, you should be aware at least of two things.
1) The following command
### Switch to the working directory;
cd $PBS_O_WORKDIR
will be translated into "cd" because the environment variable PBS_O_WORKDIR is only defined within the PBS job context. So, you should change this command (or execute another cd command just before the execution) in order to fix where you want to run the job.
2) Similarly for PBS_NUM_PPN environment variable,
OMP_NUM_THREADS=$PBS_NUM_PPN
export OMP_NUM_THREADS
this variable won't be defined if you don't run this within a PBS job context, so you should set OMP_NUM_THREADS to the value you want (2, according to your question) manually.

If you want your linux box environment to be like an HPC login node. You can do the following
Make sure that your compiler supports OpenMP, test a simple hello world program with OpenMP flags
Install OpenMPI on your system from your favourite package manager or download the source/binary from the website (OpenMPI Download)
I would not recommend installing cluster manager like Slurm for your experiments
After you are done, you can execute your MPI programs through the mpirun wrapper
mpirun -n <no_of_cores> <executable>
EDIT:
This is assuming that you are running this only MPI. Note that OpenMP utilizes the cores as well. If you are running MPI+OpenMP - n*OMP_NUM_THREADS=cores on a single node.

are there options to speed up dpkg-buildpackage

Im back porting ffmpeg to an older version of debian.
everything is going well, but its so slow.
I am running dpkg-buildpackage -us -uc
with a debian rules file that looks like this:
#!/usr/bin/make -f
%:
dh $#
override_dh_auto_configure:
./configure
I notice, this is only processing on 1 core.
is there anything like make -j 4 that I could use to speed this up?
I've been using this guide, but i don't see anything for speeding up the build step
https://www.debian.org/doc/manuals/maint-guide/

Sure, you can use -j 4 as an argument to dpkg-buildpackage. It is documented in the man page. The relevant section is:
-jjobs Number of jobs allowed to be run simultaneously, equivalent to
the make(1) option of the same name. Will add itself to
the MAKEFLAGS environment variable, which should cause all
subsequent make invocations to inherit the option. Also adds
parallel=jobs to the DEB_BUILD_OPTIONS environment variable which
allows debian/rules files to use this information for their own
purposes. The parallel=jobs in DEB_BUILD_OPTIONS environment
variable will override the -j value if this option is given.

how can i use perf to profile my code?

I'm trying to use "perf" to see what's using all the CPU in my C++ program on Linux. I want to attach to a running process and get a list of symbols or line numbers that I can then go look at to optimize.

To attach to a process and see live updates of hotspots:
perf top -p $(pidof yourapp)
To attach to a process, then analyse it for later evaluation, do:
perf record -p $(pidof yourapp)
And later:
perf report
For both, top and record, you can add --call-graph dwarf for dwarf-based callgraphs.
Note that you should compile your application with something like -O2 -g to get optimizations and debug symbols, otherwise you won't know functions names, files, line numbers etc. pp.

Build timeout on Travis CI when testing with lettuce + splinter and selenium

I have about 130 lettuce tests which runs fine locally, but when travis runs them it hangs after a few tests.
Here the tests fails at the 8th scenario: https://travis-ci.org/h3/django-editlive/jobs/3945466
And when I remove the last scenario it passes: https://travis-ci.org/h3/django-editlive/builds/3945648
I tried splitting my tests in separate features files, same problem.
It's doesn't seem to be caused by a specific scenario, but rather by the number of scenario ran.
According to Travis' docs:
Waiting for keyboard input or other kind of human interaction
Concurrency issues (deadlocks, livelocks and so on)
Installation of native extensions that take very long time to compile
The only possibility I could see is a concurrency issue .. but how can I debug it ?
My project is open source so the entire source code is available here:
https://github.com/h3/django-editlive
lettuce-features

I have no definitive answer about the problem, but I managed to work around it.
Since I had no output whatsoever I tried to strace my tests so I could see exactly where it hangs.
But the strace output was to big hand was trimmed by travis .. So I had to grep -v some lines.
Here's what it looks like in my .travis.yml file:
script:
- "strace -q python project/manage.py harvest 2>&1 | grep -v ENOENT"
ENOENT Stands for "No such file or directory", I didn't really need it to make sense of strace output and it cutted enough line to let me see where it hanged.
Turns out it was hanging on a request to selenium:
socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 4
connect(4, {sa_family=AF_INET, sin_port=htons(35146), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
send(4, "POST /hub/session/e7cba641-2842-"..., 359, 0) = 359
I couldn't really replace selenium, so I took a wild guess and replaced firefox with google chrome to run my test .. et voila. Tests ran perfectly.
It sucks that I haven't really solved the problem, but debugging remotely on travis ci is a PITA at best. And with a waiting time of 35min between each iterations I have more important things to do.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js