Distributed compile with bitbake

Distributed compile with bitbake - build

Do you got any idea how to build an oe project with distributed bitbake compile?
I've thinked about distcc.
export PATH=~/distcc/bin:$PATH
make -jn CC=linux-gcc
make will call linux-gcc from my path which points to distcc.
distcc will schedule the tasks to all known hosts.
-jn will create n6 instances of make.
It works fine.
But now I want to use distcc with bitbake.
I know how to use -jn with bitbake.
Just use export PARALLEL_MAKE=-jn
But how to use export PATH=~/distcc/bin:$PATH with bitbake.
The distcc/bin MUST stand in front of the $PATH.
But bitbake will place the $PATH_prepend (placed in org.openembedded.dev/conf/bitbake.conf) in front of the $PATH.
Or someone got another tool for better way of distributed building with bitbake?

Try Icecream: https://github.com/icecc/icecream/blob/master/README.md
Like distcc, Icecream takes compile jobs from a build and distributes it among remote machines allowing a parallel build. But unlike distcc, Icecream uses a central server that dynamically schedules the compile jobs to the fastest free server.
Both OpenEmbedded and Yocto Project support Icecream. See https://git.yoctoproject.org/cgit.cgi/poky/plain/meta/classes/icecc.bbclass
Install Icecream, then add the following to your site.conf or local.conf
INHERIT += "icecc"
# This value overrides PARALLEL_MAKE when ICECC is enabled
# This would enable icecc for local and cross
ICECC_PARALLEL_MAKE = "-j 24"

A slightly more BitBake-ish way of invoking builds with parallelism is to edit local.conf and un-comment BB_NUMBER_THREADS and PARALLEL_MAKE and set their values to twice the number of cores that you have. Now, whenever you invoke BitBake, it will use these values.

Related

Can Cmake add 'time' before ./main in the command line to measure program execution time?

I am wanting to measure the time it takes for my C++ video processing program to process a video. I am using CLion to write the program and have Cmake set up to compile and automatically run the program with a test video. However, in order to find execution time I have been using the following command in the MacOS terminal:
% time ./main ../Media/test_video.mp4
Is there a way for me to configure Cmake to automatically include time in the execution of ./main to streamline my process further?
So far I've tried using set(CMAKE_ARGS time "test_video.mp4") and some command line argument functions but they don't seem to be acting in the way that I'm looking for.

It is possible to use add_custom_target to do what you want. I'll not consider this option further as it seems abusing the build system for something it wasn't designed to do. Yet it may have an advantage over using CLion configuration: it would be available to be used outside of CLion. That advantage seems minor: why not run the desired command directly in those contexts?
The first CLion method would be to define an external tool which run time on the current build target. In File|Settings...|Tools|External Tools define a new tool with /bin/time as program, $CMakeCurrentProductFile$ $Prompt$ as arguments. When choosing that tools (in Tools|External Tools) it will now prompt you for the argument and then run /bin/time on the current target with the provided arguments. Advantage: you don't have to define the tool once, it will be available in every project. Inconvenients: the external tools are available in every project, thus it doesn't make sense to be more specific than $Prompt$ for the arguments and the working directory; it isn't possible to specify environment variables, it isn't possible to enforce the need of a build before running the command.
The second CLion method would to be define a Run/Debug Configuration. Again use /bin/time as program (chose "Custom Executable"), specify $CMakeCurrentProductFile$ as first argument (here it makes sense to provide the other arguments as desired, but note that $Prompt$ is still a valid choice if needed). Advantages: it makes sense to be as specific as needed; you have all the feature of configurations (environment variables, input redirections, specifying actions to be executed before the execution). Inconvenient: it doesn't work with other CLion features which assume that the program is the target such as the debugger, the profiler, ... so you may have to duplicate configurations to get them.
Note that the methods aren't exclusive: you can define an external tools and then add configurations for the case where it is more convenient.

How to use docker to test multiple compiler versions

What is the idiomatic way to write a docker file for building against many different versions of the same compiler?
I have a project which tests against a wide-range of versions of different compilers like gcc and clang as part of a CI job. At some point, the agents for the CI tasks were updated/changed, resulting in newer jobs failing -- and so I've started looking into dockerizing these builds to try to guarantee better reliability and stability.
However, I'm having some difficulty understanding what a proper and idiomatic approach is to producing build images like this without causing a large amount of duplication caused by layers.
For example, let's say I want to build using the following toolset:
gcc 4.8, 4.9, 5.1, ... (various versions)
cmake (latest)
ninja-build
I could write something like:
# syntax=docker/dockerfile:1.3-labs
# Parameterizing here possible, but would cause bloat from duplicated
# layers defined after this
FROM gcc:4.8
ENV DEBIAN_FRONTEND noninteractive
# Set the work directory
WORKDIR /home/dev
COPY . /home/dev/
# Install tools (cmake, ninja, etc)
# this will cause bloat if the FROM layer changes
RUN <<EOF
apt update
apt install -y cmake ninja-build
rm -rf /var/lib/apt/lists/*
EOF
# Default command is to use CMak
CMD ["cmake"]
However, the installation of tools like ninja-build and cmake occur after the base image, which changes per compiler version. Since these layers are built off of a different parent layer, this would (as far as I'm aware) result in layer duplication for each different compiler version that is used.
One alternative to avoid this duplication could hypothetically be using a smaller base image like alpine with separate installations of the compiler instead. The tools could be installed first so the layers remain shared, and only the compiler changes as the last layer -- however this presents its own difficulties, since it's often the case that certain compiler versions may require custom steps, such as installing certain keyrings.
What is the idiomatic way of accomplishing this? Would this typically be done through multiple docker files, or a single docker file with parameters? Any examples would be greatly appreciated.

I would separate the parts of preparing the compiler and doing the calculation, so the source doesn't become part of the docker container.
Prepare Compiler
For preparing the compiler I would take the ARG approach but without copying the data into the container. In case you wanna fast retry while having enough resources you could spin up multiple instances the same time.
ARG COMPILER=gcc:4.8
FROM ${COMPILER}
ENV DEBIAN_FRONTEND noninteractive
# Install tools (cmake, ninja, etc)
# this will cause bloat if the FROM layer changes
RUN <<EOF
apt update
apt install -y cmake ninja-build
rm -rf /var/lib/apt/lists/*
EOF
# Set the work directory
VOLUME /src
WORKDIR /src
CMD ["cmake"]
Build it
Here you have few options. You could either prepare a volume with the sources or use bind mounts together with docker exec like this:
#bash style
for compiler in gcc:4.9 gcc:4.8 gcc:5.1
do
docker build -t mytag-${compiler} --build-arg COMPILER=${compiler} .
# place to clean the target folder
docker run -v $(pwd)/src:/src mytag-${compiler}
done
And because the source is not part of the docker image you don't have bloat. You can also have two mounts, one for a readonly source tree and one for the output files.
Note: If you remove the CMake command you could also spin up the docker containers in parallel and use docker exec to start the build. The downside of this is that you have to take care of out of source builds to avoid clashes on the output folder.

put an ARG before the FROM and then invoke the ARG as the FROM
so:
ARG COMPILER=gcc:4.8
FROM ${COMPILER}
# rest goes here
then you
docker build . -t test/clang-8 --build-args COMPILER=clang-8
or similar.
If you want to automate just make a list of compilers and a bash script looping over the lines in your file, and paste the lines as inputs to the tag and COMPILER build args.
As for Cmake, I'd just do:
RUN wget -qO- "https://cmake.org/files/v3.23/cmake-3.23.1-linux-"$(uname -m)".tar.gz" | tar --strip-components=1 -xz -C /usr/local
When copying, I find it cleaner to do
WORKDIR /app/build
COPY . .
edit: formatting

As far as I know, there is no way to do that easily and safely. You could use a RUN --mount=type=cache, but the documentation clearly says that:
Contents of the cache directories persist between builder invocations without invalidating the instruction cache. Cache mounts should only be used for better performance. Your build should work with any contents of the cache directory as another build may overwrite the files or GC may clean it if more storage space is needed.
I have not tried it but I guess the layers are duplicated anyway, you just save time, assuming the cache is not emptied.
The other possible solution you have is similar to the one you mention in the question: starting with the tools installation and then customizing it with the gcc image. Instead of starting with an alpine image, you could start FROM scratch. scratch is basically the empty image, you could COPY the files generated by
RUN <<EOF
apt update
apt install -y cmake ninja-build
rm -rf /var/lib/apt/lists/*
EOF
Then you COPY the entire gcc filesystem. However, I am not sure it will work because the order of the initial layers is now reversed. This means that some files that were in the upper layer (coming from tools) now are in the lower layer and could be overwritten. In the comments, I asked you for a working Dockerfile because I wanted to try this out before answering. If you want, you can try this method and let us know. Anyway, the first step is extracting the files created from the tools layer.
How to extract changes from a layer?
Let's consider this Dockerfile and build it with docker build -t test .:
FROM debian:10
RUN apt update && apt install -y cmake && ( echo "test" > test.txt )
RUN echo "new test" > test.txt
Now that we have built the test image, we should find 3 new layers. You mainly have 2 ways to extract the changes from each layer:
the first is docker inspecting the image and then find the ids of the layers in the /var/lib/docker folder, assuming you are on Linux. Each layer has a diff subfolder containing the changes. Actually, I think it is more complex than this, that is why I would opt for...
skopeo: you can install it with apt install skopeo and it is a very useful tool to operate on docker images. The command you are interested in is copy, that extracts the layers of an image and export them as .tar:
skopeo copy docker-daemon:{image_name}:latest "dir:/home/test_img"
where image_name is test in this case.
Extracting layer content with Skopeo
In the specified folder, you should find some tar files and a configuration file (look at the skopeo copy command output and you will know which one is that). Then extract each {layer}.tar in a different folder and you are done.
Note: to find the layer containing your tools just open the configuration file (maybe using jq because it is json) and take the diff_id that corresponds to the RUN instruction you find in the history property. You should understand it once you open the JSON configuration. This is unnecessary if you have a small image that has, for example, debian as parent image and a single RUN instruction containing the tools you want to install.
Get GCC image content
Now that we have the tool layer content, we need to extract the gcc filesystem. we don't need skopeo for this one, but docker export is enough:
create a container from gcc (with the tag you need):
docker create --name gcc4.8 gcc:4.8
export it as tar:
docker export -o gcc4.8.tar gcc4.8
finally extract the tar file.
Putting all together
The final Dockerfile could be something like:
FROM scratch
COPY ./tools_layer/ /
COPY ./gcc_4.x/ /
In this way, the tools layer is always reused (unless you change the content of that folder, of course), but you can parameterize the gcc_4.x with the ARG instruction for example.
Read carefully: all of this is not tested but you might encounter 2 issues:
the gcc image overwrites some files you have changed in the tools layer. You could check if this happens by computing the diff between the gcc layer folder and the tools layer folder. If it happens, you can only keep track of that file/s and add it/them in the dockerfile after the COPY ./gcc ... with another COPY.
When in the upper layer a file is removed, docker marks that file with a .wh extension (not sure if it is different with skopeo). If in the tools layer you delete a file that exists in the gcc layer, then that file will not be deleted using the above Dockerfile (the COPY ./gcc ... instruction would overwrite the .wh). In this case too, you would need to add an additional RUN rm ... instruction.
This is probably not the correct approach if you have a more complex image that the one you are showing us. In my opinion, you could give this a try and just see if this works out with a single Dockerfile. Obviously, if you have many compilers, each one having its own tools set, the maintainability of this approach could be a real burden. Instead, if the Dockerfile is more or less linear for all the compilers, this might be good (after all, you do not do this every day).
Now the question is: is avoiding layer replication so important that you are willing to complicate the image-building process this much?

How to build same library more than once in Yocto?

I have 2 application, both uses the same library but the library should be build with a flag enabled in one and disabled in other. this is a static library, so at run time there won't be a conflict in runtime. But the library is separate ie, the application is build separately and the library is separate. In each configuration, the library will be build with a different name which is taken care by the makefile. This can be done manually. but now I need to add it to Yocto.
In yocto, how can I build the same library 2 times in separate configuration?

If you're limited to .bbappend and you don't want to duplicate the recipe, you can add some additional tasks then. In these additional tasks (after regular installation) you can do configuration/compilation/installation once again but with any kind of additional actions/variable overrides or whatever. Something like this:
do_special_configure() {
oe_runmake clean
export MAGIC_VARIABLE="magic value"
do_configure
}
do_special_compile() {
export MAGIC_VARIABLE="magic value"
do_compile
}
fakeroot do_special_install() {
export MAGIC_VARIABLE="magic value"
do_install
}
do_special_configure[dirs] = "${B}"
do_special_compile[dirs] = "${B}"
do_special_install[dirs] = "${B}"
addtask special_configure after do_install before do_special_compile
addtask special_compile after do_special_configure before do_special_install
addtask special_install after do_special_compile before do_package do_populate_sysroot

If the different configurations really produce different installed files, then you'll have no problems adding two separate recipes that just happen to have the same SRC_URI

Well, you can't, not without two recipes.
Your two applications, can't influence in any way, how the library is being used. Thus, your options (as long as both these two applications should be available for the same machine / distro combination) basically are:
Create a 2nd recipe (in this case, likely in your layer, though preferably in the upstream layer). If the recipe you're copying uses in .inc and a small .bb that mostly includes that file, you can easily do just the same. Otherwise, your options are to either copy the recipe and modify it, or to have your new recipe
require <PATH_FROM COREBASE-TO-THE-UPSTREAM-RECIPE>/upstream-recipe.bb
If possible, modify the upstream recipe (preferably using a .bbappend) to simultaneously build both versions that you require.

GNU make - accelerate non-parallel makefile without modification

I have a project consisting of a set of makefiles that CANNOT be run with make --jobs=N because the dependencies are not specified tightly enough for make to correctly execute the recipes in correct order (ie I get race conditions).
I am currently using Huddle, by Electric-Cloud.com, and it does exactly what I need: it parses the makefile and then executes the jobs in parallel and accounts for the unspecified dependencies.
Question: is there a free or free-er thing that does this?
Yes I know I could re-write the makefiles but project management says "no way".
UPDATE #1
I understand that I'll have to do some work to get functionality similar to Electric-Cloud's functionality.
I know that Electric-Cloud parses the makefile(s) to find the dependencies so wouldn't the same thing be accomplished using makedepend?
I'm thinking:
Run makedepend on existing makefiles
Feed in the output using include <makedepend.output>
make all --jobs=64
UPDATE 2
Turns out makedepend is specific to C/C++: it merely runs the pre-processor on source files and parses any #include statements; not what I need.
I need what this guy is asking for:
Build a makefile dependency / inheritance tree
UPDATE 3
The makefile "dependency graph generator" actually already exists
http://plindenbaum.blogspot.com/2012/11/visualizing-dependencies-of-makefile.html?m=1
but that's not going to help me.
Many of my recipes create directories which are used by other targets' recipes, effectively making them implicit prerequisites.
The graph dependency tool at above URL works by parsing the build log's statements but those statements don't indicate the implicit dependencies.
Even if I try to run my makefile with --dry-run, the build fails because some of the recipes that aren't executed - cause it's a dry run - create directories that other invocations of make need simply to 'pretend execute' a recipe.
UPDATE 4
Electric-Cloud has made Huddle - 4 local cores, non-clustered - free for anyone forever.
Furthermore, they output an .xml file that lists each job's dependencies so I can use it to fix my makefiles compatible so they're compatible with the --jobs option.

I am currently using Huddle, by Electric-Cloud.com, and it does exactly what I need: it parses the makefile and then executes the jobs in parallel and accounts for the unspecified dependencies.
I actually don't know about these tools, but can't you provide them with a super makefile under your control, that clarifies the inner dependencies of the various targets?
You probably just have to add some indirection level for these (imported?) projects directory structure and another Makefile.

Chaining Hadoop MapReduce with Pipes (C++)

Does anyone know how to chain two MapReduce with Pipes API?
I already chain two MapReduce in a previous project with JAVA, but today I need to use C++. Unfortunately, I haven't seen any examples in C++.
Has someone already done it? Is it impossible?

Use Oozie Workflow. It allows you to use Pipes along with usual MapReduce jobs.

I finally manage to make Hadoop Pipes works. Here some steps to make works the wordcount examples available in src/examples/pipes/impl/.
I have a working Hadoop 1.0.4 cluster, configured following the steps described in the documentation.
To write a Pipes job I had to include the pipes library that is already compiled in the initial package. This can be found in C++ folder for both 32-bit and 64-bit architecture. However, I had to recompile it, which can be done following those steps:
# cd /src/c++/utils
# ./configure
# make install
# cd /src/c++/pipes
# ./configure
# make install
Those two commands will compile the library for our architecture and create a ’install’ directory in /src/c++ containing the compiled files.
Moreover, I had to add −lssl and −lcrypto link flags to compile my program. Without them I encountered some authentication exception at the running time.
Thanks to those steps I was able to run wordcount−simple that can be found in src/examples/pipes/impl/ directory.
However, to run the more complex example wordcount−nopipe, I had to do some other points. Due to the implementation of the record reader and record writer, we are directly reading or writing from the local file system. That’s why we have to specify our input and output path with file://. Moreover, we have to use a dedicated InputFormat component. Thus, to launch this job I had to use the following command:
# bin/hadoop pipes −D hadoop.pipes.java.recordreader=false −D hadoop.pipes.java.recordwriter=false −libjars hadoop−1.0.4/build/hadoop−test−1.0.4.jar −inputformat org.apache.hadoop.mapred.pipes.WordCountInputFormat −input file:///input/file −output file:///tmp/output −program wordcount−nopipe
Furthermore, if we look at org.apache.hadoop.mapred.pipes.Submitter.java of 1.0.4 version, the current implementation disables the ability to specify a non java record reader if you use InputFormat option.
Thus you have to comment the line setIsJavaRecordReader(job,true); to make it possible and recompile the core sources to take into account this change (http://web.archiveorange.com/archive/v/RNVYmvP08OiqufSh0cjR).
if(results.hasOption("−inputformat")) {
setIsJavaRecordReader(job, true);
job.setInputFormat(getClass(results, "−inputformat", job,InputFormat.class));
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js