Processing files through multiple steps with Waf - build

I am trying to create an automated process with Waf to optimize, minify, etc. the source files of a website based on the HTML5 boilerplate's ANT build script. Part of this includes running all the PNG's in the img directory through two utilities, optipng and advpng.
This is my current best attempt at these tasks:
def build(ctx):
ctx(
rule = '${OPTIPNG} -quiet -o5 -i 1 -out ${TGT} ${SRC}',
source = 'img1.png',
target = 'img1-opt.png'
)
ctx(
rule = '${ADVPNG} -z -1 ${SRC}',
source = SOMETHING,
target = SOMETHING ELSE
)
I first run optipng on img1, where my first problem arises. I would like the output file to have the same name as the input file. However, setting target to the same name results in Waf detecting a deadlock. So, I moved ahead by appending a suffix.
advpng is a bit strange in that it doesn't create a new output file: it modifies the input file in place. So, I now need a way to access the output of optipng, which now resides in the build's output directory.
What is the proper way of accomplishing this task?

The first problem is addressed in part by the waf book section 6.3. That is about copying, so you have to tweak it. Part of section 11 on Providing arbitrary configuration files is somewhat relevant, as well.
You have to write some Python to resolve the target file in the build directory. This script works:
top = '.'
out = 'build'
def configure(ctx):
pass
def build(ctx):
txt = ctx.srcnode.find_node('text.txt')
ctx(
rule = 'cp ${SRC} ${TGT}',
source = txt,
target = txt.get_bld() # Or: ctx.bldnode.make_node('text.txt')
)
ctx(
rule = 'wc ${SRC} > ${TGT}',
source = 'text.txt', # Or: txt.get_bld(),
target = 'count.txt'
)
For the second step, waf seemed to resolve the literal source file name in the build directory, but you could also use the bld node from step 1 to resolve it explicitly.

Related

How can I implement a command line option in Bazel to switch between which dependency version is used for building?

For some background, the C++ program I am working on has the possibility to interoperate with some other applications that use various Protobuf versions. In the source code for my program, I have the compiled .pb.cc files from these other applications for the Protobuf interface. These .pb.cc files were compiled with a particular version of Protobuf, and I don't have any control over this. I am using Bazel to build, and I want to be able to specify a Bazel build configuration for my program, which will use a particular version of Protobuf which matches that of one of the possible other applications.
Originally, I wanted to put something in the .bazelrc file so that I can specify a particular version of Protobuf depending on the config, for example:
# in .bazelrc:
build:my_config --protobuf_version=3_20_1
build:my_other_config --protobuf_version=3_21_6
Then from the terminal, I could build with the command
bazel build --config=my_config //path/to/target:target
which would build as if I had typed
bazel build --protobuf_version=3_20_1 //path/to/target:target
At this point, I wanted to use the select() function, as detailed in the Bazel docs for Configurable Build Attributes, to use a particular Protobuf version during building. But, the Protobuf dependencies are all specified in the WORKSPACE file, which is more limited than a BUILD file, and this select() function cannot be used there. So then my idea was to pull in every version of the Protobuf library that I would possibly need, and give them different names in the WORKSPACE file, and then in the BUILD files, use a select() function to choose the correct version. But, the Bazel rule for compiling the proto_library is used as such:
proto_library(
name = "foo",
srcs = ["foo.proto"],
strip_import_prefix = "/foo/bar/baz",
)
I don't see of any opportunity to use a select() function here to specify which Protobuf version's proto_library rule should be used. The proto_library rule is also defined in from the WORKSPACE file with:
load("#rules_proto//proto:repositories.bzl", "rules_proto_dependencies", "rules_proto_toolchains")
rules_proto_dependencies()
rules_proto_toolchains()
Now, I would say that I am stuck. I don't see a way to specify on the command line which version of Protobuf should be used with the proto_library rule.
In the end, I would like a way to do the equivalent in the WORKSPACE file of
# in WORKSPACE
if my_config:
# specific protobuf version:
http_archive(
name = "com_google_protobuf",
sha256 = "8b28fdd45bab62d15db232ec404248901842e5340299a57765e48abe8a80d930",
strip_prefix = "protobuf-3.20.1",
urls = ["https://github.com/protocolbuffers/protobuf/archive/v3.20.1.tar.gz"],
)
elif my_other_config:
# same as above, but with different version
else:
# same as above, but with default version
According to some google groups discussion, this doesn't seem to be possible in the WORKSPACE file, so I would need to do it in a BUILD file, but the dependencies are specified in the WORKSPACE.
I figured out a way that works that seems to go against Bazel's philosophy, but most importantly does what I want.
The repository dependencies are loaded in the first of two steps, the first involving the WORKSPACE file, and the second involving the BUILD file. Command line flags for the build cannot be normally be directly passed to the WORKSPACE, but it is possible to get some information to the WORKSPACE by setting an environment variable and creating a repository_rule. In the WORKSPACE, this environment variable can be used, for example, to change the url argument to http_archive which specifies the dependency version.
This repository rule is created in a separate file .bzl file, which is then loaded in the WORKSPACE. As a generalized example of how get environment variable values into the WORKSPACE, the following file my_repository_rule.bzl could be created:
# in file my_repository_rule.bzl
def _my_repository_rule_impl(repository_ctx):
# read the particular environment variable we are interested in
config = repository_ctx.os.environ.get("MY_CONFIG_ENV_VAR", "")
# necessary to create empty BUILD file for this rule
# which will be located somewhere in the Bazel build files
repository_ctx.file("BUILD")
# some logic to do something based on the value of the environment variable passed in:
if config.lower() == "example_config_1":
ADDITIONAL_INFO = "foo"
elif config.lower() == "example_config_2":
ADDITIONAL_INFO = "bar"
else:
ADDITIONAL_INFO = "baz"
# create a temporary file called config.bzl to be loaded into WORKSPACE
# passing in any desired information from this rule implementation
repository_ctx.file("config.bzl", content = """
MY_CONFIG = {}
ADDITIONAL_INFO = {}
""".format(repr(config), repr(ADDITIONAL_INFO ))
)
my_repository_rule = repository_rule(
implementation=_my_repository_rule_impl,
environ = ["MY_CONFIG_ENV_VAR"]
)
This can be used in the WORKSPACE as such:
# in file WORKSPACE
load("//:my_repository_rule.bzl", "my_repository_rule ")
my_repository_rule(name = "local_my_repository_rule ")
load("#local_my_repository_rule //:config.bzl", "MY_CONFIG", "ADDITIONAL_INFO")
print("MY_CONFIG = {}".format(MY_CONFIG))
print("ADDITIONAL_INFO = {}".format(ADDITIONAL_INFO))
When a target is built with bazel build, the WORKSPACE will receive the value of the MY_CONFIG_ENV_VAR from the terminal and store it in the Starlark variable MY_CONFIG, and any other additional information determined in the implementation.
The environment variable can be passed by normal means, such as typing in a bash shell, for example:
MY_CONFIG_ENV_VAR=example_config_1 bazel build //path/to/target:target
It can also be passed as a flag with the --repo_env flag. This flag sends an extra environment variable to be available to the repository rules, meaning the following is equivalent:
bazel build --repo_env=MY_CONFIG_ENV_VAR=example_config_1 //path/to/target:target
This can be made easier to switch between by including the following in the .bazelrc file:
# in file .bazelrc
build:my_config_1 --repo_env=MY_CONFIG_ENV_VAR=example_config_1
build:my_config_2 --repo_env=MY_CONFIG_ENV_VAR=example_config_2
So running bazel build --config=my_config_1 //path/to/target:target will show the debug output from the print statements in WORKSPACE as the following:
MY_CONFIG = example_config_1
ADDITIONAL_INFO = foo
If ADDITIONAL_INFO in the rule implementation (in the file my_repository_rule.bzl) were set to a version number such as "3.20.1", then the WORKSPACE could, for example, use this in an http_archive call to pull the desired version of the dependency.
# in file WORKSPACE
if ADDITIONAL_INFO == "3.20.1":
sha256 = "8b28fdd45bab62d15db232ec404248901842e5340299a57765e48abe8a80d930"
http_archive(
name = "com_google_protobuf",
sha256 = sha256,
strip_prefix = "protobuf-{}".format(ADDITIONAL_INFO),
urls = ["https://github.com/protocolbuffers/protobuf/archive/v{}.tar.gz".format(ADDITIONAL_INFO)],
)
Of course, the value of the sha256 kwarg could also be passed in from the repository rule as a separate string variable, or as part of a dictionary, for example.

How to write files to current directory instead of bazel-out

I have the following directory structure:
my_dir
|
--> src
| |
| --> foo.cc
| --> BUILD
|
--> WORKSPACE
|
--> bazel-out/ (symlink)
|
| ...
src/BUILD contains the following code:
cc_binary(
name = "foo",
srcs = ["foo.cc"]
)
The file foo.cc creates a file named bar.txt using the regular way with <fstream> utilities.
However, when I invoke Bazel with bazel run //src:foo the file bar.txt is created and placed in bazel-out/darwin-fastbuild/bin/src/foo.runfiles/foo/bar.txt instead of my_dir/src/bar.txt, where the original source is.
I tried adding an outs field to the foo rule, but Bazel complained that outs is not a recognized attribute for cc_binary.
I also thought of creating a filegroup rule, but there is no deps field where I can declare foo as a dependency for those files.
How can I make sure that the files generated by running the cc_binary rule are placed in my_dir/src/bar.txt instead of bazel-out/...?
Bazel doesn't allow you to modify the state of your workspace, by design.
The short answer is that you don't want the results of the past builds to modify the state of your workspace, hence potentially modifying the results of the future builds. It'll violate reproducibility if running Bazel multiple times on the same workspace results in different outputs.
Given your example: imagine calling bazel run //src:foo which inserts
#define true false
#define false true
at the top of the src/foo.cc. What happens if you call bazel run //src:foo again?
The long answer: https://docs.bazel.build/versions/master/rule-challenges.html#assumption-aim-for-correctness-throughput-ease-of-use-latency
Here's more information on the output directory: https://docs.bazel.build/versions/master/output_directories.html#documentation-of-the-current-bazel-output-directory-layout
There could be a workaround to use genrule. Below is an example that I use genrule to copy a file to the .git folder.
genrule(
name = "precommit",
srcs = glob(["git/**"]),
outs = ["precommit.txt"],
# folder contain this BUILD.bazel file is tool which will be symbol linked, we use cd -P to get to the physical path
cmd = "echo 'setup pre-commit.sh' > $(OUTS) && cd -P tools && ./path/to/your-script.sh",
local = 1, # required
)
If you're passing the name of the output file in when running, you can simply use absolute paths. To make this easier, you can use the realpath utility if you're in linux. If you're on a mac, it is included in brew install coreutils. Then running it looks something like:
bazel run my_app_dir:binary_target -- --output_file=`realpath relative/path/to.output
This has been discussed and explained in a Bazel issue. Recommendation is to use a tool external to Bazel:
As I understand the use-case, this is out-of-scope for building and in the scope of, perhaps, workspace configuration. What I'm sure of is that an external tool would be both easier and safer to write for this purpose, than to introduce such a deep design change to Bazel.
The tool would copy the files from the output tree into the source tree, and update a manifest file (also in the source tree) that lists the path-digest pairs. The sources and the manifest file would all be versioned. A genrule or a sh_test would depend on the file-generating genrules, as well as on this manifest file, and compare the file-generating genrules' outputs' digests (in the output tree) to those in the manifest file, and would fail if there's a mismatch. In that case the user would need to run the external tool, thus update the source tree and the manifest, then rerun the build, which is the same workflow as you described, except you'd run this tool instead of bazel regenerate-autogenerated-sources.

How do I set up scons to build a project that has generated source files?

I'm working on a C++ project that has some hand coded source files, as well as some source and header files that are generated by a command line tool.
The actual source and header files generated are determined by the contents of a JSON file that the tool reads, and so cannot be hardcoded into the scons script.
I would like to set up scons so that if I clean the project, then make it, it will know to run the command line tool to generate the generated source and header files as the first step, and then after that compile both my hand coded files and the generated source files and link them to make my binary.
Is this possible? I'm at a loss as to how to achieve this, so any help would be much appreciated.
Yes, this is possible. Depending on which tool you're using to create the header/source files, you want to check our ToolIndex at https://bitbucket.org/scons/scons/wiki/ToolsIndex , or read our guide https://bitbucket.org/scons/scons/wiki/ToolsForFools for writing your own Builder.
Based on your description you'll probably have to write your own Emitter, which parses the JSON input file and returns the filenames that will finally result from the call. Then, all you need to do is:
# creates foo.h/cpp and bar.h/cpp
env.YourBuilder('input.json')
env.Program(Glob('*.cpp'))
The Glob will find the created files, even if they don't physically exist on the hard drive yet, and add them to the overall dependencies.
If you have further questions or problems arise, please consider subscribing to our User mailing list at scons-users#scons.org (see also http://scons.org/lists.html ).
Thanks to Dirk Baechle I got this working - for anyone else interested here is the code I used.
import subprocess
env = Environment( MSVC_USE_SCRIPT = "c:\\Program Files (x86)\\Microsoft Visual Studio 11.0\\VC\\bin\\vcvars32.bat")
def modify_targets(target, source, env):
#Call the code generator to generate the list of file names that will be generated.
subprocess.call(["d:/nk/temp/sconstest/codegenerator/CodeGenerator.exe", "-filelist"])
#Read the file name list and add a target for each file.
with open("GeneratedFileList.txt") as f:
content = f.readlines()
content = [x.strip('\n') for x in content]
for newTarget in content:
target.append(newTarget)
return target, source
bld = Builder(action = 'd:/nk/temp/sconstest/codegenerator/CodeGenerator.exe', emitter = modify_targets)
env.Append(BUILDERS = {'GenerateCode' : bld})
env.GenerateCode('input.txt')
# Main.exe depends on all the CPP files in the folder. Note that this
# will include the generated files, even though they may not currently
# exist in the folder.
env.Program('main.exe', Glob('*.cpp'))
There's an example at:
https://github.com/SCons/scons/wiki/UsingCodeGenerators
I'll also echo Dirk's suggestion to join the users mailing list.

Adding include path to Waf configuration (C++)

How can I add a include path to wscript?
I know I can declare which files from which folders I want to include per any cpp file, like:
def build(bld):
bld(features='c cxx cxxprogram',
includes='include',
source='main.cpp',
target='app',
use=['M','mylib'],
lib=['dl'])
but I do not want to set it per every file. I want to add a path to "global includes" so it will be included everytime any file will be compiled.
I've found an answer. You have to simply set the value of 'INCLUDES' to list of paths you want.
Do not forget to run waf configure again :)
def configure(cfg):
cfg.env.append_value('INCLUDES', ['include'])
I spent some time working out a good way to do this using the "use" option in bld.program() methods. Working with the boost libraries as an example, I came up with the following. I hope it helps!
'''
run waf with -v option and look at the command line arguments given
to the compiler for the three cases.
you may need to include the boost tool into waf to test this script.
'''
def options(opt):
opt.load('compiler_cxx boost')
def configure(cfg):
cfg.load('compiler_cxx boost')
cfg.check_boost()
cfg.env.DEFINES_BOOST = ['NDEBUG']
### the following line would be very convenient
### cfg.env.USE_MYCONFIG = ['BOOST']
### but this works too:
def copy_config(cfg, name, new_name):
i = '_'+name
o = '_'+new_name
l = len(i)
d = {}
for key in cfg.env.keys():
if key[-l:] == i:
d[key.replace(i,o)] = cfg.env[key]
cfg.env.update(d)
copy_config(cfg, 'BOOST', 'MYCONFIG')
# now modify the new env/configuration
# this adds the appropriate "boost_" to the beginning
# of the library and the "-mt" to the end if needed
cfg.env.LIB_MYCONFIG = cfg.boost_get_libs('filesystem system')[-1]
def build(bld):
# basic boost (no libraries)
bld.program(target='test-boost2', source='test-boost.cpp',
use='BOOST')
# myconfig: boost with two libraries
bld.program(target='test-boost', source='test-boost.cpp',
use='MYCONFIG')
# warning:
# notice the NDEBUG shows up twice in the compilation
# because MYCONFIG already includes everything in BOOST
bld.program(target='test-boost3', source='test-boost.cpp',
use='BOOST MYCONFIG')
I figured this out and the steps are as follows:
Added following check in the configure function in wscript file. This tells the script to check for the given library file (libmongoclient in this case), and we store the results of this check in MONGOCLIENT.
conf.check_cfg(package='libmongoclient', args=['--cflags', '--libs'], uselib_store='MONGOCLIENT', mandatory=True)
After this step, we need to add a package configuration file (.pc) into /usr/local/lib/pkgconfig path. This is the file where we specify the paths to lib and headers. Pasting the content of this file below.
prefix=/usr/local
libdir=/usr/local/lib
includedir=/usr/local/include/mongo
Name: libmongoclient
Description: Mongodb C++ driver
Version: 0.2
Libs: -L${libdir} -lmongoclient
Cflags: -I${includedir}
Added the dependency into the build function of the sepcific program which depends on the above library (i.e. MongoClient). Below is an example.
mobility = bld( target='bin/mobility', features='cxx cxxprogram', source='src/main.cpp', use='mob-objects MONGOCLIENT', )
After this, run the configure again, and build your code.

How do I write a waf file for a custom compiler?

I got sick of looking up the magic symbols in make and decided to try waf.
I'm trying to use calibre to make ebooks and I'd like to create a wscript that takes in a file, runs a program with some arguments that include that file, and produces an output. Waf should only build if the input file is newer than the output.
In make, I'd write a makefile like this:
%.epub: %.recipe
ebook-convert $ .epub --test -vv --debug-pipeline debug
Where % is a magic symbol for the basename of the file and $ a symbol for the output filename (basename.epub).
I could call make soverflow.epub and it would run ebook-convert on soverflow.recipe. If the .recipe hadn't changed since the last build, it wouldn't do anything.
How can I do something similar in waf?
(Why waf? Because it uses a real language that I already know. If this is really easy to use in scons, that's a good answer too.)
I figured out how to make a basic wscript file, but I don't know how to build targets specified on the command-line.
The Waf Book has a section on Task generators. The Name and extension-based file processing section gives an example for lua that I adapted:
from waflib import TaskGen
TaskGen.declare_chain(
rule = 'ebook-convert ${SRC} .epub --test -vv --debug-pipeline debug',
ext_in = '.recipe',
ext_out = '.epub'
)
top = '.'
out = 'build'
def configure(conf):
pass
def build(bld):
bld(source='soverflow.recipe')
It even automatically provides a clean step that removes the epub.