How to read Bazels binary build event protocol file? - c++

I want to implement fetching of compiler warnings with Bazel (Bazel based build). I know that there are files which can already be used for this. These files are located at:
and are named stderr-XY.
Bazel has the ability to save all build events in a designated file. Note that currently (Bazel 0.14) there are 3 supported formats for that designated file, and those are: text file, JSON file and binary file. This question is related only to binary file.
If I have understood Google's protocol buffers correctly, the workflow for them to be implemented and to work is:
You specify how you want the information you're serializing to be structured by defining protocol buffer message types in .proto files.
Once you've defined your messages, you run the protocol buffer compiler (protoc) for your application's language on your .proto file to generate data access classes.
Include generated files in your project and use generated class in your code. By use it is meant to populate, serialize and retrieve protocol buffer messages (i.e. for C++ which is the programming language that I use it is possible to use SerializeToOstream and ParseFromIstream methods for such tasks)
To conclude this question:
As it is stated here:
"Have Bazel serialize the protocol buffer messages to a file by specifying the option --build_event_binary_file=/path/to/file. The file will contain serialized protocol buffer messages with each message being length delimited."
I do not see the way to avoid the fact that the developer who wants to use Bazel's functionality to write build events in a binary file, needs to know the "format" or even more concise to say Class architecture to read that binary file. Am I missing something here? Can all of this be done and how?
Also, I have tried to use protoc --decode_raw < bazelbepbinary.bin and it says:
Failed to parse input.
All of this was done on Ubuntu 16.04 and at the moment I'm not sure what is the GCC version but I will add GCC version to the question when I have to access to that information.
My side question is: is it possible to capture only those build events which reflect build warnings (without using some kind of filter e.g grep on generated file?) I have read the documentation and used:
bazel help build --long | grep "relevant_build_event_protocol_keywords"
and was unable to find anything like that in the API.


Boost.Log Configuration Files

I'm adding logging to an old C++ program. After some research, I've decided to use Boost Log . The documentation is filled with examples of creating sinks and filters. However, I couldn't find any example of a log configuration file.
Is there a way to configure logging from a file that doesn't have to be compiled? Similar to what log4net has? Or Python (well, since Python isn't compiled, anyway...) ?
Eventually I found the official documentation, either it was added recently, or it is well hidden so that I didn't see it before:
Unfortunately, I can't find an exhaustive answer neither, but some observations:
Certainly it is possible to use a configuration file:
boost::log::init_from_stream(std::basic_istream< CharT > &)
Example of the file (from Boost log severity_logger init_from_stream):
Format="%LineID%: <%Severity%> - %Message%"
From the following link you can identify additional valid setting keys and values (e.g. Destination=TextFile, Filter=, AutoFlush=, FileName=)
Constants in boost's parser_utils.hpp give another idea of keywords that are by default supported by the configuration file (E.g. section [Core] with key DisableLogging).
Providing settings for user defined types is described here (with a corresponding snippet of the configuration file at the end of the page):
It seems to me that it is difficult to find a description of the configuration file format entries because the valid entries are derived from the source code implementing the sinks, filters etc. This implementation may be even user defined so it is impossible to give explicit configuration format description.
Maybe you can try to create your configuration in a programmatic way and when transforming it to the form of the configuration file, you can open separate questions for the particular properties that you are not able find out how to set them.

How to serialize a XML file to Thrift file ? (to put in HDFS)

Since many days, I inquire a lot of informations about Big Data and especially about Thrift and HDFS/Hadoop.
I have many many XML files which I want to store in a HDFS file system. (and after, make statistics etc... from the data of these files)
So I would like to serialize my XML files with Thrift. (to validate the structure and to make durable ..)
Then, stock them in HDFS.
Is it possible ? ( XML => Thrift => HDFS ) without use RPC service.
To do the test, I would like to use a linux VM (for HDFS) and PHP language (for thrift).
Thank you.
You can use the serialization part without the RPC part, yes. Look for "serializer" in the Thrift source tree, you should find some examples. If not for PHP, then for sure for some other languages.
You have to do a little work on your own, because there is not such a thing a "the" way to convert XML into Thrift structures. The steps are - roughly - as follows
define the data structures to hold the XML data as Thrift IDL constructs
generate the desired code using the Thrift Compiler
add the serializer code as needed
put together some code that
reads each XML file
builds the Thrift structures from it
serializes the data and puts them into HDFS
Depending on the layout of your XML data and on the number of XML structures used, this may need some effort. It could be an idea to generate at least the IDL file programmatically by some other tool, maybe even some of the other code needed. Thrift cannot support you with this, although it could be an option - again, depending on your current situation, language and tools available.

Is it a good idea to include a large text variable in compiled code?

I am writing a program that produces a formatted file for the user, but it's not only producing the formatted file, it does more.
I want to distribute a single binary to the end user and when the user runs the program, it will generate the xml file for the user with appropriate data.
In order to achieve this, I want to give the file contents to a char array variable that is compiled in code. When the user runs the program, I will write out the char file to generate an xml file for the user.
char* buffers = "a xml format file contents, \
this represent many block text \
from a file,...";
I have two questions.
Q1. Do you have any other ideas for how to compile my file contents into binary, i.e, distribute as one binary file.
Q2. Is this even a good idea as I described above?
What you describe is by far the norm for C/C++. For large amounts of text data, or for arbitrary binary data (or indeed any data you can store in a file - e.g. zip file) you can write the data to a file, link it into your program directly.
An example may be found on sites like this one
I'll recommend using another file to contain data other than putting data into the binary, unless you have your own reasons. I don't know other portable ways to put strings into binary file, but your solution seems OK.
However, note that using \ at the end of line to form strings of multiple lines, the indentation should be taken care of, because they are concatenated from the begging of the next line:
char* buffers = "a xml format file contents, \
this represent many block text \
from a file,...";
Or you can use another form:
char *buffers =
"a xml format file contents,"
"this represent many block text"
"from a file,...";
Probably, my answer provides much redundant information for topic-starter, but here are what I'm aware of:
Embedding in source code: plain C/C++ solution it is a bad idea because each time you will want to change your content, you will need:
It can be acceptable only your content changes very rarely or never of if build time is not an issue (if you app is small).
Embedding in binary: Few little more flexible solutions of embedding content in executables exists, but none of them cross-platform (you've not stated your target platform):
Windows: resource files. With most IDEs it is very simple
Linux: objcopy.
MacOS: Application Bundles. Even more simple than on Windows.
You will not need recompile C++ file(s), only re-link.
Application virtualization: there are special utilities that wraps all your application resources into single executable, that runs it similar to as on virtual machine.
I'm only aware of such utilities for Windows (ThinApp, BoxedApp), but there are probably such things for other OSes too, or even cross-platform ones.
Consider distributing your application in some form of installer: when starting installer it creates all resources and unpack executable. It is similar to generating whole stuff by main executable. This can be large and complex package or even simple self-extracting archive.
Of course choice, depends on what kind of application you are creating, who are your target auditory, how you will ship package to end-users etc. If it is a game and you targeting children its not the same as Unix console utility for C++ coders =)
It depends. If you are doing some small unix style utility with no perspective on internatialization, then it's probably fine. You don't want to bloat a distributive with a file no one would ever touch anyways.
But in general it is a bad practice, because eventually someone might want to modify this data and he or she would have to rebuild the whole thing just to fix a typo or anything.
The decision is really up to you.
If you just want to keep your distributive in one piece, you might also find this thread interesting: Store data in executable
Why don't you distribute your application with an additional configuration file? e.g. package your application executable and config file together.
If you do want to make it into a single file, try embed your config file into the executable one as resources.
I see it more of an OS than C/C++ issue. You can add the text to the resource part of your binary/program. In Windows programs HTML, graphics and even movie files are often compiled into resources that make part of the final binary.
That is handy for possible future translation into another language, plus you can modify resource part of the binary without recompiling the code.

How to create a dynamic message with Protocol Buffers?

Say we want to create our message not using any preexisting .proto files and compiled out from them cpp/cxx/h files. We want to use protobuf strictly as a library. For example we got (in some only known to us format) message description: a message called MyMessage has to have MyIntFiels and optional MyStringFiels. How to create such message? for example fill it with simple data save to .bin and read from that binary its contents back?
I looked all over dynamic_message.h help description and DescriptorPool and so on but do not see how to add/remove fields to the message as well as no way to add described on fly message to DescriptorPool.
Can any one please explain?
Short answer: it can't be used that way.
The overview page of Protobuf says:
XML is also – to some extent – self-describing. A protocol buffer is only meaningful if you have the message definition (the .proto file).
Meaning the whole point of Protobuf is to throw-out self-descriptability in favor of parsing speed ==> it's just not it's purpose to create self describing messages.
Consider using XML or JSON or any other serialization format. If the protection is needed, you can use symmetric encryption and/or lzip compression.

generate C/C++ command line argument parsing code from XML (or similar)

Is there a tool that generates C/C++ source code from XML (or something similar) to create command line argument parsing functionality?
Now a longer explanation of the question:
I have up til now used gengetopt for command line argument parsing. It is a nice tool that generates C source code from its own configuration format (a text file). For instance the gengetopt configuration line
option "max-threads" m "max number of threads" int default="1" optional
among other things generates a variable
int max_threads_arg;
that I later can use.
But gengetopt doesn't provide me with this functionality:
A way to generate Unix man pages from the gengetopt configuration format
A way to generate DocBook or HTML documentation from the gengetopt configuration format
A way to reuse C/C++ source code and to reuse gengetopt configuration lines when I have multiple programs that share some common command line options
Of course gengetopt can provide me with a documentation text by running
command --help
but I am searching for marked up documentation (e.g. HTML, DocBook, Unix man pages).
Do you know if there is any C/C++ command line argument tool/library with a liberal open source license that would suite my needs?
I guess that such a tool would use XML to specify the command line arguments. That would make it easy to generate documentation in different formats (e.g. man pages). The XML file should only be needed at build time to generate the C/C++ source code.
I know it is possible to use some other command line argument parsing library to read a configuration file in XML at runtime but I am looking for a tool that generate C/C++ source code from XML (or something similar) at build time.
Update 1
I would like to do as much as possible of the computations at compile time and as less as possible at run time. So I would like to avoid libraries that give you a map of the command line options, like for instance boost::program_options::variables_map ( tutorial ).
I other words, I prefer args_info.iterations_arg to vm["iterations"].as<int>()
User tsug303 suggested the library TCLAP. It looks quite nice. It would fit my needs to divide the options into groups so that I could reuse code when multiple programs share some common options. Although it doesn't generate out the source code from a configuration file format in XML, I almost marked that answer as the accepted answer.
But none of the suggested libraries fullfilled all of my requirements so I started thinking about writing my own library. A sketch: A new tool that would take as input a custom XML format and that would generate both C++ code and an XML schema. Some other C++ code is generated from the XML schema with the tool CodeSynthesis XSD. The two chunks of C++ code are combined into a library. One extra benefit is that we get an XML Schema for the command line options and that we get a way to serialize all of them into a binary format (in CDR format generated from CodeSynthesis XSD). I will see if I get the time to write such a library. Better of course is to find a libraray that has already been implemented.
Today I read about user Nore's suggested alternative. It looks promising and I will be eager to try it out when the planned C++ code generation has been implemented. The suggestion from Nore looks to be the closest thing to what I have been looking for.
Maybe this TCLAP library would fit your needs ?
May I suggest you look at this project. It is something I am currently working on: A XSD Schema to describe command line arguments in XML. I made XSLT transformations to create bash and Python code, XUL frontend interface and HTML documentation.
Unfortunately, I do not generate C/C++ code yet (it is planed).
Edit: a first working version of the C parser is now available. Hope it helps
I will add yet another project called protoargs. It generates C++ argument parser code out of protobuf proto file, using cxxopts.
Unfortunately it does not satisfy all author needs. No documentation generated. no compile time computation. However someone may find it useful.
UPD: As mentioned in comments, I must specify that this is my own project