Has Boost ever been used in a regulated project (FDA, FAA)? - c++

While posting a comment recently, I found myself remarking that, in my experience, Boost is not widely used in regulated industries (FDA, FAA). In fact, I don't know of any project that uses it or has used it. I realize, though, that my experience may be lacking here, so I wanted to know if anybody had knowledge of a project using boost in a medical device or in an aviation flight system (lighting, cabin controls, cockpit equipment, etc.).
I am not sure this is the right place to ask it (maybe some other SO site), but I thought this would be a good place to start.
This is not a question about whether or not boost should be used in these areas, it is a question about anybody knowing if it has been used.
EDIT Some examples projects that might help clarify this: Aircraft cabin lighting systems, cabin management systems, cockpit instrumentation, infusion/food/insulin pumps, dialysis machines, laboratory diagnostic devices, blood center data collection systems, etc. Some are life sustaining or potentially flight critical, some collect data, some collect data used to make medical decisions, etc., but I believe all are covered as regulated devices by the FAA/FDA.
EDIT Outside (did not come with the development chain) libraries are often brought into these types of projects for other purposes (graphics libraries, drivers, USB stacks, etc.) These are treated as SOUP. The use of boost would fall under this approach. Does anybody know of a project where boost was used this way?
EDIT Boost is a very large framework, with multiple components. I'm looking for any part of it that has been used in a project. For example "Boost smart pointers" or "Boost Enable" or "Boost Array" or "Boost Optional", etc. But used in "whole", not part. Not used by looking at the Boost code and re-using the idea; used as a whole component of the system (i.e. the legal sense).
This is central to the question, because used in this way means that tradeoffs of handling the SOUP must be dealt with. This may place this question outside the scope of this SO site...not sure.

I would say that a lot if it comes down to how the system designer(s) are handling system safety at an architectural level.
Triplication
For instance if the approach taken is triplicate redundancy with a trusted voter then system trials/testing is going to be the major step in approving the implementation. Suppose that one of the triplicates' development team chose to use Boost. If the system as a whole passed all its test vectors then one could argue that one didn't need to trawl through Boost itself looking for implementation errors. Obviously if all three triplicates had chosen to use Boost then that would be cause for concern, because then the scope of the test vectors becomes unmanageable.
Triplication is a standard approach to handling the problem of using software resources like compilers, libraries and programmers all of which are at risk of error. Boost is just another one of these. One could argue that Boost shared pointers are clearly a way of reducing the risk of programmer error. That in the round is most likely to be beneficial to the system as a whole.
Important, but Not Safety Critical
Where it gets interesting is when triplication is not being used, and one is now in the realms of really having to trust stuff. Interestingly the way a lot of systems seem to get round the problem is to say that ultimately there is a human in control who is supervising and able to intervene in the event of system error.
For example the car industry has a set of programming rules called MISRA. The software for an ABS system is supposed to be written to this rule set, and the development tools are supposed to be set to enforce those rules on the source code. The idea is that this will reduce the risk of undetected bugs to an acceptable level. And because ultimately there is a driver driving the car they can always do their own cadence braking. And thus the car industry has avoided having to have a triplicate implementation of ABS.
They are extending the same philosophy to more complex car systems like adaptive cruise control and self driving cars. Personally I think that such an extension is unreasonable for self driving cars. The relevant legislation makes it clear that it is the 'drivers' fault if such a vehicle has a crash (ie you are still 'driving' it), but the glossy advertising won't dwell on that important aspect.
It's the same in the world of medical devices; there's supposed to be a nurse or someone monitoring the patient anyway, so the occasional blip is covered by that supervision. The whole thing is very poor anyway; whilst the software for a medical device may have been written tested and approved, quite often these things run on embedded Windows XP. They all get networked up and end up being infested with computer viruses, etc. The FDA won't let you have an auto update system inplace, only the device vendor can update it, and of course they can't ever hope to keep up. So you end up with a nicely written well tested and good piece of medical software running on top of an OS installation which has had all the world's hackers running around inside it doing god knows what. I think that the use of Boost in these circumstances is not going to add much to the overall system risk.
So if a MISRA compliant toolchain offered Boost as part of that toolchain then I don't see why that would be any different to a toolchain offering a standard C library. If the toolchain vendor is certifying it then it's no different to the situation with anything else.
There are weaknesses with that approach. In my experience I have come across a MISRA compliant tool chain in widespread use whose compiler turned out junk object code when all optimisations were turned on. I was actually able to verify this in the disassembly. I then took a look at the source code for toolchains's standard C library, and it clearly wasn't in itself written to the MISRA rule set, and furthermore it contained glaring, horrible bugs.
And yet there is no regulatory block to building, testing and selling a car ABS system using this tool chain so long as you tick the MISRA checkbox in the project settings. Adding Boost to that toolchain would hardly make matters worse.
Safety Critical Without Triplication
The final approach is no triplication and no human supervision. This is really hard because you then need formal proof of the correctness of every component of the tool chain, OS, drivers, chips, etc. AFAIK it's never been done for a truly safety critical system like nuclear reactors, flight control avionics, or other systems that really will definitely kill people if they go wrong.
The only thing that comes close so far as I can tell is Greenhill's compiler suite and their INTEGRITY operating system. They can give you (for a large fee) formal testing and verification evidence for every single line of the OS, all their libraries and their compiler, everything. If one were ever to attempt a truly safety critical system without triplication that would be a starting point.
I don't think they've done a C++11 yet, though I have added Boost to their toolchain and it worked just fine (it wasn't in a safety critical system I hasten to add).
Conclusion
Certainly if outfits like Greenhills with a well deserved and good reputation for reliable and thoroughly tested toolchains offer Boost then I think one would be in good position to use it in an regulated system. However I doubt that the whole of Boost would be offered that way; they are more likely to follow the compiler standards.
I also know that GCC has in the past been put through formal compiler validation testing so that it could be used in Stuff That Matters. I expect that that will get repeated sooner of later for the more recent incarnations that have taken on aspects of Boost.

I think the best answer we can have here is "yes and no." I will try to explain why.
Boost is a huge umbrella for many constituent libraries. Some of them depend on each other in various ways (e.g. when a higher-level library needs features provided by a lower-level part of Boost like Type Traits). This raises questions about the usefulness of a simple answer to the question, because if three parts of Boost have been used in a regulated project, but they are different parts than you want to use, it is of no little value to know this. And we will never know the full answer regarding all parts, because you cannot prove a negative (and there are too many parts to ever expect a "100% yes" answer).
Boost is (and always has been) rapidly evolving. Entirely new libraries are added all the time. ASIO is a big one that didn't exist at all until somewhat recently. This makes it even more difficult to answer the question, because over time there are parts of Boost which are young and not as well tested as others. Additionally, existing libraries sometimes go through backward-incompatible revisions (e.g. "Boost Filesystem 3" not too long ago).
Many parts of Boost end up in projects not by a traditional dependency but rather by copy-pasting code from Boost, and perhaps modifying it to taste (e.g. adding or removing support for specific compilers). Likewise, many parts of Boost end up in projects via the fact that Boost is sort of a proving ground for many new C++ standard library features, such as shared_ptr (C++11) and unordered_map (TR1). Some features which are part of the language today were originally part of Boost, so many people have used "Boost code" without even knowing it.
Note that code does not somehow become safer when it transitions to official status within the language--GCC has had bugs which did not exist in the Boost equivalent implementations of the same concepts. This matters when considering practical questions like "Should we allow the use of Boost in our project or should we restrict ourselves to what the compiler vendor gives us?" If you are thinking of using a feature which has been implemented very recently by your compiler vendor (say, within the past year), you may well be better off using a third-party (e.g. Boost) implementation which is more mature.
Finally, since it seems that the impetus for this question is to gain some reassurance that using Boost is not a bad idea for a production project: I would certainly say that in general using Boost is fine and good, with the huge caveat that you need a local expert in Boost who knows which parts of Boost should not be used in your domain. For example, Boost Spirit, Phoenix, and Wave are examples of libraries which have been in Boost for a while but which very few people truly, deeply understand. It's one thing to use library code you don't fully understand (we all do), but quite another to use code which almost no person on earth understands.
In summary, I don't think anyone will be able to give you the reassurance that you seek that Boost is OK for safety-critical systems. You need to evaluate it on your own, the same as you need to evaluate your own compiler vendor's software, your other third-party dependencies, and the code you write yourself. I have used all four categories of software quite a lot, and in my experience Boost had fewer critical bugs than any of the others, and fewer regressions than either GCC or my own code.

Related

How to find Boost libraries that does not contain any platform specific code

For our current project, we are thinking to use Boost framework.
However, the project should be truly cross-platform and might be shipped to some exotic platforms. Therefore, we would like to use only Boost packages (libraries) that does not contain any platform specific code: pure C++ and that's all.
Boost has the idea of header-only packages (libraries).
Can one assume that these packages (libraries) are free from platform specific code?
In case if not, is there a way to identify these kind of packages of Boost?
All C++ code is platform-specific to some extent. On the one side, there is this ideal concept of "pure standard C++ code", and on the other side, there is reality. Most of the Boost libraries are designed to maintain the ideal situation on the user-side, meaning that you, as the user of Boost, can write platform-agnostic standard C++ code, while all the underlying platform-specific code is hidden away in the guts of those Boost libraries (for those that need them).
But at the core of this issue is the problem of how to define platform-specific code versus standard C++ code in the real world. You can, of course, look at the standard document and say that anything outside of it is platform-specific, but that's nothing more than an academic discussion.
If we start from this scenario: assume we have a platform that only has a C++ compiler and a C++ standard library implementation, and no other OS or OS-specific API to rely on for other things that aren't covered by the standard library. Well, at that point, you still have to ask yourself:
What compiler is this? What version?
Is the standard library implementation correct? Bug-free?
Are those two entirely standard-compliant?
As far as I know, there is essentially no universal answer to this and there are no realistic guarantees. Most exotic platforms rely on exotic (or old) compilers with partial or non-compliant standard library implementations, and sometimes have self-imposed restrictions (e.g., no exceptions, no RTTI, etc.). An enormous amount of "pure standard C++ code" would never compile on these platforms.
Then, there is also the reality that most platforms today, even really small embedded systems have an operating system. The vast majority of them are POSIX compliant to some level (except for Windows, but Windows doesn't support any exotic platform anyways). So, in effect, platform-specific code that relies on POSIX functions is not really that bad since it is likely that most exotic platforms have them, for the most part.
I guess what I'm really getting at here is that this pure dividing line that you have in your mind about "pure C++" versus platform-specific code is really just an imaginary one. Every platform (compiler + std-lib + OS + ext-libs) lies somewhere along a continuum of level of support for standard language features, standard library features, OS API functions, and so on. And by that measure, all C++ code is platform-specific.
The only real question is how wide of a net it casts. For example, most Boost libraries (except for recent "flimsy" ones) generally support compilers down to a reasonable level of C++98 support, and many even try to support as far back as early 90s compilers and std-libs.
To know if a library, part of Boost or not, has wide enough support for your intended applications or platforms, you have the define the boundaries of that support. Just saying "pure C++" is not enough, it means nothing in the real world. You cannot say that you will be using C++11 compilers just after you've taken Boost.Thread as an example of a library with platform-specific code. Many C++11 implementations have very flimsy support for std::thread, but others do better, and that issue is as much of a "platform-specific" issue as using Boost.Thread will ever be.
The only real way to ever be sure about your platform support envelope is to actual set up machines (e.g., virtual machines, emulators, or real hardware) that will provide representative worst-cases. You have to select those worst-case machines based on a realistic assessment of what your clients may be using, and you have to keep that assessment up to date. You can create a regression test suite for your particular project, that uses the particular (Boost) libraries, and test that suite on all your worst-case test environments. Whatever doesn't pass the test, doesn't pass the test, it's that simple. And yes, you might find out in the future that some Boost library won't work under some new exotic platform, and if that happens you need to either get the Boost dev-team to add code to support it, or you have to re-write your code to get around it, but that's what software maintenance is all about, and it's a cost you have to anticipate, and such problems will come not only from Boost, but from the OS and from the compiler vendors too! At least, with Boost, you can fix the code yourself and contribute it to Boost, which you can't always do with OS or compiler vendors.
We had "Boost or not" discussion too. We decided not to use it.
We had some untypical hardware platforms to serve with one source code. Especially running boost on AVR was simply impossible because RTTI and exceptions, which Boost requires for a lot of things, aren't available.
There are parts of boost which use compiler specific "hacks" to e.g. get information about class structure.
We tried splitting the packages, but the inter dependency is quite high (at least 3 or 4 years ago).
In the meantime, C++11 was underway and GCC started supporting more and more. With that many reasons to use from boost faded (Which Boost features overlap with C++11?). We implemented the rest of our needs from scratch (with relative low effort thanks to variadic templates and other TMP features in C++11).
After a steep learning curve we have all we need without external libraries.
At the same time we have pondered the future of Boost. We expected the newly standardized C++11 features would be removed from boost. I don't know the current roadmap for Boost, but at the time our uncertainty made us vote against Boost.
This is not a real answer to your question, but it may help you decide whether to use Boost. (And sorry, it was to large for a comment)

Boost library acceptance in industry

I have seen lots of people suggesting the Boost library on Stack Overflow, so I am also thinking of learning it. But today I came across this link: http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml#Boost
I wanted to know about its acceptance in industry at a broader level. My current company also doesn't allow me to use this so I am confused whether to look into this or not.
Parts of Boost library is currently being accepted into the Standard library for C++0x and it is regarded as one of the top libs with a high industry acceptance. I'm actually unaware of any other library getting accepted into the C++ Standard Library on such a large scale.
"Ten Boost libraries are already included in the C++ Standards Committee's Library Technical Report (TR1) and will be in the new C++0x Standard now being finalized. C++0x will also include several more Boost libraries in addition to those from TR1. More Boost libraries are proposed for TR2."
You should definately look into this. Don't go by Google or any other large institution. They generally have to work to a subset of any complex language like C++. Hence, they'll have restrictions on which parts they can use so that it's easier to hire and train engineers to use the code base.
Additionally, Boost leverages many aspects of the higher forms of functionality within C++, case in point template meta-programming. Boost provides for a safer, though bulkier, form of functions as first class objects. They add in a more powerful "bind" which works so well with the standard library that I'd be lost without it. Finally, they have in place tuples and hash tables, both fundamental datatypes in modern development libraries.
In short, I really can't name one reason why you wouldn't want to look at Boost, even just to learn something. It's peer reviewed and largely platform independent. The source code is a treasure trove of information and more advanced programming techniques.
I think the who is using boost web page speaks for itself. Notably: Adobe, McAfee, and Real Networks probably qualify as industry.
My current company also does not allow me
to use [boost]. So I am confused whether
to look into it or not.
You might want to dig a little further and find out why. As others have said, Boost is a fantastically useful set of open source and peer reviewed libraries of extremely high quality. Look at their development LOC chart for an idea how long and how much $$ it would cost your company to reinvent the wheel.

Why does the chip control the language to choose

I've asked the question before what language should I learn for embedded development. Most embedded engineers said c and c++ are a must, but also pointed out that it depends on the chip.
Can someone clarify? Is it a compiler issue or what? Do chips come with their own specific compilers (like a c compiler or c++ compiler) and that's why you have to use the language the compiler knows? Is it not possible to code and compile it elsewhere, then burn it to the chip directly in its compiled state? (I think I heard an acquaintance say something to this effect)
I'm not sure how this works, as clearly I don't know much embedded systems or how they work. It's probably an easy answer for those of you who know.
Probably, they meant some toolchains do not support C++. Yes, many chips and boards do come with their own toolchains. Different processors have different instruction sets, which means a different compiler (or more specifically a different backend). That doesn't mean you always have to relearn everything. Many of these are based on GCC (often considered the most ported compiler). The final executable/image formats also vary, so you need a specific linker. Most likely, you will be (cross-)compiling the chip on a "regular" computer, then burning it to the chip. However, that doesn't mean you can use a typical compiler and linker targeted towards a desktop operating system.
It "depends on the chip" in three possible ways:
Some very constrained architectures are not suited to C++, or at least C++ provides constructs not suited to such architectures so offers no benefit over C. Most 8 bit devices fall into this category, but by no means all; I have seen useful C++ code implemented on MegaAVR for example.
Some devices are not supported by a C++ compiler. For example Microchip's dsPIC/PIC24 compiler is C only (third-party tools may have C++ support).
The chip architecture is designed specifically for a particular language; for example INMOS Transputers invariably ran OCCAM.
As well as C, C++, other possibilities are assembler, Forth, Ada, Pascal and many others, but C is almost ubiquitous; few chip vendors will release a new architecture or device without a C compiler being available from day-one. For other languages you will generally have to wait until a third-part decides to develop one, and that wait may be forever for a niche architecture.
Is it not possible to code and compile it elsewhere, then burn it to the chip directly in its compiled state?
That is called cross-compilation or cross-development, and is the usual development method for embedded systems. Most embedded systems lack the OS, file, performance and memory resources to self-host a compiler, and most developers want the comfort of a sophisticated development environment with IDEs, debuggers etc. in a familiar user-oriented desktop OS.
I'm not sure how this works, as
clearly I don't know much embedded
systems or how they work.
Get up-to-speed with some of these:
http://www.state-machine.com/arm/Building_bare-metal_ARM_with_GNU.pdf
http://www.eetimes.com/design/embedded
http://www.amazon.com/exec/obidos/ASIN/020179523X
http://www.amazon.com/Embedded-Systems-Firmware-Demystified-CD-ROM/dp/1578200997
Yes, there are many architectures for which a C compiler exists but a C++ compiler does not. The smaller and less fully-featured a processor you choose, the more likely this situation is to occur.
For embedded development, you almost always compile the code 'elsewhere', as you say, and then send it to the chip for execution/debugging. The process of compiling code for a different architecture than the compiler itself is built for is called 'cross-compiling'.
You are correct: chips have variations on compilers. Most/many modern chips have a gcc port; but not all.
The term 'embedded' is used to describe a vast range of hardware. Most embedded software engineering will consist of writing C/C++ code to produce a binary for a target microprocessor, but there are devices that you may work with that are not coded with compiled binary.
One example is a Programmable Logic Controller (PLC). These devices use a language called "Ladder Logic". It's a wonderful language. I have enjoyed working with it in the past.
Another thing you may encounter, as I have in the past, is devices that have interpreted BASIC emulators. Hopefully that is rare today.
C/C++ are a very good choice for firmware development. So the software you make will run on a embedded CPU/Microcontroller. In order to proper programmer the device, you will need to know the language and the device architecture.
The same code probably will not work in different devices. So, you have to learn the language, and the device architecture.
Another options are FPGAs, which are not microcontroller. FPGA are devices with specialized cell capable to transform itself in any type of synchronous circuit, including microcontroller. FPGAs are programed with Hardware Description languages, like verilog and VHDL. The "compiled" (synthesized) version of the software are called gateware.
The HDLs are the same languages used for ASICs designe also. The path to properly learn
the language are long. So I recommend start with C/C++ with pic form Microchip, which is a
low cost and highly accepted microcontroller.
If you intend to do FPGA development, the knowledge gained with C/C++/pic will be helpfull and important, because must FPGAs have embedded CPU/Microcontroller inside.
There is no direct scientific reason for it. In a lot of cases it has to do with the management and politics of the specific company.
Some companies are driven to create a turn key system and force you to buy that system and pay for maintenance. It locks out the individual developers, but there are many companies and esp government agencies that prefer this model because the support is often much better and you can often drive the direction of their products to suit your needs.
Other companies do not have the staff or the talent and outsource the solution and sometimes take whatever they can get. And you might end up with a one time developed tool that after the contractor leaves is never updated or fixed again, or if it is fixed it is a patch job by someone else. It takes money to make money, but if you run out of money before you can sell your product you still fail.
Sometimes you have companies that both have a staff that maintains their in-house must buy from them tool AND has individuals that also contribute to open tools like gcc.
Sometimes the politics or management in the company have individuals that have a strong opinion of how the world must be and only allow tools to be developed for a specific language. Or perhaps they are owned by or partner with or just like a company that has a specific language and this chip product came to be simply to support that language.
On top of all of this you have the very real technical problems of memory space, the quality and efficiency of the instruction set and how compiler friendly it is. Some architectures may be fine for assembler, but higher level compiled code chews up the limited memory resources too quickly.
Gcc in particular has a lot of problems internally (not as a people but the software/source code itself). I challenge you to write a back end, even with the tutorials that are out there. A company requires specialised talent in order to create and then maintain a gcc backend year after year, otherwise you get dumped. if your chip architecture is not 32 bit or bigger you are already fighting a losing battle with gcc, your chip architecture might be compiler friendly but just not friendly with the popular compilers design.
In the near future llvm is going to shine as a cross compiler relative to gcc because it has not yet built this internal bulk, and perhaps because the internal guts are themselves a defined language/system it may never suffer what has happened to gcc. As more folks get comfortable with llvm we will see a number of architectures ported to it. The msp430 backend was done specifically to demonstrate that you can add a target literally in an afternoon. By the end of next month, some motivated individual could have all of the targets most of us have ever heard of ported to llvm. And you dont have to build a cross compiler it is always a cross compiler. I only mention llvm because the door is now open for targets that have suffered from bad tools to recover.
Some companies, microcontrollers in particular, can and will make the programming interface proprietary so that you must use their programming tool (and or hack it and take your chances with publishing those results and or a cat and mouse of them changing it to defeat you). And they may have only made tools for Windows leaving the linux and apple folks hanging in the wind. Or they make it so that the only binaries it will load are the ones generated by their tools, here again you may hack through the binary format allowing an alternate compiler, and they may or may not work to defeat you.
Despite the technical problems the biggest is the companies politics, management, marketing teams, and supply of or lack of talent in the engineering staff. The bottom line, follow the dollars not the technology or science to understand why this language is supported and not that, or the support for this language is good, bad, or marginal.
What language to learn as a result of all of this? Start with assembler on at least three different architectures. Then C and then C++ if you feel you really need it. C and assembler are your primary languages for embedded (depending on your definition of embedded). No, we write assembler mostly for initial boot code and to support C, interrupt stuff or special instructions that are needed that the compiler cannot create. There are places like microcontrollers where it may very well make sense to use assembler for various reasons like tools, limited chip resources, etc. Even if you dont use assembler knowing it makes you a much better high level programmer.
You do need to decide what your definition of embedded is. Is it api and library calls for an application on a(n embedded) linux system (indistinguishable from the same program/calls on a desktop system). Or at the other end of the spectrum are you talking a microcontroller with maybe 256 or 1024 bytes (not mega or giga, but bytes) of program space? Or something in the middle? The majority of the "embedded" folks out there are closer to the api calls for applications on an operating system (rtos, linux, wince, etc), than the deeply embedded, so that means C, maybe C++ (always be able to fall back on C), trying to avoid python and other scripty languages that are resource hogs.
Some 8-bit parts cannot efficiently access data from a stack. Instead of using a stack to pass parameters, auto-variables and parameters are statically allocated; typically, a linker allocates the automatic variables for main() at one end of memory, and then allocate the variables for functions that are called by main and nothing else, then allocate the variables for functions that are called by those functions and nothing else, etc. This will yield an optimal allocation fairly easily, subject to some caveats:
Recursion can only be supported by adding code to explicitly copy variables onto some sort of stack arrangement; in many compilers, it's simply not supported at all.
If a function looks like it "might" call another function, the linker will assume it can do so in all cases (e.g. it may be that when 'foo' calls 'bar', one of its parameters might always have a value such that 'bar' won't call 'boz', but the linker won't know that).
Any call to a function pointer with a certain signature will be regarded as a call to all functions with the same signature whose address is taken.
If the evaluation of more than one parameter to a function requires making additional function calls, additional temporary storage must generally be pessimistically allocated even if optimal placement of the parameter storage could have avoided that.
There are many types of C programs for which the above restrictions pose no problem at all, and many more for which they pose a nuisance but not a huge one (e.g. by adding dummy parameters or return values to ensure different classes of indirectly-called functions have different signatures). Unfortunately, the code generated by an C++ to C pre-compiler will almost always involve function pointers whose call graph cannot be reasonably divined, so using C++ on such a platform is apt to be difficult if not impossible.

Boost advocacy - help needed

Possible duplicates
Is there a reason to not use Boost?
What are the advantages of using the C++ BOOST libraries?
OK, the high-level question is "Please provide me with what you consider to be the most effective arguments of why entire Boost, or some specific parts of it, should be compiled on our company's system and endorsed in software engineering standards".
Details of what I need:
Would gladly accept both positive arguments (why install), as well as proposed rebuttals of likely counter-arguments I might hear (see question context below).
Arguments should be made aimed at both technical Software Engineering team members and/or very technical senior managers - in other words, for the latter, the details of the argument may/should be technical, but the thrust of the argument should be "how would this make/save the company X money vs losing the company Y money as a cost of adding it to our toolset".
Context of the question:
I'm a developer in a company with several hundred developers, many dosens of whom do C++.
I had the (mis)fortune of being reassigned from my beloved Perl development spot to a team where I am also doing C++ development. So far I found numerous things that I could easily have done in Perl that are very hard/cumbersome to do in C++ (foreach loop as an example), and anytime I hit one of these, the answer 50% likely ends up being "You can't do this in standard C++ but you can do it with Boost"
Our toolkit includes some legacy RogeWave libraries, and VERY limited number of Boost libraries (e.g. no regex, no foreach), of very old vintage.
Any development must use libraries compiled and vetted by Software Engineering team. That is a hard and fast rule.
SE team is somewhat resistant to adding new libraries, for a variety of reasons (e.g. effort to do this; functionality conflicts with RogeWave, for example for RegEx; the risk of installing and using any new software; cost of educating developers, etc...). They will add the libraries if presented with sufficient business need or majorly convincing cost/benefit ratio argument, but they have pretty tough threshold.
So, I'm looking for examples of which parts of Boost are so wonderful (with exact cost/benefit estimates) that installing them would be an Obviously Worth It Effort for Software Engineering.
Thanks in advance for any ideas/suggestions/examples.
Please don't mark this question as subjective as I am looking for measurable answers, not merely wonderful feelings :)
Wherever I worked in the last decade, when they had their own smart pointer class, I found bugs in that - usually within a few weeks. And, no, I never went and looked at it hoping to find errors.
I got into the habit of posting the following quote from the TR1 smart pointer proposal:
The Boost developers found a shared-ownership smart pointer exceedingly difficult to implement correctly. Others have made the same observation. For example, Scott Meyers [Meyers01] says:
"The STL itself contains no reference-counting smart pointer, and writing a good one - one that works correctly all the time - is tricky enough that you don't want to do it unless you have to. I published the code for a reference-counting smart pointer in More Effective C++ in 1996, and despite basing it on established smart pointer implementations and submitting it to extensive pre- publication reviewing by experienced developers, a small parade of valid bug reports has trickled in for years. The number of subtle ways in which reference-counting smart pointers can fail is remarkable."
This plus a detailed analysis of the bug(s) I found usually got me the job of incorporating the boost libs into the code base. :)
It seems to me you're doing this the wrong way around. Since proposals to add new libraries are going to be met with a lot of resistance, don't even bother trying to argue for boost as a whole. Choose your battles instead.
Find the specific Boost libraries which you know (with your knowledge of the application it's to be used in) will be useful and save time and money. And then propose adding those.
I could easily list the Boost libs I've found useful, and why I think they're great, but I don't know if they'll be of any use in your application.
Push for individual boost libraries to be included, and then perhaps, over time, so many of them will be included that it'll be simpler for everyone to just include all of Boost.
It's an open standard not controlled by a specific company ( no licensing costs )
It's cross platform
It's expertly designed / written and very fast / efficient extensively tested
There are open source implementations which your team can compile themselves.
Boost will soon become part of the standard C++ STL
Here is a slightly oldish 2005 article on Dr. Dobbs discussing the upcoming C++0x standard.
http://www.ddj.com/cpp/184401958
I had to maintain a component using that old vintage Tools.h++ from Roguewave, on a Solaris system.
On Solaris, if we want to use boost, we need to use either gcc, or SunStudio with STLport implementation of the standard (instead of Roguewave one). And as Tools.h++ requires the old Roguewave pre-standard implementation of the standard -- on Solaris --, I had to give up on boost.
In the end I rewrote a simplified version of a few boost-like features I needed.
If you are in that same situation (*) , you would not be able to move from Roguewave library to boost that easily. There is a non negligible cost in this operation, as for instance pointer containers from both libraries have quite different interfaces.
(*) Where we can't slowly change bits by bits of the old code to progressively use boost. In that situation, the migration has to be radical and to change simultaneously every occurrence of Tools.h++ by something more trendy, or even better.
NB: Most people are able to progressively use boost in old projects, and may miss a very important, and yes technical, point. Hence my negative answer.
Boost is a great tool and an invaluable part of our product development (we'd be lost without smart_ptr)... but because it is changing so fast the stability of releases can be effected.
For example, we were happily introducing new versions of Boost as soon as they came out without thinking twice. That is until we were stung with a bug in the 1.35 threading library that produced occassional (ie difficult to debug) but critical errors. Fortunately we identified the issue before anything was released to the public and could move back to 1.34.
Ever since then we've taken a specific version, extensively tested it, and not updated without a compelling reason to do so.
Here are two suggestions for advocating boost:
Who's Using Boost? (http://www.boost.org/users/uses.html)
lots of major projects use boost: (e.g., adobe photoshop, CERN)
Boost Project Cost (http://www.boost.org/development/index.html)
How much it would cost to hire a team to write boost from scratch? There's a nifty (somewhat gimmicky) calculator there that helps to put it in perspective.

Common libraries in a large team

Assume you have five products, and all of them use one or more of the company's internal libraries, written by individual developers.
It sounds simple but in practice, I found it to be very difficult to maintain.
How do you deal with the following scenarios:
A developer unintentionally introduces a bug and breaks everything in production.
Every library has to mature, That means the API needs to evolve, so how do you deploy the updated version to production if every developer needs to update/test their code while they are extremely busy on other projects? Is this a resource and time issue?
Version control, deployment,and usage. Would you store this in one global location or force each project to use, say, svn:externals to "tie" a library?
I've found that it is extremely hard to come up with a good strategy. My own pet theory is this:
Each common library has to have a super-thorough set of tests or else it should never be common, even if it means someone else duplicates the effort. Duplicate untested code is better than common untested code (you break only one project).
Each common library has to have a dedicated maintainer (can be offset by a really good test suite in a smaller team).
Each project should check out the version of the library that is known to work with it. This means a developer does not have to get pulled away to update API usage, as the common code gets updated. Which it will be. Every non-trivial piece of code evolves over months and years.
Thank you for your thoughts on this!
You have a competing set of goals here. First, a library of reusable components must be open enough that people from the other projects can easily add to it (or submit components to it). If it's too difficult for them to do that, they'll build their own libraries, and ignore the common one, leading to a lot of duplicate code and wasted effort. On the other hand, you want to control the development of the library enough that you can ensure its quality.
I've been in this position. There's no easy answer. However, there are some heuristics that can help.
Treat the library as an internal project. Release it on regular intervals. Ensure that it has a well-defined release procedure, complete with unit tests and quality assurance. And, most important, release often, so that new submissions to the library show up in the product frequently.
Provide incentives for people to contribute to the library, rather than just making their own internal libraries.
Make it easy for people to contribute to the library, and make the criteria clear-cut and well-defined (e.g., new classes must come with unit tests and documentation).
Put one or two developers in charge of the library, and (IMPORTANT!) allocate time for them to work on it. A library that is treated as an afterthought will quickly become an afterthought.
In short, model the development and maintenance of your internal library after a successful open source library project.
I don't agree with this:
Duplicate untested code is better than
common untested code (you break only
one project).
If you are all equally likely to create bugs by implementing the same thing, then you'll all have to fix potentially different bugs in each instance of the "duplicate" library.
It also seems that it'd be much faster/cheaper to write the library once and, instead of having multiple other teams write the same thing, have some resources allocated to testing.
Now to solve your actual problem: I'd mimic what we do with real third-party libraries. We use a particular version until we're ready, or compelled to upgrade. I don't upgrade everything just because I can--there has to be a reason.
Once I see that reason (bug fix, new feature, etc.), then I upgrade with the risk that the new library may have new bugs or breaking changes.
So, you're library project would continue development as necessary, without impacting individual teams until they were ready to "upgrade".
You could publish releases or peg/branches/tag svn to help with all this.
If all teams have access to the bug tracker, they could easily see what known issues exist in the upgrade-candidate before they upgrade, too. Or, you could maintain that list yourself.
#Brian Clapper provides some excellent guidelines for how to run your library as a project in his answer.
I used to work in a similar situation to what you're describing, only my company had dozens of software products. I worked on the team that was responsible for maintaining and upgrading the core set of libraries that everyone else used.
We dealt with those scenarios as follows:
Test the heck out of the core libraries. Maintaining duplicate code is a nightmare. You're not just maintaining the core and one copy. Somewhere in your company's source control there are several copies of the same code. We had dozens of products, so that would have meant dozens of copies. Hunt them down and kill them.
We had a small team of 10-12 developers dedicated to maintaining the core library and its test suites. We were also responsible for fielding calls from the other 1100 developers in the company about how to use the core library, so as you can imagine, we were very busy.
Each other project needs to work with the version of the core library that it is known to work with. You can use version control branches to test new releases of the core library with old products to make sure you don't break code that works. If the core team does a thorough job of testing, this should go very smoothly. The only time this ever got really complicated for us was when the core API changed, or when we flat out screwed something up. Even if you're very confident in your core testing, use branches to test individual products.
I agree - this is difficult. In our small team (consulting .. not a product company - which made it harder), we had one common component that stood out from the others. In this case the recipe for success was:
Make a good developer responsible for developing the component
Make a good developer the gatekeeper for maintaining the component
Make sure all upgrades (there were several) are backward compatible
Make sure there is some basic documentation (or a simple reference application) explaining how the component is to be used
Make sure all developers know that the component exists (!) and where they can find it (along with the code, if they wish to review it)
Give developers the ability to review the code and suggest better implementations or refectoring, but have the final mods go through an experienced gatekeeper. When the component were upgraded, older apps did not have to upgrade. If we did a new release, we evaluated if we wanted to upgrade, and if we did, all we needed to do was swap the libraries - no code needed to change, unless we wanted to use some new features available through the upgrade. Resistance is inevitable, but sometimes it is a good sort of resistance when it comes from good developers who have better ideas for a new generation or refactored component.
Treat the development of the libraries like any other product. Each library has its own repository, its own releases and version numbers. The compiled and officially tested versions of the library are also kept in the repository. Document features and changes from version to version.
Then use the libraries like you would using third party libraries. Your product uses only fixed versions of the compiled libraries. Switch to a new version when you really need to and be aware that this involves more testing. Add the versions you use to your version control.
When you find a bug or require a new feature in a library, a new version or sub-version is created. Using a version control system like svn makes this easy. When you need the source code for debugging purposes, export it and include it in your projects, but do not change it there, but fix problems in the libraries' repositories.
This way, every team can contribute to the libraries without endangering the work of the other teams. Switching versions is done deliberately and not by accident.
Create an Anti-corruption (DDD) layer for the existing library... this is nothing but a facade.. and then write unit-test for this anti-corruption layer... Now even if someone upgrade/update the library you would know if something is broken by running the unit tests...
These tests could also serve as documentation of contract... and not every project that need to use the library has to write this anti -corruption layer, if they are using the same exact functionality..
"Duplication is the root of all evil"
Sounds to me like you need:
An artifact repository like Ivy so you can have the libraries shared and versioned with a distinction between versions that are API stable and ones that are "maturing"
Tests for the libraries
Tests for the projects using them
A continuous integration system so that when an incompatibility or bug is introduced both the project and the original library developer are notified
I think that one shared library is better than 3 duplicate ones (and 1 tested is definitely better than 3 untested). That's because when you find and fix problem, this makes the whole application area more solid (and development and maintenance are more efficient).
BTW, that't one of the reasons (apart from contributing back to the community) why our company exposes our .NET shared libraries to the public as open-source.
Plus, there's less code to write. And you can designate one dev to enforce good development practices on the library and its usages (i.e. through code contracts enforced on the unit tests within library consumers). This improves quality and and reduces maintenance costs.
We store shared libraries as binaries in the solution. That comes from the logical requirement that any solution has to be atomic and independent (this rules out svn:externals links).
API compatibility is not an issue at all. Just let your integration server rebuild and retest the whole product stack (while updating all the inner references and propagating changes) and you'll always be sure that all internal API's are solid. And whomever breaks the API has to either fix it or update the usages.
Duplication is the root of all evil
I would argue that unchecked government is the root of all evil :)
I do get a lot of flack for even suggesting that duplication should be an option. I understand why, but let me complicate this a bit.
Say you have a fairly large library that doesn't actually do anything in particular - it's just a collection of utilities. There are NO tests for this library - at all. You need only one function from it. Say, something that parses out a file's extension.
Pop quiz: do you just write something as small as this in your own project, or you bite the bullet and use the free-for-all untested set of utilities, which WILL break your application if someone breaks the function?
Also, imagine you are in environment where writing tests is not part of the culture, since most projects are very intense and have a very short development span.
Duplicating large systems - such as client registration - would be dumb beyond belief, of course. However, aren't there any cases where it is safer to duplicate something fairly small in your project if the alternative is not safe enough (no system for maintaining common code).
Think of it this way - and this happens all the time - multiple contractors working on different projects, for the same company. They don't even know about each other.
My argument is this:
If a team cannot dedicate to maintaining a solid common codebase, or if the environment does not give them enough time to, it's best to let them work as separate "contractors".
You will STILL need to use large existing systems that simply cannot be duplicated.
Duplicating large systems - such as
client registration - would be dumb
beyond belief,
That's why those systems publish external interfaces.
If you define a library as shared code between projects: in my experience that's almost always bad. A project should be stand alone, and updates for one project should not affect other projects.
Even if you start with libraries, you'll end up duplicating code anyway. Want to hotfix project 1? It was released with library 1.34, so to keep the hotfix as small as possible, you'll go back to library 1.34 and fix that. But hey-- now you did exactly waht the library was supposed to avoid-- you've duplicated the code.
Every developer uses Google to find code and copy it into his application. That's probably how they found Stack Overflow in the first place. Imagine what would happen if Stackoverflow published libraries instead of code snippets, and you'll get an idea of the problems that afflicts many well meaning library creators.
Libraries tend to be generic solutions to specific problems. Typically, the generic solution is more complex than the sum of the two specific solutions. This means you need one good programmer to solve a problem that could have been solved by two morons. Sounds like a bad tradeoff to me :D
I would like to point a problem in the solutions suggested above: treating the library as an internal project with its own versioning scheme.
The problem
If your company has more than one product (lets say two teams - two product: A, B), than each product has its own release schedule. Let's give an example: Team A is working on product A v1.6. Their release schedule is two weeks from now (suppose Oct 30th). Team B is working on product B v2.4. Their release schedule is 1.5 months from now - Nov 30th. Lets assume both are working on acme-commons-1.2-SNAPSHOT. Both are adding changes to acme-commons, as they need it. Couple of days before Oct 30th, team B introduce a change which is buggy, to acme-commons-1.2-SNAPSHOT. Team A is getting into stress mode since they discover the bug 1 day prior to code freeze.
This scenario shows that treating a common library as a third party library is almost impossible. The trivial, but problematic, solution is for each team to have their own copy of the version they are about to change. For example, product A v1.2 will create a branch (and version) for acme-commons named "1.2-A-1.6". Team B will also create a branch in acme-commons called "1.2-B-2.4". Their development will never collide and they will be stress free once they tested their product.
Of course, someone will have to merge their changes back to the original branch (lets say master or 1.2).
The problems I found with this solution is:
Branch inflation - the tree structure will be very puffy and it will be harder to understand the flow of changes/merges.
Merges back to 1.2 will probably never happen - Unless a team/developer is dedicated to this library, the chances that Team A or Team B merges their code back to 1.2 branch is slim. They will always stay focused on their tasks, thus creating and using their own branch space. Allocation of a developer/team is expensive, thus not always a viable solution.
I'm still trying to figure this one out, so any thoughts of this matter are welcome