Strategy to omit unused boost src files while shipping source code

Strategy to omit unused boost src files while shipping source code - c++

I'm using
#include <boost/numeric/ublas/matrix.hpp>
in fact that's the only boost file I've included. Now I want to ship the source code and I was hoping not have to include all hundreds of MBs of boost_1_67_0.
How to deal with this issue?

This is simply something you would add to the list of build-dependencies of your C++ source code.
This kind of dependency could be made technically "bound" to your source code distribution via your version control system. In Git, for example, you could link to certain Boost libraries via a sub-module that links to their official git mirrors (github.com/boostorg as of this writing). When cloning your repository, it would then be an option to take in the Boost libraries at the same time.
Though, taking the size of the Boost headers into consideration, having them installed as a system-wide library, might be less complicated. Tools like CMake can help you write the logic for header-inclusion so you can support different header locations.
Of course, if what you seek is to create a fully isolated copy of your source code, the approach to bake all code into one massive header-file might be an option as well (but it should not be necessary).

You can preprocess the one header file you need, which will expand all its #includes:
c++ -E /usr/include/boost/numeric/ublas/matrix.hpp -o boost_numeric_ublas_matrix.hpp
Be aware though: this will expand even your system header files, so it assumes your users will build on the same platform. If they might compile on different platforms, you should simply omit the Boost code from your project and let the users install it themselves in whatever manner they choose.

Related

How properly specify the #include paths in c++ to make your program portable

I've been struggling back and forth with this for a while now looking stuff up and asking questions and I'm still at a crossroads. What I've done so far and where I'm currently at based on what I've been told is this: I've added 2 directories to my repo: src for my .cpp files and include for my .hpp files. In my include directory I have all the .hpp files directly in the folder where as in my src directory I have several sub-directories grouping my .cpp files according to the purpose they serve e.g. \src\ValuationFunctions\MonteCarloFunctions\FunctionHelpers.
I've changed the name of all the #include "header.h" to #include "..\include\header.h". This works for my main file which is directly in the src folder but I found now that it doesn't work for my .cpp files that are in sub-directories like in my example above, it would seem I would have to navigate back to the root folder doing something like #include "../../..\include\header.h" which obviously can't be the way to go.
How do I make this work, am I even on the right track here? I have uploaded my repo to github (https://github.com/OscarUngsgard/Cpp-Monte-Carlo-Value-at-Risk-Engine) and the goal is for someone to be able to go there, see how the program is structured, clone the repo and just run it (I imagine this is what the goal always is? Or does some responsibility usually fall on the cloner of the repo to make it work?).
I'm using Windows and Visual Studios, help greatly appreciated.

How properly specify the #include paths in c++ to make your program portable
Please read the C++11 standard n3337 and see this C++ reference website. An included header might not even be any file on your computer (in principle it could be some database).
If you use some recent GCC as your C++ compiler, it does have precompiled headers and link-time optimization facilities. Read also the documentation of its preprocessor. I recommend to enable all warnings and debug info, so use g++ -Wall -Wextra -g.
If you use Microsoft VisualStudio as your compiler, it has a documentation and provides a cl command, with various optimization facilities. Be sure to enable warnings.
You could consider using some C++ static analyzer, such as Clang's or Frama-C++. This draft report could be relevant and should interest you (at least for references).
The source code editor (either VisualStudioCode or GNU emacs or vim or many others) and the debugger (e.g. GDB) and the version control system (e.g. git) that you are using also have documentation. Please take time to read them, and read How to debug small programs.
Remember that C++ code can be generated, by tools such as ANTLR or SWIG.
A suggestion is to approach your issue in the dual way: ensure that proper include paths are passed to compilation commands (from your build automation tool such as GNU make or ninja or meson). This is what GNU autoconf does.
You could consider using autoconf in your software project.
I've changed the name of all the #include "header.h" to #include "..\include\header.h".
I believe it was a mistake, and you certainly want to use slashes, e.g. #include "../include/header.h" if you care about porting your code later to other operating systems (e.g. Linux, Android, MacOSX, or some other Unixes). On most operating systems, the separator for directories is a / and most C++ compilers accept it.
Studying the source code of either Qt or POCO could be inspirational, and one or both of these open source libraries could be useful to you. They are cross-platform. The source code of GCC and Clang could also be interesting to look into. Both are open source C++ compilers, written in C++ mostly (with some metaprogramming approaches, that is some generated C++ code).
See also this and that.

In program development, it is often necessary to use toolkits developed by others. Generally speaking, in Visual Studio, source files are rarely used, and most of them use header files and link libraries that declare classes. If you want to use these classes, you need to include the name of the header file in the file, such as #include "cv.h". But this is not enough, because this file is generally not in the current directory, the solution is as follows:
Open "Project-Properties-Configuration Properties-C/C++-General-Additional Include Directory" in turn and add all the paths.
For all kinds of IDEs, we can do similar operations to include directories. So for those who clone the project, it is quite normal to modify the directory contained in the project.

uint32_t does not name a type

I have shared code given to me that compiles on one linux system but not a newer system. The error is uint32_t does not name a type. I realize that this is often fixed by including the <cstdint> or stdint.h. The source code has neither of these includes and I am trying to seek an option that doesn't require modifying due to internal business practices that I can't control. Since it compiles as is on one machine they don't want changes to the source code.
I am not sure if it matters but the older system uses gcc 4.1 while the newer one uses gcc 4.4. I could install different versions of gcc if needed, or add/install library/include files on the newer machine, I have full control of what is on that machine.
What are my options for trying to compile this code on my machine without modifying the source? I can provide other details if needed.

I am not sure if it matters but the older system uses gcc 4.1 while the newer one uses gcc 4.4
GCC stopped including <stdint.h> some time ago. You now have to include something to get it...
I realize that this is often fixed by including the <cstdint> or stdint.h. The source code has neither of these includes and I am trying to seek an option that doesn't require modifying due to internal business practices that I can't control...
I hope I am not splitting hairs... If you can't modify the source files, then are you allowed to modify the build system or configuration files; or the environment? If so, you can use a force include to insert the file. See Include header files using command line option?
You can modify Makefile to force include stdint.h. If the build system honors CFLAGS or CXXFLAGS, then you can force include it in the flags. You last choice is probably to do something like export CC="gcc -include stdint.h".
The reason I am splitting hairs is OpenSSL and FIPS. The OpenSSL source files for the FIPS Object Module are sequestered and cannot be modified. We have to fallback to modifying supporting scripts and the environment to get some things working as expected.

If you really don't want to amend the file you could wrap it. Suppose it's called src.c create a new file src1.c:
#include <stdint.h>
#include "src.c"
And then compile src1.c.
PS: The problem may arise because compilers include other headers in their header files. This can mean some symbols 'officially' defined in other headers are quietly defined when you include a header that isn't specified as including it.
It's an error to write a program relying on a symbol for which the appropriate header hasn't been included - but it's easy to do and difficult to spot.
A changing compiler or version sometimes reveals these quiet issues.

Unfortunately, you can't force your code to work on a newer compiler without modifying something.
If you are allowed to modify the build script and add source files to the project, you might be able to add another source file to the project which, in turn, includes your affected file and headers it really needs. Remove the affected source files from the build, add the new ones, and rebuild.
If your shared source is using macro magic (e.g. an #include SOME_MACRO, where SOME_MACRO can be defined on the command line), you might be able to get away with modifying build options (to define that macro for every compilation of each file). Apart from relying on modifying the build process, it also relies on a possible-but-less-than-usual usage of macros in your project.
It may be possible to modify the standard headers in your compiler/library installation - assuming you have sufficient access (administrative) to do so. The problem with this is that the problem will almost certainly re-emerge whenever an update/patch to the compiler/library is installed. Over time, this approach will lock the code into relying on an older and older compiler/library that has been superseded - no ability to benefit from compiler bug fixes, evolution of standards, etc. This also severely limits your ability to share the code, and ability of others to use it - anyone who receives the code needs to modify their compiler/library installation.
The basic fact, however, is that your shared code relies on a particular implementation (compiler/library) that exhibits non-standard behaviour. Hence it has failed with an update of that implementation - which removed those non-standard occurrences - it is likely to fail with other implementations (porting to different compilers in future, etc). The real technical solution is to modify the source, and #include needed headers correctly. The real business solution is to make a business case justifying the need for such modifications, citing inefficiency - which will grow over time - in terms of cost and effort needed to maintain the shared code whenever it needs to be ported, or whenever a compiler is updated.

look at the second last line of code above your error, you'll find everything above that terminates with a , and only use a ; on the last entery

How to use a library in a single file C++ code?

How can I use a library such as the GMP library in C++ in such a manner that the file can be compiled normally without having the person compiling to install the GMP themselves. This is for personal use(so no license issues etc.). I just want my client to be able to compile my C++ code which needs the GMP library. I tried using g++ -E option, and it kinda works, but the problem is that on top of it are included many files which are themselves part of the library(and not available without the lbrary). Also the linker needs the availability of the library even if I do do that successfully.
How do I copy the entire library per se, maintaining a single file of code, such that it compiles and doesn't cause problems. I'm quite sure it is doable, because copying some functions works like a charm, but I can't copy paste the 3000 line code manually, so how to do it?

If I understand you correctly, I guess what you want is to have your source and the entire GMP library in one file? And you want to do that automated?
I think there is no standard way to do this. You could write a script in you favorite language (bash, python, etc) which traverses the GMP code tree, appending the files (first the header files, then the cpp files) to your file while ignoring all local #include-lines. And hope that there are not too many macros etc which rely on the folder structure to be intact.
However, you could (and probably should) just supply the library and a adequate Makefile with your source code. Then the client wouldn't need to install the GMP lib, but could unpack the archive and run make. If the constraint is to do this in one file, maybe it is wiser to change that constraint...

GCC Dependency Tracking: Is -M better than -MM?

There are basically two options for tracking dependencies which are -M and -MM. The difference is that -MM omits system headers and headers included by them.
My question: Why would anyone want to use -M? It inflates the generated .d files drastically, since a system header usually includes a large pack of other system headers. In addition, system headers cannot be built by make, so having them as a depencies yields no benefit. The only little benefit I could see is that - if a required system header is missing - make reports the missing header instead of gcc reporting it. But what is the benefit of this?
To sum things up, I see no reason why -M would be useful at all. Am I missing something? Which scenarios are there that require one to use -M over -MM.

Most header files can't be "built" by make. They're listed as prerequisites so that if they change, then the source code that relies on them is rebuilt. For example, if you install security fix packages on your system and they modify one of the system headers you use, you may want to be sure all your code is rebuilt. These days the backward-compatibility of most base libraries is such that this is not really needed most of the time, I agree.
Also, if you're cross-compiling then your "system" header files are provided to you from the cross-target; these headers might be for an embedded system or similar, and may change (in non-backward-compatible ways) more often than a standard system.

Why would anyone want to use -M?
If the system headers change you want a rebuild to operate accordingly. That is, if your code uses a header, and that header changes, your code should rebuild even if your code didn't change.
Listing headers as dependencies is rarely about 'building' those headers. System headers are no different.

There could be a number of reasons.
Rebuild only necessary parts after system libraries update. Rare thing, but someone may need it.
Get full dependencies list, for some reasons - maybe just drawing dependency graph?
Generate ctags for all potentially used header files.
...
Personally I use it for ctags, so it is not all hypothetical examples.

What are the pros and cons of specifying an include prefix in the source file versus in the search path parameter of the compiler?

When a C or C++ library comes with several headers, they are usually in a specific folder. For example, OpenCV provides cv.h and highgui.h in a opencv folder.
The most common way to include them is to add this opencv folder to the search path of the pre-compiler (e.g. gcc -I/pathto/opencv), and simply include the headers by their filename in the source file (e.g. #include <cv.h>)
There is an alternative, as it is quite frequent that folders containing headers are with others (e.g. in /usr/include or some path common to the development team) in a parent folder. In this case, it is enough to specify the header folder in the source file (e.g. #include <opencv/cv.h>), if the parent folder is already in the path.
The only problem I can think of with this alternative is the case of a system where all the headers are in a single folder. However, this alternative prevents ambiguities (e.g. if two libraries have a vector.h header), makes it easier to set up another build system, and is probably more efficient regard to the search of the header file by the pre-compiler.
Given this analysis, I would tend toward the alternative, but a vast majority of code I found on internet use the first. For example, Google returns around 218000 results for "#include <cv.h>", versus 79100 for "#include <opencv/cv.h>". Am I missing advantages of the common way, or disadvantages of the alternative?

My personal preference is to have <opencv/cv.h>.
Why ? Because I am human, with a limited brain, and much more important things to do than remember that cv.h comes from the opencv library.
Therefore, even though the libraries I work with are always in dedicated folders, I structure them as:
<specific library folder>/include/<library name>/...
This helps me remember where those headers come from.
I have known people saying it was useless, and that IDEs would bring you straight to the file anyway... but
I don't always use an IDE
I don't always want to open each include file merely to know which libraries this particular file is tied to
It also makes it much easier to organize the include list (and group related includes together).

They have slightly different purposes, and I think need to be used carefully.
Consider the fact that in the -I case it was /pathto/opencv. You're suggesting possibly #include <opencv/cv.h> but you'd never write #include </pathto/opencv/cv.h>. There's a reason for that, which is that you expect cv.h is always in a directory called opencv, because that's how it's always released, whereas pathto is just where that library's files happen to have been installed on your machine (or on your distribution, whatever).
Anything that could conceivably differ depending where your code is being compiled, should be in the include path, so that it can be configured without modifying your source. Anything that is guaranteed to be the same wherever that particular cv.h is used can appear in the source, but it doesn't have to, so we need to decide whether we want it there.
As you've already noticed, it's useful to have it there as a disambiguator, especially for a file with a two-character name, but if you think that someone might want to put cv.h in a different place, then you should leave it out. That's pretty much the trade-off you're making - the header should always be in an opencv directory, but is it worth it to you to rely on that as a guarantee, as the price of disambiguating?

I would guess the main problem with using #include <opencv/cv.h> is that you don't necessarily want to add an entire parent path.
Taking an example where it has been installed to /usr/local when using the option with the full path, you'll need to add -I/usr/local/include to your command line. That could have all kinds of side effects.
E.g. for a completely different application, someone may have installed GNU iconv libraries there. Then suddenly, your application, which is also doing #include <iconv.h> is grabbing the headers from the standalone iconv library, instead of the implementation in glibc.
Obviously these are problems that do crop up from time to time, but by including more specific directories, you can hopefully minimise them.

Reasons for the first version:
you are not always able to install the headers/library in a standard path. Some time you do not have root access.
you do not want to pollute the path adding all the path for all the libraries you have installed.
for example you can have 2 version of the same library installed (in your case openCV) and you want some of your projects to be compiled with one library and some other with the other library. If you put them in the path then you get a name clash. For me this is one of the main reason (exactly for OpenCv I have version 1.x and 2.x installed and some projects compiled with 1.x and some with 2.x).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js