My aim is to build a simple c++ parser which collects the name of every struct / class in a c++ program.
The problem is concerning c++20 modules. Consider the following code:
import module-name;
How can my parser know where the module originated from and collect the name of structs /classes?
Had it been an #include directive my parser would just parse the given file name...
Is there any way to get the relevant file names for a given module?
The module files loaded by the compiler are all previously generated by the compiler. The compiler gives those files some name likely based on the module's given name. Exactly how that works is up to the compiler and build environment, and will likely vary among compilers.
Basically, it's not really your job. When you generate a module, you get some output files from the build system, just as you would a library or DLL. You then make those available to later builds that want to consume them, telling the compiler where it can find those module files.
Related
I have a C++ 'library' which consists of a set of reusable classes which are templated (i.e. all source code is in header files) and a set of driver files. Each driver source file includes some (but not necessarily all) headers with class templates.
It would be nice if I could instantiate these class templates in each driver file with specific template parameters (known at compile time) and then automate the initialization of objects with the instantiated type by reading configuration files (this would help me remove some boilerplate code). These configuration files would be read upon object construction.
Suppose the config files would be bundled with the source code. Where should they be placed when drivers are compiled so that each class can locate its config files? I am using CMake to build the code.
Since reusable code is not compiled into a library, I can't place the config files in the same location as the library. I'm not even sure whether that would be a good idea actually.
One solution would be to specify a folder with config files as a CMake variable and hardcode this value in the source code of every configurable class. Is there a better way of doing this? Perhaps there's a standard CMake-style way of handling the problem?
I would consider doing this with good 'ol macros. You can use target_compile_definitions() to define macros in your source code. Your config files could then be CMake files themselves, loaded with include(). Then in your source files, you could do an explicit template specialization or a typedef to MyTemplateClass<TEMPLATE_ARG_MACRO_1, TEMPLATE_ARG_MACRO_2>.
Hopefully that makes some sense.
i want to know that in a c++ code during execution how iostream file is founded. we write #include in c++ program and i know about #include which is a preprocessor directive
to load files and is a file name but i don't know that how that file is located.
i have some questions in my mind...
Is Standard library present in compiler which we are using?
Is that file is present in standard library or in our computer?
Can we give directory path to locate the file through c++ code if yes then how?
You seem to be confused about the compilation and execution model of C++. C++ is generally not interpreted (though it could) but instead a native binary is produced during the compilation phase, which is then executed... So let's take a detour.
In order to go from a handful of text files to a program being executed, there are several steps:
compilation
link
load
execution proper
I will only describe what traditional compilers do (such as gcc or clang), potential variations will be indicated later on.
Compiling
During compilation, each source file (generally .cpp or .cxx though the compiler could care less) is processed to produced an object file (generally .o on Linux):
The source file is preprocessed: this means resolving the #include (copy/pasting the included file in the current file), navigating the #if and #else to remove unneeded sources and expanding macros.
The preprocessed file is fed to the compiler which will produce native code for each function, static or global variable, etc... the format depends on the target system in general
The compiler outputs the object file (it's a binary format, in general)
Linking
In this phase, multiple object files are assembled together into a library or an executable. In the case of a static library or a statically-linked executable, the libraries it depend on are also assembled in the produced file.
Traditionally, the linker job is relatively simple: it just concatenates all object files, which already contain the binary format the target machine can execute. However it often actually does more: in C and C++ inline functions are duplicated across object files, so the linker need to keep only one of the definitions, for example.
At this point, the program is "compiled", and we live the realm of the compiler.
Loading
When you ask to execute a program, the OS will load its code into memory (thanks to a loader) and execute it.
In the case of a statically linked executable, this is easy: it's just a single big blob of code that need be loaded. In the case of a dynamically linked executable, it implies finding the dependencies and "resolving the symbols", I'll describe this below:
First of all, your dynamically linked executable and libraries have a section describing which other libraries they depend on. They only have the name of the library, not its exact location, so the loader will search among a list of paths (LD_LIBRARY_PATH for example on Linux) for the libraries and actually load them.
When loading a library, the loader will perform replacements. Your executable had placeholders saying "Here should be the address of the printf function", and the loader will replace that placeholder with the actual address.
Once everything is loaded properly, all symbols should be resolved. If some symbols are missing, ie the code for them is not found in either the present library or any of its dependencies, then you generally get an error (either immediately, or only when the symbol is actually needed if you use lazy loading).
Executing
The code (assembler instruction in binary format) is now executed. In C++, this starts with building the global and static objects (at file scope, not function-static), and then goes on to calling your main function.
Caveats: this is a simplified view, nowadays Link-Time Optimizations mean that the linker will do more and more, the loader could actually perform optimizations too and as mentioned, using lazy loading, the loader might be invoked after the execution started... still, you've got to start somewhere, don't you ?
So, what does it mean about your question ?
The #include <iostream> in your source code is a pre-processor directive. It is thus fully resolved early in the compilation phase, and only depends on finding the appropriate header (no library code is actually needed yet). Note: a compiler would be allowed not to have a header file sitting around, and just magically inject the necessary code as if the header file existed, because this is a Standard Library header (thus special). For regular headers (yours) the pre-processor is invoked.
Then, at link-time:
if you use static linking, the linker will search for the Standard Library and include it in the executable it produces
if you use dynamic linking, the linker will note that it depends on the Standard Library file (libc++.so for example) and that the produced code is missing an implementation of printf (for example)
Then, at load time:
if you used dynamic linking, the loader will search for the Standard Library and load its code
Then, at execution time, the code (yours and its dependencies) is finally executed.
Several Standard Library implementations exist, off the top of my head:
MSVC ships with a modified version of Dirkumware
gcc ships with libstdc++ (which depends on libc)
clang ships with libc++ (which depends on libc), but may use libstdc++ instead (with compiler flags)
And of course, you could provide others... probably... though setting it up might not be easy.
Which is ultimately used depends on the compiler options you use. By default the most common compiler ship with their own implementation and use it without any intervention on your part.
And finally yes, you can indeed specify paths in a #include directive. For example, using boost:
#include <boost/optional.hpp>
#include <boost/algorithm/string/trim.hpp>
you can even use relative paths:
#include <../myotherproject/x.hpp>
though this is considered poor form by some (since it breaks as soon as your reorganize your files).
What matters is that the pre-processor will look through a list of directories, and for each directory append / and the path you specified. If this creates the path of an existing file, it picks it, otherwise it continues to the next directory... until it runs out (and complain).
The <iostream> file is just not needed during execution. But that's just the header. You do need the standard library, but that's generally named differently if not included outright into your executable.
The C++ Standard Library doesn't ship with your OS, although on many Linux systems the line between OS and common libraries such as the C++ Standard Library is a bit thin.
How you locate that library is very much OS dependent.
There can be 2 ways to load a header file (like iostream.h) in C++
if you write the code as:
# include <iostream>
It will look up the header file in include directory of your C++ compiler and will load it
Other way you can give the full path of the header file as:
# include "path_of_file.h"
And loading up the file is OS dependent as answered by MSalters
You definitely required the standard library header files so that pre-processor directive can locate them.
Yes those files are present in the library and on include copied to our code.
if we had defined the our own header file then we have to give path of that file. In that way we can include also *.c or *.cpp along with the header files in which we had defined various methods and those had to include at pre-processing time.
How do compilers know when it is not necessary to recompile certain parts of code especially in larger projects?
For example, let's say in C++ we have two C++ files and two header files. The header files depend on one another. (They use the classes specified in each others files.)
Does a compiler always need to parse both header files, (and maybe C++ files for method implementation,) to obtain the class information in order to generate either of the two C++ files?
I always thought that when you run the compiler at the command prompt, it closes immediately after outputting the object files - so it would be impossible to cache the Abstract Syntax Trees or intermediate code. Do most C++ compilers know when a certain file doesn't need to output to an object file, and is therefore skipped?
All of the compilers I know compile every source file they're
told to. Always. And they generate a new version of the object
file for every source file they compile.
Only compiling what is necessary is a job generally left to the
build system (make or other). Knowing which objects need to be
regenerated depend on what each source file includes, directly
or indirectly; most compilers have options to output this
information in some format, either on the fly or as a separate
invocation, and the build systems (the usable ones, at least)
use this information to determine dependencies.
As said above, compilers will compiler every file that it is asked to compile. It is up to tools like make to decide what needs to be compiled.
In make one sets up rules. Each rule has a target, list of dependencies followed by the commands to run if those dependencies are not met. For example
target.o : target.c
gcc -c -o target.o target.c
On most file systems, each file has a timestamp. If target.o has a newer timestamp than target.c (the rule dependency) then make does not run the gcc command below. This is because one firsts edits a source file and then compiles the source file into an object file.
If however the dependent source file is newer than the target, then we know the source file was edited after the compile took place and another compile is in order. make will therefore execute the build command for the rule.
It gets a lot more complex when rules are dependent on other rules but the same principle applies.
I don't know how they (don't) implement it (because many don't... Don't ask me why) but I'm quite sure it would be VERY easy. You save in the intermediate (obj) file the name and the hash of the source file and of every dependent file you are compiling, together with the compilation options that are being used, the hash of the compiler (or its internal version) and the compilation result (ok/error). Next time the user tries to recompile the file, the compiler checks if there is already the intermediate file, checks if all the hashes are the same, if the compilation options are the same and if the compiler is the same... If everything is the same, it gives the pre-saved error message and exits without doing anything.
The intermediate files would be a little bigger (probably some kb each).
Is there an automated way to take a large amount of C++ header files and combine them in a single one?
This operation must, of course, concatenate the files in the right order so that no types, etc. are defined before they are used in upcoming classes and functions.
Basically, I'm looking for something that allows me to distribute my library in two files (libfoo.h, libfoo.a), instead of the current bunch of include files + the binary library.
As your comment says:
.. I want to make it easier for library users, so they can just do one single #include and have it all.
Then you could just spend some time, including all your headers in a "wrapper" header, in the right order. 50 headers are not that much. Just do something like:
// libfoo.h
#include "header1.h"
#include "header2.h"
// ..
#include "headerN.h"
This will not take that much time, if you do this manually.
Also, adding new headers later - a matter of seconds, to add them in this "wrapper header".
In my opinion, this is the most simple, clean and working solution.
A little bit late, but here it is. I just recently stumbled into this same problem myself and coded this solution: https://github.com/rpvelloso/oneheader
How does it works?
Your project's folder is scanned for C/C++ headers and a list of headers found is created;
For every header in the list it analyzes its #include directives and assemble a dependency graph in the following way:
If the included header is not located inside the project's folder then it is ignored (e.g., if it is a system header);
If the included header is located inside the project's folder then an edge is create in the dependency graph, linking the included header to the current header being analyzed;
The dependency graph is topologically sorted to determine the correct order to concatenate the headers into a single file. If a cycle is found in the graph, the process is interrupted (i.e., if it is not a DAG);
Limitations:
It currently only detects single line #include directives (e.g., #include );
It does not handles headers with the same name in different paths;
It only gives you a correct order to combine all the headers, you still need to concatenate them (maybe you want remove or modify some of them prior to merging).
Compiling:
g++ -Wall -ggdb -std=c++1y -lstdc++fs oneheader.cpp -o oneheader[.exe]
Usage:
./oneheader[.exe] project_folder/ > file_sequence.txt
(Adapting an answer to my dupe question:)
There are several other libraries which aim for a single-header form of distribution, but are developed using multiple files; and they too need such a mechanism. For some (most?) it is opaque and not part of the distributed code. Luckily, there is at least one exception: Lyra, a command-line argument parsing library; it uses a Python-based include file fuser/joiner script, which you can find here.
The script is not well-documented, but they way you use it is with 3 command-line arguments:
--src-include - The include file to convert, i.e. to merge its include directives into its body. In your case it's libfoo.h which includes the other files.
--dst-include - The output file to write - the result of the merging.
--src-include-dir - The directory relative to which include files are specified (i.e. an "include search path" of one directory; the script doesn't support the complex mechanism of multiple include paths and search priorities which the C++ compiler offers)
The script acts recursively, so if file1.h includes another file under the --src-include-dir, that should be merged in as well.
Now, I could nitpick at the code of that script, but - hey, it works and it's FOSS - distributed with the Boost license.
If your library is so big that you cannot build and maintain a single wrapping header file like Kiril suggested, this may mean that it is not architectured well enough.
So if your library is really huge (above a million lines of source code), you might consider automating that, with tools like
GCC make dependency generator preprocessor options like -M -MD -MF etc, with another hand made script sorting them
expensive commercial static analysis tools like coverity
customizing a compiler thru plugins or (for GCC 4.6) MELT extensions
But I don't understand why you want an automated way of doing this. If the library is of reasonable size, you should understand it and be able to write and maintain a wrapping header by hand. Automating that task will take you some efforts (probably weeks, not minutes) so is worthwhile only for very large libraries.
If you have a master include file that includes all others available, you could simply hack a C preprocessor re-implementation in Perl. Process only ""-style includes and recursively paste the contents of these files. Should be a twenty-liner.
If not, you have to write one up yourself or try at random. Automatic dependency tracking in C++ is hard. Like in "let's see if this template instantiation causes an implicit instantiation of the argument class" hard. The only automated way I see is to shuffle your include files into a random order, see if the whole bunch compiles, and re-shuffle them until it compiles. Which will take n! time, you might be better off writing that include file by hand.
While the first variant is easy enough to hack, I doubt the sensibility of this hack, because you want to distribute on a package level (source tarball, deb package, Windows installer) instead of a file level.
You really need a build script to generate this as you work, and a preprocessor flag to disable use of the amalgamate (that could be for your uses).
To simplify this script/program, it helps to have your header structures and include hygiene in top form.
Your program/script will need to know your discovery paths (hint: minimise the count of search paths to one if possible).
Run the script or program (which you create) to replace include directives with header file contents.
Assuming your headers are all guarded as is typical, you can keep track of what files you have already physically included and perform no action if there is another request to include them. If a header is not found, leave it as-is (as an include directive) -- this is required for system/third party headers -- unless you use a separate header for external includes (which is not at all a bad idea).
It's good to have a build phase/translation that includes header alone and produces zero warnings or errors (warnings as errors).
Alternatively, you can create a special distribution repository so they never need to do more than pull from it occasionally.
What you want to do sounds "javascriptish" to me :-) . But if you insist, there is always "cat" (or the equivalent in Windows):
$ cat file1.h file2.h file3.h > my_big_file.h
Or if you are using gcc, create a file my_decent_lib_header.h with the following contents:
#include "file1.h"
#include "file2.h"
#include "file3.h"
and then use
$ gcc -C -E my_decent_lib_header.h -o my_big_file.h
and this way you even get file/line directives that will refer to the original files (although that can be disabled, if you wish).
As for how automatic is this for your file order, well, it is not at all; you have to decide the order yourself. In fact, I would be surprised to hear that a tool that orders header dependencies correctly in all cases for C/C++ can be built.
usually you don't want to include every bit of information from all your headers into the special header that enables the potential user to actually use your library. The non-trivial removal of type definitions, further includes or defines, that are not necessary for the user of your interface to know can not be automatedly done. As far as I know.
Short answer to your main question:
No.
My suggestions:
manually make a new header, that contains all relevant information (nothing more, nothing less) for the user of your library interface. Add nice documentation comments for each component it contains.
use forward declarations where possible, instead of full-fledged included definitions. Put the actual includes in your implementation files. The less include statements you have in your headers, the better.
don't build a deeply nested hierarchy of includes. This makes it extremely hard to keep an overview on the contents of every bit you include. The user of your library will look into the header to learn how to use it. And he will probably not be able to distinguish relevant code from irrelevant on the first sight. You want to maximize the ratio of relevant code per total code in the main header for your library.
EDIT
If you really do have a toolkit library, and the order of inclusion really does not matter, and you have a bunch of independent headers, that you want to enumerate just for convenience into a single header, then you can use a simple script. Like the following Python (untested):
import glob
with open("convenience_header.h", 'w') as f:
for header in glob.glob("*.h"):
f.write("#include \"%s\"\n" % header)
I hit this issue about two years ago when I first implemented our SWIG bindings. As soon as we exposed a large amount of code we got to the point where SWIG would output C++ files so large the compiler could not handle them. The only way I could get around the issue was to split up the interfaces into multiple modules and to compile them separately.
This has several downsides:
• Each module must know about dependencies in other modules. I have a script to generate the interface files which handles this side of things, but it adds extra complexity.
• Each additional module increases the time that the dynamic linker requires to load in the code. I have added an init.py file that imports all the submodules, so that the fact that the code is split up is transparent to the user, but what is always visible is the long load times.
I'm currently reviewing our build scripts / build process and I wanted to see if I could find a solution to this issue that was better than what I have now. Ideally, I'd have one shared library containing all the wrapper code.
Does anyone know how I can acheive this with SWIG? I've seen some custom code written in Ruby for a specific project, where the output is post-processed to make this possible, but when I looked at the feasibility for Python wrappers it does not look so easy.
I just did equivalent hack for TCL library: I use several SWIG modules, generating several .cpp files that are compiled in several .o files but compile them all in a single .so file that is loaded by a single TCL "load" command.
The idea is to creates a top swig module (Top) that calls initialization functions of all sub-modules (Sub1 and Sub2):
%module Top
%header %{
extern "C" {
SWIGEXPORT int Sub1_Init(Tcl_Interp *);
SWIGEXPORT int Sub2_Init(Tcl_Interp *);
}
%}
%init %{
if (Sub1_Init(interp) != TCL_OK) {return TCL_ERROR;}
if (Sub2_Init(interp) != TCL_OK) {return TCL_ERROR;}
%}
There's nothing special in the submodules files.
I end up with file Top.so that I load from TCL with command "load ./Top.so"
I don't know python but's likely to be similar. You may need to understand how the python extensions are loaded, though.
If split properly, the modules don't necessarily need to have the same dependencies as the others - just what's necessary to do compilation. If you break things up appropriately, you can have libraries without cyclic dependencies. The issue with using multiple libraries is that by default, SWIG declares its runtime code statically, and as a result, as problems passing objects from one module to another. You need to enable a shared version of the SWIG runtime code.
From the documentation (SWIG web page documentation link is broken):
The runtime functions are private to
each SWIG-generated module. That is,
the runtime functions are declared
with "static" linkage and are visible
only to the wrapper functions defined
in that module. The only problem with
this approach is that when more than
one SWIG module is used in the same
application, those modules often need
to share type information. This is
especially true for C++ programs where
SWIG must collect and share
information about inheritance
relationships that cross module
boundaries.
Check out that section in your downloaded documentation (section 16.2 The SWIG runtime code), and it'll give you details on how to enable this so that objects can be properly handled when passed from one module to the other.
FWIW, I've not worked with Python SWIG, but have done Tcl SWIG.