VC++ 'Generating Code', what does it mean? - c++

WHen compiling in visual studio the compiler outputs this at what seems to be its own discretion:
1>Generating Code...
what is it doing here exactly?

It is doing what it says: it is generating the machine code. Many compilers translate C/C++ sources into some intermediate internal representation that is later used as the source to generate the actual machine code. Visual C++ compiler (as many other compilers) does this in batches: first it translates a bunch of source files into that intermediate representation and then converts them all to machine code (and then starts working on the next batch). This is what happens when you see the "Generating code" messages.
I don't know what logic exactly it is using to split the source files into batches. Maybe it works simply by size: once the total size of all intermediate representations generated so far gets to some limit, it switches to "generating code" mode. Maybe there's some other logic at work there as well.
In any case note that the unqualified term "code" in this case does not refer to source code, meaning that it has nothing to do with templates and/or preprocessor or anything like that. Moreover, referring to C sources with unqualified "code" (as opposed to the qualified "source code") is a very niche thing, more at home with marketing department than with actual programmers. At the programmers' level nobody refers to C sources as just "code" :)

The compiler is given multiple input files at once and it reads (parses) several of those in one go, and only then produces output (object files) for them, before it reads more input files. I suppose this is an optimization, presumably because mixed read/write access to the disk is slower than when it is sorted into (first) read access and (then) write access.

Visual Studio is invoking the linker LINK.exe it works primarily with object files as input, to produce an executable as output, but also is capable of much other work concerning these and related files. Linker Command-Line Syntax # MSDN

Template instances (and other type of code) might generate code (or not in some conditions).


C/C++: get name of translation unit, not file being parsed

(Note I have checked several previous questions on this forum that are similar, but ultimately different, such as getting object file names or some such. Hence, this is not a duplicate.)
I have long used __FILE__ for logging errors' locations.
I have modified my logging module to have the header define a file-scoped structure holding this among other data. (It's highly bizarre to define storage in a header and in a long career with C/C++ I don't think I've ever done so before.)
However, I was surprised to see that __FILE__ now expands to the name of the header, no longer the source file.
I have various technical workarounds but is there a modern way in gcc, clang or Visual Studio, even if not portable, to get the name of the source file being compiled into the preprocessor?
The only options I can see so far are all distasteful:
Requiring user of the logging library to add -DFILE_NAME=$< to make
commands. (And I'm not sure how to do this in Visual Studio though I imagine I can figure it out.)
Requiring user of the logging library to manually add a
definition of this object to their code so that it creates with the
correct __FILE__
Forgetting storing this file name in such a
structure and keep doing it the old way

static library maximum file size generated by Visual Studio or other limitations?

are there some limitations under Visual Studio (2008, 2010, ...), in particular for C++ big projects ?
I think of limitations like :
- a maximum number of files for a project to be compiled / linked
- a maximum .lib file size that can be generated
We are working with quite big projects, so we would like to prevent any futur problem.
For example we already had problem with too big .obj files, that we managed to correct thanks to the Visual Studio /bigobj flag.
Your real problem is not that you don't know the various size limitations of your toolset. The problem is that you don't have a plan about how to react when/if you hit some of these limitations. I have two suggestions.
First is to split up a huge monolithic static library into pieces; this is a generic solution to any of the limitations you've mentioned. A giant slab of object code is already evidence enough to question whether there's good abstraction within the code base. Libraries in Visual Studio can contain references to other libraries that are automatically incorporated at link time, so there's no technical reason that mandates a single huge library.
Second is that you don't appear to have any automatic testing that indicates when you have a problem. One kind of automatic test would be to compare file sizes and the number of exposed symbols (for example) against policy limits. If a policy limit is exceeded, it should generate a warning and a message to someone with authority to mandate action about it. Another kind of automatic test would be a test program that links against every entry point (or at least most of them) and ensures that your code is actually linking.
My second suggestion, though, is moot if you address the first one.

Where exactly is the boundary between a preprocessor and a compiler?

According to various sources (for example, the SE radio episode with Kevlin Henney, if I remember correctly), "C with classes" was implemented with preprocessor technology (with the output then being fed to a C compiler), whereas C++ has always been implemented with a compiler (that just happened to spit out C in the early days). This seems to cause some confusion, so I was wondering:
Where exactly is the boundary between a preprocessor and a compiler? When do you call a piece of software that implements a language "a preprocessor", and when do you call it "a compiler"?
By the way, is "a compiled language" an established term? If so, what exactly does it mean?
This is an interesting question. I don't know a definitive answer, but would say this, if pressed for one:
A preprocessor doesn't parse the code, but instead scans for embedded patterns and expands them
A compiler actually parses the code by building an AST (abstract syntax tree) and then transforms that into a different language
The language of the output of the preprocessor is a subset of the language of the input.
The language of the output of the compiler is (usually) very different (machine code) then the language of the input.
From a simplified, personal, point of view:
I consider the preprocessor to be any form of textual manipulation that has no concepts of the underlying language (ie: semantics or constructs), and thus only relies on its own set of rules to perform its duties.
The compiler starts when rules and regulation are applied to what is being processed (yes, it makes 'my' preprocessor a compiler, but why not :P), this includes symantical and lexical checking, and the included transforms from x (textual) to y (binary/intermediate form). as one of my professors would say: "its a system with inputs, processes and outputs".
The C/C++ compiler cares about type-correctness while the preprocessor simply expands symbols.
A compiler consist of serval processes (components). The preprocessor is only one of these and relatively most simple one.
From the Wikipedia article, Division of compiler processes:
All but the smallest of compilers have more than two phases. However,
these phases are usually regarded as being part of the front end or
the back end. The point at which these two ends meet is open to
The front end is generally considered to be where syntactic
and semantic processing takes place, along with translation to a lower
level of representation (than source code).
The middle end is usually
designed to perform optimizations on a form other than the source code
or machine code. This source code/machine code independence is
intended to enable generic optimizations to be shared between versions
of the compiler supporting different languages and target processors.
The back end takes the output from the middle. It may perform more
analysis, transformations and optimizations that are for a particular
computer. Then, it generates code for a particular processor and OS."
Preprocessing is only the small part of the front end job.
The first C++ compiler made by attaching additional process in front of existing C compiler toolset, not because it is good design but because limited time and resources.
Nowadays, I don't think such non-native C++ compiler can survive in the commercial field.
I dare say cfront for C++11 is impossible to make.
The answer is pretty simple.
A preprocessor works on text as input and has text as output. Examples for that are the old unix commands m4, cpp (the C Pre Processor), and also unix programs like roff and nroff and troff which where used (and still are) to format man pages (unix command "man") or format text for printing or typesetting.
Preprocessors are very simple, they don't know anything about the "language of the text" they process. In other words they usually process natural languages. The C preprocessor besides its name, e.g. only recognizes #define, #include, #ifdef, #ifndef, #else etc. and if you use #define MACRO it tries to "expand" that macro everywhere it finds it. But that does not need to be C or C++ program text, it can as well be a novel written in italian or greek.
Compilers that cross compile into a different language are usually called translators. So the old cfront "compiler" for C++ which emitted C code was a C++ translator.
Preprocessors and later translators are historically used because old machines simply lacked memory to be able to do everything in one program, but instead it was done by specialized programs and from disk to disk.
A typical C program would be compiled from various sources. And the build process would be managed with make. In our days the C preprocessor is usually build directly into the C/C++ compiler. A typical make run would call the CPP on the *.c files and write the output to a different directory, from there either the C compiler CC would compile it straight to machine code or more commonly would output assembler code as text. Note: the c compiler only checks syntax, it does not really care about type safety etc. Then the assembler would take that assembler code and would output a *.o file wich later can be linked with other *.o files and *.lib files into an executable program. OTOH you likely had a make rule that would not call the C compiler but the lint command, the C language analyser, which is looking for typical mistakes and errors (which are ignored by the c compiler).
It is quite interesting to look up about lint, nroff, troff, m4 etc. on wikipedia (or your machines terminal using man) ;D

Where do I learn "what I need to know" about C++ compilers?

I'm just starting to explore C++, so forgive the newbiness of this question. I also beg your indulgence on how open ended this question is. I think it could be broken down, but I think that this information belongs in the same place.
(FYI -- I am working predominantly with the QT SDK and mingw32-make right now and I seem to have configured them correctly for my machine.)
I knew that there was a lot in the language which is compiler-driven -- I've heard about pre-compiler directives, but it seems like someone would be able to write books the different C++ compilers and their respective parameters. In addition, there are commands which apparently precede make (like qmake, for example (is this something only in QT)).
I would like to know if there is any place which gives me an overview of what compilers are out there, and what their different options are. I'd also like to know how each of them views Makefiles (it seems that there is a difference in syntax between them?).
If there is no website regarding, "Everything you need to know about C++ compilers but were afraid to ask," what would be the best way to go about learning the answers to these questions?
Concerning the "numerous options of the various compilers"
A piece of good news: you needn't worry about the detail of most of these options. You will, in due time, delve into this, only for the very compiler you use, and maybe only for the options that pertain to a particular set of features. But as a novice, generally trust the default options or the ones supplied with the make files.
The broad categories of these features (and I may be missing a few) are:
pre-processor defines (now, you may need a few of these)
code generation (target CPU, FPU usage...)
optimization (hints for the compiler to favor speed over size and such)
inclusion of debug info (which is extra data left in the object/binary and which enables the debugger to know where each line of code starts, what the variables names are etc.)
directives for the linker
output type (exe, library, memory maps...)
C/C++ language compliance and warnings (compatibility with previous version of the compiler, compliance to current and past C Standards, warning about common possible bug-indicative patterns...)
compile-time verbosity and help
Concerning an inventory of compilers with their options and features
I know of no such list but I'm sure it probably exists on the web. However, suggest that, as a novice you worry little about these "details", and use whatever free compiler you can find (gcc certainly a great choice), and build experience with the language and the build process. C professionals may likely argue, with good reason and at length on the merits of various compilers and associated runtine etc., but for generic purposes -and then some- the free stuff is all that is needed.
Concerning the build process
The most trivial applications, such these made of a single unit of compilation (read a single C/C++ source file), can be built with a simple batch file where the various compiler and linker options are hardcoded, and where the name of file is specified on the command line.
For all other cases, it is very important to codify the build process so that it can be done
a) automatically and
b) reliably, i.e. with repeatability.
The "recipe" associated with this build process is often encapsulated in a make file or as the complexity grows, possibly several make files, possibly "bundled together in a script/bat file.
This (make file syntax) you need to get familiar with, even if you use alternatives to make/nmake, such as Apache Ant; the reason is that many (most?) source code packages include a make file.
In a nutshell, make files are text files and they allow defining targets, and the associated command to build a target. Each target is associated with its dependencies, which allows the make logic to decide what targets are out of date and should be rebuilt, and, before rebuilding them, what possibly dependencies should also be rebuilt. That way, when you modify say an include file (and if the make file is properly configured) any c file that used this header will be recompiled and any binary which links with the corresponding obj file will be rebuilt as well. make also include options to force all targets to be rebuilt, and this is sometimes handy to be sure that you truly have a current built (for example in the case some dependencies of a given object are not declared in the make).
On the Pre-processor:
The pre-processor is the first step toward compiling, although it is technically not part of the compilation. The purposes of this step are:
to remove any comment, and extraneous whitespace
to substitute any macro reference with the relevant C/C++ syntax. Some macros for example are used to define constant values such as say some email address used in the program; during per-processing any reference to this constant value (btw by convention such constants are named with ALL_CAPS_AND_UNDERSCORES) is replace by the actual C string literal containing the email address.
to exclude all conditional compiling branches that are not relevant (the #IFDEF and the like)
What's important to know about the pre-processor is that the pre-processor directive are NOT part of the C-Language proper, and they serve several important functions such as the conditional compiling mentionned earlier (used for example to have multiple versions of the program, say for different Operating Systems, or indeed for different compilers)
Taking it from there...
After this manifesto of mine... I encourage to read but little more, and to dive into programming and building binaries. It is a very good idea to try and get a broad picture of the framework etc. but this can be overdone, a bit akin to the exchange student who stays in his/her room reading the Webster dictionary to be "prepared" for meeting native speakers, rather than just "doing it!".
Ideally you shouldn't need to care what C++ compiler you are using. The compatability to the standard has got much better in recent years (even from microsoft)
Compiler flags obviously differ but the same features are generally available, it's just a differently named option to eg. set warning level on GCC and ms-cl
The build system is indepenant of the compiler, you can use any make with any compiler.
That is a lot of questions in one.
C++ compilers are a lot like hammers: They come in all sizes and shapes, with different abilities and features, intended for different types of users, and at different price points; ultimately they all are for doing the same basic task as the others.
Some are intended for highly specialized applications, like high-performance graphics, and have numerous extensions and libraries to assist the engineer with those types of problems. Others are meant for general purpose use, and aren't necessarily always the greatest for extreme work.
The technique for using each type of hammer varies from model to model—and version to version—but they all have a lot in common. The macro preprocessor is a standard part of C and C++ compilers.
A brief comparison of many C++ compilers is here. Also check out the list of C compilers, since many programs don't use any C++ features and can be compiled by ordinary C.
C++ compilers don't "view" makefiles. The rules of a makefile may invoke a C++ compiler, but also may "compile" assembly language modules (assembling), process other languages, build libraries, link modules, and/or post-process object modules. Makefiles often contain rules for cleaning up intermediate files, establishing debug environments, obtaining source code, etc., etc. Compilation is one link in a long chain of steps to develop software.
Also, many development environments abstract the makefile into a "project file" which is used by an integrated development environment (IDE) in an attempt to simplify or automate many programming tasks. See a comparison here.
As for learning: choose a specific problem to solve and dive in. The target platform (Linux/Windows/etc.) and problem space will narrow the choices pretty well. Which you choose is often linked to other considerations, such as working for a particular company, or being part of a team. C++ has something like 95% commonality among all its flavors. Learn any one of them well, and learning the next is a piece of cake.

MAP file analysis - where's my code size comes from?

I am looking for a tool to simplify analysing a linker map file for a large C++ project (VC6).
During maintenance, the binaries grow steadily and I want to figure out where it comes from. I suspect some overzealeous template expansion in a library shared between different DLL's, but jsut browsign the map file doesn't give good clues.
Any suggestions?
This is a wonderful compiler generated map file analysis/explorer/viewer tool. Check if you can explore gcc generated map file.
amap : A tool to analyze .MAP files produced by 32-bit Visual Studio compiler and report the amount of memory being used by data and code.
This app can also read and analyze MAP files produced by the Xbox360, Wii, and PS3 compilers.
The map file should have the size of each section, you can write a quick tool to sort symbols by this size. There's also a command line tool that comes with MSVC (undname.exe) which you can use to demangle the symbols.
Once you have the symbols sorted by size, you can generate this weekly or daily as you like and compare how the size of each symbol has changed over time.
The map file alone from any single build may not tell much, but a historical report of compiled map files can tell you quite a bit.
Have you tried using dumpbin.exe on your .obj files?
Stuff to look for:
Using a lot of STL?
A lot of c++ classes with inline methods?
A lot of constants?
If anything of the above applies to you. Check if they have a wide visibility, i.e. if they are used/seen in large parts of your application.
No suggestion for a tool, but a guess as to a possible cause: do you have incremental linking enabled? This can cause expansion during subsequent builds...
The linker will strip unused symbols if you're compiling with /opt:ref, so if you're using that and not using incremental linking, I would expect expansion of the binaries to be only a result of actual new code being added. That's as far as I know... hope it helps a little.
Templates, macros, STL in general all use a tremendous amount of space. Heralded as a great universal library, BOOST adds much space to projects. BOOST_FOR_EACH is an example of this. Its hundreds of lines of templated code, which could simply be avoided by writing a proper loop handle, which is in general only a few more key strokes.
Get Visual AssistX to save typing, not using templates. Also consider owning the code you use. Macros and inline function expansion are not necessarily going to show up.
Also, if you can, move away from DLL architecture to statically linking everything into one executable which runs in different "modes". There is absolutely nothing wrong with using the same executable image as many times as you want just passing in a different command line parameter depending on what you want it to do.
DLL's are the worst culprit for wasting space and slowing down the running time of a project. People think they are space savers, when in fact they tend to have the opposite effect, sometimes increasing project size by ten times! Plus they increase swapping. Use fixed code sections (no relocation section) for performance.