What is a Delphi DCU file? - c++

What is a Delphi DCU file?
I believe it stands for "Delphi Compiled Unit". Am I correct in assuming it contains object code, and therefore corresponds to an ".o" file compiled from a C/C++ source code file?

I believe .dcu generally means "Delphi Compiled Unit" as opposed to a .pas file which is simply "Pascal source code".
A .dcu file is the file that the DCC compiler produces after compiling the .pas files (.dfm files are converted to binary resources, then directly processed by the linker).
It's analogous to .o and .obj files that other compilers produce, but contains more information on the symbols (therefore you can reverse engineer the interface section of a unit from it omitting comments and compiler directives).
A .dcu file technically not a "cache" file, although your builds will run faster if you don't delete them and when doesn't need to recompile them. A .dcu file is tied to the compiler version that generated it. In that sense it is less portable than .o or .obj files (though they have their share of compatibility problems too)
Here's some history in case it adds anything.
Compilers have traditionally translated source code languages into some intermediate form. Interpreters don't do that -- they just interpret the language directly and run the application right away. BASIC is the classic example of an interpreted language. The "command line" in DOS and Windows has a language that can be written in files called "batch files" with a .bat extension. But typing things on the command line executed them directly. In *nix environments, there are a bunch of different command-line interpreters (CLIs), such as sh, csh, bash, ksh, and so on. You can create batch files from all of them -- this are usually referred to as "scripting languages". But there are a lot of other languages now that are both interpreted and compiled.
Anyway Java and .Net, for example, compile into something called an intermediate "byte-code" representation.
Pascal was originally written as a single-pass compiler, and Turbo Pascal (originating from PolyPascal) - with different editions for CP/M, CP/M-86 and DOS - directly generated a binary executable (COM) file that ran under those operating systems.
Pascal was originally designed as a small, efficient language intended to encourage good programming practices using structured programming and data structuring; Turbo Pascal 1 was originally designed as a an IDE with built-in very fast compiler, and an affordable competitor in the the DOS and CP/M market against the long edit/compile/link cycles at that time. Turbo Pascal and Pascal had similar limitations as any programming environment back then: memory and disk space were measured in kilobytes, processor speeds in Megahertz.
Linking to an executable binary prevented you from linking to separately compiled units and libraries.
Before Turbo Pascal, there was UCSD p-System operating system (supporting many languages, including Pascal. The UCSD Pascal compiler back then already extended the Pascal language with units) which compiled into a pseudo-machine byte-code (called p-code) format that allowed linking multiple units together. It was slow though,
Meanwhile, c evolved in VAX and Unix environments, and it compiled into .o files, which meant "object code" as opposed to "source code". Note: this is totally unrelated to anything we call "objects" today.
Turbo Pascal up to and including version 3 directly generated .com binary output files (although you could use amend those overlays files), and as of version 4 supported separating code into units that first compiled into .tpu files before linked into the final executable binary. The Turbo C compiler generated .obj (object code) files rather than byte-codes, and Delphi 2 introduced .obj file generation on order to co-operate with C++ Builder.
Object files use relative addressing within each unit, and require what's called "fix-ups" (or relocation) later on to make them run. Fix-ups point to symbolic labels that are expected to exist in other object files or libraries.
There are two kinds of "fix-ups": one is done statically by a tool called a "linker". The linker takes a bunch of object files and seams them together into something analogous to a patchwork quilt. It then "fixes-up" all of the relative references by plugging-in pointers to all of the externally-defined labels.
The second fix-ups are done dynamically when the program is loaded to run. They're done by something called the "loader", but you never see that. When you type a command on the command line, the loader is called to load an EXE file into memory, fix-up the remaining links based on where the file is loaded, and then control is transferred to the entry point of the application.
So .dcu files originated as .tpu files when Borland introduced units in Turbo Pascal, then changed extension with the introduction of Delphi. They are very different from .obj files, though you can link to .obj files from Turbo Pascal and Delphi.
Delphi also hid the linker entirely, so you just do a compile and a run. All of the linker settings are still there, however, in one of Delphi's options panes.

In addition to David Schwartz's answer, there is one case when a dcu actually is quite different from typical obj files generated in other languages: Generic type definitions. If a generic type is defined in a Delphi Unit, the compiler compiles this code into a syntax tree representation rather than to machine code. This syntax tree representation then is stored in the dcu file. When the generic type then is used and instantiated in another unit, the compiler will use this representation and "merge" it with the syntax tree of the unit using the generic type. You could think of this being somewhat analogues to method inlining. This, btw is also the reason why a unit that makes heavy use of generics takes much longer to compile, although the generic types are "linked in" from a dcu file.

A Delphi Compiled Unit contains object code, and pre-compiled headers, and is therefore somewhat comparable to both an obj file and a .pch / .gch file.
The 'interface' section of a Delphi source file corresponds to the header, and the 'implementation' section creates the object code.
Pre-compiled header files may significantly reduce compilation and link time. The DCU header section provides link information to other referenced units, that does not have to be re-discovered.
In the Delphi / Turbo Pascal environment, pre-compiled headers support strict type checking, which would have required source-code referencing if an Object file format like .coff or .obj had been used. (In C++, name mangling provides a similar but less complete function).

Related

How and why is uncompiled C/C++ sometimes used?

I've seen some programs such as ROS use uncompiled C/C++ files (raw code, not yet compiled). In this example, C++ is used but not compiled. Most C/C++ I've learned so far needs to be compiled to run.
My guess here is that either they're compiled by the system every time it runs, or it's interpreted like Python.
How and why exactly are such uncompiled C/C++ files used?
The premise of your question is untrue.
The source code will undergo a translation process just as it always is.
The CMakeLists.txt file tells CMake what to do, with help from some items set up when you followed the build instructions. In examples given on the reference page you linked to, we have the common declaration to build a C++ file into an executable:
add_executable(talker src/talker.cpp)
…along with all the ROS-related business that gets performed.

C++ systems not using "source files"

In The C++ Programming Language (4th edition), §15.1, Stroustrup states:
A file is the traditional unit
of storage (in a file system) and the traditional unit of compilation. There are systems that do not
store, compile, and present C++ programs to the programmer as sets of files.
Sadly, he doesn't give further information. Do you know any example of such systems?
EDIT:
I mean if you know any actual free, commercial, opensource or whatever C++ implementation that doesn't deal with files as we are accustomed to.
And I was wondering: Why do that systems exist? What's the point? What can be the advantages of such a design philosophy? What the drawbacks?
IIRC, in the 1980s IBM Visual Age C++ stored the program source code (or perhaps a faithful representation of its AST) in some proprietary database. (It is rumored that header files also sit in some database at that time).
And current C++ compilers are often able to get the source code from a generated file, or even some pipe. For instance, on my Linux I could have a program mygeneratorgenerating some C++ code on its stdout and invoke the GCC compiler as:
mygenerator | g++ -x c++ /dev/stdin -Wall -O -o myprogram
However, today, most compilers are generally compiling source files and header files from some file system. Notice that an optimizing compiler spend much more time in compilation proper than in disk IO, and you could use some tmpfs file system, so file read&write time is negligible when compiling C++ code (even parsing is often quicker than optimization & code generation).
So I know no C++ compiler used in 2015 which compiles and optimize source code outside of source files
Actually, generating C++ code is often a good idea (I'm doing it in MELT, which enables you to customize GCC), but usually you tweak your build procedure (e.g. your Makefile) to generate then compile some temporary C++ files. With current computers and operating systems and compilers (e.g. Linux & GCC) you could even generate some temporary C++ file, fork a compilation of it into a shared object plugin, and dlopen(3) it.
A possible reason to store the source code in something better than a file -e.g. some database- would be to make an incremental compiler, which would recompile only one function if it was the only modification from the previous compilation. In practice, this is difficult to implement in existing compilers (it has been discussed, and sort-of prototyped, within the GCC community, but nothing stable came out of this). But C++ or C is not the best language for such an approach (Common Lisp is much better, and SBCL is able to compile and optimize in memory and incrementally), in particular because of its preprocessor.
BTW, tinycc is able to compile C code sitting inside some const char* string in memory, but the performance of the generated machine code is bad (since tcc does not do any kind of serious optimizations, that current processors need so much).
Notice also that with link time optimizations (e.g. compile and link with g++ -flto -O2) compilers are keeping some form of the AST (actually the Gimple representation of GCC) in object files.
C++ source code can be stored in a database in various ways.

Difference between code object and executable file

I'm a C++ beginner and I'm studying the basics of the language. There is a topic in my book about the compiler and my problem is that I can not understand what the text wants to say:
C++ is a compiled language so you need to translate the source code in
a file that the computer can execute. This file is generated by the
compiler and is called the object code ( .obj ), but a program like
the "hello world" program is composed by a part that we wrote and a part
of the C++ library. The linker links these two parts of a program and
produces an executable file ( .exe ).
Why does my book tell that the file that is executed by the computer is the one with the obj suffix (the object code) and then say that it is the one with the exe suffix?
Object files are source compiled into binary machine language, but they contain unresolved external references (such as printf,for instance). They may need to be linked against other object files, third party libraries and almost always against C/C++ runtime library.
In Unix, both object and exe files are the same COFF format. The only difference is that object files have unresolved external references, while a.out files don't.
The C++ specification is a technical document in English. For C++11 have a look inside n3337 (or spend a lot of money to buy the paperback ISO standard). In theory you don't need a computer to run a C++ program (you could use a bunch of human slaves, but that would be unethical, inefficient, and unreliable).
You could have a C++ implementation which is an interpreter, not a compiler (e.g. Ch by SoftIntegration)
If you install Linux on your laptop (which I recommend doing to every student) then you could have several free software C++ compilers, in particular GCC and Clang/LLVM (using g++ and clang commands respectively). Source files are suffixed .cc, or .cxx, or .cpp, or even .C (I prefer .cc), and you could ask the compiler to handle a file of some other suffix as a C++ source file (but that is not conventional). Then, both object files (suffixed .o) and executables share the same ELF format. Conventionally, executables don't have any suffix (e.g. g++ is a binary executable, not doing much except starting other processes like cc1plus -the compiler proper-, as -the assembler-, ld -the linker- etc...)
In all cases I strongly recommend:
to enable all warnings and debug info during compilation (e.g. use g++ -Wall -g ....)
to improve your source code till you got no warnings
to learn how to use the debugger (gdb)
to be able to build your program on the command line
to use a version control system like git
to use a good editor like emacs, gedit, geany, or gvim
once you are writing programs in several source files, learn how to use a builder like make
to learn C++11 (or even perhaps C++14) rather than older C++ standards
to also learn other programming languages (Ocaml, Scheme, Haskell, Prolog, Scala, ....) since they would improve your thinking and your way of coding in C++
to study the source code of several free software coded in C++
to read the documentation of every function that you are using, e.g. on cppreference or in man pages (for Linux)
to understand what is undefined behavior (the fact that your program sometimes work does not make it correct).
Concretely, on Linux you could edit your Hello World program (file hello.cc) with gedit or emacs (with a command like gedit hello.cc) etc..., compile it using g++ -Wall -g hello.cc -o hello command, debug it using gdb ./hello, and repeat (don't forget to use git commands for version control).
Sometimes it makes sense to generate some C++ code, e.g. by some shell, Python, or awk script (or even by your own program coded in C++ which generates C++ code!).
Also, understand that an IDE is not a compiler (but runs the compiler for you).
The basic steps for creating an application from a C or C++ source file are as follows:
(1) the source files are created (by a person or generated by a program), (2) the source files are compiled (which is really two steps, Preprocessor and compilation) into object code, (3) the object files that are created by the C/C++ compiler are linked to create the .exe
So you have these steps of transforming one version of the computer program, the source files, to another, the executable. The C++ source is compiled to produce the object files. The object files are then linked to produce the executable file.
In most cases there are several different programs involved in the compile and link process with C and C++. Each program takes in certain files and creates new files.
C/C++ Preprocessor takes in source code files and generates source code files
C/C++ Compiler takes in source code files and generates object code files
the linker takes in object code files and libraries and generates executable files
See What is the difference between - 1) Preprocessor,linker, 2)Header file,library? Is my understanding correct?
Most compiler installations have a program that runs these various applications for you. So if you are using gcc then gcc program will run first the C++ Preprocessor then then C++ compiler and then the linker. However you can modify what gcc does with command line options to tell it to only run the C++ Preprocessor or to only compile the source files but not to link them or to only link the object code files.
A brief history of computer languages and programming
The languages used for programming computers along with the various software development tools have evolved over the years.
The first computers were programmed with numbers entered by switches on a console.
Then people started developing languages and software that could be used to create software more easily and quicker. The first major development was creating assembler language where each line of source was converted by a computer program into a machine code instruction. Along with this came the development of linkers (which link pieces of machine code together into larger pieces). Assemblers were improved by adding a macro or preprocessor facility somewhat like the C/C++ Preprocessor though designed for assembly language.
Then people created programming languages that looked more like people written languages rather than assembler (FORTRAN and COBOL and ALGOL for instance). These languages were easier to read and a single line of source might be converted into several machine instructions so it was more productive to write computer programs in these languages rather than assembler.
The C programming language was a later refinement using lessons learned from the early programming languages such as FORTRAN. And C used some of the same software development tools that already existed such as linkers which already existed. Still later C++ was invented, starting off as a refinement of C introducing object oriented facilities. In fact the first C++ compiler was really a C++ translator which translated C++ source code to C source code which was then compiled with a C compiler. However modern C++ is compiled straight to machine code in order to provide the full functionality of the C++ standard with templates, lambdas, and all the other things with C++11 and later.
linkers and loaders
When you run a program you run the executable file. The executable file contains several kinds of information. The first is the machine instructions that are the result of compiling the C++ source code. The other is information that the loader uses in order to know how to load the executable into memory.
In the old days, long long ago all libraries and object files were linked together into an executable file and the executable file was loaded by the loader and the loader was pretty simple.
Then people invented shared libraries and dynamic link libraries and this required the linker to be more complex and the loader to be more complex.
The linker became more complex because it had to be able to recognize the difference between using a shared library and a static library and be able to generate an executable file that not only contains the linked object code but also information for the loader about any dynamic libraries.
The loader became more complex because not only does the loader have to load the executable file into memory so that it can start running, the loader must also find any shared libraries or dynamic link libraries that are also needed and load those too. And the loader also has to do a certain amount of linking of the additional components, the shared libraries, so the loader does a lot more than it used to do.
See also
Difference between shared objects (.so), static libraries (.a), and DLL's (.so)?
What is an application binary interface (ABI)?
How to make a SIMPLE C++ Makefile
Object code (within an object file): Output from a compiler intended as input for a linker (for the linker to produce executable code).
Executable: A program ready to be run (executed) on a computer

C++ Modules and the C++ ABI

I've been reading about the C++ modules proposal (latest draft) but I don't fully understand what problem(s) it aims to solve.
Is its purpose to allow a module built by one compiler to be used by any other compiler (on the same OS/architecture, of course)? That is, does the proposal amount to standardizing the C++ ABI?
If not, is there another proposal being considered that would standardize the C++ ABI and allow compilers to interoperate?
Pre-compiled headers (PCH) are special files that certain compilers can generate for a .cpp file. What they are is exactly that: pre-compiled source code. They are source code that has been fed through the compiler and built into a compiler-dependent format.
PCHs are commonly used to speed up compilation. You put commonly used headers in the PCH, then just include the PCH. When you do a #include on the PCH, your compiler does not actually do the usual #include work. It instead loads these pre-compiled symbols directly into the compiler. No running a C++ preprocessor. No running a C++ compiler. No #including a million different files. One file is loaded and symbols appear fully formed directly in your compiler's workspace.
I mention all that because modules are PCHs in their perfect form. PCHs are basically a giant hack built on top of a system that doesn't allow for actual modules. The purpose of modules is ultimately to be able to take a file, generate a compiler-specific module file that contains symbols, and then some other file loads that module as needed. The symbols are pre-compiled, so again, there is no need to #include a bunch of stuff, run a compiler, etc. Your code says, import thing.foo, and it appears.
Look at any of the STL-derived standard library headers. Take <map> for example. Odds are good that this file is either gigantic or has a lot of #inclusions of other files that make the resulting file gigantic. That's a lot of C++ parsing that has to happen. It must happen for every .cpp file that has #include <map> in it. Every time you compile a source file, the compiler has to recompile the same thing. Over. And over. And over again.
Does <map> change between compilations? Nope, but your compiler can't know that. So it has to keep recompiling it. Every time you touch a .cpp file, it must compile every header that this .cpp file includes. Even though you didn't touch those headers or source files that affect those headers.
PCH files were a way to get around this problem. But they are limited, because they're just a hack. You can only include one per .cpp file, because it must be the first thing included by .cpp files. And since there is only one PCH, if you do something that changes the PCH (like add a new header to it), you have to recompile everything in that PCH.
Modules have essentially nothing to do with cross-compiler ABI (though having one of those would be nice, and modules would make it a bit easier to define one). Their fundamental purpose is to speed up compile times.
Modules are what Java, C#, and a lot of other modern languages offer. They immensely reduce compile time simply because the code that's in today's header doesn't have to be parsed over and over again, everytime it's included. When you say #include <vector>, the content of <vector> will get copied into the current file. #include really is nothing else than copy and paste.
In the module world, you simply say import std.vector; for example and the compiler loads the query/symbol table of that module. The module file has a format that makes it easy for the compiler to parse and use it. It's also only parsed once, when the module is compiled. After that, the compiler-generated module file is just queried for the information that is needed.
Because module files are compiler-generated, they'll be pretty closely tied to the compiler's internal representation of the C++ code (AST) and will as such most likely not be portable (just like today's .o/.so/.a files, because of name mangling etc.).
Modules in C++ have to be primarily better thing than today solutions, that is, when a library consists of a *.so file and *.h file with API. They have to solve the problems that are today with #includes, that is:
require macroguards (macros that prevent that definitions are provided multiple times)
are strictly text-based (so they can be tricked and in normal conditions they are reinterpreted, which gives also a chance to look differently in different compilation unit to be next linked together)
do not distinguish between dependent libraries being only instrumentally used and being derived from (especially if the header provides inline function templates)
Despite to what Xeo says, modules do not exist in Java or C#. In fact, in these languages "loading modules" relies on that "ok, here you have the CLASSPATH and search through it to find whatever modules may provide symbols that the source file actually uses". The "import" declaration in Java is no "module request" at all - the same as "using" in C++ ("import ns.ns2.*" in Java is the same as "using namespace ns::ns2" in C++). I don't think such a solution can be used in C++. The closest approximation I can imagine are packages in Vala or modules in Tcl (those from 8.5 version).
I imagine that C++ modules are rather not possible to be cross-platform, nor dynamically loaded (requires a dedicated C++ dynamic module loader - it's not impossible, but today hard to define). They will definitely by platform-dependent and should also be configurable when requested. But a stable C++ ABI is practically only required within the range of one system, just as it is with C++ ABI now.

Converting COFF lib file to OMF format

Is there any way to convert COFF library (lib file) to OMF library for using with C++Builder6 ? This coff is not just import library, it conatians some code.
When I try to convert it using borland's coff2omf.exe, I get 1KB file from 15KB file.
Instead of DigitalMars converter, you may use the Object file converter -- objconv -- available at agner.org/optimize
This utility can be used for converting object files between COFF/PE,
OMF, ELF and Mach-O formats for all 32-bit and 64-bit x86 platforms.
Can modify symbol names in object files. Can build, modify and convert
function libraries across platforms. Can dump object files and
executable files. Also includes a very good disassembler supporting
the SSE4, AVX, AVX2, AVX512, FMA3, FMA4, XOP and Knights Corner
instruction sets. Source code included (GPL).
This is a great site for low-level optimization, and there are a lot of useful information in the associated manual PDF file, about the library formats across several platforms.
It's fairly typical for an OMF object file to be a lot smaller than an equivalent COFF object, so what you're getting may well be valid.
If you find that it's really not, you can probably break the lib file into individual object files, disassemble the object files, re-assemble them to OMF object files, and put those together into an OMF lib file.
This is rather late, but if anyone is looking for an answer, you can checkout COFFIMPLIB from DigitalMars. COFF2OMF is available at the same site, but it looks like that's older.
It may be worth noting that in newer versions of Delphi (>= XE2), the compiler accepts COFF as well as OMF. It's probably also true for C++ Builder. The 64 bit compilers use only COFF.
See here for more informations about linking COFF.
Integrating Delphi (omf) and Ada (gcc, coff) required lots of effort until I've given up doing it in a single exe.
I honestly tried to disintegrate gcc rtl and ada rtl .a (coff libraries) into lots of .o (objects), convert them via coff2omf (there were DMD coff2omf and iirc another convobj or so). Some of the coff .o failed to be converted to .obj so I can't say if it was a reliable way at all.
Assembler level conversion is not so simple when it takes to exceptions and other deep details.
It's a pity I haven't tried a tool named
ftp://ftp.styx.cabel.net/pub/UniLink/
It's not obvious, but UniLink can probably be used to achieve the goal. One of its targets is C++ Builder package (both dynamic and static). unilink -Tpp -GI should do the trick