Converting COFF lib file to OMF format - c++

Is there any way to convert COFF library (lib file) to OMF library for using with C++Builder6 ? This coff is not just import library, it conatians some code.
When I try to convert it using borland's coff2omf.exe, I get 1KB file from 15KB file.

Instead of DigitalMars converter, you may use the Object file converter -- objconv -- available at agner.org/optimize
This utility can be used for converting object files between COFF/PE,
OMF, ELF and Mach-O formats for all 32-bit and 64-bit x86 platforms.
Can modify symbol names in object files. Can build, modify and convert
function libraries across platforms. Can dump object files and
executable files. Also includes a very good disassembler supporting
the SSE4, AVX, AVX2, AVX512, FMA3, FMA4, XOP and Knights Corner
instruction sets. Source code included (GPL).
This is a great site for low-level optimization, and there are a lot of useful information in the associated manual PDF file, about the library formats across several platforms.

It's fairly typical for an OMF object file to be a lot smaller than an equivalent COFF object, so what you're getting may well be valid.
If you find that it's really not, you can probably break the lib file into individual object files, disassemble the object files, re-assemble them to OMF object files, and put those together into an OMF lib file.

This is rather late, but if anyone is looking for an answer, you can checkout COFFIMPLIB from DigitalMars. COFF2OMF is available at the same site, but it looks like that's older.

It may be worth noting that in newer versions of Delphi (>= XE2), the compiler accepts COFF as well as OMF. It's probably also true for C++ Builder. The 64 bit compilers use only COFF.
See here for more informations about linking COFF.

Integrating Delphi (omf) and Ada (gcc, coff) required lots of effort until I've given up doing it in a single exe.
I honestly tried to disintegrate gcc rtl and ada rtl .a (coff libraries) into lots of .o (objects), convert them via coff2omf (there were DMD coff2omf and iirc another convobj or so). Some of the coff .o failed to be converted to .obj so I can't say if it was a reliable way at all.
Assembler level conversion is not so simple when it takes to exceptions and other deep details.
It's a pity I haven't tried a tool named
ftp://ftp.styx.cabel.net/pub/UniLink/
It's not obvious, but UniLink can probably be used to achieve the goal. One of its targets is C++ Builder package (both dynamic and static). unilink -Tpp -GI should do the trick

Related

Load and link obj files at Runtime in C/C++ (JIT)

I'm searching for a C or C++ library which can load and link obj files (doesn't matter if ELF or obj) dynamicly at runtime. I spend some time searching for such library, but my results weren't successful.
What I tried:
LLVM:
Currently my best solution! I used Clang to generate .obj files in the bytecode format of LLVM and used its JIT functions to dynamic load and execute the function. But, the LLVM is huge and my PC at home hasn't the power to compile the complete LLVM just for the JIT. Also I encountered some problems with relocation overflows or not implemented relocation types.
libjit:
I read, that it can load .elf files and link them too. But sadly, I couldn't compile it for windows, so I couldn't try.
Nanojit and NativeJit:
It seems like they don't support JITting an object file.
So... What can I do? Do I have to stick around with the LLVM? Are there any alternatives?
I suppose that an analogy that can be taken as a 1st approach is that the .bc is similar to an .o (or .obj) file in that it is just the translation of C++ code to an intermediate language, and tht it can contain references to functions not defined in it, to be searched in libraries.
And that the JIT-ted code is similar to a DLL, in the sense that it will be linked dynamically to the executable where it will run in.
You need not to compile LLVM -- you can download the binaries for LLVM and assorted utilities (like clang) from LLVM Download Page

Finding all libraries and header files forming a C++ executable

If I have a C++ source file, gcc can give all its dependencies, in a tree structure, using the -H option. But given only the C++ executable, is it possible to find all libraries and header files that went into its compilation and linking?
If you've compiled the executable with debugging symbols, then yes, you can use the symbols to get the files.
If you have .pdb files (Visual studio creates them to store sebugging information separately) you can use all kinds of programs to open them and see the source files and methods.
You can even open it with a text editor and you'll see, among the gibrish, a list of the functions and source files.
If you're using linux (or GNU compilers in general), you can use gdb (again only if you have debug symbols enables in compilation time).
Run gdb on your executable, then run the command: info sources
That's an important reason why you should always remove that flag when going into production. You don't want clients to mess around with your sources, functions, and code.
You cannot do that, because that executable might have been build on a machine on which the header files (or the C++ code, or the libraries) are private or even generated. Also, if a static library is linked in, you have no reliable way to find out.
In practice however, on Linux, using nm or objdump or ldd on the executable will often (but not always) gives you a good clue about the needed libraries.
Also, some executables are dynamically loading a plugin e.g. using dlopen, so your question might not have any sense (since that plugin is known only at runtime).
Notice also that you might not know if an executable is obtained by compiling some C++ code (you might not be able to tell if it was obtained from C, C++, D, or Ocaml, ... source code, or a mixture of them).
On Linux, if you build an executable with static linking and stripping, people won't be able to easily guess the source programming language that you have used.
BTW, on Linux distributions, it is the role of the package management system to deal with such dependencies.
As answered by Yochai Timmer if the executable contains debug information (e.g. in DWARF format) you should be able to get a lot more information.

What is a Delphi DCU file?

What is a Delphi DCU file?
I believe it stands for "Delphi Compiled Unit". Am I correct in assuming it contains object code, and therefore corresponds to an ".o" file compiled from a C/C++ source code file?
I believe .dcu generally means "Delphi Compiled Unit" as opposed to a .pas file which is simply "Pascal source code".
A .dcu file is the file that the DCC compiler produces after compiling the .pas files (.dfm files are converted to binary resources, then directly processed by the linker).
It's analogous to .o and .obj files that other compilers produce, but contains more information on the symbols (therefore you can reverse engineer the interface section of a unit from it omitting comments and compiler directives).
A .dcu file technically not a "cache" file, although your builds will run faster if you don't delete them and when doesn't need to recompile them. A .dcu file is tied to the compiler version that generated it. In that sense it is less portable than .o or .obj files (though they have their share of compatibility problems too)
Here's some history in case it adds anything.
Compilers have traditionally translated source code languages into some intermediate form. Interpreters don't do that -- they just interpret the language directly and run the application right away. BASIC is the classic example of an interpreted language. The "command line" in DOS and Windows has a language that can be written in files called "batch files" with a .bat extension. But typing things on the command line executed them directly. In *nix environments, there are a bunch of different command-line interpreters (CLIs), such as sh, csh, bash, ksh, and so on. You can create batch files from all of them -- this are usually referred to as "scripting languages". But there are a lot of other languages now that are both interpreted and compiled.
Anyway Java and .Net, for example, compile into something called an intermediate "byte-code" representation.
Pascal was originally written as a single-pass compiler, and Turbo Pascal (originating from PolyPascal) - with different editions for CP/M, CP/M-86 and DOS - directly generated a binary executable (COM) file that ran under those operating systems.
Pascal was originally designed as a small, efficient language intended to encourage good programming practices using structured programming and data structuring; Turbo Pascal 1 was originally designed as a an IDE with built-in very fast compiler, and an affordable competitor in the the DOS and CP/M market against the long edit/compile/link cycles at that time. Turbo Pascal and Pascal had similar limitations as any programming environment back then: memory and disk space were measured in kilobytes, processor speeds in Megahertz.
Linking to an executable binary prevented you from linking to separately compiled units and libraries.
Before Turbo Pascal, there was UCSD p-System operating system (supporting many languages, including Pascal. The UCSD Pascal compiler back then already extended the Pascal language with units) which compiled into a pseudo-machine byte-code (called p-code) format that allowed linking multiple units together. It was slow though,
Meanwhile, c evolved in VAX and Unix environments, and it compiled into .o files, which meant "object code" as opposed to "source code". Note: this is totally unrelated to anything we call "objects" today.
Turbo Pascal up to and including version 3 directly generated .com binary output files (although you could use amend those overlays files), and as of version 4 supported separating code into units that first compiled into .tpu files before linked into the final executable binary. The Turbo C compiler generated .obj (object code) files rather than byte-codes, and Delphi 2 introduced .obj file generation on order to co-operate with C++ Builder.
Object files use relative addressing within each unit, and require what's called "fix-ups" (or relocation) later on to make them run. Fix-ups point to symbolic labels that are expected to exist in other object files or libraries.
There are two kinds of "fix-ups": one is done statically by a tool called a "linker". The linker takes a bunch of object files and seams them together into something analogous to a patchwork quilt. It then "fixes-up" all of the relative references by plugging-in pointers to all of the externally-defined labels.
The second fix-ups are done dynamically when the program is loaded to run. They're done by something called the "loader", but you never see that. When you type a command on the command line, the loader is called to load an EXE file into memory, fix-up the remaining links based on where the file is loaded, and then control is transferred to the entry point of the application.
So .dcu files originated as .tpu files when Borland introduced units in Turbo Pascal, then changed extension with the introduction of Delphi. They are very different from .obj files, though you can link to .obj files from Turbo Pascal and Delphi.
Delphi also hid the linker entirely, so you just do a compile and a run. All of the linker settings are still there, however, in one of Delphi's options panes.
In addition to David Schwartz's answer, there is one case when a dcu actually is quite different from typical obj files generated in other languages: Generic type definitions. If a generic type is defined in a Delphi Unit, the compiler compiles this code into a syntax tree representation rather than to machine code. This syntax tree representation then is stored in the dcu file. When the generic type then is used and instantiated in another unit, the compiler will use this representation and "merge" it with the syntax tree of the unit using the generic type. You could think of this being somewhat analogues to method inlining. This, btw is also the reason why a unit that makes heavy use of generics takes much longer to compile, although the generic types are "linked in" from a dcu file.
A Delphi Compiled Unit contains object code, and pre-compiled headers, and is therefore somewhat comparable to both an obj file and a .pch / .gch file.
The 'interface' section of a Delphi source file corresponds to the header, and the 'implementation' section creates the object code.
Pre-compiled header files may significantly reduce compilation and link time. The DCU header section provides link information to other referenced units, that does not have to be re-discovered.
In the Delphi / Turbo Pascal environment, pre-compiled headers support strict type checking, which would have required source-code referencing if an Object file format like .coff or .obj had been used. (In C++, name mangling provides a similar but less complete function).

Which files contains the implementations for malloc() and new()?

On Linux (Ubuntu) what is the path and file name where I can see the C/C++ code used in the malloc() and new() implementations?
I have looked in /usr/include but started to lose my way around. Does it depend on which version of gcc/g++ I have installed?
If someone could also give a general answer which would help me understand how Linux stores all the "native" functions it would be most appreciated and I wouldnt ever have to ask again for a different function.
One thing: new is a C++ keyword that uses malloc.
The source for malloc is in the source for your version of libc, which is probably glibc. Look at their source.
Other built in functions that are system calls only have shell implementations in glibc that call the underlying syscall.
The GIT of the GNU standard C lib implementation can be found here.
From this point in the tree you should be able to find the rest as well.
The "implementation" is a library you can link (an "a" file or an "so" file) plus an header that contains the declaration (an "h" file).
The C and CPP files sits on the computer that created those libraries before they had been used to build-up your system. And since their source is not required for your programs to work (you just link the binaries, not the sources) they are not distributed together with the system build.
That's why you have to download those files from the source repositories, jut like if you want yourself to rebuild the system.
You find this in the implementation of the C Standard Library the compiler uses.
I'm not sure for Ubunta. Debian's gcc uses eglibc, which's sources could be found here.

What is the content of OBJ file?

I know that a OBJ file produced after compilation of C/C++ source code in any standard compiler generates OBJ file, which later LINKed with the rest of the required libraries to form the EXEcutable file. I want to know the format/structure of the OBJ file. Please go ahead.
C++ Builder (and Delphi) use OMF format obj files. See this wikipedia link for details.
Additional information: Microsoft Visual C++ use an incompatible COFF, that's why C++ Builder have a utility to convert them.
See also: What's the difference between the OMF and COFF format?
the .obj file is a format used by Microsoft Compilers and is described in the (Common Object File Format) COFF spec
other compilers use different formats to store object code, e.g. ELF on Linux
Under windows, it'd be a COFF object. Google this file format for a spec. They are linked to produce a PE.