Is it possible to compile program on one platform and link with other ? What does object file contain ? Can we delink an executable to produce object file ?
No. In general object file formats might be the same, e.g. ELF, but the contents of the object files will vary from system to system.
An object file contains stuff like:
Object code that implements the desired functionality
A symbol table that can be used to resolve references
Relocation information to allow the linker to locate the object code in memory
Debugging information
The object code is usually not only processor specific, but also OS specific if, for example, it contains system calls.
Edit:
Is it possible to compile program on one platform and link with other ?
Absolutely. If you use a cross-compiler. This compiler specifically targets a platform and generates object files (and programs) that are compatible with the target platform. So you can use an X86 Linux system, for example, to make programs for a powerpc or ARM based system using the appropriate cross compiler. I do it here.
Is it possible to compile program on one platform and link with other ?
In general, no. Object files are compiler specific. Some compilers spit out COFF, others spit out ELF, etc. On top of that, you have to worry calling conventions, system calls, etc. This is platform dependent.
What does object file contain ?
Symbol tables, code, relocation, linking and debugging information.
If what you're after is portability, then write portable C/C++ and let a platform-specific compliant compiler do the work.
In practice, no. There are several things that would have to be the same:
- OS interface (same system calls)
- memory layout of data (endianness, struct padding, etc.)
- calling convention
- object file format (e.g. ELF is pretty standard on Linux)
Lookup ABI for more information.
It doesn't need to be said again: C/C++ object files aren't portable.
On the other hand, ANSI C is one of the most portable languages there is. You may not be able to pick up your object files, but recompiling your source is likely to work if you stick to the ANSI C standard. This might also be true of C++ as well.
I don't know how universal GNU C++ is, but if you can compile with gcc on one computer you're good to go on any other machine that also has gcc installed. Just about every machine you can think of has a C compiler. That's portability.
No. They are not platform independent. Take for instance, the GNU C Compiler (gcc), that generates ELF binary files. Windows compilers (Borland, Microsoft, Open Watcom) can produce Windows Binary PE (Portable Executable) format. Novell binaries are NLM (Netware Loadabable Module) format.
These examples above of the different outputs which is compiler dependant, there is no way, a linker on a Windows platform, would know anything about ELF format nor NLM format, hence it is impossible to combine different formats to produce an executable that can run on any platform.
Take the Apple's Mac OSX (before the Intel chips were put in), they were running on the PowerPC platform, even if it has the GNU C Compiler, the binary is specifically for the PowerPC platform, if you were to take that binary and copy it onto a Linux platform, it will not run as a result of the differences in the instructions of the platform's microprocessor i.e. PowerPC.
Again, same principle would apply to the OS/390 mainframe system, a GNU C compiler that produces a binary for that platform will not run on an pre-Intel Apple Mac OSX.
Edit: To further clarify what an ELF format would look like see below, this was obtained by running objdump -s main.o under Linux.
main.o: file format elf32-i386
Contents of section .text:
0000 8d4c2404 83e4f0ff 71fc5589 e55183ec .L$.....q.U..Q..
0010 14894df4 a1000000 00a30000 0000a100 ..M.............
0020 000000a3 00000000 8b45f483 38010f8e .........E..8...
0030 9c000000 8b55f48b 420483c0 048b0083 .....U..B.......
0040 ec086800 00000050 e8fcffff ff83c410 ..h....P........
0050 a3000000 00a10000 000085c0 7520a100 ............u ..
0060 00000050 6a1f6a01 68040000 00e8fcff ...Pj.j.h.......
0070 ffff83c4 10c745f8 01000000 eb5a8b45 ......E......Z.E
0080 f4833802 7e218b55 f48b4204 83c0088b ..8.~!.U..B.....
0090 0083ec08 68240000 0050e8fc ffffff83 ....h$...P......
00a0 c410a300 000000a1 00000000 85c07520 ..............u
00b0 a1000000 00506a20 6a016828 000000e8 .....Pj j.h(....
00c0 fcffffff 83c410c7 45f80100 0000eb08 ........E.......
00d0 e8fcffff ff8945f8 8b45f88b 4dfcc98d ......E..E..M...
00e0 61fcc3 a..
Contents of section .rodata:
0000 72000000 4552524f 52202d20 63616e6e r...ERROR - cann
0010 6f74206f 70656e20 696e7075 74206669 ot open input fi
0020 6c650a00 77000000 4552524f 52202d20 le..w...ERROR -
0030 63616e6e 6f74206f 70656e20 6f757470 cannot open outp
0040 75742066 696c650a 00 ut file..
Contents of section .comment:
0000 00474343 3a202847 4e552920 342e322e .GCC: (GNU) 4.2.
0010 3400 4.
Now compare that to a PE format for a simple DLL
C:\Program Files\Microsoft Visual Studio 9.0\VC\bin>dumpbin /summary "C:\Documents and Settings\Tom\My Documents\Visual Studio 2008\Projects\SimpleLib\Release\SimpleLib.dll"
Microsoft (R) COFF/PE Dumper Version 9.00.30729.01
Copyright (C) Microsoft Corporation. All rights reserved.
Dump of file C:\Documents and Settings\Tom\My Documents\Visual Studio 2008\Projects\SimpleLib\Release\SimpleLib.dll
File Type: DLL
Summary
1000 .data
1000 .rdata
1000 .reloc
1000 .rsrc
1000 .text
Notice the differences in the sections, under ELF, there is .bss, .text, .rodata and .comment, and is an ELF format for i386 processor.
Hope this helps,
Best regards,
Tom.
They are platform dependent. For example file-command prints out following:
$ file foo.o
foo.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped
C++ has the additional detail that the names that it puts into an object file are typically 'mangled' to deal type safety for names that are overloaded. The methods used to mangle names are not part of the C++ standard (in fact, name mangling is an implementation detail that's not required at all if the vendor can come up with a different way to implement overloading). So even for the same platform target, you cannot count on being able to link object files from one compiler vendor to another.
There are times when a compiler vendor might change the name mangling scheme from one compiler version to another. For example, I believe there are versions of MSVC for which you can't reliably link C++ object files from an older version to a newer version.
Some platforms have the name mangling specified in an ABI standard for the platform (such as ARM which uses the name mangling specified in the generic C++ ABI that was originally developed for SVr4 on Itanium), but others don't (Windows). Even for the ARM, I'm not sure how interoperable the ABI standard makes linking C++ object files that were created by different compilers.
I just wanted to say that as long as they use the same processor architecture and object format, as well as calling convention(usually nowadays, the processor maker creates one), there are many chances for object files to work interchangeably.
However, even in C the compiler makes some assumptions about certain library functions like stack protection(that I know of) being present, which need not be the same on both platforms. in the case that such code is generated, the objects will not be directly compatible.
System calls are not really relevant as long as the systems share them at all as normally they are called through C wrappers in the standard libraries.
In the end this only applies to C and very similar OSes like Linux and the BSDs, but it can happen.
It's possible to compile with GCC and create an object file in ELF file format and convert the object file to work in Visual Studio. I have done this multiple times now.
There are three things you need to know to do this: the function calling convention, the object file format, and the function name mangling.
Function calling conventions: For 32-bit mode the function calling convention is easy: they are the same for Windows and Unix. For 64-bit mode Windows and Unix use different calling conventions. Therefore, in 64-bit mode you have to get the calling convention correct. You can either do this when you compile or from the object file itself. It's much easier to do this when you compile. To have GCC use the Windows calling convention use -mabi=ms. To do this from the object file you need a tool. Agner Fog's objconv tool can do this for some functions.
Objection file format: To convert the object file format you need a tool. I use Agner Fog's objconv tool for this. It can convert from several different object file formats. For example to convert ELF64 to COFF64 (PE32+) do objconv -fcoff64 foo.o foo.obj.
Function name mangling: Due to function overloading in C++ compilers mangle the functions names. The details for each compiler can be found in Agner Fog's manual calling convetions. GCC and Visual Studio mangle function names differently. To work around this proceed function defintions with extern "C"
If you get all three of these correct and you don't make any OS specific calls than you may be able to use your object files between compilers successfully. There are other problems that can occur of course. See the manual in objconv for more details. But so far this method had worked well for me.
Related
Using this one line C++ program with any Solaris Studio compiler, gives an error on Solaris 11.4 when using the -library=stlport4 option.
hello.cpp
#include <iostream>
int main()
{
std::cout << "Hello world" << std::endl;
return 0;
}
$ /opt/solarisstudio12.4/bin/CC -m64 hello.cpp -o hello -library=stlport4
"/opt/solarisstudio12.4/lib/compilers/include/CC/stlport4/stl/_stdio_file.h", line 161: Error: __pad is not a member of const __FILE.
"/opt/solarisstudio12.4/lib/compilers/include/CC/stlport4/stl/_stdio_file.h", line 163: Error: __pad is not a member of const __FILE.
"/opt/solarisstudio12.4/lib/compilers/include/CC/stlport4/stl/_stdio_file.h", line 165: Error: __pad is not a member of const __FILE.
"/opt/solarisstudio12.4/lib/compilers/include/CC/stlport4/stl/_stdio_file.h", line 165: Error: __pad is not a member of const __FILE.
Please help me resolve this issue.
The 64-bit Solaris FILE structure is opaque:
64-bit applications should not rely on having access to the members of the FILE data structure. Attempts to access private implementation-specific structure members directly can result in compilation errors. Existing 32-bit applications are unaffected by this change, but any direct usage of these structure members should be removed from all code.
You can't access the FILE structure when doing a 64-bit compile on Solaris. It should work if you compile a 32-bit binary, though.
Why did Sun do this?
Because Sun (and now Oracle, for at least a bit longer) provide actual binary compatibility guarantees:
A binary application built on Solaris 2.6 or later that makes use of operating system interfaces as defined in stability.5 run on subsequent releases of Oracle Solaris, including their initial releases and all updates, even if the application has not been recompiled for those latest releases.
That's Oracle's guarantee. Sun's long-ago guarantee was actually stronger, pretty much saying if your code compiled, no later update to Solaris would break it.
And early versions of Solaris had only an 8-bit field for the FILE's associated file descriptor. And that file descriptor field was visible, and code built on early versions of Solaris used it.
So Sun was stuck with an 8-bit field for the file descriptor in the FILE structure.
But that was over three decades ago - before 64-bit processors came about.
And there were no legacy 64-bit binaries Sun had to worry about being binary forward-compatible with.
Since Sun only guaranteed binary compatibility, Sun made the new 64-bit FILE structure opaque so no compliant code could access it. (Yes, Sun was providing 64-bit systems in the early 1990s. My Little Pony killed a great, innovative company.)
Sun did provide an extended FILE library that could be used by programs needing more than 256 FILE's open at any one time:
The extended FILE facility allows 32-bit processes to use any valid file descriptor with the standard I/O (see stdio(3C)) C library functions. Historically, 32-bit applications have been limited to using the first 256 numerical file descriptors for use with standard I/O streams. By using the extended FILE facility this limitation is lifted. Any valid file descriptor can be used with standard I/O.
In Solaris 11.4 that now reads:
The extendedFILE.so.1 is an obsolete, empty, library, kept for binary compatibility only.
Its old purpose, the use of file descriptors larger than 255 for 32 bit binaries, is now the default behavior in Oracle Solaris.
The libc library now handles the environment variables originally handled by extendedFILE.so.1
The bottom line is if you want to access the 64-bit FILE structure on Solaris, you're not going to be able to do it.
This is a bug in the stlport4 headers provided with the compiler, Oracle Bug 27531287. Patches for Studio 12.3, 12.4, 12.5, and 12.6 are available to customers with current support contracts.
(Andrew Henle's answer explains the underlying problem, the stlport4 headers had depended on something they shouldn't have, and broke when that changed in Solaris 11.4.)
Compilers such as GCC and Clang allow to compile C++ programs without the C++ standard library, e.g. using the -nostdlib command line flag. It seems that such often fail to link thou, for example:
void f() noexcept { throw 42; }
int main() { f(); }
Usually fails to link due to undefined symbols like __cxa_allocate_exception, typeinfo for int, __cxa_throw, __gxx_personality_v0, __clang_call_terminate, __cxa_begin_catch, std::terminate() etc.
Even a simple
int main() {}
Fails to link with
ld: warning: cannot find entry symbol _start; defaulting to 0000000000400120
and is killed by the OS upon execution. Using -c the compiler still runs the linker which blatantly fails with:
ld: error in mytest(.eh_frame); no .eh_frame_hdr table will be created.
Is it a realistic goal to program and compile C++ applications or libraries without using and linking to the standard library? How can I compile my code using GCC or Clang on Linux? What core language features would one be unable to use without the standard library?
You will basically find all of your questions answered at osdev.org, but I'll give a brief summary anyway.
When you give GCC -nostdlib, you are saying "no startup or library files". This includes:
crti.o, crtbegin.o, crtend.o and crtn.o. Generally kernel developers only care about implementing crti.o and crtend.o and let GCC supply crtbegin.o and crtend.o by passing -print-file-name= to the linker. Generally these are just stubs that consist of .init and .fini respectively, leaving room for GCC to shove the contents of crtbegin.o and crtend.o respectively. These files are necessary for calling global constructors/destructors.
You can't avoid linking libgcc (the "low-level runtime library" (-lgcc) because even if you pass -nostdlib GCC will emit calls to its functions whenever you use it, leading to inexplicable linking errors for seemingly no reason. This is the case even when you're implementing/porting a C library.
You don't "need" libstdc++ no, but typically kernel developers want it. Porting a C library then implementing the C++ standard library from scratch is an extremely difficult task.
Since you only want to get rid of the "standard library", but keeping libc (on a Linux system) you're essentially programming C++ with just a C library. Of course, there's nothing wrong with this and you do you, but ultimately I don't see the point unless you plan on developing a kernel.
Required reading:
OSDev's C++ page - If you really care about RTTI/exception support, it's more annoying to implement than it sounds. Typically people just pass -fno-rtti or -fno-exceptions and then worry about it down the line or not at all.
"Standard" is a misnomer. In this context it doesn't mean "the library (set of functions, classes etc) as defined by the C++ standard" but "the usual set of libraries and objects (compiled files in a certain format) gcc links with by default". Some of those are necessary for most or even all programs to function.
If you use this flag, it's your responsibility to provide any missing functionality. There are several ways to do so:
Cherry-pick libraries and objects that your program really needs out of the default set. (Makes little sense as the result will most probably be exactly the same as with the default link flags).
Provide your own implementation of missing functionality.
Explicitly disable, through compiler flags, language features your program isn't using. I know of two such features: exceptions and RTTI. This is needed because the compiler needs to generate exceptions-related code and RTTI info even if these features are not explicitly used in this module.
I'm working on C and C++ programs which need to run on several different embedded platforms, for which I have cross-complilers so I can do the build on my x86 desktop.
I have a horrible problem with certain functions, e.g. "strtod()". Here's my simple test program:
#include <stdlib.h>
#include <stdio.h>
int main(int argc, char **argv)
{
if ( (argc < 2) || (NULL == argv[1]) ) return 0;
double myDouble = strtod(argv[1], NULL);
printf("\nValue: %f\n\n", myDouble);
return 0;
}
Normally I build all programs with dynamic linking to keep the binaries as small as possible. The above works fine on the x86 and Power PC. However, on the Arm system (BeagleBoard xM with Debian) strtod() misbehaves (the program always outputs "0.000000").
I tried building the program with the option '-static', and that worked on the Beagle:
root#beaglexm:/app# ./test.dynamic 1.23
Value: 0.000000
[Dynamic linked version - WRONG!!]
root#beaglexm:/app# ./test.static 1.23
Value: 1.230000
[Correct!!]
I also tested on a BeagleBone Black, which has a slightly different distribution. Both versions (static and dynamic) worked fine on the BBB.
Digging around in the libraries, I found the following version numbers:
Cross Compiler Toolchain: libc-2.9.so
BeagleBoard XM (DOESN'T WORK): libc-2.13.so
BeagleBone Black (WORKS!): libc-2.16.so
So my cross compiler is building against an older version of glibc. I've read in several places that glibc should be backwards-compatible.
I thought about static linking only libc, but according to this question it's a bad idea unless all libraries are statically linked.
Static linking everything does work, but there are serious constraints on the system which mean I need to keep the binaries as small as possible.
Any ideas what would cause horrible problems with strtod() (and similar functions) and/or why glibc 2.13 is not backwards compatible?
EDIT:
I didn't mention that the "soname" (i.e. top level name) is the same on all platforms: "libc.so.6" From my reading of the docs, the number AFTER the .so in the "soname" is the major version and only changes if the interface changes - hence all these versions should be compatible. The number BEFORE the .so which appears in the actual file name (shown above, and found by following the symlink) is the minor version. See: link
Generally version numbers reflect compatibility. The number that appears between the .so and the next dot represents a MAJOR revision, not guaranteed compatible with any other major revision.
The number(s) that that follow that, which you'll only see if you follow the symbolic links, represents a MINOR revision. These can be used interchangably, and symlinks are used to do just that. The program links against libc.so.6 or whatever, and on the actual filesystem, libc.so.6 is a symbolic link to (for example) libc.so.6.12.
glibc tries to maintain compatibility even across major revisions, but there are times when they simply have to accept a breaking change. Typically this would be when a new version of the C or POSIX standards are released and function signatures get updated in a way that breaks binary compatibility.
Any numbers that appear before the .so will also break compatibility if changed; these usually represent a complete rewrite of a program. For example glib vs glib2. Not of concern for libc.
The tool ldd is very useful for investigating library dependencies and discovering while exact version of the library is actually being loaded.
I'm currently working on a compiler project using llvm. I have followed various tutorials to the point where I have a parser to create a syntax tree and then the tree is converted into an llvm Module using the provided IRBuilder.
My goal is to create an executable, and I am confused as what to do next. All the tutorials I've found just create the llvm module and print out the assembly using Module.dump(). Additionally, the only documentation I can find is for llvm developers, and not end users of the project.
If I want to generate machine code, what are the next steps? The llvm-mc project looks like it may do what I want, but I can't find any sort of documentation on it.
Perhaps I'm expecting llvm to do something that it doesn't. My expectation is that I can build a Module, then there would be an API that I can call with the Module and a target triple and an object file will be produced. I have found documentation and examples on producing a JIT, and I am not interested in that. I am looking for how to produce compiled binaries.
I am working on OS X, if that has any impact.
Use llc -filetype=obj to emit a linkable object file from your IR. You can look at the code of llc to see the LLVM API calls it makes to emit such code. At least for Mac OS X and Linux, the objects emitted in such a manner should be pretty good (i.e. this is not a "alpha quality" option by now).
LLVM does not contain a linker (yet!), however. So to actually link this object file into some executable or shared library, you will need to use the system linker. Note that even if you have an executable consisting of a single object file, the latter has to be linked anyway. Developers in the LLVM community are working on a real linker for LLVM, called lld. You can visit its page or search the mailing list archives to follow its progress.
As you can read on the llc guide, it is indeed intended to just generate the assembly, and then "The assembly language output can then be passed through a native assembler and linker to generate a native executable" - e.g. the gnu assembler (as) and linker (ld).
So the main answer here is to use native tools for assembling and linking.
However, there's experimental support for generating the native object directly from an IR file, via llc:
-filetype - Choose a file type (not all types are supported by all targets):
=asm - Emit an assembly ('.s') file
=obj - Emit a native object ('.o') file [experimental]
Or you can use llvm-mc to assemble it from the .s file:
-filetype - Choose an output file type:
=asm - Emit an assembly ('.s') file
=null - Don't emit anything (for timing purposes)
=obj - Emit a native object ('.o') file
I don't know about linkers, though.
In addition, I recommend checking out the tools/bugpoint/ToolRunner.h file, which exposes a wrapper combining llc and the platform's native C toolchain for generating machine code. From its header comment:
This file exposes an abstraction around a platform C compiler, used to compile C and assembly code.
Check out these functions in llvm-c/TargetMachine.h:
/** Emits an asm or object file for the given module to the filename. This
wraps several c++ only classes (among them a file stream). Returns any
error in ErrorMessage. Use LLVMDisposeMessage to dispose the message. */
LLVMBool LLVMTargetMachineEmitToFile(LLVMTargetMachineRef T, LLVMModuleRef M,
char *Filename, LLVMCodeGenFileType codegen, char **ErrorMessage);
/** Compile the LLVM IR stored in \p M and store the result in \p OutMemBuf. */
LLVMBool LLVMTargetMachineEmitToMemoryBuffer(LLVMTargetMachineRef T, LLVMModuleRef M,
LLVMCodeGenFileType codegen, char** ErrorMessage, LLVMMemoryBufferRef *OutMemBuf);
To run the example BrainF program, compile it and run:
echo ,. > test.bf
./BrainF test.bf -o test.bc
llc -filetype=obj test.bc
gcc test.o -o a.out
./a.out
then type a single letter and press Enter. It should echo that letter back to you. (That's what ,. does.)
The above was tested with LLVM version 3.5.0.
I want to run an example plugin for CLANG/LLVM. Specifically llvm\tools\clang\examples\PrintFunctionNames. I managed to build it and i see an PrintFunctionNames.exports but i dont think visual studios supports it. The file is simply _ZN4llvm8Registry*. I have no idea what that is but i suspect its namespace llvm, class Registry which is defined as
template <typename T, typename U = RegistryTraits<T> >
class Registry {
I suspect the key line is at the end of the example file
static FrontendPluginRegistry::Add<PrintFunctionNamesAction> X("print-fns", "print function names");
print-fns is the name while the 2nd param is the desc. When i try loading/running the dll via
clang -cc1 -load printFunctionNames.dll -plugin print-fns a.c
I get an error about not finding print-fns. I suspect its because the static variable is never being initialize thus it never registers the plugin. A wrong dll name would get an error loading module msg.
I created a def file and added it to my project. It compiled but still no luck. Here is my def file
LIBRARY printFunctionNames
EXPORTS
X DATA
How do i register the plugin or get this example working?
Ok, becoming slightly more clear. To summarize: Visual Studio has nothing to do with it, really. This is a plugin for the clang executable. Therefore, there must be a method to communicate between them (the plugin interface). This appears to be an undocumented interface, so it's taking a bit off guesswork.
Troubleshooting DLL issues is done with "Dependency Walker" aka "Depends". It offers a profiling mode, in which all symbol lookups can be profiled. I.e. if you profile clang -cc1 -load printFunctionNames.dll -plugin print-fns a.c, you will see what symbols clang expects from your DLL, and in what order.
It looks like you're trying to mix C++ code built with two different, incompatible compilers. That's not supported, and the error you're seeing is a typical sign of that: C++ compilers usually use a "name mangling scheme", and if two compilers are incompatible then their name mangling schemes don't line up. One compiler may mangle llvm::Registry as _ZN4llvm8Registry* while another refers to it as llvm__Registry.