How and why is uncompiled C/C++ sometimes used?

How and why is uncompiled C/C++ sometimes used? - c++

I've seen some programs such as ROS use uncompiled C/C++ files (raw code, not yet compiled). In this example, C++ is used but not compiled. Most C/C++ I've learned so far needs to be compiled to run.
My guess here is that either they're compiled by the system every time it runs, or it's interpreted like Python.
How and why exactly are such uncompiled C/C++ files used?

The premise of your question is untrue.
The source code will undergo a translation process just as it always is.
The CMakeLists.txt file tells CMake what to do, with help from some items set up when you followed the build instructions. In examples given on the reference page you linked to, we have the common declaration to build a C++ file into an executable:
add_executable(talker src/talker.cpp)
…along with all the ROS-related business that gets performed.

Related

Compiling modules in different directories

I'm trying to follow these instructions to compile a module that depends on another module which I've created: https://ocaml.org/learn/tutorials/modules.html
In my case, I have a module ~/courseFiles/chapter5/moduleA.ml and another module in ~/OCamlCommons/listMethods.ml. I have compiled listMethods.ml using ocamlopt -c listMethods.ml and this seemed to work, it produced a file listMethods.cmx.
The file moduleA.ml contains open ListMethods;;. Now with my terminal located at ~/courseFiles/chapter5 I ran ocamlopt -c moduleA.ml but the terminal returns
Error: Unbound module ListMethods
Now I can understand why it would do this, but the instructions at that site seem to indicate that what I've done is how you're supposed to do this. Presumably I need to pass in the location of either the script or executable files when compiling moduleA.ml, but I'm not sure what the syntax should be. I've tried a few guesses, and guessed about how I could do this with ocamlfind but I haven't succeeded. I tried looking for instructions on compiling modules located in different directories but didn't find anything (or anything I can make sense of anyway).

First of all, the toolkit that is shipped with the OCaml System Distribution (aka the compiler) is very versatile but quite low-level and should be seen as a foundation layer for building more high-level build systems. Therefore, learning it is quite hard and usually makes sense only if you're going to build such systems. It is much easier to learn how to use dune or oasis or ocamlbuild instead. Moreover, it will diverge your attention from what actually matters - learning the language.
With all that said, let me answer your question in full details. OCaml implements a separate compilation scheme, where each compilation unit could be built independently and then linked into a single binary. This scheme is common in C/C++ languages, and in fact, OCaml compiler toolchain is very similar to the C compiler toolchain.
When you run ocamlopt -c x.ml you're creating a compilation unit, and as a result a few files are produced, namely:
x.o - contains actually the compiled machine code
x.cmx - contains optimization data and other compiler-specific information
x.cmi - contains compiled interface to the module X.
In order to compile a module, the compiler doesn't need the code of any other modules used in that module. But what it needs is the typing information, i.e., it needs to know what is the type of List.find function, or a type of any other function that is provided by some module which is external to your module. This information is stored in cmi files, for which (compiled) header files from C/C++ is the closest counterpart. As in C/C++ the compiler is searching for them in the include search path, which by default includes the current directory and the location of the standard library, but could be extended using the -I option (the same as in C/C++). Therefore, if your module is using another module defined in a folder A you need to tell the compiler where to search for it, e.g.,
ocamlopt -I A -c x.ml
The produced objective file will not contain any code from external modules. Therefore, once you will reach the final stage of compilation - the linking phase, you have to provide the implementation, e.g., if your module X was using a module implemented in a file with relative path A/y.ml, and you have compiled it in that folder, then you need to specify again the location of the compiled implementation, e.g.,
ocamlopt -I A y.cmx x.cmx -o exe
The order is important, all modules used by a module should be specified before that module, otherwise, you will get the "No implementations provided" error.
As you can see, it is a pretty convoluted process, and it is really not worthwhile to invest your time in learning it. So, if you have an option, then use a higher-level tool to build your programs. If not sure, then choose Dune :)

Difference between code object and executable file

I'm a C++ beginner and I'm studying the basics of the language. There is a topic in my book about the compiler and my problem is that I can not understand what the text wants to say:
C++ is a compiled language so you need to translate the source code in
a file that the computer can execute. This file is generated by the
compiler and is called the object code ( .obj ), but a program like
the "hello world" program is composed by a part that we wrote and a part
of the C++ library. The linker links these two parts of a program and
produces an executable file ( .exe ).
Why does my book tell that the file that is executed by the computer is the one with the obj suffix (the object code) and then say that it is the one with the exe suffix?

Object files are source compiled into binary machine language, but they contain unresolved external references (such as printf,for instance). They may need to be linked against other object files, third party libraries and almost always against C/C++ runtime library.
In Unix, both object and exe files are the same COFF format. The only difference is that object files have unresolved external references, while a.out files don't.

The C++ specification is a technical document in English. For C++11 have a look inside n3337 (or spend a lot of money to buy the paperback ISO standard). In theory you don't need a computer to run a C++ program (you could use a bunch of human slaves, but that would be unethical, inefficient, and unreliable).
You could have a C++ implementation which is an interpreter, not a compiler (e.g. Ch by SoftIntegration)
If you install Linux on your laptop (which I recommend doing to every student) then you could have several free software C++ compilers, in particular GCC and Clang/LLVM (using g++ and clang commands respectively). Source files are suffixed .cc, or .cxx, or .cpp, or even .C (I prefer .cc), and you could ask the compiler to handle a file of some other suffix as a C++ source file (but that is not conventional). Then, both object files (suffixed .o) and executables share the same ELF format. Conventionally, executables don't have any suffix (e.g. g++ is a binary executable, not doing much except starting other processes like cc1plus -the compiler proper-, as -the assembler-, ld -the linker- etc...)
In all cases I strongly recommend:
to enable all warnings and debug info during compilation (e.g. use g++ -Wall -g ....)
to improve your source code till you got no warnings
to learn how to use the debugger (gdb)
to be able to build your program on the command line
to use a version control system like git
to use a good editor like emacs, gedit, geany, or gvim
once you are writing programs in several source files, learn how to use a builder like make
to learn C++11 (or even perhaps C++14) rather than older C++ standards
to also learn other programming languages (Ocaml, Scheme, Haskell, Prolog, Scala, ....) since they would improve your thinking and your way of coding in C++
to study the source code of several free software coded in C++
to read the documentation of every function that you are using, e.g. on cppreference or in man pages (for Linux)
to understand what is undefined behavior (the fact that your program sometimes work does not make it correct).
Concretely, on Linux you could edit your Hello World program (file hello.cc) with gedit or emacs (with a command like gedit hello.cc) etc..., compile it using g++ -Wall -g hello.cc -o hello command, debug it using gdb ./hello, and repeat (don't forget to use git commands for version control).
Sometimes it makes sense to generate some C++ code, e.g. by some shell, Python, or awk script (or even by your own program coded in C++ which generates C++ code!).
Also, understand that an IDE is not a compiler (but runs the compiler for you).

The basic steps for creating an application from a C or C++ source file are as follows:
(1) the source files are created (by a person or generated by a program), (2) the source files are compiled (which is really two steps, Preprocessor and compilation) into object code, (3) the object files that are created by the C/C++ compiler are linked to create the .exe
So you have these steps of transforming one version of the computer program, the source files, to another, the executable. The C++ source is compiled to produce the object files. The object files are then linked to produce the executable file.
In most cases there are several different programs involved in the compile and link process with C and C++. Each program takes in certain files and creates new files.
C/C++ Preprocessor takes in source code files and generates source code files
C/C++ Compiler takes in source code files and generates object code files
the linker takes in object code files and libraries and generates executable files
See What is the difference between - 1) Preprocessor,linker, 2)Header file,library? Is my understanding correct?
Most compiler installations have a program that runs these various applications for you. So if you are using gcc then gcc program will run first the C++ Preprocessor then then C++ compiler and then the linker. However you can modify what gcc does with command line options to tell it to only run the C++ Preprocessor or to only compile the source files but not to link them or to only link the object code files.
A brief history of computer languages and programming
The languages used for programming computers along with the various software development tools have evolved over the years.
The first computers were programmed with numbers entered by switches on a console.
Then people started developing languages and software that could be used to create software more easily and quicker. The first major development was creating assembler language where each line of source was converted by a computer program into a machine code instruction. Along with this came the development of linkers (which link pieces of machine code together into larger pieces). Assemblers were improved by adding a macro or preprocessor facility somewhat like the C/C++ Preprocessor though designed for assembly language.
Then people created programming languages that looked more like people written languages rather than assembler (FORTRAN and COBOL and ALGOL for instance). These languages were easier to read and a single line of source might be converted into several machine instructions so it was more productive to write computer programs in these languages rather than assembler.
The C programming language was a later refinement using lessons learned from the early programming languages such as FORTRAN. And C used some of the same software development tools that already existed such as linkers which already existed. Still later C++ was invented, starting off as a refinement of C introducing object oriented facilities. In fact the first C++ compiler was really a C++ translator which translated C++ source code to C source code which was then compiled with a C compiler. However modern C++ is compiled straight to machine code in order to provide the full functionality of the C++ standard with templates, lambdas, and all the other things with C++11 and later.
linkers and loaders
When you run a program you run the executable file. The executable file contains several kinds of information. The first is the machine instructions that are the result of compiling the C++ source code. The other is information that the loader uses in order to know how to load the executable into memory.
In the old days, long long ago all libraries and object files were linked together into an executable file and the executable file was loaded by the loader and the loader was pretty simple.
Then people invented shared libraries and dynamic link libraries and this required the linker to be more complex and the loader to be more complex.
The linker became more complex because it had to be able to recognize the difference between using a shared library and a static library and be able to generate an executable file that not only contains the linked object code but also information for the loader about any dynamic libraries.
The loader became more complex because not only does the loader have to load the executable file into memory so that it can start running, the loader must also find any shared libraries or dynamic link libraries that are also needed and load those too. And the loader also has to do a certain amount of linking of the additional components, the shared libraries, so the loader does a lot more than it used to do.
See also
Difference between shared objects (.so), static libraries (.a), and DLL's (.so)?
What is an application binary interface (ABI)?
How to make a SIMPLE C++ Makefile

Object code (within an object file): Output from a compiler intended as input for a linker (for the linker to produce executable code).
Executable: A program ready to be run (executed) on a computer

What is a Delphi DCU file?

What is a Delphi DCU file?
I believe it stands for "Delphi Compiled Unit". Am I correct in assuming it contains object code, and therefore corresponds to an ".o" file compiled from a C/C++ source code file?

I believe .dcu generally means "Delphi Compiled Unit" as opposed to a .pas file which is simply "Pascal source code".
A .dcu file is the file that the DCC compiler produces after compiling the .pas files (.dfm files are converted to binary resources, then directly processed by the linker).
It's analogous to .o and .obj files that other compilers produce, but contains more information on the symbols (therefore you can reverse engineer the interface section of a unit from it omitting comments and compiler directives).
A .dcu file technically not a "cache" file, although your builds will run faster if you don't delete them and when doesn't need to recompile them. A .dcu file is tied to the compiler version that generated it. In that sense it is less portable than .o or .obj files (though they have their share of compatibility problems too)
Here's some history in case it adds anything.
Compilers have traditionally translated source code languages into some intermediate form. Interpreters don't do that -- they just interpret the language directly and run the application right away. BASIC is the classic example of an interpreted language. The "command line" in DOS and Windows has a language that can be written in files called "batch files" with a .bat extension. But typing things on the command line executed them directly. In *nix environments, there are a bunch of different command-line interpreters (CLIs), such as sh, csh, bash, ksh, and so on. You can create batch files from all of them -- this are usually referred to as "scripting languages". But there are a lot of other languages now that are both interpreted and compiled.
Anyway Java and .Net, for example, compile into something called an intermediate "byte-code" representation.
Pascal was originally written as a single-pass compiler, and Turbo Pascal (originating from PolyPascal) - with different editions for CP/M, CP/M-86 and DOS - directly generated a binary executable (COM) file that ran under those operating systems.
Pascal was originally designed as a small, efficient language intended to encourage good programming practices using structured programming and data structuring; Turbo Pascal 1 was originally designed as a an IDE with built-in very fast compiler, and an affordable competitor in the the DOS and CP/M market against the long edit/compile/link cycles at that time. Turbo Pascal and Pascal had similar limitations as any programming environment back then: memory and disk space were measured in kilobytes, processor speeds in Megahertz.
Linking to an executable binary prevented you from linking to separately compiled units and libraries.
Before Turbo Pascal, there was UCSD p-System operating system (supporting many languages, including Pascal. The UCSD Pascal compiler back then already extended the Pascal language with units) which compiled into a pseudo-machine byte-code (called p-code) format that allowed linking multiple units together. It was slow though,
Meanwhile, c evolved in VAX and Unix environments, and it compiled into .o files, which meant "object code" as opposed to "source code". Note: this is totally unrelated to anything we call "objects" today.
Turbo Pascal up to and including version 3 directly generated .com binary output files (although you could use amend those overlays files), and as of version 4 supported separating code into units that first compiled into .tpu files before linked into the final executable binary. The Turbo C compiler generated .obj (object code) files rather than byte-codes, and Delphi 2 introduced .obj file generation on order to co-operate with C++ Builder.
Object files use relative addressing within each unit, and require what's called "fix-ups" (or relocation) later on to make them run. Fix-ups point to symbolic labels that are expected to exist in other object files or libraries.
There are two kinds of "fix-ups": one is done statically by a tool called a "linker". The linker takes a bunch of object files and seams them together into something analogous to a patchwork quilt. It then "fixes-up" all of the relative references by plugging-in pointers to all of the externally-defined labels.
The second fix-ups are done dynamically when the program is loaded to run. They're done by something called the "loader", but you never see that. When you type a command on the command line, the loader is called to load an EXE file into memory, fix-up the remaining links based on where the file is loaded, and then control is transferred to the entry point of the application.
So .dcu files originated as .tpu files when Borland introduced units in Turbo Pascal, then changed extension with the introduction of Delphi. They are very different from .obj files, though you can link to .obj files from Turbo Pascal and Delphi.
Delphi also hid the linker entirely, so you just do a compile and a run. All of the linker settings are still there, however, in one of Delphi's options panes.

In addition to David Schwartz's answer, there is one case when a dcu actually is quite different from typical obj files generated in other languages: Generic type definitions. If a generic type is defined in a Delphi Unit, the compiler compiles this code into a syntax tree representation rather than to machine code. This syntax tree representation then is stored in the dcu file. When the generic type then is used and instantiated in another unit, the compiler will use this representation and "merge" it with the syntax tree of the unit using the generic type. You could think of this being somewhat analogues to method inlining. This, btw is also the reason why a unit that makes heavy use of generics takes much longer to compile, although the generic types are "linked in" from a dcu file.

A Delphi Compiled Unit contains object code, and pre-compiled headers, and is therefore somewhat comparable to both an obj file and a .pch / .gch file.
The 'interface' section of a Delphi source file corresponds to the header, and the 'implementation' section creates the object code.
Pre-compiled header files may significantly reduce compilation and link time. The DCU header section provides link information to other referenced units, that does not have to be re-discovered.
In the Delphi / Turbo Pascal environment, pre-compiled headers support strict type checking, which would have required source-code referencing if an Object file format like .coff or .obj had been used. (In C++, name mangling provides a similar but less complete function).

make SCons compile everything in one gcc line?

I have a rather complex SCons script that compiles a big C++ project.
This gcc manual page says:
The compiler performs optimization based on the knowledge it has of the program. Compiling multiple files at once to a single output file mode allows the compiler to use information gained from all of the files when compiling each of them.
So it's better to give all my files to a single g++ invocation and let it drive the compilation however it pleases.
But SCons does not do this. it calls g++ separately for every single C++ file in the project and then links them using ld
Is there a way to make SCons do this?

The main reason to have a build system with the ability to express dependencies is to support some kind of conditional/incremental build. Otherwise you might as well just use a script with the one command you need.
That being said, the result of having gcc/g++ optimize as the manual describe is substantial. In particular if you have C++ templates you use often. Good for run-time performance, bad for recompile performance.
I suggest you try and make your own builder doing what you need. Here is another question with an inspirational answer: SCons custom builder - build with multiple files and output one file

Currently the answer is no.
Logic similar to this was developed for MSVC only.
You can see this in the man page (http://scons.org/doc/production/HTML/scons-man.html) as follows:
MSVC_BATCH When set to any true value, specifies that SCons should
batch compilation of object files when calling the Microsoft Visual
C/C++ compiler. All compilations of source files from the same source
directory that generate target files in a same output directory and
were configured in SCons using the same construction environment will
be built in a single call to the compiler. Only source files that have
changed since their object files were built will be passed to each
compiler invocation (via the $CHANGED_SOURCES construction variable).
Any compilations where the object (target) file base name (minus the
.obj) does not match the source file base name will be compiled
separately.
As always patches are welcome to add this in a more general fashion.

In general this should be left up to the program developer. Trying to compile all together in an amalgamation may introduce unintended behaviour to the program if it even compiles in the first place. Your best bet if you want this kind of optimisation without editing the source yourself is to use a compiler with inter-process optimisation like icc -ipo.
Example where an amalgamation of two .c files would not compile is for example if they use two identical static symbols with different functionality.

C++ Compile on different platforms

I am currently developing a C++ command line utility to be distributed as an open-source utility on Github. However, I want people who download the program to be able to easily compile and run the program on any platform (specifically Mac, Linux, and Windows) in as few steps as possible. Assuming only small changes have to be made to the code to make it compatible with the various platform-independent C++ compilers (g++ and win32), how can I do this? Are makefiles relevant?

My advice is, do not use make files, maintaining the files for big enougth projects is tedious and errors happen sometimes which you don't catch immediatly (because the *.o file is still there).
See this question here

Makefiles are indeed highly relevant. You may find that you need (at least) two different makefiles to compensate for the fact that you have different compilers.
It's hard to be specific about how you solve this, since it depends on how complex the project is. It may be easiest to write a script/batchfile, and just document "Use the command build.sh on Linux/Unix, and build.bat on Windows") - and then let the respective files deal with for example setting up the name of the compiler and flags, etc.
Or you can have an include into the makefile, which is determined by the architecture. Or different makefiles.
If the project is REALLY simple, it may be just enough to provide a basic makefile - but it's unlikely, as a compile of x.cpp on Linux/MacOS makes an object file is called x.o, on windows the object file is called x.obj. Libraries have different names, dll's have differnet names, and on Linux/MacOS, the final executable has no extension (typically) so it's called "myprog", where the executable under windows is called "myprog.exe".
These sorts of differences mean that the makefile needs to be different.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js