I have a C++ program that I want to compile to assembly, and then assembler will compile it to machine code.
Now, as far as I know, in order to transform assembly code to machine code the assembler needs some kind of table to map assembly instructions to the actual machine instructions.
Which table will the assembler use? Is there a chance that my C++ program won't run on all CPUs, because CPUs use different tables which means that the same machine code will do different things on different CPUs?
The assembler assembles for whatever architecture it has been told to/programed to assemble for. As the assembly language for each instruction set architecture (ISA) differs, you can only assemble an assembly program written for one architecture for that same architecture. It is generally not possible to accidentally or intentionally assemble the program for the wrong architecture.
When you use a compiler, the compiler invokes the correct assembler with the correct flags to assemble the assembly code it generated for the architecture you told it to compile for. The resulting program will only run on processors of the ISA your have compiled it for. If you want the program to run on processors of a different ISA, you have to compile it for that other ISA.
If your program is poorly written, it is possible that it won't compile or work when compiled for other architectures than the one(s) you developed it for. Such a program is called an unportable program. However, unless you do weird things or make assumptions about properties of the architecture you are programming for, this is unlikely to happen.
In general what is call assembly is roughly a human readable (text) form of machine code (binary).
As franji1 said in a comment, in general compilers emit an intermediate abstract machine code from the source. And this kind of code can easily (it is intended to) be translated to assembly/machine code.
I have a C++ program that I want to compile to assembly, and then
assembler will compile it to machine code.
This is what a compiler is designed to. Compiler is somehow misleading. Compiler can be the "compiler phase" or "compiler toolchain". compiler phase is the one that translate your source code to the intermediate abstract form, that then needs to be translated to target assembly/machine code by the assembler. Compilation is commonly what denotes the whole process from source code to executable machine code.
Now, as far as I know, in order to transform assembly code to machine
code the assembler needs some kind of table to map assembly
instructions to the actual machine instructions.
Roughly yes. This is what a document like Instruction Set Reference Manual is for: describing how textual form must be translated to byte form.
Which table will the assembler use?
See document...
Is there a chance that my C++ program won't run on all CPUs, because
CPUs use different tables which means that the same machine code will
do different things on different CPUs?
You have to generate a byte form of your program for each platform (machine/os). A compiler is designed to generate a machine code for a given platform that realizes exactly what your source code specifies. This is why compilers exist, to free you from writing program in assembly (that is very hard to do).
REQUIREMENT: For a certain project we have unique requirement. The application supports an expression language that allows the user to define their own complex expressions that can be evaluated at run time (many hundred times a second) and they need to be executed at machine level for performance.
WORKING: Our expression parser translates the script into corresponding assembly language routine perfectly. We checked it by statically linking the object files generated with our C test program and they produce correct result.
Since the client can change the script anytime, our program (at run time) detects the change, calls the parser which generates the corresponding assembly routine. We then call the assembler from back end to create the object code.
PROBLEM
How can we call this assembly routine dynamically from the C++ program
(Loader)?
We are not supposed to call the C++ compiler to link it with the loader because the loader already would have other subroutines running and we cannot take the loader off, recompile and then execute the new loader program.
I tried searching for a solution online but every time the results are littered with .NET assembly dynamic calling. Our app has nothing to do with .NET.
First, the "generated plugin" approach (on Linux; my answer focuses on Linux but could be adapted to Windows with some effort; you could use many-platform frameworks like Qt or POCO or Glib from GTK; then all wrap plugin loading abilities à la dlopen with a common API that you could use on Windows, on Linux, on MacOSX, on Android) :
generate C (or assembly) code in some file /tmp/generated01.c (you might even generate C++ code using standard C++ containers, but its compilation would be significantly slower; beware of name mangling so emit and use extern "C" functions; read the C++ dlopen mini HowTo). See this answer explaining why generating C is worthwhile (and could be better, and more portable, than generating assembler code).
run (using fork+execve+waitpid, or simply system) a compilation of that generated file into a shared object /tmp/genenerated01.so by running gcc -fPIC -Wall -O /tmp/generated01.c -shared -o /tmp/generated01.so command; you practically need to get position-independent code, hence the -fPIC flag. If using dlopen on your generated assembler code you'll need to improve your assembler generator to emit PIC code.
dlopen that new /tmp/generated01.so (so use the dynamic linker), see dlopen(3); you could even remove the now useless generated C file /tmp/generated01.c
dlsym the relevant symbols to get function pointers to the generated code, see dlsym(3); your application would simply call the generated code using these function pointers.
when you are sure that you don't need any functions from it and that no call frame uses it, you could dlclose that shared object library (but you might accept to leak some address space by not calling dlclose at all)
The above approach is worthwhile and can be used a big lot of times (my manydl.c demonstrates that you could dlopen a million different shared objects), and is practically even compatible (even when emitting C code!) with an interactive Read-Eval-Print-Loop -on most current desktops and laptops and servers-, since most of the time the generated /tmp/generated01.c would be quite small (e.g. a few hundred lines at most) to be very quickly generated and compiled (by gcc, etc...). I am even using this in MELT for its REPL mode. On Linux this plugin approach generally requires to link the main application with -rdynamic (so that dlopen-ed plugins can reference and call functions from the main application).
Then, other approaches could be to use some Just-In-Time compilation library, like
GNU lightning (which emits slow machine code very quickly - so very short JIT emission time, but the generated code is running slowly since it is very unoptimized)
asmjit; it is x86-64 specific, and enables you to generate individual x86-64 machine instructions
GNU libjit is available for several platforms, and offer an "interpreter" mode for other platforms
LLVM (part of Clang/LLVM compiler, usable as a JIT library)
GCCJIT (a new JIT library front-end to GCC)
Grossly speaking, the first elements of that list are able to emit JIT machine code fairly quickly, but that code won't run as fast as compiling with gcc -fPIC -O1 or -O2 the equivalent generated C code (but would run typically 2x to 5x slower!); the last two elements (LLVM & GCCJIT) are compiler based: so they are able to optimize and emit efficient code, at the expense of slower JIT code emission. All the JIT libraries are able (like dlsym does for plugins) to give function pointers to newly JIT-constructed functions.
Notice that there is a trade-off to be made: some techniques are able to generate quickly some machine code, if you accept that generated code to later run a bit slowly; other techniques (notably GCCJIT or LLVM) are spending time to optimize the generated machine code, so takes more time to emit the machine code, but that code would later run quickly. You should not expect both (small generation time, quick execution time), since there is no such thing as a free lunch.
I believe that generating manually some assembler code is practically not worthwhile. You won't be able to generate very optimized code (because optimization is a very difficult art, and both GCC and Clang have millions of source line code for optimization passes), unless you spend many years of work for that. Using some JIT library is easier, and "compiling" to C or C++ is also quite easy (you leave the burden of optimization to the C compiler you are calling).
You could also consider rewriting your application into some language with homoiconicity and metaprogramming abilities (e.g. multi-stage programming), such as Common Lisp (and many others, e.g. those providing eval). Its SBCL implementation is always emitting machine code...
You could also embed an interpreter like Lua -perhaps even LuaJit- or Guile in your application. The main advantage of embedding an existing language is that there are resources (books, modules, ...) and community of people knowing them (designing a good language is difficult!). Also, the embedded interpreter library is well designed and probably well debugged (since used a lot), and some of them are fast enough (since using bytecode techniques).
As the comments already suggest, LoadLibrary (Windows) and dlopen (Linux/POSIX) are by far the easiest solution. These are specifically intended to dynamically load code. Equally important, they both allow unloading as well, and there are functions to then get a function entry point by name.
You can dynamically do it. I will take linux case as an example. Since your parser working fine and generates machine code, you should be able to generate .so (for linux) or .dll for windows.
Next, load the library as
handle = dlopen(so_file_name, RTLD_LAZY);
Next get function pointer
func = dlsym(handle, "function_name");
Then you should be able to execute it as func()
One thing you need to experiment (in case you do not get desired result) is close and open the so file or dll file (you need to do only if required, else it may reduce performance)
It sounds like you can generate the proper byte code. So you could just ensure that you generate position independent code, write it into an executable piece of memory, and then call or create thread upon the code. The simplest way would just be to cast the pointer to the base of the memory you wrote the code into as a function pointer, and then call it.
If you write your bytecode to avoid referencing different sections, and instead reference offsets from its loaded base, 'loading' the code is as easy as writing it to executable memory. You could do a call/pop/jmp to find the base of the code once it begins executing.
Conversely, and probably the easiest solution, would be to just write the code out as function expecting arguments, that way you could pass the code's base and any other arguments to it, as you would with any other function, as long as you use the proper typedef for your function pointer, and the generated assembly handles the arguments properly. As long as you avoid creating absolute jumps or data references to absolute addresses, you shouldn't have any issue.
too late but I think it would help someone else.
in case you want to dynamically execute a piece of code, you can create an interpreter for this.
compile your expressions into some byte code then write the interpreter for executing this.
here is a tutorial about writing interpreters, but in python.
https://ruslanspivak.com/lsbasi-part1/
you can write it using c/c++
So as I'm understood c++ code is comprised of assembly code, and when I compile a program it is read as its assembly equivelent and then run by the compiler. I'm also understood that assembly syntax and features change from model to model of proccessor. If this is so, how do compilers manage to compile programs without being littered with bugs? I mean, it can't be possible for a compiler to hold every assembly language variant created, is it?
I think you're confusing assembly code with machine code. It's not the same. Machine code is what the CPU executes - a byte stream of instructions and data. Assembly is a human readable representation of machine code.
It's indeed true that all C++ code is compiled into machine code, eventually. Yes, the instruction set changes between CPUs and CPU versions. Compilers have the notion of "target architecture" - when you compile, you have an option of specifying one. If you don't, the architecture of the current machine is usually assumed. Yes, compiler vendors have to extend an effort to support every flavor of CPU that they intend to support. Fortunately, there's not that many. Besides, in the C compilation process, code generation is not even the most complex step, so the majority of compiler's own code is not CPU specific.
Some compilers work via assembly - rather than generating machine code, they generate assembly and feed that to an assembler for the final stage of compilation. With that kind of design, your compiler normally assumes a certain flavor of assembler to be present on the system - typically GNU assembler (as).
I think you've misunderstood the meaning of "assembly code".
C++ code does not "consist" of assembly code; it consists of, well, C++ code.
A compiler translates this C++ code, ultimately into executable machine code that can be run on a computer (usually under the direction of an operating system).
Assembly code is a human-readable symbolic representation of machine code. Typically a line of assmembly code corresponds to a single CPU instruction of machine code. Assembly is a much lower level language than C++ (or even C).
Some C++ compilers generate assembly code as an intermediate step; the assembly code is then translated into executable machine code. Other C++ compilers skip that step and generate machine code directly (though they may have an option to produce a human-readable assembly listing).
Typically each compiler accepts input in a single high-level language (C, C++, etc.) and generates output for one CPU (x86, ARM, MIPS, etc). Compilers are commonly designed in phases, so that the portion that processes the high-level input language can be combined with the portion that generates machine-specific code. gcc is designed this way. There are front ends that process a number of different input languages, and code generators that generate code for different CPUs. Thus if you already have an Ada front end and a MIPS back end, it's not too difficult to join them together to create an Ada compiler that generates MIPS machine code.
As for how compilers manage to do with without being "littered with bugs", well, it's just a lot of work.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
So I found out that C(++) programs actually don't compile to plain "binary" (I may have gotten some things wrong here, in that case I'm sorry :D) but to a range of things (symbol table, os-related stuff,...) but...
Does assembler "compile" to pure binary? That means no extra stuff besides resources like predefined strings, etc.
If C compiles to something else than plain binary, how can that small assembler bootloader just copy the instructions from the HDD to memory and execute them? I mean if the OS kernel, which is probably written in C, compiles to something different than plain binary - how does the bootloader handle it?
edit: I know that assembler doesn't "compile" because it only has your machine's instruction set - I didn't find a good word for what assembler "assembles" to. If you have one, leave it here as comment and I'll change it.
C typically compiles to assembler, just because that makes life easy for the poor compiler writer.
Assembly code always assembles (not "compiles") to relocatable object code. You can think of this as binary machine code and binary data, but with lots of decoration and metadata. The key parts are:
Code and data appear in named "sections".
Relocatable object files may include definitions of labels, which refer to locations within the sections.
Relocatable object files may include "holes" that are to be filled with the values of labels defined elsewhere. The official name for such a hole is a relocation entry.
For example, if you compile and assemble (but don't link) this program
int main () { printf("Hello, world\n"); }
you are likely to wind up with a relocatable object file with
A text section containing the machine code for main
A label definition for main which points to the beginning of the text section
A rodata (read-only data) section containing the bytes of the string literal "Hello, world\n"
A relocation entry that depends on printf and that points to a "hole" in a call instruction in the middle of a text section.
If you are on a Unix system a relocatable object file is generally called a .o file, as in hello.o, and you can explore the label definitions and uses with a simple tool called nm, and you can get more detailed information from a somewhat more complicated tool called objdump.
I teach a class that covers these topics, and I have students write an assembler and linker, which takes a couple of weeks, but when they've done that most of them have a pretty good handle on relocatable object code. It's not such an easy thing.
Let's take a C program.
When you run gcc, clang, or 'cl' on the c program, it will go through these stages:
Preprocessor (#include, #ifdef, trigraph analysis, encoding translations, comment management, macros...) including lexing into preprocessor tokens and eventually resulting in flat text for input to the compiler proper.
Lexical analysis (producing tokens and lexical errors).
Syntactical analysis (producing a parse tree and syntactical errors).
Semantic analysis (producing a symbol table, scoping information and scoping/typing errors) Also data-flow, transforming the program logic into an "intermediate representation" that the optimizer can work with. (Often an SSA). clang/LLVM uses LLVM-IR, gcc uses GIMPLE then RTL.
Optimization of the program logic, including constant propagation, inlining, hoisting invariants out of loops, auto-vectorization, and many many other things. (Most of the code for a widely-used modern compiler is optimization passes.) Transforming through intermediate representations is just part of how some compilers work, making it impossible / meaningless to "disable all optimizations"
Outputing into assembly source (or another intermediate format like .NET IL bytecode)
Assembling of the assembly into some binary object format.
Linking of the assembly into whatever static libraries are needed, as well as relocating it if needed.
Output of final executable in elf, PE/coff, MachO64, or whatever other format
In practice, some of these steps may be done at the same time, but this is the logical order. Most compilers have options to stop after any given step (e.g. preprocess or asm), including dumping internal representation between optimization passes for open-source compilers like GCC. (-ftree-dump-...)
Note that there's a 'container' of elf or coff format around the actual executable binary, unless it's a DOS .com executable
You will find that a book on compilers(I recommend the Dragon book, the standard introductory book in the field) will have all the information you need and more.
As Marco commented, linking and loading is a large area and the Dragon book more or less stops at the output of the executable binary. To actually go from there to running on an operating system is a decently complex process, which Levine in Linkers and Loaders covers.
I've wiki'd this answer to let people tweak any errors/add information.
There are different phases in translating C++ into a binary executable. The language specification does not explicitly state the translation phases. However, I will describe the common translation phases.
Source C++ To Assembly or Itermediate Language
Some compilers actually translate the C++ code into an assembly language or an intermediate language. This is not a required phase, but helpful in debugging and optimizations.
Assembly To Object Code
The next common step is to translate Assembly language into an Object code. The object code contains assembly code with relative addresses and open references to external subroutines (methods or functions). In general, the translator puts in as much information into an object file as it can, everything else is unresolved.
Linking Object Code(s)
The linking phase combines one or more object codes, resolves references and eliminates duplicate subroutines. The final output is an executable file. This file contains information for the operating system and relative addresses.
Executing Binary Files
The Operating System loads the executable file, usually from a hard drive, and places it into memory. The OS may convert relative addresses into physical locations. The OS may also prepare resources (such as DLLs and GUI widgets) that are required by the executable (which may be stated in the Executable file).
Compiling Directly To Binary
Some compilers, such as the ones used in Embedded Systems, have the capability to compile from C++ directly to an executable binary code. This code will have physical addresses instead of relative address and not require an OS to load.
Advantages
One of the advantages of these phases is that C++ programs can be broken into pieces, compiled individually and linked at a later time. They can even be linked with pieces from other developers (a.k.a. libraries). This allows developers to only compiler pieces in development and link in pieces that are already validated. In general, the translation from C++ to object is the time consuming part of the process. Also, a person doesn't want to wait for all the phases to complete when there is an error in the source code.
Keep an open mind and always expect the Third Alternative (Option).
To answer your questions, please note that this is subjective as there are different processors, different platforms, different assemblers and C compilers, in this case, I will talk about the Intel x86 platform.
Assemblers do not usually assemble to pure / flat binary (raw machine code), instead usually to a file defined with segments such as data, text and bss to name but a few; this is called an object file. The Linker steps in and adjusts the segments to make it executable, that is, ready to run. Incidentally, the default output when you assemble using GNU as foo.s is a.out, that is a shorthand for Assembler Output. (But the same filename is the gcc default for linker output, with the assembler output being only a temporary.)
Boot loaders have a special directive defined, back in the days of DOS, it would be common to find a directive such as .Org 100h, which defines the assembler code to be of the old .COM variety before .EXE took over in popularity. Also, you did not need to have a assembler to produce a .COM file, using the old debug.exe that came with MSDOS, did the trick for small simple programs, the .COM files did not need a linker and were straight ready-to-run binary format. Here's a simple session using DEBUG.
1:*a 0100
2:* mov AH,07
3:* int 21
4:* cmp AL,00
5:* jnz 010c
6:* mov AH,07
7:* int 21
8:* mov AH,4C
9:* int 21
10:*
11:*r CX
12:*10
13:*n respond.com
14:*w
15:*q
This produces a ready-to-run .COM program called 'respond.com' that waits for a keystroke and not echo it to the screen. Notice, the beginning, the usage of 'a 100h' which shows that the Instruction pointer starts at 100h which is the feature of a .COM. This old script was mainly used in batch files waiting for a response and not echo it. The original script can be found here.
Again, in the case of boot loaders, they are converted to a binary format, there was a program that used to come with DOS, called EXE2BIN. That was the job of converting the raw object code into a format that can be copied on to a bootable disk for booting. Remember no linker is run against the assembled code, as the linker is for the runtime environment and sets up the code to make it runnable and executable.
The BIOS when booting, expects code to be at segment:offset, 0x7c00, if my memory serves me correct, the code (after being EXE2BIN'd), will start executing, then the bootloader relocates itself lower down in memory and continue loading by issuing int 0x13 to read from the disk, switch on the A20 gate, enable the DMA, switch onto protected mode as the BIOS is in 16bit mode, then the data read from the disk is loaded into memory, then the bootloader issues a far jump into the data code (likely to be written in C). That is in essence how the system boots.
Ok, the previous paragraph sounds abstracted and simple, I may have missed out something, but that is how it is in a nutshell.
To answer the assembly part of the question, assembly doesn't compile to binary as I understand it. Assembly === binary. It directly translates. Each assembly operation has a binary string that directly matches it. Each operation has a binary code, and each register variable has a binary address.
That is, unless Assembler != Assembly and I'm misunderstanding your question.
They compile to a file in a specific format (COFF for Windows, etc), composed of headers and segments, some of which have "plain binary" op codes. Assemblers and compilers (such as C) create the same sort of output. Some formats, such as the old *.COM files, had no headers, but still had certain assumptions (such as where in memory it would get loaded or how big it could be).
On Windows machines, the OS's boostrapper is in a disk sector loaded by the BIOS, where both of these are "plain". Once the OS has loaded its loader, it can read files that have headers and segments.
Does that help?
There are two things that you may mix here. Generally there are two topics:
Executable File Formats (see a list here), for example COFF, XCOFF, ELF
Intermediate Languages, like CIL or GIMPLE or bytecode
The latter may compile to the former in the process of assembly. Some intermediate formats are not assembled, but executed by a virtual machine. In case of C++ it may be compiled into CIL, which is assembled into a .NET assembly, hence there me be some confusion.
But in general C and C++ are usually compiled into binary, or in other words, into a executable file format.
You have a lot of answers to read through, but I think I can keep this succinct.
"Binary code" refers to the bits that feed through the microprocessor's circuits. The microprocessor loads each instruction from memory in sequence, doing whatever they say. Different processor families have different formats for instructions: x86, ARM, PowerPC, etc. You point the processor at the instruction you want by giving it the address of the instruction in memory, and then it chugs merrily along through the rest of the program.
When you want to load a program into the processor, you first have to make the binary code accessible in memory so it has an address in the first place. The C compiler outputs a file in the filesystem, which has to be loaded into a new virtual address space. Therefore, in addition to binary code, that file has to include the information that it has binary code, and what its address space should look like.
A bootloader has different requirements, so its file format might be different. But the idea is the same: binary code is always a payload in a larger file format, which includes at a minimum a sanity check to ensure that it's written in the correct instruction set.
C compilers and assemblers are typically configured to produce static library files. For embedded applications, you're more likely to find a compiler which produces something like a raw memory image with instructions beginning at address zero. Otherwise, you can write a linker which converts the output of the C compiler into whatever else you want.
As I understand it, a chipset (CPU, etc.) will have a set of registers for storing data, and understand a set of instructions for manipulating these registers. The instructions will be things like 'store this value to this register', 'move this value', or 'compare these two values'. These instructions are often expressed in short human-grokable alphabetic codes (assembly language, or assembler) which are mapped to the numbers that the chipset understands - those numbers are presented to the chip in binary (machine code.)
Those codes are the lowest level that the software gets down to. Going deeper than that gets into the architecture of the actual chip, which is something I haven't gotten involved in.
The executable files (PE format on windows) cannot be used to boot the computer because the PE loader is not in memory.
The way bootstrapping works is that the master boot record on the disk contains a blob of a few hundred bytes of code. The BIOS of the computer (in ROM on the motherboard) loads this blob into memory and sets the CPU instruction pointer to the beginning of this boot code.
The boot code then loads a "second stage" loader, on Windows called NTLDR (no extension) from the root directory. This is raw machine code that, like the MBR loader, is loaded into memory cold and executed.
NTLDR has the full capability to load PE files including DLLs and drivers.
С(++) (unmanaged) really compiles to plain binary. Some OS-related stuff - are BIOS and OS function calls, they're different for each OS, but still binary.
1. Assembler compiles to pure binary, but, as strange as it gets, it is less optimized than C(++)
2. OS kernel, as well as bootloader, also written in C, so no problems here.
Java, Managed C++, and other .NET stuff, compiles into some pseudocode (MSIL in .NET), which makes it cross-OS and cross-platform, but requires local interpreter or translator to run.
I have following basic questions :
When we should involve disassembly in debugging
How to interpret disassembly, For example below what does each segment stands for
00637CE3 8B 55 08 mov edx,dword ptr [arItem]
00637CE6 52 push edx
00637CE7 6A 00 push 0
00637CE9 8B 45 EC mov eax,dword ptr [result]
00637CEC 50 push eax
00637CED E8 3E E3 FF FF call getRequiredFields (00636030)
00637CF2 83 C4 0C add
Language : C++
Platform : Windows
It's quite useful to estimate how efficient is the code emitted by the compiler.
For example, if you use an std::vector::operator[] in a loop without disassembly it's quite hard to guess that each call to operator[] in fact requires two memory accesses but using an iterator for the same would require one memory access.
In your example:
mov edx,dword ptr [arItem] // value stored at address "arItem" is loaded onto the register
push edx // that register is pushes into stack
push 0 // zero is pushed into stack
mov eax,dword ptr [result] // value stored at "result" address us loaded onto the register
push eax // that register is pushed into stack
call getRequiredFields (00636030) // getRequiredFields function is called
this is a typical sequence for calling a function - paramaters are pushed into stack and then the control is transferred to that function code (call instruction).
Also using disassembly is quite useful when participating in arguments about "how it works after compilation" - like caf points in his answer to this question.
When you should involve disassembly: When you exactly want to know what the CPU is doing when it's executing your program, or when you don't have the source code in whatever higher level language the program was written in (C++ in your case).
How to interpret assembly code: Learn assembly language. You can find an exhaustive reference on Intel x86 CPU instructions in Intel's processor manuals.
The piece of code that you posted prepares arguments for a function call (by getting and pushing some values on the stack and putting a value in the register eax), and then calls the function getRequiredFields.
1 - We should (I) involve disassembly in debugging as a last resort. Generally, an optimizing compiler generates code that is not trivial to understand to the human eye. Instructions are reordered, some dead code is eliminated, some specific code is inlined, etc, etc. So it is not necessary and not easy when necessary to understand disassembled code. For example, I sometimes look at the disassembly to see if constants are part of the opcode or are stored in const variables.
2 - That piece of code calls a function like getRequiredFields(result, 0, arItem). You have to learn assembly language for the processor you want. For x86, go to www.intel.com and get the manuals of the IA32.
I started out in 1982 with assembly debugging of PL/M programs on CP/M-80 and later Digital Research OSes. It was the same in the early days of MS-DOS until Microsoft introduced symdeb which was a command-line debugger where source and assembly were displayed simultaneously. Symdeb was a leap forward but not that great since the earlier debuggers had forced me to learn to recognize what assembly code belonged to which source code line. Before CodeView the best debugger was pfix86 from Phoenix Technologies. NuMegas SoftIce was the best debugger (apart from pure hardware ICEs) I've ever come across in that it not only debugged my application but effortlessly led me through the inner workings of Windows as well. But I digress.
Late in 1990 a consultant in a project I was working in approached me and said he had this (very early) C++ bug he'd been working on for days but couldn't understand what the problem was. He single-stepped through the source code (on a windowed non-graphic DOS debugger) for me while I got all impatient. Finally I interrupted him and looked through the debugger options and sure enough there was the mixed source/assembly mode with registers and everything. This made it easy to realize that the application was trying to free an internal pointer (for local variables) containing NULL. For this problem, the source code mode was of no help at all. Today's C++ compilers will probably no longer contain a bug such as this but there will be others.
Knowing assembly-level debugging allows you to understand the source-compiler-assembly relationship to the extent of being able to predict what code the compiler will generate. Many people here on stackoverflow say "profile-profile-profile" but this goes a step further in that you learn what source-code constructs (I write in C) to use when and which to avoid. I suspect this is even more important with C++ which can generate a lot of code without the developer suspecting anything. For example there is a standard class for handling lists of objects which appears to be without drawbacks - just a few lines of code and this fantastic functionality! - until you look at the scores of strange procedure calls it generates. I'm not saying it's wrong to use them, I'm just saying that the developer should be aware of the pros and cons of using them. Overloading operators may be great functionality (somewhat weird to a WYSIWYG programmer like me) but what is the price in execution speed? If you say "nothing" I say "prove it."
It is never wrong to use mixed or pure assembly mode when debugging. Difficult bugs will usually be easier to find and the developer will learn to write more efficient code. Developers from the interpreted camp (C# and Java) will say that their code is just as efficient as the compiled languages but if you know assembly you will also know why they are wrong, why they are dead wrong. You can smile and think "yeah, tell me about it!"
After you've worked with different compilers you will come across one with the most astonishing code-generation ability. One PowerPC compiler condensed three nested loops into one loop simply through the superior code interpretation of it's optimizer. Next to the guy who wrote that I'm ... well, let's just say in a different league.
Up until about ten years ago I wrote quite a bit of pure assembly but with multi-stage pipelines, multiple execution units and now multiple cores to contend with the C compiler beats me hands down. On the other hand I know what the compiler can do a good job with and what it shouldn't have to work with: Garbage In still equals Garbage Out. This is true for any compiler that produces assembly output.