Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I'm currently trying to get into OS development, mostly following the articles and tutorials from OSDev. As os now, I have multiple assembly files I need to e.g. enable paging and setting up long mode.
While the only assembler code I'm certain about needs to be separated in an own file is the boot assembly file, I'm curious about the practices and "standards" how to deal with assembly in an OS written in C. Is it convenient to separate assembler from C or is there a reason why e.g. Linux wraps most of the assembly code inside C functions and calls them using the asm volatile directives?
I don't see much difference, as you can return results from assembly by moving the value into the eax register, or when using asm and asm volatile, you can specify parameters and output operands where to store the result. However, you always need to separate multiple instructions by using \n or \n\t.
As of now, I only found out about the different ways of dealing with assembly in a larger project, but not why some chose to separate assembly code from C or C++, and why some chose to use inline assembly thorough the whole program.
I hope you could give me some insights about the different ways used regarding this topic.
Interface, Procotol & Conventions
In a pure assembly language project, you need to set up protocols or conventions for passing values and returning values.
When mixing assembly language functions with C or C++, you will need to follow the parameter passing conventions of C or C++ as dictated by the compiler you are using.
Some compilers may use a convention of passing the first parameter in R0, while others may pass the last parameter in R0. Some compilers may place the variables on the stack and not use registers. Others may use registers for a few parameters and the remaining on the stack.
Inline vs Separate Assembly functions
One issue is portability. Assembly language is processor specific. For example, ARM assembly doesn't have an EAX register. Intel assembly doesn't have R10 register. When using inline assembly, the assembly must change depending on the processor, which includes modifying the high level language function to account for all target processors. When implementing as a separate assembly function (file), only the file needs to be swapped out when porting to other processors.
IMHO, pure assembly functions are easier to read than intermixed C and inline assembly.
Guidelines for High Level Language Usage in OS
The quantity of assembly language should be minimized. Assembly language takes longer to develop (typing, and debugging, more lines == high possible injected defects), whereas the high level language is more productive with lower risks.
Prefer to write the entire OS in a high level language. Get this version working robustly. Replace C functions with assembly functions for more efficiency, or when specific assembly language instructions are required.
Related
This question already has answers here:
What is the difference between 'asm', '__asm' and '__asm__'?
(4 answers)
Closed 3 years ago.
Several years ago I wrote some significant Cpp code that accessed the hardware registers by a coding command that switches to assembler language. I lost the compiler and computer. Please tell me a Cpp compiler that allows inline asembler in the middle of the Cpp code. Intel cpu, Windows. Thank you.
It seems I lacked clarity in the question. My apologies. The answers given were a refresher of the code. Well done. The answers given today suggest the C++ compilers might not have been updated for 64 bit assemblers. Here is a clearer question which has been only partially answered. It needs an updated response.
I am thinking of buying an Intel i7 desk computer. I will write C++ code for i/o and setup. The inner loops will be written in assembler language to take advantage of the hardware register multiply and divide: two multiplicands in separate registers give a double register product. My experience years ago was that not all C++ compilers are alike. Which of the many brands of C++ software out there give a good link to assembler, __asm, and make full advantage of 64 bit machines?
I feel this question has not been asked. Thanks for the great answers so far.
I once used Microsoft Visual Studio to write inline assembly, like this:
// --- Get current frame pointer
ADDR oriFramePtr = 0;
_asm mov DWORD PTR [oriFramePtr], ebp
Unfortunately, this only worked for 32-bit, because at that time the 64-bit compiler of Microsoft didn't support inline assembly (didn't check recently).
By default, C++ provides the asm keyword for writing assembly (bolded by me):
7.4 The asm declaration [dcl.asm]
1 An asm declaration has the form
asm-definition:
asm ( string-literal ) ;
The asm declaration is conditionally-supported; its meaning is implementation-defined. [ Note: Typically it is used to pass information through the implementation to an assembler. — end note ]
GCC appears to support asm based on the above article on asm, but I couldn't find anything besides its support in C
MSVC does support assembly, but not via the asm keyword; one must use __asm:
The __asm keyword invokes the inline assembler and can appear wherever a C or C++ statement is legal.
Visual C++ support for the Standard C++ asm keyword is limited to the fact that the compiler will not generate an error on the keyword. However, an asm block will not generate any meaningful code. Use __asm instead of asm.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
What language is a c or c++ header file written in. Another big doubt that I have is, since a computer understands only binary, how do people actually write a programming language and a compiler that the computer can actually understand?
The C and C++ header files are written as C and C++ source code respectively. The code in these [1] is compiled and linked into your executable file.
A compiler is written in some arbitrary language - these days, you typically take an existing C or C++ compiler for some platform that you have available, but ages ago, the process was basically to write a very basic compiler in assembler [or some other available and suitable language], and then use that to bootstrap into a higher language compiler.
Of course, if you have just invented "Chip X", and haven't got a portable assembler, you'd also have to write an assembler. Hopefully you have some OTHER computer with a programming language - but if we pretend that no computers are available, then we'd have to come up with binary code "by hand", and then enter that into the ROM of the computer. That code would perhaps be able to perform some really simple task such as printing "Hello" to some output device. Once that works, we'd expand it to have a loader, so we can load a binary file (or add new commands some other way). A very simple editor to edit files, and a file-storage would be very useful to have. And then we could start writing some code that can read human readable instructions (assembler code) and produce binary from that. Once we have an assembler, we can write a program in assembler that takes (very simple) C input and outputs assembler. Assemble that code, and we have a (very simple) C compiler. Now we can use the simple C compiler to write a better C compiler in "simple C". Keep at this for a while, and you end up with a decent C compiler... But it's probably a few years worth of work unless you have done this sort of thing many times...
Any language that can read text files and compare strings and output binary files in "free format" is pretty much usable to write a compiler. It's of course not trivial.
I have written a compiler for Pascal which uses the LLVM compiler framework to produce the actual code meaning, I've done the simple part of the compiler, the hard part in a good quality compiler is the code-generation pass, and I only do that into LLVM Intermediate Representation, which is the whole idea of LLVM - it's a simplified machine code "language", and then LLVM provides IR -> machine code for your language. My compiler is currently about 13400 lines of C++ code - the code generation and optimisations in LLVM is millions of lines - much of which I don't even know how it works [beyond the simple overview what the function does according to the description]
[1] There are typically also libraries, which contain larger functions that aren't suitable to store in the headers directly. These are built using the same compiler [or one compatible to it] that you use to build your source into a binary file.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
Is there a way how I can tell a C++ compiler/linker to compile the source code into my own homemade opcode list? I need it for my virtual machine which will execute on a microcontroller.
I don't want to create a C++ compiler from scratch, but only change the opcodes, addresses of CPU status register, stack pointer and GPIO registers, program memory and data memory from an existing compiler that is open source so that people making programs for it don't have to rewrite the whole code, but just port it using the libraries that are compatible with my own compiler's libraries.
Example is an avr-gcc compiler.
The compiler and its libraries must not be proprietary in the way that I or any programmer have to pay for it and I don't want it to be either GPL in such way that a programmer must reveal source for their own projects. I want all my programmers to freely use my compiler, be free to license their work in whatever way they want as well as choose to make it open source or proprietary.
Let's consider the steps involved:
Retargeting an existing C++ compiler: Several production-quality, retargetable C++ compilers are freely available today. For instance, the LLVM platform (clang++) provides some pointers on writing a backend for a new hardware architecture (this naturally applies to VM's as well!). Unfortunately, up-to-date documentation on porting the GNU compilers is harder to come by. It's entirely possible that many of the older documents remain relevant today, but I know far too little about GCC to say.
Note that the effort required to retarget either compiler is likely to depend on how well the instruction set of your virtual machine matches the compiler's low-level intermediate representation. Since they often (at least semantically) take the form of three-address code ― that is, instructions with two source operands and one destination ― writing a code generator for, say, a stack machine (in which all operands are implicitly addressed) could prove to be a bit more difficult.
From this point on, you really have two options. You could stick to the conventional way in which C++ programs are compiled, i.e., from source, to assembly, to object files, to linked executable or library. That involves going through the steps I have outlined below. But since you are targeting a virtual machine, it may have requirements that are radically different from those of modern hardware architectures. In that case, you may want to steer clear of existing software like binutils and roll your own assembler and linker.
Writing or porting an assembler: Unless your chosen compiler is able to directly generate machine code, you will most likely also need to write an assembler for your virtual machine, or port an existing one. If your virtual machine's instruction set looks anything like that of a modern machine, and if you want to use the standard C++ compilation/linking pipeline, you could look into porting binutils, specifically gas, the GNU assembler.
Writing or porting a linker: The object files produced by your assembler are not in themselves executable programs. Addresses must be assigned to symbols and segments, and references between object files must be resolved. This means that the linker needs some understanding of your instruction set. In particular, it must be able to find and patch locations in code and data that address memory. The binutils porting guide I linked above is relevant here, too; you may also enjoy reading Linkers and Loaders.
As #Mat noted in the comment section above, the GPL doesn't usually "infect" the output of a program licensed under it. See this section. Notably:
The output from running a covered work is covered by this License only if the output, given its content, constitutes a covered work.
I am not a lawyer, but I take this to mean that an exception would be made for, say, compiling the compiler with itself ― the output would still be subject to the terms of the GPL.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I have been making a programming language compiler in C++, which converts my code to Assembly. However, I do not know how to the convert this assembly code into an executable format (.exe preferably). How would I be able to do that?
Use Microsoft Macro Assembler 8.0 (MASM) Package (x86)
The Microsoft Macro Assembler 8.0 (MASM) is a tool that consumes x86 assembly language programs and generates corresponding binaries.
You assemble assembly code rather than compile it, and an assembler is the tool you need for that.
The exact assembler that you will need will depend on the target instruction set - assembly language is not in fact a language, it describes any language where mnemonics are used to represent individual machine code instructions.
Even for a single architecture, assembly language syntax may vary - for example on x86 there are at least two syntaxes - Intel and AT&T, so even for x86 you will need an assembler that copes with whatever syntax your tool outputs. Your tool will of course need to output something that can be used by the assembler, which may mean the generation of additional assembler and target specific directives rather than just the raw assembler mnemonics.
It may have been simpler for your language compiler to output C or C++ code (strictly a translator rather than a compiler), that is how early C++ "compilers" worked (and Comeau C++ still does), your language will then at least be more easily portable between different architectures. C compilers are almost as ubiquitous as assemblers for any target, but translation to C allows you to have a single back-end for all target architectures.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
My question is specifically for Windows C++ compilers and Visual Studio, but I got offered to interview for a job in finance where they wanted somebody very technical to write real-time multi-threaded code who could analyse at assembly level the code generated by a C++ compiler.
What are the methods one can apply to learn the link between C++ code and the generated assembly and achieve this level of proficiency ?
The simple answer to this is., To compile code and look at it in the debugger.
Debuggers will show you connection between the two in a very harsh way. The next step is to understand compiler theory and then look at the source code of compilers to understand what they try and do.
I think the person interviewing you may have been trying to see if you can understand what kind of effort is involved - rather than actually knowing how to do it.
The first thing to do would be to learn the assembler and machine code.
There is some very good documentation of the machine code available at
the Intel site (although it may be more detailed than you need). There
are two common assembler formats in widespread use: the one used by
Microsoft is based on the original Intel assembler, where as g++ uses
something completely different (based on the original Unix assembler for
PDP-11), so you'll have to choose one (although the assembler syntax
itself is rarely a real problem—knowing what the individual
instructions do is more important).
Once you have some idea of how to read assembler: most compilers
have options to output assembler: for VC++, use /Fa (and /c as well,
if you don't want to actually link the results); for g++, -S (which
causes the compiler to stop once it has generated the assembler. In the
case of VC++, the assembler will be in a file xxx.asm (where xxx.cpp
was the name of the file being compiled), for g++, xxx.s. Try
compiling some code, with different levels of optimization, and then
look at the assembler in an editor.
Finally, if the question is asked, it's probably because the interviewer
is concerned about performance issues; what he's really interested in is
whether you know the relative cost of various operations (or the risks
involved when multithreading; e.g. what operations are atomic, etc.) In
which case, it probably wouldn't hurt to point out that issues like
locality (which determines the percent of cache hits) are often more
important that the individual operations.