Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I have been making a programming language compiler in C++, which converts my code to Assembly. However, I do not know how to the convert this assembly code into an executable format (.exe preferably). How would I be able to do that?
Use Microsoft Macro Assembler 8.0 (MASM) Package (x86)
The Microsoft Macro Assembler 8.0 (MASM) is a tool that consumes x86 assembly language programs and generates corresponding binaries.
You assemble assembly code rather than compile it, and an assembler is the tool you need for that.
The exact assembler that you will need will depend on the target instruction set - assembly language is not in fact a language, it describes any language where mnemonics are used to represent individual machine code instructions.
Even for a single architecture, assembly language syntax may vary - for example on x86 there are at least two syntaxes - Intel and AT&T, so even for x86 you will need an assembler that copes with whatever syntax your tool outputs. Your tool will of course need to output something that can be used by the assembler, which may mean the generation of additional assembler and target specific directives rather than just the raw assembler mnemonics.
It may have been simpler for your language compiler to output C or C++ code (strictly a translator rather than a compiler), that is how early C++ "compilers" worked (and Comeau C++ still does), your language will then at least be more easily portable between different architectures. C compilers are almost as ubiquitous as assemblers for any target, but translation to C allows you to have a single back-end for all target architectures.
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I'm currently trying to get into OS development, mostly following the articles and tutorials from OSDev. As os now, I have multiple assembly files I need to e.g. enable paging and setting up long mode.
While the only assembler code I'm certain about needs to be separated in an own file is the boot assembly file, I'm curious about the practices and "standards" how to deal with assembly in an OS written in C. Is it convenient to separate assembler from C or is there a reason why e.g. Linux wraps most of the assembly code inside C functions and calls them using the asm volatile directives?
I don't see much difference, as you can return results from assembly by moving the value into the eax register, or when using asm and asm volatile, you can specify parameters and output operands where to store the result. However, you always need to separate multiple instructions by using \n or \n\t.
As of now, I only found out about the different ways of dealing with assembly in a larger project, but not why some chose to separate assembly code from C or C++, and why some chose to use inline assembly thorough the whole program.
I hope you could give me some insights about the different ways used regarding this topic.
Interface, Procotol & Conventions
In a pure assembly language project, you need to set up protocols or conventions for passing values and returning values.
When mixing assembly language functions with C or C++, you will need to follow the parameter passing conventions of C or C++ as dictated by the compiler you are using.
Some compilers may use a convention of passing the first parameter in R0, while others may pass the last parameter in R0. Some compilers may place the variables on the stack and not use registers. Others may use registers for a few parameters and the remaining on the stack.
Inline vs Separate Assembly functions
One issue is portability. Assembly language is processor specific. For example, ARM assembly doesn't have an EAX register. Intel assembly doesn't have R10 register. When using inline assembly, the assembly must change depending on the processor, which includes modifying the high level language function to account for all target processors. When implementing as a separate assembly function (file), only the file needs to be swapped out when porting to other processors.
IMHO, pure assembly functions are easier to read than intermixed C and inline assembly.
Guidelines for High Level Language Usage in OS
The quantity of assembly language should be minimized. Assembly language takes longer to develop (typing, and debugging, more lines == high possible injected defects), whereas the high level language is more productive with lower risks.
Prefer to write the entire OS in a high level language. Get this version working robustly. Replace C functions with assembly functions for more efficiency, or when specific assembly language instructions are required.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
What language is a c or c++ header file written in. Another big doubt that I have is, since a computer understands only binary, how do people actually write a programming language and a compiler that the computer can actually understand?
The C and C++ header files are written as C and C++ source code respectively. The code in these [1] is compiled and linked into your executable file.
A compiler is written in some arbitrary language - these days, you typically take an existing C or C++ compiler for some platform that you have available, but ages ago, the process was basically to write a very basic compiler in assembler [or some other available and suitable language], and then use that to bootstrap into a higher language compiler.
Of course, if you have just invented "Chip X", and haven't got a portable assembler, you'd also have to write an assembler. Hopefully you have some OTHER computer with a programming language - but if we pretend that no computers are available, then we'd have to come up with binary code "by hand", and then enter that into the ROM of the computer. That code would perhaps be able to perform some really simple task such as printing "Hello" to some output device. Once that works, we'd expand it to have a loader, so we can load a binary file (or add new commands some other way). A very simple editor to edit files, and a file-storage would be very useful to have. And then we could start writing some code that can read human readable instructions (assembler code) and produce binary from that. Once we have an assembler, we can write a program in assembler that takes (very simple) C input and outputs assembler. Assemble that code, and we have a (very simple) C compiler. Now we can use the simple C compiler to write a better C compiler in "simple C". Keep at this for a while, and you end up with a decent C compiler... But it's probably a few years worth of work unless you have done this sort of thing many times...
Any language that can read text files and compare strings and output binary files in "free format" is pretty much usable to write a compiler. It's of course not trivial.
I have written a compiler for Pascal which uses the LLVM compiler framework to produce the actual code meaning, I've done the simple part of the compiler, the hard part in a good quality compiler is the code-generation pass, and I only do that into LLVM Intermediate Representation, which is the whole idea of LLVM - it's a simplified machine code "language", and then LLVM provides IR -> machine code for your language. My compiler is currently about 13400 lines of C++ code - the code generation and optimisations in LLVM is millions of lines - much of which I don't even know how it works [beyond the simple overview what the function does according to the description]
[1] There are typically also libraries, which contain larger functions that aren't suitable to store in the headers directly. These are built using the same compiler [or one compatible to it] that you use to build your source into a binary file.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
Is there a way how I can tell a C++ compiler/linker to compile the source code into my own homemade opcode list? I need it for my virtual machine which will execute on a microcontroller.
I don't want to create a C++ compiler from scratch, but only change the opcodes, addresses of CPU status register, stack pointer and GPIO registers, program memory and data memory from an existing compiler that is open source so that people making programs for it don't have to rewrite the whole code, but just port it using the libraries that are compatible with my own compiler's libraries.
Example is an avr-gcc compiler.
The compiler and its libraries must not be proprietary in the way that I or any programmer have to pay for it and I don't want it to be either GPL in such way that a programmer must reveal source for their own projects. I want all my programmers to freely use my compiler, be free to license their work in whatever way they want as well as choose to make it open source or proprietary.
Let's consider the steps involved:
Retargeting an existing C++ compiler: Several production-quality, retargetable C++ compilers are freely available today. For instance, the LLVM platform (clang++) provides some pointers on writing a backend for a new hardware architecture (this naturally applies to VM's as well!). Unfortunately, up-to-date documentation on porting the GNU compilers is harder to come by. It's entirely possible that many of the older documents remain relevant today, but I know far too little about GCC to say.
Note that the effort required to retarget either compiler is likely to depend on how well the instruction set of your virtual machine matches the compiler's low-level intermediate representation. Since they often (at least semantically) take the form of three-address code ― that is, instructions with two source operands and one destination ― writing a code generator for, say, a stack machine (in which all operands are implicitly addressed) could prove to be a bit more difficult.
From this point on, you really have two options. You could stick to the conventional way in which C++ programs are compiled, i.e., from source, to assembly, to object files, to linked executable or library. That involves going through the steps I have outlined below. But since you are targeting a virtual machine, it may have requirements that are radically different from those of modern hardware architectures. In that case, you may want to steer clear of existing software like binutils and roll your own assembler and linker.
Writing or porting an assembler: Unless your chosen compiler is able to directly generate machine code, you will most likely also need to write an assembler for your virtual machine, or port an existing one. If your virtual machine's instruction set looks anything like that of a modern machine, and if you want to use the standard C++ compilation/linking pipeline, you could look into porting binutils, specifically gas, the GNU assembler.
Writing or porting a linker: The object files produced by your assembler are not in themselves executable programs. Addresses must be assigned to symbols and segments, and references between object files must be resolved. This means that the linker needs some understanding of your instruction set. In particular, it must be able to find and patch locations in code and data that address memory. The binutils porting guide I linked above is relevant here, too; you may also enjoy reading Linkers and Loaders.
As #Mat noted in the comment section above, the GPL doesn't usually "infect" the output of a program licensed under it. See this section. Notably:
The output from running a covered work is covered by this License only if the output, given its content, constitutes a covered work.
I am not a lawyer, but I take this to mean that an exception would be made for, say, compiling the compiler with itself ― the output would still be subject to the terms of the GPL.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
My question is specifically for Windows C++ compilers and Visual Studio, but I got offered to interview for a job in finance where they wanted somebody very technical to write real-time multi-threaded code who could analyse at assembly level the code generated by a C++ compiler.
What are the methods one can apply to learn the link between C++ code and the generated assembly and achieve this level of proficiency ?
The simple answer to this is., To compile code and look at it in the debugger.
Debuggers will show you connection between the two in a very harsh way. The next step is to understand compiler theory and then look at the source code of compilers to understand what they try and do.
I think the person interviewing you may have been trying to see if you can understand what kind of effort is involved - rather than actually knowing how to do it.
The first thing to do would be to learn the assembler and machine code.
There is some very good documentation of the machine code available at
the Intel site (although it may be more detailed than you need). There
are two common assembler formats in widespread use: the one used by
Microsoft is based on the original Intel assembler, where as g++ uses
something completely different (based on the original Unix assembler for
PDP-11), so you'll have to choose one (although the assembler syntax
itself is rarely a real problem—knowing what the individual
instructions do is more important).
Once you have some idea of how to read assembler: most compilers
have options to output assembler: for VC++, use /Fa (and /c as well,
if you don't want to actually link the results); for g++, -S (which
causes the compiler to stop once it has generated the assembler. In the
case of VC++, the assembler will be in a file xxx.asm (where xxx.cpp
was the name of the file being compiled), for g++, xxx.s. Try
compiling some code, with different levels of optimization, and then
look at the assembler in an editor.
Finally, if the question is asked, it's probably because the interviewer
is concerned about performance issues; what he's really interested in is
whether you know the relative cost of various operations (or the risks
involved when multithreading; e.g. what operations are atomic, etc.) In
which case, it probably wouldn't hurt to point out that issues like
locality (which determines the percent of cache hits) are often more
important that the individual operations.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
I have some C++ code. In the code there are many classes defined, their member functions, constructors, destructors for those classes, few template classes and lots of C++ stuff. Now I need to convert the source to plain C code.
I the have following questions:
Is there any tool to convert C++ code and header files to C code?
Will I have to do total rewrite of the code (I will have to remove the constructors,destructors and move that code into some init(), deinit() functions; change classes to structures, make existing member functions as function pointers in those newly defined structures and then invoke those functions using function pointers etc..)?
If I have to convert it manually myself, what C++ specific code-data constructs/semantics do I need to pay attention to while doing the conversion from C++ to C?
There is indeed such a tool, Comeau's C++ compiler. . It will generate C code which you can't manually maintain, but that's no problem. You'll maintain the C++ code, and just convert to C on the fly.
http://llvm.org/docs/FAQ.html#translatecxx
It handles some code, but will fail for more complex implementations as it hasn't been fully updated for some of the modern C++ conventions. So try compiling your code frequently until you get a feel for what's allowed.
Usage sytax from the command line is as follows for version 9.0.1:
clang -c CPPtoC.cpp -o CPPtoC.bc -emit-llvm
clang -march=c CPPtoC.bc -o CPPtoC.c
For older versions (unsure of transition version), use the following syntax:
llvm-g++ -c CPPtoC.cpp -o CPPtoC.bc -emit-llvm
llc -march=c CPPtoC.bc -o CPPtoC.c
Note that it creates a GNU flavor of C and not true ANSI C. You will want to test that this is useful for you before you invest too heavily in your code. For example, some embedded systems only accept ANSI C.
Also note that it generates functional but fairly unreadable code. I recommend commenting and maintain your C++ code and not worrying about the final C code.
EDIT : although official support of this functionality was removed, but users can still use this unofficial support from Julia language devs, to achieve mentioned above functionality.
While you can do OO in C (e.g. by adding a theType *this first parameter to methods, and manually handling something like vtables for polymorphism) this is never particularly satisfactory as a design, and will look ugly (even with some pre-processor hacks).
I would suggest at least looking at a re-design to compare how this would work out.
Overall a lot depends on the answer to the key question: if you have working C++ code, why do you want C instead?
Maybe good ol' cfront will do?
A compiler consists of two major blocks: the 'front end' and the 'back end'.
The front end of a compiler analyzes the source code and builds some form of a 'intermediary representation' of said source code which is much easier to analyze by a machine algorithm than is the source code (i.e. whereas the source code e.g. C++ is designed to help the human programmer to write code, the intermediary form is designed to help simplify the algorithm that analyzes said intermediary form easier).
The back end of a compiler takes the intermediary form and then converts it to a 'target language'.
Now, the target language for general-use compilers are assembler languages for various processors, but there's nothing to prohibit a compiler back end to produce code in some other language, for as long as said target language is (at least) as flexible as a general CPU assembler.
Now, as you can probably imagine, C is definitely as flexible as a CPU's assembler, such that a C++ to C compiler is really no problem to implement from a technical pov.
So you have: C++ ---frontEnd---> someIntermediaryForm ---backEnd---> C
You may want to check these guys out: http://www.edg.com/index.php?location=c_frontend
(the above link is just informative for what can be done, they license their front ends for tens of thousands of dollars)
PS
As far as i know, there is no such a C++ to C compiler by GNU, and this totally beats me (if i'm right about this). Because the C language is fairly small and it's internal mechanisms are fairly rudimentary, a C compiler requires something like one man-year work (i can tell you this first hand cause i wrote such a compiler myself may years ago, and it produces a [virtual] stack machine intermediary code), and being able to have a maintained, up-to-date C++ compiler while only having to write a C compiler once would be a great thing to have...
This is an old thread but apparently the C++ Faq has a section (Archived 2013 version) on this. This apparently will be updated if the author is contacted so this will probably be more up to date in the long run, but here is the current version:
Depends on what you mean. If you mean, Is it possible to convert C++ to readable and maintainable C-code? then sorry, the answer is No — C++ features don't directly map to C, plus the generated C code is not intended for humans to follow. If instead you mean, Are there compilers which convert C++ to C for the purpose of compiling onto a platform that yet doesn't have a C++ compiler? then you're in luck — keep reading.
A compiler which compiles C++ to C does full syntax and semantic checking on the program, and just happens to use C code as a way of generating object code. Such a compiler is not merely some kind of fancy macro processor. (And please don't email me claiming these are preprocessors — they are not — they are full compilers.) It is possible to implement all of the features of ISO Standard C++ by translation to C, and except for exception handling, it typically results in object code with efficiency comparable to that of the code generated by a conventional C++ compiler.
Here are some products that perform compilation to C:
Comeau Computing offers a compiler based on Edison Design Group's front end that outputs C code.
LLVM is a downloadable compiler that emits C code. See also here and here. Here is an example of C++ to C conversion via LLVM.
Cfront, the original implementation of C++, done by Bjarne Stroustrup and others at AT&T, generates C code. However it has two problems: it's been difficult to obtain a license since the mid 90s when it started going through a maze of ownership changes, and development ceased at that same time and so it doesn't get bug fixes and doesn't support any of the newer language features (e.g., exceptions, namespaces, RTTI, member templates).
Contrary to popular myth, as of this writing there is no version of g++ that translates C++ to C. Such a thing seems to be doable, but I am not aware that anyone has actually done it (yet).
Note that you typically need to specify the target platform's CPU, OS and C compiler so that the generated C code will be specifically targeted for this platform. This means: (a) you probably can't take the C code generated for platform X and compile it on platform Y; and (b) it'll be difficult to do the translation yourself — it'll probably be a lot cheaper/safer with one of these tools.
One more time: do not email me saying these are just preprocessors — they are not — they are compilers.