Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
Background:
We have acquired a software product that builds to a 32-bit Windows application in Visual Studio. We wish to port this application to 64-bit.
A mission-critical component of this code is a black-box static library (.a file) originally built using gFortran by a third party. The original developer has since passed away, and the Fortran source we were able to get was incomplete and not the version this library was built off of (and contains critical bugs not present in the compiled library). They did not use a VCS.
Problem:
I would like to create a 64-bit static library whose code is functionally equivalent to the 32-bit static library we have.
What I've Tried:
Using the Snowman decompiler to get C++ source code to recompile in 64-bit. This proved impossible because the code that was generated uses low-level intrinsic functions that appear to be gcc-specific. It likely wouldn't work anyway because such intrinsics would compile to code that isn't functionally equivalent in 64-bit. I'd likely need a better decompiler.
Apparently x86 assembly is valid x86_64 assembly, so I looked briefly into why I couldn't just run the assembly through a 64-bit assembler. Turns out the ABI is different in 64-bit, so the calling convention won't match. MAYBE I could manually convert the function calls to the proper convention and leave the rest the same, but that might be an intractable problem. Opinions?
You could keep the 32-bit binary library but load it into a 32-bit host process, and use some kind of IPC (shared memory, named pipes, local-loopback network connection, etc) to transfer data to/from your 64-bit process.
Another advantage to this approach is that if the Fortran code crashes then it will only bring down the child host process, not your main application and your program can instantly fire it up again; and if it's a single-threaded Fortran program then you could spin up several instances for multi-core parallelism.
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
I know that the inline asm exists, but is it also possible to execute machine code from a file during RUNTIME?
Would i need to write my own interpreter?
I'm using the GNU C++ compiler with c++ 14 enabled, on Windows 7.
Thanks for reading.
With your rephrasing into machine code, this question starts taking a more reasonable shape.
A short answer: Yes, you can run machine code from within your application.
A longer answer is - it's complicated.
Essentially, any string of bits and bytes in memory can be executed, given some conditions are met, such as the data being legal machine instructions (Otherwise the processor will invoke the illegal instruction exception and the OS will terminate your program) and that the memory page into which the data is loaded is marked with executable permissions.
Having said that, the conditions required for that machine code to actually run correctly and do what you expect it to do, is significantly harder, and have to do with understanding of Virtual Memory, Dynamic Loaders and Dynamic Linkers.
To bluntly answer your question, for a POSIX compliant environment at the least, you could always use the mmap system call to map a file into memory with PROT_EXEC permissions and jump into that memory space hoping for the best.
Naturally, any symbols that code would be expecting to find in memory aren't likely to be there, and the code was better compiled as PIC (Position Independent Code) but this roughly answers your question with a YES.
For better control, you'd usually prefer to use a more standard method, such as compiling your extra code as a shared object (Dynamic Link Library, DLL in Windows) and loading it into your application with dlopen while using dlsym to access symbols within it. It still allows you to load machine code from the disk into your application, but it also stores the machine code in a well formatted, standard way, which allows the dynamic linker to properly load and link the new code segment into your application, reducing unexpected behavior.
In neither of these cases will you need an interpreter, but neither is it a matter of language or compiler used - this is OS specific functionality, and will behave quite differently on Windows.
As a different approach, you could consider using the #include directive to import an external chunk of assembly code into your work while you're still working on it and properly incorporate it in compile time, which will yield far more deterministic results.
Edit:
For windows, the parallel for mmap is CreateFileMapping
dlopen is LoadLibrary
Not a Windows expert, sorry...
Let us distinguish between "assembler code"/assembly code (which is what this question initially asked about) and machine code (after one of the edits).
Anything you might describe as "assembler code" (or more usually "assembly code") but not machine code (i.e. anything not being actual, binary, executable, machine code) cannot be "executed". You can only read it into what I would call an "assembly-code-interpreter" and have it processed. I do not know of any such a program.
Alternatively, you can have it processed at runtime by a build process and execute the resulting executable. That however seems not to be what you are asking about.
Note that this does not mean that you can execute any machine code you might find in a file on your disk. It needs to be for the right, same platform and be supported by the appropriate runtime environment. That is applicable to executeables created for your machine or compatibles, e.g. the result of a built.
Note that I understand "assembler code" ("assembly code") to mean source code in assembly language, which is a (probably the most basic) representation of programs in (not really) human eye readable form. (As immortal has commented, an assembler is the program to process assembly code into machine code.) Opcode mnemonics are used, e.g. cmp r1, r2 for comparing two registers. That string of characters however is guaranteed not to make any sense when trying to execute it straight forward. (OK, strictly speaking I should say "almost guaranteed"...)
Machine code which is appropriatly made for your environment, including a loader, can be executed from a file. Any operating system will support you doing that, most will even provide a GUI for doing that. (I notice this sounds somewhat cynical, sorry, not meant to be.) Windows for example will execute an executable if you double-click its icon in the windows explorer.
An alternative to such executable programs are libraries. Especially the dynamic link libraries are probably quite close to what you are thinking of. They are very similar, in needing to be targeted at your environment. Then they can (usually partially) be executed from a linked program, via agreed calling mechanisms. Those mechanisms in turn ensure that the code is executed in a matching environment, including being able to return results.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 7 years ago.
Improve this question
I have a file with the extension .out . I'm running windows 10. From what I understand, .out files are generated while coding in C and C++ in Linux. I was wondering if there was any way in which I could execute the file in windows. Renaming it's extension to .exe gave me an error saying the file was incompatible with 64-bit version of windows.
So is there any way I could execute the file, or better yet, view it's contents as proper code so I can work with it, while using Windows?
There's no way of directly converting a linux executable to Windows format.
You'll have to recompile or use Cygwin, It allows running Linux commands in Windows environment.
a.out is not neccessarily related to C or C++, it can be generated from any other kind of compiler/assembler. If you read the article, then you can see that it isn't even guaruanteed that this actually is what you may think of a.out format.
In order to execute it, the only possible way to achieve this is to install a Unix OS to execute it, but this again wont guaruantee that it really can be executed, because there may be dependencies or the wrong OS, etc..
To view the content of the file, there are different utillities on different platforms. For example you can use objdump on Linux or Cygwin/Windows to take a look at it. You can use a disassembler and see if you can make sense of it. On Windows you can use IDA which covers a broad range of fileformats and may be able to dissect it.
Now that you managed to take a look inside it, there is the next issue you asked for, by converting it. This is a tedious process though, because you must do it by hand. If IDA could identify it, you get a good start because you now have an assembly source as a starting point, but it will likely not assemble, and certainly not run on your target platform (Windows).
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
Is there a way how I can tell a C++ compiler/linker to compile the source code into my own homemade opcode list? I need it for my virtual machine which will execute on a microcontroller.
I don't want to create a C++ compiler from scratch, but only change the opcodes, addresses of CPU status register, stack pointer and GPIO registers, program memory and data memory from an existing compiler that is open source so that people making programs for it don't have to rewrite the whole code, but just port it using the libraries that are compatible with my own compiler's libraries.
Example is an avr-gcc compiler.
The compiler and its libraries must not be proprietary in the way that I or any programmer have to pay for it and I don't want it to be either GPL in such way that a programmer must reveal source for their own projects. I want all my programmers to freely use my compiler, be free to license their work in whatever way they want as well as choose to make it open source or proprietary.
Let's consider the steps involved:
Retargeting an existing C++ compiler: Several production-quality, retargetable C++ compilers are freely available today. For instance, the LLVM platform (clang++) provides some pointers on writing a backend for a new hardware architecture (this naturally applies to VM's as well!). Unfortunately, up-to-date documentation on porting the GNU compilers is harder to come by. It's entirely possible that many of the older documents remain relevant today, but I know far too little about GCC to say.
Note that the effort required to retarget either compiler is likely to depend on how well the instruction set of your virtual machine matches the compiler's low-level intermediate representation. Since they often (at least semantically) take the form of three-address code ― that is, instructions with two source operands and one destination ― writing a code generator for, say, a stack machine (in which all operands are implicitly addressed) could prove to be a bit more difficult.
From this point on, you really have two options. You could stick to the conventional way in which C++ programs are compiled, i.e., from source, to assembly, to object files, to linked executable or library. That involves going through the steps I have outlined below. But since you are targeting a virtual machine, it may have requirements that are radically different from those of modern hardware architectures. In that case, you may want to steer clear of existing software like binutils and roll your own assembler and linker.
Writing or porting an assembler: Unless your chosen compiler is able to directly generate machine code, you will most likely also need to write an assembler for your virtual machine, or port an existing one. If your virtual machine's instruction set looks anything like that of a modern machine, and if you want to use the standard C++ compilation/linking pipeline, you could look into porting binutils, specifically gas, the GNU assembler.
Writing or porting a linker: The object files produced by your assembler are not in themselves executable programs. Addresses must be assigned to symbols and segments, and references between object files must be resolved. This means that the linker needs some understanding of your instruction set. In particular, it must be able to find and patch locations in code and data that address memory. The binutils porting guide I linked above is relevant here, too; you may also enjoy reading Linkers and Loaders.
As #Mat noted in the comment section above, the GPL doesn't usually "infect" the output of a program licensed under it. See this section. Notably:
The output from running a covered work is covered by this License only if the output, given its content, constitutes a covered work.
I am not a lawyer, but I take this to mean that an exception would be made for, say, compiling the compiler with itself ― the output would still be subject to the terms of the GPL.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
Trying to build my own non-GNU cross-platform C++ environment, I have faced the fact that I don't really understand the basics of the stack unwinding. The environment I build is as follows:
libc++ ← libc++abi ← libunwind (or some other unwinder).
I've found that libc++abi already contains some kind of libunwind, but doesn't use it on Linux. From the comments I understood, that it's special libunwind: LLVM Stack Unwinder that supports only Darwin and ARM but not x86_64 - and it's confusing. How does it come that the CPU architecture affects the stack unwinding process?
Also I know about following stack unwinders:
glibc built-in.
libc++abi LLVM libunwind.
GNU libunwind (from savanna).
Questions:
How does the platform or CPU architecture affects the stack unwinding process?
Why to have many stack unwinders - and not just a single one?
What kind of unwinders do exist and what is a difference between them?
Expectations from an answer:
I expect to get the answer that covers this topic as whole, not just separate points on each question.
Fundamentally, the stack layout is up to the compiler. It can lay out the stack in almost whatever way it thinks is best. The language standard says nothing about how the stack is laid out.
In practice, different compilers lay the stack out differently and the same compiler can also lay it out differently when run with different options. The stack layout will depend on the size of types on the target platform (especially the size of pointer types), compiler options such as GCC's -fomit-frame-pointer, ABI requirements of the platform (eg x64 has a defined ABI where x86 does not). How to interpret the stack will also depend on how the compiler stores the relevant information. This in turn partly depends on the executable format (probably either ELF or COFF these days, but actually, as long as the OS can load the executable and find the entry point, everything else is pretty much up for grabs) and partly on the debug information format - which is again specific to the compiler/debugger combination in use. In the end, it is perfectly possible to write inline assembler that manipulates the stack and program flow in a way that no unwinder will be able to follow. Some compilers also allow you to customise the function prologue and epilogue, giving you another opportunity to confuse any unwinding algorithm.
The net effect of all this is that it is not possible to write a single stack-unwinding algorithm that will work everywhere. The unwinding algorithm has to match the compiler, OS and, for more than the most basic information, the debugger. The best you can do is to write a simple stack-unwinding interface and implement it differently for each compiler/OS/debugger combination you support.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I've read that
The compiler converts code written in a human-readable programming language into a
machine code representation which is understood by your processor.
How does the compiler know about the instruction set of my CPU?
Any reference to understand job of assembler, linker and loader would be helpful.
How does the compiler know about the instruction set of my CPU?
Most compilers only know how emit code for a specific CPU (or a small number of them). Each target CPU requires that someone write a compiler back-end for it, and that task is non-trivial.
GCC supports a large variety of targets, but even GCC is built to emit code for only a few targets. In other words, you can build GCC to emit code for x86_64 and i*86 processors, and you can build another copy to emit code for PowerPC, but you can't build a single GCC that will produce code for all three.
Any reference to understand job of assembler, linker and loader
Google search for above terms led me here.
Essentially, each combination of hardware/software requires its own compiler. So, even though you might write C code in order for it to compile and run on a Windows machine you will still need a compiler that is a different implementation when compared to a C compiler for an Apple machine.
Just to throw in a twist, there are compilers that generate code for hardware that is different from the one they are running on. A simple example is the Arduino products. The Arduino IDE runs on a Windows (and others) machine but the code is compiled for an Atmel micro-processor on the Arduino board. Furthermore, each board style (UNO, etc.) has a different flavor of Atmel mico-processor, which is why in the IDE you have to identify the target of your sketch so that the compiler can make the necessary adjustments to run on the specific hardware(instruction set).
This idea also applies to assemblers and linkers (maybe a little less so).