How to execute a bitcode file with LLVM 3.3? - llvm

I'm starting to program with LLVM, and trying to execute a bitcode.
I came up to this code, adapted from old examples (my doubt is in the creation of the MemoryBuffer, getFile(string) does not exist anymore):
string *errorString = new string;
LLVMContext context;
OwningPtr<MemoryBuffer> *mb = new OwningPtr<MemoryBuffer>;
MemoryBuffer::getFileOrSTDIN(argv[1], *mb);
Module *m = ParseBitcodeFile(mb->take(), context, errorString);
ExecutionEngine *ee = EngineBuilder(m).create();
Function *main = m->getFunction("main");
From this line on nothing works (segmentation fault)
1 - "standard" approach?
void * f = ee->getPointerToFunction(main);
void (*FP)() = (void (*)()) f;
2 - lli's approach, not sure about the '0' for envp
vector<string> *argList = new vector<string>;
ee->runFunctionAsMain(main, *argList, 0);
3 - a generalization of 2.
vector<struct GenericValue> *argList = new vector<struct GenericValue>;
ee->runFunction(main, *argList);

The lli tool is your reference here. As an official LLVM tool and part of the repository and releases, it is always up to date with the latest LLVM APIs. The file tools/lli/lli.cpp is just ~500 lines of code, much of it header files, option definitions and comments. The main function contains the exact flow of execution and is cleanly structured and commented.
You can pick one of two approaches:
Start with lli.cpp as is, gradually stripping things you don't need.
Take the relevant parts from lli.cpp into your own main file.
If the problem is rather with your main, you can always find examples of bitcode files that actually run with lli within the LLVM tests - test/ExecutionEngine - most tests there are bitcode files on which lli is invoked and runs successfully.

After running into the same problem as you, I searched through lli.cpp for all non-optional invocations to modules, enginebuilders etc...
I believe what you are missing is a call to "ee->runStaticConstructorDestructors(false)"
Atleast, this fixed the issue for me
Note: This is under llvm3.4, but I have verified that the same instruction also exists in llvm3.1, indicating it propably exists in 3.3 aswell.

Related

How to attach debug information into an instruction in a LLVM Pass

I am trying to collect some information from my LLVM optimization pass during runtime. In other words, I want to know the physical address of a specific IR instruction after compilation. So my idea is to convert the LLVM metadata into LLVM DWARF data that can be used during runtime. Instead of attaching the filename and line numbers, I want to attach my own information. My question falls into two parts:
Here is a code that can get the Filename and Line number of an instruction:
if (DILocation *Loc = I->getDebugLoc()) { // Here I is an LLVM instruction
unsigned Line = Loc->getLine();
StringRef File = Loc->getFilename();
StringRef Dir = Loc->getDirectory();
bool ImplicitCode = Loc->isImplicitCode();
}
But How can I set this fields? I could not find a relevant function.
How can I see the updated Debug Information during (filename and line numbers) runtime? I used -g for compiling but still I do not see the Debug Information.
Thanks
The function you need it setDebugLoc() and the info is only included in the result if you include enough of it. The module verifier will tell you what you're missing. These two lines might also be what's tripping you up.
module->addModuleFlag(Module::Warning, "Dwarf Version", dwarf::DWARF_VERSION);
module->addModuleFlag(Module::Warning, "Debug Info Version", DEBUG_METADATA_VERSION);

generate machine code directly via LLVM API

With the following code, I can generate an LLVM bitcode file from a module:
llvm::Module * module;
// fill module with code
module = ...;
std::error_code ec;
llvm::raw_fd_ostream out("anonymous.bc", ec, llvm::sys::fs::F_None);
llvm::WriteBitcodeToFile(module, out);
I can then use that bitcode file to generate an executable machine code file, e.g.:
clang -o anonymous anonymous.bc
Alternatively:
llc anonymous.bc
gcc -o anonymous anonymous.s
My question now is: Can I generate the machine code directly in C++ with the LLVM API without first needing to write the bitcode file?
I am looking for either a code example or at least some starting points in the LLVM API, e.g. which classes to use, nudging me in the right direction might even be enough.
I was also looking for the code for this, and #arrowd's suggestion worked.
To save the trouble for the next person, this is what I came up with.
Given a Module, it generates assembly code on stdout for your native target:
void printASM(Module *M) {
InitializeNativeTarget();
InitializeNativeTargetAsmPrinter();
auto TargetTriple = sys::getDefaultTargetTriple();
M->setTargetTriple(TargetTriple);
std::string Error;
const Target *target = TargetRegistry::lookupTarget(TargetTriple, Error);
auto cpu = sys::getHostCPUName();
SubtargetFeatures Features;
StringMap<bool> HostFeatures;
if (sys::getHostCPUFeatures(HostFeatures))
for (auto &F : HostFeatures)
Features.AddFeature(F.first(), F.second);
auto features = Features.getString();
TargetOptions Options;
std::unique_ptr<TargetMachine> TM{
target->createTargetMachine(
TargetTriple, cpu, features, Options,
Reloc::PIC_, None, CodeGenOpt::None)
};
legacy::PassManager PM;
M->setDataLayout(TM->createDataLayout());
TM->addPassesToEmitFile(PM, (raw_pwrite_stream &) outs(), (raw_pwrite_stream *) (&outs()),
TargetMachine::CodeGenFileType::CGFT_AssemblyFile, true, nullptr);
PM.run(*M);
}
If anyone knows a shorter way to write this code, feel free to correct me!
Take a look at llc tool source, spcifically compileModule() function. In short, it creates Target, sets some options for it via TargetOptions, then uses it to addPassesToEmitFile() and finally asks PassManager to perform all planned tasks.

llvm.stackprotect of LLVM

I just get started with LLVM. I am reading the code for stack protection which is located in lib/CodeGen/StackProtector.cpp. In this file, the InsertStackProtectors function will insert a call to llvm.stackprotect to the code:
// entry:
// StackGuardSlot = alloca i8*
// StackGuard = load __stack_chk_guard
// call void #llvm.stackprotect.create(StackGuard, StackGuardSlot)
// ...(Skip some lines)
CallInst::
Create(Intrinsic::getDeclaration(M, Intrinsic::stackprotector),
Args, "", InsPt);
This llvm.strackprotect(http://llvm.org/docs/LangRef.html#llvm-stackprotector-intrinsic) seems to be an intrinsic function of llvm, so I tried to find the source code of this function. However, I cannot find it...
I do find one line definition of this function in include/llvm/IR/Intrinsics.td, but it does not tell how it is implemented.
So my questions are:
Where can I find the code for this llvm.strackprotect function?
What is the purpose of these *.td files?
Thank you very much!
The .td file is LLVM's use of code-generation to reduce the amount of boilerplate code. In this particular case, ./include/llvm/IR/Intrinsics.gen is generated in the build directory and contains code describing the intrinsics specified in the .td file.
As for stackprotector, there's a bunch of code in the backend for handling it. See for instance lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp - in SelectionDAGBuilder::visitIntrinsicCall it generates the actual DAG nodes that implement this intrinsic

Tracking code versions in an executable

I have a reasonable sized ( around 40k lines) machine learning system written in C++. This is still in active development and I need to run experiments regularly even as I make changes to my code.
The output of my experiments is captured in simple text files. What I would like to do when looking at these results is have some way of figuring out the exact version of the code that produced it. I usually have around 5 to 6 experiments running simultaneously, each on a slightly different version of the code.
I would like to know for instance that a set of results was obtained by compiling version 1 of file A, version 2 of file B etc (I just need some identifier and the output of "git describe" will do fine here ).
My idea is to somehow include this info when compiling the binary. This way, this can be printed out along with the results.
Any suggestions how this can be done in a nice way. In particular, any nice way of doing this with git?
I generate a single source file as part of my build process that looks like this:
static const char version_cstr[] = "93f794f674 (" __DATE__ ")";
const char * version()
{
return version_cstr;
}
Then its easy to log the version out on startup.
I originally used a DEFINE on the command line, but that meant every version change everything got recompiled by the build system - not nice for a big project.
Here's the fragment of scons I use for generating it, maybe you can adapt it to your needs.
# Lets get the version from git
# first get the base version
git_sha = subprocess.Popen(["git","rev-parse","--short=10","HEAD"], stdout=subprocess.PIPE ).communicate()[0].strip()
p1 = subprocess.Popen(["git", "status"], stdout=subprocess.PIPE )
p2 = subprocess.Popen(["grep", "Changed but not updated\\|Changes to be committed"], stdin=p1.stdout,stdout=subprocess.PIPE)
result = p2.communicate()[0].strip()
if result!="":
git_sha += "[MOD]"
print "Building version %s"%git_sha
def version_action(target,source,env):
"""
Generate file with current version info
"""
fd=open(target[0].path,'w')
fd.write( "static const char version_cstr[] = \"%s (\" __DATE__ \")\";\nconst char * version()\n{\n return version_cstr;\n}\n" % git_sha )
fd.close()
return 0
build_version = env.Command( 'src/autogen/version.cpp', [], Action(version_action) )
env.AlwaysBuild(build_version)
You can use $Id:$ in your source file, and Git will substitute that with the sha1 hash, if you add the file containing this phrase in .gitattributes with the option "ident" (see gitattributes).

gdb and GPS: Cannot set a breakpoint on a function or procedure that is part of a protected type Ada object

I've got a protected object that presents functions and procedures in its interface.
In gdb, when I set a bp on the first line of one of those, I get odd results.
Here's a snippet from my gdb console:
(gdb)
(gdb) b database-access_manager.adb:20001
Breakpoint 3 at 0x1a10588: file y:/svs/central_switch/controller/database/
database-access_manager.ads, line 20001.
(gdb)
You can see that gdb is confused. I specified a bp at 20001 of the .adb file but gdb responded by saying it had set the bp at 20001 of the corresponding ads file - which doesn't have that many lines.
What gives?
That .ads file wouldn't happen to be defining or using a generic, would it?
I have yet to find a debugger that handles Ada generics very well. The compiler often creates a raft of semi-invisible code that confuses the heck out of debuggers. I suspect C++ templates have the same issue.
Another possibility is that you are looking at a source file that has been modified since your program was compiled.
Running on Windows with GNAT Pro 6.3.1 (I realise this isn't an ideal data point for you!) this worked fine.
I did notice that when I requested a bp on the subprogram specification, GDB effectively set two bps, one in the specification and one at the first statement: so, given
package body Protected_Object is
protected body PO is
procedure Put (V : Integer) is
begin
Value := V;
end Put;
function Get return Integer is
begin
return Value;
end Get;
end PO;
end Protected_Object;
the GDB console says (for Put)
gdb) break protected_object.adb:4
Breakpoint 1 at 0x401729: file protected_object.adb, line 6. (2 locations)
and at run time, sure enough there are 2 breaks:
Breakpoint 1, <protected_object__po__putP> (<_object>=..., v=42) at protected_object.adb:4
(gdb) cont
Breakpoint 1, protected_object.po.put (<_object>=..., v=42) at protected_object.adb:6
Version: GNU gdb (GDB) 7.0.1 for GNAT Pro 6.3.1 (20100112) [rev:158983]
Here's the update on my problem.
I made a protected type with access methods and used it in a small main and found that breakpoints in my example protected type worked fine.
Now I'm trying to understand why, within the context of my company's very large build, the breakpoints don't work.
I'm using the same gdb, GPS, & compiler switches in each case and it works for the small program but not in the large one.
I'll post my results when/if I have any.
Thanks to all the repliers.
Tom