verify LLVM IR consistency - llvm

We can use LLVMVerifyModule to verify the consistency of modules inside of code. But imagine we handcrafted a LLVM IR file. This file can be syntactically correct but I do want to run something to sanitize it.
Consider the following module:
; ModuleID = 'main'
source_filename = "main"
define void #gl.main() {
entry:
ret void
ret void
}
I suppose it is not right because it contains two return in the same block. I ran llvm-as but it went through. Am I understanding it correctly? What command do you run to verify a hand-written LLVM source file?
update:
opt doesn't crash but instead output this file:
; ModuleID = 'try.bc.2'
source_filename = "main"
define void #gl.main() {
entry:
ret void
0: ; No predecessors!
ret void
}

Related

how to call a func with parameters from an executable using gdb

I need help running a program in an executable using GDB.
I have an executable file name vuln. I do not know the source code as I am doing a CTF. When I analyzed the executable, I found three exciting functions: main, vuln, and flag. Vuln func is vulnerable to BOF attack, but I do not want to go that way. What I am trying to do is run the executable in gdb, and I used print (void) flag(param1, param2) command to directly run flag func as this is supposed to give me a flag; however, it does not work as it says my parameters are incorrect which I am sure are not. I have also found out about the jump function, but I cannot pass any parameters.
So is there any way to run a function from executable with parameters properly or I would have to go through the pain of BOF.
GHIDRA disassembled code of FLAG and VULN Func are below.
void flag(int param_1, int param_2){
char local_50 [64];
FILE *local_10;
local_10 = fopen("flag.txt", "r");
if(local_10 != (FILE *)0x0){
fgets(local_50, 0x40, local_10);
if ((param_1 == -0x21524111) && (param_2 == -0x3f212ff3)){
printf(local_50);
}
return;
}
puts("Hurry up and try in on server side.");
exit(0);
}
void vuln(void)
{
char local_bc [180];
gets(local_bc);
puts(local_bc);
return;
}
print (void) flag(param1, param2)
Not sure what your values of param1 and param2 are, but this seems to work just fine for me:
echo "hello" > flag.txt
gdb -q ./a.out
(gdb) start
Temporary breakpoint 4 at 0x555555555307
Starting program: /tmp/a.out
Thread 1 "a.out" hit Temporary breakpoint 4, 0x0000555555555307 in main ()
(gdb) p (void)flag(-0x21524111, -0x3f212ff3)
hello
$2 = void
(gdb)

Exporting functions from LLVM C++ API to WebAssembly

Situation: I currently parse a front-end language and generate function definitions in LLVM IR.
I can compile the function definition to a WebAssembly file using the LLVM12 C++ API.
However, the generated wasm code does not "export" any of the compiled functions and thus cannot be accessed from a javascript that loads the wasm file.
Question: Could someone let me know what I might be missing? How does one tell the llvm compiler to create exports for the defined functions. I tried setting the function visibility to llvm::GlobalValue::DefaultVisibility. But that doesn't seem to help.
The generated IR for the function (with default visibility) looks like
define double #f(double %x) #0 {
entry:
%multmp = fmul double %x, 2.000000e+00
ret double %multmp
}
attributes #0 = { "target-features" }
The function to compile the module containing the function definition to the Wasm target looks like this:
llvm::Module *TheModule; // module containing the function definition
// function to compile to Wasm target
bool compile_file(){
const char *TargetTriple = "wasm-wasi";
// create a llvm::Target for the specified triple
std::string Error;
const llvm::Target *Target = llvm::TargetRegistry::lookupTarget(TargetTriple, Error);
if(!Target) {
llvm::errs() << Error;
return false;
}
// set the options and features for the target and create a TargetMachine instance
auto CPU = "generic";
auto Features = "";
llvm::TargetOptions opt;
auto RM = llvm::Optional<llvm::Reloc::Model>();
auto TheTargetMachine = Target->createTargetMachine(TargetTriple, CPU, Features, opt, RM);
TheModule->setDataLayout(TheTargetMachine->createDataLayout());
// create a output stream to write the compiled code to a .wasm file in the current directory
std::error_code EC;
llvm::raw_fd_ostream dest("output.wasm", EC, llvm::sys::fs::OF_None);
if(EC) {
llvm::errs() << "Could not open file: " << EC.message();
return false;
}
// set the visibility of all functions in the module to DefaultVisibility
auto &functionList = TheModule->getFunctionList();
for (auto &function : functionList) {
function.setVisibility(llvm::GlobalValue::DefaultVisibility);
}
// add a emit pass to write the generated code to the wasm file
llvm::legacy::PassManager pass;
if(TheTargetMachine->addPassesToEmitFile(pass,dest,nullptr,llvm::CGFT_ObjectFile)){
llvm::errs() << "TheTargetMachine can't emit a file of this type";
return false;
}
// run the pass on the module and flush the output stream to the file
pass.run(*(TheModule));
dest.flush();
// return true on success
return true;
This outputs a wasm file that looks like
(module
(type $t0 (func (param f64) (result f64)))
(import "env" "__linear_memory" (memory $env.__linear_memory 0))
(import "env" "__indirect_function_table" (table $env.__indirect_function_table 0 funcref))
(func $f0 (type $t0) (param $p0 f64) (result f64)
local.get $p0
local.get $p0
f64.add))
However, this generated file has a problem.
It does not add an "export" statement to make the function f0 visible to the outside world, which would allow a javascript loading the wasm module to call the function f0.
Ideally, the generated file should have the function definition line looking like
func $f0 (export "f") (type $t0) (param $p0 f64) (result f64)
local.get $p0
local.get $p0
f64.add))
This way the loading javascript will have access to a function named "f" that it can call from the wasm.
Is there a way to specify to the LLVM C++ API that the function should be exported?
You can trigger the export of a given symbol by setting the wasm-export-name and wasm-export-name attributes.
In C/C++ these correspond the export_name and export_module clang attribtes.
See llvm/test/CodeGen/WebAssembly/export-name.ll in the llvm tree for an example of this.
You can also ask the linker to export a given symbol with the --export command line flag. See https://lld.llvm.org/WebAssembly.html#exports.

Modifying the debug information of llvm IR

I want to modify debug information of an llvm instruction so that the
modified debug info is subsequently passed to executable binary. So if I
use "addr2line" utility on the binary, it will return my modified debug
information.
I've tried to change by using the following code snippet:
MDNode *N = Inst->getMetadata("dbg");
DebugLoc Loc = DebugLoc::get(newLine, newCol, N);
Inst->setDebugLoc(Loc);
I read the DebugLoc back by using
const DebugLoc D = Inst->getDebugLoc();
unsigned Line = D.getLine();
outs() << Line <<"\n";
But I can't set the debug info correctly. How can I change the debug
info correctly through llvm pass?

LLVM JIT: pass C++ exception through JIT code back to host application

I'm working on a project where I use clang to generate some LLVM IR and then JIT-compile and run it from within my host application. The JIT code calls some functions in the host application which may throw an exception. I expect the exception to be thrown through the JIT code and catched back in the host application. AFAIK this is supposed to work with LLVM, but unfortunately my test application always crashes with "terminate called after throwing an instance of 'int'". Let me give some simple example.
I use clang 3.5 to compile the following simple program into LLVM IR:
extern void test() ;
extern "C" void exec(void*) {
test();
}
with
./clang -O0 -S -emit-llvm test.cpp -c
The result is test.ll
; ModuleID = 'test.cpp'
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
; Function Attrs: uwtable
define void #exec(i8*) #0 {
%2 = alloca i8*, align 8
store i8* %0, i8** %2, align 8
call void #_Z4testv()
ret void
}
declare void #_Z4testv() #1
attributes #0 = { uwtable "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
!llvm.ident = !{!0}
!0 = metadata !{metadata !"clang version 3.5.0 (224841)"}
My host application looks like this:
static void test() {
throw 1;
}
int main(int, const char **) {
llvm::InitializeNativeTarget();
llvm::InitializeNativeTargetAsmPrinter();
llvm::InitializeNativeTargetAsmParser();
llvm::LLVMContext &Context = llvm::getGlobalContext();
llvm::SMDiagnostic Err;
llvm::Module *Mod = llvm::ParseIRFile("test.ll", Err, Context);
llvm::ExecutionEngine* m_EE = llvm::EngineBuilder(Mod)
.setEngineKind(llvm::EngineKind::JIT)
.create();
llvm::Function* f = Mod->getFunction("_Z4testv");
m_EE->addGlobalMapping(f, reinterpret_cast<void*>(test));
f = Mod->getFunction("exec");
void* poi = m_EE->getPointerToFunction(f);
void (*exec)(void*) = reinterpret_cast<void (*)(void*)>(poi);
try {
exec(NULL);
} catch (...) {
std::cout << "catched exception" << std::endl;
}
return 0;
}
I use LLVM 3.5 which I compiled with cmake. I set LLVM_ENABLE_EH=ON and LLVM_ENABLE_RTTI=ON. Did I miss something when compiling LLVM or is my host application code wrong?
Thanks!
Finally it works and here are a few things which are necessary to fix the issue.
First it's important to make sure MCJIT.h has been included, otherwise MCJIT is not linked in. Unfortunately LLVM silently falls back to the old JIT implementation if MCJIT.h has not been included even though MCJIT has been explicitly requested by:
llvm::EngineBuilder factory(Mod);
factory.setEngineKind(llvm::EngineKind::JIT);
factory.setUseMCJIT(true);
Only MCJIT supports propper exception handling.
In the example in the question I used
Execution::Engine::addGlobalMapping()
which does not work with MCJIT. External function must be reqistered via
llvm::sys::DynamicLibrary::AddSymbol()
Following the complete example:
static void test() {
throw 1;
}
int main(int, const char **) {
llvm::InitializeNativeTarget();
llvm::InitializeNativeTargetAsmPrinter();
llvm::InitializeNativeTargetAsmParser();
llvm::LLVMContext &Context = llvm::getGlobalContext();
llvm::SMDiagnostic Err;
llvm::Module *Mod = llvm::ParseIRFile("test.ll", Err, Context);
std::unique_ptr<llvm::RTDyldMemoryManager> MemMgr(new llvm::SectionMemoryManager());
// Build engine with JIT
std::string err;
llvm::EngineBuilder factory(Mod);
factory.setErrorStr(&err);
factory.setEngineKind(llvm::EngineKind::JIT);
factory.setUseMCJIT(true);
factory.setMCJITMemoryManager(MemMgr.release());
llvm::ExecutionEngine *m_EE = factory.create();
llvm::sys::DynamicLibrary::AddSymbol("_Z4testv", reinterpret_cast<void*>(test));
llvm::Function* f = Mod->getFunction("exec");
m_EE->finalizeObject();
void* poi = m_EE->getPointerToFunction(f);
void (*exec)(void*) = reinterpret_cast<void (*)(void*)>(poi);
try {
exec(NULL);
} catch (int e) {
std::cout << "catched " << e << std::endl;
}
return 0;
}
Additionally you can now also get Debug Symbols for the JIT code by adding:
Opts.JITEmitDebugInfo = true;

GCC Plugin, add new optimizing pragma

I'm creating a GCC plugin.
I'm trying to create a plugin for a specific loop transformation - unroll loop exactly N (parameter given) times.
I have installed plugins correctly and I can successfully register my pragma in compilation process.
When I register pragma with function c_register_pragma, I can handle it in lexical analysis (with function handle_my_pragma), but how can I find it then?
I can also define my own pass and traverse GIMPLE, but there is no trace of any pragma.
So my question is: Where is my pragma and how can I influence my code with it?
Or what would you suggest to reach my goal? It doesn't have to be with pragma, but it seemed to be a good idea.
Also, I know about MELT, but within the study of GCC, I would prefer pure plugin in C.
My code
static bool looplugin_gate(void)
{
return true;
}
static unsigned looplugin_exec(void)
{
printf( "===looplugin_exec===\n" );
basic_block bb;
gimple stmt;
gimple_stmt_iterator gsi;
FOR_EACH_BB(bb)
{
for (gsi=gsi_start_bb(bb); !gsi_end_p(gsi); gsi_next(&gsi), j++)
{
stmt = gsi_stmt(gsi);
print_gimple_stmt (stdout, stmt, 0, TDF_SLIM);
}
}
return 0;
}
void handle_my_pragma(cpp_reader *ARG_UNUSED(dummy))
{
printf ("=======Handling loopragma=======\n" );
enum cpp_ttype token;
tree x;
int num = -1;
token = pragma_lex (&x);
if (TREE_CODE (x) != INTEGER_CST)
warning (0, "invalid constant in %<#pragma looppragma%> - ignored");
num = TREE_INT_CST_LOW (x);
printf( "Detected #pragma loopragma %d\n", num );
}
static void register_my_pragma (void *event_data, void *data)
{
warning (0, G_("Callback to register pragmas"));
c_register_pragma (NULL, "loopragma", handle_my_pragma);
}
static struct opt_pass myopt_pass =
{
.type = GIMPLE_PASS,
.name = "LoopPlugin",
.gate = looplugin_gate,
.execute = looplugin_exec
};
int plugin_init(struct plugin_name_args *info, /* Argument infor */
struct plugin_gcc_version *ver) /* Version of GCC */
{
const char * plugin_name = info->base_name;
struct register_pass_info pass;
pass.pass = &myopt_pass;
pass.reference_pass_name = "ssa";
pass.ref_pass_instance_number = 1;
pass.pos_op = PASS_POS_INSERT_BEFORE;
register_callback( plugin_name, PLUGIN_PRAGMAS, register_my_pragma, NULL );
register_callback( plugin_name, PLUGIN_PASS_MANAGER_SETUP, NULL, &pass );
return 0;
}
PS: If there was someone familiar with GCC plugins development and had a good heart :), please contact me (mbukovy gmail com). I'm doing this because of my final thesis (own choice) and I welcome any soulmate.
When I register pragma with function c_register_pragma, I can handle it in lexical analysis (with function handle_my_pragma), but how can I find it then?
There is an option (actually, a hack) to create fictive helper function call at the place of pragma, when parsing. Then you can detect this function by name in intermediate representation.
Aslo, several days ago there was a question in GCC ML from felix.yang (huawei) "How to deliver loop-related pragma information from TREE to RTL?" - http://comments.gmane.org/gmane.comp.gcc.devel/135243 - check the thread
Some recommendations from the list:
Look at how we implement #pragma ivdep (see replace_loop_annotate ()
and fortran/trans-stmt.c where it builds ANNOTATE_EXPR).
Patch with replace_loop_annotate() function addition and ivdep pragma implementation: "Re: Patch: Add #pragma ivdep support to the ME and C FE" by Tobias Burnus (2013-08-24).
I do not think registering a DEFERRED pragma in plugin is possible, since the handler for deferred pragma is not exposed in GCC plugin level.
So your pragma just works during preprocessing stage instead of parsing stage, then, it is quite tricky to achieve an optimization goal.