How to declare a global integer instance in LLVM IR? - c++

I was wondering if anyone knew how to declare a global integer instance in LLVM IR. So far, I've been doing the following:
// Create symbol to identify previous block. Added by Justin.
llvm::Type::TypeID stupidTypeID = llvm::Type::IntegerTyID;
llvm::Type* typePtr = llvm::Type::getPrimitiveType(_context, stupidTypeID);
llvm::GlobalVariable* prevBlockID = new llvm::GlobalVariable(typePtr,
false,
llvm::GlobalValue::LinkerPrivateLinkage,
NULL,
"PREV_BLOCK_ID");
When I try to run, I get the following error:
static llvm::PointerType* llvm::PointerType::get(llvm::Type*, unsigned int): Assertion `EltTy && "Can't get a pointer to <null> type!"' failed.

It's due to the wrong type. You can have a look at Type::getPrimitiveType implementation here. Simply put, that's NOT the API you are advised to use; for IntegerType, it returns nullptr. Also, in definition of TypeID in llvm/IR/Type.h, there comments that:
/// Note: If you add an element to this, you need to add an element to the
/// Type::getPrimitiveType function, or else things will break!
Basically you can generate the type by 2 approaches:
static get API for the specified type
In your case,
IntegerType *iTy = IntegerType::get(ctx, 32); // if it's 32bit INT
a helper class named TypeBuilder
It makes type generation more easily and universally. TypeBuilder is especially useful and intuitive when you need to define more complicated types, e.g, FunctionType, of course with the cost of compiling your source code slowly(should you care?).
IntegerType *intType = TypeBuilder<int, false>::get(ctx); // normal C int
IntegerType *intTy = TypeBuilder<types::i<32>, false>::get(ctx); // if it's 32bit INT
BTW, you can also try ELLCC online compiler to get the corresponding C++ code for generating LLVM IR of current c/c++ src, where you need to choose the target of Output Options as LLVM C++ API code. Alternatively you can try it yourself on your machine(Since internally the online compiler simply invokes llc):
llc input.ll -march=cpp -o -

Related

C++ LinkTime/CompileTime Generate Function Offset From Start Of .Text Section Or Other Reference Point

So I have a need for a way to get an offset of a function from its PE files .text region/whatever section it is in, or within reference to another function within the file.
I'd like to do something similar:
void func_two()
{
/*...*/
}
void call_our_function()
{
/*...*/
}
void main_loop()
{
constexpr offset_of_two = (int)&func_two - (int)&call_our_function;
// calls func_two
(decltype(&func_two)(offset_of_two + (int)&call_our_function))();
/* OR : */
void* text_region = find_pe_text_region_start();
constexpr offset_from_text = get_offset_from_linker_somehow();
// calls func_two
(decltype(&func_two)(offset_from_text + (int)&offset_from_text))();
}
constexpr doesn't allow this. I'm assuming its because the linker sets these values for func address/etc at link-time. However I know that link time theoretically could do this, otherwise export tables and RVAs in the PE file wouldn't work. I know I could export them and parse the export table, but that doesn't particularly work for my use case.
Anybody know of any ways to solve this problem, without calculating them at runtime? Maybe a plugin for the linker, however I doubt MSVC supports that. Very specific use I have here.
Function pointers are a separate class of pointers and you can't only cast them to other function pointers. They may be larger then uintptr_t and certainly will be larger than int on common 64bit architectures. Using int is totally UB. Using uintptr_t would at least bring it up to implementation defined behavior.
But you are right that the values are only going to be available at link time. Until you link the compiler has no idea where in memory the functions will end up and thus can't know the offsets between them.
So there is no way of making this constexpr. It should become link time evaluated though. The object format (at least ELF) allows encoding the difference between 2 symbols and other simple math and the linker will compute the actual value at link time. There should be no runtime overhead for this.
PS: declare the offsets global and check if the resulting binary contains them as constants or computes them in the init_array / ctors. The local variables might compute them at runtime because that doesn't require defining an extra constant.

How to convert function insertion module pass to intrinsic to inline

PROBLEM:
I currently have a traditional module instrumentation pass that
inserts new function calls into a given IR according to some logic
(inserted functions are external from a small lib that is later linked
to given program). Running experiments, my overhead is from
the cost of executing a function call to the library function.
What I am trying to do:
I would like to inline these function bodies into the IR of
the given program to get rid of this bottleneck. I assume an intrinsic
would be a clean way of doing this, since an intrinsic function would
be expanded to its function body when being lowered to ASM (please
correct me if my understanding is incorrect here, this is my first
time working with intrinsics/LTO).
Current Status:
My original library call definition:
void register_my_mem(void *user_vaddr){
... C code ...
}
So far:
I have created a def in: llvm-project/llvm/include/llvm/IR/IntrinsicsX86.td
let TargetPrefix = "x86" in {
def int_x86_register_mem : GCCBuiltin<"__builtin_register_my_mem">,
Intrinsic<[], [llvm_anyint_ty], []>;
}
Added another def in:
otwm/llvm-project/clang/include/clang/Basic/BuiltinsX86.def
TARGET_BUILTIN(__builtin_register_my_mem, "vv*", "", "")
Added my library source (*.c, *.h) to the compiler-rt/lib/test_lib
and added to CMakeLists.txt
Replaced the function insertion with trying to insert the intrinsic
instead in: llvm/lib/Transforms/Instrumentation/myModulePass.cpp
WAS:
FunctionCallee sm_func =
curr_inst->getModule()->getOrInsertFunction("register_my_mem",
func_type);
ArrayRef<Value*> args = {
builder.CreatePointerCast(sm_arg_val, currType->getPointerTo())
};
builder.CreateCall(sm_func, args);
NEW:
Intrinsic::ID aREGISTER(Intrinsic::x86_register_my_mem);
Function *sm_func = Intrinsic::getDeclaration(currFunc->getParent(),
aREGISTER, func_type);
ArrayRef<Value*> args = {
builder.CreatePointerCast(sm_arg_val, currType->getPointerTo())
};
builder.CreateCall(sm_func, args);
Questions:
If my logic for inserting the intrinsic functions shouldnt be a
module pass, where do i put it?
Am I confusing LTO with intrinsics?
Do I put my library function definitions into the following files as mentioned in
http://lists.llvm.org/pipermail/llvm-dev/2017-June/114322.html as for example EmitRegisterMyMem()?
clang/lib/CodeGen/CodeGenFunction.cpp - define llvm::Instrinsic::ID
clang/lib/CodeGen/CodeGenFunction.h - declare llvm::Intrinsic::ID
My LLVM compiles, so it is semantically correct, but currently when
trying to insert this function call, LLVM segfaults saying "Not a valid type for function argument!"
I'm seeing multiple issues here.
Indeed, you're confusing LTO with intrinsics. Intrinsics are special "functions" that are either expanded into special instructions by a backend or lowered to library function calls. This is certainly not something you're going to achieve. You don't need an intrinsic at all, you'd just need to inline the function call in question: either by hands (from your module pass) or via LTO, indeed.
The particular error comes because you're declaring your intrinsic as receiving an integer argument (and this is how the declaration would look like), but:
asking the declaration of variadic intrinsic with invalid type (I'd assume your func_type is a non-integer type)
passing pointer argument
Hope this makes an issue clear.
See also: https://llvm.org/docs/LinkTimeOptimization.html
Thanks you for clearing up the issue #Anton Korobeynikov.
After reading your explanation, I also believe that I have to use LTO to accomplish what I am trying to do. I especially found this link very useful: https://llvm.org/docs/LinkTimeOptimization.html. It seems that I am now on a right path.

How can I find all places a given member function or ctor is called in g++ code?

I am trying to find all places in a large and old code base where certain constructors or functions are called. Specifically, these are certain constructors and member functions in the std::string class (that is, basic_string<char>). For example, suppose there is a line of code:
std::string foo(fiddle->faddle(k, 9).snark);
In this example, it is not obvious looking at this that snark may be a char *, which is what I'm interested in.
Attempts To Solve This So Far
I've looked into some of the dump features of gcc, and generated some of them, but I haven't been able to find any that tell me that the given line of code will generate a call to the string constructor taking a const char *. I've also compiled some code with -s to save the generated equivalent assembly code. But this suffers from two things: the function names are "mangled," so it's impossible to know what is being called in C++ terms; and there are no line numbers of any sort, so even finding the equivalent place in the source file would be tough.
Motivation and Background
In my project, we're porting a large, old code base from HP-UX (and their aCC C++ compiler) to RedHat Linux and gcc/g++ v.4.8.5. The HP tool chain allowed one to initialize a string with a NULL pointer, treating it as an empty string. The Gnu tools' generated code fails with some flavor of a null dereference error. So we need to find all of the potential cases of this, and remedy them. (For example, by adding code to check for NULL and using a pointer to a "" string instead.)
So if anyone out there has had to deal with the base problem and can offer other suggestions, those, too, would be welcomed.
Have you considered using static analysis?
Clang has one called clang analyzer that is extensible.
You can write a custom plugin that checks for this particular behavior by implementing a clang ast visitor that looks for string variable declarations and checks for setting it to null.
There is a manual for that here.
See also: https://github.com/facebook/facebook-clang-plugins/blob/master/analyzer/DanglingDelegateFactFinder.cpp
First I'd create a header like this:
#include <string>
class dbg_string : public std::string {
public:
using std::string::string;
dbg_string(const char*) = delete;
};
#define string dbg_string
Then modify your makefile and add "-include dbg_string.h" to cflags to force include on each source file without modification.
You could also check how is NULL defined on your platform and add specific overload for it (eg. dbg_string(int)).
You can try CppDepend and its CQLinq a powerful code query language to detect where some contructors/methods/fields/types are used.
from m in Methods where m.IsUsing ("CClassView.CClassView()") select new { m, m.NbLinesOfCode }

how to use llvm analysis pass in standalone program?

I want to use llvm alias analysis result in my standalone program, for example, maybe like this initially:
int main()
{
...
PassManager PM(M);
ImmutablePass* basic_aa = createBasicAliasAnalysisPass();
PM.add(basic_aa);
AliasAnalysis& AA = basic_aa->getAnalysis<AliasAnalysis>();
...
}
but the AA seems to make no sense. So how can I use llvm analysis pass in my standalone program?
llvm Analysis is not a pass but passes, that being said.
AA class is used to determine whether or not two pointers ever can point to the same object in memory.Traditionally, alias analyses respond to a query with a Must, May, or No alias response, indicating that two pointers always point to the same object, might point to the same object, or are known to never point to the same object
Example:
if you want to search for un-aliased global memory buffers that are only read from and pull them into the constant address space, you can create array of those pointer and Check for aliasing against non-read-only inputs.
AA->alias(psAVal, psBVal) != AliasResult::NoAlias
See:
http://llvm.org/docs/AliasAnalysis.html

Failing compilation if return value is unused for a certain type

I would like to make compilation fail for some function call but not others. The function call that I want to fail are those that do not handle return values when the value is of a certain type. In the example below, not handling a function returning Error is a compilation error but not handling a function that returns anything else should succeed just fine.
Note: our runtime environment (embedded) does not allow us to use the following constructs: RTTI, exceptions.
This code only needs to compiler with Clang, I would prefer not having to annotate each function.
We prefer a solution that fails at compile time instead of at runtime.
enum class Error {
INVAL,
NOERR,
};
// do something that can fail.
Error DoThing();
// may return different return codes, we never care (we can't change prototype)
int DoIgnoredThing();
int main() {
DoThing(); // compilation failure here, unused "Error" result
DoIgnoredThing(); // compilation succeeds, OK to ignore unused "int" result
return 0;
}
I don't know of a way to do it with straight C++, but if you're using g++ you can use the warn_unused_result attribute along with the -Werror=unused-result command-line flag. See the documentation for warn_unused result for how to specify it (you'll have to specify it on every function unfortunately; I don't believe you can specify it for a type). Then the compiler flag will turn that warning into an error.
If you're not using g++, your compiler may have similar functionality.
You might find it easier to use a code analysis tool to scan the source code.
This might let you use the return type, as you requested, or a different marker for the functions to test like a comment to indicate which functions should be checked.
You might run the analysis tool as part of the compilation process or as part of a build server.
I've tried a few ways to make a class that would do what you want, but I haven't been successful. Have you considered making your "return value" an argument passed by reference? That's the most common way that I've seen APIs force you to pay attention to the return value. So instead of
Error DoThing();
you have
void DoThing(Error& e);
Anything that calls DoThing() has to pass in an Error object or they will get a compiler error. Again, not exactly what you asked for, but maybe good enough?