GEP segmentation fault LLVM C++ API - llvm

I am sure that this is really simple but, I have been trying to figure it out for more than an hour and I cannot figure it out.
The following code gives me a segmentation fault:
Value *newArray = mBuilder.CreateGEP(alloca, value); // alloca is a `StructType`
but this does not
Value *newArray = mBuilder.CreateGEP(alloca, ConstantInt::get(mContext, APInt(32, 0)));
Value of value
%bar1 = load double, double* %bar
%3 = fptoui double %bar1 to i32
Debugging
When I debug it using lldb I get:
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
frame #0: 0x00000001000b9e6e a.out`llvm::PointerType::get(llvm::Type*, unsigned int) + 20
a.out`llvm::PointerType::get:
-> 0x1000b9e6e <+20>: movq (%rdi), %rax
Question
Why am I getting a segmentation fault and how do I fix it?
How to reproduce the problem?
The following code reproduces the problem:
#include <vector>
#include "llvm/ADT/STLExtras.h"
#include "llvm/Support/raw_ostream.h"
#include "llvm/IR/Value.h"
#include "llvm/ADT/APFloat.h"
#include "llvm/ADT/APInt.h"
#include "llvm/IR/Constants.h"
#include "llvm/IR/DerivedTypes.h"
#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/Instructions.h"
using namespace llvm;
static LLVMContext mContext;
static IRBuilder<> mBuilder(mContext);
static std::unique_ptr<Module> mModule = make_unique<Module>("example", mContext);
static Module *M = mModule.get();
static Type *dType = Type::getDoubleTy(mContext);
static Type *i32 = IntegerType::get(mContext, 32);
// helper functions
static AllocaInst *entryCreateBlockAllocaType(Function *func, std::string name, Type* type) {
IRBuilder<> tmpBuilder(&func->getEntryBlock(), func->getEntryBlock().begin());
return tmpBuilder.CreateAlloca(type, nullptr, name);
}
static ArrayRef<Value *> PrefixZero (Value *index) {
std::vector<Value *> out;
out.push_back(ConstantInt::get(mContext, APInt(32, 0)));
out.push_back(index);
return ArrayRef<Value *>(out);
}
static AllocaInst *createVariable () {
auto *func = mBuilder.GetInsertBlock()->getParent();
auto *initValue = ConstantInt::get(mContext, APInt(32, 0));
auto *alloca = entryCreateBlockAllocaType(func, "var", initValue->getType());
mBuilder.CreateStore(initValue, alloca);
return alloca;
}
static std::vector<Type *> elementTypes (3, dType);
static AllocaInst *createStruct () {
auto *func = mBuilder.GetInsertBlock()->getParent();
auto *mStructType = StructType::get(mContext, elementTypes);
return entryCreateBlockAllocaType(func, "str", mStructType);
}
int main () {
// create a main function
auto *FT = FunctionType::get(i32, std::vector<Type *>(), false);
auto *f = Function::Create(FT, Function::ExternalLinkage, "main", M);
// set insert point for out below code
auto *bb = BasicBlock::Create(mContext, "entry", f);
mBuilder.SetInsertPoint(bb);
// Create a variable
auto *variable = createVariable();
// create a struct
auto *mStruct = createStruct();
// Create a GEP with the loaded index
auto *loadedVar = mBuilder.CreateLoad(variable, "loaded_index");
// This is where the problem is.
// If `PrefixZero` is changed to `ConstantInt::get(mContext, APInt(32, 0))` this works
auto *elementPtr = mBuilder.CreateGEP(mStruct, PrefixZero(loadedVar));
mBuilder.CreateRet(ConstantInt::get(mContext, APInt(32, 0)));
f->print(errs()); // print out the function
return 1;
}
The code can also be checked out here.

There are two problems with your code:
static ArrayRef<Value *> PrefixZero (Value *index) {
std::vector<Value *> out;
out.push_back(ConstantInt::get(mContext, APInt(32, 0)));
out.push_back(index);
return ArrayRef<Value *>(out);
}
From the documentation of ArrayRef:
This class does not own the underlying data, it is expected to be used in situations where the data resides in some other buffer, whose lifetime extends past that of the ArrayRef.
In other words returning an ArrayRef to a local variable is illegal in the same way that returning a pointer to a local variable would be. Internally the ArrayRef just stores out's data pointer and as soon as out goes out of scope (i.e. at the end of PrefixZero), the data is freed and the ArrayRef now contains a pointer to freed memory.
When using getelementptr on a struct, the index that represent the member access (i.e. the second index in your case) must be a constant. If you think about it, it would be impossible to typecheck the instruction otherwise (keeping in mind that usually the members of a struct don't all have the same type). Plus calculating the pointer offset for a given non-constant index would basically have to generate an entire lookup table and it would be counter-intuitive for a pointer-arithmetic instruction to generate that much code. You can think of GEP on a struct as equivalent to the_struct.member_name in C and you can't replace member_name with a variable there either.
Note that, if assertions are enabled in your build of LLVM, the second issue should cause an assertion failure "Invalid GetElementPtrInst indices for type!", which, while not quite telling you everything you need to know (like in what way the indices are invalid), does point you in the right direction a lot more than just "segmentation fault" would. So if you didn't get that message, make sure you have assertions enabled, so you can benefit from the assertion messages the next time you run into problems.

Related

Passing array of class objects to a function

Passing an array of objects to a function is not giving back the desired values set in the settingUp function.
Try to print the values stored in the first item of the array in the main function.
main.ccp:
//** Libraries included **//
using namespace std;
//#include "common.h"
#include "settingUp.h"
int main(){
statusClass status[5];
//** Main Functions **//
settingUp(status);
status[1].printValues();
}
settings.h:
#ifndef settingUp_h
#define settingUp_h
//** Libraries **//
#include "statusClass.h"
#include <stdio.h>
#include "dataClass.h"
void settingUp(statusClass *_status);
#endif
settings.ccp //UPDATE: few lines corrected!
//** Libraries **//
#include "settingUp.h"
//** Status classes and their functions **//
void settingUp(statusClass *_status){
//statusClass statusProv;
dataClass * prueba0 = new dataClass(); //Corrected!
dataClass * prueba1 = new dataClass(); //Corrected!
dataClass * prueba2 = new dataClass(); //Corrected!
const dataClass * arrayPrueba[3];
prueba0.setValues(1);
prueba1.setValues(2);
prueba2.setValues(3);
arrayPrueba[0] = prueba0; //Corrected!
arrayPrueba[1] = prueba1; //Corrected!
arrayPrueba[2] = prueba2; //Corrected!
_status[1].setValues(1, arrayPrueba);
//_status = &statusProv;
_status[0].printValues();
}
UPDATE:
statusClass.cpp:
//** Libraries **//
#include "statusClass.h"
//** Status classes and their functions **//
void statusClass::setValues (uint8_t _statusSelectorByte, const dataClass **_array){
newStatusSelectorByte = _statusSelectorByte;
array = _array;
};
void statusClass::printValues(){
printf("TP: statusClass -> printValues: Prueba = %d\n", newStatusSelectorByte);
printf("TP: statusClass -> printValues: arrayPrueba = %d\n", array[1]->length);
}
printValues() in the settingUp() gives the right values, not in main.cpp.
Update: for array[0]->length works, for array[2]->length does not work.
When you do the following:
dataClass prueba0;
you create an object on the stack. This object is valid until you exit that function.
One solution is to allocate that object:
dataClass * prueba0 = new dataClass();
That means at some point you'll need to delete the object with:
delete prueba0;
To avoid having to use delete, you should look into using shared pointers.
I think your next problem is that in main you have:
statusClass status[5];
So 5 different status objects.
Then inside the initialization function, you specifically initialize _status[1]:
_status[1].setValues(1, arrayPrueba);
In other words, your _status[0] access within the initialization is going to show the random values that were on the stack when entering main() (which by luck are zeroes by default).
Maybe you are thinking that:
array = _array;
copies the values from one array to another. Right now, all that does is save a pointer. The array on the left is the pointer you created named arrayPrueba.
I just don't think you understand your code much and to tell you the truth, you should be using std::vector instead of C arrays. If you really want to write C++ code, learn the standard library (STL).

can't modify static set declared within a static method

I would like to store the value that is passed to the constructor in a static set returned by a static function.
It seems that the insertion is successful, but when it reach the end of the scope of the constructor it disappear.
I have reproduced it in a simple example:
// container.hh
#pragma once
#include <vector>
#include <set>
class container {
public:
container(const int& s);
static std::set<int, std::less<int>>& object_set_instance();
};
#include "container.hxx"
// container.hxx
#pragma once
#include "container.hh"
#include <iostream>
container::container(const int& s)
{
auto set = object_set_instance();
set.insert(s);
std::cout << "Size " << set.size() << "\n";
}
std::set<int, std::less<int>>& container::object_set_instance()
{
static std::set<int, std::less<int>> s;
return s;
}
#include "container.hh"
#include <iostream>
int main()
{
auto a = container(42);
auto b = container(21);
auto b1 = container(51);
auto b2 = container(65);
auto b3 = container(99);
}
Output :
Size 1
Size 1
Size 1 // Size never change
Size 1
Size 1
Why doesn't the set's size change ?
auto set = object_set_instance();
If you use your debugger to inspect what set is, you will discover that it's a std::set and not a std::set& reference. Effectively, a copy of the original std::set is made (object_set_instance() returns a reference, only to copy-construct a new object that has nothing to do with the referenced one), and the next line of code modifies the copy, and it gets thrown away immediately afterwards.
This should be:
auto &set = object_set_instance();
A debugger is a very useful tool for solving these kinds of Scooby-Doo mysteries, and it would clearly reveal what's going on here. If you haven't yet had the opportunity to learn how to use one, hopefully this will inspire you to take a look, and join Mystery, Inc. as a member in good standing.

Creating a struct containing a pointer to itself in LLVM

I'm currently using LLVM to build a JIT. There are some C structs that I would like to be able to use in my JIT'd IR. One of them has the following layout:
struct myStruct {
int depth;
myStruct* parent;
}
When compiling with clang and using -S -emit-llvm, I get the following, which seems absolutely reasonable:
type myStruct = { i32, myStruct* }
Alright. Now, if I want to do the same using the LLVM API, I'm not quite sure how I should do it. The following (expectedly) does not work:
auto intType = IntegerType::get(context, 32); // 32 bits integer
Type* myStructPtrType = nullptr; // Pointer to myStruct
// The following crashes because myStructPtrType is null:
auto myStructType = StructType::create(context, { intType, myStructPtrType }, "myStruct"); // myStruct
myStructPtrType = PointerType::get(myStructType, 0); // Initialise the pointer type now
I don't really know how to proceed here.
Any suggestions are welcome.
I was able to answer the question thanks #arnt's comment. In case anyone has the same goal/problem. The idea is first to create an opaque type, then fetch the pointer type to this opaque type, then set the aggregate body (which is the key of the solution) using setBody.
Here is some code:
auto intType = IntegerType::get(context, 32); // 32 bits integer
auto myStructType = StructType::create(context, "myStruct"); // Create opaque type
auto myStructPtrType = PointerType::get(myStructType, 0); // Initialise the pointer type now
myStructType->setBody({ intType, myStructPtrType }, /* packed */ false); // Set the body of the aggregate

How can i store a vector-list as a global variable?

I am creating a tree that consists of branches. For the purpose of my work, I need to keep track of the branches, and In order to do that, I want to store them in a vector-list. I store the vector-list as a global variable in this file, as I want to use it in both the constructor and the function shown in the code snippet below.
The tricky part here is that I get an error message (running in Visual Studio 2013) that as far as I can tell has something to do with the iterator not doing its job properly. The error message appears whenever i call branchList.push_back(root) and branchList.resize(). branchList.size() does NOT result in an error.
So my question is: What am I missing / not understanding to make this work? If i were to place vector branchList; in the beginning of the constructor, everything works as intended. This however does not help me, since I need to also use it in other functions later on.
Relevant code snippets from the files I am using.
skeletonBuilder.h:
class TreeSkeleton {
public:
TreeSkeleton();
void growTree();
};
skeletonBuilder.cpp:
#include "skeletonBuilder.h"
#include <cstdint>
#include <vector>
typedef struct branch {
branch *parent;
vec3 position;
vec3 direction;
} branch;
//used by constructor + "treeGrow" function
std::vector<branch> branchList = {};
TreeSkeleton::TreeSkeleton() {
//instantiate the tree root as a starting position.
branch root;
root.parent = NULL;
root.position = vec3(0, 0, 0);
root.direction = vec3(0, 1, 0);
branchList.size(); //works fine
branchList.resize(100); //Crashes here
branchList.push_back(root); //Crashes here
}
TreeSkeleton::growTree() {
//pushing more branches to branchList
}
main.cpp:
#include "skeletonBuilder.h"
TreeSkeleton tree;
int main(int argc, char *argv[]) {
return 0;
}
The error message I am getting:
Unhandled exception at 0x00507077 in OpenGL_project_Debug.exe: 0xC0000005: Access violation reading location 0x40EAAAB4.
The error message takes me to the following code snippet in a file called "vector":
#if _VECTOR_ORPHAN_RANGE
void _Orphan_range(pointer _First, pointer _Last) const
{ // orphan iterators within specified (inclusive) range
_Lockit _Lock(_LOCK_DEBUG);
const_iterator **_Pnext = (const_iterator **)this->_Getpfirst();
if (_Pnext != 0)
while (*_Pnext != 0) //<----------------This is the row that it gets stuck on
if ((*_Pnext)->_Ptr < _First || _Last < (*_Pnext)->_Ptr)
_Pnext = (const_iterator **)(*_Pnext)->_Getpnext();
else
{ // orphan the iterator
(*_Pnext)->_Clrcont();
*_Pnext = *(const_iterator **)(*_Pnext)->_Getpnext();
}
}
The initialization order of global objects is not guaranteed between implementation files. There is no way to know rather the globals of main.cpp or skeletonBuilder.cpp will be initialized first. In your case, TreeSkeleton tree is initialized before std::vector<branch> branchList which leads to your problem. The constructor of TreeSkeleton must use the uninitialized branchList which is undefined behavior. The solution is to place your globals in such a way that the order is guaranteed.
One solution is to make branchList a local static variable. These variables are guaranteed to be initialized when first encountered.
For example :
class TreeSkeleton {
public:
TreeSkeleton();
void growTree();
private:
static std::vector<branch> & getBranches();
};
std::vector<branch> & TreeSkeleton::getBranches()
{
// branchList is initialized the first time this line is encountered
static std::vector<branch> branchList;
return branchList;
}
TreeSkeleton::TreeSkeleton()
{
//instantiate the tree root as a starting position.
branch root;
root.parent = NULL;
root.position = vec3(0, 0, 0);
root.direction = vec3(0, 1, 0);
auto & branchList = getBranches();
branchList.size();
branchList.push_back(root); // Should be fine now
}

c++ dereference pointed object and pass into a function as reference, value is changed

Dear experienced c++ expert:
In recent coding process, there is a trick problem associated with reference and
dereference operation.
typedef io::SequenceDataAccess<DNA_N> read_access_type;
here is class constructor:
AlignmentData( // ignored some arguments
const io::SequenceDataHost* read_data_batch
)
{
read_access_type read_data_access( *read_data_batch );
......
}
in a debug session, I set a breakpoint at the function body(the unique line), the
value of read_data_batch(a pointer) is 0x7fff9a0489a0, and print the other state info:
(gdb) print read_data_batch
$7 = (const bxtbio::io::SequenceDataHost *) **0x7fff9a0489a0**
(gdb) print *read_data_batch
$8 = (bxtbio::io::SequenceData) {<bxtbio::io::SequenceData> =
{<bxtbio::io::SequenceDataInfo> = {m_alphabet = bxtbio::DNA_N,
m_n_seqs = 250, m_n_segments = 0, m_name_stream_len = 16392,
m_sequence_stream_len = 25000, m_sequence_stream_words = 3125,
......
but when I step into the constructor of class SequenceDataAccess,
shown as below:
NVBIO_HOST_DEVICE NVBIO_FORCEINLINE
SequenceDataAccess(const SequenceDataT& data):m_data(data)
the argument data's state:
(gdb) print &data
$9 = (const bxtbio::io::SequenceDataViewCore<unsigned int const*,
unsigned int const*, char const*, char const*> *) **0x7fffa0aacb50**
(gdb) print data
$10 = (const bxtbio::io::SequenceDataViewCore<unsigned int const*, unsigned int const*, char const*, char const*> &) #0x7fffa0aacb50: {<bxtbio::io::SequenceDataInfo> = {m_alphabet = bxtbio::PROTEIN, m_n_seqs = 0, m_n_segments = 0, m_name_stream_len = 0, m_sequence_stream_len = 0, m_sequence_stream_words = 0, m_has_qualities = 0, .......
My Questions:
shouldn't data has the same address as the read_data_batch which is
passed in the copy-constructor?
why are the data member's values all changed? What are the possible
reasons?
this code is running in a multiple threads environment.
Thanks.
Here is a small invoke chain code excerpt:
// file1.cpp
void MapSpliceWorker::align(io::HostOutputBatchSE *cpu_batch)
{
log_info(stderr, "MapSpliceWorker::align called cpu_batch.count = %d\n",cpu_batch->count);
for (uint32 c = 0; c < cpu_batch->count; c++) {
AlignmentData alignment = get(*cpu_batch, c); // Here is the entrypoint invocation
}
}
// file2.cpp
AlignmentData get(HostOutputBatchSE& batch, const uint32 aln_id)
{
const uint32 read_id = batch.read_ids.size() ?
batch.read_ids[ aln_id ] : aln_id;
// construct a AlignmentData object and return
return AlignmentData(&batch.alignments[aln_id],
batch.mapq[aln_id],
aln_id,
read_id,
batch.read_data,
&batch.cigar,
&batch.mds);
}
// file3.h
struct AlignmentData
{
AlignmentData(const Alignment* _aln,
const uint32 _mapq,
const uint32 _aln_id,
const uint32 _read_id,
const io::SequenceDataHost* read_data_batch,
const HostCigarArray* cigar_array,
const HostMdsArray* mds_array)
: valid(true),
aln(_aln),
aln_id(_aln_id),
read_id(_read_id),
mapq(_mapq),
read_data_batch_p(read_data_batch),
cigar_array_p(cigar_array),
mds_array_p(mds_array)
{
read_access_type read_data_access( *read_data_batch ); // up to now, read_data_batch has valid states
}
};
// file4. h
template <
Alphabet SEQUENCE_ALPHABET_T,
typename SequenceDataT = ConstSequenceDataView>
struct SequenceDataAccess
{
/// constructor
NVBIO_HOST_DEVICE NVBIO_FORCEINLINE
SequenceDataAccess(const SequenceDataT& data)
: m_data( data ) // here the data's states is cleaned and the address is not same as read_data_batch in previous context.
{
#if !defined(NVBIO_DEVICE_COMPILATION) || defined(NVBIO_CUDA_DEBUG)
assert( m_data.m_alphabet == SEQUENCE_ALPHABET );
// failed by this assert
#endif
}
}
It looks like your program invoked Undefined Behaviour.
If some piece of the program writes out-of-bounds (corrupting heap or stack) the results are unpredictable.
Whenever you see inexplicable things like this in your code, you should be thinking of UB elsewhere.
You could...
Start with minimal code and gradually add code until the problem appears¹
use a memory checker/instrumenter (valgrind, rational purify)
use sanitizers (much the same effects, eg. -fsanitize=address,undefined for GCC/clang)
Keep in mind that debugging optimized code can be disorienting because the compiler may re-arranges code out-of-order and optimize variables out.
Using a debug (unoptimized) build can help, BUT you might not see the UB then
¹ This approach is haphazard, because UB means you might not see the problem when it exists (it's undefined: It might just appear to execute normally)
Pass by reference just allows you to work with the variables without derefrencing them. It's still like passing a pointer. A perma-derefrenced pointer.
When you take an objects reference you are taking its address. A pointer is just an objects address. So a reference and pointer to the same object have the same value. The difference between a reference and a pointer is that references are immutable and pointers are mutable. There's a lot more little differences but that's a pretty big one.