Reading variable pointed by a pointer in llvm - llvm

Pointer type can be deduced through:
Value* v= i->getOperand(0);
.......
if(PointerType* pt=dyn_cast<PointerType>(v->getType())){
pt->getElementType()->getTypeID();
How can I read the value that this pointer points to?
I is a CallInst.

Given a CallInst, you can get an argument via getArgOperand() or iterate over all of them with arg_operands(). The arguments you get this way are just Values, and you can do anything you can do with other Values on them.
In particular, if those Values are constants, you can get the actual values used in the compiler - see this related stackoverflow question: LLVM get constant integer back from Value*

Related

[3.0]Question about how to use the store IR instruction to obtain the blockaddress

I am writing to enquire about a question.
When I read the IR language generated by a piece of C programs, I found that in C programs, the behavior of getting tag addresses is handled by a store directive after it is translated into IR.
store i8* blockaddress(#func_name, %label_name), i8** %val_name
However, I read the official documents. Here's how Blockaddress works:
blockaddress(#function, %block)
The 'blockaddress' constant computes the address of the specified basic block in the specified function, and always has an i8* type. Taking the address of the entry block is illegal.
This value only has defined behavior when used as an operand to the '[indirectbr](file:///D:/opensourse/llvm-3.0.src/docs/LangRef.html#i_indirectbr)' instruction, or for comparisons against null. Pointer equality tests between labels addresses results in undefined behavior — though, again, comparison against null is ok, and no label is equal to the null pointer. This may be passed around as an opaque pointer sized value as long as the bits are not inspected. This allows ptrtoint and arithmetic to be performed on these values so long as the original value is reconstituted before the indirectbr instruction.
Finally, some targets may provide defined semantics when using the value as the operand to an inline assembly, but that is target specific.
So I want to figure out how stores construct blockaddress in the IR program by storing them in%5.
What should I do if I want to use C++ to construct this store directive to get Addresses of Basic Blocks?
I made some attempts, such as constructing an indirectbr:
irBuilder.SetInsertPoint(indirectbr_bb);
IndirectBrInst *indirect_br = IndirectBrInst::Create(BlockAddress::get(func, instr2_bb), 0, indirectbr_bb);
indirect_br->addDestination(instr1_bb);
indirect_br->addDestination(instr2_bb);
The IR program generated is as follows:
indirectbr_bb: ; preds = %dispatch_then_bb
indirectbr i8* blockaddress(#jit_func, %instr2_bb), [label %instr1_bb, label %instr2_bb]
After my test, it can be executed correctly. Therefore, I want to know how to construct a similar store IR to store the address of the basic block in the array.
Blockaddress::get(basicblock *bb) returns a blockaddress pointer, which is a subclass of constant and a derived class of Value.
In LLVM IR, all variables are of type Value.
So we can do this:
ArrayType *arrayType = ArrayType::get(irBuilder.getInt8PtrTy(), 1024);
module->getOrInsertGlobal("label_array", arrayType);
GlobalVariable *label_array = module->getNamedGlobal("label_array");
vector <Constant *> array_elems;
array_elems.push_back(BlockAddress::get(func, ret_bb));
array_elems.push_back(BlockAddress::get(func, instr1_bb));
array_elems.push_back(BlockAddress::get(func, instr2_bb));
label_array->setInitializer(ConstantArray::get(arrayType, array_elems));

C++ - Casting variables and how does it treat them?

I've been pretty curious about this for awhile now maybe I am 100% wrong, but when you cast one type to another does it look at the memory/value and then treat that memory/value as the new type?
For example:
char Letter = 'A';
int iLetter = static_cast<int>(Letter);
//iLetter is 65
If this is correct does it look at the memory location / value of "Letter" and then change the value to represent what ever you are casting it to? I came to this theory by picturing all values as integers and then being casted to the char/struct/class etc.
Hopefully this is a full question, I'd just like a good understanding of how casting really works with the values / information to change them into new values, etc.
In situations when you cast a value (as opposed to a pointer or a reference) the compiler constructs a new value from the one being cast, as opposed to interpreting an existing location as the new type.
Specifically, the code looks at the value of Letter, which is a char, and constructs an iLetter from it by extending the char to an int using the integer conversion rules of C++. This may include sign extension for signed types, so a negative signed char will become a negative int.
On the other hand, when you cast a pointer, the same location is interpreted as a new type.
In your case, static_cast create temp variable with new type and then set it in iLetter.
Edit:
It means static_cast doesn't change the main var type and just read it. at the end, it doesn't directly put the converted value inside iLetter. it will create temp var with new type and that will be set inside iLetter.
What your cast is doing is an implicit conversion, and that means the cast is redundant and not needed, just do this:
int iLetter = Letter; //This is a safe conversion as well
If this is correct does it look at the memory location / value of "Letter"
Yes. Obviously, the value of Letter is looked at, as in the value of the variable seen, otherwise the compiler would have no idea what you are talking about.
- and then change the value to represent whatever you are casting it to?
The original value is not changed, only copied, and that value is casted to the int, to return the character code.

How to get string representation for the member function?

As a part of hashing, I need to convert a function pointer to a string representation. With global/static functions it's trivial:
string s1{ to_string(reinterpret_cast<uintptr_t>(&global)) };
And from here:
2) Any pointer can be converted to any integral type large enough to
hold the value of the pointer (e.g. to std::uintptr_t)
But I have a problems with member functions:
cout << &MyStruct::member;
outputs 1 though in debugger I can see the address.
string s{ to_string(reinterpret_cast<uintptr_t>(&MyStruct::member)) };
Gives a compile-time error cannot convert. So it seems that not any pointer can be converted.
What else can I do to get a string representation?
cout << &MyStruct::member;
outputs 1 though in debugger I can see the address.
There is no overload for ostream::operator<<(decltype(&MyStruct::member)). However, the member function pointer is implicitly convertible to bool and for that, there exists an overload and that is the best match for overload resolution. The converted value is true if the pointer is not null. true is output as 1.
string s{ to_string(reinterpret_cast<uintptr_t>(&MyStruct::member)) };
Gives a compile-time error cannot convert. So it seems that not any pointer can be converted.
Perhaps confusingly, in standardese pointer is not an umbrella term for object pointers, pointers-to-members, pointers-to-functions and pointers-to-member-functions. Pointers mean just data pointers specifically.
So, the quoted rule does not apply to pointers-to-member-functions. It only applies to (object) pointers.
What else can I do to get a string representation?
You can use a buffer of unsigned char, big enough to represent the pointer, and use std::memcpy. Then print it in the format of your own choice. I recommend hexadecimal.
As Martin Bonner points out, the pointer-to-member may contain padding in which case two values that point to the same member may actually have a different value in the buffer. Therefore the printed value is not of much use because two values are not comparable without knowing which bits (if any) are padding - which is implementation defined.
Unfortunately I need a robust solution so because of this padding I can't use.
No portable robust solution exists.
As Jonathan Wakely points out, there is no padding in the Itanium ABI, so if your compiler uses that, then the suggested memcpy method would work.

LLVM: Replacing all instances of an address with a constant

I'm trying to replace all instances of an address with a constant.
I'm getting & testing the address of store with the following (i is an instruction)
//already know it's a store instruction at this point
llvm::Value *addy = i->getOperand(0);
if(llvm::ConstantInt* c = dyn_cast<llvm:::ConstantInt>(addy)){
//replace all uses of the address with the constant
//operand(1) will be the address the const would be stored at
i->getOperand(1)->replaceAllUsesWith(c);
}
I'd think this would work, but I'm getting the error that
"Assertion: New->getType()== getType() && replaceAllUses of value with new value of different type!" failed
and I'm not sure why...my understanding of replaceAllUses is that it would replace usage of address (i->getOperand(1) with the constant?
The error message is pretty straightforward: the type of the new value is not identical to the type of the old value that you are replacing.
LLVM IR is strongly typed, and as you can see in the language reference, every instruction has a specific type it expects as each operand. For example, store requires that the address's type will always be a pointer to the type of the value being stored.
As a result, whenever you replace the usage of a value, you must ensure first that they both have the same type - replaceAllUsesWith actually has an assert to verify it, as you can see, and you failed it. It's also simple to see why: operand 1 of a store instruction is always of some pointer type, and a ConstantInt always represents something of some integer type, so surely they can never match.
What exactly are you trying to achieve? Perhaps you are thinking about replacing each load of that store's address with a usage of the constant? In that case, you'll have to find yourself all the loads that use that address, and for each of them (for each of the loads, I mean, not of the addresses) perform replaceAllUsesWith with the constant. There are standard LLVM passes that can do those things for you, by the way - check out the pass list. I'm guessing mem2reg followed by some constant propagation pass will take care of this.

What are the use pointer variables?

I've recently tried to really come to grips with references and pointers in C++, and I'm getting a little bit confused. I understand the * and & operators which can respectively get the value at an address and get the address of a value, however why can't these simply be used with basic types like ints?
I don't understand why you can't, for example, do something like the following and not use any weird pointer variable creation:
string x = "Hello";
int y = &x; //Set 'y' to the memory address of 'x'
cout << *y; //Output the value at the address 'y' (which is the memory address of 'x')
The code above should, theoretically in my mind, output the value of 'x'. 'y' contains the memory address of 'x', and hence '*y' should be 'x'. If this works (which incidentally on trying to compile it, it doesn't -- it tells me it can't convert from a string to an int, which doesn't make much sense since you'd think a memory address could be stored in an int fine).
Why do we need to use special pointer variable declarations (e.g. string *y = &x)?
And inside this, if we take the * operator in the pointer declaration literally in the example in the line above, we are setting the value of 'y' to the memory address of 'x', but then later when we want to access the value at the memory address ('&x') we can use the same '*y' which we previously set to the memory address.
C and C++ resolve type information at compile-time, not runtime. Even runtime polymorphism relies on the compiler constructing a table of function pointers with offsets fixed at compile time.
For that reason, the only way the program can know that cout << *y; is printing a string is because y is strongly typed as a pointer-to-string (std::string*). The program cannot, from the address alone, determine that the object stored at address y is a std::string. (Even C++ RTTI does not allow this, you need enough type information to identify a polymorphic base class.)
In short, C is a typed language. You cannot store arbitrary things in variables.
Check the type safety article at wikipedia. C/C++ prevents problematic operations and functional calls at compliation time by checking the type of the operands and function parameters (but note that with explicit casts you can change the type of an expression).
It doesn't make sense to store a string in an integer -> The same way it doesn't make sense to store a pointer in it.
Simply put, a memory address has a type, which is pointer. Pointers are not ints, so you can't store a pointer in an int variable. If you're curious why ints and pointers are not fungible, it's because the size of each is implementation defined (with certain restrictions) and there is no guarantee that they will be the same size.
For instance, as #Damien_The_Unbeliever pointed out pointers on a 64-bit system must be 64-bits long, but it is perfectly legal for an int to be 32-bits, as long as it is no longer than a long and nor shorter than a short.
As to why each data type has it's own pointer type, that's because each type (especially user-defined types) is structured differently in memory. If we were to dereference typeless (or void) pointers, there would be no information indicating how that data should be interpreted. If, on the other hand, you were to create a universal pointer and do away with the "inconvenience" of specifying types, each entity in memory would probably have to be stored along-side its type information. While this is doable, it's far from efficient, and efficiency is on of C++'s design goals.
Some very low-level languages... like machine language... operate exactly as you describe. A number is a number, and it's up to the programmer to hold it in their heads what it represents. Generally speaking, the hope of higher level languages is to keep you from the concerns and potential for error that comes from that style of development.
You can actually disregard C++'s type-safety, at your peril. For instance, the gcc on a 32-bit machine I have will print "Hello" when I run this:
string x = "Hello";
int y = reinterpret_cast<int>(&x);
cout << *reinterpret_cast<string*>(y) << endl;
But as pretty much every other answerer has pointed out, there's no guarantee it would work on another computer. If I try this on a 64-bit machine, I get:
error: cast from ‘std::string*’ to ‘int’ loses precision
Which I can work around by changing it to a long:
string x = "Hello";
long y = reinterpret_cast<long>(&x);
cout << *reinterpret_cast<string*>(y) << endl;
The C++ standard specifies minimums for these types, but not maximums, so you really don't know what you're going to be dealing with when you face a new compiler. See: What does the C++ standard state the size of int, long type to be?
So the potential for writing non-portable code is high once you start going this route and "casting away" the safeties in the language. reinterpret_cast is the most dangerous type of casting...
When should static_cast, dynamic_cast, const_cast and reinterpret_cast be used?
But that's just technically drilling down into the "why not int" part specifically, in case you were interested. Note that as #BenVoight points out in the comment below, there does exist an integer type as of C99 called intptr_t which is guaranteed to hold any poniter. So there are much larger problems when you throw away type information than losing precision...like accidentally casting back to a wrong type!
C++ is a strongly typed language, and pointers and integers are different types. By making those separate types the compiler is able to detect misuses and tell you that what you are doing is incorrect.
At the same time, the pointer type maintains information on the type of the pointed object, if you obtain the address of a double, you have to store that in a double*, and the compiler knows that dereferencing that pointer you will get to a double. In your example code, int y = &x; cout << *y; the compiler would loose the information of what y points to, the type of the expression *y would be unknown and it would not be able to determine which of the different overloads of operator<< to call. Compare that with std::string *y = &x; where the compiler sees y it knows it is a std::string* and knows that dereferencing it you get to a std::string (and not a double or any other type), enabling the compiler to statically check all expressions that contain y.
Finally, while you think that a pointer is just the address of the object and that should be representable by an integral type (which on 64bit architectures would have to be int64 rather than int) that is not always the case. There are different architectures on which pointers are not really representable by integral values. For example in architectures with segmented memory, the address of an object can contain both a segment (integral value) and an offset into the segment (another integral value). On other architectures the size of pointers was different than the size of any integral type.
The language is trying to protect you from conflating two different concepts - even though at the hardware level they are both just sets of bits;
Outside of needing to pass values manually between various parts of a debugger, you never need to know the numerical value.
Outside of archaic uses of arrays, it doesn't make sense to "add 10" to a pointer - so you shouldn't treat them as numeric values.
By the compiler retaining type information, it also prevents you from making mistakes - if all pointers were equal, then the compiler couldn't, helpfully, point out that what you're trying to dereference as an int is a pointer to a string.