LLVM non local memory dependency analyis - c++

I am writing a LLVM pass which needs to analyse memory dependencies. The pass needs to find for a memory read every instruction that could have defined this memory. For example:
int foo(){
int x = 123;
if(getchar()){
x = 321;
}
return x;
}
In this case the pass needs to find both definitions of x when it loads x from memory on the return. I used the following code in my pass.
MemoryDependenceAnalysis &MDA = getAnalysis<MemoryDependenceAnalysis>();
if (isa<LoadInst>(i)) {
MemDepResult d = MDA.getDependency(&i);
if (d.isNonLocal()) {
SmallVector<NonLocalDepResult, 4> result;
MDA.getNonLocalPointerDependency(&i, result);
for (SmallVectorImpl<NonLocalDepResult>::iterator it=result.begin() ; it < result.end(); it++ ) {
MemDepResult dd = it->getResult();
if (dd.isUnknown())
errs() << "Unknown\n"; //result always unknown
}
}
}
}
MDA.getDependency() gives me the right results for dependencies that a local (in the same basic block). But getNonLocalPointerDependency() always return null pointers.
How do I get LLVM to find every store that could have defined the memory of a load instruction?

Related

Is this allowed at runtime? How do you extract the top N elements of a stack-like thing?

I'm trying to do something like this - copy the top N elements of the stack to an array. I want to use it to define the invokevirtual, invokespecial, invokestatic, invokeinterface, and invokedynamic instructions for a Java Ahead-Of-Time Compiler. The stack is a linked list and __pop() unwinds and returns the top of the stack.
public : void __sipop(){
topframe = topframe->prev;
}
public : void __longpop(){
topframe = topframe->prev->prev;
}
public : jvalue __pop(){
//also shared with bytecode
jvalue value = topframe->value;
if(topframe->type == 'J' || topframe->type == 'D'){
__longpop();
} else{
__sipop();
}
return value;
}
public : jvalue* __extract(int count){
jvalue extracted [count];
for(int i = 0; i < count; i++){
extracted[count - i - 1] = __pop();
}
return extracted;
}
Will my implementation crash at runtime?
Will my implementation crash at runtime?
Maybe. You exhibit Undefined Behaviour at least in:
jvalue* __extract(int count){
jvalue extracted [count];
for (int i = 0; i < count; i++){
extracted[count - i - 1] = __pop();
}
return extracted;
}
Your function returns a pointer to a local variable whose lifetime ends as the function returns. For additional information, you should read this excellent answer on Sotack Overflow: Can a local variable's memory be accessed outside its scope? (tl; dr: no).
The simplest solution would be to return a vector:
#include <vector> // preferably in the first lines of your header file (.hpp)
std::vector<jvalue> extract(int count)
{
auto extracted = std::vector<jvalue>(count);
for (int i = 0; i < count; i++) {
extracted[count - i - 1] = __pop();
}
return extracted;
}
You may also be interested by std::generate.
Additionally, as mentioned in the comments, names staring by two underscores (__) are reserved to the implementation.
Unrelated note, I understand your wish to feel comfortable in C++ by mimicking aspects of the Java language, but you should write idiomatic C++ and not repeat the access modifier (public) before each member function.
Yes. Returning the address of a stack-local object (extracted) is undefined behavior. Return a heap-allocated array (auto extracted = new jvalue[count];) or std::vector<jvalue> instead.

Is access by pointer so expensive?

I've a Process() function that is called very heavy within my DLL (VST plugin) loaded in a DAW (Host software), such as:
for (int i = 0; i < nFrames; i++) {
// ...
for (int voiceIndex = 0; voiceIndex < PLUG_VOICES_BUFFER_SIZE; voiceIndex++) {
Voice &voice = pVoiceManager->mVoices[voiceIndex];
if (voice.mIsPlaying) {
for (int envelopeIndex = 0; envelopeIndex < ENVELOPES_CONTAINER_NUM_ENVELOPE_MANAGER; envelopeIndex++) {
Envelope &envelope = pEnvelopeManager[envelopeIndex]->mEnvelope;
envelope.Process(voice);
}
}
}
}
void Envelope::Process(Voice &voice) {
if (mIsEnabled) {
// update value
mValue[voice.mIndex] = (mBlockStartAmp[voice.mIndex] + (mBlockStep[voice.mIndex] * mBlockFraction[voice.mIndex]));
}
else {
mValue[voice.mIndex] = 0.0;
}
}
It basically takes 2% of CPU within the Host (which is nice).
Now, if I slightly change the code to this (which basically are increments and assignment):
void Envelope::Process(Voice &voice) {
if (mIsEnabled) {
// update value
mValue[voice.mIndex] = (mBlockStartAmp[voice.mIndex] + (mBlockStep[voice.mIndex] * mBlockFraction[voice.mIndex]));
// next phase
mBlockStep[voice.mIndex] += mRate;
mStep[voice.mIndex] += mRate;
}
else {
mValue[voice.mIndex] = 0.0;
}
// connectors
mOutputConnector_CV.mPolyValue[voice.mIndex] = mValue[voice.mIndex];
}
CPU go to 6/7% (note, those var don't interact with other part of codes, or at least I think so).
The only reason I can think is that access to pointer is heavy? How can I reduce this amount of CPU?
Those arrays are basic double "pointer" arrays (the most lighter C++ container):
double mValue[PLUG_VOICES_BUFFER_SIZE];
double mBlockStartAmp[PLUG_VOICES_BUFFER_SIZE];
double mBlockFraction[PLUG_VOICES_BUFFER_SIZE];
double mBlockStep[PLUG_VOICES_BUFFER_SIZE];
double mStep[PLUG_VOICES_BUFFER_SIZE];
OutputConnector mOutputConnector_CV;
Any suggestions?
You might be thinking that "pointer arrays" are the lightest containers. but CPU's don't think in terms of containers. They just read and write values through pointers.
The problem here might very well be that you know that two containers do not overlap (there are no "sub-containers"). But the CPU might not be told that by the compiler. Writing to mBlockStep might affect mBlockFraction. The compiler doesn't have run-time values, so it needs to handle the case where it does. This will mean introducing more memory reads, and less caching of values in registers.
Pack all the data items in a structure and create an array of structure. I would simply use a vector.
In Process function get the single element out of this vector, and use its parameters. At the cache-line/instruction level, all items would be (efficiently) brought into local cache (L1), as the data element (members of struct) as contiguous. Use reference or pointer of struct type to avoid copying.
Try to use integer data-types unless double is needed.
EDIT:
struct VoiceInfo
{
double mValue;
...
};
VoiceInfo voices[PLUG_VOICES_BUFFER_SIZE];
// Or vector<VoiceInfo> voices;
...
void Envelope::Process(Voice &voice)
{
// Get the object (by ref/pointer)
VoiceInfo& info = voices[voice.mIndex];
// Work with reference 'info'
...
}

Why local arrays in functions seem to prevent TCO?

Looks like having a local array in your function prevents tail-call optimization on it on all compilers I've checked it on:
int foo(int*);
int tco_test() {
// int arr[5]={1, 2, 3, 4, 5}; // <-- variant 1
// int* arr = new int[5]; // <-- variant 2
int x = foo(arr);
return x > 0 ? tco_test() : x;
}
When variant 1 is active, there is a true call to tco_test() in the end (gcc tries to do some unrolling before, but it still calls the function in the end). Variant 2 does TCO as expected.
Is there something in local arrays which make it impossible to optimize tail calls?
If the compiler sill performed TCO, then all of the external foo(arr) calls would receive the same pointer. That's a visible semantics change, and thus no longer a pure optimization.
The fact that the local variable in question is an array is probably a red herring here; it is its visibility to the outside via a pointer that is important.
Consider this program:
#include <stdio.h>
int *valptr[7], **curptr = valptr, **endptr = valptr + 7;
void reset(void)
{
curptr = valptr;
}
int record(int *ptr)
{
if (curptr >= endptr)
return 1;
*curptr++ = ptr;
return 0;
}
int tally(void)
{
int **pp;
int count = 0;
for (pp = valptr; pp < curptr; pp++)
count += **pp;
return count;
}
int tail_function(int x)
{
return record(&x) ? tally() : tail_function(x + 1);
}
int main(void)
{
printf("tail_function(0) = %d\n", tail_function(0));
return 0;
}
As the tail_function recurses, which it does via a tail call, the record function records the addresses of different instances of the local variable x. When it runs out of room, it returns 1, and that triggers tail_function to call tally and return. tally sweeps through the recorded memory locations and adds their values.
If tally were subject to TCO, then there would just be one instance of x. Effectively, it would be this:
int tail_function(int x)
{
tail:
if (record(&x))
return tally();
x = x + 1;
goto tail;
}
And so now, record is recording the same location over and over again, causing tally to calculate an incorrect value instead of the expected 21.
The logic of record and tally depends on x being actually instantiated on each activation of the scope, and that outer activations of the scope have a lifetime which endures until the inner ones terminate. That requirement precludes tail_function from recursing in constant space; it must allocate separate x instances.

How can I find the value of what a memory address currently holds?

How do you read the address of a program and return the value of what ever the data the address holds? I have the following code which reads an offset address of a program, but I would like to return the value of the address that I'm checking to do some other stuff with it.
inline Mem *Read(DWORD64 address)
{
this->value = address;
return (this->r(0));
}
inline Mem *r(DWORD64 ofs)
{
if (!this || !value)
return 0;
if (!ReadProcessMemory(Handle, (void*)(value + ofs), &value, sizeof(DWORD64), 0))
return 0;
return this;
}
m.Read(0x1428003C0)->r(0x100);
For example, so I read the offset 0x100 inside the address 0x1428003C0, I know it holds the value of vehicle speed how can I return the speed value from that address? I would like to find out the speed and depends on the speed I would apply breaks or accelerate. I have tried to cout the m.read command, and I get weird garbage that I do not understand. I'm guessing it's the static memory address of the program that I'm currently reading?
On successful return value should contain the content of the memory. You need to check the documentation to determine what format that is.
However, as a first guess I would try:
Mem m;
if (m.Read(0x1428003C0)->r(0x100) )
{
double* val = (double*)(&m.value);
if (val != nullptr)
std::cout << *val << std::endl;
}
If you know you have a valid address you can simply cast it to the right pointer type to read the memory. Note that this is dangerous if you don't have a valid address the result is undefined behavior, and you could just be reading garbage.

How to correct return vector from function?

Can you help me with this problem please? I want return vector from function in class Typ to main function:
In class Typ:
vector<Test*> NetworkType::createObject(int r1, int r2, r3) {
vector<test*> te0;
if (res1 == 1 && res2 == 1 && res3 == 1) {
TestV *p1 = new TestV("aaa","bbb",3,"ooo","ccc", "ttt", "testX", "sk2");
TestV *p3 = new TestV("rrr","ddd",3,"ooo","ccc", "ttt", "testY", "sk2");
//return p1;
TestV tesV1(*p1);
te0.push_back(&tesV1);
TestV tesV2(*p3);
te0.push_back(&tesV2);
return te0;
} else {
...
}
}
main:
Typ nk;
vector<Test*> p;
p = nk.createObject(p0,p1,p2);
output:
for(int i = 0; i < p.size(); i++){
cout << "\n" + toString(p[i]);
}
toString:
std::string toString(Test* arg) {
TestV* teV = dynamic_cast<TestV*>(arg);
TestN* teN = dynamic_cast<TestN*>(arg);
if (teV)
{
return teV->toString();
}
else
{
return teN->toString();
}
return "";
};
Compilation is correct, but after run the program I have obtained this error:
Unhandled exception at 0x76dac41f in VolbaHoneypotu.exe: Microsoft C++
exception: std::__non_rtti_object at memory location 0x002fec9c..
Thank you for reply.
This has nothing to do with the vectors nor with the return value of the function. The exception message clearly says what's wrong: your supposed objects are really not objects - because you're putting the address of a block scope object with automatic storage duration ("local variable") to the vector, and that's invalid, so your program invokes undefined behavior.
(This holds only if you meant
te0.push_back(&tesV1);
instead of
te0.push_back(&hpv1);
else your code doesn't even compile.)
(There are several typos in your code, I'm assuming you meant "tesV1" instead of "hpv1" and consistently either "Test" or "test" and so on.)
You are pushing pointers to variables allocated on the stack in your createObject() function:
TestV tesV1(*p1);
te0.push_back(&hpV1);
HoneypotV tesV2(*p3);
te0.push_back(&hpV2);
These objects are copies of objects allocated on the heap (with "new"), but that does not matter. Once the scope ends (when you return from the function), the objects on the stack disappear. When you dereference the pointers after that, you get into trouble.
It's not entirely clear what you're trying to do, but if you push pointers to the heap-allocated objects directly you will not get into this kind of trouble:
te0.push_back( p1 );
te0.push_back( p2 );
Note also that when you allocate an object with "new", it is generally expected that you should also de-allocate it with "delete". The code as you have written it leaks memory -- the memory allocated in createObject() can never be freed.