how to use llvm analysis pass in standalone program? - c++

I want to use llvm alias analysis result in my standalone program, for example, maybe like this initially:
int main()
{
...
PassManager PM(M);
ImmutablePass* basic_aa = createBasicAliasAnalysisPass();
PM.add(basic_aa);
AliasAnalysis& AA = basic_aa->getAnalysis<AliasAnalysis>();
...
}
but the AA seems to make no sense. So how can I use llvm analysis pass in my standalone program?

llvm Analysis is not a pass but passes, that being said.
AA class is used to determine whether or not two pointers ever can point to the same object in memory.Traditionally, alias analyses respond to a query with a Must, May, or No alias response, indicating that two pointers always point to the same object, might point to the same object, or are known to never point to the same object
Example:
if you want to search for un-aliased global memory buffers that are only read from and pull them into the constant address space, you can create array of those pointer and Check for aliasing against non-read-only inputs.
AA->alias(psAVal, psBVal) != AliasResult::NoAlias
See:
http://llvm.org/docs/AliasAnalysis.html

Related

Structure not in memory

I created a structure like that:
struct Options {
double bindableKeys = 567;
double graphicLocation = 150;
double textures = 300;
};
Options options;
Right after this declaration, in another process, I open the process which contains the structure and search for a byte array with the struct's doubles but nothing gets found.
To obtain a result, I need to add something like std::cout << options.bindableKeys;after the declaration. Then I get a result from my pattern search.
Why is this behaving like that? Is there any fix?
Minimal reproducible example:
struct Options {
double bindableKeys = 567;
double graphicLocation = 150;
double textures = 300;
};
Options options;
while(true) {
double val = options.bindableKeys;
if(val > 10)
std::cout << "test" << std::endl;
}
You can search the array with CheatEngine or another pattern finder
Contrary to popular belief, C++ source code is not a sequence of instructions provided to the executing computer. It is not a list of things that the executable will contain.
It is merely a description of a program.
Your compiler is responsible for creating an executable program, that follows the same semantics and logical narrative as you've described in your source code.
Creating an Options instance is all well and good, but if creating it does not do anything (has no side effects) and you never use any of its data, then it may as well not exist, and therefore is not a part of the logical narrative of your program.
Consequently, there is no reason for the compiler to put it into the executable program. So, it doesn't.
Some people call this "optimisation". That the instance is "optimised away". I prefer to call it common sense: the instance was never truly a part of your program.
And even if you do use the data in the instance, it may be possible for an executable program to be created that more directly uses that data. In your case, nothing changes the default values of Option's members, so there is no reason to include them into the program: the if statement can just have 567 baked into it. Then, since it's baked in, the whole condition becomes the constant expression 567 > 10 which must always be true; you'll likely find that the resulting executable program consequently contains no branching logic at all. It just starts up, then outputs "test" over and over again until you force-terminate it.
That all being said, because we live in a world governed by physical laws, and because compilers are imperfect, there is always going to be some slight leakage of this abstraction. For this reason, you can trick the compiler into thinking that the instance is "used" in a way that requires its presence to be represented more formally in the executable, even if this isn't necessary to implement the described program. This is common in benchmarking code.

small classes how to pass by value and register

I need to use small classes formed essentially from just an integer "handle" and be able to treat that as a class in order to be able to attach methods to it.
At the same time I want also to avoid to pass from one function to the other just the address of the handle ( the "this" pointer) because doing so means that in order to read a handle that should just be there I would need to read a memory location to have it.
So I need essentially to have the "handle" passed by value eventually in registers ( depending on calling convention ).
Some clarifying code is:
struct F{
int aa,bb,cc;};
F A[0x100];
struct handle{
int hhh;
void elaborateHandle(){ ... operations ;}
};
int main(){
handle h;h.hhh=3;
h.elaborateHandle();
// I need that call to pass on the stack essentially the number 3 and not the address of where the number 3 was saved on the stack.
}
I think, that you shouldn't think about it, because here you are having a very very small performance lose, dereferencing one pointer is a cheap operation.
If you use optimizing compiler, there is a chance, that your method call will be inlined inside caller func.
Anyways, if you trying to optimize your performance, you should search in other place.
But if you really thinking that it causes troubles there is a way:
Declare the function outside your class (not as member), and if you want to access private data declare it as friend.
First, print the assembly language of the code that calls your function and the first part of your function.
The assembly language will show how the registers are used. Normally, compilers try to make best use of registers when passing values to functions.
To help the compiler better use registers:
Limit parameter quantities in functions.
The compiler reserves a limited quantity of registers for passing to functions. The more parameters a function has, the less probability that all parameters will be in registers.
Also, the compiler may need to save registers before calling a function in order to pass more parameters to the function.
Pass values that fit inside registers.
If the compiler can't fit a data type in one register, it may use two registers (such as passing 64 bit values on a 32-bit processor).
If the compiler can't fit the data type in two registers, it may push the data on the stack rather than passing by register. This means that the receiving function will have to copy the values from the stack.
Pass large items by reference or pointer. On most platforms, the compiler can store a pointer into a register and pass the register to the receiving function. A lot faster that pushing and popping values with a stack. Also, compilers may use pointers to implement references.
Suggest to the compiler to place values in registers.
Although the register keyword may not be available in more recent language versions, using the register keyword with variables suggests to the compiler that you would like to have the variable in a register. It is only a suggestion and the compiler can ignore it and you.
Define variables as close to their point of usage. This allows the compiler to allocate registers when needed rather than reserving them for a while.
Create scope blocks. Using { and } to create new scope blocks will help the compiler allocate and deallocate registers that are used only in a limited area. So if variables are only used in a limited area in a function, place that area in a new scope block. You can even tag those local variable with the register keyword.
Compile with high optimization levels.
Set your compiler's optimization levels high, then check the assembly language.
The compiler may use memory for variable storage when optimization is at the lowest setting (debugging). At higher optimization levels, the compiler starts using registers more effectively.
Remember, print the assembly language of the functions and the calling code before and after playing with optimization levels.
Using the g++ compiler on the x86 platform, I found that the flag "-freg-struct-return" has a different effect that described in the documentation. According to my tests, that flag, obliges the compiler to pass structures by value ( I didn't checked it but it will be probably be valid when structures have a size smaller than a specific size -- I checked up to 64 bits and it works compiling using -m32 ).
Differently from what the documentation says, structs aren't passed in registers, unless a register passing convention in used.
That behaviour is valid also for declared or compiler recognized const methods of structures ( or classes ).
So if a method doesn't change the structure, than the structure is passed by value ( in stack allocated space or in registers depending e.g. on defining a function using the __attributes__ (( regparam(3) )) .
Instead as it should be, if a structure is modified by a method, than the address is passed to the method instead of the value of the struct ( as it should be ).
The documentation of that flag is misleading because it says: "Return struct and union values in registers when possible. This is more efficient for small structures than -fpcc-struct-return.
If you specify neither -fpcc-struct-return nor -freg-struct-return, GCC defaults to whichever convention is standard for the target. If there is no standard convention, GCC defaults to -fpcc-struct-return, except on targets where GCC is the principal compiler. In those cases, we can choose the standard, and we chose the more efficient register return alternative."
The testing code I used is bellow, the effects may be seen by seeing what the disassemler shows.
#include <stdio.h>
int a;
int aa[100];
struct Token{
short int ind; short int ind1; short int ind2;
int v() const{return aa[ind];}
__attribute((noinline)) void setind(int i){ind=i;}
__attribute((noinline)) int tok() {return ind;}
};
__attribute__ ((noinline)) void showIt(Token t){
t.ind+=t.ind;
a+=t.ind;
t.ind=8;
}
Token t0 = {.ind=15};
Token t1 = {.ind=99};
int main(int argc, char **argv)
{
t0.setind(10);
int x=19;
x=t0.tok();
showIt(t0);
t1.setind(20+x);
showIt(t1);
printf("%i\n",a);
return 0;
}

How do I treat string variables as actual code?

That probably wasn't very clear. Say I have a char *a = "reg". Now, I want to write a function that, on reading the value of a, instantiates an object of a particular class and names it reg.
So for example, say I have a class Register, and a separate function create(char *). I want something like this:
void create(char *s) //s == "reg"
{
//what goes here?
Register reg; // <- this should be the result
}
It should be reusable:
void create(char *s) //s == "second"
{
//what goes here?
Register second; // <- this should be the result
}
I hope I've made myself clear. Essentially, I want to treat the value in a variable as a separate variable name. Is this even possible in C/C++? If not, anything similar? My current solution is to hash the string, and the hash table would store the relevant Register object at that location, but I figured that was pretty unnecessary.
Thanks!
Variable names are compile-time artifacts. They don't exist at runtime. It doesn't make sense in C++ to create a dynamically-named variable. How would you refer to it?
Let's say you had this hypothetical create function, and wrote code like:
create("reg");
reg.value = 5;
This wouldn't compile, because the compiler doesn't know what reg refers to in the second line.
C++ doesn't have any way to look up variables at runtime, so creating them at runtime is a nonstarter. A hash table is the right solution for this. Store objects in the hash table and look them up by name.
This isn't possible. C++ does not offer any facilities to process code at runtime. Given the nature of a typical C++ implementation (which compiles to machine code ahead of time, losing all information about source code), this isn't even remotely feasible.
Like I said in my comment:
What's the point? A variable name is something the compiler, but -most importantly- you, the programmer, should care about. Once the application is compiled, the variable name could be whatever... it could be mangled and senseless, it doesn't matter anymore.
You read/write code, including var-names. Once compiled, it's down to the hardware to deal with it.
Neither C nor C++ have eval functions
Simply because: you only compile what you need, eval implies input later-on that may make no sense, or require other dependencies.
C/C++ are compiled ahead of time, eval implies evaluation at runtime. The C process would then imply: pre-process, compile and link the string, in such a way that it still is part of the current process...
Even if it were possible, eval is always said to be evil, that goes double for languages like the C family that are meant to run reliably, and are often used for time-critical operations. The right tool for the job and all that...
A HashTable with objects that have hash, key, Register, collision members is the sensible thing to do. It's not that much overhead anyway...
Still feel like you need this?
Look into the vast number of scripting languages that are out there. Perl, Python... They're all better suited to do this type of stuff
If you need some variable creation and lookup you can either:
Use one of the scripting languages, as suggested by others
Make the lookup explicitly, yourself. The simplest approach is by using a map, which would map a string to your register object. And then you can have:
std::map<const char*, Register*> table;
Register* create(const char* name) {
Register* r = new Register();
table[name] = r;
return r;
}
Register* lookup(const char* name) {
return table[name];
}
void destroy(const char* name) {
delete table[name];
table.erase(name);
}
Obviously, each time you want to access a variable created this way, you have to go through the call to lookup.

Which tool can list writing access to a specific variable in C?

Unfortunately I'm not even sure how this sort of static analysis is called. It's not really control flow analysis because I'm not looking for function calls and I don't really need data flow analysis because I don't care about the actual values.
I just need a tool that lists the locations (file, function) where writing access to a specific variable takes place. I don't even care if that list contained lines that are unreachable. I could imagine that writing a simple parser could suffice for this task but I'm certain that there must be a tool out there that does this simple analysis.
As a poor student I would appreciate free or better yet open source tools and if someone could tell me how this type of static analysis is actually called, I would be equally grateful!
EDIT: I forgot to mention there's no pointer arithmetic in the code base.
Why don't you make the variable const and then note down all the errors where your compiler bans write access?
Note: This won't catch errors where the memory underlying the variable is written to in some erroneous manner such as a buffer overrun.
EDIT: For example:
const int a = 1;
a = 2;
a = 3;
My compiler produces:
1>MyProg.c(46): error C3892: 'a' : you cannot assign to a variable that is const
1>MyProg.c(47): error C3892: 'a' : you cannot assign to a variable that is const
Do you mean something like this?
This works for C programs that you have made the effort to analyze with Frama-C's value analysis. It is Open Source and the dependency information is also available programmatically. As static analyzers go, it is rather on the “precise” side of the spectrum. It will work better if your target is embedded C code.
I am not sure such a tool could be written. Pointers can be used to change arbitary data in memory without having any reference to other variables pointing to that data. Think about functions like memset(), which change whole blocks of memory.
If you are not interested in these kind of mutations, you would still have to take transitive pointers into account. In C, you can have any number of pointers pointing to the same data, and you would have to analyze where copies of these pointers are made. And then these copies can be copied again, ...
So even in the "simple" case it would require quite a big amount of code analysis.

C and C++ Code Interoperability - Data Passing Issues

The following is the situation. There is a system/software which is completely written in C. This C program spawns a new thread to start some kind of a data processing engine written in C++. Hence, the system which I have, runs 2 threads (the main thread and the data processing engine thread). Now, I have written some function in C which takes in a C struct and passes it to the data processing thread so that a C++ function can access the C struct. While doing so, I am observing that the values of certain fields (like unsigned int) in the C struct changes when being accessed in the C++ side and I am not sure why. At the same time, if I pass around a primitive data type like an int, the value does not change. It would be great if someone can explain me why it behaves like this. The following is the code that i wrote.
`
/* C++ Function */
void DataProcessor::HandleDataRecv(custom_struct* cs)
{
/*Accesses the fields in the structure cs - an unsigned int field. The value of
field here is different from the value when accessed through the C function below.
*/
}
/*C Function */
void forwardData(custom_struct* cs)
{
dataProcessor->HandleDataRecv(cs); //Here dataProcessor is a reference to the object
//of the C++ class.
}
`
Also, both these functions are in different source files(one with .c ext and other with .cc ext)
I'd check that both sides layout the struct in the same
print sizeof(custom_struct) in both languages
Create an instance of custom_struct in both languages and print the offset of
each member variable.
My wild guess would be Michael Andresson is right, structure aligment might be the issue.
Try to compile both c and c++ files with
-fpack-struct=4
(or some other number for 4). This way, the struct is aligned the same in every case.
If we could see the struct declaration, it would probably clearer. The struct does not contain any #ifdef with c++-specific code like a constructor, does it? Also, check for #pragma pack directives which manipulate data alignment.
Maybe on one side the struct has 'empty bytes' added to make the variables align on 32 bit boundaries for speed (so a CPU register can point to the variable directly).
And on the other side the struct may be packed to conserve space.
(CORRECTION) With minor exceptions, C++ is a superset of C (meaning C89), So i'm confused about what is going on. I can only assume it has something to do with how you are passing or typing your variables, and/or the systems they are running on. It should, technically speaking, unless I am very mistaken, have nothing to do with c/c++ interoperability.
Some more details would help.