I want to write a pass for adding a function call in for loop.
Let's assume the original source code is below.
for (int i=0; i<10; ++i)
{
printf("%d\n", i);
}
I want to make pass like changing the upper source code to below. (Not mean that changing the source code, but IR code in real.)
for (int i=0; i<10; ++i)
{
EXTERNALFUNC(i); // function call for external function
// and args is induction variable.
printf("%d\n", i);
}
I know about usage of getOrInsertFunction method but have no idea about
finding the loop & induction variables,
put the function call inside of function,
if there is many loop, put function calls in all loops.
If you run clang -emit-llvm on that source you'll notice that the loop body starts with precisely one phi node. (Well, at least it ought to do that, I haven't bothered to check.)
Some loops don't have induction variables. But the ones that do will have a phi node and some sort of increase, and you can identify those once you've found the loop.
To find loops, you can make a LoopInfo, or if you're doing this in a pass, call FAM.getResult<LoopAnalysis>(F); to get the LoopInfo your pass manager has made. For each loop found by LoopInfo, see if it's one you can analyse and do the right thing if it is, and if it isn't. The Loop's header must contain a phi, one of the incoming blocks must dominate the header, another must be in the loop, and the value in the loop must be an arithmetic operation that uses the phi. Like this:
scan: ; preds = %scan, %notnull
%previous = phi i32 [ -1, %notnull ], [ %current, %scan ]
%current = add i32 %previous, 1
…
br i1 %found, label %scan, label %test
If a loop doesn't have anything like this, it may be e.g. while(true){…}.
BasicBlock::phis() tells you about its phi nodes, and PHINode::users() returns a list of things that use it. In the example above, %current is an arithmethic operation, is one of the users of %previous, and%previous' getIncomingValue(1) returns %current too.
Related
I'm coding a simple function using std::vector below where input is an integer vector and the function proceeds the iteration based on the number of elements in the vector.
In terms of space and time efficiency, which following code are suitable?
HugeClass is actually a Big Integer which contains complex arithmetic while I put a simple arithmetic below for simplicity.
1) Gives a dimension of vector
void (HugeClass& huge, std::vector<int>& vec, int dim){
for(int i=0;i<dim;i++){
huge+=vec[i];
}
}
2) Calls a std::vector.size() to iterate
void (HugeClass& huge, std::vector<int>& vec){
for(int i=0;i<vec.size();i++){
huge+=vec[i];
}
}
dim can range in [100,1000000]
The syntax of a for loop in C++ is:
for ( init; condition; increment ) {
statement(s);
}
Here is the flow of control in a for loop:
The init step is executed first, and only once. This step allows you to declare and initialize any loop control variables. You are not required to put a statement here, as long as a semicolon appears.
Next, the condition is evaluated. If it is true, the body of the loop is executed. If it is false, the body of the loop does not execute and flow of control jumps to the next statement just after the for loop.
After the body of the for loop executes, the flow of control jumps back up to the increment statement. This statement allows you to update any loop control variables. This statement can be left blank, as long as a semicolon appears after the condition.
So in the case of
for(int i=0;i<vec.size();i++) {
huge+=vec[i];
}
vec.size() called each time but is probably inlined, and is probably a simple function.
On top of which
A smart enough optimizer may be able to deduce that it is a loop invariant with no side effects and elide it entirely (this is easier if the code is inlined, but may be possible even if it is not if the compiler does global optimization)
I have a for loop which access many memory pointers in each iteration. For each of these memory pointers, I created an index. My problem is that when I try to use open mp to parallelize this loop, I get the following error:
error: expected iteration declaration or initialization
I thought that this error would be one of the following:
-Open MP does not accept increment different than ++ or --
-Open MP does not accept multiple initialization in a loop
For reasons regarding performance, it is important to me to use these multiple indexes. Does anybody know the answer for my problem?
Here it is the code:
#pragma omp parallel default(shared)
{
int tID = omp_get_thread_num();
int i, iCF, iPF, iNF, iPJG, iCJG, iNJG, iPRJG, iCRJG;
##pragma omp for nowait
for(i=0, iCF=0, iPF=0, iNF=sqrBcksDim, iPJG=0, iCJG=0, iNJG=sqrBcksSize, iPRJG=0, iCRJG=0 ; iCF<RHSArraySize ; iPF=iCF, iCF=iNF, iNF+=sqrBcksDim, iPJG=iCJG, iCJG=iNJG, iNJG+=sqrBcksSize, iPRJG=iCRJG, iCRJG+=rectBcksSize, ++i)
{
}
}
Well, looking at that third clause, you’re doing a lot of inherently sequential computations that depend on the program state at the end of the previous iteration of the loop. You could move all of those operations but the += and ++ updates inside the body of the loop, and from the look of things possibly make the loop condition depend on iNF, correct? But some of them look like they still might be ordered. For a parallel algorithm, are there closed-form initializers you could use inside the loop body that depend only on i or something loop-invariant?
If not, and the inputs to each iteration really do depend on the results of previous iterations of the loop, then it’s not a parallel algorithm
One suggestion:
Here’s how I would try to fix this. You can only initialize i and increment it by a constant within the loop; however, you can equivalently move all the rest of those operations inside the loop. For example, I don’t know what else goes on inside the loop body, but if iCF is initialized to 0, iNF to sqrBcksDim and at the end of each iteration, iCF is set to the previous value of iNF and iNF is incremented by sqrBcksDim, it looks like you could rewrite the loop into something like:
int i;
#pragma omp for nowait
for ( i=0; i < RHSArraySize/sqrBcksDim; ++i )
{
const int iCF = i*sqrBcksDim;
const int iNF = iCF + sqrBcksDim;
// ...
}
Can you do that for your other variables? If you really have a parallel algorithm here, you should be able to, because each run of the loop should only depend on i and loop invariants, which you can use in your initializers. You’ll need to declare a variable outside the loop if you’re going to refer to it outside the body of the loop, but for the time being, just declare a new local variable and don’t read any variable outside the loop that you also write to inside the loop. If there are no implicit sequential dependencies, you should be able to initialize them all at the start of the loop body.
You might not end up doing it that way, but it might help you think about how to refactor.
I am just wondering why I would do this in c++
for(int i=0, n=something.size(); i<n; ++i)
vs
for(int i=0; i<something.size(); ++i)
..
Assuming syntactically correct versions of both samples, if the call to something.size() were expensive, the first sample would potentially be more efficient because it saves one call per loop iteration. Even so, you should measure whether it actually makes a difference.
Note that the two would have different semantics if the size of something were to change inside of the loop.
The loop condition is evaluated before every loop round, so if the operand of the comparison doesn't change (i.e. you don't mutate the sequence during its iteration), then you don't need to recompute the operand each time and instead hoist it out.
Whether that makes a difference depends on how much the compiler can see of the size() call. For instance, if it can prove that the result cannot change during the iteration, then it may already do the hoisting for you. If in doubt, compile both versions and compare the machine code.
If you do
for(int i=0; i<something.size(); ++i);
it will be correct.
You should check in some C++ handbook how for loop looks like.
Your second example is invalid C++ code
The two examples are not the same.
for(int i=0, n=something.size(); i<n; ++i)
{
// ....
}
evaluates something.size() only once.
for(int i=0; i<something.size(); ++i) // Syntax corrected
{
// ....
}
evaluates something.size() in each loop.
So they could behave very differently if something.size() changed while doing the loop.
If you know something.size() will not change, you should go for the first solution for performance reason (i.e. only one call to something.size()).
If something.size() can change (e.g. in the body of the for-loop) the second option is the way to go.
I am generalizing another problem I have that has a similar recursive call. In my case, the variables being used are strings, so I can't simply pass by value to avoid the code before and after the recursive call in the loop. Is there a way to turn this into an iterative loop? Please assume that the code before and after the recursive call in the loop cannot be changed to make this specific instance work.
This code tests to see if the sum of any combintion of ints from nums adds up to zero. The original value for index is 0, and max is the maximum number of numbers I want to add up in any given solution.
For further clarification, the numbers can be repeated, so I can't just try all possible combinations, because there are infinitely many.
void findSolution(const vector<int>& nums, vector<int>& my_list, int& mySum,
int index, const int max)
{
if(mySum == 0) {
/* print my_list and exit(0) */
}
if(index < max) {
for(int i = 0; i < nums.size(); ++i) {
my_list.push_back(nums[i]);
mySum += nums[i];
findSolution(nums, my_list, mySum, index+1, max);
mySum -= nums[i];
my_list.pop_back();
}
}
}
Maintain a manual stack.
In your case the only independent states of the recursive call are the value of I and the value of index. Index is just the recursive call depth.
Create a std vector of int called your stack, reserve it to max. Replace the loop with while stack true.
Before entering the loop, push zero on the stack.
Break the cod in your loop into 3 parts. A B and C. A is before recursive call, B is what you recursively call, and C is after. Included in C is the increment at the top of the loop, which happens after C in the original code.
The first thing you do in the loop is check if the top of the stack is nums size or bigger. If so, pop the stack, then execute C and continue unless stack is empty, in which case break.
Then execute A. Then push 0 on the stack and continue if the stack size is less than max. Then execute C. You can remove the C code duplication with a flag variable.
Remember that the top of the stack replaces any references to I. Basically we are replacing a recursive call which automatically makes a stack for us with manually maintaining the same stack, but only storing the absolute least amount we can get away with. The start of the loop does double duty as both the end of a recursive call and the start of the loop, so we can do away with goto.
Give it a try, and worry about the explanation after you have seen it. The stuff about the flag variable makes more sense after the code is in place.
Considering this code:
std::vector<myObject*> veryLargeArray;
for (int i = 0; i < veryLargeArray.size(); ++i)
{
param_type* currParams = veryLargeArray[i]->GetParams<param_type>();
currParams->phi = /* some complex formula */;
}
How would I step that code such that I know what answer is being stored in phi before another iteration of the loop starts which will effectively destroy currParams and with it my chances of watching its values in the debugger?
I am running into this situation all too often and my solution is to either recompile the code by putting a dummy variable just before the end of the block where I then put the break OR go through the array of values, which sometimes may be huge, just so that I can see what value was stored or may require extra work just to convert the stored param_type into the correct object. Both solutions are not ideal as the first introduces warnings (which is treated as an error, in which case I have to set per file rules) as well as recompilation of the code, both of which I would like to avoid, while the second wastes time.
You could have a tracepoint output the value of phi on each iteration through the loop. You should even be able to combine this with breakpoint conditions.
Set a break point on closing bracket. Open breakpoints window (Ctrl+D, B) and in the list of breakpoints select your breakpoint. Right click and select "Condition". In the condition dialog enter "i==veryLargeArray.Size()-1". Ok dialog and F5 ;-)
You could declare a variable declared outside of the loop to store your value between iterations and set a breakpoint on the closing bracket.
std::vector<myObject*> veryLargeArray;
int inspector; // assuming currParams->phi is int, change type accordingly
for (int i = 0; i < veryLargeArray.size(); ++i)
{
param_type* currParams = veryLargeArray[i]->GetParams<param_type>();
currParams->phi = /* some complex formula */;
inspector = currParams->phi;
}