Rewriting code to remove loop-carried dependence

Rewriting code to remove loop-carried dependence - c++

I'm trying to rewrite this outer for loop to allow multiple threads to execute the iterations in parallel. Actually, for now I just want two threads to compute this although a more general solution would be ideal. The issue is that there's a loop carried dependence: nval[j] is added to the previous value of nval[j] along with this iteration's value of avval.
void loop_func(Arr* a, int blen, double* nval) {
// an Arr is a struct of an array and len field
assert(a->len < blen);
long long int i = 0, j = 0, jlo = 0, jhi = 0;
double avval = 0.0;
for (i = 0; i < blen; i++)
nval[i] = 0.0;
const double quot = static_cast<double>(blen) / static_cast<double>(a->len);
for (auto i = 1; i < a->len - 1; i++) {
j_lower = (int)(0.5 * (2 * i - 1) * quot);
j_upper = (int)(0.5 * (2 * i + 1) * quot);
printf("a->val index: %lld\t\tnval index: %lld-%lld\n", i, j_lower, j_upper);
avval = a->val[i] / (j_upper - j_lower + 1);
for (j = j_lower; j <= j_upper; j++) {
nval[j] += avval;
}
}
}
For convenience, I'm printing out some details that were printed via the printf above.
a->val index: 1 nval index: 1-3
a->val index: 2 nval index: 3-5
a->val index: 3 nval index: 5-7
a->val index: 4 nval index: 7-9
a->val index: 5 nval index: 9-11
a->val index: 6 nval index: 11-13
I'd ideally want to have thread 1 handle the a->val index 1-3 and thread 2 handle the a-val index 4-6.
Question 1: can anyone describe a code transformation that would remove this dependence?
Question 2: are there C/C++ tools that can do this? Maybe something built with LLVM? I will likely have a number of different situations like this where I need to do some parallel execution. Similarly, if there are general techniques that can be applied to remove such loop-carried dependencies, I'd like to learn about this more generally.

Related

Shortest path question involving transfer in cyclic graph

I'm trying to solve a problem of graph:
There are (n+1) cities(no. from 0 to n), and (m+1) bus lines(no. from 0 to m).
(A line may contain repeated cities, meaning the line have a cycle.)
Each line covers several cities, and it takes t_ij to run from city i to city j (t_ij may differ in different lines).
Moreover, it takes extra transfer_time to each time you get in a bus.
An edge look like this: city i --(time)-->city2
Example1:
n = 2, m = 2, start = 0, end = 2
line0:
0 --(1)--> 1 --(1)--> 2; transfer_time = 1
line1:
0 --(2)--> 2; transfer_time = 2
Line0 take 1+1+1 = 3 and line1 takes 4, so the min is 3.
Example2:
n = 4, m = 0, start = 0, end = 4
line0:
0 --(2)--> 1 --(3)--> 2 --(3)-->3 --(3)--> 1 --(2)--> 4; transfer_time = 1
it takes 1(get in at 0) + 2(from 0 to 1) + 1(get off and get in, transfer) + 2 = 6
I've tried to solve it with Dijkstra Algorithm, but failed to handle graph with cycles(like Example2).
Below is my code.
struct Edge {
int len;
size_t line_no;
};
class Solution {
public:
Solution() = default;
//edges[i][j] is a vector, containing ways from city i to j in different lines
int findNearestWay(vector<vector<vector<Edge>>>& edges, vector<int>& transfer_time, size_t start, size_t end) {
size_t n = edges.size();
vector<pair<int, size_t>> distance(n, { INT_MAX / 2, -1 }); //first: len, second: line_no
distance[start].first = 0;
vector<bool> visited(n);
int cur_line = -1;
for (int i = 0; i < n; ++i) {
int next_idx = -1;
//find the nearest city
for (int j = 0; j < n; ++j) {
if (!visited[j] && (next_idx == -1 || distance[j].first < distance[next_idx].first))
next_idx = j;
}
visited[next_idx] = true;
cur_line = distance[next_idx].second;
//update distance of other cities
for (int j = 0; j < n; ++j) {
for (const Edge& e : edges[next_idx][j]) {
int new_len = distance[next_idx].first + e.len;
//transfer
if (cur_line == -1 || cur_line != e.line_no) {
new_len += transfer_time[e.line_no];
}
if (new_len < distance[j].first) {
distance[j].first = new_len;
distance[j].second = e.line_no;
}
}
}
}
return distance[end].first == INT_MAX / 2 ? -1 : distance[end].first;
}
};
Is there a better practice to work out it? Thanks in advance.

Your visited set looks wrong. The "nodes" you would feed into Djikstra's algorithm cannot simply be cities, because that doesn't let you model the cost of switching from one line to another within a city. Each node must be a pair consisting of a city number and a bus line number, representing the bus line you are currently riding on. The bus line number can be -1 to represent that you are not on a bus, and the starting and destination nodes would have a bus line numbers of -1.
Then each edge would represent either the cost of staying on the line you are currently on to ride to the next city, or getting off the bus in your current city (which is free), or getting on a bus line within your current city (which has the transfer cost).
You said the cities are numbered from 0 to n but I see a bunch of loops that stop at n - 1 (because they use < n as the loop condition), so that might be another problem.

Construct mirror vector around the centre element in c++

I have a for-loop that is constructing a vector with 101 elements, using (let's call it equation 1) for the first half of the vector, with the centre element using equation 2, and the latter half being a mirror of the first half.
Like so,
double fc = 0.25
const double PI = 3.1415926
// initialise vectors
int M = 50;
int N = 101;
std::vector<double> fltr;
fltr.resize(N);
std::vector<int> mArr;
mArr.resize(N);
// Creating vector mArr of 101 elements, going from -50 to +50
int count;
for(count = 0; count < N; count++)
mArr[count] = count - M;
// using these elements, enter in to equations to form vector 'fltr'
int n;
for(n = 0; n < M+1; n++)
// for elements 0 to 50 --> use equation 1
fltr[n] = (sin((fc*mArr[n])-M))/((mArr[n]-M)*PI);
// for element 51 --> use equation 2
fltr[M] = fc/PI;
This part of the code works fine and does what I expect, but for elements 52 to 101, I would like to mirror around element 51 (the output value using equation)
For a basic example;
1 2 3 4 5 6 0.2 6 5 4 3 2 1
This is what I have so far, but it just outputs 0's as the elements:
for(n = N; n > M; n--){
for(i = 0; n < M+1; i++)
fltr[n] = fltr[i];
}
I feel like there is an easier way to mirror part of a vector but I'm not sure how.
I would expect the values to plot like this:

After you have inserted the middle element, you can get a reverse iterator to the mid point and copy that range back into the vector through std::back_inserter. The vector is named vec in the example.
auto rbeg = vec.rbegin(), rend = vec.rend();
++rbeg;
copy(rbeg, rend, back_inserter(vec));

Lets look at your code:
for(n = N; n > M; n--)
for(i = 0; n < M+1; i++)
fltr[n] = fltr[i];
And lets make things shorter, N = 5, M = 3,
array is 1 2 3 0 0 and should become 1 2 3 2 1
We start your first outer loop with n = 3, pointing us to the first zero. Then, in the inner loop, we set i to 0 and call fltr[3] = fltr[0], leaving us with the array as
1 2 3 1 0
We could now continue, but it should be obvious that this first assignment was useless.
With this I want to give you a simple way how to go through your code and see what it actually does. You clearly had something different in mind. What should be clear is that we do need to assign every part of the second half once.
What your code does is for each value of n to change the value of fltr[n] M times, ending with setting it to fltr[M] in any case, regardless of what value n has. The result should be that all values in the second half of the array are now the same as the center, in my example it ends with
1 2 3 3 3
Note that there is also a direct error: starting with n = N and then accessing fltr[n]. N is out of bounds for an arry of size N.
To give you a very simple working solution:
for(int i=0; i<M; i++)
{
fltr[N-i-1] = fltr[i];
}
N-i-1 is the mirrored address of i (i = 0 -> N-i-1 = 101-0-1 = 100, last valid address in an array with 101 entries).
Now, I saw several guys answering with a more elaborate code, but I thought that as a beginner, it might be beneficial for you to do this in a very simple manner.
Other than that, as #Pzc already said in the comments, you could do this assignment in the loop where the data is generated.
Another thing, with your code
for(n = 0; n < M+1; n++)
// for elements 0 to 50 --> use equation 1
fltr[n] = (sin((fc*mArr[n])-M))/((mArr[n]-M)*PI);
// for element 51 --> use equation 2
fltr[M] = fc/PI;
I have two issues:
First, the indentation makes it look like fltr[M]=.. would be in the loop. Don't do that, not even if this should have been a mistake when you wrote the question and is not like this in the code. This will lead to errors in the future. Indentation is important. Using the auto-indentation of your IDE is an easy way to go. And try to use brackets, even if it is only one command.
Second, n < M+1 as a condition includes the center. The center is located at adress 50, and 50 < 50+1. You haven't seen any problem as after the loop you overwrite it, but in a different situation, this can easily produce errors.
There are other small things I'd change, and I recommend that, when your code works, you post it on CodeReview.

Let's use std::iota, std::transform, and std::copy instead of raw loops:
const double fc = 0.25;
constexpr double PI = 3.1415926;
const std::size_t M = 50;
const std::size_t N = 2 * M + 1;
std::vector<double> mArr(M);
std::iota(mArr.rbegin(), mArr.rend(), 1.); // = [M, M - 1, ..., 1]
const auto fn = [=](double m) { return std::sin((fc * m) + M) / ((m + M) * PI); };
std::vector<double> fltr(N);
std::transform(mArr.begin(), mArr.end(), fltr.begin(), fn);
fltr[M] = fc / PI;
std::copy(fltr.begin(), fltr.begin() + M, fltr.rbegin());

Divide array into smaller consecutive parts such that NEO value is maximal

On this years Bubble Cup (finished) there was the problem NEO (which I couldn't solve), which asks
Given array with n integer elements. We divide it into several part (may be 1), each part is a consecutive of elements. The NEO value in that case is computed by: Sum of value of each part. Value of a part is sum all elements in this part multiple by its length.
Example: We have array: [ 2 3 -2 1 ]. If we divide it like: [2 3] [-2 1]. Then NEO = (2 + 3) * 2 + (-2 + 1) * 2 = 10 - 2 = 8.
The number of elements in array is smaller then 10^5 and the numbers are integers between -10^6 and 10^6
I've tried something like divide and conquer to constantly split array into two parts if it increases the maximal NEO number otherwise return the NEO of the whole array. But unfortunately the algorithm has worst case O(N^2) complexity (my implementation is below) so I'm wondering whether there is a better solution
EDIT: My algorithm (greedy) doesn't work, taking for example [1,2,-6,2,1] my algorithm returns the whole array while to get the maximal NEO value is to take parts [1,2],[-6],[2,1] which gives NEO value of (1+2)*2+(-6)+(1+2)*2=6
#include <iostream>
int maxInterval(long long int suma[],int first,int N)
{
long long int max = -1000000000000000000LL;
long long int curr;
if(first==N) return 0;
int k;
for(int i=first;i<N;i++)
{
if(first>0) curr = (suma[i]-suma[first-1])*(i-first+1)+(suma[N-1]-suma[i])*(N-1-i); // Split the array into elements from [first..i] and [i+1..N-1] store the corresponding NEO value
else curr = suma[i]*(i-first+1)+(suma[N-1]-suma[i])*(N-1-i); // Same excpet that here first = 0 so suma[first-1] doesn't exist
if(curr > max) max = curr,k=i; // find the maximal NEO value for splitting into two parts
}
if(k==N-1) return max; // If the max when we take the whole array then return the NEO value of the whole array
else
{
return maxInterval(suma,first,k+1)+maxInterval(suma,k+1,N); // Split the 2 parts further if needed and return it's sum
}
}
int main() {
int T;
std::cin >> T;
for(int j=0;j<T;j++) // Iterate over all the test cases
{
int N;
long long int NEO[100010]; // Values, could be long int but just to be safe
long long int suma[100010]; // sum[i] = sum of NEO values from NEO[0] to NEO[i]
long long int sum=0;
int k;
std::cin >> N;
for(int i=0;i<N;i++)
{
std::cin >> NEO[i];
sum+=NEO[i];
suma[i] = sum;
}
std::cout << maxInterval(suma,0,N) << std::endl;
}
return 0;
}

This is not a complete solution but should provide some helpful direction.
Combining two groups that each have a positive sum (or one of the sums is non-negative) would always yield a bigger NEO than leaving them separate:
m * a + n * b < (m + n) * (a + b) where a, b > 0 (or a > 0, b >= 0); m and n are subarray lengths
Combining a group with a negative sum with an entire group of non-negative numbers always yields a greater NEO than combining it with only part of the non-negative group. But excluding the group with the negative sum could yield an even greater NEO:
[1, 1, 1, 1] [-2] => m * a + 1 * (-b)
Now, imagine we gradually move the dividing line to the left, increasing the sum b is combined with. While the expression on the right is negative, the NEO for the left group keeps decreasing. But if the expression on the right gets positive, relying on our first assertion (see 1.), combining the two groups would always be greater than not.
Combining negative numbers alone in sequence will always yield a smaller NEO than leaving them separate:
-a - b - c ... = -1 * (a + b + c ...)
l * (-a - b - c ...) = -l * (a + b + c ...)
-l * (a + b + c ...) < -1 * (a + b + c ...) where l > 1; a, b, c ... > 0
O(n^2) time, O(n) space JavaScript code:
function f(A){
A.unshift(0);
let negatives = [];
let prefixes = new Array(A.length).fill(0);
let m = new Array(A.length).fill(0);
for (let i=1; i<A.length; i++){
if (A[i] < 0)
negatives.push(i);
prefixes[i] = A[i] + prefixes[i - 1];
m[i] = i * (A[i] + prefixes[i - 1]);
for (let j=negatives.length-1; j>=0; j--){
let negative = prefixes[negatives[j]] - prefixes[negatives[j] - 1];
let prefix = (i - negatives[j]) * (prefixes[i] - prefixes[negatives[j]]);
m[i] = Math.max(m[i], prefix + negative + m[negatives[j] - 1]);
}
}
return m[m.length - 1];
}
console.log(f([1, 2, -5, 2, 1, 3, -4, 1, 2]));
console.log(f([1, 2, -4, 1]));
console.log(f([2, 3, -2, 1]));
console.log(f([-2, -3, -2, -1]));
Update
This blog provides that we can transform the dp queries from
dp_i = sum_i*i + max(for j < i) of ((dp_j + sum_j*j) + (-j*sum_i) + (-i*sumj))
to
dp_i = sum_i*i + max(for j < i) of (dp_j + sum_j*j, -j, -sum_j) ⋅ (1, sum_i, i)
which means we could then look at each iteration for an already seen vector that would generate the largest dot product with our current information. The math alluded to involves convex hull and farthest point query, which are beyond my reach to implement at this point but will make a study of.

How to reduce execution time in C++ for the following code?

I have written this code which has an execution time of 3.664 sec but the time limit is 3 seconds.
The question is this-
N teams participate in a league cricket tournament on Mars, where each
pair of distinct teams plays each other exactly once. Thus, there are a total
of (N × (N1))/2 matches. An expert has assigned a strength to each team,
a positive integer. Strangely, the Martian crowds love onesided matches
and the advertising revenue earned from a match is the absolute value of
the difference between the strengths of the two matches. Given the
strengths of the N teams, find the total advertising revenue earned from all
the matches.
Input format
Line 1 : A single integer, N.
Line 2 : N space separated integers, the strengths of the N teams.
#include<iostream>
using namespace std;
int main()
{
int n;
cin>>n;
int stren[200000];
for(int a=0;a<n;a++)
cin>>stren[a];
long long rev=0;
for(int b=0;b<n;b++)
{
int pos=b;
for(int c=pos;c<n;c++)
{
if(stren[pos]>stren[c])
rev+=(long long)(stren[pos]-stren[c]);
else
rev+=(long long)(stren[c]-stren[pos]);
}
}
cout<<rev;
}
Can you please give me a solution??

Rewrite your loop as:
sort(stren);
for(int b=0;b<n;b++)
{
rev += (2 * b - n + 1) * static_cast<long long>(stren[b]);
}
Live code here
Why does it workYour loops make all pairs of 2 numbers and add the difference to rev. So in a sorted array, bth item is subtracted (n-1-b) times and added b times. Hence the number 2 * b - n + 1
There can be 1 micro optimization that possibly is not needed:
sort(stren);
for(int b = 0, m = 1 - n; b < n; b++, m += 2)
{
rev += m * static_cast<long long>(stren[b]);
}

In place of the if statement, use
rev += std::abs(stren[pos]-stren[c]);
abs returns the positive difference between two integers. This will be much quicker than an if test and ensuing branching. The (long long) cast is also unnecessary although the compiler will probably optimise that out.
There are other optimisations you could make, but this one should do it. If your abs function is poorly implemented on your system, you could always make use of this fast version for computing the absolute value of i:
(i + (i >> 31)) ^ (i >> 31) for a 32 bit int.
This has no branching at all and would beat even an inline ternary! (But you should use int32_t as your data type; if you have 64 bit int then you'll need to adjust my formula.) But we are in the realms of micro-optimisation here.

for(int b = 0; b < n; b++)
{
for(int c = b; c < n; c++)
{
rev += abs(stren[b]-stren[c]);
}
}
This should give you a speed increase, might be enough.

An interesting approach might be to collapse down the strengths from an array - if that distribution is pretty small.
So:
std::unordered_map<int, int> strengths;
for (int i = 0; i < n; ++i) {
int next;
cin >> next;
++strengths[next];
}
This way, we can reduce the number of things we have to sum:
long long rev = 0;
for (auto a = strengths.begin(); a != strengths.end(); ++a) {
for (auto b = std::next(a), b != strengths.end(); ++b) {
rev += abs(a->first - b->first) * (a->second * b->second);
// ^^^^ stren diff ^^^^^^^^ ^^ number of occurences ^^
}
}
cout << rev;
If the strengths tend to be repeated a lot, this could save a lot of cycles.

What exactly we are doing in this problem is: For all combinations of pairs of elements, we are adding up the absolute values of the differences between the elements of the pair. i.e. Consider the sample input
3 10 3 5
Ans (Take only absolute values) = (3-10) + (3-3) + (3-5) + (10-3) + (10-5) + (3-5) = 7 + 0 + 2 + 7 + 5 + 2 = 23
Notice that I have fixed 3, iterated through the remaining elements, found the differences and added them to Ans, then fixed 10, iterated through the remaining elements and so on till the last element
Unfortunately, N(N-1)/2 iterations are required for the above procedure, which wouldn't be ok for the time limit.
Could we better it?
Let's sort the array and repeat this procedure. After sorting, the sample input is now 3 3 5 10
Let's start by fixing the greatest element, 10 and iterating through the array like how we did before (of course, the time complexity is the same)
Ans = (10-3) + (10-3) + (10-5) + (5-3) + (5-3) + (3-3) = 7 + 7 + 5 + 2 + 2 = 23
We could rearrange the above as
Ans = (10)(3)-(3+3+5) + 5(2) - (3+3) + 3(1) - (3)
Notice a pattern? Let's generalize it.
Suppose we have an array of strengths arr[N] of size N indexed from 0
Ans = (arr[N-1])(N-1) - (arr[0] + arr[1] + ... + arr[N-2]) + (arr[N-2])(N-2) - (arr[0] + arr[1] + arr[N-3]) + (arr[N-3])(N-3) - (arr[0] + arr[1] + arr[N-4]) + ... and so on
Right. So let's put this new idea to work. We'll introduce a 'sum' variable. Some basic DP to the rescue.
For i=0 to N-1
sum = sum + arr[i]
Ans = Ans + (arr[i+1]*(i+1)-sum)
That's it, you just have to sort the array and iterate only once through it. Excluding the sorting part, it's down to N iterations from N(N-1)/2, I suppose that's called O(N) time EDIT: That is O(N log N) time overall
Hope it helped!

Time complexity of nested loop: where does cn(n+1)/2 come from?

Consider the following loop:
for (i =1; i <= n; i++) {
for (j = 1; j <= i; j++) {
k = k + i + j;
}
}
The outer loop executes n times. For i= 1, 2, ..., the inner loop is executed one time, two times, and
n times. Thus, the time complexity for the loop is
T(n)=c+2c+3c+4c...nc
=cn(n+1)/2
=c/2(n^2)+c/2n
=O(n^2)..
Ok so I don't understand how the time complexity, T(n) even determines c+2c+3c. etc.. and then cn(n+1)/2? Where did that come from?

The sum 1 + 2 + 3 + 4 + ... + n is equal to n(n+1)/2, which is the Gauss series. Therefore,
c + 2c + 3c + ... + nc
= c(1 + 2 + 3 + ... + n)
= cn(n+1) / 2
This summation comes up a lot in algorithmic analysis and is useful to know when working with big-O notation.
Or is your question where the summation comes from at all?
Hope this helps!

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Rewriting code to remove loop-carried dependence - c++

Related

Shortest path question involving transfer in cyclic graph

Construct mirror vector around the centre element in c++

Divide array into smaller consecutive parts such that NEO value is maximal

How to reduce execution time in C++ for the following code?

Time complexity of nested loop: where does cn(n+1)/2 come from?

Categories

Resources