Using omp parallel for in multiplication algorithm (BigInt multiplication)

Using omp parallel for in multiplication algorithm (BigInt multiplication) - c++

For educational purpose I'm developing c++ library for operating with large numbers represented as vectors of chars (vector<char>).
Here is algorithm that I am using for multiplication:
string multiplicationInner(CharVector a, CharVector b) {
reverse(a.begin(), a.end());
reverse(b.begin(), b.end());
IntVector stack(a.size() + b.size() + 1);
int i, j;
for (i = 0; i < a.size(); i++)
for (j = 0; j < b.size(); j++)
stack[i + j] += charToInt(a[i]) * charToInt(b[j]);
for (int i = 0; i < stack.size(); i++) {
int num = stack[i] % 10;
int move = stack[i] / 10;
stack[i] = num;
if (stack[i + 1])
stack[i + 1] += move;
else if (move)
stack[i + 1] = move;
}
CharVector stackChar = intVectorToCharVector(&stack);
deleteZerosAtEnd(&stackChar);
reverse(stackChar.begin(), stackChar.end());
return charVectorToString(&stackChar);
};
This function is called billion times in my program, so I would like to implement #pragma omp parallel for in it.
My question is: How can i parallelize first cycle?
This is what I have tried:
int i, j;
#pragma omp parallel for
for (i = 0; i < a.size(); i++) {
for (j = 0; j < b.size(); j++)
stack[i + j] += charToInt(a[i]) * charToInt(b[j]);
}
Algorithm stops working properly.
Advice needed.
Edit:
This variant works, but (with omp parallel for) benchmark shows it is 15x-20x slower than without it. (CPU: M1 Pro, 8 cores)
#pragma omp parallel for schedule(dynamic)
for (int k = 0; k < a.size() + b.size(); k++) {
for (int i = 0; i < a.size(); i++) {
int j = k - i;
if (j >= 0 && j < b.size()) {
stack[k] += charToInt(a[i]) * charToInt(b[j]);
}
}
}
This is part of my program, where multiplication is called most often. (Miller-Rabin test)
BigInt modularExponentiation(BigInt base, BigInt exponent, BigInt mod) {
BigInt x = B_ONE; // 1
BigInt y = base;
while (exponent > B_ZERO) { // while exponent > 0
if (isOdd(exponent))
x = (x * y) % mod;
y = (y * y) % mod;
exponent /= B_TWO; // exponent /= 2
}
return (x % mod);
};
bool isMillerRabinTestOk(BigInt candidate) {
if (candidate < B_TWO)
return false;
if (candidate != B_TWO && isEven(candidate))
return false;
BigInt canditateMinusOne = candidate - B_ONE;
BigInt s = canditateMinusOne;
while (isEven(s))
s /= B_TWO;
for (int i = 0; i < MILLER_RABIN_TEST_ITERATIONS; i++) {
BigInt a = BigInt(rand()) % canditateMinusOne + B_ONE;
BigInt temp = s;
BigInt mod = modularExponentiation(a, temp, candidate);
while (temp != canditateMinusOne && mod != B_ONE && mod != canditateMinusOne) {
mod = (mod * mod) % candidate;
temp *= B_TWO;
}
if (mod != canditateMinusOne && isEven(temp))
return false;
}
return true;
};

Your loops do not have the proper structure for parallelization. However, you can transform them:
for (k=0; k<a.size()+b.size(); k++) {
for (i=0; i<a.size(); i++) {
j=k-i;
stack[k] += a[i] * b[j];
}
Now the outer loop has no conflicts. Look at this as a "coordinate transformation": you're still traversing the same i/j row/column space, but now in new coordinates: k/i stands for diagonal/row.
Btw, this code is a little metaphorical. Check your loop bounds, and use the right multiplication. I'm just indicating the principle here.

Related

vector subscript out of range line 1475

I can't figure out where the error is.
vector<int> subVector(const vector<int>& a, const vector<int>& b) {
vector<int> res;
res = a;
for (int i = 0; i < res.size() && i < b.size(); i++)
{
if (res[i] < b[i])
{
int k = 1;
while (res[i + k] == 0)
k++;
for (int j = 0; j < k; j++)
{
res[i + j + 1]--;
res[i + j] += 10;
}
}
res[i] -= b[i];
}
return res;
}
vector<int> addVector(const vector<int>& a, const vector<int>& b) {
vector<int> ans;
ans.resize(max(a.size(), b.size()) + 1);
for (int i = 0; i < a.size(); i++)
ans[i] = a[i];
for (int i = 0; i < b.size(); i++)
{
ans[i] += b[i];
ans[i + 1] += ans[i] / 10;
ans[i] = ans[i] % 10;
int k = i + 1;
while (ans[k] >= 10)
{
ans[k + 1] += ans[k] / 10;
ans[k] %= 10;
k++;
}
}
return ans;
}
vector subscript out of range line 1475 and error debug.
I tried to fix it but couldn't. I can't understand how this error works.

Wherever you are accessing a vector with the [] operator, and inside it, you are doing some arithmetic operation, you need to make sure that the result is not exceeding the vector boundaries.
For example:
vector<int> subVector(const vector<int>& a, const vector<int>& b) {
vector<int> res;
res = a;
for (int i = 0; i < res.size() && i < b.size(); i++) <--- 'i' is promised to be in 'res' boundry
{
if (res[i] < b[i])
{
int k = 1;
while (res[i + k] == 0) <--- 'i+1' can exceeds the 'res' boundry
// Rest of the code ...
This is just one example. You are doing it all over your code.

Dynamic dice sum - modulo

You have d dice, and each die has f faces numbered 1, 2, ..., f.
Return the number of possible ways (out of fd total ways) modulo 10^9 + 7 to roll the dice so the sum of the face up numbers equals target.
My code works well for small values of f,d and target. It gives 0 as answer for big values say 30, 30, 500.
I am getting a lot of difficulty solving where modulo occurs.
What is wrong with my solution ?
int numRollsToTarget(int d, int f, int target)
{
long long int dp[d][target];
for (int i = 0; i < d; i++)
{
for (int j = 0; j < target; j++)
{
dp[i][j] = 0;
}
}
for (int i = 0; i < f && i < target; i++)
{
dp[0][i] = 1;
}
for (int i = 1; i < d; i++)
{
for (int j = 0; j < target; j++)
{
if (j >= i)
for (int k = max(0, j - f); k < min(j, f); k++)
dp[i][j] = (dp[i - 1][j - k - 1] % 1000000007 +
dp[i][j] % 1000000007) % 1000000007;
}
}
return dp[d - 1][target - 1];
}

How to raise a zero-one matrix to any power in C++?

I made a zero-one matrix with power 2. However, I want the code to be applied to any power the user enters. I tried several times, but it didn't work.
Here's a part of the code that would concern you.
Notes: Suppose the user has entered his (n*m) matrix which is "a", as n and m are equals and they are denoted by s.
k=0;
for(int j=0; j<s; j++)
for(int i=0; i<s; i++)
{
m[k]=0;
for(int t=0; t<s; t++)
m[k]+=a[j][t]*a[t][i];
k++;
}

Here is my implementation for matrix exponentiation:
struct matrix {
intt m[K][K];
matrix() {
memset (m, 0, sizeof (m));
}
matrix operator * (matrix b) {
matrix c = matrix();
for (intt i = 0; i < K; i++) {
for (intt k = 0; k < K; k++) {
for (intt j = 0; j < K; j++) {
c.m[i][j] = (c.m[i][j] + m[i][k] * b.m[k][j]) % MOD;
}
}
}
return c;
}
matrix pow (intt n) {
if (n <= 0) {
return matrix();
}
if (n == 1) {
return *this;
}
if (n % 2 == 1) {
return (*this) * pow (n - 1);
} else {
matrix X = pow (n / 2);
return X * X;
}
}
};

Dynamic approach to the TSP

I'm having trouble recognizing why this algorithm doesn't return the shortest path for the TSP.
vector<int> tsp(int n, vector< vector<float> >& cost)
{
long nsub = 1 << n;
vector< vector<float> > opt(nsub, vector<float>(n));
for (long s = 1; s < nsub; s += 2)
for (int i = 1; i < n; ++i) {
vector<int> subset;
for (int u = 0; u < n; ++u)
if (s & (1 << u))
subset.push_back(u);
if (subset.size() == 2)
opt[s][i] = cost[0][i];
else if (subset.size() > 2) {
float min_subpath = FLT_MAX;
long t = s & ~(1 << i);
for (vector<int>::iterator j = subset.begin(); j != subset.end(); ++j)
if (*j != i && opt[t][*j] + cost[*j][i] < min_subpath)
min_subpath = opt[t][*j] + cost[*j][i];
opt[s][i] = min_subpath;
}
}
vector<int> tour;
tour.push_back(0);
bool selected[n];
fill(selected, selected + n, false);
selected[0] = true;
long s = nsub - 1;
for (int i = 0; i < n - 1; ++i) {
int j = tour.back();
float min_subpath = FLT_MAX;
int best_k;
for (int k = 0; k < n; ++k)
if (!selected[k] && opt[s][k] + cost[k][j] < min_subpath) {
min_subpath = opt[s][k] + cost[k][j];
best_k = k;
}
tour.push_back(best_k);
selected[best_k] = true;
s -= 1 << best_k;
}
tour.push_back(0);
return tour;
}
For example, on a distance cost matrix of just 5 points (5 different nodes in the graph), the algorithm returns a path that's less than optimal. Any help in recognizing a blatant or small error would be appreciated. Or any helpful tips as to what's going wrong.

One thing that looks odd is that the main for loop does things even if i is not part of the subset s.
In other words, opt[17][8] will be set to cost[0][8]. opt[17][8] represents the state of being at node 8, and having visited nodes 0 and 4 (because 5=2^0+2^4).
This should be marked as being impossible because if we are at node 8, we must certainly have visited node 8!
I would suggest preventing these cases from occuring by changing:
for (int i = 1; i < n; ++i) {
vector<int> subset;
to
for (int i = 1; i < n; ++i) {
vector<int> subset;
if ((s&(1<<i))==0) {
opt[s][i]=FLT_MAX;
continue;
}

Nested loop for(j= iterates over all nodes in subset, including the starting node. This results in using uninitialized values opt[t][0] and therefore in incorrect optimal path length calculation.
The easiest fix would be to exclude starting node from subset:
for (int u = 1; u < n; ++u)
...
if (subset.size() == 1)
...
else if (subset.size() > 1)

Can anyone explain this algorithm for calculating large factorials?

i came across the following program for calculating large factorials(numbers as big as 100).. can anyone explain me the basic idea used in this algorithm??
I need to know just the mathematics implemented in calculating the factorial.
#include <cmath>
#include <iostream>
#include <cstdlib>
using namespace std;
int main()
{
unsigned int d;
unsigned char *a;
unsigned int j, n, q, z, t;
int i,arr[101],f;
double p;
cin>>n;
p = 0.0;
for(j = 2; j <= n; j++)
p += log10(j);
d = (int)p + 1;
a = new unsigned char[d];
for (i = 1; i < d; i++)
a[i] = 0; //initialize
a[0] = 1;
p = 0.0;
for (j = 2; j <= n; j++)
{
q = 0;
p += log10(j);
z = (int)p + 1;
for (i = 0; i <= z/*NUMDIGITS*/; i++)
{
t = (a[i] * j) + q;
q = (t / 10);
a[i] = (char)(t % 10);
}
}
for( i = d -1; i >= 0; i--)
cout << (int)a[i];
cout<<"\n";
delete []a;
return 0;
}

Note that
n! = 2 * 3 * ... * n
so that
log(n!) = log(2 * 3 * ... * n) = log(2) + log(3) + ... + log(n)
This is important because if k is a positive integer then the ceiling of log(k) is the number of digits in the base-10 representation of k. Thus, these lines of code are counting the number of digits in n!.
p = 0.0;
for(j = 2; j <= n; j++)
p += log10(j);
d = (int)p + 1;
Then, these lines of code allocate space to hold the digits of n!:
a = new unsigned char[d];
for (i = 1; i < d; i++)
a[i] = 0; //initialize
Then we just do the grade-school multiplication algorithm
p = 0.0;
for (j = 2; j <= n; j++) {
q = 0;
p += log10(j);
z = (int)p + 1;
for (i = 0; i <= z/*NUMDIGITS*/; i++) {
t = (a[i] * j) + q;
q = (t / 10);
a[i] = (char)(t % 10);
}
}
The outer loop is running from j from 2 to n because at each step we will multiply the current result represented by the digits in a by j. The inner loop is the grade-school multiplication algorithm wherein we multiply each digit by j and carry the result into q if necessary.
The p = 0.0 before the nested loop and the p += log10(j) inside the loop just keep track of the number of digits in the answer so far.
Incidentally, I think there is a bug in this part of the program. The loop condition should be i < z not i <= z otherwise we will be writing past the end of a when z == d which will happen for sure when j == n. Thus replace
for (i = 0; i <= z/*NUMDIGITS*/; i++)
by
for (i = 0; i < z/*NUMDIGITS*/; i++)
Then we just print out the digits
for( i = d -1; i >= 0; i--)
cout << (int)a[i];
cout<<"\n";
and free the allocated memory
delete []a;

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Using omp parallel for in multiplication algorithm (BigInt multiplication) - c++

Related

vector subscript out of range line 1475

Dynamic dice sum - modulo

How to raise a zero-one matrix to any power in C++?

Dynamic approach to the TSP

Can anyone explain this algorithm for calculating large factorials?

Categories

Resources