Modulo operation for max operation in matrix multiplication

Modulo operation for max operation in matrix multiplication - c++

I have a function that find maximum value for a range :
A,B,C are 2 d matrices
void solve()
{
for (i = 0; i < n; i++)
{
for (j = 0; j < n; j++)
{
C[i][j] = 0;
for (k = 0; k < n; k++)
{
C[i][j] = max(C[i][j],A[i][k]*B[k][j]);
//C[i][j] can become very large if solve() is called multiple times
}
}
}
for(int i = 0;i<n;i++)
{
for(int j=0;j<n;j++)
{
A[i][j] = C[i][j];
}
}
}
solve() method can be called for large number of times (10^7)
A[i][j] , B[i][j], C[i][j] can be 10^9.
n will be small (about 20).
I need to print final matrix C with modulo m (i.e. C[i][j]%m)
Since we cannot apply mod for intermediate results (it can produce wrong results).
The problem is integer overflow since it can cross max of int and long.
Any suggestions to solve this problem (Any solution other than big int) ?

Since this is competitive programming you should think out of the box. Since you need to calculate the actual maximum and not the maximum of modulus, you can't use modulo while processing. However:
You are only doing multiplications and you are not worried about the actual value but rather the comparison between different results (to know which is bigger). You can use isomorphism. That is, calculate the log() of the numbers and intermediate results and keep the modulus as auxiliar information. You can do this because ab < cd <=> log(ab) < log(cd) <=> log(a) + log(b) < log(c) + log(d). So now you only have to do additions between numbers and the value will remain quite small. You will lose some precision but that should be fine considering the context. The problem is that you won't be able to reconstruct the modulus from the log, so you should keep the mod value in a struct or something.

Related

How to calculate Matrix efficiently in C++?

I am new to C++ and programming so I think I am making inefficient codes.
I was wondering whether there is any way I can speed up the matrix calculation process.
For example, this is the sample code I write which finds the maximum differences(in absolute value) between 3d array 'V' and 'Vnew'.
First, I take subtraction.
And then, I put the value of tempdiff[0][0][0] to 'dif'
Then, I compare 'dif' and tempdiff[i][j][k] and replace if the latter is larger than the former.
This is just a part of my code and there are lots of matrix calculations inside so that I have too many 'for' statements.
So I was wondering whether there is any way I could avoid using 'for' in the matrix calculations.
Thanks in advance.
for (int i = 0; i < Na; i++) {
for (int j = 0; j < Nd; j++) {
for (int k = 0; k < Ny; k++) {
tempdiff[i][j][k] = abs(V[i][j][k] - Vnew[i][j][k]);
}
}
}
dif = tempdiff[0][0][0];
for (int i = 0; i < Na; i++) {
for (int j = 0; j < Nd; j++) {
for (int k = 0; k < Ny; k++) {
if (tempdiff[i][j][k] > dif) {
dif = tempdiff[i][j][k];
}
else {
dif = dif;
}
}
}
}

There's not much you can do with the for loops, as the maximum difference can locate at all possible places. You have already succeeded in iterating the array in the correct, linear, order.
Compilers are generally quite efficient in optimising, but they apparently fail to flatten a contiguous array, such as float V[Na][Nd][Ny];. After you flatten it manually to float V[Na*Nd*Ny], at least clang can auto-vectorise and produce SIMD code for x64 and arm.
A further optimisation is to avoid making this in two steps, as the total memory throughput is exactly doubled with the temporary array compared to a one-pass solution.
I was assuming your matrices are of type float -- if you can select int, gcc can auto-vectorise this as well (relates to NaN handling); furthermore int16_t or int8_t types are even quicker to evaluate, as more operations can be packed to a single SIMD instruction.

Reordering Indices for matrix multiplication in c++

const int n=50;
double a[n][n];
double b[n][n];
double c[n][n];
for (int i = 0; i < n; i++) {
for (int j = 0; j < n; j++) {
for (int k = 0; k < n; k++) {
c[i][j] += a[i][k] * b[k][j];
}
cout << c[i][j] << " ";
}
cout << "\n";
I currently have a working code that multiplies two nxn matrices. I am trying to reorder the indices (ie i,k,j ... k,i,j) without touching the equation that does the multiplication. I am doing this to see how the order of the indices affects performance time, but if I just change the 'j's to 'k's and vice versa in my loops, my multiplication equation will not be correct.
I am wondering if what I am attempting to do is possible and if anyone can shed some light on what steps I can take to achieve this.

First of all, you shouldn't be printing out the c matrix at the point you are doing so, especially if you are trying to time an algorithm. What you should be doing is more similar to this:
const int n=50;
double a[n][n];
double b[n][n];
double c[n][n];
/* First multiply the matrices a,b into c. */
for (int i = 0; i < n; i++) {
for (int j = 0; j < n; j++) {
for (int k = 0; k < n; k++) {
c[i][j] += a[i][k] * b[k][j];
}
}
}
/* now print out the result for visual correctness check */
for (int i = 0; i < n; i++){
for (int j = 0; j < n; j++){
std::cout << c[i][j] << ' '; //this will leave a space after last character, but for this use case, nobody cares.
}
std::cout << std::endl;
}
Then you can just switch around the lines containing for loops (ie. for (int i = 0; i < n; i++)) around, and see if changing access pattern changes execution time/results.
Spoiler: It shouldn't affect results except in some border cases of weird values inside the matrices, that are caused by inexactness of floating point math. It should however affect execution time, but it will be brutally dominated by time taken by printing the matrix, unless measured properly.

If you are taking about performance time then It always come down to complexity. No matter how you change the order, your complexity is defined by the area of code which is doing most of your work.
Here all your loops run till n. Now no matter what order you change you have complexity of order O(n^3) Unless you change your logic. The best Matrix Multiplication Algorithm known so far is the Coppersmith-Winograd algorithm with O(n^2.3736 ) complexity but it is not used for practical purposes.
But you can use Strassen's algorithm which has O(n^2.8074 ) complexity

C++ Prime number task from the book

I'm a C++ beginner ;)
How good is the code below as a way of finding all prime numbers between 2-1000:
int i, j;
for (i=2; i<1000; i++) {
for (j=2; j<=(i/j); j++) {
if (! (i%j))
break;
if (j > (i/j))
cout << i << " is prime\n";
}
}

You stop when j = i.
A first simple optimization is to stop when j = sqrt(i) (since there can be no factors of a number greater than its square root).
A much faster implementation is for example the sieve of eratosthenes.
Edit: the code looks somewhat mysterious, so here's how it works:
The terminating condition on the inner for is i/j, equivalent to j<i (which is much clearer),since when finally have j==i, we'll have i/j==0 and the for will break.
The next check if(j>(i/j)) is really nasty. Basically it just checks whether the loop hit the for's end condition (therefore we have a prime) or if we hit the explicit break (no prime). If we hit the for's end, then j==i+1 (think about it) => i/j==0 => it's a prime. If we hit a break, it means j is a factor of i,but not just any factor, the smallest in fact (since we exit at the first j that divides i)!
Since j is the smallest factor,the other factor (or product of remaining factors, given by i/j) will be greater or equal to j, hence the test. If j<=i/j,we hit a break and j is the smallest factor of i.
That's some unreadable code!

Not very good. In my humble opinion, the indentation and spacing is hideous (no offense). To clean it up some:
int i, j;
for (i=2; i<1000; i++) {
for (j=2; i/j; j++) {
if (!(i % j))
break;
if (j > i/j)
cout << i << " is prime\n";
}
}
This reveals a bug: the if (j > i/j) ... needs to be on the outside of the inner loop for this to work. Also, I think that the i/j condition is more confusing (not to mention slower) than just saying j < i (or even nothing, because once j reaches i, i % j will be 0). After these changes, we have:
int i, j;
for (i=2; i<1000; i++) {
for (j=2; j < i; j++) {
if (!(i % j))
break;
}
if (j > i/j)
cout << i << " is prime\n";
}
This works. However, the j > i/j confuses the heck out of me. I can't even figure out why it works (I suppose I could figure it out if I spent a while looking like this guy). I would write if (j == i) instead.
What you have implemented here is called trial division. A better algorithm is the Sieve of Eratosthenes, as posted in another answer. A couple things to check if you implement a Sieve of Eratosthenes:
It should work.
It shouldn't use division or modulus. Not that these are "bad" (granted, they tend to be an order of magnitude slower than addition, subtraction, negation, etc.), but they aren't needed, and if they're present, it probably means the implementation isn't really that efficient.
It should be able to compute the primes less than 10,000,000 in about a second (depending on your hardware, compiler, etc.).

First off, your code is both short and correct, which is very good for at beginner. ;-)
This is what I would do to improve the code:
1) Define the variables inside the loops, so they don't get confused with something else. I would also make the bound a parameter or a constant.
#define MAX 1000
for(int i=2;i<MAX;i++){
for(int j=2;j<i/j;j++){
if(!(i%j)) break;
if(j>(i/j)) cout<<i<<" is prime\n";
}
}
2) I would use the Sieve of Eratosthenes, as Joey Adams and Mau have suggested. Notice how I don't have to write the bound twice, so the two usages will always be identical.
#define MAX 1000
bool prime[MAX];
memset(prime, sizeof(prime), true);
for(int i=4;i<MAX;i+=2) prime[i] = false;
prime[1] = false;
cout<<2<<" is prime\n";
for(int i=3;i*i<MAX;i+=2)
if (prime[i]) {
cout<<i<<" is prime\n";
for(int j=i*i;j<MAX;j+=i)
prime[j] = false;
}
The bounds are also worth noting. i*i<MAX is a lot faster than j > i/j and you also don't need to mark any numbers < i*i, because they will already have been marked, if they are composite. The most important thing is the time complexity though.
3) If you really want to make this algorithm fast, you need to cache optimize it. The idea is to first find all the primes < sqrt(MAX) and then use them to find the rest of the
primes. Then you can use the same block of memory to find all primes from 1024-2047, say,
and then 2048-3071. This means that everything will be kept in L1-cache. I once measured a ~12 time speedup by using this optimization on the Sieve of Eratosthenes.
You can also cut the space usage in half by not storing the even numbers, which means that
you don't have to perform the calculations to begin working on a new block as often.
If you are a beginner you should probably just forget about the cache for the moment though.

The one simple answer to the whole bunch of text we posted up here is : Trial division!
If someone mentioned mathematical basis that this task was based on, we'd save plenty of time ;)

#include <stdio.h>
#define N 1000
int main()
{
bool primes[N];
for(int i = 0 ; i < N ; i++) primes[i] = false;
primes[2] = true;
for(int i = 3 ; i < N ; i+=2) { // Check only odd integers
bool isPrime = true;
for(int j = i/2 ; j > 2 ; j-=2) { // Check only from largest possible multiple of current number
if ( j%2 == 0 ) { j = j-1; } // Check only with previous odd divisors
if(!primes[j]) continue; // Check only with previous prime divisors
if ( i % j == 0 ) {
isPrime = false;
break;
}
}
primes[i] = isPrime;
}
return 0;
}
This is working code. I also included many of the optimizations mentioned by previous posters. If there are any other optimizations that can be done, it would be informative to know.

This function is more efficient to see if a number is prime.
bool isprime(const unsigned long n)
{
if (n<2) return false;
if (n<4) return true;
if (n%2==0) return false;
if (n%3==0) return false;
unsigned long r = (unsigned long) sqrt(n);
r++;
for(unsigned long c=6; c<=r; c+=6)
{
if (n%(c-1)==0) return false;
if (n%(c+1)==0) return false;
}

Interesting Problem (Currency arbitrage)

Arbitrage is the process of using discrepancies in currency exchange values to earn profit.
Consider a person who starts with some amount of currency X, goes through a series of exchanges and finally ends up with more amount of X(than he initially had).
Given n currencies and a table (nxn) of exchange rates, devise an algorithm that a person should use to avail maximum profit assuming that he doesn't perform one exchange more than once.
I have thought of a solution like this:
Use modified Dijkstra's algorithm to find single source longest product path.
This gives longest product path from source currency to each other currency.
Now, iterate over each other currency and multiply to the maximum product so far, w(curr,source)(weight of edge to source).
Select the maximum of all such paths.
While this appears good, i still doubt of correctness of this algorithm and the completeness of the problem.(i.e Is the problem NP-Complete?) as it somewhat resembles the traveling salesman problem.
Looking for your comments and better solutions(if any) for this problem.
Thanks.
EDIT:
Google search for this topic took me to this here, where arbitrage detection has been addressed but the exchanges for maximum arbitrage is not.This may serve a reference.

Dijkstra's cannot be used here because there is no way to modify Dijkstra's to return the longest path, rather than the shortest. In general, the longest path problem is in fact NP-complete as you suspected, and is related to the Travelling Salesman Problem as you suggested.
What you are looking for (as you know) is a cycle whose product of edge weights is greater than 1, i.e. w1 * w2 * w3 * ... > 1. We can reimagine this problem to change it to a sum instead of a product if we take the logs of both sides:
log (w1 * w2 * w3 ... ) > log(1)
=> log(w1) + log(w2) + log(w3) ... > 0
And if we take the negative log...
=> -log(w1) - log(w2) - log(w3) ... < 0 (note the inequality flipped)
So we are now just looking for a negative cycle in the graph, which can be solved using the Bellman-Ford algorithm (or, if you don't need the know the path, the Floyd-Warshall algorihtm)
First, we transform the graph:
for (int i = 0; i < N; ++i)
for (int j = 0; j < N; ++j)
w[i][j] = -log(w[i][j]);
Then we perform a standard Bellman-Ford
double dis[N], pre[N];
for (int i = 0; i < N; ++i)
dis[i] = INF, pre[i] = -1;
dis[source] = 0;
for (int k = 0; k < N; ++k)
for (int i = 0; i < N; ++i)
for (int j = 0; j < N; ++j)
if (dis[i] + w[i][j] < dis[j])
dis[j] = dis[i] + w[i][j], pre[j] = i;
Now we check for negative cycles:
for (int i = 0; i < N; ++i)
for (int j = 0; j < N; ++j)
if (dis[i] + w[i][j] < dis[j])
// Node j is part of a negative cycle
You can then use the pre array to find the negative cycles. Start with pre[source] and work your way back.

The fact that it is an NP-hard problem doesn't really matter when there are only about 150 currencies currently in existence, and I suspect your FX broker will only let you trade at most 20 pairs anyway. My algorithm for n currencies is therefore:
Make a tree of depth n and branching factor n. The nodes of the tree are currencies and the root of the tree is your starting currency X. Each link between two nodes (currencies) has weight w, where w is the FX rate between the two currencies.
At each node you should also store the cumulative fx rate (calculated by multiplying all the FX rates above it in the tree together). This is the FX rate between the root (currency X) and the currency of this node.
Iterate through all the nodes in the tree that represent currency X (maybe you should keep a list of pointers to these nodes to speed up this stage of the algorithm). There will only be n^n of these (very inefficient in terms of big-O notation, but remember your n is about 20). The one with the highest cumulative FX rate is your best FX rate and (if it is positive) the path through the tree between these nodes represents an arbitrage cycle starting and ending at currency X.
Note that you can prune the tree (and so reduce the complexity from O(n^n) to O(n) by following these rules when generating the tree in step 1:
If you get to a node for currency X, don't generate any child nodes.
To reduce the branching factor from n to 1, at each node generate all n child nodes and only add the child node with the greatest cumulative FX rate (when converted back to currency X).

Imho, there is a simple mathematical structure to this problem that lends itself to a very simple O(N^3) Algorithm. Given a NxN table of currency pairs, the reduced row echelon form of the table should yield just 1 linearly independent row (i.e. all the other rows are multiples/linear combinations of the first row) if no arbitrage is possible.
We can just perform gaussian elimination and check if we get just 1 linearly independent row. If not, the extra linearly independent rows will give information about the number of pairs of currency available for arbitrage.

Take the log of the conversion rates. Then you are trying to find the cycle starting at X with the largest sum in a graph with positive, negative or zero-weighted edges. This is an NP-hard problem, as the simpler problem of finding the largest cycle in an unweighted graph is NP-hard.

Unless I totally messed this up, I believe my implementation works using Bellman-Ford algorithm:
#include <algorithm>
#include <cmath>
#include <iostream>
#include <vector>
std::vector<std::vector<double>> transform_matrix(std::vector<std::vector<double>>& matrix)
{
int n = matrix.size();
int m = matrix[0].size();
for (int i = 0; i < n; ++i)
{
for (int j = 0; j < m; ++j)
{
matrix[i][j] = log(matrix[i][j]);
}
}
return matrix;
}
bool is_arbitrage(std::vector<std::vector<double>>& currencies)
{
std::vector<std::vector<double>> tm = transform_matrix(currencies);
// Bellman-ford algorithm
int src = 0;
int n = tm.size();
std::vector<double> min_dist(n, INFINITY);
min_dist[src] = 0.0;
for (int i = 0; i < n - 1; ++i)
{
for (int j = 0; j < n; ++j)
{
for (int k = 0; k < n; ++k)
{
if (min_dist[k] > min_dist[j] + tm[j][k])
min_dist[k] = min_dist[j] + tm[j][k];
}
}
}
for (int j = 0; j < n; ++j)
{
for (int k = 0; k < n; ++k)
{
if (min_dist[k] > min_dist[j] + tm[j][k])
return true;
}
}
return false;
}
int main()
{
std::vector<std::vector<double>> currencies = { {1, 1.30, 1.6}, {.68, 1, 1.1}, {.6, .9, 1} };
if (is_arbitrage(currencies))
std::cout << "There exists an arbitrage!" << "\n";
else
std::cout << "There does not exist an arbitrage!" << "\n";
std::cin.get();
}

Finding composite numbers

I have a range of random numbers. The range is actually determined by the user but it will be up to 1000 integers. They are placed in this:
vector<int> n
and the values are inserted like this:
srand(1);
for (i = 0; i < n; i++)
v[i] = rand() % n;
I'm creating a separate function to find all the non-prime values. Here is what I have now, but I know it's completely wrong as I get both prime and composite in the series.
void sieve(vector<int> v, int n)
{
int i,j;
for(i = 2; i <= n; i++)
{
cout << i << " % ";
for(j = 0; j <= n; j++)
{
if(i % v[j] == 0)
cout << v[j] << endl;
}
}
}
This method typically worked when I just had a series of numbers from 0-1000, but it doesn't seem to be working now when I have numbers out of order and duplicates. Is there a better method to find non-prime numbers in a vector? I'm tempted to just create another vector, fill it with n numbers and just find the non-primes that way, but would that be inefficient?
Okay, since the range is from 0-1000 I am wondering if it's easier to just create vector with 0-n sorted, and then using a sieve to find the primes, is this getting any closer?
void sieve(vector<int> v, BST<int> t, int n)
{
vector<int> v_nonPrime(n);
int i,j;
for(i = 2; i < n; i++)
v_nonPrime[i] = i;
for(i = 2; i < n; i++)
{
for(j = i + 1; j < n; j++)
{
if(v_nonPrime[i] % j == 0)
cout << v_nonPrime[i] << endl;
}
}
}

In this code:
if(i % v[j] == 0)
cout << v[j] << endl;
You are testing your index to see if it is divisible by v[j]. I think you meant to do it the other way around, i.e.:
if(v[j] % i == 0)
Right now, you are printing random divisors of i. You are not printing out random numbers which are known not to be prime. Also, you will have duplicates in your output, perhaps that is ok.

First off, I think Knuth said it first: premature optimization is the cause of many bugs. Make the slow version first, and then figure out how to make it faster.
Second, for your outer loop, you really only need to go to sqrt(n) rather than n.

Basically, you have a lot of unrelated numbers, so for each one you will have to check if it's prime.
If you know the range of the numbers in advance, you can generate all prime numbers that can occur in that range (or the sqrt thereof), and test every number in your container for divisibility by any one of the generated primes.
Generating the primes is best done by the Erathostenes Sieve - many examples to be found of that algorithm.

You should try using a prime sieve. You need to know the maximal number for creating the sieve (O(n)) and then you can build a set of primes in that range (O(max_element) or as the problem states O(1000) == O(1))) and check whether each number is in the set of primes.

Your code is just plain wrong. First, you're testing i % v[j] == 0, which is backwards and also explains why you get all numbers. Second, your output will contain duplicates as you're testing and outputting each input number every time it fails the (broken) divisibility test.
Other suggestions:
Using n as the maximum value in the vector and the number of elements in the vector is confusing and pointless. You don't need to pass in the number of elements in the vector - you just query the vector's size. And you can figure out the max fairly quickly (but if you know it ahead of time you may as well pass it in).
As mentioned above, you only need to test to sqrt(n) [where n is the max value in the vecotr]
You could use a sieve to generate all primes up to n and then just remove those values from the input vector, as also suggested above. This may be quicker and easier to understand, especially if you store the primes somewhere.
If you're going to test each number individually (using, I guess, and inverse sieve) then I suggest testing each number individually, in order. IMHO it'll be easier to understand than the way you've written it - testing each number for divisibility by k < n for ever increasing k.

The idea of the sieve that you try to implement depends on the fact that you start at a prime (2) and cross out multitudes of that number - so all numbers that depend on the prime "2" are ruled out beforehand.
That's because all non-primes can be factorized down to primes. Whereas primes are not divisible with modulo 0 unless you divide them by 1 or by themselves.
So, if you want to rely on this algorithm, you will need some mean to actually restore this property of the algorithm.

Your code seems to have many problems:
If you want to test if your number is prime or non-prime, you would need to check for v[j] % i == 0, not the other way round
You did not check if your number is dividing by itself
You keep on checking your numbers again and again. That's very inefficient.
As other guys suggested, you need to do something like the Sieve of Eratosthenes.
So a pseudo C code for your problem would be (I haven't run this through compilers yet, so please ignore syntax errors. This code is to illustrate the algorithm only)
vector<int> inputNumbers;
// First, find all the prime numbers from 1 to n
bool isPrime[n+1] = {true};
isPrime[0]= false;
isPrime[1]= false;
for (int i = 2; i <= sqrt(n); i++)
{
if (!isPrime[i])
continue;
for (int j = 2; j <= n/i; j++)
isPrime[i*j] = false;
}
// Check the input array for non-prime numbers
for (int i = 0; i < inputNumbers.size(); i++)
{
int thisNumber = inputNumbers[i];
// Vet the input to make sure we won't blow our isPrime array
if ((0<= thisNumber) && (thisNumber <=n))
{
// Prints out non-prime numbers
if (!isPrime[thisNumber])
cout<< thisNumber;
}
}

sorting the number first might be a good start - you can do that in nLogN time. That is a small addition (I think) to your other problem - that of finding if a number is prime.
(actually, with a small set of numbers like that you can do a sort much faster with a copy of the size of the vector/set and do a hash/bucket sort/whatever)
I'd then find the highest number in the set (I assume the numbers can be unbounded - no know upper limit until your sort - or do a single pass to find the max)
then go with a sieve - as others have said

Jeremy is right, the basic problem is your i % v[j] instead of v[j] % i.
Try this:
void sieve(vector<int> v, int n) {
int i,j;
for(j = 0; j <= n; j++) {
cout << v[j] << ": ";
for(i = 2; i < v[j]; i++) {
if(v[j] % i == 0) {
cout << "is divisible by " << i << endl;
break;
}
}
if (i == v[j]) {
cout << "is prime." << endl;
}
}
}
It's not optimal, because it's attempting to divide by all numbers less than v[j] instead of just up to the square root of v[j]. And it is attempting dividion by all numbers instead of only primes.
But it will work.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js