Interesting Problem (Currency arbitrage) - c++

Arbitrage is the process of using discrepancies in currency exchange values to earn profit.
Consider a person who starts with some amount of currency X, goes through a series of exchanges and finally ends up with more amount of X(than he initially had).
Given n currencies and a table (nxn) of exchange rates, devise an algorithm that a person should use to avail maximum profit assuming that he doesn't perform one exchange more than once.
I have thought of a solution like this:
Use modified Dijkstra's algorithm to find single source longest product path.
This gives longest product path from source currency to each other currency.
Now, iterate over each other currency and multiply to the maximum product so far, w(curr,source)(weight of edge to source).
Select the maximum of all such paths.
While this appears good, i still doubt of correctness of this algorithm and the completeness of the problem.(i.e Is the problem NP-Complete?) as it somewhat resembles the traveling salesman problem.
Looking for your comments and better solutions(if any) for this problem.
Thanks.
EDIT:
Google search for this topic took me to this here, where arbitrage detection has been addressed but the exchanges for maximum arbitrage is not.This may serve a reference.

Dijkstra's cannot be used here because there is no way to modify Dijkstra's to return the longest path, rather than the shortest. In general, the longest path problem is in fact NP-complete as you suspected, and is related to the Travelling Salesman Problem as you suggested.
What you are looking for (as you know) is a cycle whose product of edge weights is greater than 1, i.e. w1 * w2 * w3 * ... > 1. We can reimagine this problem to change it to a sum instead of a product if we take the logs of both sides:
log (w1 * w2 * w3 ... ) > log(1)
=> log(w1) + log(w2) + log(w3) ... > 0
And if we take the negative log...
=> -log(w1) - log(w2) - log(w3) ... < 0 (note the inequality flipped)
So we are now just looking for a negative cycle in the graph, which can be solved using the Bellman-Ford algorithm (or, if you don't need the know the path, the Floyd-Warshall algorihtm)
First, we transform the graph:
for (int i = 0; i < N; ++i)
for (int j = 0; j < N; ++j)
w[i][j] = -log(w[i][j]);
Then we perform a standard Bellman-Ford
double dis[N], pre[N];
for (int i = 0; i < N; ++i)
dis[i] = INF, pre[i] = -1;
dis[source] = 0;
for (int k = 0; k < N; ++k)
for (int i = 0; i < N; ++i)
for (int j = 0; j < N; ++j)
if (dis[i] + w[i][j] < dis[j])
dis[j] = dis[i] + w[i][j], pre[j] = i;
Now we check for negative cycles:
for (int i = 0; i < N; ++i)
for (int j = 0; j < N; ++j)
if (dis[i] + w[i][j] < dis[j])
// Node j is part of a negative cycle
You can then use the pre array to find the negative cycles. Start with pre[source] and work your way back.

The fact that it is an NP-hard problem doesn't really matter when there are only about 150 currencies currently in existence, and I suspect your FX broker will only let you trade at most 20 pairs anyway. My algorithm for n currencies is therefore:
Make a tree of depth n and branching factor n. The nodes of the tree are currencies and the root of the tree is your starting currency X. Each link between two nodes (currencies) has weight w, where w is the FX rate between the two currencies.
At each node you should also store the cumulative fx rate (calculated by multiplying all the FX rates above it in the tree together). This is the FX rate between the root (currency X) and the currency of this node.
Iterate through all the nodes in the tree that represent currency X (maybe you should keep a list of pointers to these nodes to speed up this stage of the algorithm). There will only be n^n of these (very inefficient in terms of big-O notation, but remember your n is about 20). The one with the highest cumulative FX rate is your best FX rate and (if it is positive) the path through the tree between these nodes represents an arbitrage cycle starting and ending at currency X.
Note that you can prune the tree (and so reduce the complexity from O(n^n) to O(n) by following these rules when generating the tree in step 1:
If you get to a node for currency X, don't generate any child nodes.
To reduce the branching factor from n to 1, at each node generate all n child nodes and only add the child node with the greatest cumulative FX rate (when converted back to currency X).

Imho, there is a simple mathematical structure to this problem that lends itself to a very simple O(N^3) Algorithm. Given a NxN table of currency pairs, the reduced row echelon form of the table should yield just 1 linearly independent row (i.e. all the other rows are multiples/linear combinations of the first row) if no arbitrage is possible.
We can just perform gaussian elimination and check if we get just 1 linearly independent row. If not, the extra linearly independent rows will give information about the number of pairs of currency available for arbitrage.

Take the log of the conversion rates. Then you are trying to find the cycle starting at X with the largest sum in a graph with positive, negative or zero-weighted edges. This is an NP-hard problem, as the simpler problem of finding the largest cycle in an unweighted graph is NP-hard.

Unless I totally messed this up, I believe my implementation works using Bellman-Ford algorithm:
#include <algorithm>
#include <cmath>
#include <iostream>
#include <vector>
std::vector<std::vector<double>> transform_matrix(std::vector<std::vector<double>>& matrix)
{
int n = matrix.size();
int m = matrix[0].size();
for (int i = 0; i < n; ++i)
{
for (int j = 0; j < m; ++j)
{
matrix[i][j] = log(matrix[i][j]);
}
}
return matrix;
}
bool is_arbitrage(std::vector<std::vector<double>>& currencies)
{
std::vector<std::vector<double>> tm = transform_matrix(currencies);
// Bellman-ford algorithm
int src = 0;
int n = tm.size();
std::vector<double> min_dist(n, INFINITY);
min_dist[src] = 0.0;
for (int i = 0; i < n - 1; ++i)
{
for (int j = 0; j < n; ++j)
{
for (int k = 0; k < n; ++k)
{
if (min_dist[k] > min_dist[j] + tm[j][k])
min_dist[k] = min_dist[j] + tm[j][k];
}
}
}
for (int j = 0; j < n; ++j)
{
for (int k = 0; k < n; ++k)
{
if (min_dist[k] > min_dist[j] + tm[j][k])
return true;
}
}
return false;
}
int main()
{
std::vector<std::vector<double>> currencies = { {1, 1.30, 1.6}, {.68, 1, 1.1}, {.6, .9, 1} };
if (is_arbitrage(currencies))
std::cout << "There exists an arbitrage!" << "\n";
else
std::cout << "There does not exist an arbitrage!" << "\n";
std::cin.get();
}

Related

Finding most similar Range in an Array

I am finding A[i..j] that is the most similar to B.
Here calcSimilarity is function that returns similarity of two arrays.
Similarity is calculated as
Not than brute force search, I want to know what kind of data structure and algorithm is efficient in range search.
SAMPLE input/output
input: A: [(10,1), (20,1), (-200,2), (33,1), (42,1), (58,1)] B:[(20,1), (30,1), (1000,2)]
output: most similar Range is [1, 3]
match [20, 33] => [20, 30]
This is brute force search code.
struct object{
int type, value;
}A[10000],B[100];
int N, M;
int calcSimilarity(object X[], n, object Y[], m){
if(n > m) return calcSimilarity(Y, m, X, n);
for(all possible match){//match is (i, link[i])
int minDif = 0x7ffff;
int count = 0;
for( i = 0; i< n; i++){
int j = link[i];
int similar = similar(X[i], Y[j]);
minDif = min(similar, minDif);
}
}
if(count == 0) return 0x7fffff;
return minDif/pow(count,3);
}
find_most_similar_range(){
int minSimilar = 0x7fffff, minI, minJ;
for( i = 0; i < N; i ++){
for(j = i+1; j < N; j ++){
int similarity = calcSimilarity(A + i, j-i, B, M);
if (similarity < minSimilar)
{
minSimilar = similarity;
minI= i;
minJ = j;
}
}
}
printf("most similar Range is [%d, %d]", minI, minJ);
}
it will take O((N^M) * (N^2)).
That looks like the Big-O of the find similarity is N^2. With the pairwise comparison of each element.
So it looks more like
The pairwise comparison is M*(M-1). Each list has to be tested against each other list or about M^2.
This is a problem which has been solved for clustering, and there are data structures (e.g. Metric Tree), which allow the distances between similar objects to be stored in a tree.
When looking for the N closest neighbours, the search of this tree limits the number of pairwise comparisons needed and results in a O( ln(M) ) form
The downside of this particular tree, is the similarity measure needs to be metric. Where the distance between A and B, and the distance between B and C allows inferences to be made about the distance range of A and C.
If your similarity measure is not metric, then this can't be done.
Jaccard distance is a metric of distance which allows it to be placed in a Metric tree.

Modulo operation for max operation in matrix multiplication

I have a function that find maximum value for a range :
A,B,C are 2 d matrices
void solve()
{
for (i = 0; i < n; i++)
{
for (j = 0; j < n; j++)
{
C[i][j] = 0;
for (k = 0; k < n; k++)
{
C[i][j] = max(C[i][j],A[i][k]*B[k][j]);
//C[i][j] can become very large if solve() is called multiple times
}
}
}
for(int i = 0;i<n;i++)
{
for(int j=0;j<n;j++)
{
A[i][j] = C[i][j];
}
}
}
solve() method can be called for large number of times (10^7)
A[i][j] , B[i][j], C[i][j] can be 10^9.
n will be small (about 20).
I need to print final matrix C with modulo m (i.e. C[i][j]%m)
Since we cannot apply mod for intermediate results (it can produce wrong results).
The problem is integer overflow since it can cross max of int and long.
Any suggestions to solve this problem (Any solution other than big int) ?
Since this is competitive programming you should think out of the box. Since you need to calculate the actual maximum and not the maximum of modulus, you can't use modulo while processing. However:
You are only doing multiplications and you are not worried about the actual value but rather the comparison between different results (to know which is bigger). You can use isomorphism. That is, calculate the log() of the numbers and intermediate results and keep the modulus as auxiliar information. You can do this because ab < cd <=> log(ab) < log(cd) <=> log(a) + log(b) < log(c) + log(d). So now you only have to do additions between numbers and the value will remain quite small. You will lose some precision but that should be fine considering the context. The problem is that you won't be able to reconstruct the modulus from the log, so you should keep the mod value in a struct or something.

Find similar distances between all values in vector and subset them

Given is a vector with double values. I want to know which distances between any elements of this vector have a similar distance to each other. In the best case, the result is a vector of subsets of the original values where subsets should have at least n members.
//given
vector<double> values = {1,2,3,4,8,10,12}; //with simple values as example
//some algorithm
//desired result as:
vector<vector<double> > subset;
//in case of above example I would expect some result like:
//subset[0] = {1,2,3,4}; //distance 1
//subset[1] = {8,10,12}; //distance 2
//subset[2] = {4,8,12}; // distance 4
//subset[3] = {2,4}; //also distance 2 but not connected with subset[1]
//subset[4] = {1,3}; //also distance 2 but not connected with subset[1] or subset[3]
//many others if n is just 2. If n is 3 (normally the minimum) these small subsets should be excluded.
This example is simplified as the distances of integer numbers could be iterated and tested for the vector which is not the case for double or float.
My idea so far
I thought of something like calculating the distances and storing them in a vector. Creating a difference distance matrix and thresholding this matrix for some tolerance for similar distances.
//Calculate distances: result is a vector
vector<double> distances;
for (int i = 0; i < values.size(); i++)
for (int j = 0; j < values.size(); j++)
{
if (i >= j)
continue;
distances.push_back(abs(values[i] - values[j]));
}
//Calculate difference of these distances: result is a matrix
Mat DiffDistances = Mat::zero(Size(distances.size(), distances.size()), CV_32FC1);
for (int i = 0; i < distances.size(); i++)
for (int j = 0; j < distances.size(); j++)
{
if (i >= j)
continue;
DiffDistances.at<float>(i,j) = abs(distances[i], distances[j]);
}
//threshold this matrix with some tolerance in difference distances
threshold(DiffDistances, DiffDistances, maxDistTol, 255, CV_THRESH_BINARY_INV);
//get points with similar distances
vector<Points> DiffDistancePoints;
findNonZero(DiffDistances, DiffDistancePoints);
At this point I get stuck with finding the original values corresponding to my similar distances. It should be possible to find them, but it seems very complicated to trace back the indices and I wonder if there isn't an easier way to solve the problem.
Here is a solution that works, as long as there are no branches meaning, that there are no values closer together than 2*threshold. That is the valid neighbor region because neighboring bonds should differ by less than the threshold, if I understood #Phann correctly.
The solution is definitively neither the fastest nor the nicest possible solution. But you might use it as a starting point:
#include <iostream>
#include <vector>
#include <algorithm>
int main(){
std::vector< double > values = {1,2,3,4,8,10,12};
const unsigned int nValues = values.size();
std::vector< std::vector< double > > distanceMatrix(nValues - 1);
// The distanceMatrix has a triangular shape
// First vector contains all distances to value zero
// Second row all distances to value one for larger values
// nth row all distances to value n-1 except those already covered
std::vector< std::vector< double > > similarDistanceSubsets;
double threshold = 0.05;
std::sort(values.begin(), values.end());
for (unsigned int i = 0; i < nValues-1; ++i) {
distanceMatrix.at(i).resize(nValues-i-1);
for (unsigned j = i+1; j < nValues; ++j){
distanceMatrix.at(i).at(j-i-1) = values.at(j) - values.at(i);
}
}
for (unsigned int i = 0; i < nValues-1; ++i) {
for (unsigned int j = i+1; j < nValues; ++j) {
std::vector< double > thisSubset;
double thisDist = distanceMatrix.at(i).at(j-i-1);
// This distance already belongs to another cluster
if (thisDist < 0) continue;
double minDist = thisDist - threshold;
double maxDist = thisDist + threshold;
thisSubset.push_back(values.at(i));
thisSubset.push_back(values.at(j));
//Indicate that this is already clustered
distanceMatrix.at(i).at(j-i-1) = -1;
unsigned int lastIndex = j;
for (unsigned int k = j+1; k < nValues; ++k) {
thisDist = distanceMatrix.at(lastIndex).at(k-lastIndex-1);
// This distance already belongs to another cluster
if (thisDist < 0) continue;
// Check if you found a new valid pair
if ((thisDist > minDist) && (thisDist < maxDist)){
// Update the valid distance interval
minDist = thisDist - threshold;
minDist = thisDist - threshold;
// Add the newly found point
thisSubset.push_back(values.at(k));
// Indicate that this is already clustered
distanceMatrix.at(lastIndex).at(k-lastIndex-1) = -1;
// Continue the search from here
lastIndex = k;
}
}
if (thisSubset.size() > 2) {
similarDistanceSubsets.push_back(thisSubset);
}
}
}
for (unsigned int i = 0; i < similarDistanceSubsets.size(); ++i) {
for (unsigned int j = 0; j < similarDistanceSubsets.at(i).size(); ++j) {
std::cout << similarDistanceSubsets.at(i).at(j);
if (j != similarDistanceSubsets.at(i).size()-1) {
std::cout << " ";
}
else {
std::cout << std::endl;
}
}
}
}
The idea is to precompute the distances and then look for every pair of particles, starting from the smallest and its larger neighbors, if there is another valid pair above it. If so these are all collected in a subset and this is added to the subset vector. For every new value the valid neighbor region has to be updated to ensure that neighboring distances differ by less than the threshold. Afterwards, the program continues with the next smallest value and its larger neighbors and so on.
Here is an algorithm which is slightly different from yours, which is O(n^3) in the length n of the vector - not very efficient.
It is based on the premise that you want to have subsets of at least size 2. So what you can do is consider all the two-element subsets of the vector, then find all other elements that also match.
So given a function
std::vector<int> findSubset(std::vector<int> v, int baseValue, int distance) {
// Find the subset of all elements in v that differ by a multiple of
// distance from the base value
}
you can do
std::vector<std::vector<int>> findSubsets(std::vector<int> v) {
for(int i = 0; i < v.size(); i++) {
for(int j = i + 1; j < v.size(); j++) {
subsets.push_back(findSubset(v, v[i], abs(v[i] - v[j])));
}
}
return subsets;
}
Only remaining problem is keeping track of the duplicates, maybe you can keep a hashed list of (baseValue % distance, distance) pairs for all the subsets you have already found.

C++ algorithm optimization: find K combination from N elements

I am pretty noobie with C++ and am trying to do some HackerRank challenges as a way to work on that.
Right now I am trying to solve Angry Children problem: https://www.hackerrank.com/challenges/angry-children
Basically, it asks to create a program that given a set of N integer, finds the smallest possible "unfairness" for a K-length subset of that set. Unfairness is defined as the difference between the max and min of a K-length subset.
The way I'm going about it now is to find all K-length subsets and calculate their unfairness, keeping track of the smallest unfairness.
I wrote the following C++ program that seems to the problem correctly:
#include <cmath>
#include <cstdio>
#include <iostream>
using namespace std;
int unfairness = -1;
int N, K, minc, maxc, ufair;
int *candies, *subset;
void check() {
ufair = 0;
minc = subset[0];
maxc = subset[0];
for (int i = 0; i < K; i++) {
minc = min(minc,subset[i]);
maxc = max(maxc, subset[i]);
}
ufair = maxc - minc;
if (ufair < unfairness || unfairness == -1) {
unfairness = ufair;
}
}
void process(int subsetSize, int nextIndex) {
if (subsetSize == K) {
check();
} else {
for (int j = nextIndex; j < N; j++) {
subset[subsetSize] = candies[j];
process(subsetSize + 1, j + 1);
}
}
}
int main() {
cin >> N >> K;
candies = new int[N];
subset = new int[K];
for (int i = 0; i < N; i++)
cin >> candies[i];
process(0, 0);
cout << unfairness << endl;
return 0;
}
The problem is that HackerRank requires the program to come up with a solution within 3 seconds and that my program takes longer than that to find the solution for 12/16 of the test cases. For example, one of the test cases has N = 50 and K = 8; the program takes 8 seconds to find the solution on my machine. What can I do to optimize my algorithm? I am not very experienced with C++.
All you have to do is to sort all the numbers in ascending order and then get minimal a[i + K - 1] - a[i] for all i from 0 to N - K inclusively.
That is true, because in optimal subset all numbers are located successively in sorted array.
One suggestion I'd give is to sort the integer list before selecting subsets. This will dramatically reduce the number of subsets you need to examine. In fact, you don't even need to create subsets, simply look at the elements at index i (starting at 0) and i+k, and the lowest difference for all elements at i and i+k [in valid bounds] is your answer. So now instead of n choose k subsets (factorial runtime I believe) you just have to look at ~n subsets (linear runtime) and sorting (nlogn) becomes your bottleneck in performance.

compute error from linear system - extract every column

I want to compute the error in linear least squares method.
I have matrices A,B and X. (AX=B).
Sizes are : A(NxN) , B(NxNRHS) , X(N,NRHS) ,where NRHS is number of right hand side.
The error is computed as sqrt(sum(B-AX)).
But I must take into account every column of B and X in order to make the substraction.
I must substract B[i]-A[..]X[i] -> where i is every column of B and X.
I can't figure how to do it ,hence how to extract every column.I can't find the right indices for B and X matrices (I think) ,because I must go beyond whole A matrix and only beyond every column of B and X.
I am doing something like this (using column major order):
int N=128;
int NRHS =1;
int Asize=N*N;
int Bsize=N*NRHS;
int Xsize=N*NRHS;
A=(double*)malloc(Asize*sizeof(double));
B=(double*)malloc(Bsize*sizeof(double));
X=(double*)malloc(Xsize*sizeof(double));
...
for(int i = 0; i < N; i++)
{
for (int j=0;j<NRHS; j++){
diff[i+j*N] = fabs(B[i+j*N] - A[i+j*N]*X[i+j*N]);
abs_error=sqrt(sums(diff,N));
}
}
I thought of adding some statement using the modulo operator but I couldn't figure.
sums is just a function which gives the sum of an array where the second argument is the number of elements.
You could first do a matrix multiplication of A and X using loops.
Then you could write another 2 loops to compute the difference (B - AX). This would simply your problem.
Edit
After you compute the product of A and X, assuming that you store the product in a variable named AX,the following code will give you the difference between corresponding elements.
differenceMatrix = (double*)malloc(Bsize*sizeof(double));
for(int i = 0; i < N; i++)
{
for (int j = 0; j < NRHS; j++){
differenceMatrix[i+j*N] = fabs(B[i+j*N] - AX[i+j*N]);
}
}
Each column of the differenceMatrix contains the difference between corresponding elements.
Edit
To obtain the sum of difference of each column
double sumOfDifferencePerColumn;
for(int i = 0; i < N; i++)
{
sumOfDifferencePerColumn = 0.0;
for (int j = 0; j < NRHS; j++){
sumOfDifferencePerColumn += ( fabs(B[i+j*N] - AX[i+j*N]) );
}
// add code to take square root or use the sum of difference of each column
}