Checking efficiently if three binary vectors are linearly independent over finite field - c++

I am given three binary vectors v1, v2, v3 represented by unsigned int in my program and a finite field F, which is also a set of binary vectors. I need to check if the vectors are linearly independent that is there are no f1, f2 in F such that f1*v1 +f2*v2 = v3.
The immediate brute force solution is to iterate over the field and check all possible linear combinations.
Does there exist a more efficient algorithm?
I'd like to emphasize two points:
The field elements are vectors, not scalars. Therefor,e a product of a field element f1 and a given vector vi is a dot product. So the Gaussian elimination does not work (if I am not missing something)
the field is finite, so if I find that f1*v1 +f2*v2 = v3 for some f1,f2 it does not mean that f1,f2 belong to F.

If vectors are in r^2, then they are automatically dependent because when we make a matrix of them and reduce it to echelon form, there will be atleast one free variable(in this case only one).
If vectors are in R^3, then you can make a matrix from them i. a 2d array and then you can take determinant of that matrix. If determinant is equal to 0 then vectors are linearly dependent otherwise not.
If vectors are in R^4,R^5 and so on the then the appropriate way is to reduce matrix into echelon form.

For any finite set of M vectors defined in a space of dimension N, they are linearly independent iff the rank of a MxN matrix constructed by stacking these vectors row by row has rank equal to M.
Regarding numerically stable computation involving linear algebra, the singular value decomposition is usually the way to go and there are plenty of implementations available out there. The key point in this context is to realize the rank of a matrix equals the number of its non zero singular values. One must however note, that due to floating point approximations, a finite precision must be chosen to decide whether a value is effectively zero.
Your question mentions your vectors are defined in the set of integers and that certainly can be taken advantage of to overcome the finite precision of floating point computations, but I would not know how. Maybe somebody out there could help us out?

Gaussian elimination does work if you do it inside the finite field.
For binary it should be quite simple, because inverse element is trivial.
For larger finite fields, you will need somehow to find inverse elements, that may turns into a separate problem.

Related

How should I compute the null space of a rectangular sparse matrix over GF(2) in C/C++?

UPDATE: I ended up not using Eigen and implementing my own GF(2) matrix representation where each row is an array of integers, and each bit of the integer represents a single entry. I then use a modified Gaussian Elimination with bit operations to obtain the desired vectors
I currently have a (large) rectangular sparse matrix that I'm storing using Eigen3 that I want to find the (right) null space over GF(2). I researched around and found some possible approaches to this:
(Modified) Gaussian Elimination
This means simply using some form of Gaussian Elimination to find a reduced form of the matrix that preserves the nullspace then extract the nullspace off of that. Though I know how I would do this by hand, I'm quite clueless as to how I would actually implement this.
SVD Decomposition
QR Decomposition
I'm not familiar with these, but I from my understanding the (orthonormal) basis vectors of the nullspace can be extracted from the decomposed form of the matrix.
Now my question is: Which approach should I use in my case (i.e. rectangular sparse matrix over GF(2)) that doesn't involve converting into a dense matrix? And if there are many approaches, what would recommended in terms of performance and ease of implementation?
I'm also open to using other libraries besides Eigen as well.
For context, I'm trying to find combine equivalence relations for factoring algorithms (e.g. as in Quadratic Sieve). Also, if possible, I would like to look into parallelising these algorithms in the future, so if there exists an approach that would allow this, that would be great!
Let's call the matrix in question M. Then (please correct me if I'm wrong):
GF(2) implies that M is a equivalent to a matrix of bits - each element can have one of two values.
Arithmetic on GF(2) is just like integer arithmetic on non-negative numbers, but done modulo 2, so addition is a bitwise XOR, and multiplication is a bitwise AND. It won't matter what exact elements the GF(2) has - they are all equivalent to bits.
Vectors in GF(2) are linearly independent as long as they are not equal, or as long as they differ by at least on bit, or v_1 + v_2 ≠ 0 (since addition in GF(2) is boolean XOR).
By definition, the (right) nullspace spans basis vectors that the matrix transforms to 0. A vector v would be in the nullspace if one multiplies each j-th column of M with the j-th bit of v, sum them, and the result is zero.
I see at least two ways of going about it.
Do dense Gaussian elimination in terms of bit operations, and organize the data and write the loops so that the compiler vectorizes everything and operates on 512-bit data types. You could use Compiler Explorer on godbolt.org to easily check that the vectorization takes place and e.g. AVX512 instructions are used. Linear gains will eventually lose out with the squared scaling of the problem, of course, but the performance increase over naive bool-based implementation will be massive and may be sufficient for your needs. The sparsity adds a possible complication: if the matrix won't comfortably fit in memory in a dense representation, then a suitable representation has to be devised that makes Gaussian elimination perform well. More is need to be known about the matrices you work on. Generally speaking, row operations will be performed at memory bandwidth if the implementation is correct, on the order of 1E10 elements/s, so a 1E3x1E3 M should process in about a second at most.
Since the problem is equivalent to a set of boolean equations, use a SAT solver (Boolean satisfiability problem solver) to incrementally generate the nullspace. The initial equation set is M × v = 0 and v ≠ 0, where v is a bit vector. Run the SAT until it finds some v, let's call it v_i. Then add a constraint v ≠ v_i, and run SAT again - adding the constraints in each iteration. That is, k-th iteration has constraints v ≠ 0, v ≠ v1, ... v ≠ v(k-1).
Since all bit vectors that are different are also linearly independent, the inequality constraints will force incremental generation of nullspace basis vectors.
Modern SAT excels at sparse problems with more boolean equations than variables, so I imagine this would work very well - the sparser the matrix, the better. The problem should be pre-processed to remove all zero columns in M to minimize the combinatorial explosion. Open source SAT solvers can easily deal with 1M variable problems - so, for a sparse problem, you could be realistically solving with 100k-1M columns in M, and about 10 "ones" in each row. So a 1Mx1M sparse matrix with 10 "ones" in each row on average would be a reasonable task for common SAT solvers, and I imagine that state of the art could deal with 10Mx10M matrices and beyond.
Furthermore, your application is ideal for incremental solvers: you find one solution, stop, add a constraint, resume, and so on. So I imagine you may get very good results, and there are several good open source solvers to choose from.
Since you use Eigen already, the problem would at least fit into the SparseMatrix representation with byte-sized elements, so it's not a very big problem as far as SAT is concerned.
I wonder whether this nullspace basis finding is a case of a cover problem, possibly relaxed. There are some nice algorithms for those, but it's always a question of whether the specialized algorithm will work better than just throwing SAT at it and waiting it out, so to speak.
Updated answer - thanks to harold: QR decomposition is not applicable in general for your case.
See for instance
https://math.stackexchange.com/questions/1346664/how-to-find-orthogonal-vectors-in-gf2
I wrongly assumed, QR is applicable here, but it's not by theory.
If you are still interested in details about QR-algorithms, please open a new thread.

Find all pairwise differences in an array of distinct integers less than 1e5

Given an array of distinct positive integers ≤ 105 , I need to find differences of all pairs.
I don't really need to count frequency of every difference, just unique differences.
Using brute force, this can be approached by checking all possible pairs. However, this would not be efficient enough considering the size of array (as all elements are distinct so the maximum size is 105 ). This would lead to O (n2) complexity.
I need to exploit the property of this array that the differences are ≤ 105
So my another approach :
The array elements can be represented using another hash array where the indices representing array elements will be 1 and rest will be 0.
This hash array is represented as a polynomial with all coefficients as 1 and exponents as respective hash values.
Now clone this polynomial and make another polynomial with exponents negated.
If now these polynomials are multiplied, all the positive exponents in the result correspond to differences required.
However this multiplication is something that I am not certain how to efficiently implement. I think FFT can be used as it helps multiply two polynomials in O(n log n) complexity. But it requires positive exponents.
Please provide with suggestions on how to proceed now?
I also came across this algorithm, which uses FFT to find pairwise differences in O(n log n), however I can't understand how the algorithm is working. It seems that it is trying to find all possible sums.
A proof of this algorithm would be appreciated.

choice between map or unordered_map for keys consisting of calculated double values.

In order to quickly find coplanar entities in a bunch of planar entities in 3d space, I want to create a mapping from 3d planes to the set of entities lying in that plane (estimated max around ~1000 planes and ~100000 entities)
I can create my own custom class to represent 3D-plane keys, but basically this class needs (mathematically) four doubles to uniquely identify a plane: 3 coordinates for the normal vector and one coordinate to specify a point on the plane. So we have:
struct Plane3d
{
double n_x, n_y, n_z;
double u; // (u*n_x, u*n_y, u*n_z) is a point on the plane
// some constructors etc.
}
These four doubles are each time calculated from the entity under consideration, so rounding errors and floating point comparison issues have to be taken into account. Suppose I have calculated a suitable (relative) error tolerance:
const double EPSILON;
Yet I do not want to compare the coplanarity of all pairs of entities one-by-one (categorization in O(n^2) time) but create a map to categorize my entities.
Most ideal would be an unordered_map (creation in O(n) time):
unordered_map<Plane3d, vector<Entity>, PlaneHash, PlaneEquality> mapping;
This would require to write two functors: PlaneEquality is no problem, but...
is it possible to write a Hash function for four doubles (or even just a regular double) that takes a comparison error tolerance into account.
in the specification I've read that the equality functor is only used to distinguish equal keys inside the same bucket. Is there a way to ensure that equal keys actually end up in the same bucket?
The other option is to use a normal map (still creation in O(n log n) time)
map<Plane3d, vector<Entity>, PlaneCompare> mapping;
The PlaneCompare functor sounds feasible, I could use a lexicograpic ordering of the four doubles and check each "less than" using EPSILON. But I still have a few questions:
Would this actually work? Are there any pitfalls?
The specification says that equality of keys, say p1 and p2, is determined by the test !PlaneCompare(p1,p2) && !PlaneCompare(p2,p1). If I used the lexicographic ordering this should be equivalent with a direct equality test with error tolerance but is this not slower?
"is it possible to write a Hash function for four doubles (or even just a regular double) that takes a comparison error tolerance into account."
No, it is not.
That sounds like a very definite statement, how can I be so sure?
Let's assume you want a tolerance of 0.00001. The value doesn't matter, we're just going to use it for an example. That will mean that for this hash function:
hash(1.0) must return the same value as hash(1.00001)
so that they can be considered equal. But it also means:
hash(1.00001) must return the same value as hash(1.00002)
for the same reason, and
hash(1.00002) must return the same value as hash(1.00003)
...and so on, up to the topmost possible value of double - effectively ad infinitum. The same is true for values below 1.
So any hash function that allows for tolerance will have to return the same hash for all values, making it useless.
P.S. to actually recommend an approach that does work, the four-dimensional quadtree (technically something like sedecimtree) is probably best.
You can round your doubles so e.g. all values in range [n - epsilon, n + epsilon) are rounded to n where n mod epsilon == 0, and hash them. In this case your close values will have the same hash values.
How to better hash 4 doubles depends on your case, but even summing them can be good enough.

Perfect hashing function in a hash table implementation of a sparse matrix class

I'm currently implementing a sparse matrix for my matrix library - it will be a hash table. I already implemented a dense matrix as a nested vector, and since I'm doing it just to learn new stuff, I decided that my matrices will be multi-dimensional (not just a 2D table of numbers, but also cubes, tesseracts, etc).
I use an index type which holds n numbers of type size_t, n being a number of dimensions for this particular index. Index of dimension n may be used only as an address of an element in a matrix of appropriate dimension. It is simply a tuple with implicit constructor for easy indexing, like Matrix[{1,2,3}].
My question is centered around the hashing function I plan on using for my sparse matrix implementation. I think that the function is always minimal, but is perfect only up to a certain point - to the point of size_t overflow, or an overflow of intermediate operation of the hashing function (they are actually unsigned long long). Sparse matrices have huge boundaries, so it's practically guaranteed to overflow at some point (see below).
What the hashing function does is assign consecutive numbers to matrix elements as follows:
[1 2 3 4 5 6 7 8 ...]^T //transposed 1-dimensional matrix
1 4 7
2 5 8
3 6 9 //2-dimensional matrix
and so on. Unfortunately, I'm unable show you the ordering for higher order matrices, but I hope that you get the idea - the value increases top to bottom, left to right, back to front (for cube matrices), etc.
The hashing function is defined like this:
value = x1+d1*x2+d1*d2*x3+d1*d2*d3*x3+...+d1*d2*d3*...*d|n-1|*xn
where:
x1...xn are index members - row, column, height, etc - {x1, x2, x3,
..., xn}
d1...d|n-1| are matrix boundary dimensions - one past the end of matrix in the appropriate direction
I'm actually using a recursive form of this function (simple factoring, but complexity becomes O(n) instead of O(n^2)):
value = x1+d1(x2+d2(x3+d3(...(x|n-1|+d|n-1|(xn))...)))
I'm assuming that elements will be distributed randomly (uniform) across the matrix, but the bucket number is hash(key) mod numberOfBuckets, so it is practically guaranteed to have collisions despite the fact, that the function is perfect.
Is there any way to exploit the features of my hash function in order to minimize collisions?
How should I choose the load factor? Should I leave the choice to the user? Is there any good default value for this case?
Is the hash table actually a good solution for this problem? Are there any other data structures that guarantee average O(1) complexity given that I can roll the index into a number and a number into an index (mind the size_t overflow)? I am aware of different ways to store a sparse matrix, but I want to try the DOK way first.

Determinant Value For Very Large Matrix

I have a very large square matrix of order around 100000 and I want to know whether the determinant value is zero or not for that matrix.
What can be the fastest way to know that ?
I have to implement that in C++
Assuming you are trying to determine if the matrix is non-singular you may want to look here:
https://math.stackexchange.com/questions/595/what-is-the-most-efficient-way-to-determine-if-a-matrix-is-invertible
As mentioned in the comments its best to use some sort of BLAS library that will do this for you such as Boost::uBLAS.
Usually, matrices of that size are extremely sparse. Use row and column reordering algorithms to concentrate the entries near the diagonal and then use a QR decomposition or LU decomposition. The product of the diagonal entries of the second factor is - up to a sign in the QR case - the determinant. This may still be too ill-conditioned, the best result for rank is obtained by performing a singular value decomposition. However, SVD is more expensive.
There is a property that if any two rows are equal or one row is a constant multiple of another row we can say that determinant of that matrix is zero.It is applicable to columns as well.
From my knowledge your application doesnt need to calculate determinant but the rank of matrix is sufficient to check if system of equations have non-trivial solution : -
Rank of Matrix