How to check if elements form a primitive system of a lattice - primitive

Given a collection of vectors $V = {v_1, v_2, ..., v_k}$ belonging to a lattice $L$ with basis $B$, is there an efficient procedure that can determine whether or not $V$ forms a primitive system for $L$? This means that if $L$ has rank $n \geq k$, you can extend $V$ by adding $n-k$ vectors such that the resulting set is a basis for $L$.

Related

Is there an algorithm/way to find out how different (or the minimum distance between) 2 list orders?

I have a bunch of items I want to rate in a specific order. For example:
["Person1", "Person2", "Person3", "Person4", "Person5"]
Which can be ordered like this:
["Person4", "Person5", "Person3", "Person1", "Person2"]
Given 2 different orders of the same list, is there a way to quantify how difference they are?
I know Levenshtein distance exists for strings, and I'm looking for something similar.
My ideal measurement for distance would be the minimum number of switches between two adjacent items required to change one list to the other - but I'm open to other algorithms if you think they're better.
The answer I'm looking for is an algorithm (and preferably, a [Python] implementation) to perform this kind of measurement (fast).
Thanks in advance!
To quantify how "different" two strings are, as you already noted, you can use Levenshtein distance, which is implemented in this library:
pip install levenshtein
>>> import Levenshtein
>>> Levenshtein.distance("lewenstein", "levenshtein")
2
To determine how "different" two lists are, you could assign each value in the list to a Unicode character.
import Levenshtein
def list_distance(A, B):
# Assign each unique value of the list to a unicode character
unique_map = {v:chr(k) for (k,v) in enumerate(set(A+B))}
# Create string versions of the lists
a = ''.join(list(map(unique_map.get, A)))
b = ''.join(list(map(unique_map.get, B)))
return Levenshtein.distance(a, b)
A = ["Person1", "Person2", "Person3", "Person4", "Person5"]
B = ["Person4", "Person5", "Person3", "Person1", "Person2"]
list_distance(A, B)
returns 4.
This works by making a unique mapping to arbitrary Unicode characters, for example:
the list A to the string '\x03\x02\x01\x00\x04' and
the list B to the string '\x00\x04\x01\x03\x02',
before taking the Levenshtein distance of the two strings.

How to Use MCMC with a Custom Log-Probability and Solve for a Matrix

The code is in PyMC3, but this is a general problem. I want to find which matrix (combination of variables) gives me the highest probability. Taking the mean of the trace of each element is meaningless because they depend on each other.
Here is a simple case; the code uses a vector rather than a matrix for simplicity. The goal is to find a vector of length 2, where the each value is between 0 and 1, so that the sum is 1.
import numpy as np
import theano
import theano.tensor as tt
import pymc3 as mc
# define a theano Op for our likelihood function
class LogLike_Matrix(tt.Op):
itypes = [tt.dvector] # expects a vector of parameter values when called
otypes = [tt.dscalar] # outputs a single scalar value (the log likelihood)
def __init__(self, loglike):
self.likelihood = loglike # the log-p function
def perform(self, node, inputs, outputs):
# the method that is used when calling the Op
theta, = inputs # this will contain my variables
# call the log-likelihood function
logl = self.likelihood(theta)
outputs[0][0] = np.array(logl) # output the log-likelihood
def logLikelihood_Matrix(data):
"""
We want sum(data) = 1
"""
p = 1-np.abs(np.sum(data)-1)
return np.log(p)
logl_matrix = LogLike_Matrix(logLikelihood_Matrix)
# use PyMC3 to sampler from log-likelihood
with mc.Model():
"""
Data will be sampled randomly with uniform distribution
because the log-p doesn't work on it
"""
data_matrix = mc.Uniform('data_matrix', shape=(2), lower=0.0, upper=1.0)
# convert m and c to a tensor vector
theta = tt.as_tensor_variable(data_matrix)
# use a DensityDist (use a lamdba function to "call" the Op)
mc.DensityDist('likelihood_matrix', lambda v: logl_matrix(v), observed={'v': theta})
trace_matrix = mc.sample(5000, tune=100, discard_tuned_samples=True)
If you only want the highest likelihood parameter values, then you want the Maximum A Posteriori (MAP) estimate, which can be obtained using pymc3.find_MAP() (see starting.py for method details). If you expect a multimodal posterior, then you will likely need to run this repeatedly with different initializations and select the one that obtains the largest logp value, but that still only increases the chances of finding the global optimum, though cannot guarantee it.
It should be noted that at high parameter dimensions, the MAP estimate is usually not part of the typical set, i.e., it is not representative of typical parameter values that would lead to the observed data. Michael Betancourt discusses this in A Conceptual Introduction to Hamiltonian Monte Carlo. The fully Bayesian approach is to use posterior predictive distributions, which effectively averages over all the high-likelihood parameter configurations rather than using a single point estimate for parameters.

Building sucessive union of lists/arrays using comprehension and iteration

Afternoon-I'm currently trying to read up on/implement more clever ways to form the union of sub lists after imposing varying conditions on the original list (or, more generally, any list)-specifically the successive union over what may be a large number of sublists. the data within may be strings, or numerical data.
For example, I could form sub lists of the following list based on the condition if 'a' appears in the ith position of the list.
mylist = [a b c d
d a b c
c b a d
b d c a]
mysublist1 = [item for item in mylist if item[0] == a]
mysublist = [ a b c d]
In this example, I could repeat this process and change the condition to be the ith position of the list and generate 4 sublists. I could then-for demonstrations sake, reform the original list using the handy set functions
set(thisset).union(thatset)
However, without being clever this will take n-1 instances of these guys (forming the first union, then taking successive unions)-which can get dicey.
Is anyone aware of any methods that would make this a bit more elegant? I have tried appending sets to a list and then defining the successive union over the size of this list be fruitful (but I'm getting some type errors!)
Thanks!

Select duplicated lists from a list of lists (Python 2.7.13)

I have two lists, one is a list of lists, and they have the same number of indexes(the half number of values), like this:
list1=[['47', '43'], ['299', '295'], ['47', '43'], etc.]
list2=[[9.649, 9.612, 9.42, etc.]
I want to detect the repeated pair of values in the same list(and delete it), and sum the values with the same indexes in the second list, creating an output like this:
list1=[['47', '43'], ['299', '295'], etc.]
list2=[[19.069, 9.612, etc.]
The main problem is that the order of the values is important and I'm really stuck.
You could create a collections.defaultdict to sum values together, with keys as the sublists (converted as tuple to be hashable)
list1=[['47', '43'], ['299', '295'], ['47', '43']]
list2=[9.649, 9.612, 9.42]
import collections
c = collections.defaultdict(float)
for l,v in zip(list1,list2):
c[tuple(l)] += v
print(c)
Alternative using collections.Counter and which does the same:
c = collections.Counter((tuple(k),v) for k,v in zip(list1,list2))
At this point, we have the related data:
defaultdict(<class 'float'>, {('299', '295'): 9.612, ('47', '43'): 19.069})
now if needed (not sure, since the dictionary holds the data very well) we can rebuild the lists, keeping the (relative) order between them (but not their original order, that shouldn't be a problem since they're still linked):
list1=[]
list2=[]
for k,v in c.items():
list1.append(list(k))
list2.append(v)
print(list1,list2)
result:
[['299', '295'], ['47', '43']]
[9.612, 19.069]

cloudsearch query to boost exact match on range

In a cloudsearch structured query.
I have a couple of fields I am searching on.
On field one, the user selects "2"
On field two the user selects "1"
I am wanting to run this as a range query, so that the results that are returned are -1 to +1
eg. on field one the range would be 1,3 and on field 2 it would be 0,2
What I am wanting to do is sort the results so that the results that match both field 1 and field 2 are at the top, and the rest under it.
eg. where field one=2 and field two =1 would be at the top and the rest are not in any specific order,
note: I do need to end up sorting the results by distance, so that all the exact matching results are in distance order, then all the rest are ordered by distance.
I am sure I can do this with 2 queries, just trying to make it work in one query if at all possible to lighten the load.
Say your fields are 'a' and 'b', and the specified values are a=2 and b=1 (as in your example, except I've named the fields 'a' and 'b' instead of 'one' and 'two'). Here are the various terms of your query.
Range Query
This is the query for the range a±1 and b±1 where a=2 and b=1:
q=(and (range field=a[1,3]) (range field=b[0,2]))
Rank Expression
For your rank expression, compute a distance-based score using absolute value so that scores 'a' and 'b' can't cancel each other out (like a=3,b=0 would, for example):
expr.rank1=abs(a-2)+abs(b-1)
Sort by Rank
That defined a ranking expression named rank1, which we now want to sort by, starting with the lowest values ('0' means a=2,b=1):
sort=rank1 asc
Return the Rank
For debugging purposes, you may want return the ranking score:
return=rank1
Put all those terms together and you've got your query.
Further Potentially-Useful Things
If you want to get fancy and penalize things in a non-linear way, you can use exp. For example, if you want to differentiate between 'a' and 'b' both being off by 1 vs 'a' being an exact match and 'b' being off by 2 (eg a=3,b=2 will rank ahead of a=2,b=3 even though the previous ranker would give them both a score of 2):
expr.rank1=exp(abs(a-2))+exp(abs(b-1))
And you can use boolean logic and the ternary operator to detect and prefer certain results that meet certain criteria, eg to give a big boost when 'a' and 'b' are on-target, a smaller boost when 'a' or 'b' is on target, etc (since we're sorting in low-to-high, a boost in rank is actually achieved by adding less to the result):
((a==1&&b==2)?0:100)+((a==1||b==2)?0:1000)+abs(a-1)+abs(b-2)
See http://docs.aws.amazon.com/cloudsearch/latest/developerguide/configuring-expressions.html