Eg. The given array:[1,2,1,3,1,2,1,5]
should return-1 -> 2
2 -> 4
3 -> 0
5 -> 0
There is a solution I can think of but it is of O(n^2).
Suggest something better.
Transform in one linear scan your array into a hashmap of arrays indexed by value, containing the indices where that value was found. For your example this would be:
{
1: [0, 2, 4, 6],
2: [1, 5],
3: [3],
5: [7],
}
Then for each entry l in the hashmap output 0 if len(l) <= 1, and otherwise output l[1] - l[0]. If you also have to check that the period is consistent, check that l[i] - l[i-1] == l[1] - l[0] for all i >= 2.
Find the length of the longest continuous sub-sequence of an array the elements of which make up a set of continuous increasing integers.
The input file consists of the number n(the number of elements in the array) followed by n integers.
example input - 10 1 6 4 5 2 3 8 10 7 7
example output - 6(1 6 4 5 2 3 since they make the set 1 2 3 4 5 6).
I was able to write an algorithm that satisfies 0<n<5000 but in order to get 100 points the algorithm had to work for 0<=n<=50000.
How about something like this? Arrange the array elements in descending order, each coupled with its index-range as a local maximum (for example, A[0] = 10 would be the maximum for array indexes, [0, 10], while A[3] = 4 would be the local maximum for array indexes, [3,3]. Now traverse this list and find the longest, continuously descending sequence where the index-ranges are all contained in the starting range.
10 1 6 4 5 2 3 8 10 7 7
=> 10, [ 0,10]
8, [ 1, 7]
7, [ 9,10]
6, [ 1, 6] <--
5, [ 3, 6] | ranges
4, [ 3, 3] | all
3, [ 5, 6] | contained
2, [ 5, 5] | in [1,6]
1, [ 1, 1] <--
I have n data points in some arbitrary space and I cluster them.
The result of my clustering algorithm is a partition represented by an int vector l of length n assigning each point to a cluster. Values of l ranges from 0 to (possibly) n-1.
Example:
l_1 = [ 1 1 1 0 0 2 6 ]
Is a partition of n=7 points into 4 clusters: first three points are clustered together, the fourth and fifth are together and the last two points forms two distinct singleton clusters.
My question:
Suppose I have two partitions l_1 and l_2 how can I efficiently determine if they represents identical partitions?
Example:
l_2 = [ 2 2 2 9 9 3 1 ]
is identical to l_1 since it represents the same partitions of the points (despite the fact that the "numbers"/"labels" of the clusters are not identical).
On the other hand
l_3 = [ 2 2 2 9 9 3 3 ]
is no longer identical since it groups together the last two points.
I'm looking for a solution in either C++, python or Matlab.
Unwanted direction
A naive approach would be to compare the co-occurrence matrix
c1 = bsxfun( #eq, l_1, l_1' );
c2 = bsxfun( #eq, l_2, l_2' );
l_1_l_2_are_identical = all( c1(:)==c2(:) );
The co-occurrence matrix c1 is of size nxn with true if points k and m are in the same cluster and false otherwise (regardless of the cluster "number"/"label").
Therefore if the co-occurrence matrices c1 and c2 are identical then l_1 and l_2 represent identical partitions.
However, since the number of points, n, might be quite large I would like to avoid O(n^2) solutions...
Any ideas?
Thanks!
When are two partition identical?
Probably if they have the exact same members.
So if you just want to test for identity, you can do the following:
Substitute each partition ID with the smallest object ID in the partition.
Then two partitionings are identical if and only if this representation is identical.
In your example above, lets assume the vector index 1 .. 7 is your object ID. Then I would get the canonical form
[ 1 1 1 4 4 6 7 ]
^ first occurrence at pos 1 of 1 in l_1 / 2 in l_2
^ first occurrence at pos 4
for l_1 and l_2, whereas l_3 canonicalizes to
[ 1 1 1 4 4 6 6 ]
To make it more clear, here is another example:
l_4 = [ A B 0 D 0 B A ]
canonicalizes to
[ 1 2 3 4 3 2 1 ]
since the first occurence of cluster "A" is at position 1, "B" at position 2 etc.
If you want to measure how similar two clusterings are, a good approach is to look at precision/recall/f1 of the object pairs, where the pair (a,b) exists if and only if a and b belong to the same cluster.
Update: Since it was claimed that this is quadratic, I will further clarify.
To produce the canonical form, use the following approach (actual python code):
def canonical_form(li):
""" Note, this implementation overwrites li """
first = dict()
for i in range(len(li)):
v = first.get(li[i])
if v is None:
first[li[i]] = i
v = i
li[i] = v
return li
print canonical_form([ 1, 1, 1, 0, 0, 2, 6 ])
# [0, 0, 0, 3, 3, 5, 6]
print canonical_form([ 2, 2, 2, 9, 9, 3, 1 ])
# [0, 0, 0, 3, 3, 5, 6]
print canonical_form([ 2, 2, 2, 9, 9, 3, 3 ])
# [0, 0, 0, 3, 3, 5, 5]
print canonical_form(['A','B',0,'D',0,'B','A'])
# [0, 1, 2, 3, 2, 1, 0]
print canonical_form([1,1,1,0,0,2,6]) == canonical_form([2,2,2,9,9,3,1])
# True
print canonical_form([1,1,1,0,0,2,6]) == canonical_form([2,2,2,9,9,3,3])
# False
If you are going to relabel your partitions, as has been previously suggested, you will potentially need to search through n labels for each of the n items. I.e. the solutions are O(n^2).
Here is my idea: Scan through both lists simultaneously, maintaining a counter for each partition label in each list.
You will need to be able to map partition labels to counter numbers.
If the counters for each list do not match, then the partitions do not match.
This would be O(n).
Here is a proof of concept in Python:
l_1 = [ 1, 1, 1, 0, 0, 2, 6 ]
l_2 = [ 2, 2, 2, 9, 9, 3, 1 ]
l_3 = [ 2, 2, 2, 9, 9, 3, 3 ]
d1 = dict()
d2 = dict()
c1 = []
c2 = []
# assume lists same length
match = True
for i in range(len(l_1)):
if l_1[i] not in d1:
x1 = len(c1)
d1[l_1[i]] = x1
c1.append(1)
else:
x1 = d1[l_1[i]]
c1[x1] += 1
if l_2[i] not in d2:
x2 = len(c2)
d2[l_2[i]] = x2
c2.append(1)
else:
x2 = d2[l_2[i]]
c2[x2] += 1
if x1 != x2 or c1[x1] != c2[x2]:
match = False
print "match = {}".format(match)
In matlab:
function tf = isIdenticalClust( l_1, l_2 )
%
% checks if partitions l_1 and l_2 are identical or not
%
tf = all( accumarray( {l_1} , l_2 , [],#(x) all( x == x(1) ) ) == 1 ) &&...
all( accumarray( {l_2} , l_1 , [],#(x) all( x == x(1) ) ) == 1 );
What this does:
groups all elements of l_1 according to the partition of l_2 and checks if all elements of l_1 at each cluster are all identical. Repeating the same for partitioning l_2 according to l_1.
If both grouping yields the homogenous clusters - they are identical.
This is a question in Maple.
I understand in terms of java that I want a count and an increment but my logic doesn't convert as simply to maple code.
I have an very long list of numbers LIST (196) that I wish to turn into a 14x14 Array but using the convert(LIST,Array) only gives me a 1 dimensional array.
In Maple code, this will give me my first column.
j:=1;
for i from 1 to 14 do
B[i,j]:=Longlistvalue[i];
end do;
It's clear that my second column comes from t=2 and s from 15 to 24 but I'm struggling to put this into a loop.
Surely there is either a loop I can use for this or a maple command that puts the first 14 into the first row (or column) then the next 14 into the next row/column etc?
My most recent attempt gets me
B:=Array(1..14,1..14):
n:=1;
m:=14;
for j from 1 to 14 do
for i from n to m do
B[i,j]:=Longlistvalue[i];
end do;
n:=n+14;
m:=m+14;
end do;
But not it states that my array is out of range (because the s in B[i,j] must be less than 15).
Is there a way to get around this by means of a more efficient loop?
The Array (or Matrix) constructor can be used to do this directly, using an operator to assign the entries.
You can lay the list entries down into the Array either by column or by row. Adjust the example to fit your case where m=14 and n=14.
m,n := 3,4:
L:=[seq(i,i=1..m*n)]; # you got your list in another way
L := [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
Array(1..m,1..n,(i,j)->L[(j-1)*m+i]);
[1 4 7 10]
[ ]
[2 5 8 11]
[ ]
[3 6 9 12]
Array(1..m,1..n,(i,j)->L[(i-1)*n+j]);
[1 2 3 4]
[ ]
[5 6 7 8]
[ ]
[9 10 11 12]
You could also use nested loops.
Longlistvalue:=[seq(i,i=1..14^2)]: # your values will differ, of course
B:=Array(1..14,1..14):
n:=14;
m:=14;
for j from 1 to m do
for i from 1 to n do
B[i,j]:=Longlistvalue[(j-1)*m+i];
end do;
end do:
# so we can see the contents, displayed in full
interface(rtablesize=infinity):
B;
With given permutation 1...n for example 5 3 4 1 2
how to find all ascending subsequences of length 3 in linear time ?
Is it possible to find other ascending subsequences of length X ? X
I don't have idea how to solve it in linear time.
Do you need the actual ascending sequences? Or just the number of ascending subsequences?
It isn't possible to generate them all in less than the time it takes to list them. Which, as has been pointed out, is O(NX / (X-1)!). (There is a possibly unexpected factor of X because it takes time O(X) to list a data structure of size X.) The obvious recursive search for them scales not far from that.
However counting them can be done in time O(X * N2) if you use dynamic programming. Here is Python for that.
counts = []
answer = 0
for i in range(len(perm)):
inner_counts = [0 for k in range(X)]
inner_counts[0] = 1
for j in range(i):
if perm[j] < perm[i]:
for k in range(1, X):
inner_counts[k] += counts[j][k-1]
counts.add(inner_counts)
answer += inner_counts[-1]
For your example 3 5 1 2 4 6 and X = 3 you will wind up with:
counts = [
[1, 0, 0],
[1, 1, 0],
[1, 0, 0],
[1, 1, 0],
[1, 3, 1],
[1, 5, 5]
]
answer = 6
(You only found 5 above, the missing one is 2 4 6.)
It isn't hard to extend this answer to create a data structure that makes it easy to list them directly, to find a random one, etc.
You can't find all ascending subsequences on linear time because there may be much more subsequences than that.
For instance in a sorted original sequence all subsets are increasing subsequences, so a sorted sequence of of length N (1,2,...,N) has N choose k = n!/(n-k)!k! increasing subsequences of length k.