The problem is simpler than knapsack (or a type of it, without values and only positive weights). The problem consists of checking whether a number can be a combination of others. The function should return true or false.
For example,
112 and a list with { 17, 100, 101 } should return false, 469 with the same list should return true, 35 should return false, 119 should return true, etc...
Edit: subset sum problem would be more accurate for this than knapsack.
This is a special case of the Subset Sum problem, with sets that only contain one negative number (i.e., express 112 and { 17, 100, 101 } as { -112, 17, 100, 101 }). There's a few algorithms on the Wikipedia page, http://en.wikipedia.org/wiki/Subset_sum_problem.
An observation that will help you is that if your list is {a, b, c...} and the number you want to test is x, then x can be written as a sum of a sublist only if either x or x-a can be written as a sum of the sublist {b, c, ...}. This lets you write a very simple recursive algorithm to solve the problem.
edit: here is some code, taking into account the comments below. Not tested so probably buggy; and not necessarily the fastest. But for a small dataset it will get the job done neatly.
bool is_subset_sum(int x, std::list::const_iterator start, std::list::const_iterator end)
{
// for a 1-element list {a} we just need to test a|x
if (start == end) return (x % *start == 0);
// if x is small enough we don't need to bother testing x - a
if (x<a) return is_subset_sum (x, start+1, end);
// the default case. Note that the shortcut properties of || means the process ends as soon as we get a positive.
return (is_subset_sum (x, start+1, end) || is_subset_sum (x-a, start, end));
}
Note that positive results become denser as the queried number becomes larger. For example, all numbers greater than 100^2 can be generated by { 17, 100, 101 }. So the optimal algorithm may depend upon whether the queried number is much greater than the set's members. You might look into field theory.
At the least, you know the result is always false if the greatest common divisor of the set is not in the query, and that can be checked in negligible time.
If the number to reach is not too large, you can probably generate all the reachable numbers from the set that fall in the range [1,N].
Problem: Reach N using the elements in the list L, where N is small enough not to worry about a vector of size N elements' size.
Algorithm:
Generate a vector V of size N
For each element l in the list L
For each reachable element v in V
mark all elements v + n*l in V as reachable
Related
Question
Given a list L:[a,b,c,d,e,f] is there a built in way of randomly shuffling the elements of the list? Something like:
M:random_order(L);
> [ b, c, d, a, e, f]
I checked the documentation for Functions and Variables for Lists for any built in option for shuffling the order of elements in a list, but didn't see anything obvious.
Context
I'm trying to generate lists of x terms whose maximal sum is s. Right now, I'm doing this by creating a list where each term is a random number between 1 and the max value that ensures that, if the remaining terms have a minimal value of 1, then the sum will be, at most, s:
/* `x` is the total number of terms; `s` is the max sum */
gen_val(x, s):=block([x:x, s:s, vals:makelist(nul,i,x) ],
/*
the first value is a random integer in [1, (s-x)], if
vals[1] = (s-x), then all remaining terms have to be equal to 1
*/
vals[1]: 1 + random(s-x),
/*
subsequent terms are assigned in the same way, subtracting the sum of
previously assigned values, as well as reserving at least 1 unit for
each remaining term
*/
for i:2 thru x
do vals[i]:1 + random(s-sum(vals[k],k,1,i-1)-(x-i+1)),
/* return the list */
vals
);
However, this generates lists where earlier terms (i.e. lower index) have a higher probability of having higher values; whereas I'd like a more even distribution of the values.
The simplest solution I could think of was simply shuffling the elements of the vals list; however, I'd be equally interested in any other method that achieves this desired result (i.e. a list of x terms whose sum is, at most, s).
The even broader context is the problem of dividing an interval of the number line into sub-intervals. I decided to take the length of the interval and the number of partitions as my variables for building the sub-intervals, hence the goal, stated above. If I = [a, b] is the full interval, then for any c, d such that c+d =< b-a we can define the subintervals [a, a+c], [a+c, a+c+d], [a+c+d, b]
I have a large list of items, each item has a weight.
I'd like to select N items randomly without replacement, while the items with more weight are more probable to be selected.
I'm looking for the most performing idea. Performance is paramount. Any ideas?
If you want to sample items without replacement, you have lots of options.
Use a weighted-choice-with-replacement algorithm to choose random indices. There are many algorithms like this. One of them is WeightedChoice, described later in this answer, and another is rejection sampling, described as follows. Assume that the highest weight is max, there are n weights, and each weight is 0 or greater. To choose an index in [0, n) using rejection sampling:
Choose a uniform random integer i in [0, n).
With probability weights[i]/max, return i. Otherwise, go to step 1. (For example, if all the weights are integers greater than 0, choose a uniform random integer in [1, max] and if that number is weights[i] or less, return i, or go to step 1 otherwise.)
Each time the weighted choice algorithm chooses an index, set the weight for the chosen index to 0 to keep it from being chosen again. Or...
Assign each index an exponentially distributed random number (with a rate equal to that index's weight), make a list of pairs assigning each number to an index, then sort that list by those numbers. Then take each item from first to last, in ascending order. This sorting can be done on-line using a priority queue data structure (a technique that leads to weighted reservoir sampling). Notice that the naïve way to generate the random number, -ln(1-RNDU01())/weight, where RNDU01() is a uniform random number in [0, 1], is not robust, however ("Index of Non-Uniform Distributions", under "Exponential distribution").
Tim Vieira gives additional options in his blog.
A paper by Bram van de Klundert compares various algorithms.
EDIT (Aug. 19): Note that for these solutions, the weight expresses how likely a given item will appear first in the sample. This weight is not necessarily the chance that a given sample of n items will include that item (that is, an inclusion probability). The methods given above will not necessarily ensure that a given item will appear in a random sample with probability proportional to its weight; for that, see "Algorithms of sampling with equal or unequal probabilities".
Assuming you want to choose items at random with replacement, here is pseudocode implementing this kind of choice. Given a list of weights, it returns a random index (starting at 0), chosen with a probability proportional to its weight. This algorithm is a straightforward way to implement weighted choice. But if it's too slow for you, see my section "Weighted Choice With Replacement" for a survey of other algorithms.
METHOD WChoose(weights, value)
// Choose the index according to the given value
lastItem = size(weights) - 1
runningValue = 0
for i in 0...size(weights) - 1
if weights[i] > 0
newValue = runningValue + weights[i]
lastItem = i
// NOTE: Includes start, excludes end
if value < newValue: break
runningValue = newValue
end
end
// If we didn't break above, this is a last
// resort (might happen because rounding
// error happened somehow)
return lastItem
END METHOD
METHOD WeightedChoice(weights)
return WChoose(weights, RNDINTEXC(Sum(weights)))
END METHOD
Let A be the item array with x itens. The complexity of each method is defined as
< preprocessing_time, querying_time >
If sorting is possible: < O(x lg x), O(n) >
sort A by the weight of the itens.
create an array B, for example:
B = [ 0, 0, 0, x/2, x/2, x/2, x/2, x/2 ].
it's clear to see that B has a bigger probability from choosing x/2.
if you haven't picked n elements yet, choose a random element e from B.
pick a random element from A within the interval e : x-1.
If iterating through the itens is possible: < O(x), O(tn) >
iterate through A and find the average weight w of the elements.
define the maximum number of tries t.
try (at most t times) to pick a random number in A whose weight is bigger than w.
test for some t that gives you good/satisfactory results.
If nothing above is possible: < O(1), O(tn) >
define the maximum number of tries t.
if you haven't picked n elements yet, take t random elements in A.
pick the element with biggest value.
test for some t that gives you good/satisfactory results.
My question's header is similar to this link, however that one wasn't answered to my expectations.
I have an array of integers (1 000 000 entries), and need to mask exactly 30% of elements.
My approach is to loop over elements and roll a dice for each one. Doing it in a non-interrupted manner is good for cache coherency.
As soon as I notice that exactly 300 000 of elements were indeed masked, I need to stop. However, I might reach the end of an array and have only 200 000 elements masked, forcing me to loop a second time, maybe even a third, etc.
What's the most efficient way to ensure I won't have to loop a second time, and not being biased towards picking some elements?
Edit:
//I need to preserve the order of elements.
//For instance, I might have:
[12, 14, 1, 24, 5, 8]
//Masking away 30% might give me:
[0, 14, 1, 24, 0, 8]
The result of masking must be the original array, with some elements set to zero
Just do a fisher-yates shuffle but stop at only 300000 iterations. The last 300000 elements will be the randomly chosen ones.
std::size_t size = 1000000;
for(std::size_t i = 0; i < 300000; ++i)
{
std::size_t r = std::rand() % size;
std::swap(array[r], array[size-1]);
--size;
}
I'm using std::rand for brevity. Obviously you want to use something better.
The other way is this:
for(std::size_t i = 0; i < 300000;)
{
std::size_t r = rand() % 1000000;
if(array[r] != 0)
{
array[r] = 0;
++i;
}
}
Which has no bias and does not reorder elements, but is inferior to fisher yates, especially for high percentages.
When I see a massive list, my mind always goes first to divide-and-conquer.
I won't be writing out a fully-fleshed algorithm here, just a skeleton. You seem like you have enough of a clue to take decent idea and run with it. I think I only need to point you in the right direction. With that said...
We'd need an RNG that can return a suitably-distributed value for how many masked values could potentially be below a given cut point in the list. I'll use the halfway point of the list for said cut. Some statistician can probably set you up with the right RNG function. (Anyone?) I don't want to assume it's just uniformly random [0..mask_count), but it might be.
Given that, you might do something like this:
// the magic RNG your stats homework will provide
int random_split_sub_count_lo( int count, int sub_count, int split_point );
void mask_random_sublist( int *list, int list_count, int sub_count )
{
if (list_count > SOME_SMALL_THRESHOLD)
{
int list_count_lo = list_count / 2; // arbitrary
int list_count_hi = list_count - list_count_lo;
int sub_count_lo = random_split_sub_count_lo( list_count, mask_count, list_count_lo );
int sub_count_hi = list_count - sub_count_lo;
mask( list, list_count_lo, sub_count_lo );
mask( list + sub_count_lo, list_count_hi, sub_count_hi );
}
else
{
// insert here some simple/obvious/naive implementation that
// would be ludicrous to use on a massive list due to complexity,
// but which works great on very small lists. I'm assuming you
// can do this part yourself.
}
}
Assuming you can find someone more informed on statistical distributions than I to provide you with a lead on the randomizer you need to split the sublist count, this should give you O(n) performance, with 'n' being the number of masked entries. Also, since the recursion is set up to traverse the actual physical array in constantly-ascending-index order, cache usage should be as optimal as it's gonna get.
Caveat: There may be minor distribution issues due to the discrete nature of the list versus the 30% fraction as you recurse down and down to smaller list sizes. In practice, I suspect this may not matter much, but whatever person this solution is meant for may not be satisfied that the random distribution is truly uniform when viewed under the microscope. YMMV, I guess.
Here's one suggestion. One million bits is only 128K which is not an onerous amount.
So create a bit array with all items initialised to zero. Then randomly select 300,000 of them (accounting for duplicates, of course) and mark those bits as one.
Then you can run through the bit array and, any that are set to one (or zero, if your idea of masking means you want to process the other 700,000), do whatever action you wish to the corresponding entry in the original array.
If you want to ensure there's no possibility of duplicates when randomly selecting them, just trade off space for time by using a Fisher-Yates shuffle.
Construct an collection of all the indices and, for each of the 700,000 you want removed (or 300,000 if, as mentioned, masking means you want to process the other ones) you want selected:
pick one at random from the remaining set.
copy the final element over the one selected.
reduce the set size.
This will leave you with a random subset of indices that you can use to process the integers in the main array.
You want reservoir sampling. Sample code courtesy of Wikipedia:
(*
S has items to sample, R will contain the result
*)
ReservoirSample(S[1..n], R[1..k])
// fill the reservoir array
for i = 1 to k
R[i] := S[i]
// replace elements with gradually decreasing probability
for i = k+1 to n
j := random(1, i) // important: inclusive range
if j <= k
R[j] := S[i]
I have the following problem:
Generate M uniformly random integers from the range 0-N, where N >> M, and where no pair has a difference less than K. where M >> K.
At the moment the best method I can think of is to maintain a sorted list, then determine the lower bound of the current generated integer and test it with the lower and upper elements, if it's ok to then insert the element in between. This is of complexity O(nlogn).
Would there happen to be a more efficient algorithm?
An example of the problem:
Generate 1000 uniformly random integers between zero and 100million where the difference between any two integers is no less than 1000
A comprehensive way to solve this would be to:
Determine all the combinations of n-choose-m that satisfy the constraint, lets called it set X
Select a uniformly random integer i in the range [0,|X|).
Select the i'th combination from X as the result.
This solution is problematic when the n-choose-m is large, as enumerating and storing all possible combinations will be extremely costly. Hence an efficient online generating solution is sought.
Note: The following is a C++ implementation of the solution provided by pentadecagon
std::vector<int> generate_random(const int n, const int m, const int k)
{
if ((n < m) || (m < k))
return std::vector<int>();
std::random_device source;
std::mt19937 generator(source());
std::uniform_int_distribution<> distribution(0, n - (m - 1) * k);
std::vector<int> result_list;
result_list.reserve(m);
for (int i = 0; i < m; ++i)
{
result_list.push_back(distribution(generator));
}
std::sort(std::begin(result_list),std::end(result_list));
for (int i = 0; i < m; ++i)
{
result_list[i] += (i * k);
}
return result_list;
}
http://ideone.com/KOeR4R
.
EDIT: I adapted the text for the requirement to create ordered sequences, each with the same probability.
Create random numbers a_i for i=0..M-1 without duplicates. Sort them. Then create numbers
b_i=a_i + i*(K-1)
Given the construction, those numbers b_i have the required gaps, because the a_i already have gaps of at least 1. In order to make sure those b values cover exactly the required range [1..N], you must ensure a_i are picked from a range [1..N-(M-1)*(K-1)]. This way you get truly independent numbers. Well, as independent as possible given the required gap. Because of the sorting you get O(M log M) performance again, but this shouldn't be too bad. Sorting is typically very fast. In Python it looks like this:
import random
def random_list( N, M, K ):
s = set()
while len(s) < M:
s.add( random.randint( 1, N-(M-1)*(K-1) ) )
res = sorted( s )
for i in range(M):
res[i] += i * (K-1)
return res
First off: this will be an attempt to show that there's a bijection between the (M+1)- compositions (with the slight modification that we will allow addends to be 0) of the value N - (M-1)*K and the valid solutions to your problem. After that, we only have to pick one of those compositions uniformly at random and apply the bijection.
Bijection:
Let
Then the xi form an M+1-composition (with 0 addends allowed) of the value on the left (notice that the xi do not have to be monotonically increasing!).
From this we get a valid solution
by setting the values mi as follows:
We see that the distance between mi and mi + 1 is at least K, and mM is at most N (compare the choice of the composition we started out with). This means that every (M+1)-composition that fulfills the conditions above defines exactly one valid solution to your problem. (You'll notice that we only use the xM as a way to make the sum turn out right, we don't use it for the construction of the mi.)
To see that this gives a bijection, we need to see that the construction can be reversed; for this purpose, let
be a given solution fulfilling your conditions. To get the composition this is constructed from, define the xi as follows:
Now first, all xi are at least 0, so that's alright. To see that they form a valid composition (again, every xi is allowed to be 0) of the value given above, consider:
The third equality follows since we have this telescoping sum that cancels out almost all mi.
So we've seen that the described construction gives a bijection between the described compositions of N - (M-1)*K and the valid solutions to your problem. All we have to do now is pick one of those compositions uniformly at random and apply the construction to get a solution.
Picking a composition uniformly at random
Each of the described compositions can be uniquely identified in the following way (compare this for illustration): reserve N - (M-1)*K spaces for the unary notation of that value, and another M spaces for M commas. We get an (M+1)- composition of N - (M-1)*K by choosing M of the N - (M-1)*K + M spaces, putting commas there, and filling the rest with |. Then let x0 be the number of | before the first comma, xM+1 the number of | after the last comma, and all other xi the number of | between commas i and i+1. So all we have to do is pick an M-element subset of the integer interval[1; N - (M-1)*K + M] uniformly at random, which we can do for example with the Fisher-Yates shuffle in O(N + M log M) (we need to sort the M delimiters to build the composition) since M*K needs to be in O(N) for any solutions to exist. So if N is bigger than M by at least a logarithmic factor, then this is linear in N.
Note: #DavidEisenstat suggested that there are more space efficient ways of picking the M-element subset of that interval; I'm not aware of any, I'm afraid.
You can get an error-proof algorithm out of this by doing the simple input validation we get from the construction above that N ≥ (M-1) * K and that all three values are at least 1 (or 0, if you define the empty set as a valid solution for that case).
Why not do this:
for (int i = 0; i < M; ++i) {
pick a random number between K and N/M
add this number to (N/M)* i;
Now you have M random numbers, distributed evenly along N, all of which have a difference of at least K. It's in O(n) time. As an added bonus, it's already sorted. :-)
EDIT:
Actually, the "pick a random number" part shouldn't be between K and N/M, but between min(K, [K - (N/M * i - previous value)]). That would ensure that the differences are still at least K, and not exclude values that should not be missed.
Second EDIT:
Well, the first case shouldn't be between K and N/M - it should be between 0 and N/M. Just like you need special casing for when you get close to the N/M*i border, we need special initial casing.
Aside from that, the issue you brought up in your comments was fair representation, and you're right. As my pseudocode is presented, it currently completely misses the excess between N/M*M and N. It's another edge case; simply change the random values of your last range.
Now, in this case, your distribution will be different for the last range. Since you have more numbers, you have slightly less chance for each number than you do for all the other ranges. My understanding is that because you're using ">>", this shouldn't really impact the distribution, i.e. the difference in size in the sample set should be nominal. But if you want to make it more fair, you divide the excess equally among each range. This makes your initial range calculation more complex - you'll have to augment each range based on how much remainder there is divided by M.
There are lots of special cases to look out for, but they're all able to be handled. I kept the pseudocode very basic just to make sure that the general concept came through clearly. If nothing else, it should be a good starting point.
Third and Final EDIT:
For those worried that the distribution has a forced evenness, I still claim that there's nothing saying it can't. The selection is uniformly distributed in each segment. There is a linear way to keep it uneven, but that also has a trade-off: if one value is selected extremely high (which should be unlikely given a very large N), then all the other values are constrained:
int prevValue = 0;
int maxRange;
for (int i = 0; i < M; ++i) {
maxRange = N - (((M - 1) - i) * K) - prevValue;
int nextValue = random(0, maxRange);
prevValue += nextValue;
store previous value;
prevValue += K;
}
This is still linear and random and allows unevenness, but the bigger prevValue gets, the more constrained the other numbers become. Personally, I prefer my second edit answer, but this is an available option that given a large enough N is likely to satisfy all the posted requirements.
Come to think of it, here's one other idea. It requires a lot more data maintenance, but is still O(M) and is probably the most fair distribution:
What you need to do is maintain a vector of your valid data ranges and a vector of probability scales. A valid data range is just the list of high-low values where K is still valid. The idea is you first use the scaled probability to pick a random data range, then you randomly pick a value within that range. You remove the old valid data range and replace it with 0, 1 or 2 new data ranges in the same position, depending on how many are still valid. All of these actions are constant time other than handling the weighted probability, which is O(M), done in a loop M times, so the total should be O(M^2), which should be much better than O(NlogN) because N >> M.
Rather than pseudocode, let me work an example using OP's original example:
0th iteration: valid data ranges are from [0...100Mill], and the weight for this range is 1.0.
1st iteration: Randomly pick one element in the one element vector, then randomly pick one element in that range.
If the element is, e.g. 12345678, then we remove the [0...100Mill] and replace it with [0...12344678] and [12346678...100Mill]
If the element is, e.g. 500, then we remove the [0...100Mill] and replace it with just [1500...100Mill], since [0...500] is no longer a valid range. The only time we will replace it with 0 ranges is in the unlikely event that you have a range with only one number in it and it gets picked. (In that case, you'll have 3 numbers in a row that are exactly K apart from each other.)
The weight for the ranges are their length over the total length, e.g. 12344678/(12344678 + (100Mill - 12346678)) and (100Mill - 12346678)/(12344678 + (100Mill - 12346678))
In the next iterations, you do the same thing: randomly pick a number between 0 and 1 and determine which of the ranges that scale falls into. Then randomly pick a number in that range, and replace your ranges and scales.
By the time it's done, we're no longer acting in O(M), but we're still only dependent on the time of M instead of N. And this actually is both uniform and fair distribution.
Hope one of these ideas works for you!
I'm not exactly sure how to ask this, but I'll try to be as specific as possible.
Imagine a tetris screen with only rectangles, of different shapes, falling to the bottom.
I want to compute the maximum number of rectangles that I can fit one next to the other without any overlapping ones. I've named them lines in the title because I'm actually only interested in the length of the rectangle when computing, or the line parallel to the x axis that it's falling towards.
So basically I have a custom type with a start and end, both integers between 0 and 100. Say we have a list of these rectangles ranging from 1 to n. rectangle_n.start (unless it's the rectangle closest to the origin) has to be > rectangle_(n-1).end so that they will never overlap.
I'm reading the rectangle coordinates (both are x axis coordinates) from a file with random numbers.
As an example:
consider this list of rectangle type objects
rectangle_list {start, end} = {{1,2}, {3,5}, {4,7} {9,12}}
We can observe that the 3rd object has its start coordinate 4 < the previous rectangle's end coordinate which is 5. So in sorting this list, I would have to remove the 2nd or the 3rd object so that they don't overlap.
I'm not sure if there is a type for this kind of problem so I didn't know how else to name it. I'm interested in an algorithm that can be applied on a list of such objects and would sort them out accordingly.
I've tagged this with c++ because the code I'm writing is c++ but any language would do for the algorithm.
You are essentially solving the following problem. Suppose we have n intervals {[x_1,y_1),[x_2,y_2),...,[x_n,y_n)} with x_1<=x_2<=...<=x_n. We want to find a maximal subset of these intervals such that there are no overlaps between any intervals in the subset.
The naive solution is dynamic programming. It guarantees to find the best solution. Let f(i), 0<=i<=n, be the size of the maximal subset up to interval [x_i,y_i). We have equation (this is latex):
f(i)=\max_{0<=j<i}{f(j)+d(i,j)}
where d(i,j)=1 if and only if [x_i,y_i) and [x_j,y_j) have no overlaps; otherwise d(i,j) takes zero. You can iteratively compute f(i), starting from f(0)=0. f(n) gives the size of the maximal subset. To get the actual subset, you need to keep a separate array s(i)=\argmax_{0<=j<i}{f(j)+d(i,j)}. You then need to backtrack to get the 'path'.
This is an O(n^2) algorithm: you need to compute each f(i) and for each f(i) you need i number of tests. I think there should be a O(nlogn) algorithm, but I am not so sure.
EDIT: an implementation in Lua:
function find_max(list)
local ret, f, b = {}, {}, {}
f[0], b[0] = 0, 0
table.sort(list, function(a,b) return a[1]<b[1] end)
-- dynamic programming
for i, x in ipairs(list) do
local max, max_j = 0, -1
x = list[i]
for j = 0, i - 1 do
local e = j > 0 and list[j][2] or 0
local score = e <= x[1] and 1 or 0
if f[j] + score > max then
max, max_j = f[j] + score, j
end
end
f[i], b[i] = max, max_j
end
-- backtrack
local max, max_i = 0, -1
for i = 1, #list do
if f[i] > max then -- don't use >= here
max, max_i = f[i], i
end
end
local i, ret = max_i, {}
while true do
table.insert(ret, list[i])
i = b[i]
if i == 0 then break end
end
return ret
end
local l = find_max({{1,2}, {4,7}, {3,5}, {8,11}, {9,12}})
for _, x in ipairs(l) do
print(x[1], x[2])
end
The name of this problem is bin packing, it is usually considered as a hard problem but can be computed reasonably well for small number of bins.
Here is a video explaining common approaches to this problem
EDIT : By hard problem, I mean that some kind of brute force has to be employed. You will have to evaluate a lot of solutions and reject most of them, so usually you need some kind of evaluation mechanism. You need to be able to compare solution, such as "This solution packs 4 rectangles with area of 15" is better than "This solution packs 3 rectangles with area of 16".
I can't think of a shortcut, so you may have to enumerate the power set in descending order of size and stop on the first match.
The straightforward way to do this is to enumerate combinations of decreasing size. You could do something like this in C++11:
template <typename I>
std::set<Span> find_largest_non_overlapping_subset(I start, I finish) {
std::set<Span> result;
for (size_t n = std::distance(start, finish); n-- && result.empty();) {
enumerate_combinations(start, finish, n, [&](I begin, I end) {
if (!has_overlaps(begin, end)) {
result.insert(begin, end);
return false;
}
return true;
});
}
return result;
}
The implementation of enumerate_combination is left as an exercise. I assume you already have has_overlap.