What Probabilistic Function to use

What Probabilistic Function to use - c++

I want to probabilistically chose "n" edges from "e" edges_in_sorted_order in a vector.
But i want to use probability in choosing. And i also want not to choose big edges in the starting.
so its like giving more weightage to smaller edges in the starting, and as i take edges,
i would give more and more weightage to bigger remaining edges too.
what probabilistic function of n and e should i choose ?
while( edgesTaken < n ) {
for each edge i and edgesTaken < n
probability = pdf( edgesTaken, i)
if ( prob > THRESHOLD )
take the edge
}

You need the quantile function for the distribution you want. Draw a random number using a standard generator to give you q uniformally distributed in [0, 1). Then call the quantile function with q as the parameter. The resulting random set will have the required distribution.

The probability that the first edge is 1 is choose(n-1,e-1)/choose(n,e).
More generally, the probability that the first edge is k is
[choose(n-k,e-1)/choose(n,e)] * 1/k
You also might want the probability that there is exactly one edge in 1-k:
[choose(n-k,e-1)/choose(n,e)]
From here I think you can wrap things up!
P.S. Just to explain, the three functions give the ratio of the number of ways to pick edges that satisfy their condition, to choose(n,e) which is the number of ways to pick e edges from n.

Related

What would be the fastest algorithm to randomly select N items from a list based on weights distribution?

I have a large list of items, each item has a weight.
I'd like to select N items randomly without replacement, while the items with more weight are more probable to be selected.
I'm looking for the most performing idea. Performance is paramount. Any ideas?

If you want to sample items without replacement, you have lots of options.
Use a weighted-choice-with-replacement algorithm to choose random indices. There are many algorithms like this. One of them is WeightedChoice, described later in this answer, and another is rejection sampling, described as follows. Assume that the highest weight is max, there are n weights, and each weight is 0 or greater. To choose an index in [0, n) using rejection sampling:
Choose a uniform random integer i in [0, n).
With probability weights[i]/max, return i. Otherwise, go to step 1. (For example, if all the weights are integers greater than 0, choose a uniform random integer in [1, max] and if that number is weights[i] or less, return i, or go to step 1 otherwise.)
Each time the weighted choice algorithm chooses an index, set the weight for the chosen index to 0 to keep it from being chosen again. Or...
Assign each index an exponentially distributed random number (with a rate equal to that index's weight), make a list of pairs assigning each number to an index, then sort that list by those numbers. Then take each item from first to last, in ascending order. This sorting can be done on-line using a priority queue data structure (a technique that leads to weighted reservoir sampling). Notice that the naïve way to generate the random number, -ln(1-RNDU01())/weight, where RNDU01() is a uniform random number in [0, 1], is not robust, however ("Index of Non-Uniform Distributions", under "Exponential distribution").
Tim Vieira gives additional options in his blog.
A paper by Bram van de Klundert compares various algorithms.
EDIT (Aug. 19): Note that for these solutions, the weight expresses how likely a given item will appear first in the sample. This weight is not necessarily the chance that a given sample of n items will include that item (that is, an inclusion probability). The methods given above will not necessarily ensure that a given item will appear in a random sample with probability proportional to its weight; for that, see "Algorithms of sampling with equal or unequal probabilities".
Assuming you want to choose items at random with replacement, here is pseudocode implementing this kind of choice. Given a list of weights, it returns a random index (starting at 0), chosen with a probability proportional to its weight. This algorithm is a straightforward way to implement weighted choice. But if it's too slow for you, see my section "Weighted Choice With Replacement" for a survey of other algorithms.
METHOD WChoose(weights, value)
// Choose the index according to the given value
lastItem = size(weights) - 1
runningValue = 0
for i in 0...size(weights) - 1
if weights[i] > 0
newValue = runningValue + weights[i]
lastItem = i
// NOTE: Includes start, excludes end
if value < newValue: break
runningValue = newValue
end
end
// If we didn't break above, this is a last
// resort (might happen because rounding
// error happened somehow)
return lastItem
END METHOD
METHOD WeightedChoice(weights)
return WChoose(weights, RNDINTEXC(Sum(weights)))
END METHOD

Let A be the item array with x itens. The complexity of each method is defined as
< preprocessing_time, querying_time >
If sorting is possible: < O(x lg x), O(n) >
sort A by the weight of the itens.
create an array B, for example:
B = [ 0, 0, 0, x/2, x/2, x/2, x/2, x/2 ].
it's clear to see that B has a bigger probability from choosing x/2.
if you haven't picked n elements yet, choose a random element e from B.
pick a random element from A within the interval e : x-1.
If iterating through the itens is possible: < O(x), O(tn) >
iterate through A and find the average weight w of the elements.
define the maximum number of tries t.
try (at most t times) to pick a random number in A whose weight is bigger than w.
test for some t that gives you good/satisfactory results.
If nothing above is possible: < O(1), O(tn) >
define the maximum number of tries t.
if you haven't picked n elements yet, take t random elements in A.
pick the element with biggest value.
test for some t that gives you good/satisfactory results.

Pick a matrix cell according to its probability

I have a 2D matrix of positive real values, stored as follow:
vector<vector<double>> matrix;
Each cell can have a value equal or greater to 0, and this value represents the possibility of the cell to be chosen. In particular, for example, a cell with a value equals to 3 has three times the probability to be chosen compared to a cell with value 1.
I need to select N cells of the matrix (0 <= N <= total number of cells) randomly, but according to their probability to be selected.
How can I do that?
The algorithm should be as fast as possible.

I describe two methods, A and B.
A works in time approximately N * number of cells, and uses space O(log number of cells). It is good when N is small.
B works in time approximately (number of cells + N) * O(log number of cells), and uses space O(number of cells). So, it is good when N is large (or even, 'medium') but uses a lot more memory, in practice it might be slower in some regimes for that reason.
Method A:
The first thing you need to do is normalize the entries. (It's not clear to me if you assume they are normalized or not.) That means, sum all the entries and divide by the sum. (This part is potentially slow, so it's better if you assume or require that it already happened.)
Then you sample like this:
Choose a random [i,j] entry of the matrix (by choosing i,j each uniformly randomly from the range of integers 0 to n-1).
Choose a uniformly random real number p in the range [0, 1].
Check if matrix[i][j] > p. If so, return the pair [i][j]. If not, go back to step 1.
Why does this work? The probability that we end at step 3 with any particular output, is equal to, the probability that [i][j] was selected (this is the same for each entry), times the probality that the number p was small enough. This is proportional to the value matrix[i][j], so the sampling is choosing each entry with the correct proportions. It's also possible that at step 3 we go back to the start -- does that bias things? Basically, no. The reason is, suppose we arbitrarily choose a number k and then consider the distribution of the algorithm, conditioned on stopping exactly after k rounds. Conditioned on the assumption that we stop at the k'th round, no matter what value k we choose, the distribution we sample has to be exactly right by the above argument. Since if we eliminate the case that p is too small, the other possibilities all have their proportions correct. Since the distribution is perfect for each value of k that we might condition on, and the overall distribution (not conditioned on k) is an average of the distributions for each value of k, the overall distribution is perfect also.
If you want to analyze the number of rounds that typically needed in a rigorous way, you can do it by analyzing the probability that we actually stop at step 3 for any particular round. Since the rounds are independent, this is the same for every round, and statistically, it means that the running time of the algorithm is poisson distributed. That means it is tightly concentrated around its mean, and we can determine the mean by knowing that probability.
The probability that we stop at step 3 can be determined by considering the conditional probability that we stop at step 3, given that we chose any particular entry [i][j]. By the formulas for conditional expectation, you get that
Pr[ stop at step 3 ] = sum_{i,j} ( 1/(n^2) * Matrix[i,j] )
Since we assumed the matrix is normalized, this sum reduces to just 1/n^2. So, the expected number of rounds is about n^2 (that is, n^2 up to a constant factor) no matter what the entries in the matrix are. You can't hope to do a lot better than that I think -- that's about the same amount of time it takes to just read all the entries of the matrix, and it's hard to sample from a distribution that you cannot even read all of.
Note: What I described is a way to correctly sample a single element -- to get N elements from one matrix, you can just repeat it N times.
Method B:
Basically you just want to compute a histogram and sample inversely from it, so that you know you get exactly the right distribution. Computing the histogram is expensive, but once you have it, getting samples is cheap and easy.
In C++ it might look like this:
// Make histogram
typedef unsigned int uint;
typedef std::pair<uint, uint> upair;
typedef std::map<double, upair> histogram_type;
histogram_type histogram;
double cumulative = 0.0f;
for (uint i = 0; i < Matrix.size(); ++i) {
for (uint j = 0; j < Matrix[i].size(); ++j) {
cumulative += Matrix[i][j];
histogram[cumulative] = std::make_pair(i,j);
}
}
std::vector<upair> result;
for (uint k = 0; k < N; ++k) {
// Do a sample (this should never repeat... if it does not find a lower bound you could also assert false quite reasonably since it means something is wrong with rand() implementation)
while(1) {
double p = cumulative * rand(); // Or, for best results use std::mt19937 or boost::mt19937 and sample a real in the range [0,1] here.
histogram_type::iterator it = histogram::lower_bound(p);
if (it != histogram.end()) {
result.push_back(it->second);
break;
}
}
}
return result;
Here the time to make the histogram is something like number of cells * O(log number of cells) since inserting into the map takes time O(log n). You need an ordered data structure in order to get cheap lookup N * O(log number of cells) later when you do repeated sampling. Possibly you could choose a more specialized data structure to go faster, but I think there's only limited room for improvement.
Edit: As #Bob__ points out in comments, in method (B) a written there is potentially going to be some error due to floating point round-off if the matrices are quite large, even using type double, at this line:
cumulative += Matrix[i][j];
The problem is that, if cumulative is much larger than Matrix[i][j] beyond what the floating point precision can handle then these each time this statement is executed you may observe significant errors which accumulate to introduce significant inaccuracy.
As he suggests, if that happens, the most straightforward way to fix it is to sort the values Matrix[i][j] first. You could even do this in the general implementation to be safe -- sorting these guys isn't going to take more time asymptotically than you already have anyways.

Code Output on Graph and some claims on Local Contest?

I ran into a question as follows:
We have a Code on Weighted, Acyclic Graph G(V, E) with positive and negative edges. we change the weight of this graph with following code, to give a G without negative edge (G'). if V={1,2...,n} and G_ij be a weight of edge i to edge j.
Change_weight(G)
for i=i to n
for j=1 to n
c_i=min c_ij for all j
if c_i < 0
c_ij = c_ij-c_i for all j
c_ki = c_ki+c_i for all k
We have two axioms:
1) the shortest path between every two vertex in G is the same as G'.
2) the length of shortest path between every two vertex in G is the same as G'.
We want to verify these two sentence. which one is True and Which one is false. Who can add some hint why these are true or false?
My Solution:
I think two is false as following counter example, the original graph is given in left, and after the algorithm is run, the result is in right the shortest path between 1 to 3 changed, it passed from vertex 2 but after the algorithm is run it never passed from vertex 2.

Assumptions:
There are a few problems with your presentation of the question; I made some assumptions, which I clarify here. The answer to your question, given that these assumptions are correct, is in the section below.
First, as #amit said, your use of j is not clear. It seems that you meant this:
Change_weight(G)
for i = 1 to n
c_i = min(c_ij) for all j
if c_i < 0
c_ij = c_ij-c_i for all j
c_ki = c_ki+c_i for all k
That is, for every vertex i, if the smallest outgoing edge c_i is negative, then increase the weights of all outgoing edges by -c_i and decrease the weights of all incoming edges by -c_i. Then the smallest outgoing edge will have weight of 0.
Second, by itself, this algorithm will not guarantee that G' has no negative edges! Consider the following graph:
Here, the value of edge (1,2) is pushed up to 0 by the operation on 1, but it is pushed back to -1 by the operation on 2. You must specify that the graph is in reverse topological order, so that edge (i,j) will always be operated on by j before being operated on by i. (Alternatively, you could sort it in topological order and iterate from n to 1.)
Answer to your question:
1) The shortest path between every two vertices in G is the same as in G'.
This is true. Consider a path not as a tuple of edges but as a tuple of nodes. For vertices s and t, a path is a tuple of nodes (s, v_1, v_2, ..., t) where there is an edge between every two subsequent elements. For every vertex u, u decreased the cost of incoming edges at the same rate that it increased the cost of outgoing edges; therefore, the relative cost of including u in the path is unchanged.
2) The weight of the shortest path between every two vertices in G is the same as in G'.
This is false. The source s increases its outgoing weight by -c_s, while the destination t decreases its incoming weight by -c_t. If c_s != c_t, then the weight of the path will not be the same.
To reiterate, the weight of every path from s to t will be increased by (c_t-c_s). Therefore, the shortest path for a given s and t pair will still be the shortest (since all paths between this pair change by the same amount). However, the weight will obviously not necessarily be the same.

Efficient data structure for sparse data lookup

Situation:
Given some points with coordinate (x, y)
Range 0 < x < 100,000,000 and 0 < y <100,000,000
I have to find smallest square which contains at least N no of points on its edge and inside it.
I used vector to store coordinates and searched all squares with side length minLength upto side length maxLength (Appling Brute Force in relevant space)
struct Point
{
int x;
int y;
};
vector<Point> P;
int minLength = sqrt(N) - 1;
int maxLength = 0;
// bigx= largest x coordinate of any point
// bigy= largest y coordinate of any point
// smallx= smallest x coordinate of any point
// smally= smallest y coordinate of any point
(bigx - smallx) < (bigy - smally) ? maxLength = (bigx - smallx) : maxLength = (bigy - smally);
For each square I looked up, traversed through complete vector to see if at least N points are on its edge and inside it.
This was quite time inefficient.
Q1. What data structure should I use to improve time efficiency without changing Algorithm I used?
Q2. Efficient Algorithm for this problem?

There are points on 2 opposite edges - if not, you could shrink the square by 1 and still contain the same number of points. That means the possible coordinates of the edges are limited to those of the input points. The input points are probably not on the corners, though. (For a minimum rectangle, there would be points on all 4 edges as you can shrink one dimension without altering the other)
The next thing to realize is that each point divides the plane in 4 quadrants, and each quadrant contains a number of points. (These can add up to more than the total number of points as the quadrants have one pixel overlap). Lets say that NW(p) is the number of points to the northwest of point p, i.e. those that have x>=px and y>=py. Then the number of points in a square is NW(bottomleft) + NW(topright) - NW(bottomright) - NW(topleft).
It's fairly easy to calculate NW(p) for all input points. Sort them by x and for equal x by y. The most northwestern point has NW(p)==0. The next point can have NW(p)==1 if it's to the southeast of the first point, else it has NW(p)==0. It's also useful to keep track of SW(p) in this stage, as you're working through the points from west to east and they're therefore not sorted north to south. Having calculated NW(p), you can determine the number of points in a square S in O(1)
Recall that the square size is restricted by by the need to have points on opposite edges. Assume the points are on the left (western) and right edge - you still have the points sorted by x order. Start by assuming the left edge is at your leftmost x coordinate, and see what the right edge must be to contain N points. Now shift the left edge to the next x coordinate and find a new right edge (and thus a new square). Do this until the right edge of the square is the rightmost point.
Its also possible that the square is constrained in y direction. Just sort the points in y direction and repeat, then choose the smallest square between the two outcomes.
Since you're running linearly through the points in x and y direction, that part is just O(N) and the dominant factor is the O(N log N) sort.

Look at http://en.wikipedia.org/wiki/Space_partitioning for algorithms that use the Divide-and-Conquer technique to solve this. This is definitely solvable in Polynomial time.
Another variant algorithms can be on the following lines.
Generate a vornoi-diagram on the points to get neighbour information. [ O(nlog(n)) ]
Now use Dynamic Programming, the DP will be similar to the problem of finding the maximum subarray in a 2D array. Here instead of the sum of numbers, you will keep count of points before it.
2.a Essentially a recursion similar to this will hold. [ O(n) ]
Number of elements in square from (0,0) to (x,y ) = (Number of elems
from square (0,0 to ((x-1),y))+ (Number of elems in square 0,0 - ( x, y-1))
- (Number of elems in (0,0)-((x-1),(y-1)))
Your recurrence will have to change for all the points on its neighbourhood and to the left and above, instead of just the points above and left as above.
Once the DP is ready, you can query the points in a sqare in O(1).
Another O(n^2) loop to find from all possible combinations and find the least square.
You can even greedily start from the smallest squares first, that way you can end your search as soon as you find a suitable square..

The rtree allows spatial searching, but doesn't have stl implementation, although sqlite would allow binding. This can answer "get all points within range", "k nearest neighbours"
Finding a region which has the most dense data, is a problem similar to clustering.
Iterating over the points and finding the N nearest entries to each point. Then generate the smallest circle - centre would be the Max(x) - min(x), Max(y) - min(y). A square can be formed which contains all the neighbours, and would be somewhere between 2r length and 2sqrt(r) length sides compared to circle.
Time taken O(x) to build structure
O(X N log(X)) to search for smallest cluster

Note: There are a bunch of answers for your second question (which will probably reap bigger benefits), but I'm only referring to your first one, i.e. what data to use without changing the algorithm.
There, I think that your choice using a vector is already pretty good, because in general vectors offer the best payload/overhead ratio and also the fastest iteration. In order to find out specific bottlenecks, use a profiler, otherwise you are only guessing. With large vectors, there are a few things to avoid though:
Overallocation, this wastes space.
Underallocation, this causes copying when the vector is grown to the necessary size.
Copying.

Generating random integers with a difference constraint

I have the following problem:
Generate M uniformly random integers from the range 0-N, where N >> M, and where no pair has a difference less than K. where M >> K.
At the moment the best method I can think of is to maintain a sorted list, then determine the lower bound of the current generated integer and test it with the lower and upper elements, if it's ok to then insert the element in between. This is of complexity O(nlogn).
Would there happen to be a more efficient algorithm?
An example of the problem:
Generate 1000 uniformly random integers between zero and 100million where the difference between any two integers is no less than 1000
A comprehensive way to solve this would be to:
Determine all the combinations of n-choose-m that satisfy the constraint, lets called it set X
Select a uniformly random integer i in the range [0,|X|).
Select the i'th combination from X as the result.
This solution is problematic when the n-choose-m is large, as enumerating and storing all possible combinations will be extremely costly. Hence an efficient online generating solution is sought.
Note: The following is a C++ implementation of the solution provided by pentadecagon
std::vector<int> generate_random(const int n, const int m, const int k)
{
if ((n < m) || (m < k))
return std::vector<int>();
std::random_device source;
std::mt19937 generator(source());
std::uniform_int_distribution<> distribution(0, n - (m - 1) * k);
std::vector<int> result_list;
result_list.reserve(m);
for (int i = 0; i < m; ++i)
{
result_list.push_back(distribution(generator));
}
std::sort(std::begin(result_list),std::end(result_list));
for (int i = 0; i < m; ++i)
{
result_list[i] += (i * k);
}
return result_list;
}
http://ideone.com/KOeR4R
.

EDIT: I adapted the text for the requirement to create ordered sequences, each with the same probability.
Create random numbers a_i for i=0..M-1 without duplicates. Sort them. Then create numbers
b_i=a_i + i*(K-1)
Given the construction, those numbers b_i have the required gaps, because the a_i already have gaps of at least 1. In order to make sure those b values cover exactly the required range [1..N], you must ensure a_i are picked from a range [1..N-(M-1)*(K-1)]. This way you get truly independent numbers. Well, as independent as possible given the required gap. Because of the sorting you get O(M log M) performance again, but this shouldn't be too bad. Sorting is typically very fast. In Python it looks like this:
import random
def random_list( N, M, K ):
s = set()
while len(s) < M:
s.add( random.randint( 1, N-(M-1)*(K-1) ) )
res = sorted( s )
for i in range(M):
res[i] += i * (K-1)
return res

First off: this will be an attempt to show that there's a bijection between the (M+1)- compositions (with the slight modification that we will allow addends to be 0) of the value N - (M-1)*K and the valid solutions to your problem. After that, we only have to pick one of those compositions uniformly at random and apply the bijection.
Bijection:
Let
Then the xi form an M+1-composition (with 0 addends allowed) of the value on the left (notice that the xi do not have to be monotonically increasing!).
From this we get a valid solution
by setting the values mi as follows:
We see that the distance between mi and mi + 1 is at least K, and mM is at most N (compare the choice of the composition we started out with). This means that every (M+1)-composition that fulfills the conditions above defines exactly one valid solution to your problem. (You'll notice that we only use the xM as a way to make the sum turn out right, we don't use it for the construction of the mi.)
To see that this gives a bijection, we need to see that the construction can be reversed; for this purpose, let
be a given solution fulfilling your conditions. To get the composition this is constructed from, define the xi as follows:
Now first, all xi are at least 0, so that's alright. To see that they form a valid composition (again, every xi is allowed to be 0) of the value given above, consider:
The third equality follows since we have this telescoping sum that cancels out almost all mi.
So we've seen that the described construction gives a bijection between the described compositions of N - (M-1)*K and the valid solutions to your problem. All we have to do now is pick one of those compositions uniformly at random and apply the construction to get a solution.
Picking a composition uniformly at random
Each of the described compositions can be uniquely identified in the following way (compare this for illustration): reserve N - (M-1)*K spaces for the unary notation of that value, and another M spaces for M commas. We get an (M+1)- composition of N - (M-1)*K by choosing M of the N - (M-1)*K + M spaces, putting commas there, and filling the rest with |. Then let x0 be the number of | before the first comma, xM+1 the number of | after the last comma, and all other xi the number of | between commas i and i+1. So all we have to do is pick an M-element subset of the integer interval[1; N - (M-1)*K + M] uniformly at random, which we can do for example with the Fisher-Yates shuffle in O(N + M log M) (we need to sort the M delimiters to build the composition) since M*K needs to be in O(N) for any solutions to exist. So if N is bigger than M by at least a logarithmic factor, then this is linear in N.
Note: #DavidEisenstat suggested that there are more space efficient ways of picking the M-element subset of that interval; I'm not aware of any, I'm afraid.
You can get an error-proof algorithm out of this by doing the simple input validation we get from the construction above that N ≥ (M-1) * K and that all three values are at least 1 (or 0, if you define the empty set as a valid solution for that case).

Why not do this:
for (int i = 0; i < M; ++i) {
pick a random number between K and N/M
add this number to (N/M)* i;
Now you have M random numbers, distributed evenly along N, all of which have a difference of at least K. It's in O(n) time. As an added bonus, it's already sorted. :-)
EDIT:
Actually, the "pick a random number" part shouldn't be between K and N/M, but between min(K, [K - (N/M * i - previous value)]). That would ensure that the differences are still at least K, and not exclude values that should not be missed.
Second EDIT:
Well, the first case shouldn't be between K and N/M - it should be between 0 and N/M. Just like you need special casing for when you get close to the N/M*i border, we need special initial casing.
Aside from that, the issue you brought up in your comments was fair representation, and you're right. As my pseudocode is presented, it currently completely misses the excess between N/M*M and N. It's another edge case; simply change the random values of your last range.
Now, in this case, your distribution will be different for the last range. Since you have more numbers, you have slightly less chance for each number than you do for all the other ranges. My understanding is that because you're using ">>", this shouldn't really impact the distribution, i.e. the difference in size in the sample set should be nominal. But if you want to make it more fair, you divide the excess equally among each range. This makes your initial range calculation more complex - you'll have to augment each range based on how much remainder there is divided by M.
There are lots of special cases to look out for, but they're all able to be handled. I kept the pseudocode very basic just to make sure that the general concept came through clearly. If nothing else, it should be a good starting point.
Third and Final EDIT:
For those worried that the distribution has a forced evenness, I still claim that there's nothing saying it can't. The selection is uniformly distributed in each segment. There is a linear way to keep it uneven, but that also has a trade-off: if one value is selected extremely high (which should be unlikely given a very large N), then all the other values are constrained:
int prevValue = 0;
int maxRange;
for (int i = 0; i < M; ++i) {
maxRange = N - (((M - 1) - i) * K) - prevValue;
int nextValue = random(0, maxRange);
prevValue += nextValue;
store previous value;
prevValue += K;
}
This is still linear and random and allows unevenness, but the bigger prevValue gets, the more constrained the other numbers become. Personally, I prefer my second edit answer, but this is an available option that given a large enough N is likely to satisfy all the posted requirements.
Come to think of it, here's one other idea. It requires a lot more data maintenance, but is still O(M) and is probably the most fair distribution:
What you need to do is maintain a vector of your valid data ranges and a vector of probability scales. A valid data range is just the list of high-low values where K is still valid. The idea is you first use the scaled probability to pick a random data range, then you randomly pick a value within that range. You remove the old valid data range and replace it with 0, 1 or 2 new data ranges in the same position, depending on how many are still valid. All of these actions are constant time other than handling the weighted probability, which is O(M), done in a loop M times, so the total should be O(M^2), which should be much better than O(NlogN) because N >> M.
Rather than pseudocode, let me work an example using OP's original example:
0th iteration: valid data ranges are from [0...100Mill], and the weight for this range is 1.0.
1st iteration: Randomly pick one element in the one element vector, then randomly pick one element in that range.
If the element is, e.g. 12345678, then we remove the [0...100Mill] and replace it with [0...12344678] and [12346678...100Mill]
If the element is, e.g. 500, then we remove the [0...100Mill] and replace it with just [1500...100Mill], since [0...500] is no longer a valid range. The only time we will replace it with 0 ranges is in the unlikely event that you have a range with only one number in it and it gets picked. (In that case, you'll have 3 numbers in a row that are exactly K apart from each other.)
The weight for the ranges are their length over the total length, e.g. 12344678/(12344678 + (100Mill - 12346678)) and (100Mill - 12346678)/(12344678 + (100Mill - 12346678))
In the next iterations, you do the same thing: randomly pick a number between 0 and 1 and determine which of the ranges that scale falls into. Then randomly pick a number in that range, and replace your ranges and scales.
By the time it's done, we're no longer acting in O(M), but we're still only dependent on the time of M instead of N. And this actually is both uniform and fair distribution.
Hope one of these ideas works for you!

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js