Chapel domains : differences between `low/high` and `first/last` methods - chapel

Chapel domains have two sets of methods
domain.low, domain.high
and
domain.first, domain.last
What are the various cases where these return different results (i.e when is domain.first != domain.low and domain.last != domain.high?

First, note that these queries are supported not just on domains, but also on ranges (a simpler type representing an integer sequence upon which many domains, and their domain queries, are based). For that reason, my answer will initially focus on ranges for simplicity, before returning to dense rectangular domains (which are defined using a range per dimension).
As background, first and last on a range are designed to specify the indices that you'll get when iterating over that range. In contrast, low and high specify the minimal and maximal indices that define the range.
For a simple range, like 1..10, first and low will be the same, evaluating to 1, while last and high will both evaluate to 10
The way you iterate through a range in reverse order in Chapel is by using a negative stride like 1..10 by -1. For this range, low and high will still be 1 and 10 respectively, but first will be 10 and last will be 1 since the range represents the integers 10, 9, 8, ..., 1.
Chapel also supports non-unit strides, and they can also result in differences. For example for the range 1..10 by 2, low and high will still be 1 and 10 respectively, and first will still be 1 but last will be 9 since this range only represents the odd values between 1 and 10.
The following program demonstrates these cases along with 1..10 by -2 which I'll leave as an exercise for the reader (you can also try it online (TIO)):
proc printBounds(r) {
writeln("For range ", r, ":");
writeln(" first = ", r.first);
writeln(" last = ", r.last);
writeln(" low = ", r.low);
writeln(" high = ", r.high);
writeln();
}
printBounds(1..10);
printBounds(1..10 by -1);
printBounds(1..10 by 2);
printBounds(1..10 by -2);
Dense rectangular domains are defined using a range per dimension. Queries like low, high, first, and last on such domains return a tuple of values, one per dimension, corresponding to the results of the queries on the respective ranges. As an example, here's a 4D domain defined in terms of the ranges above (TIO):
const D = {1..10, 1..10 by -1, 1..10 by 2, 1..10 by -2};
writeln("low = ", D.low);
writeln("high = ", D.high);
writeln("first = ", D.first);
writeln("last = ", D.last);

Related

What would be the fastest algorithm to randomly select N items from a list based on weights distribution?

I have a large list of items, each item has a weight.
I'd like to select N items randomly without replacement, while the items with more weight are more probable to be selected.
I'm looking for the most performing idea. Performance is paramount. Any ideas?
If you want to sample items without replacement, you have lots of options.
Use a weighted-choice-with-replacement algorithm to choose random indices. There are many algorithms like this. One of them is WeightedChoice, described later in this answer, and another is rejection sampling, described as follows. Assume that the highest weight is max, there are n weights, and each weight is 0 or greater. To choose an index in [0, n) using rejection sampling:
Choose a uniform random integer i in [0, n).
With probability weights[i]/max, return i. Otherwise, go to step 1. (For example, if all the weights are integers greater than 0, choose a uniform random integer in [1, max] and if that number is weights[i] or less, return i, or go to step 1 otherwise.)
Each time the weighted choice algorithm chooses an index, set the weight for the chosen index to 0 to keep it from being chosen again. Or...
Assign each index an exponentially distributed random number (with a rate equal to that index's weight), make a list of pairs assigning each number to an index, then sort that list by those numbers. Then take each item from first to last, in ascending order. This sorting can be done on-line using a priority queue data structure (a technique that leads to weighted reservoir sampling). Notice that the naïve way to generate the random number, -ln(1-RNDU01())/weight, where RNDU01() is a uniform random number in [0, 1], is not robust, however ("Index of Non-Uniform Distributions", under "Exponential distribution").
Tim Vieira gives additional options in his blog.
A paper by Bram van de Klundert compares various algorithms.
EDIT (Aug. 19): Note that for these solutions, the weight expresses how likely a given item will appear first in the sample. This weight is not necessarily the chance that a given sample of n items will include that item (that is, an inclusion probability). The methods given above will not necessarily ensure that a given item will appear in a random sample with probability proportional to its weight; for that, see "Algorithms of sampling with equal or unequal probabilities".
Assuming you want to choose items at random with replacement, here is pseudocode implementing this kind of choice. Given a list of weights, it returns a random index (starting at 0), chosen with a probability proportional to its weight. This algorithm is a straightforward way to implement weighted choice. But if it's too slow for you, see my section "Weighted Choice With Replacement" for a survey of other algorithms.
METHOD WChoose(weights, value)
// Choose the index according to the given value
lastItem = size(weights) - 1
runningValue = 0
for i in 0...size(weights) - 1
if weights[i] > 0
newValue = runningValue + weights[i]
lastItem = i
// NOTE: Includes start, excludes end
if value < newValue: break
runningValue = newValue
end
end
// If we didn't break above, this is a last
// resort (might happen because rounding
// error happened somehow)
return lastItem
END METHOD
METHOD WeightedChoice(weights)
return WChoose(weights, RNDINTEXC(Sum(weights)))
END METHOD
Let A be the item array with x itens. The complexity of each method is defined as
< preprocessing_time, querying_time >
If sorting is possible: < O(x lg x), O(n) >
sort A by the weight of the itens.
create an array B, for example:
B = [ 0, 0, 0, x/2, x/2, x/2, x/2, x/2 ].
it's clear to see that B has a bigger probability from choosing x/2.
if you haven't picked n elements yet, choose a random element e from B.
pick a random element from A within the interval e : x-1.
If iterating through the itens is possible: < O(x), O(tn) >
iterate through A and find the average weight w of the elements.
define the maximum number of tries t.
try (at most t times) to pick a random number in A whose weight is bigger than w.
test for some t that gives you good/satisfactory results.
If nothing above is possible: < O(1), O(tn) >
define the maximum number of tries t.
if you haven't picked n elements yet, take t random elements in A.
pick the element with biggest value.
test for some t that gives you good/satisfactory results.

Stroustrup's C++ Book Challenge, Can someone help me understand this code?

I saw this code from the Stroustrup's book, but I can't understand how it works.
I just can't get how it increases by " 0, 1, 4, 9..."
int archaic_square(int v) {
int total = 0;
for (int i = 0; i < v; ++i) {
total += v;
}
return total;
}
int main() {
for (int i = 0; i < 100; ++i) {
cout << i << '\t' << archaic_square(i) << '\n';
}
return 0;
}
The code in archaic_square is starting total off as zero, then adding v to it v times (in the loop).
By definition, it will then end up as:
0 + v + v + … + v
\___________/
v times
which is 0 + v * v, or v2.
In more explicit detail:
adding zero to zero, zero times, gives you zero (0);
adding one to zero, once, gives you one (0, 1);
adding two to zero, two times, gives you four (0, 2, 4);
adding three to zero, three times, gives you nine (0, 3, 6, 9);
adding four to zero, four times, gives you sixteen (0, 4, 8, 12, 16);
and so on, ad infinitum.
Remember from arithmetic that multiplication is repeated just a addition (or rather a repeated addition by definition)? That's all that's happening here.
Since v is getting added v times, it is the same as v * v, or v squared.
That code calculates squares by the method of differences. It's an alternative way of evaluating functions, and has some benefits over the usual plug-in-the-values approach. It was used in Babbage's difference engine, which was designed in the early 1800s to calculate mathematical tables for logarithms, trig functions, etc. to 40 (!) digits.
The underlying idea is that you begin with a list of values that you know how to evaluate. You subtract each value from its neighboring value, giving you a list of first differences. Then you subtract each difference value from its neighboring difference value, giving you a list of second differences. Continue until you reach a level where all the difference values are equal. For a second-order polynomial (such as x^2) the second differences will all be equal. For a third-order polynomial, the third differences will all be equal. And so on.
So for calculating squares, you end up with this:
Value First Difference Second Difference
0
1 1
4 3 2
9 5 2
16 7 2
25 9 2
Now you can reverse the process. Start with a result of 0. Add the first difference (1), giving the next result (1). Then increase the first difference by the second difference (2), giving the next first difference (3); add that to the previous result (1), giving the next result (4). Then increase the new first difference (3) by the second difference (2), giving the next first difference (5); add that to the previous result (4), giving the next result (9). Continue until done.
So with only the first value (0), the first difference (1), and the (constant) second difference (2), we can generate as long a list of squares as we would like.
When you need a list of results, calculating them one after another like this replaces multiplication with addition, which, back in the olden days, was much faster. Further, if a computer (back when a computer was a person who did tedious calculations to produce mathematical) made a mistake, all the results after that mistake would be wrong, too, so the mathematician in charge of the project didn't have to provide for checking every result; spot checking was sufficient.
Calculating trig functions, of course, is a bit more tricky, because they aren't defined by polynomials. But over a small enough region they can be approximated by a polynomial.
Babbage's engine would have calculated 40 digit values, with up to 7 levels of differences. It mechanically went through the sequence of steps I mentioned above, grinding out results at a rate of one every few seconds. Babbage didn't actually build the full difference engine; he got an inspiration for a much more powerful "Analytical engine" which he also never built. It would have been a precursor to modern digital computers, with 1000 40-digit storage units, an arithmetic processor, and punched cards to control the sequence of operations.

DEAP toolbox: to consider different types and ranges of genes in mutation and crossover operators

I am working on a genetic algorithm implementation and I'm using DEAP toolbox.
I've written a code that initializes chromosomes which their first gene is a float number in range of [0.01, 2048], their second gene is again float in range of [0.0001, 10] and their last three genes are boolean. This is my code:
toolbox.register("attr_flt1", random.uniform, 0.01, 2048)
toolbox.register("attr_flt2", random.uniform, 0.0001, 10)
toolbox.register("attr_bool", random.randint, 0, 1)
enter ctoolbox.register("individual", tools.initCycle, creator.Individual,
(toolbox.attr_flt1, toolbox.attr_flt2, toolbox.attr_bool, toolbox.attr_bool, toolbox.attr_bool),
n=1)
There is a sample of created population:
[1817.2852738610263, 6.184224906600851, 0, 0, 1], [1145.7253307024512, 8.618185266721435, 1, 0, 1], ...
Now, I want to do mutation and crossover on my chromosomes by considering differences in the genes types and ranges.
Currently I have an error because a 0 value is produced for the first gene of a chromosome ,after applying crossover and mutation operators, which is wrong with my evaluation function.
Can anyone help me code selection, mutation and crossover using DEAP toolbox that produce new population in ranges defined at first?
If you use the mutation operator mutPolynomialBounded (documented here), then you can specify the interval for each gene.
With the bounds you indicated, perhaps using something such as
eta = 0.5 #indicates degree of ressemblance of mutated individual
indpb = 0.1 #probability of individual to be mutated
low = [0.01, 0.0001, 0, 0, 0] #lower bound for each gene
up = [2048, 10, 1, 1, 1] #upper bound for each gene
toolbox.register('mutate', mutPolynomialBounded(individual, eta, low, up, indpb))
as a mutation function will solve your error.
This way, first gene is in the interval [0.01, 2048], the second gene is in the interval [0.0001, 10] and the last three genes are in the interval [0, 1].
If you also want the last three genes to be either 0 or 1 (but not a float in between), then you might have to implement your own mutation function. For instance, the following function will select random values for each gene using your requirements
def mutRandom(individual, indpb):
if random.random() < indpb:
individual[0] = toolbox.attr_flt1()
individual[1] = toolbox.attr_flt2()
for i in range(2, 5):
individual[i] = toolbox.attr_bool()
return individual,

Fastest way to check list of integers against a list of Ranges in scala?

I have a list of integers and I need to find out the range it falls in. I have a list of ranges which might be of size 2 to 15 at the maximum. Currently for every integer,I check through the list of ranges and find its location. But this takes a lot of time as the list of integers I needed to check includes few thousands.
//list of integers
val numList : List[(Int,Int)] = List((1,4),(6,20),(8,15),(9,15),(23,27),(21,25))
//list of ranges
val rangesList:List[(Int,Int)] = List((1,5),(5,10),(15,30))
def checkRegions(numPos:(Int,Int),posList:List[(Int,Int)]){
val loop = new Breaks()
loop.breakable {
for (va <- 0 until posList.length) {
if (numPos._1 >= posList(va)._1 && numPos._2 <= (posList(va)._2)) {
//i save "va"
loop.break()
}
}
}
}
Currently for every integer in numList I go through the rangesList to find its range and save its range location. Is there any faster/better way approach to this issue?
Update: It's actually a list of tuples that is compared against a list of ranges.
First of all, using apply on a List is problematic, since it takes linear run time.
List(1,2,3)(2) has to traverse the whole list to finally get the last element at index 2.
If you want your code to be efficient, you should either find a way around it or choose another data structure. Data structures like IndexedSeq have constant time indexing.
You should also avoid breaks, as to my knowledge, it works via exceptions and that is not a good practice. There are always ways around it.
You can do something like this:
val numList : List[(Int,Int)] = List((1,4),(6,20),(8,15),(9,15),(23,27),(21,25))
val rangeList:List[(Int,Int)] = List((1,5),(5,10),(15,30))
def getRegions(numList: List[(Int,Int)], rangeList:List[(Int,Int)]) = {
val indexedRangeList = rangeList.zipWithIndex
numList.map{case (a,b) => indexedRangeList
.find{case ((min, max), index) => a >= min && b <= max}.fold(-1)(_._2)}
}
And use it like this:
getRegions(numList, rangeList)
//yields List(0, -1, -1, -1, 2, 2)
I chose to yield -1 when no range matches. The key thing is that you zip the ranges with the index beforehand. Therefore we know at each range, what index this range has and are never using apply.
If you use this method to get the indices to again access the ranges in rangeList via apply, you should consider changing to IndexedSeq.
The apply will of course only be costly when the number ranges gets big. If, as you mentioned, it is only 2-15, then it is no problem. I just want to give you the general idea.
One approach includes the use of parallel collections with par, and also indexWhere which delivers the index of the first item in a collection that holds a condition.
For readability consider this predicate for checking interval inclusion,
def isIn( n: (Int,Int), r: (Int,Int) ) = (r._1 <= n._1 && n._2 <= r._2)
Thus,
val indexes = numList.par.map {n => rangesList.indexWhere(r => isIn(n,r))}
indexes: ParVector(0, -1, -1, -1, 2, 2)
delivers, for each number, the index in the ranges collection where it is included. Value -1 indicates the condition did not hold.
For associating numbers with range indexes, consider this,
numList zip indexes
res: List(((1,4), 0), ((6,20),-1), ((8,15),-1),
((9,15),-1), ((23,27),2), ((21,25),2))
Parallel collections may prove more efficient that the non parallel counterpart for performing computations on a very large number of items.

distrubute a set of elements between N buckets based on weights

Given N buckets and some elements E1 (W1) E2(W2). I want to distribute the N buckets between The elements Ei based on their weights Wi
For example N = 20, W1 = 5 W2 = 5 W3 = 10 so
E1_buckets = 20*(5/20) = 5
E2_buckets = 20*(5/20) = 5
E3_buckets = 20*(10/20) = 10
I must have the individual buckets (5+5+10=20) sum up to N.
I was thinking of doing something like this
bucket[i] = round(N*(W[i]/TOT_WGT) where W[i] = element weight, and TOT_WGT = sum of weights W[i]
It seems however that I could run into error from imprecision in the representation of floating point numbers. Is it possible with floating point arithmetic to guarantee that the sum of buckets would always sum to N?
An alternative way would be to always take the floor and assign the excess to some random element
bucket[i] = floor(N*(W[i]/TOT_WGT)
bucket[k] += (N-sum_of_buckets)
While it doesn't guarantee perfect weighting I do get the buckets summing to N.
Any thoughts, am I missing something and there is a possibly simple way to do this?
Instead of calculating the number of buckets in element i, you could calculate the number of buckets in the first i elements, and then later subtract the number of buckets in the first i-1 elements to get the number of buckets in element i.
In this case, the number of buckets in the first i elements could be round (N * SUM_k_up_to_i(W[k])/TOT_WGT). In that case the number of elements in all buckets will be round(N * TOT_WGT/TOT_WGT) which is very likely to sum to N - and which you could in any case replace with N, and you are guaranteed that the sum of the buckets will be N.
The best way to do it is to not to represent a bin using its width. You're trying to represent a continuous interval, and doing that by taking the union of subintervals aligned ~just right~ is tricky to say the least.
Instead, calculate the positions of the interior dividers ({5, 10} in your example), then represent your buckets as pairs of endpoints (the endpoints being {0, 5, 10, 20} in the example). Whenever you need the width of a bin, return the difference between that bin's two endpoints. Yeah, the widths of the bins might be off from the weights by a bit, but if your application is sensitive to that error your should really be using an exact numeric type instead.