Related
I am working on an optimization problem. I have X number of ambulance locations, where X ranges from 1-39.
There are 43 numbers [Ambulance Locations] to choose from (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39) , we choose 3 of them since I have 3 ambulances.
I can only put my ambulance in three locations among 1-39 locations (Restriction). Assume that I want to put my Ambulance on the 5th, 19th, and 31 positions. -- Chromosome 1= [000010000000000000100000000000100000000]. In the above presentation, I am turning on 5-bit, 19-bit, and 31-bit.
Is it possible to flip a bit close to the original solution? For example, keeping 2 bits on in the original position and randomly changing the 3rd bit close to 2bits. It is important for me to keep 3bits on among 39bits. I want to make a control mutation with the aim to produce a small change.
My goal is to make small changes since each bit represents a location. The purpose of mutation is to make small changes and see evaluate results. Therefore, a code should do something like this. As for CS1: (111000000000000000000000000000000000000), I want something like (011010000000000000000000000000000000000), or (011001000000000000000000000000000000000), or (010110000000000000000000000000000000000) or (101010000000000000000000000000000000000), or (00101100000000000000000000000000000000), etc
To achieve mutation, what can be a good way to randomly change present positions to other positions keeping the range only between 1-39 locations (Restriction)?
you could use numpy and do something like
import numpy
s = "1110000000000000000000000000"
def mutate(s):
arr = numpy.array(list(s))
mask = arr == "1"
indices_of_ones = numpy.argwhere(mask).flatten()
pick_one_1_index = numpy.random.choice(indices_of_ones)
potential_swaps = numpy.argwhere(~mask).flatten()
distances = numpy.abs(pick_one_1_index - potential_swaps)
probabilities = (1/distances) # higher probabilities the less distance from its original position
# probabilities = (1/(distances*2)) # even higher probabilities the less distance from its original position
pick_one_0_index = numpy.random.choice(potential_swaps,p=probabilities/probabilities.sum())
arr[pick_one_1_index] = '0'
arr[pick_one_0_index] = '1'
return "".join(arr)
there is likely a more optimal solution
alternatively you can add a scalar or power to the distances to penalize more for distance...
if you wanted to test different multipliers or powers for the probabilities
you could use something like
def score_solution(s1,s2):
ix1 = set([i for i,v in enumerate(s1) if v == "1"])
ix2 = set([i for i,v in enumerate(s2) if v == "1"])
a,b = ix1 ^ ix2
return numpy.abs(a-b)
def get_solution_score_quantiles(sample_size=100,quantiles = [0.25,0.5,0.75]):
scores = []
for i in range(10):
s1 = mutate(s)
scores.append(score_solution(s,s1))
return numpy.quantile(scores,quantiles)
print(get_solution_score_quantiles(50))
I'm discovering Halide and got some success with a pipeline doing various
transformations. Most of these are based on the examples within the sources (color-transformations, various filters, hist-eq).
My next step needs to process the image in blocks. In a more general form,
partially-overlapping blocks.
Examples
Input:
[ 1, 2, 3, 4, 5, 6, 7, 8,
9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24,
25, 26, 27, 28, 29, 30, 31, 32]
Non-overlapping blocks:
Size: 2x4
[ 1, 2, 3, 4,
9, 10, 11, 12]
[ 5, 6, 7, 8,
13, 14, 15, 16]
[ 17, 18, 19, 20,
25, 26, 27, 28]
[ 21, 22, 23, 24,
29, 30, 31, 32]
Overlapping blocks:
Size: 2x4 with 50% overlap (both axes)
[ 1, 2, 3, 4,
9, 10, 11, 12]
[ 3, 4, 5, 6,
11, 12, 13, 14]
[ 5, 6, 7, 8,
13, 14, 15, 16]
-
[ 9, 10, 11, 12,
17, 18, 19, 20]
[11, 12, 13, 14,
19, 20, 21, 22]
...
I suspect there should be a nice way to express these, as those are also quite common
in many algorithms (e.g. macroblocks).
What i checked out
I tried to gather ideas from the tutorial and example apps and found the following,
which seem somewhat connected to what i want to implement:
Halide tutorial lesson 6: Realizing Funcs over arbitrary domains
// We start by creating an image that represents that rectangle
Image<int> shifted(5, 7); // In the constructor we tell it the size
shifted.set_min(100, 50); // Then we tell it the top-left corner
The problem i have: how to generalize this to multiple shifted domains without looping?
Halide tutorial lesson 9: Multi-pass Funcs, update definitions, and reductions
Here RDom is introduced which looks nice to create a block-view
Most examples using RDom seem to be sliding-window like approaches where there are no jumps
Target
So in general i'm asking how to implement a block-based view which can then be processed by
other steps.
It would be nice if the approach will be general enough to realize both, overlapping & no overlapping
Somehow generating the top-left indices first?
In my case, the image-dimension is known at compile-time which simplifies this
But i still would like some compact form which is nice to work with from Halide's perspective (no handcoded stuff like those examples with small filter-boxes)
The approach used might be depending on the output per block, which is a scalar in my case
Maybe someone can give me some ideas and/or some examples (which would be very helpful).
I'm sorry for not providing code, as i don't think i could produce anything helpful.
Edit: Solution
After dsharlet's answer and some tiny debugging/discussion here, the following very simplified self-containing code works (assuming an 1-channel 64x128 input like this one i created).
#include "Halide.h"
#include "Halide/tools/halide_image_io.h"
#include <iostream>
int main(int argc, char **argv) {
Halide::Buffer<uint8_t> input = Halide::Tools::load_image("TestImages/block_example.png");
// This is a simple example assuming an input of 64x128
std::cout << "dim 0: " << input.width() << std::endl;
std::cout << "dim 1: " << input.height() << std::endl;
// The "outer" (block) and "inner" (pixel) indices that describe a pixel in a tile.
Halide::Var xo, yo, xi, yi, x, y;
// The distance between the start of each tile in the input.
int tile_stride_x = 32;
int tile_stride_y = 64;
int tile_size_x = 32;
int tile_size_y = 64;
Halide::Func tiled_f;
tiled_f(xi, yi, xo, yo) = input(xo * tile_stride_x + xi, yo * tile_stride_y + yi);
Halide::RDom tile_dom(0, tile_size_x, 0, tile_size_y);
Halide::Func tile_means;
tile_means(xo, yo) = sum(Halide::cast<uint32_t>(tiled_f(tile_dom.x, tile_dom.y, xo, yo))) / (tile_size_x * tile_size_y);
Halide::Func output;
output(xo, yo) = Halide::cast<uint8_t>(tile_means(xo, yo));
Halide::Buffer<uint8_t> output_(2, 2);
output.realize(output_);
Halide::Tools::save_image(output_, "block_based_stuff.png");
}
Here's an example that breaks a Func into blocks of abitrary stride and size:
Func f = ... // The thing being blocked
// The "outer" (block) and "inner" (pixel) indices that describe a pixel in a tile.
Var xo, yo, xi, yi;
// The distance between the start of each tile in the input.
int tile_stride_x, tile_stride_y;
Func tiled_f;
tiled_f(xi, yi, xo, yo) = f(xo * tile_stride_x + xi, yo * tile_stride_y + yi);
Func tiled_output;
tiled_output(xi, yi, xo, yo) = ... // Your tiled processing here
To compute some reduction (like statistics) on each block, you can do the following:
RDom tile_dom(0, tile_size_x, 0, tile_size_y);
Func tile_means;
tile_means(xo, yo) = sum(tiled_output(tile_dom.x, tile_dom.y, xo, yo)) / (tile_size_x * tile_size_y);
To flatten the tiles back into a result is a bit tricky. It probably depends on your method of combining the results in overlapped areas. If you want to add up the overlapping tiles, the simplest way is probably to use an RDom:
RDom tiles_dom(
0, tile_size_x,
0, tile_size_y,
min_tile_xo, extent_tile_xo,
min_tile_yo, extent_tile_yo);
Func output;
Expr output_x = tiles_dom[2] * tile_stride_x + tiles_dom[0];
Expr output_y = tiles_dom[3] * tile_stride_y + tiles_dom[1];
output(x, y) = 0;
output(output_x, output_y) += tiled_output(tiles_dom[0], tiles_dom[1], tiles_dom[2], tiles_dom[3]);
Note that in the above two blocks of code, tile_stride_x and tile_size_x are independent parameters, allowing for any tile size and overlap.
In both of your examples, tile_size_x = 4, and tile_size_y = 2. To get non-overlapping tiles, set the tile strides equal to the tile size. To get 50% overlapping tiles, set tile_stride_x = 2, and tile_stride_y = 1.
A useful schedule for an algorithm like this is:
// Compute tiles as needed by the output.
tiled_output.compute_at(output, tile_dom[2]);
// or
tiled_output.compute_at(tile_means, xo);
There are other options, like using a pure func (no update/RDom) that uses the mod operator to figure out tile inner and outer indices. However, this approach can be difficult to schedule efficiently with overlapping tiles (depending on the processing you do at each tile). I use the RDom approach when this problem comes up.
Note that with the RDom approach, you have to supply the bounds of the tile indices you want computed (min_tile_xo, extent_tile_xo, ...), which can be tricky for overlapped tiles.
I would like to perform the following operation as quickly as possible
x / LSB(x)
where x is an integral value unknown at compile time and LSB(x) = x & -x.
(Alternatively, the operation is equivalent to an even division by the highest power of 2 <= x.) I am looking for a reasonably portable solution (without compiler intrinsics/builtins like GCC's __builtin_clz or alike).
My concern is that the following simple implementation
x / (x & -x)
would still result in an expensive division as compiler might fail to realize that the division is in fact equivalent to right-shift by the number of trailing zeroes in the divisor.
If my concerns are reasonable, what would be a more efficient way to implement it?
I would appreciate a solution that is easily extendible to integral types of sizes 32-bit, 64-bits, 128-bits, ...
How about
x >>= ffs(x)-1;
The ffs function conforms to 4.3BSD, POSIX.1-2001.
It won't work if x is 0.
If you don't want to rely on CLZ (count leading zeros) hardware instructions, you can count leading zeros as described in this answer. It's very fast with a look-up and multiplication by a magic number. I'll re-post the code here:
unsigned x; // input to clz
unsigned c; // output of clz
static const unsigned MultiplyDeBruijnBitPosition[32] =
{
0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8,
31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9
};
c = MultiplyDeBruijnBitPosition[((unsigned)((x & -x) * 0x077CB531U)) >> 27];
Once you have counted the leading zeros, you no loner need to use a division instruction. Instead, you can just shift the value right by c. That is (eliminating an unneeded temporary value), the code becomes this:
static const unsigned MultiplyDeBruijnBitPosition[32] =
{
0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8,
31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9
};
x >>= MultiplyDeBruijnBitPosition[((unsigned)((x & -x) * 0x077CB531U)) >> 27]; // x /= LSB(x)
I'm trying to implement simplified DES for learning purposes in python, but I am having trouble figuring out how to do the permutations based on a "schedule." Essentially, I have a tuple with the appropriate permutations, and I need to bit shift to the correct location.
For example, using a key:
K = 00010011 00110100 01010111 01111001 10011011 10111100 11011111 11110001
Would move the 57st bit to the first bit spot, 49th bit to the second bit spot, etc...
K+ = 1111000 0110011 0010101 0101111 0101010 1011001 1001111 0001111
Current code:
def keyGen(key):
PC1table = (57, 49, 41, 33, 25, 17, 9,
1, 58, 50, 42, 34, 26, 18,
10, 2, 59, 51, 43, 35, 27,
19, 11, 3, 60, 52, 44, 36,
63, 55, 47, 39, 31, 23, 15,
7, 62, 54, 46, 38, 30, 22,
14, 6, 61, 53, 45, 37, 29,
21, 13, 5, 28, 20, 12, 4)
keyBinary = bin(int(key, 16))[2:].zfill(64)
print keyBinary
permute(PC1table, keyBinary)
def permute(permutation, permuteInput):
elements = list(enumerate(permutation))
for bit in permuteInput:
***magic bitshifting goes here***
keyGen("133457799BBCDFF1")
The logic I thought would work was to enumerate the tuple of permutations, and for each bit of my old key, look in the enumeration to find the index corresponding the the bit, and bit shift the appropriate number of times, but I just can't figure out how to go about doing this. It may be that I am approaching the problem from the wrong angle, but any guidance would be greatly appreciated!
Ok, I ended up figuring a way to make this work, although this probably isn't the most efficient way...
prior to calling the function, turn the binary number into a list:
keyBinary = bin(int(key, 16))[2:].zfill(64)
keyBinary = [int(i) for i in keyBinary]
Kplus = permute(PC1table, keyBinary)
def permute(mapping, permuteInput):
permuteOutput = []
for i in range(len(mapping)):
permuteOutput.append(permuteInput[mapping[i % 56] - 1])
return permuteOutput
if anyone has a better way of tackling this, I'd love to see your solutions!
I've written a similar question which was closed I would like to ask not the code but an efficiency tip. I haven't coded but if I can't find any good hint in here I'll go and code straightforward. My question:
Suppose you have a function listNums that take a as lower bound and b as upper bound.
For example a=120 and b=400
I want to print numbers between these numbers with one rule. 120's permutations are 102,210,201 etc. Since I've got 120 I would like to skip printing 201 or 210.
Reason: The upper limit can go up to 1020 and reducing the permutations would help the running time.
Again just asking for efficiency tips.
I am not sure how you are handling 0s (eg: after outputting 1 do you skip 10, 100 etc since technically 1=01=001..).
The trick is to select a number such that all its digits are in increasing order (from left to right).
You can do it recursively. AT every recursion add a digit and make sure it is equal to or higher than the one you recently added.
EDIT: If the generated number is less than the lower limit then permute it in such a way that it is greater than or equal to the lower limit. If A1A2A3..Ak is your number and it is lower than limit), then incrementally check if any of A2A1A3...Ak, A3A1A2...Ak, ... , AkA1A2...Ak-1 are within limit. If need arises, repeat this step to with keeping Ak as first digit and finding a combination of A1A2..Ak-1.
Eg: Assume we are selecting 3 digits and lower limit is 99. If the combination is 012, then the lowest permutation that is higher than 99 is 102.
When the lower bound is 0, an answer is given by the set of numbers with non-decreasing digits (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 14, 15, 16, 17, 18, 19, 22, 23, 24, 25, 26, 27, 28, 29, 33, 34, 35, 36, 37, 38, 39, 44, 45, 46, 47, 48, 49, 55, 56, 57, 58, 59, 66, 67, 68, 69, 77, 78, 79, 88, 89, 99, 111, 112...) that fall in the requested range.
This sequence is easily formed by incrementing an integer, and when there is a carry, replicate the digit instead of carrying. Exemple: 73 is followed by 73+1 = 74 (no carry); 79 is followed by 79+1 = 80 (carry), so 88 instead; 22356999 is followed by 22356999+1 = 22357000, hence 22357777.
# Python code
A= 0 # CAUTION: this version only works for A == 0 !
B= 1000
N= A
while N < B:
# Detect zeroes at the end
S= str(N)
P= S.find('0')
if P > 0:
# Replicate the last nonzero digit
S= S[:P] + ((len(S) - P) * S[P-1])
N= eval(S)
# Next candidate
print N
N+= 1
Dealing with a nonzero lower bound is a lot more tricky.