C++ Inverted Weighted Shuffle/Random - c++

I have a list of weighted objects i.e.:
A->1 B->1 C->3 D->2 E->3
is there an efficient algorithm in C++ to pick random elements according to their weight?
For example The possibility that element A or B with a lower weighting is picked is higher (30%) than the possibility that the algorithm selects elements C E (10%) or D (20%)

As #Dukeling said, we need more info. Like how you interpret and use the selection chance.
At least in the field of evolutionary algorithm, fitness scaling (or selection chance scaling) is a sizable topic.
Suppose you start with badness score
B[i] = how badly you don't want to select the i-th item
And the objective is to calculate fitness/selection score S[i] which I assume you are to use it in roulette wheel fashion.
As you say, one obvious way is to use multiplicative inverse:
S[i] = 1 / B[i]
However, there might be a little problem with that.
The the same amount of change in B[i] with low value has so much more impact than the same amount of change when B[i] already has high value.
Ask yourself this:
Say
B[1] = 1 -> S[1] = 1
B[2] = 2 -> S[2] = 0.5
So item 1 is twice times as likely to be selected compared to item 2
But with the same amount of change
B[3] = 1000 -> S[3] = 0.001
B[4] = 1001 -> S[4] = 0.000999001
Item 3 is only 1.001 times as likely to be selected compared to item 4
I'll just throw one possible alternative scheme here for now.
S[i] = max(B) - B[i] + 1
The + 1 part helps so no item has zero chance to be selected.
This ends the part of calculating selection score.
Next, let's clear up how to use the selection score in roulette wheel fashion.
Assume we decided to use the additive inverse scheme.
B[1] = 1 -> S[1] = 1001
B[2] = 2 -> S[2] = 1000
B[3] = 1000 -> S[3] = 2
B[4] = 1001 -> S[4] = 1
Then imagine each point in the score is correspond to a lottery ticket.
Let's assign the ticket a running IDs.
| Item | Score = #ticket | ticket ID | win chance |
| 1 | 1001 | 0 to 1000 | 1001/2004 ~ 0.499500998 |
| 2 | 1000 | 1001 to 2000 | 1000/2004 ~ 0.499001996 |
| 3 | 2 | 2001 to 2002 | 2/2004 ~ 0.000998004 |
| 4 | 1 | 2003 to 2003 | 1/2004 ~ 0.000499002 |
There are 2004 tickets in total.
To do a selection, pick the winning ticket ID at random i.e. the random range is [0,2004).
Binary search can be used to quickly look up which item owns the winning ticket as you have already seen in this question. What needs to be looked up with binary search are the boundary values of ticket ID which are 1001,2001,2003 rather than the score themselves.
For comparison, here is the selection chance in case the multiplicative inverse scheme is used.
| Item | win chance |
| 1 | 1/1.501999001 ~ 0.665779404 |
| 2 | 0.5/1.501999001 ~ 0.332889702 |
| 3 | 0.001/1.501999001 ~ 0.000665779 |
| 4 | 0.000999001/1.501999001 ~ 0.000665114 |
You can notice that in the additive inverse scheme, 1 unit of badness consistently corresponds to around a difference of 0.0005 in selection chance.
Whereas in multiplicative inverse scheme, 1 unit of badness results in varying difference of selection chance.

Related

Memory continuity and search tree

I'm trying to build a search tree on top of my results. Some kind of k-ary tree with n leafs at the end. I'm looking for a C++ solution and experimenting with std::vector but can't get it done as I need memory consistency. It could be done by nested vectors but I can't do that.
Let me explain the details by example:
A unsorted result could be
Result R = { 4, 7, 8, 3, 1, 9, 0, 2, 2, 9, 6 }
On top of that I need a tree with nodes wich in my specific problem are centroids. But to keep it simple I will use artificial values here.
I define the search tree dimensions as
Height H = 2
Branch B = 3
The tree at first
4 7 8 3 1 9 0 2 2 9 6
Second step
layer_0 1.6 5 8.2
| | |
+-+-+-+-+-+ +-+ +-+-+-+
| | | | | | | | | | | |
layer_1 3 1 0 2 2 3 4 6 7 8 9 9
Last step
layer_0 1.6 5 8.2
| | |
+---+---+ +-+---+ +---+----+
layer_1 0.8 1.6 2.4 4.2 5 5.8 6.4 8.2 8.4
| | | | | | |
+-+ +-+-+ | | | | +-+
layer_2 1 0 2 2 3 4 6 7 8 9 9
This last tree is not a k-ary tree as the end-leafs sizes are 0 <= size <= |R|.
At this moment I'm experimenting with two vectors.
std::vector<size_t> layer_2;
std::vector<float> leafs;
std::size_t width, height;
With help of width and height it would be possible to navigate through leafs. But I'm questioning myself how to elegantly connect leafs and layer_2?
How would a good solution look like?
Note: This solution is continuous in the sense that it uses a contiguous data structure (vector or array) instead of a Node pointer style tree, but has the potential to have unused space within the data structure depending on the application.
A case where this approach wastes a lot of space: The max amount of branches per node is large but most nodes actually have far fewer children. This will have no effect on how long it takes to find the leafs though. In fact it is a trade off to make that bit quite fast.
consider a 3 branch tree with 4 levels in continuous memory:
R,a,b,c,aa,ab,ac,ba,bb,bc,ca,cb,cc,aaa,aab,aac,aba,abb,abc,baa.... where the index of a node's children ranges from (parent_index*3)+1 to (parent_index*3)+3
The important caveat I alluded to is that every node must always have it's three child spaces in the vector, array, whatever. If a node has say only 2 children, just fill that extra space with a null_child value to hold the space. (this is where the wasted space comes from)
The advantage is that now, finding all of the leaves is easy.
first_leaf_index = 0
for(i=0;i<(4-1);i++)//in this example 4 is the depth
first_leaf_index += 3^(i) //three is max branches per node
At that point just iterate to the end of the data structure

Queue with mod operation

I'm studying fundamental of data structure (Queue) , so far I understand the flow of Queue but I don't understand whenever queue is applying with Mod operator. There a several question which confusing my brain. How to answer this question (refer to picture)?
The best method for handling circular queues is to draw them out. Since circles don't post very well with ASCII art, I'll use a linear array.
+---+---+---+---+---+
| | | | | |
+---+---+---+---+---+
0 1 2 3 4
^
Rear
The REAR is at index 4.
Let's perform the operation step by step.
First: Add 1 to REAR. This makes REAR point beyond the array:
+---+---+---+---+---+
| | | | | |
+---+---+---+---+---+
0 1 2 3 4 5
^
Rear
Applying the modulo operation, %, this will give us the remainder of 5 / 5 which is zero:
+---+---+---+---+---+
| | | | | |
+---+---+---+---+---+
0 1 2 3 4
^
Rear
Thus the modulo operation wraps around the array, like a circle.
The next question is for you to solve. Remember draw the array or queue. You can use circles (think of a pie sliced or a pizza sliced).
Edit 1: Modulo details
The modulo operation will give a value in the range 0..N, when N is the divisor.
Given N == 4, here are some results for modulo:
Index result
0 0
1 1
2 2
3 3
4 0 --> The remainder of 4 / 4 == 0.
5 1
6 2
7 3
8 0 --> The remainder of 8 / 4 == 0.
Modulus returns the remainder of the two operands. For example, 4%2=0 since 4/2=2 with no remainder, while 4%3=1 since 4/3=1 with remainder 1. Since you can never have a remainder higher than the right operand, you have an effective "range" of answers for any modulus of 0 to (n-1). With that in mind, just plug in the numbers for the variables ((4+1)%5=? and (1+1)%4=?). Usually to find the remainder you would use long division, but one useful thing to remember is that any number divided by itself has a remainder of 0, and any number divided by a larger number will have a remainder equal to itself.

Determinant of a square binary matrix c++ [duplicate]

Can anyone tell me which is the best algorithm to find the value of determinant of a matrix of size N x N?
Here is an extensive discussion.
There are a lot of algorithms.
A simple one is to take the LU decomposition. Then, since
det M = det LU = det L * det U
and both L and U are triangular, the determinant is a product of the diagonal elements of L and U. That is O(n^3). There exist more efficient algorithms.
Row Reduction
The simplest way (and not a bad way, really) to find the determinant of an nxn matrix is by row reduction. By keeping in mind a few simple rules about determinants, we can solve in the form:
det(A) = α * det(R), where R is the row echelon form of the original matrix A, and α is some coefficient.
Finding the determinant of a matrix in row echelon form is really easy; you just find the product of the diagonal. Solving the determinant of the original matrix A then just boils down to calculating α as you find the row echelon form R.
What You Need to Know
What is row echelon form?
See this [link](http://stattrek.com/matrix-algebra/echelon-form.aspx) for a simple definition
**Note:** Not all definitions require 1s for the leading entries, and it is unnecessary for this algorithm.
You Can Find R Using Elementary Row Operations
Swapping rows, adding multiples of another row, etc.
You Derive α from Properties of Row Operations for Determinants
If B is a matrix obtained by multiplying a row of A by some non-zero constant ß, then
det(B) = ß * det(A)
In other words, you can essentially 'factor out' a constant from a row by just pulling it out front of the determinant.
If B is a matrix obtained by swapping two rows of A, then
det(B) = -det(A)
If you swap rows, flip the sign.
If B is a matrix obtained by adding a multiple of one row to another row in A, then
det(B) = det(A)
The determinant doesn't change.
Note that you can find the determinant, in most cases, with only Rule 3 (when the diagonal of A has no zeros, I believe), and in all cases with only Rules 2 and 3. Rule 1 is helpful for humans doing math on paper, trying to avoid fractions.
Example
(I do unnecessary steps to demonstrate each rule more clearly)
| 2 3 3 1 |
A=| 0 4 3 -3 |
| 2 -1 -1 -3 |
| 0 -4 -3 2 |
R2 R3, -α -> α (Rule 2)
| 2 3 3 1 |
-| 2 -1 -1 -3 |
| 0 4 3 -3 |
| 0 -4 -3 2 |
R2 - R1 -> R2 (Rule 3)
| 2 3 3 1 |
-| 0 -4 -4 -4 |
| 0 4 3 -3 |
| 0 -4 -3 2 |
R2/(-4) -> R2, -4α -> α (Rule 1)
| 2 3 3 1 |
4| 0 1 1 1 |
| 0 4 3 -3 |
| 0 -4 -3 2 |
R3 - 4R2 -> R3, R4 + 4R2 -> R4 (Rule 3, applied twice)
| 2 3 3 1 |
4| 0 1 1 1 |
| 0 0 -1 -7 |
| 0 0 1 6 |
R4 + R3 -> R3
| 2 3 3 1 |
4| 0 1 1 1 | = 4 ( 2 * 1 * -1 * -1 ) = 8
| 0 0 -1 -7 |
| 0 0 0 -1 |
def echelon_form(A, size):
for i in range(size - 1):
for j in range(size - 1, i, -1):
if A[j][i] == 0:
continue
else:
try:
req_ratio = A[j][i] / A[j - 1][i]
# A[j] = A[j] - req_ratio*A[j-1]
except ZeroDivisionError:
# A[j], A[j-1] = A[j-1], A[j]
for x in range(size):
temp = A[j][x]
A[j][x] = A[j-1][x]
A[j-1][x] = temp
continue
for k in range(size):
A[j][k] = A[j][k] - req_ratio * A[j - 1][k]
return A
If you did an initial research, you've probably found that with N>=4, calculation of a matrix determinant becomes quite complex. Regarding algorithms, I would point you to Wikipedia article on Matrix determinants, specifically the "Algorithmic Implementation" section.
From my own experience, you can easily find a LU or QR decomposition algorithm in existing matrix libraries such as Alglib. The algorithm itself is not quite simple though.
I am not too familiar with LU factorization, but I know that in order to get either L or U, you need to make the initial matrix triangular (either upper triangular for U or lower triangular for L). However, once you get the matrix in triangular form for some nxn matrix A and assuming the only operation your code uses is Rb - k*Ra, you can just solve det(A) = Π T(i,i) from i=0 to n (i.e. det(A) = T(0,0) x T(1,1) x ... x T(n,n)) for the triangular matrix T. Check this link to see what I'm talking about. http://matrix.reshish.com/determinant.php

Combinational Circuit with LED Lighting

Combinational Circuit design question.
A
____
| |
F | | B
| |
____
| G |
E | | C
| |
____
D
Suppose this is a LED display. It would take input of 4 bit
(0000)-(1111) and display the Hex of it. For example
if (1100) come in it would display C by turning on AFED and turning off BCG.
If (1010) comes in it would display A by turning on ABCEFG
and turn off D.
These display will all be Capital letters so there is no visual
difference between 0 and D and 8 and B.
Develop a truth table and an optimized expression using Karnaugh Maps.
I'm not exactly sure how to begin. For the truth table would I be using (w,x,y,z) as input variable or just the ABCDEFG variable since it's the one turning on and off?
input (1010)-->A--> ABCEFG~D (~ stand for NOT)
input (1011)-->B--> ABCDEFG
input (1100)-->C--> ADEF~B~C~G
So would I do for all hex 0-F then that would give me the min. term canonical then use Karnaugh Map to optimize it? Any help would be grateful!
1) Map your lights to bits:
ABCDEFG, so truth table will be:
ABCDEFG
input (1010)-->A-->1110110
and so on.
You will have big table (with 16 rows).
2) Then follow sample on wikipedia for every output light.
You need to do 7 of these: Each for one segment in the 7-segment display.
This figure is for illustration only. It doesn't necessarily map to any segment in your problem.
cd=00 01 11 10 <-- where abcd = 0000 for 0 : put '1' if the light is on
ab= 00 1 1 1 1 = 0001 for 1 : put '0' if it's off for
ab= 01 1 1 1 0 = 0010 for 2 ... the given segment
ab= 11 0 1 1 1
ab= 10 1 1 1 0 = 1111 for f
^^^^ = d=1 region
^^^^ = c==1 region
The two middle rows represent "b==1" region and the two last rows are a==1 region.
From that map find maximum size rectangles (that are of size [1,2 or 4] x [1, 2 or 4]); that can be overlapping. The middle 2x4 region is coded as 'd'. The top row is '~a~b'. The top left 2x2 square is '~a~c'. A bottom left square that wraps from row 4 to row 1 is '~b~c'. Finally the small 2x1 region that covers position x=4, y=3 is 'abc'.
This function would thus be 'd + ~a~b + ~a~c + ~b~c + abc'. If there are no redundant squares (that are completely covered by other squares), then this formula should be optimal canonical form. (not counting XOR operation). Repeat for 7 times for the real data!
Any selection/permutation of the variables should give the same logical circuit, whether you use abcd or dcba or acbd etc.

Calculating a boundary around several linked rectangles

I am working on a project where I need to create a boundary around a group of rectangles.
Let's use this picture as an example of what I want to accomplish.
EDIT: Couldn't get the image tag to work properly, so here is the full link:
http://www.flickr.com/photos/21093416#N04/3029621742/
We have rectangles A and C who are linked by a special link rectangle B. You could think of this as two nodes in a graph (A,C) and the edge between them (B). That means the rectangles have pointers to each other in the following manner: A->B, A<-B->C, C->B
Each rectangle has four vertices stored in an array where index 0 is bottom left, and index 3 is bottom right.
I want to "traverse" this linked structure and calculate the vertices making up the boundary (red line) around it. I already have some small ideas around how to accomplish this, but want to know if some of you more mathematically inclined have some neat tricks up your sleeves.
The reason I post this here is just that someone might have solved a similar problem before, and have some ideas I could use. I don't expect anyone to sit down and think this through long and hard. I'm going to work on a solution in parallell as I wait for answers.
Any input is greatly appreciated.
Using the example, where rectangles are perpendicular to each other and can therefore be presented by four values (two x coordinates and two y coordinates):
1 2 3 4 5 6
1 +---+---+
| |
2 + A +---+---+
| | B |
3 + + +---+---+
| | | | |
4 +---+---+---+---+ +
| |
5 + C +
| |
6 +---+---+
1) collect all the x coordinates (both left and right) into a list, then sort it and remove duplicates
1 3 4 5 6
2) collect all the y coordinates (both top and bottom) into a list, then sort it and remove duplicates
1 2 3 4 6
3) create a 2D array by number of gaps between the unique x coordinates * number of gaps between the unique y coordinates. It only needs to be one bit per cell, so in c++ a vector<bool> with likely give you a very memory-efficient version of this
4 * 4
4) paint all the rectangles into this grid
1 3 4 5 6
1 +---+
| 1 | 0 0 0
2 +---+---+---+
| 1 | 1 | 1 | 0
3 +---+---+---+---+
| 1 | 1 | 1 | 1 |
4 +---+---+---+---+
0 0 | 1 | 1 |
6 +---+---+
5) for each cell in the grid, for each edge, if the cell beside it in that cardinal direction is not painted, draw the boundary line for that edge
In the question, the rectangles are described as being four vectors where each represents a corner. If each rectangle can be at arbitrary and different rotation from others, then the approach I've outlined above won't work. The problem of finding the path around a complex polygon is regularly solved by vector graphics rasterizers, and a good approach to solving the problem is using a library such as Cairo to do the work for you!
The generalized solution to this problem is to implement boolean operations in terms of a scanline. You can find a brief discussion here to get you started. From the text:
"The basis of the boolean algorithms is scanlines. For the basic principles the book: Computational Geometry an Introduction by Franco P. Preparata and Michael Ian Shamos is very good."
I own this book, though it's at the office now, so I can't look up the page numbers you should read, though chapter 8, on the geometry of rectangles is probably the best starting point.
Calculate the sum of the boundaries of all 3 rectangles seperately
calculate the overlapping rectangle of A and B, and subtract it from the sum
Do the same for the overlapping rectangle of B and C
(to get the overlapping rectangle from A and B take the middle 2 X positions, together with the middle 2 Y positions)
Example (x1,y1) - (x2,y2):
Rectangle A: (1,1) - (3,4)
Rectangle B: (3,2) - (5,4)
Rectangle C: (4,3) - (6,6)
Calculation:
10 + 8 + 10 = 28
X coords ordered = 1,3,3,5 middle two are 3 and 3
Y coords ordered = 1,2,4,4 middle two are 2 and 4
so: (3,2) - (3,4) : boundery = 4
X coords ordered = 3,4,5,6 middle two are 4 and 5
Y coords ordered = 2,3,4,6 middle two are 3 and 4
so: (4,3) - (5,4) : boundery = 4
28 - 4 - 4 = 20
This is my example visualized:
1 2 3 4 5 6
1 +---+---+
| |
2 + A +---+---+
| | B |
3 + + +---+---+
| | | | |
4 +---+---+---+---+ +
| |
5 + C +
| |
6 +---+---+
A simple trick should be:
Create a region from the first rectangle
Add the other rectangles to the region
Get the boundary of the region (somehow? :P)
After some thinking I might end up doing something like this:
Pseudo code:
LinkRectsConnectedTo(Rectangle rectangle,Edge startEdge) // Edge can be West,North,East,South
for each edge in rectangle starting with the edge facing last rectangle
add vertices in the edge to the final boundary polygon
if edge is connected to another rectangle
if edge not equals startEdge
recursively call LinkRectsConnectedTo(rectangle,startEdge)
Obvisouly this pseudo code would have to be refined a bit and might not cover all cases, but I think I might have solved my own problem.
I haven't thought this out completely, but I wonder if you couldn't do something like:
Make a list of all the edges.
Get all the edges where P1.X = P2.X
In that list, get the pairs where X are equal
For each pair, replace with one or two edges for the parts where they DON'T overlap
Do something clever to get the edges in the right order
Will your rectangles always be horizontally aligned, if not you'd need to do the same thing but for Y too?
And are they always guaranteed to be touching? If not the algorithm wouldn't be broken, but the 'right order' wouldn't be definable.