Probability improves? - combinations

You have a total of 3n cards, n red cards, n blue cards and n yellow cards. What is the probability of drawing 3 cards one of each color? Now if you have 3n+3 cards instead, n+1 of each color, will the probability compared to the last case improve? Thanks!

Case 1
Experiment : Drawing three cards out of 3n Cards (n red, n blue & n yellow)
Sample Space : (3n)C3 = (3n)(3n-1)(3n-2) / 3!
Event A : Getting three cards of different color.
Favorable Case : nC1 * nC1 * nC1 = n^3
P(A) = Favourable Case / Sample Space
= {(n^3)} * 3! / (3n)(3n-1)(3n-2)
Case 2
Experiment : Drawing three cards out of 3n+3 Cards (n+1 red, n+1 blue & n+1 yellow)
Sample Space : (3n+3)C(3) = (3n+3)(3n+2)(3n+1) / 3!
Event A : Getting three cards of different color.
Favorable Case : (n+1)C(1) * (n+1)C(1) * (n+1)C(1) = (n+1)^3
P(A) = Favorable Case / Sample Space
= {((n+1)^3)} * 3! / (3n+3)(3n+2)(3n+1)
As compared to Case 1, We see that in Case 2, the numerator factor in increased by 1 whereas the denominator factor is raised by 3. Hence the probability in Case 2 is less than Case 1 i.e., the probability is not improved.

For 3n cards: P(x)=[C(n,1)C(n,1)C(n,1)]/C(3n,3)
For 3n+3 cards: P(x)=[C(n+1,1)C(n+1,1)C(n+1,1)]/C(3n+3,3)
You can simply plot the above function on wolfram alpha by typing
C(n,1)^3/C(3n,3) plot (n from 2 to 10) in the search box and verify it is decreasing.

The probability of picking one of each in three draws is the number of combinations that result in one red, one blue, and one yellow over the total cards each time: 3n/3n*2n/(3n-1)*n/(3n-2). Which reduces to 6n^2/(3n(9n^2-9n-2)) and then to 2n/(9n^2-9n-2). Next you want to know if adding one to n changes the probability so:
2n/(9n^2-9n-2)=2(n+1)/(9(n+1)^2-9(n+1)-2)
2n/(9n^2-9n-2)=2n+2/(9(n^2 +2n +1)-9n-9-2)
2n/(9n^2-9n-2)=2n+2/(9n^2 +18n +9-9n-9-2)
2n/(9n^2-9n-2)=2n+2/(9n^2+9n-2)
n/(9n^2-9n-2)=(n+1)/(9n^2+9n-2)
(n+1)(9n^2+9n-2)=(9n^2-9n-2)n
9n^3+9n^2-2n+9n^2+9n-2=9n^3-9n^2-2n
9n^3+18n^2+7n-2=9n^3-9n^2-2n
27n^2+9n-2=0
n=-0.48581,0.15248
so at -0.48581 and 0.15248 1+n has the same probability as n and since we are dealing with n>=1 ill plug in 1 to the equation to see if it the difference in probability is greater than or less than 0 for n>=0.15248.
27(1)^2+9(1)-2?=0
27+9-2?=0
34>0
meaning that the probability of drawing one of each card does not increase as you add one more of each card but, in fact, it decreases.
note: remember that the left side of the equation represents n and the right side represents n+1
also this answer makes sense because if n=1 the probability has to be 100% because you are drawing every card, but not if n>1 because you are not drawing all of the cards.

Related

How to determine if a point is within an polygon consist of horizontal and vertical lines only?

I want to find a best way because all coordinates are integer values and polygons consist of horizontal and vertical lines only. I think there may be a simple and fast method to do this.
From an asymptotic complexity point of view, a rectilinear polygon is not really simpler to process than a general one: O(N) without preprocessing, and O(Log N) after O(N Log N) preprocessing (but using a complicated procedure).
For the case of no preprocessing, the procedure is simple: consider every vertical side in turn and count those that cross the horizontal half-line from the given point (+1 upward, -1 downward). The point is inside if the final count is nonzero.
The status of points on the outline is application-dependent.
For rectilinear poygons with not too large integer coordinates, you can anyway do a little better, by "compressing" them. By two independent sorts on X and Y, you can obtain a mapping from X (or Y) to integer indexes in range [0,N). This gives the shrunk polygon below, of size NxN.
Now you can embed the polygon in an image and preprocess to label the pixels as inside/outside (by seed-filling). After filling two lookup-tables for coordinate conversion, you can obtain the status of any point in constant time O(1).
This will take O(N²+M) preprocessing time and storage, where M is the range of X and Y values.
Consider any polygon, not necessary convex, formed only with horizontal and vetical lines:
Take a point (I've drawn A,B,C,D) and draw horizontal and vertical lines passing through the point.
Let's take point A. You see the horizontal line through it crosses four (vertical) segments. Note one segment is at left and the others are at right.
For point B its horizontal line crosses also four segments, but two at left and two at right.
The conditions that a point must fulfill to be inside a polygon are:
At least one segment is horizontaly crossed at left of the point.
At least one segment is horizontaly crossed at right of the point.
Both numbers of crosses left, right must be odd.
Same three conditions for vertical lines.
So, in pseudocode it goes like this:
let nL = 0, nR = 0 //left/right counters
let nA = 0, nU = 0 //above/under counters
for each segment s(sx1,sy1, sx2, sy2) in polygon
if point is on segment
return true //or false, your choice
else if segment is vertical and pointY is inside of (sy1,sy2)
if pointX > min(sx1,sx2)
nL = nL + 1
else
nR = nR + 1
else if segment is horizontal and pointX is inside of (sx1,sx2)
if pointY > min(sy1,sy2)
nU = nU + 1
else
nA = nA + 1
//Check conditions
if nL > 0 and nR > 0 and nL is odd and nR is odd
return true
if nA > 0 and nU > 0 and nA is odd and nU is odd
return true
return false

How can I optimise this to run in an efficient manner?

The link for the question is as follows: http://codeforces.com/problemset/problem/478/C
You have r red, g green and b blue balloons. To decorate a single table for the banquet you need exactly three balloons. Three balloons attached to some table shouldn't have the same color. What maximum number t of tables can be decorated if we know number of balloons of each color?
Your task is to write a program that for given values r, g and b will find the maximum number t of tables, that can be decorated in the required manner.
Input:
The single line contains three integers r, g and b (0 ≤ r, g, b ≤ 2·10^9) — the number of red, green and blue baloons respectively. The numbers are separated by exactly one space.
Output:
Print a single integer t — the maximum number of tables that can be decorated in the required manner.
So, what I did was, in a greedy manner, searched for the maximum and minimum value each time and subtracted 2 and 1 respectively if possible. Here is my code:
int main (void)
{
int ans=0,r,g,b;
cin>>r>>g>>b;
while (1)
{
int a1 = maxfind(r,g,b);
int a2 = minfind(r,g,b);
//ans++;
if (a1 >= 2 && a2 >= 1)
{
ans++;
if (indma == 1)
r = r-2;
else if (indma == 2)
g = g-2;
else
b = b-2;
if (indmi == 1)
r = r-1;
else if (indmi == 2)
g = g-1;
else
b = b-1;
}
else if (r == 1 && g == 1 && b == 1)
{
ans++;
break;
}
else
break;
}
cout<<ans<<"\n";
int maxfind(int r, int g, int b)
{
indma = 0;
int temp = INT_MIN;
if (r >= temp)
{
temp = r;
indma = 1;
}
if (g >= temp)
{
temp = g;
indma = 2;
}
if (b >= temp)
{
temp = b;
indma = 3;
}
return temp;
}
Similar is the function for findmin and I make sure that it's not the same number chosen in case the maximum and minimum values are same. However, since the limit is 2*10^9, obviously, this surpasses the Time limit. How can I optimise it? Thanks!
Edit: You can easily find sample test cases in the link for the question. However, I am still adding one of them.
Input
5 4 3
output
4
Explanation: In the first sample you can decorate the tables with the following balloon sets: "rgg", "gbb", "brr", "rrg", where "r", "g" and "b" represent the red, green and blue balls, respectively.
You can split this problem into two scenarios, either you use all the balloons(with 0, 1, or 2 left over), or you don't because there is too many of one color and not enough of the other two.
If you use all the balloons, the answer is simply (r+g+b)/3
if you don't use all the balloons, then the answer is equal to the sum of the lower 2 of the three numbers.
t = min((r+g+b)/3,r+g+b-max(r,g,b))
Without looking at the problem but just at your code:
If the smallest number is less than the middle number and at least two less than the largest number before any iteration then this is true after that iteration as well (because the smallest number will now be less than the number that was the largest, and it will be two less than the number that was the middle one). In that case you can figure out exactly what will happen throughout your algorithm (the largest number will be decreased by two until it is not the largest anymore, and then the two largest numbers will be decreased by two in turn). So you can figure out exactly what ans will be without actually doing all the iterations.
If the two smallest numbers are equal and the largest is at least three larger then in the next two iterations both smallest numbers will be decreased by 1 once, while the largest will be decreased by 2 twice. You can calculate how often that happens.
After that you end up with (x, x+1, x+1), (x, x, x+2), (x, x, x+1) or (x, x, x). Here you can also predict what will happen in the next iteration or the next two iterations. It's a bit complicated, but not very complicated. For example, if three numbers are (x, x, x+1) then the next three numbers will be (x-1, x, x-1) which is the same pattern again.
Example: Start with (10^9, 10^9 + 1, 10^9 + 1000): You will 500 times subtract 1 from the first and 2 from the last number, giving (10^9 - 500, 10^9 + 1, 10^9 + 0). Then you will 10^9 - 500 times decrease the first number by 1, and since the number is even you will decrease each of the other two numbers by two (10^9 - 500) / 2 times. At that point you have (0, 501, 500) and your algorithm ends with ans = 10^9.
Now this shows how to do the calculation in constant time. It doesn't show whether this gives the correct solution.
How can I optimise this to run in an efficient manner?
By looking at the problem more closely. The brute force approach will not work (unfortunately).
Fortunately, the numbers can be calculated in a single closed equation without resorting to recursion or looping.
Let's try a derivation: You start with (r, g, b) balloons. The upper limit of tables is certainly sum(r, g, b) / 3 (integer division, i.e. round down) because you need at least three times as many balloons as tables.
What about the less than-optimal cases? To decorate a table, you need two balloons of different colors but you do not care about the color of the third one.
Let's assume that you have fewest green (min(r, b, g) = g) balloons. So you can certainly decorate g tables as long as you have enough baloons in total (already covered). How many more tables can you decorate?
Assuming you did not use up all balloons yet (i.e. g < sum(r, b, g) / 3) you have used up 2 x g balloons of the other colors, i.e. you have a total of sum(r, b) - 2 x g balloons left. This can be an arbitrary combination of the available red and blue balloons since we can shuffle them around as we like.
If we assume, that the red (r) balloons are the second least frequent (i.e. most balloons are blue), we can decorate at most min(r, sum(r, b) - 2 x g) more tables. We either run out of red balloons or we run out of balloons, whichever one happens first.
Since we already covered the case of running out of balloons, we can ignore the second term of min(r, sum(r, b) - 2 x g).
So indeed, the number of tables is min(sum(r, b, g) / 3, min(r + b, r + g, b + g)) or simplified min(sum(r, b, g) / 3, sum(r, b, g) - max(r, b, g)), or, colloquially, the minimum of a third of the total number and the sum of the two least frequent colors.

Is it possible for matplotlib's alpha transparency values to "sum" to 1?

I am using matplotlib to plot a series of horizontal lines that overlap. I would like to indicate (in a very rough way) how much overlap there is via transparency. For example if I have ten lines and 5 of them overlap over a certain interval, I would like that interval to have an alpha value of 0.5. If all of them overlap over a certain interval then the interval should have an alpha value of 1.0. The following code should illustrate what I want:
import matplotlib.pyplot as plt
y = [1,1,1,1,1,1,1,1,1,1]
x_start = [0,0,0,0,0,0,0,0,0,0]
x_end = [1,2,3,4,5,6,7,8,9,10]
plt.hlines(y, x_start, x_end, linewidth=7, colors='red', alpha=0.1)
plt.hlines(1.2, 0, 10, linewidth=7, colors='red', alpha=1)
plt.ylim(0.8, 1.4)
plt.show()
I would like the transparency of the red from x=0 to x=1 for the line at y=1 to be the same as that of the horizontal line at y=1.2 (not transparent at all). However this is not the case.
Is there a way to achieve what I want with matplotlib and the alpha values? I will know the total number of lines that can possibly overlap (i.e., how many lines overlapping should correspond to 0 transparency).
Thanks to #cphlewis who got me pointed in the right direction I now have an approximation that works well enough for my needs.
My problem is much easier than the general problem since I want to assign each line (layer) the exact same transparency level s. If there are n=2 lines I want the transparency when both lines overlap to be close to 0, e.g. alpha=0.97.
If n=2 and alpha=0.97, solving
0.97 = s + s(1-s)
for s yields s=0.827.
Generalizing this for any n leads to solving a polynomial where the coefficients are given by the n'th row of Pascal's triangle and where the sign of each coefficient is equal to
(-1)^(n + pos)
where pos is the position of the coefficient in Pascal's triangle from left to right and where pos starts at 1. Also, the last coefficient in Pascal's triangle is replaced with the desired alpha value.
So for n=5 the polynomial to be solved is
s^5 - 5s^4 + 10s^3 - 10s^2 + 5s - 0.97 = 0
The following Python code solves for the smallest real root (which is the alpha value that I want) given n and alpha (note that alpha < 1).
import numpy as np
import scipy.linalg
num_lines = 5
end_alpha_value = 0.97 ## end_alpha_value must be in the interval (0, 1)
pascal_triangle = scipy.linalg.pascal(num_lines + 1, kind='lower')
print 'num_reps: 1, minimum real root: %.3f' % end_alpha_value
for i in range(2, num_lines + 1):
coeff_list = []
for j, coeff in enumerate(pascal_triangle[i][:i]):
coeff_list.append(coeff * ((-1)**(i+j+1)))
coeff_list.append(-end_alpha_value)
all_roots = np.roots(coeff_list)
real_roots = all_roots[np.isreal(all_roots)]
min_real_root = min(real_roots)
real_valued = min_real_root.real[abs(min_real_root.imag) < 1e-5]
print 'num_reps: %i, minimum real root: %.3f' % (i, real_valued[0])
For the case n=10 if the desired transparency is alpha=0.97 then s=0.296 resulting in the following output:
I believe what is going on shows up better using black as the color:

C++: Finding all combinations of array items divisable to two groups

I believe this is more of an algorithmic question but I also want to do this in C++.
Let me illustrate the question with an example.
Suppose I have N number of objects (not programming objects), each with different weights. And I have two vehicles to carry them. The vehicles are big enough to carry all the objects by each. These two vehicles have their own mileage and different levels of fuel in the tank. And also the mileage depends on the weight it carries.
The objective is to bring these N objects as far as possible. So I need to distribute the N objects in a certain way between the two vehicles. Note that I do not need to bring them the 'same' distance, but rather as far as possible. So example, I want the two vehicles to go 5km and 6 km, rather than one going 2km and other going 7km.
I cannot think of a theoretical closed-form calculation to determine which weights to be loaded in to each vehicle. because remember that I need to carry all the N objects which is a fixed value.
So as far as I can think, I need to try all the combinations.
Could someone advice of an efficient algorithm to try all the combinations?
For example I would have the following:
int weights[5] = {1,4,2,7,5}; // can be more values than 5
float vehicelONEMileage(int totalWeight);
float vehicleTWOMileage(int totalWeight);
How could I efficiently try all the combinations of weights[] with the two functions?
Thw two functions can be assumed as linear functions. I.e. the return value of the two mileage functions are linear functions with (different) negative slopes and (different) offsets.
So what I need to find is something like:
MAX(MIN(vehicleONEMileage(x), vehicleTWOMileage(sum(weights) - x)));
Thank you.
This should be on the cs or the math site.
Simplification: Instead of an array of objects, let's say we can distribute weight linearly.
The function we want to optimize is the minimum of both travel distances. Finding the maximum of the minimum is the same as finding the maximum of the product (Without proof. But to see this, think of the relationship between perimeter and area of rectangles. The rectangle with the biggest area given a perimeter is a square, which also happens to have the largest minimum side length).
In the following, we will scale the sum of all weights to 1. So, a distribution like (0.7, 0.3) means that 70% of all weights is loaded on vehicle 1. Let's call the load of vehicle 1 x and the load of vehicle 1-x.
Given the two linear functions f = a x + b and g = c x + d, where f is the mileage of vehicle 1 when loaded with weight x, and g the same for vehicle 2, we want to maximize
(a*x+b)*(c*(1-x)+d)
Let's ask Wolfram Alpha to do the hard work for us: www.wolframalpha.com/input/?i=derive+%28%28a*x%2Bb%29*%28c*%281-x%29%2Bd%29%29
It tells us that there is an extremum at
x_opt = (a * c + a * d - b * c) / (2 * a * c)
That's all you need to solve your problem efficiently.
The complete algorithm:
find a, b, c, d
b = vehicleONEMileage(0)
a = (vehicleONEMileage(1) - b) * sum_of_all_weights
same for c and d
calculate x_opt as above.
if x_opt < 0, load all weight onto vehicle 2
if x_opt > 1, load all weight onto vehicle 1
else, try to load tgt_load = x_opt*sum_of_all_weights onto vehicle 1, the rest onto vehicle 2.
The rest is a knapsack problem. See http://en.wikipedia.org/wiki/Knapsack_problem#0.2F1_Knapsack_Problem
How to apply this? Use the dynamic programming algorithm described there twice.
for maximizing a load up to tgt_load
for maximizing a load up to (sum_of_all_weights - tgt_load)
The first one, if loaded onto vehicle one, gives you a distribution with slightly less then expected on vehicle one.
The second one, if loaded onto vehicle two, gives you a distribution with slightly more than expected on vehicle two.
One of those is the best solution. Compare them and use the better one.
I leave the C++ part to you. ;-)
I can suggest the following solution:
The total number of combinations is 2^(number of weights). Using a bit logic we can loop through the all combinations and calculate maxDistance. Bits in the combination value show which weight goes to which vehicle.
Note that algorithm complexity is exponential and int has a limited number of bits!
float maxDistance = 0.f;
for (int combination = 0; combination < (1 << ARRAYSIZE(weights)); ++combination)
{
int weightForVehicleONE = 0;
int weightForVehicleTWO = 0;
for (int i = 0; i < ARRAYSIZE(weights); ++i)
{
if (combination & (1 << i)) // bit is set to 1 and goes to vechicleTWO
{
weightForVehicleTWO += weights[i];
}
else // bit is set to 0 and goes to vechicleONE
{
weightForVehicleONE += weights[i];
}
}
maxDistance = max(maxDistance, min(vehicelONEMileage(weightForVehicleONE), vehicleTWOMileage(weightForVehicleTWO)));
}

Finding nearest RGB colour

I was told to use distance formula to find if the color matches the other one so I have,
struct RGB_SPACE
{
float R, G, B;
};
RGB_SPACE p = (255, 164, 32); //pre-defined
RGB_SPACE u = (192, 35, 111); //user defined
long distance = static_cast<long>(pow(u.R - p.R, 2) + pow(u.G - p.G, 2) + pow(u.B - p.B, 2));
this gives just a distance, but how would i know if the color matches the user-defined by at least 25%?
I'm not just sure but I have an idea to check each color value to see if the difference is 25%. for example.
float R = u.R/p.R * 100;
float G = u.G/p.G * 100;
float B = u.B/p.B * 100;
if (R <= 25 && G <= 25 && B <= 25)
{
//color matches with pre-defined color.
}
I would suggest not to check in RGB space. If you have (0,0,0) and (100,0,0) they are similar according to cababungas formula (as well as according to casablanca's which considers too many colors similar). However, they LOOK pretty different.
The HSL and HSV color models are based on human interpretation of colors and you can then easily specify a distance for hue, saturation and brightness independently of each other (depending on what "similar" means in your case).
"Matches by at least 25%" is not a well-defined problem. Matches by at least 25% of what, and according to what metric? There's tons of possible choices. If you compare RGB colors, the obvious ones are distance metrics derived from vector norms. The three most important ones are:
1-norm or "Manhattan distance": distance = abs(r1-r2) + abs(g1-g2) + abs(b1-b2)
2-norm or Euclidean distance: distance = sqrt(pow(r1-r2, 2) + pow(g1-g2, 2) + pow(b1-b2, 2)) (you compute the square of this, which is fine - you can avoid the sqrt if you're just checking against a threshold, by squaring the threshold too)
Infinity-norm: distance = max(abs(r1-r2), abs(g1-g2), abs(b1-b2))
There's lots of other possibilities, of course. You can check if they're within some distance of each other: If you want to allow up to 25% difference (over the range of possible RGB values) in one color channel, the thresholds to use for the 3 methods are 3/4*255, sqrt(3)/4*255 and 255/4, respectively. This is a very coarse metric though.
A better way to measure distances between colors is to convert your colors to a perceptually uniform color space like CIELAB and do the comparison there; there's a fairly good Wikipedia article on the subject, too. That might be overkill depending on your intended application, but those are the color spaces where measured distances have the best correlation with distances perceived by the human visual system.
Note that the maximum possible distance is between (255, 255, 255) and (0, 0, 0), which are at a distance of 3 * 255^2. Obviously these two colours match the least (0% match) and they are a distance 100% apart. Then at least a 25% match means a distance less than 75%, i.e. 3 / 4 * 3 * 255^2 = 9 / 4 * 255 * 255. So you could just check if:
distance <= 9 / 4 * 255 * 255