How to find weights given a Huffman tree - compression

Huffman's algorithm derives a tree given the weights of the symbols. I want the reverse: given a tree, figure out a set of symbol weights that would generate that tree a tree with the same bit lengths for each symbol.
I'm aware that there are multiple sets of weights that generate the same tree, so I imagine that the weights can be given as powers of two, and the longest code could be assigned weight 1.
(Not relevant to the question, but the purpose is to fine-tune the fixed tree used internally by an LZ77-type compression algorithm to code the offsets and lengths, checking whether the current bitlengths are reasonable or adjusting them if not).

You imagine correctly. However the powers of two will result in many ties when executing the Huffman algorithm. The tree you get back may have a different topology than the tree you started with, depending on how the ties are decided. But the bit lengths will all be the same.
Here is an example:
I used these frequencies for the alphabet:
817 A
145 B
248 C
431 D
1232 E
209 F
182 G
668 H
689 I
10 J
80 K
397 L
277 M
662 N
781 O
156 P
9 Q
572 R
628 S
905 T
304 U
102 V
264 W
15 X
211 Y
5 Z
That gave me this tree:
Then I assigned powers-of-two frequencies to the symbols per their depths in the tree:
64 A
16 B
32 C
32 D
128 E
16 F
16 G
64 H
64 I
2 J
8 K
32 L
32 M
64 N
64 O
16 P
1 Q
64 R
64 S
128 T
32 U
16 V
32 W
4 X
32 Y
1 Z
Applying Huffman to that, I get a very different tree, but one where all of the symbols have the same depth as before:
I'm pretty sure that there's a way to assign frequencies working up from the bottom, making the next thing to add just big enough to assure the right choice. That will also result in lower weights overall than the powers of two, coming closer to a Fibonacci sequence. This is an interesting problem, so now I am tempted to play with it.

Related

How to calculate pow(2,n) when n exceeds 64 in c++?

So, I am new to programming in c++ and i came across this question where i need to calculate pow(2,n)/2 where n>64 ?
i tried using unsigned long long int but as the limit of the c++ is only 2^64. So is there any method to calculate this.
Edit:
1 < n < 10^5
The result of the expression is used in further calculation
The question is asked on online platform.So, i cant use libraries like gmp to handle large numbers.
Question
You are given with an array A of size N. An element Ai is said to be charged if its value (Ai) is greater than or equal to Ki. Ki is the total number of subsets of array A that consist of element Ai.
Total charge value of the array is defined as summation of all charged elements present in the array mod (10^9)+7.
Your task is to output the total charge value of the given array.
An important detail here is that you're not being asked to compute 2n for gigantic n. Instead, you're being asked to compute 2n mod 109 + 7 for large n, and that's a different question.
For example, let's suppose you want to compute 270 mod 109 + 1. Notice that 270 doesn't fit into a 64-bit machine word. However, 270 = 230 · 235, and 235 does fit into a 64-bit machine word. Therefore, we could do this calculation to get 270 mod 109 + 7:
270 (mod 109 + 7)
= 235 · 235 (mod 109 + 7)
= (235 mod 109 + 7) · (235 mod 109 + 7) mod 109 + 7
= (34359738368 mod 109 + 7) · (34359738368 mod 109 + 7) mod 109 + 7
= (359738130 · 359738130) mod 109 + 7
= 129411522175896900 mod 109 + 7
= 270016253
More generally, by using repeated squaring, you can compute 2n mod 109 + 7 for any value of n in a way that nicely fits into a 64-bit integer.
Hope this helps!
The common approach in serious numerical work is to rewrite the formula's. You store log(x) instead of x, and later when you do need x it will typically be in a context where you didn't need all those digits anyway.

Is Dilation/Erosion with fixed kernel for a number of iterations is similar to dilating/eroding with equivalent kernel of bigger size

While going through the OpenCV source code, I noticed that for iterations more than one it just creates a kernel of bigger size and do a single iteration.
So my question is if we take SQUARE structuring element of 3x3 size and dilate/erode it in three iterations, will it be same as dilating/eroding it with a 9x9 kernel once.
if( iterations > 1 && countNonZero(kernel) == kernel.rows*kernel.cols )
{
anchor = Point(anchor.x*iterations, anchor.y*iterations);
kernel = getStructuringElement(MORPH_RECT,
Size(ksize.width + (iterations-1)*(ksize.width-1),
ksize.height + (iterations-1)*(ksize.height-1)),
anchor);
iterations = 1;
}
Refering to Jordi's Answer:
[Quoted] ... Note, however, that this does not hold for all structuring elements...
In fact, it holds, in the following way (not in Jordi's example):
First step, calculate the 5x5 kernel by dilation twice in 3x3 kernel on a single center point 5x5 source image:
00000 00000 00100
00000 010 00100 010 01110
00100 + 111 -> 01110 + 111 -> 11111 ===> this is the equivalent 5x5 kernel for 2x 3x3 dilation
00000 010 00100 010 01110
00000 00000 00100
Then applying twice of 3x3 original dilation kernel is equivalent to applying this 5x5 dilation kernel on a bigger image. For example:
0000000000 0000000000 00100
0000000000 010 010 0000000000 01110
0011100000 + 111 + 111 === 0011100000 + 11111
0000001000 010 010 0000001000 01110
0000000000 0000000000 00100
0000000000 0000000000
This does not directly answer your question though. However, I can not just use 'comment' as it is very hard (if not impossible) to format all these equations/explanations.
In fact, a proof for binary image (image with only value 0 or 1 in each pixel) for the larger combined kernel for dilation is easy:
Let's define the binary operator + to be the dilation operator, where the 1st operand is the kernel, and the second operand is the image to be dilated.. So, if we want to do dilation on image I with kernel K, we write dilated-image = K + I
Let's define binary operator U to be the union operator, or, in other word, the binary 'OR' operator for each pixel, where the two operand of U must be binary images in the same dimension. For example: A U B means doing -OR- on each corresponding pixel of A and B:
A= 0 0 1 B= 0 1 1
1 0 1 1 1 1
1 1 0 0 1 0
Then
A U B = 0 1 1
1 1 1
1 1 0
We also define U A(i), i=1, ..., n to be A(1) U A(2) U ... U A(n).
Let's define K^n to be the dilation-styled larger kernel by applying n times of kernel K on a single center point image.
Note that any image I, we can decompose it into union of single point images. For example,
0 1 0 0 1 0 0 0 0 0 0 0
I = 0 0 0 === 0 0 0 U 0 0 0 U 0 0 0
1 0 1 0 0 0 1 0 0 0 0 1
Now it's time to prove it:
For any image I, we define D(i), i = 1, ..., n to be the single point decomposition of I,
and thus I = U D(i), i = 1, ..., n
By definition of the binary dilation, K + I == K + (U D(i)) == U (K+D(i)).
(Remember that dilation is to mask kernel K on each pixel of I, and mark all corresponding 1's).
Now, let's see what is K + (K + I):
K + (K + I) == K + U (K + D(i))
== U(K + (K + D(i))) (Note: this is tricky. see Theorem 1 below)
== U (K^2 + D(i)) (by definition of K^2)
== K^2 + U D(i) (by definition of the dilation)
== K^2 + I (since I = U D(i))
Now, we already know K + (K + I) == K^2 + I, and it's easy to apply mathematical induction to prove that K + K + K .. + K + I = K^n + I (Note: please apply right association, as I have drop the parenthesis).
Theorem 1: Proof of the deduction from K + U (K + D(i)) to U(K + (K+D(i)))
It's suffice to just prove that for any two binary images A and B in a same dimension,
K + (A U B) = (K+A) U (K+B)
It's quite easy to see that, if we decompose image A and B, and apply kernel K on the decomposed images, those common points (i.e. the intersection points of A and B, or the common 1's point of A and B), will contribute the same resulting points after applying kernel K. And by the definition of dilation, we need to union all points contributed by each decomposed image of A and B. Thus Theorem 1 holds.
=== UPDATE ===
Regarding to kid.abr's comment "27 operations compared to 7x7 kernel with 49 operations":
Generally speaking, it is not 27 operations. It depends. For example, a source image of 100x100 pixels,
with 20 singular point (1's) sparsely distributed. Applying a 3x3 solid kernel (i.e. All 1's) 3 times on it
requires the following steps for each of the 20 singular point:
Loop 1: 9 operations, and generate 9 points.
Loop 2: For each of the 9 points generated, it needs 9 operations => 9 x 9 = 81 steps. And it generates 25 points
Loop 3: For each of the 25 points generated, it needs 9 operations => 25 x 9 = 225 steps.
Total: 9 + 81 + 225 = 315 steps.
Please note that when we visit a pixel with 0 value in the source image, we don't need to apply the kernel
on that point, right?
So, the same case applying the larger kernel, it requires 7x7 = 49 steps.
Yet, if the source image has a large solid area of 1's, the 3-step method wins.
Short answer: with a square structuring element, yes.
Long answer: you need to consider what the erosion/dilation operations do. Dilation, for instance, moves the kernel over the image and sets its centre to 1 whenever any of its grid positions are 1 (I'm assuming binary images, it works the same for greyscale). Increasing the distance between the centre of the structuring element and its edges is then the same as increasing the size of the kernel.
Note, however, that this does not hold for all structuring elements. Suppose you take a structuring element that is just a stretched plus, obviously dilating twice with size 3 is not the same as dilating once with size 5:
00000 00000 00100
00000 010 00100 010 01110
00100 + 111 -> 01110 + 111 -> 11111
00000 010 00100 010 01110
00000 00000 00100
00000 00100 00100
00000 00100 00100
00100 + 11111 -> 11111
00000 00100 00100
00000 00100 00100
Of course, this does work if we define the scaled version of plus as a square without its corners (as it usually would be). I think that in general this shortcut works when the kernel of size k+1 is the dilated version of the kernel of size k, but I have no proof for this.
Short answer for a general kernel: Yes for dilation/erosion, but not necessarily with an equivalent kernel.
From wikipedia:
Dilation: (A⊕B)⊕C = A⊕(B⊕C)
Erosion: (A⊖B)⊖C = A⊖(B⊕C)
Where ⊕ denotes the morphological dilation, and ⊖ denotes the morphological erosion.
Basically, performing erosion/dilation on image A with kernel B and then kernel C is equivalent to performing erosion/dilation on image A with the kernel obtained by dilating B with C. This can easily be expanded to an arbitrary number erosions/dilations.

Decompose integers larger than 100 digits [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 9 years ago.
Improve this question
X and Y are integers larger than 100 digits. Find the integer P which is within the range [X, Y[ and that guaranties the "best" prime decomposition (i.e. the decomposition with the most unique prime factors).
What I've done is just check the primality and decompose each number in the range and find the number that respects the rule. Is there any other way to do this?
An example on small integers
Edit:
In the above example, 123456 is decomposed to
2^6 * 3^1 * 643^1, that's 2 * 2 * 2 * 2 * 2 * 2 * 3 * 643 but only 3 unique factors.
While the answer, 123690, is decomposed to 6 unique factors
2^1 * 3^1 * 5^1 * 7^1 * 19^1 * 31^1.
The answer to questions about enumerating prime numbers is always to find a way to solve the problem using a sieve; in your case, you are looking for "anti-prime" numbers with a large number of factors, but the principle still applies.
The key to this question is that, for most numbers, most of the factors are small. Thus, my suggestion is to set up a sieve for the range X to Y, containing integers all initialized to zero. Then consider all the primes less than some limit, as large as convenient, but obviously much smaller than X. For each prime, add 1 to each element of the sieve that is a multiple of the prime. After sieving with all the primes, the sieve location with the largest count corresponds to the number between X and Y that has the most distinct prime factors.
Let's consider an example: take the range 100 to 125 and sieve with the primes 2, 3, 5 and 7. You'll get something like this:
100 2 5
101 (101)
102 2 3 (17)
103 (103)
104 2 (13)
105 3 5 7
106 2 (53)
107 (107)
108 2 3
109 (109)
110 2 5 (11)
111 3 (37)
112 2 7
113 (113)
114 2 3 (19)
115 5 (23)
116 2 (29)
117 3 (13)
118 2 (59)
119 7 (17)
120 2 3 5
121 (11)
122 2 (61)
123 3 (41)
124 2 (31)
125 5
So the winners are 105 and 120, each having three prime factors; you'll have to decide for yourself what to do with ties. Note that some factors are missed: 11 divides 110 and 121, 13 divides 104 and 117, 17 divides 102 and 119, 19 divides 114, 23 divides 115, 29 divides 116, 31 divides 124, 37 divides 111, 41 divides 123, 53 divides 106, 59 divides 118, 61 divides 122, and of course 101, 103, 107, 109, and 113 are prime. That means 102, 110 and 114 also tie for the lead, each having three prime factors. So this algorithm isn't perfect, but for X and Y in the hundred-digit range, and assuming you sieve by the primes to a million or ten million, it is unlikely you will substantially miss the answer.
Good question. Look for it soon at my blog.
Take the list of all primes in order (2,3,5,7...) and start multiplying them (2 * 3 * 5 *...) until you get a number >= X. Call this number P'. If its <= Y, you're done, P = P'. If not, start computing P'/2, P'/3, P'/5 etc looking for a number [X,Y]. If you don't find it and get to a number < X, try multiplying in then next prime to P' and continuing. If this still fails, then the range [X,Y] is pretty small, so fall back to the method of factoring all the numbers in that range.
For a small range (Y-X is small), allocate an array of size Y-X+1, zero it, then for all primes <= Y-X, add one to the array elements corresponding to multiples of the prime (simple seive). Then search for the element with the largest total. If that total n is such that (Y-X)n >= X, then that is the answer. If not, continue sieving primes larger than Y-X until you get to some prime p such that pn > X for some n in the table...
One of the two above methods should work, depending on how large the range is...

C++, How to create and draw a Binary Tree then traverse it in Pre-Order

How do I create a Binary Tree and draw it using a Pre-Order Traversal strategy? The root would be the first number going in.
I have a set of numbers: 48 32 51 54 31 24 39. 48 would be the root. How are the child nodes pushed onto the Binary Tree in a Pre-Order traversal?
Imagine the following sub-problem. You have a set of numbers:
N A1...AX B1...BY
You know that N is the root of the corresponding tree. All you need to know is what numbers form the left sub-tree. Obviously the rest of the numbers form the right sub-tree.
If you remember the properties of a binary-search trees, you would know that elements of the left sub-tree have values smaller than the root (while the ones on the right have values bigger).
Therefore, the left sub-tree is the sequence of numbers that are smaller than (or possibly equal to) N. The rest of the numbers are in the right sub-tree.
Recursively solve for
A1...AX
and
B1...BY
For example given:
10 1 5 2 9 3 1 6 4 11 15 12 19 20
You get:
root: 10
left sub-tree: 1 5 2 9 3 1 6 4
right sub-tree: 11 15 12 19 20
Say you have the following binary tree:
A
/ \
B C
/ \ / \
D E F G
/ \
H I
A Pre-Order Traversal goes NODE, LEFT, RIGHT.
So Pre-Order of this binary tree would be: A B D E H I C F G
For more details on how to implement this in C++: https://stackoverflow.com/a/17658699/445131

Find rank of a number on basis of number of 1's

Let f(k) = y where k is the y-th number in the increasing sequence of non-negative integers with
the same number of ones in its binary representation as k, e.g. f(0) = 1, f(1) = 1, f(2) = 2, f(3) = 1, f(4)
= 3, f(5) = 2, f(6) = 3 and so on. Given k >= 0, compute f(k)
many of us have seen this question
1 solution to this problem to categorise numbers on basis of number of 1's and then find the rank.i did find some patterns going by this way but it would be a lengthy process. can anyone suggest me a better solution?
This is a counting problem. I think that if you approach it with this in mind, you can do much better than literally enumerating values and checking how many bits they have.
Consider the number 17. The binary representation is 10001. The number of 1s is 2. We can get smaller numbers with two 1s by (in this case) re-distributing the 1s to any of the four low-order bits. 4 choose 2 is 6, so 17 should be the 7th number with 2 ones in the binary representation. We can check this...
0 00000 -
1 00001 -
2 00010 -
3 00011 1
4 00100 -
5 00101 2
6 00110 3
7 00111 -
8 01000 -
9 01001 4
10 01010 5
11 01011 -
12 01100 6
13 01101 -
14 01110 -
15 01111 -
16 10000 -
17 10001 7
And we were right. Generalize that idea and you should get an efficient function for which you simply compute the rank of k.
EDIT: Hint for generalization
17 is special in that if you don't consider the high-order bit, the number has rank 1; that is, f(z) = 1 where z is everything except the higher order bit. For numbers where this is not the case, how can you account for the fact that you can get smaller numbers without moving the high-order bit?
f(k) are integers less than or equal to k that have the same number of ones in their binary representation as k.
For example, k needs m bits, that is k = 2^(m-1) + a, where a < 2^(m-1). The number of integers less than 2^(m-1) that have the same number of bits as k is choose(m-1, bitcount(k)), since you can freely redistribute the ones among the m-1 least significant bits.
Integers that are greater than or equal to 2^(m-1) have the same most significant bit as k (which is 1), so there are f(k - 2^(m-1)) of them. This implies f(k) = choose(m-1, bitcount(k)) + f(k-2^(m-1)).
See "Efficiently Enumerating the Subsets of a Set". Look at Table 3, the "Bankers sequence". This is a method to generate exactly the sequence you need (if you reverse the bit order). Just run K iterations for the word with K bits. There is code to generate it included in the paper.