tANS Mininum Size of State Set to Safely Encode a Symbol Frame - compression

Hi I'm trying to implement tANS in a compute shader, but I am confused about the size of the state set. Also apologies but my account is too new to embed pictures of latex formatted equations.
Imagine we have a symbol frame S comprised of symbols s₁ to sₙ:
S = {s₁, s₂, s₁, s₂, ..., sₙ}
|S| = 2ᵏ
and the probability of each symbol is
pₛₙ = frequency(sₙ) / |S|
∑ pₛ₁ + pₛ₂ + ... pₛₙ = 1
According to Jarek Duda's slides (which can be found here) the first step in constructing the encoding function is to calculate the number of states L:
L = |S|
so that we can create a set of states
𝕃 = {L, ..., 2L - 1}
from which we can construct the encoding table from. In our example, this is simple L = |S| = 2^k. However, we don't want L to necessarily equal |S| because |S| could be enormous, and constructing an encoding table corresponding to size |S| would be counterproductive to compression. Jarek's solution is to create a quantization function so that we can choose an
L : L < |S|
which approximates the symbol probabilities
Lₛ / L ≈ pₛₙ
However as L decreases, the quality of the compression decreases, so I have two questions:
How small can we make L while still achieving compression?
What is a "good" way of determining the size of L for a given |S|?
In Jarek's ANS toolkit he uses the depth of a Huffman tree created from S to get the size of L, but this seems like a lot of work when we already know the upper bound of L (|S|; as I understand it when L = |S| we are at the Shannon entropy; thus making L > |S| would not increase compression). Instead it seems like it would be faster to choose an L that is both less than |S| and above some minimum L. A "good" size of L therefore would achieve some amount of compression, but more importantly would be easy to calculate. However we would need to determine the minimum L. Based on the pictures of sample ANS tables it seems like the minimum size of L could be the frequency of the most probable symbol, but I don't know enough about ANS to confirm this.

After mulling it over for awhile, both questions have very simple answers. The smallest L that still achieves lossless compression is L = |A|, where A is the alphabet of symbols to be encoded(I apologize, the lossless criterion should have been included in the original question). If L < |A| then we are pigeonholing symbols, thus losing information. When L = |A| what we essentially have is a fixed length variable code, where each symbol has an equal probability weighting in our encoding table. The answer to the second part is even more simple now that we know the answer to the first question. L can be pretty much whatever you want so long as its greater than the size of the alphabet to be encoded. Usually we want L to be a power of two for computational efficiency and then we want L to be greater than |A| to achieve better compression, so a very common L size is 2 times the greatest power of two equal to or greater than the size of the alphabet. This can easily be found by something like this:
int alphabetSize = SizeOfAlphabet();
int L = pow(2, ceil(log(alphabetSize, 2)) + 1);

Related

Adding values from multiple .rrd file

Problem =====>
Basically there are three .rrd which are generated for three departments.
From that we fetch three values (MIN, MAX, CURRENT) and print ins 3x3 format. There is a python script which does that.
eg -
Dept1: Min=10 Max=20 Cur=15
Dept2: Min=0 Max=10 Cur=5
Dept3: Min=10 Max=30 Cur=25
Now I want to add the values together (Min, Max, Cur) and print in one line.
eg -
Dept: Min=20 Max=60 Cur=45
Issue I am facing =====>
No matter what CDEF i write, I am breaking the graph. :(
This is the part I hate as i do not get any error message.
As far as I understand(please correct me if i am wrong) I definitely cannot store the value anywhere in my program as a graph is returned.
What would be a proper way to add the values in this condition.
Please let me know if my describing the problem is lacking more detail.
You can do this with a VDEF over a CDEF'd sum.
DEF:a=dept1.rrd:ds0:AVERAGE
DEF:b=dept2.rrd:ds0:AVERAGE
DEF:maxa=dept1.rrd:ds0:MAXIMUM
DEF:maxb=dept2.rrd:ds0:MAXIMUM
CDEF:maxall=maxa,maxb,+
CDEF:all=a,b,+
VDEF:maxalltime=maxall,MAXIMUM
VDEF:alltimeavg=all,AVERAGE
PRINT:maxalltime:Max=%f
PRINT:alltimeavg:Avg=%f
LINE:all#ff0000:AllDepartments
However, you should note that, apart form at the highest granularity, the Min and Max totals will be wrong! This is because max(a+b) != max(a) + max(b). If you dont calculate the min/max aggregate at time of storage, the granularity will be gone at time of display.
For example, if a = (1, 2, 3) and b = (3, 2, 1), then max(a) + max(b) = 6; however the maximum at any point in time is in fact 4. The same issue applies to using min(a) + min(b).

Understanding; for i in range, x,y = [int(i) in i.... Python3

I am stuck trying to understand the mechanics behind this combined input(), loop & list-comprehension; from Codegaming's "MarsRover" puzzle. The sequence creates a 2D line, representing a cut-out of the topology in an area 6999 units wide (x-axis).
Understandably, my original question was put on hold, being to broad. I am trying to shorten and to narrow the question: I understand list comprehension basically, and I'm ok experienced with for-loops.
Like list comp:
land_y = [int(j) for j in range(k)]
if k = 5; land_y = [0, 1, 2, 3, 4]
For-loops:
for i in the range(4)
a = 2*i = 6
ab.append(a) = 0,2,4,6
But here, it just doesn't add up (in my head):
6999 points are created along the x-axis, from 6 points(x,y).
surface_n = int(input())
for i in range(surface_n):
land_x, land_y = [int(j) for j in input().split()]
I do not understand where "i" makes a difference.
I do not understand how the data "packaged" inside the input. I have split strings of integers on another task in almost exactly the same code, and I could easily create new lists and work with them - as I understood the structure I was unpacking (pretty simple being one datatype with one purpose).
The fact that this line follows within the "game"-while-loop confuses me more, as it updates dynamically as the state of the game changes.
x, y, h_speed, v_speed, fuel, rotate, power = [int(i) for i in input().split()]
Maybe someone could give an example of how this could be written in javascript, haskell or c#? No need to be syntax-correct, I'm just struggling with the concept here.
input() takes a line from the standard input. So it’s essentially reading some value into your program.
The way that code works, it makes very hard assumptions on the format of the input strings. To the point that it gets confusing (and difficult to verify).
Let’s take a look at this line first:
land_x, land_y = [int(j) for j in input().split()]
You said you already understand list comprehension, so this is essentially equal to this:
inputs = input().split()
result = []
for j in inputs:
results.append(int(j))
land_x, land_y = results
This is a combination of multiple things that happen here. input() reads a line of text into the program, split() separates that string into multiple parts, splitting it whenever a white space character appears. So a string 'foo bar' is split into ['foo', 'bar'].
Then, the list comprehension happens, which essentially just iterates over every item in that splitted input string and converts each item into an integer using int(j). So an input of '2 3' is first converted into ['2', '3'] (list of strings), and then converted into [2, 3] (list of ints).
Finally, the line land_x, land_y = results is evaluated. This is called iterable unpacking and essentially assumes that the iterable on the right has exactly as many items as there are variables on the left. If that’s the case then it’s just a nice way to write the following:
land_x = results[0]
land_y = results[1]
So basically, the whole list comprehension assumes that there is an input of two numbers separated by whitespace, it then splits those into separate strings, converts those into numbers and then assigns each number to a separate variable land_x and land_y.
Exactly the same thing happens again later with the following line:
x, y, h_speed, v_speed, fuel, rotate, power = [int(i) for i in input().split()]
It’s just that this time, it expects the input to have seven numbers instead of just two. But then it’s exactly the same.

Time based rotation

I'm trying to figure out the best way of doing the following:
I have a list of values: L
I'd like to pick a subset of this list, of size N, and get a different subset (if the list has enough members) every X minutes.
I'd like the values to be picked sequentially, or randomly, as long as all the values get used.
For example, I have a list: [google.com, yahoo.com, gmail.com]
I'd like to pick X (2 for this example) values and rotate those values every Y(60 for now) minutes:
minute 0-59: [google.com, yahoo.com]
minute 60-119: [gmail.com, google.com
minute 120-179: [google.com, yahoo.com]
etc.
Random picking is also fine, i.e:
minute 0-59: [google.com, gmail.com]
minute 60-119: [yahoo.com, google.com]
Note: The time epoch should be 0 when the user sets the rotation up, i.e, the 0 point can be at any point in time.
Finally: I'd prefer not to store a set of "used" values or anything like that, if possible. i.e, I'd like this to be as simple as possible.
Random picking is actually preferred to sequential, but either is fine.
What's the best way to go about this? Python/Pseudo-code or C/C++ is fine.
Thank you!
You can use the itertools standard module to help:
import itertools
import random
import time
a = ["google.com", "yahoo.com", "gmail.com"]
combs = list(itertools.combinations(a, 2))
random.shuffle(combs)
for c in combs:
print(c)
time.sleep(3600)
EDIT: Based on your clarification in the comments, the following suggestion might help.
What you're looking for is a maximal-length sequence of integers within the range [0, N). You can generate this in Python using something like:
def modseq(n, p):
r = 0
for i in range(n):
r = (r + p) % n
yield r
Given an integer n and a prime number p (which is not a factor of n, making p greater than n guarantees this), you will get a sequence of all the integers from 0 to n-1:
>>> list(modseq(10, 13))
[3, 6, 9, 2, 5, 8, 1, 4, 7, 0]
From there, you can filter this list to include only the integers that contain the desired number of 1 bits set (see Best algorithm to count the number of set bits in a 32-bit integer? for suggestions). Then choose the elements from your set based on which bits are set to 1. In your case, you would use pass n as 2N if N is the number of elements in your set.
This sequence is deterministic given a time T (from which you can find the position in the sequence), a number N of elements, and a prime P.

How do I count the number of elements in a list?

I need to write a small Prolog program to count the number of occurrence of each element in a list.
numberOfRepetition(input, result)
For example:
numberOfRepetition([a,b,a,d,c,a,b], X)
can be satisfied with X=[a/3,b/2,d/1,c/1] because a occurs three times, b occurs 2 times and c and d one time.
I don't want to give you the answer, so I gonna help you with it:
% Find the occurrences of given element in list
%
% occurrences([a,b,c,a],a,X).
% -> X = 2.
occurrences([],_,0).
occurrences([X|Y],X,N):- occurrences(Y,X,W),N is W + 1.
occurrences([X|Y],Z,N):- occurrences(Y,Z,N),X\=Z.
Depending on your effort and feedback, I can help you to get your answer.
Check out my answer to the related question "How to count number of element occurrences in a list in Prolog"!
In that answer I present the predicate list_counts/2, which should fot your needs.
Sample use:
:- list_counts([a,b,a,d,c,a,b],Ys).
Ys = [a-3, b-2, d-1, c-1].
Note that that this predicate uses a slightly different representation for key-value pairs expressing multiplicity: principal functor (-)/2 instead of (/)/2.
If possible, switch to the representation using (-)/2 for better interoperability with standard library predicates (like keysort/2).
If you wish to find element with max occurrences:
occurrences([],_,0).
occurrences([X|Y],X,N):- occurrences(Y,X,W),N is W + 1.
occurrences([X|Y],Z,N):- occurrences(Y,Z,N),X\=Z.
**make_list(Max):-
findall((Num,Elem),occurrences([d,d,d,a,a,b,c,d,e],Elem,Num),L),
sort(L,Sorted),
last(Sorted,(_,Max)).**

Sequence Compression?

lately i've faced a problem which gets me so confused ,
the problem is :
i want to compress a sequence so no information is lost , for example :
a,a,a,b --> a,b
a,b,a,a,c --> a,b,a,a,c (it can't be compressed to a,b,a,c because in this way we lose a,a)
Is there any algorithm to do such a thing ? what is name of this problem ? is it compression ? or anything else ?
I would really appreciate any help
Thanks in advance
Every algorithm which is able to transform data in such a way that is takes up less memory is called compression. May it be lossless or lossy.
e.g. (compressed form for "example given" :-) )
The following is imho the simples form, called run length encoding, short RLE:
a,a,a,b,c -> 3a,1b,1c
As you can see all subsequent characters which are identical are compressed into one.
You can also search for subsequent patterns which is much more difficult:
a,b,a,b,a,c --> 2(a,b),1(a),1(c)
There are lots of literature and web sources about compression algorithms, you should use them to get a deeper view.
Another good algorithm is Lempel–Ziv–Welch
I found marvellous this simple Javascript LZW function, from the magicians at 140 bytes of javascript :
function (
a // String to compress and placeholder for 'wc'.
){
for (
var b = a + "Ā", // Append first "illegal" character (charCode === 256).
c = [], // dictionary
d = 0, // dictionary size
e = d, // iterator
f = c, // w
g = c, // result
h; // c
h = b.charAt(e++);
)
c[h] = h.charCodeAt(), // Fill in the dictionary ...
f = 1 + c[a = f + h] ? a : (g[d++] = c[f], c[a] = d + 255, h); // ... and use it to compress data.
return g // Array of compressed data.
}
RLE
Yep, compression. A simple algorithm would be runlength encoding. There also information theory, which is the basis for compression algorithms.
Information theory: More common inputs should be shorter, thus making the sentence length shorter.
So, if you're encoding binary, where the sequence 0101 is very commmon (about 25% of the input), then a simple compression would be:
0101 = 0
anything else = 1[original 4 bits]
So the input: 0101 1100 0101 0101 1010 0101 1111 0101
Would be compressed to: 0 11100 0 0 11010 0 11111 0
Thats a compression of 32 bits -> 20 bits.
An important lesson: The compression algorithm choice is entirely dependent on the input. The wrong algorithm and you will likely make the data longer.
Unless you have to code some solution yourself, you could use some ZIP compression library for the programming language you're using.
And yes, this is data compression.
We can use the LZW compression algorithm to compress text files efficiently and quickly by making use of hash tables.