Implementation of Ball-Larus algorithm - llvm

Is an implementation of Ball-larus 'efficient path profiling' algorithm available anywhere?
[An implementation in llvm would be more helpful]
Here is a Citeseer link to the original paper BL96

There already is an implementation of Ball-Larus path profiling in LLVM. Additional patches in this area are being solicited.

All that i was able to find is some Pseudo-Code:
for all node n in reverse topological order do
if n is a leaf then
NumPaths(n) ← 1
else
NumPaths(n) ← 0
for all edge e of the form n → m do
Val(e) ← NumPaths(n)
NumPaths(n) ← NumPaths(n) + NumPaths(m)
end for
end if
end for

Related

tANS Mininum Size of State Set to Safely Encode a Symbol Frame

Hi I'm trying to implement tANS in a compute shader, but I am confused about the size of the state set. Also apologies but my account is too new to embed pictures of latex formatted equations.
Imagine we have a symbol frame S comprised of symbols s₁ to sₙ:
S = {s₁, s₂, s₁, s₂, ..., sₙ}
|S| = 2ᵏ
and the probability of each symbol is
pₛₙ = frequency(sₙ) / |S|
∑ pₛ₁ + pₛ₂ + ... pₛₙ = 1
According to Jarek Duda's slides (which can be found here) the first step in constructing the encoding function is to calculate the number of states L:
L = |S|
so that we can create a set of states
𝕃 = {L, ..., 2L - 1}
from which we can construct the encoding table from. In our example, this is simple L = |S| = 2^k. However, we don't want L to necessarily equal |S| because |S| could be enormous, and constructing an encoding table corresponding to size |S| would be counterproductive to compression. Jarek's solution is to create a quantization function so that we can choose an
L : L < |S|
which approximates the symbol probabilities
Lₛ / L ≈ pₛₙ
However as L decreases, the quality of the compression decreases, so I have two questions:
How small can we make L while still achieving compression?
What is a "good" way of determining the size of L for a given |S|?
In Jarek's ANS toolkit he uses the depth of a Huffman tree created from S to get the size of L, but this seems like a lot of work when we already know the upper bound of L (|S|; as I understand it when L = |S| we are at the Shannon entropy; thus making L > |S| would not increase compression). Instead it seems like it would be faster to choose an L that is both less than |S| and above some minimum L. A "good" size of L therefore would achieve some amount of compression, but more importantly would be easy to calculate. However we would need to determine the minimum L. Based on the pictures of sample ANS tables it seems like the minimum size of L could be the frequency of the most probable symbol, but I don't know enough about ANS to confirm this.
After mulling it over for awhile, both questions have very simple answers. The smallest L that still achieves lossless compression is L = |A|, where A is the alphabet of symbols to be encoded(I apologize, the lossless criterion should have been included in the original question). If L < |A| then we are pigeonholing symbols, thus losing information. When L = |A| what we essentially have is a fixed length variable code, where each symbol has an equal probability weighting in our encoding table. The answer to the second part is even more simple now that we know the answer to the first question. L can be pretty much whatever you want so long as its greater than the size of the alphabet to be encoded. Usually we want L to be a power of two for computational efficiency and then we want L to be greater than |A| to achieve better compression, so a very common L size is 2 times the greatest power of two equal to or greater than the size of the alphabet. This can easily be found by something like this:
int alphabetSize = SizeOfAlphabet();
int L = pow(2, ceil(log(alphabetSize, 2)) + 1);

Implementing an Automaton in Prolog

I'm new to Prolog. I managed to learn C and Java relatively quickly but and Prolog is giving me a lot of trouble. My trouble is understanding lists and writing functions? For example. We have this automaton:
I can do this task in C and Java, no problems. But the course wants Prolog. With my current knowledge I could do things like this:
% 1. Check whether all integers of the list are < 10.
less_than_10([]).
less_than_10([Head|Tail]) :-
Head < 10,
less_than_10(Tail).
Just so you know where my knowledge is at. Very basic. I did read the list chapter in Learn Prolog Now but it's still confusing me. They gave us a hint:
Every node should be presented like:
delta(1, d, 2)
% or
alpha(2, a, 2)
They also told us to pass the list in questions to a predicate that returns true if the list fits the automaton and false if not:
accept([d,a,b,a,b,b,b,c,d,c]).
The output is true.
Where to go from here? I'm guessing the first step is to check if the Head of the list is 1. How do I do that? Also, should I add every node as fact into the knowledge base?
So that's pretty easy. Super-direct, much more than if you were using C or Java.
Let's write an interpreter for this graph that:
Is given a list of named transitions ;
Walks the transitions using the given graph along a path through that graph ;
Accepts (Succeeds) the list if we end up at a final state ;
Rejects (Fails) the list if we do not ;
And.. let's say throws an exception if the list cannot be generated by the given graph.
Prolog gives us nondeterminism for free in case there are several paths. Which is nice.
We do not have an class to describe the automaton. In a sense, the Prolog program is the automaton. We just have a set of predicates which describe the automaton via inductive definitions. Actually, if you slap a module definition around the source below, you do have the object.
First describe the graph. This is just a set of Prolog facts.
As required, we give the transitions (labeled by atoms) between nodes (labeled by integers), plus we indicate which are the start and end nodes. There is no need to list the nodes or edges themselves.
delta(1,d,2).
delta(2,a,2).
delta(2,b,2).
delta(2,d,4).
delta(2,e,5).
delta(2,c,3).
delta(3,d,6).
delta(6,c,5).
start(1).
end(4).
end(5).
A simple database. This is just one possible representation of course.
And now for the graph walker. We could use Definite Clause Grammars here because we are handling a list, but lets' not.
First, a predicate which "accepts" or "rejects" a list of transitions.
It looks like:
% accepts(+Transitions)
It starts in a start state, then "walks" by removing transitions off the list until the list is empty. Then it checks whether it is at an end state.
accepts(Ts) :- % accept the list of transitions if...
start(S), % you can accept the list starting
accepts_from(S,Ts). % from a start state
accepts_from(S,[T|Ts]) :- % accepts the transitions when at S if...
delta(S,T,NextS), % there is a transition S->NextS via T
accepts_from(NextS,Ts). % and you can accept the remaining Ts from NextS. (inductive definition)
accepts_from(S,[]) :- % if there is no transition left, we accept if...
end(S). % we are a final state
Ah, we wanted to throw if the path was impossible for that graph. So a little modification:
accepts(Ts) :- % accept the list of transitions if...
start(S), % you can accept the list starting
accepts_from(S,Ts). % from a start state
accepts_from(S,[T|Ts]) :- % accepts the transitions when at S if...
delta(S,T,NextS), % there is a transition S->NextS via T
accepts_from(NextS,Ts). % and you can accept the remaining Ts from NextS.
accepts_from(S,[T|Ts]) :- % accepts the transitions when at S if...
\+ delta(S,T,NextS), % there is NO transition S->NextS via T
format(string(Txt),"No transition at ~q to reach ~q",[S,[T|Ts]]),
throw(Txt).
accepts_from(S,[]) :- % if there is no transition left, we accept if...
end(S). % we are a final state
And so:
?- accepts([d,a,b,a,b,b,b,c,d,c]).
true ; % yup, accepts but maybe there are other paths?
false. % nope
?- accepts([d,a,a,a,a,e]).
true ;
false.
?- accepts([d,a,a,a,a]).
false.
?- accepts([d,c,e,a]).
ERROR: Unhandled exception: "No transition at 3 to reach [e,a]"
The above code should also be able to find acceptable paths through the graph. But it does not:
?- accepts(T).
... infinite loop
This is not nice.
The primary reason for that is that accept/2 will immediately generate an infinite path looping at state 2 via transitions a and b. So one needs to add a "depth limiter" (the keyword is "iterative deepening").
The second reason would be that the test \+ delta(S,T,NextS) would succeed at node 4 for example (because there is nowhere to go from that node) and cause an exception before trying out the possibility of going nowhere (the last clause). So when generating, throwing is a hindrance, one just wants to reject.
Addendum: Also generate
The following only accepts/rejects and does not throw, but can also generate.
:- use_module(library(clpfd)).
accepts(Ts,L) :- % Accept the list of transitions Ts of length L if
start(S), % ...starting from a start state S
accepts_from(S,Ts,L). % ...you can accept the Ts of length L.
accepts_from(S,[T|Ts],L) :- % Accept the transitions [T|Ts] when at S if
(nonvar(L)
-> L >= 1
; true), % L (if it is bound) is at least 1 (this can be replaced by L #> 0)
delta(S,T,SN), % ...and there is a transition S->SN via T
Lm #= L-1, % ...and the new length is **constrained to be** 1 less than the previous length
accepts_from(SN,Ts,Lm). % ...and you can accept the remaining Ts of length Lm from SN.
accepts_from(S,[],0) :- % If there is no transition left, length L must be 0 and we accept if
end(S). % ...we are a final state.
delta(1,d,2).
delta(2,a,2).
delta(2,b,2).
delta(2,d,4).
delta(2,e,5).
delta(2,c,3).
delta(3,d,6).
delta(6,c,5).
start(1).
end(4).
end(5).
generate :-
between(0,7,L),
findall(Ts,accepts(Ts,L),Bag),
length(Bag,BagLength),
format("Found ~d paths of length ~d through the graph\n",[BagLength,L]),
maplist({L}/[Ts]>>format("~d : ~q\n",[L,Ts]),Bag).
And so:
?- accepts([d,a,b,a,b,b,b,c,d,c],_).
true ;
false.
?- accepts([d,a,a,a,a],_).
false.
?- accepts([d,c,e,a],_).
false.
?- generate.
Found 0 paths of length 0 through the graph
true ;
Found 0 paths of length 1 through the graph
true ;
Found 2 paths of length 2 through the graph
2 : [d,d]
2 : [d,e]
true ;
Found 4 paths of length 3 through the graph
3 : [d,a,d]
3 : [d,a,e]
3 : [d,b,d]
3 : [d,b,e]
true ;
Found 9 paths of length 4 through the graph
4 : [d,a,a,d]
4 : [d,a,a,e]
4 : [d,a,b,d]
4 : [d,a,b,e]
4 : [d,b,a,d]
4 : [d,b,a,e]
4 : [d,b,b,d]
4 : [d,b,b,e]
4 : [d,c,d,c]
true
Here's my answer. I sought to completely separate the data from the logic.
There are rules to infer the possible paths, start and end nodes.
The edge/2 predicate stands for either an alpha or a delta line.
The path (DCG) predicate describes a list of edges that ends with an end node.
The start and end nodes are inferred using the start_node/1 and end_node/1 predicates.
Finally, the phrase/3 is used to describe the list of paths that are valid automata.
delta(1, d, 2).
delta(2, d, 4).
delta(2, e, 5).
delta(2, c, 3).
delta(3, d, 6).
delta(6, c, 5).
alpha(2, a, 2).
alpha(2, b, 2).
edge(Node, Node, Via) :-
alpha(Node, Via, Node).
edge(From, To, Via) :-
delta(From, Via, To).
path(From, To) -->
{ end_node(To),
dif(From, To),
edge(From, To, Via)
},
[Via].
path(From, To) -->
{edge(From, Mid, Via)},
[Via],
path(Mid, To).
start_node(Node) :-
node_aux(start_node_aux, Node).
end_node(Node) :-
node_aux(end_node_aux, Node).
start_node_aux(Node) :-
edge(Node, _, _),
\+ edge(_, Node, _).
node_aux(Goal, Node) :-
setof(Node, call(Goal, Node), Nodes),
member(Node, Nodes).
end_node_aux(Node) :-
edge(_, Node, _),
\+ edge(Node, _, _).
automaton -->
{start_node(Start)},
path(Start, _End).
accept(Steps) :-
length(Steps, _N),
phrase(automaton, Steps).
I suspect that David did not use Definite Clause Grammars because you should be familiar with the basics before learning DCGs.

Finding the longest run of positive integers in OCaml

Trying an OCaml question of iterating through the list and finding the longest run of positive or negative integers. My thinking so far is you have to use List.fold_left and somehow +1 to the accumulator each time the next sign is the same as the current sign. However, I'm a bit stuck on how to save that value. Any help would be appreciated.
I suspect you're being downvoted because this is the kind of basic question that's probably best answered by looking at an introduction to OCaml or functional programming in general.
The essential idea of folds in general and List.fold_left in particular is to maintain some state while traversing a collection (or a list in particular). When you say you want to "save" a value, the natural answer is that the value would be part of the state that you maintain while traversing the list.
The template for calling List.fold_left looks like this:
let final_state =
List.fold_left update_function initial_state list
The update function takes the current state and the next element of the list, and returns the value of the next state. So it looks like this:
let update_function old_state list_element =
let new_state =
(* compute based on old state and element *)
in
new_state
So the explicit answer to your question is that your update function (the function that you "fold" over the list) would save a value by returning it as part of the new state.
Here's some code that returns the largest non-negative integer that it sees in a list:
let largest_int list =
let update largest_so_far element =
max largest_so_far element
in
List.fold_left update 0 list
This code "saves" the largest int seen so far by returning it as the value of the update function. (Note that it returns a value of 0 for an empty list.)
Ah dang im also up all night doing OCaml LOL!!!
here's my shot at it ... one way is to sort the list and then find the first positive or first negative depending on how you sort ... checkout sort function from https://caml.inria.fr/pub/docs/manual-ocaml/libref/List.html ... then it's easy to get the size.
Let's say you don't want to use this built in library ... This code should be close to working (If my editor/debugger worked with OCaml I'd further test but I think this is close to good)
let mostPositives(l: int list) : int list=
let counter = ref 0 in
let maxSize = ref 0 in
for i = 0 to List.length(l) -1 do
if (List.nth l i) >= 0 then
counter := !counter + 1
else
counter := 1;
maxSize := (max !counter !maxSize );
done;

How to exclude a particular tuple from a list in haskell

Im very confused on how to filter out the element (1,1) from this list in the code below.
take 10 [ (i,j) | i <- [1,2],
j <- [1..] ]
yields
[(1,1),(1,2),(1,3),(1,4),(1,5),(1,6),(1,7),(1,8),(1,9),(1,10)]
My thoughts were to use something like filter but Im not too sure where to implement it.
My go was Filter ((i,j) /=0) "the list"
Thanks
Your attempt
Filter ((i,j) /=0) "the list"
has a few problems, which can be fixed.
First, the function is called filter. Second, its first argument must be a function: so you can use \(i,j) -> ... to take a list as input. Third, you want (i,j) /= (1,1) -- you can't compare a pair (i,j) to a single number 0.
You should now be able to correct your code.
As an alternative to using filter, you can also specify that you don't want (1,1) as an element within your list comprehension by adding a guard expression (i,j) /= (1,1):
take 10 [ (i,j) | i &lt- [1,2], j &lt- [1..], (i,j) /= (1,1) ]
This is similar to how you might write a set comprehension (which list comprehensions mimic):
This answer gives a nice example ([x | i <- [0..10], let x = i*i, x > 20]) of the three types of expression you can have in the tail end of a list comprehension:
Generators, eg. i <- [0..10] provide the sources of values.
Guards, eg. x > 20 are arbitrary predicates - for any given values from the generators, the value will only be included in the result if all the predicates hold.
Local declarations, eg. let x = i*i perform the same task as normal let/where statements.
Names for the different expressions taken from the syntax reference, expression qual.

Time complexity of :: and # (OCaml)

I was reading this and was wondering if :: is always more efficient than # or if only in that particular implementation of rev
let rec rev = function
| [] -> []
| h::t -> rev t # [h]
let rev l =
let aux accu = function
| [] -> accu
| h::t -> aux (h :: accu) t
For example, if I wanted to append an element on a queue, would there be a difference in these two methods:
let append x q = q # [x]
and
type 'a queue = {front:'a list; back:'a list}
let norm = function
| {front=[]; back} -> {front=List.rev back; back=[]}
| q -> q
let append x q = norm {q with back=x::q.back}
# is O(n) in the size of its left operand. :: is O(1) and List.rev (as defined in the standard library) is O(n).
So if you apply # n times, you get an O(n^2) algorithm. But if you apply :: n times, that's only O(n) and if you then reverse the result once at the end, it's still O(n). This is true in general and for that reason any algorithm that appends to the end of a list multiple times, should instead prepend to the beginning of the list multiple times and then reverse the result.
However your example is different. Here you're replacing one single use of # with one possible use of rev. Since both are O(n), you end up with the same complexity in the case where you use rev.
However the case where you use rev won't happen a lot, so the complexity of enqueuing n should end up amortized O(n) (and if you don't dequeue anything in between, it's just plain O(n)). Whereas your version using # would lead to O(n^2).
I was reading this and was wondering if :: is always more efficient than #
Basically yes, but the question is blurry enough to find very special cases where both are as efficient.
The efficiency or the complexity of an operation is typically expressed using an asymptotic equivalent of the computing cost of this operation in terms of the size of its input. That said we can compare precisely the complexity of :: and # by stating:
The complexity of computing x :: lst is O(1), that is, it is bounded by a constant cost independant of the inputs.
The complexity of computing a # b is O(List.length a).
(The notation used is called big-O notation or Landau notation and should be described in most computer science textbooks.)
For example, if I wanted to append an element on a queue, would there be a difference in these two methods:
These two methods have equivalent complexity, running in O(length q). The complexity of the second operation is carried by the List.rev operation.