huffman coding for a text file - ocaml

This is only part of my huffman tree generated using ocaml. The tree is represented as (char*int list) list:
[(' ', [0]); ('e', [1; 0]); ('t', [1; 1; 0]); ('a', [1; 1; 1; 0]);
('o', [1; 1; 1; 1; 0]); ('n', [1; 1; 1; 1; 1; 0]).....].
The (char*int list) is the code and the corresponding encoded bitstream. I'm wondering if this is a correct tree or I understood something wrong. In this way, the longest encoded ASC II code will be 255 bits. The original file is 213.3k and after encoding, it becomes 227k while in the instructions, I was told it should generate a file around 119k. I don't know where my problem is because I did everything following the instructions. Can someone tell me what is wrong in here?
My biggest problem is that: if I use huffman coding, only the 8 most frequent chars can save me space while the other 247 chars will cost extra space, is that true? If it isn't, why?
The codes I wrote was following the instructions in this link:
http://www.cs.cornell.edu/Courses/cs3110/2012sp/hw/ps3/ps3.html
This is my code of encoding function:
type huffmantree = Node of huffmantree*(int*int)*huffmantree
| Leaf of char*int | Nil
type encoding = char * (int list)
let look_up (chr: char) (encl : encoding list) : int list =
let rec look_up_rec encl =
match encl with
| [] -> raise (Failure "Not found")
| (ch,theL)::tl -> if ch = chr then theL
else look_up_rec tl
in
look_up_rec encl
;;
let get_codes (hm : huffmantree): encoding list =
let rec get_codes_rec aTree word=
match aTree with
| Nil -> []
| Node (Leaf(lKey,lFreq),value,Nil) -> [(lKey,[0])]
| Node (Leaf(lKey,lFreq),value,Leaf(rKey,rFreq)) ->
[(lKey,List.append word [0]);(rKey,List.append word [1])]
| Node (Leaf(lKey,lFreq),value,rNode) ->
(lKey,List.append word [0])::(get_codes_rec rNode (List.append word [1]))
in
get_codes_rec hm []
;;
let encode (text : char list) : huffmantree * int list =
let sortedT = List.fast_sort (fun ch1 ch2->
if (int_of_char ch1)>=(int_of_char) ch2 then 1 else -1) text
in
let rec cre_freq_list aList m =
match aList with
| [] -> []
| hd::[] -> [(hd,m+1)]
| hd1::hd2::tl -> if hd1=hd2 then cre_freq_list (hd2::tl) (m+1)
else (hd1,(m+1))::(cre_freq_list (hd2::tl) 0)
in
let sortedF = List.fast_sort (fun (ch1,fr1) (ch2,fr2) ->
if fr1>=fr2 then 1 else -1) (cre_freq_list sortedT 0)
in
let rec createHuff sortedF=
match sortedF with
| [] -> Nil
| (ch,va)::[] -> Node (Leaf (ch,va),(256,va),Nil)
| (ach,aval)::tl ->
let rec creH_rec the_tl sib n freq=
match the_tl with
| (bch,bval)::[] -> Node(Leaf (bch,bval),(n,bval+freq),sib)
| (bch,bval)::btl -> creH_rec btl
(Node (Leaf (bch,bval),(n,bval+freq),sib)) (n+1)
(freq+bval)
in creH_rec tl (Leaf(ach,aval)) 256 aval
in
let huff = createHuff sortedF
in
let rec make_codes text =
match text with
| [] -> []
| hd::tl -> List.append (look_up hd (get_codes huff))
(make_codes tl)
in
(huff,(make_codes text))

Looking at the resulting tree, it appears that you don't implement the Huffman's algorithm. I doubt the 'e' is more frequent in your text than any other letter. Without your code I can only guess but maybe when merging the two lightest trees you inserted the resulting tree at the end of the list of trees to merge instead of inserting it at the right place according to its weight.
In your code createHuff is declared recursive but there is no recursive call.
Your function createHuff never compares the values inside the sortedF list don't you think this is a problem? It means that createHuff will always yield the same tree (with different labels but with the same structure).

Related

deleting duplicates tail recursively in OCaml

I tried to write my own solution for this exercise by iterating through a list with a empty complst list where all non duplicates are inserted into and then get returned.
I know it is a over complicated approach after looking up the solution but would still like to understand why the pattern matching does not work as intended:
let compress list =
let rec aux complst lst =
match lst with
| [] -> complst
| a :: (b :: c) -> if a = b then aux complst (b::c) else aux (a::complst) (b::c)
| x -> x
in aux [] list;;
val comp : 'a list -> 'a list = <fun>
Regardless of the input, the output is always a list with only the last element:
compress [1;1;2;2;3];;
- : int list = [3]
compress [1;2;3];;
- : int list = [3]
Pattern matching
Your pattern-matching matches against three patterns:
The empty list: []
The list with at least two elements: a :: (b :: c)
A catch-all, which must by process of elimination be a list with a single element.
Consider what happens when we evaluate your example:
compress [1; 1; 2; 2; 3]
aux [] [1; 1; 2; 2; 3]
aux [] [1; 2; 2; 3]
aux [1] [2; 2; 3]
aux [1] [2; 3]
aux [2; 1] [3]
[3]
Oops, as soon as it hit lst being [3] it just returned it.
Let's rewrite your function to handle that single element list by adding to complst.
let compress lst =
let rec aux complst lst =
match lst with
| [] -> complst
| [x] -> aux (x::complst) []
| a :: (b :: c) ->
if a = b then aux complst (b::c)
else aux (a::complst) (b::c)
in
aux [] list
Now:
compress [1; 1; 2; 2; 3]
aux [] [1; 1; 2; 2; 3]
aux [] [1; 2; 2; 3]
aux [1] [2; 2; 3]
aux [1] [2; 3]
aux [2; 1] [3]
aux [3; 2; 1] []
[3; 2; 1]
Clean up and reversing the resulting list
Of course, there are also ways to clean up your code a bit using a conditional guard and _ for values you don't need to bind names to. You probably also want to reverse your accumulator.
let compress lst =
let rec aux complst lst =
match lst with
| [] -> List.rev complst
| [x] -> aux (x::complst) []
| a :: (b :: _ as tl) when a = b -> aux complst tl
| a :: (_ :: _ as tl) -> aux (a::complst) tl
in
aux [] lst
Fold
When you see this pattern of iterating over a list one element at a time and accumulating a new value, you can usually map that pretty well to List.fold_left.
let compress lst =
List.(
fold_left
(fun i x ->
match i with
| (x'::_) when x = x' -> i
| _ -> x::i)
[] lst
|> rev
)
Because List.fold_left can only be aware of one element at a time on the list, the function we pass as its first argument can't be aware of the next element in the list. But it is aware of the accumulator or "init" value. In this case that's another list, and we can pattern match out that list.
If it's not empty and the first element is equal to the current element we're looking at, don't add it to the result list. Otherwise, do add it. This also handles the first element case where the accumulator is empty.
Kudos on creating a tail-recursive solution to this problem!
The problem with your code here is mainly the last part, which corresponds to when you have the last element in your list so here [3], and you return that list with this single element.
What you need to do instead is append it to complst like this :
let compress list =
let rec aux complst lst =
match lst with
| [] -> complst
| a :: (b :: c ) -> if a=b then aux complst (b::c) else aux (a::complst) (b::c)
| x::e -> x::complst
in aux [] list;;
val comp : 'a list -> 'a list = <fun>
Now you can check with the given example :
compress [1;1;2;2;3];;
- : int list = [3; 2; 1]
Hope it helps you understand your mistake better.
Note regarding comments:
you should keep the [] case, because although it can only happen in one scenario, it is still a valid input meaning it must be kept!.

Set a key value to the closest node element given two lists

Say I have a list of keys, k = [2,3,7,15,18,23] ; and a list of nodes, n = [1,5,10,15,20] . Both lists are sorted lists.
Then the "closest next node", or the successor node for key k = 2 is n = 5 ; for k = 3 is n = 5; for k = 7 is n = 10 , and so on. If the key value is greater than the last node value, then its successor node is the first node element, so k = 23 is n = 1. I want to output a list array that maps each successor nodes with their keys in format [[successor_node1, key, key],[successor_node2, key, key],...]. So the results for example is output_array = [[5,2,3],[10,7,],[15,15],[20,18],[1,23]]
how can I achieve these with F# in just ONE function?
You can do this by writing a recursive function that iterates over the two lists and pattern matches on the first elements. To keep the result, the best option is probably to use an immutable map - as you go, you can add the values for the individual keys associated with individual successor nodes:
let k = [2;3;7;15;18;23]
let n = [1;5;10;15;20]
let rec findSuccessors first res k n =
// Add a key 'k' associated with a successor node 'n' to the list
let add k n =
match Map.tryFind n res with
| None -> Map.add n [n; k] res
| Some l -> Map.add n (l # [k]) res
match k, n with
| [], _ ->
// If there are no more keys, we return the results
res |> Map.toList |> List.map snd
| k::ks, [] ->
// If there are no more successors, use the special 'first'
findSuccessors first (add k first) ks []
| k::ks, n::ns when n < k ->
// If we have a key 'k', but the next node is smaller, skip it
findSuccessors first res (k::ks) ns
| k::ks, n::ns ->
// Found a key 'k' with a successor 'n' - add it to the list
findSuccessors first (add k n) ks (n::ns)
findSuccessors (List.head n) Map.empty k n
I came up with a new solution to your description of the problem, rather than trying to modify your code. I'm using quite a different approach: no mutable variables or data structures, just pure functional code with one recursive function. I did this because it was easier for me, not because pure code is always better.
let mapNodes startingNodes startingKeys =
let rec loop remainingNodes remainingKeys acc =
match remainingNodes, remainingKeys with
| _, [] ->
acc
| [], keys ->
let next = startingNodes |> List.tryHead |> Option.map (fun firstNode -> firstNode :: keys)
match next with
| Some next -> next :: acc
| None -> acc // this shouldn't happen if there is at least one starting node
| nextNode :: restNodes, keys ->
let keysForNode = keys |> List.takeWhile (fun key -> key <= nextNode)
match keysForNode with
| [] ->
loop restNodes keys acc
| keysForNode ->
let next = nextNode :: keysForNode
let restKeys = keys |> List.skip keysForNode.Length
loop restNodes restKeys (next :: acc)
loop (startingNodes |> List.tail) startingKeys [] |> List.rev
let nodes = [ 1; 5; 10; 15; 20 ]
let keys = [ 2; 3; 7; 15; 18; 23 ]
let expected = [ [ 5; 2; 3 ]; [ 10; 7 ]; [ 15; 15 ]; [ 20; 18 ]; [ 1; 23 ] ]
let result = mapNodes nodes keys // [[5; 2; 3]; [10; 7]; [15; 15]; [20; 18]; [1; 23]]
result = expected // true
The general approach is to use a recursive loop that explicitly passes through all of the input state required, rather than using mutable variables. An accumulator acc is also passed through to gather the output.
This code uses a List.takeWhile, followed by a List.skip on the same list. This is slightly inefficient. It could be improved if there was a List.splitWhen function in the F# library, or if you were to write one yourself.
One more attempt in addition to what was proposed earlier :) I'm not well familiar with F# standard library and idioms, so it might be not idiomatic/suboptimal/both, but I tried to solve it in a very straightforward way (as I would explain the solution verbally):
let nearest_keys_per_node keys nodes =
(* Simple helper function that finds the nearest next node for a given key *)
let nearest_next_node nodes k =
match nodes with
| [] -> failwith "Empty nodes list!"
| hd :: tl ->
let rec nearest_node_tr k current_best = function
| [] -> current_best
| hd :: tl when hd < k -> nearest_node_tr k current_best tl
| hd :: tl -> hd
nearest_node_tr k hd tl
List.map (nearest_next_node nodes) keys (* Get the nearest next node for each key *)
|> List.zip keys (* "Glue" them together with the keys - gettin a list of tuples (key, node) *)
|> Seq.groupBy (fun (_, node) -> node) (* Group by nodes*)
|> List.ofSeq
|> List.map (fun (node, seq) -> (* "Cleanup" the structure that we got after the grouping and transform in to your desired output *)
node :: (List.ofSeq(seq) |> List.map fst)
)
;;
> nearest_keys_per_node [2;3;7;15;18;23] [1;5;10;15;20];;
val it : int list list = [[5; 2; 3]; [10; 7]; [15; 15]; [20; 18]; [1; 23]]

How to count the number of recurring character repetitions in a char list?

My goal is to take a char list like:
['a'; 'a'; 'a'; 'a'; 'a'; 'b'; 'b'; 'b'; 'a'; 'd'; 'd'; 'd'; 'd']
Count the number of repeated characters and transform it into a (int * char) list like this:
[(5, 'a'); (3, 'b'); (1, 'a'); (4, 'd')]
I am completely lost and also am very very new to OCaml. Here is the code I have rn:
let to_run_length (lst : char list) : (int * char) list =
match lst with
| [] -> []
| h :: t ->
let count = int 0 in
while t <> [] do
if h = t then
count := count + 1;
done;
I am struggling on how to check the list like you would an array in C or Python. I am not allowed to use fold functions or map or anything like that.
Edit: Updated code, yielding an exception on List.nth:
let rec to_run_length (lst : char list) : (int * char) list =
let n = ref 0 in
match lst with
| [] -> []
| h :: t ->
if h = List.nth t 0 then n := !n + 1 ;
(!n, h) :: to_run_length t ;;
Edit: Added nested match resulting in a function that doesn't work... but no errors!
let rec to_run_length (lst : char list) : (int * char) list =
match lst with
| [] -> []
| h :: t ->
match to_run_length t with
| [] -> []
| (n, c) :: tail ->
if h <> c then to_run_length t
else (n + 1, c) :: tail ;;
Final Edit: Finally got the code running perfect!
let rec to_run_length (lst : char list) : (int * char) list =
match lst with
| [] -> []
| h :: t ->
match to_run_length t with
| (n, c) :: tail when h = c -> (n + 1, h) :: tail
| tail -> (1, h) :: tail ;;
One way to answer your question is to point out that a list in OCaml isn't like an array in C or Python. There is no (constant-time) way to index an OCaml list like you can an array.
If you want to code in an imperative style, you can treat an OCaml list like a list in C, i.e., a linked structure that can be traversed in one direction from beginning to end.
To make this work you would indeed have a while statement that continues only as long as the list is non-empty. At each step you examine the head of the list and update your output accordingly. Then replace the list with the tail of the list.
For this you would want to use references for holding the input and output. (As a side comment, where you have int 0 you almost certainly wanted ref 0. I.e., you want to use a reference. There is no predefined OCaml function or operator named int.)
However, the usual reason to study OCaml is to learn functional style. In that case you should be thinking of a recursive function that will compute the value you want.
For that you need a base case and a way to reduce a non-base case to a smaller case that can be solved recursively. A pretty good base case is an empty list. The desired output for this input is (presumably) also an empty list.
Now assume (by recursion hypothesis) you have a function that works, and you are given a non-empty list. You can call your function on the tail of the list, and it (by hypothesis) gives you a run-length encoded version of the tail. What do you need to do to this result to add one more character to the front? That's what you would have to figure out.
Update
Your code is getting closer, as you say.
You need to ask yourself how to add a new character to the beginning of the encoded value. In your code you have this, for example:
. . .
match to_run_length t with
| [] -> []
. . .
This says to return an empty encoding if the tail is empty. But that doesn't make sense. You know for a fact that there's a character in the input (namely, h). You should be returning some kind of result that includes h.
In general if the returned list starts with h, you want to add 1 to the count of the first group. Otherwise you want to add a new group to the front of the returned list.

OCaml Counting if char exists in char list : syntax error

let count (l: char list) : bool =
let cnt = 0 in
let rec check l =
match l with
| [] -> false
| h::t -> if h = 'a' then
let cnt + 1
check t
else check t
in check []
;;
what causes the syntax error?
I want to add 1 if 'a' exists and -1 if 'b' exists
You should write it like this :
let rec count l =
match l with
| [] -> 0
| c::t -> if c = 'a' then 1 + count t else count t
But this isn't tail recursive. To make it tail recursive, you'll have to add an accumulator and write an auxiliary function:
let count l =
let rec aux cnt l =
match l with
| [] -> cnt
| c::t -> if c = 'a' then aux (cnt + 1) t else aux cnt t
in
aux 0 l
Now this is tail recursive but a little bit long to write, so you could just use List.fold_left and still have a tail recursive function:
let count l =
List.fold_left (fun cnt c -> if c = 'a' then cnt + 1 else cnt) 0 l
OCaml is a functional language, variables are immutable (their content might be mutable, but that is not the default).
Consequently,
let cnt + 1
is both a syntax error and a logical error: it is not possible to update cnt in such way.
As a first step, I would suggest to write a recursive version of count (and maybe rename it to exists_a) without using ifs :
let rec count l = match l with
| [] -> ...
| 'a' :: t -> ...
| _ :: t -> ...

Insert implementation for a trie in Ocaml

I don't have any idea on how to change the code for my add function.
type trie = Node of bool * (char * trie) list
let explode word =
let rec explode' i acc =
if i < 0 then acc else explode' (i-1) (word.[i] :: acc)
in explode' (String.length word - 1) []
let rec exists w tr = match w, tr with
| [], Node (b, _) -> b
| h::t, Node (_, l) -> try exists t (List.assoc h l) with Not_found -> false
let rec add w tr = match w, tr with
| [], Node (_, l) -> Node (true, l)
| h :: t, Node (b, l) -> try add t (List.assoc h l)
with Not_found -> Node (false, (h, add t tr) :: l)
The problem is when List.assoc h l finds something , then I don't keep track of my structure, no Node is built during the recursive call so I am losing data.
Example :
# let empty = Node(true, []);;
- : trie = Node (true, [])
# let w = explode "hi";;
val w : char list = ['h'; 'i']
# let ww = explode "hit";;
val ww : char list = ['h'; 'i'; 't']
# let tr = add w x;;
val tr : trie = Node (false, [('h', Node (false, [('i', Node (true, []))]))])
# add ww tr;;
- : trie = Node (false, [('t', Node (true, []))])
It seems your basic plan is to work down through the data structure with List.assoc, then add your new node when you find the right spot. This makes sense if you can modify the structure. However, your data structure is immutable. With immutable data, your basic plan must be to build a new data structure rather than to modify the old one. So you have to imagine yourself finding the right spot while keeping traack of the old structure along the way, then building up a new structure starting from the spot.
Here's some code that keeps an association list counting the number of instances of characters seen so far. Note that it returns a new association list rather than modifying the old one (which is impossible):
let rec add_char_count list char =
match list with
| [] -> [(char, 1)]
| (hchar, hcount) :: t ->
if hchar = char then (hchar, hcount + 1) :: t
else (hchar, hcount) :: add_char_count t char
The recursive call (hchar, hcount) :: add_char_count t char is the spot where the old structure is remembered. It rebuilds the old structure from the part of the list before where the new character is added.