Insert implementation for a trie in Ocaml

Insert implementation for a trie in Ocaml - ocaml

I don't have any idea on how to change the code for my add function.
type trie = Node of bool * (char * trie) list
let explode word =
let rec explode' i acc =
if i < 0 then acc else explode' (i-1) (word.[i] :: acc)
in explode' (String.length word - 1) []
let rec exists w tr = match w, tr with
| [], Node (b, _) -> b
| h::t, Node (_, l) -> try exists t (List.assoc h l) with Not_found -> false
let rec add w tr = match w, tr with
| [], Node (_, l) -> Node (true, l)
| h :: t, Node (b, l) -> try add t (List.assoc h l)
with Not_found -> Node (false, (h, add t tr) :: l)
The problem is when List.assoc h l finds something , then I don't keep track of my structure, no Node is built during the recursive call so I am losing data.
Example :
# let empty = Node(true, []);;
- : trie = Node (true, [])
# let w = explode "hi";;
val w : char list = ['h'; 'i']
# let ww = explode "hit";;
val ww : char list = ['h'; 'i'; 't']
# let tr = add w x;;
val tr : trie = Node (false, [('h', Node (false, [('i', Node (true, []))]))])
# add ww tr;;
- : trie = Node (false, [('t', Node (true, []))])

It seems your basic plan is to work down through the data structure with List.assoc, then add your new node when you find the right spot. This makes sense if you can modify the structure. However, your data structure is immutable. With immutable data, your basic plan must be to build a new data structure rather than to modify the old one. So you have to imagine yourself finding the right spot while keeping traack of the old structure along the way, then building up a new structure starting from the spot.
Here's some code that keeps an association list counting the number of instances of characters seen so far. Note that it returns a new association list rather than modifying the old one (which is impossible):
let rec add_char_count list char =
match list with
| [] -> [(char, 1)]
| (hchar, hcount) :: t ->
if hchar = char then (hchar, hcount + 1) :: t
else (hchar, hcount) :: add_char_count t char
The recursive call (hchar, hcount) :: add_char_count t char is the spot where the old structure is remembered. It rebuilds the old structure from the part of the list before where the new character is added.

Related

Set a key value to the closest node element given two lists

Say I have a list of keys, k = [2,3,7,15,18,23] ; and a list of nodes, n = [1,5,10,15,20] . Both lists are sorted lists.
Then the "closest next node", or the successor node for key k = 2 is n = 5 ; for k = 3 is n = 5; for k = 7 is n = 10 , and so on. If the key value is greater than the last node value, then its successor node is the first node element, so k = 23 is n = 1. I want to output a list array that maps each successor nodes with their keys in format [[successor_node1, key, key],[successor_node2, key, key],...]. So the results for example is output_array = [[5,2,3],[10,7,],[15,15],[20,18],[1,23]]
how can I achieve these with F# in just ONE function?

You can do this by writing a recursive function that iterates over the two lists and pattern matches on the first elements. To keep the result, the best option is probably to use an immutable map - as you go, you can add the values for the individual keys associated with individual successor nodes:
let k = [2;3;7;15;18;23]
let n = [1;5;10;15;20]
let rec findSuccessors first res k n =
// Add a key 'k' associated with a successor node 'n' to the list
let add k n =
match Map.tryFind n res with
| None -> Map.add n [n; k] res
| Some l -> Map.add n (l # [k]) res
match k, n with
| [], _ ->
// If there are no more keys, we return the results
res |> Map.toList |> List.map snd
| k::ks, [] ->
// If there are no more successors, use the special 'first'
findSuccessors first (add k first) ks []
| k::ks, n::ns when n < k ->
// If we have a key 'k', but the next node is smaller, skip it
findSuccessors first res (k::ks) ns
| k::ks, n::ns ->
// Found a key 'k' with a successor 'n' - add it to the list
findSuccessors first (add k n) ks (n::ns)
findSuccessors (List.head n) Map.empty k n

I came up with a new solution to your description of the problem, rather than trying to modify your code. I'm using quite a different approach: no mutable variables or data structures, just pure functional code with one recursive function. I did this because it was easier for me, not because pure code is always better.
let mapNodes startingNodes startingKeys =
let rec loop remainingNodes remainingKeys acc =
match remainingNodes, remainingKeys with
| _, [] ->
acc
| [], keys ->
let next = startingNodes |> List.tryHead |> Option.map (fun firstNode -> firstNode :: keys)
match next with
| Some next -> next :: acc
| None -> acc // this shouldn't happen if there is at least one starting node
| nextNode :: restNodes, keys ->
let keysForNode = keys |> List.takeWhile (fun key -> key <= nextNode)
match keysForNode with
| [] ->
loop restNodes keys acc
| keysForNode ->
let next = nextNode :: keysForNode
let restKeys = keys |> List.skip keysForNode.Length
loop restNodes restKeys (next :: acc)
loop (startingNodes |> List.tail) startingKeys [] |> List.rev
let nodes = [ 1; 5; 10; 15; 20 ]
let keys = [ 2; 3; 7; 15; 18; 23 ]
let expected = [ [ 5; 2; 3 ]; [ 10; 7 ]; [ 15; 15 ]; [ 20; 18 ]; [ 1; 23 ] ]
let result = mapNodes nodes keys // [[5; 2; 3]; [10; 7]; [15; 15]; [20; 18]; [1; 23]]
result = expected // true
The general approach is to use a recursive loop that explicitly passes through all of the input state required, rather than using mutable variables. An accumulator acc is also passed through to gather the output.
This code uses a List.takeWhile, followed by a List.skip on the same list. This is slightly inefficient. It could be improved if there was a List.splitWhen function in the F# library, or if you were to write one yourself.

One more attempt in addition to what was proposed earlier :) I'm not well familiar with F# standard library and idioms, so it might be not idiomatic/suboptimal/both, but I tried to solve it in a very straightforward way (as I would explain the solution verbally):
let nearest_keys_per_node keys nodes =
(* Simple helper function that finds the nearest next node for a given key *)
let nearest_next_node nodes k =
match nodes with
| [] -> failwith "Empty nodes list!"
| hd :: tl ->
let rec nearest_node_tr k current_best = function
| [] -> current_best
| hd :: tl when hd < k -> nearest_node_tr k current_best tl
| hd :: tl -> hd
nearest_node_tr k hd tl
List.map (nearest_next_node nodes) keys (* Get the nearest next node for each key *)
|> List.zip keys (* "Glue" them together with the keys - gettin a list of tuples (key, node) *)
|> Seq.groupBy (fun (_, node) -> node) (* Group by nodes*)
|> List.ofSeq
|> List.map (fun (node, seq) -> (* "Cleanup" the structure that we got after the grouping and transform in to your desired output *)
node :: (List.ofSeq(seq) |> List.map fst)
)
;;
> nearest_keys_per_node [2;3;7;15;18;23] [1;5;10;15;20];;
val it : int list list = [[5; 2; 3]; [10; 7]; [15; 15]; [20; 18]; [1; 23]]

Removing consecutive duplicates from a list without recursion

I'm supposed to remove consecutive duplicates from an int list without using recursion and using only List.fold, map, filter, fold_left, fold_right.
I almost got it, but the problem with my code is that it checks if each element equals the 2nd element, and not the next element.
For example if let z = int list [3;1;4;5;5;1;1] my code will return [3;4;5] and not [3;1;4;5;1]. I'm not sure how to change it so filter uses a dynamically changing list parameter and not simply the original one (so it doesn't compare each element to the second element (1 in this case) each time):
let dupe (ls: int list) : int list =
List.filter (fun x -> if List.length ls = 0 then true else if x = List.hd (List.tl xs) then false else true) ls

The type of List.filter is this:
# List.filter;;
- : ('a -> bool) -> 'a list -> 'a list = <fun>
Notably, the filter function can see only one element of the list at a time. You need to see two consecutive elements to decide what to do, so I'd say List.filter won't do the job.
You're going to have to use map or one of the folds, I'd say. You can figure out which one(s) will work, with similar reasoning.
(I assume this is the sort of reasoning the assignment is supposed to illustrate. So I'm going to leave it there.)

Without rec
let remove = function
[] -> []
| x::tl ->
let (_,lxRes)=
List.fold_left (
fun (xPrec,lxRes) xCour ->
if xPrec=xCour then
(xCour,lxRes)
else
(xCour,lxRes#[xCour])
) (x+1,[]) (x::tl)
in
lxRes
Test:
# remove [3;1;4;5;5;1;1];;
- : int list = [3; 1; 4; 5; 1]
# remove [1;1];;
- : int list = [1]
# remove [1;1;1;1;2;2;3;4;5;5];;
- : int list = [1; 2; 3; 4; 5]
With rec (just for information)
let rec remove =
function
| [] -> []
| x::[] -> x::[]
| x::y::tl ->
if x=y then remove (y::tl)
else x::remove (y::tl)

Using just List.fold_left can be a little bit more concise than the previous answer. Of course, this will build up the list in reverse order, so we need to reverse the result.
let remove lst =
List.(
lst
|> fold_left
(fun acc x ->
match acc with
| [] -> [x]
| hd::_ when x = hd -> acc
| _ -> x::acc)
[]
|> rev
)
Of course, if you're not allowed to use List.rev we can reimplement it easily using List.fold_left, List.cons and Fun.flip.
let rev lst =
List.fold_left (Fun.flip List.cons) [] lst

Information hiding with OCaml records

Given
type 'a set = { insert : 'a -> 'a set; contains : 'a -> bool }
How can I implement
val empty : 'a set
?
I've tried closing over something, say a list, but the return type is wrong.. since it is. (ignoring the fact that the performance characteristics here are terrible :-) )
let empty =
let rec insert_f set a =
match set with
| [] -> a :: []
| k :: rest ->
if k = a then
k :: rest
else
k :: insert_f rest a
in
let rec contains_f set a =
match set with
| [] -> false
| k :: rest ->
if k = key then
true
else contains_f rest a
in
{ insert = insert_f []; contains = contains_f []}

directly writing the empty is not the easiest in such data structure, as you will need to write the insert, which will contains again an insert and so one... So let's write first the insert:
let rec insert : 'a set -> 'a -> 'a set = fun s x -> {
insert = (fun y -> failwith "TODO");
contains = (fun y -> if x = y then true else s.contains y) }
in insert, you want to recursively call insert, but the first parameter will be the record you are writing. So here is the complete solution:
let rec insert : 'a set -> 'a -> 'a set = fun s x ->
let rec ss = {
insert = ( fun y -> insert ss y);
contains = (fun y -> if x = y then true else s.contains y)}
in ss
let rec empty = {
insert = (fun x -> insert empty x);
contains = (fun x -> false)}

First of all, it's bool, not boolean. :)
Second, this definition is quite cumbersome. But you can do something like:
let empty = {
insert=(fun x -> {
insert=(fun x -> assert false);
contains=(fun x-> assert false)});
contains=(fun x -> false)}
with your implementations of insert and contains for non-empty sets in place of "assert false" of course.
A hint for implementing insert and contains: don't use any lists, use compositions of a functions from existing and new sets.
You can find nice examples in e.g. "On Understanding Data Abstraction, Revisited" by W. Cook, that paper is available online.

Haskell Split list into Sublist using pattern recognition

I am trying to split a Array containing I and Os, if a certain pattern occurs.
lets assume i have an input, looking like this:
data Bit = O | I deriving (Eq, Show)
let b = [I,I,O,O,O,O,O,I,I,O,O,O,I,O]
that is what i am generating, when encoding [[Bool]] -> [Bit] corresponding input to my encode function would be let a = [[True, False, False, True],[False, False],[False]]
Now my objective is to decode what ive generated,so i need a function that gets me from b to a.
But i can't come up with a way to split b list into 3 sublists, every time it reads either I,O or I,I. Every Odd letter stands for following member or starting array member. I am basically copying utf unicode encoding.
So i am trying to build a function that would get me from b to a.
After some time i came up with this:
split :: [Bit] -> [[Bit]]
split (x1:x2:xs) = if (x1 == I)
then [x2 : split xs]
else x2 : split xs
And i cant figure out, how to split the list into sublist. Any kind of advice/help/code is greatly appreciated
EDIT:
split :: [Bit] ->[[Bit]]
split [] = []
split xs = case foo xs of (ys,I,x2) -> -- generate new subarray like [...,[x2]]
(ys,O,x2) -> -- append existing subarray with value x2 [.....,[previous values]++x2]
foo :: [a] -> ([a],x1,x2)
foo x1:x2:input = (input,x1,x2)
those 2 comments are the last thing i need to figure out. after that im done :)
if feeding b into function split, i want this ouput: [[I,O,O,I],[O,O],[O]]
final step would be to get from b to [[True, False, False, True],[False, False],[False]]

I would start with if (x1 == 1) ...
If x1 is a Bit that can be either I or O, why are you comparing its equality against a Num, 1?

If I got it right, you need something like:
split [] = []
split xs = case foo xs of (ys,r) -> r : split ys
foo :: [a] -> ([a],r)
foo = undefined
In foo, the list should get partially consumed and returns the rest of the list and the value to collect.
EDIT:
data Bit = O | I deriving (Eq, Show)
sampleA = [[True, False, False, True],[False, False],[False]]
sampleB = [I,I,O,O,O,O,O,I,I,O,O,O,I,O]
type TwoBit = (Bit,Bit)
twobit (x:y:xs) = (x,y) : twobit xs
twobit _ = []
split :: [TwoBit] -> [[Bool]]
split [] = []
split xs = case spli xs of (ys,r) -> r : split ys
where
spli :: [TwoBit] -> ([TwoBit],[Bool])
spli (x:xs) = case span (not . pterm) xs of
(ys,zs) -> (zs, map ptrue $ x:ys)
pterm x = (I,O) == x || (I,I) == x
ptrue x = (O,I) == x || (I,I) == x
splitTB = split . twobit
main = print $ splitTB sampleB == sampleA
PS Functions that look like s -> (s,a) could also be represented as State monad.

huffman coding for a text file

This is only part of my huffman tree generated using ocaml. The tree is represented as (char*int list) list:
[(' ', [0]); ('e', [1; 0]); ('t', [1; 1; 0]); ('a', [1; 1; 1; 0]);
('o', [1; 1; 1; 1; 0]); ('n', [1; 1; 1; 1; 1; 0]).....].
The (char*int list) is the code and the corresponding encoded bitstream. I'm wondering if this is a correct tree or I understood something wrong. In this way, the longest encoded ASC II code will be 255 bits. The original file is 213.3k and after encoding, it becomes 227k while in the instructions, I was told it should generate a file around 119k. I don't know where my problem is because I did everything following the instructions. Can someone tell me what is wrong in here?
My biggest problem is that: if I use huffman coding, only the 8 most frequent chars can save me space while the other 247 chars will cost extra space, is that true? If it isn't, why?
The codes I wrote was following the instructions in this link:
http://www.cs.cornell.edu/Courses/cs3110/2012sp/hw/ps3/ps3.html
This is my code of encoding function:
type huffmantree = Node of huffmantree*(int*int)*huffmantree
| Leaf of char*int | Nil
type encoding = char * (int list)
let look_up (chr: char) (encl : encoding list) : int list =
let rec look_up_rec encl =
match encl with
| [] -> raise (Failure "Not found")
| (ch,theL)::tl -> if ch = chr then theL
else look_up_rec tl
in
look_up_rec encl
;;
let get_codes (hm : huffmantree): encoding list =
let rec get_codes_rec aTree word=
match aTree with
| Nil -> []
| Node (Leaf(lKey,lFreq),value,Nil) -> [(lKey,[0])]
| Node (Leaf(lKey,lFreq),value,Leaf(rKey,rFreq)) ->
[(lKey,List.append word [0]);(rKey,List.append word [1])]
| Node (Leaf(lKey,lFreq),value,rNode) ->
(lKey,List.append word [0])::(get_codes_rec rNode (List.append word [1]))
in
get_codes_rec hm []
;;
let encode (text : char list) : huffmantree * int list =
let sortedT = List.fast_sort (fun ch1 ch2->
if (int_of_char ch1)>=(int_of_char) ch2 then 1 else -1) text
in
let rec cre_freq_list aList m =
match aList with
| [] -> []
| hd::[] -> [(hd,m+1)]
| hd1::hd2::tl -> if hd1=hd2 then cre_freq_list (hd2::tl) (m+1)
else (hd1,(m+1))::(cre_freq_list (hd2::tl) 0)
in
let sortedF = List.fast_sort (fun (ch1,fr1) (ch2,fr2) ->
if fr1>=fr2 then 1 else -1) (cre_freq_list sortedT 0)
in
let rec createHuff sortedF=
match sortedF with
| [] -> Nil
| (ch,va)::[] -> Node (Leaf (ch,va),(256,va),Nil)
| (ach,aval)::tl ->
let rec creH_rec the_tl sib n freq=
match the_tl with
| (bch,bval)::[] -> Node(Leaf (bch,bval),(n,bval+freq),sib)
| (bch,bval)::btl -> creH_rec btl
(Node (Leaf (bch,bval),(n,bval+freq),sib)) (n+1)
(freq+bval)
in creH_rec tl (Leaf(ach,aval)) 256 aval
in
let huff = createHuff sortedF
in
let rec make_codes text =
match text with
| [] -> []
| hd::tl -> List.append (look_up hd (get_codes huff))
(make_codes tl)
in
(huff,(make_codes text))

Looking at the resulting tree, it appears that you don't implement the Huffman's algorithm. I doubt the 'e' is more frequent in your text than any other letter. Without your code I can only guess but maybe when merging the two lightest trees you inserted the resulting tree at the end of the list of trees to merge instead of inserting it at the right place according to its weight.
In your code createHuff is declared recursive but there is no recursive call.
Your function createHuff never compares the values inside the sortedF list don't you think this is a problem? It means that createHuff will always yield the same tree (with different labels but with the same structure).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Insert implementation for a trie in Ocaml - ocaml

Related

Set a key value to the closest node element given two lists

Removing consecutive duplicates from a list without recursion

Information hiding with OCaml records

Haskell Split list into Sublist using pattern recognition

huffman coding for a text file

Categories

Resources