I need to create an implementation of the regex() SQLite function in a Haskell database connection so that I can use the "REGEX" operator in queries.
Now, I have an implementation of a regex matching function that uses PCRE:
import Text.Regex.Base.RegexLike
import qualified Text.Regex.PCRE.ByteString as PCRE
import qualified Data.ByteString as BS
sqlRegex :: BS.ByteString -> BS.ByteString -> IO Bool
sqlRegex reg b = do
reC <- pcreCompile reg
re <- case reC of
(Right r) -> return r
reE <- PCRE.execute re b
case reE of
(Right (Just _)) -> return True
(Right (Nothing)) -> return False
where pcreCompile = PCRE.compile defaultCompOpt defaultExecOpt
which works well (please excuse the very explicit calls)
> sqlRegex (Data.ByteString.Char8.pack ".*") (Data.ByteString.Char8.pack "hello")
True
> sqlRegex (Data.ByteString.Char8.pack "H.*") (Data.ByteString.Char8.pack "hello")
False
Now, how do I create the SQLite function??
conn <- open $ pack dbFile
createFunction conn "regexp" (Just 2) True [..... and what should go here?]
The docs for createFunction
helps me as far as making me understand that I need to make the function take a context and some arguments, but the refs for those data does not help me at all!
How should make my function take a FuncContext and FuncArgs??
There is an example in the github repo:
https://github.com/IreneKnapp/direct-sqlite/blob/master/test/Main.hs#L743-757
-- implements repeat(n,str)
repeatString ctx args = do
n <- funcArgInt64 args 0
s <- funcArgText args 1
funcResultText ctx $ T.concat $ replicate (fromIntegral n) s
You use the functions funcArg... to get the arguments and functions like funcResult... to return them.
Links to the docs:
Extract Function Arguments
Set the Result of a Function
Related
Why is this Ocaml statement giving me a syntax error?
let a = 0;; if a = 0 then let b = 0;;
Do if then else statements always have to return a value?
EDIT: Here is the code I am struggling with. I want to apply this function over a list with the map function. The function is supposed to look at each word in the list wordlist and add to the stringmap. If it has already been added to the string map then add 1 to its password.
module StringMap = Map.Make(String)
let wordcount = StringMap.empty
let findword testword =
let wordcount = (if (StringMap.mem testword wordcount)
then (StringMap.add testword ((StringMap.find testword wordcount)+1) wordcount)
else (StringMap.add testword 1 wordcount))
List.map findword wordlist
You can only have an if then without else if the then expression evaluates to unit () Otherwise, the expression will not type check. An if without an else is equivalent to writing if x then y else () which can only type check if y is unit.
Check this out for a reference.
(Terminology note: there are no statements in OCaml because everything is an expression, so the term "if statement" doesn't quite apply. I still understood what you meant, but I thought this was worth noting)
Yes, if is an expression in OCaml, not a statement. The best way to look at it is that there are no statements in OCaml. Everything is an expression. (Admittedly there are expressions that return (), which are similar to statements.)
You can only have if b then e if the type of e is unit (i.e., if it returns ()).
Note also that you can't just say let v = e, except at the top level of a module. At the top level it defines a global name in the module. In other cases you need to say let v = e1 in e2; the let defines a local symbol v for use in the expression e2.
One answer to the let b = problem - it works like this:
let a = 0
let b = if a = 0 then 0 else 1
(* or whatever value you need in the else branch *)
And then the Map problem: the manual says Map is applicative - that means Stringmap.add returns a new map. You must use a ref to store your map - see this ocaml toplevel protocol:
# module StringMap = Map.Make(String);;
# let mymap = ref StringMap.empty ;;
val mymap : '_a StringMap.t ref = {contents = <abstr>}
# mymap := StringMap.add "high" 1 !mymap;;
- : unit = ()
# StringMap.mem "high" !mymap;;
- : bool = true
# StringMap.mem "nono" !mymap;;
- : bool = false
# StringMap.find "high" !mymap;;
- : int = 1
# StringMap.find "nono" !mymap;;
Exception: Not_found.
I'm trying to test a small function (or rather, IO Action) that takes a command line argument and outputs it to the screen. My original (untestable) function is:
-- In Library.hs
module Library where
import System.Environment (getArgs)
run :: IO ()
run = do
args <- getArgs
putStrLn $ head args
After looking at this answer about mocking, I have come up with a way to mock getArgs and putStrLn by using a type class constrained type. So the above function becomes:
-- In Library.hs
module Library where
class Monad m => SystemMonad m where
getArgs :: m [String]
putStrLn :: String -> m ()
instance SystemMonad IO where
getArgs = System.Environment.getArgs
putStrLn = Prelude.putStrLn
run :: SystemMonad m => m ()
run = do
args <- Library.getArgs
Library.putStrLn $ head args
This Library., Prelude. and System.Environment. are to avoid compiler complaints of Ambigious Occurence. My test file looks like the following.
-- In LibrarySpec.hs
{-# LANGUAGE TypeSynonymInstances #-}
{-# LANGUAGE FlexibleInstances #-}
import Library
import Test.Hspec
import Control.Monad.State
data MockArgsAndResult = MockArgsAndResult [String] String
deriving(Eq, Show)
instance SystemMonad (State MockArgsAndResult) where
getArgs = do
MockArgsAndResult args _ <- get
return args
putStrLn string = do
MockArgsAndResult args _ <- get
put $ MockArgsAndResult args string
return ()
main :: IO ()
main = hspec $ do
describe "run" $ do
it "passes the first command line argument to putStrLn" $ do
(execState run (MockArgsAndResult ["first", "second"] "")) `shouldBe` (MockArgsAndResult ["first", "second"] "first")
I'm using a State monad that effectively contains 2 fields.
A list for the command line arguments where the mock getArgs reads from
A string that the mock putStrLn puts what was passed to it.
The above code works and seems to test what I want it to test. However, I'm wondering if there is some better / cleaner / more idiomatic way of testing this. For one thing, I'm using the same state to both put stuff into the test (my fake command line arguments), and then get stuff out of it (what was passed to putStrLn.
Is there a better way of doing what I'm doing? I'm more familiar with mocking in a Javascript environment, and my knowledge of Haskell is pretty basic (I arrived at the above solution by a fair bit of trial and error, rather than actual understanding)
The better way is to avoid needing to provide mock versions of getArgs and putStrLn by separating out the heart of the computation into a pure function.
Consider this example:
main = do
args <- getArgs
let n = length $ filter (\w -> length w < 5) args
putStrLn $ "Number of small words: " ++ show n
One could say that the heart of the computation is counting the number of small words which is a pure function of type [String] -> Int. This suggest that we should refactor the program like this:
main = do
args <- getArgs
let n = countSmallWords args
putStrLn $ "Number of small words: " ++ show n
countSmallWords :: [String] -> Int
countSmallWords ws = ...
Now we just test countSmallWords, and this is easy because it is pure function.
The function tally below is really simple: it takes a string s as argument, splits it on non-alphanumeric characters, and tallies the numbers of the resulting "words", case-insensitively.
open Core.Std
let tally s =
let get m k =
match Map.find m k with
| None -> 0
| Some n -> n
in
let upd m k = Map.add m ~key:k ~data:(1 + get m k) in
let re = Str.regexp "[^a-zA-Z0-9]+" in
let ws = List.map (Str.split re s) ~f:String.lowercase in
List.fold_left ws ~init:String.Map.empty ~f:upd
I think this function is harder to read than it should be due to clutter. I wish I could write something closer to this (where I've indulged in some "fantasy syntax"):
(* NOT VALID SYNTAX -- DO NOT COPY !!! *)
open Core.Std
let tally s =
let get m k =
match find m k with
| None -> 0
| Some n -> n ,
upd m k = add m k (1 + get m k) ,
re = regexp "[^a-zA-Z0-9]+" ,
ws = map (split re s) lowercase
in fold_left ws empty upd
The changes I did above fall primarily into three groups:
get rid of the repeated let ... in's, consolidated all the bindings (into a ,-separated sequence; this, AFAIK, is not valid OCaml);
got rid of the ~foo:-type noise in function calls;
got rid of the prefixes Str., List., etc.
Can I achieve similar effects using valid OCaml syntax?
Readability is difficult to achieve, it highly depends on the reader's abilities and familiarity with the code. I'll focus simply on the syntax transformations, but you could perhaps refactor the code in a more compact form, if this is what you are really looking for.
To remove the module qualifiers, simply open them beforehand:
open Str
open Map
open List
You must open them in that order to make sure the List values you are using there are still reachable, and not scope-overridden by the Map ones.
For labelled parameters, you may omit the labels if for each function call you provide all the parameters of the function in the function signature order.
To reduce the number of let...in constructs, you have several options:
Use a set of rec definitions:
let tally s =
let rec get m k =
match find m k with
| None -> 0
| Some n -> n
and upd m k = add m k (1 + get m k)
and re = regexp "[^a-zA-Z0-9]+"
and ws = map lowercase (split re s)
in fold_left ws empty upd
Make multiple definitions at once:
let tally s =
let get, upd, ws =
let re = regexp "[^a-zA-Z0-9]+" in
fun m k ->
match find m k with
| None -> 0
| Some n -> n,
fun g m k -> add m k (1 + g m k),
map lowercase (split re s)
in fold_left ws empty (upd get)
Use a module to group your definitions:
let tally s =
let module M = struct
let get m k =
match find m k with
| None -> 0
| Some n -> n
let upd m k = add m k (1 + get m k)
let re = regexp "[^a-zA-Z0-9]+"
let ws = map lowercase (split re s)
end in fold_left ws empty M.upd
The later is reminiscent of the Sml syntax, and perhaps better suited to proper optimization by the compiler, but it only get rid of the in keywords.
Please note that since I am not familiar with the Core Api, I might have written incorrect code.
If you have a sequence of computations on the same value, then in OCaml there is a |> operator, that takes a value from the left, and applies in to the function on the right. This can help you to "get rid of" let and in. What concerning labeled arguments, then you can get rid of them by falling back to a vanilla standard library, and make your code smaller, but less readable. Anyway, there is a small piece of sugar with labeled arguments, you can always write f ~key ~data instead of f ~key:key ~data:data. And, finally, module names can be removed either by local open syntax (let open List in ...) or by locally shorcutting it to a smaller names (let module L = List in).
Anyway, I would like to show you a code, that contains less clutter, to my opinion:
open Core.Std
open Re2.Std
open Re2.Infix
module Words = String.Map
let tally s =
Re2.split ~/"\\PL" s |>
List.map ~f:(fun s -> String.uppercase s, ()) |>
Words.of_alist_multi |>
Words.map ~f:List.length
Here is a sample program from RWH book. I'm wondering why the first works great but the second can't even compile? The only difference is the first one uses 2 tabs after where mainWith func = do whereas the second uses only 1. Not sure what difference does that mean? Why the second fails to compile? And also why do construct can be empty?
Thanks a lot,
Alex
-- Real World Haskell Sample Code Chapter 4:
-- http://book.realworldhaskell.org/read/functional-programming.html
import System.Environment (getArgs)
interactWith func input output = do
s <- readFile input
writeFile output (func s)
main = mainWith myFunction
where mainWith func = do
args <- getArgs
case args of
[fin, fout] -> do
interactWith func fin fout
_ -> putStrLn "error: exactly two arguments needed"
myFunction = id
-- The following code has a compilation error
-- % ghc --make interactWith.hs
-- [1 of 1] Compiling Main ( interactWith.hs, interactWith.o )
--
-- interactWith.hs:8:26: Empty 'do' construct
import System.Environment (getArgs)
interactWith func input output = do
s <- readFile input
writeFile output (func s)
main = mainWith myFunction
where mainWith func = do
args <- getArgs
case args of
[fin, fout] -> do
interactWith func fin fout
_ -> putStrLn "error: exactly two arguments needed"
myFunction = id
The definition of the mainWith function is indented to column 10:
where mainWith func = do
^
The contents of the do block started in this line are only indented to column 8:
args <- getArgs
case args of
...
^
If you increase the indentation of the contents of the do block to be also indented at least to column 10, the code is parsed correctly. With the current indentation the lines that should belong to the do block are seen to be part of the where clause, but not the mainWith function.
The do-block can not be empty, that's why you get the error. When using only one tab args <- getArgs is seen as part of the where-block, not the do-block, so the do-block is empty and you get an error.
The thing is that unless you use {} and ; to explicitly state which block goes from where to where, haskell relies on indendation. And since you indented your line only by one level, it was seen as part of the where-block.
Can you create a list of functions and then execute them sequentially, perhaps passing them into do notation?
I'm currently doing this by mapping over a list of data and am wondering if I can call somehow pass the result as a series of sequential calls?
Something like this?
sequence [putStrLn "Hello", putStrLn "World"]
If these are functions, ie pure, then you can use ($) or "apply":
execute functions argument = map ($argument) functions
-- execute [id,(1+),(1-)] 12 => [12,13,-11]
There's no guarantee that this happens sequentially of course, but you'll get a list of the return values.
If these are actions, ie impure, then what you want is called sequence_:
sequence_ [putStr "Hello", putStr " world", putStrLn "!"]
sequence_ is pretty easy to write yourself:
sequence_ [] = return ()
sequence_ (action:actions) = action >> sequence_ actions
There is also a sequence (without the underscore) that runs a bunch of actions and returns their results:
main = do
ss <- sequence [readFile "foo.txt", readFile "bar.txt", readFile "baz.txt"]
-- or ss <- mapM readFile ["foo.txt", "bar.txt", "baz.txt"]
good answers so far, but if you also want each function to act not on the original data but on the result of the previous function, look at the foldding functions, such as foldl, foldl1, and foldr:
fns = [(1-), (+2), (abs), (+1)]
helperFunction a f = f a
test1 n = foldl helperFunction n fns
and you may need the monadic version, foldM and foldM_ :
import Control.Monad
import Data.Char
helperFunction a f = f a
prnt = \s-> do putStrLn s; return s
actions = [return, prnt, return.reverse, prnt, return.(map toUpper), prnt, return.reverse, prnt]
test2 str = foldM_ helperFunction str actions