I have Haskell code which needs to interface with a C library somewhat like this:
// MyObject.h
typedef struct MyObject *MyObject;
MyObject newMyObject(void);
void myObjectDoStuff(MyObject myObject);
//...
void freeMyObject(MyObject myObject);
The original FFI code wraps all of these functions as pure functions using unsafePerformIO. This has caused bugs and inconsistencies because the sequencing of the operations is undefined.
What I am looking for is a general way of dealing with objects in Haskell without resorting to doing everything in IO. What would be nice is something where I can do something like:
myPureFunction :: String -> Int
-- create object, call methods, call destructor, return results
Is there a nice way to achieve this?
The idea is to keep passing a baton from each component to force each component to be evaluated in sequence. This is basically what the state monad is (IO is really a weird state monad. Kinda).
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
import Control.Monad.State
data Baton = Baton -- Hide the constructor!
newtype CLib a = CLib {runCLib :: State Baton a} deriving Monad
And then you just string operations together. Injecting them into the CLib monad will mean they're sequenced. Essentially, you're faking your own IO, in a more unsafe way since you can escape.
Then you must ensure that you add construct and destruct to the end of all CLib chains. This is easily done by exporting a function like
clib :: CLib a -> a
clib m = runCLib $ construct >> m >> destruct
The last big hoop to jump through is to make sure that when you unsafePerformIO whatever's in construct, it actually gets evaluated.
Frankly, this is all kinda pointless since it already exists, battle proven in IO. Instead of this whole elaborate process, how about just
construct :: IO Object
destruct :: IO ()
runClib :: (Object -> IO a) -> a
runClib = unsafePerformIO $ construct >>= m >> destruct
If you don't want to use the name IO:
newtype CLib a = {runCLib :: IO a} deriving (Functor, Applicative, Monad)
My final solution. It probably has subtle bugs that I haven't considered, but it is the only solution so far which has met all of the original criteria:
Strict - all operations are sequenced correctly
Abstract - the library is exported as a stateful monad rather than a leaky set of IO operations
Safe - the user can embed this code in pure code without using unsafePerformIO and they can expect the result to be pure
Unfortunately the implementation is a bit complicated.
E.g.
// Stack.h
typedef struct Stack *Stack;
Stack newStack(void);
void pushStack(Stack, int);
int popStack(Stack);
void freeStack(Stack);
c2hs file:
{-# LANGUAGE ForeignFunctionInterface, GeneralizedNewtypeDeriving #-}
module CStack(StackEnv(), runStack, pushStack, popStack) where
import Foreign.C.Types
import Foreign.Ptr
import Foreign.ForeignPtr
import qualified Foreign.Marshal.Unsafe
import qualified Control.Monad.Reader
#include "Stack.h"
{#pointer Stack foreign newtype#}
newtype StackEnv a = StackEnv
(Control.Monad.Reader.ReaderT (Ptr Stack) IO a)
deriving (Functor, Monad)
runStack :: StackEnv a -> a
runStack (StackEnv (Control.Monad.Reader.ReaderT m))
= Foreign.Marshal.Unsafe.unsafeLocalState $ do
s <- {#call unsafe newStack#}
result <- m s
{#call unsafe freeStack#} s
return result
pushStack :: Int -> StackEnv ()
pushStack x = StackEnv . Control.Monad.Reader.ReaderT $
flip {#call unsafe pushStack as _pushStack#} (fromIntegral x)
popStack :: StackEnv Int
popStack = StackEnv . Control.Monad.Reader.ReaderT $
fmap fromIntegral . {#call unsafe popStack as _popStack#}
test program:
-- Main.hs
module Main where
import qualified CStack
main :: IO ()
main = print $ CStack.runStack x where
x :: CStack.StackEnv Int
x = pushStack 42 >> popStack
build:
$ gcc -Wall -Werror -c Stack.c
$ c2hs CStack.chs
$ ghc --make -Wall -Werror Main.hs Stack.o
$ ./Main
42
Disclaimer: I've never actually worked with C stuff from Haskell, so I am not speaking from experience here.
But what springs to mind for me is to write something like:
withMyObject :: NFData r => My -> Object -> Constructor -> Params -> (MyObject -> r) -> r
You wrap the C++ constructor/destructor as IO operations. withMyObject uses IO to sequence the constructor, calling the user-specified function, calling the destructor, and returning the result. It can then unsafePerformIO that entire do block (as opposed to the individual operations within it, which you've already cooking doesn't work). You need to use deepSeq too (which is why the NFData constraint is there), or laziness could defer the use of the MyObject until after it's been destructed.
The advantages of this is are:
You can write pure MyObject -> r functions using whatever ordinary code you like, no monads required
You can decide to construct a MyObject in order to call such functions in the middle of other ordinary pure code, with the help of withMyObject
You can't forget to call the destructor when you use withMyObject
You can't use the MyObject after calling the destructor on it1
There is only one (small) place in your system where you use unsafePerformIO, and therefore that's the only place you have to carefully worry about whether you've got the sequencing correct to justify that it's safe after all. There's also only one place you have to worry about making sure you use the destructor properly.
It's basically the "construct, use, destruct" pattern with the particulars of the "use" step abstracted out as a parameter so that you can has a single implementation cover every time you need to use that pattern.
The main disadvantage is that it's a bit awkward to construct a MyObject and then pass it to several unrelated functions. You have to bundle them up into a function that returns a tuple of each of the original results, and then use withMyObject on that. Alternatively if you also expose the IO versions of he constructor and destructor separately the user has the option of using those if IO is less awkward than making wrapper functions to pass to withMyObject (but then it's possible for the user to accidentally use the MyObject after freeing it, or forget to free it).
1 Unless you do something silly like use id as the MyObject -> r function. Presumably there's no NFData MyObject instance though. Also that sort of error would tend to come from willful abuse rather than accidental misunderstanding.
Related
Say I have a simple file based database Monad. I would define it as shown below.
newtype MyDbFileBased a = MyDbFileBased {
unMyDbDbFileBased :: ExceptT MyDbFileBasedError (ReaderT MyDbFileBasedEnv IO) a
} deriving (
Functor
, Applicative
, Monad
, MonadError MyDbFileBasedError
, MonadReader MyDbFileBasedEnv
, MonadIO
)
I have read that the above pattern (will post a link to the blog when I find it) is not recommended and that I should replace IO with a generic Monad like so.
import qualified Data.ByteString as B
newtype MyDbFileBased m a = MyDbFileBased {
unMyDbFileBased :: ExceptT MyDbFileBasedError (ReaderT MyDbFileBasedEnv m) a
} deriving (
Functor
, Applicative
, Monad
, MonadError MyDbFileBasedError
, MonadReader MyDbFileBasedEnv
)
class Monad m => MonadFileBasedIO m where
readBytes :: FilePath -> m B.ByteString
writeBytes :: FilePath -> B.ByteString -> m ()
...
instance MonadFileBasedIO IO where
readBytes = B.readFile
writeBytes = B.writeFile
This will supposedly make unit testing easier. The advise is to mock IO in testing with something like as follows.
data MockFS = EmptyDir
| SingleFile FilePath String
deriving (Show)
newtype MockFileBasedIO a = MockFileBasedIO {
unMockFileBasedIO :: State MockFS a
} deriving (
Functor
, Applicative
, Monad
, MonadState MockFS
)
instance MonadFileBasedIO MockFileBasedIO where
readBytes pathReq = do
dir <- get
case dir of
EmptyDir -> fail "file not found"
SingleFile path contents -> if pathReq == path
then pure (BU.fromString contents)
else fail "file not found"
writeBytes path = put . SingleFile path . BU.toString
All this looks good to me until now. But then I want to add things like catch and liftIO to functions within MyDbFileBased type. I thought of adding catchMonadFileBasedIO and liftMonadFileBasedIO functions to MonadFileBasedIO typeclass and set catchMonadFileBasedIO = catch and liftMonadFileBasedIO = liftIO for IO monad. But then it brings in dependency on MonadIO and Exception typeclasses and the compiler tells me to add these typeclasses to function signatures of catchMonadFileBasedIO and liftMonadFileBasedIO. Also I would need to derive MonadIO from MyDbFileBased m. Then what's the point of replacing IO with a generic Monad in the first place?
I don't understand if I should be mocking IO in cases such as these or not. How do I use liftIO and catch if we mock it? Should I not catch exceptions in this module and cascade them to application level?
Then what's the point of replacing IO with a generic Monad in the first place?
As a general piece of advice, that might enable you to replace IO with something pure when testing.
Unit tests ought to be deterministic, which is one of the two characteristics of pure functions. Thus, being able to frame any problem in terms of pure functions makes it intrinsically testable.
In order to make a set of interactions pure and testable, you can, for example, replace m with State, and run your unit tests in the State monad. Here's an example. Here's an example with Writer.
In general, I'd recommend avoiding 'mocking' if possible. In object-oriented programming, this may be a necessary evil to enable testing, but it typically leads to hard-to-maintain code. In functional programming, unit testing is much easier, but it typically requires you to design the modules of your application in a functional style.
Introducing a type class as something equivalent to an object-oriented interface or base class is unlikely to lead to a a functional design. This will pull you towards a programming model where the (impure) interactions are at the centre of your application architecture. That's exactly what makes object-oriented programming so difficult.
In functional programming, you're much better off pushing the impure interactions to the edge of the system. This'll enable you to unit test your (pure) domain logic, while the IO remains concrete.
EDIT to provide more details:
1) The code that provides the libraries cannot be (easily) changed so profile_v1_type and profile_v2_type should be assumed to be immutable.
I have implemented #francescalus suggestion and it works for my small test case, but I was insufficiently clear about the problem, I think. The reason being that I can only modify my code not the code/types coming from the library. The problem will be that both will have t in the profile_type that gets imported which clash with the parent type.
But I am going to implement something where I replicate the contents of the derived type that I want and then use pointers and type-bound procedures to point to the components of the profile_type version that I want to use. It's not as clean as I wanted it to be but it's much better than I have now.
I am supporting a code that interfaces with another that has 2 versions - the two versions are very similar in interface and although the inputs and outputs are identical in property, they are obviously different derived types (they come from different libraries and differ slightly in the variables contained within. Most variable names inside these types are the same though crucially).
It is (apparently) necessary to support both at runtime, otherwise I would preprocess this all at compile time.
At the moment I have lazily copied and pasted the same code for each version (and all of the versions of derived types it uses) into separate subroutines (*_v1.f90, *_v2.f90).
This is annoying and not very maintainable.
What I'd like to be able to do is use some kind of pointer that doesn't care about what it's pointing to (or rather gets its type information from what it points to and is smart enough to know what's inside).
As I said above, the names are mostly the same, e.g. (t, for temperature, say)
From v1 of library:
TYPE profile_v1_type
REAL :: t
! loads of other stuff
END TYPE profile_v1_type
From v2 of library:
TYPE profile_v2_type
REAL :: t
! loads of other stuff, more than the first version
END TYPE profile_v2_type
In my code:
TYPE profile_container_type
TYPE(profile_v1_type) :: profile_v1
TYPE(profile_v2_type) :: profile_v2
! other arrays that are common inputs to both
END TYPE
! It's actually USE'd then allocated and initialised elsewhere, but I hope you get the idea
!USE profile_container_mod, ONLY : profile_container
TYPE(profile_container_type), TARGET :: profile_container
TYPE(*) :: p
REAL :: t1
!Version determined by a namelist
IF (Version == 1) THEN
p => profile_container % profile_v1
ELSE IF (Version == 2) THEN
p => profile_container % profile_v2
ENDIF
t1 = p % t + 1
.
.
.
ifort 19 gives these (expected) errors:
test.f90(24): error #8776: An assumed type object must be a DUMMY argument. [P]
TYPE(*), POINTER :: p
--------------------^
test.f90(24): error #8772: An assumed type object must not have the ALLOCATABLE, CODIMENSION, POINTER, INTENT(OUT) or VALUE attribute. [P]
TYPE(*), POINTER :: p
--------------------^
test.f90(39): error #6460: This is not a field name that is defined in the encompassing structure. [T]
t1 = p % t + 1
---------^
compilation aborted for test.f90 (code 1)
replace TYPE(*) with CLASS(*) gives the (still expected):
test2.f90(39): error #6460: This is not a field name that is defined in the encompassing structure. [T]
t1 = p % t + 1 ! or some clever function...
---------^
compilation aborted for test2.f90 (code 1)
This is fixed by SELECTing the type that you want to handle, but my point is that I want to do the same thing for either the v1 and v2 code (it will never be both). And I want to do it many times, not in this routine but in about a dozen routines.
I am open to using C pointers if the responder is able to provide a simple example to follow. I have tried (not recently) to solve this problem using C interoperability, but obviously without success!
Unlimited polymorphic entities are not the correct approach here.
Instead, we can define a base type which incorporates all of the common data and processing for the various other types. Here, let's call this base type profile_base_type:
type profile_base_type
real t
end type
The other two specific profiles can extend this base:
type, extends(profile_base_type) :: profile_v1_type
! v1 specific parts
end type
type, extends(profile_base_type) :: profile_v2_type
! v2 specific parts
end type
Then we can declare a polymorphic pointer of the base type
class(profile_base_type), pointer :: p
which can point to targets of either of the extending types:
p => profile_container%profile_v1
p => profile_container%profile_v2
Now, we can access the components of p which are in the type profile_base_type
t1 = p%t + 1
without having to use a select type construct.
Naturally, those specific aspects of the extending types cannot be accessed in this way but there are other considerations for that.
visitNode :: Castle -> State (Set Castle) Unit
visitNode c = do
s <- get
guard $ not (member c s)
modify \acc -> insert c s
I have some simple code for visiting nodes represented by a custom datatype. I thought MonadZero control functions like guard are supposed to work within all monad structures (such as State in this case). It gives me the error:
No type class instance was found for
Control.MonadZero.MonadZero Identity
Which I don't understand why MonadZero would not work in this context, but regardless, I attempted to derive the Identity for MonadZero with things like this:
newtype Identity a = Identity a
derive instance newtypeIdentity :: Newtype (Identity a) _
derive newtype instance monadZeroIdentity :: MonadZero Identity
None of which helped or compiled and I'm fairly sure I misunderstand what is wrong here. How do I use guard or any other monadic checks in this context?
What you need here is when, not guard.
guard only works for monads where there is a possibility to not produce a result. One example of such monad would be Maybe, where guard will yield Nothing when the condition is false. Another example would be Array, where guard would yield an empty array when the condition is false. And so on.
In your case, your monad always produces a value, so guard is really irrelevant there.
Instead, if I understood your logic correctly, what you want to do is produce an effect when a condition is true, and skip producing it when the condition is false. This can be accomplished via when or its evil twin unless:
visitNode c = do
s <- get
unless (member c s) $
modify \_-> insert c s
Also note that you're not using the parameter acc under modify. I've replaced it with an underscore, but really, if you're not using the argument, you don't need modify, you need put:
visitNode c = do
s <- get
unless (member c s) $
put (insert c s)
But the next thing to notice is that the pattern of get and then immediately put is exactly what modify is for. So in your case, seeing how there are no effects in between get and put, I would actually put all the logic within modify itself:
visitNode c = modify \s ->
if member c s
then insert c s
else s
The less effectful, the better.
EDIT: This answer tackles issues directly pointed in the question like:
guard usage, MonadPlus context and newtype deriving.
I think that #Fyodor Soikin answer tackles the essence of this problem by replacing guard with when so this answer can be treated as supplementary material.
I think that if you try something like:
visitNode :: Castle -> StateT (Set Castle) Maybe Unit
visitNode c = do
s <- get
guard $ not (member c s)
modify \acc -> insert c s
it should work because Maybe has MonadZero instance and StateT instance depends on this.
Now let's go back and try to resolve some of the problems which you have encountered.
It gives me the error:
No type class instance was found for
Control.MonadZero.MonadZero Identity
This message tells us that Identity has no MonadZero instance. If we check what is a MonadZero we are going to discover that it is a class which implicates that given type has also Monad and Alternative instance and which satisfies the Annihilation law... Identity has no Alternative instance because it require that given type has a Plus instance:
The Plus type class extends the Alt type class with a value that should be the left and right identity for (<|>)
(...)
Members:
empty :: forall a. f a
I think that it is impossible to find any good candidate for an empty (where empty :: ∀ a. f a) value when we have only one constructor Identity ∷ ∀ a. a → Identity a.
For example in case of Maybe we have empty = Nothing and <|> with this value always gives Nothing.
Which I don't understand why MonadZero would not work in this context, but regardless, I attempted to derive the Identity for MonadZero with things like this:
newtype Identity a = Identity a
derive instance newtypeIdentity :: Newtype (Identity a) _
derive newtype instance monadZeroIdentity :: MonadZero Identity
When you are using newtype deriving you are telling the compiler that instance for your newtype should use "inner type" instance as an implementation. In this case you have only a type parameter and there is no "underlyning" instance at hand.
I think that if you want to use such a deriving you have to use concrete type which instances you want to use. For example here we are deriving Functor for our type MaybeWrapper which uses Maybe instance to provide appropriate members implementation (map in this case):
newtype MaybeWrapper a = MaybeWrapper (Maybe a)
derive instance newtypeMaybeWrapper :: Newtype (MaybeWrapper a) _
derive newtype instance functorMaybeWrapper :: Functor MaybeWrapper
Happy PureScript Hacking!
Assume now I have a type data in a module (called module A)
type::data
endtype
The definition left empty. In future I will extend it to any type,for instant:
type,extends(data)::newdata
...(something general data )
endtype
Now what I want is to define a process in module A to copy newdata type. Is it possible? Something as following:
subroutine copyBtoA(A,B)
class(data),pinter::A
class(data)::B
A <- B
endsubroutine
Even though we do not know what the type newdata looks like, here A is a pointer and I guess we can allocate it according to B somehow?
To be more clear, what I want to do is define a list structure. Since it is a structure, the type data is not defined yet (in module A). However I want to define a function to copy data type. The point is that I want to copy a data which is NOT defined yet. Logically it looks little strange.
When the subroutine copyBtoA is called, the input A and B, in fact, is of type newdata. And here in moduleA A is not allocated yet. If in the definition I use Allocate(A), I think it is of type data. Then A=B is impossible. If I use select type, then I do not know the type yet.
!---------------------------example code---------------
module moduleA
type::data
endtype
contains
subroutine copyBtoA(A,B)
type(data),allocatable::A
type(data)::B
!-------------------
A = B
endsubroutine
endmodule
module moduleB
use moduleA
type,extends(data)::newdata
real::something
endtype
endmodule
program main
use moduleB
!
type(newdata),allocatable::data2
type(newdata)::data1
data1%something = 1.
! call copyBtoA(data2,data1)
endprogram
With non-polymorphic variables there is really no problem, at least with the program you show.
Maybe your code does not really show the real intended usage after all? You first spoke about some list structure and in the end you only show some simple program...
You can just use the default assignment
type(newdata),allocatable::data2
type(newdata)::data1
data1%something = 1.
data2 = data1
In Fortran 2008 you you can do that even with polymorphics, that means you do NOT directly see type(newdata)
class(data),allocatable::data1,data2
class(data)::data1
allocate(newdata::data1)
data2 = data1
is enough.
In gfortran this is not yet supported. You have to do
class(data),allocatable::data1,data2
class(data)::data1
allocate(newdata::data1)
allocate(data2, source = data1)
There is always a possibility to override the assignment by a user defined subroutine, but it is not really necessary here. You would likely have to define the assignment also for type newdata.
But remember, all this is copying the value. In your very first code snippet you had a pointer. With pointers you often do pointer assignment => and that is something else.
Suppose we have a module that defines an abstract type T:
module AbstractType (T, lexer) where
data T = T String deriving Show
lexer = (fmap T) . words
(Note that we do not export any type constructors for T, so the user would not be able to draft an instance by hand.)
How does one unit test lexer function?
Sure we may use the Show property of T, like this:
module Main where
import AbstractType
main = test
(show $ lexer "summer is miles and miles away")
"[T \"summer\",T \"is\",T \"miles\",T \"and\",T \"miles\",T \"away\"]"
test :: (Eq a) => a -> a -> IO ()
test expression expectation
| expression == expectation = putStrLn "Test passed."
| otherwise = error "Test failed."
— But this is both not beautiful and unfit for cases when our abstract type is not an instance of a class that permits casting to another, constructable type.
Is there a remedy?
P.S. To provide some justification for the case: suppose we have a chain of functions like parser . lexer that we can integration test and see if the whole of it works. As the chain at hand gets more complex, it may nevertheless become desirable to unit test each link individually.
The example is a simplified excerpt from an actual toy text processor I am in the process of writing.
The generally accepted best practice is, for an exposed module A, to create an internal module A.Internal that is either:
Exposed but documented to be unstable or unsafe.
Not exposed to the users of the package, but only to the testing facilities. (This is made possible by the internal libraries feature released in Cabal 2.0.)
It is my understanding that functions that are not exposed enjoy more radical optimizations, particularly inlining. I am not sure if it applies to functions in internal libraries too.
On the other hand, situations often arise when a user desperately needs some internal feature of your library and ends up forking and patching it to gain access. This is, of course, unfortunate and undesirable.
I would say generally that the implementation of an abstract type is best kept in an internal library as a safety measure, but you should use your judgement in each particular case.