Why we have to export the function used by spawn? - concurrency

In Erlang and while dealing with process, you have to export the function used in spawn function.
-module(echo).
-export([start/0, loop/0]).
start() ->
spawn(echo, loop, []).
The reason from the book "Programming Erlang, 2nd Edition. page 188" is
"Note that we also have to export the argument of spawn from the module. This is a good practice because we will be able to change the internal details of the server without changing the client code.".
And in the book "Erlang Programming", page 121:
-module(frequency).
-export([start/0, stop/0, allocate/0, deallocate/1]).
-export([init/0]).
%% These are the start functions used to create and
%% initialize the server.
start() ->
register(frequency, spawn(frequency, init, [])).
init() ->
Frequencies = {get_frequencies(), []},
loop(Frequencies).
Remember that when spawning a process, you have to export the init/ 0 function as it is used by the spawn/3 BIF. We have put this function in a separate export clause to distinguish it from the client functions, which are supposed to be called from other modules.
Would you please explain to me the logic behind that reason?

short answer is: spawn is not 'language construction' it's library function.
It means 'spawn' is situated in another module, which does not have access to any functions in your module but exported.
You have to pass to 'spawn' function some way to start your code. It can be function value (ie spawn(fun() -> (any code you want, including any local functions invocations) end) ) or module/exported function name/arguments, which is visible from other modules.

The logic is quite straightforward. Yet confusion can easily arise as:
export does not exactly match object-oriented encapsulation and especially public methods;
several common patterns require to export functions not meant to be called by regular clients.
What export really does
Export has a very strict meaning: exported functions are the only functions that can be referred to by their fully qualified name, i.e. by module, function name and arity.
For example:
-module(m).
-export([f/0]).
f() -> foo.
f(_Arg) -> bar.
g() -> foobar.
You can call the first function with an expression such as m:f() but this wouldn't work for the other two functions. m:f(ok) and m:g() will fail with an error.
For this reason, the compiler will warn in the example above that f/1 and g/0 are not called and cannot be called (they are unused).
Functions can always be called from outside a module: functions are values and you can refer to a local function (within a module), and pass this value outside. For example, you can spawn a new process by using a non-exported function, using spawn/1. You could rewrite your example as follows:
start() ->
spawn(fun loop/0).
This doesn't require to export loop. Joe Armstrong in other editions of Programming Erlang explicitely suggests to transform the code as above to avoid exporting loop/0.
Common patterns requiring an export
Because exports are the only way to refer to a function by name from outside a module, there are two common patterns that require exported functions even if those functions are not part of a public API.
The example you mention is whenever you want to call a library function that takes a MFA, i.e. a module, a function name and a list of arguments. These library functions will refer to the function by its fully qualified name. In addition to spawn/3, you might encounter timer:apply_after/4.
Likewise, you can write functions that take MFA arguments, and call the function using apply/3.
Sometimes, there are variants of these library functions that directly take a 0-arity function value. This is the case with spawn, as mentioned above. apply/1 doesn't make sense as you would simply write F().
The other common case is behavior callbacks, and especially OTP behaviors. In this case, you will need to export the callback functions which are of course referred to by name.
Good practice is to use separate export attributes for these functions to make it clear these functions are not part of the regular interface of the module.
Exports and code change
There is a third common case for using exports beyond a public API: code changes.
Imagine you are writing a loop (e.g. a server loop). You would typically implement this as following:
-module(m).
-export([start/0]).
start() -> spawn(fun() -> loop(state) end).
loop(State) ->
NewState = receive ...
...
end,
loop(NewState). % not updatable !
This code cannot be updated, as the loop will never exit the module. The proper way would be to export loop/1 and perform a fully qualified call:
-module(m).
-export([start/0]).
-export([loop/1]).
start() -> spawn(fun() -> loop(state) end).
loop(State) ->
NewState = receive ...
...
end,
?MODULE:loop(NewState).
Indeed, when you refer to an exported function using its fully qualified name, the lookup is always performed against the latest version of the module. So this trick allows to jump to the newer version of the code at every iteration of the loop. Code updates are actually quite complex, and OTP, with its behaviors, does it right for you. It typically uses the same construct.
Conversely, when you call a function passed as a value, this is always from the version of the module that created this value. Joe Armstrong argues this is an advantage of spawn/3 over spawn/1 in a dedicated section of his book (8.10, Spawning with MFAs). He writes:
Most programs we write use spawn(Fun) to create a new process. This is fine provided we don’t want to dynamically upgrade our code. Sometimes we want to write code that can be upgraded as we run it. If we want to make sure that our code can be dynamically upgraded, then we have to use a different form of spawn.
This is far-fetched as when you spawn a new process, it starts immediately, and an update is unlikely to occur between the start of the new process and the moment the function value is created. Besides, Armstrong's statement is partly untrue: to make sure the code can dynamically be upgraded, spawn/1 will work as well (cf example above), the trick is not to use spawn/3, but to perform a fully qualified call (Joe Armstrong describes this in another section). spawn/3 has other advantages over spawn/1.
Still, the difference between passing a function by value and by name explains why there is no version of timer:apply_after/4 that takes a function by value, since there is a delay and the function by value might be old when the timer fires. Such a variant would actually be dangerous because at most two versions of a module: the current one of the old one. If you reload a module more than once, processes trying to call even older versions of the code will be killed. For this reason, you would often prefer MFAs and their exports to function values.

When you do a spawn you create a new completely new process with its own environment and thread of execution. This means that you are no longer executing "inside" the module where the spawn is called, so you must make an "outside" call into the module. the only functions in a module which can be called from the "outside" are exported functions, hence the spawned function must be exported.
It might seem a little strange seeing you are spawning a function in the same module but this is why.
I think it is important to remember that a module is just code and does not contain any deeper meaning than that, for example like a class in an OO language. So even if you have functions from the same module being executed in different processes, a very common occurrence, then there is no implicit connection between them. You still have to send messages between processes even if it is from/to functions in the same module.
EDIT:
About the last part of your question with the quote about putting export init/1 in a separate export declaration. There is no need to do this and it has no semantic significance, you can use as many or as few export declarations as you wish. So you could put all the functions in one export declaration or have a separate one for each function; it makes no difference.
The reason to split them is purely visual and for documentation purposes. You typically group functions which go together into separate export declarations to make it easier to see that they are a group. You also typically put "internal" exported functions, functions which aren't meant for the user to directly call, in a separate export declaration. In this case init/1 has to be exported for the spawn but is not meant to be called directly outside the spawn.
By having the user call the start/0 function to start the server and not have them explicitly spawn the init/1 function allows you to change the internals as you wish later on. The user only sees the start/0 function. Which is what the first quote is trying to say.

If you're wondering why you have to export anything and not have everything visible by default, it's because it's clearer to the user which functions they should call if you hide all the ones they shouldn't. That way, if you change your mind on the implementation, people using your code won't notice. Otherwise, there may be someone who is using a function that you want to change or eliminate.
For example, say you have a module:
-module(somemod).
useful() ->
helper().
helper() ->
i_am_helping.
And you want to change it to:
-module(somemod).
useful() ->
betterhelper().
betterhelper() ->
i_am_helping_more.
If people should only be calling useful, you should be able to make this change. However, if everything was exported, people might be depending on helper when they shouldn't be. This change will break their code when it shouldn't.

Related

Per-entity Lua scripts in games?

I'm using Lua for scripts in my C++ game. I want to be able to attach scripts to entities, and based on which functions are defined in the script, register for callbacks which will run the script functions at the appropriate time.
I believe that I can encapsulate different scripts from each other, by making the "script" into a table. Basically, ... lua source code ... would become ScriptName = { ... lua source code ... }. Then instead of calling func(), I'd call ScriptName.func(), and thus two scripts defining the same function (aka registering for the same event) wouldn't trample over each other.
My problem now is in encapsulating different entities sharing the same script. Obviously I don't want them to be sharing variables, but with what I'm doing now, any variable defined by a script would be shared by every instance of that script, which is just really bad. I could maybe try something similar to my above solution on the source level, by wrapping every script with EntityID.ScriptName = { ... } before compiling it. Something tells me there's a better way, though, I just don't know it.
Another factor is that scripts need to be able to reference entities and scripts/components relative to a specific entity. If I use the above method the best solution to this would be passing entity IDs around as strings which could reference the table specific to that entity, I think? At this point I really have no idea what I'm doing.
In order for a script to interact with a C++ object, the typical approach is to have the C++ code expose the object to Lua as a userdata (a wrapper for a pointer) and provide C++ functions that the script can call, passing the userdata as parameter. In the C++ code, that userdata gives you the object that the function should to operate on. It's equivalent to a "this" pointer.
You usually do this by putting the C++ functions into a metatable associated with the userdata, so they can be called like methods in the Lua code (i.e. objectIGotFromCpp:someMethod('foo').
ScriptName.func(), and thus two scripts defining the same function (aka registering for the same event) wouldn't trample over each other.
Rather than relying on accessing globals or naming conventions, ect. it's much cleaner to simply provide a callback that Lua scripts can use to register for events.
If I use the above method the best solution to this would be passing entity IDs around as strings
No reason. An entity in your C++ code is a pointer to an object on the heap. You can pass that pointer directly to Lua as userdata. Lua can pass that back to your C++ code and give you direct access to the object, rather than going through some object-to-ID mapping.

Getting a reference to a Node module and work with it in a separate thread

Assuming I have 2 different sources:
node_module.cc
threaded_class.cc
node_module.cc is where I am calling NODE_MODULE to initialize my module. This module has a function that makes an instance of threaded_class.cc (in a separate thread). I understand that I need to use Lockers and Isolates to access v8 in a separate thread but my issue is bigger than that.
NODE_MODULE function is my only chance to catch the module's instance from my understanding. I found this article that uses a piece of code that is what I am exactly looking for. The author stores the module handle in a persistent object like this:
auto module_handle = Persistent<Object>::New(target);
But this either seems deprecated or not possible anymore. However I figured that it can be achieved like this:
auto module_handle = Persistent<Object>(context->GetIsolate() ,target);
However the latter, when I am trying to access its properties, are mostly private methods and properties, nothing worth to be used or I am not knowing how to use this.
My question is, is there any updated guide on how to properly handle these kind of stuff in writing a Node module? Or can you show me an example how I can pass my latter module_handle to my thread and use it for example for executing a js function called test?
I also want to know, what is the difference between NODE_MODULE and NODE_MODULE_CONTEXT_AWARE when initializing a node module?

Putting all code of a module behind 1 interface. Good idea or not?

I have several modules (mainly C) that need to be redesigned (using C++). Currently, the main problems are:
many parts of the application rely on the functions of the module
some parts of the application might want to overrule the behavior of the module
I was thinking about the following approach:
redesign the module so that it has a clear modern class structure (using interfaces, inheritence, STL containers, ...)
writing a global module interface class that can be used to access any functionality of the module
writing an implementation of this interface that simply maps the interface methods to the correct methods of the correct class in the interface
Other modules in the application that currently directly use the C functions of the module, should be passed [an implementation of] this interface. That way, if the application wants to alter the behavior of one of the functions of the module, it simply inherits from this default implementation and overrules any function that it wants.
An example:
Suppose I completely redesign my module so that I have classes like: Book, Page, Cover, Author, ... All these classes have lots of different methods.
I make a global interface, called ILibraryAccessor, with lots of pure virtual methods
I make a default implementation, called DefaultLibraryAccessor, than simply forwards all methods to the correct method of the correct class, e.g.
DefaultLibraryAccessor::printBook(book) calls book->print()
DefaultLibraryAccessor::getPage(book,10) calls book->getPage(10)
DefaultLibraryAccessor::printPage(page) calls page->print()
Suppose my application has 3 kinds of windows
The first one allows all functionality and as an application I want to allow that
The second one also allows all functionality (internally), but from the application I want to prevent printing separate pages
The third one also allows all functionality (internally), but from the application I want to prevent printing certain kinds of books
When constructing the window, the application passes an implementation of ILibraryAccessor to the window
The first window will get the DefaultLibraryAccessor, allowing everything
I will pass a special MyLibraryAccessor to the second window, and in MyLibraryAccessor, I will overrule the printPage method and let it fail
I will pass a special AnotherLibraryAccessor to the third window, and in AnotherLibraryAccessor, I will overrule the printBook method and check the type of book before I will call book->print().
The advantage of this approach is that, as shown in the example, an application can overrule any method it wants to overrule. The disadvantage is that I get a rather big interface, and the class-structure is completely lost for all modules that wants to access this other module.
Good idea or not?
You could represent the class structure with nested interfaces. E.g. instead of DefaultLibraryAccessor::printBook(book), have DefaultLibraryAccessor::Book::print(book). Otherwise it looks like a good design to me.
Maybe look at the design pattern called "Facade". Use one facade per module. Your approach seems good.
ILibraryAccessor sounds like a known anti-pattern, the "god class".
Your individual windows are probably better off inheriting and overriding at Book/Page/Cover/Author level.
The only thing I'd worry about is a loss of granularity, partly addressed by suszterpatt previously. Your implementations might end up being rather heavyweight and inflexible. If you're sure that you can predict the future use of the module at this point then the design is probably ok.
It occurs to me that you might want to keep the interface fine-grained, but find some way of injecting this kind of display-specific behaviour rather than trying to incorporate it at top level.
If you have n number of methods in your interface class, And there are m number of behaviors per each method, you get m*(nC1 + nC2 + nC3 + ... + nCn) Implementations of your interface (I hope I got my math right :) ). Compare this with the m*n implementations you need if you were to have a single interface per function. And this method has added flexibility which is more important. So, no - I don't think a single interface would do. But you don't have to be extreme about it.
EDIT: I am sure the math is wrong. :(

C/C++ Dynamic loading of functions with unknown prototype

I'm in the process of writing a kind of runtime system/interpreter, and one of things that I need to be able to do is call c/c++ functions located in external libraries.
On linux I'm using the dlfcn.h functions to open a library, and call a function located within. The problem is that, when using dlsysm() the function pointer returned need to be cast to an appropriate type before being called so that the function arguments and return type are know, however if I’m calling some arbitrary function in a library then obviously I will not know this prototype at compile time.
So what I’m asking is, is there a way to call a dynamically loaded function and pass it arguments, and retrieve it’s return value without knowing it’s prototype?
So far I’ve come to the conclusion there is not easy way to do this, but some workarounds that I’ve found are:
Ensure all the functions I want to load have the same prototype, and provide some sort mechanism for these functions to retrieve parameters and return values. This is what I am doing currently.
Use inline asm to push the parameters onto the stack, and to read the return value. I really want to steer clear of doing this if possible!
If anyone has any ideas then it would be much appreciated.
Edit:
I have now found exactly what I was looking for:
http://sourceware.org/libffi/
"A Portable Foreign Function Interface Library"
(Although I’ll admit I could have been clearer in the original question!)
What you are asking for is if C/C++ supports reflection for functions (i.e. getting information about their type at runtime). Sadly the answer is no.
You will have to make the functions conform to a standard contract (as you said you were doing), or start implementing mechanics for trying to call functions at runtime without knowing their arguments.
Since having no knowledge of a function makes it impossible to call it, I assume your interpreter/"runtime system" at least has some user input or similar it can use to deduce that it's trying to call a function that will look like something taking those arguments and returning something not entirely unexpected. That lookup is hard to implement in itself, even with reflection and a decent runtime type system to work with. Mix in calling conventions, linkage styles, and platforms, and things get nasty real soon.
Stick to your plan, enforce a well-defined contract for the functions you load dynamically, and hopefully make due with that.
Can you add a dispatch function to the external libraries, e.g. one that takes a function name and N (optional) parameters of some sort of variant type and returns a variant? That way the dispatch function prototype is known. The dispatch function then does a lookup (or a switch) on the function name and calls the corresponding function.
Obviously it becomes a maintenance problem if there are a lot of functions.
I believe the ruby FFI library achieves what you are asking. It can call functions
in external dynamically linked libraries without specifically linking them in.
http://wiki.github.com/ffi/ffi/
You probably can't use it directly in your scripting language but perhapps the ideas are portable.
--
Brad Phelan
http://xtargets.heroku.com
I'm in the process of writing a kind of runtime system/interpreter, and one of things that I need to be able to do is call c/c++ functions located in external libraries.
You can probably check for examples how Tcl and Python do that. If you are familiar with Perl, you can also check the Perl XS.
General approach is to require extra gateway library sitting between your interpreter and the target C library. From my experience with Perl XS main reasons are the memory management/garbage collection and the C data types which are hard/impossible to map directly on to the interpreter's language.
So what I’m asking is, is there a way to call a dynamically loaded function and pass it arguments, and retrieve it’s return value without knowing it’s prototype?
No known to me.
Ensure all the functions I want to load have the same prototype, and provide some sort mechanism for these functions to retrieve parameters and return values. This is what I am doing currently.
This is what in my project other team is doing too. They have standardized API for external plug-ins on something like that:
typedef std::list< std::string > string_list_t;
string_list_t func1(string_list_t stdin, string_list_t &stderr);
Common tasks for the plug-ins is to perform transformation or mapping or expansion of the input, often using RDBMS.
Previous versions of the interface grew over time unmaintainable causing problems to both customers, products developers and 3rd party plug-in developers. Frivolous use of the std::string is allowed by the fact that the plug-ins are called relatively seldom (and still the overhead is peanuts compared to the SQL used all over the place). The argument stdin is populated with input depending on the plug-in type. Plug-in call considered failed if inside output parameter stderr any string starts with 'E:' ('W:' is for warnings, rest is silently ignored thus can be used for plug-in development/debugging).
The dlsym is used only once on function with predefined name to fetch from the shared library array with the function table (function public name, type, pointer, etc).
My solution is that you can define a generic proxy function which will convert the dynamic function to a uniform prototype, something like this:
#include <string>
#include <functional>
using result = std::function<std::string(std::string)>;
template <class F>
result proxy(F func) {
// some type-traits technologies based on func type
}
In user-defined file, you must add define to do the convert:
double foo(double a) { /*...*/ }
auto local_foo = proxy(foo);
In your runtime system/interpreter, you can use dlsym to define a foo-function. It is the user-defined function foo's responsibility to do calculation.

Calling unexported functions in Win32 C++

How would I go about calling an unexported function in Win32 C++?
Calling unexported functions that are defined in the same module (DLL/EXE) as your code is easy: just call them like any other C++ function. Obviously this isn't what you're asking about. If you want to call unexported functions in a different module, you need to find out their addresses somehow.
One way to do this is to have the first module call an exported function in the second module which returns a function pointer. (Or: a struct containing function pointers, a pointer to an instance of a class, etc.) Think factory pattern.
Another way is to export a registration function from the first module and have the second module's initialization code call it, passing it pointers to unexported functions along with some sort of identifying info. (Better also have a corresponding unregistration function which is called before the second module is unloaded.)
Yet another way is to grovel through the debug symbols using dbghelp.dll. This would not be recommended for a real-world application because it would require distributing debug symbols and would be extremely slow, not to mention overly complex.
Additionally to bk1e's answer, there's still another method (not recommended as well).
Obtain the relative Adress of that function in the dll (e.g. via disassembly). This has to be done manually and before compiling.
In the program, you now have to obtain the startadress of the dll in memory (for example using an exported function and some calculation).
Now you can directly call that function using the relative Adress of the function + the startadress of the exported function.
I don't recommend this though. It works only on one defined version of that dll. Any recompile and the adress may change. Or that function may not be needed any more and gets deleted. There must be a reason, why this function is NOT exported. In general - you try to archive something the author of the library intentionally did not want you to do and that's "evil" most of the time.
You mentioned the ida-name. This name includes the startadress.
No two ways about it, you'll have to study the disassembly to figure out what gets pushed on the stack, and how it's used to determine the types.