Is declaring anonymous functions expensive in Clojure? - clojure

As Clojure programmers we use lots of anonymous functions without thinking it's cost.
What are the relative costs of creating and using anonymous functions in clojure?

Clojure compiles all functions, anonymous or named, the same way. It then stores a pointer to that function in a namespace (contained in a var) so others can find it later.
There is no cost difference in compile time between functions that are compiled and used as anonymous functions, vs functions that are compiled, then have a pointer to them stored in a var that is part of a namespace.
When anonymous functions used at runtime, most of the time (perhaps always) they are created by making closures (the objects) so the cost of creating them is some memory and a little time.
If you are calling eval in time critical loops of course you can create the same problems in Clojure that you can make in any other language.

Related

forcing a function to be pure

In C++ it is possible to declare that a function is const, which means, as far as I understand, that the compiler ensures the function does not modify the object. Is there something analogous in C++ where I can require that a function is pure? If not in C++, is there a language where one can make this requirement?
If this is not possible, why is it possible to require functions to be const but not require them to be pure? What makes these requirements different?
For clarity, by pure I want there to be no side effects and no use of variables other than those passed into the function. As a result there should be no file reading or system calls etc.
Here is a clearer definition of side effects:
No modification to files on the computer that the program is run on and no modification to variables with scope outside the function. No information is used to compute the function other than variables passed into it. Running the function should return the same thing every time it is run.
NOTE: I did some more research and encountered pure script
(Thanks for jarod42's comment)
Based on a quick read of the wikipedia article I am under the impression you can require functions be pure in pure script, however I am not completely sure.
Short answer: No. There is no equivalent keyword called pure that constrains a function like const does.
However, if you have a specific global variable you'd like to remain untouched, you do have the option of static type myVar. This will require that only functions in that file will be able to use it, and nothing outside of that file. That means any function outside that file will be constrained to leave it alone.
As to "side effects", I will break each of them down so you know what options you have:
No modification to files on the computer that the program is run on.
You can't constrain a function to do this that I'm aware. C++ just doesn't offer a way to constrain a function like this. You can, however, design a function to not modify any files, if you like.
No modification to variables with scope outside the function.
Globals are the only variables you can modify outside a function's scope that I'm aware of, besides anything passed by pointer or reference as a parameter. Globals have the option of being constant or static, which will keep you from modifying them, but, beyond that, there's really nothing you can do that I'm aware.
No information is used to compute the function other than variables passed into it.
Again, you can't constrain it to do so that I'm aware. However, you can design the function to work like this if you want.
Running the function should return the same thing every time it is run.
I'm not sure I understand why you want to constrain a function like this, but no. Not that I'm aware. Again, you can design it like this if you like, though.
As to why C++ doesn't offer an option like this? I'm guessing reusability. It appears that you have a specific list of things you don't want your function to do. However, the likelihood that a lot of other C++ users as a whole will need this particular set of constraints often is very small. Maybe they need one or two at a time, but not all at once. It doesn't seem like it would be worth the trouble to add it.
The same, however, cannot be said about const. const is used all the time, especially in parameter lists. This is to keep data from getting modified if it's passed by reference, or something. Thus, the compiler needs to know what functions modify the object. It uses const in the function declaration to keep track of this. Otherwise, it would have no way of knowing. However, with using const, it's quite simple. It can just constrain the object to only use functions that guarantee that it remains constant, or uses the const keyword in the declaration if the function.
Thus, const get's a lot of reuse.
Currently, C++ does not have a mechanism to ensure that a function has "no side effects and no use of variables other than those passed into the function." You can only force yourself to write pure functions, as mentioned by Jack Bashford. The compiler can't check this for you.
There is a proposal (N3744 Proposing [[pure]]). Here you can see that GCC and Clang already support __attribute__((pure)). Maybe it will be standardized in some form in the future revisions of C++.
In C++ it is possible to declare that a function is const, which means, as far as I understand, that the compiler ensures the function does not modify the object.
Not quite. The compiler will allow the object to be modified by (potentially ill-advised) use of const_cast. So the compiler only ensures that the function does not accidentally modify the object.
What makes these requirements [constant and pure] different?
They are different because one affects correct functionality while the other does not.
Suppose C is a container and you are iterating over its contents. At some point within the loop, perhaps you need to call a function that takes C as a parameter. If that function were to clear() the container, your loop will likely crash. Sure, you could build a loop that can handle that, but the point is that there are times when a caller needs assurance that the rug will not be pulled out from under it. Hence the ability to mark things const. If you pass C as a constant reference to a function, that function is promising to not modify C. This promise provides the needed assurance (even though, as I mentioned above, the promise can be broken).
I am not aware of a case where use of a non-pure function could similarly cause a program to crash. If there is no use for something, why complicate the language with it? If you can come up with a good use-case, maybe it is something to consider for a future revision of the language.
(Knowing that a function is pure could help a compiler optimize code. As far as I know, it's been left up to each compiler to define how to flag that, as it does not affect functionality.)

Call a function in OCaml based on a string variable storing the function name

Is there such a mechanism in OCaml, such that I could invoke a function dynamically based on a variable storing the function name, like what I can do in other scripting languages?
For example, I have written a function foo(). And I store the String constants "foo" somewhere in a variable "x". In JavaScript I'm able to do something like this window[x](arguments); to dynamically invoke the method foo(). Can I do something similar in OCaml?
No, this is not the kind of thing that OCaml lets you do easily. The program definition, including the names of functions and so on, isn't available for the program itself to manipulate.
A simple way to get this effect for a set of functions known ahead of time is to make a dictionary (a hash table or a map, say) of functions using the function name as the key. Note that this will require the functions to have the same type (which is a feature of OCaml not a problem :-).

static functions compiler optimization C++

I know that we should declare a function as static in a class if its a utility function or if we have to use in a singleton class to access private static member.
But apart from that does a static function provide any sort of compiler optimization too since it doesn't pass the "this" pointer? Why not just use the utility function through an already instantiated object of class? Or is it just a best practice to make utility functions as statics?
Thanks in advance.
The "non-passing-this" optimization can probably be done by the compiler once you turn the optimizations on. As I see it, a static function has rather idiomatic uses:
Implementing modules. There is a bit of overlap here with namespaces, and I would rather use namespaces.
Factories: you can make the constructor protected/private and then have several static functions (with different, explicit names) creating instances.
For function pointers: static functions do not require the slightly more complicated syntax of pointer to member function. That can be a plus when interacting with libraries written for C.
To keep yourself from using this and have the compiler to enforce it. It makes sense sometimes. For example, if you have a commutative operation that takes two instances, a static function that takes the two instances would emphasize (in my opinion) that the operation is commutative. Of course in many cases you would rather overload an operator.
In general a static function will ease namespace and "friend" clutter by prefixing an otherwise ordinary function with the name of a class, presumably because both are tightly related.
Static exists to associate a method with a class, rather than either:
Associating it with an instance of that class (like writing a normal, non-static member function).
Keeping it in the global namespace or whatever namespace you would otherwise be in (like declaring a function just in the file, not in a class).
Static says that 'conceptually this is something tied to/associated with this class, but it does not depend on any instance of that class'.
In more formal terms: a static member function is the same as a function declared outside of a class in all ways other than that it is part of that class's namespace and in that it has access rights to that class's private/protected data members.
Going back to your question:
There is no optimization gain here.
Utility function has nothing to do with it. It's whether or not it makes sense to scope the function in the class itself (rather than an instance of it).
It does not 'pass the this pointer' because there is no instance to speak of. You can call a static member function without ever invoking that class's constructor.

Memory management for types in complex languages

I've come across a slight problem for writing memory management with regard to the internal representation of types in a compiler for statically typed, complex languages. Consider a simple snippet in C++ which easily demonstrates a type that refers to itself.
class X {
void f(const X&) {}
};
Types can have nearly infinitely complex relationships to each other. So, as a compiler process, how do you make sure that they are properly collected?
So far, I've decided that garbage collection might be the right way to go, which I wouldn't be too happy with because I want to write the compiler in C++, or alternatively, just leave them and never collect them for the life of the compile phase for which they are needed (which has a very fixed lifetime) and then collect them all afterwards. The problem with that is that if you had a lot of complex types, you could lose a lot of memory that way.
Memory management is easy, just have some table type-name -> type-descriptor for each declaration scopes. Types are uniquely identified by name, no matter how complex the nesting is. Even a recursive type is still only a single type. As tp1 says correctly, you typically perform multiple passes to fill in all blanks. For instance, you might check that a type name is known in the first pass and then compute all links, later on, you compute the type.
Keep in mind that languages like C don't have a really complex type system -- even though they have pointers (which allow for recursive types), there is not much type computation going on.
I think you can remove the cycles from the dependency graph by using separate objects to represent declarations and definitions. Assuming a type system similar to C++, you will then have a hierarchical dependency:
Function definitions depend on type definitions and function declarations
Type definitions depend on function and type declarations (and definitions of contained types)
Function declarations depend on type declarations
In your example, the dependency graph is f_def -> X_def -> f_decl -> X_decl.
With no cycles in the graph, you can manage objects using simple reference counting.

Why and how should I use namespaces in C++?

I have never used namespaces for my code before. (Other than for using STL functions)
Other than for avoiding name conflicts, is there any other reason to use namespaces?
Do I have to enclose both declarations and definitions in namespace scope?
One reason that's often overlooked is that simply by changing a single line of code to select one namespaces over another you can select an alternative set of functions/variables/types/constants - such as another version of a protocol, or single-threaded versus multi-threaded support, OS support for platform X or Y - compile and run. The same kind of effect might be achieved by including a header with different declarations, or with #defines and #ifdefs, but that crudely affects the entire translation unit and if linking different versions you can get undefined behaviour. With namespaces, you can make selections via using namespace that only apply within the active namespace, or do so via a namespace alias so they only apply where that alias is used, but they're actually resolved to distinct linker symbols so can be combined without undefined behaviour. This can be used in a way similar to template policies, but the effect is more implicit, automatic and pervasive - a very powerful language feature.
UPDATE: addressing marcv81's comment...
Why not use an interface with two implementations?
"interface + implementations" is conceptually what choosing a namespace to alias above is doing, but if you mean specifically runtime polymorphism and virtual dispatch:
the resultant library or executable doesn't need to contain all implementations and constantly direct calls to the selected one at runtime
as one implementation's incorporated the compiler can use myriad optimisations including inlining, dead code elimination, and constants differing between the "implementations" can be used for e.g. sizes of arrays - allowing automatic memory allocation instead of slower dynamic allocation
different namespaces have to support the same semantics of usage, but aren't bound to support the exact same set of function signatures as is the case for virtual dispatch
with namespaces you can supply custom non-member functions and templates: that's impossible with virtual dispatch (and non-member functions help with symmetric operator overloading - e.g. supporting 22 + my_type as well as my_type + 22)
different namespaces can specify different types to be used for certain purposes (e.g. a hash function might return a 32 bit value in one namespace, but a 64 bit value in another), but a virtual interface needs to have unifying static types, which means clumsy and high-overhead indirection like boost::any or boost::variant or a worst case selection where high-order bits are sometimes meaningless
virtual dispatch often involves compromises between fat interfaces and clumsy error handling: with namespaces there's the option to simply not provide functionality in namespaces where it makes no sense, giving a compile-time enforcement of necessary client porting effort
Here is a good reason (apart from the obvious stated by you).
Since namespace can be discontiguous and spread across translation units, they can also be used to separate interface from implementation details.
Definitions of names in a namespace can be provided either in the same namespace or in any of the enclosing namespaces (with fully qualified names).
It can help you for a better comprehension.
eg:
std::func <- all function/class from C++ standard library
lib1::func <- all function/class from specific library
module1::func <-- all function/class for a module of your system
You can also think of it as module in your system.
It can also be usefull for an writing documentation (eg: you can easily document namespace entity in doxygen)
Aren't name collisions enough of a reason? ADL subtleties, especially with operator overloads, are another.
That's the easiest way. You can also prefix names with the namespace, e.g. my_namespace::name, when defining.
You can think of namespaces as logical separated units for your application, and logical here means that suppose we have two different classes, putting these two classes each in a file, but when you notice that these classes share something enough to be categorized under one category, that's one strong reason to use namespaces.
Answer: If you ever want to overload the new, placement new, or delete functions you're going to want to do them in a namespace. No one wants to be forced to use your version of new if they don't require the things you require.
Yes