D naming conventions: How is Phobos organized? - d

I'm making my own little library of handy functions and I'm trying to follow Phobos's naming convention but I'm getting really confused. How do I know where things would fit?
Example:
If there was a function like foldRight in Phobos (basically reduce in the reverse direction), which module would I find it in?
I can think of several:
std.algorithm: Because it's expressing an algorithm
std.array: Because I'm likely going to use it on arrays
std.container: Because it's used on containers, rather than single objects
std.functional: Because it's used mainly in functional programming
std.range: Because it operates on ranges as well
but I have no idea which one would be a good choice -- I could give a convincing argument for at least 3 of them.
What's the convention?

std.algorithm: yep and you can implement it like reduce!fun(retro(r))
this module specifies algorithms that run on sequences
std.array: no because it can also run on other ranges
these are helper functions that run only on build-in arrays
std.container: no because it doesn't define a data structure (like a treeset)
this defines data structures that are not build in into the language (for now a linked list, a binary tree and a deterministic array in terms of memory management)
std.functional: no because it doesn't operate on a function but on a range
this one takes a function and returns a different one
std.range: no because it doesn't define a range or provide a different way to iterate over one
the lack of a clear structure is one of my gripes with the phobos library TBH but really reading the first paragraph of the docs should tell you quite a bit of where to put the function

Related

Rules of thumb for function arguments ordering in Clojure

What (if any) are the rules for deciding the order of the parameters functions in Clojure core?
Functions like map and filter expect a data structure as the last
argument.
Functions like assoc and select-keys expect a data
structure as the first argument.
Functions like map and filter expect a function as the first
argument.
Functions like update-in expect a function as the last argument.
This can cause pains when using the threading macros (I know I can use as-> ) so what is the reasoning behind these decisions? It would also be nice to know so my functions can conform as closely as possible to those written by the great man.
Functions that operate on collections (and so take and return data structures, e.g. conj, merge, assoc, get) take the collection first.
Functions that operate on sequences (and therefore take and return an abstraction over data structures, e.g. map, filter) take the sequence last.
Becoming more aware of the distinction [between collection functions and sequence functions] and when those transitions occur is one of the more subtle aspects of learning Clojure.
(Alex Miller, in this mailing list thread)
This is important part of working intelligently with Clojure's sequence API. Notice, for instance, that they occupy separate sections in the Clojure Cheatsheet. This is not a minor detail. This is central to how the functions are organized and how they should be used.
It may be useful to review this description of the mental model when distinguishing these two kinds of functions:
I am usually very aware in Clojure of when I am working with concrete
collections or with sequences. In many cases I find the flow of data
starts with collections, then moves into sequences (as a result of
applying sequence functions), and then sometimes back to collections
when it comes to rest (via into, vec, or set). Transducers have
changed this a bit as they allow you to separate the target collection
from the transformation and thus it's much easier to stay in
collections all the time (if you want to) by apply into with a
transducer.
When I am building up or working on collections, typically the code
constructing it is "close" and the collection types are known and
obvious. Generally sequential data is far more likely to be vectors
and conj will suffice.
When I am thinking in "sequences", it's very rare for me to do an
operation like "add last" - instead I am thinking in whole collection
terms.
If I do need to do something like that, then I would probably convert
back to collections (via into or vec) and use conj again.
Clojure's FAQ has a few good rules of thumb and visualization techniques for getting an intuition of collection/first-arg versus sequence/last-arg.
Rather than have this be a link-only question, I'll paste a quote of Rich Hickey's response to the Usenet question "Argument order rules of thumb":
One way to think about sequences is that they are read from the left,
and fed from the right:
<- [1 2 3 4]
Most of the sequence functions consume and produce sequences. So one
way to visualize that is as a chain:
map<- filter<-[1 2 3 4]
and one way to think about many of the seq functions is that they are
parameterized in some way:
(map f)<-(filter pred)<-[1 2 3 4]
So, sequence functions take their source(s) last, and any other
parameters before them, and partial allows for direct parameterization
as above. There is a tradition of this in functional languages and
Lisps.
Note that this is not the same as taking the primary operand last.
Some sequence functions have more than one source (concat,
interleave). When sequence functions are variadic, it is usually in
their sources.
I don't think variable arg lists should be a criteria for where the
primary operand goes. Yes, they must come last, but as the evolution
of assoc/dissoc shows, sometimes variable args are added later.
Ditto partial. Every library eventually ends up with a more order-
independent partial binding method. For Clojure, it's #().
What then is the general rule?
Primary collection operands come first.That way one can write -> and
its ilk, and their position is independent of whether or not they have
variable arity parameters. There is a tradition of this in OO
languages and CL (CL's slot-value, aref, elt - in fact the one that
trips me up most often in CL is gethash, which is inconsistent with
those).
So, in the end there are 2 rules, but it's not a free-for-all.
Sequence functions take their sources last and collection functions
take their primary operand (collection) first. Not that there aren't
are a few kinks here and there that I need to iron out (e.g. set/
select).
I hope that helps make it seem less spurious,
Rich
Now, how one distinguishes between a "sequence function" and a "collection function" is not obvious to me. Perhaps others can explain this.

Nested Dictionary/Array in C++

Everybody,
I am a Python/C# guy and I am trying to learn C++.
In Python I used to do things like:
myRoutes = {0:[1,2,3], 1:[[1,2],[3,4]], 2:[[1,2,3],[[1,2,3],[1,2,3]],[4]]}
Basically when you have arrays of variable length and you don't want to wast a 2D matrix for them, nesting arrays into a dictionary to keep track of them is a good option.
In C++ I tried std::map<int, std:map<int, std:map<int, int> > > and it works but I feel there got to bo a better way to do this.
I prefer to stick to the standard libraries but popular libraries like boost are also acceptable for me.
I appreciate your help,
Ali
It looks like part of the question is: 'How do I store heterogeneous data in a container?' There are several different approaches:
1) Use a ready-made class of array types that abstracts away the exact details (i.e., dimensionality). Example: Boost Basic Linear Algebra
2) Make an explicit list of element types, using Boost.variant
#import "boost/variant.hpp"
typedef boost::variant< ArrayTypeA, ArrayTypeB > mapelement;
typedef std::map<int, mapelement> mappingtype;
Building a visitor for variant type is a bit involved (it involves writing a subclass of boost::static_visitor<desired_return_type> with a single operator() overload for each type in your variant). On the plus side, visitors are statically type checked to ens ure that they implement the exact right set of handlers.
3) Use a wrapper type, like Boost.Any to wrap the different types of interest.
Overall, I think the first option (using a specialized array class) is probably the most reliable. I have also used variants heavily in recent code, and although the compile errors are long, once one gets used to those, it is great to have compile time checking, which I prefer strongly over Python's "run and find out you were wrong later" paradigm.
Many of us share the pain you're experiencing right now but there are solutions to them. One of those solutions is the Boost library (it's like the 2nd standard library of C++.) There's quite a few collection libraries. In your case I'd use Boost::Multi-Dimentional Arrays.
Looks like this:
boost::multi_array<double,3> myArray(boost::extents[2][2][2]);
Which creates a 2x2x2 array. The first type in the template parameters "double" specifies what type the array will hold and the second "3" the dimension count of the array. You then use the "extents" to impart the actual size of each of the dimensions. Quite easy to use and syntatically clear in its intent.
Now, if you're dealing with something in Python like foo = {0:[1,2,3], 1:[3,4,5]} what you're really looking for is a multimap. This is part of the standard library and is essentially a Red-Black tree, indexed by key but with a List for the value.

Idiomatic way to do list/dict in Cython?

My problem: I've found that processing large data sets with raw C++ using the STL map and vector can often be considerably faster (and with lower memory footprint) than using Cython.
I figure that part of this speed penalty is due to using Python lists and dicts, and that there might be some tricks to use less encumbered data structures in Cython. For example, this page (http://wiki.cython.org/tutorials/numpy) shows how to make numpy arrays very fast in Cython by predefining the size and types of the ND array.
Question: Is there any way to do something similar with lists/dicts, e.g. by stating roughly how many elements or (key,value) pairs you expect to have in them? That is, is there an idiomatic way to convert lists/dicts to (fast) data structures in Cython?
If not I guess I'll just have to write it in C++ and wrap in a Cython import.
Cython now has template support, and comes with declarations for some of the STL containers.
See http://docs.cython.org/src/userguide/wrapping_CPlusPlus.html#standard-library
Here's the example they give:
from libcpp.vector cimport vector
cdef vector[int] vect
cdef int i
for i in range(10):
vect.push_back(i)
for i in range(10):
print vect[i]
Doing similar operations in Python as in C++ can often be slower. list and dict are actually implemented very well, but you gain a lot of overhead using Python objects, which are more abstract than C++ objects and require a lot more lookup at runtime.
Incidentally, std::vector is implemented in a pretty similar way to list. std::map, though, is actually implemented in a way that many operations are slower than dict as its size gets large. For suitably large examples of each, dict overcomes the constant factor by which it's slower than std::map and will actually do operations like lookup, insertion, etc. faster.
If you want to use std::map and std::vector, nothing is stopping you. You'll have to wrap them yourself if you want to expose them to Python. Do not be shocked if this wrapping consumes all or much of the time you were hoping to save. I am not aware of any tools that make this automatic for you.
There are C API calls for controlling the creation of objects with some detail. You can say "Make a list with at least this many elements", but this doesn't improve the overall complexity of your list creation-and-filling operation. It certainly doesn't change much later as you try to change your list.
My general advice is
If you want a fixed-size array (you talk about specifying the size of a list), you may actually want something like a numpy array.
I doubt you are going to get any speedup you want out of using std::vector over list for a general replacement in your code. If you want to use it behind the scenes, it may give you a satisfying size and space improvement (I of course don't know without measuring, nor do you. ;) ).
dict actually does its job really well. I definitely wouldn't try introducing a new general-purpose type for use in Python based on std::map, which has worse algorithmic complexity in time for many important operations and—in at least some implementations—leaves some optimisations to the user that dict already has.
If I did want something that worked a little more like std::map, I'd probably use a database. This is generally what I do if stuff I want to store in a dict (or for that matter, stuff I store in a list) gets too big for me to feel comfortable storing in memory. Python has sqlite3 in the stdlib and drivers for all other major databases available.
C++ is fast not just because of the static declarations of the vector and the elements that go into it, but crucially because using templates/generics one specifies that the vector will only contain elements of a certain type, e.g. vector with tuples of three elements. Cython can't do this last thing and it sounds nontrivial -- it would have to be enforced at compile time, somehow (typechecking at runtime is what Python already does). So right now when you pop something off a list in Cython there is no way of knowing in advance what type it is , and putting it in a typed variable only adds a typecheck, not speed. This means that there is no way of bypassing the Python interpreter in this regard, and it seems to me it's the most crucial shortcoming of Cython for non-numerical tasks.
The manual way of solving this is to subclass the python list/dict (or perhaps std::vector) with a cdef class for a specific type of element or key-value combination. This would amount to the same thing as the code that templates are generating. As long as you use the resulting class in Cython code it should provide an improvement.
Using databases or arrays just solves a different problem, because this is about putting arbitrary objects (but with a specific type, and preferably a cdef class) in containers.
And std::map shouldn't be compared to dict; std::map maintains keys in sorted order because it is a balanced tree, dict solves a different problem. A better comparison would be dict and Google's hashtable.
You can take a look at the standard array module for Python if this is appropriate for your Cython setting. I'm not sure since I have never used Cython.
There is no way to get native Python lists/dicts up to the speed of a C++ map/vector or even anywhere close. It has nothing to do with allocation or type declaration but rather paying the interpreter overhead. The example you mention (numpy) is a C extension and is written in C for precisely this reason.
Just because it was not mentioned here: You can easily wrap for example a C++ vector in a custom extension type.
from libcpp.vector cimport vector
cdef class pyvector:
"""Extension type wrapping a vector"""
cdef vector[long] _data
cpdef void push_back(self, long x):
self._data.push_back(x)
#property
def data(self):
return self._data
In this way, you can store your data in a vector allowing fast Cython operations while still being able to access the data (with some overhead) from the Python side.

Why is the use of tuples in C++ not more common?

Why does nobody seem to use tuples in C++, either the Boost Tuple Library or the standard library for TR1? I have read a lot of C++ code, and very rarely do I see the use of tuples, but I often see lots of places where tuples would solve many problems (usually returning multiple values from functions).
Tuples allow you to do all kinds of cool things like this:
tie(a,b) = make_tuple(b,a); //swap a and b
That is certainly better than this:
temp=a;
a=b;
b=temp;
Of course you could always do this:
swap(a,b);
But what if you want to rotate three values? You can do this with tuples:
tie(a,b,c) = make_tuple(b,c,a);
Tuples also make it much easier to return multiple variable from a function, which is probably a much more common case than swapping values. Using references to return values is certainly not very elegant.
Are there any big drawbacks to tuples that I'm not thinking of? If not, why are they rarely used? Are they slower? Or is it just that people are not used to them? Is it a good idea to use tuples?
A cynical answer is that many people program in C++, but do not understand and/or use the higher level functionality. Sometimes it is because they are not allowed, but many simply do not try (or even understand).
As a non-boost example: how many folks use functionality found in <algorithm>?
In other words, many C++ programmers are simply C programmers using C++ compilers, and perhaps std::vector and std::list. That is one reason why the use of boost::tuple is not more common.
Because it's not yet standard. Anything non-standard has a much higher hurdle. Pieces of Boost have become popular because programmers were clamoring for them. (hash_map leaps to mind). But while tuple is handy, it's not such an overwhelming and clear win that people bother with it.
The C++ tuple syntax can be quite a bit more verbose than most people would like.
Consider:
typedef boost::tuple<MyClass1,MyClass2,MyClass3> MyTuple;
So if you want to make extensive use of tuples you either get tuple typedefs everywhere or you get annoyingly long type names everywhere. I like tuples. I use them when necessary. But it's usually limited to a couple of situations, like an N-element index or when using multimaps to tie the range iterator pairs. And it's usually in a very limited scope.
It's all very ugly and hacky looking when compared to something like Haskell or Python. When C++0x gets here and we get the 'auto' keyword tuples will begin to look a lot more attractive.
The usefulness of tuples is inversely proportional to the number of keystrokes required to declare, pack, and unpack them.
For me, it's habit, hands down: Tuples don't solve any new problems for me, just a few I can already handle just fine. Swapping values still feels easier the old fashioned way -- and, more importantly, I don't really think about how to swap "better." It's good enough as-is.
Personally, I don't think tuples are a great solution to returning multiple values -- sounds like a job for structs.
But what if you want to rotate three values?
swap(a,b);
swap(b,c); // I knew those permutation theory lectures would come in handy.
OK, so with 4 etc values, eventually the n-tuple becomes less code than n-1 swaps. And with default swap this does 6 assignments instead of the 4 you'd have if you implemented a three-cycle template yourself, although I'd hope the compiler would solve that for simple types.
You can come up with scenarios where swaps are unwieldy or inappropriate, for example:
tie(a,b,c) = make_tuple(b*c,a*c,a*b);
is a bit awkward to unpack.
Point is, though, there are known ways of dealing with the most common situations that tuples are good for, and hence no great urgency to take up tuples. If nothing else, I'm not confident that:
tie(a,b,c) = make_tuple(b,c,a);
doesn't do 6 copies, making it utterly unsuitable for some types (collections being the most obvious). Feel free to persuade me that tuples are a good idea for "large" types, by saying this ain't so :-)
For returning multiple values, tuples are perfect if the values are of incompatible types, but some folks don't like them if it's possible for the caller to get them in the wrong order. Some folks don't like multiple return values at all, and don't want to encourage their use by making them easier. Some folks just prefer named structures for in and out parameters, and probably couldn't be persuaded with a baseball bat to use tuples. No accounting for taste.
As many people pointed out, tuples are just not that useful as other features.
The swapping and rotating gimmicks are just gimmicks. They are utterly confusing to those who have not seen them before, and since it is pretty much everyone, these gimmicks are just poor software engineering practice.
Returning multiple values using tuples is much less self-documenting then the alternatives -- returning named types or using named references. Without this self-documenting, it is easy to confuse the order of the returned values, if they are mutually convertible, and not be any wiser.
Not everyone can use boost, and TR1 isn't widely available yet.
When using C++ on embedded systems, pulling in Boost libraries gets complex. They couple to each other, so library size grows. You return data structures or use parameter passing instead of tuples. When returning tuples in Python the data structure is in the order and type of the returned values its just not explicit.
You rarely see them because well-designed code usually doesn't need them- there are not to many cases in the wild where using an anonymous struct is superior to using a named one.
Since all a tuple really represents is an anonymous struct, most coders in most situations just go with the real thing.
Say we have a function "f" where a tuple return might make sense. As a general rule, such functions are usually complicated enough that they can fail.
If "f" CAN fail, you need a status return- after all, you don't want callers to have to inspect every parameter to detect failure. "f" probably fits into the pattern:
struct ReturnInts ( int y,z; }
bool f(int x, ReturnInts& vals);
int x = 0;
ReturnInts vals;
if(!f(x, vals)) {
..report error..
..error handling/return...
}
That isn't pretty, but look at how ugly the alternative is. Note that I still need a status value, but the code is no more readable and not shorter. It is probably slower too, since I incur the cost of 1 copy with the tuple.
std::tuple<int, int, bool> f(int x);
int x = 0;
std::tuple<int, int, bool> result = f(x); // or "auto result = f(x)"
if(!result.get<2>()) {
... report error, error handling ...
}
Another, significant downside is hidden in here- with "ReturnInts" I can add alter "f"'s return by modifying "ReturnInts" WITHOUT ALTERING "f"'s INTERFACE. The tuple solution does not offer that critical feature, which makes it the inferior answer for any library code.
Certainly tuples can be useful, but as mentioned there's a bit of overhead and a hurdle or two you have to jump through before you can even really use them.
If your program consistently finds places where you need to return multiple values or swap several values, it might be worth it to go the tuple route, but otherwise sometimes it's just easier to do things the classic way.
Generally speaking, not everyone already has Boost installed, and I certainly wouldn't go through the hassle of downloading it and configuring my include directories to work with it just for its tuple facilities. I think you'll find that people already using Boost are more likely to find tuple uses in their programs than non-Boost users, and migrants from other languages (Python comes to mind) are more likely to simply be upset about the lack of tuples in C++ than to explore methods of adding tuple support.
As a data-store std::tuple has the worst characteristics of both a struct and an array; all access is nth position based but one cannot iterate through a tuple using a for loop.
So if the elements in the tuple are conceptually an array, I will use an array and if the elements are not conceptually an array, a struct (which has named elements) is more maintainable. ( a.lastname is more explanatory than std::get<1>(a)).
This leaves the transformation mentioned by the OP as the only viable usecase for tuples.
I have a feeling that many use Boost.Any and Boost.Variant (with some engineering) instead of Boost.Tuple.

What is the usefulness of project1st<Arg1, Arg2> in the STL?

I was browsing the SGI STL documentation and ran into project1st<Arg1, Arg2>. I understand its definition, but I am having a hard time imagining a practical usage.
Have you ever used project1st or can you imagine a scenario?
A variant of project1st (taking a std::pair, and returning .first) is quite useful. You can use it in combination with std::transform to copy the keys from a std::map<K,V> to a std::vector<K>. Similarly, a variant of project2nd can be used to copy the values from a map to a vector<V>.
As it happens, none of the standard algorithms really benefits from project1st. The closest is partial_sum(project1st), which would set all output elements to the first input element. It mainly exists because the STL is heavily founded in mathematical set theory, and there operations like project1st are basic building blocks.
My guess is that if you were using the strategy pattern and had a situation where you needed to pass an identity object, this would be a good choice. For example, there might be a case where an algorithm takes several such objects, and perhaps it is possible that you want one of them to do nothing under some situation.
Parallel programming. Imagine a situation where two processes come up with two valid but different results for a given computation, and you need to force them to be the same. project1st/2nd provides a very convenient way to perform this operation on a whole container, using an appropriate parallel call that takes a functor as an argument.
I assume that someone had a practical use for it, or it wouldn't have been written, but I'm drawing a blank on what it might have been. Presumably its use-case is similar to the identity function that the description mentions, where there's no real need for processing but the syntax requires a functor anyway.
The example on that same page suggests using it with the two-container form of std::transform, but if I'm not mistaken, the way they're using it is functionally identical to std::copy, so I don't see the point.
It looks like a solution in search of a problem to me.