Something like boost::multi_index for Python - c++

I have come to appreciate a lot boost::multi_index in C++. It happens that I would happily use something like that in Python; for scripts that process data coming out from numerical intensive applications. Is there such a thing for Python? I just want to be sure that it doesn't exist, then I would try to implement it myself. Things that won't do it for me:
Wrapping boost::multi_index in Python. It simply doesn't scale.
Using sqlite3 in memory. It is ugly.

Since python collections only store references to objects, not objects themselves, theres isn't much difference between having one collection with multiple indexing schemes, and just having multiple collections.
You can for example have several dicts with your data, each of them using different keys to refer to them.

To answer your question of whether a similar thing exists in Python, I would say no.
One useful feature of Boost.MultiIndex is that elements can be modified in-place (via replace() or modify()). Python's native dict doesn't provide such a functionality and requires the key to be immutable. I haven't seen other implementations that allow the key to be altered. So in this specific area, there's no comparable thing as Boost.MultiIndex in Python.
If you only require multiple static views of your data, then I would agree with Radomir Dopieralski. You can wrap multiple dicts in your own class to provide a unified API to ensure the synchronization between different views. I don't know what you mean by "performance-aware transformations" but if you were talking about the computational complexity of the insertion/deletion operations, even with Boost.MultiIndex, "inserting an element into a multi_index_container reduces to a simple combination of elementary insertion operations on each of the indices, and similarly for deletion."

Related

Intrusive algorithms equivalents in Rust

I'm looking at the Rust programming language and trying to convert my C++ thinking to Rust. Common data structures such as lists and trees and have previously been implemented with pointers in C++, and I'm not sure how implement the exact equivalents in Rust. The data structures I'm interested in are the intrusive algorithms, similar to what is found in Boost intrusive libraries, and these are useful in embedded/system programming.
The linked list example in Rust (Dlist) is pretty much straight forward, but it uses a container type where the actual type is inside the container. The intrusive algorithm I'm looking for is a little bit the other way around: you have a main type where the list node is inserted or inherited.
Also, the famous linked list in Linux is also another example where the list data is in the members of the structures. This is like Boost member variant of the intrusive algorithms. This enables that you use your type in several lists/trees many times. How would this work with Rust?
So I'm unsure how to convert these kind of design patterns to Rust that I'm used to in C/C++. Anyone who had any success understanding this?
Rust wants you to think about ownership and lifetimes. Who owns the members and how long will they live?
In the question of Dlist, the answer is 'the container'. With intrusive algorithms there is no clear answer. Members of one list might be reused in another list, while others get destroyed with the first list. Ultimately, you probably want to use reference counting (std::sync::Arc).
I think there are two ways to accomplish something like that in Rust. Let's take a look at implementation of graphs, which typically use intrusive links.
The first approach relies on Rc<RefCell<Node>>. You can find more details here: Graphs and arena allocation
The second approach relies on vector indexes. You can find more information here: Modeling Graphs in Rust Using Vector Indices.
I believe the second approach is better, but I have not done any testing.

How does one best integrate with clojure abstractions?

I am implementing an ordered set in clojure, where I retrieve elements based on their rank. This means that I can retrieve the 4th element (according to the set's ordering), the 3rd, or the 7th, all in logarithmic time.
In order to get my new data structure integrated with clojure's common methods (or "abstractions") such as conj, get, nth, etc., Which is the better way to do it:
Actually implement conj, for example, in my datatype's protocol, or
Implement Rich Hickey's clojure.lang.IPersistentSet or some interface like it.
The first seems easier, but also easier to mess up the semantics of the function. The second seems like I am implementing an interface that was never meant to be part of the public API, and the actual methods that are associated with that interface (protocol) are confusingly different. For example, it seems that in order to implement conj with my set, I must implement a cons method of clojure.lang.IPersistentSet, which has a different name. There seems to have little documentation on how this all works, which poses a large challenge in implementing this ranked set.
Which one should I choose? Should I implement my own or the methods of a clojure.lang interface? If I should do the latter, where is some good documentation that can guide me through the prosses?
EDIT: I want to make it clear that I am trying to make a set from which you can retrieve any element (or "remove" it) in logarithmic time by specifying the element's rank (e.g., "give me the 5th element, mr. set."). To my knowledge, no such set yet exists in clojure.
Firstly, I have just released a library called avl.clj which implements persistent sorted maps and sets with support for the standard Clojure API (they are drop-in replacements for the built-in sorted collections), as well as transients and logarithmic time rank queries (via clojure.core/nth)1. Both Clojure and ClojureScript are supported; performance on the Clojure side is mostly on a par with the built-in variants in my preliminary benchmarking. Follow the link above if you'd like to give it a try. Any experience reports would be greatly appreciated!
As for the actual question: I'm afraid there isn't much in the way of documentation on Clojure's internal interfaces, but still, implementing them is the only way of making one's custom data structures fit in with the built-ins. core.rrb-vector (which I have written and now maintain) takes this approach, as do other Contrib libraries implementing various data structures. This is also what I've done with avl.clj, as well as sorted.clj (which is basically the ClojureScript port of the red-black-tree-based sorted collections backported to Clojure). All of these libraries, as well as Clojure's own gvec.clj file which implements the primitive-storing vectors produced by clojure.core/vector-of, can serve as examples of what's involved. (Though I have to say it's easy to miss a method here and there...)
The situation is much simpler in ClojureScript, where all the core protocols are defined at the top of core.cljs, so you can just look at the list and pick the ones relevant to your data structure. Hopefully the same will be true on the Clojure side one day.
1 Removal by rank is (disj my-set (nth my-set 123)) for now. I might provide a direct implementation later on if it turns out to make enough of a difference performance-wise. (I'll definitely write one to check if it does.)

Why would I ever choose not to use the clojure 1.5 reducers feature?

I was reading about clojure reducers introduced in 1.5, here: https://github.com/clojure/clojure/blob/master/changes.md. My understanding is that they're a performance enhancement on the existing map/filter/reduce function(s). So if that's the case, I'm wondering why they are in a new namespace, and do not simply replace the existing map/reduce/filter implementations. Stated differently, why would I ever not choose to use the new reducers feature?
EDIT:
In response to the inital two answers, here is a clarification:
I'm going to quote the release notes here:
Reducers provide a set of high performance functions for working with collections. The actual fold/reduce algorithms are specified via the collection being reduced. This allows each collection to define the most efficient way to reduce its contents.
This does not sound to me like the new map/filter/reduce functions are inherently parallel. For example, further down in the release notes it states:
It contains a new function, fold, which is a parallel reduce+combine
So unless the release note are poorly written, it would appear to me that there is one new function, fold, which is parallel, and the other functions are collection specific implementations that aim to produce the highest performance possible for the particular collection. Am I simply mis-reading the release notes here?
Foreword: you have problem and you are going to use parallelism, now problems two have you.
They're replacement in a sense they do that work in parallel (versus plain old sequential map and etc). Not all operations could be parallelized (in many cases operation has to be at least associative, also think about lazy sequences and iterators). Moreover, not every operation could be parallelized efficiently (there is always some coordination overhead, sometimes overhead is greater than parallelization gain).
They cannot replace the old implementations in some cases. For instance if you have infinite sequences or if you actually require sequential processing of the collection.
A couple of good reasons you might decide not to use reducers:
You need to maintain backwards compatibility with Clojure 1.4. This makes it tricky to use reducers in library code, for example, where you don't know what Clojure version your uses will be using
In some circumstances there are better options: for example if you are dealing with numerical arrays then you will almost certainly be better off using something like core.matrix instead.
I found the following write up by Rich Hickey that while still somewhat confusing, cleared (some) things up for me: http://clojure.com/blog/2012/05/08/reducers-a-library-and-model-for-collection-processing.html
In particular the summary:
By adopting an alternative view of collections as reducible, rather than seqable things, we can get a complementary set of fundamental operations that tradeoff laziness for parallelism, while retaining the same high-level, functional programming model. Because the two models retain the same shape, we can easily choose whichever is appropriate for the task at hand.

C++ Container selection/choices

There is plenty of discussion on StackOverflow and other sites on what type of C++ container to use, with the not so shocking conclusion "it depends on your needs".
Currently i'm using std::list on my interfaces, however i have no direct requirement on lists as opposed to vectors or deques; and in there lies my question.
I can't say what my requirements will be down the line. Todays its a list, tomorrow... who knows?
I've been toying with the idea of creating a wrapper class 'Collection' which does nothing more than expose the STL containers interface allowing me to alter the internals without breaking my interfaces if the need arises.
Is this worth the hassle?
Should i just suck it up and make a decision on my current needs?
Any opinions?
Cheers,
Ben
EDIT:
Thread safety is important.
The recompilation of code that consumes the interface is unacceptable.
You should write such class if only you are going to make an option in your program to use different container type or create some kind of run-time optimization but in general you should know what the container is used for and so you know how it's used and that leads to what your needs are.
Don't make a class that you use just because you don't understand different containers because it's a waste of resources. In such case you should learn more about a few main container types, such as list, vector, queue, probably map, and use whenever they are needed. The only reason why there are so many of them is that different situations require different containers to make programming easier and code more efficient. For example lists are good if you put and remove a lot while vector is faster if you do more of reading. Queues are good when there is a need to do things in exact order (priority_queue is the same, by the way, except you can use a specific order), maps are good for hashing current state or something like that.
You should write your code generically. But instead of defining a generic Container, use the STL way of decoupling algorithms from containers (iterators). Since you want to link dynamically, read this article, and you may find some things in boost (any_range...).
If you need a single container and want to change its type quickly, use a typedef as recommended by #icabod.
If you're writing algorithms that should work with different containers selected at compile-time, then implement them as template code on containers, or, if possible, iterators.
Only if you need to select a container type at run-time you should implement a polymorphic Container or Collection class + subclasses.

What is so great about STL? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
I am a Java developer trying to learn C++. I have many times read on the internet (including Stack Overflow) that STL is the best collections library that you can get in any language. (Sorry, I do not have any citations for that)
However after studying some STL, I am really failing to see what makes STL so special. Would you please shed some light on what sets STL apart from the collection libraries of other languages and make it the best collection library?
What is so great about the STL ?
The STL is great in that it was conceived very early and yet succeeded in using C++ generic programming paradigm quite efficiently.
It separated efficiently the data structures: vector, map, ... and the algorithms to operate on them copy, transform, ... taking advantage of templates to do so.
It neatly decoupled concerns and provided generic containers with hooks of customization (Comparator and Allocator template parameters).
The result is very elegant (DRY principle) and very efficient thanks to compiler optimizations so that hand-generated algorithms for a given container are unlikely to do better.
It also means that it is easily extensible: you can create your own container with the interface you wish, as long as it exposes STL-compliant iterators you'll be able to use the STL algorithms with it!
And thanks to the use of traits, you can even apply the algorithms on C-array through plain pointers! Talk about backward compatibility!
However, it could (perhaps) have been better...
What is not so great about the STL ?
It really pisses me off that one always have to use the iterators, I'd really stand for being able to write: std::foreach(myVector, [](int x) { return x+1;}); because face it, most of the times you want to iterate over the whole of the container...
But what's worse is that because of that:
set<int> mySet = /**/;
set<int>::const_iterator it = std::find(mySet.begin(), mySet.end(), 1005); // [1]
set<int>::const_iterator it = mySet.find(1005); // [2]
[1] and [2] are carried out completely differently, resulting in [1] having O(n) complexity while [2] has O(log n) complexity! Here the problem is that the iterators abstract too much.
I don't mean that iterators are not worthy, I just mean that providing an interface exclusively in terms of iterators was a poor choice.
I much prefer myself the idea of views over containers, for example check out what has been done with Boost.MPL. With a view you manipulate your container with a (lazy) layer of transformation. It makes for very efficient structures that allows you to filter out some elements, transform others etc...
Combining views and concept checking ideas would, I think, produce a much better interface for STL algorithms (and solve this find, lower_bound, upper_bound, equal_range issue).
It would also avoid common mistakes of using ill-defined ranges of iterators and the undefined behavior that result of it...
It's not so much that it's "great" or "the best collections library that you can get in *any* language", but it does have a different philosophy to many other languages.
In particular, the standard C++ library uses a generic programming paradigm, rather than an object-oriented paradigm that is common in languages like Java and C#. That is, you have a "generic" definition of what an iterator should be, and then you can implement the function for_each or sort or max_element that takes any class that implements the iterator pattern, without actually having to inherit from some base "Iterator" interface or whatever.
What I love about the STL is how robust it is. It is easy to extend it. Some complain that it's small, missing many common algorithms or iterators. But this is precisely when you see how easy it is to add in the missing components you need. Not only that, but small is beautiful: you have about 60 algorithms, a handful of containers and a handful of iterators; but the functionality is in the order of the product of these. The interfaces of the containers remain small and simple.
Because it's fashion to write small, simple, modular algorithms it gets easier to spot bugs and holes in your components. Yet, at the same time, as simple as the algorithms and iterators are, they're extraordinarily robust: your algorithms will work with many existing and yet-to-be-written iterators and your iterators work with many existing and yet-to-be-written algorithms.
I also love how simple the STL is. You have containers, you have iterators and you have algorithms. That's it (I'm lying here, but this is what it takes to get comfortable with the library). You can mix different algorithms with different iterators with different containers. True, some of these have constraints that forbid them from working with others, but in general there's a lot to play with.
Niklaus Wirth said that a program is algorithms plus data-structures. That's exactly what the STL is about. If Ruby and Python are string superheros, then C++ and the STL are an algorithms-and-containers superhero.
STL's containers are nice, but they're not much different than you'll find in other programming languages. What makes the STL containers useful is that they mesh beautifully with algorithms. The flexibility provided by the standard algorithms is unmatched in other programming languages.
Without the algorithms, the containers are just that. Containers. Nothing special in particular.
Now if you're talking about container libraries for C++ only, it is unlikely you will find libraries as well used and tested as those provided by STL if nothing else because they are standard.
The STL works beautifully with built-in types. A std::array<int, 5> is exactly that -- an array of 5 ints, which consumes 20 bytes on a 32 bit platform.
java.util.Arrays.asList(1, 2, 3, 4, 5), on the other hand, returns a reference to an object containing a reference to an array containing references to Integer objects containing ints. Yes, that's 3 levels of indirection, and I don't dare predict how many bytes that consumes ;)
This is not a direct answer, but as you're coming from Java I'd like to point this out. By comparison to Java equivalents, STL is really fast.
I did find this page, showing some performance comparisons. Generally Java people are very touchy when it comes to performance conversations, and will claim that all kinds of advances are occurring all the time. However similar advances are also occurring in C/C++ compilers.
Keep in mind that STL is actually quite old now, so other, newer libraries may have specific advantages. Given the age, its' popularity is a testament to how good the original design was.
There are four main reasons why I'd say that STL is (still) awesome:
Speed
STL uses C++ templates, which means that the compiler generates code that is specifically tailored to your use of the library. For example, map will automagically generate a new class to implement a map collection of 'key' type to 'value' type. There is no runtime overhead where the library tries to work out how to efficiently store 'key' and 'value' - this is done at compile time. Due to the elegant design some operations on some types will compile down to single assembly instructions (e.g. increment integer-based iterator).
Efficiency
The collections classes have a notion of 'allocators', which you can either provide yourself or use the library-provided ones which allocate only enough storage to store your data. There is no padding nor wastage. Where a built-in type can be stored more efficiently, there are specializations to handle these cases optimally, e.g. vector of bool is handled as a bitfield.
Exensibility
You can use the Containers (collection classes), Algorithms and Functions provided in the STL on any type that is suitable. If your type can be compared, you can put it into a container. If it goes into a container, it can be sorted, searched, compared. If you provide a function like 'bool Predicate(MyType)', it can be filtered, etc.
Elegance
Other libraries/frameworks have to implement the Sort()/Find()/Reverse() methods on each type of collection. STL implements these as separate algorithms that take iterators of whatever collection you are using and operate blindly on that collection. The algorithms don't care whether you're using a Vector, List, Deque, Stack, Bag, Map - they just work.
Well, that is somewhat of a bold claim... perhaps in C++0x when it finally gets a hash map (in the form of std::unordered_map), it can make that claim, but in its current state, well, I don't buy that.
I can tell you, though, some cool things about it, namely that it uses templates rather than inheritance to achieve its level of flexibility and generality. This has both advantages and disadvantages; a disadvantage is that lots of code gets duplicated by the compiler, and any sort of dynamic runtime typing is very hard to achieve; however, a key advantage is that it is incredibly quick. Because each template specialization is really its own separate class generated by the compiler, it can be highly optimized for that class. Additionally, many of the STL algorithms that operate on STL containers have general definitions, but have specializations for special cases that result in incredibly good performance.
STL gives you the pieces.
Languages and their environments are built from smaller component pieces, sometimes via programming language constructs, sometimes via cut-and-paste. Some languages give you a sealed box - Java's collections, for instance. You can do what they allow, but woe betide you if you want to do something exotic with them.
The STL gives you the pieces that the designers used to build its more advanced functionality. Directly exposing the iterators, algorithms, etc. give you an abstract but highly flexible way of recombining core data structures and manipulations in whatever way is suitable for solving your problem. While Java's design probably hits the 90-95% mark for what you need from data structures, the STL's flexibility raises it to maybe 99%, with the iterator abstraction meaning you're not completely on your own for the remaining 1%.
When you combine that with its speed and other extensibility and customizabiltiy features (allocators, traits, etc.), you have a quite excellent package. I don't know that I'd call it the best data structures package, but certainly a very good one.
Warning: percentages totally made up.
Unique because it
focuses on basic algorithms instead of providing ready-to-use solutions to specific application problems.
uses unique C++ features to implement those algorithms.
As for being best... There is a reason why the same approach wasn't (and probably won't) ever followed by any other language, including direct descendants like D.
The standard C++ library's approach to collections via iterators has come in for some constructive criticism recently. Andrei Alexandrescu, a notable C++ expert, has recently begun working on a new version of a language called D, and describes his experiences designing collections support for it in this article.
Personally I find it frustrating that this kind of excellent work is being put into yet another programming language that overlaps hugely with existing languages, and I've told him so! :) I'd like someone of his expertise to turn their hand to producing a collections library for the so-called "modern languages" that are already in widespread use, Java and C#, that has all the capabilities he thinks are required to be world-class: the notion of a forward-iterable range is already ubiquitous, but what about reverse iteration exposed in an efficient way? What about mutable collections? What about integrating all this smoothly with Linq? etc.
Anyway, the point is: don't believe anyone who tells you that the standard C++ way is the holy grail, the best it could possibly be. It's just one way among many, and has at least one obvious drawback: the fact that in all the standard algorithms, a collection is specified by two separate iterators (begin and end) and hence is clumsy to compose operations on.
Obviously C++, C#, and Java can enter as many pissing contests as you want them to. The clue as to why the STL is at least somewhat great is that Java was initially designed and implemented without type-safe containers. Then Sun decided/realised people actually need them in a typed language, and added generics in 1.5.
You can compare the pros and cons of each, but as to which of the three languages has the "greatest" implementation of generic containers - that is solely a pissing contest. Greatest for what? In whose opinion? Each of them has the best libraries that the creators managed to come up with, subject to other constraints imposed by the languages. C++'s idea of generics doesn't work in Java, and type erasure would be sub-standard in typical C++ usage.
The primary thing is, you can use templates to make using containers switch-in/switch-out, without having to resort to the horrendous mess that is Java's interfaces.
If you fail to see what usage the STL has, I recommend buying a book, "The C++ Programming Language" by Bjarne Stroustrup. It pretty much explains everything there is about C++ because he's the dude who created it.