Nested Dictionary/Array in C++ - c++

Everybody,
I am a Python/C# guy and I am trying to learn C++.
In Python I used to do things like:
myRoutes = {0:[1,2,3], 1:[[1,2],[3,4]], 2:[[1,2,3],[[1,2,3],[1,2,3]],[4]]}
Basically when you have arrays of variable length and you don't want to wast a 2D matrix for them, nesting arrays into a dictionary to keep track of them is a good option.
In C++ I tried std::map<int, std:map<int, std:map<int, int> > > and it works but I feel there got to bo a better way to do this.
I prefer to stick to the standard libraries but popular libraries like boost are also acceptable for me.
I appreciate your help,
Ali

It looks like part of the question is: 'How do I store heterogeneous data in a container?' There are several different approaches:
1) Use a ready-made class of array types that abstracts away the exact details (i.e., dimensionality). Example: Boost Basic Linear Algebra
2) Make an explicit list of element types, using Boost.variant
#import "boost/variant.hpp"
typedef boost::variant< ArrayTypeA, ArrayTypeB > mapelement;
typedef std::map<int, mapelement> mappingtype;
Building a visitor for variant type is a bit involved (it involves writing a subclass of boost::static_visitor<desired_return_type> with a single operator() overload for each type in your variant). On the plus side, visitors are statically type checked to ens ure that they implement the exact right set of handlers.
3) Use a wrapper type, like Boost.Any to wrap the different types of interest.
Overall, I think the first option (using a specialized array class) is probably the most reliable. I have also used variants heavily in recent code, and although the compile errors are long, once one gets used to those, it is great to have compile time checking, which I prefer strongly over Python's "run and find out you were wrong later" paradigm.

Many of us share the pain you're experiencing right now but there are solutions to them. One of those solutions is the Boost library (it's like the 2nd standard library of C++.) There's quite a few collection libraries. In your case I'd use Boost::Multi-Dimentional Arrays.
Looks like this:
boost::multi_array<double,3> myArray(boost::extents[2][2][2]);
Which creates a 2x2x2 array. The first type in the template parameters "double" specifies what type the array will hold and the second "3" the dimension count of the array. You then use the "extents" to impart the actual size of each of the dimensions. Quite easy to use and syntatically clear in its intent.
Now, if you're dealing with something in Python like foo = {0:[1,2,3], 1:[3,4,5]} what you're really looking for is a multimap. This is part of the standard library and is essentially a Red-Black tree, indexed by key but with a List for the value.

Related

In C++ how do you store 2 different data types in 1 data structure?

I most often work with python and therefore I'm used to being able to put a bool and integer in a single list. I realize that C++ has a different paradigm, however, I imagine that there is a workaround to this issue. Ideally I want a vector that could contain data that looks like {1, 7, true, 8, false, true, 9}. So this vector would have to be defined with syntax like (vector int bool intBoolsVec), however, I realize that isn't proper syntax.
I see that some people suggest using variant that was introduced in C++17, is this the best solution? Seems like this would be a common problem if C++ doesn't easily allow you to work with heterogenous containers, even if those containers are constrained to a couple defined types like a vector that only takes only ints and bools.
What is the easiest way to create a vector that contains both integers and booleans in C++? If someone could also provide me more insight on why C++ doesn't have an easy/obvious way to do this, that might help me better understand C++ as well.
My approach would probably be to create my own class, which does exactly what I want it to do. This might be your easiest solution, besides using std::any. Also, you could combine those by creating a custom array of std::any which only allows integers and booleans at certain entries for your example. This would be similar, but not equal to an array of std::variant. Also, In C++ you can store 2 types in a std::pair, if that fits your use-case.

D naming conventions: How is Phobos organized?

I'm making my own little library of handy functions and I'm trying to follow Phobos's naming convention but I'm getting really confused. How do I know where things would fit?
Example:
If there was a function like foldRight in Phobos (basically reduce in the reverse direction), which module would I find it in?
I can think of several:
std.algorithm: Because it's expressing an algorithm
std.array: Because I'm likely going to use it on arrays
std.container: Because it's used on containers, rather than single objects
std.functional: Because it's used mainly in functional programming
std.range: Because it operates on ranges as well
but I have no idea which one would be a good choice -- I could give a convincing argument for at least 3 of them.
What's the convention?
std.algorithm: yep and you can implement it like reduce!fun(retro(r))
this module specifies algorithms that run on sequences
std.array: no because it can also run on other ranges
these are helper functions that run only on build-in arrays
std.container: no because it doesn't define a data structure (like a treeset)
this defines data structures that are not build in into the language (for now a linked list, a binary tree and a deterministic array in terms of memory management)
std.functional: no because it doesn't operate on a function but on a range
this one takes a function and returns a different one
std.range: no because it doesn't define a range or provide a different way to iterate over one
the lack of a clear structure is one of my gripes with the phobos library TBH but really reading the first paragraph of the docs should tell you quite a bit of where to put the function

Initializing a AS 3.0 vector with type array - Vector.<Array>? (plus C++ equivelant)

I have a quick question for you all. I'm trying to convert over some ActionScript code to C++ and am having a difficult time with this one line:
private var edges:Vector.<Array>
What is this exactly? Is this essentially a multidimensional vector then? Or is this simply declaring the vector as a container? I understand from researching that vectors, like C++ vectors, have to be declared with a type. However, in C++ I can't just put down Array, I have to use another vector (probably) so it looks like:
vector<vector<T> example;
or possibly even
vector<int[]> example;
I don't expect you guys to know the C++ equivalent because I'm primarily posting this with AS tags, but if you could confirm my understand of the AS half, that would be great. I did some googling but didn't find any cases where someone used Array as it's type.
From Mike Chambers (adobe evangelist) :
"Essentially, the Vector class is a typed Array, and in addition to ensuring your collection is type safe, can also provide (sometimes significant) performance improvements over using an Array."
http://www.mikechambers.com/blog/2008/08/19/using-vectors-in-actionscript-3-and-flash-player-10/
Essentially the vector in C++ is based on the same principles. As far as porting a vector of Arrays in AS3 to C++, well that's not a conversion that is clear cut in principle, as you could have a collection (array) of various types in C++, such as a char array. However, it appears you've got the idea, as you've pretty much posted examples of both avenues in your question.
I would post some code but I think you've got it exactly. Weather you use a vector within a vector or you declare a specifically typed collection I think comes down to a matter of what works best for you specific project.
Also you might be interested in:
http://www.mikechambers.com/blog/2008/09/24/actioscript-3-vector-array-performance-comparison/

Idiomatic way to do list/dict in Cython?

My problem: I've found that processing large data sets with raw C++ using the STL map and vector can often be considerably faster (and with lower memory footprint) than using Cython.
I figure that part of this speed penalty is due to using Python lists and dicts, and that there might be some tricks to use less encumbered data structures in Cython. For example, this page (http://wiki.cython.org/tutorials/numpy) shows how to make numpy arrays very fast in Cython by predefining the size and types of the ND array.
Question: Is there any way to do something similar with lists/dicts, e.g. by stating roughly how many elements or (key,value) pairs you expect to have in them? That is, is there an idiomatic way to convert lists/dicts to (fast) data structures in Cython?
If not I guess I'll just have to write it in C++ and wrap in a Cython import.
Cython now has template support, and comes with declarations for some of the STL containers.
See http://docs.cython.org/src/userguide/wrapping_CPlusPlus.html#standard-library
Here's the example they give:
from libcpp.vector cimport vector
cdef vector[int] vect
cdef int i
for i in range(10):
vect.push_back(i)
for i in range(10):
print vect[i]
Doing similar operations in Python as in C++ can often be slower. list and dict are actually implemented very well, but you gain a lot of overhead using Python objects, which are more abstract than C++ objects and require a lot more lookup at runtime.
Incidentally, std::vector is implemented in a pretty similar way to list. std::map, though, is actually implemented in a way that many operations are slower than dict as its size gets large. For suitably large examples of each, dict overcomes the constant factor by which it's slower than std::map and will actually do operations like lookup, insertion, etc. faster.
If you want to use std::map and std::vector, nothing is stopping you. You'll have to wrap them yourself if you want to expose them to Python. Do not be shocked if this wrapping consumes all or much of the time you were hoping to save. I am not aware of any tools that make this automatic for you.
There are C API calls for controlling the creation of objects with some detail. You can say "Make a list with at least this many elements", but this doesn't improve the overall complexity of your list creation-and-filling operation. It certainly doesn't change much later as you try to change your list.
My general advice is
If you want a fixed-size array (you talk about specifying the size of a list), you may actually want something like a numpy array.
I doubt you are going to get any speedup you want out of using std::vector over list for a general replacement in your code. If you want to use it behind the scenes, it may give you a satisfying size and space improvement (I of course don't know without measuring, nor do you. ;) ).
dict actually does its job really well. I definitely wouldn't try introducing a new general-purpose type for use in Python based on std::map, which has worse algorithmic complexity in time for many important operations and—in at least some implementations—leaves some optimisations to the user that dict already has.
If I did want something that worked a little more like std::map, I'd probably use a database. This is generally what I do if stuff I want to store in a dict (or for that matter, stuff I store in a list) gets too big for me to feel comfortable storing in memory. Python has sqlite3 in the stdlib and drivers for all other major databases available.
C++ is fast not just because of the static declarations of the vector and the elements that go into it, but crucially because using templates/generics one specifies that the vector will only contain elements of a certain type, e.g. vector with tuples of three elements. Cython can't do this last thing and it sounds nontrivial -- it would have to be enforced at compile time, somehow (typechecking at runtime is what Python already does). So right now when you pop something off a list in Cython there is no way of knowing in advance what type it is , and putting it in a typed variable only adds a typecheck, not speed. This means that there is no way of bypassing the Python interpreter in this regard, and it seems to me it's the most crucial shortcoming of Cython for non-numerical tasks.
The manual way of solving this is to subclass the python list/dict (or perhaps std::vector) with a cdef class for a specific type of element or key-value combination. This would amount to the same thing as the code that templates are generating. As long as you use the resulting class in Cython code it should provide an improvement.
Using databases or arrays just solves a different problem, because this is about putting arbitrary objects (but with a specific type, and preferably a cdef class) in containers.
And std::map shouldn't be compared to dict; std::map maintains keys in sorted order because it is a balanced tree, dict solves a different problem. A better comparison would be dict and Google's hashtable.
You can take a look at the standard array module for Python if this is appropriate for your Cython setting. I'm not sure since I have never used Cython.
There is no way to get native Python lists/dicts up to the speed of a C++ map/vector or even anywhere close. It has nothing to do with allocation or type declaration but rather paying the interpreter overhead. The example you mention (numpy) is a C extension and is written in C for precisely this reason.
Just because it was not mentioned here: You can easily wrap for example a C++ vector in a custom extension type.
from libcpp.vector cimport vector
cdef class pyvector:
"""Extension type wrapping a vector"""
cdef vector[long] _data
cpdef void push_back(self, long x):
self._data.push_back(x)
#property
def data(self):
return self._data
In this way, you can store your data in a vector allowing fast Cython operations while still being able to access the data (with some overhead) from the Python side.

Why is the use of tuples in C++ not more common?

Why does nobody seem to use tuples in C++, either the Boost Tuple Library or the standard library for TR1? I have read a lot of C++ code, and very rarely do I see the use of tuples, but I often see lots of places where tuples would solve many problems (usually returning multiple values from functions).
Tuples allow you to do all kinds of cool things like this:
tie(a,b) = make_tuple(b,a); //swap a and b
That is certainly better than this:
temp=a;
a=b;
b=temp;
Of course you could always do this:
swap(a,b);
But what if you want to rotate three values? You can do this with tuples:
tie(a,b,c) = make_tuple(b,c,a);
Tuples also make it much easier to return multiple variable from a function, which is probably a much more common case than swapping values. Using references to return values is certainly not very elegant.
Are there any big drawbacks to tuples that I'm not thinking of? If not, why are they rarely used? Are they slower? Or is it just that people are not used to them? Is it a good idea to use tuples?
A cynical answer is that many people program in C++, but do not understand and/or use the higher level functionality. Sometimes it is because they are not allowed, but many simply do not try (or even understand).
As a non-boost example: how many folks use functionality found in <algorithm>?
In other words, many C++ programmers are simply C programmers using C++ compilers, and perhaps std::vector and std::list. That is one reason why the use of boost::tuple is not more common.
Because it's not yet standard. Anything non-standard has a much higher hurdle. Pieces of Boost have become popular because programmers were clamoring for them. (hash_map leaps to mind). But while tuple is handy, it's not such an overwhelming and clear win that people bother with it.
The C++ tuple syntax can be quite a bit more verbose than most people would like.
Consider:
typedef boost::tuple<MyClass1,MyClass2,MyClass3> MyTuple;
So if you want to make extensive use of tuples you either get tuple typedefs everywhere or you get annoyingly long type names everywhere. I like tuples. I use them when necessary. But it's usually limited to a couple of situations, like an N-element index or when using multimaps to tie the range iterator pairs. And it's usually in a very limited scope.
It's all very ugly and hacky looking when compared to something like Haskell or Python. When C++0x gets here and we get the 'auto' keyword tuples will begin to look a lot more attractive.
The usefulness of tuples is inversely proportional to the number of keystrokes required to declare, pack, and unpack them.
For me, it's habit, hands down: Tuples don't solve any new problems for me, just a few I can already handle just fine. Swapping values still feels easier the old fashioned way -- and, more importantly, I don't really think about how to swap "better." It's good enough as-is.
Personally, I don't think tuples are a great solution to returning multiple values -- sounds like a job for structs.
But what if you want to rotate three values?
swap(a,b);
swap(b,c); // I knew those permutation theory lectures would come in handy.
OK, so with 4 etc values, eventually the n-tuple becomes less code than n-1 swaps. And with default swap this does 6 assignments instead of the 4 you'd have if you implemented a three-cycle template yourself, although I'd hope the compiler would solve that for simple types.
You can come up with scenarios where swaps are unwieldy or inappropriate, for example:
tie(a,b,c) = make_tuple(b*c,a*c,a*b);
is a bit awkward to unpack.
Point is, though, there are known ways of dealing with the most common situations that tuples are good for, and hence no great urgency to take up tuples. If nothing else, I'm not confident that:
tie(a,b,c) = make_tuple(b,c,a);
doesn't do 6 copies, making it utterly unsuitable for some types (collections being the most obvious). Feel free to persuade me that tuples are a good idea for "large" types, by saying this ain't so :-)
For returning multiple values, tuples are perfect if the values are of incompatible types, but some folks don't like them if it's possible for the caller to get them in the wrong order. Some folks don't like multiple return values at all, and don't want to encourage their use by making them easier. Some folks just prefer named structures for in and out parameters, and probably couldn't be persuaded with a baseball bat to use tuples. No accounting for taste.
As many people pointed out, tuples are just not that useful as other features.
The swapping and rotating gimmicks are just gimmicks. They are utterly confusing to those who have not seen them before, and since it is pretty much everyone, these gimmicks are just poor software engineering practice.
Returning multiple values using tuples is much less self-documenting then the alternatives -- returning named types or using named references. Without this self-documenting, it is easy to confuse the order of the returned values, if they are mutually convertible, and not be any wiser.
Not everyone can use boost, and TR1 isn't widely available yet.
When using C++ on embedded systems, pulling in Boost libraries gets complex. They couple to each other, so library size grows. You return data structures or use parameter passing instead of tuples. When returning tuples in Python the data structure is in the order and type of the returned values its just not explicit.
You rarely see them because well-designed code usually doesn't need them- there are not to many cases in the wild where using an anonymous struct is superior to using a named one.
Since all a tuple really represents is an anonymous struct, most coders in most situations just go with the real thing.
Say we have a function "f" where a tuple return might make sense. As a general rule, such functions are usually complicated enough that they can fail.
If "f" CAN fail, you need a status return- after all, you don't want callers to have to inspect every parameter to detect failure. "f" probably fits into the pattern:
struct ReturnInts ( int y,z; }
bool f(int x, ReturnInts& vals);
int x = 0;
ReturnInts vals;
if(!f(x, vals)) {
..report error..
..error handling/return...
}
That isn't pretty, but look at how ugly the alternative is. Note that I still need a status value, but the code is no more readable and not shorter. It is probably slower too, since I incur the cost of 1 copy with the tuple.
std::tuple<int, int, bool> f(int x);
int x = 0;
std::tuple<int, int, bool> result = f(x); // or "auto result = f(x)"
if(!result.get<2>()) {
... report error, error handling ...
}
Another, significant downside is hidden in here- with "ReturnInts" I can add alter "f"'s return by modifying "ReturnInts" WITHOUT ALTERING "f"'s INTERFACE. The tuple solution does not offer that critical feature, which makes it the inferior answer for any library code.
Certainly tuples can be useful, but as mentioned there's a bit of overhead and a hurdle or two you have to jump through before you can even really use them.
If your program consistently finds places where you need to return multiple values or swap several values, it might be worth it to go the tuple route, but otherwise sometimes it's just easier to do things the classic way.
Generally speaking, not everyone already has Boost installed, and I certainly wouldn't go through the hassle of downloading it and configuring my include directories to work with it just for its tuple facilities. I think you'll find that people already using Boost are more likely to find tuple uses in their programs than non-Boost users, and migrants from other languages (Python comes to mind) are more likely to simply be upset about the lack of tuples in C++ than to explore methods of adding tuple support.
As a data-store std::tuple has the worst characteristics of both a struct and an array; all access is nth position based but one cannot iterate through a tuple using a for loop.
So if the elements in the tuple are conceptually an array, I will use an array and if the elements are not conceptually an array, a struct (which has named elements) is more maintainable. ( a.lastname is more explanatory than std::get<1>(a)).
This leaves the transformation mentioned by the OP as the only viable usecase for tuples.
I have a feeling that many use Boost.Any and Boost.Variant (with some engineering) instead of Boost.Tuple.