Translating python dictionary to C++ - c++

I have python code that contains the following code.
d = {}
d[(0,0)] = 0
d[(1,2)] = 1
d[(2,1)] = 2
d[(2,3)] = 3
d[(3,2)] = 4
for (i,j) in d:
print d[(i,j)], d[(j,i)]
Unfortunately looping over all the keys in python isn't really fast enough for my purpose, and I would like to translate this code to C++. What is the best C++ data structure to use for a python dictionary that has tuples as its keys? What would be the C++ equivalent of the above code?
I looked at sparse matrices in the boost library, but couldn't find an easy way to loop only over the non-zero elements.

A dictionary would be a std::map in c++, and a tuple with two elements would be a std::pair.
The python code provided would translate to:
#include <iostream>
#include <map>
typedef std::map<std::pair<int, int>, int> Dict;
typedef Dict::const_iterator It;
int main()
{
Dict d;
d[std::make_pair(0, 0)] = 0;
d[std::make_pair(1, 2)] = 1;
d[std::make_pair(2, 1)] = 2;
d[std::make_pair(2, 3)] = 3;
d[std::make_pair(3, 2)] = 4;
for (It it(d.begin()); it != d.end(); ++it)
{
int i(it->first.first);
int j(it->first.second);
std::cout <<it->second <<' '
<<d[std::make_pair(j, i)] <<'\n';
}
}

The type is
std::map< std::pair<int,int>, int>
The code to add entries to map is like here:
typedef std::map< std::pair<int,int>, int> container;
container m;
m[ make_pair(1,2) ] = 3; //...
for(container::iterator i = m.begin(); i != m.end(); ++i){
std::cout << i.second << ' ';
// not really sure how to translate [i,j] [j,i] idiom here easily
}

Have a look at Boost.python. It's for interaction between python and C++ (basically building python libs using C++, but also for embedding python in C++ programs). Most pythons data structures and their C++ equivalents are described (didn't checked for the one you want).

std::map or more likely std::tr1::unordered_map / boost::unordered_map (aka hash_map) is what you want.
Also, as kriss said, Boost.Python is a good idea to look at here. It provides a C++ version of python's dict class already, so if you're doing cross-language stuff, it might be useful.

Map is often implemented as a balanced binary tree not a hash table. This not the case for a Python dict. So you need a C++ O(1) equivalent data structure to use your pairs with.

Do you want to call an optimized C++ routine via Python? If so, read on:
Often times I use PyYaml when dealing with dictionaries in Python. Perhaps you could link in something like LibYAML or yamlcpp to:
Translate a Python dictionary into a YAML string
Use Python to call a C++ function wrapped using something like SWIG, taking the YAML string as a parameter.
Use a C++ library to parse the YAML & obtain a std::map object
Operate on std::map object
Warning: I have never tried this, but using everyone's favorite search engine on "yaml std::map" yields lots of interesting links

As a direct answer to your question (for the python part look at my other answer). You can forget the tuple part if you want. You can use any mapping type key/value (hash, etc.) in C++, you just have to find a unique key function. In some cases that can be easy. For instance if you two integers are integers between 1 and 65536 you just could use a 32 bits integer with each 16 bits part one of the keys. A simple shift and an 'or' or + to combine the two values would do the trick and it's very efficient.

Related

C++, How to use maps for holding multiple integer values

I'm working on a word Tagging system for a C++ project. I need a system where a map stores the following key-value information:
word["with"] = 16, 6, 15;
Where ["with"] is the index, and the 3-tuple (16, 6, 15) are values of the index. I've tried maps, but I keep getting semantic errors, which I understand are a result of not being able to give a key more then 1 value.
I tried multi maps, but I can't seem to get the syntax to suit my needs?
I would like to refrain from using Structs or Classes, as this database already contains 200 words, and I'm trying to keep my lines of code readable and too a minimum.
How would I go about this? Am I missing something? How would you declare a system like this?
You should declare your map as std::map<std::string, std::vector<unsigned int>>, so you can have a vector of values for your index.
You can make a map that maps Strings to Vectors or some other data structure that can hold an arbitrary number of integers.
Worth noting, however, that things like Structs and Classes are components of a language meant to organize code. Structs group related data; classes model groups of related data and their associated behaviors. It's certainly possible to do everything without them but that would make for some very unreadable code.
The number of lines and whether or not you use classes/structs are poor metrics for the complexity and readability of your code. And the modularity they offer far exceeds the minute runtime cost of dereferencing those values.
word["with"] = 16, 6, 15;//This usage is wrong
std::multimap or std::unordered_multimap should work for you.
If you define word as follows:
std::multimap<std::string,int> word;
You should insert values to map as shown below:
std::string key="with";
word.insert(std::pair<std::string,int>(key,16));
word.insert(std::pair<std::string,int>(key,6));
word.insert(std::pair<std::string,int>(key,15));
for( auto &x : word)
std::cout<<x.first<<" " << x.second<<"\n";
As user4581301 pointed out in comment if you have C++11 enabled compiler, you can insert values into std::multimap as follows:
word.emplace("with",16);
word.emplace("with",6);
word.emplace("with",15);
Demo: http://coliru.stacked-crooked.com/a/c7ede5c497172c5d
Example for using C++ maps to hold multiple integer values:
#include<iostream>
#include<map>
#include<vector>
using namespace std;
int main(){
std::map<int, std::vector<int> > mymap2;
std::vector<int> myvector;
myvector.push_back(8);
myvector.push_back(11);
myvector.push_back(53);
mymap2[5] = myvector;
cout << mymap2[5][0] << endl;
cout << mymap2[5][1] << endl;
cout << mymap2[5][2] << endl;
}
Prints:
8
11
53
Just replace the int datatypes with a string and you should be able to map strings to lists of numbers.

C++ Standard Library hash code sample

I solved a problem to find duplicates in a list
I used the property of a set that it contains only unique members
set<int> s;
// insert the new item into the set
s.insert(nums[index]);
// if size does not increase there is a duplicate
if (s.size() == previousSize)
{
DuplicateFlag = true;
break;
}
Now I am trying to solve the same problem with hash functions in the Standard Library. I have sample code like this
#include <functional>
using namespace __gnu_cxx;
using namespace std;
hash<int> hash_fn2;
int x = 34567672;
size_t int_hash2 = hash_fn2(x);
cout << x << " " << int_hash2 << '\n';
x and int_hash2 are always the same
Am I missing something here ?
For std::hash<int>, it's ok to directly return the original int value. From the specification, it only needs to ensure that for two different parameters k1 and k2 that are not equal, the probability that std::hash<Key>()(k1) == std::hash<Key>()(k2) should be very small, approaching 1.0/std::numeric_limits<size_t>::max(). Clearly returning the original value satisfies the requirement for std::hash<int>.
x and int_hash2 are always the same Am I missing something here ?
Yes. You say "I am trying to solve the same problem with hash functions", but hash functions are not functional alternatives to std::set<>s, and can not - by themselves - be used to solve your poroblem. You probably want to use a std::unordered_set<>, which will internally use a hash table, using the std::hash<> function (by default) to help it map from elements to "buckets". For the purposes of a hash table, a hash function for integers that returns the input is usually good enough, and if it's not the programmer's expected to provide their preferred alternative as a template parameter.
Anyway, all you have to do to try a hash table approach is change std:set<int> s; to std::unordered_set<int> s; in your original code.

Speed up access to many std::maps with same key

Suppose you have a std::vector<std::map<std::string, T> >. You know that all the maps have the same keys. They might have been initialized with
typedef std::map<std::string, int> MapType;
std::vector<MapType> v;
const int n = 1000000;
v.reserve(n);
for (int i=0;i<n;i++)
{
std::map<std::string, int> m;
m["abc"] = rand();
m["efg"] = rand();
m["hij"] = rand();
v.push_back(m);
}
Given a key (e.g. "efg"), I would like to extract all values of the maps for the given key (which definitely exists in every map).
Is it possible to speed up the following code?
std::vector<int> efgValues;
efgValues.reserve(v.size());
BOOST_FOREACH(MapType const& m, v)
{
efgValues.push_back(m.find("efg")->second);
}
Note that the values are not necessarily int. As profiling confirms that most time is spent in the find function, I was thinking about whether there is a (GCC and MSVC compliant C++03) way to avoid locating the element in the map based on the key for every single map again, because the structure of all the maps is equal.
If no, would it be possible with boost::unordered_map (which is 15% slower on my machine with the code above)? Would it be possible to cache the hash value of the string?
P.S.: I know that having a std::map<std::string, std::vector<T> > would solve my problem. However, I cannot change the data structure (which is actually more complex than what I showed here).
You can cache and playback the sequence of comparison results using a stateful comparator. But that's just nasty; the solution is to adjust the data structure. There's no "cannot." Actually, adding a stateful comparator is changing the data structure. That requirement rules out almost anything.
Another possibility is to create a linked list across the objects of type T so you can get from each map to the next without another lookup. If you might be starting at any of the maps (please, just refactor the structure) then a circular or doubly-linked list will do the trick.
As profiling confirms that most time is spent in the find function
Keeping the tree data structures and optimizing the comparison can only speed up the comparison. Unless the time is spent in operator< (std::string const&, std::string const&), you need to change the way it's linked together.

How to make a variable name without creating an array in C++?

How do you make a variable name where you create a variable and then in brackets the variable number? (By the way, I'm just guessing out how the code should be so that you get what I'm trying to say.) For example:
int var[5];
//create a variable var[5], but not var[4], var[3], var[2], etc.
Then, the variable number must be able to be accessed by a variable value:
int number = 5;
int var[number]; //creates a var[5], not a var[4], etc.
int var[2]; //creates a var[2], not a var[1], etc.
cout >>var[number];
number = 2;
cin << var[number];
If I'm way off track with my "example", please suggest something else. I need something similar to this for my game to operate, because I must be able to create an unlimited instance of bullets, but they will also be destroyed at one point.
It looks like you are looking for the functionality provided by std::map which is a container used to map keys to values.
Documentation of std::map
Example use
In the below example we bind the value 123 to the integer key 4, and the value 321 to key 8. We then use a std::map<int,int>::const_iterator to iterate over the key/value pairs in our std::map named m.
#include <map>
...
std::map<int, int> m;
m[4] = 123;
m[8] = 321;
for (std::map<int, int>::const_iterator cit = m.begin (); cit != m.end (); ++cit)
std::cout << cit->first << " -> " << cit->second << std::endl;
output:
4 -> 123
8 -> 321
It looks like you want variable length arrays, which is not something C++ supports. In most cases, the correct solution is to use an std::vector instead, as in
int number = 42; // or whatever
std::vector<int> var(number);
You can use std::vector as you would use an array in most cases, and you gain a lot of bonus functionality.
If I understand what you want correctly (which I'm not certain that I do), you want to be able to create a place to hold objects and use them according to some index number, but to only create the specific objects which go in it on demand. You want do to this either because 1) you don't know how many objects you're going to create or 2) you aren't going to use every index number or 3) both.
If (1) then you should probably just use a vector, which is an array-like structure which grows automatically as you add more things to it. Look up std::vector.
If (2) then you could use an array of pointers and initially set all of the values to null and then use new to create the objects as needed. (Or you could use the solution recommend in part 3.)
If (3) then you want to use some form of map or hash table. These structures will let you find things by number even when not all numbers are in use and will grow as needed. I would highly recommend a hash table, but in C++, there isn't one in the STL, so you have to build your own or find one in a third-party library. For ease, you can use std::map, which is part of the STL. It does basically the same thing, but is slower. Some C++ distributions also include std::hash_map. If it's available, that should be used instead because it will be faster than std::map.

How does one test item membership in an unordered sequence (C++)?

I've used python for a long time, and I'm just beginning to use C++
In python, if one has a set or a dictionary it is relatively easy to get a boolean value indicating whether or not a particular item is in that sequence using the in keyword.
i.e.
a = set(2,4,3)
if 4 in a
print "yes, 4 is in a, thank you for asking!"
it's much more efficient than doing this:
a = [2,3,4]
for number in a
>if number == 4
>>return "yes, 4 is in a, thank you for asking!"
is there a way to do make a membership test simple and efficient in cpp or do you always have to iterate through some ordered sequence?
You have functionality like this in std::set and tr1::unordered_set (not yet in C++ standard).
#include <set>
#include <cstdio>
int main() {
std::set<int> s;
s.insert(1);
s.insert(2);
s.insert(4);
if (s.find(4) != s.end())
puts("4 found!");
return 0;
}
In reality, if your data set is small, linear search may still be the faster option.
Get to know the C++ Standard Template Library. The set class (and others) has a find() method that will return an iterator to an item in the set if it exists.
Take a look at the containers offered by STL and their performance characteristics.
Python's method is not "much more efficient" because you don't know the complexity of in construct.
In C++ there are many methods of storing data. Binary trees are best for searching.
If you work with numbers like 2,3,4 etc, you may consider having an array of bools, and simply see if array[4] == true
template<typename T>
bool contains(const std::set<T>& a, const T& value) {
return a.find(value) != a.end();
}
if (contains(a, 4)) {
std::cout << "A contains 4\n";
}
std::find can determine whether or not an element exists in any unordered container or sequence. If the sequence is unordered and not stored in a specialized container designed for lookups, it's unlikely you're going to do any better than O(N).