Using unordered_map to store key-value pairs in STL

Using unordered_map to store key-value pairs in STL - c++

I have to store some data in my program as described below.
The data is high dimensional coordinates and the number of points in those coordinates. Following would be a simple example (with coordinate dimension 5):
coordinate # of points
(3, 5, 3, 5, 7) 6
(6, 8, 5, 8, 9) 4
(4, 8, 6, 7, 9) 3
Please note that even if I use 5 dimensions as an example, the actual problem is of 20 dimensions. The coordinates are always integers.
I want to store this information in some kind of data structure. The first thing that comes to my mind is a hash table. I tried unordered_map in STL. But cannot figure out how to use the coordinates as the key in unordered_map. Defining it as:
unordered_map<int[5], int> umap;
or,
unordered_map<int[], int> umap;
gives me a compilation error. What am I doing wrong?

unordered_map needs to know how to hash your coordinates. In addition, it needs a way to compare coordinates for equality.
You can wrap your coordinates in a class or struct and provide a custom operator == to compare coordinate points. Then you need to specialise std::hash to be able to use your Point struct as a key in unordered_map. While comparing coordinates for equality is fairly straightforward, it is up to you to decide how coordinates are hashed. The following is an overview of what you need to implement:
#include <vector>
#include <unordered_map>
#include <cmath>
class Point
{
std::vector<int> coordinates;
public:
inline bool operator == (const std::vector<int>& _other)
{
if (coordinates.size() != _other.size())
{
return false;
}
for (uint c = 0; c < coordinates.size(); ++c)
{
if (coordinates[c] != _other[c])
{
return false;
}
}
return true;
}
};
namespace std
{
template<>
struct hash<Point>
{
std::size_t operator() (const Point& _point) const noexcept
{
std::size_t hash;
// See https://www.boost.org/doc/libs/1_67_0/doc/html/hash/reference.html#boost.hash_combine
// for an example of hash implementation for std::vector.
// Using Boost just for this might be an overkill - you could use just the hash_combine code here.
return hash;
}
};
}
int main()
{
std::unordered_map<Point, int> points;
// Use points...
return 0;
}
In case you know how many coordinates you are going to have and you can name them like this
struct Point
{
int x1;
int x2;
int x3;
// ...
}
you could use a header-only hashing library I wrote exactly for this purpose. Your mileage may vary.

Hacky way
I've seen this being used in programming competitions for ease of use. You can convert the set of points to a string(concatenate each coordinate and separate them with a space or any other special character) and then use unordered_map<string, int>
unordered_map<string, int> map; int p[5] = {3, 5, 3, 5, 7};
string point = to_string(p[0]) + " " + to_string(p[1]) + " " to_string(p[2]) + " " to_string(p[3]) + " " to_string(p[4]);
map[point] = 6;

Related

C++ OpenCV: Convert vector<vector<Point>> to vector<vector<Point2f>>

I get a vector<vector<Point>> data by OpenCV. For some reasons (for example offset/scaling), I need to convert the data Point to Point2f. How can I do that?
For example:
std::vector<std::vector<Point> > contours;
std::vector<std::Vec4i> hierarchy;
cv::findContours(edges, contours, hierarchy, CV_RETR_TREE, CV_CHAIN_APPROX_SIMPLE);
std::vector<std::vector<Point2f> > new_contour;
I need new_contour which is vector of vector of Point2f. Is there a simple way that convert it to float type? Then I can do some modification (for example offset/scaling) by replacing each Point2f data.
I try it using push_back. But still get error when building...

You can use 2 nested loops: one for the outer vector and one for the inner.
In the code below you can replace the trivial conversion between int and float and apply any transformation you need.
Note that for the output contours, I allocated the outer vector using std::vector::resize and then in a similar way allocated all the inner vectors in the loop.
Alternatively you could use std::vector::reserve to do all the allocations together with std::vector::push_back for adding the elements.
std::vector<std::vector<cv::Point>> contours;
// ... obtain the contours
std::vector<std::vector<cv::Point2f>> contours_float;
contours_float.resize(contours.size()); // allocate the outer vector
for (size_t i = 0; i < contours.size(); ++i)
{
auto const & cur_contour = contours[i];
auto & cur_contour_float = contours_float[i];
cur_contour_float.resize(cur_contour.size()); // allocate the current inner vector
for (size_t j = 0; j < cur_contour.size(); ++j)
{
auto const & cur_point = cur_contour[j];
// Here you can apply any transformation you need:
float x = static_cast<float>(cur_point.x);
float y = static_cast<float>(cur_point.y);
cur_contour_float[j] = cv::Point2f{ x,y };
}
}
Another way to implement this is using std::transform (here I found it convenient to allocate the vectors by using the appropriate std::vector constructor):
std::vector<std::vector<cv::Point>> contours;
// ... obtain the contours
std::vector<std::vector<cv::Point2f>> contours_float(contours.size()); // allocate the outer vector
std::transform(
contours.begin(),
contours.end(),
contours_float.begin(),
[](auto const & cur_contour) -> auto
{
std::vector<cv::Point2f> cur_contour_float(cur_contour.size()); // allocate the current inner vector
std::transform(
cur_contour.begin(),
cur_contour.end(),
cur_contour_float.begin(),
[](auto const & cur_point) -> auto
{
// Here you can apply any transformation you need:
float x = static_cast<float>(cur_point.x);
float y = static_cast<float>(cur_point.y);
return cv::Point2f{ x,y };
});
return cur_contour_float;
});
The 2nd version actually implements the exact operation that you require (transformation from one representation to another).

I'll first focus on your actual question, which is just the type conversion.
Let's say we have some basic type aliases to make our life easier:
using pi_vec = std::vector<cv::Point2i>; // NB: Point is an alias for Point2i
using pf_vec = std::vector<cv::Point2f>;
using pi_vec_vec = std::vector<pi_vec>;
using pf_vec_vec = std::vector<pf_vec>;
To solve this problem, let's use the "divide and conquer" principle:
In order to convert a vector of vectors of points, we need to be able to convert a single vector of points
In order to convert a vector of points, we need to be able to convert a single point
Converting Single Points
This is actually trivial, since cv::Point_ provides a cast operator allowing implicit conversion to points of a different data type. Hence, conversion is done by a simple assignment:
cv::Point2i p1 = { 1,2 };
cv::Point2f p2 = p1;
Converting Vectors of Points
Since implicit conversion of points is possible, this is just as easy -- we simply construct the new vector initializing it using an iterator pair:
pi_vec v1 = { {1,2}, {3,4}, {5,6} };
pf_vec v2{ v1.begin(), v1.end() };
Converting Vectors of Vectors of Points
We do the above in a loop. To improve performance, we can reserve space in the destination vector to avoid reallocations. Then we use emplace_back to construct vectors of points in-place, using an iterator pair to initialize them as above.
pi_vec_vec vv1 = {
{ {1,2}, {3,4}, {5,6} }
, { {7, 8}, {9,10} }
};
pf_vec_vec vv2;
vv2.reserve(vv1.size());
for (auto const& pts : vv1) {
vv2.emplace_back(pts.begin(), pts.end());
}
Additional Operations During Conversion
The answer by #wohlstad already provides some possible approaches. However, looking at the second piece of code using std::transform made me wonder whether there was a way to make it a bit less verbose, perhaps taking advantage of features provided by more recent standard (like c++20). Here is my approach using std::views::transform.
First we create a "range adapter closure" wrapping our conversion lambda function, which applies some constant scaling and offset:
auto const tf = std::views::transform(
[](cv::Point2i const& pt) -> cv::Point2f
{
return cv::Point2f(pt) * 0.5 + cv::Point2f{ 1, -1 };
});
Next, we create an outer transform view of the input vector of vectors of integer points. The lambda function will create an inner transform view using the previously created "range adapter closure", and use this view to construct a vector of float points:
auto const tf_view2 = std::views::transform(vv1
, [&tf](pi_vec const& v) -> pf_vec
{
auto const tf_view = v | tf;
return { tf_view.begin(), tf_view.end() };
});
Finally, we'll construct a vector of vectors of float points, using an iterator pair of this view:
pf_vec_vec vv3{ tf_view2.begin(), tf_view2.end()};
Example Code
#include <opencv2/opencv.hpp>
#include <algorithm>
#include <ranges>
template <typename T>
void dump_v(std::vector<T> const& v)
{
for (auto const& e : v) {
std::cout << e << ' ';
}
std::cout << '\n';
}
template <typename T>
void dump_vv(std::vector<std::vector<T>> const& vv)
{
for (auto const& v : vv) {
dump_v(v);
}
std::cout << '\n';
}
int main()
{
using pi_vec = std::vector<cv::Point2i>;
using pf_vec = std::vector<cv::Point2f>;
using pi_vec_vec = std::vector<pi_vec>;
using pf_vec_vec = std::vector<pf_vec>;
pi_vec_vec vv1 = {
{ {1,2}, {3,4}, {5,6} }
, { {7, 8}, {9,10} }
};
dump_vv(vv1);
pf_vec_vec vv2;
vv2.reserve(vv1.size());
for (auto const& pts : vv1) {
vv2.emplace_back(pts.begin(), pts.end());
}
dump_vv(vv2);
auto const tf = std::views::transform(
[](cv::Point2i const& pt) -> cv::Point2f
{
return cv::Point2f(pt) * 0.5 + cv::Point2f{ 1, -1 };
});
auto const tf_view2 = std::views::transform(vv1
, [&tf](pi_vec const& v) -> pf_vec
{
auto const tf_view = v | tf;
return { tf_view.begin(), tf_view.end() };
});
pf_vec_vec vv3{ tf_view2.begin(), tf_view2.end()};
dump_vv(vv3);
return 0;
}
Example Output
[1, 2] [3, 4] [5, 6]
[7, 8] [9, 10]
[1, 2] [3, 4] [5, 6]
[7, 8] [9, 10]
[1.5, 0] [2.5, 1] [3.5, 2]
[4.5, 3] [5.5, 4]

More efficient way to get indices of a binary mask in Eigen3?

I currently have a bool mask vector generated in Eigen. I would like to use this binary mask similar as in Python numpy, where depending on the True value, i get a sub-matrix or a sub-vector, where i can further do some calculations on these.
To achieve this in Eigen, i currently "convert" the mask vector into another vector containing the indices by simply iterating over the mask:
Eigen::Array<bool, Eigen::Dynamic, 1> mask = ... // E.G.: [0, 1, 1, 1, 0, 1];
Eigen::Array<uint32_t, Eigen::Dynamic, 1> mask_idcs(mask.count(), 1);
int z_idx = 0;
for (int z = 0; z < mask.rows(); z++) {
if (mask(z)) {
mask_idcs(z_idx++) = z;
}
}
// do further calculations on vector(mask_idcs)
// E.G.: vector(mask_idcs)*3 + another_vector
However, i want to further optimize this and am wondering if Eigen3 provides a more elegant solution for this, something like vector(from_bin_mask(mask)), which may benefit from the libraries optimization.
There are already some questions here in SO, but none seems to answer this simple use-case
(1, 2). Some refer to the select-function, which returns an equally sized vector/matrix/array, but i want to discard elements via a mask and only work further with a smaller vector/matrix/array.
Is there a way to do this in a more elegant way? Can this be optimized otherwise?
(I am using the Eigen::Array-type since most of the calculations are element-wise in my use-case)

As far as I'm aware, there is no "out of the shelf" solution using Eigen's methods. However it is interesting to notice that (at least for Eigen versions greater or equal than 3.4.0), you can using a std::vector<int> for indexing (see this section). Therefore the code you've written could simplified to
Eigen::Array<bool, Eigen::Dynamic, 1> mask = ... // E.G.: [0, 1, 1, 1, 0, 1];
std::vector<int> mask_idcs;
for (int z = 0; z < mask.rows(); z++) {
if (mask(z)) {
mask_idcs.push_back(z);
}
}
// do further calculations on vector(mask_idcs)
// E.G.: vector(mask_idcs)*3 + another_vector
If you're using c++20, you could use an alternative implementation using std::ranges without using raw for-loops:
int const N = mask.size();
auto c = iota(0, N) | filter([&mask](auto const& i) { return mask[i]; });
auto masked_indices = std::vector(begin(c), end(c));
// ... Use it as vector(masked_indices) ...
I've implemented some minimal examples in compiler explorer in case you'd like to check out. I honestly wished there was a simpler way to initialize the std::vector from the raw range, but it's currently not so simple. Therefore I'd suggest you to wrap the code into a helper function, for example
auto filtered_indices(auto const& mask) // or as you've suggested from_bin_mask(auto const& mask)
{
using std::ranges::begin;
using std::ranges::end;
using std::views::filter;
using std::views::iota;
int const N = mask.size();
auto c = iota(0, N) | filter([&mask](auto const& i) { return mask[i]; });
return std::vector(begin(c), end(c));
}
and then use it as, for example,
Eigen::ArrayXd F(5);
F << 0.0, 1.1548, 0.0, 0.0, 2.333;
auto mask = (F > 1e-15).eval();
auto D = (F(filtered_indices(mask)) + 3).eval();
It's not as clean as in numpy, but it's something :)

I have found another way, which seems to be more elegant then comparing each element if it equals to 0:
Eigen::SparseMatrix<bool> mask_sparse = mask.matrix().sparseView();
for (uint32_t k = 0; k<mask.outerSize(); ++k) {
for (Eigen::SparseMatrix<bool>::InnerIterator it(mask_sparse, k); it; ++it) {
std::cout << it.row() << std::endl; // row index
std::cout << it.col() << std::endl; // col index
// Do Stuff or built up an array
}
}
Here we can at least build up a vector (or multiple vectors, if we have more dimensions) and then later use it to "mask" a vector or matrix. (This is taken from the documentation).
So applied to this specific usecase, we simply do:
Eigen::Array<uint32_t, Eigen::Dynamic, 1> mask_idcs(mask.count(), 1);
Eigen::SparseVector<bool> mask_sparse = mask.matrix().sparseView();
int z_idx = 0;
for (Eigen::SparseVector<bool>::InnerIterator it(mask_sparse); it; ++it) {
mask_idcs(z_idx++) = it.index()
}
// do Stuff like vector(mask_idcs)*3 + another_vector
However, i do not know which version is faster for large masks containing thousands of elements.

C++ Making sure 2D vector is compact in memory

I'm writing a C++ program to perform calculations on a huge graph and therefore has to be as fast as possible. I have a 100MB textfile of unweighted edges and am reading them into a 2D vector of integers (first index = nodeID, then a sorted list of nodeIDs of nodes which have edges to that node). Also, during the program, the edges are looked up exactly in the order in which they're stored in the list. So my expectation was that, apart from a few bigger gaps, it'd always be nicely preloaded to the cache. However, according to my profiler, iterating through the edges of a player is an issue. Therefore I suspect, that the 2D vector isn't placed in memory compactly.
How can I ensure that my 2D vector is as compact as possible and the subvectors in the order in which they should be?
(I thought for example about making a "2D array" from the 2D vector, first an array of pointers, then the lists.)
BTW: In case it wasn't clear: The nodes can have different numbers of edges, so a normal 2D array is no option. There are a couple ones with lots of edges, but most have very few.
EDIT:
I've solved the problem and my program is now more than twice as fast:
There was a first solution and then a slight improvement:
I put the lists of neighbour ids into a 1D integer array and had another array to know where a certain id's neighbour lists start
I got a noticeable speedup by replacing the pointer array (a pointer needs 64 bit) with a 32 bit integer array containing indices instead

What data structure are you using for the 2d vector? If you use std::vector then the memory will be contiguous.
Next, if pointers are stored then only the address will take advantage of the vectors spacial locality. Are you accessing the object pointed to when iterating the edges and if so this could be a bottleneck. To get around this perhaps you can setup your objects so they are also in contiguous memory and take advantage of spacial locality.
Finally the way in which you access the members of a vector affects the caching. Make sure you are accessing in an order advantageous to the container used (eg change column index first when iterating).
Here's some helpful links:
Cache Blocking Techniques
SO on cache friendly code

I have written a few of these type structures by having a 2D view onto a 1D vector and there are lots of different ways to do it. I have never made one that allows the internal arrays to vary in length before so this may contain bugs but should illustrate the general approach:
#include <cassert>
#include <iostream>
#include <vector>
template<typename T>
class array_of_arrays
{
public:
array_of_arrays() {}
template<typename Iter>
void push_back(Iter beg, Iter end)
{
m_idx.push_back(m_vec.size());
m_vec.insert(std::end(m_vec), beg, end);
}
T* operator[](std::size_t row) { assert(row < rows()); return &m_vec[m_idx[row]]; }
T const* operator[](std::size_t row) const { assert(row < rows()); return &m_vec[m_idx[row]]; }
std::size_t rows() const { return m_idx.size(); }
std::size_t cols(std::size_t row) const
{
assert(row <= m_idx.size());
auto b = m_idx[row];
auto e = row + 1 >= m_idx.size() ? m_vec.size() : m_idx[row + 1];
return std::size_t(e - b);
}
private:
std::vector<T> m_vec;
std::vector<std::size_t> m_idx;
};
int main()
{
array_of_arrays<int> aoa;
auto data = {2, 4, 3, 5, 7, 2, 8, 1, 3, 6, 1};
aoa.push_back(std::begin(data), std::begin(data) + 3);
aoa.push_back(std::begin(data) + 3, std::begin(data) + 8);
for(auto row = 0UL; row < aoa.rows(); ++row)
{
for(auto col = 0UL; col < aoa.cols(row); ++col)
{
std::cout << aoa[row][col] << ' ';
}
std::cout << '\n';
}
}
Output:
2 4 3
5 7 2 8 1

using C++ priority_queue comparator correctly

This question was asked in an interview recently
public interface PointsOnAPlane {
/**
* Stores a given point in an internal data structure
*/
void addPoint(Point point);
/**
* For given 'center' point returns a subset of 'm' stored points that are
* closer to the center than others.
*
* E.g. Stored: (0, 1) (0, 2) (0, 3) (0, 4) (0, 5)
*
* findNearest(new Point(0, 0), 3) -> (0, 1), (0, 2), (0, 3)
*/
vector<Point> findNearest(vector<Point> points, Point center, int m);
}
This is following approach I used
1) Create a max heap priority_queue to store the closest points
priority_queue<Point,vector<Point>,comp> pq;
2) Iterate the points vector and push a point if priority queue size < m
3) If size == m then compare the queue top with current point and pop if necessary
for(int i=0;i<points.size();i++)
{
if(pq.size() < m)
{
pq.push(points[i]);
}
else
{
if(compareDistance(points[i],pq.top(),center))
{
pq.pop();
pq.push(points[i]);
}
}
}
4) Finally put the contents of priority queue in a vector and return.
How should I write the comp and the compareDistance comparator which will allow me to store m points initially and then compare the current point with the one on top?

I think your approach can be changed so that it uses the priority_queue in a different way. The code becomes a bit complex since there's an if-statement in the for loop, and this if-statement controls when to add to the priority_queue. Why not add all the points to the priority_queue first, and then pop out m points? Let the priority_queue do all the work.
The key to implementing the findNearest function using a priority_queue is to realize that the comparator can be a lambda that captures the center parameter. So you can do something like so:
#include <queue>
#include <vector>
using namespace std;
struct Point { int x, y; };
constexpr int distance(const Point& l, const Point& r)
{
return (l.x - r.x)*(l.x - r.x) + (l.y - r.y)*(l.y - r.y);
}
vector<Point> findNearest(const vector<Point>& points, Point center, int m)
{
auto comparator = [center](const Point& l, const Point& r) {
return distance(l, center) > distance(r, center);
};
priority_queue<Point, vector<Point>, decltype(comparator)> pq(comparator);
for (auto&& p : points) {
pq.emplace(p);
}
vector<Point> result;
for (int i = 0; i < m; ++i) {
result.push_back(pq.top());
pq.pop();
}
return result;
}
In an interview setting it's also good to talk about the flaws in the algorithm.
This implementation runs in O(nlogn). There's going to be a clever algorithm that will beat this run time, especially since you only need the closest m points.
It uses O(n) more space because of the queue, and we should be able to do better. What's really happening in this function is a sort, and sorts can be implemented in-place.
Prone to integer overflow. A good idea would be use a template on the Point struct. You can also use a template to make the points container generic in the findNearest function. The container just has to support iteration.

Swapping two values within a 2D array

I am currently working on a 15 puzzle programming assignment. My question here is about how I would go about swapping the empty tile with an adjacent tile.
So, for example, let's go with the initial setup board.
I have:
int originalBoard[4][4] = {
{1 , 2, 3, 4},
{5 , 6, 7, 8},
{9 ,10,11,12},
{13,14,15, 0}};
So here, the locations of 12, 15, and 0 (the empty tile) in the array are [3][4], [4][3], and [4][4] respectively. What would be a method of swapping 0 out with either 12 or 15?
What I had in mind for this was creating a loop that would keep track of the empty tile every time I made a move.
I believe an optimal method would be to have two functions. 1 that would update the location of the empty tile, and 1 to make the move.
So, right off the top of my head I would have:
void locateEmptyTile(int& blankRow, int& blankColumn, int originalBoard[4][4])
{
for (int row = 0; row < 4; row++)
{
for (int col = 0; col < 4; col++)
{
if (originalBoard[row][col] == 0)
{
blankRow = row;
blankColumn = col;
}
}
}
}
void move(int& blankRow, int& blankColumn, int originalBoard[4][4])
{
}
And in my main function I would have the variables: int blankRow and int blankColumn
Now, how would I take that data from locateEmptyTile and apply it into the move function in the relevant practical manner? The process does not currently connect within my head.
I appreciate any little bits of help.

If you're just asking for swap function you can use std::swap:
#include <algorithm> // until c++11
#include <utility> // since c++11
...
int m[3][3];
...
//somewhere in the code
std::swap(m[i][j], m[j][i]); // this swaps contents of two matrix cells
...
Or you can just write where you want to swap contents of two variables (in example int a and int b):
int temp = a;
a = b;
b = temp;
As you can see swapping is the same as with normal arrays, c++ does not know if you are swapping two matrix cells or two array elements, it just knows that you are swapping two memory blocks with certain type.

A basic swap concept (pre-C++11) is hold a temporary variable. Simply...
template<typename T, typename U>
void swap(T& lhs, U& rhs) {
T t = lhs;
lhs = rhs;
rhs = t;
}
So, you don't need to reference blankRow and blankCol, you just need to reference the values on the grid. Lets say that you want to swap what you know is blank positioned at (2, 1) with (2, 2)...
swap(originalBoard[2][1], originalBoard[2][2]);
... will swap the values within originalBoard.
If you are using C++11 or later, just use std::swap() to swap positions. That's exactly what it does.
If you would like originalBoard to be immutable an result in a totally different board, just copy it first before applying the switch.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Using unordered_map to store key-value pairs in STL - c++

Related

C++ OpenCV: Convert vector<vector<Point>> to vector<vector<Point2f>>

More efficient way to get indices of a binary mask in Eigen3?

C++ Making sure 2D vector is compact in memory

using C++ priority_queue comparator correctly

Swapping two values within a 2D array

Categories

Resources