How does one implement a container which exposes multiple ranges?

How does one implement a container which exposes multiple ranges? - c++

I have a container which (among other things) exposes a string buffer, and the upper case version of that string buffer. (Well, it isn't just upper case, but it is similar in concept) I want to allow a caller to do something similar to:
container c("Example");
auto const iter = c.begin() + 2;
std::printf("%c\n", iter->get_source()); // Prints a
std::printf("%c\n", iter->get_upper()); // Prints A
iter->set('x');
std::puts(c.get()); // Prints Exxmple
std::puts(c.get_upper()); // Prints EXXMPLE
The problem is, the "proxy" type with the member functions get_source, get_upper, etc. has no obvious place it can be stored, and an iterator is required to return a reference to something, not a value. (vector<bool> has a similar problem)
Alternately I could expose some kind of shell container or range, or expose completely separate iterator begin/end functions. Does anyone have experience doing something like this and know what works well?

My personal approach to this sort of things is to use property maps: I envision a system of algorithms which can [optionally] take a property map (or actually sometimes multiple property maps) for each range. The idea is that *it yields a key (e.g., the T& it currently do) which is then used with a property map which transforms the key into the actually accessed value. The transformation can, e.g., be the identity yielding the current behavior of the algorithms and a good default to be used when there is no property map. The example above would look something like this:
auto const cursor = c.begin();
std::printf("%c\n", c.map_source()(*cursor));
std::printf("%c\n", c.map_upper()(*cursor));
c.map_source()(*cursor, 'x');
std::copy(c.map_source(), c, std::ostreambuf_iterator<char>(std::cout));
std::copy(c.map_upper(), c, std::ostreambuf_iterator<char>(std::cout));
std::copy([](unsigned char c)->char{ return std::toupper(c); }, c,
std::ostreambuf_iterator<char>(std::cout));
The code assumes that the property maps yielding the source and the capitalized characters are obtained using c.map_source() and c.map_upper(), respectively. The last variant using std::copy() uses a lambda function as a property map.
Sadly, I still haven't found the time to write up a coherent proposal to apply various improvements to the STL algorithms. ... nor do I have have an implementation putting it all together (I have a somewhat clunky implementation which is about 10 years old and doesn't benefit from various C++11 features which make it a lot easier; also, this implementation only concentrates on property maps and doesn't use the interface I currently envision).

Related

How can we customize our own hash function for C++ unordered set to gain a specific order?

In competitive coding we face many question where we have to provide the output in an order of input. So, we need to make own hash function. Any idea how could I write my own hash function?

... you can't. An unordered_set is unordered. Writing your own hash function will not change that. A particular standard library implementation may hold an order for a particular hash function and a particular set of data stored in that unordered_set. But such code will not just not be portable, the order can change simply by adding more stuff to that set.
If you need to provide some output in the order of the input, then you should use a container that preserves the order you give it, like a vector.

Since implementations of unordered containers use hash tables (they are pretty much required to by the Standard) with separate chaining with linked lists, if you ensure that the number of buckets exceeds the range of your hash function (using reserve()) then it is reasonably likely - though not guaranteed - that elements will be stored in order of their hash value and for elements with the same hash value in insertion order.
I reiterate that this is not guaranted but in a coding competition where know you the implementation you may get away with it.
Also, this is of course inefficient since you will either need to reserve a huge number of buckets, requiring high memory usage, or restrict the range of your hash function resulting in collisions. You would be much better off using the ordered containers.

How can we customize our own hash function for C++ unordered set to gain a specific order?
You cannot reliably do what you want.
But why don't you use std::set (or std::map) for such a purpose? Look into this C++ reference, and read a good C++ programming book (and the C++11 standard n3337), for more.
We don't know what is your actual use case, but I might suggest making your own class, following the C++ rule of five, and having both
a std::map and a std::hash_map for the same mathematical relation.
class YourClass {
// incomplete, should follow the rule of five
private:
std::map<std::string, long> mapstr;
std::unordered_map<std::string, long> hashstr;
public:
void put(const std::string&str, long n) {
mapstr.insert({str,n});
hashstr.insert({str,n});
}
/// etc...
};
Of course, if you are coding a multi-threaded program you'll need to have some std::mutex field in the class above to serialize access using std::lock_guard....
Any idea how could I write my own hash function?
Writing a good enough hash function is generally easy.
Writing a very efficient hash function could still give you a PhD, and you'll find many papers in ACM sponsored conferences on that topic.
Here is a simple and naive hash function on strings:
std::size_t naive_string_hash(const std::string&str) {
constexpr unsigned k1 = 78139; // a prime number
constexpr unsigned k2 = 98129; // another prime number
std::size_t h = 38197; // yet another prime number
for (char c: str)
h = (k1 * h) ^ (k2 * (unsigned)c);
return h;
}
You might replace the bitwise exclusive or ^ with a + and read about Bézout's identity.
Recommendation: study existing open source code
I strong recommend looking, for inspiration, into the existing open source code in C++ (including the code of GCC and Clang, both being C++ compilers; or of FLTK or Qt) available on websites like github or gitlab. You could need to ask permission to your manager to study such code.
Recommendation : read documentation
I invite you to read the documentation of your C++ compiler (perhaps GCC or Clang), of your linker (perhaps binutils), of your source code editor (I like GNU emacs), of your version control system (e.g. git). If you are allowed to do so, I suggest using a GNU/Linux system (e.g. Debian or Ubuntu) on your computer (because Linux is mostly made of open source components, whose source code you can download and study).
See also http://linuxfromscratch.org/ and https://norvig.com/21-days.html

Updateable Priority Queue

Is there anything built into the C++ Standard Library that allows me to work in a priority queue/heap like data structure (i.e., can always pop the highest value from the list, can define how the highest value is determined for custom classes, etc.) but allows me to update the keys in the heap? I'm dealing with fairly simple data, pairs to be exact, but I need to be able to update the value of a given key within the heap easily for my algorithm to function. WHat is the best way to achieve this in C++?

Binary heaps (which are how priority queues are implemented in the C++ standard library) do not support arbitrary update-key operations. A common method if updates are infrequent is to extrinsically flag the original item as invalid, and reinsert the value with the new key; when an invalid value is popped, it is ignored.
The alternative is using a different PQ implementation which does support update-key, such as a binomial heap. Binomial heaps have the particular advantage of being manipulated by swinging pointers, instead of moving values. This streamlines the task of implementing operations like update-key and delete.

I'm not sure what you're take on Boost is, but I always consider a kind of almost standard library (some boost functionality has even ended up in standard library). In any case, if you're ok with using boost, then Boost.Heap provides a priority queue with updatable priority.
Like most boost libraries, it's header-only, so there's no linker hassles to go through and it won't make your build system any more complex. You can just #include it and use it.

I don't have the ability to comment on your question, but here is my understanding.
Your mentioned pairs, and it sounds like you need the ability to change the priority from some function of your first element of the pair to your second element.
That is, you want to initially use FirstComparator below, but then switch to SecondComparator.
typedef std::pair<X, Y> MyPair;
struct FirstComparator
{
bool operator() (const MyPair& left, const MyPair& right)
{
return left.first < right.first;
}
}
struct SecondComparator
{
bool operator() (const MyPair& left, const MyPair& right)
{
return left.second < right.second;
}
}
Because std::priority_queue is a template that includes a sorting criterion (as your question mentioned), you can create a second container of a different type. (I wrote less-than comparisons, so we have a min-heap.)
You will need to transfer the members into it.
std::priority_queue<MyPair, std::vector<MyPair>, FirstComparator> firstQueue;
// Populate and use firstQueue
// Now create an initially-empty queue sorting according to other criterion,
// and copy elements in.
std::priority_queue<MyPair, std::vector<MyPair>, SecondComparator> secondQueue;
// Use 'push' instead of 'emplace' for pre-C++11 code
while (! firstQueue.empty())
secondQueue.emplace(firstQueue.pop());
// Use secondQueue
An alternative approach is to use a single std::vector, and resort it with std::sort using different sorting criteria. C++11 allows you to create named or anonymous lambda functions for establishing such sorting criteria on-the-fly.
Since your question specifically involved priority queues, I won't get into that unless you're specifically interested.

Searching data using different keys

I am no expert in C++ and STL.
I use a structure in a Map as data. Key is some class C1.
I would like to access the same data but using a different key C2 too (where C1 and C2 are two unrelated classes).
Is this possible without duplicating the data?
I tried searching in google, but had a tough time finding an answer that I could understand.
This is for an embedded target where boost libraries are not supported.
Can somebody offer help?

You may store pointers to Data as std::map values, and you can have two maps with different keys pointing to the same data.
I think a smart pointer like std::shared_ptr is a good option in this case of shared ownership of data:
#include <map> // for std::map
#include <memory> // for std::shared_ptr
....
std::map<C1, std::shared_ptr<Data>> map1;
std::map<C2, std::shared_ptr<Data>> map2;
Instances of Data can be allocated using std::make_shared().

Not in the Standard Library, but Boost offers boost::multi_index

Two keys of different types
I must admit I've misread a bit, and didn't really notice you want 2 keys of different types, not values. The solution for that will base on what's below, though. Other answers have pretty much what will be needed for that, I'd just add that you could make an universal lookup function: (C++14-ish pseudocode).
template<class Key>
auto lookup (Key const& key) { }
And specialize it for your keys (arguably easier than SFINAE)
template<>
auto lookup<KeyA> (KeyA const& key) { return map_of_keys_a[key]; }
And the same for KeyB.
If you wanted to encapsulate it in a class, an obvious choice would be to change lookup to operator[].
Key of the same type, but different value
Idea 1
The simplest solution I can think of in 60 seconds: (simplest meaning exactly that it should be really thought through). I'd also switch to unordered_map as default.
map<Key, Data> data;
map<Key2, Key> keys;
Access via data[keys["multikey"]].
This will obviously waste some space (duplicating objects of Key type), but I am assuming they are much smaller than the Data type.
Idea 2
Another solution would be to use pointers; then the only cost of duplicate is a (smart) pointer:
map<Key, shared_ptr<Data>> data;
Object of Data will be alive as long as there is at least one key pointing to it.

What I usually do in these cases is use non-owned pointers. I store my data in a vector:
std::vector<Data> myData;
And then I map pointers to each element. Since it is possible that pointers are invalidated because of the future growth of the vector, though, I will choose to use the vector indexes in this case.
std::map<Key1, int> myMap1;
std::map<Key2, int> myMap2;
Don't expose the data containers to your clients. Encapsulate element insertion and removal in specific functions, which insert everywhere and remove everywhere.

Bartek's "Idea 1" is good (though there's no compelling reason to prefer unordered_map to map).
Alternatively, you could have a std::map<C2, Data*>, or std::map<C2, std::map<C1, Data>::iterator> to allow direct access to Data objects after one C2-keyed search, but then you'd need to be more careful not to access invalid (erased) Data (or more precisely, to erase from both containers atomically from the perspective of any other users).
It's also possible for one or both maps to move to shared_ptr<Data> - the other could use weak_ptr<> if that's helpful ownership-wise. (These are in the C++11 Standard, otherwise the obvious source - boost - is apparently out for you, but maybe you've implemented your own or selected another library? Pretty fundamental classes for modern C++).
EDIT - hash tables versus balanced binary trees
This isn't particularly relevant to the question, but has received comments/interest below and I need more space to address it properly. Some points:
1) Bartek's casually advising to change from map to unordered_map without recommending an impact study re iterator/pointer invalidation is dangerous, and unwarranted given there's no reason to think it's needed (the question doesn't mention performance) and no recommendation to profile.
3) Relatively few data structures in a program are important to performance-critical behaviours, and there are plenty of times when the relative performance of one versus another is of insignificant interest. Supporting this claim - masses of code were written with std::map to ensure portability before C++11, and perform just fine.
4) When performance is a serious concern, the advice should be "Care => profile", but saying that a rule of thumb is ok - in line with "Don't pessimise prematurely" (see e.g. Sutter and Alexandrescu's C++ Coding Standards) - and if asked for one here I'd happily recommend unordered_map by default - but that's not particularly reliable. That's a world away from recommending every std::map usage I see be changed.
5) This container performance side-track has started to pull in ad-hoc snippets of useful insight, but is far from being comprehensive or balanced. This question is not a sane venue for such a discussion. If there's another question addressing this where it makes sense to continue this discussion and someone asks me to chip in, I'll do it sometime over the next month or two.

You could consider having a plain std::list holding all your data, and then various std::map objects mapping arbitrary key values to iterators pointing into the list:
std::list<Data> values;
std::map<C1, std::list<Data>::iterator> byC1;
std::map<C2, std::list<Data>::iterator> byC2;
I.e. instead of fiddling with more-or-less-raw pointers, you use plain iterators. And iterators into a std::list have very good invalidation guarantees.

I had the same problem, at first holding two map for shared pointers sound very cool. But you will still need to manage this two maps(inserting, removing etc...).
Than I came up with other way of doing this.
My reason was; accessing a data with x-y or radius-angle. Think like each point will hold data but point could be described as cartesian x,y or radius-angle .
So I wrote a struct like
struct MyPoint
{
std::pair<int, int> cartesianPoint;
std::pair<int, int> radianPoint;
bool operator== (const MyPoint& rhs)
{
if (cartesianPoint == rhs.cartesianPoint || radianPoint == rhs.radianPoint)
return true;
return false;
}
}
After that I could used that as key,
std::unordered_map<MyPoint, DataType> myMultIndexMap;
I am not sure if your case is the same or adjustable to this scenerio but it can be a option.

Mapping vectors of arbitrary type

I need to store a list vectors of different types, each to be referenced by a string identifier. For now, I'm using std::map with std::string as the key and boost::any as it's value (example implementation posted here).
I've come unstuck when trying to run a method on all the stored vector, e.g.:
std::map<std::string, boost::any>::iterator it;
for (it = map_.begin(); it != map_.end(); ++it) {
it->second.reserve(100); // FAIL: refers to boost::any not std::vector
}
My questions:
Is it possible to cast boost::any to an arbitrary vector type so I can execute its methods?
Is there a better way to map vectors of arbitrary types and retrieve then later on with the correct type?
At present, I'm toying with an alternative implementation which replaces boost::any with a pointer to a base container class as suggested in this answer. This opens up a whole new can of worms with other issues I need to work out. I'm happy to go down this route if necessary but I'm still interested to know if I can make it work with boost::any, of if there are other better solutions.
P.S. I'm a C++ n00b novice (and have been spoilt silly by Python's dynamic typing for far too long), so I may well be going about this the wrong way. Harsh criticism (ideally followed by suggestions) is very welcome.
The big picture:
As pointed out in comments, this may well be an XY problem so here's an overview of what I'm trying to achieve.
I'm writing a task scheduler for a simulation framework that manages the execution of tasks; each task is an elemental operation on a set of data vectors. For example, if task_A is defined in the model to be an operation on "x"(double), "y"(double), "scale"(int) then what we're effectively trying to emulate is the execution of task_A(double x[i], double y[i], int scale[i]) for all values of i.
Every task (function) operate on different subsets of data so these functions share a common function signature and only have access to data via specific APIs e.g. get_int("scale") and set_double("x", 0.2).
In a previous incarnation of the framework (written in C), tasks were scheduled statically and the framework generated code based on a given model to run the simulation. The ordering of tasks is based on a dependency graph extracted from the model definition.
We're now attempting to create a common runtime for all models with a run-time scheduler that executes tasks as their dependencies are met. The move from generating model-specific code to a generic one has brought about all sorts of pain. Essentially, I need to be able to generically handle heterogenous vectors and access them by "name" (and perhaps type_info), hence the above question.
I'm open to suggestions. Any suggestion.

Looking through the added detail, my immediate reaction would be to separate the data out into a number of separate maps, with the type as a template parameter. For example, you'd replace get_int("scale") with get<int>("scale") and set_double("x", 0.2) with set<double>("x", 0.2);
Alternatively, using std::map, you could pretty easily change that (for one example) to something like doubles["x"] = 0.2; or int scale_factor = ints["scale"]; (though you may need to be a bit wary with the latter -- if you try to retrieve a nonexistent value, it'll create it with default initialization rather than signaling an error).
Either way, you end up with a number of separate collections, each of which is homogeneous, instead of trying to put a number of collections of different types together into one big collection.
If you really do need to put those together into a single overall collection, I'd think hard about just using a struct, so it would become something like vals.doubles["x"] = 0.2; or int scale_factor = vals.ints["scale"];
At least offhand, I don't see this losing much of anything, and by retaining static typing throughout, it certainly seems to fit better with how C++ is intended to work.

Chaining iterators for C++

Python's itertools implement a chain iterator which essentially concatenates a number of different iterators to provide everything from single iterator.
Is there something similar in C++ ? A quick look at the boost libraries didn't reveal something similar, which is quite surprising to me. Is it difficult to implement this functionality?

Came across this question while investigating for a similar problem.
Even if the question is old, now in the time of C++ 11 and boost 1.54 it is pretty easy to do using the Boost.Range library. It features a join-function, which can join two ranges into a single one. Here you might incur performance penalties, as the lowest common range concept (i.e. Single Pass Range or Forward Range etc.) is used as new range's category and during the iteration the iterator might be checked if it needs to jump over to the new range, but your code can be easily written like:
#include <boost/range/join.hpp>
#include <iostream>
#include <vector>
#include <deque>
int main()
{
std::deque<int> deq = {0,1,2,3,4};
std::vector<int> vec = {5,6,7,8,9};
for(auto i : boost::join(deq,vec))
std::cout << "i is: " << i << std::endl;
return 0;
}

In C++, an iterator usually doesn't makes sense outside of a context of the begin and end of a range. The iterator itself doesn't know where the start and the end are. So in order to do something like this, you instead need to chain together ranges of iterators - range is a (start, end) pair of iterators.
Takes a look at the boost::range documentation. It may provide tools for constructing a chain of ranges. The one difference is that they will have to be the same type and return the same type of iterator. It may further be possible to make this further generic to chain together different types of ranges with something like any_iterator, but maybe not.

I've written one before (actually, just to chain two pairs of iterators together). It's not that hard, especially if you use boost's iterator_facade.
Making an input iterator (which is effectively what Python's chain does) is an easy first step. Finding the correct category for an iterator chaining a combination of different iterator categories is left as an exercise for the reader ;-).

Check Views Template Library (VTL). It may not provided 'chained iterator' directly. But I think it has all the necessary tools/templates available for implementing your own 'chained iterator'.
From the VTL Page:
A view is a container adaptor, that provides a container interface to
parts of the data or
a rearrangement of the data or
transformed data or
a suitable combination of the data sets
of the underlying container(s). Since views themselves provide the container interface, they can be easily combined and stacked. Because of template trickery, views can adapt their interface to the underlying container(s). More sophisticated template trickery makes this powerful feature easy to use.
Compared with smart iterators, views are just smart iterator factories.

What you are essentially looking for is a facade iterator that abstracts away the traversing through several sequences.
Since you are coming from a python background I'll assume that you care more about flexibility rather than speed. By flexibility I mean the ability to chain-iterate through different sequence types together (vector, array, linked list, set etc....) and by speed I mean only allocating memory from the stack.
If this is the case then you may want to look at the any_iterator from adobe labs:
http://stlab.adobe.com/classadobe_1_1any__iterator.html
This iterator will give you the ability to iterate through any sequence type at runtime. To chain you would have a vector (or array) of 3-tuple any_iterators, that is, three any_iterators for each range you chain together (you need three to iterate forward or backward, if you just want to iterate forward two will suffice).
Let's say that you wanted to chain-iterate through a sequence of integers:
(Untested psuedo-c++ code)
typedef adobe::any_iterator AnyIntIter;
struct AnyRange {
AnyIntIter begin;
AnyIntIter curr;
AnyIntIter end;
};
You could define a range such as:
int int_array[] = {1, 2, 3, 4};
AnyRange sequence_0 = {int_array, int_array, int_array + ARRAYSIZE(int_array)};
Your RangeIterator class would then have an std::vector.
<code>
class RangeIterator {
public:
RangeIterator() : curr_range_index(0) {}
template <typename Container>
void AddAnyRange(Container& c) {
AnyRange any_range = { c.begin(), c.begin(), c.end() };
ranges.push_back(any_range);
}
// Here's what the operator++() looks like, everything else omitted.
int operator++() {
while (true) {
if (curr_range_index > ranges.size()) {
assert(false, "iterated too far");
return 0;
}
AnyRange* any_range = ranges[curr_range_index];
if (curr_range->curr != curr_range->end()) {
++(curr_range->curr);
return *(curr_range->curr);
}
++curr_range_index;
}
}
private:
std::vector<AnyRange> ranges;
int curr_range_index;
};
</code>
I do want to note however that this solution is very slow. The better, more C++ like approach is just to store all the pointers to the objects that you want operate on and iterate through that. Alternatively, you can apply a functor or a visitor to your ranges.

Not in the standard library. Boost might have something.
But really, such a thing should be trivial to implement. Just make yourself an iterator with a vector of iterators as a member. Some very simple code for operator++, and you're there.

No functionality exists in boost that implements this, to the best of my knowledge - I did a pretty extensive search.
I thought I'd implement this easily last week, but I ran into a snag: the STL that comes with Visual Studio 2008, when range checking is on, doesn't allow comparing iterators from different containers (i.e., you can't compare somevec1.end() with somevec2.end() ). All of a sudden it became much harder to implement this and I haven't quite decided yet on how to do it.
I wrote other iterators in the past using iterator_facade and iterator_adapter from boost, which are better than writing 'raw' iterators but I still find writing custom iterators in C++ rather messy.
If someone can post some pseudocode on how this could be done /without/ comparing iterators from different containers, I'd be much obliged.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js