C++ container set + array functionality

C++ container set + array functionality - c++

Which is the best container in C++ which can -
store only unique values (such as set)
can lookup those values using index in constant time (such as array)
I basically need to to iterate in phase one and collect all the unique elements, order really doesn't matter.
However, in phase two, I then have to provide each element in the container, but can only provide it one by one. Since caller can know the size of my container, it provides me index one by one, such that 0 < idx < size of the container.
Right now, the only solution that comes to my mind is two maintain two containers vector and set, I am wondering is there any container that provides the same?
class MyContainer{
private:
std::set<Fruits> setFruits;
std::vector<Fruits> arrFruits; // can have indexed access
public:
void collectFruits(const Fruits& fruit){
if(setFruits.find(fruit) == setFruits.end()){
// insert only if it doens't contains
setFruits.insert(fruit);
arrFruits.push_back(fruit);
}
}
};

Alex Stepanov, the creator of STL, once said "Use vectors whenever you can. If you cannot use vectors, redesign your solution so that you can use vectors." With that good advice in mind:
Phase 1: Collect the unique elements
std::vector<Foo> elements;
// add N elements
elements.push_back(foo1);
...
elements.push_back(fooN);
// done collecting: remove dupes
std::sort(elements.begin(), elements.end());
elements.erase(std::unique(elements.begin(), elements.end()),
elements.end());
Phase 2: Well, now we have a vector of our k unique elements, with constant-time index access (with indices 0..k-1).

You could use a boost flat_set.
I don't think it provides an operator[] but it has random access iterators and has a constant time nth() function that returns an iterator with a particular index.
Inserting may invalidate iterators but providing you do all insertions in phase 1 and then all index access in phase 2 you should be ok.

Related

Why isn't std::list::splice a free function?

Splice is a member function that puts part of a linked list into another linked list in constant time.
Why does it need to be a member function? I would expect that I can splice with just iterators into the lists with having a handle on the list itself. Why should the list to be spliced need to be an argument in addition to the start and end iterators?
For testing, I made three lists and mixed up the containers and iterators.
See in the splice below where the containers (empty) don't match the iterators (test0 and test1):
list<int> test0;
list<int> test1;
list<int> empty;
test0.push_back(1);
test0.push_back(2);
test0.push_back(3);
test1.push_back(4);
test1.push_back(5);
test1.push_back(6);
empty.splice(test0.end(), empty, test1.begin(), test1.end());
printf("empty size: %ld\n", empty.size());
printf("test0 size: %ld\n", test0.size());
printf("test1 size: %ld\n", test1.size());
for (const auto& i : test0) {
printf("%d\n", i);
}
Surprisingly, it all worked fine, even the size!
empty size: 0
test0 size: 6
test1 size: 0
1
2
3
4
5
6
I can somewhat understand the iteration working because it just runs until next is null, without regard to the container's front/back pointers. But how did it get the size right? Maybe size is calculated dynamically?
Edit: Based on this explanation of size, size is calculated dynamically for lists, in linear time. So the container is really just a dummy argument. Maybe it's only needed when adding new elements because it has the allocator for making new nodes in the list?

std::list::splice modifies the size of the container. You can't use a container's iterators to modify it's size. You'll notice that there are no free functions in the standard library that can insert new elements into a range using only iterators. At best they can rearrange them.
For example, std::remove shuffles the elements to remove at the end of the container and returns an iterator identifying the range of elements that need to be removed. It can't really remove elements from the range itself.
There are some workarounds, such as by using std::back_inserter, but that works by simulating an unbound range.

I was looking at the std::list::splice implementation the other day.
Typically the iterator abstracts pointers to the list node private implementation. The list nodes contain the _M_prev and _M_next pointers to their neighbor nodes - this is purely implementation dependent. For an empty list, the list contains a sentinal node which serves as both head and tail (again implementation dependent).
So I thought I would try to implement splice using only the list nodes:
void splice(const_iterator pos, list& other,
const_iterator first, const_iterator last)
{
// Hook up last.
last->_M_next = pos->_M_next;
pos->_M_next->_M_prev = last;
// Hook up first.
pos->_M_next = first;
first->_M_prev = pos;
}
I think that looks correct, but I could be wrong.
So based on that implementation, and if size is calculated dynamically, then that would work as you said.
However as François Andrieux pointed out, size being calculated dynamically would be wasteful, and so the container needs to be involved so that the internal size count can be maintained.

Looking for a data structure which provides both random and "sequential" access

This is a programming problem I come across very often and was wondering whether there is a data structure, either in the C++ STL or one I can implement myself which provides both random and sequential access.
An example of why I might need this:
Say there are n types of items, (n = 1000000, for example), and there's a fixed number of each type of item (for example, 0 or 10)
I store these items into an array, where the array index represents the type of the item, and the value represents how many items of that given type are there
Now, I have an algorithm which iterates over all EXISTING items. To obtain these items, it is very wasteful to iterate over the entire array when all the entries are 0, except for i.e. Array[99999] and Array[999999].
Normally, I solve this by using a linked list which saves the indices of all the nonzero array entries. I implement the standard operations in this way:
Insert(int t):
1) If Array[t] == 0, LinkedList.push_back(t);
2) Array[t]++;
Delete(int t):
1) If Array[t] == 1, find and remove t from LinkedList;
2) Array[t]--;
If I want O(1) complexity for the deletion operation, I make the array store containers instead of integers. Each container contains an integer and a pointer to the respective element of the LinkedList, so I don't have to search through the list.
I would love to know whether there is a data structure which formalizes/improves this approach, or whether there's a better way to do this altogether.

Given the following requirements:
Random access
Fast lookups
Fast insertions
Fast removals
Avoid wasted space
then you probably want something called a sparse array. Sparse arrays are not part of the standard library, so you'll have to emulate your own, using a std::map or std::unordered_map. In a sparse array, only non-zero elements occupy space in the collection.
An ordered_map will have O(1) lookups, insertions, and removals, but does not provide ordered iteration. A map will generally have slower operations, but will provide ordered iteration. I'm oversimplifying things when I say std::map is slower, as it depends on the number of elements and usage patterns (a topic probably already discussed in another question).
If you must absolutely have both O(1) lookups and ordered iteration, then you can combine both a map and ordered_map and keep them in sync. At that point, you'll want to consider using Boost.MultiIndex.
Here's a rough sketch showing how you can implement your own sparse vector class:
class SparseVector
{
public:
int get(size_t index) const
{
auto kv = map_.find(index);
return (kv == map_.end()) ? 0 : kv->second;
}
void put(size_t index, int value)
{
if (value == 0)
map_.erase(index);
else
map_.emplace(index, value);
}
// etc...
private:
std::unordered_map<size_t, int> map_;
};
In such a sparse vector class, you can overload operator[] if you wish to allow something like sparseVec[42] = 123.
Linear algebra libraries, such as Eigen or Boost.uBlas, already provide templates for sparse vectors and sparse matrices.

Does the C++ standard library have a set ordered by insertion order?

Does the C++ standard library have an "ordered set" datastructure? By ordered set, I mean something that is exactly the same as the ordinary std::set but that remembers the order in which you added the items to it.
If not, what is the best way to simulate one? I know you could do something like have a set of pairs with each pair storing the number it was added in and the actual value, but I dont want to jump through hoops if there is a simpler solution.

No single, homogeneous data structure will have this property, since it is either sequential (i.e. elements are arranged in insertion order) or associative (elements are arranged in some order depending on value).
The best, clean approach would perhaps be something like Boost.MultiIndex, which allows you to add multiple indexes, or "views", on a container, so you can have a sequential and an ordered index.

Instead of making a std::set of whatever type you're using, why not pass it a std::pair of the object and an index that gets incremented at each insertion?

No, it does not.
Such a container presumably would need two different iterators, one to iterate in the order defined by the order of adding, and another to iterate in the usual set order. There's nothing of that kind in the standard libraries.
One option to simulate it is to have a set of some type that contains an intrusive linked list node in addition to the actual data you care about. After adding an element to the set, append it to the linked list. Before removing an element from the set, remove it from the linked list. This is guaranteed to be OK, since pointers to set elements aren't invalidated by any operation other than removing that element.

I thought the answer is fairly simple, combine set with another iteratable structure (say, queue). If you like to iterate the set in the order that the element been inserted, push the elements in queue first, do your work on the front element, then pop out, put into set.

[Disclaimer: I have given a similar answer to this question already]
If you can use Boost, a very straightforward solution is to use the header-only library Boost.Bimap (bidirectional maps).
Consider the following sample program that will display some dummy entries in insertion order (try out here):
#include <iostream>
#include <string>
#include <type_traits>
#include <boost/bimap.hpp>
using namespace std::string_literals;
template <typename T>
void insertByOrder(boost::bimap<T, size_t>& mymap, const T& element) {
using pos = typename std::remove_reference<decltype(mymap)>::type::value_type;
// We use size() as index, therefore indexing the elements with 0, 1, ...
mymap.insert(pos(element, mymap.size()));
}
int main() {
boost::bimap<std::string, size_t> mymap;
insertByOrder(mymap, "stack"s);
insertByOrder(mymap, "overflow"s);
// Iterate over right map view (integers) in sorted order
for (const auto& rit : mymap.right) {
std::cout << rit.first << " -> " << rit.second << std::endl;
}
}
The funky type alias in insertByOrder() is needed to insert elements into a boost::bimap in the following line (see referenced documentation).

Yes, it's called a vector or list (or array). Just appends to the vector to add element to the set.

Accessing map value by index

If I have a structure like
std::map<string, int> myMap;
myMap["banana"] = 1;
myMap["apple"] = 1;
myMap["orange"] = 1;
How can I access myMap[0]?
I know that the map sorts internally and I'm fine with this, I want to get a value in the map by index. I've tried myMap[0] but I get the error:
Error 1 error C2679: binary '[' : no operator found which takes a right-hand operand of type 'int' (or there is no acceptable conversion)
I realise I could do something like this:
string getKeyAtIndex (int index){
map<string, int>::const_iterator end = myMap.end();
int counter = 0;
for (map<string, int>::const_iterator it = myMap.begin(); it != end; ++it) {
counter++;
if (counter == index)
return it->first;
}
}
But surely this is hugely inefficient? Is there a better way?

Your map is not supposed to be accessed that way, it's indexed by keys not by positions. A map iterator is bidirectional, just like a list, so the function you are using is no more inefficient than accessing a list by position. If you want random access by position then use a vector or a deque.
Your function could be written with help from std::advance(iter, index) starting from begin():
auto it = myMap.begin();
std::advance(it, index);
return it->first;

There may be an implementation specific (non-portable) method to achieve your goal, but not one that is portable.
In general, the std::map is implemented as a type of binary tree, usually sorted by key. The definition of the first element differs depending on the ordering. Also, in your definition, is element[0] the node at the top of the tree or the left-most leaf node?
Many binary trees are implemented as linked lists. Most linked lists cannot be directly accessed like an array, because to find element 5, you have to follow the links. This is by definition.
You can resolve your issue by using both a std::vector and a std::map:
Allocate the object from dynamic memory.
Store the pointer, along with the key, into the std::map.
Store the pointer in the std::vector at the position you want it
at.
The std::map will allow an efficient method to access the object by key.
The std::vector will allow an efficient method to access the object by index.
Storing pointers allows for only one instance of the object instead of having to maintain multiple copies.

Well, actually you can't. The way you found is very unefficient, it have a computational complexity of O(n) (n operations worst case, where n is the number of elements in a map).
Accessing an item in a vector or in an array have complexity O(1) by comparison (constant computational complexity, a single operation).
Consider that map is internally implemented as a red black tree (or avl tree, it depends on the implementation) and every insert, delete and lookup operation are O(log n) worst case (it requires logarithm in base 2 operations to find an element in the tree), that is quite good.
A way you can deal with is to use a custom class that have inside both a vector and a map.
Insertion at the end of the class will be averaged O(1), lookup by name will be O(log n), lookup by index will be O(1) but in this case, removal operation will be O(n).

Previous answer (see comment): How about just myMap.begin();
You could implement a random-access map by using a vector backing-store, which is essentially a vector of pairs. You of course lose all the benefits of the standard library map at that point.

you can use some other map like containers .
keep a size fields can make binary search tree easy to random access .
here is my implementation ...
std style , random access iterator ...
size balanced tree ...
https://github.com/mm304321141/zzz_lib/blob/master/sbtree.h
and B+tree ...
https://github.com/mm304321141/zzz_lib/blob/master/bpptree.h

std::map is an ordered container, but it's iterators don't support random access, but rather bidirectional access. Therefore, you can only access the nth element by navigating all its prior elements. A shorter alternative to your example is using the standard iterator library:
std::pair<const std::string, int> &nth_element = *std::next(myMap.begin(), N);
This has linear complexity, which is not ideal if you plan to frequently access this way in large maps.
An alternative is to use an ordered container that supports random access. For example, boost::container::flat_map provides a member function nth which allows you exactly what you are looking for.

std::map<string,int>::iterator it = mymap.begin() + index;

Using boost::random to select from an std::list where elements are being removed

See this related question on more generic use of the Boost Random library.
My questions involves selecting a random element from an std::list, doing some operation, which could potentally include removing the element from the list, and then choosing another random element, until some condition is satisfied.
The boost code and for loop look roughly like this:
// create and insert elements into list
std::list<MyClass> myList;
//[...]
// select uniformly from list indices
boost::uniform_int<> indices( 0, myList.size()-1 );
boost::variate_generator< boost::mt19937, boost::uniform_int<> >
selectIndex(boost::mt19937(), indices);
for( int i = 0; i <= maxOperations; ++i ) {
int index = selectIndex();
MyClass & mc = myList.begin() + index;
// do operations with mc, potentially removing it from myList
//[...]
}
My problem is as soon as the operations that are performed on an element result in the removal of an element, the variate_generator has the potential to select an invalid index in the list. I don't think it makes sense to completely recreate the variate_generator each time, especially if I seed it with time(0).

I assume that MyClass & mc = myList.begin() + index; is just pseudo code, as begin returns an iterator and I don't think list iterators (non-random-access) support operator+.
As far as I can tell, with variate generator your three basic options in this case are:
Recreate the generator when you remove an item.
Do filtering on the generated index and if it's >= the current size of the list, retry until you get a valid index. Note that if you remove a lot of indexes this could get pretty inefficient as well.
Leave the node in the list but mark it invalid so if you try to operate on that index it safely no-ops. This is just a different version of the second option.
Alternately you could devise a different index generation algorithm that's able to adapt to the container changing size.

You could create your own uniform_contained_int distribution class, that accept a container in its constructor, aggregates a uniform_int, and recreates the uniform_distribution each time the container changes size. Look at the description of the uniform_int which methods you need to implement to create your distribution.

I think you have more to worry about performance-wise. Particularly this:
std::list<MyClass> myList;
myList.begin() + index;
is not a particularly fast way of geting index-th element.
I would transform it into something like this (which should operate on a random subsequence of the list):
X_i ~ U(0, 1) for all i
left <- max_ops
N <- list size
for each element
if X_i < left/N
process element
left--
N--
provided you don't need the random permutation of the elements.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

C++ container set + array functionality - c++

Related

Why isn't std::list::splice a free function?

Looking for a data structure which provides both random and "sequential" access

Does the C++ standard library have a set ordered by insertion order?

Accessing map value by index

Using boost::random to select from an std::list where elements are being removed

Categories

Resources