Moving from C array to std::map <int, val> operator `-` gotcha - c++

I had a C style array of some values. I needed it to be a map for memory economy (not allocate all at once and keep but allocate as needed)... It can be made into a set or in futher optimization a vector. But I got on one painfull gotcha: val * v; auto val_index = v - val_collection used to give item id... now such code will not compile. will it in std::vector case?

std::distance can give you the distance from the beginning of a container (or other sequence):
std::vector<val>::iterator v = whatever();
size_t val_index = std::distance(val_collection.begin(), v);
For random-access containers (including vector, but not map), you could also use - if you like:
size_t val_index = v - val_collection.begin();

Related

Most optimal way to remove an index from an vector-like data structure without copying data in C++

The problem:
I need to create a simple vector or vector-like data structure that has indexable access, for example:
arr[0] = 'a';
arr[1] = 'b';
//...
arr[25] = 'z';
From this structure, I would like to remove some index, for example index [5]
The actual value at the index does not need to be erased from memory, and the values should not be copied anywhere, I just need the indexes of the data structure to re-arrange afterward, so that:
arr[0] = 'a';
//...
arr[4] = 'e';
arr[5] = 'g';
//...
arr[24] = 'z';
Is std::vector the best data structure to use in this case, and how should I properly remove the index without copying data? Please provide code.
Or is there a more optimal data structure that I can use for this?
Note, I am not intending on accessing the data in any other way except through the index, and I do not need it to be contiguously stored in memory at any time.
What you want is probably covered in one of these:
what has been proposed for std::hive
Hive is a formalisation, extension and optimization of what is typically known as a 'bucket array' container in game programming circles; similar structures exist in various incarnations across the high-performance computing, high performance trading, 3D simulation, physics simulation, robotics, server/client application and particle simulation fields.
std::flat_map in C++23
A flat_map is a kind of associative container that supports unique keys (contains at most one of each key value) and provides for fast retrieval of values of another type T based on the keys.
Since you want the indecies to be updated, then you need a sequential container: vector, list, deque, or similar. vector and deque will copy values around, but list is also slow for virtually all purposes, so none of these is a great fit at first.
Ergo, the best solution is std::vector<std::unique_ptr<Item>>. Then you get very fast accesses, but when removing elements by index, the actual items themselves are not moved, only pointers are rearranged.
Another range-based solution consists of:
enumerate each element of v,
remove(each element)_if it is_contained_in those toBeRemoved based on the key,
and finally transform the result retaining only the value.
auto w = enumerate(v)
| remove_if(is_contained_in(toBeRemoved), key)
| transform(val);
Full example on Compilier Explorer.
I've also used BOOST_HOF_LIFT from Boost.HOF to turn std::get into an object that I can pass around, based on which I've defined key and val.
It is a requirement of the vector data structure that it's data is contiguous in memory, so it is not possible to remove an element without moving memory to fill the gap (other than the final element).
A vector is one of the sequence containers. A sequence container with minimal O(1) cost of element removal is a double-linked list (as implemented by std::list). A list can be efficiently accessed sequentially, but unlike a vector is O(n) for random access.
For a discussion of the time complexity of different operations on various container classes see for example https://dev.to/pratikparvati/c-stl-containers-choose-your-containers-wisely-4lc4
Each container has different performance characteristics for different operations. You need to choose one that best fits the operations you will mostly perform. If sequential access and element insertion and removal are key, a list is appropriate. If random access is more critical, it may be worth the hit of using a vector if insertion/removal are infrequent. It may be that neither is optimal in your application, but for the specific situation detailed in your question, a linked-list fits the bill.
What about using a view-base approach?
Here I show a solution using ranges::any_view. At the bottom of this answer is the complete working code, which shows that the vector-like entity we make up is actually pointing to the very elements of the original std::vector.
Beware, I'm not addressing performance in any way. I don't claim to know much of it, in general, and I know even less of it as related to the cost of the abstractions I'm using below.
The core of the solution is this function for dropping only one element, the one with index i, form the input range:
constexpr auto shoot =
[](std::size_t i, auto&& range)
-> any_view<char const&, category::random_access> {
return concat(range | take(i), range | drop(i + 1));
};
In detail,
given the index i of the item to be removed from the input range,
it creates a range by takeing the first i elements from range (these are the elements before the element of index i),
it creates a range by droping the first i + 1 elements from range (thus retaining the elements after the element of index i),
and finally it concatenates those two ranges
returning the resulting range as an any_view<char const&, category::random_access>, to avoid nesting more and more views for each repeated application of shoot;
category::random_access is what allows a []-based access to the elements.
Given the above, deleting a few elements from the range is as easy as this:
auto w = shoot(3, shoot(9, shoot(10, v)));
However, if you were to call shoot(9, shoot(3, v)) you would be removing the 3rd element first, and then the 9th element of the resulting range, which means that you'd have removed the 3rd and 10th elements with respect to the original vector; this has nothing to do with the range-base approach, but just with providing a function to delete only one element.
Clearly you can build on top of that a function that eliminates all the indices from another range:
sort the indices of elements to be removed,
for-loop on them in reverse (for the reason explained above),
and shoot them one by one (without using any_view we couldn't do view = shoot(n, view); because each application of shoot would change the type of the view):
constexpr auto shoot_many = [](auto&& indices, auto&& range){
any_view<char const&, category::random_access> view{range};
sort(indices);
for (auto const& idx : reverse(indices)) {
view = shoot(idx, view);
}
return view;
};
I have tried another solution for shoot_many, where I would basically index all the elements of the range, filter out those with index contained in indices, and finally transforming to remove the indices. Here's sketch of it:
constexpr auto shoot_many = [](auto&& indices, auto&& range){
std::set<std::size_t> indices_(indices.begin(), indices.end()); // for easier lookup
// (assuming they're not sorted)
auto indexedRange = zip(range, iota(0)); // I pretty sure there's a view doing this already
using RandomAccessViewOfChars
= any_view<char const&, category::random_access>;
return RandomAccessViewOfChars{
indexedRange | filter([indices_](auto&& pair){ return indices_.contains(pair.second); })
| transform([](auto&& pair){ return pair.first; })};
};
This, however, doesn't work because having filter in the pipe means that we don't know the length of the resulting range until the moment we truly traverse it, which in turn means that the output I'm returning doesn't meet the compile-time requirements for a category::random_access any_view. Sad.
Anyway, here's the solution:
#include <assert.h>
#include <cstddef>
#include <functional>
#include <iostream>
#include <memory>
#include <range/v3/algorithm/sort.hpp>
#include <range/v3/range/conversion.hpp>
#include <range/v3/view/any_view.hpp>
#include <range/v3/view/concat.hpp>
#include <range/v3/view/drop.hpp>
#include <range/v3/view/iota.hpp>
#include <range/v3/view/reverse.hpp>
#include <range/v3/view/take.hpp>
#include <set>
#include <vector>
using namespace ranges;
using namespace ranges::views;
// utility to drop one element from a range and give you back a view
constexpr auto shoot =
[](std::size_t i, auto&& range)
-> any_view<char const&, category::random_access> {
return concat(range | take(i), range | drop(i + 1));
};
constexpr auto shoot_many = [](auto&& indices, auto&& range){
any_view<char const&, category::random_access> view{range};
sort(indices);
for (auto const& idx : reverse(indices)) {
view = shoot(idx, view);
}
return view;
};
int main() {
// this is the input
std::vector<char> v = iota('a') | take(26) | to_vector;
// alternavively, = {'a', 'b', ...)
// remove a few elements by index
auto w = shoot_many(std::vector<int>{3, 10, 9}, v);
for (std::size_t i = 0, j = 0; i != v.size(); ++i, ++j) {
if (i == 10 || i == 9 || i == 3) {
--j;
std::cout << v[i] << ',' << '-' << std::endl;
} else {
std::cout << v[i] << ',' << w[j] << std::endl;
assert( v[i] == w[j]);
assert(&v[i] == &w[j]);
}
}
}

Erase by value in a vector of shared pointers

I want to erase by value from a vector of shared ptr of string (i.e vector<shared_ptr<string>>) . Is there any efficient way of doing this instead of iterating the complete vector and then erasing from the iterator positions.
#include <bits/stdc++.h>
using namespace std;
int main()
{
vector<shared_ptr<string>> v;
v.push_back(make_shared<string>("aaa"));
int j = 0,ind;
for(auto i : v) {
if((*i)=="aaa"){
ind = j;
}
j++;
}
v.erase(v.begin()+ind);
}
Also I dont want to use memory for a map ( value vs address).
Try like that (Erase-Remove Idiom):
string s = "aaa";
auto cmp = [s](const shared_ptr<string> &p) { return s == *p; };
v.erase(std::remove_if(v.begin(), v.end(), cmp), v.end());
There is no better way then O(N) - you have to find the object in a vector, and you have to iterate the vector once to find it. Does not really matter if it is a pointer or any object.
The only way to do better is to use a different data structure, which provides O(1) finding/removal. A set is the first thing that comes to mind, but that would indicate your pointers are unique. A second option would be a map, such that multiple pointers pointing to the same value exist at the same hash key.
If you do not want to use a different structure, then you are out of luck. You could have an additional structure hashing the pointers, if you want to retain the vector but also have O(1) access.
For example if you do use a set, and define a proper key - hasher or key_equal. probably hasher is enough defined as the hash for *elementInSet, so each pointer must point to a distinct string for example:
struct myPtrHash {
size_t operator()(const std::shared_ptr<std::string>& p) const {
//Maybe we want to add checks/throw a more meaningful error if p is invalid?
return std::hash<std::string>()(*p);
}
};
such that your set is:
std::unordered_set<std::shared_ptr<std::string>,myPtrHash > pointerSet;
Then erasing would be O(1) simply as:
std::shared_ptr<std::string> toErase = make_shared("aaa");
pointerSet.erase(toErase)
That said, if you must use a vector a more idomatic way to do this is to use remove_if instead of iterating yourself - this will not improve time complexity though, just better practice.
Don't include bits/stdc++.h, and since you're iterating through the hole vector, you should be using std::for_each with a lambda.

Clamp iterator value to end() with std::min

I have a vector with n Strings in it. Now lets say I want to "page" or "group" those strings in a map.
typedef std::vector<std::string> TStringVec;
TStringVec myVec;
//.. fill myVecwith n elements
typedef std::map<int, TStringVec>> TPagedMap;
TPagedMap myMap;
int ItemsPerPage = 3 // or whatever
int PagesRequired = std::ceil(myVec.size() / nItemsPerPage);
for (int Page = 0; Page < NumPagesMax; ++Page)
{
TStringVec::const_iterator Begin = myVec.begin() + (ItemsPerPage * Page);
TStringVec::const_iterator End = myVec.begin() + ItemsPerPage * (Page+1);
myMap[Page] = TStringVec(Begin, End);
}
One can easily spot the problem here. When determining the end iterator, I risk leaving the allocated space by the vector.
Quick example: 5 elements in the vector, ItemsPerPage is 3. That means we need a total of 2 pages in the map to group all elements.
Now when hitting the last iteration, begin is pointing at myVec[3] but end is "pointing" to myVec[6]. Remember, myVec only has 5 elements.
Could this case be safely handled by swapping
TStringVec::const_iterator End = myVec.begin() + ItemsPerPage * (Page+1);
with
TStringVec::const_iterator End = std::min(myVec.begin() + ItemsPerPage * (Page+1), myVec.end());
It compiles of course, and it seems to work. But I'm not sure if this can be considered a safe thing to do. Any advice or a definitive answer?
I think the question is... Is a value "past" .end() guaranteed to be larger than the adress returned by .end()?
Thanks in advance.
EDIT: of course, a if check beforehand could solve the problem, but I'm looking for a more elegant solution.
There are two things wrong with your proposed replacement:
You are potentially creating an iterator past the end-iterator, which is UB.
For some unfathomable reason, you want to stop at end() - 1?? Everywhere else, you properly use half-open ranges.
What you want is more like
auto End = myVec.cbegin() + std::min(ItemsPerPage * (Page + 1), myVec.size());
Also take note that I used auto to avoid needlessly specifying complicated type-names.
As an aside, using std::ceil on an integer is not very useful conceptually, at least the compiler will likely optimize out the round-trip through double.
With range-v3, you may use chunk view:
std::vector<std::string> myVec = /*...*/;
const int ItemsPerPage = 3 // or whatever
std::map<int, std::vector<std::string>>> myMap;
int counter = 0;
for (const auto& page : myVec | ranges::view::chunk(ItemsPerPage)) {
myMap[counter++] = page;
}
Demo

Accessing elements of a list of lists in C++

I have a list of lists like this:
std::list<std::list<double> > list;
I filled it with some lists with doubles in them (actually quite a lot, which is why I am not using a vector. All this copying takes up a lot of time.)
Say I want to access the element that could be accesed like list[3][3] if the list were not a list but a vector or two dimensional array. How would I do that?
I know that accessing elements in a list is accomplished by using an iterator. I couldn't figure out how to get out the double though.
double item = *std::next(std::begin(*std::next(std::begin(list), 3)), 3);
Using a vector would usually have much better performance, though; accessing element n of a list is O(n).
If you're concerned about performance of splicing the interior of the container, you could use deque, which has operator[], amortized constant insertion and deletion from either end, and linear time insertion and deletion from the interior.
For C++03 compilers, you can implement begin and next yourself:
template<typename Container>
typename Container::iterator begin(Container &container)
{
return container.begin();
}
template<typename Container>
typename Container::const_iterator begin(const Container &container)
{
return container.begin();
}
template<typename T, int n>
T *begin(T (&array)[n])
{
return &array[0];
}
template<typename Iterator>
Iterator next(Iterator it, typename std::iterator_traits<Iterator>::difference_type n = 1)
{
std::advance(it, n);
return it;
}
To actually answer your question, you should probably look at std::advance.
To strictly answer your question, Joachim Pileborg's answer is the way to go:
std::list<std::list<double> >::iterator it = list.begin();
std::advance(it, 3);
std::list<double>::iterator it2 = (*it).begin();
std::advance(it2, 3);
double d = *it2;
Now, from your question and further comments it is not clear whether you always add elements to the end of the lists or they can be added anywhere. If you always add to the end, vector<double> will work better. A vector<T> does not need to be copied every time its size increases; only whenever its capacity increases, which is a very different thing.
In addition to this, using reserve(), as others said before, will help a lot with the reallocations. You don't need to reserve for the combined size of all vectors, but only for each individual vector. So:
std::vector<std::vector<double> > v;
v.reserve(512); // If you are inserting 400 vectors, with a little extra just in case
And you would also reserve for each vector<double> inside v. That's all.
Take into account that your list of lists will take much more space. For each double in the internal list, it will have to allocate at least two additional pointers, and also two additional pointers for each list inside the global least. This means that the total memory taken by your container will be roughly three times that of the vector. And all this allocation and management also takes extra runtime.

C++ STL Vectors: Get iterator from index?

So, I wrote a bunch of code that accesses elements in an stl vector by index[], but now I need to copy just a chunk of the vector. It looks like vector.insert(pos, first, last) is the function I want... except I only have first and last as ints. Is there any nice way I can get an iterator to these values?
Try this:
vector<Type>::iterator nth = v.begin() + index;
way mentioned by #dirkgently ( v.begin() + index ) nice and fast for vectors
but std::advance( v.begin(), index ) most generic way and for random access iterators works constant time too.
EDIT
differences in usage:
std::vector<>::iterator it = ( v.begin() + index );
or
std::vector<>::iterator it = v.begin();
std::advance( it, index );
added after #litb notes.
Also; auto it = std::next(v.begin(), index);
Update: Needs a C++11x compliant compiler
You can always use std::advance to move the iterator a certain amount of positions in constant time:
std::vector<int>::iterator it = myvector.begin();
std::advance(it, 2);
Actutally std::vector are meant to be used as C tab when needed. (C++ standard requests that for vector implementation , as far as I know - replacement for array in Wikipedia)
For instance it is perfectly legal to do this folowing, according to me:
int main()
{
void foo(const char *);
sdt::vector<char> vec;
vec.push_back('h');
vec.push_back('e');
vec.push_back('l');
vec.push_back('l');
vec.push_back('o');
vec.push_back('/0');
foo(&vec[0]);
}
Of course, either foo must not copy the address passed as a parameter and store it somewhere, or you should ensure in your program to never push any new item in vec, or requesting to change its capacity. Or risk segmentation fault...
Therefore in your exemple it leads to
vector.insert(pos, &vec[first_index], &vec[last_index]);