Removing Vector from 2D vector - c++

I have a 2D vector containing 96 blocks of 600 values, which is what I want.
I need to remove (blocks) that do not contain sufficient energy. I have managed to calculate the energy but do not know which way would be better in removing the (blocks) that do not contain enough energy.
In your opinions would it be better to create a temporary 2D vector, that pushed back the blocks that do contain enough energy and then delete the original vector from memory or...
Should I remove the blocks from the vector at that particular position?

I'm assuming you have this:
typedef std::vector<value> Block;
typedef std::vector< Block > my2dVector;
and you have a function like this:
bool BlockHasInsufficientEnergy( Block const& vec );
and you want to remove the Blocks that do not have sufficient energy.
By remove, do you mean you want there to be fewer than 96 Blocks afterwards? I will assume so.
Then the right way to do this is:
void RemoveLowEnergyBlocks( my2dVector& vec )
{
my2dVector::iterator erase_after = std::remove_if( vec.begin(), vec.end(), BlockHasInsufficientEnergy );
vec.erase( erase_after, vec.end() );
}
the above can be done in one line, but by doing it in two what is going on should be more clear.
remove_if finds everything that passes the 3rd argument condition, and filters it out of the range. It returns the point where the "trash" at the end of the vector lives. We then erase the trash. This is called the remove-erase idiom.

Maybe you'd want to use linked list, or just set filtered-out items as NULL's, or mark them with bool member flag, or keep a separate vector of indexes of filtered items (if you have several filters at once this saves memory).
The solution vary on what are the constraints. Do you need random access? How much object copy takes? Etc.
Also you can take a look at the STL code (this is STL's vector, right?) and check if it does what you ask for - i.e. copying a vector data.

It depends, in part, on how you define better in this case. There may be advantages to either method, but it would be hard to know exactly what they are. Most likely, it is probably somewhat "better", in terms of memory and processing performance, to erase the exact positions you don't want from the vector instead of allocating an entirely new one.
It may be better still to consider using a deque or list for that purpose, since they may avoid large reallocations that the vector is likely to make as it tries to keep a contiguous segment of memory.

Related

"Merge" pointer to vector

Sorry if the title is wrong. I am not sure how to describe my problem in a line.
Suppose I have multiple vectors of type cv::Point and that at each iteration I am to create a new vector that will hold the concatenation of 2 other vectors, what is the most efficient way of doing so? In the example below, vector V4 will hold the elements of vector V1 and V2, and V5 will hold V4 and V3.
I would not like to copy the elements over. It would be better if I could get a single iterator that points to V1 and V2 because the vectors get really large over time.
I have tried this from What is the best way to concatenate two vectors? but im not sure if there are faster options. I also used the boost:join function but it seems to be much slower.
Ideally, this copy/move operation is time constant because the entire application revolves around it.
AB.reserve( A.size() + B.size() ); // preallocate memory
AB.insert( AB.end(), A.begin(), A.end() );
AB.insert( AB.end(), B.begin(), B.end() );
You might want to consider doing something at least vaguely like a deque is normally implemented--essentially a vector of pointers to vectors, that keeps track of the bounds on each component vector, so it can find the appropriate vector, the adjust the index for the vector that contains the element you need.
Depending on the situation, you might also consider attempting to keep the component vectors equal in size to speed computation of the correct index. If they start out with wildly disparate sizes, this probably won't be worthwhile, but if they're close to the same, it may be worth moving some over from one to the next to equalize their sizes. To facilitate this, you'd want to leave an initially-unused section at the beginning of each vector so you can move some elements from one to the beginning of the next quickly and easily.

Does inserting an element in vector by re-sizing the vector every time takes more time?

I got a decision making problem here. In my application, I need to merge two vectors. I can't use stl algorithms since data order is important (It should not be sorted.).
Both the vectors contains the data which can be same sometimes or 75% different in the worst case.
Currently I am confused b/w two approaches,
Approach 1:
a. take an element in the smaller vector.
b. compare it with the elements in bigger one.
c. If element matches then skip it (I don't want duplicates).
d. If element is not found in bigger one, calculate proper position to insert.
e. re-size the bigger one to insert the element (multiple time re-size may happen).
Approach 2:
a. Iterate through vectors to find matched element positions.
b. Resize the bigger one at a go by calculating total size required.
c. Take smaller vector and go to elements which are not-matched.
d. Insert the element in appropriate position.
Kindly help me to choose the proper one. And if there is any better approach or simpler techniques (like stl algorithms), or easier container than vector, please post here. Thank you.
You shouldn't be focusing on the resizes. In approach 1, you should use use vector.insert() so you don't actually need to resize the vector yourself. This may cause reallocations of the underlying buffer to happen automatically, but std::vector is carefully implemented so that the total cost of these operations will be small.
The real problem with your algorithm is the insert, and maybe the search (which you didn't detail). When you into a vector anywhere except at the end, all the elements after the insertion point must be moved up in memory, and this can be quite expensive.
If you want this to be fast, you should build a new vector from your two input vectors, by appending one element at a time, with no inserting in the middle.
Doesn't look like you can do this in better time complexity than O(n.log(n)) because removing duplicates from a normal vector takes n.log(n) time. So using set to remove duplicates might be the best thing you can do.
n here is number of elements in both vectors.
Depending on your actual setup (like if you're adding object pointers to a vector instead of copying values into one), you might get significantly faster results using a std::list. std::list allows for constant time insertion which is going to be a huge performance overhead.
Doing insertion might be a little awkward but is completely do-able by only changing a few pointers (inexpensive) vs insertion via a vector which moves every element out of the way to put the new one down.
If they need to end up as vectors, you can then convert the list to a vector with something like (untested)
std::list<thing> things;
//efficiently combine the vectors into a list
//since list is MUCH better for inserts
//but we still need it as a vector anyway
std::vector<thing> things_vec;
things_vec.reserve(things.size()); //allocate memory
//now move them into the vector
things_vec.insert(
things_vec.begin(),
std::make_move_iterator(things.begin()),
std::make_move_iterator(things.end())
);
//things_vec now has the same content and order as the list with very little overhead

C++ Std queue and vector performance

I've been working with graphs lately, and I am looking into returning a path from a graph. The path needs to be returned as a std vector containing all of the nodes with the starting node first.
I've been looking at two options:
- use the slow vector insert method to add the nodes at the front of the vector
- use a deque to add the nodes to the front (push_front), which is much faster. Then copying the deque to the vector using std::copy
Is there a significant performance boost using one method over the other?
Since you're returning a path, you presumably have an upper bound on its length. Therefore, you can call create a vector, call reserve and later (as #user2079303 writes) call push_back to add vertices to the path.
const auto n = <graph_size>
std::vector<size_t> path;
path.reserve(n)
...
v.push_back(i); // Push whatever you want.
Now the problem is that, at least from the question, it seems like v is in the reversed order. However, you can simply call std::reverse:
std::reverse(std::begin(v), std::end(v));
So, using only a vector:
You're allocating a single data structure instead of two; moreover, using reserve there will be a single memory allocation.
The use of reverse at the end simply replaces the use of copy you would have to do from the deque to the vector.
If you are looking at wrapping a std::vector in a std::queue then the std::queue will push elements to the back of the vector (the fast way).
Even if not however, because a std::vector is contiguous storage, it is possible it will out-perform a std::deque even if you push_font() because it plays well with the CPU cache where shuffling data is fast.
But why not try both and profile the code to see which one performs better?

Keeping vector of iterators of the data

I have a function :
void get_good_items(const std::vector<T>& data,std::vector<XXX>& good_items);
This function should check all data and find items that satisfies a condition and return where they are in good_items.
what is best instead of std::vector<XXX>?
std::vector<size_t> that contains all good indices.
std::vector<T*> that contain a pointers to the items.
std::vector<std::vector<T>::iterator> that contains iterators to the items.
other ??
EDIT:
What will I do with the good_items?
Many things... one of them is to delete them from the vector and save them in other place. maybe something else later
EDIT 2:
One of the most important for me is how will accessing the items in data will be fast depending on the struct of good_items?
EDIT 3:
I have just relized that my thought was wrong. Is not better to keep raw pointers(or smart) as items of the vector so I can keep the real values of the vector (which are pointers) and I do not afraid of heavy copy because they are just pointers?
If you remove items from the original vector, every one of the methods you listed will be a problem.
If you add items to the original vector, the second and third will be problematic. The first one won't be a problem if you use push_back to add items.
All of them will be fine if you don't modify the original vector.
Given that, I would recommend using std::vector<size_t>.
I would go with std::vector<size_t> or std::vector<T*> because they are easier to type. Otherwise, those three vectors are pretty much equivalent, they all identify positions of elements.
std::vector<size_t> can be made to use a smaller type for indexes if you know the limits.
If you expect that there are going to be many elements in this vector, you may like to consider using boost::dynamic_bitset instead to save memory and increase CPU cache utilization. A bit per element, bit position being the index into the original vector.
If you intend to remove the elements that statisfy the predicate, then erase-remove idiom is the simplest solution.
If you intend to copy such elements, then std::copy_if is the simplest solution.
If you intend to end up with two partitions of the container i.e. one container has the good ones and another the bad ones, then std::partition_copy is a good choice.
For generally allowing the iteration of such elements, an efficient solution is returning a range of such iterators that will check the predicate while iterating. I don't think there are such iterators in the standard library, so you'll need to implement them yourself. Luckily boost already has done that for you: http://www.boost.org/doc/libs/release/libs/iterator/doc/filter_iterator.html
The problem you are solving, from my understanding, is the intersection of two sets, and I would go for the solution from standard library: std::set_intersection

An indexed set (for efficient removal in a vector)

I was just about to implement my own class for efficient removal from an array, but thought I'd ask to see if anything like it already exists. What I want is list-like access efficiency but using an array. I want to use an array for reasons of cache coherence and so I don't have to continually be calling a memory allocator (as using std::list would when allocating nodes).
What I thought about doing was creating a class with two arrays. The first is a set of elements and the second array is a set of integers where each integer is a free slot in the first array. So I can add/remove elements from the array fairly easily, without allocating new memory for them, simply by taking an index from the free list and using that for the new element.
Does anything like this exist already? If I do my own, I'll have to also make my own iterators, so you can iterate the set avoiding any empty slots in the array, and I don't fancy that very much.
Thanks.
Note: The kind of operations I want to perform on the set are:
Iteration
Random access of individual elements, by index (or "handle" as I'm thinking of it)
Removal of an element anywhere in the set
Addition of an element to the set (order unimportant)
std::list<T> actually does sound exactly like the theoretically correct data structure for your job, because it supports the four operations you listed, all with optimal space and time complexity. std::list<T>::iterator is a handle that remains valid even if you add/remove other items to/from the list.
It may be that there is a custom allocator (i.e. not std::allocator<T>) that you could use with std::list<T, Allocator> to get the performance you want (internally pool nodes and then don't do runtime allocation everytime you add or remove a node). But that might be overkill.
I would start just using a std::list<T> with the default allocator and then only look at custom allocators or other data structures if you find the performance is too bad for your application.
If maintaining order of elements is irrelevant, use swap-and-pop.
Copy/move the last element over the one to be removed, then pop the back element. Super easy and efficient. You don't even need to bother with special checks for removing the element since it'll Just Work(tm) if you use the standard C++ vector and operations.
*iter = std::move(container.back());
container.pop_back();
I don't recall if pop_back() invalidated iterators on vector, but I don't think it does. If it does, just use indices directly or to recalculate a new valid iterator.
auto delta = iter - container.begin();
// mutate container
iter = container.begin() + delta;
You can use a single array by storing the information about the "empty" slots in the space of the empty slots.
For a contiguous block of empty slots in your array A, say of k slots starting from index n, store (k, n') at location A[n] (where n' is the index of the next block of free indexes). You may have to pack the two ints into a single word if your array is storing word-sized objects.
You're essentially storing a linked-list of free blocks, like a memory-manager might do.
It's a bit of a pain to code, but this'll allow you to allocate a free index in O(1) time, and to iterate through the allocated indices in O(n) time, where n is the number of allocated slots. Freeing an index will be O(n) time though in the worst case: this is the same problem as fragmented memory.
For the first free block, you can either store the index separately, or have the convention that you never allocate A[0] so you can always start a free-index search from there.
std::map might be useful in your case.