I want a simplest visualization for STL containers - c++

I have read many things about STL containers (Sequence Containers, Associative Container, Container adapters) but I'm still don't understand the internal form for each of them.
I want visualization form supported with pictures to every container in STL, (ex: what is the form of data inside the container) if is possible.
I hope that my question be clear.

The internal working of the containers is implementation independent, if the performance (Big-O) is correct and the preconditions and post-conditions of the container and his methods (ex: default constructed = empty container) are meet, the implementation is standard.
Example in VS:
1-vector of T - buffer (T[], T*). Normal internal representation is 3 pointer, begin and end of the buffer and current element.
2-deque of T - implemented in similar way to vector, but inserted element not need to be one after the other, because the container support inserting before and after with O(1) normally if you create deque it begin empty as all container, suppose a value is inserted from back, in this case is inserted in position 0, if a value is inserted from front is inserted in position capacity of deque (until the two index begin and end encounter, in this case grow is needed, and copy, like vector, and order of the value updated). Normal internal representation is pointer to begin and end of the buffer, first and last element of the deque.
3-list of T - double linked list (node with next, previous and T)
4-asociative of T (set, multiset, map, multimap) - Red Black Tree with value T or pair in case of maps like. Normal internal representation is left and right Node and value (used for sort)
5-unordered asociative of T (unordered_set, multiset ...): hash map implementation new in C++11.
6-Adaptor - the underlying representation is one of the previous containers (ex: stack is an adapter that is implemented by default with deque), in instantiation time one can change the underlying representation to tweak performance Big-O according to constrain (it may require Random Access, Forward, etc...).
This specifics can change in any version of the libraries, when any optimization is founded, and in any other case. For example some implementation of vector have pointer to begin, capacity (instead of end), and current element.

http://www.cs.usfca.edu/~galles/visualization/Algorithms.html
This will help a little bit, although the underlying data structure can be vary upon implementation

Related

vector vs. list from stl - remove method

std::list has a remove method, while the std::vector doesn't. What is the reason for that?
std::list<>::remove is a physical removal method, which can be implemented by physically destroying list elements that satisfy certain criteria (by physical destruction I mean the end of element's storage duration). Physical removal is only applicable to lists. It cannot be applied to arrays, like std::vector<>. It simply is not possible to physically end storage duration of an individual element of an array. Arrays can only be created and destroyed as a whole. This is why std::vector<> does not have anything similar to std::list<>::remove.
The universal removal method applicable to all modifiable sequences is what one might call logical removal: the target elements are "removed" from the sequence by overwriting their values with values of elements located further down in the sequence. I.e. the sequence is shifted and compacted by copying the persistent data "to the left". Such logical removal is implemented by freestanding functions, like std::remove. Such functions are applicable in equal degree to both std::vector<> and std::list<>.
In cases where the approach based on immediate physical removal of specific elements applies, it will work more efficiently than the generic approach I referred above as logical removal. That is why it was worth providing it specifically for std::list<>.
std::list::remove removes all items in a list that match the provided value.
std::list<int> myList;
// fill it with numbers
myList.remove(10); // physically removes all instances of 10 from the list
It has a similar function, std::list::remove_if, which allows you to specify some other predicate.
std::list::remove (which physically removes the elements) is required to be a member function as it needs to know about the memory structure (that is, it must update the previous and next pointers for each item that needs to be updated, and remove the items), and the entire function is done in linear time (a single iteration of the list can remove all of the requested elements without invalidating any of the iterators pointing to items that remain).
You cannot physically remove a single element from a std::vector. You either reallocate the entire vector, or you move every element after the removed items and adjust the size member. The "cleanest" implementation of that set of operations would be to do
// within some instance of vector
void vector::remove(const T& t)
{
erase(std::remove(t), end());
}
Which would require std::vector to depend on <algorithm> (something that is currently not required).
As the "sorting" is needed to remove the items without multiple allocations and copies being required. (You do not need to sort a list to physically remove elements).
Contrary to what others are saying, it has nothing to do with speed. It has to do with the algorithm needing to know how the data is stored in memory.
As a side note: This is also a similar reason why std::remove (and friends) do not actually remove the items from the container they operate on; they just move all the ones that are not going to be removed to the "front" of the container. Without the knowledge of how to actually remove an object from a container, the generic algorithm cannot actually do the removing.
Consider the implementation details of both containers. A vector has to provide a continuous memory block for storage. In order to remove an element at index n != N (with N being the vector's length), all elements from n+1 to N-1 need to be moved. The various functions in the <algorithm> header implement that behavior, like std::remove or std::remove_if. The advantage of these being free-standing functions is that they can work for any type that offers the needed iterators.
A list on the other hand, is implemented as a linked list structure, so:
It's fast to remove an element from anywhere
It's impossible to do it as efficiently using iterators (since the internal structure has to be known and manipulated).
In general in STL the logic is "if it can be done efficiently - then it's a class member. If it's inefficient - then it's an outside function"
This way they make the distinction between "correct" (i.e. "efficient") use of classes vs. "incorrect" (inefficient) use.
For example, random access iterators have a += operator, while other iterators use the std::advance function.
And in this case - removing elements from an std::list is very efficient as you don't need to move the remaining values like you do in std::vector
It's all about efficiency AND reference/pointer/iterator validity. list items can be removed without disturbing any other pointers and iterators. This is not true for a vector and other containers in all but the most trivial cases. Nothing prevents use the external strategy, but you have a superior options.. That said this fellow said it better than I could on a duplicate question
From another poster on a duplicate question:
The question is not why std::vector does not offer the operation, but
rather why does std::list offer it. The design of the STL is focused
on the separation of the containers and the algorithms by means of
iterators, and in all cases where an algorithm can be implemented
efficiently in terms of iterators, that is the option.
There are, however, cases where there are specific operations that can
be implemented much more efficiently with knowledge of the container.
That is the case of removing elements from a container. The cost of
using the remove-erase idiom is linear in the size of the container
(which cannot be reduced much), but that hides the fact that in the
worst case all but one of those operations are copies of the objects
(the only element that matches is the first), and those copies can
represent quite a big hidden cost.
By implementing the operation as a method in std::list the complexity
of the operation will still be linear, but the associated cost for
each one of the elements removed is very low, a couple of pointer
copies and releasing of a node in memory. At the same time, the
implementation as part of the list can offer stronger guarantees:
pointers, references and iterators to elements that are not erased do
not become invalidated in the operation.
Another example of an algorithm that is implemented in the specific
container is std::list::sort, that uses mergesort that is less
efficient than std::sort but does not require random-access iterators.
So basically, algorithms are implemented as free functions with
iterators unless there is a strong reason to provide a particular
implementation in a concrete container.
std::list is designed to work like a linked list. That is, it is designed (you might say optimized) for constant time insertion and removal ... but access is relatively slow (as it typically requires traversing the list).
std::vector is designed for constant-time access, like an array. So it is optimized for random access ... but insertion and removal are really only supposed to be done at the "tail" or "end", elsewhere they're typically going to be much slower.
Different data structures with different purposes ... therefore different operations.
To remove an element from a container you have to find it first. There's a big difference between sorted and unsorted vectors, so in general, it's not possible to implement an efficient remove method for the vector.

Difference between std::remove and erase for vector?

I have a doubt that I would like to clarify in my head. I am aware of the different behavior for std::vector between erase and std::remove where the first physically removes an element from the vector, reducing size, and the other just moves an element leaving the capacity the same.
Is this just for efficiency reasons? By using erase, all elements in a std::vector will be shifted by 1, causing a large amount of copies; std::remove does just a 'logical' delete and leaves the vector unchanged by moving things around. If the objects are heavy, that difference might matter, right?
Is this just for efficiency reason? By using erase all elements in a std::vector will be shifted by 1 causing a large amount of copies; std::remove does just a 'logical' delete and leaves the vector unchanged by moving things around. If the objects are heavy that difference mihgt matter, right?
The reason for using this idiom is exactly that. There is a benefit in performance, but not in the case of a single erasure. Where it does matter is if you need to remove multiple elements from the vector. In this case, the std::remove will copy each not removed element only once to its final location, while the vector::erase approach would move all of the elements from the position to the end multiple times. Consider:
std::vector<int> v{ 1, 2, 3, 4, 5 };
// remove all elements < 5
If you went over the vector removing elements one by one, you would remove the 1, causing copies of the remainder elements that get shifted (4). Then you would remove 2 and shift all remainding elements by one (3)... if you see the pattern this is a O(N^2) algorithm.
In the case of std::remove the algorithm maintains a read and write heads, and iterates over the container. For the first 4 elements the read head will be moved and the element tested, but no element is copied. Only for the fifth element the object would be copied from the last to the first position, and the algorithm will complete with a single copy and returning an iterator to the second position. This is a O(N) algorithm. The later std::vector::erase with the range will cause destruction of all the remainder elements and resizing the container.
As others have mentioned, in the standard library algorithms are applied to iterators, and lack knowledge of the sequence being iterated. This design is more flexible than other approaches on which algorithms are aware of the containers in that a single implementation of the algorithm can be used with any sequence that complies with the iterator requirements. Consider for example, std::remove_copy_if, it can be used even without containers, by using iterators that generate/accept sequences:
std::remove_copy_if(std::istream_iterator<int>(std::cin),
std::istream_iterator<int>(),
std::ostream_iterator<int>(std::cout, " "),
[](int x) { return !(x%2); } // is even
);
That single line of code will filter out all even numbers from standard input and dump that to standard output, without requiring the loading of all numbers into memory in a container. This is the advantage of the split, the disadvantage is that the algorithms cannot modify the container itself, only the values referred to by the iterators.
std::remove is an algorithm from the STL which is quite container agnostic. It requires some concept, true, but it has been designed to also work with C arrays, which are static in sizes.
std::remove simply returns a new end() iterator to point to one past the last non-removed element (the number of items from the returned value to end() will match the number of items to be removed, but there is no guarantee their values are the same as those you were removing - they are in a valid but unspecified state). This is done so that it can work for multiple container types (basically any container type that a ForwardIterator can iterate through).
std::vector::erase actually sets the new end() iterator after adjusting the size. This is because the vector's method actually knows how to handle adjusting it's iterators (the same can be done with std::list::erase, std::deque::erase, etc.).
remove organizes a given container to remove unwanted objects. The container's erase function actually handles the "removing" the way that container needs it to be done. That is why they are separate.
I think it has to do with needing direct access to the vector itself to be able to resize it. std::remove only has access to the iterators, so it has no way of telling the vector "Hey, you now have fewer elements".
See yves Baumes answer as to why std::remove is designed this way.
Yes, that's the gist of it. Note that erase is also supported by the other standard containers where its performance characteristics are different (e.g. list::erase is O(1)), while std::remove is container-agnostic and works with any type of forward iterator (so it works for e.g. bare arrays as well).
Kind of. Algorithms such as remove work on iterators (which are an abstraction to represent an element in a collection) which do not necessarily know which type of collection they are operating on - and therefore cannot call members on the collection to do the actual removal.
This is good because it allows algorithms to work generically on any container and also on ranges that are subsets of the entire collection.
Also, as you say, for performance - it may not be necessary to actually remove (and destroy) the elements if all you need is access to the logical end position to pass on to another algorithm.
Standard library algorithms operate on sequences. A sequence is defined by a pair of iterators; the first points at the first element in the sequence, and the second points one-past-the-end of the sequence. That's all; algorithms don't care where the sequence comes from.
Standard library containers hold data values, and provide a pair of iterators that specify a sequence for use by algorithms. They also provide member functions that may be able to do the same operations as an algorithm more efficiently by taking advantage of the internal data structure of the container.

Understanding Iterators in the STL

What exactly are iterators in the C++ STL?
In my case, I'm using a list, and I don't understand why you have to make an iterator std::list <int>::const_iterator iElementLocator; to display contents of the list by the derefrence operator:
cout << *iElementLocator; after assigning it to maybe list.begin().
Please explain what exactly an iterator is and why I have to dereference or use it.
There are three building blocks in the STL:
Containers
Algorithms
Iterators
At the conceptual level containers hold data. That by itself isn't very useful, because you want to do something with the data; you want to operate on it, manipulate it, query it, play with it. Algorithms do exactly that. But algorithms don't hold data, they have no data -- they need a container for this task. Give a container to an algorithm and you have an action going on.
The only problem left to solve is how does an algorithm traverse a container, from a technical point of view. Technically a container can be a linked list, or it can be an array, or a binary tree, or any other data structure that can hold data. But traversing an array is done differently than traversing a binary tree. Even though conceptually all an algorithm wants is to "get" one element at a time from a container, and then work on that element, the operation of getting the next element from a container is technically very container-specific.
It appears as if you'd need to write the same algorithm for each container, so that each version of the algorithm has the correct code for traversing the container. But there's a better solution: ask the container to return an object that can traverse over the container. The object would have an interface algorithms know. When an algorithm asks the object to "get the next element" the object would comply. Because the object came directly from the container it knows how to access the container's data. And because the object has an interface the algorithm knows, we need not duplicate an algorithm for each container.
This is the iterator.
The iterator here glues the algorithm to the container, without coupling the two. An iterator is coupled to a container, and an algorithm is coupled to the iterator's interface. The source of the magic here is really template programming. Consider the standard copy() algorithm:
template<class In, class Out>
Out copy(In first, In last, Out res)
{
while( first != last ) {
*res = *first;
++first;
++res;
}
return res;
}
The copy() algorithm takes as parameters two iterators templated on the type In and one iterator of type Out. It copies the elements starting at position first and ending just before position last, into res. The algorithm knows that to get the next element it needs to say ++first or ++res. It knows that to read an element it needs to say x = *first and to write an element it needs to say *res = x. That's part of the interface algorithms assume and iterators commit to. If by mistake an iterator doesn't comply with the interface then the compiler would emit an error for calling a function over type In or Out, when the type doesn't define the function.
I'm being lazy. So I would not type describing what an iterator is and how they're used, especially when there're already lots of articles online that you can read yourself.
Here are few that I can quote for a start, proividing the links to complete articles:
MSDN says,
Iterators are a generalization of
pointers, abstracting from their
requirements in a way that allows a
C++ program to work with different
data structures in a uniform manner.
Iterators act as intermediaries
between containers and generic
algorithms. Instead of operating on
specific data types, algorithms are
defined to operate on a range
specified by a type of iterator. Any
data structure that satisfies the
requirements of the iterator may then
be operated on by the algorithm. There
are five types or categories of
iterator [...]
By the way, it seems the MSDN has taken the text in bold from C++ Standard itself, specifically from the section ยง24.1/1 which says
Iterators are a generalization of
pointers that allow a C + + program to
work with different data structures
(containers) in a uniform manner. To
be able to construct template
algorithms that work correctly and
efficiently on different types of data
structures, the library formalizes not
just the interfaces but also the
semantics and complexity assumptions
of iterators. All iterators i support
the expression *i, resulting in a
value of some class, enumeration, or
built-in type T, called the value type
of the iterator. All iterators i for
which the expression (*i).m is
well-defined, support the expression
i->m with the same semantics as
(*i).m. For every iterator type X for
which equality is defined, there is a
corresponding signed integral type
called the difference type of the
iterator.
cplusplus says,
In C++, an iterator is any object
that, pointing to some element in a
range of elements (such as an array or
a container), has the ability to
iterate through the elements of that
range using a set of operators (at
least, the increment (++) and
dereference (*) operators).
The most obvious form of iterator is a
pointer [...]
And you can also read these:
What Is an Iterator?
Iterators in the Standard C++ Library
Iterator (at wiki entry)
Have patience and read all these. Hopefully, you will have some idea what an iterator is, in C++. Learning C++ requires patience and time.
An iterator is not the same as the container itself. The iterator refers to a single item in the container, as well as providing ways to reach other items.
Consider designing your own container without iterators. It could have a size function to obtain the number of items it contains, and could overload the [] operator to allow you to get or set an item by its position.
But "random access" of that kind is not easy to implement efficiently on some kinds of container. If you obtain the millionth item: c[1000000] and the container internally uses a linked list, it will have to scan through a million items to find the one you want.
You might instead decide to allow the collection to remember a "current" item. It could have functions like start and more and next to allow you to loop through the contents:
c.start();
while (c.more())
{
item_t item = c.next();
// use the item somehow
}
But this puts the "iteration state" inside the container. This is a serious limitation. What if you wanted to compare each item in the container with every other item? That requires two nested loops, both iterating through all the items. If the container itself stores the position of the iteration, you have no way to nest two such iterations - the inner loop will destroy the working of the outer loop.
So iterators are an independent copy of an iteration state. You can begin an iteration:
container_t::iterator i = c.begin();
That iterator, i, is a separate object that represents a position within the container. You can fetch whatever is stored at that position:
item_t item = *i;
You can move to the next item:
i++;
With some iterators you can skip forward several items:
i += 1000;
Or obtain an item at some position relative to the position identified by the iterator:
item_t item = i[1000];
And with some iterators you can move backwards.
And you can discover if you've reached beyond the contents of the container by comparing the iterator to end:
while (i != c.end())
You can think of end as returning an iterator that represents a position that is one beyond the last position in the container.
An important point to be aware of with iterators (and in C++ generally) is that they can become invalid. This usually happens for example if you empty a container: any iterators pointing to positions in that container have now become invalid. In that state, most operations on them are undefined - anything could happen!
An iterator is to an STL container what a pointer is to an array. You can think of them as pointer objects to STL containers. As pointers, you will be able to use them with the pointer notation (e.g. *iElementLocator, iElementLocator++). As objects, they will have their own attributes and methods (http://www.cplusplus.com/reference/std/iterator).
There already exists a lot of good explanations of iterators. Just google it.
One example.
If there is something specific you don't understand come back and ask.
I'd suggest reading about operator overloading in C++. This will tell why * and -> can mean essentially anything. Only then you should read about the iterators. Otherwise it might appear very confusing.

What are Containers/Adapters? C++

What are containers/adapters? I have basic knowledge of C++ and its sub-topics like (class/templates/STL).
Can anyone please explain in layman's language and give me a practical example of the application of containers/adapters?
A container is a specific data structure that contains data, usually in an unbounded amount. Each container type has limitations on how to access, add, or remove data efficiently.
Below are a few examples of containers using STL classes.
Sequence Containers
Here are the sequence containers, meaning the data is reliably ordered (that is, there is a front and a back to them. I do NOT mean that they automatically sort themselves!).
A vector is a bit like a flexibly-sized array. Vectors are random-access, meaning you can access any element with integer index in constant time (just like an array). You can add or remove from the back of the vector in amortized constant time as well. Anywhere else, though, and you're probably looking at having to recopy potentially all of the elements.
A deque, or double-ended queue, is like a vector but you can add to the front or the back in amortized constant time. You can still access elements in constant time, but deque elements are not guaranteed to be contiguous in memory like vectors or arrays.
A list is a linked list, meaning data which are linked together by pointers. You have constant-time access to the beginning and the end, but in order to get anywhere in the middle you need to iterate through the list. You can add elements anywhere in the list in constant time, though, if you already have a pointer to one of the nearby nodes.
Associative Containers
These are associative containers, meaning that elements are no longer ordered but instead have associations with each other used for determining uniqueness or mappings:
A set is a container with unique elements. You can only add one of each element to a set; any other additions are ignored.
A multiset is like a set, but you can put more than one of an element in. The multiset keeps track of how many of each kind of element are in the structure.
A map, also known as an associative array, is a structure in which you insert key-value pairs; then you can look up any value by supplying the key. So it's a bit like an array that you can access with a string index (key), or any other kind of index. (If you insert another key-value pair and the key already exists, then you just overwrite the value for the original key.)
A multimap is a map that allows for insertion of multiple values for the same key. When you do a key lookup, you get back a container with all the values in it.
Container Adapters
Container adapters, on the other hand, are interfaces created by limiting functionality in a pre-existing container and providing a different set of functionality. When you declare the container adapters, you have an option of specifying which sequence containers form the underlying container. These are:
A stack is a container providing Last-In, First-Out (LIFO) access. Basically, you remove elements in the reverse order you insert them. It's difficult to get to any elements in the middle. Usually this goes on top of a deque.
A queue is a container providing First-In, First-Out (FIFO) access. You remove elements in the same order you insert them. It's difficult to get to any elements in the middle. Usually this goes on top of a deque.
A priority_queue is a container providing sorted-order access to elements. You can insert elements in any order, and then retrieve the "lowest" of these values at any time. Priority queues in C++ STL use a heap structure internally, which in turn is basically array-backed; thus, usually this goes on top of a vector.
See this reference page for more information, including time complexity for each of the operations and links to detailed pages for each of the container types.
<joke>C++ is technical and hard to understand :-D</joke>
Containers are data types from STL that can contain data.
Example: vector as a dynamic array
Adapters are data types from STL that adapt a container to provide specific interface.
Example: stack providing stack interface on top of the chosen container
(side note: both are actually templates not data types, but the definition looks better this way)
The technical definition of "container" from The SGI STL documentation is pretty good:
A Container is an object that stores other objects (its elements), and that has methods for accessing its elements. In particular, every type that is a model of Container has an associated iterator type that can be used to iterate through the Container's elements.
So, a container is a data structure that holds ("contains") a collection of objects of some type. The key idea is that there are different types of containers, each of which stores objects in a different way and provides different performance characteristics, but all of them have a standard interface so that you can swap one out for another easily and without modifying too much of the code that uses the container. The idea is that the containers are designed to be interchangeable as much as possible.
The container adapters are classes that provide a subset of a container's functionality but may provide additional functionality that makes it easier to use containers for certain scenarios. For example, you could easily use std::vector or std::deque for a stack data structure and call push_back, back, and pop_back as the stack interface; std::stack provides an interface that can use a std::vector or std::deque or other sequence container but provides the more standard push, top, and pop member functions for accessing members.

Best c++ container to strip items away from?

I have a list of files (stored as c style strings) that I will be performing a search on and I will remove those files that do not match my parameters. What is the best container to use for this purpose? I'm thinking Set as of now. Note the list of files will never be larger than when it is initialized. I'll only be deleting from the container.
I would definitely not use a set - you don't need to sort it so no point in using a set. Set is implemented as a self-balancing tree usually, and the self-balancing algorithm is unnecessary in your case.
If you're going to be doing this operation once, I would use a std::vector with remove_if (from <algorithm>), followed by an erase. If you haven't used remove_if before, what it does is go through and shifts all the relevant items down, overwriting the irrelevant ones in the process. You have to follow it with an erase to reduce the size of the vector. Like so:
std::vector<const char*> files;
files.erase(remove_if(files.begin(), files.end(), RemovePredicate()), files.end());
Writing the code to do the same thing with a std::list would be a little bit more difficult if you wanted to take advantage of its O(1) deletion time property. Seeing as you're just doing this one-off operation which will probably take so little time you won't even notice it, I'd recommend doing this as it's the easiest way.
And to be honest, I don't think you'll see that much difference in terms of speed between the std::list and std::vector approaches. The vector approach only copies each value once so it's actually quite fast, yet takes much less space. In my opinion, going up to a std::list and using three times the space is only really justified if you're doing a lot of addition and deletion throughout the entire application's lifetime.
Elements in a std::set must be unique, so unless the filenames are globally unique this won't suit your needs.
I would probably recommend a std::list.
From SGI:
A vector is a Sequence that supports random access to elements, constant time insertion and removal of elements at the end, and linear time insertion and removal of elements at the beginning or in the middle.
A list is a doubly linked list. That is, it is a Sequence that supports both forward and backward traversal, and (amortized) constant time insertion and removal of elements at the beginning or the end, or in the middle.
An slist is a singly linked list: a list where each element is linked to the next element, but not to the previous element. That is, it is a Sequence that supports forward but not backward traversal, and (amortized) constant time insertion and removal of elements.
Set is a Sorted Associative Container that stores objects of type Key. Set is a Simple Associative Container, meaning that its value type, as well as its key type, is Key. It is also a Unique Associative Container, meaning that no two elements are the same.
Multiset is a Sorted Associative Container that stores objects of type Key. Multiset is a Simple Associative Container, meaning that its value type, as well as its key type, is Key. It is also a Multiple Associative Container, meaning that two or more elements may be identical.
Hash_set is a Hashed Associative Container that stores objects of type Key. Hash_set is a Simple Associative Container, meaning that its value type, as well as its key type, is Key. It is also a Unique Associative Container, meaning that no two elements compare equal using the Binary Predicate EqualKey.
Hash_multiset is a Hashed Associative Container that stores objects of type Key. Hash_multiset is a simple associative container, meaning that its value type, as well as its key type, is Key. It is also a Multiple Associative Container, meaning that two or more elements may compare equal using the Binary Predicate EqualKey.
(Some containers have been omitted.)
I would go with hash_set if all you care to have is a container which is fast and doesn't contain multiple identical keys. hash_multiset if you do, set or multiset if you want the strings to be sorted, or list or slist if you want the strings to keep their insertion order.
After you've built your list/set, use remove_if to filter out your items based on your criteria.
I will start by tossing out vector since it is a sequential container. Set, I believe is close to being sequential or hashed. I would avoid that. A doublely-linked list, the stl list is one of these, has two pointers and the node. Basically, to remove an item, it breaks the chain then rejoins the two parts with the pointers.
Assuming your search criteria does not depend on the filename (ie. you search for content, file sizes etc.), so you cannot make use of a set, I'd go with a list. It will take you O(N) to construct the whole list, and O(1) per one delete.
If you wanted to make it even faster, and didn't insist on using ready-made STL containers, I would:
use a vector
delete using false delete, ie. marking an item as deleted
when the ratio of deleted/all items raises above certain threshold, I would filter the items with remove_if
This should give you the best space/time/cache performance. (Although you should profile it to be sure)
You can use two lists/vectors/whatever:
using namespace std;
vector<const char *> files;
files.push_back("foo.bat");
files.push_back("bar.txt");
vector<const char *> good_files; // Maybe reserve elements given files.size()?
for(vector<const char *>::const_iterator i = files.begin(); i != files.end(); ++i) {
if(file_is_good(*i)) {
new_files.push_back(*i);
}
}