Efficient Input Handling with std::vector buffers - c++

I'm trying to find a fast way to do input but I've learned that using STL for such purposes might be slow.
I have a callback that fires whenever I get Keyboard input.
It creates an object with (int _key, int _state, int _life)
Everytime I receive this callback, I push_back the object to my std::vector;
Every frame I check the top of this vector and remove the "dead" input.
The vector can be polled for whatever input is valid at that moment which means it will be searched frequently.
Optimizations:
-All the memory should be contiguous so although Link Lists are better for dynamic allocation, should I stick with STL's vector? I'm always adding to the top and removing from the bottom so what data struct should I use?
-I was thinking of having a buffer(second vector) that continuously receives new input from the callback, then each frame copy the data from that vector to the top of my active input vector. Since the active vector will be polled, would this increase performance since it won't be wasting time getting added to during the loop?
Basically I'm trying to squeeze as much performance from this vector as possible and I could use some help.

What you are describing, adding data in one end, removing in another, is the archetypical description of a queue. This is implemented in the standard library with the std::queue class.
This queue class is a so-called container adapter, meaning it uses another container for the actual storage. By default it uses std::deque, but that container doesn't keep its data in a contiguous memory area. However you can declare a std::queue with almost any other standard container, like std::vector (which is the only container guaranteed to store data in a contiguous memory area):
std::queue<int, std::vector> my_queue_with_vector;
my_queue_with_vector.push(1);
my_queue_with_vector.push(2);
my_queue_with_vector.push(3);
while (!my_queue_with_vector.empty())
{
std::cout << my_queue_with_vector.top() << '\n';
my_queue_with_vector.pop(); // Remove top element in the queue
}

std::deque makes the best container. It guarantees O(1) pop_front and push_back() and has random access and a good degree (although not completely) of cache behaviour.
But if you absolutely must have complete contiguity, then you'll need to look into a custom circular buffer container (there's one in Boost IIRC), as pop_front() on a vector is rather expensive.
Edit: As another poster has pointed out, keyboard input is so infrequent even for a very fast typist that I find it difficult to believe that this is a bottleneck in your system.

Sounds like you want a std::deque. Adjacents elements are nearly always allocated continuously in memory and it has constant time insertion and removal at begin and end.

Short answer: It doesn't matter. std::vector and std::list can easily handle millions of insert operations per second, but most typists don't type faster than 10 characters per second on a keyboard.
Long answer: push_back and erase are usually very cheap on a vector if the vector is small (< 100) and the copy/swap operations of the objects stored on the vector are cheap. The allocations used to insert into or remove from an std:list are usually more expensive. If it becomes an issue, measure the cost.
An std::deque also has allocations and depending on the implementation is likely more expensive than the vector in your case, if my assumption that your vector rarely if ever contains more than 10 items - all of which are cheap to copy - is correct.

Related

Efficient linked list in C++?

This document says std::list is inefficient:
std::list is an extremely inefficient class that is rarely useful. It performs a heap allocation for every element inserted into it, thus having an extremely high constant factor, particularly for small data types.
Comment: that is to my surprise. std::list is a doubly linked list, so despite its inefficiency in element construction, it supports insert/delete in O(1) time complexity, but this feature is completely ignored in this quoted paragraph.
My question: Say I need a sequential container for small-sized homogeneous elements, and this container should support element insert/delete in O(1) complexity and does not need random access (though supporting random access is nice, it is not a must here). I also don't want the high constant factor introduced by heap allocation for each element's construction, at least when the number of element is small. Lastly, iterators should be invalidated only when the corresponding element is deleted. Apparently I need a custom container class, which might (or might not) be a variant of doubly linked list. How should I design this container?
If the aforementioned specification cannot be achieved, then perhaps I should have a custom memory allocator, say, bump pointer allocator? I know std::list takes an allocator as its second template argument.
Edit: I know I shouldn't be too concerned with this issue, from an engineering standpoint - fast enough is good enough. It is just a hypothetical question so I don't have a more detailed use case. Feel free to relax some of the requirements!
Edit2: I understand two algorithms of O(1) complexity can have entirely different performance due to the difference in their constant factors.
Your requirements are exactly those of std::list, except that you've decided you don't like the overhead of node-based allocation.
The sane approach is to start at the top and only do as much as you really need:
Just use std::list.
Benchmark it: is the default allocator really too slow for your purposes?
No: you're done.
Yes: goto 2
Use std::list with an existing custom allocator such as the Boost pool allocator
Benchmark it: is the Boost pool allocator really too slow for your purposes?
No: you're done.
Yes: goto 3
Use std::list with a hand-rolled custom allocator finely tuned to your unique needs, based on all the profiling you did in steps 1 and 2
Benchmark as before etc. etc.
Consider doing something more exotic as a last resort.
If you get to this stage, you should have a really well-specified SO question, with lots of detail about exactly what you need (eg. "I need to squeeze n nodes into a cacheline" rather than "this doc said this thing is slow and that sounds bad").
PS. The above makes two assumptions, but both are worth investigation:
as Baum mit Augen points out, it's not sufficient to do simple end-to-end timing, because you need to be sure where your time is going. It could be the allocator itself, or cache misses due to the memory layout, or something else. If something's slow, you still need to be sure why before you know what ought to change.
your requirements are taken as a given, but finding ways to weaken requirements is often the easiest way to make something faster.
do you really need constant-time insertion and deletion everywhere, or only at the front, or the back, or both but not in the middle?
do you really need those iterator invalidation constraints, or can they be relaxed?
are there access patterns you can exploit? If you're frequently removing an element from the front and then replacing it with a new one, could you just update it in-place?
As an alternative, you can use a growable array and handle the links explicitly, as indexes into the array.
Unused array elements are put in a linked list using one of the links. When an element is deleted, it is returned to the free list. When the free list is exhausted, grow the array and use the next element.
For the new free elements, you have two options:
append them to the free list at once,
append them on demand, based on the number of elements in the free list vs. the array size.
The requirement of not invalidating iterators except the one on a node being deleted is forbidding to every container that doesn't allocate individual nodes and is much different from e.g. list or map.
However, I've found that in almost every case when I thought that this was necessary, it turned out with a little discipline I could just as well do without. You might want to verify if you can, you would benefit greatly.
While std::list is indeed the "correct" thing if you need something like a list (for CS class, mostly), the statement that it is almost always the wrong choice is, unluckily, exactly right. While the O(1) claim is entirely true, it's nevertheless abysmal in relation to how actual computer hardware works, which gives it a huge constant factor. Note that not only are the objects that you iterate randomly placed, but the nodes that you maintain are, too (yes, you can somehow work around that with an allocator, but that is not the point). On the average, you have two one guaranteed cache misses for anything you do, plus up to two one dynamic allocations for mutating operations (one for the object, and another one for the node).
Edit: As pointed out by #ratchetfreak below, implementations of std::list commonly collapse the object and node allocation into one memory block as an optimization (akin to what e.g. make_shared does), which makes the average case somewhat less catastrophic (one allocation per mutation and one guaranteed cache miss instead of two).
A new, different consideration in this case might be that doing so may not be entirely trouble-free either. Postfixing the object with two pointers means reversing the direction while dereference which may interfere with auto prefetch.
Prefixing the object with the pointers, on the other hand, means you push the object back by two pointers' size, which will mean as much as 16 bytes on a 64-bit system (that might split a mid-sized object over cache line boundaries every time). Also, there's to consider that std::list cannot afford to break e.g. SSE code solely because it adds a clandestine offset as special surprise (so for example the xor-trick would likely not be applicable for reducing the two-pointer footprint). There would likely have to be some amount of "safe" padding to make sure objects added to a list still work the way they should.
I am unable to tell whether these are actual performance problems or merely distrust and fear from my side, but I believe it's fair to say that there may be more snakes hiding in the grass than one expects.
It's not for no reason that high-profile C++ experts (Stroustrup, notably) recommend using std::vector unless you have a really good reason not to.
Like many people before, I've tried to be smart about using (or inventing) something better than std::vector for one or the other particular, specialized problem where it seems you can do better, but it turns out that simply using std::vector is still almost always the best, or second best option (if std::vector happens to be not-the-best, std::deque is usually what you need instead).
You have way fewer allocations than with any other approach, way less memory fragmentation, way fewer indirections, and a much more favorable memory access pattern. And guess what, it's readily available and just works.
The fact that every now and then inserts require a copy of all elements is (usually) a total non-issue. You think it is, but it's not. It happens rarely and it is a copy of a linear block of memory, which is exactly what processors are good at (as opposed to many double-indirections and random jumps over memory).
If the requirement not to invalidate iterators is really an absolute must, you could for example pair a std::vector of objects with a dynamic bitset or, for lack of something better, a std::vector<bool>. Then use reserve() appropriately so reallocations do not happen. When deleting an element, do not remove it but only mark it as deleted in the bitmap (call the destructor by hand). At appropriate times, when you know that it's OK to invalidate iterators, call a "vacuum cleaner" function that compacts both the bit vector and the object vector. There, all unforeseeable iterator invalidations gone.
Yes, that requires maintaining one extra "element was deleted" bit, which is annoying. But a std::list must maintain two pointers as well, in additon to the actual object, and it must do allocations. With the vector (or two vectors), access is still very efficient, as it happens in a cache-friendly way. Iterating, even when checking for deleted nodes, still means you move linearly or almost-linearly over memory.
std::list is a doubly linked list, so despite its inefficiency in element construction, it supports insert/delete in O(1) time complexity, but this feature is completely ignored in this quoted paragraph.
It's ignored because it's a lie.
The problem of algorithmic complexity is that it generally measures one thing. For example, when we say that insertion in a std::map is O(log N), we mean that it performs O(log N) comparisons. The costs of iterating, fetching cache lines from memory, etc... are not taken into account.
This greatly simplifies analysis, of course, but unfortunately does not necessarily map cleanly to real-world implementation complexities. In particular, one egregious assumption is that memory allocation is constant-time. And that, is a bold-faced lie.
General purpose memory allocators (malloc and co), do not have any guarantee on the worst-case complexity of memory allocations. The worst-case is generally OS-dependent, and in the case of Linux it may involve the OOM killer (sift through the ongoing processes and kill one to reclaim its memory).
Special purpose memory allocators could potentially be made constant time... within a particular range of number of allocations (or maximum allocation size). Since Big-O notation is about the limit at infinity, it cannot be called O(1).
And thus, where the rubber meets the road, the implementation of std::list does NOT in general feature O(1) insertion/deletion, because the implementation relies on a real memory allocator, not an ideal one.
This is pretty depressing, however you need not lose all hopes.
Most notably, if you can figure out an upper-bound to the number of elements and can allocate that much memory up-front, then you can craft a memory allocator which will perform constant-time memory allocation, giving you the illusion of O(1).
Use two std::lists: One "free-list" that's preallocated with a large stash of nodes at startup, and the other "active" list into which you splice nodes from the free-list. This is constant time and doesn't require allocating a node.
The new slot_map proposal claim O(1) for insert and delete.
There is also a link to a video with a proposed implementation and some previous work.
If we knew more about the actual structure of the elements there might be some specialized associative containers that are much better.
I would suggest doing exactly what #Yves Daoust says, except instead of using a linked list for the free list, use a vector. Push and pop the free indices on the back of the vector. This is amortized O(1) insert, lookup, and delete, and doesn't involve any pointer chasing. It also doesn't require any annoying allocator business.
The simplest way I see to fulfill all your requirements:
Constant-time insertion/removal (hope amortized constant-time is okay for insertion).
No heap allocation/deallocation per element.
No iterator invalidation on removal.
... would be something like this, just making use of std::vector:
template <class T>
struct Node
{
// Stores the memory for an instance of 'T'.
// Use placement new to construct the object and
// manually invoke its dtor as necessary.
typename std::aligned_storage<sizeof(T), alignof(T)>::type element;
// Points to the next element or the next free
// element if this node has been removed.
int next;
// Points to the previous element.
int prev;
};
template <class T>
class NodeIterator
{
public:
...
private:
std::vector<Node<T>>* nodes;
int index;
};
template <class T>
class Nodes
{
public:
...
private:
// Stores all the nodes.
std::vector<Node> nodes;
// Points to the first free node or -1 if the free list
// is empty. Initially this starts out as -1.
int free_head;
};
... and hopefully with a better name than Nodes (I'm slightly tipsy and not so good at coming up with names at the moment). I'll leave the implementation up to you but that's the general idea. When you remove an element, just do a doubly-linked list removal using the indices and push it to the free head. The iterator doesn't invalidate since it stores an index to a vector. When you insert, check if the free head is -1. If not, overwrite the node at that position and pop. Otherwise push_back to the vector.
Illustration
Diagram (nodes are stored contiguously inside std::vector, we simply use index links to allow skipping over elements in a branchless way along with constant-time removals and insertions anywhere):
Let's say we want to remove a node. This is your standard doubly-linked list removal, except we use indices instead of pointers and you also push the node to the free list (which just involves manipulating integers):
Removal adjustment of links:
Pushing removed node to free list:
Now let's say you insert to this list. In that case, you pop off the free head and overwrite the node at that position.
After insertion:
Insertion to the middle in constant-time should likewise be easy to figure out. Basically you just insert to the free head or push_back to the vector if the free stack is empty. Then you do your standard double-linked list insertion. Logic for the free list (though I made this diagram for someone else and it involves an SLL, but you should get the idea):
Make sure you properly construct and destroy the elements using placement new and manual calls to the dtor on insertion/removal. If you really want to generalize it, you'll also need to think about exception-safety and we also need a read-only const iterator.
Pros and Cons
The benefit of such a structure is that it does allow very rapid insertions/removals from anywhere in the list (even for a gigantic list), insertion order is preserved for traversal, and it never invalidates the iterators to element which aren't directly removed (though it will invalidate pointers to them; use deque if you don't want pointers to be invalidated). Personally I'd find more use for it than std::list (which I practically never use).
For large enough lists (say, larger than your entire L3 cache as a case where you should definitely expect a huge edge), this should vastly outperform std::vector for removals and insertions to/from the middle and front. Removing elements from vector can be quite fast for small ones, but try removing a million elements from a vector starting from the front and working towards the back. There things will start to crawl while this one will finish in the blink of an eye. std::vector is ever-so-slightly overhyped IMO when people start using its erase method to remove elements from the middle of a vector spanning 10k elements or more, though I suppose this is still preferable over people naively using linked lists everywhere in a way where each node is individually allocated against a general-purpose allocator while causing cache misses galore.
The downside is that it only supports sequential access, requires the overhead of two integers per element, and as you can see in the above diagram, its spatial locality degrades if you constantly remove things sporadically.
Spatial Locality Degradation
The loss of spatial locality as you start removing and inserting a lot from/to the middle will lead to zig-zagging memory access patterns, potentially evicting data from a cache line only to go back and reload it during a single sequential
loop. This is generally inevitable with any data structure that allows removals from the middle in constant-time while likewise allowing that space to be reclaimed while preserving the order of insertion. However, you can restore spatial locality by offering some method or you can copy/swap the list. The copy constructor can copy the list in a way that iterates through the source list and inserts all the elements which gives you back a perfectly contiguous, cache-friendly vector with no holes (though doing this will invalidate iterators).
Alternative: Free List Allocator
An alternative that meets your requirements is implement a free list conforming to std::allocator and use it with std::list. I never liked reaching around data structures and messing around with custom allocators though and that one would double the memory use of the links on 64-bit by using pointers instead of 32-bit indices, so I'd prefer the above solution personally using std::vector as basically your analogical memory allocator and indices instead of pointers (which both reduce size and become a requirement if we use std::vector since pointers would be invalidated when vector reserves a new capacity).
Indexed Linked Lists
I call this kind of thing an "indexed linked list" as the linked list isn't really a container so much as a way of linking together things already stored in an array. And I find these indexed linked lists exponentially more useful since you don't have to get knee-deep in memory pools to avoid heap allocations/deallocations per node and can still maintain reasonable locality of reference (great LOR if you can afford to post-process things here and there to restore spatial locality).
You can also make this singly-linked if you add one more integer to the node iterator to store the previous node index (comes free of memory charge on 64-bit assuming 32-bit alignment requirements for int and 64-bit for pointers). However, you then lose the ability to add a reverse iterator and make all iterators bidirectional.
Benchmark
I whipped up a quick version of the above since you seem interested in 'em: release build, MSVC 2012, no checked iterators or anything like that:
--------------------------------------------
- test_vector_linked
--------------------------------------------
Inserting 200000 elements...
time passed for 'inserting': {0.000015 secs}
Erasing half the list...
time passed for 'erasing': {0.000021 secs}
time passed for 'iterating': {0.000002 secs}
time passed for 'copying': {0.000003 secs}
Results (up to 10 elements displayed):
[ 11 13 15 17 19 21 23 25 27 29 ]
finished test_vector_linked: {0.062000 secs}
--------------------------------------------
- test_vector
--------------------------------------------
Inserting 200000 elements...
time passed for 'inserting': {0.000012 secs}
Erasing half the vector...
time passed for 'erasing': {5.320000 secs}
time passed for 'iterating': {0.000000 secs}
time passed for 'copying': {0.000000 secs}
Results (up to 10 elements displayed):
[ 11 13 15 17 19 21 23 25 27 29 ]
finished test_vector: {5.320000 secs}
Was too lazy to use a high-precision timer but hopefully that gives an idea of why one shouldn't use vector's linear-time erase method in critical paths for non-trivial input sizes with vector above there taking ~86 times longer (and exponentially worse the larger the input size -- I tried with 2 million elements originally but gave up after waiting almost 10 minutes) and why I think vector is ever-so-slightly-overhyped for this kind of use. That said, we can turn removal from the middle into a very fast constant-time operation without shuffling the order of the elements, without invalidating indices and iterators storing them, and while still using vector... All we have to do is simply make it store a linked node with prev/next indices to allow skipping over removed elements.
For removal I used a randomly shuffled source vector of even-numbered indices to determine what elements to remove and in what order. That somewhat mimics a real world use case where you're removing from the middle of these containers through indices/iterators you formerly obtained, like removing the elements the user formerly selected with a marquee tool after he his the delete button (and again, you really shouldn't use scalar vector::erase for this with non-trivial sizes; it'd even be better to build a set of indices to remove and use remove_if -- still better than vector::erase called for one iterator at a time).
Note that iteration does become slightly slower with the linked nodes, and that doesn't have to do with iteration logic so much as the fact that each entry in the vector is larger with the links added (more memory to sequentially process equates to more cache misses and page faults). Nevertheless, if you're doing things like removing elements from very large inputs, the performance skew is so epic for large containers between linear-time and constant-time removal that this tends to be a worthwhile exchange.
I second #Useless' answer, particularly PS item 2 about revising requirements. If you relax the iterator invalidation constraint, then using std::vector<> is Stroustrup's standard suggestion for a small-number-of-items container (for reasons already mentioned in the comments). Related questions on SO.
Starting from C++11 there is also std::forward_list.
Also, if standard heap allocation for elements added to the container is not good enough, then I would say you need to look very carefully at your exact requirements and fine tune for them.
I just wanted to make a small comment about your choice. I'm a huge fan of vector because of it's read speeds, and you can direct access any element, and do sorting if need be. (vector of class/struct for example).
But anyways I digress, there's two nifty tips I wanted to disclose.
With vector inserts can be expensive, so a neat trick, don't insert if you can get away with not doing it. do a normal push_back (put at the end) then swap the element with one you want.
Same with deletes. They are expensive. So swap it with the last element, delete it.
Thanks for all the answers.
This is a simple - though not rigorous - benchmark.
// list.cc
#include <list>
using namespace std;
int main() {
for (size_t k = 0; k < 1e5; k++) {
list<size_t> ln;
for (size_t i = 0; i < 200; i++) {
ln.insert(ln.begin(), i);
if (i != 0 && i % 20 == 0) {
ln.erase(++++++++++ln.begin());
}
}
}
}
and
// vector.cc
#include <vector>
using namespace std;
int main() {
for (size_t k = 0; k < 1e5; k++) {
vector<size_t> vn;
for (size_t i = 0; i < 200; i++) {
vn.insert(vn.begin(), i);
if (i != 0 && i % 20 == 0) {
vn.erase(++++++++++vn.begin());
}
}
}
}
This test aims to test what std::list claims to excel at - O(1) inserting and erasing. And, because of the positions I ask to insert/delete, this race is heavily skewed against std::vector, because it has to shift all the following elements (hence O(n)), while std::list doesn't need to do that.
Now I compile them.
clang++ list.cc -o list
clang++ vector.cc -o vector
And test the runtime. The result is:
time ./list
./list 4.01s user 0.05s system 91% cpu 4.455 total
time ./vector
./vector 1.93s user 0.04s system 78% cpu 2.506 total
std::vector has won.
Compiled with optimization O3, std::vector still wins.
time ./list
./list 2.36s user 0.01s system 91% cpu 2.598 total
time ./vector
./vector 0.58s user 0.00s system 50% cpu 1.168 total
std::list has to call heap allocation for each element, while std::vector can allocate heap memory in batch (though it might be implementation-dependent), hence std::list's insert/delete has a higher constant factor, though it is O(1).
No wonder this document says
std::vector is well loved and respected.
EDIT: std::deque does even better in some cases, at least for this task.
// deque.cc
#include <deque>
using namespace std;
int main() {
for (size_t k = 0; k < 1e5; k++) {
deque<size_t> dn;
for (size_t i = 0; i < 200; i++) {
dn.insert(dn.begin(), i);
if (i != 0 && i % 20 == 0) {
dn.erase(++++++++++dn.begin());
}
}
}
}
Without optimization:
./deque 2.13s user 0.01s system 86% cpu 2.470 total
Optimized with O3:
./deque 0.27s user 0.00s system 50% cpu 0.551 total

Keeping constant number of elements in vector

I am trying to figure out the fastest way to keep constant number of elements in vector (or maybe there is some ready-made structure that do it automatically).
In my app I am adding multiple elements to the vector and I need to do it fast. Because of vector's self resizing at some point it is significantly decreasing overall application speed. What I was thinking about is to do something like this:
if(my_vector.size() < 300)
my_vector.push_back(new_element);
else
{
my_vector.pop_front();
my_vector.push_back(new_element);
}
but after first few tests I've realized that it might not be the best solution, because I am not sure if pop_front() and later push_back() doesn't still need to resize at some point.
Is there any other solution for this?
Use a std::queue. Its underlying container is a std::deque, but like a stack a queue's interface is specifically for FIFO operations (push_back, pop_front), which is exactly what you're doing in your situation. Here's why a deque is better for this situation:
The storage of a deque is automatically expanded and contracted as
needed. Expansion of a deque is cheaper than the expansion of a
std::vector because it does not involve copying of the existing
elements to a new memory location.
The complexity (efficiency) of common operations on deques is as
follows:
Random access - constant O(1)
Insertion or removal of elements at the end or beginning - constant O(1)
To implement a fixed-size container with push_back and pop_front and minimal memory shuffling, use a std::array of the appropriate size. To keep track of things you'll need a front index for pushing elements and a back index for popping things. To push, store the element at the location given by front_index, then increment front_index and take the remainder modulo the container size. To pop, read the element at the location given by back_index, and adjust that index the same way you did front_index. With that in place, the code in the question will do what you need.
You just need to reserve the capacity to a reasonable number. The vector will not automatically shrink. So it only will grow and, possibly, stop at some point.
You might be also interested in the resize policies. For example Facebook made a substantial research and created own implementation of the vector - folly::fbvector which has better performance than std::vector

c++ container very efficient at adding elements to the end

I have been running a c++ program for scientific purpose and I am now looking at optimizing it.
The bottleneck seems to be a function where I need to stack pairs of integers. Their number is impossible to know from the start, and I have been using a std::vector of a custom struct holding two ints. Is there a more efficient data container for repetitively adding elements at the end? Should I use it with two ints instead of a pair or custom structure?
edit:
After timing and profiling my program, I can say that, for my use, vector is slightly faster than deque (by only 3%). My layman conclusion is that the CPU makes a good use of the contiguity of the data. Optimization is more than ever a sorcery for me! And to those it may help: I actually improved significantly my running time by switching from the STL C++ 11 random number generator to the BOOST one.
The cost of the logarithmic number of reallocations you have to do is arguably irrelevant. However, you may consider using a std::deque that guarantees O(1) insertion at the front and at the end. You would lose contiguity, but some cache friendliness is kept. A deque is usually a decent trade-off, especially if you need to pop from the front.
Consider also estimating the number of elements the vector will store, and using reserve. Be careful however not to waste too much memory or you'll get the opposite effect.
As already mentioned by gd1, a std::deque (double-ended queue) allows fast insertion and deletion at both ends. Additionally, insertion and deletion at either end of a deque never invalidates pointers or references. In your case you could for example use std::deque with std::pair (defined in <utility>) to suit your needs:
// Small example:
deque<pair<int, int>> deq;
deq.push_back({1234, 4321});
cout << deq.at(0).first << " " << deq.at(0).second;
// using 'at' above, should normally be inside a try block as it may throw
// out_of_range exception

Would I see a performance gain using std::map instead of vector<pair<string, string> >?

I currently have some code where I am using a vector of pair<string,string>. This is used to store some data from XML parsing and as such, the process is quite slow in places. In terms of trying to speed up the entire process I was wondering if there would be any performance advantage in switching from vector<pair<string,string> > to std::map<string,string> ? I could code it up and run a profiler, but I thought I would see if I could get an answer that suggests some obvious performance gain first. I am not required to do any sorting, I simply add items to the vector, then at a later stage iterate over the contents and do some processing - I have no need for sorting or anything of that nature. I am guessing that perhaps I would not get any performance gain, but I have never actually used a std::map before so I don't know without asking or coding it all up.
No. If (as you say) you are simply iterating over the collection, you will see a small (probably not measurable) performance decrease by using a std::map.
Maps are for accessing a value by its key. If you never do this, map is a bad choice for a container.
If you are not modifying your vector<pair<string,string> > - just iterating it over and over - you will get perfomance degradation by using map. This is because typical map is organized with binary tree of objects, each of which can be allocated in different memory blocks (unless you write own allocator). Plus, each node of map manages pointers to neighbor objects, so it's time and memory overhead, too. But, search by key is O(log) operation. On other side, vector holds data in one block, so processor cache usually feels better with it. Searching in vector is actually O(N) operation which is not so good but acceptable. Search in sorted vector can be upgraded to O(log) using lower_bound etc functions.
It depends on operations you doing on this data. If you make many searches - probably its better to use hashing container like unordered_map since search by key in this containers is O(1) operation. For iterating, as mentioned, vector is faster.
Probably it is worth to replace string in your pair, but this highly depends on what you hold there and how access container.
The answer depends on what you are doing with these data structures and what the size of them is. If you have thousands of elements in your std::vector<std::pair<std::stringm std::string> > and you keep searching for the first element over and over, using a std::map<std::string, std::string> may improve the performance (you might want to consider using std::unordered_map<std::string, std::string> for this use case, instead). If your vectors are relatively small and you don't trying to insert elements into the middle too often, using vectors may very well be faster. If you just iterate over the elements, vectors are a lot faster than maps: iterations isn't really one of their strength. Maps are good at looking things up, assuming the number of elements isn't really small because otherwise a linear search over a vector is still faster.
The best way to determine where the time is spent is to profile the code: it is often not entirely clear up front where the time is spent. Frequently, the suspected hot-spots are actually non-problematic and other areas show unexpected performance problems. For example, you might be passing your objects my value rather than by reference at some obscure place.
If your usage pattern is such that you perform many insertions before performing any lookups, then you might benefit from implementing a "lazy" map where the elements are sorted on demand (i.e. when you acquire an iterator, perform a lookup, etc).
As C++ say std::vector sort items in a linear memory, so first it allocate a memory block with an initial capacity and then when you want to insert new item into vector it will check if it has more room or not and if not it will allocate a new buffer with more space, copy construct all items into new buffer and then delete source buffer and set it to new one.
When you just start inserting items into vector and you have lot of items you suffer from too many reallocation, copy construction and destructor call.
In order to solve this problem, if you now count of input items (not exact but its usual length) you can reserve some memory for the vector and avoid reallocation and every thing.
if you have no idea about the size you can use a collection like std::list witch never reallocate its internal items.

Quickest Queue Container (C++)

I'm using queue's and priority queues, through which I plan on pumping a lot of data quite quickly.
Therefore, I want my q's and pq's to be responsive to additions and subtractions.
What are the relative merits of using a vector, list, or deque as the underlying container?
Update:
At the time of writing, both Mike Seymour and Steve Townsend's answers below are worth reading. Thanks both!
The only way to be sure how the choice effects performance is to measure it, in a situation similar to your expected use cases. That said, here are some observations:
For std::queue:
std::deque is usually the best choice; it supports all the necessary operations in constant time, and allocates memory in chunks as it grows.
std::list also supports the necessary operations, but may be slower due to more memory allocations; in special circumstances, you might be able to get good results by allocating from a dedicated object pool, but that's not entirely straightforward.
std::vector can't be used as it doesn't have a pop_front() operation; such an operation would be slow, as it would have to move all the remaining elements.
A potentially faster, but less flexible, alternative is to implement a circular buffer over a fixed-size array (e.g. std::array, or a std::vector that you don't resize). You'll need to deal with the case of it filling up, either by reporting an error, or allocating a larger buffer and copying all the data.
For std::priority_queue:
std::vector is usually the best choice; it grows exponentially (reducing the number of memory allocations), and is a simple data structure that's very fast to access - an iterator can be implemented simply as a wrapper around a pointer.
std::deque may be slower as it typically grows linearly (requiring more memory allocation), and access is more complicated than with a vector.
std::list can't be used as it doesn't provide random access.
To summarise - the defaults are usually the best choice, but if speed really is important, then measure the alternatives.
I would use std::queue for your basic queue which is (by default at least) a wrapper on deque. Do something more special-purpose if that does not work for you.
std::priority_queue also exists (over vector by default) but the added semantics make it more likely that you could have to roll your own here, depending on perf observed for your particular access pattern.
vector has storage characteristics which make it very ill-suited to removal from front of the dataset. A lot of shuffling down to be done every time you pop_front. For a simple queue, avoid this.
list is likely to be too expensive for any high-hit queue, because by contract it has to offer function you don't need. It could be a candidate for use as a priority queue but my instinct is always to trust the STL.
vector would implement a stack as your fast insertion is at the end and fast removal is also at the end. If you want a FIFO queue, vector would be the wrong implementation to use.
deque or list both provide constant time insertion at either end. list is good for LRU caches where you want to move elements out of the middle fast and where you want your iterators to remain valid no matter how much you move them about. deque is generally used when insertions and deletions are at the end.
The main thing I need to ask about your collection is whether they are accessed by multiple threads. I sort-of assume they are, in which case one of your primary aims is to reduce locking. This is best done if you at least have a multi_push and multi_get feature so that you can put more than one element on at a time without any locking.
There are also lock-free containers or semi-lock-free containers.
You will probably find that your locking strategy is more important than any performance in the collection itself as long as your operations are all constant-time.