caching multiple key hash - c++

I want to do some caching in my project.
Let my API is int foo(int a, float b, float c, int d, char e)
Now in my project, there is lot of calls to above time consuming API with repeating values of a, b, c ,d and e. Now I want to store return value of this function with these arguments as keys.
suppose my call sequence is
foo(23, 3.45, 4.5, 90, 'd') // returns 1000, so I need to store it in cache as (23,3.45, 4.5, 90, 'd')->1000
foo(30, 1.2, 3.5, 100, 'e') // returns 2000, so I need to store it in cache as (30, 1.2, 3.5, 100, 'e')->2000
foo(23, 3.45, 4.5, 90, 'd') // No need to call this API, I just check in my cache value associated with
//(23, 3.45, 4.5, 90, 'd'), which is already stored as 1000
What should be best strategy to implement above in C++? which data structure would be best to make cache table?

One key note: caching is difficult.
Often times people think that caching will solve all their issues, but they forget to take into account the issues that it brings to the table. An unmanaged cache is nothing else than a giant memory leak. Two strategies of note:
Size limit: whenever the cache is full, adding a new entry cause another entry to be evicted (you therefore need a scheme to decide when to evict an entry)
Time limit: entries are flushed out after a certain time elapsed
Usually, when we hear about caches we think LRU (Least Recently Used) Cache. Those cache are limited by size, and the least recently used entry is evicted when the cache is full. Note: might cause contention on multi-threading because read-only accesses in fact imply modifying a value.
Such a cache is implemented in terms of two elements:
A (key -> value) mapping, either using a tree or a hash-map
A priority list, which is interleaved within the nodes for efficiency
If you go this road, I would suggest using the Boost.MultiIndex library. There is an exemple of a MRU implementation which is very similar to your needs.

If you can use boost, look at boost::unordered_map, otherwise you can use a std::map. You will have to provide functor to generate the key.

It doesn't always work and is somewhat compiler dependent, but you can look into using function attributes. Of interest to you might be the const or pure attributes. hot might also be of interest.

Nice question. You have several options. First of all, put all the values into an struct:
struct values
{
int a;
float b;
...
};
If one of the values of the sequence is most representative, you can just use a std::map to map that representative value to a "bucket". Let's say that the most representative is the float b
:
std::map< float, std::list < std::pair< values, int> > >
is represented by the std::list, and stores pairs of value structures and result value (int in this case).
Declare a map from the values to the result, int. For that, you should allow values struct to be compared against others in the map, so you have to write the operator<()
:
int operator<(values const& left, values const& right)
{
if (left.a < left.b) ... // compare two values objects
}
and then declare the map as usual:
std::map<values, int>
There are other questions, such as copy constructors, etc. that you have to deal with, but this is the idea.
Final note, you can also substitute std::map for unordered_map.

Put them all in a structure
struct mykey{ int a; float b; float c; int d; char e; };
Then write them in and hash the structure, and use it as a key
int foo(int a, float b, float c, int d, char e)
{
mykey tk = { a, b, c, d, e };
guid key = md5( &tk, sizeof( tk ) );

I'd use nested maps, so you use the first parameter to lookup a map from a map, until the final map where you lookup using the last parameter and the result is the previously cached value of foo.
When you arrive to the last map and find that foo hasn't been called for this setup of parameters, you only need to store the result of foo for the last parameter.

I suggest using the Hash table. You will only need to calculate hash function of the data. If the hash is strong enough, it is possible to store it and output value, without storing arguments. Also, this metod should work faster than using std::map.
In C++ this can be implemented with unordered_map or std::hash_map.
Using very simple hash function will suffice, for example The String hash function.
By the way, the metod of storing output values for arguments is called Memoization

Related

Efficient way to hash a 2D point

OK, so the task is this, I would be given (x, y) co-ordinates of points with both (x, y) ranging from -10^6 to 10^6 inclusive. I have to check whether a particular point e.g. (x, y) tuple was given to me or not. In simple words how do i answer the query whether a particular point(2D) is set or not. So far the best i could think of is maintaining a std::map<std::pair<int,int>, bool> and whenever a point is given I mark it 1. Although this must be running in logarithmic time and is fairly optimized way to answer the query I am wondering if there's a better way to do this.
Also I would be glad if anyone could tell what actually complexity would be if I am using the above data structure as a hash.I mean is it that the complexity of std::map is going to be O(log N) in the size of elements present irrespective of the structure of key?
In order to use a hash map you need to be using std::unordered_map instead of std::map. The constraint of using this is that your value type needs to have a hash function defined for it as described in this answer. Either that or just use boost::hash for this:
std::unordered_map<std::pair<int, int>, boost::hash<std::pair<int, int> > map_of_pairs;
Another method which springs to mind is to store the 32 bit int values in a 64 bit integer like so:
uint64_t i64;
uint32_t a32, b32;
i64 = ((uint64_t)a32 << 32) | b32;
As described in this answer. The x and y components can be stored in the high and low bytes of the integer and then you can use a std::unordered_map<uint64_t, bool>. Although I'd be interested to know if this is any more efficient than the previous method or if it even produces different code.
Instead of mapping each point to a bool, why not store all the points given to you in a set? Then, you can simply search the set to see if it contains the point you are looking for. It is essentially the same as what you are doing without having to do an additional lookup of the associated bool. For example:
set<pair<int, int>> points;
Then, you can check whether the set contains a certain point or not like this :
pair<int, int> examplePoint = make_pair(0, 0);
set<pair<int, int>>::iterator it = points.find(examplePoint);
if (it == points.end()) {
// examplePoint not found
} else {
// examplePoint found
}
As mentioned, std::set is normally implemented as a balanced binary search tree, so each lookup would take O(logn) time.
If you wanted to use a hash table instead, you could do the same thing using std::unordered_set instead of std::set. Assuming you use a good hash function, this would speed your lookups up to O(1) time. However, in order to do this, you will have to define the hash function for pair<int, int>. Here is an example taken from this answer:
namespace std {
template <> struct hash<std::pair<int, int>> {
inline size_t operator()(const std::pair<int, int> &v) const {
std::hash<int> int_hasher;
return int_hasher(v.first) ^ int_hasher(v.second);
}
};
}
Edit: Nevermind, I see you already got it working!

Tolerant key lookup in std::map

Requirements:
container which sorts itself based on numerically comparing the keys (e.g. std::map)
check existence of key based on float tolerance (e.g. map.find() and use custom comparator )
and the tricky one: the float tolerance used by the comparator may be changed by the user at runtime!
The first 2 can be accomplished using a map with a custom comparator:
struct floatCompare : public std::binary_function<float,float,bool>
{
bool operator()( const float &left, const float &right ) const
{
return (fabs(left - right) > 1e-3) && (left < right);
}
};
typedef std::map< float, float, floatCompare > floatMap;
Using this implementation, floatMap.find( 15.0001 ) will find 15.0 in the map.
However, let's say the user doesn't want a float tolerance of 1e-3.
What is the easiest way to make this comparator function use a variable tolerance at runtime? I don't mind re-creating and re-sorting the map based on the new comparator each time epsilon is updated.
Other posts on modification after initialization here and using floats as keys here didn't provide a complete solution.
You can't change the ordering of the map after it's created (and you should just use plain old operator< even for the floating point type here), and you can't even use a "tolerant" comparison operator as that may vioate the required strict-weak-ordering for map to maintain its state.
However you can do the tolerant search with lower_bound and upper_bound. The gist is that you would create a wrapper function much like equal_range that does a lower_bound for "value - tolerance" and then an upper_bound for "value + tolerance" and see if it creates a non-empty range of values that match the criteria.
You cannot change the definition of how elements are ordered in a map once it's been instantiated. If you were to find some technical hack to do so (such as implementing a custom comparator that takes a tolerance that can change at runtime), it would evoke Undefined Behavior.
Your main alternative to changing the ordering is to create another map with a different ordering scheme. This other map could be an indexing map, where the keys are ordered in a different way, and the values arent the elements themselves, but an index in to the main map.
Alternatively maybe what you're really trying to do isn't change the ordering, but maintain the ordering and change the search parameters.
That you can do, and there are a few ways to do it.
One is to simply use map::lower_bound -- once with the lower bound of your tolerance, and once with the upper bound of your tolerance, just past the end of tolerance. For example, if you want to find 15.0 with a tolerance of 1e-5. You could lower_bound with 14.99995 and then again with 15.00005 (my math might be off here) to find the elements in that range.
Another is to use std::find_if with a custom functor, lambda, or std::function. You could declare the functor in such a way as to take the tolerance and the value at construction, and perform the check in operator().
Since this is a homework question, I'll leave the fiddly details of actually implementing all this up to you. :)
Rather than using a comparator with tolerance, which is going to fail in subtle ways, just use a consistent key that is derived from the floating point value. Make your floating point values consistent using rounding.
inline double key(double d)
{
return floor(d * 1000.0 + 0.5);
}
You can't achieve that with a simple custom comparator, even if it was possible to change it after the definition, or when resorting using a new comparator. The fact is: a "tolerant comparator" is not really a comparator. For three values, it's possible that a < c (difference is large enough) but neither a < b nor b < c (both difference too small). Example: a = 5.0, b = 5.5, c = 6.0, tolerance = 0.6
What you should do instead is to use default sorting using operator< for floats, i.e. simply don't provide any custom comparator. Then, for the lookup don't use find but rather lower_bound and upper_bound with modified values according to the tolerance. These two function calls will give you two iterators which define the sequence which will be accepted using this tolerance. If this sequence is empty, the key was not found, obviously.
You then might want to get the key which is closest to the value to be searched for. If this is true, you should then find the min_element of this subsequence, using a comparator which will consider the difference between the key and the value to be searched.
template<typename Map, typename K>
auto tolerant_find(const Map & map, const K & lookup, const K & tolerance) -> decltype(map.begin()) {
// First, find sub-sequence of keys "near" the lookup value
auto first = map.lower_bound(lookup - tolerance);
auto last = map.upper_bound(lookup + tolerance);
// If they are equal, the sequence is empty, and thus no entry was found.
// Return the end iterator to be consistent with std::find.
if (first == last) {
return map.end();
}
// Then, find the one with the minimum distance to the actual lookup value
typedef typename Map::mapped_type T;
return std::min_element(first, last, [lookup](std::pair<K,T> a, std::pair<K,T> b) {
return std::abs(a.first - lookup) < std::abs(b.first - lookup);
});
}
Demo: http://ideone.com/qT3JIa
It may be better to leave the std::map class alone (well, partly at least), and just write your own class which implements the three methods you mentioned.
template<typename T>
class myMap{
private:
float tolerance;
std::map<float,T> storage;
public:
void setTolerance(float t){tolerance=t;};
std::map<float,T>::iterator find(float val); // ex. same as you provided, just change 1e-3 for tolerance
/* other methods go here */
};
That being said, I don't think you need to recreate the container and sort it depending on the tolerance.
check existence of key based on float tolerance
merely means you have to check if an element exists. The position of the elements inside the map shouldn't change. You could start the search from val-tolerance, and when you find an element (the function find returns an iterator), get the next elements untill you reach the end of the map or untill their values exceed val+tolerance.
That basically means that the behavior of the insert/add/[]/whatever functions isn't based on the tolerance, so there's no real problem of storing the values.
If you're afraid the elements will be too close to eachother, you may want to start the searching from val, and then gradually increase the toleration untill it reaches the user desired one.

Avoid recomputation when data is not changed

Imagine you have a pretty big array of double and a simple function avg(double*,size_t) that computes the average value (just a simple example: both the array and the function could be whatever data structure and algorithm). I would like that if the function is called a second time and the array is not changed in the meanwhile, the return value comes directly from the previous one, without going through the unchanged data.
To hold the previous value looks simple, I just need a static variable inside the function, right? But what about detecting the changes in the array? Do I need to write an interface to access the array which sets a flag to be read by the function? Can something smarter and more portable be done?
As Kerrek SB so astutely put it, this is known as "memoization." I'll cover my personal favorite method at the end (both with double* array and the much easier DoubleArray), so you can skip to there if you just want to see code. However, there are many ways to solve this problem, and I wanted to cover them all, including those suggested by others. Skip to the horizontal rule if you just want to see code.
The first part is some theory and alternate approaches. There are fundamentally four parts to the problem:
Prove the function is idempotent (calling a function once is the same as calling it any number of times)
Cache results keyed to the inputs
Search cached results given a new set of inputs
Invalidating cached results which are no longer accurate/current
The first step is easy for you: average is idempotent. It has no side effects.
Caching the results is a fun step. You obviously are going to create some "key" for the inputs that you can compare against the cached "keys." In Kerrek SB's memoization example, the key is a tuple of all of the arguments, compared against other keys with ==. In your system, the equivalent solution would be to have the key be the contents of the entire array. This means each key comparison is O(n), which is expensive. If the function was more expensive to calculate than the average function is, this price may be acceptable. However in the case of averaging, this key is terribly expensive.
This leads one on the open-ended search for good keys. Dieter Lücking's answer was to key the array pointer. This is O(1), and wicked fast to boot. However, it also makes the assumption that once you've calculated the average for an array, that array's values never change, and that memory address is never re-used for another array. Solutions for this come later, in the invalidation portion of the task.
Another popular key is HotLick's (1) in the comments. You use a unique identifier for the array (pointer or, better yet, a unique integer idx that will never be used again) as your key. Each array then has a "dirty bit for avg" that they are expected to set to true whenever a value is changed. Caches first look for the dirty bit. If it is true, they ignore the cached value, calculate the new value, cache the new value, then clear the dirty bit indicating that the cached value is now valid. (this is really invalidation, but it fit well in this part of the answer)
This technique assumes that there are more calls to avg than updates to the data. If the array is constantly dirty, then avg still has to keep recalculating, but we still pay the price of setting the dirty bit on every write (slowing it down).
This technique also assumes that there is only one function, avg, which needs cached results. If you have many functions, it starts to get expensive to keep all of the dirty bits up to date. The solution is an "epoch" counter. Instead of a dirty bit, you have an integer, which starts at 0. Every write increments it. When you cache a result, you cache not only the identity of the array, but its epoch as well. When you check to see if you have a cached value, you also check to see if the epoch changed. If it did change, you can't prove your old results are current, and have to throw them out.
Storing the results is an interesting task. It is very easy to write a storing algorithm which uses up gobs of memory by remembering hundreds of thousands of old results to avg. Generally speaking, there needs to be a way to let the caching code know that an array has been destroyed, or a way to slowly remove old unused cache results. In the former case, the deallocator of the double arrays needs to let the cache code know that that array is being deallocated. In the latter case, it is common to limit a cache to 10 or 100 entries, and have evict old cache results.
The last piece is invalidation of caches. I spoke earlier regarding the dirty bit. The general pattern for this is that a value inside a cache must be marked invalid if the key it was stored in didn't change, but the values in the array did change. This can obviously never happen if the key is a copy of the array, but it can occur when the key is an identifing integer or a pointer.
Generally speaking, invalidation is a way to add a requirement to your caller: if you want to use avg with caching, here's the extra work you are required to do to help the caching code.
Recently I implemented a system with such caching invalidation scheme. It was very simple, and stemmed from one philosophy: the code which is calling avg is in a better position to determine if the array has changed than avg is itself.
There were two versions of the equvalent of avg: double avg(double* array, int n) and double avg(double* array, int n, CacheValidityObject& validity).
Calling the 2 argument version of avg never cached, because it had no guarantees that array had not changed.
Calling the 3 argument version of avg activated caching. The caller guarentees that, if it passes the same CacheValidityObject to avg without marking it dirty, then the arrays must be the same.
Putting the onus on the caller makes average trivial. CacheValidityObject is a very simple class to hold on to the results
class CacheValidityObject
{
public:
CacheValidityObject(); // creates a new dirty CacheValidityObject
void invalidate(); // marks this object as dirty
// this function is used only by the `avg` algorithm. "friend" may
// be used here, but this example makes it public
boost::shared_ptr<void>& getData();
private:
boost::shared_ptr<void> mData;
};
inline void CacheValidityObject::invalidate()
{
mData.reset(); // blow away any cached data
}
double avg(double* array, int n); // defined as usual
double avg(double* array, int n, CacheValidityObject& validity)
{
// this function assumes validity.mData is null or a shared_ptr to a double
boost::shared_ptr<void>& data = validity.getData();
if (data) {
// The cached result, stored on the validity object, is still valid
return *static_pointer_cast<double>(data);
} else {
// There was no cached result, or it was invalidated
double result = avg(array, n);
data = make_shared<double>(result); // cache the result
return result;
}
}
// usage
{
double data[100];
fillWithRandom(data, 100);
CacheValidityObject dataCacheValidity;
double a = avg(data, 100, dataCacheValidity); // caches the aveerage
double b = avg(data, 100, dataCacheValidity); // cache hit... uses cached result
data[0] = 0;
dataCacheValidity.invalidate();
double c = avg(data, 100, dataCacheValidity); // dirty.. caches new result
double d = avg(data, 100, dataCacheValidity); // cache hit.. uses cached result
// CacheValidityObject::~CacheValidityObject() will destroy the shared_ptr,
// freeing the memory used to cache the result
}
Advantages
Nearly the fastest caching possible (within a few opcodes)
Trivial to implement
Doesn't leak memory, saving cached values only when the caller thinks it may want to use them again
Disadvantages
Requires the caller to handle caching, instead of doing it implicitly for them.
If you wrap the double* array in a class, you can minimize the disadvantage. Assign each algorithm an index (can be done at run time) Have the DoubleArray class maintain a map of cached values. Each modification to DoubleArray invalidates the cached results. This is the most easy to use version, but doesn't work with a naked array... you need a class to help you out
class DoubleArray
{
public:
// all of the getters and setters and constructors.
// Special note: all setters MUST call invalidate()
CacheValidityObject getCache(int inIdx)
{
return mCaches[inIdx];
}
void setCache(int inIdx, const CacheValidityObject& inObj)
{
mCaches[inIdx] = inObj;
}
private:
void invalidate()
{
mCaches.clear();
}
std::map<int, CacheValidityObject> mCaches;
double* mArray;
int mSize;
};
inline int getNextAlgorithmIdx()
{
static int nextIdx = 1;
return nextIdx++;
}
static const int avgAlgorithmIdx = getNextAlgorithmIdx();
double avg(DoubleArray& inArray)
{
CacheValidityObject valid = inArray.getCache(avgAlgorithmIdx);
// use the 3 argument avg in the previous example
double result = avg(inArray.getArray(), inArray.getSize(), valid);
inArray.setCache(avgAlgorithmIdx, valid);
return result;
}
// usage
DoubleArray array(100);
fillRandom(array);
double a = avg(array); // calculates, and caches
double b = avg(array); // cache hit
array.set(0, 5); // invalidates caches
double c = avg(array); // calculates, and caches
double d = avg(array); // cache hit
#include <limits>
#include <map>
// Note: You have to manage cached results - release it with avg(p, 0)!
double avg(double* p, std::size_t n) {
typedef std::map<double*, double> map;
static map results;
map::iterator pos = results.find(p);
if(n) {
// Calculate or get a cached value
if(pos == results.end()) {
pos = results.insert(map::value_type(p, 0.5)).first; // calculate it
}
return pos->second;
}
// Erase a cached value
results.erase(pos);
return std::numeric_limits<double>::quiet_NaN();
}

Speed up access to many std::maps with same key

Suppose you have a std::vector<std::map<std::string, T> >. You know that all the maps have the same keys. They might have been initialized with
typedef std::map<std::string, int> MapType;
std::vector<MapType> v;
const int n = 1000000;
v.reserve(n);
for (int i=0;i<n;i++)
{
std::map<std::string, int> m;
m["abc"] = rand();
m["efg"] = rand();
m["hij"] = rand();
v.push_back(m);
}
Given a key (e.g. "efg"), I would like to extract all values of the maps for the given key (which definitely exists in every map).
Is it possible to speed up the following code?
std::vector<int> efgValues;
efgValues.reserve(v.size());
BOOST_FOREACH(MapType const& m, v)
{
efgValues.push_back(m.find("efg")->second);
}
Note that the values are not necessarily int. As profiling confirms that most time is spent in the find function, I was thinking about whether there is a (GCC and MSVC compliant C++03) way to avoid locating the element in the map based on the key for every single map again, because the structure of all the maps is equal.
If no, would it be possible with boost::unordered_map (which is 15% slower on my machine with the code above)? Would it be possible to cache the hash value of the string?
P.S.: I know that having a std::map<std::string, std::vector<T> > would solve my problem. However, I cannot change the data structure (which is actually more complex than what I showed here).
You can cache and playback the sequence of comparison results using a stateful comparator. But that's just nasty; the solution is to adjust the data structure. There's no "cannot." Actually, adding a stateful comparator is changing the data structure. That requirement rules out almost anything.
Another possibility is to create a linked list across the objects of type T so you can get from each map to the next without another lookup. If you might be starting at any of the maps (please, just refactor the structure) then a circular or doubly-linked list will do the trick.
As profiling confirms that most time is spent in the find function
Keeping the tree data structures and optimizing the comparison can only speed up the comparison. Unless the time is spent in operator< (std::string const&, std::string const&), you need to change the way it's linked together.

Difference between two vector<MyType*> A and B

I've got two vector<MyType*> objects called A and B. The MyType class has a field ID and I want to get the MyType* which are in A but not in B. I'm working on a image analysis application and I was hoping to find a fast/optimized solution.
The unordered approach will typically have quadratic complexity unless the data is sorted beforehand (by your ID field), in which case it would be linear and would not require repeated searches through B.
struct CompareId
{
bool operator()(const MyType* a, const MyType* b) const
{
return a>ID < b->ID;
}
};
...
sort(A.begin(), A.end(), CompareId() );
sort(B.begin(), B.end(), CompareId() );
vector<MyType*> C;
set_difference(A.begin(), A.end(), B.begin(), B.end(), back_inserter(C) );
Another solution is to use an ordered container like std::set with CompareId used for the StrictWeakOrdering template argument. I think this would be better if you need to apply a lot of set operations. That has its own overhead (being a tree) but if you really find that to be an efficiency problem, you could implement a fast memory allocator to insert and remove elements super fast (note: only do this if you profile and determine this to be a bottleneck).
Warning: getting into somewhat complicated territory.
There is another solution you can consider which could be very fast if applicable and you never have to worry about sorting data. Basically, make any group of MyType objects which share the same ID store a shared counter (ex: pointer to unsigned int).
This will require creating a map of IDs to counters and require fetching the counter from the map each time a MyType object is created based on its ID. Since you have MyType objects with duplicate IDs, you shouldn't have to insert to the map as often as you create MyType objects (most can probably just fetch an existing counter).
In addition to this, have a global 'traversal' counter which gets incremented whenever it's fetched.
static unsigned int counter = 0;
unsigned int traversal_counter()
{
// make this atomic for multithreaded applications and
// needs to be modified to set all existing ID-associated
// counters to 0 on overflow (see below)
return ++counter;
}
Now let's go back to where you have A and B vectors storing MyType*. To fetch the elements in A that are not in B, we first call traversal_counter(). Assuming it's the first time we call it, that will give us a traversal value of 1.
Now iterate through every MyType* object in B and set the shared counter for each object from 0 to the traversal value, 1.
Now iterate through every MyType* object in A. The ones that have a counter value which doesn't match the current traversal value(1) are the elements in A that are not contained in B.
What happens when you overflow the traversal counter? In this case, we iterate through all the counters stored in the ID map and set them back to zero along with the traversal counter itself. This will only need to occur once in about 4 billion traversals if it's a 32-bit unsigned int.
This is about the fastest solution you can apply to your given problem. It can do any set operation in linear complexity on unsorted data (and always, not just in best-case scenarios like a hash table), but it does introduce some complexity so only consider it if you really need it.
Sort both vectors (std::sort) according to ID and then use std::set_difference. You will need to define a custom comparator to pass to both of these algorithms, for example
struct comp
{
bool operator()(MyType * lhs, MyType * rhs) const
{
return lhs->id < rhs->id;
}
};
First look at the problem. You want "everything in A not in B". That means you're going to have to visit "everything in A". You'll also have to visit everything in B to have knowledge of what is and is not in B. So that suggests there should be an O(n) + O(m) solution, or taking liberty to elide the difference between n and m, O(2n).
Let's consider the std::set_difference approach. Each sort is O(n log n), and set_difference is O(n). So the sort-sort-set_difference approach is O(n + 2n log n). Let's call that O(4n).
Another approach would be to first place the elements of B in a set (or map). Iteration across B to create the set is O(n) plus insertion O(log n) of each element, followed by iteration across A O(n), with a lookup for each element of A (log n), gives a total: O(2n log n). Let's call that O(3n), which is slightly better.
Finally, using an unordered_set (or unordered_map), and assuming we get average case of O(1) insertion and O(1) lookup, we have an approach that is O(2n). A-ha!
The real win here is that unordered_set (or map) is probably the most natural choice to represent your data in the first place, i.e., the proper design yields the optimized implementation. That doesn't always happen, but it's nice when it does!
If B preexists to A, then while populating A, you can bookkeep in a C vector.