Cache behaviour of std::vector - c++

My question is simple enough, given the following data structure, std::vector<std::pair<int, std::unique_ptr<foo>>>, if I have the following:
auto it = std::find_if(begin(v), end(v), [&](std::pair<...> const& p){ return p.first == some_value; });
Can I expect that whatever is pointed to by the pointer is not fetched (I don't want it fetched, will later pre-fetch as needed) into the cache purely for the find operation? Or is this impossible to determine (if so I will close the question..)

When "find" searches in a vector, it will look at the value of the entry in the vector, and match that with what you are searching for. So, it will use whatever "equal" function that is provided by the find, or "operator==" if there is no function provided to find.
Since in this case, you are just comparing the int value in the pair with your expected value, the unique_ptr<foo> will not be dereferenced (and thus data pointed to by unique_ptr<foo> will not enter the cache).

strictly speaking you don't... nowadays I see no reason why this loop should prefetch smth from stored pointer. from machine's point of view you've iterated over a contiguous memory block full of ints and pointer, where only ints are accessed... no reason to prefetch pointers content...
but, maybe later in you code (quite near find_if) there is another loop which would dereference that pointers, so damn smart compiler could decide to insert a fetch instructions in a first loop (this wouldn't affect find_if anyway, so it could!)... we don't know -- it is compiler + optimization-options + architecture depended... we even don't know that next Intel's BlahBlahBridge won't do it w/o any compiler instructions...

Related

Is there a fast way to modify all the elements of a vector by given value?

Basically the title. Is there a relatively fast way to modify all, or a bunch of elements of a vector by a given value, for eg. 1? If not, is there some other datatype that would perform this kind of an operation better?
I have implemented a for loop that basically adds 1 to every element of a vector. Is there some cleaner/shorter way to go about this?
vector<int> vct;
for (int i = 0;i<10;i++){
vct.push_back(i);
}
for (int i = 0;i<vct.size();i++){
vct[i]++;
}
One possibility is to use a data structure which combines a scalar offset, base, and a vector of numbers, data. The value of element i is then computed as base+data[i], which is still O(1). (On modern CPUs, you probably won't notice the time taken by the addition.)
To increment an individual element, you simply increment the particular value in data. To increment all elements, you can increment base.
If you need to set element i to a specific value v, you can use data[i] = v - base. But normally this data structure is used for problems where the data is always incremented (or decremented), either individually or collectively, and it is desired to make collective increments O(1).
You would need to check if the compiler does not already optimize it. Run the compiler with optimization on (-O3 on clang and g++ for example) and check the assembly code code.
Clang for example does some optimizations on vectors already, and without meassuring, you will not know if your code will be slower or faster.
https://llvm.org/docs/Vectorizers.html
So make sure to measure if you want to know if your optimizations have effect.
My syntax favorite is like a commenter above
for (auto &x: vec) {
x .. something
}

Skip an entry of a random iterator

Suppose we have a function foo that does something
to all the elements between *firsta and *lastb:
foo(RandomAccessIterator1 firsta,RandomAccessIterator1 lasta){
for (RandomAccessIterator1 it=firsta;it!=lasta+1;it++){
//here stuff happens...
}
}
question a): is there a way to skip an index firsta<i<lastb by only
modifying the inputs to foo --e.g. the random iterators,
in other words without changing foo itself, just its input?
--Unfortunately the index I want to skip are not in the edges
(they are often deep between firsta and lasta) and foo
is a complicated divide&conquer algorithm that's not amenable
to being called on subsets of the original array the iterators
are pointing to.
question b): if doing a) is possible, what's the cost of doing that?
constant or does it depend on (lasta-firsta)?
The best way to do this would be to use an iterator that knows how to skip that element. A more generalized idea though, is an iterator that simply iterates over two separate ranges under the hood. I don't know of anything in boost that does this, so, here's one I just whipped up: http://coliru.stacked-crooked.com/a/588afa2a353942fc
Unfortunately, the code to detect which element to skip adds a teeny tiny amount of overhead to each and every iterator increment, so the overhead is technically proportional to lasta-firsta. Realistically, using this wrapper around a vector::iterator or a char* should bring it roughly to the same performance level as std::deque::iterator, so it's not like this should be a major slowdown.
The answer might be a bit picky, but you could call foo(firsta,i-1) and foo(i+1,lastb) or something similar to have the desired effect.

C++11 unordered_map time complexity

I'm trying to figure out the best way to do a cache for resources. I am mainly looking for native C/C++/C++11 solutions (i.e. I don't have boost and the likes as an option).
What I am doing when retrieving from the cache is something like this:
Object *ResourceManager::object_named(const char *name) {
if (_object_cache.find(name) == _object_cache.end()) {
_object_cache[name] = new Object();
}
return _object_cache[name];
}
Where _object_cache is defined something like: std::unordered_map <std::string, Object *> _object_cache;
What I am wondering is about the time complexity of doing this, does find trigger a linear-time search or is it done as some kind of a look-up operation?
I mean if I do _object_cache["something"]; on the given example it will either return the object or if it doesn't exist it will call the default constructor inserting an object which is not what I want. I find this a bit counter-intuitive, I would have expected it to report in some way (returning nullptr for example) that a value for the key couldn't be retrieved, not second-guess what I wanted.
But again, if I do a find on the key, does it trigger a big search which in fact will run in linear time (since the key will not be found it will look at every key)?
Is this a good way to do it, or does anyone have some suggestions, perhaps it's possible to use a look up or something to know if the key is available or not, I may access often and if it is the case that some time is spent searching I would like to eliminate it or at least do it as fast as possible.
Thankful for any input on this.
The default constructor (triggered by _object_cache["something"]) is what you want; the default constructor for a pointer type (e.g. Object *) gives nullptr (8.5p6b1, footnote 103).
So:
auto &ptr = _object_cache[name];
if (!ptr) ptr = new Object;
return ptr;
You use a reference into the unordered map (auto &ptr) as your local variable so that you assign into the map and set your return value in the same operation. In C++03 or if you want to be explicit, write Object *&ptr (a reference to a pointer).
Note that you should probably be using unique_ptr rather than a raw pointer to ensure that your cache manages ownership.
By the way, find has the same performance as operator[]; average constant, worst-case linear (only if every key in the unordered map has the same hash).
Here's how I'd write this:
auto it = _object_cache.find(name);
return it != _object_cache.end()
? it->second
: _object_cache.emplace(name, new Object).first->second;
The complexity of find on an std::unordered_map is O(1) (constant), specially with std::string keys which have good hashing leading to very low rate of collisions. Even though the name of the method is find, it doesn't do a linear scan as you pointed out.
If you want to do some kind of caching, this container is definitely a good start.
Note that a cache typically is not just a fast O(1) access but also a bounded data structure. The std::unordered_map will dynamically increase its size when more and more elements are added. When resources are limited (e.g. reading huge files from disk into memory), you want a bounded and fast data structure to improve the responsiveness of your system.
In contrast, a cache will use an eviction strategy whenever size() reaches capacity(), by replacing the least valuable element.
You can implement a cache on top of a std::unordered_map. The eviction strategy can then be implemented by redefining the insert() member. If you want to go for an N-way (for small and fixed N) associative cache (i.e. one item can replace at most N other items), you could use the bucket() interface to replace one of the bucket's entries.
For a fully associative cache (i.e. any item can replace any other item), you could use a Least Recently Used eviction strategy by adding a std::list as a secondary data structure:
using key_tracker_type = std::list<K>;
using key_to_value_type = std::unordered_map<
K,std::pair<V,typename key_tracker_type::iterator>
>;
By wrapping these two structures inside your cache class, you can define the insert() to trigger a replace when your capacity is full. When that happens, you pop_front() the Least Recently Used item and push_back() the current item into the list.
On Tim Day's blog there is an extensive example with full source code that implements the above cache data structure. It's implementation can also be done efficiently using Boost.Bimap or Boost.MultiIndex.
The insert/emplace interfaces to map/unordered_map are enough to do what you want: find the position, and insert if necessary. Since the mapped values here are pointers, ekatmur's response is ideal. If your values are fully-fledged objects in the map rather than pointers, you could use something like this:
Object& ResourceManager::object_named(const char *name, const Object& initialValue) {
return _object_cache.emplace(name, initialValue).first->second;
}
The values name and initialValue make up arguments to the key-value pair that needs to be inserted, if there is no key with the same value as name. The emplace returns a pair, with second indicating whether anything was inserted (the key in name is a new one) - we don't care about that here; and first being the iterator pointing to the (perhaps newly created) key-value pair entry with key equivalent to the value of name. So if the key was already there, dereferencing first gives the original Ojbect for the key, which has not been overwritten with initialValue; otherwise, the key was newly inserted using the value of name and the entry's value portion copied from initialValue, and first points to that.
ekatmur's response is equivalent to this:
Object& ResourceManager::object_named(const char *name) {
bool res;
auto iter = _object_cache.end();
std::tie(iter, res) = _object_cache.emplace(name, nullptr);
if (res) {
iter->second = new Object(); // we inserted a null pointer - now replace it
}
return iter->second;
}
but profits from the fact that the default-constructed pointer value created by operator[] is null to decide whether a new Object needs to be allocated. It's more succinct and easier to read.

An integer hashing problem

I have a (C++) std::map<int, MyObject*> that contains a couple of millions of objects of type MyObject*. The maximum number of objects that I can have, is around 100 millions. The key is the object's id. During a certain process, these objects must be somehow marked( with a 0 or 1) as fast as possible. The marking cannot happen on the objects themselves (so I cannot introduce a member variable and use that for the marking process). Since I know the minimum and maximum id (1 to 100_000_000), the first thought that occured to me, was to use a std::bit_set<100000000> and perform my marking there. This solves my problem and also makes it easier when marking processes run in parallel, since these use their own bit_set to mark things, but I was wondering what the solution could be, if I had to use something else instead of a 0-1 marking, e.g what could I use if I had to mark all objects with an integer number ?
Is there some form of a data structure that can deal with this kind of problem in a compact (memory-wise) manner, and also be fast ? The main queries of interest are whether an object is marked, and with what was marked with.
Thank you.
Note: std::map<int, MyObject*> cannot be changed. Whatever data structure I use, must not deal with the map itself.
How about making the value_type of your map a std::pair<bool, MyObject*> instead of MyObject*?
If you're not concerned with memory, then a std::vector<int> (or whatever suits your need in place of an int) should work.
If you don't like that, and you can't modify your map, then why not create a parallel map for the markers?
std::map<id,T> my_object_map;
std::map<id,int> my_marker_map;
If you cannot modify the objects directly, have you considered wrapping the objects before you place them in the map? e.g.:
struct
{
int marker;
T *p_x;
} T_wrapper;
std::map<int,T_wrapper> my_map;
If you're going to need to do lookups anyway, then this will be no slower.
EDIT: As #tenfour suggests in his/her answer, a std::pair may be a cleaner solution here, as it saves the struct definition. Personally, I'm not a big fan of std::pairs, because you have to refer to everything as first and second, rather than by meaningful names. But that's just me...
The most important question to ask yourself is "How many of these 100,000,000 objects might be marked (or remain unmarked)?" If the answer is smaller than roughly 100,000,000/(2*sizeof(int)), then just use another std::set or std::tr1::unordered_set (hash_set previous to tr1) to track which ones are so marked (or remained unmarked).
Where does 2*sizeof(int) come from? It's an estimate of the amount of memory overhead to maintain a heap structure in a deque of the list of items that will be marked.
If it is larger, then use std::bitset as you were about to use. It's overhead is effectively 0% for the scale of quantity you need. You'll need about 13 megabytes of contiguous ram to hold the bitset.
If you need to store a marking as well as presence, then use std::tr1::unordered_map using the key of Object* and value of marker_type. And again, if the percentage of marked nodes is higher than the aforementioned comparison, then you'll want to use some sort of bitset to hold the number of bits needed, with suitable adjustments in size, at 12.5 megabytes per bit.
A purpose-built object holding the bitset might be your best choice, given the clarification of the requirements.
Edit: this assumes that you've done proper time-complexity computations for what are acceptable solutions to you, since changing the base std::map structure is no longer permitted.
If you don't mind using hacks, take a look at the memory optimization used in Boost.MultiIndex. It can store one bit in the LSB of a stored pointer.

what happens when you modify an element of an std::set?

If I change an element of an std::set, for example, through an iterator, I know it is not "reinserted" or "resorted", but is there any mention of if it triggers undefined behavior? For example, I would imagine insertions would screw up. Is there any mention of specifically what happens?
You should not edit the values stored in the set directly. I copied this from MSDN documentation which is somewhat authoritative:
The STL container class set is used
for the storage and retrieval of data
from a collection in which the values
of the elements contained are unique
and serve as the key values according
to which the data is automatically
ordered. The value of an element in a
set may not be changed directly.
Instead, you must delete old values
and insert elements with new values.
Why this is is pretty easy to understand. The set implementation will have no way of knowing you have modified the value behind its back. The normal implementation is a red-black tree. Having changed the value, the position in the tree for that instance will be wrong. You would expect to see all manner of wrong behaviour, such as exists queries returning the wrong result on account of the search going down the wrong branch of the tree.
The precise answer is platform dependant but as a general rule, a "key" (the stuff you put in a set or the first type of a map) is suppose to be "immutable". To put it simply, that should not be modified, and there is no such thing as automatic re-insertion.
More precisely, the member variables used for to compare the key must not be modified.
Windows vc compiler is quite flexible (tested with VC8) and this code compile:
// creation
std::set<int> toto;
toto.insert(4);
toto.insert(40);
toto.insert(25);
// bad modif
(*toto.begin())=100;
// output
for(std::set<int>::iterator it = toto.begin(); it != toto.end(); ++it)
{
std::cout<<*it<<" ";
}
std::cout<<std::endl;
The output is 100 25 40, which is obviously not sorted... Bad...
Still, such behavior is useful when you want to modify data not participating in the operator <. But you better know what you're doing: that's the price you get for being too flexible.
Some might prefer gcc behavior (tested with 3.4.4) which gives the error "assignment of read-only location". You can work around it with a const_cast:
const_cast<int&>(*toto.begin())=100;
That's now compiling on gcc as well, same output: 100 25 40.
But at least, doing so will probably makes you wonder what's happening, then go to stack overflow and see this thread :-)
You cannot do this; they are const. There exists no method by which the set can detect you making a change to the internal element, and as a result you cannot do so. Instead, you have to remove and reinsert the element. If you are using elements that are expensive to copy, you may have to switch to using pointers and custom comparators (or switch to a C++1x compiler that supports rvalue references, which would make things a whole lot nicer).