When I use a std::map, it seems that accessing elements takes a different amount of time based on the method used.
First Method: Direct Access
cnt += umap[t];
Second Method:
if (umap.find(t) != umap.end()){
cnt += umap[t];
}
The second method seems to be quite a bit faster than the first method, and I don't understand why. Can someone explain the differences between these two methods?
Each of the code snippets is doing a different thing.
The first snippet takes more time because it inserts a key t to umap, if it does not exist (and initializes it with zero), before adding it to cnt.
In second snippet no key is inserted. Because of the if condition, umap[t] is (called then) added to cnt only when umap has the key t.
The second snippet can be more optimized by temporarily storing the iterator returned by find. In second snippet, operator[] internally calls find method again, which in turn increases time complexity.
Hence, an attempt like this will prove much faster (picked up from user253751's comment):
if(auto it = umap.find(t); it != umap.end())
cnt += it->second;
Related
Basically the title. Is there a relatively fast way to modify all, or a bunch of elements of a vector by a given value, for eg. 1? If not, is there some other datatype that would perform this kind of an operation better?
I have implemented a for loop that basically adds 1 to every element of a vector. Is there some cleaner/shorter way to go about this?
vector<int> vct;
for (int i = 0;i<10;i++){
vct.push_back(i);
}
for (int i = 0;i<vct.size();i++){
vct[i]++;
}
One possibility is to use a data structure which combines a scalar offset, base, and a vector of numbers, data. The value of element i is then computed as base+data[i], which is still O(1). (On modern CPUs, you probably won't notice the time taken by the addition.)
To increment an individual element, you simply increment the particular value in data. To increment all elements, you can increment base.
If you need to set element i to a specific value v, you can use data[i] = v - base. But normally this data structure is used for problems where the data is always incremented (or decremented), either individually or collectively, and it is desired to make collective increments O(1).
You would need to check if the compiler does not already optimize it. Run the compiler with optimization on (-O3 on clang and g++ for example) and check the assembly code code.
Clang for example does some optimizations on vectors already, and without meassuring, you will not know if your code will be slower or faster.
https://llvm.org/docs/Vectorizers.html
So make sure to measure if you want to know if your optimizations have effect.
My syntax favorite is like a commenter above
for (auto &x: vec) {
x .. something
}
I created an unordered map in C++, and used umap.erase(num) = 0 to delete the element from my hash table. This was functioning in a loop and gave me a Time Limit Exceeded error, but when I used umap[num] = 0 instead, to perform the same task, it worked. Do these two have such a huge difference in time complexities as to give me an errors? If yes, how much is this difference?
Ignoring the fact that the first expression produces a compile time error, since you cannot assign something to the value returned from umap.erase(num) (the return type is size_t) the difference is the following:
erase removes the key-value entry from the map completely. For any further access to the key an entry associated with the needs to be created. This may result in unnecessary updates on the map that decrease performance (rehashing, allocation of the the objects required to store the entry. Furthermore using erase you never create a new mapping; calling erase with a key not associated with a value simply leaves the map unchanged.
umap[num] = 0 simply sets the value associated with num to 0, but keeps the mapping; this may actually create a new mapping, should there be no value associated with num before the operation.
To highlight the difference
umap[num] = 0;
//umap.erase(0);
auto iter = umap.find(num);
bool mappingExists = (iter != umap.end());
Both options will result in different values for mappingExists: umap[num] = 0; will yield true and umap.erase(0) will yield false.
Note that given the information in the question it's impossible to tell, if both versions of your algorithm are even equivalent. They may very well be, since accessing a non-existing mapping via operator[] results in an initialization of the value with the default value, which is 0 for integral types.
I want to find the pair using the second element only and the first element could be anything, also all of the second elements are unique.
Code using std::find_if but this takes linear time
set<pair<int,int> > s;
s.insert(make_pair(3,1));
s.insert(make_pair(1,0));
auto it = find_if(s.begin(),s.end(),[value](const pair<int,int>& p ){ return p.second == value; });
if(it==s.end())
s.insert(make_pair(1,value));
else {
int v = it->first;
s.erase(it);
s.insert(make_pair(v+1,value));
}
I want to use std::find function of set so that it takes logarithmic time.
There is no data structure that do exactly what you want.
However databases do something similar. They call it Index Skip Scanning. To implement the same without starting from scratch, you could implement a std::map from the first thing in the pair to a std::map of the second thing in the pair. And now a lookup of a single pair is logarithmic in time, lookup of the things with a given first entry is also logarithmic in time (though iterating through those things may be slower), and lookup of the things with the second entry is linear in the number of first values you have, times logarithmic in the number of second values that you have.
Do note that this is only worthwhile if you have a very large number of pairs, and relatively few values for the first entry in the pair. And furthermore you are constantly changing data (so maintaining multiple indexes is a lot of overhead), and only rarely doing a lookup on the second value in the pair. Break any of those assumptions and the overhead is not worth it.
That is a rather specific set of assumptions to satisfy. It comes up far more often for databases than C++ programmers. Which is why most databases support the operation, and the standard library of C++ does not.
So I've been doing research on the efficiency of the orderings of different unordered_map function calls. Here are two possible workings out of the same code.
Note: keywordMap is an unordered map that maps strings to vector of a home-made struct (which is the type of e). This is done in a loop.
First option:
auto curKeyWord = someString;
auto curEntryPair = keywordMap.insert(
make_pair( move(curKeyWord), vector<entry*>{e} ) );
if (!curEntryPair.second){//insertion failed
curEntryPair.first->second.push_back(e);
}
Second option:
auto curKeyWord = someString;
auto curEntry = keywordMap.find(curKeyWord);
if( curEntry == end(keywordMap) ){//DNE in map
keywordMap.emplace( make_pair( move(curKeyWord), vector<entry*>{e} ) );
}
else{
curEntry->second.push_back(e);
}
I am interested in which of these blocks of code is faster. The question really boils down to how .insert works. If insert basically works as finding where the key should be and inserting it if it doesn't exist, then the first should be faster, as it is just a single probe. Once I've called insert, I have everything I need to call push_back should the insert not have done anything. It also is, however, significantly uglier. I'm also curious if insert has the same problem emplace does, where it constructs the element before checking whether or not the key exists in the map already.
It is possible that I will have to benchmark these two pieces of code, but I am wondering if there is any piece of information that I am missing that would tell me the answer now.
Generally speaking, the first bit of code is faster because it performs a singe lookup.
Here is the usual way of doing insert/overwrite in map:
auto rv = map.insert(std::make_pair(key, value));
if (!rv.second)
rv.first->second = value;
The point here is that that std::map is usually implemented as a balanced binary tree (google: red/black tree) so both insert() and find() take O(log(n)) steps. IE the container has a natural internal order and insertion must place the new items at their correct place. (that is why the keys must be in strict weak order).
std::unordered_map uses hashing, so the lookup is O(1) for the default-constructed map (ie when there is a single item in every bucket). Once collisions are allowed (ie when you have k items in each bucket), each lookup would be take O(k) steps.
Now, going back to the original question - doing less work is always better. The difference in the std::unordered_map case is very small (the second lookup happens in O(1) steps in the default case).
I often see code like:
if(myQMap.contains("my key")){
myValue = myQMap["my key"];
}
which theoretically performs two look-up's in the QMap.
My first reaction is that it should be replaced by the following, which performs one lookup only and should be two times faster:
auto it = myQMap.find("my key");
if(it != myQMap.end()){
myValue = it.value();
}
I am wondering if QMap does this optimization automatically for me?
In other words, I am wondering if QMap saves the position of the last element found with QMap::contains() and checks it first before performing the next lookup?
I would expect that QMap provides both functions for a better interface to the class. It's more natural to ask if the map 'contains' a value with a specified key than it is to call the 'find' function.
As the code shows, both find and contains call the following internal function: -
Node *n = d->findNode(akey);
So if you're going to use the returned iterator, then using find and checking the return value will be more efficient, but if you just want to know if the value exists in the map, calling contains is better for readability.
If you look at the source code, you'll see that QMap is implemented as a binary tree structure of nodes. Calling findNode iterates through the nodes and does not cache the result.
QMap source code reveals that there is no special code in QMap::contains() method.
In some cases you can use QMap::value() or QMap::values() to get value for a key and check if it is correct. These methods (and const operator[]) will copy the value, although this is probably OK for most Qt types since their underlying data are copied-on-write (notably QMap itself).