How is std::set slower than std::map? - c++

I was trying to solve this problem from acm.timus.ru which basically wants me to output the number of different substrings of a given string (max length 5000).
The solutions I am about to present are desperately inefficient and doomed for Time Limit Exceeded verdict given the constraints. However, the only way in which these two solutions differ (at least as far as I can see/understand) is that one uses std::map<long long, bool>, while the other uses std::set <long long> (see the beginning of the last for loop. The rest is identical, you can check by any diff tool). The map solution results in "Time Limit Exceeded on Test 3", whereas the set solution results in "Time Limit Exceeded on Test 2", which means that Test 2 is such that the map solution works faster on it than the set solution. This is the case if I choose Microsoft Visual Studio 2010 compiler. If I choose GCC, then both solutions result in TLE on test 3.
I am not asking for how to solve the problem efficiently. What I am asking
is how can one explain that using std::map can apparently be more efficient than using std::set. I just fail to see the mechanics of this phenomenon and hope that someone could have any insights.
Code1 (uses map, TLE 3):
#include <iostream>
#include <map>
#include <string>
#include <vector>
using namespace std;
int main ()
{
string s;
cin >> s;
vector <long long> p;
p.push_back(1);
for (int i = 1; i < s.size(); i++)
p.push_back(31 * p[i - 1]);
vector <long long> hash_temp;
hash_temp.push_back((s[0] - 'a' + 1) * p[0]);
for (int i = 1; i < s.size(); i++)
hash_temp.push_back((s[i] - 'a' + 1) * p[i] + hash_temp[i - 1]);
int n = s.size();
int answer = 0;
for (int i = 1; i <= n; i++)
{
map <long long, bool> hash_ans;
for (int j = 0; j < n - i + 1; j++)
{
if (j == 0)
hash_ans[hash_temp[j + i - 1] * p[n - j - 1]] = true;
else
hash_ans[(hash_temp[j + i - 1] - hash_temp[j - 1]) * p[n - j - 1]] = true;
}
answer += hash_ans.size();
}
cout << answer;
}
Code2 (uses set, TLE 2):
#include <iostream>
#include <string>
#include <vector>
#include <set>
using namespace std;
int main ()
{
string s;
cin >> s;
vector <long long> p;
p.push_back(1);
for (int i = 1; i < s.size(); i++)
p.push_back(31 * p[i - 1]);
vector <long long> hash_temp;
hash_temp.push_back((s[0] - 'a' + 1) * p[0]);
for (int i = 1; i < s.size(); i++)
hash_temp.push_back((s[i] - 'a' + 1) * p[i] + hash_temp[i - 1]);
int n = s.size();
int answer = 0;
for (int i = 1; i <= n; i++)
{
set <long long> hash_ans;
for (int j = 0; j < n - i + 1; j++)
{
if (j == 0)
hash_ans.insert(hash_temp[j + i - 1] * p[n - j - 1]);
else
hash_ans.insert((hash_temp[j + i - 1] - hash_temp[j - 1]) * p[n - j - 1]);
}
answer += hash_ans.size();
}
cout << answer;
}

The actual differences I see (tell me if I missed anything) are that in the map case you do
hash_ans[key] = true;
while in the set case you do
hash_ans.insert(key);
In both cases, an element is inserted, unless it already exists, in which it does nothing. In both cases, the lookup needs to locate the according element and insert it on failure. In effectively every implementation out there, the containers will use a tree, making the lookup equally expensive. Even more, the C++ standard actually requires set::insert() and map::operator[]() to be O(log n) in complexity, so the complexity of both implementations should be the same.
Now, what could be the reason that one performs better? One difference is that in one case a node of the underlying tree contains a string, while in the other it's a pair<string const, bool>. Since the pair contains a string, it must be larger and put more pressure on the RAM interface of the machine, so this doesn't explain the speedup. What it could do is enlarge the node size so that other nodes are pushed off the cache line, which can be bad for performance in multi-core system.
In summary, there's some things I'd try:
use the same data in the set
I'd do this with struct data: string {bool b}; i.e. bundle the string in a struct that should have a similar binary layout as the map's elements. As comparator, use less<string>, so that only the string actually takes part in the comparisons.
use insert() on the map
I don't think this should be an issue, but the insert could incur a copy of the argument, even if no insert takes place in the end. I would hope that it doesn't though, so I'm not too confident this will change anything.
turn off debugging
Most implementations have a diagnostic mode, where iterators are validated. You can use this to catch errors where C++ only says "undefined behaviour", shrugs its shoulders and crashes on you. This mode often doesn't meet complexity guarantees and it always has some overhead.
read the code
If the implementations for set and map have different levels of quality and optimization, this could explain the differences. Under the hood, I'd expect both map and set to be built on the same type of tree though, so not much hope here either.

A set is only a little bit faster than a map in this case I guess. Still I don't think you should case that much as TLE 2 or TLE 3 is not really a big deal. It may happen if you are clsoe to the time limit that the same solution time limits on test 2 on a given submit and next time it time limits on test 3. I have some solutions passing the tests just on the time limit and I bet if I resubmit them they will fail.
This particular problem I solved using a Ukonen Sufix tree.

It depends on the implementation algorithms used. Usually sets are implemented using maps only using the key field. In such case there would be a very slight overhead for using a set as opposed to a map.

Related

Find duplicate in unsorted array with best time Complexity

I know there were similar questions, but not of such specificity
Input: n-elements array with unsorted emelents with values from 1 to (n-1).
one of the values is duplicate (eg. n=5, tab[n] = {3,4,2,4,1}.
Task: find duplicate with best Complexity.
I wrote alghoritm:
int tab[] = { 1,6,7,8,9,4,2,2,3,5 };
int arrSize = sizeof(tab)/sizeof(tab[0]);
for (int i = 0; i < arrSize; i++) {
tab[tab[i] % arrSize] = tab[tab[i] % arrSize] + arrSize;
}
for (int i = 0; i < arrSize; i++) {
if (tab[i] >= arrSize * 2) {
std::cout << i;
break;
}
but i dont think it is with best possible Complexity.
Do You know better method/alghoritm? I can use any c++ library, but i don't have any idea.
Is it possible to get better complexity than O(n) ?
In terms of big-O notation, you cannot beat O(n) (same as your solution here). But you can have better constants and simpler algorithm, by using the property that the sum of elements 1,...,n-1 is well known.
int sum = 0;
for (int x : tab) {
sum += x;
}
duplicate = sum - ((n*(n-1)/2))
The constants here will be significntly better - as each array index is accessed exactly once, which is much more cache friendly and efficient to modern architectures.
(Note, this solution does ignore integer overflow, but it's easy to account for it by using 2x more bits in sum than there are in the array's elements).
Adding the classic answer because it was requested. It is based on the idea that if you xor a number with itself you get 0. So if you xor all numbers from 1 to n - 1 and all numbers in the array you will end up with the duplicate.
int duplicate = arr[0];
for (int i = 1; i < arr.length; i++) {
duplicate = duplicate ^ arr[i] ^ i;
}
Don't focus too much on asymptotic complexity. In practice the fastest algorithm is not necessarily the one with lowest asymtotic complexity. That is because constants are not taken into account: O( huge_constant * N) == O(N) == O( tiny_constant * N).
You cannot inspect N values in less than O(N). Though you do not need a full pass through the array. You can stop once you found the duplicate:
#include <iostream>
#include <vector>
int main() {
std::vector<int> vals{1,2,4,6,5,3,2};
std::vector<bool> present(vals.size());
for (const auto& e : vals) {
if (present[e]) {
std::cout << "duplicate is " << e << "\n";
break;
}
present[e] = true;
}
}
In the "lucky case" the duplicate is at index 2. In the worst case the whole vector has to be scanned. On average it is again O(N) time complexity. Further it uses O(N) additional memory while yours is using no additional memory. Again: Complexity alone cannot tell you which algorithm is faster (especially not for a fixed input size).
No matter how hard you try, you won't beat O(N), because no matter in what order you traverse the elements (and remember already found elements), the best and worst case are always the same: Either the duplicate is in the first two elements you inspect or it's the last, and on average it will be O(N).

Golomb sequence without using array

Hi guys I'm looking for a program to find nth number of Golomb sequence without using array!!!!
**
I know this below program, But it's so so slow...
#include <bits/stdc++.h>
using namespace std;
int findGolomb(int);
int main()
{
int n;
cin >> n;
cout << findGolomb(n);
return 0;
}
int findGolomb(int n)
{
if (n == 1)
return 1;
else
return 1 + findGolomb(n - findGolomb(findGolomb(n - 1)));
}
It depends on how large a value you want to calculate. For n <= 50000, the following works:
#include <cmath>
/*
*/
round(1.201*pow(n, 0.618));
As it turns out, due to the nature of this sequence, you need almost every single entry in it to compute g[n]. I coded up a solution that uses a map to save past calculations, purging it of unneeded values. For n == 500000, the map still had roughly 496000 entries, and since a map has two values where the array would have one, you end up using about twice as much memory.
#include <iostream>
#include <map>
using namespace std;
class Golomb_Generator {
public:
int next() {
if (n == 1)
return cache[n++] = 1;
int firstTerm = n - 1;
int secondTerm = cache[firstTerm];
int thirdTerm = n - cache[secondTerm];
if (n != 3) {
auto itr = cache.upper_bound(secondTerm - 1);
cache.erase(begin(cache), itr);
}
return cache[n++] = 1 + cache[thirdTerm];
}
void printCacheSize() {
cout << cache.size() << endl;
}
private:
int n = 1;
map<int, int> cache;
};
void printGolomb(long long n)
{
Golomb_Generator g{};
for (int i = 0; i < n - 1; ++i)
g.next();
cout << g.next() << endl;
g.printCacheSize();
}
int main()
{
int n = 500000;
printGolomb(n);
return EXIT_SUCCESS;
}
You can guess as much. n - g(g(n - 1)) uses g(n-1) as an an argument to g, which is always much, much smaller than n. At the same time, the recurrence also uses n - 1 as an argument, which is close to n. You can't delete that many entries.
About the best you can do without O(n) memory is recursion combined with the approximation that is accurate for smaller n, but it will still become slow quickly. Additionally, as the recursive calls stack up, you will likely use more memory than having an appropriately sized array would.
You might be able to do a little better though. The sequence grows very slowly. Applying that fact to g(n - g(g(n - 1))), you can convince yourself that this relationship mostly needs stored values nearer to 1 and stored values nearer to n -- nearN(n - near1(nearN(n - 1))). You can have a tremendous swath in between that do not need to be stored, because they would be used in calculations of g(n) for much, much larger n than you care about. Below is an example of maintaining the first 10000 values of g and the last 20000 values of g. It works at least for n <= 2000000, and it stops working for sure at n >= 2500000. For n == 2000000, it takes about 5 to 10 seconds to compute.
#include <iostream>
#include <unordered_map>
#include <cmath>
#include <map>
#include <vector>
using namespace std;
class Golomb_Generator {
public:
int next() {
return g(n++);
}
private:
int n = 1;
map<int, int> higherValues{};
vector<int> lowerValues{1, 1};
int g(int n) {
if(n == 1)
return 1;
if (n <= 10000) {
lowerValues.push_back(1 + lowerValues[n - lowerValues[lowerValues[n - 1]]]);
return higherValues[n] = lowerValues[n];
}
removeOldestResults();
return higherValues[n] = 1 + higherValues[n - lowerValues[higherValues[n - 1]]];
}
void removeOldestResults() {
while(higherValues.size() >= 20000)
higherValues.erase(higherValues.begin());
}
};
void printGolomb(int n)
{
Golomb_Generator g{};
for (int i = 0; i < n - 1; ++i)
g.next();
cout << g.next() << endl;
}
int main()
{
int n = 2000000;
printGolomb(n);
return EXIT_SUCCESS;
}
There are some choices and considerations regarding the runtime.
move the complexity to math
The algorithm actually is nothing else but math in computer language. The algorithm may be improved by mathematical substitutions. You may look into the research regarding this algorithm and may find a better algorithm substitute.
move the complexity to the compiler.
When calling the findGolomb(12) with a specific number known at compile time, we may use constexpr, to move the calculation time to the compiler.
constexpr int findGolomb(int);
move the complexity to the memory
Although requested by the Question to not use an array, this is a considerable constraint. Without using any additional memory space, the algorithm has no options but to use runtime, to for example, to known already computed values of findGolomb(..).
The memory constraint may also include the size of the compiled program (by additional lines of code).
move the complexity to the runtime
When not using math, compiler or memory to enhance the algorithm, there is left no options but to move the complexity to the runtime.
Summarizing, there won't be any options to improve the runtime without the four options above. Removing compiler and memory optimizations, and considering the current runtime as already optimal, you are only left with math and research.

insertion sort : why assigning key to j+1 index?

void insertionSort(vector<int>& v){
for(int i =1; i<v.size(); ++i){
int key = v[i];
int j = i - 1;
while(j>=0 && key < v[j]){
swap(v[j], v[j+1]);
j--;
}
v[j+1] = key; // works fine without this.
}
}
In an insertion sort algorithm, I just wonder why the commented part was inserted.
I did several experiments removing that part, and actually thought it is okay to get rid of that.
Could anyone explain of the purpose of the line? Any help would be well appreciated!
Since after each swap it does j--, after the final swap (which frees up v[j]), it decreases j once more. Hence you need to put the new element at v[j + 1].
By the way, swap is not necessary for this code, you might as well do v[j + 1] = v[j] instead of swap.
Edit
Regarding the question on implementation, perhaps the author was making a some point which needed the swap - without knowing the context, we can't say for sure.
Since no one really uses insertion sort, I reckon the purpose of this was only theoretical, and likely to compute complexity by counting number of swaps. Hence the author may have been demonstrating the sort with swap as a building block.
Back to the question,
the implementation is correct, if you are okay with the extra writes that swap does.
(Essentially swap(a, b) is t = a; a = b; b = t;, so two additional writes.)
If you do have the swap, then the commented out line is indeed not necessary.
Without the swap you may rewrite it as -
void insertionSort(vector<int>& v){
for(int i = 1; i < v.size(); ++i){
int key = v[i];
int j = i - 1;
while(j >= 0 && key < v[j]){
v[j + 1] = v[j];
j--;
}
v[j + 1] = key; // this is now necessary.
}
}
Note that since this reduces asymptotic time taken by only a constant, complexity still remain the same as the one with swap, i.e. $O(n^2)$.

Sets and Vectors. Are sets fast in C++?

Please read the question here - http://www.spoj.com/problems/MRECAMAN/
The question was to compute the recaman's sequence where, a(0) = 0 and, a(i) = a(i-1)-i if, a(i-1)-i > 0 and does not come into the sequence before else, a(i) = a(i-1) + i.
Now when I use vectors to store the sequence, and use the find function, the program times out. But when I use an array and a set to see if the element exists, it gets accepted (very fast). IS using set faster?
Here are the codes:
Vector implementation
vector <int> sequence;
sequence.push_back(0);
for (int i = 1; i <= 500000; i++)
{
a = sequence[i - 1] - i;
b = sequence[i - 1] + i;
if (a > 0 && find(sequence.begin(), sequence.end(), a) == sequence.end())
sequence.push_back(a);
else
sequence.push_back(b);
}
Set Implementation
int a[500001]
set <int> exists;
a[0] = 0;
for (int i = 1; i <= MAXN; ++i)
{
if (a[i - 1] - i > 0 && exists.find(a[i - 1] - i) == exists.end()) a[i] = a[i - 1] - i;
else a[i] = a[i - 1] + i;
exists.insert(a[i]);
}
Lookup in an std::vector:
find(sequence.begin(), sequence.end(), a)==sequence.end()
is an O(n) operation (n being the number of elements in the vector).
Lookup in an std::set (which is a balanced binary search tree):
exists.find(a[i-1] - i) == exists.end()
is an O(log n) operation.
So yes, lookup in a set is (asymptotically) faster than a linear lookup in vector.
If you can sort the vector, the look up is faster in most cases than in set because it is much more cache friendly.
There is only one valid answer to most "Is XY faster than UV in C++" questions:
Use a profiler.
While most algorithms (including container insertions, searches etc.) have a guaranteed complexity, these complexities can only tell you about the approximate behavior for large amounts of data. The performance for any given smaller set of data can not be easily compared, and the optimizations that a compiler can apply can not be reasonably guessed by humans. So use a profiler and see what is faster. If it matters at all. To see if performance matters in that special part of your program, use a profiler.
However, in your case it might be a safe bet that searching a set of ~250k elements can be faster than searching an unsorted vector of tat size. However, if you use the vector only for storing the inserted values and leave the sequence[i-1] out in a separate variable, you can keep the vector sorted and use an algorithm for sorted ranges like binary_search, which can be way faster than the set.
A sample implementation with a sorted vector:
const static size_t NMAX = 500000;
vector<int> values = {0};
values.reserve(NMAX );
int lastInserted = 0;
for (int i = 1; i <= NMAX) {
auto a = lastInserted - i;
auto b = lastInserted + i;
auto iter = lower_bound(begin(values), end(values), a);
//a is always less than the last inserted value, so iter can't be end(values)
if (a > 0 && a < *iter) {
lastInserted = a;
}
else {
//b > a => lower_bound(b) >= lower_bound(a)
iter = lower_bound(iter, end(values), b);
lastInserted = b;
}
values.insert(iter, lastInserted);
}
I hope I did not introduce any bugs...
For the task at hand, set is faster than vector because it keeps its contents sorted and does a binary search to find a specified item, giving logarithmic complexity instead of linear complexity. When the set is small, that difference is also small, but when the set gets large the difference grows considerably. I think you can improve things a bit more than just that though.
First, I'd avoid the clumsy lookup to see if an item is already present by just attempting to insert an item, then see if that succeeded:
if (b>0 && exists.insert(b).second)
a[i] = b;
else {
a[i] = c;
exists.insert(c);
}
This avoids looking up the same item twice, once to see if it was already present, and again to insert the item. It only does a second lookup when the first one was already present, so we're going to insert some other value.
Second, and even more importantly, you can use std::unordered_set to improve the complexity from logarithmic to (expected) constant. Since unordered_set uses (mostly) the same interface as std::set, this substitution is easy to make (including the optimization above.
Here's some code to compare the three methods:
#include <iostream>
#include <string>
#include <set>
#include <unordered_set>
#include <vector>
#include <numeric>
#include <chrono>
static const int MAXN = 500000;
unsigned original() {
static int a[MAXN+1];
std::set <int> exists;
a[0] = 0;
for (int i = 1; i <= MAXN; ++i)
{
if (a[i - 1] - i > 0 && exists.find(a[i - 1] - i) == exists.end()) a[i] = a[i - 1] - i;
else a[i] = a[i - 1] + i;
exists.insert(a[i]);
}
return std::accumulate(std::begin(a), std::end(a), 0U);
}
template <class container>
unsigned reduced_lookup() {
container exists;
std::vector<int> a(MAXN + 1);
a[0] = 0;
for (int i = 1; i <= MAXN; ++i) {
int b = a[i - 1] - i;
int c = a[i - 1] + i;
if (b>0 && exists.insert(b).second)
a[i] = b;
else {
a[i] = c;
exists.insert(c);
}
}
return std::accumulate(std::begin(a), std::end(a), 0U);
}
template <class F>
void timer(F f) {
auto start = std::chrono::high_resolution_clock::now();
std::cout << f() <<"\t";
auto stop = std::chrono::high_resolution_clock::now();
std::cout << "Time: " << std::chrono::duration_cast<std::chrono::milliseconds>(stop - start).count() << " ms\n";
}
int main() {
timer(original);
timer(reduced_lookup<std::set<int>>);
timer(reduced_lookup<std::unordered_set<int>>);
}
Note how std::set and std::unordered_set provide similar enough interfaces that I've written the code as a single template that can use either type of container, then for timing just instantiated that for both set and unordered_set.
Anyway, here's some results from g++ (version 4.8.1, compiled with -O3):
212972756 Time: 137 ms
212972756 Time: 101 ms
212972756 Time: 63 ms
Changing the lookup strategy improves speed by about 30%1 and using unordered_set with the improved lookup strategy better than doubles the speed compared to the original--not bad, especially when the result actually looks cleaner, at least to me. You might not agree that it's cleaner looking, but I think we can at least agree that I didn't write code that was a lot longer or more complex to get the speed improvement.
1. Simplistic analysis indicates that it should be around 25%. Specifically, if we assume there are even odds of a given number being in the set already, then this eliminates half the lookups about half the time, or about 1/4th of the lookups.
The set is a huge speedup because it's faster to look up. (Btw, exists.count(a) == 0 is prettier than using find.)
That doesn't have anything to do with vector vs array though. Adding the set to the vector version should work just as fine.
It is classic space-time tradeoff. When you use only vector your program uses minimum memory but you should to find existing numbers on every step. It is slowly. When you use additional index data structure (like a set in your case) you dramatically speed up your code but your code now takes at least twice greater memory. More about tradeoff here.

What container type provides better (average) performance than std::map?

In the following example a std::map structure is filled with 26 values from A - Z (for key) and 0 - 26 for value. The time taken (on my system) to lookup the last entry (10000000 times) is roughly 250 ms for the vector, and 125 ms for the map. (I compiled using release mode, with O3 option turned on for g++ 4.4)
But if for some odd reason I wanted better performance than the std::map, what data structures and functions would I need to consider using?
I apologize if the answer seems obvious to you, but I haven't had much experience in the performance critical aspects of C++ programming.
#include <ctime>
#include <map>
#include <vector>
#include <iostream>
struct mystruct
{
char key;
int value;
mystruct(char k = 0, int v = 0) : key(k), value(v) { }
};
int find(const std::vector<mystruct>& ref, char key)
{
for (std::vector<mystruct>::const_iterator i = ref.begin(); i != ref.end(); ++i)
if (i->key == key) return i->value;
return -1;
}
int main()
{
std::map<char, int> mymap;
std::vector<mystruct> myvec;
for (int i = 'a'; i < 'a' + 26; ++i)
{
mymap[i] = i - 'a';
myvec.push_back(mystruct(i, i - 'a'));
}
int pre = clock();
for (int i = 0; i < 10000000; ++i)
{
find(myvec, 'z');
}
std::cout << "linear scan: milli " << clock() - pre << "\n";
pre = clock();
for (int i = 0; i < 10000000; ++i)
{
mymap['z'];
}
std::cout << "map scan: milli " << clock() - pre << "\n";
return 0;
}
For your example, use int value(char x) { return x - 'a'; }
More generalized, since the "keys" are continuous and dense, use an array (or vector) to guarantee Θ(1) access time.
If you don't need the keys to be sorted, use unordered_map, which should provide amortized logarithmic improvement (i.e. O(log n) -> O(1)) to most operations.
(Sometimes, esp. for small data sets, linear search is faster than hash table (unordered_map) / balanced binary trees (map) because the former has a much simpler algorithm, thus reducing the hidden constant in big-O. Profile, profile, profile.)
For starters, you should probably use std::map::find if you want to compare the search times; operator[] has additional functionality over and above the regular find.
Also, your data set is pretty small, which means that the whole vector will easily fit into the processor cache; a lot of modern processors are optimised for this sort of brute-force search so you'd end up getting fairly good performance. The map, while theoretically having better performance (O(log n) rather than O(n)) can't really exploit its advantage of the smaller number of comparisons because there aren't that many keys to compare against and the overhead of its data layout works against it.
TBH for data structures this small, the additional performance gain from not using a vector is often negligible. The "smarter" data structures like std::map come into play when you're dealing with larger amounts of data and a well distributed set of data that you are searching for.
If you really just have values for all entries from A to Z, why don't you use letter (properly adjusted) as the index into a vector?:
std::vector<int> direct_map;
direct_map.resize(26);
for (int i = 'a'; i < 'a' + 26; ++i)
{
direct_map[i - 'a']= i - 'a';
}
// ...
int find(const std::vector<int> &direct_map, char key)
{
int index= key - 'a';
if (index>=0 && index<direct_map.size())
return direct_map[index];
return -1;
}