I want to count GCD of integers and save them. I find that the time consuming part is not to calculate GCD but to save result to the map. Do I use std::map in a bad way?
#include <set>
#include <iostream>
#include <chrono>
#include "timer.h"
using namespace std;
int gcd (int a, int b)
{
int temp;
while (b != 0)
{
temp = a % b;
a = b;
b = temp;
}
return(a);
}
int main() {
map<int,int> res;
{
Timer timer;
for(int i = 1; i < 10000; i++)
{
for(int j = 2; j < 10000; j++)
res[gcd(i,j)]++;
}
}
{
Timer timer;
for(int i = 1; i < 10000; i++)
{
for(int j = 2; j < 10000; j++)
gcd(i, j);
}
}
}
6627099us(6627.1ms)
0us(0ms)
You should use some real benchmarking library to test this kind of code. In your particular case, the second loop where you discard the results of gcd was probably optimized away. With quickbench I see not that much difference between running just the algorithm and storing the results in std::map or std::unordered_map. I used randomized integers for testing, which is maybe not the best for GCD algorithm, but you can try other approaches.
Code under benchmark without storage:
constexpr int N = 10000;
std::random_device rd;
std::mt19937 gen(rd());
std::uniform_int_distribution<> distrib(1, N);
benchmark::DoNotOptimize(gcd(distrib(gen), distrib(gen)));
and with storage:
benchmark::DoNotOptimize(res[gcd(distrib(gen), distrib(gen))]++);
Results:
You are using std::map correctly. However, you are using an inefficient container for your problem. Given that the possible values of gcd(x,y) are bounded by N, a std::vector would be the most efficient container to store the results.
Specifically,
int main() {
const int N = 10'000;
std::vector<int> res(N, 0); // initialize to N elements with value 0.
...
}
Using parallelism will speed up the program even further. Each thread would have it's own std::vector to compute local results. Once a thread is finished, the results would be added to the result vector in a thread-safe manner (e.g. using std::mutex).
Related
I was trying to solve this question
but codechef.com says the answer is wrong.
#include <iostream>
#include <cmath>
using namespace std;
int main()
{
int t, n, diff, mindiff;
cin >> t;
cin >> n;
int val[n];
while(t--)
{
mindiff = 1000000000;
for(int i = 0; i<n; i++)
{
cin >> val[i];
}
int a = 0;
for(a = 0; a<n ; a++)
{
for(int b=a+1; b<n ; b++)
{
diff = abs(val[a] - val[b]);
if(diff <= mindiff)
{
mindiff = diff;
}
}
}
cout << mindiff << endl;
}
return 0;
}
The results are as expected (for at least the tests I did) buts the website says its wrong.
There are a few things in your code that you should change:
Use std::vector<int> and not variable-length arrays (VLA's):
Reasons:
Variable length arrays are not standard C++. A std::vector is standard C++.
Variable length arrays may exhaust stack memory if the number of entries is large. A std::vector gets its memory from the heap, not the stack.
Variable length arrays suffer from the same problem as regular arrays -- going beyond the bounds of the array leads to undefined
behavior. A std::array has an at() function that can check boundary access when desired.
Use the maximum int to get the maximum integer value.
Instead of
mindif = 1000000000;
it should be:
#include <climits>
//...
int mindiff = std::numeric_limits<int>::max();
As to the solution you chose, the comments in the main section about the nested loop should be addressed.
Instead of a nested for loop, you should sort the data first. Thus finding the minimum value between two values is much easier and with less time complexity.
The program can look something like this (using the data provided at the link):
#include <iostream>
#include <vector>
#include <climits>
#include <algorithm>
int main()
{
int n = 5;
std::vector<int> val = {4, 9, 1, 32, 13};
int mindiff = std::numeric_limits<int>::max();
std::sort(val.begin(), val.end());
for(int a = 0; a < n-1 ; a++)
mindiff = std::min(val[a+1] - val[a], mindiff);
std::cout << mindiff;
}
Output:
3
To do this you can use a simple for():
// you already have an array called "arr" which contains some numbers.
int biggestNumber = 0;
for (int i = 0; i < arr.size(); i++) {
if (arr[i] > biggestNumber) {
biggestNumber = arr[i];
}
}
arr.size will get the array's length so that you can check every value from the position 0 to the last one which is arr.size() - 1 (because arrays are 0 based in c++).
Hope this helps.
I'm running the following program:
#include <iostream>
#include <vector>
#include <cmath>
#include <cstdlib>
#include <chrono>
using namespace std;
const int N = 200; // Number of tests.
const int M = 2000000; // Number of pseudo-random values generated per test.
const int VALS = 2; // Number of possible values (values from 0 to VALS-1).
const int ESP = M / VALS; // Expected number of appearances of each value per test.
int main() {
for (int i = 0; i < N; ++i) {
unsigned seed = chrono::system_clock::now().time_since_epoch().count();
srand(seed);
vector<int> hist(VALS, 0);
for (int j = 0; j < M; ++j) ++hist[rand() % VALS];
int Y = 0;
for (int j = 0; j < VALS; ++j) Y += abs(hist[j] - ESP);
cout << Y << endl;
}
}
This program performs N tests. In each test we generate M numbers between 0 and VALS-1 while we keep counting their appearances in a histogram. Finally, we accumulate in Y the errors, which correspond to the difference between each value of the histogram and the expected value. Since the numbers are generated randomly, each of them would ideally appear M/VALS times per test.
After running my program I analysed the resulting data (i.e., the 200 values of Y) and I realised that some things where happening which I can not explain. I saw that, if the program is compiled with vc++ and given some N and VALS (N = 200 and VALS = 2 in this case), we get different data patterns for different values of M. For some tests the resulting data follows a normal distribution, and for some tests it doesn't. Moreover, this type of results seem to altern as M (the number of pseudo-random values generated in each test) increases:
M = 10K, data is not normal:
M = 100K, data is normal:
and so on:
As you can see, depending on the value of M the resulting data follows a normal distribution or otherwise follows a non-normal distribution (bimodal, dog food or kind of uniform) in which more extreme values of Y have greater presence.
This diversity of results doesn't occur if we compile the program with other C++ compilers (gcc and clang). In this case, it looks like we always obtain a half-normal distribution of Y values:
What are your thoughts on this? What is the explanation?
I carried out the tests through this online compiler: http://rextester.com/l/cpp_online_compiler_visual
The program will generate poorly distributed random numbers (not uniform, independent).
The function rand is a notoriously poor one.
The use of the remainder operator % to bring the numbers into range effectively discards all but the low-order bits.
The RNG is re-seeded every time through the loop.
[edit] I just noticed const int ESP = M / VALS;. You want a floating point number instead.
Try the code below and report back. Using the new <random> is a little tedious. Many people write some small library code to simplify its use.
#include <iostream>
#include <vector>
#include <cmath>
#include <random>
#include <chrono>
using namespace std;
const int N = 200; // Number of tests.
const int M = 2000000; // Number of pseudo-random values generated per test.
const int VALS = 2; // Number of possible values (values from 0 to VALS-1).
const double ESP = (1.0*M)/VALS; // Expected number of appearances of each value per test.
static std::default_random_engine engine;
static void seed() {
std::random_device rd;
engine.seed(rd());
}
static int rand_int(int lo, int hi) {
std::uniform_int_distribution<int> dist (lo, hi - 1);
return dist(engine);
}
int main() {
seed();
for (int i = 0; i < N; ++i) {
vector<int> hist(VALS, 0);
for (int j = 0; j < M; ++j) ++hist[rand_int(0, VALS)];
int Y = 0;
for (int j = 0; j < VALS; ++j) Y += abs(hist[j] - ESP);
cout << Y << endl;
}
}
I am trying to output 9 random non repeating numbers. This is what I've been trying to do:
#include <iostream>
#include <cmath>
#include <vector>
#include <ctime>
using namespace std;
int main() {
srand(time(0));
vector<int> v;
for (int i = 0; i<4; i++) {
v.push_back(rand() % 10);
}
for (int j = 0; j<4; j++) {
for (int m = j+1; m<4; m++) {
while (v[j] == v[m]) {
v[m] = rand() % 10;
}
}
cout << v[j];
}
}
However, i get repeating numbers often. Any help would be appreciated. Thank you.
With a true random number generator, the probability of drawing a particular number is not conditional on any previous numbers drawn. I'm sure you've attained the same number twice when rolling dice, for example.
rand(), which roughly approximates a true generator, will therefore give you back the same number; perhaps even consecutively: your use of % 10 further exacerbates this.
If you don't want repeats, then instantiate a vector containing all the numbers you want potentially, then shuffle them. std::shuffle can help you do that.
See http://en.cppreference.com/w/cpp/algorithm/random_shuffle
When j=0, you'll be checking it with m={1, 2, 3}
But when j=1, you'll be checking it with just m={2, 3}.
You are not checking it with the 0th index again. There, you might be getting repetitions.
Also, note to reduce the chances of getting repeated numbers, why not increase the size of random values, let's say maybe 100.
Please look at the following code to get distinct random values by constantly checking the used values in a std::set:
#include <iostream>
#include <vector>
#include <set>
int main() {
int n = 4;
std::vector <int> values(n);
std::set <int> used_values;
for (int i = 0; i < n; i++) {
int temp = rand() % 10;
while (used_values.find(temp) != used_values.end())
temp = rand() % 10;
values[i] = temp;
}
for(int i = 0; i < n; i++)
std::cout << values[i] << std::endl;
return 0;
}
Here is my code, my unordered_map and map are behaving the same and taking the same time to execute. Am I missing something about these data structures?
Update: I've changed my code based on the below answers and comments. I've removed string operation to reduce the impact in profiling. Also now am only measuring the find() which takes almost 40% of CPU in my code. The profile shows that unordered_map is 3 times faster, however, is there any other way to make this code faster?
#include <map>
#include <unordered_map>
#include <stdio.h>
struct Property {
int a;
};
int main() {
printf("Performance Summery:\n");
static const unsigned long num_iter = 999999;
std::unordered_map<int, Property > myumap;
for (int i = 0; i < 10000; i++) {
int ind = rand() % 1000;
Property p;
p.a = i;
myumap.insert(std::pair<int, Property> (ind, p));
}
clock_t tStart = clock();
for (int i = 0; i < num_iter; i++) {
int ind = rand() % 1000;
std::unordered_map<int, Property >::iterator itr = myumap.find(ind);
}
printf("Time taken unordered_map: %.2fs\n", (double)(clock() - tStart)/CLOCKS_PER_SEC);
std::map<int, Property > mymap;
for (int i = 0; i < 10000; i++) {
int ind = rand() % 1000;
Property p;
p.a = i;
mymap.insert(std::pair<int, Property> (ind, p));
}
tStart = clock();
for (int i = 0; i < num_iter; i++) {
int ind = rand() % 1000;
std::map<int, Property >::iterator itr = mymap.find(ind);
}
printf("Time taken map: %.2fs\n", (double)(clock() - tStart)/CLOCKS_PER_SEC);
}
The output is here
Performance Summery:
Time taken unordered_map: 0.12s
Time taken map: 0.36s
Without going into your code, I would make a few general comments.
What exactly are you measuring? Your profiling includes both populating and scanning the data structures. Given that (presumably) populating an ordered map would take longer, measuring both works against the idea of the gains (or otherwise) of an ordered map. Figure out what you are measuring and just measure that.
You also have a lot going on in the code that is probably incidental to what you are profiling: there is a lot of object creation, string concatenation, etc etc. This is probably what you are actually measuring. Focus on profiling only what your want to measure (see point 1).
10,000 cases is way too small. At this scale other considerations can overwhelm what you are measuring, particularly when you are measuring everything.
There is a reason we like getting minimal, complete and verifiable examples. Here's my code:
#include <map>
#include <unordered_map>
#include <stdio.h>
struct Property {
int a;
};
static const unsigned long num_iter = 100000;
int main() {
printf("Performance Summery:\n");
clock_t tStart = clock();
std::unordered_map<int, Property> myumap;
for (int i = 0; i < num_iter; i++) {
int ind = rand() % 1000;
Property p;
//p.fileName = "hello" + to_string(i) + "world!";
p.a = i;
myumap.insert(std::pair<int, Property> (ind, p));
}
for (int i = 0; i < num_iter; i++) {
int ind = rand() % 1000;
myumap.find(ind);
}
printf("Time taken unordered_map: %.2fs\n", (double)(clock() - tStart)/CLOCKS_PER_SEC);
tStart = clock();
std::map<int, Property> mymap;
for (int i = 0; i < num_iter; i++) {
int ind = rand() % 1000;
Property p;
//p.fileName = "hello" + to_string(i) + "world!";
p.a = i;
mymap.insert(std::pair<int, Property> (ind, p));
}
for (int i = 0; i < num_iter; i++) {
int ind = rand() % 1000;
mymap.find(ind);
}
printf("Time taken map: %.2fs\n", (double)(clock() - tStart)/CLOCKS_PER_SEC);
}
Run time is:
Performance Summery:
Time taken unordered_map: 0.04s
Time taken map: 0.07s
Please note that I am running 10 times the number of iterations you were running.
I suspect there are two problems with your version. The first is that you are running too little iterations for it to make a difference. The second is that you are doing expensive string operations inside the counted loop. The time it takes to run the string operations is greater than the time saved by using unordered map, hence you not seeing the difference in performance.
Whether a tree (std::map) or a hash map (std::unordered_map) is faster really depends on the number of entries and the characteristics of the key (the variability of the values, the compare and hashing functions, etc.)
But in theory, a tree is slower than a hash map because insertion and searching inside a binary tree is O(log2(N)) complexity while insertion and searching inside a hash map is roughly O(1) complexity.
Your test didn't show it because:
You call rand() in a loop. That takes ages in comparison with the map insertion. And it generates different values for the two maps you're testing, skewing results even further. Use a lighter-weight generator e.g. a minstd LCG.
You need a higher resolution clock and more iterations so that each test run takes at least a few hundred milliseconds.
You need to make sure the compiler does not reorder your code so the timing calls happen where they should. This is not always easy. A memory fence around the timed test usually helps to solve this.
Your find() calls have a high probability of being optimized away since you're not using their value (I just happen to know that at least GCC in -O2 mode doesn't do that, so I leave it as is).
String concatenation is also very slow in comparison.
Here's my updated version:
#include <atomic>
#include <chrono>
#include <iostream>
#include <map>
#include <random>
#include <string>
#include <unordered_map>
using namespace std;
using namespace std::chrono;
struct Property {
string fileName;
};
const int nIter = 1000000;
template<typename MAP_TYPE>
long testMap() {
std::minstd_rand rnd(12345);
std::uniform_int_distribution<int> testDist(0, 1000);
auto tm1 = high_resolution_clock::now();
atomic_thread_fence(memory_order_seq_cst);
MAP_TYPE mymap;
for (int i = 0; i < nIter; i++) {
int ind = testDist(rnd);
Property p;
p.fileName = "hello" + to_string(i) + "world!";
mymap.insert(pair<int, Property>(ind, p));
}
atomic_thread_fence(memory_order_seq_cst);
for (int i = 0; i < nIter; i++) {
int ind = testDist(rnd);
mymap.find(ind);
}
atomic_thread_fence(memory_order_seq_cst);
auto tm2 = high_resolution_clock::now();
return (long)duration_cast<milliseconds>(tm2 - tm1).count();
}
int main()
{
printf("Performance Summary:\n");
printf("Time taken unordered_map: %ldms\n", testMap<unordered_map<int, Property>>());
printf("Time taken map: %ldms\n", testMap<map<int, Property>>());
}
Compiled with -O2, it gives the following results:
Performance Summary:
Time taken unordered_map: 348ms
Time taken map: 450ms
So using unordered_map in this particular case is faster by ~20-25%.
It's not just the lookup that's faster with an unordered_map. This slightly modified test also compares the fill times.
I have made a couple of modifications:
increased sample size
both maps now use the same sequence of random numbers.
-
#include <map>
#include <unordered_map>
#include <vector>
#include <stdio.h>
struct Property {
int a;
};
struct make_property : std::vector<int>::const_iterator
{
using base_class = std::vector<int>::const_iterator;
using value_type = std::pair<const base_class::value_type, Property>;
using base_class::base_class;
decltype(auto) get() const {
return base_class::operator*();
}
value_type operator*() const
{
return std::pair<const int, Property>(get(), Property());
}
};
int main() {
printf("Performance Summary:\n");
static const unsigned long num_iter = 9999999;
std::vector<int> keys;
keys.reserve(num_iter);
std::generate_n(std::back_inserter(keys), num_iter, [](){ return rand() / 10000; });
auto time = [](const char* message, auto&& func)
{
clock_t tStart = clock();
func();
clock_t tEnd = clock();
printf("%s: %.2gs\n", message, double(tEnd - tStart) / CLOCKS_PER_SEC);
};
std::unordered_map<int, Property > myumap;
time("fill unordered map", [&]
{
myumap.insert (make_property(keys.cbegin()),
make_property(keys.cend()));
});
std::map<int, Property > mymap;
time("fill ordered map",[&]
{
mymap.insert(make_property(keys.cbegin()),
make_property(keys.cend()));
});
time("find in unordered map",[&]
{
for (auto k : keys) { myumap.find(k); }
});
time("find in ordered map", [&]
{
for (auto k : keys) { mymap.find(k); }
});
}
example output:
Performance Summary:
fill unordered map: 3.5s
fill ordered map: 7.1s
find in unordered map: 1.7s
find in ordered map: 5s
I have an algorithm to pick up and sum up the specific data from a 2-dimensional array at a time, which was done with the following 2-fold loop
#include <iostream>
#include <vector>
#include <algorithm>
#include <iterator>
#include <vector>
#include <cmath>
using namespace std;
int main(int argc, char* argv[])
{
double data[2000][200];
double result[200];
int len[200];
vector< vector<int> > index;
srand (time(NULL));
// initialize data here
for (int i=0; i<2000; i++) for (int j=0; j<200; j++) data[i][j] = rand();
// each index element contains some indices for some elements in data, each index elements might have different length
// len[i] tell the size of vector at index[i]
for (int i=0; i<200; i++)
{
vector<int> c;
len[i] = (int)(rand()%100 + 1);
c.reserve(len[i]);
for (int j=0; j<len[i]; j++)
{
int coord= (int)(rand()%(200*2000));
c.push_back(coord);
}
index.push_back(c);
}
for (int i=0; i<200; i++)
{
double acc=0.0;
for (int j=0; j<index[i].size(); j++) acc += *(&data[0][0] + (int)(index[i][j]));
result[i] = acc;
}
return 0;
}
Since this algorithm will be applied to a big array and the 2-fold might be executed in quite long time. I am thinking if stl algorithm will help this case but stl is so far too abstract for me and I don't know how to use that in 2-fold loop. Any suggestion or idea is more then welcomed.
Following other posts and information I found online, I am trying to use for_each to solve the issue
double sum=0.0;
void sumElements(const int &n)
{
sum += *(&data[0][0] + n);
}
void addVector(const vector<int>& coords)
{
for_each( coords.begin(), coords.end(), sumElements)
}
for_each( index.begin(), index.end(), addVector);
But this code have two issues. Firstly, it doesn't compile on void sumElements(const int &n), many error message comes out. Second, even it works it won't store the result to the right place. For the first for_each, my intention is to enumerate each index, so to calculate the corresponding sum and store the sum as the corresponding element of results array.
First off, STL is not going to give you magic performance benefits.
There already is a std::accumulate which is easier then building your own. Probably not faster, though. Similarly, there's std::generate_n which calls a generator (such as &rand) N times.
You first populate c before you call index.push_back(c);. It may be cheaper to push an empty vector, and set std::vector<int>& c = index.back().