Getting totally unexpected results while comparing binary search vs linear search's real time performance in C++ using the code below -
typedef std::chrono::microseconds us;
int linear_search(uint64_t* val, int s, int e, uint64_t k) {
while (s < e) {
if (!less<uint64_t>()(val[s], k)) {
break;
}
++s;
}
return {s};
}
int binary_search(uint64_t* val, int s, int e, uint64_t k) {
while (s != e) {
const int mid = (s + e) >> 1;
if (less<uint64_t>()(val[mid], k)) {
s = mid + 1;
} else {
e = mid;
}
}
return {s};
}
int main() {
// Preparing data
int iter = 1000000;
int m = 1000;
uint64_t val[m];
for(int i = 0; i < m;i++) {
val[i] = rand();
}
sort(val, val + m);
uint64_t key = rand();
// Linear search time computation
auto start = std::chrono::system_clock::now();
for (int i = 0; i < iter; i++) {
linear_search(val, 0, m - 1, key);
}
auto end = std::chrono::system_clock::now();
auto elapsed_us = std::chrono::duration_cast<us>(end - start);
std::cout << "Linear search: " << m << " values "
<< elapsed_us.count() << "us\n";
// Binary search time computation
start = std::chrono::system_clock::now();
for (int i = 0; i < iter; i++) {
binary_search(val, 0, m - 1, key);
}
end = std::chrono::system_clock::now();
elapsed_us = std::chrono::duration_cast<us>(end - start);
std::cout << "Binary search: " << m <<" values "
<< elapsed_us.count() << "us\n";
}
Compiling without optimisation, getting following output -
Linear search: 1000 values 1848621us
Binary search: 1000 values 24975us
When compiled with -O3 optimisation, getting this output -
Linear search: 1000 values 0us
Binary search: 1000 values 13424us
I understand that for small array size, binary search may be expensive than linear but can't understand reason for difference of this magnitude by adding -O3
I benchmarked your code with https://quick-bench.com and binary search is much faster (for m = 100, it breaks for m = 1000). That's my benchmark code:
int linear_search(uint64_t* val, int s, int e, uint64_t k) {
while (s < e) {
if (!std::less<uint64_t>()(val[s], k)) {
break;
}
++s;
}
return s;
}
int binary_search(uint64_t* val, int s, int e, uint64_t k) {
while (s != e) {
const int mid = (s + e) >> 1;
if (std::less<uint64_t>()(val[mid], k)) {
s = mid + 1;
} else {
e = mid;
}
}
return s;
}
constexpr int m = 100;
uint64_t val[m];
uint64_t key = rand();
void init() {
static bool isInitialized = false;
if (isInitialized) return;
for(int i = 0; i < m;i++) {
val[i] = rand();
}
std::sort(val, val + m);
isInitialized = true;
}
static void Linear(benchmark::State& state) {
init();
for (auto _ : state) {
int result = linear_search(val, 0, m - 1, key);
benchmark::DoNotOptimize(result);
}
}
BENCHMARK(Linear);
static void Binary(benchmark::State& state) {
init();
for (auto _ : state) {
int result = binary_search(val, 0, m - 1, key);
benchmark::DoNotOptimize(result);
}
}
BENCHMARK(Binary);
and the result:
Only the code inside for (auto _ : state) { is benchmarked.
The compiler manages to realize that your linear search is a noop (it has no side effects) and converts it to doing nothing. So it takes zero time.
To fix that, consider taking the return value and adding it up, then printing it outside the timing block.
Related
I need to permute N numbers between 0 and N-1 in the fastest way (on a CPU, without multi-threading, but maybe with SIMD). N is not large, I think in most cases, N<=12, so N! fits a signed 32-bit integer.
What I have tried so far is roughly the following (some optimizations are omitted, and my original code is in Java, but we speak performance in C++ if not pseudo-code):
#include <random>
#include <cstdint>
#include <iostream>
static inline uint64_t rotl(const uint64_t x, int k) {
return (x << k) | (x >> (64 - k));
}
static uint64_t s[2];
uint64_t Next(void) {
const uint64_t s0 = s[0];
uint64_t s1 = s[1];
const uint64_t result = rotl(s0 + s1, 17) + s0;
s1 ^= s0;
s[0] = rotl(s0, 49) ^ s1 ^ (s1 << 21); // a, b
s[1] = rotl(s1, 28); // c
return result;
}
// Assume the array |dest| must have enough space for N items
void GenPerm(int* dest, const int N) {
for(int i=0; i<N; i++) {
dest[i] = i;
}
uint64_t random = Next();
for(int i=0; i+1<N; i++) {
const int ring = (N-i);
// I hope the compiler optimizes acquisition
// of the quotient and modulo for the same
// dividend and divisor pair into a single
// CPU instruction, at least in Java it does
const int pos = random % ring + i;
random /= ring;
const int t = dest[pos];
dest[pos] = dest[i];
dest[i] = t;
}
}
int main() {
std::random_device rd;
uint32_t* seed = reinterpret_cast<uint32_t*>(s);
for(int i=0; i<4; i++) {
seed[i] = rd();
}
int dest[20];
for(int i=0; i<10; i++) {
GenPerm(dest, 12);
for(int j=0; j<12; j++) {
std::cout << dest[j] << ' ';
}
std::cout << std::endl;
}
return 0;
}
The above is slow because the CPU's modulo operation (%) is slow. I could think of generating one random number between 0 and N!-1 (inclusive); this will reduce the number of modulo operations and Next() calls, but I don't know how to proceed then. Another approach could be to replace the division operation with multiplication by the inverse integer number at the cost of small bias in the modulos generated, but I don't these inverse integers and multiplication will probably not be much faster (bitwise operations & shifts should be faster).
Any more concrete ideas?
UPDATE: I've been asked why it's a bottleneck in the real application. So I just posted a task that may be of interest to the other folks. The real task in production is:
struct Item {
uint8_t is_free_; // 0 or 1
// ... other members ...
};
Item* PickItem(const int time) {
// hash-map lookup, non-empty arrays
std::vector<std::vector<Item*>>> &arrays = GetArrays(time);
Item* busy = nullptr;
for(int i=0; i<arrays.size(); i++) {
uint64_t random = Next();
for(int j=0; j+1<arrays[i].size(); j++) {
const int ring = (arrays[i].size()-j);
const int pos = random % ring + j;
random /= ring;
Item *cur = arrays[i][pos];
if(cur.is_free_) {
// Return a random free item from the first array
// where there is at least one free item
return cur;
}
arrays[i][pos] = arrays[i][j];
arrays[i][j] = cur;
}
Item* cur = arrays[i][arrays[i].size()-1];
if(cur.is_free_) {
return cur;
} else {
// Return the busy item in the last array if no free
// items are found
busy = cur;
}
}
return busy;
}
I came up with the following solution in C++ (though not very portable to Java, because Java doesn't allow parametrizing generics with a constant - in Java I had to use polymorphism, as well as a lot of code duplication):
#include <random>
#include <cstdint>
#include <iostream>
static inline uint64_t rotl(const uint64_t x, int k) {
return (x << k) | (x >> (64 - k));
}
static uint64_t s[2];
uint64_t Next(void) {
const uint64_t s0 = s[0];
uint64_t s1 = s[1];
const uint64_t result = rotl(s0 + s1, 17) + s0;
s1 ^= s0;
s[0] = rotl(s0, 49) ^ s1 ^ (s1 << 21); // a, b
s[1] = rotl(s1, 28); // c
return result;
}
template<int N> void GenPermInner(int* dest, const uint64_t random) {
// Because N is a constant, the compiler can optimize the division
// by N with more lightweight operations like shifts and additions
const int pos = random % N;
const int t = dest[pos];
dest[pos] = dest[0];
dest[0] = t;
return GenPermInner<N-1>(dest+1, random / N);
}
template<> void GenPermInner<0>(int*, const uint64_t) {
return;
}
template<> void GenPermInner<1>(int*, const uint64_t) {
return;
}
// Assume the array |dest| must have enough space for N items
void GenPerm(int* dest, const int N) {
switch(N) {
case 0:
case 1:
return;
case 2:
return GenPermInner<2>(dest, Next());
case 3:
return GenPermInner<3>(dest, Next());
case 4:
return GenPermInner<4>(dest, Next());
case 5:
return GenPermInner<5>(dest, Next());
case 6:
return GenPermInner<6>(dest, Next());
case 7:
return GenPermInner<7>(dest, Next());
case 8:
return GenPermInner<8>(dest, Next());
case 9:
return GenPermInner<9>(dest, Next());
case 10:
return GenPermInner<10>(dest, Next());
case 11:
return GenPermInner<11>(dest, Next());
case 12:
return GenPermInner<12>(dest, Next());
// You can continue with larger numbers, so long as (N!-1) fits 64 bits
default: {
const uint64_t random = Next();
const int pos = random % N;
const int t = dest[pos];
dest[pos] = dest[0];
dest[0] = t;
return GenPerm(dest+1, N-1);
}
}
}
int main() {
std::random_device rd;
uint32_t* seed = reinterpret_cast<uint32_t*>(s);
for(int i=0; i<4; i++) {
seed[i] = rd();
}
int dest[20];
const int N = 12;
// No need to init again and again
for(int j=0; j<N; j++) {
dest[j] =j;
}
for(int i=0; i<10; i++) {
GenPerm(dest, N);
// Or, if you know N at compile-time, call directly
// GenPermInner<N>(dest, Next());
for(int j=0; j<N; j++) {
std::cout << dest[j] << ' ';
}
std::cout << std::endl;
}
return 0;
}
Suppose I want to get every combination of 1's and 0's with length n. For example, if n = 3, then I want
000
001
010
011
100
101
110
111
My initial thought was to use something like:
#include <iostream>
#include <bitset>
#include <cmath>
int main() {
int n = 3;
for (int i = 0; i < pow(2, n); i++)
std::cout << std::bitset<n>(i).to_string() << '\n';
}
but this does not work since std::bitset takes a const, whereas I need n to be variable (for example if I am in a loop).
How can I do this?
A straightforward way: Extract each bits using bitwise shift operation.
#include <iostream>
int main() {
int n = 3;
for (int i = 0; i < (1 << n); i++) {
for (int j = n - 1; j >= 0; j--) {
std::cout << ((i >> j) & 1);
}
std::cout << '\n';
}
return 0;
}
Note that this method will work only if n is small enough not to cause an integer overflow (1 << n doesn't exceed INT_MAX).
To generate larger sequence, you can use recursion:
#include <iostream>
#include <string>
void printBits(int leftBits, const std::string& currentBits) {
if (leftBits <= 0) {
std::cout << currentBits << '\n';
} else {
printBits(leftBits - 1, currentBits + "0");
printBits(leftBits - 1, currentBits + "1");
}
}
int main() {
int n = 3;
printBits(n, "");
return 0;
}
C++20 format to the rescue:
int main()
{
int p;
while (std::cin >> p) {
std::cout << std::format("--------\n2^{}\n", p);
auto n = 1 << p;
for (int i = 0; i < n; i++) {
std::cout << std::format("{:0{}b}\n", i, p);
}
}
}
https://godbolt.org/z/5so59GGMq
Sadly for now only MSVC supports it.
It is also possible to declare and use an Integer class with a parametrable number of bits (static variable) like below ? Use is simple :
#include "Integer.hpp"
int main (int argc, char* argn []) {
Integer::set_nbit (3);
Integer i (0);
do {
i.write (std::cout); std::cout << std::endl;
++i;
}
while (!i.is_max ());
if (i.is_max ()) {i.write (std::cout); std::cout << std::endl;}
return 0;
}
The results are those expected :
000
001
010
011
100
101
110
111
And the Integer class is not that complex now (to be completed with other operation than pre-incrementation, operator =, ==...) : using Little Endian Convention internally, and Big Endian convention for outputs (Integer class can be easily extended to an undetermined number of bits Integer)
#include <iostream>
#include <vector>
#include <algorithm>
class Integer {
static int nbit_;
static int nmax_;
public :
static void set_nbit (int s) {nbit_ = s; auto q (1); auto nb (0); while ((nb +1) < nbit_) {q <<= 1;++nb; nmax_ += q;} }
Integer (int i = 0) : val_ (nbit_, 0) {
int q (1);
int siz (0);
while (q <= i) { ++siz; q<<=1;}
if (!siz) return;
if (q > 1) q >> 1;
auto rit (val_.rbegin ());
auto rest (i);
while (rest > 1) {
*rit++ = rest%q ?true:false;
rest -= q;
q >>= 1;
}
if (q) *rit++ = true;
}
Integer (const Integer& i) : val_ (i.val_) {
}
void operator ++ () {
auto carry ((int) 1);
std::find_if (val_.begin (), val_.end (), [&carry] (std::_Bit_iterator::reference b) {
if (!carry) return true;
if (b) {
b = false;
//carry continues to be 1
}
else {
b = true; carry = 0;
}
return false;
});
if (carry) exit (1);
}
operator std::string () const {
std::string str (val_.size (), '0');
auto i (str.begin ());
auto b0 ('0'), b1 ('1');
std::for_each (val_.rbegin (), val_.rend (), [&i, &b0, &b1] (const auto& b) {*i++ = b ?b1:b0;});
return str;
}
void write (std::ostream& os) const{
os << operator std::string ();
}
bool is_max () const {
auto i (val_.begin ());
i = std::find_if (val_.begin (), val_.end (), [] (const auto& b) {return !b;});
if (i == val_.end ()) return true;
return false;
}
//operators == (string), < (int), < (Integer), =, == TO be written
private :
std::vector<bool> val_;
};
int Integer::nmax_ (0);
int Integer::nbit_ (0);
Here is my function for Combinatorics Combinations.
For example: combinations "ABCD", 2 = AB AC AD BC BD CD.
In future, i will do some operations with each combination(not just printf).
I wonder, is there a way to improve perfomance of this code?
#include "stdafx.h"
#include "iostream"
#include <vector>
void display(std::vector<char> v, int* indices, int r)//f() to display combinations
{
for (int i = 0; i < r; ++i)
std::cout << v[indices[i]];
std::cout << std::endl;
}
void combinations(std::vector<char> v, int n, int r, void(*f)(std::vector<char>, int*, int))
{
int* indices = new int[r];
for (int i = 0; i < r; ++i)
indices[i] = i;
int count;
bool b;
f(v, indices, r);
while (true)
{
b = true;
for (count = r - 1; count >= 0; --count)
{
if (indices[count] != count + n - r)
{
b = false;
break;
}
}
if (b)
break;
++indices[count];
for (int i = count + 1; i < r; ++i)
indices[i] = indices[i - 1] + 1;
f(v, indices, r);
}
delete[] indices;
}
int _tmain(int argc, _TCHAR* argv[])
{
std::vector<char> v(4);//pool
v[0] = 'A';
v[1] = 'B';
v[2] = 'C';
v[3] = 'D';
int n = 4;// pool size
int r = 2;// length of each combination
combinations(v, n, r, display);// pool, pool size, len of combination, func for each combination
return 0;
}
Maybe not performance but readability is also important.
See a solution in recursion.
http://cpp.sh/2jimb
#include <iostream>
#include <string>
typedef unsigned int uint;
typedef std::string::iterator seq_iterator;
std::string FindCombinations(seq_iterator current, seq_iterator end, uint comb_length, std::string comb)
{
if (comb_length == 0 || comb_length > std::distance(current, end))
return comb;//no more possible combinations
auto next = current + 1;
std::string with;
if (std::distance(next, end) >= comb_length - 1)
with = FindCombinations(next, end, comb_length - 1, comb + *current);//use current in combination
std::string without;
if (std::distance(next, end) >= comb_length)
without = FindCombinations(next, end, comb_length, comb);//skip current in combination
std::string result = with;
if (!result.empty() && !without.empty())
result += ' ';
return result + without;
}
int main()
{
std::string seq = "ABCDE";
std::cout << FindCombinations(seq.begin(), seq.end(), 2, "") << std::endl;
}
I am trying to implement build_max_heap function that creates the heap( as it is written in Cormen's "introduction do algorithms" ). But I am getting strange error and i could not localize it. My program successfully give random numbers to table, show them but after build_max_heap() I am getting strange numbers, that are probably because somewhere my program reached something out of the table, but I can not find this error. I will be glad for any help.
For example I get the table:
0 13 18 0 22 15 24 19 5 23
And my output is:
24 7 5844920 5 22 15 18 19 0 23
My code:
#include <iostream>
#include <ctime>
#include <stdlib.h>
const int n = 12; // the length of my table, i will onyl use indexes 1...n-1
struct heap
{
int *tab;
int heap_size;
};
void complete_with_random(heap &heap2)
{
srand(time(NULL));
for (int i = 1; i <= heap2.heap_size; i++)
{
heap2.tab[i] = rand() % 25;
}
heap2.tab[0] = 0;
}
void show(heap &heap2)
{
for (int i = 1; i < heap2.heap_size; i++)
{
std::cout << heap2.tab[i] << " ";
}
}
int parent(int i)
{
return i / 2;
}
int left(int i)
{
return 2 * i;
}
int right(int i)
{
return 2 * i + 1;
}
void max_heapify(heap &heap2, int i)
{
if (i >= heap2.heap_size || i == 0)
{
return;
}
int l = left(i);
int r = right(i);
int largest;
if (l <= heap2.heap_size || heap2.tab[l] > heap2.tab[i])
{
largest = l;
}
else
{
largest = i;
}
if (r <= heap2.heap_size || heap2.tab[r] > heap2.tab[i])
{
largest = r;
}
if (largest != i)
{
std::swap(heap2.tab[i], heap2.tab[largest]);
max_heapify(heap2, largest);
}
}
void build_max_heap(heap &heap2)
{
for (int i = heap2.heap_size / 2; i >= 1; i--)
{
max_heapify(heap2, i);
}
}
int main()
{
heap heap1;
heap1.tab = new int[n];
heap1.heap_size = n - 1;
complete_with_random(heap1);
show(heap1);
std::cout << std::endl;
build_max_heap(heap1);
show(heap1);
}
Indeed, the table is accessed with out-of-bounds indexes.
if (l <= heap2.heap_size || heap2.tab[l] > heap2.tab[i])
^^
I think you meant && in this condition.
The same for the next branch with r.
In case you're still having problems, below is my own implementation that you might use for reference. It was also based on Cormen et al. book, so it's using more or less the same terminology. It may have arbitrary types for the actual container, the comparison and the swap functions. It provides a public queue-like interface, including key incrementing.
Because it's part of a larger software collection, it's using a few entities that are not defined here, but I hope the algorithms are still clear. CHECK is only an assertion mechanism, you can ignore it. You may also ignore the swap member and just use std::swap.
Some parts of the code are using 1-based offsets, others 0-based, and conversion is necessary. The comments above each method indicate this.
template <
typename T,
typename ARRAY = array <T>,
typename COMP = fun::lt,
typename SWAP = fun::swap
>
class binary_heap_base
{
protected:
ARRAY a;
size_t heap_size;
SWAP swap_def;
SWAP* swap;
// 1-based
size_t parent(const size_t n) { return n / 2; }
size_t left (const size_t n) { return n * 2; }
size_t right (const size_t n) { return n * 2 + 1; }
// 1-based
void heapify(const size_t n = 1)
{
T& x = a[n - 1];
size_t l = left(n);
size_t r = right(n);
size_t select =
(l <= heap_size && COMP()(x, a[l - 1])) ?
l : n;
if (r <= heap_size && COMP()(a[select - 1], a[r - 1]))
select = r;
if (select != n)
{
(*swap)(x, a[select - 1]);
heapify(select);
}
}
// 1-based
void build()
{
heap_size = a.length();
for (size_t n = heap_size / 2; n > 0; n--)
heapify(n);
}
// 1-based
size_t advance(const size_t k)
{
size_t n = k;
while (n > 1)
{
size_t pn = parent(n);
T& p = a[pn - 1];
T& x = a[n - 1];
if (!COMP()(p, x)) break;
(*swap)(p, x);
n = pn;
}
return n;
}
public:
binary_heap_base() { init(); set_swap(); }
binary_heap_base(SWAP& s) { init(); set_swap(s); }
binary_heap_base(const ARRAY& a) { init(a); set_swap(); }
binary_heap_base(const ARRAY& a, SWAP& s) { init(a); set_swap(s); }
void init() { a.init(); build(); }
void init(const ARRAY& a) { this->a = a; build(); }
void set_swap() { swap = &swap_def; }
void set_swap(SWAP& s) { swap = &s; }
bool empty() { return heap_size == 0; }
size_t size() { return heap_size; }
size_t length() { return heap_size; }
void reserve(const size_t len) { a.reserve(len); }
const T& top()
{
CHECK (heap_size != 0, eshape());
return a[0];
}
T pop()
{
CHECK (heap_size != 0, eshape());
T x = a[0];
(*swap)(a[0], a[heap_size - 1]);
heap_size--;
heapify();
return x;
}
// 0-based
size_t up(size_t n, const T& x)
{
CHECK (n < heap_size, erange());
CHECK (!COMP()(x, a[n]), ecomp());
a[n] = x;
return advance(n + 1) - 1;
}
// 0-based
size_t push(const T& x)
{
if (heap_size == a.length())
a.push_back(x);
else
a[heap_size] = x;
return advance(++heap_size) - 1;
}
};
When i run this it gives a string of numbers and letters (an address im guessing) where have i gone wrong? I im trying to display the highest and lowest numbers
intArray is a 1d array of 10 numbers and size = 10
void greatAndSmall(int intsAray[], const int SZ, int greatAdd, int smallAdd) //def func
{
while (x < SZ)
{
if (intsAray[x] > greatAdd)
greatAdd = intsAray[x];
else
break;
if (intsAray[x] < smallAdd)
smallAdd = intsAray[x];
else
break;
x = x + 1;
}
}
greatAndSmall(intArray, SIZE, &great, &small); //IN MAIN FUNC
cout << "The smallest of these numbers is: " << small << "\n"; //display smallest
cout << "The largest of these numbers is: " << great; //display greatest
Your code, as written, is not valid C/C++ and won't compile. It also has logical problems and won't work even if it would compile (the breaks you are using are completely redundant.)
Just use this code:
void greatAndSmall (int intsArray [], int sz, int * largest, int * smallest)
{
if (sz < 1) return;
*largest = *smallest = intsArray[0];
for (int i = 1; i < sz; ++i)
{
if (intsArray[i] > *largest) *largest = intsArray[i];
if (intsArray[i] < *smallest) *smallest = intsArray[i];
}
}
IMPORTANT NOTE: This is C code. Do not for one second think that, just because you use cout, this can be counted as C++ code.
Just for comparison, this is how one might write this in C++:
// Largest value in first and smallest value in second
std::pair<int, int> greatAndSmall (std::vector<int> const & c)
{
if (c.empty()) return {};
std::pair<int, int> ret (c[0], c[0]);
for (unsigned i = 1; i < c.size(); ++i)
{
if (c[i] > ret.first) ret.first = c[i];
if (c[i] < ret.second) ret.second = c[i];
}
return ret;
}
or this more general (and admittedly more complex) version:
template<typename C>
auto greatAndSmall (C const & c)
-> std::pair<typename C::value_type, typename C::value_type>
{
if (c.empty()) return {};
auto ret = std::make_pair(*c.begin(), *c.begin());
for (auto const & v : c)
{
if (v > ret.first) ret.first = v;
if (v < ret.second) ret.second = v;
}
return ret;
}