I want to sort a large array of integers (say 1 millon elements) lexicographically.
Example:
input [] = { 100, 21 , 22 , 99 , 1 , 927 }
sorted[] = { 1 , 100, 21 , 22 , 927, 99 }
I have done it using the simplest possible method:
convert all numbers to strings (very costly because it will take huge memory)
use std:sort with strcmp as comparison function
convert back the strings to integers
Is there a better method than this?
Use std::sort() with a suitable comparison function. This cuts down on the memory requirements.
The comparison function can use n % 10, n / 10 % 10, n / 100 % 10 etc. to access the individual digits (for positive integers; negative integers work a bit differently).
To provide any custom sort ordering, you can provide a comparator to std::sort. In this case, it's going to be somewhat complex, using logarithms to inspect individual digits of your number in base 10.
Here's an example — comments inline describe what's going on.
#include <iostream>
#include <algorithm>
#include <cmath>
#include <cassert>
int main() {
int input[] { 100, 21, 22, 99, 1, 927, -50, -24, -160 };
/**
* Sorts the array lexicographically.
*
* The trick is that we have to compare digits left-to-right
* (considering typical Latin decimal notation) and that each of
* two numbers to compare may have a different number of digits.
*
* This is very efficient in storage space, but inefficient in
* execution time; an approach that pre-visits each element and
* stores a translated representation will at least double your
* storage requirements (possibly a problem with large inputs)
* but require only a single translation of each element.
*/
std::sort(
std::begin(input),
std::end(input),
[](int lhs, int rhs) -> bool {
// Returns true if lhs < rhs
// Returns false otherwise
const auto BASE = 10;
const bool LHS_FIRST = true;
const bool RHS_FIRST = false;
const bool EQUAL = false;
// There's no point in doing anything at all
// if both inputs are the same; strict-weak
// ordering requires that we return `false`
// in this case.
if (lhs == rhs) {
return EQUAL;
}
// Compensate for sign
if (lhs < 0 && rhs < 0) {
// When both are negative, sign on its own yields
// no clear ordering between the two arguments.
//
// Remove the sign and continue as for positive
// numbers.
lhs *= -1;
rhs *= -1;
}
else if (lhs < 0) {
// When the LHS is negative but the RHS is not,
// consider the LHS "first" always as we wish to
// prioritise the leading '-'.
return LHS_FIRST;
}
else if (rhs < 0) {
// When the RHS is negative but the LHS is not,
// consider the RHS "first" always as we wish to
// prioritise the leading '-'.
return RHS_FIRST;
}
// Counting the number of digits in both the LHS and RHS
// arguments is *almost* trivial.
const auto lhs_digits = (
lhs == 0
? 1
: std::ceil(std::log(lhs+1)/std::log(BASE))
);
const auto rhs_digits = (
rhs == 0
? 1
: std::ceil(std::log(rhs+1)/std::log(BASE))
);
// Now we loop through the positions, left-to-right,
// calculating the digit at these positions for each
// input, and comparing them numerically. The
// lexicographic nature of the sorting comes from the
// fact that we are doing this per-digit comparison
// rather than considering the input value as a whole.
const auto max_pos = std::max(lhs_digits, rhs_digits);
for (auto pos = 0; pos < max_pos; pos++) {
if (lhs_digits - pos == 0) {
// Ran out of digits on the LHS;
// prioritise the shorter input
return LHS_FIRST;
}
else if (rhs_digits - pos == 0) {
// Ran out of digits on the RHS;
// prioritise the shorter input
return RHS_FIRST;
}
else {
const auto lhs_x = (lhs / static_cast<decltype(BASE)>(std::pow(BASE, lhs_digits - 1 - pos))) % BASE;
const auto rhs_x = (rhs / static_cast<decltype(BASE)>(std::pow(BASE, rhs_digits - 1 - pos))) % BASE;
if (lhs_x < rhs_x)
return LHS_FIRST;
else if (rhs_x < lhs_x)
return RHS_FIRST;
}
}
// If we reached the end and everything still
// matches up, then something probably went wrong
// as I'd have expected to catch this in the tests
// for equality.
assert("Unknown case encountered");
}
);
std::cout << '{';
for (auto x : input)
std::cout << x << ", ";
std::cout << '}';
// Output: {-160, -24, -50, 1, 100, 21, 22, 927, 99, }
}
Demo
There are quicker ways to calculate the number of digits in a number, but the above will get you started.
Here's another algorithm which does some of the computation before sorting. It seems to be quite fast, despite the additional copying (see comparisons).
Note:
it only supports positive integers
in only supports integers <= std::numeric_limits<int>::max()/10
N.B. you can optimize count_digits and my_pow10; for example, see Three Optimization Tips for C++ from Andrei Alexandrescu and Any way faster than pow() to compute an integer power of 10 in C++?
Helpers:
#include <random>
#include <vector>
#include <utility>
#include <cmath>
#include <iostream>
#include <algorithm>
#include <limits>
#include <iterator>
// non-optimized version
int count_digits(int p) // returns `0` for `p == 0`
{
int res = 0;
for(; p != 0; ++res)
{
p /= 10;
}
return res;
}
// non-optimized version
int my_pow10(unsigned exp)
{
int res = 1;
for(; exp != 0; --exp)
{
res *= 10;
}
return res;
}
Algorithm (note - not in-place):
// helper to provide integers with the same number of digits
template<class T, class U>
std::pair<T, T> lexicographic_pair_helper(T const p, U const maxDigits)
{
auto const digits = count_digits(p);
// append zeros so that `l` has `maxDigits` digits
auto const l = static_cast<T>( p * my_pow10(maxDigits-digits) );
return {l, p};
}
template<class RaIt>
using pair_vec
= std::vector<std::pair<typename std::iterator_traits<RaIt>::value_type,
typename std::iterator_traits<RaIt>::value_type>>;
template<class RaIt>
pair_vec<RaIt> lexicographic_sort(RaIt p_beg, RaIt p_end)
{
if(p_beg == p_end) return {};
auto max = *std::max_element(p_beg, p_end);
auto maxDigits = count_digits(max);
pair_vec<RaIt> result;
result.reserve( std::distance(p_beg, p_end) );
for(auto i = p_beg; i != p_end; ++i)
result.push_back( lexicographic_pair_helper(*i, maxDigits) );
using value_type = typename pair_vec<RaIt>::value_type;
std::sort(begin(result), end(result),
[](value_type const& l, value_type const& r)
{
if(l.first < r.first) return true;
if(l.first > r.first) return false;
return l.second < r.second; }
);
return result;
}
Usage example:
int main()
{
std::vector<int> input = { 100, 21 , 22 , 99 , 1 , 927 };
// generate some numbers
/*{
constexpr int number_of_elements = 1E6;
std::random_device rd;
std::mt19937 gen( rd() );
std::uniform_int_distribution<>
dist(0, std::numeric_limits<int>::max()/10);
for(int i = 0; i < number_of_elements; ++i)
input.push_back( dist(gen) );
}*/
std::cout << "unsorted: ";
for(auto const& e : input) std::cout << e << ", ";
std::cout << "\n\n";
auto sorted = lexicographic_sort(begin(input), end(input));
std::cout << "sorted: ";
for(auto const& e : sorted) std::cout << e.second << ", ";
std::cout << "\n\n";
}
Here's a community wiki to compare the solutions. I took nim's code and made it easily extensible. Feel free to add your solutions and outputs.
Sample runs an old slow computer (3 GB RAM, Core2Duo U9400) with g++4.9 # -O3 -march=native:
number of elements: 1e+03
size of integer type: 4
reference solution: Lightness Races in Orbit
solution "dyp":
duration: 0 ms and 301 microseconds
comparison to reference solution: exact match
solution "Nim":
duration: 2 ms and 160 microseconds
comparison to reference solution: exact match
solution "nyarlathotep":
duration: 8 ms and 126 microseconds
comparison to reference solution: exact match
solution "notbad":
duration: 1 ms and 102 microseconds
comparison to reference solution: exact match
solution "Eric Postpischil":
duration: 2 ms and 550 microseconds
comparison to reference solution: exact match
solution "Lightness Races in Orbit":
duration: 17 ms and 469 microseconds
comparison to reference solution: exact match
solution "pts":
duration: 1 ms and 92 microseconds
comparison to reference solution: exact match
==========================================================
number of elements: 1e+04
size of integer type: 4
reference solution: Lightness Races in Orbit
solution "nyarlathotep":
duration: 109 ms and 712 microseconds
comparison to reference solution: exact match
solution "Lightness Races in Orbit":
duration: 272 ms and 819 microseconds
comparison to reference solution: exact match
solution "dyp":
duration: 1 ms and 748 microseconds
comparison to reference solution: exact match
solution "notbad":
duration: 16 ms and 115 microseconds
comparison to reference solution: exact match
solution "pts":
duration: 15 ms and 10 microseconds
comparison to reference solution: exact match
solution "Eric Postpischil":
duration: 33 ms and 301 microseconds
comparison to reference solution: exact match
solution "Nim":
duration: 17 ms and 83 microseconds
comparison to reference solution: exact match
==========================================================
number of elements: 1e+05
size of integer type: 4
reference solution: Lightness Races in Orbit
solution "Nim":
duration: 217 ms and 4 microseconds
comparison to reference solution: exact match
solution "pts":
duration: 199 ms and 505 microseconds
comparison to reference solution: exact match
solution "dyp":
duration: 20 ms and 330 microseconds
comparison to reference solution: exact match
solution "Eric Postpischil":
duration: 415 ms and 477 microseconds
comparison to reference solution: exact match
solution "Lightness Races in Orbit":
duration: 3955 ms and 58 microseconds
comparison to reference solution: exact match
solution "notbad":
duration: 215 ms and 259 microseconds
comparison to reference solution: exact match
solution "nyarlathotep":
duration: 1341 ms and 46 microseconds
comparison to reference solution: mismatch found
==========================================================
number of elements: 1e+06
size of integer type: 4
reference solution: Lightness Races in Orbit
solution "Lightness Races in Orbit":
duration: 52861 ms and 314 microseconds
comparison to reference solution: exact match
solution "Eric Postpischil":
duration: 4757 ms and 608 microseconds
comparison to reference solution: exact match
solution "nyarlathotep":
duration: 15654 ms and 195 microseconds
comparison to reference solution: mismatch found
solution "dyp":
duration: 233 ms and 779 microseconds
comparison to reference solution: exact match
solution "pts":
duration: 2181 ms and 634 microseconds
comparison to reference solution: exact match
solution "Nim":
duration: 2539 ms and 9 microseconds
comparison to reference solution: exact match
solution "notbad":
duration: 2675 ms and 362 microseconds
comparison to reference solution: exact match
==========================================================
number of elements: 1e+07
size of integer type: 4
reference solution: Lightness Races in Orbit
solution "notbad":
duration: 33425 ms and 423 microseconds
comparison to reference solution: exact match
solution "pts":
duration: 26000 ms and 398 microseconds
comparison to reference solution: exact match
solution "Eric Postpischil":
duration: 56206 ms and 359 microseconds
comparison to reference solution: exact match
solution "Lightness Races in Orbit":
duration: 658540 ms and 342 microseconds
comparison to reference solution: exact match
solution "nyarlathotep":
duration: 187064 ms and 518 microseconds
comparison to reference solution: mismatch found
solution "Nim":
duration: 30519 ms and 227 microseconds
comparison to reference solution: exact match
solution "dyp":
duration: 2624 ms and 644 microseconds
comparison to reference solution: exact match
The algorithms have to be structs with function-call operator templates that support the interface:
template<class RaIt> operator()(RaIt begin, RaIt end);
A copy of the input data is provided as a parameter, the algorithm is expected to provide the result in the same range (e.g. in-place sort).
#include <iostream>
#include <vector>
#include <algorithm>
#include <iterator>
#include <random>
#include <vector>
#include <utility>
#include <cmath>
#include <cassert>
#include <chrono>
#include <cstring>
#include <climits>
#include <functional>
#include <cstdlib>
#include <iomanip>
using duration_t = decltype( std::chrono::high_resolution_clock::now()
- std::chrono::high_resolution_clock::now());
template<class T>
struct result_t
{
std::vector<T> numbers;
duration_t duration;
char const* name;
};
template<class RaIt, class F>
result_t<typename std::iterator_traits<RaIt>::value_type>
apply_algorithm(RaIt p_beg, RaIt p_end, F f, char const* name)
{
using value_type = typename std::iterator_traits<RaIt>::value_type;
std::vector<value_type> inplace(p_beg, p_end);
auto start = std::chrono::high_resolution_clock::now();
f(begin(inplace), end(inplace));
auto end = std::chrono::high_resolution_clock::now();
auto duration = end - start;
return {std::move(inplace), duration, name};
}
// non-optimized version
int count_digits(int p) // returns `0` for `p == 0`
{
int res = 0;
for(; p != 0; ++res)
{
p /= 10;
}
return res;
}
// non-optimized version
int my_pow10(unsigned exp)
{
int res = 1;
for(; exp != 0; --exp)
{
res *= 10;
}
return res;
}
// !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
// paste algorithms here
// !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
int main(int argc, char** argv)
{
using integer_t = int;
constexpr integer_t dist_min = 0;
constexpr integer_t dist_max = std::numeric_limits<integer_t>::max()/10;
constexpr std::size_t default_number_of_elements = 1E6;
const std::size_t number_of_elements = argc>1 ? std::atoll(argv[1]) :
default_number_of_elements;
std::cout << "number of elements: ";
std::cout << std::scientific << std::setprecision(0);
std::cout << (double)number_of_elements << "\n";
std::cout << /*std::defaultfloat <<*/ std::setprecision(6);
std::cout.unsetf(std::ios_base::floatfield);
std::cout << "size of integer type: " << sizeof(integer_t) << "\n\n";
std::vector<integer_t> input;
{
input.reserve(number_of_elements);
std::random_device rd;
std::mt19937 gen( rd() );
std::uniform_int_distribution<> dist(dist_min, dist_max);
for(std::size_t i = 0; i < number_of_elements; ++i)
input.push_back( dist(gen) );
}
auto b = begin(input);
auto e = end(input);
using res_t = result_t<integer_t>;
std::vector< std::function<res_t()> > algorithms;
#define MAKE_BINDER(B, E, ALGO, NAME) \
std::bind( &apply_algorithm<decltype(B),decltype(ALGO)>, \
B,E,ALGO,NAME )
constexpr auto lightness_name = "Lightness Races in Orbit";
algorithms.push_back( MAKE_BINDER(b, e, lightness(), lightness_name) );
algorithms.push_back( MAKE_BINDER(b, e, dyp(), "dyp") );
algorithms.push_back( MAKE_BINDER(b, e, nim(), "Nim") );
algorithms.push_back( MAKE_BINDER(b, e, pts(), "pts") );
algorithms.push_back( MAKE_BINDER(b, e, epost(), "Eric Postpischil") );
algorithms.push_back( MAKE_BINDER(b, e, nyar(), "nyarlathotep") );
algorithms.push_back( MAKE_BINDER(b, e, notbad(), "notbad") );
{
std::srand( std::random_device()() );
std::random_shuffle(begin(algorithms), end(algorithms));
}
std::vector< result_t<integer_t> > res;
for(auto& algo : algorithms)
res.push_back( algo() );
auto reference_solution
= *std::find_if(begin(res), end(res),
[](result_t<integer_t> const& p)
{ return 0 == std::strcmp(lightness_name, p.name); });
std::cout << "reference solution: "<<reference_solution.name<<"\n\n";
for(auto const& e : res)
{
std::cout << "solution \""<<e.name<<"\":\n";
auto ms =
std::chrono::duration_cast<std::chrono::microseconds>(e.duration);
std::cout << "\tduration: "<<ms.count()/1000<<" ms and "
<<ms.count()%1000<<" microseconds\n";
std::cout << "\tcomparison to reference solution: ";
if(e.numbers.size() != reference_solution.numbers.size())
{
std::cout << "ouput count mismatch\n";
break;
}
auto mismatch = std::mismatch(begin(e.numbers), end(e.numbers),
begin(reference_solution.numbers)).first;
if(end(e.numbers) == mismatch)
{
std::cout << "exact match\n";
}else
{
std::cout << "mismatch found\n";
}
}
}
Current algorithms; note I replaced the digit counters and pow-of-10 with the global function, so we all benefit if someone optimizes.
struct lightness
{
template<class RaIt> void operator()(RaIt b, RaIt e)
{
using T = typename std::iterator_traits<RaIt>::value_type;
/**
* Sorts the array lexicographically.
*
* The trick is that we have to compare digits left-to-right
* (considering typical Latin decimal notation) and that each of
* two numbers to compare may have a different number of digits.
*
* This is very efficient in storage space, but inefficient in
* execution time; an approach that pre-visits each element and
* stores a translated representation will at least double your
* storage requirements (possibly a problem with large inputs)
* but require only a single translation of each element.
*/
std::sort(
b,
e,
[](T lhs, T rhs) -> bool {
// Returns true if lhs < rhs
// Returns false otherwise
const auto BASE = 10;
const bool LHS_FIRST = true;
const bool RHS_FIRST = false;
const bool EQUAL = false;
// There's no point in doing anything at all
// if both inputs are the same; strict-weak
// ordering requires that we return `false`
// in this case.
if (lhs == rhs) {
return EQUAL;
}
// Compensate for sign
if (lhs < 0 && rhs < 0) {
// When both are negative, sign on its own yields
// no clear ordering between the two arguments.
//
// Remove the sign and continue as for positive
// numbers.
lhs *= -1;
rhs *= -1;
}
else if (lhs < 0) {
// When the LHS is negative but the RHS is not,
// consider the LHS "first" always as we wish to
// prioritise the leading '-'.
return LHS_FIRST;
}
else if (rhs < 0) {
// When the RHS is negative but the LHS is not,
// consider the RHS "first" always as we wish to
// prioritise the leading '-'.
return RHS_FIRST;
}
// Counting the number of digits in both the LHS and RHS
// arguments is *almost* trivial.
const auto lhs_digits = (
lhs == 0
? 1
: std::ceil(std::log(lhs+1)/std::log(BASE))
);
const auto rhs_digits = (
rhs == 0
? 1
: std::ceil(std::log(rhs+1)/std::log(BASE))
);
// Now we loop through the positions, left-to-right,
// calculating the digit at these positions for each
// input, and comparing them numerically. The
// lexicographic nature of the sorting comes from the
// fact that we are doing this per-digit comparison
// rather than considering the input value as a whole.
const auto max_pos = std::max(lhs_digits, rhs_digits);
for (auto pos = 0; pos < max_pos; pos++) {
if (lhs_digits - pos == 0) {
// Ran out of digits on the LHS;
// prioritise the shorter input
return LHS_FIRST;
}
else if (rhs_digits - pos == 0) {
// Ran out of digits on the RHS;
// prioritise the shorter input
return RHS_FIRST;
}
else {
const auto lhs_x = (lhs / static_cast<decltype(BASE)>(std::pow(BASE, lhs_digits - 1 - pos))) % BASE;
const auto rhs_x = (rhs / static_cast<decltype(BASE)>(std::pow(BASE, rhs_digits - 1 - pos))) % BASE;
if (lhs_x < rhs_x)
return LHS_FIRST;
else if (rhs_x < lhs_x)
return RHS_FIRST;
}
}
// If we reached the end and everything still
// matches up, then something probably went wrong
// as I'd have expected to catch this in the tests
// for equality.
assert("Unknown case encountered");
// dyp: suppress warning and throw
throw "up";
}
);
}
};
namespace ndyp
{
// helper to provide integers with the same number of digits
template<class T, class U>
std::pair<T, T> lexicographic_pair_helper(T const p, U const maxDigits)
{
auto const digits = count_digits(p);
// append zeros so that `l` has `maxDigits` digits
auto const l = static_cast<T>( p * my_pow10(maxDigits-digits) );
return {l, p};
}
template<class RaIt>
using pair_vec
= std::vector<std::pair<typename std::iterator_traits<RaIt>::value_type,
typename std::iterator_traits<RaIt>::value_type>>;
template<class RaIt>
pair_vec<RaIt> lexicographic_sort(RaIt p_beg, RaIt p_end)
{
if(p_beg == p_end) return pair_vec<RaIt>{};
auto max = *std::max_element(p_beg, p_end);
auto maxDigits = count_digits(max);
pair_vec<RaIt> result;
result.reserve( std::distance(p_beg, p_end) );
for(auto i = p_beg; i != p_end; ++i)
result.push_back( lexicographic_pair_helper(*i, maxDigits) );
using value_type = typename pair_vec<RaIt>::value_type;
std::sort(begin(result), end(result),
[](value_type const& l, value_type const& r)
{
if(l.first < r.first) return true;
if(l.first > r.first) return false;
return l.second < r.second; }
);
return result;
}
}
struct dyp
{
template<class RaIt> void operator()(RaIt b, RaIt e)
{
auto pairvec = ndyp::lexicographic_sort(b, e);
std::transform(begin(pairvec), end(pairvec), b,
[](typename decltype(pairvec)::value_type const& e) { return e.second; });
}
};
namespace nnim
{
bool comp(int l, int r)
{
int lv[10] = {}; // probably possible to get this from numeric_limits
int rv[10] = {};
int lc = 10; // ditto
int rc = 10;
while (l || r)
{
if (l)
{
auto t = l / 10;
lv[--lc] = l - (t * 10);
l = t;
}
if (r)
{
auto t = r / 10;
rv[--rc] = r - (t * 10);
r = t;
}
}
while (lc < 10 && rc < 10)
{
if (lv[lc] == rv[rc])
{
lc++;
rc++;
}
else
return lv[lc] < rv[rc];
}
return lc > rc;
}
}
struct nim
{
template<class RaIt> void operator()(RaIt b, RaIt e)
{
std::sort(b, e, nnim::comp);
}
};
struct pts
{
template<class T> static bool lex_less(T a, T b) {
unsigned la = 1, lb = 1;
for (T t = a; t > 9; t /= 10) ++la;
for (T t = b; t > 9; t /= 10) ++lb;
const bool ll = la < lb;
while (la > lb) { b *= 10; ++lb; }
while (lb > la) { a *= 10; ++la; }
return a == b ? ll : a < b;
}
template<class RaIt> void operator()(RaIt b, RaIt e)
{
std::sort(b, e, lex_less<typename std::iterator_traits<RaIt>::value_type>);
}
};
struct epost
{
static bool compare(int x, int y)
{
static const double limit = .5 * (log(INT_MAX) - log(INT_MAX-1));
double lx = log10(x);
double ly = log10(y);
double fx = lx - floor(lx); // Get the mantissa of lx.
double fy = ly - floor(ly); // Get the mantissa of ly.
return fabs(fx - fy) < limit ? lx < ly : fx < fy;
}
template<class RaIt> void operator()(RaIt b, RaIt e)
{
std::sort(b, e, compare);
}
};
struct nyar
{
static bool lexiSmaller(int i1, int i2)
{
int digits1 = count_digits(i1);
int digits2 = count_digits(i2);
double val1 = i1/pow(10.0, digits1-1);
double val2 = i2/pow(10.0, digits2-1);
while (digits1 > 0 && digits2 > 0 && (int)val1 == (int)val2)
{
digits1--;
digits2--;
val1 = (val1 - (int)val1)*10;
val2 = (val2 - (int)val2)*10;
}
if (digits1 > 0 && digits2 > 0)
{
return (int)val1 < (int)val2;
}
return (digits2 > 0);
}
template<class RaIt> void operator()(RaIt b, RaIt e)
{
std::sort(b, e, lexiSmaller);
}
};
struct notbad
{
static int up_10pow(int n) {
int ans = 1;
while (ans < n) ans *= 10;
return ans;
}
static bool compare(int v1, int v2) {
int ceil1 = up_10pow(v1), ceil2 = up_10pow(v2);
while ( ceil1 != 0 && ceil2 != 0) {
if (v1 / ceil1 < v2 / ceil2) return true;
else if (v1 / ceil1 > v2 / ceil2) return false;
ceil1 /= 10;
ceil2 /= 10;
}
if (v1 < v2) return true;
return false;
}
template<class RaIt> void operator()(RaIt b, RaIt e)
{
std::sort(b, e, compare);
}
};
I believe the following works as a sort comparison function for positive integers provided the integer type used is substantially narrower than the double type (e.g., 32-bit int and 64-bit double) and the log10 routine used returns exactly correct results for exact powers of 10 (which a good implementation does):
static const double limit = .5 * (log(INT_MAX) - log(INT_MAX-1));
double lx = log10(x);
double ly = log10(y);
double fx = lx - floor(lx); // Get the mantissa of lx.
double fy = ly - floor(ly); // Get the mantissa of ly.
return fabs(fx - fy) < limit ? lx < ly : fx < fy;
It works by comparing the mantissas of the logarithms. The mantissas are the fractional parts of the logarithm, and they indicate the value of the significant digits of a number without the magnitude (e.g., the logarithms of 31, 3.1, and 310 have exactly the same mantissa).
The purpose of fabs(fx - fy) < limit is to allow for errors in taking the logarithm, which occur both because implementations of log10 are imperfect and because the floating-point format forces some error. (The integer portions of the logarithms of 31 and 310 use different numbers of bits, so there are different numbers of bits left for the significand, so they end up being rounded to slightly different values.) As long as the integer type is substantially narrower than the double type, the calculated limit will be much larger than the error in log10. Thus, the test fabs(fx - fy) < limit essentially tells us whether two calculated mantissas would be equal if calculated exactly.
If the mantissas differ, they indicate the lexicographic order, so we return fx < fy. If they are equal, then the integer portion of the logarithm tells us the order, so we return lx < ly.
It is simple to test whether log10 returns correct results for every power of ten, since there are so few of them. If it does not, adjustments can be made easily: Insert if (1-fx < limit) fx = 0; if (1-fu < limit) fy = 0;. This allows for when log10 returns something like 4.99999… when it should have returned 5.
This method has the advantage of not using loops or division (which is time-consuming on many processors).
The task sounds like a natural fit for an MSD variant of Radix Sort with padding ( http://en.wikipedia.org/wiki/Radix_sort ).
Depends on how much code you want to throw at it. The simple code as the others show is O(log n) complexity, while a fully optimized radix sort would be O(kn).
A compact solution if all your numbers are nonnegative and they are small enough so that multiplying them by 10 doesn't cause an overflow:
template<class T> bool lex_less(T a, T b) {
unsigned la = 1, lb = 1;
for (T t = a; t > 9; t /= 10) ++la;
for (T t = b; t > 9; t /= 10) ++lb;
const bool ll = la < lb;
while (la > lb) { b *= 10; ++lb; }
while (lb > la) { a *= 10; ++la; }
return a == b ? ll : a < b;
}
Run it like this:
#include <iostream>
#include <algorithm>
int main(int, char **) {
unsigned short input[] = { 100, 21 , 22 , 99 , 1 , 927 };
unsigned input_size = sizeof(input) / sizeof(input[0]);
std::sort(input, input + input_size, lex_less<unsigned short>);
for (unsigned i = 0; i < input_size; ++i) {
std::cout << ' ' << input[i];
}
std::cout << std::endl;
return 0;
}
You could try using the % operator to give you access to each individual digit eg 121 % 100 will give you the first digit and check that way but you'll have to find a way to get around the fact they have different sizes.
So find the maximum value in array. I don't know if theres a function for this in built you could try.
int Max (int* pdata,int size)
{
int temp_max =0 ;
for (int i =0 ; i < size ; i++)
{
if (*(pdata+i) > temp_max)
{
temp_max = *(pdata+i);
}
}
return temp_max;
}
This function will return the number of digits in the number
int Digit_checker(int n)
{
int num_digits = 1;
while (true)
{
if ((n % 10) == n)
return num_digits;
num_digits++;
n = n/10;
}
return num_digits;
}
Let number of digits in max equal n.
Once you have this open a for loop in the format of
for (int i = 1; i < n ; i++)
then you can go through your and use "data[i] % (10^(n-i))" to get access to the first digit then
sort that and then on the next iteration you'll get access to the second digit. I Don't know how you'll sort them though.
It wont work for negative numbers and you'll have to get around data[i] % (10^(n-i)) returning itself for numbers with less digits than max
Overload the < operator to compare two integers lexicographically. For each integer, find the smallest 10^k, which is not less than the given integer. Than compare the digits one by one.
class CmpIntLex {
int up_10pow(int n) {
int ans = 1;
while (ans < n) ans *= 10;
return ans;
}
public:
bool operator ()(int v1, int v2) {
int ceil1 = up_10pow(v1), ceil2 = up_10pow(v2);
while ( ceil1 != 0 && ceil2 != 0) {
if (v1 / ceil1 < v2 / ceil2) return true;
else if (v1 / ceil1 > v2 / ceil2) return false;
ceil1 /= 10;
ceil2 /= 10;
}
if (v1 < v2) return true;
return false;
}
int main() {
vector<int> vi = {12,45,12134,85};
sort(vi.begin(), vi.end(), CmpIntLex());
}
While some other answers here (Lightness's, notbad's) are already showing quite good code, I believe I can add one solution which might be more performant (since it requires neither division nor power in each loop; but it requires floating point arithmetic, which again might make it slow, and possibly inaccurate for large numbers):
#include <algorithm>
#include <iostream>
#include <assert.h>
// method taken from http://stackoverflow.com/a/1489873/671366
template <class T>
int numDigits(T number)
{
int digits = 0;
if (number < 0) digits = 1; // remove this line if '-' counts as a digit
while (number) {
number /= 10;
digits++;
}
return digits;
}
bool lexiSmaller(int i1, int i2)
{
int digits1 = numDigits(i1);
int digits2 = numDigits(i2);
double val1 = i1/pow(10.0, digits1-1);
double val2 = i2/pow(10.0, digits2-1);
while (digits1 > 0 && digits2 > 0 && (int)val1 == (int)val2)
{
digits1--;
digits2--;
val1 = (val1 - (int)val1)*10;
val2 = (val2 - (int)val2)*10;
}
if (digits1 > 0 && digits2 > 0)
{
return (int)val1 < (int)val2;
}
return (digits2 > 0);
}
int main(int argc, char* argv[])
{
// just testing whether the comparison function works as expected:
assert (lexiSmaller(1, 100));
assert (!lexiSmaller(100, 1));
assert (lexiSmaller(100, 22));
assert (!lexiSmaller(22, 100));
assert (lexiSmaller(927, 99));
assert (!lexiSmaller(99, 927));
assert (lexiSmaller(1, 927));
assert (!lexiSmaller(927, 1));
assert (lexiSmaller(21, 22));
assert (!lexiSmaller(22, 21));
assert (lexiSmaller(22, 99));
assert (!lexiSmaller(99, 22));
// use the comparison function for the actual sorting:
int input[] = { 100 , 21 , 22 , 99 , 1 ,927 };
std::sort(&input[0], &input[5], lexiSmaller);
std::cout << "sorted: ";
for (int i=0; i<6; ++i)
{
std::cout << input[i];
if (i<5)
{
std::cout << ", ";
}
}
std::cout << std::endl;
return 0;
}
Though I have to admit I haven't tested the performance yet.
Here is the dumb solution that doesn't use any floating point tricks. It's pretty much the same as the string comparison, but doesn't use a string per say, doesn't also handle negative numbers, to do that add a section at the top...
bool comp(int l, int r)
{
int lv[10] = {}; // probably possible to get this from numeric_limits
int rv[10] = {};
int lc = 10; // ditto
int rc = 10;
while (l || r)
{
if (l)
{
auto t = l / 10;
lv[--lc] = l - (t * 10);
l = t;
}
if (r)
{
auto t = r / 10;
rv[--rc] = r - (t * 10);
r = t;
}
}
while (lc < 10 && rc < 10)
{
if (lv[lc] == rv[rc])
{
lc++;
rc++;
}
else
return lv[lc] < rv[rc];
}
return lc > rc;
}
It's fast, and I'm sure it's possible to make it faster still, but it works and it's dumb enough to understand...
EDIT: I ate to dump lots of code, but here is a comparison of all the solutions so far..
#include <iostream>
#include <vector>
#include <algorithm>
#include <iterator>
#include <random>
#include <vector>
#include <utility>
#include <cmath>
#include <cassert>
#include <chrono>
std::pair<int, int> lexicographic_pair_helper(int p, int maxDigits)
{
int digits = std::log10(p);
int l = p*std::pow(10, maxDigits-digits);
return {l, p};
}
bool l_comp(int l, int r)
{
int lv[10] = {}; // probably possible to get this from numeric_limits
int rv[10] = {};
int lc = 10; // ditto
int rc = 10;
while (l || r)
{
if (l)
{
auto t = l / 10;
lv[--lc] = l - (t * 10);
l = t;
}
if (r)
{
auto t = r / 10;
rv[--rc] = r - (t * 10);
r = t;
}
}
while (lc < 10 && rc < 10)
{
if (lv[lc] == rv[rc])
{
lc++;
rc++;
}
else
return lv[lc] < rv[rc];
}
return lc > rc;
}
int up_10pow(int n) {
int ans = 1;
while (ans < n) ans *= 10;
return ans;
}
bool l_comp2(int v1, int v2) {
int n1 = up_10pow(v1), n2 = up_10pow(v2);
while ( v1 != 0 && v2 != 0) {
if (v1 / n1 < v2 / n2) return true;
else if (v1 / n1 > v2 / n2) return false;
v1 /= 10;
v2 /= 10;
n1 /= 10;
n2 /= 10;
}
if (v1 == 0 && v2 != 0) return true;
return false;
}
int main()
{
std::vector<int> numbers;
{
constexpr int number_of_elements = 1E6;
std::random_device rd;
std::mt19937 gen( rd() );
std::uniform_int_distribution<> dist;
for(int i = 0; i < number_of_elements; ++i) numbers.push_back( dist(gen) );
}
std::vector<int> lo(numbers);
std::vector<int> dyp(numbers);
std::vector<int> nim(numbers);
std::vector<int> nb(numbers);
std::cout << "starting..." << std::endl;
{
auto start = std::chrono::high_resolution_clock::now();
/**
* Sorts the array lexicographically.
*
* The trick is that we have to compare digits left-to-right
* (considering typical Latin decimal notation) and that each of
* two numbers to compare may have a different number of digits.
*
* This probably isn't very efficient, so I wouldn't do it on
* "millions" of numbers. But, it works...
*/
std::sort(
std::begin(lo),
std::end(lo),
[](int lhs, int rhs) -> bool {
// Returns true if lhs < rhs
// Returns false otherwise
const auto BASE = 10;
const bool LHS_FIRST = true;
const bool RHS_FIRST = false;
const bool EQUAL = false;
// There's no point in doing anything at all
// if both inputs are the same; strict-weak
// ordering requires that we return `false`
// in this case.
if (lhs == rhs) {
return EQUAL;
}
// Compensate for sign
if (lhs < 0 && rhs < 0) {
// When both are negative, sign on its own yields
// no clear ordering between the two arguments.
//
// Remove the sign and continue as for positive
// numbers.
lhs *= -1;
rhs *= -1;
}
else if (lhs < 0) {
// When the LHS is negative but the RHS is not,
// consider the LHS "first" always as we wish to
// prioritise the leading '-'.
return LHS_FIRST;
}
else if (rhs < 0) {
// When the RHS is negative but the LHS is not,
// consider the RHS "first" always as we wish to
// prioritise the leading '-'.
return RHS_FIRST;
}
// Counting the number of digits in both the LHS and RHS
// arguments is *almost* trivial.
const auto lhs_digits = (
lhs == 0
? 1
: std::ceil(std::log(lhs+1)/std::log(BASE))
);
const auto rhs_digits = (
rhs == 0
? 1
: std::ceil(std::log(rhs+1)/std::log(BASE))
);
// Now we loop through the positions, left-to-right,
// calculating the digit at these positions for each
// input, and comparing them numerically. The
// lexicographic nature of the sorting comes from the
// fact that we are doing this per-digit comparison
// rather than considering the input value as a whole.
const auto max_pos = std::max(lhs_digits, rhs_digits);
for (auto pos = 0; pos < max_pos; pos++) {
if (lhs_digits - pos == 0) {
// Ran out of digits on the LHS;
// prioritise the shorter input
return LHS_FIRST;
}
else if (rhs_digits - pos == 0) {
// Ran out of digits on the RHS;
// prioritise the shorter input
return RHS_FIRST;
}
else {
const auto lhs_x = (lhs / static_cast<decltype(BASE)>(std::pow(BASE, lhs_digits - 1 - pos))) % BASE;
const auto rhs_x = (rhs / static_cast<decltype(BASE)>(std::pow(BASE, rhs_digits - 1 - pos))) % BASE;
if (lhs_x < rhs_x)
return LHS_FIRST;
else if (rhs_x < lhs_x)
return RHS_FIRST;
}
}
// If we reached the end and everything still
// matches up, then something probably went wrong
// as I'd have expected to catch this in the tests
// for equality.
assert("Unknown case encountered");
}
);
auto end = std::chrono::high_resolution_clock::now();
auto elapsed = end - start;
std::cout << "Lightness: " << elapsed.count() << '\n';
}
{
auto start = std::chrono::high_resolution_clock::now();
auto max = *std::max_element(begin(dyp), end(dyp));
int maxDigits = std::log10(max);
std::vector<std::pair<int,int>> temp;
temp.reserve(dyp.size());
for(auto const& e : dyp) temp.push_back( lexicographic_pair_helper(e, maxDigits) );
std::sort(begin(temp), end(temp), [](std::pair<int, int> const& l, std::pair<int, int> const& r)
{ if(l.first < r.first) return true; if(l.first > r.first) return false; return l.second < r.second; });
auto end = std::chrono::high_resolution_clock::now();
auto elapsed = end - start;
std::cout << "Dyp: " << elapsed.count() << '\n';
}
{
auto start = std::chrono::high_resolution_clock::now();
std::sort (nim.begin(), nim.end(), l_comp);
auto end = std::chrono::high_resolution_clock::now();
auto elapsed = end - start;
std::cout << "Nim: " << elapsed.count() << '\n';
}
// {
// auto start = std::chrono::high_resolution_clock::now();
// std::sort (nb.begin(), nb.end(), l_comp2);
// auto end = std::chrono::high_resolution_clock::now();
// auto elapsed = end - start;
// std::cout << "notbad: " << elapsed.count() << '\n';
// }
std::cout << (nim == lo) << std::endl;
std::cout << (nim == dyp) << std::endl;
std::cout << (lo == dyp) << std::endl;
// std::cout << (lo == nb) << std::endl;
}
Based on #Oswald's answer, below is some code that does the same.
#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;
bool compare(string a, string b){
// Check each digit
int i = 0, j = 0;
while(i < a.size() && j < b.size()){
// If different digits
if(a[i] - '0' != b[j] - '0')
return (a[i] - '0' < b[j] - '0');
i++, j++;
}
// Different sizes
return (a.size() < b.size());
}
int main(){
vector<string> array = {"1","2","3","4","5","6","7","8","9","10","11","12"};
sort(array.begin(), array.end(), compare);
for(auto value : array)
cout << value << " ";
return 0;
}
Input: 1 2 3 4 5 6 7 8 9 10 11 12
Output: 1 10 11 12 2 3 4 5 6 7 8 9
Related
The task (from a Bulgarian judge, click on "Език" to change it to English):
I am given the size of the first (S1 = A) of N corals. The size of every subsequent coral (Si, where i > 1) is calculated using the formula (B*Si-1 + C)%D, where A, B, C and D are some constants. I am told that Nemo is nearby the Kth coral (when the sizes of all corals are sorted in ascending order).
What is the size of the above-mentioned Kth coral ?
I will have T tests and for every one of them I will be given N, K, A, B, C and D and prompted to output the size of the Kth coral.
The requirements:
1 ≤ T ≤ 3
1 ≤ K ≤ N ≤ 107
0 ≤ A < D ≤ 1018
1 ≤ C, B*D ≤ 1018
Memory available is 64 MB
Time limit is 1.9 sec
The problem I have:
For the worst case scenario I will need 107*8B which is 76 MB.
The solution If the memory available was at least 80 MB would be:
#include <iostream>
#include <vector>
#include <iterator>
#include <algorithm>
using biggie = long long;
int main() {
int t;
std::cin >> t;
int i, n, k, j;
biggie a, b, c, d;
std::vector<biggie>::iterator it_ans;
for (i = 0; i != t; ++i) {
std::cin >> n >> k >> a >> b >> c >> d;
std::vector<biggie> lut{ a };
lut.reserve(n);
for (j = 1; j != n; ++j) {
lut.emplace_back((b * lut.back() + c) % d);
}
it_ans = std::next(lut.begin(), k - 1);
std::nth_element(lut.begin(), it_ans, lut.end());
std::cout << *it_ans << '\n';
}
return 0;
}
Question 1: How can I approach this CP task given the requirements listed above ?
Question 2: Is it somehow possible to use std::nth_element to solve it since I am not able to store all N elements ? I mean using std::nth_element in a sliding window technique (If this is possible).
# Christian Sloper
#include <iostream>
#include <queue>
using biggie = long long;
int main() {
int t;
std::cin >> t;
int i, n, k, j, j_lim;
biggie a, b, c, d, prev, curr;
for (i = 0; i != t; ++i) {
std::cin >> n >> k >> a >> b >> c >> d;
if (k < n - k + 1) {
std::priority_queue<biggie, std::vector<biggie>, std::less<biggie>> q;
q.push(a);
prev = a;
for (j = 1; j != k; ++j) {
curr = (b * prev + c) % d;
q.push(curr);
prev = curr;
}
for (; j != n; ++j) {
curr = (b * prev + c) % d;
if (curr < q.top()) {
q.pop();
q.push(curr);
}
prev = curr;
}
std::cout << q.top() << '\n';
}
else {
std::priority_queue<biggie, std::vector<biggie>, std::greater<biggie>> q;
q.push(a);
prev = a;
for (j = 1, j_lim = n - k + 1; j != j_lim; ++j) {
curr = (b * prev + c) % d;
q.push(curr);
prev = curr;
}
for (; j != n; ++j) {
curr = (b * prev + c) % d;
if (curr > q.top()) {
q.pop();
q.push(curr);
}
prev = curr;
}
std::cout << q.top() << '\n';
}
}
return 0;
}
This gets accepted (Succeeds all 40 tests. Largest time 1.4 seconds, for a test with T=3 and D≤10^9. Largest time for a test with larger D (and thus T=1) is 0.7 seconds.).
#include <iostream>
using biggie = long long;
int main() {
int t;
std::cin >> t;
int i, n, k, j;
biggie a, b, c, d;
for (i = 0; i != t; ++i) {
std::cin >> n >> k >> a >> b >> c >> d;
biggie prefix = 0;
for (int shift = d > 1000000000 ? 40 : 20; shift >= 0; shift -= 20) {
biggie prefix_mask = ((biggie(1) << (40 - shift)) - 1) << (shift + 20);
int count[1 << 20] = {0};
biggie s = a;
int rank = 0;
for (j = 0; j != n; ++j) {
biggie s_vs_prefix = s & prefix_mask;
if (s_vs_prefix < prefix)
++rank;
else if (s_vs_prefix == prefix)
++count[(s >> shift) & ((1 << 20) - 1)];
s = (b * s + c) % d;
}
int i = -1;
while (rank < k)
rank += count[++i];
prefix |= biggie(i) << shift;
}
std::cout << prefix << '\n';
}
return 0;
}
The result is a 60 bits number. I first determine the high 20 bits with one pass through the numbers, then the middle 20 bits in another pass, then the low 20 bits in another.
For the high 20 bits, generate all the numbers and count how often each high 20 bits pattern occurrs. After that, add up the counts until you reach K. The pattern where you reach K, that pattern covers the K-th largest number. In other words, that's the result's high 20 bits.
The middle and low 20 bits are computed similarly, except we take the by then known prefix (the high 20 bits or high+middle 40 bits) into account. As a little optimization, when D is small, I skip computing the high 20 bits. That got me from 2.1 seconds down to 1.4 seconds.
This solution is like user3386109 described, except with bucket size 2^20 instead of 10^6 so I can use bit operations instead of divisions and think of bit patterns instead of ranges.
For the memory constraint you hit:
(B*Si-1 + C)%D
requires only the value (Si-2) before itself. So you can compute them in pairs, to use only 1/2 of total you need. This only needs indexing even values and iterating once for odd values. So you can just use half-length LUT and compute the odd value in-flight. Modern CPUs are fast enough to do extra calculations like these.
std::vector<biggie> lut{ a_i,a_i_2,a_i_4,... };
a_i_3=computeOddFromEven(lut[1]);
You can make a longer stride like 4,8 too. If dataset is large, RAM latency is big. So it's like having checkpoints in whole data search space to balance between memory and core usage. 1000-distance checkpoints would put a lot of cpu cycles into re-calculations but then the array would fit CPU's L2/L1 cache which is not bad. When sorting, the maximum re-calc iteration per element would be n=1000 now. O(1000 x size) maybe it's a big constant but maybe somehow optimizable by compiler if some constants really const?
If CPU performance becomes problem again:
write a compiling function that writes your source code with all the "constant" given by user to a string
compile the code using command-line (assuming target computer has some accessible from command line like g++ from main program)
run it and get results
Compiler should enable more speed/memory optimizations when those are really constant in compile-time rather than depending on std::cin.
If you really need to add a hard-limit to the RAM usage, then implement a simple cache with the backing-store as your heavy computations with brute-force O(N^2) (or O(L x N) with checkpoints every L elements as in first method where L=2 or 4, or ...).
Here's a sample direct-mapped cache with 8M long-long value space:
int main()
{
std::vector<long long> checkpoints = {
a_0, a_16, a_32,...
};
auto cacheReadMissFunction = [&](int key){
// your pure computational algorithm here, helper meant to show variable
long long result = checkpoints[key/16];
for(key - key%16 times)
result = iterate(result);
return result;
};
auto cacheWriteMissFunction = [&](int key, long long value){
/* not useful for your algorithm as it doesn't change behavior per element */
// backing_store[key] = value;
};
// due to special optimizations, size has to be 2^k
int cacheSize = 1024*1024*8;
DirectMappedCache<int, long long> cache(cacheSize,cacheReadMissFunction,cacheWriteMissFunction);
std::cout << cache.get(20)<<std::endl;
return 0;
}
If you use a cache-friendly sorting-algorithm, a direct cache access would make a lot of re-use for nearly all the elements in comparisons if you fill the output buffer/terminal with elements one by one by following something like a bitonic-sort-path (that is known in compile-time). If that doesn't work, then you can try accessing files as a "backing-store" of cache for sorting whole array at once. Is file system prohibited for use? Then the online-compiling method above won't work either.
Implementation of a direct mapped cache (don't forget to call flush() after your algorithm finishes, if you use any cache.set() method):
#ifndef DIRECTMAPPEDCACHE_H_
#define DIRECTMAPPEDCACHE_H_
#include<vector>
#include<functional>
#include<mutex>
#include<iostream>
/* Direct-mapped cache implementation
* Only usable for integer type keys in range [0,maxPositive-1]
*
* CacheKey: type of key (only integers: int, char, size_t)
* CacheValue: type of value that is bound to key (same as above)
*/
template< typename CacheKey, typename CacheValue>
class DirectMappedCache
{
public:
// allocates buffers for numElements number of cache slots/lanes
// readMiss: cache-miss for read operations. User needs to give this function
// to let the cache automatically get data from backing-store
// example: [&](MyClass key){ return redis.get(key); }
// takes a CacheKey as key, returns CacheValue as value
// writeMiss: cache-miss for write operations. User needs to give this function
// to let the cache automatically set data to backing-store
// example: [&](MyClass key, MyAnotherClass value){ redis.set(key,value); }
// takes a CacheKey as key and CacheValue as value
// numElements: has to be integer-power of 2 (e.g. 2,4,8,16,...)
DirectMappedCache(CacheKey numElements,
const std::function<CacheValue(CacheKey)> & readMiss,
const std::function<void(CacheKey,CacheValue)> & writeMiss):size(numElements),sizeM1(numElements-1),loadData(readMiss),saveData(writeMiss)
{
// initialize buffers
for(size_t i=0;i<numElements;i++)
{
valueBuffer.push_back(CacheValue());
isEditedBuffer.push_back(0);
keyBuffer.push_back(CacheKey()-1);// mapping of 0+ allowed
}
}
// get element from cache
// if cache doesn't find it in buffers,
// then cache gets data from backing-store
// then returns the result to user
// then cache is available from RAM on next get/set access with same key
inline
const CacheValue get(const CacheKey & key) noexcept
{
return accessDirect(key,nullptr);
}
// only syntactic difference
inline
const std::vector<CacheValue> getMultiple(const std::vector<CacheKey> & key) noexcept
{
const int n = key.size();
std::vector<CacheValue> result(n);
for(int i=0;i<n;i++)
{
result[i]=accessDirect(key[i],nullptr);
}
return result;
}
// thread-safe but slower version of get()
inline
const CacheValue getThreadSafe(const CacheKey & key) noexcept
{
std::lock_guard<std::mutex> lg(mut);
return accessDirect(key,nullptr);
}
// set element to cache
// if cache doesn't find it in buffers,
// then cache sets data on just cache
// writing to backing-store only happens when
// another access evicts the cache slot containing this key/value
// or when cache is flushed by flush() method
// then returns the given value back
// then cache is available from RAM on next get/set access with same key
inline
void set(const CacheKey & key, const CacheValue & val) noexcept
{
accessDirect(key,&val,1);
}
// thread-safe but slower version of set()
inline
void setThreadSafe(const CacheKey & key, const CacheValue & val) noexcept
{
std::lock_guard<std::mutex> lg(mut);
accessDirect(key,&val,1);
}
// use this before closing the backing-store to store the latest bits of data
void flush()
{
try
{
std::lock_guard<std::mutex> lg(mut);
for (size_t i=0;i<size;i++)
{
if (isEditedBuffer[i] == 1)
{
isEditedBuffer[i]=0;
auto oldKey = keyBuffer[i];
auto oldValue = valueBuffer[i];
saveData(oldKey,oldValue);
}
}
}catch(std::exception &ex){ std::cout<<ex.what()<<std::endl; }
}
// direct mapped access
// opType=0: get
// opType=1: set
CacheValue const accessDirect(const CacheKey & key,const CacheValue * value, const bool opType = 0)
{
// find tag mapped to the key
CacheKey tag = key & sizeM1;
// compare keys
if(keyBuffer[tag] == key)
{
// cache-hit
// "set"
if(opType == 1)
{
isEditedBuffer[tag]=1;
valueBuffer[tag]=*value;
}
// cache hit value
return valueBuffer[tag];
}
else // cache-miss
{
CacheValue oldValue = valueBuffer[tag];
CacheKey oldKey = keyBuffer[tag];
// eviction algorithm start
if(isEditedBuffer[tag] == 1)
{
// if it is "get"
if(opType==0)
{
isEditedBuffer[tag]=0;
}
saveData(oldKey,oldValue);
// "get"
if(opType==0)
{
const CacheValue && loadedData = loadData(key);
valueBuffer[tag]=loadedData;
keyBuffer[tag]=key;
return loadedData;
}
else /* "set" */
{
valueBuffer[tag]=*value;
keyBuffer[tag]=key;
return *value;
}
}
else // not edited
{
// "set"
if(opType == 1)
{
isEditedBuffer[tag]=1;
}
// "get"
if(opType == 0)
{
const CacheValue && loadedData = loadData(key);
valueBuffer[tag]=loadedData;
keyBuffer[tag]=key;
return loadedData;
}
else // "set"
{
valueBuffer[tag]=*value;
keyBuffer[tag]=key;
return *value;
}
}
}
}
private:
const CacheKey size;
const CacheKey sizeM1;
std::mutex mut;
std::vector<CacheValue> valueBuffer;
std::vector<unsigned char> isEditedBuffer;
std::vector<CacheKey> keyBuffer;
const std::function<CacheValue(CacheKey)> loadData;
const std::function<void(CacheKey,CacheValue)> saveData;
};
#endif /* DIRECTMAPPEDCACHE_H_ */
You can solve this problem using a Max-heap.
Insert the first k elements into the max-heap. The largest element of these k will now be at the root.
For each remaining element e:
Compare e to the root.
If e is larger than the root, discard it.
If e is smaller than the root, remove the root and insert e into the heap structure.
After all elements have been processed, the k-th smallest element is at the root.
This method uses O(K) space and O(n log n) time.
There’s an algorithm that people often call LazySelect that I think would be perfect here.
With high probability, we make two passes. In the first pass, we save a random sample of size n much less than N. The answer will be around index (K/N)n in the sorted sample, but due to the randomness, we have to be careful. Save the values a and b at (K/N)n ± r instead, where r is the radius of the window. In the second pass, we save all of the values in [a, b], count the number of values less than a (let it be L), and select the value with index K−L if it’s in the window (otherwise, try again).
The theoretical advice on choosing n and r is fine, but I would be pragmatic here. Choose n so that you use most of the available memory; the bigger the sample, the more informative it is. Choose r fairly large as well, but not quite as aggressively due to the randomness.
C++ code below. On the online judge, it’s faster than Kelly’s (max 1.3 seconds on the T=3 tests, 0.5 on the T=1 tests).
#include <algorithm>
#include <cmath>
#include <cstdint>
#include <iostream>
#include <limits>
#include <optional>
#include <random>
#include <vector>
namespace {
class LazySelector {
public:
static constexpr std::int32_t kTargetSampleSize = 1000;
explicit LazySelector() { sample_.reserve(1000000); }
void BeginFirstPass(const std::int32_t n, const std::int32_t k) {
sample_.clear();
mask_ = n / kTargetSampleSize;
mask_ |= mask_ >> 1;
mask_ |= mask_ >> 2;
mask_ |= mask_ >> 4;
mask_ |= mask_ >> 8;
mask_ |= mask_ >> 16;
}
void FirstPass(const std::int64_t value) {
if ((gen_() & mask_) == 0) {
sample_.push_back(value);
}
}
void BeginSecondPass(const std::int32_t n, const std::int32_t k) {
sample_.push_back(std::numeric_limits<std::int64_t>::min());
sample_.push_back(std::numeric_limits<std::int64_t>::max());
const double p = static_cast<double>(sample_.size()) / n;
const double radius = 2 * std::sqrt(sample_.size());
const auto lower =
sample_.begin() + std::clamp<std::int32_t>(std::floor(p * k - radius),
0, sample_.size() - 1);
const auto upper =
sample_.begin() + std::clamp<std::int32_t>(std::ceil(p * k + radius), 0,
sample_.size() - 1);
std::nth_element(sample_.begin(), upper, sample_.end());
std::nth_element(sample_.begin(), lower, upper);
lower_ = *lower;
upper_ = *upper;
sample_.clear();
less_than_lower_ = 0;
equal_to_lower_ = 0;
equal_to_upper_ = 0;
}
void SecondPass(const std::int64_t value) {
if (value < lower_) {
++less_than_lower_;
} else if (upper_ < value) {
} else if (value == lower_) {
++equal_to_lower_;
} else if (value == upper_) {
++equal_to_upper_;
} else {
sample_.push_back(value);
}
}
std::optional<std::int64_t> Select(std::int32_t k) {
if (k < less_than_lower_) {
return std::nullopt;
}
k -= less_than_lower_;
if (k < equal_to_lower_) {
return lower_;
}
k -= equal_to_lower_;
if (k < sample_.size()) {
const auto kth = sample_.begin() + k;
std::nth_element(sample_.begin(), kth, sample_.end());
return *kth;
}
k -= sample_.size();
if (k < equal_to_upper_) {
return upper_;
}
return std::nullopt;
}
private:
std::default_random_engine gen_;
std::vector<std::int64_t> sample_ = {};
std::int32_t mask_ = 0;
std::int64_t lower_ = std::numeric_limits<std::int64_t>::min();
std::int64_t upper_ = std::numeric_limits<std::int64_t>::max();
std::int32_t less_than_lower_ = 0;
std::int32_t equal_to_lower_ = 0;
std::int32_t equal_to_upper_ = 0;
};
} // namespace
int main() {
int t;
std::cin >> t;
for (int i = t; i > 0; --i) {
std::int32_t n;
std::int32_t k;
std::int64_t a;
std::int64_t b;
std::int64_t c;
std::int64_t d;
std::cin >> n >> k >> a >> b >> c >> d;
std::optional<std::int64_t> ans = std::nullopt;
LazySelector selector;
do {
{
selector.BeginFirstPass(n, k);
std::int64_t s = a;
for (std::int32_t j = n; j > 0; --j) {
selector.FirstPass(s);
s = (b * s + c) % d;
}
}
{
selector.BeginSecondPass(n, k);
std::int64_t s = a;
for (std::int32_t j = n; j > 0; --j) {
selector.SecondPass(s);
s = (b * s + c) % d;
}
}
ans = selector.Select(k - 1);
} while (!ans);
std::cout << *ans << '\n';
}
}
I've got an array (actually std::vector) size ~ 7k elements.
If you draw this data, there will be a diagram of the combustion of the fuel. But I want to minimize this vector from 7k elements to 721 (every 0.5 degree) elements or ~ 1200 (every 0.3 degree). Of course I want save diagram the same. How can I do it?
Now I am getting every 9 element from big vector to new and cutting other evenly from front and back of vector to get 721 size.
QVector <double> newVMTVector;
for(QVector <double>::iterator itv = oldVmtDataVector.begin(); itv < oldVmtDataVector.end() - 9; itv+=9){
newVMTVector.push_back(*itv);
}
auto useless = newVMTVector.size() - 721;
if(useless%2 == 0){
newVMTVector.erase(newVMTVector.begin(), newVMTVector.begin() + useless/2);
newVMTVector.erase(newVMTVector.end() - useless/2, newVMTVector.end());
}
else{
newVMTVector.erase(newVMTVector.begin(), newVMTVector.begin() + useless/2+1);
newVMTVector.erase(newVMTVector.end() - useless/2, newVMTVector.end());
}
newVMTVector.squeeze();
oldVmtDataVector.clear();
oldVmtDataVector = newVMTVector;
I can swear there is an algorithm that averages and reduces the array.
The way I understand it you want to pick the elements [0, k, 2k, 3k ... ] where n is 10 or n is 6.
Here's a simple take:
template <typename It>
It strided_inplace_reduce(It it, It const last, size_t stride) {
It out = it;
if (stride < 1) return last;
while (it < last)
{
*out++ = *it;
std::advance(it, stride);
}
return out;
}
Generalizing a bit for non-random-access iterators:
Live On Coliru
#include <iterator>
namespace detail {
// version for random access iterators
template <typename It>
It strided_inplace_reduce(It it, It const last, size_t stride, std::random_access_iterator_tag) {
It out = it;
if (stride < 1) return last;
while (it < last)
{
*out++ = *it;
std::advance(it, stride);
}
return out;
}
// other iterator categories
template <typename It>
It strided_inplace_reduce(It it, It const last, size_t stride, ...) {
It out = it;
if (stride < 1) return last;
while (it != last) {
*out++ = *it;
for (size_t n = stride; n && it != last; --n)
{
it = std::next(it);
}
}
return out;
}
}
template <typename Range>
auto strided_inplace_reduce(Range& range, size_t stride) {
using std::begin;
using std::end;
using It = decltype(begin(range));
It it = begin(range), last = end(range);
return detail::strided_inplace_reduce(it, last, stride, typename std::iterator_traits<It>::iterator_category{});
}
#include <vector>
#include <list>
#include <iostream>
int main() {
{
std::vector<int> v { 1,2,3,4,5,6,7,8,9 };
v.erase(strided_inplace_reduce(v, 2), v.end());
std::copy(v.begin(), v.end(), std::ostream_iterator<int>(std::cout << "\nv: ", " "));
}
{
std::list<int> l { 1,2,3,4,5,6,7,8,9 };
l.erase(strided_inplace_reduce(l, 4), l.end());
std::copy(l.begin(), l.end(), std::ostream_iterator<int>(std::cout << "\nl: ", " "));
}
}
Prints
v: 1 3 5 7 9
l: 1 5 9
What you need is an interpolation. There are many libraries providing many types of interpolation. This one is very lightweight and easy to setup and run:
http://kluge.in-chemnitz.de/opensource/spline/
All you need to do is create the second vector that contains the X values, pass both vectors to generate spline, and generate interpolated results every 0.5 degrees or whatever:
std::vector<double> Y; // Y is your current vector of fuel combustion values with ~7k elements
std::vector<double> X;
X.reserve(Y.size());
double step_x = 360 / (double)Y.size();
for (int i = 0; i < X.size(); ++i)
X[i] = i*step_x;
tk::spline s;
s.set_points(X, Y);
double interpolation_step = 0.5;
std::vector<double> interpolated_results;
interpolated_results.reserve(std::ceil(360/interpolation_step) + 1);
for (double i = 0.0, int j = 0; i <= 360; i += interpolation_step, ++j) // <= in order to obtain range <0;360>
interpolated_results[j] = s(i);
if (fmod(360, interpolation_step) != 0.0) // for steps that don't divide 360 evenly, e.g. 0.7 deg, we need to close the range
interpolated_results.back() = s(360);
// now interpolated_results contain values every 0.5 degrees
This should give you and idea how to use this kind of libraries. If you need some other interpolation type, just find the one that suits your needs. The usage should be similar.
I have a vector with digits of number, vector represents big integer in system with base 2^32. For example:
vector <unsigned> vec = {453860625, 469837947, 3503557200, 40}
This vector represent this big integer:
base = 2 ^ 32
3233755723588593872632005090577 = 40 * base ^ 3 + 3503557200 * base ^ 2 + 469837947 * base + 453860625
How to get this decimal representation in string?
Here is an inefficient way to do what you want, get a decimal string from a vector of word values representing an integer of arbitrary size.
I would have preferred to implement this as a class, for better encapsulation and so math operators could be added, but to better comply with the question, this is just a bunch of free functions for manipulating std::vector<unsigned> objects. This does use a typedef BiType as an alias for std::vector<unsigned> however.
Functions for doing the binary division make up most of this code. Much of it duplicates what can be done with std::bitset, but for bitsets of arbitrary size, as vectors of unsigned words. If you want to improve efficiency, plug in a division algorithm which does per-word operations, instead of per-bit. Also, the division code is general-purpose, when it is only ever used to divide by 10, so you could replace it with special-purpose division code.
The code generally assumes a vector of unsigned words and also that the base is the maximum unsigned value, plus one. I left a comment wherever things would go wrong for smaller bases or bases which are not a power of 2 (binary division requires base to be a power of 2).
Also, I only tested for 1 case, the one you gave in the OP -- and this is new, unverified code, so you might want to do some more testing. If you find a problem case, I'll be happy to fix the bug here.
#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
namespace bigint {
using BiType = std::vector<unsigned>;
// cmp compares a with b, returning 1:a>b, 0:a==b, -1:a<b
int cmp(const BiType& a, const BiType& b) {
const auto max_size = std::max(a.size(), b.size());
for(auto i=max_size-1; i+1; --i) {
const auto wa = i < a.size() ? a[i] : 0;
const auto wb = i < b.size() ? b[i] : 0;
if(wa != wb) { return wa > wb ? 1 : -1; }
}
return 0;
}
bool is_zero(BiType& bi) {
for(auto w : bi) { if(w) return false; }
return true;
}
// canonize removes leading zero words
void canonize(BiType& bi) {
const auto size = bi.size();
if(!size || bi[size-1]) return;
for(auto i=size-2; i+1; --i) {
if(bi[i]) {
bi.resize(i + 1);
return;
}
}
bi.clear();
}
// subfrom subtracts b from a, modifying a
// a >= b must be guaranteed by caller
void subfrom(BiType& a, const BiType& b) {
unsigned borrow = 0;
for(std::size_t i=0; i<b.size(); ++i) {
if(b[i] || borrow) {
// TODO: handle error if i >= a.size()
const auto w = a[i] - b[i] - borrow;
// this relies on the automatic w = w (mod base),
// assuming unsigned max is base-1
// if this is not the case, w must be set to w % base here
borrow = w >= a[i];
a[i] = w;
}
}
for(auto i=b.size(); borrow; ++i) {
// TODO: handle error if i >= a.size()
borrow = !a[i];
--a[i];
// a[i] must be set modulo base here too
// (this is automatic when base is unsigned max + 1)
}
}
// binary division and its helpers: these require base to be a power of 2
// hi_bit_set is base/2
// the definition assumes CHAR_BIT == 8
const auto hi_bit_set = unsigned(1) << (sizeof(unsigned) * 8 - 1);
// shift_right_1 divides bi by 2, truncating any fraction
void shift_right_1(BiType& bi) {
unsigned carry = 0;
for(auto i=bi.size()-1; i+1; --i) {
const auto next_carry = (bi[i] & 1) ? hi_bit_set : 0;
bi[i] >>= 1;
bi[i] |= carry;
carry = next_carry;
}
// if carry is nonzero here, 1/2 was truncated from the result
canonize(bi);
}
// shift_left_1 multiplies bi by 2
void shift_left_1(BiType& bi) {
unsigned carry = 0;
for(std::size_t i=0; i<bi.size(); ++i) {
const unsigned next_carry = !!(bi[i] & hi_bit_set);
bi[i] <<= 1; // assumes high bit is lost, i.e. base is unsigned max + 1
bi[i] |= carry;
carry = next_carry;
}
if(carry) { bi.push_back(1); }
}
// sets an indexed bit in bi, growing the vector when required
void set_bit_at(BiType& bi, std::size_t index, bool set=true) {
std::size_t widx = index / (sizeof(unsigned) * 8);
std::size_t bidx = index % (sizeof(unsigned) * 8);
if(bi.size() < widx + 1) { bi.resize(widx + 1); }
if(set) { bi[widx] |= unsigned(1) << bidx; }
else { bi[widx] &= ~(unsigned(1) << bidx); }
}
// divide divides n by d, returning the result and leaving the remainder in n
// this is implemented using binary division
BiType divide(BiType& n, BiType d) {
if(is_zero(d)) {
// TODO: handle divide by zero
return {};
}
std::size_t shift = 0;
while(cmp(n, d) == 1) {
shift_left_1(d);
++shift;
}
BiType result;
do {
if(cmp(n, d) >= 0) {
set_bit_at(result, shift);
subfrom(n, d);
}
shift_right_1(d);
} while(shift--);
canonize(result);
canonize(n);
return result;
}
std::string get_decimal(BiType bi) {
std::string dec_string;
// repeat division by 10, using the remainder as a decimal digit
// this will build a string with digits in reverse order, so
// before returning, it will be reversed to correct this.
do {
const auto next_bi = divide(bi, {10});
const char digit_value = static_cast<char>(bi.size() ? bi[0] : 0);
dec_string.push_back('0' + digit_value);
bi = next_bi;
} while(!is_zero(bi));
std::reverse(dec_string.begin(), dec_string.end());
return dec_string;
}
}
int main() {
bigint::BiType my_big_int = {453860625, 469837947, 3503557200, 40};
auto dec_string = bigint::get_decimal(my_big_int);
std::cout << dec_string << '\n';
}
Output:
3233755723588593872632005090577
I am currently using the code below that removes all digits equal to zero from an integer.
int removeZeros(int candid)
{
int output = 0;
string s(itoa(candid));
for (int i = s.size(); i != 0; --i)
{
if (s[i] != '0') output = output * 10 + atoi(s[i]);
}
return output;
}
The expected output for e.g. 102304 would be 1234.
Is there a more compact way of doing this by directly working on the integer, that is, not string representation? Is it actually going to be faster?
Here's a way to do it without strings and buffers.
I've only tested this with positive numbers. To make this work with negative numbers is an exercise left up to you.
int removeZeros(int x)
{
int result = 0;
int multiplier = 1;
while (x > 0)
{
int digit = x % 10;
if (digit != 0)
{
int val = digit * multiplier;
result += val;
multiplier *= 10;
}
x = x / 10;
}
return result;
}
For maintainability, I would suggest, don't work directly on the numeric value. You can express your requirements in a very straightforward way using string manipulations, and while it's true that it will likely perform slower than number manipulations, I expect either to be fast enough that you don't have to worry about the performance unless it's in an extremely tight loop.
int removeZeros(int n) {
auto s = std::to_string(n);
s.erase(std::remove(s.begin(), s.end(), '0'), s.end());
return std::stoi(s);
}
As a bonus, this simpler implementation handles negative numbers correctly. For zero, it throws std::invalid_argument, because removing all zeros from 0 doesn't produce a number.
You could try something like this:
template<typename T> T nozeros( T const & z )
{
return z==0 ? 0 : (z%10?10:1)*nozeros(z/10)+(z%10);
}
If you want to take your processing one step further you can do a nice tail recursion , no need for a helper function:
template<typename T> inline T pow10(T p, T res=1)
{
return p==0 ? res : pow10(--p,res*10);
}
template<typename T> T nozeros( T const & z , T const & r=0, T const & zp =0)
{
static int digit =-1;
return not ( z ^ r ) ? digit=-1, zp : nozeros(z/10,z%10, r ? r*pow10(++digit)+zp : zp);
}
Here is how this will work with input 32040
Ret, z, r, zp, digits
-,32040,0,0, -1
-,3204,0,0, -1
-,320,4,0,0, -1
-,32,0,4,4, 0
-,3,2,4, 0
-,0,3,24, 1
-,0,0,324, 2
324,-,-,-, -1
Integer calculations are always faster than actually transforming your integer to string, making comparisons on strings, and looking up strings to turn them back to integers.
The cool thing is that if you try to pass floats you get nice compile time errors.
I claim this to be slightly faster than other solutions as it makes less conditional evaluations which will make it behave better with CPU branch prediction.
int number = 9042100;
stringstream strm;
strm << number;
string str = strm.str();
str.erase(remove(str.begin(), str.end(), '0'), str.end());
number = atoi(str.c_str());
No string representation is used here. I can't say anything about the speed though.
int removezeroes(int candid)
{
int x, y = 0, n = 0;
// I did this to reverse the number as my next loop
// reverses the number while removing zeroes.
while (candid>0)
{
x = candid%10;
n = n *10 + x;
candid /=10;
}
candid = n;
while (candid>0)
{
x = candid%10;
if (x != 0)
y = y*10 + x;
candid /=10;
}
return y;
}
If C++11 is available, I do like this with lambda function:
int removeZeros(int candid){
std::string s=std::to_string(candid);
std::string output;
std::for_each(s.begin(), s.end(), [&](char& c){ if (c != '0') output += c;});
return std::stoi(output);
}
A fixed implementation of g24l recursive solution:
template<typename T> T nozeros(T const & z)
{
if (z == 0) return 0;
if (z % 10 == 0) return nozeros(z / 10);
else return (z % 10) + ( nozeros(z / 10) * 10);
}
Below is my implementation of Pollard's rho algorithm for prime factorization:
#include <vector>
#include <queue>
#include <gmpxx.h>
// Interface to the GMP random number functions.
gmp_randclass rng(gmp_randinit_default);
// Returns a divisor of N using Pollard's rho method.
mpz_class getDivisor(const mpz_class &N)
{
mpz_class c = rng.get_z_range(N);
mpz_class x = rng.get_z_range(N);
mpz_class y = x;
mpz_class d = 1;
mpz_class z;
while (d == 1) {
x = (x*x + c) % N;
y = (y*y + c) % N;
y = (y*y + c) % N;
z = x - y;
mpz_gcd(d.get_mpz_t(), z.get_mpz_t(), N.get_mpz_t());
}
return d;
}
// Adds the prime factors of N to the given vector.
void factor(const mpz_class &N, std::vector<mpz_class> &factors)
{
std::queue<mpz_class> to_factor;
to_factor.push(N);
while (!to_factor.empty()) {
mpz_class n = to_factor.front();
to_factor.pop();
if (n == 1) {
continue; // Trivial factor.
} else if (mpz_probab_prime_p(n.get_mpz_t(), 5)) {
// n is a prime.
factors.push_back(n);
} else {
// n is a composite, so push its factors on the queue.
mpz_class d = getDivisor(n);
to_factor.push(d);
to_factor.push(n/d);
}
}
}
It's essentially a straight translation of the pseudocode on Wikipedia, and relies on GMP for big numbers and for primality testing. The implementation works well and can factor primes such as
1000036317378699858851366323 = 1000014599 * 1000003357 * 1000018361
but will choke on e.g.
1000000000002322140000000048599822299 = 1000000000002301019 * 1000000000000021121
My question is: Is there anything I can do to improve on this, short of switching to a more complex factorization algorithm such as Quadratic sieve?
I know one improvement could be to first do some trial divisions by pre-computed primes, but that would not help for products of a few large primes such as the above.
I'm interested in any tips on improvements to the basic Pollard's rho method to get it to handle larger composites of only a few prime factors. Of course if you find any stupidities in the code above, I'm interested in those as well.
For full disclosure: This is a homework assignment, so general tips and pointers are better than fully coded solutions. With this very simple approach I already get a passing grade on the assignment, but would of course like to improve.
Thanks in advance.
You are using the original version of the rho algorithm due to Pollard. Brent's variant makes two improvements: Floyd's tortoise-and-hare cycle-finding algorithm is replaced by a cycle-finding algorithm developed by Brent, and the gcd calculation is delayed so it is performed only once every hundred or so times through the loop instead of every time. But those changes only get a small improvement, maybe 25% or so, and won't allow you to factor the large numbers you are talking about. Thus, you will need a better algorithm: SQUFOF might work for semi-primes the size that you mention, or you could implement quadratic sieve or the elliptic curve method. I have discussion and implementation of all those algorithms at my blog.
Part 1
Very interesting question you have, thanks!
Decided to implement my own very complex C++ solution of your task from scratch. Although you asked not to write code, still I did it fully only to have proof of concept, to check real speed and timings.
To tell in advance, I improved speed of you program 250-500x times (see Part 2).
Besides well known algorithm described in Wikipedia I did following extra optimizations:
Made most of computations in compile time. This is main feature of my code. Input number N is provided at compile time as long macro constant. This ensures that compiler does half of optimizations at compile time like inlining constants and doing optimizing division and other arithmetics. As a drawback, you have to re-compile program every time when you change a number that you want to factor.
Additionally to 1. I also did support of runtime-only value of N. This is needed to do real comparison of speed in different environments.
One more very important speed improvement is that I used Montgomery Reduction to speedup modulus division. This Montgomery speeds up all computations 2-2.5x times. Besides Montgomery you can also use Barrett Reduction. Both Montgomery and Barrett replace single expensive division with several multiplications and additions, which makes division very fast.
Unlike in your code I do GCD (Greatest Common Divisor) computation very rarely, once in 8 000 - 16 000 iterations. Because GCD is very expensive, it needs around 128 expensive divisions for 128-bit numbers. Instead of computing GCD(x - y, N) every time you can notice that it is enough to accumulate product prod = (prod * (x - y)) % N and later after thousand of such iterations just compute GCD(prod, N). This is easily derived from fact that GCD((a * b * c) % N, N) = GCD(GCD(a, N) * GCD(b, N) * GCD(c, N), N).
One very advanced and fast optimization that I did is implemented my own uint128 and uint256 with all necessary sub-optimizations needed for my task. This optimization is only posted in code by me in Part 2 of my answer, see this part after first code.
As a result of above steps, total speed of Pollard Rho is increased 50-100x times, especially due to doing GCD only once in thousand steps. This speed is enough even to compute your biggest number provided in your question.
Besides algorithms described above I also used following extra algorithms: Extended Euclidean Algorithm (for computing coefficients for modular inverse), Modular Multiplicative Inverse, Euclidean Algorithm (for computing GCD), Modular Binary Exponentiation, Trial Division (for checking primality of small numbers), Fermat Primality Test, as already mentioned Montgomery Reduction and Pollard Rho itself.
I did following timings and speed measurements:
N 1000036317378699858851366323 (90 bits)
1000003357 (30 bits) * 1000014599 (30 bits) * 1000018361 (30 bits)
PollardRho time 0.1 secs, tries 1 iterations 25599 (2^14.64)
N 780002082420246798979794021150335143 (120 bits)
244300526707007 (48 bits) * 3192797383346267127449 (72 bits)
PollardRho time 32 secs, tries 1 iterations 25853951 (2^24.62)
NO-Montgomery time 70 secs
N 614793320656537415355785711660734447 (120 bits)
44780536225818373 (56 bits) * 13729029897191722339 (64 bits)
PollardRho time 310 secs, tries 1 iterations 230129663 (2^27.78)
N 1000000000002322140000000048599822299 (120 bits)
1000000000000021121 (60 bits) * 1000000000002301019 (60 bits)
PollardRho time 2260 secs, tries 1 iterations 1914068991 (2^30.83)
As you can see above your smaller number takes just 0.1 second to factor, while your bigger number (that you failed to factor at all) takes quite affordable time, 2260 seconds (a bit more than half hour). Also you can see that I created myself a number with 48-bit smallest factor, and another number with 56-bit smaller factor.
In general a rule is such that if you have smallest factor of K-bit then it takes 2^(K/2) iterations of Pollard Rho to compute this factor. Unlike for example Trial division algorithm which needs square times bigger time of 2^K for K-bit factor.
In my code see very start of file, there is a bunch of lines #define NUM, each defining compile time constant containing a number. You can comment out any line or change value of a number or add a new line with new number. Then re-compile program and run it to see results.
Before below code don't forget to click on Try it online! link to check code run on GodBolt server. Also see example Console Output after code.
Try it online!
#include <cstdint>
#include <tuple>
#include <iostream>
#include <iomanip>
#include <chrono>
#include <random>
#include <stdexcept>
#include <string>
#include <mutex>
#include <cmath>
#include <type_traits>
#include <boost/multiprecision/cpp_int.hpp>
//#define NUM "1000036317378699858851366323" // 90 bits, 1000003357 (30 bits) * 1000014599 (30 bits) * 1000018361 (30 bits), PollardRho time 0.1 secs, tries 1 iterations 25599 (2^14.64)
#define NUM "780002082420246798979794021150335143" // 120 bits, 244300526707007 (48 bits) * 3192797383346267127449 (72 bits), PollardRho time 32 secs, tries 1 iterations 25853951 (2^24.62), NO-Montgomery time 70 secs
//#define NUM "614793320656537415355785711660734447" // 120 bits, 44780536225818373 (56 bits) * 13729029897191722339 (64 bits), PollardRho time 310 secs, tries 1 iterations 230129663 (2^27.78)
//#define NUM "1000000000002322140000000048599822299" // 120 bits, 1000000000000021121 (60 bits) * 1000000000002301019 (60 bits), PollardRho time 2260 secs, tries 1 iterations 1914068991 (2^30.83)
#define IS_DEBUG 0
#define IS_COMPILE_TIME 1
bool constexpr use_montg = 1;
size_t constexpr gcd_per_nloops = 1 << 14;
#if defined(_MSC_VER) && !defined(__clang__)
#define HAS_INT128 0
#else
#define HAS_INT128 1
#endif
#define ASSERT_MSG(cond, msg) { if (!(cond)) throw std::runtime_error("Assertion (" #cond ") failed at line " + std::to_string(__LINE__) + "! Msg: '" + std::string(msg) + "'."); }
#define ASSERT(cond) ASSERT_MSG(cond, "")
#define COUT(code) { std::unique_lock<std::mutex> lock(cout_mux); std::cout code; std::cout << std::flush; }
#if IS_DEBUG
#define LN { COUT(<< "LN " << __LINE__ << std::endl); }
#define DUMP(var) { COUT(<< __LINE__ << " : " << #var << " = (" << (var) << ")" << std::endl); }
#else
#define LN
#define DUMP(var)
#endif
#define bisizeof(x) (sizeof(x) * 8)
using u32 = uint32_t;
using i64 = int64_t;
using u64 = uint64_t;
using u128 = boost::multiprecision::uint128_t;
using i128 = boost::multiprecision::number<boost::multiprecision::cpp_int_backend<128, 128, boost::multiprecision::signed_magnitude, boost::multiprecision::unchecked, void>>;
using u192 = boost::multiprecision::number<boost::multiprecision::cpp_int_backend<192, 192, boost::multiprecision::unsigned_magnitude, boost::multiprecision::unchecked, void>>;
using i192 = boost::multiprecision::number<boost::multiprecision::cpp_int_backend<192, 192, boost::multiprecision::signed_magnitude, boost::multiprecision::unchecked, void>>;
using u256 = boost::multiprecision::uint256_t;
using i256 = boost::multiprecision::number<boost::multiprecision::cpp_int_backend<256, 256, boost::multiprecision::signed_magnitude, boost::multiprecision::unchecked, void>>;
using u384 = boost::multiprecision::number<boost::multiprecision::cpp_int_backend<384, 384, boost::multiprecision::unsigned_magnitude, boost::multiprecision::unchecked, void>>;
using i384 = boost::multiprecision::number<boost::multiprecision::cpp_int_backend<384, 384, boost::multiprecision::signed_magnitude, boost::multiprecision::unchecked, void>>;
#if HAS_INT128
using u128_cl = unsigned __int128;
using i128_cl = signed __int128;
#endif
template <typename T> struct DWordOf;
template <> struct DWordOf<u64> : std::type_identity<u128> {};
template <> struct DWordOf<i64> : std::type_identity<i128> {};
template <> struct DWordOf<u128> : std::type_identity<u256> {};
template <> struct DWordOf<i128> : std::type_identity<i256> {};
#if HAS_INT128
template <> struct DWordOf<u128_cl> : std::type_identity<u256> {};
template <> struct DWordOf<i128_cl> : std::type_identity<i256> {};
#endif
template <typename T>
using DWordOfT = typename DWordOf<T>::type;
template <typename T> struct SignedOf;
template <> struct SignedOf<u64> : std::type_identity<i64> {};
template <> struct SignedOf<i64> : std::type_identity<i64> {};
template <> struct SignedOf<u128> : std::type_identity<i128> {};
template <> struct SignedOf<i128> : std::type_identity<i128> {};
#if HAS_INT128
template <> struct SignedOf<u128_cl> : std::type_identity<i128> {};
template <> struct SignedOf<i128_cl> : std::type_identity<i128> {};
#endif
template <typename T>
using SignedOfT = typename SignedOf<T>::type;
template <typename T> struct BiSizeOf;
template <> struct BiSizeOf<u64> : std::integral_constant<size_t, 64> {};
template <> struct BiSizeOf<u128> : std::integral_constant<size_t, 128> {};
template <> struct BiSizeOf<u192> : std::integral_constant<size_t, 192> {};
template <> struct BiSizeOf<u256> : std::integral_constant<size_t, 256> {};
template <> struct BiSizeOf<u384> : std::integral_constant<size_t, 384> {};
#if HAS_INT128
template <> struct BiSizeOf<u128_cl> : std::integral_constant<size_t, 128> {};
#endif
template <typename T>
size_t constexpr BiSizeOfT = BiSizeOf<T>::value;
static std::mutex cout_mux;
double Time() {
static auto const gtb = std::chrono::high_resolution_clock::now();
return std::chrono::duration_cast<std::chrono::duration<double>>(
std::chrono::high_resolution_clock::now() - gtb).count();
}
template <typename T, typename DT = DWordOfT<T>>
constexpr DT MulD(T const & a, T const & b) {
return DT(a) * b;
}
template <typename T>
constexpr auto EGCD(T const & a, T const & b) {
using ST = SignedOfT<T>;
using DST = DWordOfT<ST>;
T ro = 0, r = 0, qu = 0, re = 0;
ST so = 0, s = 0;
std::tie(ro, r, so, s) = std::make_tuple(a, b, 1, 0);
while (r != 0) {
std::tie(qu, re) = std::make_tuple(ro / r, ro % r);
std::tie(ro, r) = std::make_tuple(r, re);
std::tie(so, s) = std::make_tuple(s, ST(so - DST(s) * ST(qu)));
}
ST const to = ST((DST(ro) - DST(a) * so) / ST(b));
return std::make_tuple(ro, so, to);
}
template <typename T>
constexpr T ModInv(T x, T mod) {
using ST = SignedOfT<T>;
using DT = DWordOfT<T>;
x %= mod;
auto [g, s, t] = EGCD(x, mod);
//ASSERT(g == 1);
if (s < 0) {
//ASSERT(ST(mod) + s >= 0);
s += mod;
} else {
//ASSERT(s < mod);
}
//ASSERT((DT(x) * s) % mod == 1);
return T(s);
}
template <typename ST>
constexpr std::tuple<ST, ST, ST, ST> MontgKRR(ST n) {
size_t constexpr ST_bisize = BiSizeOfT<ST>;
using DT = DWordOfT<ST>;
DT constexpr r = DT(1) << ST_bisize;
ST const rmod = ST(r % n), rmod2 = ST(MulD<ST>(rmod, rmod) % n), rinv = ModInv<ST>(rmod, n);
DT const k0 = (r * DT(rinv) - 1) / n;
//ASSERT(k0 < (DT(1) << ST_bisize));
ST const k = ST(k0);
return std::make_tuple(k, rmod, rmod2, rinv);
}
template <typename T>
constexpr T GCD(T a, T b) {
while (b != 0)
std::tie(a, b) = std::make_tuple(b, a % b);
return a;
}
template <typename T>
T PowMod(T a, T b, T const & c) {
// https://en.wikipedia.org/wiki/Modular_exponentiation
using DT = DWordOfT<T>;
T r = 1;
while (b != 0) {
if (u32(b) & 1)
r = T(MulD<T>(r, a) % c);
a = T(MulD<T>(a, a) % c);
b >>= 1;
}
return r;
}
template <typename T>
std::pair<bool, bool> IsProbablyPrime_TrialDiv(T const n, u64 limit = u64(-1)) {
// https://en.wikipedia.org/wiki/Trial_division
if (n <= 16)
return {n == 2 || n == 3 || n == 5 || n == 7 || n == 11 || n == 13, true};
if ((n & 1) == 0)
return {false, true};
u64 d = 0;
for (d = 3; d < limit && d * d <= n; d += 2)
if (n % d == 0)
return {false, true};
return {true, d * d > n};
}
template <typename T>
bool IsProbablyPrime_Fermat(T const n, size_t ntrials = 32) {
// https://en.wikipedia.org/wiki/Fermat_primality_test
if (n <= 16)
return n == 2 || n == 3 || n == 5 || n == 7 || n == 11 || n == 13;
thread_local std::mt19937_64 rng{123};
u64 const rand_end = n - 3 <= u64(-5) ? u64(n - 3) : u64(-5);
for (size_t trial = 0; trial < ntrials; ++trial)
if (PowMod<T>(rng() % rand_end + 2, n - 1, n) != 1)
return false;
return true;
}
template <typename T>
bool IsProbablyPrime(T const n) {
if (n < (1 << 12))
return IsProbablyPrime_TrialDiv(n).first;
return IsProbablyPrime_Fermat(n);
}
template <typename T>
std::string IntToStr(T n) {
if (n == 0)
return "0";
std::string r;
while (n != 0) {
u32 constexpr mod = 1'000'000'000U;
std::ostringstream ss;
auto const nm = u32(n % mod);
n /= mod;
if (n != 0)
ss << std::setw(9) << std::setfill('0');
ss << nm;
r = ss.str() + r;
}
return r;
}
template <typename T>
constexpr T ParseNum(char const * s) {
size_t len = 0;
for (len = 0; s[len]; ++len);
T r = 0;
for (size_t i = 0; i < len; ++i) {
r *= 10;
r += s[i] - '0';
}
return r;
}
template <typename T>
std::tuple<T, std::vector<T>, std::vector<T>> Factor_PollardRho(
#if !IS_COMPILE_TIME
T const & n,
#endif
u64 limit = u64(-1), size_t ntrials = 6) {
size_t constexpr T_bisize = BiSizeOfT<T>;
// https://en.wikipedia.org/wiki/Pollard%27s_rho_algorithm
using DT = DWordOfT<T>;
#if IS_COMPILE_TIME
static auto constexpr n = ParseNum<T>(NUM);
#endif
if (n <= 1)
return {n, {}, {}};
if (IsProbablyPrime<T>(n))
return {n, {n}, {}};
#if IS_COMPILE_TIME
static auto constexpr montg_krr = MontgKRR(n);
static T constexpr mk = std::get<0>(montg_krr), mrm = std::get<1>(montg_krr), mrm2 = std::get<2>(montg_krr), mri = std::get<3>(montg_krr),
mone = use_montg ? mrm : 1, mone2 = use_montg ? mrm2 : 1;
#else
static auto const montg_krr = MontgKRR(n);
static T const mk = std::get<0>(montg_krr), mrm = std::get<1>(montg_krr), mrm2 = std::get<2>(montg_krr), mri = std::get<3>(montg_krr),
mone = use_montg ? mrm : 1, mone2 = use_montg ? mrm2 : 1;
#endif
auto AdjustL = [&](T x) -> T {
if constexpr(1) {
while (x >= n)
x -= n;
return x;
} else {
using SiT = SignedOfT<T>;
return x - (n & (~T(SiT(x - n) >> (T_bisize - 1))));
}
};
auto MontgModL = [&](DT const & x) -> T {
if constexpr(!use_montg)
return T(x % n);
else
return T((x + MulD<T>(n, T(x) * mk)) >> T_bisize);
};
auto ToMontgL = [&](T const & x) -> T {
if constexpr(!use_montg)
return x;
else
return MontgModL(MulD<T>(x, mrm2));
};
auto FromMontgL = [&](T const & x) -> T {
if constexpr(!use_montg)
return x;
else
return AdjustL(MontgModL(x));
};
auto DumpMontgX = [&](char const * name, T const & x, bool from = true){
if constexpr(1) {
COUT(<< __LINE__ << " : " << name << " = " << IntToStr(from ? FromMontgL(x) : x) << std::endl);
}
};
auto f = [&](T x){ return MontgModL(MulD<T>(x, x) + mone2); };
#if IS_DEBUG
#define DUMPM(x) DumpMontgX(#x, x)
#define DUMPI(x) DumpMontgX(#x, x, false)
#else
#define DUMPM(x)
#define DUMPI(x)
#endif
ASSERT(3 <= n);
size_t cnt = 0;
u64 const distr_end = n - 2 <= u64(-5) ? u64(n - 2) : u64(-5);
thread_local std::mt19937_64 rng{123};
for (size_t itry = 0; itry < ntrials; ++itry) {
bool failed = false;
u64 const rnd = rng() % distr_end + 1;
T x = ToMontgL(rnd);
u64 sum_cycles = 0;
for (u64 cycle = 1;; cycle <<= 1) {
T y = x, m = mone, xstart = x, ny = 0;
while (ny < y)
ny += n;
ny -= y;
auto ILast = [&](auto istart){
size_t ri = istart + gcd_per_nloops - 1;
if (ri < cycle)
return ri;
else
return cycle - 1;
};
for (u64 i = 0, istart = 0, ilast = ILast(istart); i < cycle; ++i) {
x = f(x);
m = MontgModL(MulD<T>(m, ny + x));
if (i < ilast)
continue;
cnt += ilast + 1 - istart;
if (cnt >= limit)
return {n, {}, {n}};
if (GCD<T>(n, FromMontgL(m)) == 1) {
istart = i + 1;
ilast = ILast(istart);
xstart = x;
continue;
}
T x2 = xstart;
for (u64 i2 = istart; i2 <= i; ++i2) {
x2 = f(x2);
auto const g = GCD<T>(n, FromMontgL(ny + x2));
if (g == 1) {
continue;
}
sum_cycles += i + 1;
if (g == n) {
failed = true;
break;
}
#if 0
auto res0 = Factor_PollardRho<T>(g, limit, ntrials);
auto res1 = Factor_PollardRho<T>(n / g, limit, ntrials);
res0.first.insert(res0.first.end(), res1.first.begin(), res1.first.end());
res0.second.insert(res0.second.end(), res1.second.begin(), res1.second.end());
#endif
ASSERT(n % g == 0);
COUT(<< "PollardRho tries " << (itry + 1) << " iterations " << sum_cycles << " (2^" << std::fixed << std::setprecision(2) << std::log2(std::max<size_t>(1, sum_cycles)) << ")" << std::endl);
if (IsProbablyPrime<T>(n / g))
return {n, {g, n / g}, {}};
else
return {n, {g}, {n / g}};
}
if (failed)
break;
ASSERT(false);
}
sum_cycles += cycle;
if (failed)
break;
}
}
return {n, {}, {n}};
}
template <typename T>
void ShowFactors(std::tuple<T, std::vector<T>, std::vector<T>> fs) {
auto [N, a, b] = fs;
std::cout << "Factors of " << IntToStr(N) << " (2^" << std::fixed << std::setprecision(3) << std::log2(double(std::max<T>(1, N))) << "):" << std::endl;
std::sort(a.begin(), a.end());
std::sort(b.begin(), b.end());
for (auto const & x: a)
std::cout << x << " ";
std::cout << std::endl;
if (!b.empty()) {
std::cout << "Unfactored:" << std::endl;
for (auto const & x: b)
std::cout << x << " ";
std::cout << std::endl;
}
}
int main() {
try {
using T = u128;
#if !IS_COMPILE_TIME
std::string s;
COUT(<< "Enter number: ");
std::cin >> s;
auto const N = ParseNum<T>(s.c_str());
#endif
auto const tim = Time();
ShowFactors(Factor_PollardRho<T>(
#if !IS_COMPILE_TIME
N
#endif
));
COUT(<< "Time " << std::fixed << std::setprecision(3) << (Time() - tim) << " sec" << std::endl);
return 0;
} catch (std::exception const & ex) {
std::cout << "Exception: " << ex.what() << std::endl;
return -1;
}
}
Console Output:
PollardRho tries 1 iterations 25853951 (2^24.62)
Factors of 780002082420246798979794021150335143 (2^119.231):
244300526707007 3192797383346267127449
Time 35.888 sec
Part 2
Decided to do even further improvements, by implementing my own highly optimal uint128 and uint256, meaning long arithmetics, same like done in Boost or GMP.
Not to dwell into all the details, I optimized every line and every method of these classes. Especially all those methods that deal with operations needed for factorization.
This improved version gives 6x times more speed, compared to Part 1 of my answer, meaning if first version takes 30 seconds to finish, this second version takes 5 seconds.
As you can see in Console Output at the very end of my post, your biggest number takes just 420 seconds to factor. This is 5.4x times faster compared to 2260 seconds of the first part of this answer.
CODE GOES HERE. Only because of StackOverflow limit of 30 000 symbols per post, I can't inline 2nd code here, because it alone is 26 KB in size. Hence I'm providing this code as Gist link below and also through Try it online! link (to run online on GodBolt server):
Github Gist source code
Try it online!
Console Output:
PollardRho tries 1 iterations 32767 (2^15.00)
Factors of 1000036317378699858851366323 (2^89.692):
1000014599
Unfactored:
1000021718061637877
Time 0.086 sec
PollardRho tries 1 iterations 25853951 (2^24.62)
Factors of 780002082420246798979794021150335143 (2^119.231):
244300526707007 3192797383346267127449
Time 5.830 sec
PollardRho tries 1 iterations 230129663 (2^27.78)
Factors of 614793320656537415355785711660734447 (2^118.888):
44780536225818373 13729029897191722339
Time 49.446 sec
PollardRho tries 1 iterations 1914077183 (2^30.83)
Factors of 1000000000002322140000000048599822299 (2^119.589):
1000000000000021121 1000000000002301019
Time 419.680 sec