Related
I'm still using C++14. So std::sample is out of reach. Is there something equivalent in boost? I do not want to copy my std::multiset which isn't reorderable.
As far as I know, there is not such a thing in boost. But you may write a simple one, yourself:
template<typename T>
std::vector<T> sample_items(const std::multiset<T> & ms, int samples)
{
std::vector<T> ret_value;
std::random_device rd;
std::mt19937 gen(rd());
std::uniform_int_distribution<> dis(0, ms.size() - 1);
for (int i = 0; i < samples; i++)
{
auto first = std::begin(ms);
auto offset = dis(gen);
std::advance(first, offset);
ret_value.push_back(*first);
}
return ret_value;
}
I do not want to copy my std::multiset which isn't reorderable.
If still prefer not to send your multiset to a function, just change the function in order to work with iterators.
UPDATE
Added a sequential draw algorithm that doesn't require additional storage by dynamically adjusting the probability for selecting the next item in sequence order.
See sequential_sample below
random_sample
I think the semantics of random_sample should be that you don't pick the same sequence element twice.
With multiset you could get duplicate values. Just use set if you don't want that.
To avoid duplicate picks you can generate a set of unique indices until the size matches n and then project the results:
A problem that lurks here is that when doing it naively, you might always return the results in the input order, which is definitely not what you want.
So, you could do a hybrid approach where you keep track of already picked elements. In this implementation I do that, while
optimizing the storage to avoid dynamic allocation (unless n is >10)
optimize the storage for locality of reference (cache friendliness)
also cache the iterators with the picked items, so that subsequent picks may optimize iterator traversal, instead always advancing from the start iterator
There are some more comments in the code, and I left in a few trace statements that may help in understanding how the algorithm and the optimizations operate.
Live On Coliru
#include <random>
#include <set>
#include <iostream>
#include <algorithm>
#include <iterator>
#include <boost/container/flat_set.hpp>
#include <boost/container/small_vector.hpp>
namespace my {
static std::ostream trace(std::clog.rdbuf()/* or: nullptr*/);
template <typename It, typename Out, typename URBG>
Out random_sample(It f, It l, Out out, size_t n, URBG& urbg) {
size_t const size = std::distance(f,l);
// adjust n for size (matches std::sample)
n = std::min(size, n);
// bind distribution to the random bit generator
auto pick = [&urbg,dist=std::uniform_int_distribution<size_t>(0, size-1)]() mutable {
return dist(urbg);
};
// Optimized storage of indices: works best for small n, probably still
// better than `std::set` for large n.
// IDEA: For very large n, prefer just a vector, sort+unique until n
// reached
//
// The loc field is a cached (forward) iterator so we reduce repeated
// traversals.
// IDEA: when It is of random iterator category, specialize without loc
// cache
struct P {
size_t idx; It loc;
bool operator<(P const& rhs) const { return idx < rhs.idx; }
};
namespace bc = boost::container;
bc::flat_set<P, std::less<P>, bc::small_vector<P, 10> > picked;
// generate n unique picks
while (n-->0) {
auto entry = [&] {
while (true) {
auto insertion = picked.insert({pick(), f});
if (insertion.second)
return insertion.first;
}
}();
trace << "accept pick: " << entry->idx << "\n";
// traverse and cache loc
if (entry == begin(picked)) {
// advance from scratcj
entry->loc = std::next(f, entry->idx);
} else {
// minimum steps from prior cached loc
auto& prior = *std::prev(entry);
trace << "using prior reference: " << prior.idx << "\n";
entry->loc = std::next(prior.loc, entry->idx - prior.idx);
}
// output
*out++ = *entry->loc;
}
return out;
}
} // namespace my
int main() {
std::multiset const pool {
0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
20, 21, 22, 23, 24, 25, 26, 27, 28, 29,
30, 31, 32, 33, 34, 35, 36, 37, 38, 39,
40, 41, 42, 43, 44, 45, 46, 47, 48, 49,
50, 51, 52, 53, 54, 55, 56, 57, 58, 59,
60, 61, 62, 63, 64, 65, 66, 67, 68, 69,
70, 71, 72, 73, 74, 75, 76, 77, 78, 79,
80, 81, 82, 83, 84, 85, 86, 87, 88, 89,
90, 91, 92, 93, 94, 95, 96, 97, 98, 99,
};
std::mt19937 engine(std::random_device{}());
for (int i = 0; i<3; ++i) {
my::random_sample(
pool.begin(), pool.end(),
std::ostream_iterator<int>(std::cout << "-- random draw (n=3): ", " "),
3,
engine);
std::cout << "\n";
}
}
Prints, e.g.:
accept pick: 46
accept pick: 98
using prior reference: 46
accept pick: 55
using prior reference: 46
accept pick: 80
accept pick: 12
accept pick: 20
using prior reference: 12
accept pick: 63
accept pick: 80
using prior reference: 63
accept pick: 29
-- random draw (n=3): 46 98 55
-- random draw (n=3): 80 12 20
-- random draw (n=3): 63 80 29
sequential_sample
As announced at the top, if the results being in input-order is not an issue, you can be much more efficient and require no storage at all:
template <typename It, typename Out, typename URBG>
Out sequential_sample(It f, It l, Out out, size_t n, URBG&& urbg) {
using D = std::uniform_int_distribution<size_t>;
size_t size = std::distance(f, l);
n = std::min(n, size);
D dist;
for (; n != 0; ++f) {
if (dist(urbg, D::param_type{ 0, --size }) >= n)
continue;
*out++ = *f;
--n;
}
return out;
}
This program combines random_sample and sequential_sample and demonstrates the difference in results:
Live On Coliru
#include <random>
#include <algorithm>
namespace my {
template <typename It, typename Out, typename URBG>
Out sequential_sample(It f, It l, Out out, size_t n, URBG&& urbg) {
using D = std::uniform_int_distribution<size_t>;
size_t size = std::distance(f, l);
n = std::min(n, size);
D dist;
for (; n != 0; ++f) {
if (dist(urbg, D::param_type{ 0, --size }) >= n)
continue;
*out++ = *f;
--n;
}
return out;
}
}
#include <boost/container/flat_set.hpp>
#include <boost/container/small_vector.hpp>
namespace my {
template <typename It, typename Out, typename URBG>
Out random_sample(It f, It l, Out out, size_t n, URBG& urbg) {
using Dist = std::uniform_int_distribution<size_t>;
size_t const size = std::distance(f,l);
// adjust n for size (matches std::sample)
n = std::min(size, n);
// bind distribution to the random bit generator
auto pick = [&urbg,dist=Dist(0, size-1)]() mutable {
return dist(urbg);
};
// Optimized storage of indices: works best for small n, probably still
// better than `std::set` for large n.
// IDEA: For very large n, prefer just a vector, sort+unique until n
// reached
//
// The loc field is a cached (forward) iterator so we reduce repeated
// traversals.
// IDEA: when It is of random iterator category, specialize without loc
// cache
struct P {
size_t idx; It loc;
bool operator<(P const& rhs) const { return idx < rhs.idx; }
};
namespace bc = boost::container;
bc::flat_set<P, std::less<P>, bc::small_vector<P, 10> > picked;
// generate n unique picks
while (n-->0) {
auto entry = [&] {
while (true) {
auto insertion = picked.insert({pick(), f});
if (insertion.second)
return insertion.first;
}
}();
// traverse and cache loc
if (entry == begin(picked)) {
// advance from scratcj
entry->loc = std::next(f, entry->idx);
} else {
// minimum steps from prior cached loc
auto& prior = *std::prev(entry);
entry->loc = std::next(prior.loc, entry->idx - prior.idx);
}
// output
*out++ = *entry->loc;
}
return out;
}
} // namespace my
#include <set>
#include <iostream>
#include <iterator>
int main() {
std::multiset<int> const pool {
0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
20, 21, 22, 23, 24, 25, 26, 27, 28, 29,
30, 31, 32, 33, 34, 35, 36, 37, 38, 39,
40, 41, 42, 43, 44, 45, 46, 47, 48, 49,
50, 51, 52, 53, 54, 55, 56, 57, 58, 59,
60, 61, 62, 63, 64, 65, 66, 67, 68, 69,
70, 71, 72, 73, 74, 75, 76, 77, 78, 79,
80, 81, 82, 83, 84, 85, 86, 87, 88, 89,
90, 91, 92, 93, 94, 95, 96, 97, 98, 99,
};
std::mt19937 engine(std::random_device{}());
constexpr int N = 10;
for (int i = 0; i<N; ++i) {
my::sequential_sample(
pool.begin(), pool.end(),
std::ostream_iterator<int>(std::cout << "-- sequential draw (n=3): ", " "),
3,
engine);
std::cout << "\n";
}
for (int i = 0; i<N; ++i) {
my::random_sample(
pool.begin(), pool.end(),
std::ostream_iterator<int>(std::cout << "-- random draw (n=3): ", " "),
3,
engine);
std::cout << "\n";
}
}
Prints e.g.
-- sequential draw (n=3): 14 66 71
-- sequential draw (n=3): 24 26 30
-- sequential draw (n=3): 19 34 65
-- sequential draw (n=3): 16 41 49
-- sequential draw (n=3): 15 25 37
-- sequential draw (n=3): 15 49 84
-- sequential draw (n=3): 12 53 88
-- sequential draw (n=3): 46 70 94
-- sequential draw (n=3): 32 51 56
-- sequential draw (n=3): 32 37 95
-- random draw (n=3): 15 38 35
-- random draw (n=3): 61 64 58
-- random draw (n=3): 4 37 93
-- random draw (n=3): 0 43 84
-- random draw (n=3): 58 52 59
-- random draw (n=3): 81 43 3
-- random draw (n=3): 41 30 89
-- random draw (n=3): 58 9 84
-- random draw (n=3): 15 39 27
-- random draw (n=3): 74 27 9
I am having trouble finding the number of distinct elements in a 2D-array using for loops. I know how to do it if its a 1D-array but can't seem to figure out how to do it for 2D-array.
I tried searching for it, but can't seem to quite understand how some of the example works.
I recommend you to use std::array and use the find() method for finding specific element in the array.
int array[5][4] = {{ 34, 56, 79, 12},
{ 25, 37, 41, 18 },
{ 59, 29, 38, 47 },
{ 55, 11, 88, 34 },
{ 45, 19, 34, 66 } };
And use
find(array[0], array[n-1]+m, x)
//array is your 2D array, n is the first dimension, m is the second and x is your value
Lets say i have:
int array[9][9]= {
{1 , 2 , 3 , 4, 5, 6, 7, 8, 9},
{10, 11, 12, 13, 14, 15, 16, 17, 18},
{19, 20, 21, 22, 23, 24, 25, 26, 27},
{28, 29, 30, 31, 32, 33, 34, 35, 36},
{37, 38, 39, 40, 41, 42, 43, 44, 45},
{46, 47, 48, 49, 50, 51, 52, 53, 54},
{55, 56, 57, 58, 59, 60, 61, 62, 63},
{64, 65, 66, 67, 68, 69, 70, 71, 72},
{73, 74, 75, 76, 77, 78, 79, 80, 81}
};
how can i only apply some function to the first row (value 1 to 9 ) or the first column only (like value 1 to 73). lets say i want to say index 0 to 9 shall all have value 0.
is it possible to save this range in a variable?
Try to do like this:
for (int i = 0; i<10; i++)
array[0][i] = 0;
There are no true multidimensional arrays in C.
In a true multidimensional array, all dimensions are on equal standing. Whatever you can do with rows, you can also do with columns.
This is not the case with C++. The third row of your array is just
array[3]
It's an array on its own in every regard. A range of rows, like any other range, can be represented as a (start, end) pair, e.g. make_pair(array[3], array[7]).
Nothingl like that can be done with columns. The third column, unlike the third row, is not an array, it's just a virtual collection of elements not sitting under any standard data structure umbrella.
The closest thing to a multidimensional array slices are custom iterators, such that ++i moves to either the next element to the right or to the next element below. While you're at it, consider moving away from C style arrays to STL style containers.
To isolate the rows of the array, you could take a reference to a row of the array:
int (&row)[9] = array[2];
For example the above line takes a reference to the 3rd row of the array.
Live Demo
For the columns, is more complicated.
Alternatevely, you could do the following construct that returns a vector of reference wrappers to either a column or a row of a 2D array.
// if flg == true you get row at idx else if flg == false you get column at idx
template<typename T, int N, int M>
std::vector<std::reference_wrapper<T>>
getRange(T (&arr)[N][M], std::size_t const idx, bool const flg = true) {
if(flg) {
return typename std::vector<std::reference_wrapper<T>>(std::begin(arr[idx]), std::end(arr[idx]));
} else {
typename std::vector<std::reference_wrapper<T>> out;
out.reserve(N);
for(int i(0); i < N; ++i) out.push_back(arr[i][idx]);
return out;
}
}
Live Demo
For rows it's easy, as you can pass them like:
void foo(int * row, int cols) {
for (int col = 0; col < cols; ++col) {
int * x = row + col;
}
}
...
foo(array[3], 9);
...
For columns it's more difficult but you can thought about every column as something that have specific offset in the array:
void boo(int * col, int rows, int cols) {
for (int row = 0; row < rows; ++row) {
int * x = col + row * cols;
}
}
....
// process fourth column:
boo(array[0]+4, 9, 9);
Of course using sizeof instead of '9' and C++ vectors/array instead of C-style int[][] will make life more easy and code more readable and supportable.
Another way is to use boost::matrix e.g.:
using namespace boost::numeric::ublas;
matrix<double> m(9, 9);
matrix_row<matrix <double> > row(m, 5);
matrix_column<matrix <double> > col(m, 4);
You can do it by specifying indices (start and end range) with your function and mention whether it should be applied on row or column. Since you are using plain C style array it's trickier to deal with pointers. I recommend you to use vectors and pairs (for ranges).
An example for C style array
void some_function(int array[][9], bool row_or_column, size_t major, size_t start, size_t end){
if (row_or_column = true) {
for (int i = start; i < end; i++) {
cout << array[major][i]; //perform your operation on row
}
}
else {
for (int i = start; i < end; i++) {
cout << array[i][major]; //perform your operation on column
}
}
}
Set row_or_column as either true for row or false for column, major should specify the column number or row number and the ranges in start and end. Note: end is exclusive
For processing second row with range start = 0 and end = 5 i.e 10 to 14
some_function(array, true, 1, 0, 5)
For processing second column with range start = 0 and end = 5 i.e 2 to 38
some_function(array, false, 1, 0, 5)
I'm encoding a byte array into qr code using libqrencode and than try to decode it using zbar library. the programming language is c++.
The problem occurs when the values are >=128. for example when I decode the qr code which contains the following values:
unsigned char data[17]={111, 127, 128, 224, 255, 178, 201,200, 192, 191,22, 17,20, 34, 65 ,23, 76};
symbol->get_data_length() return 25 instead of 17 and when I tried to print the values using this small piece of code:
string input_data = symbol->get_data();
for(int k=0; k< 25; k++)
cout<< (int)((unsigned char)input_data[k])<<", ";
I got the following result:
111, 127, 194, 128, 195, 160, 195, 191, 194, 178, 195, 137, 195, 136, 195, 128, 194, 191, 22, 17, 20, 34, 65, 23, 76,
So as we can notice the values < 128 didn't effected but I got two bytes for every value >=128.
Also I printed the values without casting to unsigned char:
for(int k=0; k< 25; k++)
cout<< (int)input_data[k]<<", ";
and the result is:
111, 127, -62, -128, -61, -96, -61, -65, -62, -78, -61, -119, -61, -120, -61, -128, -62, -65, 22, 17, 20, 34, 65, 23, 76
I solve this problem by the following code:
void process_zbar_output(const string & input_data, vector<unsigned char> & output_data)
{
for (int i = 0; i < input_data.length(); i++)
{
int temp = (int) input_data[i];
// if the original value is >=128 we need to process it to get the original value
if (temp < 0)
{
// if the number is 62 than the original is between 128 and 191
// if the number is 61 than the original is between 192 and 255
if (temp == -62)
output_data.push_back(256 + ((int) input_data[i + 1]));
else
output_data.push_back(256 + ((int) input_data[i + 1] + 64));
i++;
}
else
{
output_data.push_back( input_data[i]);
}
}
}
Can anybody help me with this problem and explain why I got these extra bytes?
How have/would you design an function that on each call returns the next value in a nominated numeric range in lexicographical order of string representation...?
Example: range 8..203 --> 10, 100..109, 11, 110..119, 12, 120..129, 13, 130..139, ..., 19, 190..199, 20, 200..203, 30..99.
Constraints: indices 0..~INT_MAX, fixed space, O(range-length) performance, preferably "lazy" so if you stop iterating mid way you haven't wasted processing effort. Please don't post brute force "solutions" iterating numerically while generating strings that are then sorted.
Utility: if you're generating data that ultimately needs to be lexicographically presented or processed, a lexicographical series promises lazy generation as needed, reduces memory requirements and eliminates a sort.
Background: when answering this question today, my solution gave output in numeric order (i.e. 8, 9, 10, 11, 12), not lexicographical order (10, 11, 12, 8, 8) as illustrated in the question. I imagined it would be easy to write or find a solution, but my Google-foo let me down and it was trickier than I expected, so I figured I'd collect/contribute here....
(Tagged C++ as it's my main language and I'm personally particularly interested in C++ solutions, but anything's welcome)
Somebody voted to close this because I either didn't demonstrate a minimal understanding of the problem being solved (hmmmm!?! ;-P), or an attempted solution. My solution is posted as an answer as I'm happy for it to be commented on and regailed in the brutal winds of Stack Overflow wisdom.... O_o
This is actually quite easy. First an observation:
Theorem: if two numbers x and y such that x < y are in the series and these numbers have the same number of digits, then x comes before y.
Proof: let's view digits of x as xn..x0 and digits of y as yn...y0. Let's take the left most digit that these two differ in, assumed to be at index i. Therefore, we have:
y = yn...yiy(i-1)...y0
x = yn...yix(i-1)...x0
since all digits from n to i are the same in both numbers. If x < y, then mathematically:
x(i-1) < y(i-1)
Lexicographically, if the digit x(i-1) is smaller than the digit y(i-1), then x comes before y.
This theorem means that in your specified range of [a, b], you have numbers with different number of digits, but the ones that have the same number of digits are in their mathematical order.
Building on that, here's a simple algorithm. First, let's say a has m digits and b has n digits (n >= m)
1. create a heap with lexicographical order
2. initially, insert `a` and `10^i` for i in [n + 1, m]
3. while the heap is not exhausted
3.1. remove and yield the top of the heap (`next`) as next result
3.2. if `next + 1` is still in range `[a, b]` (and doesn't increase in digits), insert it in heap
Notes:
In step 2, you are inserting the starting numbers of each series of numbers that have the same number of digits.
To change to a function that returns a number on each call, step 3.1 should be changed to store the state of the algorithm and resume on next call. Pretty standard.
Step 3.2 is the part that exploits the above theorem and keeps only the next number in mathematical order in the heap.
Assuming N = b - a, The extra space used by this algorithm is O(log N) and it's time complexity is O(N * log log N).
Here's my attempt, in Python:
import math
#iterates through all numbers between start and end, that start with `cur`'s digits
def lex(start, end, cur=0):
if cur > end:
return
if cur >= start:
yield cur
for i in range(0,10):
#add 0-9 to the right of the current number
next_cur = cur * 10 + i
if next_cur == 0:
#we already yielded 0, no need to do it again
continue
for ret in lex(start, end, next_cur):
yield ret
print list(lex(8, 203))
Result:
[10, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 11, 110, 111, 112, 113,
114, 115, 116, 117, 118, 119, 12, 120, 121, 122, 123, 124, 125, 126, 127, 128,
129, 13, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 14, 140, 141, 142,
143, 144, 145, 146, 147, 148, 149, 15, 150, 151, 152, 153, 154, 155, 156, 157,
158, 159, 16, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 17, 170, 171,
172, 173, 174, 175, 176, 177, 178, 179, 18, 180, 181, 182, 183, 184, 185, 186,
187, 188, 189, 19, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 20, 200,
201, 202, 203, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36,
37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56,
57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76,
77, 78, 79, 8, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 9, 90, 91, 92, 93, 94, 95,
96, 97, 98, 99]
This uses O(log(end)) stack space, which is bounded by INT_MAX, so it won't go any deeper than five calls for your typical 16 bit int. It runs in O(end) time, since it has to iterate through numbers smaller than start before it can begin yielding valid numbers. This can be considerably worse than O(end-start) if start and end are large and close together.
Iterating through lex(0, 1000000) takes about six seconds on my machine, so it appears to be slower than Tony's method but faster than Shahbaz's. Of course, it's challenging to make a direct comparison since I'm using a different language.
This is a bit of a mess, so I'm curious to see how other people tackle it. There are so many edge cases explicitly handled in the increment operator!
For range low to high:
0 is followed by 1
numbers shorter than high are always followed by 0-appended versions (e.g. 12->120)
numbers other than high that end in 0-8 are followed by the next integer
when low has as many digits as high, you finish after high (return sentinel high + 1)
otherwise you finish at a number 999... with one less digit than high
other numbers ending in 9(s) have the part before the trailing 9s incremented, but if that results in trailing 0s they're removed providing the number's still more than low
template <typename T>
std::string str(const T& t)
{
std::ostringstream oss; oss << t; return oss.str();
}
template <typename T>
class Lex_Counter
{
public:
typedef T value_type;
Lex_Counter(T from, T to, T first = -1)
: from_(from), to_(to),
min_size_(str(from).size()), max_size_(str(to).size()),
n_(first != -1 ? first : get_first()),
max_unit_(pow(10, max_size_ - 1)), min_unit_(pow(10, min_size_ - 1))
{ }
operator T() { return n_; }
T& operator++()
{
if (n_ == 0)
return n_ = 1;
if (n_ < max_unit_ && n_ * 10 <= to_)
return n_ = n_ * 10; // e.g. 10 -> 100, 89 -> 890
if (n_ % 10 < 9 && n_ + 1 <= to_)
return ++n_; // e.g. 108 -> 109
if (min_size_ == max_size_
? n_ == to_
: (n_ == max_unit_ - 1 && to_ < 10 * max_unit_ - 10 || // 99/989
n_ == to_ && to_ >= 10 * max_unit_ - 10)) // eg. 993
return n_ = to_ + 1;
// increment the right-most non-9 digit
// note: all-9s case handled above (n_ == max_unit_ - 1 etc.)
// e.g. 109 -> 11, 19 -> 2, 239999->24, 2999->3
// comments below explain 230099 -> 230100
// search from the right until we have exactly non-9 digit
for (int k = 100; ; k *= 10)
if (n_ % k != k - 1)
{
int l = k / 10; // n_ 230099, k 1000, l 100
int r = ((n_ / l) + 1) * l; // 230100
if (r > to_ && r / 10 < from_)
return n_ = from_; // e.g. from_ 8, r 20...
while (r / 10 >= from_ && r % 10 == 0)
r /= 10; // e.g. 230100 -> 2301
return n_ = r <= from_ ? from_ : r;
}
assert(false);
}
private:
T get_first() const
{
if (min_size_ == max_size_ ||
from_ / min_unit_ < 2 && from_ % min_unit_ == 0)
return from_;
// can "fall" from e.g. 321 to 1000
return min_unit_ * 10;
}
T pow(T n, int exp)
{ return exp == 0 ? 1 : exp == 1 ? n : 10 * pow(n, exp - 1); }
T from_, to_;
size_t min_size_, max_size_;
T n_;
T max_unit_, min_unit_;
};
Performance numbers
I can count from 0 to 1 billion in under a second on a standard Intel machine / single threaded, MS compiler at -O2.
The same machine / harness running my attempt at Shahbaz's solution - below - takes over 3.5 second to count to 100,000. Maybe the std::set isn't a good heap/heap-substitute, or there's a better way to use it? Any optimisation suggestions welcome.
template <typename T>
struct Shahbaz
{
std::set<std::string> s;
Shahbaz(T from, T to)
: to_(to)
{
s.insert(str(from));
for (int n = 10; n < to_; n *= 10)
if (n > from) s.insert(str(n));
n_ = atoi(s.begin()->c_str());
}
operator T() const { return n_; }
Shahbaz& operator++()
{
if (s.empty())
n_ = to_ + 1;
else
{
s.erase(s.begin());
if (n_ + 1 <= to_)
{
s.insert(str(n_ + 1));
n_ = atoi(s.begin()->c_str());
}
}
return *this;
}
private:
T n_, to_;
};
Perf code for reference...
void perf()
{
DWORD start = GetTickCount();
int to = 1000 *1000;
// Lex_Counter<int> counter(0, to);
Shahbaz<int> counter(0, to);
while (counter <= to)
++counter;
DWORD elapsed = GetTickCount() - start;
std::cout << '~' << elapsed << "ms\n";
}
Some Java code (deriving C++ code from this should be trivial), very similar to Kevin's Python solution:
public static void generateLexicographical(int lower, int upper)
{
for (int i = 1; i < 10; i++)
generateLexicographical(lower, upper, i);
}
private static void generateLexicographical(int lower, int upper, int current)
{
if (lower <= current && current <= upper)
System.out.println(current);
if (current > upper)
return;
for (int i = 0; i < 10; i++)
generateLexicographical(lower, upper, 10*current + i);
}
public static void main(String[] args)
{
generateLexicographical(11, 1001);
}
The order of the if-statements are not important, and one can be made an else of the other, but changing them in any way strangely enough makes it take about 20% longer.
This just starts with each number from 1 to 10, then recursively appends each possible number from 0 to 10 to that number, until we get a number bigger than the upper limit.
It similarly uses O(log upper) space (every digit requires a stack frame) and O(upper) time (we go from 1 to upper).
I/O is obviously the most time-consuming part here. If that is removed and replaced by just incrementing a variable, generateLexicographical(0, 100_000_000); takes about 4 seconds, but by no means taken from a proper benchmark.