Speed comparison of 2 loop styles - c++

I'm reading about STL algorithms and the book pointed out that algorithms like find use a while loop rather than a for loop because it is minimal, efficient, and uses one less variable. I decided to do some testing and the results didn't really match up.
The forfind consistently performed better than the whilefind. At first I simply tested by pushing 10000 ints back into a vector, and then using find to get a single value from it and return it to the iterator. I timed it and output that time.
Then I decided to change it so that the forfind and whilefind functions were used multiple times (in this case 10000 times). However, the for loop find still came up with better performance than the while find. Can anyone explain this? Here is the code.
#include "std_lib_facilities.h"
#include<ctime>
template<class ln, class T>
ln whilefind(ln first, ln last, const T& val)
{
while (first!=last && *first!=val) ++first;
return first;
}
template<class ln, class T>
ln forfind(ln first, ln last, const T& val)
{
for (ln p = first; p!=last; ++p)
if(*p == val) return p;
return last;
}
int main()
{
vector<int> numbers;
vector<int>::iterator whiletest;
vector<int>::iterator fortest;
for (int n = 0; n < 10000; ++n)
numbers.push_back(n);
clock_t while1 = clock(); // start
for (int i = 0; i < 10000; ++i)
whiletest = whilefind(numbers.begin(), numbers.end(), i);
clock_t while2 = clock(); // stop
clock_t for1 = clock(); // start
for (int i = 0; i < 10000; ++i)
fortest = forfind(numbers.begin(), numbers.end(), i);
clock_t for2 = clock(); // stop
cout << "While loop: " << double(while2-while1)/CLOCKS_PER_SEC << " seconds.\n";
cout << "For loop: " << double(for2-for1)/CLOCKS_PER_SEC << " seconds.\n";
}
The while loop consistently reports taking around .78 seconds and the for loop reports .67 seconds.

if(*p = val) return p;
That should be a ==. So forfind will only go through the entire vector for the first value, 0, and return immediately for numbers 1-9999.

Related

C++ : Program fails during vector filling and searching

I'm practicing C++ vector and as an exercise I want to fill a vector with 16 million random numbers and then find the position of the first occurrence of a number. The code which I implemented so far is this:
int getIndexOf(std::vector<int>& v, int num) {
for(std::size_t i=0; i < v.size(); i++) {
if(v.at(i) == num) {
return i;
}
}
return -1;
}
int main() {
int searchedNumber = 42;
int vectorSize = 16000000;
std::vector<int> v(vectorSize);
for(std::size_t i=0; i < v.size(); i++) {
v.push_back(rand() % 10000000);
}
//Linear search
auto start = std::chrono::high_resolution_clock::now();
int position = getIndexOf(v, searchedNumber);
auto stop = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::seconds>(stop - start);
std::cout << "The linear search took: " << duration.count() << " seconds" << std::endl;
std::cout << "The number " << searchedNumber << " occur first at position " << position << std::endl;
return 0;
}
Additionally I measure the time just for some statistics. The problem is that the program crash with error bad_alloc which I link with Running out of stack space. So initially I though that filling a vector with so many numbers when the vector is on the stack will be a reason for the crash and I created the vector dynamically (pointer). However, I still get the same error. What might be the reason for this?
int vectorSize = 16000000;
std::vector<int> v(vectorSize);
for(std::size_t i=0; i < v.size(); i++) {
v.push_back(rand() % 10000000);
}
This part is bad. push_back() adds an element to the vector, so it increases size(). Therefore, this loop won't terminate until something bad happens.
You should do like this instead:
int vectorSize = 16000000;
std::vector<int> v;
v.reserve(vectorSize); // allocate memory without actually adding elements
for(int i=0; i < vectorSize; i++) { // use the known size
v.push_back(rand() % 10000000);
}

Vector equilibrium point(s) function in C++

So I wanted to clean the rust off my C++ skills and thought I'd start with something fairly simple. An equilibrium point in a vector A of size N is a point K, such that: A[0] + A[1] + ... + A[K−1] = A[K+1] + ... + A[N−2] + A[N−1]. The rationale behind the function algorithm is simple: Check each consecutive element of the vector and compare the sum of the elements before said element with the sum of the elements after it and if they are equal, output the index of that element. While it sounds simple (and I imagine that it is) it turned out to be harder to implement in reality. Here's what the code looks like:
#include <iostream>
#include <vector>
using std::cin;
using std::cout;
using std::endl;
void EquilibriumPoint(std::vector<int> &A);
void VectorPrint(std::vector<int> &V);
void main()
{
int input;
std::vector<int> Vect1;
cout << "Input the vector elements" << endl;
while (cin >> input)
Vect1.push_back(input);
VectorPrint(Vect1);
EquilibriumPoint(Vect1);
}
void EquilibriumPoint(std::vector<int> &A)
{
for (int it = 0; it != A.size(); ++it)
{
int lowersum = 0;
int uppersum = 0;
for (int beg = 0; beg != it; ++beg) lowersum += A[beg];
for (int end = it + 1; end != A.size(); ++end) uppersum += A[end];
if (uppersum == lowersum) cout << it;
}
}
void VectorPrint(std::vector<int> &V)
{
for (int i = 0; i != V.size(); ++i)
cout << V[i] << endl;
}
As you can see I threw in a print function also for good measure. The problem is that the program doesn't seem to execute the EquilibriumPoint function. There must be a problem with the logic of the implementation but I can't find it. Do you guys have any suggestions?
cin >> input
always returns true for you - so IMHO you have an endless loop. You need to stop collecting elements at some point, for instance
int input = 1
while (input)
{
cin >> input;
Vect1.push_back(input);
}
Will accept all elements that are not zero, when zero arrives, it will end the vector and run your function.
Or you can first input the number of elements (if you want to include zeros), example:
int count;
cin >> count
for (int i = 0; i < count; ++i)
{
cin >> input;
Vect1.push_back(input);
}
I didn't check the rest of the code, though. One problem at a time.

Is using a vector of boolean values slower than a dynamic bitset?

Is using a vector of boolean values slower than a dynamic bitset?
I just heard about boost's dynamic bitset, and I was wondering is it worth
the trouble. Can I just use vector of boolean values instead?
A great deal here depends on how many Boolean values you're working with.
Both bitset and vector<bool> normally use a packed representation where a Boolean is stored as only a single bit.
On one hand, that imposes some overhead in the form of bit manipulation to access a single value.
On the other hand, that also means many more of your Booleans will fit in your cache.
If you're using a lot of Booleans (e.g., implementing a sieve of Eratosthenes) fitting more of them in the cache will almost always end up a net gain. The reduction in memory use will gain you a lot more than the bit manipulation loses.
Most of the arguments against std::vector<bool> come back to the fact that it is not a standard container (i.e., it does not meet the requirements for a container). IMO, this is mostly a question of expectations -- since it says vector, many people expect it to be a container (other types of vectors are), and they often react negatively to the fact that vector<bool> isn't a container.
If you're using the vector in a way that really requires it to be a container, then you probably want to use some other combination -- either deque<bool> or vector<char> can work fine. Think before you do that though -- there's a lot of (lousy, IMO) advice that vector<bool> should be avoided in general, with little or no explanation of why it should be avoided at all, or under what circumstances it makes a real difference to you.
Yes, there are situations where something else will work better. If you're in one of those situations, using something else is clearly a good idea. But, be sure you're really in one of those situations first. Anybody who tells you (for example) that "Herb says you should use vector<char>" without a lot of explanation about the tradeoffs involved should not be trusted.
Let's give a real example. Since it was mentioned in the comments, let's consider the Sieve of Eratosthenes:
#include <vector>
#include <iostream>
#include <iterator>
#include <chrono>
unsigned long primes = 0;
template <class bool_t>
unsigned long sieve(unsigned max) {
std::vector<bool_t> sieve(max, false);
sieve[0] = sieve[1] = true;
for (int i = 2; i < max; i++) {
if (!sieve[i]) {
++primes;
for (int temp = 2 * i; temp < max; temp += i)
sieve[temp] = true;
}
}
return primes;
}
// Warning: auto return type will fail with older compilers
// Fine with g++ 5.1 and VC++ 2015 though.
//
template <class F>
auto timer(F f, int max) {
auto start = std::chrono::high_resolution_clock::now();
primes += f(max);
auto stop = std::chrono::high_resolution_clock::now();
return stop - start;
}
int main() {
using namespace std::chrono;
unsigned number = 100000000;
auto using_bool = timer(sieve<bool>, number);
auto using_char = timer(sieve<char>, number);
std::cout << "ignore: " << primes << "\n";
std::cout << "Time using bool: " << duration_cast<milliseconds>(using_bool).count() << "\n";
std::cout << "Time using char: " << duration_cast<milliseconds>(using_char).count() << "\n";
}
We've used a large enough array that we can expect a large portion of it to occupy main memory. I've also gone to a little pain to ensure that the only thing that changes between one invocation and the other is the use of a vector<char> vs. vector<bool>. Here are some results. First with VC++ 2015:
ignore: 34568730
Time using bool: 2623
Time using char: 3108
...then the time using g++ 5.1:
ignore: 34568730
Time using bool: 2359
Time using char: 3116
Obviously, the vector<bool> wins in both cases--by around 15% with VC++, and over 30% with gcc. Also note that in this case, I've chosen the size to show vector<char> in quite favorable light. If, for example, I reduce number from 100000000 to 10000000, the time differential becomes much larger:
ignore: 3987474
Time using bool: 72
Time using char: 249
Although I haven't done a lot of work to confirm, I'd guess that in this case, the version using vector<bool> is saving enough space that the array fits entirely in the cache, while the vector<char> is large enough to overflow the cache, and involve a great deal of main memory access.
You should usually avoid std::vector<bool> because it is not a standard container. It's a packed version, so it breaks some valuable guarantees usually given by a vector. A valid alternative would be to use std::vector<char> which is what Herb Sutter recommends.
You can read more about it in his GotW on the subject.
Update:
As has been pointed out, vector<bool> can be used to good effect, as a packed representation improves locality on large data sets. It may very well be the fastest alternative depending on circumstances. However, I would still not recommend it by default since it breaks many of the promises established by std::vector and the packing is a speed/memory tradeoff which may be beneficial in both speed and memory.
If you choose to use it, I would do so after measuring it against vector<char> for your application. Even then, I'd recommend using a typedef to refer to it via a name which does not seem to make the guarantees which it does not hold.
#include "boost/dynamic_bitset.hpp"
#include <chrono>
#include <iostream>
#include <random>
#include <vector>
int main(int, char*[])
{
auto gen = std::bind(std::uniform_int_distribution<>(0, 1), std::default_random_engine());
std::vector<char> randomValues(1000000);
for (char & randomValue : randomValues)
{
randomValue = static_cast<char>(gen());
}
// many accesses, few initializations
auto start = std::chrono::high_resolution_clock::now();
for (int i = 0; i < 500; ++i)
{
std::vector<bool> test(1000000, false);
for (int j = 0; j < test.size(); ++j)
{
test[j] = static_cast<bool>(randomValues[j]);
}
}
auto end = std::chrono::high_resolution_clock::now();
std::cout << "Time taken1: " << std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count()
<< " milliseconds" << std::endl;
auto start2 = std::chrono::high_resolution_clock::now();
for (int i = 0; i < 500; ++i)
{
boost::dynamic_bitset<> test(1000000, false);
for (int j = 0; j < test.size(); ++j)
{
test[j] = static_cast<bool>(randomValues[j]);
}
}
auto end2 = std::chrono::high_resolution_clock::now();
std::cout << "Time taken2: " << std::chrono::duration_cast<std::chrono::milliseconds>(end2 - start2).count()
<< " milliseconds" << std::endl;
auto start3 = std::chrono::high_resolution_clock::now();
for (int i = 0; i < 500; ++i)
{
std::vector<char> test(1000000, false);
for (int j = 0; j < test.size(); ++j)
{
test[j] = static_cast<bool>(randomValues[j]);
}
}
auto end3 = std::chrono::high_resolution_clock::now();
std::cout << "Time taken3: " << std::chrono::duration_cast<std::chrono::milliseconds>(end3 - start3).count()
<< " milliseconds" << std::endl;
// few accesses, many initializations
auto start4 = std::chrono::high_resolution_clock::now();
for (int i = 0; i < 1000000; ++i)
{
std::vector<bool> test(1000000, false);
for (int j = 0; j < 500; ++j)
{
test[j] = static_cast<bool>(randomValues[j]);
}
}
auto end4 = std::chrono::high_resolution_clock::now();
std::cout << "Time taken4: " << std::chrono::duration_cast<std::chrono::milliseconds>(end4 - start4).count()
<< " milliseconds" << std::endl;
auto start5 = std::chrono::high_resolution_clock::now();
for (int i = 0; i < 1000000; ++i)
{
boost::dynamic_bitset<> test(1000000, false);
for (int j = 0; j < 500; ++j)
{
test[j] = static_cast<bool>(randomValues[j]);
}
}
auto end5 = std::chrono::high_resolution_clock::now();
std::cout << "Time taken5: " << std::chrono::duration_cast<std::chrono::milliseconds>(end5 - start5).count()
<< " milliseconds" << std::endl;
auto start6 = std::chrono::high_resolution_clock::now();
for (int i = 0; i < 1000000; ++i)
{
std::vector<char> test(1000000, false);
for (int j = 0; j < 500; ++j)
{
test[j] = static_cast<bool>(randomValues[j]);
}
}
auto end6 = std::chrono::high_resolution_clock::now();
std::cout << "Time taken6: " << std::chrono::duration_cast<std::chrono::milliseconds>(end6 - start6).count()
<< " milliseconds" << std::endl;
return EXIT_SUCCESS;
}
Time taken1: 1821 milliseconds
Time taken2: 1722 milliseconds
Time taken3: 25 milliseconds
Time taken4: 1987 milliseconds
Time taken5: 1993 milliseconds
Time taken6: 10970 milliseconds
dynamic_bitset = std::vector<bool>
if you allocate many times but you only access the array that you created few times, go for std::vector<bool> because it has lower allocation/initialization time.
if you allocate once and access many times, go for std::vector<char>, because of faster access
Also keep in mind that std::vector<bool> is NOT safe to be used is in multithreading because you might write to different bits but it might be the same byte.
It appears that the size of a dynamic bitset cannot be changed:
"The dynamic_bitset class is nearly identical to the std::bitset class. The difference is that the size of the dynamic_bitset (the number of bits) is specified at run-time during the construction of a dynamic_bitset object, whereas the size of a std::bitset is specified at compile-time through an integer template parameter." (from http://www.boost.org/doc/libs/1_36_0/libs/dynamic_bitset/dynamic_bitset.html)
As such, it should be slightly faster since it will have slightly less overhead than a vector, but you lose the ability to insert elements.
UPDATE: I just realize that OP was asking about vector<bool> vs bitset, and my answer does not answer the question, but I think I should leave it, if you search for c++ vector bool slow, you end up here.
vector<bool> is terribly slow. At least on my Arch Linux system (you can probably get a better implementation or something... but I was really surprised). If anybody has any suggestions why this is so slow, I'm all ears! (Sorry for the blunt beginning, here's the more professional part.)
I've written two implementations of the SOE, and the 'close to metal' C implementation is 10 times faster. sievec.c is the C implementation, and sievestl.cpp is the C++ implementation. I just compiled with make (implicit rules only, no makefile): and the results were 1.4 sec for the C version, and 12 sec for the C++/STL version:
sievecmp % make -B sievec && time ./sievec 27
cc sievec.c -o sievec
aa 1056282
./sievec 27 1.44s user 0.01s system 100% cpu 1.455 total
and
sievecmp % make -B sievestl && time ./sievestl 27
g++ sievestl.cpp -o sievestl
1056282./sievestl 27 12.12s user 0.01s system 100% cpu 12.114 total
sievec.c is as follows:
#include <stdio.h>
#include <stdlib.h>
typedef unsigned long prime_t;
typedef unsigned long word_t;
#define LOG_WORD_SIZE 6
#define INDEX(i) ((i)>>(LOG_WORD_SIZE))
#define MASK(i) ((word_t)(1) << ((i)&(((word_t)(1)<<LOG_WORD_SIZE)-1)))
#define GET(p,i) (p[INDEX(i)]&MASK(i))
#define SET(p,i) (p[INDEX(i)]|=MASK(i))
#define RESET(p,i) (p[INDEX(i)]&=~MASK(i))
#define p2i(p) ((p)>>1) // (((p-2)>>1))
#define i2p(i) (((i)<<1)+1) // ((i)*2+3)
unsigned long find_next_zero(unsigned long from,
unsigned long *v,
size_t N){
size_t i;
for (i = from+1; i < N; i++) {
if(GET(v,i)==0) return i;
}
return -1;
}
int main(int argc, char *argv[])
{
size_t N = atoi(argv[1]);
N = 1lu<<N;
// printf("%u\n",N);
unsigned long *v = malloc(N/8);
for(size_t i = 0; i < N/64; i++) v[i]=0;
unsigned long p = 3;
unsigned long pp = p2i(p * p);
while( pp <= N){
for(unsigned long q = pp; q < N; q += p ){
SET(v,q);
}
p = p2i(p);
p = find_next_zero(p,v,N);
p = i2p(p);
pp = p2i(p * p);
}
unsigned long sum = 0;
for(unsigned long i = 0; i+2 < N; i++)
if(GET(v,i)==0 && GET(v,i+1)==0) {
unsigned long p = i2p(i);
// cout << p << ", " << p+2 << endl;
sum++;
}
printf("aa %lu\n",sum);
// free(v);
return 0;
}
sievestl.cpp is as follows:
#include <iostream>
#include <vector>
#include <sstream>
using namespace std;
inline unsigned long i2p(unsigned long i){return (i<<1)+1; }
inline unsigned long p2i(unsigned long p){return (p>>1); }
inline unsigned long find_next_zero(unsigned long from, vector<bool> v){
size_t N = v.size();
for (size_t i = from+1; i < N; i++) {
if(v[i]==0) return i;
}
return -1;
}
int main(int argc, char *argv[])
{
stringstream ss;
ss << argv[1];
size_t N;
ss >> N;
N = 1lu<<N;
// cout << N << endl;
vector<bool> v(N);
unsigned long p = 3;
unsigned long pp = p2i(p * p);
while( pp <= N){
for(unsigned long q = pp; q < N; q += p ){
v[q] = 1;
}
p = p2i(p);
p = find_next_zero(p,v);
p = i2p(p);
pp = p2i(p * p);
}
unsigned sum = 0;
for(unsigned long i = 0; i+2 < N; i++)
if(v[i]==0 and v[i+1]==0) {
unsigned long p = i2p(i);
// cout << p << ", " << p+2 << endl;
sum++;
}
cout << sum;
return 0;
}

Comprehensive vector vs linked list benchmark for randomized insertions/deletions

So I am aware of this question, and others on SO that deal with issue, but most of those deal with the complexities of the data structures (just to copy here, linked this theoretically has O(
I understand the complexities would seem to indicate that a list would be better, but I am more concerned with the real world performance.
Note: This question was inspired by slides 45 and 46 of Bjarne Stroustrup's presentation at Going Native 2012 where he talks about how processor caching and locality of reference really help with vectors, but not at all (or enough) with lists.
Question: Is there a good way to test this using CPU time as opposed to wall time, and getting a decent way of "randomly" inserting and deleting elements that can be done beforehand so it does not influence the timings?
As a bonus, it would be nice to be able to apply this to two arbitrary data structures (say vector and hash maps or something like that) to find the "real world performance" on some hardware.
I guess if I were going to test something like this, I'd probably start with code something on this order:
#include <list>
#include <vector>
#include <algorithm>
#include <deque>
#include <time.h>
#include <iostream>
#include <iterator>
static const int size = 30000;
template <class T>
double insert(T &container) {
srand(1234);
clock_t start = clock();
for (int i=0; i<size; ++i) {
int value = rand();
T::iterator pos = std::lower_bound(container.begin(), container.end(), value);
container.insert(pos, value);
}
// uncomment the following to verify correct insertion (in a small container).
// std::copy(container.begin(), container.end(), std::ostream_iterator<int>(std::cout, "\t"));
return double(clock()-start)/CLOCKS_PER_SEC;
}
template <class T>
double del(T &container) {
srand(1234);
clock_t start = clock();
for (int i=0; i<size/2; ++i) {
int value = rand();
T::iterator pos = std::lower_bound(container.begin(), container.end(), value);
container.erase(pos);
}
return double(clock()-start)/CLOCKS_PER_SEC;
}
int main() {
std::list<int> l;
std::vector<int> v;
std::deque<int> d;
std::cout << "Insertion time for list: " << insert(l) << "\n";
std::cout << "Insertion time for vector: " << insert(v) << "\n";
std::cout << "Insertion time for deque: " << insert(d) << "\n\n";
std::cout << "Deletion time for list: " << del(l) << '\n';
std::cout << "Deletion time for vector: " << del(v) << '\n';
std::cout << "Deletion time for deque: " << del(d) << '\n';
return 0;
}
Since it uses clock, this should give processor time not wall time (though some compilers such as MS VC++ get that wrong). It doesn't try to measure the time for insertion exclusive of time to find the insertion point, since 1) that would take a bit more work and 2) I still can't figure out what it would accomplish. It's certainly not 100% rigorous, but given the disparity I see from it, I'd be a bit surprised to see a significant difference from more careful testing. For example, with MS VC++, I get:
Insertion time for list: 6.598
Insertion time for vector: 1.377
Insertion time for deque: 1.484
Deletion time for list: 6.348
Deletion time for vector: 0.114
Deletion time for deque: 0.82
With gcc I get:
Insertion time for list: 5.272
Insertion time for vector: 0.125
Insertion time for deque: 0.125
Deletion time for list: 4.259
Deletion time for vector: 0.109
Deletion time for deque: 0.109
Factoring out the search time would be somewhat non-trivial because you'd have to time each iteration separately. You'd need something more precise than clock (usually is) to produce meaningful results from that (more on the order or reading a clock cycle register). Feel free to modify for that if you see fit -- as I mentioned above, I lack motivation because I can't see how it's a sensible thing to do.
This is the program I wrote after watching that talk. I tried running each timing test in a separate process to make sure the allocators weren't doing anything sneaky to alter performance. I have amended the test allow timing of the random number generation. If you are concerned it is affecting the results significantly, you can time it and subtract out the time spent there from the rest of the timings. But I get zero time spent there for anything but very large N. I used getrusage() which I am pretty sure isn't portable to Windows but it would be easy to substitute in something using clock() or whatever you like.
#include <assert.h>
#include <algorithm>
#include <iostream>
#include <list>
#include <vector>
#include <stdlib.h>
#include <time.h>
#include <sys/time.h>
#include <sys/resource.h>
void f(size_t const N)
{
std::vector<int> c;
//c.reserve(N);
for (size_t i = 0; i < N; ++i) {
int r = rand();
auto p = std::find_if(c.begin(), c.end(), [=](int a) { return a >= r; });
c.insert(p, r);
}
}
void g(size_t const N)
{
std::list<int> c;
for (size_t i = 0; i < N; ++i) {
int r = rand();
auto p = std::find_if(c.begin(), c.end(), [=](int a) { return a >= r; });
c.insert(p, r);
}
}
int h(size_t const N)
{
int r;
for (size_t i = 0; i < N; ++i) {
r = rand();
}
return r;
}
double usage()
{
struct rusage u;
if (getrusage(RUSAGE_SELF, &u) == -1) std::abort();
return
double(u.ru_utime.tv_sec) + (u.ru_utime.tv_usec / 1e6) +
double(u.ru_stime.tv_sec) + (u.ru_stime.tv_usec / 1e6);
}
int
main(int argc, char* argv[])
{
assert(argc >= 3);
std::string const sel = argv[1];
size_t const N = atoi(argv[2]);
double t0, t1;
srand(127);
if (sel == "vector") {
t0 = usage();
f(N);
t1 = usage();
} else if (sel == "list") {
t0 = usage();
g(N);
t1 = usage();
} else if (sel == "rand") {
t0 = usage();
h(N);
t1 = usage();
} else {
std::abort();
}
std::cout
<< (t1 - t0)
<< std::endl;
return 0;
}
To get a set of results I used the following shell script.
seq=`perl -e 'for ($i = 10; $i < 100000; $i *= 1.1) { print int($i), " "; }'`
for i in $seq; do
vt=`./a.out vector $i`
lt=`./a.out list $i`
echo $i $vt $lt
done

Are there any better methods to do permutation of string?

void permute(string elems, int mid, int end)
{
static int count;
if (mid == end) {
cout << ++count << " : " << elems << endl;
return ;
}
else {
for (int i = mid; i <= end; i++) {
swap(elems, mid, i);
permute(elems, mid + 1, end);
swap(elems, mid, i);
}
}
}
The above function shows the permutations of str(with str[0..mid-1] as a steady prefix, and str[mid..end] as a permutable suffix). So we can use permute(str, 0, str.size() - 1) to show all the permutations of one string.
But the function uses a recursive algorithm; maybe its performance could be improved?
Are there any better methods to permute a string?
Here is a non-recursive algorithm in C++ from the Wikipedia entry for unordered generation of permutations. For the string s of length n, for any k from 0 to n! - 1 inclusive, the following modifies s to provide a unique permutation (that is, different from those generated for any other k value on that range). To generate all permutations, run it for all n! k values on the original value of s.
#include <algorithm>
void permutation(int k, string &s)
{
for(int j = 1; j < s.size(); ++j)
{
std::swap(s[k % (j + 1)], s[j]);
k = k / (j + 1);
}
}
Here swap(s, i, j) swaps position i and j of the string s.
Why dont you try std::next_permutation() or std::prev_permutation()
?
Links:
std::next_permutation()
std::prev_permutation()
A simple example:
#include<string>
#include<iostream>
#include<algorithm>
int main()
{
std::string s="123";
do
{
std::cout<<s<<std::endl;
}while(std::next_permutation(s.begin(),s.end()));
}
Output:
123
132
213
231
312
321
I'd like to second Permaquid's answer. The algorithm he cites works in a fundamentally different way from the various permutation enumeration algorithms that have been offered. It doesn't generate all of the permutations of n objects, it generates a distinct specific permutation, given an integer between 0 and n!-1. If you need only a specific permutation, it's much faster than enumerating them all and then selecting one.
Even if you do need all permutations, it provides options that a single permutation enumeration algorithm does not. I once wrote a brute-force cryptarithm cracker, that tried every possible assignment of letters to digits. For base-10 problems, it was adequate, since there are only 10! permutations to try. But for base-11 problems took a couple of minutes and base-12 problems took nearly an hour.
I replaced the permutation enumeration algorithm that I had been using with a simple i=0--to--N-1 for-loop, using the algorithm Permaquid cited. The result was only slightly slower. But then I split the integer range in quarters, and ran four for-loops simultaneously, each in a separate thread. On my quad-core processor, the resulting program ran nearly four times as fast.
Just as finding an individual permutation using the permutation enumeration algorithms is difficult, generating delineated subsets of the set of all permutations is also difficult. The algorithm that Permaquid cited makes both of these very easy
In particular, you want std::next_permutation.
void permute(string elems, int mid, int end)
{
int count = 0;
while(next_permutation(elems.begin()+mid, elems.end()))
cout << << ++count << " : " << elems << endl;
}
... or something like that...
Any algorithm for generating permutations is going to run in polynomial time, because the number of permutations for characters within an n-length string is (n!). That said, there are some pretty simple in-place algorithms for generating permutations. Check out the Johnson-Trotter algorithm.
The Knuth random shuffle algorithm is worth looking into.
// In-place shuffle of char array
void shuffle(char array[], int n)
{
for ( ; n > 1; n--)
{
// Pick a random element to move to the end
int k = rand() % n; // 0 <= k <= n-1
// Simple swap of variables
char tmp = array[k];
array[k] = array[n-1];
array[n-1] = tmp;
}
}
Any algorithm that makes use of or generates all permutations will take O(N!*N) time, O(N!) at the least to generate all permutations and O(N) to use the result, and that's really slow. Note that printing the string is also O(N) afaik.
In a second you can realistically only handle strings up to a maximum of 10 or 11 characters, no matter what method you use. Since 11!*11 = 439084800 iterations (doing this many in a second on most machines is pushing it) and 12!*12 = 5748019200 iterations. So even the fastest implementation would take about 30 to 60 seconds on 12 characters.
Factorial just grows too fast for you to hope to gain anything by writing a faster implementation, you'd at most gain one character. So I'd suggest Prasoon's recommendation. It's easy to code and it's quite fast. Though sticking with your code is completely fine as well.
I'd just recommend that you take care that you don't inadvertantly have extra characters in your string such as the null character. Since that will make your code a factor of N slower.
I've written a permutation algorithm recently. It uses a vector of type T (template) instead of a string, and it's not super-fast because it uses recursion and there's a lot of copying. But perhaps you can draw some inspiration for the code. You can find the code here.
The only way to significantly improve performance is to find a way to avoid iterating through all the permutations in the first place!
Permuting is an unavoidably slow operation (O(n!), or worse, depending on what you do with each permutation), unfortunately nothing you can do will change this fact.
Also, note that any modern compiler will flatten out your recursion when optimisations are enabled, so the (small) performance gains from hand-optimising are reduced even further.
Do you want to run through all the permutations, or count the number of permutations?
For the former, use std::next_permutation as suggested by others. Each permutation takes O(N) time (but less amortized time) and no memory except its callframe, vs O(N) time and O(N) memory for your recursive function. The whole process is O(N!) and you can't do better than this, as others said, because you can't get more than O(X) results from a program in less than O(X) time! Without a quantum computer, anyway.
For the latter, you just need to know how many unique elements are in the string.
big_int count_permutations( string s ) {
big_int divisor = 1;
sort( s.begin(), s.end() );
for ( string::iterator pen = s.begin(); pen != s.end(); ) {
size_t cnt = 0;
char value = * pen;
while ( pen != s.end() && * pen == value ) ++ cnt, ++ pen;
divisor *= big_int::factorial( cnt );
}
return big_int::factorial( s.size() ) / divisor;
}
Speed is bounded by the operation of finding duplicate elements, which for chars can be done in O(N) time with a lookup table.
I don't think this is better, but it does work and does not use recursion:
#include <iostream>
#include <stdexcept>
#include <tr1/cstdint>
::std::uint64_t fact(unsigned int v)
{
::std::uint64_t output = 1;
for (unsigned int i = 2; i <= v; ++i) {
output *= i;
}
return output;
}
void permute(const ::std::string &s)
{
using ::std::cout;
using ::std::uint64_t;
typedef ::std::string::size_type size_t;
static unsigned int max_size = 20; // 21! > 2^64
const size_t strsize = s.size();
if (strsize > max_size) {
throw ::std::overflow_error("This function can only permute strings of size 20 or less.");
} else if (strsize < 1) {
return;
} else if (strsize == 1) {
cout << "0 : " << s << '\n';
} else {
const uint64_t num_perms = fact(s.size());
// Go through each permutation one-by-one
for (uint64_t perm = 0; perm < num_perms; ++perm) {
// The indexes of the original characters in the new permutation
size_t idxs[max_size];
// The indexes of the original characters in the new permutation in
// terms of the list remaining after the first n characters are pulled
// out.
size_t residuals[max_size];
// We use div to pull our permutation number apart into a set of
// indexes. This holds what's left of the permutation number.
uint64_t permleft = perm;
// For a given permutation figure out which character from the original
// goes in each slot in the new permutation. We start assuming that
// any character could go in any slot, then narrow it down to the
// remaining characters with each step.
for (unsigned int i = strsize; i > 0; permleft /= i, --i) {
uint64_t taken_char = permleft % i;
residuals[strsize - i] = taken_char;
// Translate indexes in terms of the list of remaining characters
// into indexes in terms of the original string.
for (unsigned int o = (strsize - i); o > 0; --o) {
if (taken_char >= residuals[o - 1]) {
++taken_char;
}
}
idxs[strsize - i] = taken_char;
}
cout << perm << " : ";
for (unsigned int i = 0; i < strsize; ++i) {
cout << s[idxs[i]];
}
cout << '\n';
}
}
}
The fun thing about this is that the only state it uses from permutation to permutation is the number of the permutation, the total number of permutations, and the original string. That means it can be easily encapsulated in an iterator or something like that without having to carefully preserve the exact correct state. It can even be a random access iterator.
Of course ::std::next_permutation stores the state in the relationships between elements, but that means it can't work on unordered things, and I would really wonder what it does if you have two equal things in the sequence. You can solve that by permuting indexes of course, but that adds slightly more complication.
Mine will work with any random access iterator range provided it's short enough. And if it isn't, you'll never get through all the permutations anyway.
The basic idea of this algorithm is that every permutation of N items can be enumerated. The total number is N! or fact(N). And any given permutation can be thought of as a mapping of source indices from the original sequence into a set of destination indices in the new sequence. Once you have an enumeration of all permutations the only thing left to do is map each permutation number into an actual permutation.
The first element in the permuted list can be any of the N elements from the original list. The second element can be any of the N - 1 remaining elements, and so on. The algorithm uses the % operator to pull apart the permutation number into a set of selections of this nature. First it modulo's the permutation number by N to get a number from [0,N). It discards the remainder by dividing by N, then it modulo's it by the size of the list - 1 to get a number from [0,N-1) and so on. That is what the for (i = loop is doing.
The second step is translating each number into an index into the original list. The first number is easy because it's just a straight index. The second number is an index into a list that contains every element but the one removed at the first index, and so on. That is what the for (o = loop is doing.
residuals is a list of indices into the successively smaller lists. idxs is a list of indices into the original list. There is a one-one mapping between values in residuals and idxs. They each represent the same value in different 'coordinate spaces'.
The answer pointed to by the answer you picked has the same basic idea, but has a much more elegant way of accomplishing the mapping than my rather literal and brute force method. That way will be slightly faster than my method, but they are both about the same speed and they both have the same advantage of random access into permutation space which makes a whole number of things easier, including (as the answer you picked pointed out) parallel algorithms.
Actually you can do it using Knuth shuffling algo!
// find all the permutations of a string
// using Knuth radnom shuffling algorithm!
#include <iostream>
#include <string>
template <typename T, class Func>
void permutation(T array, std::size_t N, Func func)
{
func(array);
for (std::size_t n = N-1; n > 0; --n)
{
for (std::size_t k = 0; k <= n; ++k)
{
if (array[k] == array[n]) continue;
using std::swap;
swap(array[k], array[n]);
func(array);
}
}
}
int main()
{
while (std::cin.good())
{
std::string str;
std::cin >> str;
permutation(str, str.length(), [](std::string const &s){
std::cout << s << std::endl; });
}
}
This post: http://cplusplus.co.il/2009/11/14/enumerating-permutations/ deals with permuting just about anything, not only strings. The post itself and the comments below are pretty informative and I wouldn't want to copy&paste..
If you are interested in permutation generation I did a research paper on it a while back : http://www.oriontransfer.co.nz/research/permutation-generation
It comes complete with source code, and there are 5 or so different methods implemented.
Even I found it difficult to understand that recursive version of the first time and it took me some time to search for a berre way.Better method to find (that I can think of) is to use the algorithm proposed by Narayana Pandita. The basic idea is:
First sort the given string in no-decreasing order and then find the index of the first element from the end that is less than its next character lexicigraphically. Call this element index the 'firstIndex'.
Now find the smallest character which is greater thn the element at the 'firstIndex'. Call this element index the 'ceilIndex'.
Now swap the elements at 'firstIndex' and 'ceilIndex'.
Reverse the part of the string starting from index 'firstIndex+1' to the end of the string.
(Instead of point 4) You can also sort the part of the string from index 'firstIndex+1' to the end of the string.
Point 4 and 5 do the same thing but the time complexity in case of point 4 is O(n*n!) and that in case of point 5 is O(n^2*n!).
The above algorithm can even be applied to the case when we have duplicate characters in the string. :
The code for displaying all the permutation of a string :
#include <iostream>
using namespace std;
void swap(char *a, char *b)
{
char tmp = *a;
*a = *b;
*b = tmp;
}
int partition(char arr[], int start, int end)
{
int x = arr[end];
int i = start - 1;
for(int j = start; j <= end-1; j++)
{
if(arr[j] <= x)
{
i = i + 1;
swap(&arr[i], &arr[j]);
}
}
swap(&arr[i+1], &arr[end]);
return i+1;
}
void quickSort(char arr[], int start, int end)
{
if(start<end)
{
int q = partition(arr, start, end);
quickSort(arr, start, q-1);
quickSort(arr, q+1, end);
}
}
int findCeilIndex(char *str, int firstIndex, int n)
{
int ceilIndex;
ceilIndex = firstIndex+1;
for (int i = ceilIndex+1; i < n; i++)
{
if(str[i] >= str[firstIndex] && str[i] <= str[ceilIndex])
ceilIndex = i;
}
return ceilIndex;
}
void reverse(char *str, int start, int end)
{
while(start<=end)
{
char tmp = str[start];
str[start] = str[end];
str[end] = tmp;
start++;
end--;
}
}
void permutate(char *str, int n)
{
quickSort(str, 0, n-1);
cout << str << endl;
bool done = false;
while(!done)
{
int firstIndex;
for(firstIndex = n-2; firstIndex >=0; firstIndex--)
{
if(str[firstIndex] < str[firstIndex+1])
break;
}
if(firstIndex<0)
done = true;
if(!done)
{
int ceilIndex;
ceilIndex = findCeilIndex(str, firstIndex, n);
swap(&str[firstIndex], &str[ceilIndex]);
reverse(str, firstIndex+1, n-1);
cout << str << endl;
}
}
}
int main()
{
char str[] = "mmd";
permutate(str, 3);
return 0;
}
Here's one I just rustled up!!
void permute(const char* str, int level=0, bool print=true) {
if (print) std::cout << str << std::endl;
char temp[30];
for (int i = level; i<strlen(str); i++) {
strcpy(temp, str);
temp[level] = str[i];
temp[i] = str[level];
permute(temp, level+1, level!=i);
}
}
int main() {
permute("1234");
return 0;
}
This is not the best logic, but then, i am a beginner. I'll be quite happy and obliged if anyone gives me suggestions on this code
#include<iostream.h>
#include<conio.h>
#include<string.h>
int c=1,j=1;
int fact(int p,int l)
{
int f=1;
for(j=1;j<=l;j++)
{
f=f*j;
if(f==p)
return 1;
}
return 0;
}
void rev(char *a,int q)
{
int l=strlen(a);
int m=l-q;
char t;
for(int x=m,y=0;x<q/2+m;x++,y++)
{
t=a[x];
a[x]=a[l-y-1];
a[l-y-1]=t;
}
c++;
cout<<a<<" ";
}
int perm(char *a,int f,int cd)
{
if(c!=f)
{
int l=strlen(a);
rev(a,2);
cd++;
if(c==f)return 0;
if(cd*2==6)
{
for(int i=1;i<=c;i++)
{
if(fact(c/i,l)==1)
{
rev(a,j+1);
rev(a,2);
break;
}
}
cd=1;
}
rev(a,3);
perm(a,f,cd);
}
return 0;
}
void main()
{
clrscr();
char *a;
cout<<"\n\tEnter a Word";
cin>>a;
int f=1;
for(int o=1;o<=strlen(a);o++)
f=f*o;
perm(a,f,0);
getch();
}
**// Prints all permutation of a string**
#include<bits/stdc++.h>
using namespace std;
void printPermutations(string input, string output){
if(input.length() == 0){
cout<<output <<endl;
return;
}
for(int i=0; i<=output.length(); i++){
printPermutations(input.substr(1), output.substr(0,i) + input[0] + output.substr(i));
}
}
int main(){
string s = "ABC";
printPermutations(s, "");
return 0;
}
Here yet another recursive function for string permutations:
void permute(string prefix, string suffix, vector<string> &res) {
if (suffix.size() < 1) {
res.push_back(prefix);
return;
}
for (size_t i = 0; i < suffix.size(); i++) {
permute(prefix + suffix[i], suffix.substr(0,i) + suffix.substr(i + 1), res);
}
}
int main(){
string str = "123";
vector<string> res;
permute("", str, res);
}
The function collects all permutations in vector res.
The idea can be generalized for different type of containers using templates and iterators:
template <typename Cont1_t, typename Cont2_t>
void permute(typename Cont1_t prefix,
typename Cont1_t::iterator beg, typename Cont1_t::iterator end,
Cont2_t &result)
{
if (beg == end) {
result.insert(result.end(), prefix);
return;
}
for (auto it = beg; it != end; ++it) {
prefix.insert(prefix.end(), *it);
Cont1_t tmp;
for (auto i = beg; i != end; ++i)
if (i != it)
tmp.insert(tmp.end(), *i);
permute(prefix, tmp.begin(), tmp.end(), result);
prefix.erase(std::prev(prefix.end()));
}
}
int main()
{
string str = "123";
vector<string> rStr;
permute<string, vector<string>>("", str.begin(), str.end(), rStr);
vector<int>vint = { 1,2,3 };
vector<vector<int>> rInt;
permute<vector<int>, vector<vector<int>>>({}, vint.begin(), vint.end(), rInt);
list<long> ll = { 1,2,3 };
vector<list<long>> vlist;
permute<list<long>, vector<list<long>>>({}, ll.begin(), ll.end(), vlist);
}
This may be an interesting programming exercise, but in production code you should use a non recusrive version of permutation , like next_permutation.
//***************anagrams**************//
//************************************** this code works only when there are no
repeatations in the original string*************//
#include<iostream>
using namespace std;
int counter=0;
void print(char empty[],int size)
{
for(int i=0;i<size;i++)
{
cout<<empty[i];
}
cout<<endl;
}
void makecombination(char original[],char empty[],char comb[],int k,int& nc,int size)
{
nc=0;
int flag=0;
for(int i=0;i<size;i++)
{
flag=0; // {
for(int j=0;j<k;j++)
{
if(empty[j]==original[i]) // remove this code fragment
{ // to print permutations with repeatation
flag=1;
break;
}
}
if(flag==0) // }
{
comb[nc++]=original[i];
}
}
//cout<<"checks ";
// print(comb,nc);
}
void recurse(char original[],char empty[],int k,int size)
{
char *comb=new char[size];
int nc;
if(k==size)
{
counter++;
print(empty,size);
//cout<<counter<<endl;
}
else
{
makecombination(original,empty,comb,k,nc,size);
k=k+1;
for(int i=0;i<nc;i++)
{
empty[k-1]=comb[i];
cout<<"k = "<<k<<" nc = "<<nc<<" empty[k-1] = "<<empty[k-1]<<endl;//checks the value of k , nc, empty[k-1] for proper understanding
recurse(original,empty,k,size);
}
}
}
int main()
{
const int size=3;
int k=0;
char original[]="ABC";
char empty[size];
for(int f=0;f<size;f++)
empty[f]='*';
recurse(original,empty,k,size);
cout<<endl<<counter<<endl;
return 0;
}