Why is std::vector<char> faster than std::string?

Why is std::vector<char> faster than std::string? - c++

I have written a small test where I'm trying to compare the run speed of resizing a container and then subsequently using std::generate_n to fill it up. I'm comparing std::string and std::vector<char>. Here is the program:
#include <algorithm>
#include <iostream>
#include <iterator>
#include <random>
#include <vector>
int main()
{
std::random_device rd;
std::default_random_engine rde(rd());
std::uniform_int_distribution<int> uid(0, 25);
#define N 100000
#ifdef STRING
std::cout << "String.\n";
std::string s;
s.resize(N);
std::generate_n(s.begin(), N,
[&]() { return (char)(uid(rde) + 65); });
#endif
#ifdef VECTOR
std::cout << "Vector.\n";
std::vector<char> v;
v.resize(N);
std::generate_n(v.begin(), N,
[&]() { return (char)(uid(rde) + 65); });
#endif
return 0;
}
And my Makefile:
test_string:
g++ -std=c++11 -O3 -Wall -Wextra -pedantic -pthread -o test test.cpp -DSTRING
valgrind --tool=callgrind --log-file="test_output" ./test
cat test_output | grep "refs"
test_vector:
g++ -std=c++11 -O3 -Wall -Wextra -pedantic -pthread -o test test.cpp -DVECTOR
valgrind --tool=callgrind --log-file="test_output" ./test
cat test_output | grep "refs"
And the comparisons for certain values of N:
N=10000
String: 1,865,367
Vector: 1,860,906
N=100000
String: 5,295,213
Vector: 5,290,757
N=1000000
String: 39,593,564
Vector: 39,589,108
std::vector<char> comes out ahead everytime. Since it seems to be more performant, what is even the point of using std::string?

I used #define N 100000000. Tested 3 times for each scenario and in all scenarios string is faster. Not using Valgrind, it does not make sense.
OS: Ubuntu 14.04. Arch:x86_64 CPU: Intel(R) Core(TM) i5-4670 CPU # 3.40GHz.
$COMPILER -std=c++11 -O3 -Wall -Wextra -pedantic -pthread -o test x.cc -DVECTOR
$COMPILER -std=c++11 -O3 -Wall -Wextra -pedantic -pthread -o test x.cc -DSTRING
Times:
compiler/variant | time(1) | time(2) | time(3)
---------------------------+---------+---------+--------
g++ 4.8.2/vector Times: | 1.724s | 1.704s | 1.669s
g++ 4.8.2/string Times: | 1.675s | 1.678s | 1.674s
clang++ 3.5/vector Times: | 1.929s | 1.934s | 1.905s
clang++ 3.5/string Times: | 1.616s | 1.612s | 1.619s

std::vector comes out ahead everytime. Since it seems to be more
performant, what is even the point of using std::string?
Even if we suppose that your observation holds true for a wide range of different systems and different application contexts, it would still make sense to use std::string for various reasons, which are all rooted in the fact that a string has different semantics than a vector. A string is a piece of text (at least simple, non-internationalised English text), a vector is a collection of characters.
Two things come to mind:
Ease of use. std::string can be constructed from string literals, has a lot of convenient operators and can be subject to string-specific algorithms. Try std::string x = "foo" + ("bar" + boost::algorithm::replace_all_copy(f(), "abc", "ABC").substr(0, 10) with a std::vector<char>...
std::string is implemented with Small-String Optimization (SSO) in MSVC, eliminating heap allocation entirely in many cases. SSO is based on the observation that strings are often very short, which certainly cannot be said about vectors.
Try the following:
#include <iostream>
#include <vector>
#include <string>
int main()
{
char const array[] = "short string";
#ifdef STRING
std::cout << "String.\n";
for (int i = 0; i < 10000000; ++i) {
std::string s = array;
}
#endif
#ifdef VECTOR
std::cout << "Vector.\n";
for (int i = 0; i < 10000000; ++i) {
std::vector<char> v(std::begin(array), std::end(array));
}
#endif
}
The std::string version should outperform the std::vector version, at least with MSVC. The difference is about 2-3 seconds on my machine. For longer strings, the results should be different.
Of course, this does not really prove anything either, except two things:
Performance tests depend a lot on the environment.
Performance tests should test what will realistically be done in a real program. In the case of strings, your program may deal with many small strings rather than a single huge one, so test small strings.

Related

-O2 and -fPIC option in gcc

For performance optimization, I would like to make use of the reference of a string rather than its value. Depending on the compilation options, I obtain different results. The behavior is a bit unclear to me, and I do not know the actual gcc flag that causes that difference.
My code is
#include <string>
#include <iostream>
const std::string* test2(const std::string& in) {
// Here I want to make use of the pointer &in
// ...
// it's returned only for demonstration purposes...
return &in;
}
int main() {
const std::string* t1 = test2("text");
const std::string* t2 = test2("text");
// only for demonstration, the cout is printed....
std::cout<<"References are: "<<(t1==t2?"equivalent. ":"different. ")<<t1<<"\t"<<t2<<std::endl;
return 0;
}
There are three compilation options:
gcc main.cc -o main -lstdc++ -O0 -fPIC && ./main
gcc main.cc -o main -lstdc++ -O2 -fno-PIC && ./main
gcc main.cc -o main -lstdc++ -O2 -fPIC && ./main
The first two yield equivalent results (References are: different.), so the pointers are different, but the third one results in equivalent pointers (References are: equivalent.).
Why does this happen, and which option do I have to add to the options -O2 -fPIC such that the pointers become again different?
Since this code is embedded into a larger framework, I cannot drop the options -O2 or -fPIC.
Since I get the desired result with the option -O2 and also with -fPIC, but a different behavior if both flags are used together, the exact behavior of these flags is unclear to me.
I tried with gcc4.8 and gcc8.3.

Both t1 and t2 are dangling pointers, they point to a temporary std::string which is already destroyed. The temporary std::string is constructed from the string literal during each call to test2("text") and lives until the end of the full-expression (the ;).
Their exact values depend on how the compiler (re-)uses stack space at a particular optimization level.
which option do I have to add to the options -O2 -fPIC such that the pointers become again different?
The code exhibits undefined behavior because it's illegal to compare invalid pointer values. Simply don't do this.
If we ignore the comparing part, then we end up with this version:
#include <string>
#include <iostream>
void test2(const std::string& in) {
std::cout << "Address of in: " << (void*)&in << std::endl;
}
int main() {
test2("text");
test2("text");
}
Now this code is free from UB, and it will print either the same address or different addresses, depending on how the compiler re-uses stack space between function calls. There is no way to control this, but it's no problem because keeping track of addresses of temporaries is a bad idea to begin with.
You can try using const char* as the input argument instead, then no temporary will be created in a call test2("text"). But here again, whether or not two instances of "text" point to the same location is implementation-defined. Though GCC does coalesce identical string literals, so at least in GCC you should observe the behavior you're after.

My C++11 test show that sort(vector<string>) is even slower than C++03, any error?

With right value reference and move semantics, C++11's swap/sort speed, should be equal or greater than C++03. So I designed a simple experiment to test this.
I compile and run it with -O2, with c++03 and c++11 standard.
$g++ test.cpp -O2 && ./a.out
10240000 end construction
sort 10240000 spent1.40035
$g++ test.cpp -O2 -std=c++11 && ./a.out
10240000 end construction
sort 10240000 spent2.25684
So it seems with C++11 enabled, program is slower.
I'm on a very new mac and gcc environment:
$gcc -v
Configured with: --prefix=/Library/Developer/CommandLineTools/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.3.0 (clang-703.0.31)
Target: x86_64-apple-darwin15.6.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin
Below is source code:
#include<string>
#include<algorithm>
#include<vector>
#include<cstdlib>
#include<cstdio>
#include<iostream>
#include<ctime>
using namespace std;
string randomString()
{
const size_t scale=600;
char ret[scale];
for(size_t i=0;i<scale;++i)
{
double rand0to1=(double)rand()/RAND_MAX;
ret[i]=(char)rand0to1*92+33;
}
return ret;
}
int main()
{
srand(time(NULL));
const size_t scale=10240000;
vector<string> vs;
vs.reserve(scale);
for(size_t i=0;i<scale;++i)
{
vs.push_back(randomString());
}
cout<<vs.size()<<" end construction\n";
clock_t begin=clock();
sort(vs.begin(),vs.end());
clock_t end=clock();
double duration=(double)(end-begin)/CLOCKS_PER_SEC;
cout<<"sort "<<scale<<" spent"<<duration<<"\n";
return 0;
}
Any error with my program or understanding, how to explain my test result?
Really need your expertise on this!

Your test code has several issues.
The string you generate in ret is not null terminated so it will contain garbage from the stack which is likely to change with compiler settings. This the most likely cause of your strange results: the c++11 version sorting longer strings.
Your casts lead to strings which are all identical. Not an actual problem with measurements but probably not what you're interested in testing.
You should not use a truly random seed for benchmarking. You want to produce the same strings on every run to get reproducibility.
This fixed version of the code:
#include<string>
#include<algorithm>
#include<vector>
#include<cstdlib>
#include<cstdio>
#include<iostream>
#include<ctime>
using namespace std;
string randomString()
{
const size_t scale=600;
char ret[scale];
for(size_t i=0;i<scale;++i)
{
double rand0to1=(double)rand()/RAND_MAX;
ret[i]=(char)(rand0to1*92+33);
}
ret[scale-1] = 0;
return ret;
}
int main()
{
srand(1);
const size_t scale=10240000;
vector<string> vs;
vs.reserve(scale);
for(size_t i=0;i<scale;++i)
{
vs.push_back(randomString());
}
cout<<vs.size()<<" end construction\n";
clock_t begin=clock();
sort(vs.begin(),vs.end());
clock_t end=clock();
double duration=(double)(end-begin)/CLOCKS_PER_SEC;
cout<<"sort "<<scale<<" spent "<<duration<<"\n";
return 0;
}
produces what I believe you were expecting:
$ g++ -O2 -std=c++03 test.cpp && ./a.out
10240000 end construction
sort 10240000 spent 10.8765
$ g++ -O2 -std=c++11 test.cpp && ./a.out
10240000 end construction
sort 10240000 spent 8.72834
By the way, g++ from Xcode on a mac is actually clang. But the results are similar:
$ clang++ -O2 -std=c++03 test.cpp && ./a.out
10240000 end construction
sort 10240000 spent 10.9408
$ clang++ -O2 -std=c++11 test.cpp && ./a.out
10240000 end construction
sort 10240000 spent 8.33261
Tested with g++ 6.2.1 and clang 3.9.0. The -std=c++03 switch is important as without it, g++ compiles in a mode which gives the fast times.

performance regression with Eigen 3.3.0 vs. 3.2.10?

We're just in the process of porting our codebase over to Eigen 3.3 (quite an undertaking with all the 32-byte alignment issues). However, there's a few places where performance seems to have been badly affected, contrary to expectations (I was looking forward to some speedup given the extra support for FMA and AVX...). These include eigenvalue decomposition, and matrix*matrix.transpose()*vector products. I've written two minimal working examples to demonstrate.
All tests run on an up to date Arch Linux system, using an Intel Core i7-4930K CPU (3.40GHz), and compiled with g++ version 6.2.1.
1. Eigen value decomposition:
A straightforward self-adjoint eigenvalue decomposition takes twice as long with Eigen 3.3.0 as it does with 3.2.10.
File test_eigen_EVD.cpp:
#define EIGEN_DONT_PARALLELIZE
#include <Eigen/Dense>
#include <Eigen/Eigenvalues>
#define SIZE 200
using namespace Eigen;
int main (int argc, char* argv[])
{
MatrixXf mat = MatrixXf::Random(SIZE,SIZE);
SelfAdjointEigenSolver<MatrixXf> eig;
for (int n = 0; n < 1000; ++n)
eig.compute (mat);
return 0;
}
Test results:
eigen-3.2.10:
g++ -march=native -O2 -DNDEBUG -isystem eigen-3.2.10 test_eigen_EVD.cpp -o test_eigen_EVD && time ./test_eigen_EVD
real 0m5.136s
user 0m5.133s
sys 0m0.000s
eigen-3.3.0:
g++ -march=native -O2 -DNDEBUG -isystem eigen-3.3.0 test_eigen_EVD.cpp -o test_eigen_EVD && time ./test_eigen_EVD
real 0m11.008s
user 0m11.007s
sys 0m0.000s
Not sure what might be causing this, but if anyone can see a way of maintaining performance with Eigen 3.3, I'd like to know about it!
2. matrix*matrix.transpose()*vector product:
This particular example takes a whopping 200× longer with Eigen 3.3.0...
File test_eigen_products.cpp:
#define EIGEN_DONT_PARALLELIZE
#include <Eigen/Dense>
#define SIZE 200
using namespace Eigen;
int main (int argc, char* argv[])
{
MatrixXf mat = MatrixXf::Random(SIZE,SIZE);
VectorXf vec = VectorXf::Random(SIZE);
for (int n = 0; n < 50; ++n)
vec = mat * mat.transpose() * VectorXf::Random(SIZE);
return vec[0] == 0.0;
}
Test results:
eigen-3.2.10:
g++ -march=native -O2 -DNDEBUG -isystem eigen-3.2.10 test_eigen_products.cpp -o test_eigen_products && time ./test_eigen_products
real 0m0.040s
user 0m0.037s
sys 0m0.000s
eigen-3.3.0:
g++ -march=native -O2 -DNDEBUG -isystem eigen-3.3.0 test_eigen_products.cpp -o test_eigen_products && time ./test_eigen_products
real 0m8.112s
user 0m7.700s
sys 0m0.410s
Adding brackets to the line in the loop like this:
vec = mat * ( mat.transpose() * VectorXf::Random(SIZE) );
makes a huge difference, with both Eigen versions then performing equally well (actually 3.3.0 is slightly better), and faster than the unbracketed 3.2.10 case. So there is a fix. Still, it's odd that 3.3.0 would struggle so much with this.
I don't know whether this is a bug, but I guess it's worth reporting in case this is something that needs to be fixed. Or maybe I was just doing it wrong...
Any thoughts appreciated.
Cheers,
Donald.
EDIT
As pointed out by ggael, the EVD in Eigen 3.3 is faster if compiled using clang++, or with -O3 with g++. So that's problem 1 fixed.
Problem 2 isn't really a problem since I can just put brackets to force the most efficient order of operations. But just for completeness: there does seems to be a flaw somewhere in the evaluation of these operations. Eigen is an incredible piece of software, I think this probably deserves to be fixed. Here's a modified version of the MWE, just to show that it's unlikely to be related to the first temporary product being taken out of the loop (at least as far as I can tell):
#define EIGEN_DONT_PARALLELIZE
#include <Eigen/Dense>
#include <iostream>
#define SIZE 200
using namespace Eigen;
int main (int argc, char* argv[])
{
VectorXf vec (SIZE), vecsum (SIZE);
MatrixXf mat (SIZE,SIZE);
for (int n = 0; n < 50; ++n) {
mat = MatrixXf::Random(SIZE,SIZE);
vec = VectorXf::Random(SIZE);
vecsum += mat * mat.transpose() * VectorXf::Random(SIZE);
}
std::cout << vecsum.norm() << std::endl;
return 0;
}
In this example, the operands are all initialised within the loop, and the results accumulated in vecsum, so there's no way the compiler can precompute anything, or optimise away unnecessary computations. This shows the exact same behaviour (this time testing with clang++ -O3 (version 3.9.0):
$ clang++ -march=native -O3 -DNDEBUG -isystem eigen-3.2.10 test_eigen_products.cpp -o test_eigen_products && time ./test_eigen_products
5467.82
real 0m0.060s
user 0m0.057s
sys 0m0.000s
$ clang++ -march=native -O3 -DNDEBUG -isystem eigen-3.3.0 test_eigen_products.cpp -o test_eigen_products && time ./test_eigen_products
5467.82
real 0m4.225s
user 0m3.873s
sys 0m0.350s
So same result, but vastly different execution times. Thankfully, this is is easily resolved by placing brackets in the right places, but there does seem to be a regression somewhere in Eigen 3.3's evaluation of operations. With brackets around the mat.transpose() * VectorXf::Random(SIZE) part, the execution times are reduced for both Eigen versions to around 0.020s (so Eigen 3.2.10 clearly also benefits in this case). At least this means we can keep getting awesome performance out of Eigen!
In the meantime, I'll accept ggael's answer, it's all I needed to know to move forward.

For the EVD, I cannot reproduce with clang. With gcc, you need -O3 to avoid an inlining issue. Then, with both compiler, Eigen 3.3 will deliver a 33% speedup.
EDIT my previous answer regarding the matrix*matrix*vector product was wrong. This is a shortcoming in Eigen 3.3.0, and will be fixed in Eigen 3.3.1. For the record I leave here my previous analysis which is still partly valid:
As you noticed you should really add the parenthesis to perform two
matrix*vector products instead of a big matrix*matrix product.
Then the speed difference is easily explained by the fact that in 3.2,
the nested matrix*matrix product is immediately evaluated (at
nesting time), whereas in 3.3 it is evaluated at evaluation time, that
is in operator=. This means that in 3.2, the loop is equivalent to:
for (int n = 0; n < 50; ++n) {
MatrixXf tmp = mat * mat.transpose();
vec = tmp * VectorXf::Random(SIZE);
}
and thus the compiler can move tmp out of the loop. Production code
should not rely on the compiler for this kind of task and rather
explicitly moves constant expression outside loops.
This is true, except that in practice the compiler was not smart enough to move the temporary out of the loop.

(Missing) performance improvements with C++11 move semantics

I've been writing C++11 code for quite some time now, and haven't done any benchmarking of it, only expecting things like vector operations to "just be faster" now with move semantics. So when actually benchmarking with GCC 4.7.2 and clang 3.0 (default compilers on Ubuntu 12.10 64-bit) I get very unsatisfying results. This is my test code:
EDIT: With regards to the (good) answers posted by #DeadMG and #ronag, I changed the element type from std::string to my::string which does not have a swap(), and made all inner strings larger (200-700 bytes) so that they shouldn't be the victims of SSO.
EDIT2: COW was the reason. Adapted code again by the great comments, changed the storage from std::string to std::vector<char> and leaving out copy/move onstructors (letting the compiler generate them instead). Without COW, the speed difference is actually huge.
EDIT3: Re-added the previous solution when compiled with -DCOW. This makes the internal storage a std::string rather than a std::vector<char> as requested by #chico.
#include <string>
#include <vector>
#include <fstream>
#include <iostream>
#include <algorithm>
#include <functional>
static std::size_t dec = 0;
namespace my { class string
{
public:
string( ) { }
#ifdef COW
string( const std::string& ref ) : str( ref ), val( dec % 2 ? - ++dec : ++dec ) {
#else
string( const std::string& ref ) : val( dec % 2 ? - ++dec : ++dec ) {
str.resize( ref.size( ) );
std::copy( ref.begin( ), ref.end( ), str.begin( ) );
#endif
}
bool operator<( const string& other ) const { return val < other.val; }
private:
#ifdef COW
std::string str;
#else
std::vector< char > str;
#endif
std::size_t val;
}; }
template< typename T >
void dup_vector( T& vec )
{
T v = vec;
for ( typename T::iterator i = v.begin( ); i != v.end( ); ++i )
#ifdef CPP11
vec.push_back( std::move( *i ) );
#else
vec.push_back( *i );
#endif
}
int main( )
{
std::ifstream file;
file.open( "/etc/passwd" );
std::vector< my::string > lines;
while ( ! file.eof( ) )
{
std::string s;
std::getline( file, s );
lines.push_back( s + s + s + s + s + s + s + s + s );
}
while ( lines.size( ) < ( 1000 * 1000 ) )
dup_vector( lines );
std::cout << lines.size( ) << " elements" << std::endl;
std::sort( lines.begin( ), lines.end( ) );
return 0;
}
What this does is read /etc/passwd into a vector of lines, then duplicating this vector onto itself over and over until we have at least 1 million entries. This is where the first optimization should be useful, not only the explicit std::move() you see in dup_vector(), but also the push_back per se should perform better when it needs to resize (create new + copy) the inner array.
Finally, the vector is sorted. This should definitely be faster when you don't need to copy temporary objects each time two elements are swapped.
I compile and run this two ways, one being as C++98, the next as C++11 (with -DCPP11 for the explicit move):
1> $ rm -f a.out ; g++ --std=c++98 test.cpp ; time ./a.out
2> $ rm -f a.out ; g++ --std=c++11 -DCPP11 test.cpp ; time ./a.out
3> $ rm -f a.out ; clang++ --std=c++98 test.cpp ; time ./a.out
4> $ rm -f a.out ; clang++ --std=c++11 -DCPP11 test.cpp ; time ./a.out
With the following results (twice for each compilation):
GCC C++98
1> real 0m9.626s
1> real 0m9.709s
GCC C++11
2> real 0m10.163s
2> real 0m10.130s
So, it's slightly slower to run when compiled as C++11 code. Similar results goes for clang:
clang C++98
3> real 0m8.906s
3> real 0m8.750s
clang C++11
4> real 0m8.858s
4> real 0m9.053s
Can someone tell me why this is? Are the compilers optimizing so good even when compiling for pre-C++11, that they practically reach move semantic behaviour after all? If I add -O2, all code runs faster, but the results between the different standards are almost the same as above.
EDIT: New results with my::string and rather than std::string, and larger individual strings:
$ rm -f a.out ; g++ --std=c++98 test.cpp ; time ./a.out
real 0m16.637s
$ rm -f a.out ; g++ --std=c++11 -DCPP11 test.cpp ; time ./a.out
real 0m17.169s
$ rm -f a.out ; clang++ --std=c++98 test.cpp ; time ./a.out
real 0m16.222s
$ rm -f a.out ; clang++ --std=c++11 -DCPP11 test.cpp ; time ./a.out
real 0m15.652s
There are very small differences between C++98 and C+11 with move semantics. Slightly slower with C++11 with GCC and slightly faster with clang, but still very small differencies.
EDIT2: Now without std::string's COW, the performance improvement is huge:
$ rm -f a.out ; g++ --std=c++98 test.cpp ; time ./a.out
real 0m10.313s
$ rm -f a.out ; g++ --std=c++11 -DCPP11 test.cpp ; time ./a.out
real 0m5.267s
$ rm -f a.out ; clang++ --std=c++98 test.cpp ; time ./a.out
real 0m10.218s
$ rm -f a.out ; clang++ --std=c++11 -DCPP11 test.cpp ; time ./a.out
real 0m3.376s
With optimization, the difference is a lot bigger too:
$ rm -f a.out ; g++ -O2 --std=c++98 test.cpp ; time ./a.out
real 0m5.243s
$ rm -f a.out ; g++ -O2 --std=c++11 -DCPP11 test.cpp ; time ./a.out
real 0m0.803s
$ rm -f a.out ; clang++ -O2 --std=c++98 test.cpp ; time ./a.out
real 0m5.248s
$ rm -f a.out ; clang++ -O2 --std=c++11 -DCPP11 test.cpp ; time ./a.out
real 0m0.785s
Above showing a factor of ~6-7 times faster with C++11.
Thanks for the great comments and answers. I hope this post will be useful and interesting to others too.

This should definitely be faster when you don't need to copy temporary
objects each time two elements are swapped.
std::string has a swap member, so sort will already use that, and it's internal implementation will already be move semantics, effectively. And you won't see a difference between copy and move for std::string as long as SSO is involved. In addition, some versions of GCC still have a non-C++11-permitted COW-based implementation, which also would not see much difference between copy and move.

This is probably due to the small string optimization, which can occur (depending on the compiler) for strings shorter than e.g 16 characters. I would guess that all the lines in the file are quite short, since they are passwords.
When small string optimization is active for a particular string then move is done as a copy.
You will need to have larger strings to see any speed improvements with move semantics.

I think that you'll need to profile the program. Maybe most of the time is spent in the lines T v = vec; and the std::sort(..) of a vector of 20 million strings!!! Nothing to do with move semantics.

Big number in C++

I am trying to place a big number in a C++ variable. The number is 600851475143
I tried unsigned long long int but got an error saying it the constant was too big.
I then tried a bigInt library called BigInt -> http://mattmccutchen.net/bigint/
The problem is I can't compile the code as I get many errors regarding the lib.
undefined reference to `BigInteger::BigInteger(int)' <-- lot's of these.
Here is my code so far:
#include "string"
#include "iostream"
#include "bigint/NumberlikeArray.hh"
#include "bigint/BigUnsigned.hh"
#include "bigint/BigInteger.hh"
#include "bigint/BigIntegerAlgorithms.hh"
#include "bigint/BigUnsignedInABase.hh"
#include "bigint/BigIntegerUtils.hh"
using namespace std;
int main() {
//unsigned long int num = 13195;
//unsigned long long int num = 600851475143;
BigInteger num = 13195;
int divider = 2;
//num = 600851475143;
while (1) {
if ((num % divider) == 0) {
cout << divider << '\n';
num /= divider;
}
else
divider++;
if (num == 1)
break;
}
}
If I put a smaller number and don't use the BigInt lib this program runs fine.
Any help will be appreciated :D

You can specify an integer literal as long by the suffix L.
You can specify an integer literal as long long by the suffix LL.
#include <iostream>
int main()
{
long long num = 600851475143LL;
std::cout << num;
}

The number is 600851475143 isn't too large for a long long int but you need to use the LL suffix when using a long long constants (ULL for unsigned long long int):
unsigned long long int num = 600851475143ULL;

Raison d'etre of a big integer library is to represent integers which your language cannot handle natively. That means, you cannot even write it down as a literal. Probably, that library has a way to parse a string as a big number.

In a more general case when you cannot fit your number in a long long, and can live with the GNU LGPL license (http://www.gnu.org/copyleft/lesser.html), I would suggest trying the GNU Multiprecision Library (http://gmplib.org/).
It is extremely fast, written in C and comes with a very cool C++-wrapper-library.

Is there a bigint lib to link in or a bigint.cpp to compile?

If you are getting undefined reference errors for the bignum library, you probably didn't link it. On Unix, you will have to pass an option like -lbigint. If you are using an IDE, you will have to find the linker settings and add the library.
As for the numbers, as has already been said, a natural constant defaults to int type. You must use LL/ll to get a long long.

The first thing to do in this case is to figure out what is the largest number that you can fit into an unsigned long long. Since it is 64 bit, the largest number would be 2^64-1 = 18446744073709551615, which is larger than your number. Then you know that you are doing something wrong, and you look at the answer by Martin York to see how to fix it.

Try this one.
Basically you can have your own custom class which uses linked list to store the number of infinite size. ( RAM is the restriction )
Try this one
https://mattmccutchen.net/bigint/

For anyone else having problems with this library five years after this question was asked, this is the answer for you.
You cannot just compile your program, it will fail to link with an ugly impenetrable error!
This library is a collection of c++ files which you are supposed to compile to .o files and link against. If you look at the output of the make file provided with the sample program you will see this:
g++ -c -O2 -Wall -Wextra -pedantic BigUnsigned.cc
g++ -c -O2 -Wall -Wextra -pedantic BigInteger.cc
g++ -c -O2 -Wall -Wextra -pedantic BigIntegerAlgorithms.cc
g++ -c -O2 -Wall -Wextra -pedantic BigUnsignedInABase.cc
g++ -c -O2 -Wall -Wextra -pedantic BigIntegerUtils.cc
g++ -c -O2 -Wall -Wextra -pedantic sample.cc
g++ sample.o BigUnsigned.o BigInteger.o BigIntegerAlgorithms.o BigUnsignedInABase.o BigIntegerUtils.o -o sample
Replace sample with the name of your program, paste these lines in a makefile or script, and away you go.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Why is std::vector<char> faster than std::string? - c++

Related

-O2 and -fPIC option in gcc

My C++11 test show that sort(vector<string>) is even slower than C++03, any error?

performance regression with Eigen 3.3.0 vs. 3.2.10?

(Missing) performance improvements with C++11 move semantics

Big number in C++

Categories

Resources