Size of float and double - c++

Calculating the dot-product of two vectors (float value) gives the different results on the different machines:
6.102435302 (Win7 x64, compiler VS12 version 17.00.50727.1)
6.140244007 (Win7 x64, MinGW, gcc version 5.3.0)
The code is:
#include <iostream>
#include <iterator>
#include <fstream>
#include <vector>
#include <iomanip>
#include <algorithm>
int main(int argc, char** argv){
std::ifstream is("test.txt");
std::istream_iterator<float> start(is), end;
std::vector<float> numbers(start, end);
std::cout << "Read " << numbers.size() << " numbers" << std::endl;
float product = 0;
for (int i = 0; i <= numbers.size() - 1; i++)
product += (numbers[i])*(numbers[i]); // += means add to product
std::cout << std::setprecision(10) << product << std::endl;
std::cin.get();
}
test.txt is:
-0.082833
0.151422
-0.088526
-0.538506
0.646273
0.266993
0.200206
-0.149989
0.141407
0.158835
-0.119255
-0.039122
-0.045419
0.141848
-0.218912
-0.264521
0.032238
-0.055877
0.100393
-0.097075
-0.006268
-0.070172
-0.275793
0.103654
-0.075405
-0.117017
0.029951
-0.094158
-0.168427
0.381314
0.144073
-0.100971
-0.078645
0.013768
0.144876
0.005855
-0.018223
-0.090576
-0.071564
-0.029456
-0.098014
-0.149181
0.200667
-0.189492
0.264529
-0.061738
-0.097826
0.138872
-0.241878
0.019428
-0.087634
-0.058300
-0.009269
0.039241
-0.066350
0.059845
-0.048516
-0.070653
-0.116227
0.037203
-0.037091
-0.097324
0.043834
-0.340037
0.133938
0.087197
0.213261
-0.170708
-0.151203
0.052959
0.027145
-0.142675
-0.209020
0.001813
-0.022321
0.190862
-0.015501
-0.228589
-0.038538
-0.038480
-0.194482
0.087518
-0.257362
0.160805
-0.114158
0.176832
0.219573
-0.333160
-0.068385
-0.143289
-0.228401
0.214679
0.277186
-0.130965
0.142526
-0.166073
-0.035309
0.001260
-0.064977
0.020747
0.014043
-0.133625
-0.156975
-0.043092
0.154749
-0.181473
-0.288339
-0.144132
-0.004081
-0.071694
-0.094631
0.483994
-0.260140
0.020749
0.031850
0.041064
0.250101
-0.192338
-0.222687
0.114226
-0.227428
0.005388
-0.163509
-0.135427
-0.206788
-0.021093
0.279840
-0.055362
-0.016305
-0.279524
0.277402
0.198076
0.103796
-0.272994
0.306518
-0.024435
0.149532
-0.165079
-0.394348
-0.141590
-0.188541
0.002890
0.064264
-0.045430
-0.026021
0.096325
0.033765
0.111890
-0.012204
0.130457
-0.106022
-0.180052
-0.447620
0.051825
0.089245
-0.265819
-0.087720
0.180074
-0.259521
-0.356145
0.162247
0.282323
-0.096935
-0.040101
-0.214359
0.357032
0.195393
0.150603
-0.120796
0.204032
0.130334
0.115753
-0.123727
-0.107526
0.196002
-0.397541
0.320854
0.013272
-0.058865
0.018108
0.023616
-0.053654
-0.223593
-0.310052
0.109229
-0.107124
0.074454
-0.021471
-0.033081
0.108072
-0.067013
-0.084968
-0.171947
0.308421
-0.204827
-0.060015
0.092264
0.115863
0.131043
0.041844
I suppose, it somehow depends on the size of float and double and they are not the same for these machines. Is it possible to make the same output for both computers?
I have no access to the first machine (with the first result 6.102435302), but I can reproduce the same result with the following python code (with numpy):
test = np.loadtxt(test_file, dtype=np.float32)
result = test.dot(test)

The difference is too large to be explained by using float instead of double. Look for actual bugs in your code. Or your calculation is highly unstable, in which case you really need to examine what's going on and can't trust any numbers until you understood what's going on.
Getting the same output for both compilers is easy - just set the result to zero. But what you want is to get the correct result. You have one result that is badly wrong, and one that cannot be trusted, and you don't know which one is which. Making the results the same would only cover this up but not solve any problem.

It looks likely that the VS12 auto vectorized the loop (and then you mis-typed the result).
If you run the loop vectorised as so:
float product = 0;
for (int i = 0; i < numbers.size(); i+= 4)
{
__m128 val = *(__m128*)(&numbers[i]);
auto res = _mm_dp_ps (val, val, 255);
float result;
_mm_store_ss(&result, res);
product += result;
}
Then the result you get out is:
6.14024353
This is the same as your first result of 6.102435302 but it looks like you missed the 4 out whilst transcribing.
At least the the best explanation I can come up with. Already spent way too long on this question :-)

Related

rand() not giving random numbers depending on modulo in xcode

I have an array with 7 elements and I'm trying to get a random number between 0 - 6 so I can select an element in the array at random.
#include <iostream>
#include <cstdlib>
#include <ctime>
using namespace std;
class Color{
public:
Color(){
colors[0] = "red";
colors[1] = "orange";
colors[2] = "yellow";
colors[3] = "green";
colors[4] = "blue";
colors[5] = "indigo";
colors[6] = "violet";
}
void printColors()
{
for (int i = 0; i<sizeof(colors)/sizeof(colors[0]); ++i)
{
cout << colors[i] << endl;
}
}
void printRandomColor()
{
int random_integer = rand() % 7;
cout << random_integer << endl;
}
private:
string colors[7];
};
int main(int argc, const char * argv[]) {
srand( static_cast<unsigned int>(time(0)));
Color colorObject;
colorObject.printRandomColor();
return 0;
}
When I do rand() % 7 I keep getting 6, but if I do rand() % 6 I end up getting random numbers. What gives?
I call srand( static_cast<unsigned int>(time(0))); in my main()
I noticed the same behavior with the code shown in the question:
rand() % 7 // always shows 6
rand() % 14 // always shows 6 or 13
rand() % 21 // always shows 6, 13, or 20
The problem is peculiar and there seems to be a pattern involved. Based on the comments that some aren't able to reproduce it, I decided to compile the code, with gcc on a Linux based machine and clang on macOS; Linux seems to behave normally from what I can tell, however macOS does not. I even tried completely different code just make sure it wasn't something else, yet got the same result.
#include <cstdlib>
#include <iostream>
#include <ctime>
int main()
{
int min = 1;
int max = 7;
std::srand(std::time(0)); // use current time as seed for random generator
// int random_variable = std::rand() % max; // always returns 6
// int random_variable = std::rand() % (max - min) + min; // produces 'predictable' numbers based on the time.
int random_variable = RAND_MAX % std::rand() % (max-min) + min; // also returns predicate results based on the timing, except in reverse.
std::cout << "Random value on [0 " << RAND_MAX << "]: "
<< random_variable << '\n';
}
The only way I was able to get seemingly random results from rand() was to do:
RAND_MAX % std::rand() % (max-min) + min; // predictable based on timing
The issue is odd, and might be a bug with Clang; I'm at a loss at to what exactly is at play here. I would probably recommend using something other than rand() such as the <random> library mentioned in the comments perhaps.
EDIT: After reporting this bug to Apple this was the response:
Apple Developer Relations July 27 2017, 11:27 AM
There are no plans to address this based on the following:
std::rand directly uses rand from the C library. rand is known and
documented to be broken (and is not going to change since people
depend on its specific behavior).
From the man page: RAND(3) BSD Library Functions Manual
NAME
rand, rand_r, srand, sranddev -- bad random number generator
DESCRIPTION
These interfaces are obsoleted by arc4random(3).
For good pseudorandom numbers in C++, look at from C++11.
E.g.: http://en.cppreference.com/w/cpp/numeric/random
Based on this information RAND() is broken and won't be fixed — use an alternative random number generator.
rand() is terrible. rand() % range is worse. Don't use it. Use arc4random_uniform().
#include <iostream>
#include <cstdlib> // Needed for arc4random_uniform()
int main(int argc, char *argv[]) {
// Random number between 0 and 6.
std::cout << arc4random_uniform(7) << std::endl;
}
So in your case:
void printRandomColor()
{
int random_integer = arc4random_uniform(7);
cout << random_integer << endl;
}
If portability is desired, then here is a C++ standard example. To me, it's needlessly more complicated and runs slower, but hey… it's the C++ standard.
#include <iostream>
#include <random> // For std::random_device and std::uniform_int_distribution
int main() {
std::random_device randomizer;
std::uniform_int_distribution<int> distribution(0, 6);
// Random number between 0 and 6.
int random_integer = distribution(randomizer);
std::cout << random_integer << std::endl;
}
I would like to point out, that you are using a Random (Rand) operator, then trying to find out if the result has a Remainder (%), the Result will be the Remainder, which is where your strange math comes from. This is known as the Modulo Operator or Modulus Operator if you desire to Google it, although you should know that it actually has a slightly different name in C#, there is a Post in StackTrace about it Here:
What does the '%' operator mean?
If you open the Calc.exe Windows Program it is listed in Scientific Mode (Alt+2) as Mod.
Specifically, the way % operates is ((x - (x / y)) * y)
The above URL is a direct link to my answer where I point out specifically HOW it differs from standard / complete with a long drawn out example simulating all of the math step by step, the result returns a 0 for % and a 1 for / since the / Operand does roundUp() whilst % does roundDown() from what I've understood in the other Answers in that Post.
Update
I would at least like to have this answer here to provide reference for the Modulo Operator which is mentioned in the title of this question.
I didn't post this specifically as an answer per se, but more as reference material to avoid spam posts in the future.
If this is in fact a discovered bug, then this question is going to be picked apart letter by letter, symbol by symbol, and it's going to assist everybody involved to have this reference material here.
If I didn't know already it was named Modulo/Modulus in most languages, I would wonder what he meant by "Modulo" as he never explains anywhere that the % is named exactly that.
This answer addresses the fact that % uses roundDown() whereas / uses roundUp() complete with a referenced compile-able example written painstakingly in expanded step-by-step longhand which I then converted to C#.
I also would like to reiterate, as I mentioned in the comments, I have zero knowledge about xCode, I am somewhat familiar with C# and have provided this information in the C# context which this question is tagged with.

Casting float to int inconsistent across MinGw and Clang

Using C++, I'm trying to cast a float value to an int using these instructions :
#include <iostream>
int main() {
float NbrToCast = 1.8f;
int TmpNbr = NbrToCast * 10;
std::cout << TmpNbr << "\n";
}
I understand the value 1.8 cannot be precisely represented as a float and is actually stored as 1.79999995.
Thus, I would expect that multiplying this value by ten, would result to 17.99999995 and then casting it to an int would give 17.
When compiling and running this code with MinGW (v4.9.2 32bits) on Windows 7, I get the expected result (17).
When compiling and running this code with CLang (v600.0.57) on my Mac (OS X 10.11), I get 18as a result, which is not what I was expecting but which seems more correct in a mathematical way !
Why do I get this difference ?
Is there a way to have a consistent behavior no matter the OS or the compiler ?
Like Yuushi said in the comments, I guess the rounding rules may differ for each compiler. Having a portable solution on such a topic probably means you need to write your own rounding method.
So in your case you probably need to check the value of the digit after 7 and increment the value or not. Let's say something like:
int main() {
float NbrToCast = 1.8f;
float TmpNbr = NbrToCast * 10;
std::cout << RoundingFloatToInt(TmpNbr) << "\n";
}
int RoundingFloatToInt(const float &val)
{
float intPart, fractPart;
fractpart = modf (val, &intpart);
int result = intPart;
if (fractpart > 0.5)
{
result++;
}
return result;
}
(code not tested at all but you have the idea)
If you need performance, it's probably not great but I think it should be portable.

How does the cout statement affect the O/P of the code written?

#include <iostream>
#include <iomanip>
#include <math.h>
using namespace std;
int main() {
int t;
double n;
cin>>t;
while(t--)
{
cin>>n;
double x;
for(int i=1;i<=10000;i++)
{
x=n*i;
if(x==ceilf(x))
{
cout<<i<<endl;
break;
}
}
}
return 0;
}
For I/P:
3
5
2.98
3.16
O/P:
1
If my code is:
#include <iostream>
#include <iomanip>
#include <math.h>
using namespace std;
int main() {
int t;
double n;
cin>>t;
while(t--)
{
cin>>n;
double x;
for(int i=1;i<=10000;i++)
{
x=n*i;
cout<<"";//only this statement is added;
if(x==ceilf(x))
{
cout<<i<<endl;
break;
}
}
}
return 0;
}
For the same input O/P is:
1
50
25
The only extra line added in 2nd code is: cout<<"";
Can anyone please help in finding why there is such a difference in output just because of the cout statement added in the 2nd code?
Well this is a veritable Heisenbug. I've tried to strip your code down to a minimal replicating example, and ended up with this (http://ideone.com/mFgs0S):
#include <iostream>
#include <math.h>
using namespace std;
int main()
{
float n;
cin >> n; // this input is needed to reproduce, but the value doesn't matter
n = 2.98; // overwrite the input value
cout << ""; // comment this out => y = z = 149
float x = n * 50; // 149
float y = ceilf(x); // 150
cout << ""; // comment this out => y = z = 150
float z = ceilf(x); // 149
cout << "x:" << x << " y:" << y << " z:" << z << endl;
}
The behaviour of ceilf appears to depend on the particular sequence of iostream operations that occur around it. Unfortunately I don't have the means to debug in any more detail at the moment, but maybe this will help someone else to figure out what's going on. Regardless, it seems almost certain that it's a bug in gcc-4.9.2 and gcc-5.1. (You can check on ideone that you don't get this behaviour in gcc-4.3.2.)
You're probably getting an issue with floating point representations - which is to say that computers cannot perfectly represent all fractions. So while you see 50, the result is probably something closer to 50.00000000001. This is a pretty common problem you'll run across when dealing with doubles and floats.
A common way to deal with it is to define a very small constant (in mathematical terms this is Epsilon, a number which is simply "small enough")
const double EPSILON = 0.000000001;
And then your comparison will change from
if (x==ceilf(x))
to something like
double difference = fabs(x - ceilf(x));
if (difference < EPSILON)
This will smooth out those tiny inaccuracies in your doubles.
"Comparing for equality
Floating point math is not exact. Simple values like 0.2 cannot be precisely represented using binary floating point numbers, and the limited precision of floating point numbers means that slight changes in the order of operations can change the result. Different compilers and CPU architectures store temporary results at different precisions, so results will differ depending on the details of your environment. If you do a calculation and then compare the results against some expected value it is highly unlikely that you will get exactly the result you intended.
In other words, if you do a calculation and then do this comparison:
if (result == expectedResult)
then it is unlikely that the comparison will be true. If the comparison is true then it is probably unstable – tiny changes in the input values, compiler, or CPU may change the result and make the comparison be false."
From http://www.cygnus-software.com/papers/comparingfloats/Comparing%20floating%20point%20numbers.htm
Hope this answers your question.
Also you had a problem with
if(x==ceilf(x))
ceilf() returns a float value and x you have declared as a double.
Refer to problems in floating point comparison as to why that wont work.
change x to float and the program runs fine,
I made a plain try on my laptop and even online compilers.
g++ (4.9.2-10) gave the desired output (3 outputs), along with online compiler at geeksforgeeks.org. However, ideone, codechef did not gave the right output.
All I can infer is that online compilers name their compiler as "C++(gcc)" and give wrong output. While, geeksforgeeks.org, which names the compiler as "C++" runs perfectly, along with g++ (as tested on Linux).
So, we could arrive at a hypothesis that they use gcc to compile C++ code as a method suggested at this link. :)

First random number is always smaller than rest

I happen to notice that in C++ the first random number being called with the std rand() method is most of the time significant smaller than the second one. Concerning the Qt implementation the first one is nearly always several magnitudes smaller.
qsrand(QTime::currentTime().msec());
qDebug() << "qt1: " << qrand();
qDebug() << "qt2: " << qrand();
srand((unsigned int) time(0));
std::cout << "std1: " << rand() << std::endl;
std::cout << "std2: " << rand() << std::endl;
output:
qt1: 7109361
qt2: 1375429742
std1: 871649082
std2: 1820164987
Is this intended, due to error in seeding or a bug?
Also while the qrand() output varies strongly the first rand() output seems to change linearly with time. Just wonder why.
I'm not sure that could be classified as a bug, but it has an explanation. Let's examine the situation:
Look at rand's implementation. You'll see it's just a calculation using the last generated value.
You're seeding using QTime::currentTime().msec(), which is by nature bounded by the small range of values 0..999, but qsrand accepts an uint variable, on the range 0..4294967295.
By combining those two factors, you have a pattern.
Just out of curiosity: try seeding with QTime::currentTime().msec() + 100000000
Now the first value will probably be bigger than the second most of the time.
I wouldn't worry too much. This "pattern" seems to happen only on the first two generated values. After that, everything seems to go back to normal.
EDIT:
To make things more clear, try running the code below. It'll compare the first two generated values to see which one is smaller, using all possible millisecond values (range: 0..999) as the seed:
int totalCalls, leftIsSmaller = 0;
for (totalCalls = 0; totalCalls < 1000; totalCalls++)
{
qsrand(totalCalls);
if (qrand() < qrand())
leftIsSmaller++;
}
qDebug() << (100.0 * leftIsSmaller) / totalCalls;
It will print 94.8, which means 94.8% of the time the first value will be smaller than the second.
Conclusion: when using the current millisecond to seed, you'll see that pattern for the first two values. I did some tests here and the pattern seems to disappear after the second value is generated. My advice: find a "good" value to call qsrand (which should obviously be called only once, at the beginning of your program). A good value should span the whole range of the uint class. Take a look at this other question for some ideas:
Recommended way to initialize srand?
Also, take a look at this:
PCG: A Family of Better Random Number Generators
Neither current Qt nor C standard run-time have a quality randomizer and your test shows. Qt seems to use C run-time for that (this is easy to check but why). If C++ 11 is available in your project, use much better and way more reliable method:
#include <random>
#include <chrono>
auto seed = std::chrono::system_clock::now().time_since_epoch().count();
std::default_random_engine generator(seed);
std::uniform_int_distribution<uint> distribution;
uint randomUint = distribution(generator);
There is good video that covers the topic. As noted by commenter user2357112 we can apply different random engines and then different distributions but for my specific use the above worked really well.
Keeping in mind that making judgments about a statistical phenomena based on a small number of samples might be misleading, I decided to run a small experiment. I run the following code:
int main()
{
int i = 0;
int j = 0;
while (i < RAND_MAX)
{
srand(time(NULL));
int r1 = rand();
int r2 = rand();
if (r1 < r2)
++j;
++i;
if (i%10000 == 0) {
printf("%g\n", (float)j / (float)i);
}
}
}
which basically printed the percentage of times the first generated number was smaller than the second. Below you see the plot of that ratio:
and as you can see it actually approaches 0.5 after less than 50 actual new seeds.
As suggested in the comment, we could modify the code to use consecutive seeds every iteration and speed up the convergence:
int main()
{
int i = 0;
int j = 0;
int t = time(NULL);
while (i < RAND_MAX)
{
srand(t);
int r1 = rand();
int r2 = rand();
if (r1 < r2)
++j;
++i;
if (i%10000 == 0) {
printf("%g\n", (float)j / (float)i);
}
++t;
}
}
This gives us:
which stays pretty close to 0.5 as well.
While rand is certainly not the best pseudo random number generator, the claim that it often generates a smaller number during the first run does not seem to be warranted.

Why do std::string operations perform poorly?

I made a test to compare string operations in several languages for choosing a language for the server-side application. The results seemed normal until I finally tried C++, which surprised me a lot. So I wonder if I had missed any optimization and come here for help.
The test are mainly intensive string operations, including concatenate and searching. The test is performed on Ubuntu 11.10 amd64, with GCC's version 4.6.1. The machine is Dell Optiplex 960, with 4G RAM, and Quad-core CPU.
in Python (2.7.2):
def test():
x = ""
limit = 102 * 1024
while len(x) < limit:
x += "X"
if x.find("ABCDEFGHIJKLMNOPQRSTUVWXYZ", 0) > 0:
print("Oh my god, this is impossible!")
print("x's length is : %d" % len(x))
test()
which gives result:
x's length is : 104448
real 0m8.799s
user 0m8.769s
sys 0m0.008s
in Java (OpenJDK-7):
public class test {
public static void main(String[] args) {
int x = 0;
int limit = 102 * 1024;
String s="";
for (; s.length() < limit;) {
s += "X";
if (s.indexOf("ABCDEFGHIJKLMNOPQRSTUVWXYZ") > 0)
System.out.printf("Find!\n");
}
System.out.printf("x's length = %d\n", s.length());
}
}
which gives result:
x's length = 104448
real 0m50.436s
user 0m50.431s
sys 0m0.488s
in Javascript (Nodejs 0.6.3)
function test()
{
var x = "";
var limit = 102 * 1024;
while (x.length < limit) {
x += "X";
if (x.indexOf("ABCDEFGHIJKLMNOPQRSTUVWXYZ", 0) > 0)
console.log("OK");
}
console.log("x's length = " + x.length);
}();
which gives result:
x's length = 104448
real 0m3.115s
user 0m3.084s
sys 0m0.048s
in C++ (g++ -Ofast)
It's not surprising that Nodejs performas better than Python or Java. But I expected libstdc++ would give much better performance than Nodejs, whose result really suprised me.
#include <iostream>
#include <string>
using namespace std;
void test()
{
int x = 0;
int limit = 102 * 1024;
string s("");
for (; s.size() < limit;) {
s += "X";
if (s.find("ABCDEFGHIJKLMNOPQRSTUVWXYZ", 0) != string::npos)
cout << "Find!" << endl;
}
cout << "x's length = " << s.size() << endl;
}
int main()
{
test();
}
which gives result:
x length = 104448
real 0m5.905s
user 0m5.900s
sys 0m0.000s
Brief Summary
OK, now let's see the summary:
javascript on Nodejs(V8): 3.1s
Python on CPython 2.7.2 : 8.8s
C++ with libstdc++: 5.9s
Java on OpenJDK 7: 50.4s
Surprisingly! I tried "-O2, -O3" in C++ but noting helped. C++ seems about only 50% performance of javascript in V8, and even poor than CPython. Could anyone explain to me if I had missed some optimization in GCC or is this just the case? Thank you a lot.
It's not that std::string performs poorly (as much as I dislike C++), it's that string handling is so heavily optimized for those other languages.
Your comparisons of string performance are misleading, and presumptuous if they are intended to represent more than just that.
I know for a fact that Python string objects are completely implemented in C, and indeed on Python 2.7, numerous optimizations exist due to the lack of separation between unicode strings and bytes. If you ran this test on Python 3.x you will find it considerably slower.
Javascript has numerous heavily optimized implementations. It's to be expected that string handling is excellent here.
Your Java result may be due to improper string handling, or some other poor case. I expect that a Java expert could step in and fix this test with a few changes.
As for your C++ example, I'd expect performance to slightly exceed the Python version. It does the same operations, with less interpreter overhead. This is reflected in your results. Preceding the test with s.reserve(limit); would remove reallocation overhead.
I'll repeat that you're only testing a single facet of the languages' implementations. The results for this test do not reflect the overall language speed.
I've provided a C version to show how silly such pissing contests can be:
#define _GNU_SOURCE
#include <string.h>
#include <stdio.h>
void test()
{
int limit = 102 * 1024;
char s[limit];
size_t size = 0;
while (size < limit) {
s[size++] = 'X';
if (memmem(s, size, "ABCDEFGHIJKLMNOPQRSTUVWXYZ", 26)) {
fprintf(stderr, "zomg\n");
return;
}
}
printf("x's length = %zu\n", size);
}
int main()
{
test();
return 0;
}
Timing:
matt#stanley:~/Desktop$ time ./smash
x's length = 104448
real 0m0.681s
user 0m0.680s
sys 0m0.000s
So I went and played a bit with this on ideone.org.
Here a slightly modified version of your original C++ program, but with the appending in the loop eliminated, so it only measures the call to std::string::find(). Note that I had to cut the number of iterations to ~40%, otherwise ideone.org would kill the process.
#include <iostream>
#include <string>
int main()
{
const std::string::size_type limit = 42 * 1024;
unsigned int found = 0;
//std::string s;
std::string s(limit, 'X');
for (std::string::size_type i = 0; i < limit; ++i) {
//s += 'X';
if (s.find("ABCDEFGHIJKLMNOPQRSTUVWXYZ", 0) != std::string::npos)
++found;
}
if(found > 0)
std::cout << "Found " << found << " times!\n";
std::cout << "x's length = " << s.size() << '\n';
return 0;
}
My results at ideone.org are time: 3.37s. (Of course, this is highly questionably, but indulge me for a moment and wait for the other result.)
Now we take this code and swap the commented lines, to test appending, rather than finding. Note that, this time, I had increased the number of iterations tenfold in trying to see any time result at all.
#include <iostream>
#include <string>
int main()
{
const std::string::size_type limit = 1020 * 1024;
unsigned int found = 0;
std::string s;
//std::string s(limit, 'X');
for (std::string::size_type i = 0; i < limit; ++i) {
s += 'X';
//if (s.find("ABCDEFGHIJKLMNOPQRSTUVWXYZ", 0) != std::string::npos)
// ++found;
}
if(found > 0)
std::cout << "Found " << found << " times!\n";
std::cout << "x's length = " << s.size() << '\n';
return 0;
}
My results at ideone.org, despite the tenfold increase in iterations, are time: 0s.
My conclusion: This benchmark is, in C++, highly dominated by the searching operation, the appending of the character in the loop has no influence on the result at all. Was that really your intention?
The idiomatic C++ solution would be:
#include <iostream>
#include <string>
#include <algorithm>
int main()
{
const int limit = 102 * 1024;
std::string s;
s.reserve(limit);
const std::string pattern("ABCDEFGHIJKLMNOPQRSTUVWXYZ");
for (int i = 0; i < limit; ++i) {
s += 'X';
if (std::search(s.begin(), s.end(), pattern.begin(), pattern.end()) != s.end())
std::cout << "Omg Wtf found!";
}
std::cout << "X's length = " << s.size();
return 0;
}
I could speed this up considerably by putting the string on the stack, and using memmem -- but there seems to be no need. Running on my machine, this is over 10x the speed of the python solution already..
[On my laptop]
time ./test
X's length = 104448
real 0m2.055s
user 0m2.049s
sys 0m0.001s
That is the most obvious one: please try to do s.reserve(limit); before main loop.
Documentation is here.
I should mention that direct usage of standard classes in C++ in the same way you are used to do it in Java or Python will often give you sub-par performance if you are unaware of what is done behind the desk. There is no magical performance in language itself, it just gives you right tools.
My first thought is that there isn't a problem.
C++ gives second-best performance, nearly ten times faster than Java. Maybe all but Java are running close to the best performance achievable for that functionality, and you should be looking at how to fix the Java issue (hint - StringBuilder).
In the C++ case, there are some things to try to improve performance a bit. In particular...
s += 'X'; rather than s += "X";
Declare string searchpattern ("ABCDEFGHIJKLMNOPQRSTUVWXYZ"); outside the loop, and pass this for the find calls. An std::string instance knows it's own length, whereas a C string requires a linear-time check to determine that, and this may (or may not) be relevant to std::string::find performance.
Try using std::stringstream, for a similar reason to why you should be using StringBuilder for Java, though most likely the repeated conversions back to string will create more problems.
Overall, the result isn't too surprising though. JavaScript, with a good JIT compiler, may be able to optimise a little better than C++ static compilation is allowed to in this case.
With enough work, you should always be able to optimise C++ better than JavaScript, but there will always be cases where that doesn't just naturally happen and where it may take a fair bit of knowledge and effort to achieve that.
What you are missing here is the inherent complexity of the find search.
You are executing the search 102 * 1024 (104 448) times. A naive search algorithm will, each time, try to match the pattern starting from the first character, then the second, etc...
Therefore, you have a string that is going from length 1 to N, and at each step you search the pattern against this string, which is a linear operation in C++. That is N * (N+1) / 2 = 5 454 744 576 comparisons. I am not as surprised as you are that this would take some time...
Let us verify the hypothesis by using the overload of find that searches for a single A:
Original: 6.94938e+06 ms
Char : 2.10709e+06 ms
About 3 times faster, so we are within the same order of magnitude. Therefore the use of a full string is not really interesting.
Conclusion ? Maybe that find could be optimized a bit. But the problem is not worth it.
Note: and to those who tout Boyer Moore, I am afraid that the needle is too small, so it won't help much. May cut an order of magnitude (26 characters), but no more.
For C++, try to use std::string for "ABCDEFGHIJKLMNOPQRSTUVWXYZ" - in my implementation string::find(const charT* s, size_type pos = 0) const calculates length of string argument.
I just tested the C++ example myself. If I remove the the call to std::sting::find, the program terminates in no time. Thus the allocations during string concatenation is no problem here.
If I add a variable sdt::string abc = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" and replace the occurence of "ABC...XYZ" in the call of std::string::find, the program needs almost the same time to finish as the original example. This again shows that allocation as well as computing the string's length does not add much to the runtime.
Therefore, it seems that the string search algorithm used by libstdc++ is not as fast for your example as the search algorithms of javascript or python. Maybe you want to try C++ again with your own string search algorithm which fits your purpose better.
C/C++ language are not easy and take years make fast programs.
with strncmp(3) version modified from c version:
#define _GNU_SOURCE
#include <string.h>
#include <stdio.h>
void test()
{
int limit = 102 * 1024;
char s[limit];
size_t size = 0;
while (size < limit) {
s[size++] = 'X';
if (!strncmp(s, "ABCDEFGHIJKLMNOPQRSTUVWXYZ", 26)) {
fprintf(stderr, "zomg\n");
return;
}
}
printf("x's length = %zu\n", size);
}
int main()
{
test();
return 0;
}
Your test code is checking a pathological scenario of excessive string concatenation. (The string-search part of the test could have probably been omitted, I bet you it contributes almost nothing to the final results.) Excessive string concatenation is a pitfall that most languages warn very strongly against, and provide very well known alternatives for, (i.e. StringBuilder,) so what you are essentially testing here is how badly these languages fail under scenarios of perfectly expected failure. That's pointless.
An example of a similarly pointless test would be to compare the performance of various languages when throwing and catching an exception in a tight loop. All languages warn that exception throwing and catching is abysmally slow. They do not specify how slow, they just warn you not to expect anything. Therefore, to go ahead and test precisely that, would be pointless.
So, it would make a lot more sense to repeat your test substituting the mindless string concatenation part (s += "X") with whatever construct is offered by each one of these languages precisely for avoiding string concatenation. (Such as class StringBuilder.)
As mentioned by sbi, the test case is dominated by the search operation.
I was curious how fast the text allocation compares between C++ and Javascript.
System: Raspberry Pi 2, g++ 4.6.3, node v0.12.0, g++ -std=c++0x -O2 perf.cpp
C++ : 770ms
C++ without reserve: 1196ms
Javascript: 2310ms
C++
#include <iostream>
#include <string>
#include <chrono>
using namespace std;
using namespace std::chrono;
void test()
{
high_resolution_clock::time_point t1 = high_resolution_clock::now();
int x = 0;
int limit = 1024 * 1024 * 100;
string s("");
s.reserve(1024 * 1024 * 101);
for(int i=0; s.size()< limit; i++){
s += "SUPER NICE TEST TEXT";
}
high_resolution_clock::time_point t2 = high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::milliseconds>( t2 - t1 ).count();
cout << duration << endl;
}
int main()
{
test();
}
JavaScript
function test()
{
var time = process.hrtime();
var x = "";
var limit = 1024 * 1024 * 100;
for(var i=0; x.length < limit; i++){
x += "SUPER NICE TEST TEXT";
}
var diff = process.hrtime(time);
console.log('benchmark took %d ms', diff[0] * 1e3 + diff[1] / 1e6 );
}
test();
It seems that in nodejs there are better algorithms for substring search. You can implement it by yourself and try it out.