Finding maximum value in Python vs. C++ - c++

I am just curious as to why finding the maximum value in C++ is faster than in Python3. Here is a snippet of my code in both languages:
C++:
int main() {
int arr[] = {45, 67, 89};
int temp = 0;
for(int n = 0; n < 3; n++) {
if(arr[n] > temp)
temp = arr[n];
}
cout << "Biggest number: " << temp << endl;
}
Python:
def Main():
numbers = ['87', '67', '32', '43']
print(max(numbers))
if __name__ == "__main__":
Main()
As it is illustrated in the code, I am finding the maximum value in C++ via looping each element in an array as compared to using the max() method in Python.
I then ran the code on the terminal to find their execution times and found out that it takes approximately 0.006s(C++) and 0.032s(Python). Is there a way to further shorten Python's execution time?

Python is an interpreted language. Python has to read the text file with the python code, parse it, and only then begin executing it.
By the time the C++ code executes, the C++ compiler already did all the heavy lifting of compiling C++ into native machine language code that gets directly executed by the CPU.
It is possible to precompile Python code; this'll save some overhead, but the C++ code will still get the benefit of C++ compile-time optimization. With a small array size, an aggressive C++ compiler is likely to unroll the loop, and maybe even compute the maximum value at compile time, instead of at runtime; so all you end up executing is:
cout << "Biggest number: " << 89 << endl;
This is something that, theoretically, Python can also do; however that'll take even more CPU cycles to figure out, at runtime.

Assuming you're using a larger vector than the toy example above, I would give numpy a shot.
# set up a vector with 50000 random elements
a = np.random.randint(0,100000,50000)
max_val = np.max(a)
Very fast relative to looping.
My computer shows it about 12x faster to use np.max than the built in max() operation in python. C++ would be even faster as it's a compiled language. (Numpy wraps around low level packages that are optimized C code.)

Related

C++ is way slower than MATLAB

I am trying to generate 5000 by 5000 random number matrix. Here is what I do with MATLAB:
for i = 1:100
rand(5000)
end
And here is what I do in C++:
#include <iostream>
#include <stdlib.h>
#include <time.h>
#include <ctime>
using namespace std;
int main(){
int N = 5000;
double ** A = new double*[N];
for (int i=0;i<N;i++)
A[i] = new double[N];
srand(time(NULL));
clock_t start = clock();
for (int k=0;k<100;k++){
for (int i=0;i<N;i++){
for (int j=0;j<N;j++){
A[i][j] = rand();
}
}
}
cout << "T="<< (clock()-start)/(double)(CLOCKS_PER_SEC/1000)<< "ms " << endl;
}
MATLAB takes around 38 seconds while C++ takes around 90 seconds.
In another question, people executed the same code and got same speeds for both C++ and MATLAB.
I am using visual C++ with the following optimizations
I would like to learn what I am missing here? Thank you for all the help.
EDIT: Here is the key thing though...
Why MATLAB is faster than C++ in creating random numbers?
In this question, people gave me answers where their C++ speeds are same as MATLAB. When I use the same code I get way worse speeds and I am trying to understand why.
Your test is flawed, as others have noted, and does not even address the statement made by the title. You are comparing an inbuilt Matlab function to C++, not Matlab code itself, which in fact executes 100x more slowly than C++. Matlab is just a wrapper around the BLAS/LAPACK libraries in C/Fortran so one would expect a Matlab script, and a competently written C++ to be approximately equivalent, and indeed they are: This code in Matlab 2007b
tic; A = rand(5000); toc
executes in 810ms on my machine and this
#include <iostream>
#include <stdlib.h>
#include <time.h>
#include <ctime>
#define N 5000
int main()
{
srand(time(NULL));
clock_t start = clock();
int num_rows = N,
num_cols = N;
double * A = new double[N*N];
for (int i=0; i<N*N; ++i)
A[i] = rand();
std::cout << "T="<< (clock()-start)/(double)(CLOCKS_PER_SEC/1000)<< "ms " << std::endl;
return 0;
}
executes in 830ms. A slight advantage for Matlab's in-house RNG over rand() is not too surprising. Note also the single indexing. This is how Matlab does it, internally. It then uses a clever indexing system (developed by others) to give you a matrix-like interface to the data.
In your C++ code, you are doing 5000 allocations of double[5000] on the heap. You would probably get much better speed if you did a single allocation of a double[25000000], and then do your own arithmetic to convert your 2 indices to a single one.
I believe MATLAB utilize multiple cpu cores on your machine. Have you try to write a multi-threaded version and measure the difference?
Also, the quality of (pseudo) random would also make slightly difference (but not that much).
In my experience,
First check that you execute your C++ code in release mode instead of in Debug mode. (Although I see in the picture you are in release mode)
Consider MPI parallelization.
Bear in mind that MATLAB is highly optimized and compiled with the Intel compiler which produces faster executables. You can also try more advanced compilers if you can afford them.
Last you can make a loop aggregation by using a function to generate combinations of i, j in a single loop. (In python this is a common practice given by the function product from the itertools library, see this)
I hope it helps.

large loop for timing tests gets somehow optimized to nothing?

I am trying to test a series of libraries for matrix-vector computations. For that I just make a large loop and inside I call the routine I want to time. Very simple. However I sometimes see that when I increase the level of optimization for the compiler the time drops to zero no matter how large the loop is. See the example below where I try to time a C macro to compute cross products. What is the compiler doing? how can I avoid it but to allow maximum optimization for floating point arithmetics? Thank you in advance
The example below was compiled using g++ 4.7.2 on a computer with an i5 intel processor.
Using optimization level 1 (-O1) it takes 0.35 seconds. For level two or higher it drops down to zero. Remember, I want to time this so I want the computations to actually happen even if, for this simple test, unnecessary.
#include<iostream>
using namespace std;
typedef double Vector[3];
#define VecCross(A,assign_op,B,dummy_op,C) \
( A[0] assign_op (B[1] * C[2]) - (B[2] * C[1]), \
A[1] assign_op (B[2] * C[0]) - (B[0] * C[2]), \
A[2] assign_op (B[0] * C[1]) - (B[1] * C[0]) \
)
double get_time(){
return clock()/(double)CLOCKS_PER_SEC;
}
int main()
{
unsigned long n = 1000000000u;
double start;
{//C macro cross product
Vector u = {1,0,0};
Vector v = {1,1,0};
Vector w = {1.2,1.2,1.2};
start = get_time();
for(unsigned long i=0;i<n;i++){
VecCross (w, =, u, X, v);
}
cout << "C macro cross product: " << get_time()-start << endl;
}
return 0;
}
Ask yourself, what does your program actually do, in terms of what is visible to the end-user?
It displays the result of a calculation: get_time()-start. The contents of your loop have no bearing on the outcome of that calculation, because you never actually use the variables being modified inside the loop.
Therefore, the compiler optimises out the entire loop since it is irrelevant.
One solution is to output the final state of the variables being modified in the loop, as part of your cout statement, thus forcing the compiler to compute the loop. However, a smart compiler could also figure out that the loop always calculates the same thing, and it can simply insert the result directly into your cout statement, because there's no need to actually calculate it at run-time. As a workaround to this, you could for example require that one of the inputs to the loop be provided at run-time (e.g. read it in from a file, command line argument, cin, etc.).
For more (and possibly better) solutions, check out this duplicate thread: Force compiler to not optimize side-effect-less statements

Faster input and output

#include<iostream>
using namespace std;
int main(){
int i,x,max=0;
cin>>x;
int a[x];
for(i=0;i<x;i++){
cin>>a[i];
if(max<a[i]){
max=a[i];}
}
int b[max+1];
for(i=0;i<max+1;i++){
b[i]=-1;
}
for(i=0;i<x;i++){
if(b[a[i]]==-1){
b[a[i]]=1;
}
else{
b[a[i]]++;
}
}
i=0;
while(i<=max){
while(b[i]>0&&b[i]!=-1){
cout<<i<<endl;
b[i]--;
}
i++;
}
return 0;
}
Guys I tried indexing method for sorting and codechef shows tle .. complexity of this problem is not o(n) but closer to it ... the question has a time limit of 5 sec and source limit is 50000 bytes..
Any help on how to improve the performance either by faster i/o or code computations ...
I'm pretty certain your code is problematic because you are using cout << x << endl; in a loop that will print a huge number of lines.
I will be back with "difference" in a few minutes.
Edit: Not sure I can make much of a difference either way. Obviously, depending on compiler, it may vary greatly, but with my g++ -O2 and 100000 input numbers, it takes 0.16 - 0.18s to use endl; and 0.06 - 0.07s to use '\n' for the output.
Using printf isn't faster than cout, but scanf is a little faster than cin (0.04s +/- 0.05).
However, that is really related to sync_with_stdio. If we use cin.sync_with_stdio(false); then the results are the same for scanf and cin.
All measurements are made with a file as input and a file as output - it takes much longer to write to the shell, but that's because it's scrolling 100k lines of text past me.
(Your program will crash with eithr "large" inputs or with large number of inputs - if max is greater than about 1 million, the code will crash due to out of stack - on many systems, that may happen for lower values too)
Avoid cin and cout, and use the C IO functions scanf and printf instead. You'll find they can be up to 5x faster than the slow c++ functions.

C++ To Cuda Conversion/String Generation And Comparison

So I am in a basic High School coding class. We had to think up one
of our semester projects. I chose to
base mine on ideas and applications
that arn't used in traditional code.
This brought up the idea for use of
CUDA. One of the best ways I would
know to compare speed of traditional
methods versus unconventional is
string generation and comparison. One
could demonstrate the generation and
matching speed of traditional CPU
generation with timers and output. And
then you could show the increase(or
decrease) in speed and output of GPU
Processing.
I wrote this C++ code to generate random characters that are input into
a character array and then match that
array to a predetermined string.
However like most CPU programming it
is incredibly slow comparatively to
GPU programming. I've looked over CUDA
API and could not find something that
would possibly lead me in the right
direction for what I'm looking to do.
Below is the code I have written in C++, if anyone could point me in
the direction of such things as a
random number generator that I can
convert to chars using ASCII codes,
that would be excellent.
#include <iostream>
#include <string>
#include <cstdlib>
using namespace std;
int sLength = 0;
int count = 0;
int stop = 0;
int maxValue = 0;
string inString = "aB1#";
static const char alphanum[] =
"0123456789"
"!##$%^&*"
"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
"abcdefghijklmnopqrstuvwxyz";
int stringLength = sizeof(alphanum) - 1;
char genRandom()
{
return alphanum[rand() % stringLength];
}
int main()
{
cout << "Length of string to match?" << endl;
cin >> sLength;
string sMatch(sLength, ' ');
while(true)
{
for (int x = 0; x < sLength; x++)
{
sMatch[x] = genRandom();
//cout << sMatch[x];
count++;
if (count == 2147000000)
{
count == 0;
maxValue++;
}
}
if (sMatch == inString)
{
cout << "It took " << count + (maxValue*2147000000) << " randomly generated characters to match the strings." << endl;
cin >> stop;
}
//cout << endl;
}
}
If you want to implement a pseudorandom number generator using CUDA, have a look over here. If you want to generate chars from a predetermined set of characters, you can just put all possible chars into that array and create a random index (just as you are doing it right now).
But I think it might be more valuable comparison might be one that uses brute force. Therefore, you could adapt your program to try not random strings, but try one string after another in any meaningful order.
Then, on the other hand, you could implement the brute-force stuff on the GPU using CUDA. This can be tricky since you might want to stop all CUDA threads as soon as one of them finds a solution. I could imagine the brute force process using CUDA the following way: One thread tries aa as first two letters and brute-forces all following digits, the next thread tries ab as first two letters and brute-forces all following digits, the next thread tries ac as first two letters and brute-forces all following digits, and so on. All these threads run in parallel. Of course, you could vary the number of predetermined chars such that e.g. the first thread tries aaaa, the second aaab. Then, you could compare different input values.
Any way, if you have never dealt with CUDA, I recommend the vector addition sample, a very basic CUDA example, that serves very well for getting a basic understanding of what's going on with CUDA. Moreover, you should read the CUDA programming guide to make yourself familiar with CUDAs concept of a grid of thread-blocks containing a grid of threads. Once you understand this, I think it becomes clearer how CUDA organizes stuff. To be short, in CUDA, you should replace loops with a kernel, that is executed multiple times at once.
First off, I am not sure what your actual question is? Do you need a faster random number generator or one with a greater period? In that case I would recommend boost::random, the "Mersenne Twister" is generally considered state of the art. It is a little hard to get started, but boost is a great library so worth the effort.
I think the method you arer using should be fairly efficient. Be aware that it could take up to (#characters)^(length of string) draws to get to the target string (here 70^4 = 24010000). GPU should be at an advantage here since this process is a Monte Carlo simulation and trivially parallelizable.
Have you compiled the code with optimizations?

error when i give sleep(1000), to make srand() work, in visual C++

i have following program:
srand((unsigned) time(NULL));
for (int w = 0; w < 10; w++) {
int ran_x;
ran_x = rand() % 255;
cout << "nRandom X = " << ran_x << endl;
//some more lines of code
Sleep(1000);
}
I am running it on visual c++ 2008, When I run this program, it doesnt show any errors or warnings. But when I run it, some of the times it runs fine, and some of the times it stops in the middle and gives this error "This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information."
What shall I do? Is it possible to do it with out using Sleep() function and still get randomly generated values. Because if I remove Sleep(1000), it doesnt give any error but it doesnt gives random values either
Obviously you shouldn't have to sleep. Code looks sane to me, as long as you only call srand() once. If you call this entire block of code multiple times intra-second, then time(NULL) will be returning the same second value and srand() will start the pseudo-random number generation at the same number, selecting the same set of 10 subsequent numbers....
Works without any problems with gcc
#include <iostream>
#include <cstdlib>
int main (int argc, char *argv[])
{
srand( time(0) );
for (int w = 0; w < 10; w++)
{
int ran_x = rand() % 255;
std::cout<<"\nRandom X = " << ran_x << std::endl;
sleep(1);
}
return 0;
}
Seems to me your program should work perfectly without the sleep call. In fact seems to work for me on VS2008 perfectly. I believe your problems must be in code that you have removed thinking it irrelevant.
The code snippet you posted is hardly responsible for your application terminating, Sleep or not.
Because if I remove Sleep(1000), it
doesnt give any error but it doesnt
gives random values either.
Well, rand() certainly gives you pseudo-random numbers, although the PRNG implementation might not return random values evenly distributed along the bits of the returned value, i.e. in many implementations, the higher bits are changing more often than the lower bits, which is why your code is a poor choice for selecting a random value between 0 and 255.
In general, I'd recommend switching from your standard library's rand/srand to an implementation like boost's mersenne twister (boost::random), or at least see
http://c-faq.com/lib/randrange.html
What's the content of "some more lines of code"?
<psychic debugging>I bet you have code that there that, directly or indirectly, depends on the random value you generated earlier. This code will likely be a division, or involve setting the length of some container, and borks when the generated random number is 0.</psychic debugging>