Faster input and output - c++

#include<iostream>
using namespace std;
int main(){
int i,x,max=0;
cin>>x;
int a[x];
for(i=0;i<x;i++){
cin>>a[i];
if(max<a[i]){
max=a[i];}
}
int b[max+1];
for(i=0;i<max+1;i++){
b[i]=-1;
}
for(i=0;i<x;i++){
if(b[a[i]]==-1){
b[a[i]]=1;
}
else{
b[a[i]]++;
}
}
i=0;
while(i<=max){
while(b[i]>0&&b[i]!=-1){
cout<<i<<endl;
b[i]--;
}
i++;
}
return 0;
}
Guys I tried indexing method for sorting and codechef shows tle .. complexity of this problem is not o(n) but closer to it ... the question has a time limit of 5 sec and source limit is 50000 bytes..
Any help on how to improve the performance either by faster i/o or code computations ...

I'm pretty certain your code is problematic because you are using cout << x << endl; in a loop that will print a huge number of lines.
I will be back with "difference" in a few minutes.
Edit: Not sure I can make much of a difference either way. Obviously, depending on compiler, it may vary greatly, but with my g++ -O2 and 100000 input numbers, it takes 0.16 - 0.18s to use endl; and 0.06 - 0.07s to use '\n' for the output.
Using printf isn't faster than cout, but scanf is a little faster than cin (0.04s +/- 0.05).
However, that is really related to sync_with_stdio. If we use cin.sync_with_stdio(false); then the results are the same for scanf and cin.
All measurements are made with a file as input and a file as output - it takes much longer to write to the shell, but that's because it's scrolling 100k lines of text past me.
(Your program will crash with eithr "large" inputs or with large number of inputs - if max is greater than about 1 million, the code will crash due to out of stack - on many systems, that may happen for lower values too)

Avoid cin and cout, and use the C IO functions scanf and printf instead. You'll find they can be up to 5x faster than the slow c++ functions.

Related

How do I know if this function works in time O(N) and memory O(N)?

The task is:
Given is an array containing N numbers, A[0],A[1],...A[N-1]. Compute the array B of length N, such that B[i]=A[0]*A[1]*...A[i-1]*A[i+1]...*A[N-1]. You shouldn't use division and both time and memory complexity should be O(N).
#include <iostream>
#include <math.h>
#include <conio.h>
#include <time.h>
long int find_solution(int result, int multiplier, int increment)
{
int safety=1;
long int solution=0;
while(safety==1)
{
if (solution*multiplier==result)
{
safety=0;
return solution;
}
else if((solution+increment)*multiplier>result)
{
safety=0;
return find_solution(result,multiplier, increment*0.1);
}
solution=solution+increment;
}
}
int main()
{
double time_spent=0.0;
int A[20] = {2, 3, 4, 5, 6,1,2,3,4,5,1,1,1,2,3,1,1,1,1,1};
clock_t begin=clock();
int i;
long int main_result=1;
for(i=0;i<20;i++)
{
main_result=main_result*A[i];
}
printf("\n%d\n",main_result);
long int B[20];
for(i=0;i<20;i++)
{
B[i]=find_solution(main_result, A[i],100000000);
}
for(i=0;i<20;i++)
{
printf("%d\n",B[i]);
}
clock_t end=clock();
time_spent=(double)(end-begin)/CLOCKS_PER_SEC;
printf("Time elapsed: %.20f\n",time_spent);
getch();
return 0;
}
I analyzed this function myself and it seems OK. When I calculate the time on the computer, it's fine if the input is 5,10,15 (I made 21 measures for each amount of input). When it's 20, the time starts growing more than it probably should.
5-0.01333333
10-0.02276
15-0.03757
20-0.07066
What might be the reason? I didn't use any nested loops. Is my function OK?
I would suggest generating much larger inputs to test time complexity, for example N1=100000, N2=200000 ... . For small inputs and small timeframes compilers, caches and other processes may impact the time you measure, and shouldnt be part of your analysis.
Your algorithm looks to be O(N) to me, as find_solution() is constant time and is not called more than some constant times. A very cool and creative solution to the problem, however, you might just be reinventing division as you are answering the question "what multiplied by the denominator equals the numerator".
My solution after thinking a little is to for each index calculate the product of all elements to the right, this can be done in O(N) time. Do the same for the other side and multiply the sides together and you have B, right?
EDIT: Also, when discussing time complexity, measurements are fine but they do not "prove" the time complexity, that has to be done manually by understanding the time complexity of different parts of the algorithm and analysing how many times something is ran.
O(N) means that the algorithm should take N operations, which is limited to the input array size of A in your case.
At main function the loop iterates 20 times but within "find_solution" method loop the complexity duplicates.
I think you are a bit confused, your task requires just the multiplication of each item of A for all B's whatever the i of B is:
B[i]=A[0]A[1]...A[i-1]*A[i+1]...*A[N-1].
So multiply each item of A once, and put it into all B's within a for loop of N is O(N).

C++ Visual Studio 2012 Express Command window weird behavior

#include <iostream>
#include <cmath>
using namespace std;
int main()
{
int riceamount=2,
squarenumber=1,
totalamount=0,
neededrice1000=0,
neededrice1000000=0,
neededrice1000000000=0;
cout<<"Amount of rice you need for the square "<<
squarenumber<<" is " <<riceamount-1<<endl;
cout<<"Amount of rice you need for the square "<<
squarenumber+1<<" is " <<riceamount<<endl;
squarenumber=2;
for(int i=2;i<65;i++)
{
riceamount=riceamount*2;
++squarenumber;
cout<<"Amount of rice you need for the square "<< squarenumber<<" is " <<riceamount<<endl;
totalamount=totalamount+ riceamount;
if (totalamount>1000)
squarenumber=neededrice1000;
if (totalamount>10000000 && totalamount<1100000)
squarenumber=neededrice1000000;
if (totalamount>1000000000 && totalamount<1100000000)
squarenumber=neededrice1000000000;
}
system("pause");
return 0;}
When I debug Command window print numbers weirdly(after 10 it weirdly turn back to 1 and keep going printing 1 as squarenumber then continue from 2 when c++ gave up calculating powers), as you can see below from image, why? Thanks for any help. Command window picture
after 10 it weirdly turn back to 1 and keep going printing 1 as squarenumber
You told it to:
if (totalamount>1000)
squarenumber=neededrice1000;
This has nothing to do with the Visual Studio command window; it is the stated logic of your program.
I suggest you step through it, line by line, using pencil and paper, so that you understand what you have written.
when c++ gave up calculating powers
It didn't "give up"; you overflowed your int with huge numbers, so your program has undefined behaviour.
For you, this resulted in low values, low enough that the previously pointed-out bug no longer kicks in, and squarenumber is once again free to increment on each iteration.
In this example, a 64-bit type will be enough (so consider uint64_t).
Eventually riceamount * 2 overflows the int type.
The behaviour on doing that is undefined, but in your case the computation is effectively modulo a power of 2, which is zero for a large power of 2.
An unsigned long long would be big enough for the total number of grains of rice distributed across 64 squares with 1 grain on the first square.

Why does the program ends even before running?

I want to use this question to improve a bit in my general understanding of how computer works, since I'll probably never have the chance to study in a profound and deep manner. Sorry in advance if the question is silly and not useful in general, but I prefer to learn in this way.
I am learning c++, I found online a code that implements the Newton-Raphson method for finding the root of a function. The code is pretty simple, as you can see from it, at the beginning it asks for the tolerance required, and if I give a "decent" number it works fine. If instead, when it asks for the tolerance I write something like 1e-600, the program break down immediately and the output is Enter starting value x: Failed to converge after 100 iterations .
The output of failed convergence should be a consequence of running the loop for more than 100 iterations, but this isn't the case since the loop doesn't even start. It looks like the program knows already it won't reach that level of tolerance.
Why does this happen? How can the program write that output even if it didn't try the loop for 100 times?
Edit: It seems that everything meaningless (too small numbers, words) I write when it asks for tolerance produces a pnew=0.25 and then the code runs 100 times and fails.
The code is the following:
#include <iostream>
#include <cmath>
using namespace std;
#define N 100 // Maximum number of iterations
int main() {
double p, pnew;
double f, dfdx;
double tol;
int i;
cout << "Enter tolerance: ";
cin >> tol;
cout << "Enter starting value x: ";
cin >> pnew;
// Main Loop
for(i=0; i < N; i++){
p = pnew;
//Evaluate the function and its derivative
f = 4*p - cos(p);
dfdx= 4 + sin(p);
// The Newton-Raphson step
pnew = p - f/dfdx;
// Check for convergence and quit if done
if(abs(p-pnew) < tol){
cout << "Root is " << pnew << " to within " << tol << "\n";
return 0;
}
}
// We reach this point only if the iteration failed to converge
cerr << "Failed to converge after " << N << " iterations.\n";
return 1;
}
1e-600 is not representable by most implementations of double. std::cin will fail to convert your input to double and fall into a failed state. This means that, unless you clear the error state, any future std::cin also automatically fails without waiting for user input.
From cppreference (since c++17) :
If extraction fails, zero is written to value and failbit is set. If extraction results in the value too large or too small to fit in value, std::numeric_limits<T>::max() or std::numeric_limits<T>::min() is written and failbit flag is set.
As mentioned, 1e-600 is not a valid double value. However, there's more to it than being outside of the range. What's likely happening is that 1 is scanned into tol, and then some portion of e-600 is being scanned into pnew, and that's why it ends immediately, instead of asking for input for pnew.
Like François said, you cannot exeed 2^64 when you work on an 64bit machine (with corresponding OS) and 2^32 on a 32bit machine, you can use SSE which are 4 32 bytes data used for floating point representation. In your program the function fails at every iteration and skips your test with "if" and so never returns before ending the loop.

Looking for a more efficient way to generate/print a "progress bar" in C++

I am working on some grid generation code, during which I really want to see where I am, so I download a piece of progress bar code from internet and then inserted it into my code, something like:
std::string bar;
for(int i = 0; i < 50; i++)
{
if( i < (percent/2))
{
bar.replace(i,1,"=");
}
else if( i == (percent/2))
{
bar.replace(i,1,">");
}
else
{
bar.replace(i,1," ");
}
}
std::cout<< "\r" "[" << bar << "] ";
std::cout.width( 3 );
std::cout<< percent << "% "
<< " ieration: " << iterationCycle << std::flush;
This is very straightforward. However, it GREATLY slows down the whole process, note percent=iterI/nIter.
I am really get annoyed with this, I am wondering if there is any smarter and more efficient way to print a progress bar to the screen.
Thanks a million.
Firstly you could consider only updating it on every 100 or 1000 iterations. Secondly, I don't think the division is the bottleneck, but much rather the string operations and the outputting itself.
I guess the only significant improvement would be to just output less often.
Oh and just for good measure - an efficient way to only execute the code every, say, 1024 iterations, would be not to see if 1024 is a divisor using the modulo operations, but rather using bitwise calls. Something along the lines of
if (iterationCycle & 1024) {
would work. You'd be computing the bitwise AND of iterationCycle and 1024, only returning positive for every time the bit on the 10th position would be a 1. These kind of operations are done extremely fast, as your CPU has specific hardware for them.
You might be overthinking this. I would just output a single character every however-many cycles of your main application code. Run some tests to see how many (hundreds? millions?), but you shouldn't print more than say once a second. Then just do:
std::fputc('*', stdout);
std::fflush(stdout);
You should really check "efficiency", but what would work almost the same ist boost.progress:
#include <boost/progress.hpp>
...
boost::progress_display pd(50);
for (int i=0; i<=60; i++) {
++pd;
}
and as Joost already answered, output less often

C++ To Cuda Conversion/String Generation And Comparison

So I am in a basic High School coding class. We had to think up one
of our semester projects. I chose to
base mine on ideas and applications
that arn't used in traditional code.
This brought up the idea for use of
CUDA. One of the best ways I would
know to compare speed of traditional
methods versus unconventional is
string generation and comparison. One
could demonstrate the generation and
matching speed of traditional CPU
generation with timers and output. And
then you could show the increase(or
decrease) in speed and output of GPU
Processing.
I wrote this C++ code to generate random characters that are input into
a character array and then match that
array to a predetermined string.
However like most CPU programming it
is incredibly slow comparatively to
GPU programming. I've looked over CUDA
API and could not find something that
would possibly lead me in the right
direction for what I'm looking to do.
Below is the code I have written in C++, if anyone could point me in
the direction of such things as a
random number generator that I can
convert to chars using ASCII codes,
that would be excellent.
#include <iostream>
#include <string>
#include <cstdlib>
using namespace std;
int sLength = 0;
int count = 0;
int stop = 0;
int maxValue = 0;
string inString = "aB1#";
static const char alphanum[] =
"0123456789"
"!##$%^&*"
"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
"abcdefghijklmnopqrstuvwxyz";
int stringLength = sizeof(alphanum) - 1;
char genRandom()
{
return alphanum[rand() % stringLength];
}
int main()
{
cout << "Length of string to match?" << endl;
cin >> sLength;
string sMatch(sLength, ' ');
while(true)
{
for (int x = 0; x < sLength; x++)
{
sMatch[x] = genRandom();
//cout << sMatch[x];
count++;
if (count == 2147000000)
{
count == 0;
maxValue++;
}
}
if (sMatch == inString)
{
cout << "It took " << count + (maxValue*2147000000) << " randomly generated characters to match the strings." << endl;
cin >> stop;
}
//cout << endl;
}
}
If you want to implement a pseudorandom number generator using CUDA, have a look over here. If you want to generate chars from a predetermined set of characters, you can just put all possible chars into that array and create a random index (just as you are doing it right now).
But I think it might be more valuable comparison might be one that uses brute force. Therefore, you could adapt your program to try not random strings, but try one string after another in any meaningful order.
Then, on the other hand, you could implement the brute-force stuff on the GPU using CUDA. This can be tricky since you might want to stop all CUDA threads as soon as one of them finds a solution. I could imagine the brute force process using CUDA the following way: One thread tries aa as first two letters and brute-forces all following digits, the next thread tries ab as first two letters and brute-forces all following digits, the next thread tries ac as first two letters and brute-forces all following digits, and so on. All these threads run in parallel. Of course, you could vary the number of predetermined chars such that e.g. the first thread tries aaaa, the second aaab. Then, you could compare different input values.
Any way, if you have never dealt with CUDA, I recommend the vector addition sample, a very basic CUDA example, that serves very well for getting a basic understanding of what's going on with CUDA. Moreover, you should read the CUDA programming guide to make yourself familiar with CUDAs concept of a grid of thread-blocks containing a grid of threads. Once you understand this, I think it becomes clearer how CUDA organizes stuff. To be short, in CUDA, you should replace loops with a kernel, that is executed multiple times at once.
First off, I am not sure what your actual question is? Do you need a faster random number generator or one with a greater period? In that case I would recommend boost::random, the "Mersenne Twister" is generally considered state of the art. It is a little hard to get started, but boost is a great library so worth the effort.
I think the method you arer using should be fairly efficient. Be aware that it could take up to (#characters)^(length of string) draws to get to the target string (here 70^4 = 24010000). GPU should be at an advantage here since this process is a Monte Carlo simulation and trivially parallelizable.
Have you compiled the code with optimizations?