How to speed up program execution

How to speed up program execution - c++

This is a very simple question, but unfortunately, I am stuck and do not know what to do. My program is a simple program that keeps on accepting 3 numbers and outputs the largest of the 3. The program keeps on running until the user inputs a character.
As the tittle says, my question is how I can make this execute faster ( There will be a large amount of input data ). Any sort of help which may include using a different algorithm or using different functions or changing the entire code is accepted.
I'm not very experienced in C++ Standard, and thus do not know about all the different functions available in the different libraries, so please do explain your reasons and if you're too busy, at least try and provide a link.
Here is my code
#include<stdio.h>
int main()
{
int a,b,c;
while(scanf("%d %d %d",&a,&b,&c))
{
if(a>=b && a>=c)
printf("%d\n",a);
else if(b>=a && b>=c)
printf("%d\n",b);
else
printf("%d\n",c);
}
return 0;
}
It's working is very simple. The while loop will continue to execute until the user inputs a character. As I've explained earlier, the program accepts 3 numbers and outputs the largest. There is no other part of this code, this is all. I've tried to explain it as much as I can. If you need anything more from my side, please ask, ( I'll try as much as I can ).
I am compiling on an internet platform using CPP 4.9.2 ( That's what is said over there )
Any sort of help will be highly appreciated. Thanks in advance
EDIT
The input is made by a computer, so there is no delay in input.
Also, I will accept answers in c and c++.
UPDATE
I would also like to ask if there are any general library functions or algorithms, or any other sort of advise ( certain things we must do and what we must not do ) to follow to speed up execution ( Not just for this code, but in general ). Any help would be appreciated. ( and sorry for asking such an awkward question without giving any reference material )

Your "algorithm" is very simple and I would write it with the use of the max() function, just because it is better style.
But anyway...
What will take the most time is the scanf. This is your bottleneck. You should write your own read function which reads a huge block with fread and processes it. You may consider doing this asynchronously - but I wouldn't recommend this as a first step (some async implementations are indeed slower than the synchronous implementations).
So basically you do the following:
Read a huge block from file into memory (this is disk IO, so this is the bottleneck)
Parse that block and find your three integers (watch out for the block borders! the first two integers may lie within one block and the third lies in the next - or the block border splits your integer in the middle, so let your parser just catch those things)
Do your comparisions - that runs as hell compared to the disk IO, so no need to improve that

Unless you have a guarantee that the three input numbers are all different, I'd worry about making the program get the correct output. As noted, there's almost nothing to speed up, other than input and output buffering, and maybe speeding up decimal conversions by using custom parsing and formatting code, instead of the general-purpose scanf and printf.
Right now if you receive input values a=5, b=5, c=1, your code will report that 1 is the largest of those three values. Change the > comparisons to >= to fix that.
You can minimize the number of comparisons by remembering previous results. You can do this with:
int d;
if (a >= b)
if (a >= c)
d = a;
else
d = c;
else
if (b >= c)
d = b;
else
d = c;
[then output d as your maximum]
That does exactly 2 comparisons to find a value for d as max(a,b,c).
Your code uses at least two and maybe up to 4.

Related

Measure execution efficiency for compairing two solutions?

I want to measure the efficiency for two solutions for the same problem.
I don't need to include any environmental "noise" into the calculation, just I want to know which of these below solutions would perform better in a perfect world, ie.: which needs more steps to execute?
string a;
int b;
string c;
//SOLUTION A
c = a;
c = std::move(c) + ',' + std:to_string(b);
//SOLUTION B
c = a;
c.append(",").append(std::to_string(b));
I don't really have any experience in measuring execution times in this small scale, so I might have lost in the jungle, sorry if this is the case here.

One way is to benchmark both solutions, e.g. with QuickBench. In the chart:
you can see that second solution is faster. However, execution time of your code might also be dependent on the size and the number of the strings you try to concatenate, so take it into account. I would also recommend benchmarking whole solution (whatever you try to achieve), not just one line of your code (here: append vs operator+).
You can also try it with different compilers and different optimization levels.

Permutations of English Alphabet of a given length

So I have this code. Not sure if it works because the runtime for the program is still continuing.
void permute(std::vector<std::string>& wordsVector, std::string prefix, int length, std::string alphabet) {
if (length == 0) {
//end the recursion
wordsVector.push_back(prefix);
}
else {
for (int i = 0; i < alphabet.length(); ++i) {
permute(wordsVector, prefix + alphabet.at(i), length - 1, alphabet);
}
}}
where I'm trying to get all combinations of characters in the English alphabet of a given length. I'm not sure if the approach is correct at the moment.
Alphabet consists of A-Z in a string of length 26. WordsVectors holds all the different combinations of words. prefix is meant to pass through recursively until a word is made and length is self explanatory.
Example, if I give the length of 7 to the function, I expect a size of 26 x 25 x 24 x 23 x 22 x 21 x 20 = 3315312000 if I'm correct, following the formula for permutations.
I don't think programs are meant to run this long so either I'm hitting an infinite loop or something is wrong with my approach. Please advise. Thanks.

Surely the stack would overflow but concentrating on your question even if you write an iterative program it will take a long time ( not an infinite loop just very long )
[26L, 650L, 15600L, 358800L, 7893600L, 165765600L, 3315312000L, 62990928000L, 1133836704000L, 19275223968000L, 308403583488000L, 4626053752320000L, 64764752532480000L, 841941782922240000L, 10103301395066880000L, 111136315345735680000L, 1111363153457356800000L, 10002268381116211200000L, 80018147048929689600000L, 560127029342507827200000L, 3360762176055046963200000L, 16803810880275234816000000L, 67215243521100939264000000L, 201645730563302817792000000L, 403291461126605635584000000L, 403291461126605635584000000L]
The above list is the number of possibilities for 1<=n<=26. You can see as n increases number of possibilities increases tremendously. Say you have 1GHz processor that does 10^9 operations per second. Say consider number of possibilities for n=26 its 403291461126605635584000000L. Its evident that if you sit down to list all possibilities its so so long ( so so many years ) that
you will feel it has hit an infinite loop. Finally I have not looked that closely into your code , but in nutshell even if you write it correctly,iteratively and don't store (again can't have this much memory) and just print all possibilities its going to take long time for larger values of n.
EDIT
As jaromax and others said if you just want to write it for smaller values of n,
say less than 10-12 you can write an iterative program to list/print them. It will run quite fast for small values. But if you also want to store them them then n will have to be say less than 5 say. (Really depends on how much RAM is available or you could find some permutations write them to disk, then depends on how much disk memory you can spare, again refer the number of possibilities list I posted above. It gives a rough idea of both time and space complexity).

I think there could be quite a problem that you do this on stack. A large part of the calculation you do recursively and this means every time allocated space for function.
Try to reformulate it linearly. I think I had such a problem before.

Your question implies you think there are 26x25x24x ... permutations
Your code doesn't have anything I can see to avoid "AAAAAAA" being a permutation, in which case there are 26x26x26x ...
So in addition to being a very complicated way of counting in base 26, I think it's also giving bad answers?

Is adding 1 to a number repeatedly slower than adding everything at the same time in C++? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
If I have a number a, would it be slower to add 1 to it b times rather than simply adding a + b?
a += b;
or
for (int i = 0; i < b; i++) {
a += 1;
}
I realize that the second example seems kind of silly, but I have a situation where coding would actually be easier that way, and I am wondering if that would impact performance.
EDIT: Thank you for all your answers. It looks like some posters would like to know what situation I have. I am trying to write a function to shift an inputted character a certain number of characters over (ie. a cipher) if it is a letter. So, I want to say that one char += the number of shifts, but I also need to account for the jumps between the lowercase characters and uppercase characters on the ascii table, and also wrapping from z back to A. So, while it is doable in another way, I thought it would be easiest to keep adding one until I get to the end of a block of letter characters, then jump to the next one and keep going.

If your loop is really that simple, I don't see any reason why a compiler couldn't optimize it. I have no idea if any actually would, though. If your compiler doesn't the single addition will be much faster than the loop.

The language C++ does not describe how long either of those operations take. Compilers are free to turn your first statement into the second, and that is a legal way to compile it.
In practice, many compilers would treat those two subexpressions as the same expression, assuming everything is of type int. The second, however, would be fragile in that seemingly innocuous changes would cause massive performance degradation. Small changes in type that 'should not matter', extra statements nearby, etc.
It would be extremely rare for the first to be slower than the second, but if the type of a was such that += b was a much slower operation than calling += 1 a bunch of times, it could be. For example;
struct A {
std::vector<int> v;
void operator+=( int x ) {
// optimize for common case:
if (x==1 && v.size()==v.capacity()) v.reserve( v.size()*2 );
// grow the buffer:
for (int i = 0; i < x; ++i)
v.reserve( v.size()+1 );
v.resize( v.size()+1 );
}
}
};
then A a; int b = 100000; a+=b; would take much longer than the loop construct.
But I had to work at it.

The overhead (CPU instructions) on having a variable being incremented in a loop is likely to be insignificant compared to the total number of instructions in that loop (unless the only thing you are doing in the loop is incrementing). Loop variables are likely to remain in the low levels of the CPU cache (if not in CPU registries) and is very fast to increment as in doesn't need to read from the RAM via the FSB. Anyway, if in doubt just make a quick profile and you'll know if it makes sense to sacrifice code readability for speed.

Yes, absolutely slower. The second example is beyond silly. I highly doubt you have a situation where it would make sense to do it that way.
Lets say 'b' is 500,000... most computers can add that in a single operation, why do 500,000 operations (not including the loop overhead).

If the processor has an increment instruction, the compiler will usually translate the "add one" operation into an increment instruction.
Some processors may have an optimized increment instructions to help speed up things like loops. Other processors can combine an increment operation with a load or store instruction.
There is a possibility that a small loop containing only an increment instruction could be replaced by a multiply and add. The compiler is allowed to do so, if and only if the functionality is the same.
This kind of operation, generally produces negligible results. However, for large data sets and performance critical applications, this kind of operation may be necessary and the time gained would be significant.
Edit 1:
For adding values other than 1, the compiler would emit processor instructions to use the best addition operations.
The add operation is optimized in hardware as a different animal than incrementing. Arithmetic Logic Units (ALU) have been around for a long time. The basic addition operation is very optimized and a lot faster than incrementing in a loop.

2 while loops vs if else statement in 1 while loop

First a little introduction:
I'm a novice C++ programmer (I'm new to programming) writing a little multiplication tables practising program. The project started as a small program to teach myself the basics of programming and I keep adding new features as I learn more and more about programming. At first it just had basics like ask for input, loops and if-else statements. But now it uses vectors, read and writes to files, creates a directory etc.
You can see the code here: Project on Bitbucket
My program now is going to have 2 modes: practise a single multiplication table that the user can choose himself or practise all multiplication tables mixed. Now both modes work quite different internally. And I developed the mixed mode as a separate program, as would ease the development, I could just focus on writing the code itself instead of also bothering how I will integrate it in the existing code.
Below the code of the currently separate mixed mode program:
#include <iostream>
#include <sstream>
#include <string>
#include <vector>
#include <algorithm>
#include <time.h>
using namespace std;
using std::string;
int newquestion(vector<int> remaining_multiplication_tables, vector<int> multiplication_tables, int table_selecter){
cout << remaining_multiplication_tables[table_selecter] << " * " << multiplication_tables[remaining_multiplication_tables[table_selecter]-1]<< " =" << "\n";
return remaining_multiplication_tables[table_selecter] * multiplication_tables[remaining_multiplication_tables[table_selecter]-1];
}
int main(){
int usersanswer_int;
int cpu_answer;
int table_selecter;
string usersanswer;
vector<int> remaining_multiplication_tables = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
vector<int> multiplication_tables(10, 1);//fill vector with 10 elements that contain the value '1'. This vector will store the "progress" of each multiplication_table.
srand(time(0));
table_selecter = rand() % remaining_multiplication_tables.size();
cpu_answer = newquestion(remaining_multiplication_tables, multiplication_tables, table_selecter);
while(remaining_multiplication_tables.size() != 0){
getline(cin, usersanswer);
stringstream usersanswer_stream(usersanswer);
usersanswer_stream >> usersanswer_int;
if(usersanswer_int == cpu_answer){
cout << "Your answer is correct! :)" << "\n";
if(multiplication_tables[remaining_multiplication_tables[table_selecter]-1] == 10){
remaining_multiplication_tables.erase(remaining_multiplication_tables.begin() + table_selecter);
}
else{
multiplication_tables[remaining_multiplication_tables[table_selecter]-1] +=1;
}
if (remaining_multiplication_tables.size() != 0){
table_selecter = rand() % remaining_multiplication_tables.size();
cpu_answer = newquestion(remaining_multiplication_tables, multiplication_tables, table_selecter);
}
}
else{
cout << "Unfortunately your answer isn't correct! :(" << "\n";
}
}
return 0;
}
As you can see the newquestion function for the mixed mode is quite different. Also the while loop includes other mixed mode specific code.
Now if I want to integrate the mixed multiplication tables mode into the existing main program I have 2 choices:
-I can clutter up the while loop with if-else statements to check each time the loop runs whether mode == 10 (single multiplication table mode) or mode == 100 (mixed multiplication tables mode). And also place a if-else statement in the newquestion() function to check if mode == 10 or mode == 100
-I can let the program check on startup whether the user chose single multiplication table or mixed multiplication tables mode and create 2 while loops and 2 newquestion() functions. That would look like this:
int newquestion_mixed(){
//newquestion function for mixed mode
}
int newquestion_single(){
//newquestion function for single mode
}
//initialization
if mode == 10
//create necessary variables for single mode
while(){
//single mode loop
}
else{
//create necessary variables for mixed mode
while(){
//mixed mode loop
}
}
Now why would I bother creating 2 separate loops and functions? Well isn't it inefficient if the program checks each time the loop runs (each time the user is asked a new question, for example: '5 * 3 =') which mode the user chose? I'm worried about the performance with this option. Now I hear you think: but why would you bother about performance for such a simple, little non-performance critical application with the extremely powerful processors today and the huge amounts of RAM? Well, as I said earlier this program is mainly about teaching myself a good coding style and learning how to program etc. So I want to teach myself the good habits from the beginning.
The 2 while loops and functions option is much more efficient will use less CPU, but more space and includes duplicating code. I don't know if this is a good style either.
So basically I'm asking the experts what's the best style/way to handle this kind of things. Also if you spot something bad in my code/bad style please tell me, I'm very open to feedback because I'm still a novice. ;)

First, a fundamental rule of programming is that of "don't prematurely optimize the code" - that is, don't fiddle around with little details, before you have the code working correctly, and write code that expresses what you want done as clearly as possible. This is good coding style. To obsess over the details of "which is faster" (in a loop that spends most of it's time waiting for the user to input some number) is not good coding style.
Once it's working correcetly, analyse (using for example a profiler tool) where the code is spending it's time (assuming performance is a major factor in the first place). Once you have located the major "hotspot", then try to make that better in some way - how you go about that depends very much on what that particular hot-spot code does.
As to which performs best will highly depend on details the code and the compiler (and which compiler optimizations are chosen). It is quite likely that having an if inside a while-loop will run slower, but modern compilers are quite clever, and I have certainly seen cases where the compiler hoists such a choice out of the loop, in cases where the conditions don't change. Having two while-loops is harder for the compiler to "make better", because it most likely won't see that you are doing the same thing in both loops [because the compiler works from the bottom of the parse-tree up, and it will optimize the inside of the while-loop first, then go out to the if-else side, and at that point it's "lost track" of what's going on inside each loop].
Which is clearer, to have one while loop with an if inside, or an if with two while-loops, that's another good question.
Of course, the object oriented solution is to have two classes - one for mixed, another for single - and just run one loop, that calls the relevant (virtual) member function of the object created based on an if-else statement before the loop.

Modern CPU branch predictors are so good that if during the loop the condition never changes, it will probably be as fast as having two while loops in each branch.

How to optimize input/output in C++

I'm solving a problem which requires very fast input/output. More precisely, the input data file will be up to 15MB. Is there a fast, way to read/print integer values.
Note: I don't know if it helps, but the input file has the following form:
line 1: a number n
line 2..n+1: three numbers a,b,c;
line n+2: a number r
line n+3..n+4+r: four numbers a,b,c,d
Note 2: The input file will be stdin.
Edit: Something like the following isn't fast enough:
void fast_scan(int &n) {
char buffer[10];
gets(buffer);
n=atoi(buffer);
}
void fast_scan_three(int &a,int &b,int &c) {
char buffval[3][20],buffer[60];
gets(buffer);
int n=strlen(buffer);
int buffindex=0, curindex=0;
for(int i=0; i<n; ++i) {
if(!isdigit(buffer[i]) && !isspace(buffer[i]))break;
if(isspace(buffer[i])) {
buffindex++;
curindex=0;
} else {
buffval[buffindex][curindex++]=buffer[i];
}
}
a=atoi(buffval[0]);
b=atoi(buffval[1]);
c=atoi(buffval[2]);
}

General input/output optimization principle is to perform as less I/O operations as possible reading/writing as much data as possible.
So performance-aware solution typically looks like this:
Read all data from device into some buffer (using the principle mentioned above)
Process the data generating resulting data to some buffer (on place or another one)
Output results from buffer to device (using the principle mentioned above)
E.g. you could use std::basic_istream::read to input data by big chunks instead of doing it line by line. The similar idea with output - generate single string as result adding line feed symbols manually and output it at once.

If you want to minimize the physical I/O operation overhead, load the whole file into memory by a technique called memory mapped files. I doubt you'll get a noticable performance gain though. Parsing will most likely be a lot costlier.

Consider using threads. Threading is useful for lots of things, but this is exactly the kind of problem that motivated the invention of threads.
The underlying idea is to separate the input, processing, and output, so these different operations can run in parallel. Do it right and you will see a significant speedup.
Have one thread doing close to pure input. It reads lines into a buffer of lines. Have a second thread do a quick pre-parse and organize the raw input into blocks. You have two things that need to be parsed, the line that contain the number of lines that contain triples and the line that contains the number of lines that contain quads. This thread forms the raw input into blocks that are still mostly text. A third thread parses the triples and quads, re-forming the the input into fully parsed structures. Since the data are now organized into independent blocks, you can have multiple instances of this third operation so as to better take advantage of the multiple processors on your computer. Finally, other threads will operate on these fully-parsed structures. Note: It might be better to combine some of these operations, for example combining the input and pre-parsing operations into one thread.

Put several input lines in a buffer, split them, and then parse them simultaneously in different threads.

It's only 15MB. I would just slurp the whole thing into a memory buffer, then parse it.
The parsing looks something like this, approximately:
#define DIGIT(c)((c) >= '0' && (c) <= '9')
while(*p == ' ') p++;
if (DIGIT(*p)){
a = 0;
while(DIGIT(*p){
a *= 10; a += (*p++ - '0');
}
}
// and so on...
You should be able to write this kind of code in your sleep.
I don't know if that's any faster than atoi, but it's not fussing around figuring out where numbers begin and end. I would stay away from scanf because it goes through nine yards of figuring out its format string.
If you run this whole thing in a loop 1000 times and grab some stack samples, you should see that it is spending nearly 100% of its time reading in the file and generating output (which you didn't mention).
You just can't beat that.
If you do see that noticeable time is spent in the actual parsing, it might be possible to do overlapped I/O, but the machine would have to be really slow, or the I/O really fast (like from a solid state drive) before that would make sense.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js