How can I make my program faster? [closed]

How can I make my program faster? [closed] - c++

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
Today I'm working on making my program faster. This program scans in a 100,000 fake social security numbers, first names, last names and gpa's. My professor has started talking about pointer's, referencing and dereferencing and has said that using these can help speed a program up by passing addresses. My explanation probably sucks because I am not really understanding the topics in class, so while you guys help me out I will be reading in my book for chapter 9 on call by value and call by reference. Any help would be appreciated. Thanks!!!
#include<iostream>
#include<fstream>
#include<cstdlib>
#include<ctime>
using namespace std;
struct nameType{
string ssno;
string fName;
string lName;
double gpa;
};
int load(istream &in,nameType []);
void shellsort(nameType [],int);
void exchange(nameType &, nameType &);
void print(ostream &out,nameType [],int);
int main(void)
{
ifstream in;
ofstream out;
char infile[40],outfile[40];
nameType name[100000];
clock_t start, stop;
double secl=0;
double secs=0;
double secp=0;
double total=0;
int n;
cout << "Please enter the input data file name(NO SPACES): ";
cin >> infile;
in.open(infile);
if(in.fail()) {
cerr<<"problem input file\n"<<endl;
exit(1);
}
cout << "Please enter the output data file name(NO SPACES): ";
cin >> outfile;
out.open(outfile);
if(out.fail()) {
cerr<<"problem output file\n"<<endl;
exit(1);
}
start = clock();
n = load(in,name);
stop = clock();
secl = (double)(stop - start)/CLOCKS_PER_SEC;
cout << "Load Time: " << secl << endl;
start = clock();
shellsort(name,n);
stop = clock();
secs = (double)(stop - start)/CLOCKS_PER_SEC;
cout << "Sort Time: " << secs << endl;
start = clock();
print(out,name,n);
stop = clock();
secp = (double)(stop - start)/CLOCKS_PER_SEC;
cout << "Print Time: " << secp << endl;
total = secl + secs + secp;
cout << "Total Time: " << total << endl;
in.close();
out.close();
return 0;
}
int load(istream &in,nameType name[])
{
int n=0;
in >> name[n].ssno >> name[n].fName >> name[n].lName >> name[n].gpa;
while(!in.eof()){
n++;
in >> name[n].ssno >> name[n].fName >> name[n].lName >> name[n].gpa;
}
return n;
}
void shellsort(nameType name[],int n)
{
int gap = n/2;
bool passOk;
while(gap>0){
passOk=true;
for(int i=0; i<n-gap; i++){
if(name[i].lName>name[i+gap].lName){
exchange(name[i],name[i+gap]);
passOk=false;
}
else if(name[i].lName == name[i+gap].lName && name[i].fName > name[i+gap].fName){
exchange(name[i],name[i+gap]);
passOk=false;
}
}
if(passOk){
gap/=2;
}
}
}
void exchange(nameType &a, nameType &b)
{
nameType temp;
temp = a;
a = b;
b = temp;
}
void print(ostream &out,nameType name[],int n)
{
for(int i=0;i<n;i++){
out << name[i].ssno << " " << name[i].fName << " " << name[i].lName << " " << name[i].gpa << endl;
}
out << endl;
}
Exact assignment details--- I'm on bullet number 5---
Efficiency - time and space
Time and space are always issues to consider when writing programs.
In this assignment, you will be modifying the sort_v4.cpp program created in the first programming assignment. In that assignment, you were required to process an array of integers. Let's update the program with several items:
Build a structure containing an social security number, first name, last name, and a GPA.
The new sort technique that uses the gap concept as described in class that sorts the information in ascending order based on the last name (if last names are the same then the first names need to be checked). The sort technique is know as shell sort.
Improve the efficiency - time and space as described below.
Each struct element contained an ID, first name, last name and gpa. Suppose you had to process 100,000 students. The program is inefficient for two reasons
Space issue: if social security number takes up 12 characters, first and last take up 20 characters each, and the gpa as a double takes up 8 bytes then the main array of structure elements consumes (12+20+20+8)*100000 bytes of memory. This may be OK if we load up 100,000 names. But if the average number of students we process is <50,000, then there is a considerable amount of wasted memory.
Time issue: When you exchange two elements that are out of order, 60 bytes are moved around 3 times. Total of 180 bytes are moved in memory. Again, inefficient.
As the number of member variables in the struct increase the problem gets worse.
The focus of this assignment is to
Read a file containing the social security number, first, last, and a gpa into an array of structure elements. Process until EOF. Each line contains information about one student. Inside the struct, the elements may be define as char [] or strings. Do you think it will make a difference in performance if we use char [] vs strings? Make a prediction.
Dump the array into a file along with timing information.
Try out the new sort technique - sort structure elements based on two items - first and last name.
Time each function to see where the most time is being spent.
Attempt to improve the use of memory by using an array of pointers to the structs.
Attempt to improve the efficiency by making the exchange faster by swapping pointers instead of elements.
With the above list in mind, there will be three possible grades for the assignment. For a maximum grade of a C - 14/20 points, you must complete the first two bullets. For a maximum grade of a B - 16/20 points, you must complete the first 4 bullets. For an A, you must complete all bullets.
I would recommend to start the assignment by implementing what I call a non pointer version. Define a struct above the main function to hold the items. The main function should declare the array of struct elements. Call the load, sort, and print functions as we did in the first assignment making adjustments to accommodate the struct and the new sort technique where there are two items to consider before exchanging two elements. NOTE: I want to see an exchange function this time. This will give you a total of 4 functions. Make sure you follow the guidelines stated in the first programming assignment.
For all that are attempting the A program, make sure you have the basic program for a B completed and tested before continuing. Make a copy of that program and modify the program as follows:
Change the array of struct elements to an array of pointers to struct elements by placing a * in the definition.
Using new (or malloc), dynamically create space for one structure element just before you read in the values for the members of the struct.
When you exchange two elements, exchange two pointers to struct elements instead of the struct elements themselves.
Take a look at your times for the non-pointer version and the pointer version. Is there a significant difference?
Following is the information about timing one function.
#include<ctime>
//Create a couple variables of clock_t type.
clock_t start, stop;
start = clock();
load(x, n); //call a function to perform a task
stop = clock();
cout << "load time: " << (double)(stop - start)/CLOCKS_PER_SEC << endl;
The function clock() returns the number of cpu clock cycles that have occurred since the program started. Ideally, start in the above code should be 0. To be able to make some sense of how much time a function takes, you have to convert the elapsed time into seconds. This is accomplished by taking the difference between start and stop, typecasting, then dividing by the system defined CLOCKS_PER_SEC. On linus public (or an alien ware machine in the lab) is 1,000,000. Also see class notes on the topic. The following is an example of what should appear at the end of the sorted data.
load time: 0.05
sort time: 2.36
print time: 0.01
Total Run time: 2.42
The Penalty for missing deadline, 1 pt per day for a max of 7 days. Programs will not be accepted 7 days after the deadline.

If you look here: How are arrays passed?
Arrays are passed as pointers already, and your implementation of std::swap (which you called "exchange") is already passing by reference. So that is not an issue in your case. Has your professor complained about your program execution speed?

A place that takes a lot of execution time in a program is called a bottleneck.
A common bottleneck is file input and output. You can make your program faster by reducing or optimizing the bottleneck.
Reading many small pieces of data from a file takes more time than reading one large piece of data. For example, reading 10 lines of data with one request is more efficient that 10 requests to read one line of data.
Accessing memory is fast, faster than reading from a file.
The general pattern is:
1. Read a lot of data into memory.
2. Process the data.
3. Repeat at 1 as necessary.
So, you could use the "block" read command, istream::read into an array of bytes. Load a string with a line of data from the array of bytes and process the line. Repeat loading from the array until you run out and then reload from the file.

Related

Why does the program ends even before running?

I want to use this question to improve a bit in my general understanding of how computer works, since I'll probably never have the chance to study in a profound and deep manner. Sorry in advance if the question is silly and not useful in general, but I prefer to learn in this way.
I am learning c++, I found online a code that implements the Newton-Raphson method for finding the root of a function. The code is pretty simple, as you can see from it, at the beginning it asks for the tolerance required, and if I give a "decent" number it works fine. If instead, when it asks for the tolerance I write something like 1e-600, the program break down immediately and the output is Enter starting value x: Failed to converge after 100 iterations .
The output of failed convergence should be a consequence of running the loop for more than 100 iterations, but this isn't the case since the loop doesn't even start. It looks like the program knows already it won't reach that level of tolerance.
Why does this happen? How can the program write that output even if it didn't try the loop for 100 times?
Edit: It seems that everything meaningless (too small numbers, words) I write when it asks for tolerance produces a pnew=0.25 and then the code runs 100 times and fails.
The code is the following:
#include <iostream>
#include <cmath>
using namespace std;
#define N 100 // Maximum number of iterations
int main() {
double p, pnew;
double f, dfdx;
double tol;
int i;
cout << "Enter tolerance: ";
cin >> tol;
cout << "Enter starting value x: ";
cin >> pnew;
// Main Loop
for(i=0; i < N; i++){
p = pnew;
//Evaluate the function and its derivative
f = 4*p - cos(p);
dfdx= 4 + sin(p);
// The Newton-Raphson step
pnew = p - f/dfdx;
// Check for convergence and quit if done
if(abs(p-pnew) < tol){
cout << "Root is " << pnew << " to within " << tol << "\n";
return 0;
}
}
// We reach this point only if the iteration failed to converge
cerr << "Failed to converge after " << N << " iterations.\n";
return 1;
}

1e-600 is not representable by most implementations of double. std::cin will fail to convert your input to double and fall into a failed state. This means that, unless you clear the error state, any future std::cin also automatically fails without waiting for user input.
From cppreference (since c++17) :
If extraction fails, zero is written to value and failbit is set. If extraction results in the value too large or too small to fit in value, std::numeric_limits<T>::max() or std::numeric_limits<T>::min() is written and failbit flag is set.

As mentioned, 1e-600 is not a valid double value. However, there's more to it than being outside of the range. What's likely happening is that 1 is scanned into tol, and then some portion of e-600 is being scanned into pnew, and that's why it ends immediately, instead of asking for input for pnew.

Like François said, you cannot exeed 2^64 when you work on an 64bit machine (with corresponding OS) and 2^32 on a 32bit machine, you can use SSE which are 4 32 bytes data used for floating point representation. In your program the function fails at every iteration and skips your test with "if" and so never returns before ending the loop.

How to test if a particular C++ statement is faster or slower than other? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
For example when printing a single character like a new line character, which might be faster while using cout in C++, passing as string or as character?
cout << "\n";
Or
cout << '\n';
This video motivated me to write efficient codes.
How would you go about testing such things? Maybe I might want to test other things to see which is faster so it would be helpful to know how I can test these things myself.

In theory, yes, using '\n' instead of "\n" is quite faster when took out the elapsed time of printing 1000 occurrences of the same good-ol new-line:
Remember: A single char possibly cannot be slower than a pointer... since a pointer points to addresses of each char (like a container) and this is why its byte size is not fixed, and a char only has one address and that is itself... of only 1 byte
// Works with C++17 and above...
#include <iostream>
#include <chrono>
template<typename T, typename Duration = std::chrono::milliseconds, typename ...Args>
constexpr static auto TimeElapsedOnOperation(T&& functor, Args&&... arguments)
{
auto const ms = std::chrono::steady_clock::now();
std::invoke(std::forward<decltype(functor)>(functor),
std::forward<Args>(arguments)...);
return std::chrono::duration_cast<std::chrono::
milliseconds>(std::chrono::steady_clock::now() - ms);
}
int main()
{
std::cout << TimeElapsedOnOperation([]
{
for (auto i = 0; i < 1000; i++)
std::cout << "\n";
}).count() << std::endl;
std::cin.get();
std::cout << TimeElapsedOnOperation([]
{
for (auto i = 0; i < 1000; i++)
std::cout << '\n';
}).count() << std::endl;
std::cin.get();
return 0;
}
It gave the following output: (Can occur differently...)
<1000> newlines follow...
2195 milliseconds For the string "\n"
More <1000> newlines follow...
852 milliseconds For the character '\n'
2195 - 852 = 1343 milliseconds
It took 1343 (1.343 seconds) milliseconds longer... So we can take the approximation that it was 61.18% (1343 / 2195 * 100) slower than using just '\n'
This is just an approximation since the performance can differ in other machines...
As to why this happens:
A single character (1 byte) constant is much smaller (in bytes) than a string having a single character since a string is a pointer to char (Points to specified addresses in memory) which takes up more space than a single char in the memory since it is a container (for memory addresses of each character) after all... (i.e, const char*)...
There is a difference how a character and a string is read... A character is directly accessed while the string is iterated and the operations are performed for each individual character pointed and the result is stored back inside the address of the pointer...
A string is always a char array, while a char is safely considered an integer containing the respective numerical value (Extended ASCII, from which different character encodings are branched) of it, a string of 1 character is an array of 1 character (along with its address...), which, in fact, is not equal to a single char...
So maybe (just maybe) you are on the better side of using '\n' instead...
However, some "tricky" compiler may optimize your code from "\n" to '\n' anytime..., so, actually, we never can guess, but still, it is considered good practice to declare a char as a char...

Theoretical considerations only:
A single character can be just printed out as is, a string needs to be iterated over to find the terminating null character.
A single character can be passed and used directly as value; a string is passed by pointer, so the address must be resolved before the character(s) can be used.
So at least, the single character cannot be slower. However, a sufficiently clever compiler might spot the constant one-character string and optimise any difference away (especially if operator<< is inline).
How to test: At very first, you'd be interested in a system that might disturb the test as little as possible (context switches between threads are expensive), so best close any open applications.
A very simple test program might repeatedly use both operators sufficiently often, something like:
for(uint32_t loop = 0; loop < SomeLimit; ++loop)
{
// take timestamp in highest precision possible
for(uint32_t i = 0; i < Iterations; ++i)
{
// single character
}
// calculate difference to timestamp, add to sum for character
// take timestamp in highest precision possible
for(uint32_t i = 0; i < Iterations; ++i)
{
// string
}
// calculate difference to timestamp, add to sum for string
}
Interleaving character and string output might help to get a better average over runtime if OS activities vary during the test, the inner loops should run sufficiently long to get a reasonable time interval for measurement.
The longer the program runs, the more precise the output will be. To prevent overflow, use uint64_t to collect the sums (your program would have to run more than 200000 days even with ns precision to overflow...).

c++ stack efficient for multicore application

I am trying to code a multicode Markov Chain in C++ and while I am trying to take advantage of the many CPUs (up to 24) to run a different chain in each one, I have a problem in picking a right container to gather the result the numerical evaluations on each CPU. What I am trying to measure is basically the average value of an array of boolean variables. I have tried coding a wrapper around a `std::vector`` object looking like that:
struct densityStack {
vector<int> density; //will store the sum of boolean varaibles
int card; //will store the amount of elements we summed over for normalizing at the end
densityStack(int size){ //constructor taking as only parameter the size of the array, usually size = 30
density = vector<int> (size, 0);
card = 0;
}
void push_back(vector<int> & toBeAdded){ //method summing a new array (of measurements) to our stack
for(auto valStack = density.begin(), newVal = toBeAdded.begin(); valStack != density.end(); ++valStack, ++ newVal)
*valStack += *newVal;
card++;
}
void savef(const char * fname){ //method outputting into a file
ofstream out(fname);
out.precision(10);
out << card << "\n"; //saving the cardinal in first line
for(auto val = density.begin(); val != density.end(); ++val)
out << << (double) *val/card << "\n";
out.close();
}
};
Then, in my code I use a single densityStack object and every time a CPU core has data (can be 100 times per second) it will call push_back to send the data back to densityStack.
My issue is that this seems to be slower that the first raw approach where each core stored each array of measurement in file and then I was using some Python script to average and clean (I was unhappy with it because storing too much information and inducing too much useless stress on the hard drives).
Do you see where I can be losing a lot of performance? I mean is there a source of obvious overheading? Because for me, copying back the vector even at frequencies of 1000Hz should not be too much.

How are you synchronizing your shared densityStack instance?
From the limited info here my guess is that the CPUs are blocked waiting to write data every time they have a tiny chunk of data. If that is the issue, a simple technique to improve performance would be to reduce the number of writes. Keep a buffer of data for each CPU and write to the densityStack less frequently.

How do I avoid using arrays?

My professor wants us to write a program without using arrays or vectors like this:
Write a program using functions that calculates and prints parking charges for each of the n customers who parked their cars in the garage.
Parking rates:
a parking garage charges a $5.00 minimum fee to park for up to five hours.
the garage charges an additional $0.50 per hour for each hour or part thereof in the excess of five hours
the maximum charge for any given 24hr period is $10.00. Assume that no car parks longer that 24 hours at a time.
You should enter the hours parked for each customer. Your program should print the results in a neat tabular format and should calculate and print the total of your receipts.
The program output should look like this:
car------Hours------Charge
1--------2.00--------$5.00
2--------5.00--------$5.00
3--------5.30--------$5.50
etc.
total: 3---12.30----$15.50
I only managed to get this far:
include <iostream>
include <conio.h>
include <cmath>
include <iomanip>
using namespace std;
double calculate(double);
int main()
{
double hours,charge;
int finish;
double sumhours;
sumhours=0;
finish=0;
charge=0;
int cars;
cars=0;
do
{
cout<<"Enter the number of hours the vehicle has been parked: "<<endl;
cin>>hours;
cars++;
sumhours+=hours;
finish=cin.get();
if(hours>24)
{
cout<<"enter a time below 24hrs."<<endl;
cars--;
sumhours=sumhours-hours;
}
}
while(finish!=EOF);
double total=calculate(hours);
cout<<total<<": "<<(cars-1)<<": "<<sumhours;
while(!_kbhit());
return 0;
}
double calculate(double time)
{
double calculate=0;
double fees;
if(time<=5)
return 5;
if(time>15)
return 10;
time=ceil(time);
fees=5+(.5*(time-5));
return calculate;
}

Since this is homework, here is an algorithm:
1. Print header.
2. Clear running total variables.
3. While not end of file
3.1 read a record.
3.2 print record contents
3.3 add record field values to running total variables (Hint! Hint!)
3.4. end-while
4. print out running total variables.
You may have to do some additional calculations with the running total variables, especially for averages.
Edit 1: Example of a running total variable
int sum = 0; // This is the running total variable.
const unsigned int QUANTITY = 23;
for (unsigned int i = 0; i < QUANTITY; ++i)
{
cout << "Adding " << i << " to sum.\n";
sum += i;
}
cout << "Sum is: " << sum << "\n";
cout.flush();
In this example, the data 'i' is not stored only used. The sum variable is a running total.
Look for similarities in your assignment.
Edit 2: Example of detecting end of input on cin
char reply = 'n';
while (tolower(reply) != 'y')
{
cout << "Do you want to quit? (y/n)";
cout.flush();
cin >> reply;
cin.ignore(1000, '\n'); // Eat up newline.
}
cout << "Thanks for the answer.\n";
cout.flush();

Since you can't use arrays or vectors, I think you should print the parking data for each car as it's being processed. Pseudocode:
While more cars:
Read data for next car
Calculate cost
Print data
Add to running totals
End while
Print totals

On every iteration, generate the relevant output, but don't stream it to std::cout. Instead, stream it to a std::stringstream object. Then, at the end, stream that object to std::cout. The maths can be done simply by maintaining a running accumulation of the input values.
This, of course, assumes that using a std::stringstream is not considered "cheating" in the context of this homework exercise.

You can try storing your values in a linked list structure instead of an array. Linked lists work great for dynamic storage.
Try this tutorial, http://www.cprogramming.com/tutorial/lesson15.html

My suggestion then is to use a recursive method, the method first accepts input, asks if there is any more input. If there is more input, it then calls itself. If there is no more input, it outputs it's current car and then returns a sum that's added so far in a structure.
The only problem with this method is that it would output entered cars in reverse of input, but it would do so without an array or a file to save to.

C++ To Cuda Conversion/String Generation And Comparison

So I am in a basic High School coding class. We had to think up one
of our semester projects. I chose to
base mine on ideas and applications
that arn't used in traditional code.
This brought up the idea for use of
CUDA. One of the best ways I would
know to compare speed of traditional
methods versus unconventional is
string generation and comparison. One
could demonstrate the generation and
matching speed of traditional CPU
generation with timers and output. And
then you could show the increase(or
decrease) in speed and output of GPU
Processing.
I wrote this C++ code to generate random characters that are input into
a character array and then match that
array to a predetermined string.
However like most CPU programming it
is incredibly slow comparatively to
GPU programming. I've looked over CUDA
API and could not find something that
would possibly lead me in the right
direction for what I'm looking to do.
Below is the code I have written in C++, if anyone could point me in
the direction of such things as a
random number generator that I can
convert to chars using ASCII codes,
that would be excellent.
#include <iostream>
#include <string>
#include <cstdlib>
using namespace std;
int sLength = 0;
int count = 0;
int stop = 0;
int maxValue = 0;
string inString = "aB1#";
static const char alphanum[] =
"0123456789"
"!##$%^&*"
"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
"abcdefghijklmnopqrstuvwxyz";
int stringLength = sizeof(alphanum) - 1;
char genRandom()
{
return alphanum[rand() % stringLength];
}
int main()
{
cout << "Length of string to match?" << endl;
cin >> sLength;
string sMatch(sLength, ' ');
while(true)
{
for (int x = 0; x < sLength; x++)
{
sMatch[x] = genRandom();
//cout << sMatch[x];
count++;
if (count == 2147000000)
{
count == 0;
maxValue++;
}
}
if (sMatch == inString)
{
cout << "It took " << count + (maxValue*2147000000) << " randomly generated characters to match the strings." << endl;
cin >> stop;
}
//cout << endl;
}
}

If you want to implement a pseudorandom number generator using CUDA, have a look over here. If you want to generate chars from a predetermined set of characters, you can just put all possible chars into that array and create a random index (just as you are doing it right now).
But I think it might be more valuable comparison might be one that uses brute force. Therefore, you could adapt your program to try not random strings, but try one string after another in any meaningful order.
Then, on the other hand, you could implement the brute-force stuff on the GPU using CUDA. This can be tricky since you might want to stop all CUDA threads as soon as one of them finds a solution. I could imagine the brute force process using CUDA the following way: One thread tries aa as first two letters and brute-forces all following digits, the next thread tries ab as first two letters and brute-forces all following digits, the next thread tries ac as first two letters and brute-forces all following digits, and so on. All these threads run in parallel. Of course, you could vary the number of predetermined chars such that e.g. the first thread tries aaaa, the second aaab. Then, you could compare different input values.
Any way, if you have never dealt with CUDA, I recommend the vector addition sample, a very basic CUDA example, that serves very well for getting a basic understanding of what's going on with CUDA. Moreover, you should read the CUDA programming guide to make yourself familiar with CUDAs concept of a grid of thread-blocks containing a grid of threads. Once you understand this, I think it becomes clearer how CUDA organizes stuff. To be short, in CUDA, you should replace loops with a kernel, that is executed multiple times at once.

First off, I am not sure what your actual question is? Do you need a faster random number generator or one with a greater period? In that case I would recommend boost::random, the "Mersenne Twister" is generally considered state of the art. It is a little hard to get started, but boost is a great library so worth the effort.
I think the method you arer using should be fairly efficient. Be aware that it could take up to (#characters)^(length of string) draws to get to the target string (here 70^4 = 24010000). GPU should be at an advantage here since this process is a Monte Carlo simulation and trivially parallelizable.
Have you compiled the code with optimizations?

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js