I have a process that get killed immediately after executing the program. This is the code of the compiled executable, and it is a small program that reads several graphs represented by numbers from standard input (a descriptive file usually) and finds the minimum spanning tree for every graph using the Prim's algorithm (it does not show the results yet, it just find the solution).
#include <stdlib.h>
#include <iostream>
using namespace std;
const int MAX_NODOS = 20000;
const int infinito = 10000;
int nnodos;
int nAristas;
int G[MAX_NODOS][MAX_NODOS];
int solucion[MAX_NODOS][MAX_NODOS];
int menorCoste[MAX_NODOS];
int masCercano[MAX_NODOS];
void leeGrafo(){
if (nnodos<0 || nnodos>MAX_NODOS) {
cerr << "Numero de nodos (" << nnodos << ") no valido\n";
exit(0);
}
for (int i=0; i<nnodos ; i++)
for (int j=0; j<nnodos ; j++)
G[i][j] = infinito;
int A,B,P;
for(int i=0;i<nAristas;i++){
cin >> A >> B >> P;
G[A][B] = P;
G[B][A] = P;
}
}
void prepararEstructuras(){
// Grafo de salida
for(int i=0;i<nnodos;i++)
for(int j=0;j<nnodos;j++)
solucion[i][j] = infinito;
// el mas cercaano
for(int i=1;i<nnodos;i++){
masCercano[i]=0;
// menor coste
menorCoste[i]=G[0][i];
}
}
void prim(){
prepararEstructuras();
int min,k;
for(int i=1;i<nnodos;i++){
min = menorCoste[1];
k = 1;
for(int j=2;i<nnodos;j++){
if(menorCoste[j] < min){
min = menorCoste[j];
k = j;
}
}
solucion[k][masCercano[k]] = G[k][masCercano[k]];
menorCoste[k] = infinito;
for(int j=1;j<nnodos;j++){
if(G[k][j] < menorCoste[j] && menorCoste[j]!=infinito){
menorCoste[j] = G[k][j];
masCercano[j] = k;
}
}
}
}
void output(){
for(int i=0;i<nnodos;i++){
for(int j=0;j<nnodos;j++)
cout << G[i][j] << ' ';
cout << endl;
}
}
int main (){
while(true){
cin >> nnodos;
cin >> nAristas;
if((nnodos==0)&&(nAristas==0)) break;
else{
leeGrafo();
output();
prim();
}
}
}
I have learned that i must use strace to find what is going on, and this is what i get :
execve("./412", ["./412"], [/* 38 vars */] <unfinished ...>
+++ killed by SIGKILL +++
Killed
I am runing ubuntu and this is the first time i get this type of errors. The program is supposed to stop after reading two zeros in a row from the input wich i can guarantee that i have in my graphs descriptive file. Also the problem happens even if i execute the program without doing an input redirection to my graphs file.
Although I'm not 100% sure that this is the problem, take a look at the sizes of your global arrays:
const int MAX_NODOS = 20000;
int G[MAX_NODOS][MAX_NODOS];
int solucion[MAX_NODOS][MAX_NODOS];
Assuming int is 4 bytes, you'll need:
20000 * 20000 * 4 bytes * 2 = ~3.2 GB
For one, you might not even have that much memory. Secondly, if you're on 32-bit, it's likely that the OS will not allow a single process to have that much memory at all.
Assuming you're on 64-bit (and assuming you have enough memory), the solution would be to allocate it all at run-time.
Your arrays G and solucion each contain 400,000,000 integers, which is about 1.6 GiB each on most machines. Unless you have enough (virtual) memory for that (3.2 GiB and counting), and permission to use it (try ulimit -d; that's correct for bash on MacOS X 10.7.2), your process will fail to start and will be killed by SIGKILL (which cannot be trapped, not that the process is really going yet).
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I have a program that reads data from a file (a matrix of 400,000 * 3 elements) and writes it to a two-dimensional array and write this array. However, there is a problem: it all takes a long time (6 seconds). According to the conditions of correct tests, this should take no more than 2 seconds.
int main()
{
ifstream file_for_reading("C:\\Tests\\20");
int a,b,c;
int edge, number_of_vertexes;
file_for_reading >> number_of_vertexes >> edge;
if (number_of_vertexes < 1 || number_of_vertexes > 30000 || edge < 0 || edge>400000) { cout << "Correct your vallues"; exit(1); };
short** matrix = new short* [edge];
for (int i = 0; i < edge; i++)
matrix[i] = new short[3];
int tmp = 0;
for (int i = 0; i < edge; i++) {
file_for_reading >> matrix[i][tmp] >> matrix[i][tmp+1] >> matrix[i][tmp+2];
tmp = 0;
}
file_for_reading.close();
//Dijkstra(matrix, number_of_vertexes);
}
S.M.'s advice is promising - just short* matrix = new short [edge * 3]; then for (int i = 0; i < edge * 3; i++) file_for_reading >> matrix[i]; to read the file. Crucially, this puts all the file content into contiguous memory, which is more CPU cache friendly.
Using the following code I generated test input and measured the performance of your original approach and the contiguous-memory approach:
#include <iostream>
#include <fstream>
#include <cstdlib>
#include <cstring>
#include <string>
using namespace std::literals;
#define ASSERT(X) \
do { \
if (X) break; \
std::cerr << ':' << __LINE__ << " ASSERT " << #X << '\n'; \
exit(1); \
} while (false)
int main(int argc, const char* argv[])
{
// for (int i = 0; i < 400000 * 3; ++i)
// std::cout << rand() % 32768 << ' ';
// old way...
std::ifstream in{"fasterread.in"};
ASSERT(in);
if (argc == 2 && argv[1] == "orig"s) {
short** m = new short*[400000];
for (int i = 0; i < 400000; ++i)
m[i] = new short[3];
for (int i = 0; i < 400000; ++i)
in >> m[i][0] >> m[i][1] >> m[i][2];
}
if (argc == 2 && argv[1] == "contig"s) {
short* m = new short[400000 * 3];
for (int i = 0; i < 400000 * 3; ++i)
in >> m[i];
}
}
I then compiled them with optimisations using GCC on Linux:
g++ -O2 -Wall -std=c++20 fasterread.cc -o fasterread
And ran them with the time utility to show elapsed time:
time ./fasterread orig; time ./fasterread contig
Over a dozen runs of each, the fastest the orig version completed was 0.063 seconds (I have a fast SSD), whilst contig took as little as 0.058 seconds. Still not fast enough to meet your 3-fold reduction target.
That said, C++ ifstream supports locale translations whilst parsing numbers - using a slowish virtual dispatch mechanism - so may be slower than other text-to-number parsing that you could use or write.
But, when you're 100x slower than me - it's obviously your old HDD that sucks, and not the software parsing the numbers....
FWIW, I tried C-style I/O using fscanf and it proved slower for me - 0.077s.
Here's an optimized version for you:
const int limit = edge / 2;
for (int i = 0; i < limit; i += 2)
{
/* register */ int a, b, c, d, e, f;
file_for_reading >> a >> b >> c >> d >> e >> f;
matrix[i][0] = a;
matrix[i][1] = b;
matrix[i][2] = c;
matrix[i + 1][0] = d;
matrix[i + 1][1] = e;
matrix[i + 1][2] = f;
}
for (; i < edge; ++i)
{
/* register */ int a, b, c;
file_for_reading >> a >> b >> c;
matrix[i][0] = a;
matrix[i][1] = b;
matrix[i][2] = c;
}
Here are the optimization principles, I'm trying to achieve in the above example:
Keep the file streaming (more), read more data per transaction.
Group the matrix assignments together, separate from the input.
This allows the compiler and processor to optimize. The processor can reduce memory fetches to take advantage of prefetching.
Hopefully, the compiler can use registers for the local variables. Register access is faster than memory access.
By grouping the assignments, maybe the compiler can use some advanced processor instructions.
Loop unrolling. The loop overhead (comparison and increment) are performed less often.
The best idea is to set your compiler for highest optimization and create a release build. Also have your compiler print the assembly language for the functions. The compiler may already perform some of the above optimizations. IMHO, it never hurts to make your code easier for the compiler to optimize. :-)
Edit 1:
I'm hoping also that the matrix assignment may occur while reading in the "next" group of variables. This would be a great optimization. I'm open to people suggesting edits to this answer showing how to do that (without using threads).
My program opens a file which contains 100,000 numbers and parses them out into a 10,000 x 10 array correlating to 10,000 sets of 10 physical parameters. The program then iterates through each row of the array, performing overlap calculations between that row and every other row in the array.
The process is quite simple, and being new to c++, I programmed it the most straightforward way that I could think of. However, I know that I'm not doing this in the most optimal way possible, which is something that I would love to do, as the program is going to face off against my cohort's identical program, coded in Fortran, in a "race".
I have a feeling that I am going to need to implement multithreading to accomplish my goal of speeding up the program, but not only am I new to c++, I am new to multithreading, so I'm not sure how I should go about creating new threads in a beneficial way, or if it is even something that would give me that much "gain on investment" so to speak.
The program has the potential to be run on a machine with over 50 cores, but because the program is so simple, I'm not convinced that more threads is necessarily better. I think that if I implement two threads to compute the complex parameters of the two gaussians, one thread to compute the overlap between the gaussians, and one thread that is dedicated to writing to the file, I could speed up the program significantly, but I could also be wrong.
CODE:
cout << "Working...\n";
double **gaussian_array;
gaussian_array = (double **)malloc(N*sizeof(double *));
for(int i = 0; i < N; i++){
gaussian_array[i] = (double *)malloc(10*sizeof(double));
}
fstream gaussians;
gaussians.open("GaussParams", ios::in);
if (!gaussians){
cout << "File not found.";
}
else {
//generate the array of gaussians -> [10000][10]
int i = 0;
while(i < N) {
char ch;
string strNums;
string Num;
string strtab[10];
int j = 0;
getline(gaussians, strNums);
stringstream gaussian(strNums);
while(gaussian >> ch) {
if(ch != ',') {
Num += ch;
strtab[j] = Num;
}
else {
Num = "";
j += 1;
}
}
for(int c = 0; c < 10; c++) {
stringstream dbl(strtab[c]);
dbl >> gaussian_array[i][c];
}
i += 1;
}
}
gaussians.close();
//Below is the process to generate the overlap file between all gaussians:
string buffer;
ofstream overlaps;
overlaps.open("OverlapMatrix", ios::trunc);
overlaps.precision(15);
for(int i = 0; i < N; i++) {
for(int j = 0 ; j < N; j++){
double r1[6][2];
double r2[6][2];
double ol[2];
//compute complex parameters from the two gaussians
compute_params(gaussian_array[i], r1);
compute_params(gaussian_array[j], r2);
//compute overlap between the gaussians using the complex parameters
compute_overlap(r1, r2, ol);
//write to file
overlaps << ol[0] << "," << ol[1];
if(j < N - 1)
overlaps << " ";
else
overlaps << "\n";
}
}
overlaps.close();
return 0;
Any suggestions are greatly appreciated. Thanks!
I am trying to run the below C++ code and I get this error :
Could anyone please help me clarify why is this the issue
Input : input/text_4.txt 9
terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc
Aborted (core dumped)
After reading a few similar threads, the solution is to check dynamic memory allocation. However, my code does not have any dynamically allocated memory
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <sys/types.h>
#include <sys/stat.h>
using namespace std;
vector<string> arrangefile(vector<string>& scale, int width, int &number) {
int beginning = 0; int total = 0;
vector<string> result;
for(int i = 0; i < scale.size(); i++)
{
total += scale[i].size(); // add length of each word
if(total + i - beginning > width) // checking if the value has exceeded the maximum width
{
total -= scale[i].size();
string sentence= "",low="";
int last = i-1;
int space = width - total; // calculate number of spaces in each line
int check = max(last-beginning, 1);
int even = space/check;
while(even--){
low += " ";
}
int mod = space%check;
for(int j = beginning; j <= last; j++)
{
sentence += scale[j]; //find all values in a sentence
if(j < last || beginning == last)
sentence += low; // add the word low to the larger sentence
if(j - beginning < mod)
sentence += " ";
}
result.push_back(sentence); // add the sentence to the vector
number++; // counts the number of sentences
beginning = i;
total = scale[i].size();
}
}
string sentence =""; // for the last line
int last = scale.size()-1;
int check = last-beginning;
int space = width - total - check;
string low="";
while(space--){
low += " ";
}
for(int j = beginning; j <= last; j++)
{
sentence += scale[j];
if(j < last){
sentence += " ";
}
}
sentence += low;
result.push_back(sentence); // // add the sentence to the vector
number++; // counts the number of sentences
return result;
}
int main(){
string filepath, word;
int M, number=0;
cin >> filepath;
cin >> M;
ifstream fin;
fin.open(filepath.c_str());
unsigned found = filepath.find_last_of("/");
string b = filepath.substr(found+1);
int create = b.size();
string between = b.substr(0, create-4);
string final = between + "_formatted.txt";
string ending = "output/" + final;
mkdir ("output", 0777);
ofstream fout;
fout.open(ending);
for(int i = 0, count = 0; i<M; i++, count ++){
if(count == 9){
fout<<count;
count = -1;
}
else
fout<<count;
}
fout<<endl;
vector <string> first;
vector <string> second;
while(fin >> word){
first.push_back(word);
}
if(first.empty()){
cout<<"0 formatted lines written to "<< ending<<endl;
}
else{
second = arrangefile(first, M,number);
for (auto i = second.begin(); i != second.end(); ++i)
fout << *i <<endl;
cout<<number<<" formatted lines written to "<<ending<<endl;
}
fin.close();
fout.close();
return 0;
}
input file text_4.txt:
This is because not very many happy things happened
in the lives of the three Baudelaire youngsters.
Input: input/text_4.txt 8
When I run your code, on the i==16 iteration of the outer loop in arrangefile, we get width==8 and total==10, with check==1. As a result, even is initialized to -2, and so the while(even--) loop is (nearly) infinite. So it attempts to add spaces to low until it runs out of memory.
(Note that the memory used by std::string is dynamically allocated, so your code does have dynamic memory allocation. The same for std::vector.)
I haven't analyzed your algorithm closely enough to figure out the correct fix, but it's possible your loop should be while(even-- > 0) instead.
I'll second the tip in the comments to use your debugger, and I'll repost the link: What is a debugger and how can it help me diagnose problems?. That's how I found this bug.
I ran the program under the debugger gdb. It ran for a few seconds, at which point I got suspicious because the program doesn't appear do anything complicated enough to take that much computation time. So I interrupted the program (Ctrl-C) which let me see where it was and what it was doing. I could see that it was within the while(even--) loop. That was also suspicious because that loop should complete very fast. So I inspected the value of even (with the command p even) and saw that it was a large negative number. That could only happen if it had started as a negative number, which logically could only happen if total were greater than width. Inspecting their values I could see that this was indeed the case.
Maybe this will be helpful as you learn more about using your debugger.
I've made a program which process a lot of data, and it takes forever at runtime, but looking in Task Manager I found out that the executable only uses a small part of my cpu and my RAM...
How can I tell my IDE to allocate more resources (as much as he can) to my program?
Running it in Release x64 helps but not enough.
#include <cstddef>
#include <iostream>
#include <utility>
#include <vector>
int main() {
using namespace std;
struct library {
int num = 0;
unsigned int total = 0;
int booksnum = 0;
int signup = 0;
int ship = 0;
vector<int> scores;
};
unsigned int libraries = 30000; // in the program this number is read a file
unsigned int books = 20000; // in the program this number is read a file
unsigned int days = 40000; // in the program this number is read a file
vector<int> scores(books, 0);
vector<library*> all(libraries);
for(auto& it : all) {
it = new library;
it->booksnum = 15000; // in the program this number is read a file
it->signup = 50000; // in the program this number is read a file
it->ship = 99999; // in the program this number is read a file
it->scores.resize(it->booksnum, 0);
}
unsigned int past = 0;
for(size_t done = 0; done < all.size(); done++) {
if(!(done % 1000)) cout << done << '-' << all.size() << endl;
for(size_t m = done; m < all.size() - 1; m++) {
all[m]->total = 0;
{
double run = past + all[m]->signup;
for(auto at : all[m]->scores) {
if(days - run > 0) {
all[m]->total += scores[at];
run += 1. / all[m]->ship;
} else
break;
}
}
}
for(size_t n = done; n < all.size(); n++)
for(size_t m = 0; m < all.size() - 1; m++) {
if(all[m]->total < all[m + 1]->total) swap(all[m], all[m + 1]);
}
past += all[done]->signup;
if (past > days) break;
}
return 0;
}
this is the cycle which takes up so much time... For some reason even using pointers to library doesn't optimize it
RAM doesn't make things go faster. RAM is just there to store data your program uses; if it's not using much then it doesn't need much.
Similarly, in terms of CPU usage, the program will use everything it can (the operating system can change priority, and there are APIs for that, but this is probably not your issue).
If you're seeing it using a fraction of CPU percentage, the chances are you're either waiting on I/O or writing a single threaded application that can only use a single core at any one time. If you've optimised your solution as much as possible on a single thread, then it's worth looking into breaking its work down across multiple threads.
What you need to do is use a tool called a profiler to find out where your code is spending its time and then use that information to optimise it. This will help you with microoptimisations especially, but for larger algorithmic changes (i.e. changing how it works entirely), you'll need to think about things at a higher level of abstraction.
I'm working on a parallel sort program to learn MPI, and I've been having problems with MPI_Scatter. Every time I attempt to run, I get the following:
reading input
Scattering input
_pmii_daemon(SIGCHLD): [NID 00012] PE 0 exit signal Segmentation fault
[NID 00012] 2011-03-28 10:12:56 Apid 23655: initiated application termination
A basic look at other questions didn't really answer why I'm having troubles - The arrays are contiguous, so I shouldn't have problems with non-contiguous memory access, and I'm passing the correct pointers in the correct order. Does anyone have any ideas?
Source code is below - It's specified for a specific number because I don't want to deal with variable input and rank size just yet.
#include <mpi.h>
#include <iostream>
using std::endl;
using std::cout;
#include <fstream>
using std::ifstream;
using std::ofstream;
#include <algorithm>
using std::sort;
#define SIZEOF_INPUT 10000000
#define NUMTHREADS 100
#define SIZEOF_SUBARRAY SIZEOF_INPUT/NUMTHREADS
int main(int argc, char** argv){
MPI_Init(&argc, &argv);
int input[SIZEOF_INPUT];
int tempbuf[SIZEOF_SUBARRAY];
int myRank;
MPI_Comm_rank(MPI_COMM_WORLD, &myRank);
/*
Read input from file
*/
if(myRank == 0){
cout << "reading input" << endl;
ifstream in(argv[1]);
for(int i = 0; i < SIZEOF_INPUT; ++i)
in >> input[i];
cout << "Scattering input" << endl;
}
// Scatter, Sort, and Gather again
MPI_Scatter(input,SIZEOF_INPUT,MPI_INT,tempbuf,SIZEOF_SUBARRAY,MPI_INT,0,MPI_COMM_WORLD);
cout << "Rank " << myRank << "Sorting" << endl;
sort(tempbuf,tempbuf+SIZEOF_SUBARRAY);
MPI_Gather(tempbuf,SIZEOF_SUBARRAY,MPI_INT,input,SIZEOF_INPUT,MPI_INT,0,MPI_COMM_WORLD);
if(myRank == 0){
cout << "Sorting final output" << endl;
// I'm doing a multi-queue merge here using tricky pointer games
//list of iterators representing things in the queue
int* iterators[NUMTHREADS];
//The ends of those iterators
int* ends[NUMTHREADS];
//Set up iterators and ends
for(int i = 0; i < NUMTHREADS; ++i){
iterators[i] = input + (i*SIZEOF_SUBARRAY);
ends[i] = iterators[i] + SIZEOF_SUBARRAY;
}
ofstream out(argv[2]);
int ULTRA_MAX = SIZEOF_INPUT + 1;
int* ULTRA_MAX_POINTER = &ULTRA_MAX;
while(true){
int** curr_min = &ULTRA_MAX_POINTER;
for(int i = 0 ; i < NUMTHREADS; ++i)
if(iterators[i] < ends[i] && *iterators[i] < **curr_min)
curr_min = &iterators[i];
if(curr_min == &ULTRA_MAX_POINTER) break;
out << **curr_min << endl;
++(*curr_min);
}
}
MPI_Finalize();
}
Any help would be much appreciated.
Regards,
Zach
Hah! Took me a while to see this one.
The trick is, in MPI_Scatter, the sendcount is the amount to send to each process, not in total. Same with gather; it's the amount to receive from each. That is, it's like MPI_Scatterv with counts; the count is to each process, but in this case, it's assumed to be the same.
so this
MPI_Scatter(input,SIZEOF_SUBARRAY,MPI_INT,tempbuf,SIZEOF_SUBARRAY,MPI_INT,0,MPI_COMM_WORLD);
cout << "Rank " << myRank << "Sorting" << endl;
MPI_Gather(tempbuf,SIZEOF_SUBARRAY,MPI_INT,input,SIZEOF_SUBARRAY,MPI_INT,0,MPI_COMM_WORLD);
works for me.
Also, be careful of allocating big arrays like that on the stack; I know this is just an example problem, but for me this was causing crashes right away. Doing it dynamically
int *input = new int[SIZEOF_INPUT];
int *tempbuf = new int[SIZEOF_SUBARRAY];
//....
delete [] input;
delete [] tempbuf;
solved that problem.
int* iterators[NUMTHREADS];
//The ends of those iterators
int* ends[NUMTHREADS];
//Set up iterators and ends
for(int i = 0; i < NUMTHREADS; ++i){
iterators[i] = input + (i*SIZEOF_SUBARRAY); // problem
ends[i] = iterators[i] + SIZEOF_SUBARRAY; // problem
}
Both iterators and ends are array of integer pointers pointing no where or garbage. But in the for loop trying to keep values as if they are pointing to some location, which results segmentation fault. Program should should first allocate memory, iterators can point to and then should keep the values at locations pointed by them.
for( int i=0 ; i < NUMTHREADS; ++i )
{
iterators[i] = new int;
end[i] = new int ;
}
// Now do the earlier operation which caused problem
Since the program manages resources( i.e., acquired from new ), it should return resources to free store using delete[] when no longer needed. Use std::vector instead of managing resources your self, which is very easy.