Suppose you want to read the data from large text file (~300mb) to array of vectors: vector<string> *Data (assume that the number of columns is known).
//file is opened with ifstream; initial value of s is set up, etc...
Data = new vector<string>[col];
string u;
int i = 0;
do
{
istringstream iLine = istringstream(s);
i=0;
while(iLine >> u)
{
Data[i].push_back(u);
i++;
}
}
while(getline(file, s));
This code works fine for small files (<50mb) but memory usage is increasing exponentially when reading large file. I'm pretty sure that the problem is in creating istringstream objects each time in a loop. However, defining istringstream iLine; outside of both loops and putting each string into stream by iLine.str(s); and clearing the stream after inner while-loop (iLine.str(""); iLine.clear();) causes the same order of memory explosion as well.
The questions that arise:
why istringstream behaves this way;
if it is the intended behavior, how the above task can be accomplished?
Thank you
EDIT: In regards to the 1st answer I do clean the memory allocated by array later in the code:
for(long i=0;i<col;i++)
Data[i].clear();
delete []Data;
FULL COMPILE-READY CODE (add headers):
int _tmain(int argc, _TCHAR* argv[])
{
ofstream testfile;
testfile.open("testdata.txt");
srand(time(NULL));
for(int i = 1; i<1000000; i++)
{
for(int j=1; j<100; j++)
{
testfile << rand()%100 << " ";
}
testfile << endl;
}
testfile.close();
vector<string> *Data;
clock_t begin = clock();
ifstream file("testdata.txt");
string s;
getline(file,s);
istringstream iss = istringstream(s);
string nums;
int col=0;
while(iss >> nums)
{
col++;
}
cout << "Columns #: " << col << endl;
Data = new vector<string>[col];
string u;
int i = 0;
do
{
istringstream iLine = istringstream(s);
i=0;
while(iLine >> u)
{
Data[i].push_back(u);
i++;
}
}
while(getline(file, s));
cout << "Rows #: " << Data[0].size() << endl;
for(long i=0;i<col;i++)
Data[i].clear();
delete []Data;
clock_t end = clock();
double elapsed_secs = double(end - begin) / CLOCKS_PER_SEC;
cout << elapsed_secs << endl;
getchar();
return 0;
}
vector<> grows memory geometrically. A typical pattern would be that it doubles the capacity whenever it needs to grow. That may leave a lot of extra space allocated but unused, if your loop ends right after such a threshold. You could try calling shrink_to_fit() on each vector when you are done.
Additionally, memory allocated by the C++ allocators (or even plain malloc()) is often not returned to the OS, but left in a process-internal free memory pool. this may lead to further apparent growth. And it may cause the results of shrink_to_fit() to be invisible from outside the process.
Finally if you have lots of small strings ("2-digit numbers"), the overhead of a stringobject may be considerable. Even if the implementation uses a small-string optimization, I'd assume that a typical string uses no less than 16 or 24 bytes (size, capacity, data pointer or small string buffer) - probably more on a platform where size_type is 64 bits. That is a lot of memory for 3 bytes of payload.
So I assume you are seeing normal behaviour of vector<>
I seriously suspect this is not istringstream problem (especially, given you have same result with iLine constructor outside the loop).
Possibly, this is a normal behavior of the std::vector. To test that, how about you run the exact same lines, but comment out: Data[i].push_back(u);. See if your memory grows this way. If it doesn't then you know where the problem is..
Depends on your library, vector::push_back will expand its capacity by a factor of 1.5 (Microsoft) or 2 (glib) every time it needs more room.
Related
I would like to ask if this part of the code might suffer from memory leaks (I'm quite sure it does, but how severely?).
The "input" variable is a pointer to double, i.e. double* input. The reason I didn't use float (more compatible in this case) is because I wanted to maintain compatibility with other parts of the code.
else if (filetype == "BinaryFile")
{
char* memblock;
std::ifstream file(filename1, std::ios::binary | std::ios::in);
file.seekg(0, std::ios::end);
int size = file.tellg();
file.seekg(0, std::ios::beg);
std::cout << "Size=" << size << " [in bytes]"
<< "\n";
std::cout << "There are overall " << grid_points << "^3 = " << std::setprecision(10) << pow(grid_points, 3) << " values of the field1, written as float type.\n";
memblock = new char[size];
file.seekg(0, std::ios::beg);
file.read(memblock, size);
file.close();
float* values = (float*)memblock; //reinterpret as float, because the file was saved as float
for (int i = 0; i < grid_points * grid_points * grid_points; i++) {
input1[i] = (double)values[i]; //cast to double, since input1 is an array of doubles
}
file.close();
delete[] memblock;
}
The files that I need to work on are quite big, coming from cosmological simulations; for example one file is 4GB and the other could be 20 GB. I'm using the supercomputer infrastructure for that reason.
This kind of reading works for files that have 512^3 float values (e.x. density evaluated on points in a cube of side 512) but memory leaks happen for a file with 1024^3 entries.
I had thought I should delete[] the "values" array, but when I do that, I get even worse memory leaks, crashing my program even in the case where previously all was calculated correctly (512^3).
How could I improve on this code? I would have used the std::vector container but I had to use the FFTW library.
EDIT:
Following the suggestions in the comments, I have rewritten the reading part of the code as:
std::ifstream file(filename1,std::ios::binary);
std::vector<float> buf(pow(grid_points,3));
file.read(reinterpret_cast<char*>(buf.data()), buf.size()*sizeof(float));
std::copy_n(buf.begin(),pow(grid_points,3),input1);
Where I explicitly make use of the knowledge of how many elements there will be in the input1 array. No memory leaks occur now.
I need some kind of error handler if I enter a string larger than the set size.
cout << "Enter long of the string" << endl;
cin >> N;
char* st = new char[N];
char* st1 = new char[N];
for (int i = 0; i < N; ++i) {
*(st1 + i) = ' ';
}
cout << "Enter string in the end put 0,without whitespace in the end." << endl;
cin.getline(st, N, '0');
First some comments.
Do not use C-Style arrays in C++ (like char data[N])
Always use std::string for strings
Never use char arrays for strings
Never ever use raw pointers for owned memory in C++
Neally never use new in C++
Avoid using pointer arithmetic with raw pointers pointing to owned memory
So, you should rethink your design. Start doing it correctly in the first place.
To answer you concrete question: If you read the documentation of the getline, then you can see that
count-1 characters have been extracted (in which case setstate(failbit) is executed).
So, the failbit will be set. You can check this with
if (std::cin.rdstate() == std::ios_base::failbit)
But as you can also read in the documentation
Extracts characters from stream until end of line or the specified delimiter delim.
So, it will not work, as you expect, It will try to read until 0 has been read. I think it will not work for you.
You also need to delete the newed memory. Otherwise, you are creating a memory hole. Look at you example again and try it:
#include <iostream>
int main() {
size_t N;
std::cout << "Enter maximum length of the string\n";
std::cin >> N;
char* st = new char[N];
char* st1 = new char[N];
for (size_t i = 0U; i < N; ++i) {
*(st1 + i) = ' ';
}
std::cout << "Enter string in the end put 0, without whitespace in the end.\n";
std::cin.getline(st, N, '0');
if (std::cin.rdstate() == std::ios_base::failbit) {
std::cin.clear();
std::cout << "\nError: Wrong string entered\n\n";
}
delete[] st;
delete[] st1;
return 0;
}
Solution for all your problems: Use std::string and std::getline
I have a function which returns a pointer to an array of doubles:
double * centerOfMass(System &system) {
long unsigned int size = system.atoms.size();
double x_mass_sum=0.0; double y_mass_sum=0.0; double z_mass_sum=0.0; double mass_sum=0.0;
for (int i=0; i<=size; i++) {
double atom_mass = system.atoms[i].m;
mass_sum += atom_mass;
x_mass_sum += system.atoms[i].pos["x"]*atom_mass;
y_mass_sum += system.atoms[i].pos["y"]*atom_mass;
z_mass_sum += system.atoms[i].pos["z"]*atom_mass;
}
double comx = x_mass_sum/mass_sum;
double comy = y_mass_sum/mass_sum;
double comz = z_mass_sum/mass_sum;
double* output = new double[3]; // <-------- here is output
output[0] = comx*1e10; // convert all to A for writing xyz
output[1] = comy*1e10;
output[2] = comz*1e10;
return output;
}
When I try to access the output by saving the array to a variable (in a different function), I get a segmentation fault when the program runs (but it compiles fine):
void writeXYZ(System &system, string filename, int step) {
ofstream myfile;
myfile.open (filename, ios_base::app);
long unsigned int size = system.atoms.size();
myfile << to_string(size) + "\nStep count: " + to_string(step) + "\n";
for (int i = 0; i < size; i++) {
myfile << system.atoms[i].name;
myfile << " ";
myfile << system.atoms[i].pos["x"]*1e10;
myfile << " ";
myfile << system.atoms[i].pos["y"]*1e10;
myfile << " ";
myfile << system.atoms[i].pos["z"]*1e10;
myfile << "\n";
}
// get center of mass
double* comfinal = new double[3]; // goes fine
comfinal = centerOfMass(system); // does NOT go fine..
myfile << "COM " << to_string(comfinal[0]) << " " << to_string(comfinal[1]) << " " << to_string(comfinal[2]) << "\n";
myfile.close();
}
Running the program yields normal function until it tries to call centerOfMass.
I've checked most possible solutions; I think I just lack understanding on pointers and their scope in C++. I'm seasoned in PHP so dealing with memory explicitly is problematic.
Thank you kindly
I'm not sure about the type of system.atoms. If it's a STL container like std::vector, the condition part of the for loop inside function centerOfMass is wrong.
long unsigned int size = system.atoms.size();
for (int i=0; i<=size; i++) {
should be
long unsigned int size = system.atoms.size();
for (int i=0; i<size; i++) {
PS1: You can use Range-based for loop (since C++11) to avoid such kind of problem.
PS2: You didn't delete[] the dynamically allocated arrays; Consider about using std::vector, std::array, or std::unique_ptr instead, they're designed to help you to avoid such kind of issues.
In addition to the concerns pointed out by songyuanyao, the usage of the function in writeXYZ() causes a memory leak.
To see this, note that centerOfMass() does (with extraneous details removed)
double* output = new double[3]; // <-------- here is output
// assign values to output
return output;
and writeXYZ() does (note that I've changed comments to reflect what is actually happening, as distinct from your comments on what you thought was happening)
double* comfinal = new double[3]; // allocate three doubles
comfinal = centerOfMass(system); // lose reference to them
// output to myfile
If writeXYZ() is being called multiple times, then the three doubles will be leaked every time, EVEN IF, somewhere, delete [] comfinal is subsequently performed. If this function is called numerous times (for example, in a loop) eventually the amount of memory leaked can exceed what is available, and subsequent allocations will fail.
One fix of this problem is to change the relevant part of writeXYZ() to
double* comfinal = centerOfMass(system);
// output to myfile
delete [] comfinal; // eventually
Introducing std::unique_ptr in the above will alleviate the symptoms, but that is more a happy accident than good logic in the code (allocating memory only to discard it immediately without having used it is rarely good technique).
In practice, you are better off using standard containers (std::vector, etc) and avoid using operator new at all. But they still require you to keep within bounds.
Trying to figure out the reasoning behind the mechanics of c strings.
char** text;
text = new char*[5];
for(int i = 0; int < 5; int++) {
cout << endl << "Enter a phrase: ";
cin >> text[i];
cout << text[i];
}
I'm not entirely sure as to why this works for the first 2 iterations, even successfully displaying them, but gets a segfault error on the 3rd iteration.
You are using uninitialized memory. You are experiencing undefined behavior.
The line
text = new char*[5];
allocated memory for five pointers but those pointers haven't been initialized to point to anything valid. Before you can use text[i] to read data, you have to allocate memory for it.
for(int i = 0; int < 5; int++) {
cout << endl << "Enter a phrase: ";
text[i] = new char[SOME_SIZE_LARGE_ENOUGH_FOR_YOUR_NEED];
cin >> text[i];
cout << text[i];
}
Then, it should work.
You've allocated memory for 5 pointers, but you are not allocating anything that those five pointers point to. Assuming that you're using a modern 64-bit CPU with 8 byte-wide pointers, your new operator allocated exactly 40 bytes, five eight-byte pointers. Their initial contents are random, uninitialized memory, and when you write to them, they get interpreted as pointers to random memory addresses, which end up being corrupted with what you've read from std::cin. You got lucky initially, and the first two iterations scribbled over some memory somewhere, but your program continued to limp along, but you won the lottery on the third try; hitting a random address that does not exist, and segfaulting.
Although you can rewrite this to do proper allocation, if you're really trying to write C++, rather than C, here, there's no reason to allocate anything. Why do you want to deal with allocating memory, when C++ will happily do it for you?
std::vector<std::string> text;
for(int i = 0; int < 5; int++)
{
std::cout << std::endl << "Enter a phrase: ";
std::string s;
if (std::getline(std::cin, s).eof())
break;
text.push_back(s);
std::cout << s << std::endl;
}
So this is my main method:
#include <iostream>
#include "TextFileToArray.h"
#include <vector>
using namespace std;
int main()
{
cout << "Hello world!" << endl;
TextFileToArray myobject("C:/bunz.txt");
vector<string> v[10];
myobject.vectorfiller(*v);
for(int i =0; i<10; i++){
cout << v;
}
}
It calls upon an object known as myobject and it calls upon a method/function. Here is the method/function:
int TextFileToArray::vectorfiller(vector<string>& givenpointer) {
vector<string> *innervec = &givenpointer;
const char * constantcharversion = path.c_str();
ifstream filler(constantcharversion);
string bunz;
string lineoutliner = "Line ";
string equalssign = " = ";
int numbercounter = 1;
while (!filler.eof()) {
std::getline(filler, bunz, ';');
if (bunz.empty()) {
lineoutliner = "";
numbercounter = 0;
equalssign = "";
}
cout << lineoutliner << numbercounter << equalssign << bunz <<endl;
cout << "" << endl;
innervec->push_back(bunz);
numbercounter++;
}
filler.close();
return 0;
}
So far it displays the text from the textfile, but for some reason it pushes memory addresses into the vector, so when main() displays the vector, it shows memory locations:
vector<string> v[10]; creates an array of 10 vectors, which is probably not what you want.
Create a single vector, pass that as parameter, and output its contents:
vector<string> v;
myobject.vectorfiller(v);
for(int i =0; i < v.size(); i++){
cout << v[i];
}
Agree, size should not be 10, should be size(), and count<
The problem is that you're printing the array of vectors, not the elements in the first vector. Instead, you want this in your main:
for (int i = 0; i < v[0].size(); i++) {
cout << v[0][i] << endl;
}
PS: As Luchian said, you are creating 10 vectors, not one vector with 10 slots. To get just one vector do this:
vector<string> v;
You also don't need to mention 10; vectors grow when you push elements on them. If you happen to know how much space you want to be reserved ahead of time, you can use the reserve member function like so:
vector<string> v;
v.reserve(some_number);
reserve doesn't change the size of v; it only makes the vector ready to accept that many elements, so that it doesn't have to reallocate memory, and copy things around as much. It is purely an optimization; if you were to simply comment out reserve calls in your program, it will behave exactly the same. The only thing that might change is performance, and memory usage.