Correct way of reading /proc/pid/status - c++

I read /proc/<pid>/status this way:
std::ifstream file(filename);
std::string line;
int numberOfLinesToRead = 4;
int linesRead = 0;
while (std::getline(file, line)) {
// do stuff
if (numberOfLinesToRead == ++linesRead) {
break;
}
}
I noticed that in rare cases std::getline hangs.
Why it happens? I was under impression that proc filesystem should be in somewhat consistent state and there should not be cases when newline is missing. My assumption was that getline returns false when EOF/error occurs.
What is the recommended, safe way to read /proc/<pid>/status ?

Perhaps a more sure path is to use fread into a large buffer. The status file is small so allocate a local buffer and read the whole file.
Example look at the second answer for the simplest solution
This may still fail on the fopen or fread but a sensible error should be returned.

/proc is a virtual filesystem. That means reading from "files" in it is not the same as reading from normal filesystem.
If process exits the information about it is removed from /proc much faster than if it was real filesystem (dirty cache flushing delay involved here).
Bearing that in mind imagine that process exits before you get to read next line which wasn't buffered yet.
The solution is either to account for file loss since you may not need information about process which does not exists anymore or buffer the entire file and then only parse it.
EDIT: the hang in process should clearly be related to the fact that this is virtual filesystem. It does not behave exactly same way as real filesystem. Since this is specific fs type the issue could be in fs driver. The code you provide looks fine for normal file reading.

Related

Do we need mutex to perform multithreading file IO

I'm trying to do random write (Benchmark test) to a file using multiple threads (pthread). Looks like if I comment out mutex lock the created file size is less than actual as if Some writes are getting lost (always in some multiple of chunk size). But if I keep the mutex it's always exact size.
Is my code have a problem in other place and mutex is not really required (as suggested by #evan ) or mutex is necessary here
void *DiskWorker(void *threadarg) {
FILE *theFile = fopen(fileToWrite, "a+");
....
for (long i = 0; i < noOfWrites; ++i) {
//pthread_mutex_lock (&mutexsum);
// For Random access
fseek ( theFile , randomArray[i] * chunkSize , SEEK_SET );
fputs ( data , theFile );
//Or for sequential access (in this case above 2 lines would not be here)
fprintf(theFile, "%s", data);
//sequential access end
fflush (theFile);
//pthread_mutex_unlock(&mutexsum);
}
.....
}
You are opening a file using "append mode". According to C11:
Opening a file with append mode ('a' as the first character in the
mode argument) causes all subsequent writes to the file to be forced
to the then current end-of-file, regardless of intervening calls to
the fseek function.
C standard does not specified how exactly this should be implemented, but on POSIX system this is usually implemented using O_APPEND flag of open function, while flushing data is done using function write. Note that fseek call in your code should have no effect.
I think POSIX requires this, as it describes how redirecting output in append mode (>>) is done by the shell:
Appended output redirection shall cause the file whose name results
from the expansion of word to be opened for output on the designated
file descriptor. The file is opened as if the open() function as
defined in the System Interfaces volume of POSIX.1-2008 was called
with the O_APPEND flag. If the file does not exist, it shall be
created.
And since most programs use FILE interface to send data to stdout, this probably requires fopen to use open with O_APPEND and write (and not functions like pwrite) when writing data.
So if on your system fopen with 'a' mode uses O_APPEND and flushing is done using write and your kernel and filesystem correctly implement O_APPEND flag, using mutex should have no effect as writes do not intervene:
If the O_APPEND flag of the file status flags is set, the file
offset shall be set to the end of the file prior to each write and no
intervening file modification operation shall occur between changing
the file offset and the write operation.
Note that not all filesystems support this behavior. Check this answer.
As for my answer to your previous question, my suggestion was to remove mutex as it should have no effect on the size of a file (and it didn't have any effect on my machine).
Personally, I never really used O_APPEND and would be hesitant to do so, as its behavior might not be supported at some level, plus its behavior is weird on Linux (see "bugs" section of pwrite).
You definitely need a mutex because you are issuing several different file commands. The underlying file subsystem can't possibly know how many file commands you are going to call to complete your whole operation.
So you need the mutex.
In your situation you may find you get better performance putting the mutex outside the loop. The reason being that, otherwise, switching between threads may cause excessive skipping between different parts of the disk. Hard disks take about 10ms to move the read/write head so that could potentially slow things down a lot.
So it might be a good idea to benchmark that.

How to copy a file from one location to another in a fast way with C++ program? [duplicate]

This question already has answers here:
Copy a file in a sane, safe and efficient way
(9 answers)
Closed 7 years ago.
I am trying to understand the code behind the copy command which copies a file from one place to other.I studied c++ file system basics and have written the following code for my task.
#include<iostream>
#include<fstream>
using namespace std;
main()
{
cout<<"Copy file\n";
string from,to;
cout<<"Enter file address: ";
cin>>from;
ifstream in(from,ios::in | ios::binary);
if(!in)
{
cout<<"could not find file "<<from<<endl;
return 1;
}
cout<<"Enter file destination: ";
cin>>to;
ofstream out(to,ios::out | ios::binary);
char ch;
while(in.get(ch))
{
out.put(ch);
}
cout<<"file has been copied\n";
in.close();
out.close();
}
Though this code works but is much slower than the copy command of my OS which is windows.I want to know how I can make my program faster to reduce the difference between my program's time and the my OS's copy command time.
Reading one byte at time is going to waste a lot of time in function calls... use a bigger buffer:
char ch[4096];
while(in) {
in.read(ch, sizeof(ch));
out.write(ch, in.gcount());
}
(you may want to add some more error handling, e.g. out may go in a bad state and the like)
(the most C++-ish way is reported here, but takes advantage of streambuf functionalities that typically a beginner rarely has reason to know, and to me is also way less instructive)
You have correctly opened the file for binary read and binary write. However instead of reading characters(which is not meaningful in binary format), use istream::read and ostream::write.
Like other answers say, use bigger buffers. I'd go for 1MB.
But there's a lot more to it.
Also, avoid stream lib and FILE stuff. They buffer the data so you get 2 memcpy calls instead of 1.
Disabling buffering on the streams can achieve a similar result, but I think you're better of using the system calls directly.
And one last thing, on the "do it yourself" front. You must check the return values from read and write calls. They may read/write less bytes than you ask them to.
If you can manage a circular buffer, you should switch read/wrote whenever the function returns short... disk may be more ready for reading or for writing so no point in wasting time waiting instead of switching to the other thing you have to do.
And now the very last thing you might want to explore- look into the sendfile system call. It was built to speed up web servers by doing all the copy in the kernel and avoiding context switches and memcpys, but may serve here if it works with two disk file descriptors.

Writing huge txt files without overloading RAM

I need to write the results of a process in a txt file. The process is very long and the amount of data to be written is huge (~150Gb). The program works fine, but the problem is that the RAM gets overloaded and, at a certain point, it just stops.
The program is simple:
ostream f;
f.open(filePath);
for(int k=0; k<nDataset; k++){
//treat element of dataset
f << result;
}
f.close();
Is there a way of writing this file without overloading the memory?
You should flush output periodically.
For example:
if (k%10000 == 0) f.flush();
I'd like to suggest something like this
ogzstream f;
f.open(filePath);
string s("");
for(int k=0; k<nDataset; k++){
//treat element of dataset
s.append(result);
if (s.length() == OPTIMUM_BUFFER_SIZE) {
f << s;
f.flush();
s.clear();
}
}
f << s;
f.flush();
f.close();
Basically, you construct the stream in memory rather than redirecting to the stream so you don't have to worry about when the stream gets flushed. And when you are redirecting you ensure it's flushed to the actual file. Some ideas for the OPTIMUM_BUFFER_SIZE can be found from here and here.
I'm not exactly sure whether string or vector is the best option for the buffer. Will do some research myself and update the answer or you can refer to Effective STL by Scott Meyers.
If that truly is the code where your program gets stuck, then your explanation of the problem is wrong.
There's no text file. Your igzstream is not dealing with text, but a gzip archive.
There's no data being written. The code you show reads from the stream.
I don't know what your program does with result, because you didn't show that. But if it accumulates results into a collection in memory, that will grow. You'll need to find a way to process all your data without loading all of it into RAM at the same time.
Your memory usage could be from the decompressor. For some compression algorithms, an entire block has to be stored in memory. In such cases it's best to break the file into blocks and compress each separately (possibly pre-initializing a dictionary with the results of the previous block). I don't think that gzip is such an algorithm, however. You may need to find a library that supports streaming.

ifstream vs. fread for binary files

Which is faster? ifstream or fread.
Which should I use to read binary files?
fread() puts the whole file into the memory.
So after fread, accessing the buffer it creates is fast.
Does ifstream::open() puts the whole file into the memory?
or does it access the hard disk every time we run ifstream::read()?
So... does ifstream::open() == fread()?
or (ifstream::open(); ifstream::read(file_length);) == fread()?
Or shall I use ifstream::rdbuf()->read()?
edit:
My readFile() method now looks something like this:
void readFile()
{
std::ifstream fin;
fin.open("largefile.dat", ifstream::binary | ifstream::in);
// in each of these small read methods, there are at least 1 fin.read()
// call inside.
readHeaderInfo(fin);
readPreference(fin);
readMainContent(fin);
readVolumeData(fin);
readTextureData(fin);
fin.close();
}
Will the multiple fin.read() calls in the small methods slow down the program?
Shall I only use 1 fin.read() in the main method and pass the buffer into the small methods? I guess I am going to write a small program to test.
Thanks!
Are you really sure about fread putting the whole file into memory? File access can be buffered, but I doubt that you really get the whole file put into memory. I think ifstream::read just uses fread under the hood in a more C++ conformant way (and is therefore the standard way of reading binary information from a file in C++). I doubt that there is a significant performance difference.
To use fread, the file has to be open. It doesn't take just a file and put it into memory at once. so ifstream::open == fopen and ifstream::read == fread.
C++ stream api is usually a little bit slower then C file api if you use high level api, but it provides cleaner/safer api then C.
If you want speed, consider using memory mapped files, though there is no portable way of doing this with standard library.
As to which is faster, see my comment. For the rest:
Neither of these methods automatically reads the whole file into memory. They both read as much as you specify.
As least for ifstream I am sure that the IO is buffered, so there will not necessarily be a disk access for every read you make.
See this question for the C++-way of reading binary files.
The idea with C++ file streams is that some or all of the file is buffered in memory (based on what it thinks is optimal) and that you don't have to worry about it.
I would use ifstream::read() and just tell it how much you need.
Use stream operator:
DWORD processPid = 0;
std::ifstream myfile ("C:/Temp/myprocess.pid", std::ios::binary);
if (myfile.is_open())
{
myfile >> processPid;
myfile.close();
std::cout << "PID: " << processPid << std::endl;
}

c++ file bad bit

when I run this code, the open and seekg and tellg operation all success.
but when I read it, it fails, the eof,bad,fail bit are 0 1 1.
What can cause a file bad?
thanks
int readriblock(int blockid, char* buffer)
{
ifstream rifile("./ri/reverseindex.bin", ios::in|ios::binary);
rifile.seekg(blockid * RI_BLOCK_SIZE, ios::beg);
if(!rifile.good()){ cout<<"block not exsit"<<endl; return -1;}
cout<<rifile.tellg()<<endl;
rifile.read(buffer, RI_BLOCK_SIZE);
**cout<<rifile.eof()<<rifile.bad()<<rifile.fail()<<endl;**
if(!rifile.good()){ cout<<"error reading block "<<blockid<<endl; return -1;}
rifile.close();
return 0;
}
Quoting the Apache C++ Standard Library User's Guide:
The flag std::ios_base::badbit indicates problems with the underlying stream buffer. These problems could be:
Memory shortage. There is no memory available to create the buffer, or the buffer has size 0 for other reasons (such as being provided from outside the stream), or the stream cannot allocate memory for its own internal data, as with std::ios_base::iword() and std::ios_base::pword().
The underlying stream buffer throws an exception. The stream buffer might lose its integrity, as in memory shortage, or code conversion failure, or an unrecoverable read error from the external device. The stream buffer can indicate this loss of integrity by throwing an exception, which is caught by the stream and results in setting the badbit in the stream's state.
That doesn't tell you what the problem is, but it might give you a place to start.
Keep in mind the EOF bit is generally not set until a read is attempted and fails. (In other words, checking rifile.good after calling seekg may not accomplish anything.)
As Andrey suggested, checking errno (or using an OS-specific API) might let you get at the underlying problem. This answer has example code for doing that.
Side note: Because rifile is a local object, you don't need to close it once you're finished. Understanding that is important for understanding RAII, a key technique in C++.
try old errno. It should show real reason for error. unfortunately there is no C++ish way to do it.