I am working with mmap() to fastly read big files, basing my script on this question answer (Fast textfile reading in c++).
I am using the second version from sehe answer :
#include <algorithm>
#include <iostream>
#include <cstring>
// for mmap:
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
const char* map_file(const char* fname, size_t& length);
int main()
{
size_t length;
auto f = map_file("test.cpp", length);
auto l = f + length;
uintmax_t m_numLines = 0;
while (f && f!=l)
if ((f = static_cast<const char*>(memchr(f, n, l-f))))
m_numLines++, f++;
std::cout << "m_numLines = " << m_numLines << "n";
}
void handle_error(const char* msg) {
perror(msg);
exit(255);
}
const char* map_file(const char* fname, size_t& length)
{
int fd = open(fname, O_RDONLY);
if (fd == -1)
handle_error("open");
// obtain file size
struct stat sb;
if (fstat(fd, &sb) == -1)
handle_error("fstat");
length = sb.st_size;
const char* addr = static_cast<const char*>(mmap(NULL, length, PROT_READ, MAP_PRIVATE, fd, 0u));
if (addr == MAP_FAILED)
handle_error("mmap");
// TODO close fd at some point in time, call munmap(...)
return addr;
}
and it works just great.
But if I implement it over a loop of several files (I just change the main() function name to:
void readFile(std::string &nomeFile) {
and then get the file content in "f" object in main() function with:
size_t length;
auto f = map_file(nomeFile.c_str(), length);
auto l = f + length;
and call it from main() on a loop over a filenames list), after a while I got:
open: Too many open files
I imagine there would be a way to close the open() call after working on a file, but I can not figure out how and where to put it exactly. I tried:
int fc = close(fd);
at the end of the readFile() function but it did change nothing.
Thanks a lot in advance for any help!
EDIT:
after the important suggestions I received I made some performance comparison with different approaches with mmap() and std::cin(), check out: fast file reading in C++, comparison of different strategies with mmap() and std::cin() results interpretation for the results
Limit to the number of concurrently open files
As you can imagine, keeping a file open consumes resources. So there is in any case a practical limit to the number of open file descriptors on your system. This is why it's highly recommended to close files that you no longer need.
The exact limit depends on the OS and the configuration. If you want to know more, there are already a lot of answers available for this kind of question.
Special case of mmap
Obviously, with mmap() you open a file. And doing so repetitively in a loop risk to reach sooner or later the fatal file description limit, as you could experience.
The idea of trying to close the file is not bad. The problem is that it does not work. This is specified in the POSIX documentation:
The mmap() function adds an extra reference to the file associated
with the file descriptor fildes which is not removed by a subsequent
close() on that file descriptor. This reference is removed when there
are no more mappings to the file.
Why ? Because mmap() links the file in a special way to the virtual memory management in your system. And this file will be needed as long as you use the address range to which it was allocated.
So how to remove those mappings ? The answer is to use munmap():
The function munmap() removes any mappings for those entire pages
containing any part of the address space of the process starting at
addr and continuing for len bytes.
And of course, close() the file descriptor that you no longer need. A prudent approach would be to close after munmap(), but in principle, at least on a POSIX compliant system, it should not matter when you're closing. Nevertheless, check your latest OS documentation to be on the safe side :-)
*Note: file mapping is also available on windows; the documentation about closing the handles is ambiguous on potential memory leaks if there are remaining mappings. This is why I recommend prudence on the closing moment. *
Related
I have a C++ program that uses the POSIX API to write a file opened with O_DIRECT. Concurrently, another thread is reading back from the same file via a different file descriptor. I've noticed that occasionally the data read back from the file contains all zeroes, rather than the actual data I wrote. Why is this?
Here's an MCVE in C++17. Compile with g++ -std=c++17 -Wall -otest test.cpp or equivalent. Sorry I couldn't seem to make it any shorter. All it does is write 100 MiB of constant bytes (0x5A) to a file in one thread and read them back in another, printing a message if any of the read-back bytes are not equal to 0x5A.
WARNING, this MCVE will delete and rewrite any file in the current working directory named foo.
#include <algorithm>
#include <cstddef>
#include <cstdint>
#include <cstdlib>
#include <iostream>
#include <thread>
#include <fcntl.h>
#include <unistd.h>
#include <sys/stat.h>
#include <sys/types.h>
constexpr size_t CHUNK_SIZE = 1024 * 1024;
constexpr size_t TOTAL_SIZE = 100 * CHUNK_SIZE;
int main(int argc, char *argv[])
{
::unlink("foo");
std::thread write_thread([]()
{
int fd = ::open("foo", O_WRONLY | O_CREAT | O_DIRECT, 0777);
if (fd < 0) std::exit(-1);
uint8_t *buffer = static_cast<uint8_t *>(
std::aligned_alloc(4096, CHUNK_SIZE));
std::fill(buffer, buffer + CHUNK_SIZE, 0x5A);
size_t written = 0;
while (written < TOTAL_SIZE)
{
ssize_t rv = ::write(fd, buffer,
std::min(TOTAL_SIZE - written, CHUNK_SIZE));
if (rv < 0) { std::cerr << "write error" << std::endl; std::exit(-1); }
written += rv;
}
});
std::thread read_thread([]()
{
int fd = ::open("foo", O_RDONLY, 0);
if (fd < 0) std::exit(-1);
uint8_t *buffer = new uint8_t[CHUNK_SIZE];
size_t checked = 0;
while (checked < TOTAL_SIZE)
{
ssize_t rv = ::read(fd, buffer, CHUNK_SIZE);
if (rv < 0) { std::cerr << "write error" << std::endl; std::exit(-1); }
for (ssize_t i = 0; i < rv; ++i)
if (buffer[i] != 0x5A)
std::cerr << "readback mismatch at offset " << checked + i << std::endl;
checked += rv;
}
});
write_thread.join();
read_thread.join();
}
(Details such as proper error checking and resource management are omitted here for the sake of the MCVE. This is not my actual program but it shows the same behavior.)
I'm testing on Linux 4.15.0 with an SSD. About 1/3 of the time I run the program, the "readback mismatch" message prints. Sometimes it doesn't. In all cases, if I examine foo after the fact I find that it does contain the correct data.
If you remove O_DIRECT from the ::open() flags in the write thread, the problem goes away and the "readback mismatch" message never prints.
I could understand why my ::read() might return 0 or something to indicate I've already read everything that has been flushed to disk yet. But I can't understand why it would perform what appears to be a successful read, but with data other than what I wrote. Clearly I'm missing something, but what is it?
So, O_DIRECT has some additional constraints that might not make it what you're looking for:
Applications should avoid mixing O_DIRECT and normal I/O to the same
file, and especially to overlapping byte regions in the same file.
Even when the filesystem correctly handles the coherency issues in
this situation, overall I/O throughput is likely to be slower than
using either mode alone.
Instead, I think O_SYNC might be better, since it does provide the expected guarantees:
O_SYNC provides synchronized I/O file integrity completion, meaning
write operations will flush data and all associated metadata to the
underlying hardware. O_DSYNC provides synchronized I/O data
integrity completion, meaning write operations will flush data to the
underlying hardware, but will only flush metadata updates that are
required to allow a subsequent read operation to complete
successfully. Data integrity completion can reduce the number of
disk operations that are required for applications that don't need
the guarantees of file integrity completion.
i wrote an application which processes data on the GPU. Code works well, but i have the problem that the reading part of the input file (~3GB, text) is the bottleneck of my application. (The read from the HDD is fast, but the processing line by line is slow).
I read a line with getline() and copy line 1 to a vector, line2 to a vector and skip lines 3 and 4. And so on for the rest of the 11 mio lines.
I tried several approaches to get the file at the best time possible:
Fastest method I found is using boost::iostreams::stream
Others were:
Read the file as gzip, to minimize IO, but is slower than directly
reading it.
copy file to ram by read(filepointer, chararray, length)
and process it with a loop to distinguish the lines (also slower than boost)
Any suggestions how to make it run faster?
void readfastq(char *filename, int SRlength, uint32_t blocksize){
_filelength = 0; //total datasets (each 4 lines)
_SRlength = SRlength; //length of the 2. line
_blocksize = blocksize;
boost::iostreams::stream<boost::iostreams::file_source>ins(filename);
in = ins;
readNextBlock();
}
void readNextBlock() {
timeval start, end;
gettimeofday(&start, 0);
string name;
string seqtemp;
string garbage;
string phredtemp;
_seqs.empty();
_phred.empty();
_names.empty();
_filelength = 0;
//read only a part of the file i.e the first 4mio lines
while (std::getline(in, name) && _filelength<_blocksize) {
std::getline(in, seqtemp);
std::getline(in, garbage);
std::getline(in, phredtemp);
if (seqtemp.size() != _SRlength) {
if (seqtemp.size() != 0)
printf("Error on read in fastq: size is invalid\n");
} else {
_names.push_back(name);
for (int k = 0; k < _SRlength; k++) {
//handle special letters
if(seqtemp[k]== 'A') ...
else{
_seqs.push_back(5);
}
}
_filelength++;
}
}
EDIT:
The source-file is downloadable under https://docs.google.com/open?id=0B5bvyb427McSMjM2YWQwM2YtZGU2Mi00OGVmLThkODAtYzJhODIzYjNhYTY2
I changed the function readfastq to read the file, because of some pointer problems. So if you call readfastq the blocksize (in lines) must be bigger than the number of lines to read.
SOLUTION:
I found a solution, which get the time for read in the file from 60sec to 16sec. I removed the inner-loop which handeles the special characters and do this in GPU. This decreases the read-in time and only minimal increases the GPU running time.
Thanks for your suggestions.
void readfastq(char *filename, int SRlength) {
_filelength = 0;
_SRlength = SRlength;
size_t bytes_read, bytes_expected;
FILE *fp;
fp = fopen(filename, "r");
fseek(fp, 0L, SEEK_END); //go to the end of file
bytes_expected = ftell(fp); //get filesize
fseek(fp, 0L, SEEK_SET); //go to the begining of the file
fclose(fp);
if ((_seqarray = (char *) malloc(bytes_expected/2)) == NULL) //allocate space for file
err(EX_OSERR, "data malloc");
string name;
string seqtemp;
string garbage;
string phredtemp;
boost::iostreams::stream<boost::iostreams::file_source>file(filename);
while (std::getline(file, name)) {
std::getline(file, seqtemp);
std::getline(file, garbage);
std::getline(file, phredtemp);
if (seqtemp.size() != SRlength) {
if (seqtemp.size() != 0)
printf("Error on read in fastq: size is invalid\n");
} else {
_names.push_back(name);
strncpy( &(_seqarray[SRlength*_filelength]), seqtemp.c_str(), seqtemp.length()); //do not handle special letters here, do on GPU
_filelength++;
}
}
}
First instead of reading the file into memory you may work with file mappings. You just have to build your program as 64-bit to fit 3GB of virtual address space (for 32-bit application only 2GB is accessible in the user mode). Or alternatively you may map & process your file by parts.
Next, it sounds to me that your bottleneck is "copying a line to a vector". Dealing with vectors involves dynamic memory allocation (heap operations), which in a critical loop hits the performance very seriously). If this is the case - either avoid using vectors, or make sure they're declared outside the loop. The latter helps because when you reallocate/clear vectors they do not free memory.
Post your code (or a part of it) for more suggestions.
EDIT:
It seems that all your bottlenecks are related to string management.
std::getline(in, seqtemp); reading into an std::string deals with the dynamic memory allocation.
_names.push_back(name); This is even worse. First the std::string is placed into the vector by value. Means - the string is copied, hence another dynamic allocation/freeing happens. Moreover, when eventually the vector is internally reallocated - all the contained strings are copied again, with all the consequences.
I recommend using neither standard formatted file I/O functions (Stdio/STL) nor std::string. To achieve better performance you should work with pointers to strings (rather than copied strings), which is possible if you map the entire file. Plus you'll have to implement the file parsing (division into lines).
Like in this code:
class MemoryMappedFileParser
{
const char* m_sz;
size_t m_Len;
public:
struct String {
const char* m_sz;
size_t m_Len;
};
bool getline(String& out)
{
out.m_sz = m_sz;
const char* sz = (char*) memchr(m_sz, '\n', m_Len);
if (sz)
{
size_t len = sz - m_sz;
m_sz = sz + 1;
m_Len -= (len + 1);
out.m_Len = len;
// for Windows-format text files remove the '\r' as well
if (len && '\r' == out.m_sz[len-1])
out.m_Len--;
} else
{
out.m_Len = m_Len;
if (!m_Len)
return false;
m_Len = 0;
}
return true;
}
};
if _seqs and _names are std::vectors and you can guess the final size of them before processing the whole 3GB of data, you can use reserve to avoid most of the memory re-allocation during pushing back the new elements in the loop.
You should be aware of the fact that the vectors effectively produce another copy of parts of the file in main memory. So unless you have a main memory sufficiently large to store the text file plus the vector and its contents, you will probably end up with a number of page faults that also have a significant influence on the speed of your program.
You are apparently using <stdio.h> since using getline.
Perhaps fopen-ing the file with fopen(path, "rm"); might help, because the m tells (it is a GNU extension) to use mmap for reading.
Perhaps setting a big buffer (i.e. half a megabyte) with setbuffer could also help.
Probably, using the readahead system call (in a separate thread perhaps) could help.
But all this are guesses. You should really measure things.
General suggestions:
Code the simplest, most straight-forward, clean approach,
Measure,
Measure,
Measure,
Then if all else fails:
Read raw bytes (read(2)) in page-aligned chunks. Do so sequentially, so kernel's read-ahead plays to your advantage.
Re-use the same buffer to minimize cache flushing.
Avoid copying data, parse in place, pass around pointers (and sizes).
mmap(2)-ing [parts of the] file is another approach. This also avoids kernel-userland copy.
Depending on your disk speed, using a very fast de compression algorithm might help, like fastlz (there are at least two other that might be more efficient, but under GPL, so licence can be a problem).
Also, using C++ data structures and functions car increase the speed as you can maybe achieve a better compiler-time optimization. Going the C way isn't always the fastes! In some bad conditions, using char* you need to parse the whole string to reach the \0 yielding desastrous performances.
For parsing your data, using boost::spirit::qi is also probably the most optimized approach http://alexott.blogspot.com/2010/01/boostspirit2-vs-atoi.html
I'm using code like the following to check whether a file has been created before continuing, thing is the file is showing up in the file browser much before it is being detected by stat... is there a problem with doing this?
//... do something
struct stat buf;
while(stat("myfile.txt", &buf))
sleep(1);
//... do something else
alternatively is there a better way to check whether a file exists?
Using inotify, you can arrange for the kernel to notify you when a change to the file system (such as a file creation) takes place. This may well be what your file browser is using to know about the file so quickly.
The "stat" system call is collecting different information about the file, such as, for example, a number of hard links pointing to it or its "inode" number. You might want to look at the "access" system call which you can use to perform existence check only by specifying "F_OK" flag in "mode".
There is, however, a little problem with your code. It puts the process to sleep for a second every time it checks for file which doesn't exist yet. To avoid that, you have to use inotify API, as suggested by Jerry Coffin, in order to get notified by kernel when file you are waiting for is there. Keep in mind that inotify does not notify you if file is already there, so in fact you need to use both "access" and "inotify" to avoid a race condition when you started watching for a file just after it was created.
There is no better or faster way to check if file exists. If your file browser still shows the file slightly faster than this program detects it, then Greg Hewgill's idea about renaming is probably taking place.
Here is a C++ code example that sets up an inotify watch, checks if file already exists and waits for it otherwise:
#include <cstdio>
#include <cstring>
#include <string>
#include <unistd.h>
#include <sys/inotify.h>
int
main ()
{
const std::string directory = "/tmp";
const std::string filename = "test.txt";
const std::string fullpath = directory + "/" + filename;
int fd = inotify_init ();
int watch = inotify_add_watch (fd, directory.c_str (),
IN_MODIFY | IN_CREATE | IN_MOVED_TO);
if (access (fullpath.c_str (), F_OK) == 0)
{
printf ("File %s exists.\n", fullpath.c_str ());
return 0;
}
char buf [1024 * (sizeof (inotify_event) + 16)];
ssize_t length;
bool isCreated = false;
while (!isCreated)
{
length = read (fd, buf, sizeof (buf));
if (length < 0)
break;
inotify_event *event;
for (size_t i = 0; i < static_cast<size_t> (length);
i += sizeof (inotify_event) + event->len)
{
event = reinterpret_cast<inotify_event *> (&buf[i]);
if (event->len > 0 && filename == event->name)
{
printf ("The file %s was created.\n", event->name);
isCreated = true;
break;
}
}
}
inotify_rm_watch (fd, watch);
close (fd);
}
your code will check if the file is there every second. you can use inotify to get an event instead.
If I have a large binary file (say it has 100,000,000 floats), is there a way in C (or C++) to open the file and read a specific float, without having to load the whole file into memory (i.e. how can I quickly find what the 62,821,214th float is)? A second question, is there a way to change that specific float in the file without having to rewrite the entire file?
I'm envisioning functions like:
float readFloatFromFile(const char* fileName, int idx) {
FILE* f = fopen(fileName,"rb");
// What goes here?
}
void writeFloatToFile(const char* fileName, int idx, float f) {
// How do I open the file? fopen can only append or start a new file, right?
// What goes here?
}
You know the size of a float is sizeof(float), so multiplication can get you to the correct position:
FILE *f = fopen(fileName, "rb");
fseek(f, idx * sizeof(float), SEEK_SET);
float result;
fread(&result, sizeof(float), 1, f);
Similarly, you can write to a specific position using this method.
fopen allows to open a file for modification (not just to append) by using either the rb+ or wb+ mode on fopen. See here: http://www.cplusplus.com/reference/clibrary/cstdio/fopen/
To position the file to a specific float, you can use the fseek by using index*sizeof(float) as the offset ad SEEK_SET as the orign. See here: http://www.cplusplus.com/reference/clibrary/cstdio/fseek/
Here is an example if you would like to use C++ streams:
#include <fstream>
using namespace std;
int main()
{
fstream file("floats.bin", ios::binary);
float number;
file.seekp(62821214*sizeof(float), ios::beg);
file.read(reinterpret_cast<char*>(&number), sizeof(float));
file.seekp(0, ios::beg); // move to the beginning of the file
number = 3.2;
// write number at the beginning of the file
file.write(reinterpret_cast<char*>(&number), sizeof(float));
}
One way would be to call mmap() on the file. Once you've done that, you can read/modify the file as if it was an in-memory array.
Of course that method only works if the file is small enough to fit in your process's address space... if you're running in 64-bit mode, you'll be fine; in 32-bit mode, a file with 100,000,000 floats should fit, but another order or two of magnitude above that and you might run into trouble.
I know this question has been answered already, but Linux/Unix provides easy system calls to read/write(pread/pwrite) in the middle of a file. If you look at the kernel source code for the system calls 'read' & 'pread', both eventually calls the vfs_read().And vfs_read requires a OFFSET, i.e it requires a POSITION to read from the file. In pread,this offset is given by us and in read() the offset is calculated internally in the kernel and maintained for the file descriptor. pread() offers exceptional performance compared to read() and using pread ,you can read/write in the same file descriptor simultaneously in multiple threads in different parts of the file. My Humble opionion, never use read() or other file streams, use pread(). Hope the filestream libraries have wrapped the read() calls, the streams perform well by making fewer system calls.
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
int main()
{
char* buf; off_t offToStart = id * sizeof(float); size_t sizeToRead = sizeof(float);
int fd = open("fileName", O_RDONLY);
ret = pread(fd, buf, sizeToRead, offToStart);
//processs from the read 'buf'
close(fd);
}
I'm using the following code to try to read the results of a df command in Linux using popen.
#include <iostream> // file and std I/O functions
int main(int argc, char** argv) {
FILE* fp;
char * buffer;
long bufSize;
size_t ret_code;
fp = popen("df", "r");
if(fp == NULL) { // head off errors reading the results
std::cerr << "Could not execute command: df" << std::endl;
exit(1);
}
// get the size of the results
fseek(fp, 0, SEEK_END);
bufSize = ftell(fp);
rewind(fp);
// allocate the memory to contain the results
buffer = (char*)malloc( sizeof(char) * bufSize );
if(buffer == NULL) {
std::cerr << "Memory error." << std::endl;
exit(2);
}
// read the results into the buffer
ret_code = fread(buffer, 1, sizeof(buffer), fp);
if(ret_code != bufSize) {
std::cerr << "Error reading output." << std::endl;
exit(3);
}
// print the results
std::cout << buffer << std::endl;
// clean up
pclose(fp);
free(buffer);
return (EXIT_SUCCESS);
}
This code is giving me a "Memory error" with an exit status of '2', so I can see where it's failing, I just don't understand why.
I put this together from example code that I found on Ubuntu Forums and C++ Reference, so I'm not married to it. If anyone can suggest a better way to read the results of a system() call, I'm open to new ideas.
EDIT to the original: Okay, bufSize is coming up negative, and now I understand why. You can't randomly access a pipe, as I naively tried to do.
I can't be the first person to try to do this. Can someone give (or point me to) an example of how to read the results of a system() call into a variable in C++?
You're making this all too hard. popen(3) returns a regular old FILE * for a standard pipe file, which is to say, newline terminated records. You can read it with very high efficiency by using fgets(3) like so in C:
#include <stdio.h>
char bfr[BUFSIZ] ;
FILE * fp;
// ...
if((fp=popen("/bin/df", "r")) ==NULL) {
// error processing and return
}
// ...
while(fgets(bfr,BUFSIZ,fp) != NULL){
// process a line
}
In C++ it's even easier --
#include <cstdio>
#include <iostream>
#include <string>
FILE * fp ;
if((fp= popen("/bin/df","r")) == NULL) {
// error processing and exit
}
ifstream ins(fileno(fp)); // ifstream ctor using a file descriptor
string s;
while (! ins.eof()){
getline(ins,s);
// do something
}
There's some more error handling there, but that's the idea. The point is that you treat the FILE * from popen just like any FILE *, and read it line by line.
Why would std::malloc() fail?
The obvious reason is "because std::ftell() returned a negative signed number, which was then treated as a huge unsigned number".
According to the documentation, std::ftell() returns -1 on failure. One obvious reason it would fail is that you cannot seek in a pipe or FIFO.
There is no escape; you cannot know the length of the command output without reading it, and you can only read it once. You have to read it in chunks, either growing your buffer as needed or parsing on the fly.
But, of course, you can simply avoid the whole issue by directly using the system call df probably uses to get its information: statvfs().
(A note on terminology: "system call" in Unix and Linux generally refers to calling a kernel function from user-space code. Referring to it as "the results of a system() call" or "the results of a system(3) call" would be clearer, but it would probably be better to just say "capturing the output of a process.")
Anyway, you can read a process's output just like you can read any other file. Specifically:
You can start the process using pipe(), fork(), and exec(). This gives you a file descriptor, then you can use a loop to read() from the file descriptor into a buffer and close() the file descriptor once you're done. This is the lowest level option and gives you the most control.
You can start the process using popen(), as you're doing. This gives you a file stream. In a loop, you can read using from the stream into a temporary variable or buffer using fread(), fgets(), or fgetc(), as Zarawesome's answer demonstrates, then process that buffer or append it to a C++ string.
You can start the process using popen(), then use the nonstandard __gnu_cxx::stdio_filebuf to wrap that, then create an std::istream from the stdio_filebuf and treat it like any other C++ stream. This is the most C++-like approach. Here's part 1 and part 2 of an example of this approach.
I'm not sure you can fseek/ftell pipe streams like this.
Have you checked the value of bufSize ? One reason malloc be failing is for insanely sized buffers.
Thanks to everyone who took the time to answer. A co-worker pointed me to the ostringstream class. Here's some example code that does essentially what I was attempting to do in the original question.
#include <iostream> // cout
#include <sstream> // ostringstream
int main(int argc, char** argv) {
FILE* stream = popen( "df", "r" );
std::ostringstream output;
while( !feof( stream ) && !ferror( stream ))
{
char buf[128];
int bytesRead = fread( buf, 1, 128, stream );
output.write( buf, bytesRead );
}
std::string result = output.str();
std::cout << "<RESULT>" << std::endl << result << "</RESULT>" << std::endl;
return (0);
}
To answer the question in the update:
char buffer[1024];
char * line = NULL;
while ((line = fgets(buffer, sizeof buffer, fp)) != NULL) {
// parse one line of df's output here.
}
Would this be enough?
First thing to check is the value of bufSize - if that happens to be <= 0, chances are that malloc returns a NULL as you're trying to allocate a buffer of size 0 at that point.
Another workaround would be to ask malloc to provide you with a buffer of the size (bufSize + n) with n >= 1, which should work around this particular problem.
That aside, the code you posted is pure C, not C++, so including is overdoing it a little.
check your bufSize. ftell can return -1 on error, and this can lead to nonallocation by malloc with buffer having a NULL value.
The reason for the ftell to fail is, because of the popen. You cant search pipes.
Pipes are not random access. They're sequential, which means that once you read a byte, the pipe is not going to send it to you again. Which means, obviously, you can't rewind it.
If you just want to output the data back to the user, you can just do something like:
// your file opening code
while (!feof(fp))
{
char c = getc(fp);
std::cout << c;
}
This will pull bytes out of the df pipe, one by one, and pump them straight into the output.
Now if you want to access the df output as a whole, you can either pipe it into a file and read that file, or concatenate the output into a construct such as a C++ String.