using stat to detect whether a file exists (slow?) - c++

I'm using code like the following to check whether a file has been created before continuing, thing is the file is showing up in the file browser much before it is being detected by stat... is there a problem with doing this?
//... do something
struct stat buf;
while(stat("myfile.txt", &buf))
sleep(1);
//... do something else
alternatively is there a better way to check whether a file exists?

Using inotify, you can arrange for the kernel to notify you when a change to the file system (such as a file creation) takes place. This may well be what your file browser is using to know about the file so quickly.

The "stat" system call is collecting different information about the file, such as, for example, a number of hard links pointing to it or its "inode" number. You might want to look at the "access" system call which you can use to perform existence check only by specifying "F_OK" flag in "mode".
There is, however, a little problem with your code. It puts the process to sleep for a second every time it checks for file which doesn't exist yet. To avoid that, you have to use inotify API, as suggested by Jerry Coffin, in order to get notified by kernel when file you are waiting for is there. Keep in mind that inotify does not notify you if file is already there, so in fact you need to use both "access" and "inotify" to avoid a race condition when you started watching for a file just after it was created.
There is no better or faster way to check if file exists. If your file browser still shows the file slightly faster than this program detects it, then Greg Hewgill's idea about renaming is probably taking place.
Here is a C++ code example that sets up an inotify watch, checks if file already exists and waits for it otherwise:
#include <cstdio>
#include <cstring>
#include <string>
#include <unistd.h>
#include <sys/inotify.h>
int
main ()
{
const std::string directory = "/tmp";
const std::string filename = "test.txt";
const std::string fullpath = directory + "/" + filename;
int fd = inotify_init ();
int watch = inotify_add_watch (fd, directory.c_str (),
IN_MODIFY | IN_CREATE | IN_MOVED_TO);
if (access (fullpath.c_str (), F_OK) == 0)
{
printf ("File %s exists.\n", fullpath.c_str ());
return 0;
}
char buf [1024 * (sizeof (inotify_event) + 16)];
ssize_t length;
bool isCreated = false;
while (!isCreated)
{
length = read (fd, buf, sizeof (buf));
if (length < 0)
break;
inotify_event *event;
for (size_t i = 0; i < static_cast<size_t> (length);
i += sizeof (inotify_event) + event->len)
{
event = reinterpret_cast<inotify_event *> (&buf[i]);
if (event->len > 0 && filename == event->name)
{
printf ("The file %s was created.\n", event->name);
isCreated = true;
break;
}
}
}
inotify_rm_watch (fd, watch);
close (fd);
}

your code will check if the file is there every second. you can use inotify to get an event instead.

Related

c++ close a open() file read with mmap

I am working with mmap() to fastly read big files, basing my script on this question answer (Fast textfile reading in c++).
I am using the second version from sehe answer :
#include <algorithm>
#include <iostream>
#include <cstring>
// for mmap:
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
const char* map_file(const char* fname, size_t& length);
int main()
{
size_t length;
auto f = map_file("test.cpp", length);
auto l = f + length;
uintmax_t m_numLines = 0;
while (f && f!=l)
if ((f = static_cast<const char*>(memchr(f, n, l-f))))
m_numLines++, f++;
std::cout << "m_numLines = " << m_numLines << "n";
}
void handle_error(const char* msg) {
perror(msg);
exit(255);
}
const char* map_file(const char* fname, size_t& length)
{
int fd = open(fname, O_RDONLY);
if (fd == -1)
handle_error("open");
// obtain file size
struct stat sb;
if (fstat(fd, &sb) == -1)
handle_error("fstat");
length = sb.st_size;
const char* addr = static_cast<const char*>(mmap(NULL, length, PROT_READ, MAP_PRIVATE, fd, 0u));
if (addr == MAP_FAILED)
handle_error("mmap");
// TODO close fd at some point in time, call munmap(...)
return addr;
}
and it works just great.
But if I implement it over a loop of several files (I just change the main() function name to:
void readFile(std::string &nomeFile) {
and then get the file content in "f" object in main() function with:
size_t length;
auto f = map_file(nomeFile.c_str(), length);
auto l = f + length;
and call it from main() on a loop over a filenames list), after a while I got:
open: Too many open files
I imagine there would be a way to close the open() call after working on a file, but I can not figure out how and where to put it exactly. I tried:
int fc = close(fd);
at the end of the readFile() function but it did change nothing.
Thanks a lot in advance for any help!
EDIT:
after the important suggestions I received I made some performance comparison with different approaches with mmap() and std::cin(), check out: fast file reading in C++, comparison of different strategies with mmap() and std::cin() results interpretation for the results
Limit to the number of concurrently open files
As you can imagine, keeping a file open consumes resources. So there is in any case a practical limit to the number of open file descriptors on your system. This is why it's highly recommended to close files that you no longer need.
The exact limit depends on the OS and the configuration. If you want to know more, there are already a lot of answers available for this kind of question.
Special case of mmap
Obviously, with mmap() you open a file. And doing so repetitively in a loop risk to reach sooner or later the fatal file description limit, as you could experience.
The idea of trying to close the file is not bad. The problem is that it does not work. This is specified in the POSIX documentation:
The mmap() function adds an extra reference to the file associated
with the file descriptor fildes which is not removed by a subsequent
close() on that file descriptor. This reference is removed when there
are no more mappings to the file.
Why ? Because mmap() links the file in a special way to the virtual memory management in your system. And this file will be needed as long as you use the address range to which it was allocated.
So how to remove those mappings ? The answer is to use munmap():
The function munmap() removes any mappings for those entire pages
containing any part of the address space of the process starting at
addr and continuing for len bytes.
And of course, close() the file descriptor that you no longer need. A prudent approach would be to close after munmap(), but in principle, at least on a POSIX compliant system, it should not matter when you're closing. Nevertheless, check your latest OS documentation to be on the safe side :-)
*Note: file mapping is also available on windows; the documentation about closing the handles is ambiguous on potential memory leaks if there are remaining mappings. This is why I recommend prudence on the closing moment. *

Why is data corrupt when reading back from a file as it's being written with O_DIRECT

I have a C++ program that uses the POSIX API to write a file opened with O_DIRECT. Concurrently, another thread is reading back from the same file via a different file descriptor. I've noticed that occasionally the data read back from the file contains all zeroes, rather than the actual data I wrote. Why is this?
Here's an MCVE in C++17. Compile with g++ -std=c++17 -Wall -otest test.cpp or equivalent. Sorry I couldn't seem to make it any shorter. All it does is write 100 MiB of constant bytes (0x5A) to a file in one thread and read them back in another, printing a message if any of the read-back bytes are not equal to 0x5A.
WARNING, this MCVE will delete and rewrite any file in the current working directory named foo.
#include <algorithm>
#include <cstddef>
#include <cstdint>
#include <cstdlib>
#include <iostream>
#include <thread>
#include <fcntl.h>
#include <unistd.h>
#include <sys/stat.h>
#include <sys/types.h>
constexpr size_t CHUNK_SIZE = 1024 * 1024;
constexpr size_t TOTAL_SIZE = 100 * CHUNK_SIZE;
int main(int argc, char *argv[])
{
::unlink("foo");
std::thread write_thread([]()
{
int fd = ::open("foo", O_WRONLY | O_CREAT | O_DIRECT, 0777);
if (fd < 0) std::exit(-1);
uint8_t *buffer = static_cast<uint8_t *>(
std::aligned_alloc(4096, CHUNK_SIZE));
std::fill(buffer, buffer + CHUNK_SIZE, 0x5A);
size_t written = 0;
while (written < TOTAL_SIZE)
{
ssize_t rv = ::write(fd, buffer,
std::min(TOTAL_SIZE - written, CHUNK_SIZE));
if (rv < 0) { std::cerr << "write error" << std::endl; std::exit(-1); }
written += rv;
}
});
std::thread read_thread([]()
{
int fd = ::open("foo", O_RDONLY, 0);
if (fd < 0) std::exit(-1);
uint8_t *buffer = new uint8_t[CHUNK_SIZE];
size_t checked = 0;
while (checked < TOTAL_SIZE)
{
ssize_t rv = ::read(fd, buffer, CHUNK_SIZE);
if (rv < 0) { std::cerr << "write error" << std::endl; std::exit(-1); }
for (ssize_t i = 0; i < rv; ++i)
if (buffer[i] != 0x5A)
std::cerr << "readback mismatch at offset " << checked + i << std::endl;
checked += rv;
}
});
write_thread.join();
read_thread.join();
}
(Details such as proper error checking and resource management are omitted here for the sake of the MCVE. This is not my actual program but it shows the same behavior.)
I'm testing on Linux 4.15.0 with an SSD. About 1/3 of the time I run the program, the "readback mismatch" message prints. Sometimes it doesn't. In all cases, if I examine foo after the fact I find that it does contain the correct data.
If you remove O_DIRECT from the ::open() flags in the write thread, the problem goes away and the "readback mismatch" message never prints.
I could understand why my ::read() might return 0 or something to indicate I've already read everything that has been flushed to disk yet. But I can't understand why it would perform what appears to be a successful read, but with data other than what I wrote. Clearly I'm missing something, but what is it?
So, O_DIRECT has some additional constraints that might not make it what you're looking for:
Applications should avoid mixing O_DIRECT and normal I/O to the same
file, and especially to overlapping byte regions in the same file.
Even when the filesystem correctly handles the coherency issues in
this situation, overall I/O throughput is likely to be slower than
using either mode alone.
Instead, I think O_SYNC might be better, since it does provide the expected guarantees:
O_SYNC provides synchronized I/O file integrity completion, meaning
write operations will flush data and all associated metadata to the
underlying hardware. O_DSYNC provides synchronized I/O data
integrity completion, meaning write operations will flush data to the
underlying hardware, but will only flush metadata updates that are
required to allow a subsequent read operation to complete
successfully. Data integrity completion can reduce the number of
disk operations that are required for applications that don't need
the guarantees of file integrity completion.

inotify stops monitoring file when the file is deleted and created again

I encounter some problem when using inotify.
I use inotify to monitor changes on files. Here is my code:
int fd = inotify_init();
int wd = inotify_add_watch(fd, "/root/temp", IN_ALL_EVENTS);
int bufSize = 1000;
char *buf = new char[bufSize];
memset(buf, 0, sizeof(buf));
int nBytes = read(fd, buf, bufSize - 1);
cout << nBytes << " bytes read" << endl;
inotify_event *eventPtr = (inotify_event *)buf;
int offset = 0;
while (offset < nBytes)
{
cout << eventPtr->mask << endl;
offset += sizeof(inotify_event) + eventPtr->len;
eventPtr = (inotify_event *)(buf + offset);
}
delete []buf;
If I delete "/root/temp" and re-create such a file, any changes to this file is not monitored by inotify, anyone how is this? Thanks.
cheng
That's because inotify monitors the underlying inode, not the filename. When you delete that file, the inode you're currently watching becomes invalid, therefore, you must invoke inotify_rm_watch. If you want to monitor a new file with the same name, but a different inode, you must detect when it's created by monitoring its parent folder.
The other two answers are correct. Another useful point is that inotify tells you when the watch is invalidated.
mask & IN_IGNORED
will be non-zero. IN_IGNORED is set when:
"Watch was removed explicitly (inotify_rm_watch(2)) or automatically (file was deleted, or file system was unmounted)."
So, as noted, when this is set, you can rewatch the file (and/or the directory if the file has not yet been re-created).
Whenever you use an API, READ THE DOCUMENTATION.
inotify works using the unique file identifer inode, not a filename. The entire Linux kernel works with inodes in fact. Filenames are only a means to look up inodes.
To get what you want you need to monitor the /root directory. It will report a creation event when a file is added. If that file is named "temp" then you can add a watch on that file.

Why is calling close() after fopen() not closing?

I ran across the following code in one of our in-house dlls and I am trying to understand the behavior it was showing:
long GetFD(long* fd, const char* fileName, const char* mode)
{
string fileMode;
if (strlen(mode) == 0 || tolower(mode[0]) == 'w' || tolower(mode[0]) == 'o')
fileMode = string("w");
else if (tolower(mode[0]) == 'a')
fileMode = string("a");
else if (tolower(mode[0]) == 'r')
fileMode = string("r");
else
return -1;
FILE* ofp;
ofp = fopen(fileName, fileMode.c_str());
if (! ofp)
return -1;
*fd = (long)_fileno(ofp);
if (*fd < 0)
return -1;
return 0;
}
long CloseFD(long fd)
{
close((int)fd);
return 0;
}
After repeated calling of GetFD with the appropriate CloseFD, the whole dll would no longer be able to do any file IO. I wrote a tester program and found that I could GetFD 509 times, but the 510th time would error.
Using Process Explorer, the number of Handles did not increase.
So it seems that the dll is reaching the limit for the number of open files; setting _setmaxstdio(2048) does increase the amount of times we can call GetFD. Obviously, the close() is working quite right.
After a bit of searching, I replaced the fopen() call with:
long GetFD(long* fd, const char* fileName, const char* mode)
{
*fd = (long)open(fileName, 2);
if (*fd < 0)
return -1;
return 0;
}
Now, repeatedly calling GetFD/CloseFD works.
What is going on here?
If you open a file with fopen, you have to close it with fclose, symmetrically.
The C++ runtime must be given a chance to clean up/deallocate its inner file-related structures.
You need to use fclose with files opened via fopen, or close with files opened via open.
The standard library you are using has a static array of FILE structures. Because you are not calling fclose(), the standard library doesn't know that the underlying files have been closed, so it doesn't know it can reuse the corresponding FILE structures. You get an error after it has run out of entries in the FILE array.
fopen opens it's own file descriptor, so you'd need to do an fclose(ofp) in your original function to prevent running out of file descriptors. Usually, one either uses the lower level file descriptor functions open, close OR the buffered fopen, fclose functions.
you are open the file fopen() function so u have to close the file useing fclose(), if you are using open() function and try to call fclose() function it will not work

How do I read the results of a system() call in C++?

I'm using the following code to try to read the results of a df command in Linux using popen.
#include <iostream> // file and std I/O functions
int main(int argc, char** argv) {
FILE* fp;
char * buffer;
long bufSize;
size_t ret_code;
fp = popen("df", "r");
if(fp == NULL) { // head off errors reading the results
std::cerr << "Could not execute command: df" << std::endl;
exit(1);
}
// get the size of the results
fseek(fp, 0, SEEK_END);
bufSize = ftell(fp);
rewind(fp);
// allocate the memory to contain the results
buffer = (char*)malloc( sizeof(char) * bufSize );
if(buffer == NULL) {
std::cerr << "Memory error." << std::endl;
exit(2);
}
// read the results into the buffer
ret_code = fread(buffer, 1, sizeof(buffer), fp);
if(ret_code != bufSize) {
std::cerr << "Error reading output." << std::endl;
exit(3);
}
// print the results
std::cout << buffer << std::endl;
// clean up
pclose(fp);
free(buffer);
return (EXIT_SUCCESS);
}
This code is giving me a "Memory error" with an exit status of '2', so I can see where it's failing, I just don't understand why.
I put this together from example code that I found on Ubuntu Forums and C++ Reference, so I'm not married to it. If anyone can suggest a better way to read the results of a system() call, I'm open to new ideas.
EDIT to the original: Okay, bufSize is coming up negative, and now I understand why. You can't randomly access a pipe, as I naively tried to do.
I can't be the first person to try to do this. Can someone give (or point me to) an example of how to read the results of a system() call into a variable in C++?
You're making this all too hard. popen(3) returns a regular old FILE * for a standard pipe file, which is to say, newline terminated records. You can read it with very high efficiency by using fgets(3) like so in C:
#include <stdio.h>
char bfr[BUFSIZ] ;
FILE * fp;
// ...
if((fp=popen("/bin/df", "r")) ==NULL) {
// error processing and return
}
// ...
while(fgets(bfr,BUFSIZ,fp) != NULL){
// process a line
}
In C++ it's even easier --
#include <cstdio>
#include <iostream>
#include <string>
FILE * fp ;
if((fp= popen("/bin/df","r")) == NULL) {
// error processing and exit
}
ifstream ins(fileno(fp)); // ifstream ctor using a file descriptor
string s;
while (! ins.eof()){
getline(ins,s);
// do something
}
There's some more error handling there, but that's the idea. The point is that you treat the FILE * from popen just like any FILE *, and read it line by line.
Why would std::malloc() fail?
The obvious reason is "because std::ftell() returned a negative signed number, which was then treated as a huge unsigned number".
According to the documentation, std::ftell() returns -1 on failure. One obvious reason it would fail is that you cannot seek in a pipe or FIFO.
There is no escape; you cannot know the length of the command output without reading it, and you can only read it once. You have to read it in chunks, either growing your buffer as needed or parsing on the fly.
But, of course, you can simply avoid the whole issue by directly using the system call df probably uses to get its information: statvfs().
(A note on terminology: "system call" in Unix and Linux generally refers to calling a kernel function from user-space code. Referring to it as "the results of a system() call" or "the results of a system(3) call" would be clearer, but it would probably be better to just say "capturing the output of a process.")
Anyway, you can read a process's output just like you can read any other file. Specifically:
You can start the process using pipe(), fork(), and exec(). This gives you a file descriptor, then you can use a loop to read() from the file descriptor into a buffer and close() the file descriptor once you're done. This is the lowest level option and gives you the most control.
You can start the process using popen(), as you're doing. This gives you a file stream. In a loop, you can read using from the stream into a temporary variable or buffer using fread(), fgets(), or fgetc(), as Zarawesome's answer demonstrates, then process that buffer or append it to a C++ string.
You can start the process using popen(), then use the nonstandard __gnu_cxx::stdio_filebuf to wrap that, then create an std::istream from the stdio_filebuf and treat it like any other C++ stream. This is the most C++-like approach. Here's part 1 and part 2 of an example of this approach.
I'm not sure you can fseek/ftell pipe streams like this.
Have you checked the value of bufSize ? One reason malloc be failing is for insanely sized buffers.
Thanks to everyone who took the time to answer. A co-worker pointed me to the ostringstream class. Here's some example code that does essentially what I was attempting to do in the original question.
#include <iostream> // cout
#include <sstream> // ostringstream
int main(int argc, char** argv) {
FILE* stream = popen( "df", "r" );
std::ostringstream output;
while( !feof( stream ) && !ferror( stream ))
{
char buf[128];
int bytesRead = fread( buf, 1, 128, stream );
output.write( buf, bytesRead );
}
std::string result = output.str();
std::cout << "<RESULT>" << std::endl << result << "</RESULT>" << std::endl;
return (0);
}
To answer the question in the update:
char buffer[1024];
char * line = NULL;
while ((line = fgets(buffer, sizeof buffer, fp)) != NULL) {
// parse one line of df's output here.
}
Would this be enough?
First thing to check is the value of bufSize - if that happens to be <= 0, chances are that malloc returns a NULL as you're trying to allocate a buffer of size 0 at that point.
Another workaround would be to ask malloc to provide you with a buffer of the size (bufSize + n) with n >= 1, which should work around this particular problem.
That aside, the code you posted is pure C, not C++, so including is overdoing it a little.
check your bufSize. ftell can return -1 on error, and this can lead to nonallocation by malloc with buffer having a NULL value.
The reason for the ftell to fail is, because of the popen. You cant search pipes.
Pipes are not random access. They're sequential, which means that once you read a byte, the pipe is not going to send it to you again. Which means, obviously, you can't rewind it.
If you just want to output the data back to the user, you can just do something like:
// your file opening code
while (!feof(fp))
{
char c = getc(fp);
std::cout << c;
}
This will pull bytes out of the df pipe, one by one, and pump them straight into the output.
Now if you want to access the df output as a whole, you can either pipe it into a file and read that file, or concatenate the output into a construct such as a C++ String.