Need an efficient way to handle ReadDirectoryChangesW in C++ - c++

I want to get notified about a change in directory (new file addition/deletion/updation).
I used an API - "ReadDirectoryChangesW" which correctly notifies about any change in directory. But, the API accepts a buffer in which it returns the details of file(s) added/deleted/modified in the directory.
This pose a limitation as the change in directory is not certain and can be huge sometimes. For ex: 1000 files get added in the directory.
In this case I alwasys need to be ready with a large enough buffer to hold notification about all 1000 files.
I dont want to always create this large buffer.
Is there any other alternate way which is more efficient?

If I read the documentation correctly, it will return as many changes as fit in your buffer, and then when next you call it will give you more changes. If want to get 1000 files worth of changes at once, you've got to give it a big buffer, but if you can handle them in smaller chunks, just pass in a smaller buffer and you'll get the rest of the changes on subsequent calls.

One approach that you could use is to use the ReadDirectoryChangesW() as a way to be notified that there has been some change in the directory and to then use this notification as an event to review the directory for changes.
The idea is to discover what has changed yourself rather than depending on ReadDirectoryChangesW() to tell you what has changed.
The documentation for the function indicates that a system buffer is allocated to track changes and it is possible, with a large number of changes that the buffer allocated will overflow. This results in an error returned and requires you to discover what has changed for yourself anyway.
This article on using ReadDirectoryChangesW() may help you.
In my case, I am using the function to monitor a print spooler folder into which a number of text files might be dropped. The number of files is small so I have just allocated a large buffer. What I then do is to use a queue to provide to the actual print functionality, which runs on another thread, the list of files to print.

Related

What is the best architecture / data structure for detecting changes in state of data

I'm working on a data structure with moving devices. I already added geohashes (open location code). And I'm able to use where >= geohash_low and where <= geohash_high to search within an area.
If I look this up, I get results within (roughly) this area. The next step is to look back per device when they entered or left the area. If I look back, I can determine this easily, but the next step is to determine it at the moment that a new message is placed in the database.
My first idea would be to create a copy of every incoming message at device/latest. And then monitor with a onChange cloud function the changes. And then determine if the change was going from inside to outside an area or the otherway around.
But this architecture will add an extra write operation (to latest message) at every incoming message. And it will add an extra read operation (onChange).
Another approach would be to do an additional read at every incoming message to read a latest state of inside or outside an area. And compare that with the current position. That would 'cost' only an extra read at every incoming message. If the state needs to be changed, then perform an extra write.
Basically the problem is that I need to manage a state, while functions are stateless.
Any other thoughts...
Thanks a lot for your thoughts.

Is there a way prevent libcurl from buffering?

I am using libcurl with CURLOPT_WRITEFUNCTION to download a certain file.
I ask for a certain buffer size using CURLOPT_BUFFERSIZE.
When my callback function is called the first time and I get about that many bytes, there are much more data actually downloaded.
For example, if I ask for 1024 bytes of data, when I first get that, the process has already consumed 100K of data (based on process explorer and similar tools. I can see the continuous stream of data and ACKs in wireshark), so I assume it is downloading in advance and buffering the data.
The thing I am trying to achieve here is to be able to cancel the retrieval based on first few chunks of data without downloading anything that is unnecessary.
Is there a way to prevent that sort of buffering and only download the next chunk of data once I have finished processing the current one (or at least not to buffer tens and hundreds of kilobytes)?
I would prefer the solution to be server agnostic, so CURLOPT_RANGE won't actually work here.

Is FindFirstChangeNotification API doing any disk access? [duplicate]

I've used FileSystemWatcher in the past. However, I am hoping someone can explain how it actually is working behind the scenes.
I plan to utilize it in an application I am making and it would monitor about 5 drives and maybe 300,000 files.
Does the FileSystemWatcher actually do "Checking" on the drive - as in, will it be causing wear/tear on the drive? Also does it impact hard drive ability to "sleep"
This is where I do not understand how it works - if it is like scanning the drives on a timer etc... or if its waiting for some type of notification from the OS before it does anything.
I just do not want to implement something that is going to cause extra reads on a drive and keep the drive from sleeping.
Nothing like that. The file system driver simply monitors the normal file operations requested by other programs that run on the machine against the filters you've selected. If there's a match then it adds an entry to an internal buffer that records the operation and the filename. Which completes the driver request and gets an event to run in your program. You'll get the details of the operation passed to you from that buffer.
So nothing actually happens the operations themselves, there is no extra disk activity at all. It is all just software that runs. The overhead is minimal, nothing slows down noticeably.
The short answer is no. The FileSystemWatcher calls the ReadDirectoryChangesW API passing it an asynchronous flag. Basically, Windows will store data in an allocated buffer when changes to a directory occur. This function returns the data in that buffer and the FileSystemWatcher converts it into nice notifications for you.

Reading file that changes over time C++

I am going to read a file in C++. The reading itself is happening in a while-loop, and is reading from one file.
When the function reads information from the file, it is going to push this information up some place in the system. The problem is that this file may change while the loop is ongoing.
How may I catch that new information in the file? I tried out std::ifstream reading and changing my file manually on my computer as the endless-loop (with a sleep(2) between each loop) was ongoing, but as expected -- nothing happend.
EDIT: the file will overwrite itself at each new entry of data to the file.
Help?
Running on virtual box Ubuntu Linux 12.04, if that may be useful info. And this is not homework.
The usual solution is something along the lines of what MichaelH
proposes: the writing process opens the file in append mode, and
always writes to the end. The reading process does what
MichaelH suggests.
This works fine for small amounts of data in each run. If the
processes are supposed to run a long time, and generate a lot of
data, the file will eventually become too big, since it will
contain all of the processed data as well. In this case, the
solution is to use a directory, generating numbered files in it,
one file per data record. The writing process will write each
data set to a new file (incrementing the number), and the
reading process will try to open the new file, and delete it
when it has finished. This is considerably more complex than
the first suggestion, but will work even for processes
generating large amounts of data per second and running for
years.
EDIT:
Later comments by the OP say that the device is actually a FIFO.
In that case:
you can't seek, so MichaelH's suggestion can't be used
literally, but
you don't need to seek, since data is automatically removed
from the FIFO whenever it has been read, and
depending on the size of the data, and how it is written, the
writes may be atomic, so you don't have to worry about partial
records, even if you happen to read exactly in the middle of
a write.
With regards to the latter: make sure that both the read and
write buffers are large enough to contain a complete record, and
that the writer flushes after each record. And make sure that
the records are smaller than the size needed to guarantee
atomicity. (Historically, on the early Unix I know, this was
4096, but I would be surprised if it hasn't increased since
then. Although... Under Posix, this is defined by PIPE_BUF,
which is only guaranteed to be at least 512, and is only 4096
under modern Linux.)
Just read the file, rename the file, open the renamed file. do the processing of data to your system, and at the end of the loop close the file. After a sleep, re-open the file at the top of the white loop, rename it and repeat.
That's the simplest way to approach the problem and saves having to write code to process dynamic changes to the file during the processing stage.
To be absolutely sure you don't get any corruption it's best to rename the file. This guarantees that any changes from another process do not affect the processing. It may not be necessary to do this - it depends on the processing and how the file is updated. But it's a safer approach. A move or rename operation is guaranteed to be atomic - so there should be no concurreny issues if using this approach.
You can use inotify to watch file changes.
If you need simpler solution - read file attributes ( with stat(), and check last_write_time of a file ).
However you still may miss some file modification, while you'll be opening and rereading file. So if you have control over the application which writes to a file, i'd recommend you using something else to communicate between these processes, pipes for example.
To be more explicit, if you want tail-like behavior you'll want to:
Open the file, read in the data. Save the length. Close the file.
Wait for a bit.
Open the file again, attempt to seek to the last read position, read the remaining data, close.
rinse and repeat

FindFirstChangeNotification is notifying about changes twice

I want to monitor a folder in my file system. Let say I want to monitor the folder: C:\MyNewFolder
I have this code to do it:
HANDLE ChangeHandle=FindFirstChangeNotification(_T("C:\\\MyNewFolder"),FALSE,FILE_NOTIFY_CHANGE_LAST_WRITE);
for(;;)
{
DWORD Wait=WaitForSingleObject(ChangeHandle,INFINITE);
if (Wait == WAIT_OBJECT_0)
{
MessageBox(NULL,_T("Change"),_T("Change"),MB_OK);
FindNextChangeNotification(ChangeHandle);
}
else
{
break;
}
}
I want to have a messagebox that notifying me about any file change in my folder. That code works fine but I have one problem. The problem is that I got 2 notification for each change. What is the problem with my code?
Thanks.
This is entirely normal. A change to a file usually involves a change to the file data as well as a change to the directory entry. Metadata properties like the file length and the last write date are stored there. So you'll get a notification for both. ReadDirectoryChangesW() doesn't otherwise distinguish between the two.
This is not different from a process making multiple changes to the same file. Be sure to be able to handle both conditions. This usually involves a timer so you don't go overboard with the number of operations you perform on a notification. Such a timer is also often required because the process that is changing the file still has a lock on it that prevents you from doing anything with the file. Until the process closes the file, an indeterminate amount of time later.
What you're probably seeing is multiple changes to the one file (e.g. a file being created, and then written to, or a file being written to multiple times, etc). Unfortunately FindFirstChangeNotification doesn't tell you what has actually happened.
You're better off using ReadDirectoryChangesW for file notification as it will actually tell you what has changed.