I have a folder to which files are copied. I want to watch it and process files as soon as they are copied to the directory. I can detect when a file is in the directory, whether through polling (my current implementation) or in some tests using Windows APIs from a few samples I've found online.
The problem is that I detect when the file is first created and its still being copied. This makes my program, that needs to access the file, through errors (because the file is not yet complete). How can I detect not when the copying started but when the copying ended? I'm using C++ on Windows, so the answer may be platform dependent but, if possible, I'd prefer it to be platform agnostic.
You could use either lock files or a special naming convention. The easiest is the latter and would work something like this:
Say you want to move a file named "fileA.txt" When copying it to the destination directory, instead, copy it to "fileA.txt.partial" or something like that. When the copy is complete, rename the file from "fileA.txt.partial" to "fileA.txt". So the appearance of "fileA.txt" is atomic as far as the watching program can see.
The other option as mentioned earlier is lock files. So when you copy a file named "fileA.txt", you first create a file called "fileA.txt.lock". When the copying is done, you simply delete the lock file. When the watching program see "fileA.txt", it should check if "fileA.txt.lock" exists, if it does, it can wait or revisit that file in the future as needed.
You should not be polling. Use FindFirstChangeNotification (http://msdn.microsoft.com/en-us/library/windows/desktop/aa364417%28v=vs.85%29.aspx) to watch a directory for changes.
Then use the Wait functions (http://msdn.microsoft.com/en-us/library/windows/desktop/ms687069%28v=vs.85%29.aspx) to wait on change notifications to happen.
Overview and examples here: http://msdn.microsoft.com/en-us/library/windows/desktop/aa365261%28v=vs.85%29.aspx
I'm not sure how exactly file write completion can be determined. Evan Teran's answer is a good idea.
You can use something like this, This is tested and working
bool IsFileDownloadComplete(const std::wstring& dir, const std::wstring& fileName)
{
std::wstring originalFileName = dir + fileName;
std::wstring tempFileName = dir + L"temp";
while(true)
{
int ret = rename(convertWstringToString(originalFileName).c_str(), convertWstringToString(tempFileName).c_str());
if(ret == 0)
break;
Sleep(10);
}
/** File is not open. Rename to original. */
int ret = rename(convertWstringToString(tempFileName).c_str(), convertWstringToString(originalFileName).c_str());
if(ret != 0)
throw std::exception("File rename failed");
return true;
}
Related
I had the following code which got executed every two
minutes all day long:
int sucessfully_deleted = DeleteFile(dest_filename);
if (!sucessfully_deleted)
{
// this never happens
}
rename(source_filename,dest_filename);
Once every several hours the rename() would fail with errno=13 (EACCES). The files involved were all sitting on a DropBox directory and I had a hunch that DropBox could be the cause. I figured that it might just be possible that the DeleteFile() function may return with a non-zero successfully_deleted but actually DropBox could still be busy doing some stuff in relation to the deletion that prevented rename() from succeeding. What I did next was to change rename() to my_rename() which would attempt a rename() and upon any failure would Sleep() for one second and try a second time. Sure enough that has worked perfectly ever since. What's more, I get a diagnostic message displaying first-attempt-failures every several hours. It has never failed on the second attempt.
So you could say that the problem is entirely solved... but I would like to understand what might be going on so as to better defend myself against any related DropBox issues in the future...
Really I would like to have a new super_delete() function which does not return until the file is properly deleted and finished with in all respects.
under windows request to delete file really never delete file just. it mark it FCB (File Control Block) with special flag (FCB_STATE_DELETE_ON_CLOSE). real deletion will be only when the last file handle will be closed.
The DeleteFile function marks a file for deletion on close. Therefore,
the file deletion does not occur until the last handle to the file is
closed. Subsequent calls to CreateFile to open the file fail with
ERROR_ACCESS_DENIED.
also if exist section ( memory-mapped file ) open on file - file even can not be marked for delete. api call fail with STATUS_CANNOT_DELETE. so in general impossible always delete file.
in case exist another open handles for file (but not section !) begin from windows 10 rs1 exist new functional for delete - FileDispositionInformationEx with FILE_DISPOSITION_POSIX_SEMANTICS. in this case:
Normally a file marked for deletion is not actually deleted until all
open handles for the file have been closed and the link count for the
file is zero. When marking a file for deletion using
FILE_DISPOSITION_POSIX_SEMANTICS, the link gets removed from the visible namespace as soon as the POSIX delete handle has been closed,
but the file’s data streams remain accessible by other existing
handles until the last handle has been closed.
ULONG DeletePosix(PCWSTR lpFileName)
{
HANDLE hFile = CreateFileW(lpFileName, DELETE, FILE_SHARE_VALID_FLAGS, 0, OPEN_EXISTING,
FILE_FLAG_BACKUP_SEMANTICS|FILE_FLAG_OPEN_REPARSE_POINT, 0);
if (hFile == INVALID_HANDLE_VALUE)
{
return GetLastError();
}
static FILE_DISPOSITION_INFO_EX fdi = { FILE_DISPOSITION_DELETE| FILE_DISPOSITION_POSIX_SEMANTICS };
ULONG dwError = SetFileInformationByHandle(hFile, FileDispositionInfoEx, &fdi, sizeof(fdi))
? NOERROR : GetLastError();
// win10 rs1: file removed from parent folder here
CloseHandle(hFile);
return dwError;
}
Update
Sorry i didn't get the question correctly the first time. I thought DeleteFile returned error 13.
Now I understand that DeleteFile succeeds but rename fails immediatlely after.
It could be just a sync issue with the filesystem. After calling DeleteFile the file will be deleted when the OS commits the changes to the filesystem. That may not appen immediately.
If you need to perform multiple operations to the same path, you should have a look at transactions https://learn.microsoft.com/it-it/windows/desktop/api/winbase/nf-winbase-deletefiletransacteda.
-- OLD ANSWER --
That is correct. If the another application handles to that file, DeleteFile will fail.
Citing MSDN docs https://learn.microsoft.com/en-us/windows/desktop/api/winbase/nf-winbase-deletefile :
The DeleteFile function fails if an application attempts to delete a file that has other handles open for normal I/O or as a memory-mapped file (FILE_SHARE_DELETE must have been specified when other handles were opened).
This applies to dropbox, the antivirus, or in general, any other application that may open those files.
Dropbox may open the file to compute its hash (to look for changes) at any moment. Same goes with the antivirus.
I observe pretty strange behavior in my application.
I have following functions:
bool File::Exists(const std::string & Path)
{
struct stat S;
if(stat(Path.c_str(), &S) != 0)
return false;
if(!S_ISREG(S.st_mode))
return false;
return true;
}
void File::Remove(const std::string & Path)
{
if(unlink(Path.c_str()) != 0)
throw Exceptions::Exception(EXCEPTION_PARAMS, errno);
}
The code i use is:
...
const std::string Path = ...;
if(File::Exists(Path))
File::Remove(Path);
...
And at this point exception is thrown:
No such file or directory (2)
Key facts:
Happens once every 1k-10k calls
All files being removed are binary files, about 20MB
This is called in application thread, however this is the only thread accessing those files (no other threads / process access them). Even no other process / user access partition they are located on.
File being removed is located on mounted CIFS (SMB) (Network) mount point
Why does stat() report file being present, but unlink() sometimes fails?
Why does stat() report file being present, but unlink() sometimes fails?
Because the calls don't happen at the same time.
This bug is so common it has its own name and even a Wikipedia page:
In software development, time of check to time of use (TOCTTOU or
TOCTOU, pronounced "TOCK too") is a class of software bug caused by
changes in a system between the checking of a condition (such as a
security credential) and the use of the results of that check. This is
one example of a race condition.
How intensively are these calls being made? Are you trying to delete thousands of files at a time? Or one or two a second/minute? Over a network you could get all sorts of timeouts that could be interpreted as "No such file or directory", but in actuality is another problem.
And if you're about to delete them anyway, why bother checking for their existence at all? Just delete them, and check the exception to find out if it didn't exist, or whether there was a different error - or not even do that. Either they're gone, or you can't delete them anyway...
PROBLEM HISTORY:
Now I use Windows Media Player SDK 9 to play AVI files in my desktop application. It works well on Windows XP but when I try to run it on Windows 7 I caught an error - I can not remove AVI file immediately after playback. The problem is that there are opened file handles exist. On Windows XP I have 2 opened file handles during the playing file and they are closed after closing of playback window but on Windows 7 I have already 4 opened handles during the playing file and 2 of them remain after the closing of playback window. They are become free only after closing the application.
QUESTION:
How can I solve this problem? How to remove the file which has opened handles? May be exists something like "force deletion"?
The problem is that you're not the only one getting handles to your file. Other processes and services are also able to open the file. So deleting it isn't possible until they release their handles. You can rename the file while those handles are open. You can copy the file while those handles are open. Not sure if you can move the file to another container, however?
Other processes & services esp. including antivirus, indexing, etc.
Here's a function I wrote to accomplish "Immediate Delete" under Windows:
bool DeleteFileNow(const wchar_t * filename)
{
// don't do anything if the file doesn't exist!
if (!PathFileExistsW(filename))
return false;
// determine the path in which to store the temp filename
wchar_t path[MAX_PATH];
wcscpy_s(path, filename);
PathRemoveFileSpecW(path);
// generate a guaranteed to be unique temporary filename to house the pending delete
wchar_t tempname[MAX_PATH];
if (!GetTempFileNameW(path, L".xX", 0, tempname))
return false;
// move the real file to the dummy filename
if (!MoveFileExW(filename, tempname, MOVEFILE_REPLACE_EXISTING))
{
// clean up the temp file
DeleteFileW(tempname);
return false;
}
// queue the deletion (the OS will delete it when all handles (ours or other processes) close)
return DeleteFileW(tempname) != FALSE;
}
Technically you can delete a locked file by using MoveFileEx and passing in MOVEFILE_DELAY_UNTIL_REBOOT. When the lpNewFileName parameter is NULL, the Move turns into a delete and can delete a locked file. However, this is intended for installers and, among other issues, requires administrator privileges.
Have you checked which application is still using the avi file?
you can do this by using handle.exe. You can try deleting/moving the file after closing the process(es) that is/are using that file.
The alternative solution would be to use unlocker appliation (its free).
One of the above two method should fix your problem.
Have you already tried to ask WMP to release the handles instead? (IWMPCore::close seems to do that)
Right now, I am facing a new problem that I can't figure out how to fix. I have two files. One is a video file and other is a thumbnail. They have same name. I want to rename these two files using C++. I am using the rename function and it works. This is what I've written:
if(rename(videoFile) == 0)
{
if(rename(thumbnail) != 0)
{
printf("Fail rename \n");
}
}
The problem occurs when the video file is renamed successfully but for some reason the thumbnail can't be renamed. When this happens, I would like to rollback the renaming of the video file because the video file name and the thumbnail file name should be the same in my program. What I want to do is to rename after both files are okay to rename. Please guide me, any design pattern for function like rollback or third party software.
There is no absolutely foolproof way to do this.
Fundamental rule of disk I/O: The filesystem can change at any time. You can't check whether a rename would succeed; your answer is already wrong. You can't be certain that undoing the rename will succeed; somebody else might have taken the name while you briefly weren't using it.
On systems that support hard links, you can use them to get about 90% of the way there, assuming you're not moving between filesystems. Suppose you're renaming A to B and C to D. Then do these things:
Create hard link B which links to A. This is written as link("A", "B") in C, using the Unix link(2) system call. Windows users should call CreateHardLink() instead.
If (1) succeeded, create hard link D which links to C. Otherwise, return failure now.
If (2) succeeded, delete A and C and return success. Otherwise, delete B and return failure. If the deletions fail, there is no obvious means of recovery. In practice, you can probably ignore failed deletions assuming the reason for failure was "file not found" or equivalent for your platform.
This is still vulnerable to race conditions if someone deletes one of the files out from under you at the wrong time, but that is arguably not an issue since it is largely equivalent to the rename failing (or succeeding) and then the person deleting the file afterwards.
Technically, you should also be opening the containing directory (in O_RDONLY mode) and fsync(2)'ing it after each operation, at least under Unix. If moving between directories, that's both the source and the destination directories. In practice, nobody does this, particularly since it will lead to degraded performance under ext3. Linus takes the position that the filesystem ought to DTRT without this call, but it is formally required under POSIX. As for Windows, I've been unable to find any authoritative reference on this issue on MSDN or elsewhere. So far as I'm aware, Windows does not provide an API for synchronizing directory entries (you can't open() a directory, so you can't get a file descriptor suitable to pass to fsync()).
Nitpick: To some extent, this sort of thing can be done perfectly on transactional filesystems, but just about the only one in common use right now is NTFS, and Microsoft specifically tells developers not to use that feature. If/when btrfs hits stable, transactions might become genuinely useful.
On Windows platform starting from Vista, you can use code such as the following.
#include "KtmW32.h"
bool RenameFileTransact( LPCTSTR lpctszOldVideoFile, LPCTSTR lpctszNewVideoFile, LPCTSTR lpctszOldThumbnailFile, LPCTSTR lpctszNewThumbnailFile )
{
bool bReturn = false;
HANDLE hRnameTransaction = CreateTransaction(NULL, NULL, 0, 0, 0, 0, NULL);
if (MoveFileTransacted(lpctszOldVideoFile, lpctszNewVideoFile, NULL, NULL, 0, hRnameTransaction) &&
MoveFileTransacted(lpctszOldThumbnailFile, lpctszNewThumbnailFile, NULL, NULL, 0, hRnameTransaction))
{
if ( CommitTransaction(hRnameTransaction))
{
bReturn = true;
}
}
CloseHandle( hRnameTransaction );
return bReturn;
}
But as #Kevin pointed out above, Microsoft discourages the usage of this good feature.
I wish for a file to be deleted from disk only when it is closed. Up until that point, other processes should be able to see the file on disk and read its contents, but eventually after the close of the file, it should be deleted from disk and no longer visible on disk to other processes.
Open the file, then delete it while it's open. Other processes will be able to use the file, but as soon as all handles to file are closed, it will be deleted.
Edit: based on the comments WilliamKF added later, this won't accomplish what he wants -- it'll keep the file itself around until all handles to it are closed, but the directory entry for the file name will disappear as soon as you call unlink/remove.
Open files in Unix are reference-counted. Every open(2) increments the counter, every close(2) decrements it. The counter is shared by all processes on the system.
Then there's a link count for a disk file. Brand-new file gets a count of one. The count is incremented by the link(2) system call. The unlink(2) decrements it. File is removed from the file system when this count drops to zero.
The only way to accomplish what you ask is to open the file in one process, then unlink(2) it. Other processes will be able to open(2) or stat(2) it between open(2) and unlink(2). Assuming the file had only one link, it'll be removed when all processes that have it open close it.
Use unlink
#include <unistd.h>
int unlink(const char *pathname);
unlink() deletes a name from the
filesystem. If that name was the last
link to a file and no processes have
the file open the file is deleted and
the space it was using is made
available for reuse.
If the name was the last link to a
file but any processes still have the
file open the file will remain in
existence until the last file
descriptor referring to it is closed.
If the name referred to a symbolic
link the link is removed.
If the name referred to a socket, fifo
or device the name for it is removed
but processes which have the object
open may continue to use it.
Not sure, but you could try remove, but it looks more like c-style.
Maybe boost::filesystem::remove?
bool remove( const path & ph );
Precondition: !ph.empty()
Returns: The value of exists( ph )
prior to the establishment of the
postcondition.
Postcondition: !exists( ph )
Throws: if ph.empty() || (exists(ph)
&& is_directory(ph) && !is_empty(ph)).
See empty path rationale.
Note: Symbolic links are themselves
deleted, rather than what they point
to being deleted.
Rationale: Does not throw when
!exists( ph ) because not throwing:
Works correctly if ph is a dangling
symbolic link. Is slightly
easier-to-use for many common use
cases. Is slightly higher-level
because it implies use of
postcondition semantics rather than
effects semantics, which would be
specified in the somewhat lower-level
terms of interactions with the
operating system. There is, however, a
slight decrease in safety because some
errors will slip by which otherwise
would have been detected. For example,
a misspelled path name could go
undetected for a long time.
The initial version of the library
threw an exception when the path did
not exist; it was changed to reflect
user complaints.
You could create a wrapper class that counts references, using one of the above methods to delete de file .
class MyFileClass{
static unsigned _count;
public:
MyFileClass(std::string& path){
//open file with path
_count++;
}
//other methods
~MyFileClass(){
if (! (--_count)){
//delete file
}
}
};
unsigned MyFileClass::_count = 0; //elsewhere
I think you need to extend your notion of “closing the file” beyond fclose or std::fstream::close to whatever you intend to do. That might be as simple as
class MyFile : public std::fstream {
std::string filename;
public:
MyFile(const std::string &fname) : std::fstream(fname), filename(fname) {}
~MyFile() { unlink(filename); }
}
or it may be something much more elaborate. For all I know, it may even be much simpler – if you close files only at one or two places in your code, the best thing to do may be to simply unlink the file there (or use boost::filesystem::remove, as Tom suggests).
OTOH, if all you want to achieve is that processes started from your process can use the file, you may not need to keep it lying around on disk at all. forked processes inherit open files. Don't forget to dup them, lest seeking in the child influences the position in the parent or vice versa.