Overcome MAX_PATH filename length - c++

I have read a lot of documentation on this subject, but I can't seem to figure it out.
The cause is that I have to process file paths which may be longer than the MAX_PATH parameter, causing a lot of issues
I have already replaced all my ANSI-functions like GetFileAttributesA with the UNICODE equivalent (GetFileAttributesW) in order to support the extended file path length with the prefix: \\?\.
However, I also need to check whether the file path for instance is a symbolic link and I need to know the filesize, last modified date, etc.
In order to do so, I use the stat function, as shown below:
if (fstat(LongFilePath, &file_info) == 0) //THIS FAILS WITH THE ENAMETOOLONG FOR LONG FILEPATHS
So, here again, the problem is in the ENAMETOOLONG error, due to a too long filename (exceeding MAX_PATH).
So, I found out I could use fstat to access the file by its descriptor. However, to obtain the descriptor, I need to use fopen, which also has the ENAMETOOLONG limitation.
So, my question is. How can I get the file information I need (symlink, filesize, last modified, .... as the stat function offers) for file paths exceeding MAX_PATH

Related

iOS file size during write using only C/C++ APIs

Purpose: I am monitoring file writes in a particular directory on iOS using BSD kernel queues, and poll for file sizes to determine write ends (when the size stops changing). The basic idea is to refresh a folder only after any number of file copies coming from iTunes sync. I have a completely working Objective-C implementation for this but I have my reasons for needing to implement the same thing in C++ only.
Problem: The one thing stopping me is that I can't find a C or C++ API that will get the correct file size during a write. Presumably, one must exist because Objective-C's [NSFileManager attributesOfItemAtPath:] seems to work and we all know it is just calling a C API underneath.
Failed Solutions:
I have tried using stat() and lstat() to get st_size and even st_blocks for allocated block count, and they return correct sizes for most files in a directory, but when there is a file write happening that file's size never changes between poll intervals, and every subsequent file iterated in that directory have a bad size.
I have tried using fseek and ftell but they are also resulting in a very similar issue.
I have also tried modified date instead of size using stat() and st_mtimespec, and the date doesn't appear to change during a write - not that I expected it to.
Going back to NSFileManager's ability to give me the right values, does anyone have an idea what C API call that [NSFileManager attributesOfItemAtPath:] is actually using underneath?
Thanks in advance.
Update:
It appears that this has less to do with in-progress write operations and more with specific files. After closer inspection there are some files which always return a size, and other files that never return a size when using the C API (but will work fine with the Objective-C API). Even creating a copy of the "good" files the C API does not want to give a size for the copy but works fine with the original "good" file. I have both failures and successes with text (xml) files and binary (zip) files. I am using iTunes to add these files to the iPad's app's Documents directory. It is an iPad Mini Retina.
Update 2 - Answer:
Probably any of the above file size methods will work, if your path isn't invisibly trashed, like mine was. See accepted answer on why the path was trashed.
Well this weird behavior turned out to be a problem with the paths, which result in strings that will print normally, but are likely trashed in memory enough that file descriptors sometimes didn't like it (thus only occurring in certain file paths). I was using the dirent API to iterate over the files in a directory and concatenating the dir path and file name erroneously.
Bad Path Concatenation: Obviously (or apparently not-so-obvious at runtime) str-copying over three times is not going to end well.
char* fullPath = (char*)malloc(strlen(dir) + strlen(file) + 2);
strcpy(fullPath, dir);
strcpy(fullPath, "/");
strcpy(fullPath, file);
long sizeBytes = getSize(fullPath);
free(fullPath);
Correct Path Concatenation: Use proper str-concatenation.
char* fullPath = (char*)malloc(strlen(dir) + strlen(file) + 2);
strcpy(fullPath, dir);
strcat(fullPath, "/");
strcat(fullPath, file);
long sizeBytes = getSize(fullPath);
free(fullPath);
Long story short, it was sloppy work on my part, via two typos.

GetDiskFreeSpaceEx with NULL Directory Name failing

I'm trying to use GetDiskFreeSpaceEx in my C++ win32 application to get the total available bytes on the 'current' drive. I'm on Windows 7.
I'm using this sample code: http://support.microsoft.com/kb/231497
And it works! Well, almost. It works if I provide a drive, such as:
...
szDrive[0] = 'C'; // <-- specifying drive
szDrive[1] = ':';
szDrive[2] = '\\';
szDrive[3] = '\0';
pszDrive = szDrive;
...
fResult = pGetDiskFreeSpaceEx ((LPCTSTR)pszDrive,
    (PULARGE_INTEGER)&i64FreeBytesToCaller,
    (PULARGE_INTEGER)&i64TotalBytes,
(PULARGE_INTEGER)&i64FreeBytes);
fResult becomes true and i can go on to accurately calculate the number of free bytes available.
The problem, however, is that I was hoping to not have to specify the drive, but instead just use the 'current' one. The docs I found online (Here) state:
lpDirectoryName [in, optional]
A directory on the disk. If this parameter is NULL, the function uses the root of the current disk.
But if I pass in NULL for the Directory Name then GetDiskFreeSpaceEx ends up returning false and the data remains as garbage.
fResult = pGetDiskFreeSpaceEx (NULL,
    (PULARGE_INTEGER)&i64FreeBytesToCaller,
    (PULARGE_INTEGER)&i64TotalBytes,
(PULARGE_INTEGER)&i64FreeBytes);
//fResult == false
Is this odd? Surely I'm missing something? Any help is appreciated!
EDIT
As per JosephH's comment, I did a GetLastError() call. It returned the DWORD for:
ERROR_INVALID_NAME 123 (0x7B)
The filename, directory name, or volume label syntax is incorrect.
2nd EDIT
Buried down in the comments I mentioned:
I tried GetCurrentDirectory and it returns the correct absolute path, except it prefixes it with \\?\
it returns the correct absolute path, except it prefixes it with \\?\
That's the key to this mystery. What you got back is the name of the directory with the native api path name. Windows is an operating system that internally looks very different from what you are familiar with winapi programming. The Windows kernel has a completely different api, it resembles the DEC VMS operating system a lot. No coincidence, David Cutler used to work for DEC. On top of that native OS were originally three api layers, Win32, POSIX and OS/2. They made it easy to port programs from other operating systems to Windows NT. Nobody cared much for the POSIX and OS/2 layers, they were dropped at XP time.
One infamous restriction in Win32 is the value of MAX_PATH, 260. It sets the largest permitted size of a C string that stores a file path name. The native api permits much larger names, 32000 characters. You can bypass the Win32 restriction by using the path name using the native api format. Which is simply the same path name as you are familiar with, but prefixed with \\?\.
So surely the reason that you got such a string back from GetCurrentDirectory() is because your current directory name is longer than 259 characters. Extrapolating further, GetDiskFreeSpaceEx() failed because it has a bug, it rejects the long name it sees when you pass NULL. Somewhat understandable, it isn't normally asked to deal with long names. Everybody just passes the drive name.
This is fairly typical for what happens when you create directories with such long names. Stuff just starts falling over randomly. In general there is a lot of C code around that uses MAX_PATH and that code will fail miserably when it has to deal with path names that are longer than that. This is a pretty exploitable problem too for its ability to create stack buffer overflow in a C program, technically a carefully crafted file name could be used to manipulate programs and inject malware.
There is no real cure for this problem, that bug in GetDiskFreeSpaceEx() isn't going to be fixed any time soon. Delete that directory, it can cause lots more trouble, and write this off as a learning experience.
I am pretty sure you will have to retrieve the current drive and directory and pass that to the function. I remember attempting to use GetDiskFreeSpaceEx() with the directory name as ".", but that did not work.

How to know when a file is edited?

Is there a way ( or an API ) to know when a text file is edited ( by a program or by a person ) and do a specific action ?
For example: I want to show a MessageBox when the file c:\Users\john\free.txt is edited.
Depends on when you exactly want to know it.
is your application running continuously and do you want to see any change as soon as possible?
is your application a simple command-line application that needs to check for changes once?
In the second case, you could check the modification dates of the file (as suggested by PoweRoy and Michal) or use a hash (as suggested by PoweRoy).
If your application is running continuously, you should use the FindFirstChangeNotification and ReadDirectoryChanges functions. You can read more about it on the following pages:
http://msdn.microsoft.com/en-us/library/windows/desktop/aa364417(v=vs.85).aspx
http://msdn.microsoft.com/en-us/library/windows/desktop/aa365465(v=vs.85).aspx.
Simplest: compare modification dates. But this can be manipulated.
Or make a hash of the original file and compare it with the current file.
GetFileTime should help you.
http://msdn.microsoft.com/en-us/library/ms724320%28v=vs.85%29.aspx
and there is GetFileAttributesEx as well.
check the file's last modify datetime.
This method retrieves status information related to a given CFile object instance or a given file path.
BOOL GetStatus(
CFileStatus& rStatus
) const;
static BOOL PASCAL GetStatus(
LPCTSTR lpszFileName,
CFileStatus& rStatus
);
Parameters
rStatus
A reference to a user-supplied CFileStatus structure that will receive the status information. The CFileStatus structure has the following fields:
CTime m_ctime The date and time the file was created.
CTime m_mtime The date and time the file was last modified.
CTime m_atime The date and time the file was last accessed for reading.
ULONGLONG m_size The logical size of the file in bytes, as reported by the DIR command.
BYTE m_attribute The attribute byte of the file.
char m_szFullName[_MAX_PATH] The absolute filename in the Windows character set.
lpszFileName
A string in the Windows character set that is the path to the desired file. The path can be relative or absolute, or it can contain a network path name.
Return Value
TRUE if the status information for the specified file is successfully obtained; otherwise, FALSE.
PS:information from MSDN

Efficiently List All Sub-Directories in a Directory

Please see edit with advice taken so far...
I am attempting to list all the directories(folders) in a given directory using WinAPI & C++.
Right now my algorithm is slow & inefficient:
- Use FindFirstFileEx() to open the folder I am searching
- I then look at every file in the directory(using FindNextFile()); if its a directory file then I store its absolute path in a vector, if its just a file I do nothing.
This seems extremely inefficient because I am looking at every file in the directory.
Is there a WinAPI function that I can use that will tell me all the sub-directories in a given directory?
Do you know of an algorithm I could use to efficiently locate & identify folders in a directory(folder)?
EDIT:
So after taking the advice I have searched using FindExSearchLimitToDirectories but for me it still prints out all the files(.txt, etc.) & not just folders. Am I doing something wrong?
WIN32_FIND_DATA dirData;
HANDLE dir = FindFirstFileEx( "c:/users/soribo/desktop\\*", FindExInfoStandard, &dirData,
FindExSearchLimitToDirectories, NULL, 0 );
while ( FindNextFile( dir, &dirData ) != 0 )
{
printf( "FileName: %s\n", dirData.cFileName );
}
In order to see a performance boost there must be support at the file system level. If this does not exist then the system must enumerate every single object in the directory.
In principle, you can use FindFirstFileEx specifying the FindExSearchLimitToDirectories flag. However, the documentation states (emphasis mine):
This is an advisory flag. If the file system supports directory filtering, the function searches for a file that matches the specified name and is also a directory. If the file system does not support directory filtering, this flag is silently ignored.
If directory filtering is desired, this flag can be used on all file systems, but because it is an advisory flag and only affects file systems that support it, the application must examine the file attribute data stored in the lpFindFileData parameter of the FindFirstFileEx function to determine whether the function has returned a handle to a directory.
However, from what I can tell, and information is sparse, FindExSearchLimitToDirectories flag is not widely supported on desktop file systems.
Your best bet is to use FindFirstFileEx with FindExSearchLimitToDirectories. You must still perform your own filtering in case you meet a file system that doesn't support directory filtering at file system level. If you get lucky and hit upon a file system that does support it then you will get the performance benefit.
If you're using FindFirstFileEx, then you should be able to specify the _FINDEX_SEARCH_OPS::FindExSearchLimitToDirectories option (to be used as the fSearchOp param in FindFirstFileEx) to limit the first search (and any subsequent FindNextFile()) calls to directories.

Accessing files across the windows network with near MAX_PATH length

I'm using C++ and accessing a UNC path across the network. This path is slightly greater than MAX_PATH. So I cannot obtain a file handle.
But if I run the program on the computer in question, the path is not greater than MAX_PATH. So I can get a file handle. If I rename the file to have less characters (minus length of computer name) I can access the file.
Can this file be accessed across the network even know the computer name in the UNC path puts it over the MAX_PATH limit?
I recall that there is some feature like using \\?\ at the start of the path to get around the MAX_PATH limit. Here is a reference on MSDN:
http://msdn.microsoft.com/en-us/library/aa365247(VS.85).aspx
For remote machines, you would use a path name such as: \\?\unc\server\share\path\file. The \\?\unc\ is the special prefix and is not used as part of the actual filename.
You might be able to get a handle to the file if you try opening the file after converting the file name to a short (8.3) file name. Failing that can you map the dir the file is in as a drive and access the file that way?