On Windows, when should you use the "\\\\?\\" filename prefix? - c++

I came across a c library for opening files given a Unicode filename. Before opening the file, it first converts the filename to a path by prepending "\\?\". Is there any reason to do this other than to increase the maximum number of characters allowed in the path, per this msdn article?
It looks like these "\\?\" paths require the Unicode versions of the Windows API and standard library.

Yes, it's just for that purpose. However, you will likely see compatibility problems if you decide to creating paths over MAX_PATH length. For example, the explorer shell and the command prompt (at least on XP, I don't know about Vista) can't handle paths over that length and will return errors.

The best use for this method is probably not to create new files, but to manage existing files, which someone else may have created.
I managed a file server which routinely would get files with path_length > MAX_PATH. You see, the users saw the files as H:\myfile.txt, but on the server it was actually H:\users\username\myfile.txt. So if a user created a file with exactly MAX_PATH characters, on the server it was MAX_PATH+len("users\username").
(Creating a file with MAX_PATH characters is not so uncommon, since when you save a web page on Internet Explorer it uses the page title as the filename, which can be quite long for some pages).
Also, sharing a drive (via network or usb) with a Mac or a Linux machine, you can find yourself with files with names like con, prn or lpt1. And again, the prefix lets you and your scripts handle those files.

I think the first thing to note is that "\\?\" does not make the path a UNC path. You were more accurate the second time when you called it a UNC-style path. But even then, the similarity only comes from having two backslashes at the start. It really has nothing to do with UNC. That's backed up by the fact that you have to use even more characters to get a UNC path with the "\\?\" prefix.
I think you've got the entire reason for using that prefix. It lifts the maximum-length limit as described in the article you cited. And it only applies to Unicode paths; non-Unicode paths don't get to avoid the limit by using that prefix.
One thing to note is that the prefix is not allowed for relative paths, only for absolute ones. You might want to double-check that your C library honors that restriction.

As well as allowing longer paths, the "\\?\" prefix also lets you use files and directory names like "con" and "aux". Normally Windows would interpret those as old-fashioned DOS devices.

I've been writing Windows code since 1995, and although I'm aware of that prefix, I've never found any reason to use it. Increasing the path length beyond MAX_PATH seems to be the only reason for it, and neither I nor any of my programs' customers have ever done so, to my knowledge.

Related

How to get real full path of a file or directory having one of its paths?

The same file system entry can be accessible in several paths.
real full path - /home/user/dir1/file1
path which contains parent dirs - /home/user/dir1/../dir1/file1
path with direct symlinks - /home/user/dir1/symlink_to_file1
path with indirect symlinks - /home/user/symlink_to_dir1/file1
...
I want two write a function which for given two paths will tell whether the file or directory specified by the second path is inside (including sub-directories) the directory specified by the first path.
I think the most obvious solution is to find real full paths of both file system entries then check whether the first real path is a prefix of the second. That is why the title of question is about finding real full paths.
NOTE: I want to write the function for both Windows and POSIX compatible systems.
NOTE: boost::filesystem cannot be used.
In Windows and Unix-land alike there is no single “real path”. In particular a file can have many different directory entries, called hardlinks, in Unix-land created via ln and in Windows 7 and later via mklink. But also, in Windows you can very simply define a local logical drive mapped to some directory, via the subst command, and drives mapped to file server directories via e.g. net use, and you can mount a drive as a directory, e.g. via the mountvol command.
However, the “real path” problem is just an imagined solution to the real problem, which is to establish whether a file or directory is inside a directory specified via a path.
For that, establish a system-specfic ID for the filesystem entity that you're searching for, and scan up the parent directory chain looking for that ID. Sorry, I misread the question. I can't think of any efficient way to do this, it sounds like brute force ID search through all possible directories, unless you can avail yourself of indexing information.
The question you need to know up front is this: How many ways are there to get to /path/to/filename? With symbolic links the answer is infinite (well, within the bound of the filesystem size). Any symbolic link anywhere on any portion of the filesystem could redirect to the file (or some portion of the path above the file). Even without considering hard links the search space must be the entire filesystem under /base/path/of/interest/ (which may be the entire filesystem).
Allowing symbolic links, and without further limitations, there is no non-brute-force method for establishing whether /path/to/filename is reachable within /base/path/of/interest/.

Is there a faster alternative to enumerating folders than FindFirstFile/FindNextFile with C++?

I need to get all paths to subfolders within a folder (with WinAPIs and C++.) So far the only solution that I found is recursively calling FindFirstFile / FindNextFile but it takes a significant amount of time to do this on a folder with a deeper hierarchy.
So I was wondering, just to get folder names, is there a faster approach?
If you really just need subfolders you should be able to use FindFirstFileEx with
search options to filter out non-directories.
The docs suggest this is an advisory flag only, but your filesystem may support this optimization - give it a try.
FindExSearchLimitToDirectories
This is an advisory flag. If the file
system supports directory filtering, the function searches for a file
that matches the specified name and is also a directory. If the file
system does not support directory filtering, this flag is silently
ignored.
A faster approach would be to bypass the FindFirstFile...() API and go straight to the file system directly. You can use DeviceIoControl() with the FSCTL_ENUM_USN_DATA control to access the master file table, at least on NTFS formatted volumes. With that information, you can directly access the records for files/folders, which includes their attributes, parent info, etc. Yes, it would be more work, but it should also be faster since you can optimize the code to access just the pieces you need.
That is the fastest approach you can come across. Also you may consider using another thread to manage directory enumerations as it takes a lot of time. even Microsoft file explorer spend some time if the directory has a lot of sub folders/files.
One more thing here is that you can enumerate directories once and then register for any updates. so the cost of enumerating the folder should be made only once during start up.

Getting list of files and folders on the user's computer with the filename filtered by the text line

Currently I'm developing a project that should do the thing described above on Windows. I have the idea to recurcively go through all user's drives and collect all information on then, but it seems to be really time consuming. So is there a better way to do such thing (maybe to use OS's index file or NTFS MFT)?
I use C++/Qt.
You can search for any of the many code examples for this and use one.
The library finctions which you use FindFirstFile and FindNextFile are optimized and will go firectly to the FAT. They are coded by microsoft & I doubt that there is a faster way.
Btw, what do mean by "filtered by the text line"? Do you mean you want only filenames matching a certain pattern (use teh above) or files containing a string?

Finding unique path name for a given input

I'm working on a problem where I need to have a way to convert a user inputted filename to a unique path name. Let's say I let the user specify a path name that points to a file that contains some data. I can then do Data* pData=Open(PathName). Now if the user specifies the same path name again, I'd like to be able to have a table of already opened files and just return a pointer to the same data: Data* pData2=GetOpenedData(PathName). This is easy to accomplish with a simple std::map<std::string,Data*>, the problem is that different values of PathName can point to the same file. The simplest case is on Windows case insensitivity comes into play.
The code is cross platform C++ and I don't have access to .NET stuff (but I'm happy to #ifdef the differences between Windows and UNIX if needed). Does anyone know of either Windows API or POSIX functions that can take a path name and return a unique (to the system) string that I can key off of. The key doesn't have to be the same in both systems (Windows/POSIX), just unique within a running instance of my code.
For now, I'm not worried about links or two ways to get to the same file. Such as in Windows, if I had \myserver\share mapped to S: then \myserver\share\blah and S:\blah are the same file, but I can live with those being thought of as different. But S:\blah and S:\Blah should be the same. If there is a way to make \myserver\share and S:\ also be unique, that's a bonus and I'd be really happy, but I can live without it. (Likewise, if there are multiple links to the same file in UNIX).
Edited to add:
It's not as simple as just doing a case insensitive search in windows. For example: c://data/mydata.dat while that's an "invalid" filename, windows will accept it and it will actualy point to c:\data\mydata.dat
Edited to add another thing:
I'd also like c:\mydirectory\..\blah.dat to be recognized at the same as c:\blah.dat
For Windows, PathCanonicalize() is your friend. The shell path handing package in Windows has a few additional routines that'll help you out.
Unfortunately, I'm not sure what the Unix equivalents to this package is.
For Windows you can store the full path of a resource making all lowercase (or uppercase).
I don't use *nix so can't tell about that. But I believe in *nix systems case does matter (\home\a and \home\A are different). If that is the case then you can omit converting case of user input for *nix.
You can optionally instantiate std::map with a third template argument, which is the comparison function/functor (see e.g. http://www.cplusplus.com/reference/stl/map/). You could provide a case-insensitive string comparison function.
I believe Scott Meyers provides a good example of such a function in Effective STL; I can check this when I get home.

Path sanitization in C++

I'm writing a small read-only FTP-like server. Client says "give me that file" and my server sends it.
Is there any standard way (a library function?!?) to make sure that the file requested is not "../../../../../etc/passwd" or any other bad thing? It would be great if I could limit all queries to a directory (and its subdirectories).
Thank you!
Chroot is probably the best way to go, but you can use realpath(3) to determine the canonical path to a given filename. From the man page:
char *realpath(const char *file_name, char *resolved_name);
The realpath() function resolves all symbolic links, extra '/' characters, and references to /./ and /../ in filename, and copies the resulting absolute pathname into the memory referenced by resolved name. The resolved_name argument must refer to a buffer capable of storing at least PATH_MAX characters.
From there you can restrict the request in any additional way you like.
Also take a look at chroot
While this isn't perfect, you can run your ftp server under a specific user/group, and only permission certain directories to that user/group. This, however, may not be exactly what you're looking for.
You can also have a whitelist of directories users can go to, and any others they try to go to, you simply don't allow (thus, basically building your own permissioning).
I, personally, prefer the former, as the work is already done for you by the OS.
Get the inode of the root (/) directory, and that of the serving directory (say /ftp/pub). For the files they request, make sure that:
The file exists.
The parents of the file (accessed using multiple "/.." on the file path) hit the serving directory inode before it hits the root directory inode.
You can use stat to find the inode of any directory. Put this in one function, and call it before serving the file.
Of course using a user/group with appropriate privilege will work as well.
I don't know of a standard library that accomplishes this.
You could try:
Set the permissions of the unix user who is running the server to only have read/write permissions to a certain directory. (maybe using PAM, a chrooted environment, or using standard unix user/group permissions)
You could design your program so that it only accepts absolute paths (in unix, paths beginning with '/'). That way, you can check to make sure it is a valid path - for example, disallow any path which has the string ".."
edit:
From Peter: looks like there is a library function, realpath() which helps with #2 from above.
In Windows, I would do something like (still applies to any OS though):
User requests file
Server finds file
Server checks if path_to_file starts with "C:/SomeFolderWithFiles/"
Finish transaction