Path sanitization in C++ - c++

I'm writing a small read-only FTP-like server. Client says "give me that file" and my server sends it.
Is there any standard way (a library function?!?) to make sure that the file requested is not "../../../../../etc/passwd" or any other bad thing? It would be great if I could limit all queries to a directory (and its subdirectories).
Thank you!

Chroot is probably the best way to go, but you can use realpath(3) to determine the canonical path to a given filename. From the man page:
char *realpath(const char *file_name, char *resolved_name);
The realpath() function resolves all symbolic links, extra '/' characters, and references to /./ and /../ in filename, and copies the resulting absolute pathname into the memory referenced by resolved name. The resolved_name argument must refer to a buffer capable of storing at least PATH_MAX characters.
From there you can restrict the request in any additional way you like.

Also take a look at chroot

While this isn't perfect, you can run your ftp server under a specific user/group, and only permission certain directories to that user/group. This, however, may not be exactly what you're looking for.
You can also have a whitelist of directories users can go to, and any others they try to go to, you simply don't allow (thus, basically building your own permissioning).
I, personally, prefer the former, as the work is already done for you by the OS.

Get the inode of the root (/) directory, and that of the serving directory (say /ftp/pub). For the files they request, make sure that:
The file exists.
The parents of the file (accessed using multiple "/.." on the file path) hit the serving directory inode before it hits the root directory inode.
You can use stat to find the inode of any directory. Put this in one function, and call it before serving the file.
Of course using a user/group with appropriate privilege will work as well.

I don't know of a standard library that accomplishes this.
You could try:
Set the permissions of the unix user who is running the server to only have read/write permissions to a certain directory. (maybe using PAM, a chrooted environment, or using standard unix user/group permissions)
You could design your program so that it only accepts absolute paths (in unix, paths beginning with '/'). That way, you can check to make sure it is a valid path - for example, disallow any path which has the string ".."
edit:
From Peter: looks like there is a library function, realpath() which helps with #2 from above.

In Windows, I would do something like (still applies to any OS though):
User requests file
Server finds file
Server checks if path_to_file starts with "C:/SomeFolderWithFiles/"
Finish transaction

Related

How to get real full path of a file or directory having one of its paths?

The same file system entry can be accessible in several paths.
real full path - /home/user/dir1/file1
path which contains parent dirs - /home/user/dir1/../dir1/file1
path with direct symlinks - /home/user/dir1/symlink_to_file1
path with indirect symlinks - /home/user/symlink_to_dir1/file1
...
I want two write a function which for given two paths will tell whether the file or directory specified by the second path is inside (including sub-directories) the directory specified by the first path.
I think the most obvious solution is to find real full paths of both file system entries then check whether the first real path is a prefix of the second. That is why the title of question is about finding real full paths.
NOTE: I want to write the function for both Windows and POSIX compatible systems.
NOTE: boost::filesystem cannot be used.
In Windows and Unix-land alike there is no single “real path”. In particular a file can have many different directory entries, called hardlinks, in Unix-land created via ln and in Windows 7 and later via mklink. But also, in Windows you can very simply define a local logical drive mapped to some directory, via the subst command, and drives mapped to file server directories via e.g. net use, and you can mount a drive as a directory, e.g. via the mountvol command.
However, the “real path” problem is just an imagined solution to the real problem, which is to establish whether a file or directory is inside a directory specified via a path.
For that, establish a system-specfic ID for the filesystem entity that you're searching for, and scan up the parent directory chain looking for that ID. Sorry, I misread the question. I can't think of any efficient way to do this, it sounds like brute force ID search through all possible directories, unless you can avail yourself of indexing information.
The question you need to know up front is this: How many ways are there to get to /path/to/filename? With symbolic links the answer is infinite (well, within the bound of the filesystem size). Any symbolic link anywhere on any portion of the filesystem could redirect to the file (or some portion of the path above the file). Even without considering hard links the search space must be the entire filesystem under /base/path/of/interest/ (which may be the entire filesystem).
Allowing symbolic links, and without further limitations, there is no non-brute-force method for establishing whether /path/to/filename is reachable within /base/path/of/interest/.

How to tell if folder is a subfolder for recursive folder copy?

I'm trying to implement a folder copy method that calls FindFirstFile and FindNextFile in a loop, that may call itself recursively on any subfolders.
To prevent an obvious infinite loop I need to make sure that the destination folder is not a subfolder of a source folder. The question is how to do that? My thinking was to translate a DOS path into a device specific path (need to find out how) but there seems to be more to it.
So I'm testing it for this situation:
I set up My Documents folder to be redirected to a network share to \\Server\Home\UserA\Documents, plus that folder is also mapped to the drive R: on the client machine. So that means that all of the following folders:
"R:\Documents\Subfolder1"
"\\Server\Home\UserA\Documents\Subfolder1"
"C:\Users\UserA\Documents\Subfolder1"
point technically to the same physical location, that is a subfolder of My Documents.
The question is how to know this reliably?
Use GetFileInformationByHandle to retrieve the volume serial number and file index for the destination directory and for each possible match. If both the serial number and the file index are the same, they are the same directory.
Note that you will need to use the FILE_FLAG_BACKUP_SEMANTICS flag in CreateFile in order to open a handle to a directory. (You do not need backup privilege to do so.)
It may be possible for cloned volumes to have the same serial number (I'm not sure offhand whether Windows forces a serial number change or not) so it might be wise to provide an option to the user that disables this check.

How to specify the directory for searching file all over the system

I am writing a file system program in C++. Now I try to write a file finding function. First I want program be able to search the file in all system. I use FindFirstFile and FindNextFile Windows API functions. First I should call FindFirstFile , and give it the directory, where it must search the file. But I don't know, how to specify the diirectory so that FindFirstFile searches in all the system.
Please, help me with that question. I will be very grateful for any help.
this is what I've found here:
you cannot use a trailing backslash () in the lpFileName input string
for FindFirstFile, therefore it may not be obvious how to search root
directories. If you want to see files or get the attributes of a root
directory, the following options would apply: To examine files in a
root directory, you can use "C:\*" and step through the directory by
using FindNextFile. To get the attributes of a root directory, use the
GetFileAttributes function. Note Prepending the string "\?\" does
not allow access to the root directory.
to get a list of available drives you might use GetLogicalDriveStrings(). This returns a double-null terminated list of null-terminated strings. E.g., say you had drives A, B and C in your machine. The returned string would look like this:
A:\<nul>B:\<nul>C:\<nul><nul>
https://stackoverflow.com/a/18573199/1141471

Is there any method to know whether a directory contain a sub directory?

I am woking in c++.
Is there any method to know whether a directory contain a sub directory?
CFileFind seems have to search through total files.
It is time consuming if the only subdirectory is at the end of the list and the there are lots of files.
for example: directory A contains 99995 files and one subdirectory at the end of FindNextFile List. had I try 99995 times, then say: yes, it contains subdirectory?
Raymond Chen from Microsoft has written a post that probably applies here: Computing the size of a directory is more than just adding file sizes. In essence, he explains that information like the size of a dir cannot be stored in the dir's entry, because different users might have different permissions, possibly making some of the files invisible to them. Therefore, the only way to get the size the user should see is to calculate it upon request from the user.
In your case, the answer probably stems from the same reasoning. The list of directories available to your app can only be determined when your app asks for it, as its view of the root directory might be different than another app's, running with different credentials. Why Windows store directories along with files I don't know, but that's a given.
Since Win32 is as close as you'll get to the file system in user mode, I'd avoid any higher level solutions such as .NET, as it might only simplify the interface. A driver might work quicker, but that out of the scope of my knowledge.
If you are using the .Net framework you could use Directory.GetDirectories and check is the size of the array is 0. Do not know how if this will give you speed.
If you have control over the directories you could apply a naming convention so that directories that have sub directories are named one way and directories with out sub directories are named another.
You can try using the boost filesystem library.
A class by name directory_iterator [ declared in boost/filesystem/operations.hpp ] has many functions which can be used for listing files, finding whether the file is a sub-directory ( is_directory -- I guess this is what you are looking for ) etc..
Refer the following link for more information.
link text
It seems you are using MFC [ just saw that you are using CFileFind ], didn't see that earlier.
Sorry, Didn't have much info. You may have to use FindFirstFile/FindNextFile.
Whether this can be done very fast is entirely platform-dependent.
On Win32 you use FindFirstFile/FindNextFile or wrappers on top of those like MFC CFileFind and they list items in some order that can't be forced to list directories first.

On Windows, when should you use the "\\\\?\\" filename prefix?

I came across a c library for opening files given a Unicode filename. Before opening the file, it first converts the filename to a path by prepending "\\?\". Is there any reason to do this other than to increase the maximum number of characters allowed in the path, per this msdn article?
It looks like these "\\?\" paths require the Unicode versions of the Windows API and standard library.
Yes, it's just for that purpose. However, you will likely see compatibility problems if you decide to creating paths over MAX_PATH length. For example, the explorer shell and the command prompt (at least on XP, I don't know about Vista) can't handle paths over that length and will return errors.
The best use for this method is probably not to create new files, but to manage existing files, which someone else may have created.
I managed a file server which routinely would get files with path_length > MAX_PATH. You see, the users saw the files as H:\myfile.txt, but on the server it was actually H:\users\username\myfile.txt. So if a user created a file with exactly MAX_PATH characters, on the server it was MAX_PATH+len("users\username").
(Creating a file with MAX_PATH characters is not so uncommon, since when you save a web page on Internet Explorer it uses the page title as the filename, which can be quite long for some pages).
Also, sharing a drive (via network or usb) with a Mac or a Linux machine, you can find yourself with files with names like con, prn or lpt1. And again, the prefix lets you and your scripts handle those files.
I think the first thing to note is that "\\?\" does not make the path a UNC path. You were more accurate the second time when you called it a UNC-style path. But even then, the similarity only comes from having two backslashes at the start. It really has nothing to do with UNC. That's backed up by the fact that you have to use even more characters to get a UNC path with the "\\?\" prefix.
I think you've got the entire reason for using that prefix. It lifts the maximum-length limit as described in the article you cited. And it only applies to Unicode paths; non-Unicode paths don't get to avoid the limit by using that prefix.
One thing to note is that the prefix is not allowed for relative paths, only for absolute ones. You might want to double-check that your C library honors that restriction.
As well as allowing longer paths, the "\\?\" prefix also lets you use files and directory names like "con" and "aux". Normally Windows would interpret those as old-fashioned DOS devices.
I've been writing Windows code since 1995, and although I'm aware of that prefix, I've never found any reason to use it. Increasing the path length beyond MAX_PATH seems to be the only reason for it, and neither I nor any of my programs' customers have ever done so, to my knowledge.