Finding unique path name for a given input - c++

I'm working on a problem where I need to have a way to convert a user inputted filename to a unique path name. Let's say I let the user specify a path name that points to a file that contains some data. I can then do Data* pData=Open(PathName). Now if the user specifies the same path name again, I'd like to be able to have a table of already opened files and just return a pointer to the same data: Data* pData2=GetOpenedData(PathName). This is easy to accomplish with a simple std::map<std::string,Data*>, the problem is that different values of PathName can point to the same file. The simplest case is on Windows case insensitivity comes into play.
The code is cross platform C++ and I don't have access to .NET stuff (but I'm happy to #ifdef the differences between Windows and UNIX if needed). Does anyone know of either Windows API or POSIX functions that can take a path name and return a unique (to the system) string that I can key off of. The key doesn't have to be the same in both systems (Windows/POSIX), just unique within a running instance of my code.
For now, I'm not worried about links or two ways to get to the same file. Such as in Windows, if I had \myserver\share mapped to S: then \myserver\share\blah and S:\blah are the same file, but I can live with those being thought of as different. But S:\blah and S:\Blah should be the same. If there is a way to make \myserver\share and S:\ also be unique, that's a bonus and I'd be really happy, but I can live without it. (Likewise, if there are multiple links to the same file in UNIX).
Edited to add:
It's not as simple as just doing a case insensitive search in windows. For example: c://data/mydata.dat while that's an "invalid" filename, windows will accept it and it will actualy point to c:\data\mydata.dat
Edited to add another thing:
I'd also like c:\mydirectory\..\blah.dat to be recognized at the same as c:\blah.dat

For Windows, PathCanonicalize() is your friend. The shell path handing package in Windows has a few additional routines that'll help you out.
Unfortunately, I'm not sure what the Unix equivalents to this package is.

For Windows you can store the full path of a resource making all lowercase (or uppercase).
I don't use *nix so can't tell about that. But I believe in *nix systems case does matter (\home\a and \home\A are different). If that is the case then you can omit converting case of user input for *nix.

You can optionally instantiate std::map with a third template argument, which is the comparison function/functor (see e.g. http://www.cplusplus.com/reference/stl/map/). You could provide a case-insensitive string comparison function.
I believe Scott Meyers provides a good example of such a function in Effective STL; I can check this when I get home.

Related

How to check in a portable way that a file path is potentially valid?

I need to determine that a string entered by a user is OK to create a file with that name. My application is built on Qt and runs on Windows and Mac OS.
I've found a check function in boost.filesystem, namely native(). The documentation says, 'Returns true for names considered valid by the operating system's native file systems.' Sounds like what I need, but the function doesn't work properly and returns false always. I've tried both back and forward slashes in the path, and tested the function with both existing and non-existing paths—all these tests failed on Windows. Thanks to chris (see a comment below) who pointed that the function may be intentionally broken (I tend to agree with that).
So the question is: how to achieve what I need?

Check file name is valid windows name

I wanna check if my string is valid windows file path. I was searching around and it seems that there is no reliable method to do that. Also I checked boost filesystem library , and no obvious function exist to do this check , maybe something like is_valid_windows_name
You could use _splitpath() function and parse the output (based on it, you could easily say if your path is valid or not).
See MSDN for additional information.
Note that this function is windows-specific.
I do not believe there is a standard c++ api for that.
Note that the Windows API allows more filenames than the Windows Shell (The filenames the user is allowed to use in windws explorer).
You should have a look at the windows shell api.
Another possibility is to use trial and error, this way you are truly independend of the current filesystem.
The easiest way is to disallow
\ / < > | " : ? *
and you should be fine.
Yes, there is a boost function that does what you want. Take a look at boost::filesystem::windows_name(...). You will need to include boost/filesystem/path.hpp as well as link against the correct (version- and architecture-specific) libboost_system and libboost_filesystem libraries since path is not a header-only lib.
It's a pity even the newest C++17 filesystem library doesn't have a function to verify file names.
You can use the Windows-specific Shell Lightweight Utility function PathFileExists or the Windows API GetFileAttributes and check the last error code specifically for ERROR_INVALID_NAME.
I think it's kind of a misuse (because there really should be a dedicated function for it) but serves the purpose.

How to fix file path case?

I have a set of local paths, and some of them are capitalized (C:\SOMEDIR\SOMEFILE.TXT). I need to convert them to their real names (as shown in Explorer). Suggest a way plz.
Pass your file path to FindFirstFile, the resulting WIN32_FIND_DATA.cFileName will be in the correct case as read from the file system.
Are you looking for that: short names vs. long name?
Note that Explorer applies some tricky conversions to the file names. You easiest shot is to test it all and make your own function for the purpose.
Otherwise, you can try to access via OLE the Explorer's functions to list files in directory. But that is VERY painful and error prone to code. (Why I guess MS has given up in the end and provided the functions I have linked to above in Win2K).
I suppose you could always use OpenFile to get a handle, and then use the method here to get the filename from that handle. I haven't tried it, but it seems likely to give the "explorer name"
Obtaining a File Name From a File Handle
I don't know of any direct API call to do this in a single line.

On Windows, when should you use the "\\\\?\\" filename prefix?

I came across a c library for opening files given a Unicode filename. Before opening the file, it first converts the filename to a path by prepending "\\?\". Is there any reason to do this other than to increase the maximum number of characters allowed in the path, per this msdn article?
It looks like these "\\?\" paths require the Unicode versions of the Windows API and standard library.
Yes, it's just for that purpose. However, you will likely see compatibility problems if you decide to creating paths over MAX_PATH length. For example, the explorer shell and the command prompt (at least on XP, I don't know about Vista) can't handle paths over that length and will return errors.
The best use for this method is probably not to create new files, but to manage existing files, which someone else may have created.
I managed a file server which routinely would get files with path_length > MAX_PATH. You see, the users saw the files as H:\myfile.txt, but on the server it was actually H:\users\username\myfile.txt. So if a user created a file with exactly MAX_PATH characters, on the server it was MAX_PATH+len("users\username").
(Creating a file with MAX_PATH characters is not so uncommon, since when you save a web page on Internet Explorer it uses the page title as the filename, which can be quite long for some pages).
Also, sharing a drive (via network or usb) with a Mac or a Linux machine, you can find yourself with files with names like con, prn or lpt1. And again, the prefix lets you and your scripts handle those files.
I think the first thing to note is that "\\?\" does not make the path a UNC path. You were more accurate the second time when you called it a UNC-style path. But even then, the similarity only comes from having two backslashes at the start. It really has nothing to do with UNC. That's backed up by the fact that you have to use even more characters to get a UNC path with the "\\?\" prefix.
I think you've got the entire reason for using that prefix. It lifts the maximum-length limit as described in the article you cited. And it only applies to Unicode paths; non-Unicode paths don't get to avoid the limit by using that prefix.
One thing to note is that the prefix is not allowed for relative paths, only for absolute ones. You might want to double-check that your C library honors that restriction.
As well as allowing longer paths, the "\\?\" prefix also lets you use files and directory names like "con" and "aux". Normally Windows would interpret those as old-fashioned DOS devices.
I've been writing Windows code since 1995, and although I'm aware of that prefix, I've never found any reason to use it. Increasing the path length beyond MAX_PATH seems to be the only reason for it, and neither I nor any of my programs' customers have ever done so, to my knowledge.

C++ Passing Options To Executable

How do you pass options to an executable? Is there an easier way than making the options boolean arguments?
EDIT: The last two answers have suggested using arguments. I know I can code a workable solution like that, but I'd rather have them be options.
EDIT2: Per requests for clarification, I'll use this simple example:
It's fairly easy to handle arguments because they automatically get parsed into an array.
./printfile file.txt 1000
If I want to know what the name of the file the user wants to print, I access it via argv[1].
Now about how this situation:
./printfile file.txt 1000 --nolinebreaks
The user wants to print the file with no line breaks. This is not required for the program to be able to run (as the filename and number of lines to print are), but the user has the option of using if if s/he would like. Now I could do this using:
./printfile file.txt 1000 true
The usage prompt would inform the user that the third argument is used to determine whether to print the file with line breaks or not. However, this seems rather clumsy.
Command-line arguments is the way to go. You may want to consider using Boost.ProgramOptions to simplify this task.
You seem to think that there is some fundamental difference between "options" that start with "--" and "arguments" that don't. The only difference is in how you parse them.
It might be worth your time to look at GNU's getopt()/getopt_long() option parser. It supports passing arguments with options such as --number-of-line-breaks 47.
I use two methods for passing information:
1/ The use of command line arguments, which are made easier to handle with specific libraries such as getargs.
2/ As environment variables, using getenv.
Pax has the right idea here.
If you need more thorough two-way communication, open the process with pipes and send stuff to stdin/listen on stdout.
You can also use Window's PostMessage() function. This is very handy if the executable you want to send the options to is already running. I can post some example code if you are interested in this technique.
The question isn't blazingly clear as to the context and just what you are trying to do - you mean running an executable from within a C++ program? There are several standard C library functions with names like execl(), execv(), execve(), ... that take the options as strings or pointer to an array of strings. There's also system() which takes a string containing whatever you'd be typing at a bash prompt, options and all.
I like the popt library. It is C, but works fine from C++ as well.
It doesn't appear to be cross-platform though. I found that out when I had to hack out my own API-compatible version of it for a Windows port of some Linux software.
You can put options in a .ini file and use the GetPrivateProfileXXX API's to create a class that can read the type of program options you're looking for from the .ini.
You can also create an interactive shell for your app to change certain settings real-time.
EDIT:
From your edits, can't you just parse each option looking for special keywords associated with that option that are "optional"?