Method for abstracting filesystems in a C program - c++

I'm starting out a program in SDL which obviously needs to load resources for the filesystem.
I'd like file calls within the program to be platform-independent. My initial idea is to define a macro (lets call it PTH for path) that is defined in the preprocessor based on system type and and then make file calls in the program using it.
For example
SDL_LoadBMP(PTH("data","images","filename"));
would simply translate to something filesystem-relevant.
If macros are the accepted way of doing this, what would such macros look like (how can I check for which system is in use, concatenate strings in the macro?)
If not, what is the accepted way of doing this?

The Boost Filesystem module is probably your best bet. It has override for the "/" operator on paths so you can do stuff like...
ifstream file2( arg_path / "foo" / "bar" );

GLib has a number of portable path-manipulation functions. If you prefer C++, there's also boost::filesystem.

There's no need to have this as a macro.
One common approach is to abstract paths to use the forward slash as a separator, since that (almost accidentally!) maps very well to a large proportion of actual platforms. For those where it doesn't, you simply translate inside your file system implementation layer.

Looking at the python implementation for OS9 os.path.join (macpath)
def join(s, *p):
path = s
for t in p:
if (not s) or isabs(t):
path = t
continue
if t[:1] == ':':
t = t[1:]
if ':' not in path:
path = ':' + path
if path[-1:] != ':':
path = path + ':'
path = path + t
return path
I'm not familiar with developing under SDL on older Macs. Another alternative in game resources is to use a package file format, and load the resources into memory directly (such as a map < string, SDL_Surface > )
Thereby you would load one file (perhaps even a zip, unzipped at load time)

I would simply do the platform-equivalent version of chdir(data_base_dir); in your program's startup code, then use relative unix-style paths of the form "images/filename". The last systems where this would not work were MacOS 9, which is completely irrelevant now.

Related

How to read a file name containing 'œ' as character in C/C++ on windows

This post is not a duplicate of this one: dirent not working with unicode
Because here I'm using it on a different OS and I also don't want to do the same thing. The other thread is trying to simply count the files, and I want to access the file name which is more complex.
I'm trying to retrieve data information through files names on a windows 10 OS.
For this purpose I use dirent.h(external c library, but still very usefull also in c++).
DIR* directory = opendir(path);
struct dirent* direntStruct;
if (directory != NULL)
{
while (direntStruct = readdir(directory))
{
cout << direntStruct->d_name << endl;
}
}
This code is able to retrieve all files names located in a specific folder (one by one). And it works pretty well!
But when it encounter a file containing the character 'œ' then things are going crazy:
Example:
grosse blessure au cœur.txt
is read in my program as:
GUODU0~6.TXT
I'm not able to find the original data in the string name because as you can see my string variable has nothing to do with the current file name!
I can rename the file and it works, but I don't want to do this, I just need to read the data from that file name and it seems impossible. How can I do this?
On Windows you can use FindFirstFile() or FindFirstFileEx() followed by FindNextFile() to read the contents of a directory with Unicode in the returned file names.
Short File Name
The name you receive is the 8.3 short file name NTFS generates for non-ascii file names, so they can be accessed by programs that don't support unicode.
clinging to dirent
If dirent doesn't support UTF-16, your best bet may be to change your library.
However, depending on the implementation of the library you may have luck with:
adding / changing the manifest of your application to support UTF-8 in char-based Windows API's. This requires a very recent version of Windows 10.
see MSDN:
Use the UTF-8 code page under Windows - Apps - UWP - Design and UI - Usability - Globalization and localization.
setting the C++ Runtime's code page to UTF-8 using setlocale
I do not recommend this, and I don't know if this will work.
life is change
Use std::filesystem to enumerate directory content.
A simple example can be found here (see the "Update 2017").
Windows only
You can use FindFirstFileW and FindNextFileW as platform API's that support UTF16 strings. However, with std::filesystem there's little reason to do so (at least for your use case).
If you're in C, use the OS functions directly, specifically FindFirstFileW and FindNextFileW. Note the W at the end, you want to use the wide versions of these functions to get back the full non-ASCII name.
In C++ you have more options, specifically with Boost. You have classes like recursive_directory_iterator which allow cross-platform file searching, and they provide UTF-8/UTF-16 file names.
Edit: Just to be absolutely clear, the file name you get back from your original code is correct. Due to backwards compatibility in Windows filesystems (FAT32 and NTFS), every file has two names: the "full", Unicode aware name, and the "old" 8.3 name from DOS days.
You can absolutely use the 8.3 name if you want, just don't show it to your users or they'll be (correctly) confused. Or just use the proper, modern API to get the real name.

A system function to convert relative path to full path that works even for non-exsting paths?

This question has been asked before, but pretty much all the answers boil down to the realpath function. Which doesn't work for paths that do not exist. I need a solution that will, and I want to call a POSIX or OS X framework function rather than hand-parse strings.
To reiterate: I need a function that takes an arbitrary path string and returns the equivalent path with no "./" or ".." elements.
Is there such a solution?
Are you sure there can be such a solution? I believe that not (because some directories could be typos or symbolic links to be created).
What do you expect your betterrealpath function to return for /tmp/someinexistentdirectory/foobar ? Perhaps the user intent was a symbolic link from his $HOME to /tmp/someinexistentdirectory ? Or perhaps it is a typo and the user wants /tmp/someexistentdirectory/foobar ...? And what about /tmp/someinexistentdirectory/../foobar? Should it be canonicalized as /tmp/foobar? Why?
Maybe using first dirname(3), then doing realpath(3) on that, then appending the basename(3) of the argument should be enough? In C something like:
const char*origpath = something();
char*duppath = strdup(origpath);
if (!duppath) { perror("strdup"); exit(EXIT_FAILURE); };
char*basepath = basename(duppath);
char*dirpath = dirname(duppath);
char*realdirpath = realpath(dirpath, NULL);
if (!realdirpath) { perror("realpath"); exit(EXIT_FAILURE); };
char* canonpath = NULL;
if (asprintf(&canonpath, "%s/%s", realdirpath, basepath) <= 0)
{ perror("asprintf"); exit(EXIT_FAILURE); };
free (duppath), duppath = NULL;
basepath = NULL, dirpath = NULL;
/// use canonpath below, don't forget to free it
Of course that example won't work for /tmp/someinexistentdirectory/foobar but would work for /home/violet/missingfile, assuming your home directory is /home/violet/ and is accessible (readable & executable) ...
Feel free to improve or adapt to C++ the above code. Don't forget to handle failures.
Remember that i-nodes are central to POSIX filesystems. A file (including a directory) can have one, zero, or several file paths... A directory (or a file) name can be rename-d by some other running process...
Perhaps you want to use a framework like Qt or POCO; they might provide something good enough for you...
Actually, I suggest you to code your betterrealpath function entirely yourself, using only syscalls(2) on Linux. You'll then have to think about all the weird cases... Also, use strace(1) on realpath(1) to understand what it is doing...
Alternatively, don't care about non-canonical paths containing ../ or symbol links in directories, and simply prepend the current directory (see getcwd(3)) to any path not starting with / .......

Accessing resources from program in Debian package structure

I've made a DEB package of an C++ app that I've created. I want this app to use resources in the "data" directory, which, in my tests (for convenience), is in the same location that the program binary, and I call it from inside the code by its relative path. In the Debian OS there are standard locations to put the data files in (something like /usr/share/...), and other location to put the binaries in (probably /usr/bin). I'd not like to put the paths hard-coded in my program, I think its a better practice to access an image by "data/img.png" than "/usr/share/.../data/img.png". All the GNU classic programs respect the directories structure, and I imagine they do it in a good manner. I tried to use dpkg to find out the structure of the apps, but that didn't help me. Is there a better way that I'm doing to do this?
PS: I also want my code to be portable to Windows (cross-platform) avoiding using workarounds like "if WIN32" as much as possible.
In your Debian package you should indeed install your data in /usr/share/. When accessing your data, you should use the XDG standard, which states that $XDG_DATA_DIRS is a colon-separated list of data directories to search (also, "if $XDG_DATA_DIRS is either not set or empty, a value equal to /usr/local/share/:/usr/share/ should be used.").
This is not entirely linux specific or debian specific. I think is has something to do with Linux Standard Base or POSIX specifications maybe. I were unable to discover any specification quickly enough.
But you should not use some "base" directory and subdirectories in it for each type of data. Platform dependent code should belong into /usr/lib/programname, platform independent read-only data into /usr/share/programname/img.png. Data changed by application in /var/lib/programname/cache.db. Or ~/.programname/cache.db, depends what kind of application it is and what it does. Note: there is no need to "data" directory when /usr/share is already there for non-executable data.
You may want check http://www.debian.org/doc/manuals/developers-reference/best-pkging-practices.html if packaging for Debian. But it is not resources like in adroid or iphone, or windows files. These files are extracted on package install into target file system as real files.
Edit: see http://www.debian.org/doc/packaging-manuals/fhs/fhs-2.3.html
Edit2: As for multiplatform solution, i suggest you make some wrapper functions. On windows, it depends on installer, usually programs usually have path in registry to directory where they are installed. On unix, place for data is more or less given, you may consider build option for changing target prefix, or use environment variable to override default paths. On windows, prefix would be sufficient also, if it should not be too flexible.
I suggest some functions, where you will pass name of object and they will return path of file. It depends on toolkit used, Qt library may have something similar already implemented.
#include <string>
#ifdef WIN32
#define ROOT_PREFIX "c:/Program Files/"
const char DATA_PREFIX[] = ROOT_PREFIX "program/data";
#else
#define ROOT_PREFIX "/usr/"
/* #define ROOT_PREFIX "/usr/local/" */
const char DATA_PREFIX[] = ROOT_PREFIX "share/program";
#endif
std::string GetImageBasePath()
{
return std::string(DATA_PREFIX) + "/images";
}
std::string GetImagePath(const std::string &imagename)
{
// multiple directories and/or file types could be tried here, depends on how sophisticated
// it should be.
// you may check if such file does exist here for example and return only image type that does exist, if you can load multiple types.
return GetImageBasePath() + imagename + ".png";
}
class Image;
extern Image * LoadImage(const char *path);
int main(int argc, char *argv[])
{
Image *img1 = LoadImage(GetImagePath("toolbox").c_str());
Image *img2 = LoadImage(GetImagePath("openfile").c_str());
return 0;
}
It might be wise to make class Settings, where you can initialize platform dependent root paths once per start, and then use Settings::GetImagePath() as method.

Write a file in a specific path in C++

I have this code that writes successfully a file:
ofstream outfile (path);
outfile.write(buffer,size);
outfile.flush();
outfile.close();
buffer and size are ok in the rest of code.
How is possible put the file in a specific path?
Specify the full path in the constructor of the stream, this can be an absolute path or a relative path. (relative to where the program is run from)
The streams destructor closes the file for you at the end of the function where the object was created(since ofstream is a class).
Explicit closes are a good practice when you want to reuse the same file descriptor for another file. If this is not needed, you can let the destructor do it's job.
#include <fstream>
#include <string>
int main()
{
const char *path="/home/user/file.txt";
std::ofstream file(path); //open in constructor
std::string data("data to write to file");
file << data;
}//file destructor
Note you can use std::string in the file constructor in C++11 and is preferred to a const char* in most cases.
Rationale for posting another answer
I'm posting because none of the other answers cover the problem space.
The answer to your question depends on how you get the path. If you are building the path entirely within your application then see the answer from #James Kanze. However, if you are reading the path or components of the path from the environment in which your program is running (e.g. environment variable, command-line, config files etc..) then the solution is different. In order to understand why, we need to define what a path is.
Quick overview of paths
On the operating systems (that I am aware of), a path is a string which conforms to a mini-language specified by the operating-system and file-system (system for short). Paths can be supplied to IO functions on a given system in order to access some resource. For example here are some paths that you might encounter on Windows:
\file.txt
\\bob\admin$\file.txt
C:..\file.txt
\\?\C:\file.txt
.././file.txt
\\.\PhysicalDisk1\bob.txt
\\;WebDavRedirector\bob.com\xyz
C:\PROGRA~1\bob.txt
.\A:B
Solving the problem via path manipulation
Imagine the following scenario: your program supports a command line argument, --output-path=<path>, which allows users to supply a path into which your program should create output files. A solution for creating files in the specified directory would be:
Parse the user specified path based on the mini-language for the system you are operating in.
Build a new path in the mini-language which specifies the correct location to write the file using the filename and the information you parsed in step 1.
Open the file using the path generated in step 2.
An example of doing this:
On Linux, say the user has specified --output-path=/dir1/dir2
Parse this mini-language:
/dir1/dir2
--> "/" root
--> "dir1" directory under root
--> "/" path seperator
--> "dir2" directory under dir1
Then when we want to output a file in the specified directory we build a new path. For example, if we want to output a file called bob.txt, we can build the following path:
/dir1/dir2/bob.txt
--> "/" root
--> "dir1" directory under root
--> "/" path separator
--> "dir2" directory under dir1
--> "/" path seperator
--> "bob.txt" file in directory dir2
We can then use this new path to create the file.
In general it is impossible to implement this solution fully. Even if you could write code that could successfully decode all path mini-languages in existence and correctly represent the information about each system so that a new path could be built correctly - in the future your program may be built or run on new systems which have new path mini-languages that your program cannot handle. Therefore, we need to use a careful strategy for managing paths.
Path handling strategies
1. Avoid path manipulation entirely
Do not attempt to manipulate paths that are input to your program. You should pass these strings directly to api functions that can handle them correctly. This means that you need to use OS specific api's directly avoiding the C++ file IO abstractions (or you need to be absolutely sure how these abstractions are implemented on each OS). Make sure to design the interface to your program carefully to avoid a situation where you might be forced into manipulating paths. Try to implement the algorithms for your program to similarly avoid the need to manipulate paths. Document the api functions that your program uses on each OS to the user - this is because OS api functions themselves become deprecated over time so in future your program might not be compatible with all possible paths even if you are careful to avoid path manipulation.
2. Document the functions your program uses to manipulate paths
Document to the user exactly how paths will be manipulated. Then make it clear that it is the users responsibility to specify paths that will work correctly with the documented program behavior.
3. Only support a restricted set of paths
Restrict the path mini-languages your program will accept until you are confident that you can correctly manipulate the subset of paths that meet this set of restrictions. Document this to the user. Error if paths are input that do not conform.
4. Ignore the issues
Do some basic path manipulation without worrying too much. Accept that your program will exhibit undefined behavior for some paths that are input. You could document to the user that the program may or may not work when they input paths to it, and that it is the users responsibly to ensure that the program has handled the input paths correctly. However, you could also not document anything. Users will commonly expect that your program will not handle some paths correctly (many don't) and therefore will cope well even without documentation.
Closing thoughts
It is important to decide on an effective strategy for working with paths early on in the life-cycle of your program. If you have to change how paths are handled later it may be difficult to avoid a change in behaviour that might break the your program for existing users.
Try this:
ofstream outfile;
string createFile = "";
string path="/FULL_PATH";
createFile = path.as<string>() + "/" + "SAMPLE_FILENAME" + ".txt";
outfile.open(createFile.c_str());
outfile.close();
//It works like a charm.
That needs to be done when you open the file, see std::ofstream constructor or open() member.
It's not too clear what you're asking; if I understand correctly, you're
given a filename, and you want to create the file in a specific
directory. If that's the case, all that's necessary is to specify the
complet path to the constructor of ofstream. You can use string
concatenation to build up this path, but I'd strongly recommend
boost::filesystem::path. It has all of the functions to do this
portably, and a lot more; otherwise, you'll not be portable (without a
lot of effort), and even simple operations on the filename will require
considerable thought.
I was stuck on this for a while and have since figured it out. The path is based off where your executable is and varies a little. For this example assume you do a ls while in your executable directory and see:
myprogram.out Saves
Where Saves is a folder and myprogram.out is the program you are running.
In your code, if you are converting chars to a c_str() in a manner like this:
string file;
getline(cin, file, '\n');
ifstream thefile;
thefile.open( ("Saves/" + file + ".txt").c_str() );
and the user types in savefile, it would be
"Saves/savefile.txt"
which will work to get to to get to savefile.txt in your Saves folder. Notice there is no pre-slashes and you just start with the folder name.
However if you are using a string literal like
ifstream thefile;
thefile.open("./Saves/savefile.txt");
it would be like this to get to the same folder:
"./Saves/savefile.txt"
Notice you start with a ./ in front of the foldername.
If you are using linux, try execl(), with the command mv.

Check whether a string is a valid filename with Qt

Is there a way with Qt 4.6 to check if a given QString is a valid filename (or directory name) on the current operating system ? I want to check for the name to be valid, not for the file to exist.
Examples:
// Some valid names
test
under_score
.dotted-name
// Some specific names
colon:name // valid under UNIX OSes, but not on Windows
what? // valid under UNIX OSes, but still not on Windows
How would I achieve this ? Is there some Qt built-in function ?
I'd like to avoid creating an empty file, but if there is no other reliable way, I would still like to see how to do it in a "clean" way.
Many thanks.
This is the answer I got from Silje Johansen - Support Engineer - Trolltech ASA (in March 2008 though)
However. the complexity of including locale settings and finding
a unified way to query the filesystems on Linux/Unix about their
functionality is close to impossible.
However, to my knowledge, all applications I know of ignore this
problem.
(read: they aren't going to implement it)
Boost doesn't solve the problem either, they give only some vague notion of the maximum length of paths, especially if you want to be cross platform. As far as I know many have tried and failed to crack this problem (at least in theory, in practice it is most definitely possible to write a program that creates valid filenames in most cases.
If you want to implement this yourself, it might be worth considering a few not immediately obvious things such as:
Complications with invalid characters
The difference between file system limitations and OS and software limitations. Windows Explorer, which I consider part of the Windows OS does not fully support NTFS for example. Files containing ':' and '?', etc... can happily reside on an ntfs partition, but Explorer just chokes on them. Other than that, you can play safe and use the recommendations from Boost Filesystem.
Complications with path length
The second problem not fully tackled by the boost page is length of the full path. Probably the only thing that is certain at this moment is that no OS/filesystem combination supports indefinite path lengths. However, statements like "Windows maximum paths are limited to 260 chars" are wrong. The unicode API from Windows does allow you to create paths up to 32,767 utf-16 characters long. I haven't checked, but I imagine Explorer choking equally devoted, which would make this feature utterly useless for software having any users other than yourself (on the other hand you might prefer not to have your software choke in chorus).
There exists an old variable that goes by the name of PATH_MAX, which sounds promising, but the problem is that PATH_MAX simply isn't.
To end with a constructive note, here are some ideas on possible ways to code a solution.
Use defines to make OS specific sections. (Qt can help you with this)
Use the advice given on the boost page and OS and filesystem documentation to decide on your illegal characters
For path length the only workable idea that springs to my mind is a binary tree trial an error approach using the system call's error handling to check on a valid path length. This is quite aloof, but might be the only possibility of getting accurate results on a variety of systems.
Get good at elegant error handling.
Hope this has given some insights.
Based on User7116's answer here:
How do I check if a given string is a legal/valid file name under Windows?
I quit being lazy - looking for elegant solutions, and just coded it. I got:
bool isLegalFilePath(QString path)
{
if (!path.length())
return false;
// Anything following the raw filename prefix should be legal.
if (path.left(4)=="\\\\?\\")
return true;
// Windows filenames are not case sensitive.
path = path.toUpper();
// Trim the drive letter off
if (path[1]==':' && (path[0]>='A' && path[0]<='Z'))
path = path.right(path.length()-2);
QString illegal="<>:\"|?*";
foreach (const QChar& c, path)
{
// Check for control characters
if (c.toLatin1() >= 0 && c.toLatin1() < 32)
return false;
// Check for illegal characters
if (illegal.contains(c))
return false;
}
// Check for device names in filenames
static QStringList devices;
if (!devices.count())
devices << "CON" << "PRN" << "AUX" << "NUL" << "COM0" << "COM1" << "COM2"
<< "COM3" << "COM4" << "COM5" << "COM6" << "COM7" << "COM8" << "COM9" << "LPT0"
<< "LPT1" << "LPT2" << "LPT3" << "LPT4" << "LPT5" << "LPT6" << "LPT7" << "LPT8"
<< "LPT9";
const QFileInfo fi(path);
const QString basename = fi.baseName();
foreach (const QString& d, devices)
if (basename == d)
// Note: Names with ':' other than with a drive letter have already been rejected.
return false;
// Check for trailing periods or spaces
if (path.right(1)=="." || path.right(1)==" ")
return false;
// Check for pathnames that are too long (disregarding raw pathnames)
if (path.length()>260)
return false;
// Exclude raw device names
if (path.left(4)=="\\\\.\\")
return false;
// Since we are checking for a filename, it mustn't be a directory
if (path.right(1)=="\\")
return false;
return true;
}
Features:
Probably faster than using regexes
Checks for illegal characters and excludes device names (note that '' is not illegal, since it can be in path names)
Allows drive letters
Allows full path names
Allows network path names
Allows anything after \\?\ (raw file names)
Disallows anything starting with \\.\ (raw device names)
Disallows names ending in "\" (i.e. directory names)
Disallows names longer than 260 characters not starting with \\?\
Disallows trailing spaces and periods
Note that it does not check the length of filenames starting with \\?, since that is not a hard and fast rule. Also note, as pointed out here, names containing multiple backslashes and forward slashes are NOT rejected by the win32 API.
I don't think that Qt has a built-in function, but if Boost is an option, you can use Boost.Filesystem's name_check functions.
If Boost isn't an option, its page on name_check functions is still a good overview of what to check for on various platforms.
Difficult to do reliably on windows (some odd things such as a file named "com" still being invalid) and do you want to handle unicode, or subst tricks to allow a >260 char filename.
There is already a good answer here How do I check if a given string is a legal / valid file name under Windows?
see example (from Digia Qt Creator sources) in: https://qt.gitorious.org/qt-creator/qt-creator/source/4df7656394bc63088f67a0bae8733f400671d1b6:src/libs/utils/filenamevalidatinglineedit.cpp
I'd just create a simple function to validate the filename for the platform, which just searches through the string for any invalid characters. Don't think there's a built-in function in Qt. You could use #ifdefs inside the function to determine what platform you're on. Clean enough I'd say.