Check whether a string is a valid filename with Qt - c++

Is there a way with Qt 4.6 to check if a given QString is a valid filename (or directory name) on the current operating system ? I want to check for the name to be valid, not for the file to exist.
Examples:
// Some valid names
test
under_score
.dotted-name
// Some specific names
colon:name // valid under UNIX OSes, but not on Windows
what? // valid under UNIX OSes, but still not on Windows
How would I achieve this ? Is there some Qt built-in function ?
I'd like to avoid creating an empty file, but if there is no other reliable way, I would still like to see how to do it in a "clean" way.
Many thanks.

This is the answer I got from Silje Johansen - Support Engineer - Trolltech ASA (in March 2008 though)
However. the complexity of including locale settings and finding
a unified way to query the filesystems on Linux/Unix about their
functionality is close to impossible.
However, to my knowledge, all applications I know of ignore this
problem.
(read: they aren't going to implement it)
Boost doesn't solve the problem either, they give only some vague notion of the maximum length of paths, especially if you want to be cross platform. As far as I know many have tried and failed to crack this problem (at least in theory, in practice it is most definitely possible to write a program that creates valid filenames in most cases.
If you want to implement this yourself, it might be worth considering a few not immediately obvious things such as:
Complications with invalid characters
The difference between file system limitations and OS and software limitations. Windows Explorer, which I consider part of the Windows OS does not fully support NTFS for example. Files containing ':' and '?', etc... can happily reside on an ntfs partition, but Explorer just chokes on them. Other than that, you can play safe and use the recommendations from Boost Filesystem.
Complications with path length
The second problem not fully tackled by the boost page is length of the full path. Probably the only thing that is certain at this moment is that no OS/filesystem combination supports indefinite path lengths. However, statements like "Windows maximum paths are limited to 260 chars" are wrong. The unicode API from Windows does allow you to create paths up to 32,767 utf-16 characters long. I haven't checked, but I imagine Explorer choking equally devoted, which would make this feature utterly useless for software having any users other than yourself (on the other hand you might prefer not to have your software choke in chorus).
There exists an old variable that goes by the name of PATH_MAX, which sounds promising, but the problem is that PATH_MAX simply isn't.
To end with a constructive note, here are some ideas on possible ways to code a solution.
Use defines to make OS specific sections. (Qt can help you with this)
Use the advice given on the boost page and OS and filesystem documentation to decide on your illegal characters
For path length the only workable idea that springs to my mind is a binary tree trial an error approach using the system call's error handling to check on a valid path length. This is quite aloof, but might be the only possibility of getting accurate results on a variety of systems.
Get good at elegant error handling.
Hope this has given some insights.

Based on User7116's answer here:
How do I check if a given string is a legal/valid file name under Windows?
I quit being lazy - looking for elegant solutions, and just coded it. I got:
bool isLegalFilePath(QString path)
{
if (!path.length())
return false;
// Anything following the raw filename prefix should be legal.
if (path.left(4)=="\\\\?\\")
return true;
// Windows filenames are not case sensitive.
path = path.toUpper();
// Trim the drive letter off
if (path[1]==':' && (path[0]>='A' && path[0]<='Z'))
path = path.right(path.length()-2);
QString illegal="<>:\"|?*";
foreach (const QChar& c, path)
{
// Check for control characters
if (c.toLatin1() >= 0 && c.toLatin1() < 32)
return false;
// Check for illegal characters
if (illegal.contains(c))
return false;
}
// Check for device names in filenames
static QStringList devices;
if (!devices.count())
devices << "CON" << "PRN" << "AUX" << "NUL" << "COM0" << "COM1" << "COM2"
<< "COM3" << "COM4" << "COM5" << "COM6" << "COM7" << "COM8" << "COM9" << "LPT0"
<< "LPT1" << "LPT2" << "LPT3" << "LPT4" << "LPT5" << "LPT6" << "LPT7" << "LPT8"
<< "LPT9";
const QFileInfo fi(path);
const QString basename = fi.baseName();
foreach (const QString& d, devices)
if (basename == d)
// Note: Names with ':' other than with a drive letter have already been rejected.
return false;
// Check for trailing periods or spaces
if (path.right(1)=="." || path.right(1)==" ")
return false;
// Check for pathnames that are too long (disregarding raw pathnames)
if (path.length()>260)
return false;
// Exclude raw device names
if (path.left(4)=="\\\\.\\")
return false;
// Since we are checking for a filename, it mustn't be a directory
if (path.right(1)=="\\")
return false;
return true;
}
Features:
Probably faster than using regexes
Checks for illegal characters and excludes device names (note that '' is not illegal, since it can be in path names)
Allows drive letters
Allows full path names
Allows network path names
Allows anything after \\?\ (raw file names)
Disallows anything starting with \\.\ (raw device names)
Disallows names ending in "\" (i.e. directory names)
Disallows names longer than 260 characters not starting with \\?\
Disallows trailing spaces and periods
Note that it does not check the length of filenames starting with \\?, since that is not a hard and fast rule. Also note, as pointed out here, names containing multiple backslashes and forward slashes are NOT rejected by the win32 API.

I don't think that Qt has a built-in function, but if Boost is an option, you can use Boost.Filesystem's name_check functions.
If Boost isn't an option, its page on name_check functions is still a good overview of what to check for on various platforms.

Difficult to do reliably on windows (some odd things such as a file named "com" still being invalid) and do you want to handle unicode, or subst tricks to allow a >260 char filename.
There is already a good answer here How do I check if a given string is a legal / valid file name under Windows?

see example (from Digia Qt Creator sources) in: https://qt.gitorious.org/qt-creator/qt-creator/source/4df7656394bc63088f67a0bae8733f400671d1b6:src/libs/utils/filenamevalidatinglineedit.cpp

I'd just create a simple function to validate the filename for the platform, which just searches through the string for any invalid characters. Don't think there's a built-in function in Qt. You could use #ifdefs inside the function to determine what platform you're on. Clean enough I'd say.

Related

How to read a file name containing 'œ' as character in C/C++ on windows

This post is not a duplicate of this one: dirent not working with unicode
Because here I'm using it on a different OS and I also don't want to do the same thing. The other thread is trying to simply count the files, and I want to access the file name which is more complex.
I'm trying to retrieve data information through files names on a windows 10 OS.
For this purpose I use dirent.h(external c library, but still very usefull also in c++).
DIR* directory = opendir(path);
struct dirent* direntStruct;
if (directory != NULL)
{
while (direntStruct = readdir(directory))
{
cout << direntStruct->d_name << endl;
}
}
This code is able to retrieve all files names located in a specific folder (one by one). And it works pretty well!
But when it encounter a file containing the character 'œ' then things are going crazy:
Example:
grosse blessure au cœur.txt
is read in my program as:
GUODU0~6.TXT
I'm not able to find the original data in the string name because as you can see my string variable has nothing to do with the current file name!
I can rename the file and it works, but I don't want to do this, I just need to read the data from that file name and it seems impossible. How can I do this?
On Windows you can use FindFirstFile() or FindFirstFileEx() followed by FindNextFile() to read the contents of a directory with Unicode in the returned file names.
Short File Name
The name you receive is the 8.3 short file name NTFS generates for non-ascii file names, so they can be accessed by programs that don't support unicode.
clinging to dirent
If dirent doesn't support UTF-16, your best bet may be to change your library.
However, depending on the implementation of the library you may have luck with:
adding / changing the manifest of your application to support UTF-8 in char-based Windows API's. This requires a very recent version of Windows 10.
see MSDN:
Use the UTF-8 code page under Windows - Apps - UWP - Design and UI - Usability - Globalization and localization.
setting the C++ Runtime's code page to UTF-8 using setlocale
I do not recommend this, and I don't know if this will work.
life is change
Use std::filesystem to enumerate directory content.
A simple example can be found here (see the "Update 2017").
Windows only
You can use FindFirstFileW and FindNextFileW as platform API's that support UTF16 strings. However, with std::filesystem there's little reason to do so (at least for your use case).
If you're in C, use the OS functions directly, specifically FindFirstFileW and FindNextFileW. Note the W at the end, you want to use the wide versions of these functions to get back the full non-ASCII name.
In C++ you have more options, specifically with Boost. You have classes like recursive_directory_iterator which allow cross-platform file searching, and they provide UTF-8/UTF-16 file names.
Edit: Just to be absolutely clear, the file name you get back from your original code is correct. Due to backwards compatibility in Windows filesystems (FAT32 and NTFS), every file has two names: the "full", Unicode aware name, and the "old" 8.3 name from DOS days.
You can absolutely use the 8.3 name if you want, just don't show it to your users or they'll be (correctly) confused. Or just use the proper, modern API to get the real name.

How to use carriage return with multiple line?

When I want to print out another text in the same line, I can do this:
int i = 0;
string text = "Paragraph ";
while (i < 10) {
if (clock() % CLOCKS_PER_SEC == 0) {
cout << text << i + 1 << "\r";
cout.flush();
i++;
}
}
But, how I can I do this with multiple line? I want to retain a paragraph as a whole in its initial position in terminal. If I change text with a string that contains paragraph with some newline characters, it prints another new block of paragraph below the last printed.
How can I retain it's position?
Your question isn't very clear, but I'm going to assume you want to know how to overwrite text in places other than the current line.
Standard C++ doesn't give you this capability. You will have to use OS-specific functionality to place the cursor at an arbitrary place of the console.
Under Unix-like systems you will generally use ANSI escape sequences
Under Windows you're best served by the console manipulation functions, in particular SetConsoleCursorPosition. Look here for more console functions.
It is not possible in standard C++.
The technique depends on what the standard output device (i.e. std::cout) is - which is difficult, as that depends on the operating system and choices by the end user. For example, a lot of physical terminals (and terminal/console emulators) support escape sequences. Standard output can be redirected to various devices (including to a text file, which makes positioning the cursor a bit pointless).
In general terms, you will need to specify the output device (i.e. what your program can assume output is being written to), the host system, system settings, and a bunch of other things. And then use an API (or library) supported on the host system. Depending on your choices here, the techniques are highly variable.
Under unix, functions libraries like curses might be used. If you use curses, it will probably be necessary to use other curses functions to actually write your output (rather than cout).
Under windows, there is a set of console API functions (a subset of the win API), such as SetConsoleCursorPosition(). Again, it might be easier if you use other console functions, rather than cout.

FindFirstFile undocumented wildcard or bug?

MSDN says:
HANDLE WINAPI FindFirstFile( LPCTSTR lpFileName, LPWIN32_FIND_DATA lpFindFileData );
lpFileName The directory or path, and the file name, which can include wildcard characters, for example, an asterisk (*) or a question mark (?)...
Until today I didn't noticed the “for example”.
Assuming you have a “c:\temp” directory, the code below displays “temp”. Notice the searched directory: “c:\temp>”. If you have a “c:\temp1” directory and a “c:\tem” directory, FindNextFile will find “temp1” but will not find “tem”. I assumed that ‘<’ will find “tem” but I was wrong: it behaves in the same way. It does not matter how many ‘<’/’>’ you append: the behavior is the same.
From my point of view, this is a bug ('>'&'<' are not valid characters in a file name). From Microsoft’s point of view it may be a feature.
I did not manage to find a complete description of F*F’s behavior.
const TCHAR* s = _T("c:\\temp>");
{
WIN32_FIND_DATA d;
HANDLE h;
h = FindFirstFile( s, &d );
if ( h == INVALID_HANDLE_VALUE )
{
CString m;
m.Format( _T("FindFirstFile failed (%d)\n"), GetLastError() );
AfxMessageBox( m );
return;
}
else
{
AfxMessageBox( d.cFileName );
FindClose( h );
}
}
Edit 1:
In the first place I have tried to use Windows implementation of _stat. It worked fine with illegal characters ‘*’ and ‘?’, but ignored ‘>’, so I stepped in and noticed that the implementation took special care of the documented wildcards. I ended in FFF.
Edit 2:
I have filled two bug forms: one for FFF the other for _stat. I am now waiting for MS’s answer.
I do not think that it is normal to peek into something that is supposed to be a black-box and speculate. Therefore, my objections are based on what the “contract” says: “lpFileName [in] The directory or path, and the file name, which can include wildcard characters, for example, an asterisk (*) or a question mark (?). …” I am not a native English speaker. Maybe it means “these are not the only wildcards”, maybe not. However, if these are not the only wildcards, they should have listed all (maybe they will). At this point, I think the MS’s resolution will be “By Design” or “Won’t fix”.
Regarding _stat, which I think it is an ISO function, MSDN says: “Return value: Each of these functions returns 0 if the file-status information is obtained.” It does not say a thing about the wildcards, documented or not. I do not see what kind of information _stat may retrieve from “c:\temp*” or “c:\temp>>”. It is highly unlikely that someone is relying on current behavior, so they may issue a fix.
Edit 3:
Microsoft has closed the _stat bug as Fixed.
"... We have fixed this for the next major release of Visual Studio (this will be Visual Studio “14,” but note that the fix is not present in the Visual Studio “14” CTP that was released last week). In Visual Studio “14,” the _stat functions now use CreateFile to query existence and properties of a path. The change to use CreateFile was done to work around other quirks related to file permissions that were present in the old FindFirstFile-based implementation, but the change has also resolved this issue. ..."
According to a post on the OSR ntfsd list from 2002, this is an intentional feature of NtQueryDirectoryFile/ZwQueryDirectoryFile via FsRtlIsNameInExpression. < and > correspond to * and ?, but perform matching "using MS-DOS semantics".
The FsRtlIsNameInExpression states:
The following wildcard characters can be used in the pattern string.
Wildcard character Meaning
* (asterisk) Matches zero or more characters.
? (question mark) Matches a single character.
DOS_DOT Matches either a period or zero characters beyond the name
string.
DOS_QM Matches any single character or, upon encountering a period
or end of name string, advances the expression to the end of
the set of contiguous DOS_QMs.
DOS_STAR Matches zero or more characters until encountering and
matching the final . in the name.
For some reason, this page does not give the values of the DOS_* macros, but ntifs.h does:
// The following constants provide addition meta characters to fully
// support the more obscure aspects of DOS wild card processing.
#define DOS_STAR (L'<')
#define DOS_QM (L'>')
#define DOS_DOT (L'"')

GetDiskFreeSpaceEx with NULL Directory Name failing

I'm trying to use GetDiskFreeSpaceEx in my C++ win32 application to get the total available bytes on the 'current' drive. I'm on Windows 7.
I'm using this sample code: http://support.microsoft.com/kb/231497
And it works! Well, almost. It works if I provide a drive, such as:
...
szDrive[0] = 'C'; // <-- specifying drive
szDrive[1] = ':';
szDrive[2] = '\\';
szDrive[3] = '\0';
pszDrive = szDrive;
...
fResult = pGetDiskFreeSpaceEx ((LPCTSTR)pszDrive,
    (PULARGE_INTEGER)&i64FreeBytesToCaller,
    (PULARGE_INTEGER)&i64TotalBytes,
(PULARGE_INTEGER)&i64FreeBytes);
fResult becomes true and i can go on to accurately calculate the number of free bytes available.
The problem, however, is that I was hoping to not have to specify the drive, but instead just use the 'current' one. The docs I found online (Here) state:
lpDirectoryName [in, optional]
A directory on the disk. If this parameter is NULL, the function uses the root of the current disk.
But if I pass in NULL for the Directory Name then GetDiskFreeSpaceEx ends up returning false and the data remains as garbage.
fResult = pGetDiskFreeSpaceEx (NULL,
    (PULARGE_INTEGER)&i64FreeBytesToCaller,
    (PULARGE_INTEGER)&i64TotalBytes,
(PULARGE_INTEGER)&i64FreeBytes);
//fResult == false
Is this odd? Surely I'm missing something? Any help is appreciated!
EDIT
As per JosephH's comment, I did a GetLastError() call. It returned the DWORD for:
ERROR_INVALID_NAME 123 (0x7B)
The filename, directory name, or volume label syntax is incorrect.
2nd EDIT
Buried down in the comments I mentioned:
I tried GetCurrentDirectory and it returns the correct absolute path, except it prefixes it with \\?\
it returns the correct absolute path, except it prefixes it with \\?\
That's the key to this mystery. What you got back is the name of the directory with the native api path name. Windows is an operating system that internally looks very different from what you are familiar with winapi programming. The Windows kernel has a completely different api, it resembles the DEC VMS operating system a lot. No coincidence, David Cutler used to work for DEC. On top of that native OS were originally three api layers, Win32, POSIX and OS/2. They made it easy to port programs from other operating systems to Windows NT. Nobody cared much for the POSIX and OS/2 layers, they were dropped at XP time.
One infamous restriction in Win32 is the value of MAX_PATH, 260. It sets the largest permitted size of a C string that stores a file path name. The native api permits much larger names, 32000 characters. You can bypass the Win32 restriction by using the path name using the native api format. Which is simply the same path name as you are familiar with, but prefixed with \\?\.
So surely the reason that you got such a string back from GetCurrentDirectory() is because your current directory name is longer than 259 characters. Extrapolating further, GetDiskFreeSpaceEx() failed because it has a bug, it rejects the long name it sees when you pass NULL. Somewhat understandable, it isn't normally asked to deal with long names. Everybody just passes the drive name.
This is fairly typical for what happens when you create directories with such long names. Stuff just starts falling over randomly. In general there is a lot of C code around that uses MAX_PATH and that code will fail miserably when it has to deal with path names that are longer than that. This is a pretty exploitable problem too for its ability to create stack buffer overflow in a C program, technically a carefully crafted file name could be used to manipulate programs and inject malware.
There is no real cure for this problem, that bug in GetDiskFreeSpaceEx() isn't going to be fixed any time soon. Delete that directory, it can cause lots more trouble, and write this off as a learning experience.
I am pretty sure you will have to retrieve the current drive and directory and pass that to the function. I remember attempting to use GetDiskFreeSpaceEx() with the directory name as ".", but that did not work.

Method for abstracting filesystems in a C program

I'm starting out a program in SDL which obviously needs to load resources for the filesystem.
I'd like file calls within the program to be platform-independent. My initial idea is to define a macro (lets call it PTH for path) that is defined in the preprocessor based on system type and and then make file calls in the program using it.
For example
SDL_LoadBMP(PTH("data","images","filename"));
would simply translate to something filesystem-relevant.
If macros are the accepted way of doing this, what would such macros look like (how can I check for which system is in use, concatenate strings in the macro?)
If not, what is the accepted way of doing this?
The Boost Filesystem module is probably your best bet. It has override for the "/" operator on paths so you can do stuff like...
ifstream file2( arg_path / "foo" / "bar" );
GLib has a number of portable path-manipulation functions. If you prefer C++, there's also boost::filesystem.
There's no need to have this as a macro.
One common approach is to abstract paths to use the forward slash as a separator, since that (almost accidentally!) maps very well to a large proportion of actual platforms. For those where it doesn't, you simply translate inside your file system implementation layer.
Looking at the python implementation for OS9 os.path.join (macpath)
def join(s, *p):
path = s
for t in p:
if (not s) or isabs(t):
path = t
continue
if t[:1] == ':':
t = t[1:]
if ':' not in path:
path = ':' + path
if path[-1:] != ':':
path = path + ':'
path = path + t
return path
I'm not familiar with developing under SDL on older Macs. Another alternative in game resources is to use a package file format, and load the resources into memory directly (such as a map < string, SDL_Surface > )
Thereby you would load one file (perhaps even a zip, unzipped at load time)
I would simply do the platform-equivalent version of chdir(data_base_dir); in your program's startup code, then use relative unix-style paths of the form "images/filename". The last systems where this would not work were MacOS 9, which is completely irrelevant now.