MSDN says:
HANDLE WINAPI FindFirstFile( LPCTSTR lpFileName, LPWIN32_FIND_DATA lpFindFileData );
lpFileName The directory or path, and the file name, which can include wildcard characters, for example, an asterisk (*) or a question mark (?)...
Until today I didn't noticed the “for example”.
Assuming you have a “c:\temp” directory, the code below displays “temp”. Notice the searched directory: “c:\temp>”. If you have a “c:\temp1” directory and a “c:\tem” directory, FindNextFile will find “temp1” but will not find “tem”. I assumed that ‘<’ will find “tem” but I was wrong: it behaves in the same way. It does not matter how many ‘<’/’>’ you append: the behavior is the same.
From my point of view, this is a bug ('>'&'<' are not valid characters in a file name). From Microsoft’s point of view it may be a feature.
I did not manage to find a complete description of F*F’s behavior.
const TCHAR* s = _T("c:\\temp>");
{
WIN32_FIND_DATA d;
HANDLE h;
h = FindFirstFile( s, &d );
if ( h == INVALID_HANDLE_VALUE )
{
CString m;
m.Format( _T("FindFirstFile failed (%d)\n"), GetLastError() );
AfxMessageBox( m );
return;
}
else
{
AfxMessageBox( d.cFileName );
FindClose( h );
}
}
Edit 1:
In the first place I have tried to use Windows implementation of _stat. It worked fine with illegal characters ‘*’ and ‘?’, but ignored ‘>’, so I stepped in and noticed that the implementation took special care of the documented wildcards. I ended in FFF.
Edit 2:
I have filled two bug forms: one for FFF the other for _stat. I am now waiting for MS’s answer.
I do not think that it is normal to peek into something that is supposed to be a black-box and speculate. Therefore, my objections are based on what the “contract” says: “lpFileName [in] The directory or path, and the file name, which can include wildcard characters, for example, an asterisk (*) or a question mark (?). …” I am not a native English speaker. Maybe it means “these are not the only wildcards”, maybe not. However, if these are not the only wildcards, they should have listed all (maybe they will). At this point, I think the MS’s resolution will be “By Design” or “Won’t fix”.
Regarding _stat, which I think it is an ISO function, MSDN says: “Return value: Each of these functions returns 0 if the file-status information is obtained.” It does not say a thing about the wildcards, documented or not. I do not see what kind of information _stat may retrieve from “c:\temp*” or “c:\temp>>”. It is highly unlikely that someone is relying on current behavior, so they may issue a fix.
Edit 3:
Microsoft has closed the _stat bug as Fixed.
"... We have fixed this for the next major release of Visual Studio (this will be Visual Studio “14,” but note that the fix is not present in the Visual Studio “14” CTP that was released last week). In Visual Studio “14,” the _stat functions now use CreateFile to query existence and properties of a path. The change to use CreateFile was done to work around other quirks related to file permissions that were present in the old FindFirstFile-based implementation, but the change has also resolved this issue. ..."
According to a post on the OSR ntfsd list from 2002, this is an intentional feature of NtQueryDirectoryFile/ZwQueryDirectoryFile via FsRtlIsNameInExpression. < and > correspond to * and ?, but perform matching "using MS-DOS semantics".
The FsRtlIsNameInExpression states:
The following wildcard characters can be used in the pattern string.
Wildcard character Meaning
* (asterisk) Matches zero or more characters.
? (question mark) Matches a single character.
DOS_DOT Matches either a period or zero characters beyond the name
string.
DOS_QM Matches any single character or, upon encountering a period
or end of name string, advances the expression to the end of
the set of contiguous DOS_QMs.
DOS_STAR Matches zero or more characters until encountering and
matching the final . in the name.
For some reason, this page does not give the values of the DOS_* macros, but ntifs.h does:
// The following constants provide addition meta characters to fully
// support the more obscure aspects of DOS wild card processing.
#define DOS_STAR (L'<')
#define DOS_QM (L'>')
#define DOS_DOT (L'"')
Related
Background
I'm working on a C++/MFC application and we've been converting it to display unicode characters to support foreign languages. For the most part this has been successful and unicode characters are displayed correctly. But I've encountered an issue where certain text on certain controls gets cut off.
Example
Here you can see a button that should display "ログアウト/終了" but gets cutoff and displays an unknown character in it's place.
But if I pad the string with spaces it displays fine. The number of spaces needed varies by string. This string needed 4 spaces to display correctly, whereas another string with one less character needed 5 spaces; there doesn't seem to be a correlation or pattern with the number of spaces needed. And also, I don't want to pad strings randomly throughout the code, especially when other languages don't need this at all.
What I've tried (doesn't work)
Shrinking the font size
Resizing the control
Changing the font facename
Changing the font character set
Copying the control properties from another control in the application that does not have this issue
Add extra null terminators
Padding with zero-width characters
Using SetWindowTextW
Changing source and execution character sets
Changing system locale
The only thing I've found that works is padding with an arbitrary amount of spaces which is certainly not an ideal solution.
Other info
I've only noticed this issue for Japanese characters, but have only tested English, German, and Japanese.
Japanese characters use 3 bytes of data, which I suspect has something to do with this but I don't know what or why. English characters use 1 byte and certain German characters use 2 bytes.
A control (button/label/etc) in one place may have an issue whereas a control in a different place that contains the same text does not have the issue, even if they're both buttons..etc.
When the text is cutoff, it typically either displays a question mark box (like the first image) or a random character/letter at the end. This character changes each time I run the application, but the question box is the most common.
For my padding "fix", it doesn't matter if the spaces are at the beginning or end of the string, as long as the number of spaces is enough. It also doesn't need to be spaces, any non-zero-width character works.
Compiled using MBCS (Multibyte Character Set) and the Windows 10 UTF-8 Unicode Support setting enabled. (As opposed to compiling with UNICODE defined which isn't an option. Large old codebase)
EDIT: Here is an example on how the text is set
GetDlgItem(IDC_SOME_CTRL_ID)->SetWindowText(GetTranslation("Some String"));
Where GetTranslation() is our own function to look up the translation of "Some String" (basically a lookup table) and return a CString. Using a debugger I can see the returned CString always has the correct string value. I can replace GetTranslation with a hardcoded Japanese string and the issue will still happen.
EDIT 2: I got complaints that this code wasn't enough.
myapp.rc
// Microsoft Visual C++ generated resource script.
//
#include "resource.h"
#define APSTUDIO_READONLY_SYMBOLS
#include "afxres.h"
#undef APSTUDIO_READONLY_SYMBOLS
IDD_VIEW_MENU DIALOGEX 0, 0, 50, 232
STYLE DS_SETFONT | WS_CHILD
FONT 14, "Verdana", 0, 0, 0x1
BEGIN
CONTROL "btn0",IDC_BUTTON_MENU_0,"Button",BS_3STATE | BS_PUSHLIKE,12,38,25,13
END
#endif
resource.h
#define IDC_BUTTON_MENU_0 6040
ViewMenu.cpp
#include "stdafx.h"
#include "ViewMenu.h"
CViewMenu::CViewMenu() : CFormView(CViewMenu::IDD)
{
}
void CViewMenu::DoDataExchange(CDateExchange* pDX)
{
CFormView::DoDataExchange(pDX);
DDX_Control(pDX, IDC_BUTTON_MENU_0, m_ctrlMenuButton0);
}
void CViewMenu::OnInitialUpdate()
{
CFormView::OnInitialUpdate();
}
void CViewMenu::OnDraw(CDC* pDC)
{
CFormView::OnDraw(pDC);
GetDlgItem(IDC_BUTTON_MENU_0)->SetWindowText("ログアウト/終了");
return;
}
ViewMenu.h
#include "resource.h"
class CViewMenu : public CFormView
{
protected:
CViewMenu();
public:
enum { IDD = IDD_VIEW_MENU };
CButton m_ctrlMenuButton0;
}
The following should work in Windows 10 versions 1903 and later, regardless of the default system locale, and fulfills OP's requirements (string literals, MBCS build, no Unicode windows etc). It was verified to work in version 2004 set to En-US locale, without "Beta: Use Unicode UTF-8 for worldwide language support" checked, using VS 2019 16.7.5 to build.
Save source files containing characters outside the active codepage in UTF-8 encoding, with or without BOM.
Compile with _MBCS defined (in the IDE: Properties / Advanced / Character Set = MBCS).
Compile with the /utf-8 switch (C/C++ / Command Line / Additional Options = /utf-8).
Create a manifest file declaring UTF-8 as the target codepage for the process (per the activeCodePage documentation).
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<assembly manifestVersion="1.0" xmlns="urn:schemas-microsoft-com:asm.v1" xmlns:asmv3="urn:schemas-microsoft-com:asm.v3">
<asmv3:application>
<asmv3:windowsSettings xmlns="http://schemas.microsoft.com/SMI/2019/WindowsSettings">
<activeCodePage>UTF-8</activeCodePage>
</asmv3:windowsSettings>
</asmv3:application>
</assembly>
Add the manifest file to the project (in the IDE: Manifest Tool / General / Input and Output / Additional Manifest Files = manifest file created at the previous step).
This ain't Python. With C++ you need to know, why your code works. Otherwise it doesn't.
GetDlgItem(IDC_BUTTON_MENU_0)->SetWindowText("ログアウト/終了");
That's where you and your compiler start to disagree. You think this should be UTF-8. Your compiler, on the other hand, trusts you, and assumes that you are using the source character set.
While you are unaware of a concept called source character set, you get all confused about something that should be the norm: Garbage in, garbage out.
If you feel like fixing the "Garbage in" part (now, clearly, that is your job), read up on C++ string literals. In case you don't make it to the end, the quickest way to fix your ungodly workaround is to use a u8 prefix.
Seriously, though, the real solutions is to use Windows' native character encoding. Which, oddly, you seem to reject, even though you could use it, given a string literal. I mean, it's not like you have to change anything global. Just call SetWindowTextW and use an L prefix.
Just saying, you know...
Supposing to use _findfirst in one of the two cases below
struct _finddata_t fd;
intptr_t search_handle;
// Either
search_handle = _findfirst( "file*.txt", &fd );
// Or
search_handle = _findfirst( ".\\file*.txt", &fd );
does the two have differences in result?
I didn't find yet if the first form can follow a specific search strategy other that the GetCurrentDirectory() one.
Can it make a difference with some filesystems (like with Unicode special names)?
From what I tested on a few PCs and different versions of Windows, it looked to be the same, but on a remote system that I can't access it looks like there can be some reason for it to fail -- maybe when the file name selected is long enough, however it's just a supposition I can't prove right now, and I still doubt it exceeds _MAX_PATH.
My goal is get a file name from a command line call/string. For instance, if I have the following strings on the input:
C:\WINDOWS\system32\mstsc.exe /v:%WKSNAME% /f
"C:\Users\User Name\Desktop\My Program.exe" /?
The API should return the following respectively:
mstsc.exe
My Program.exe
So I tried to use splitpath function, and although it works for a very simple file path, it totally fails on my two examples above.
I understand that I can write my own parser (so please don't offer that.) I'm curious if there's a built-in Windows API that does it already?
PS. There must be one that OS uses internally to parse those.
PS2. Here's the code I've been toying with:
TCHAR buffFileName[MAX_PATH];
TCHAR buffExt[MAX_PATH];
LPCTSTR strInputPath = L"C:\\WINDOWS\\system32\\mstsc.exe /v:%WKSNAME% /f";
if(_tsplitpath_s(strInputPath, NULL, 0, NULL, 0, buffFileName, MAX_PATH, buffExt, MAX_PATH) == 0)
{
//Got something
}
I believe PathFindFileName may be what you're looking for. From the docs:
Searches a path for a file name.
The examples within the docs seem to show the exact behavior you describe.
If you'd rather parse the entire command-line, CommandLineToArgvW may be helpful. It takes a command-line and splits it into an array containing the filename and any arguments.
This function's parsing rules are fairly intricate, so be sure to look over the docs, but a simple explanation of them can be found in this answer.
I'm trying to use GetDiskFreeSpaceEx in my C++ win32 application to get the total available bytes on the 'current' drive. I'm on Windows 7.
I'm using this sample code: http://support.microsoft.com/kb/231497
And it works! Well, almost. It works if I provide a drive, such as:
...
szDrive[0] = 'C'; // <-- specifying drive
szDrive[1] = ':';
szDrive[2] = '\\';
szDrive[3] = '\0';
pszDrive = szDrive;
...
fResult = pGetDiskFreeSpaceEx ((LPCTSTR)pszDrive,
(PULARGE_INTEGER)&i64FreeBytesToCaller,
(PULARGE_INTEGER)&i64TotalBytes,
(PULARGE_INTEGER)&i64FreeBytes);
fResult becomes true and i can go on to accurately calculate the number of free bytes available.
The problem, however, is that I was hoping to not have to specify the drive, but instead just use the 'current' one. The docs I found online (Here) state:
lpDirectoryName [in, optional]
A directory on the disk. If this parameter is NULL, the function uses the root of the current disk.
But if I pass in NULL for the Directory Name then GetDiskFreeSpaceEx ends up returning false and the data remains as garbage.
fResult = pGetDiskFreeSpaceEx (NULL,
(PULARGE_INTEGER)&i64FreeBytesToCaller,
(PULARGE_INTEGER)&i64TotalBytes,
(PULARGE_INTEGER)&i64FreeBytes);
//fResult == false
Is this odd? Surely I'm missing something? Any help is appreciated!
EDIT
As per JosephH's comment, I did a GetLastError() call. It returned the DWORD for:
ERROR_INVALID_NAME 123 (0x7B)
The filename, directory name, or volume label syntax is incorrect.
2nd EDIT
Buried down in the comments I mentioned:
I tried GetCurrentDirectory and it returns the correct absolute path, except it prefixes it with \\?\
it returns the correct absolute path, except it prefixes it with \\?\
That's the key to this mystery. What you got back is the name of the directory with the native api path name. Windows is an operating system that internally looks very different from what you are familiar with winapi programming. The Windows kernel has a completely different api, it resembles the DEC VMS operating system a lot. No coincidence, David Cutler used to work for DEC. On top of that native OS were originally three api layers, Win32, POSIX and OS/2. They made it easy to port programs from other operating systems to Windows NT. Nobody cared much for the POSIX and OS/2 layers, they were dropped at XP time.
One infamous restriction in Win32 is the value of MAX_PATH, 260. It sets the largest permitted size of a C string that stores a file path name. The native api permits much larger names, 32000 characters. You can bypass the Win32 restriction by using the path name using the native api format. Which is simply the same path name as you are familiar with, but prefixed with \\?\.
So surely the reason that you got such a string back from GetCurrentDirectory() is because your current directory name is longer than 259 characters. Extrapolating further, GetDiskFreeSpaceEx() failed because it has a bug, it rejects the long name it sees when you pass NULL. Somewhat understandable, it isn't normally asked to deal with long names. Everybody just passes the drive name.
This is fairly typical for what happens when you create directories with such long names. Stuff just starts falling over randomly. In general there is a lot of C code around that uses MAX_PATH and that code will fail miserably when it has to deal with path names that are longer than that. This is a pretty exploitable problem too for its ability to create stack buffer overflow in a C program, technically a carefully crafted file name could be used to manipulate programs and inject malware.
There is no real cure for this problem, that bug in GetDiskFreeSpaceEx() isn't going to be fixed any time soon. Delete that directory, it can cause lots more trouble, and write this off as a learning experience.
I am pretty sure you will have to retrieve the current drive and directory and pass that to the function. I remember attempting to use GetDiskFreeSpaceEx() with the directory name as ".", but that did not work.
Is there a way with Qt 4.6 to check if a given QString is a valid filename (or directory name) on the current operating system ? I want to check for the name to be valid, not for the file to exist.
Examples:
// Some valid names
test
under_score
.dotted-name
// Some specific names
colon:name // valid under UNIX OSes, but not on Windows
what? // valid under UNIX OSes, but still not on Windows
How would I achieve this ? Is there some Qt built-in function ?
I'd like to avoid creating an empty file, but if there is no other reliable way, I would still like to see how to do it in a "clean" way.
Many thanks.
This is the answer I got from Silje Johansen - Support Engineer - Trolltech ASA (in March 2008 though)
However. the complexity of including locale settings and finding
a unified way to query the filesystems on Linux/Unix about their
functionality is close to impossible.
However, to my knowledge, all applications I know of ignore this
problem.
(read: they aren't going to implement it)
Boost doesn't solve the problem either, they give only some vague notion of the maximum length of paths, especially if you want to be cross platform. As far as I know many have tried and failed to crack this problem (at least in theory, in practice it is most definitely possible to write a program that creates valid filenames in most cases.
If you want to implement this yourself, it might be worth considering a few not immediately obvious things such as:
Complications with invalid characters
The difference between file system limitations and OS and software limitations. Windows Explorer, which I consider part of the Windows OS does not fully support NTFS for example. Files containing ':' and '?', etc... can happily reside on an ntfs partition, but Explorer just chokes on them. Other than that, you can play safe and use the recommendations from Boost Filesystem.
Complications with path length
The second problem not fully tackled by the boost page is length of the full path. Probably the only thing that is certain at this moment is that no OS/filesystem combination supports indefinite path lengths. However, statements like "Windows maximum paths are limited to 260 chars" are wrong. The unicode API from Windows does allow you to create paths up to 32,767 utf-16 characters long. I haven't checked, but I imagine Explorer choking equally devoted, which would make this feature utterly useless for software having any users other than yourself (on the other hand you might prefer not to have your software choke in chorus).
There exists an old variable that goes by the name of PATH_MAX, which sounds promising, but the problem is that PATH_MAX simply isn't.
To end with a constructive note, here are some ideas on possible ways to code a solution.
Use defines to make OS specific sections. (Qt can help you with this)
Use the advice given on the boost page and OS and filesystem documentation to decide on your illegal characters
For path length the only workable idea that springs to my mind is a binary tree trial an error approach using the system call's error handling to check on a valid path length. This is quite aloof, but might be the only possibility of getting accurate results on a variety of systems.
Get good at elegant error handling.
Hope this has given some insights.
Based on User7116's answer here:
How do I check if a given string is a legal/valid file name under Windows?
I quit being lazy - looking for elegant solutions, and just coded it. I got:
bool isLegalFilePath(QString path)
{
if (!path.length())
return false;
// Anything following the raw filename prefix should be legal.
if (path.left(4)=="\\\\?\\")
return true;
// Windows filenames are not case sensitive.
path = path.toUpper();
// Trim the drive letter off
if (path[1]==':' && (path[0]>='A' && path[0]<='Z'))
path = path.right(path.length()-2);
QString illegal="<>:\"|?*";
foreach (const QChar& c, path)
{
// Check for control characters
if (c.toLatin1() >= 0 && c.toLatin1() < 32)
return false;
// Check for illegal characters
if (illegal.contains(c))
return false;
}
// Check for device names in filenames
static QStringList devices;
if (!devices.count())
devices << "CON" << "PRN" << "AUX" << "NUL" << "COM0" << "COM1" << "COM2"
<< "COM3" << "COM4" << "COM5" << "COM6" << "COM7" << "COM8" << "COM9" << "LPT0"
<< "LPT1" << "LPT2" << "LPT3" << "LPT4" << "LPT5" << "LPT6" << "LPT7" << "LPT8"
<< "LPT9";
const QFileInfo fi(path);
const QString basename = fi.baseName();
foreach (const QString& d, devices)
if (basename == d)
// Note: Names with ':' other than with a drive letter have already been rejected.
return false;
// Check for trailing periods or spaces
if (path.right(1)=="." || path.right(1)==" ")
return false;
// Check for pathnames that are too long (disregarding raw pathnames)
if (path.length()>260)
return false;
// Exclude raw device names
if (path.left(4)=="\\\\.\\")
return false;
// Since we are checking for a filename, it mustn't be a directory
if (path.right(1)=="\\")
return false;
return true;
}
Features:
Probably faster than using regexes
Checks for illegal characters and excludes device names (note that '' is not illegal, since it can be in path names)
Allows drive letters
Allows full path names
Allows network path names
Allows anything after \\?\ (raw file names)
Disallows anything starting with \\.\ (raw device names)
Disallows names ending in "\" (i.e. directory names)
Disallows names longer than 260 characters not starting with \\?\
Disallows trailing spaces and periods
Note that it does not check the length of filenames starting with \\?, since that is not a hard and fast rule. Also note, as pointed out here, names containing multiple backslashes and forward slashes are NOT rejected by the win32 API.
I don't think that Qt has a built-in function, but if Boost is an option, you can use Boost.Filesystem's name_check functions.
If Boost isn't an option, its page on name_check functions is still a good overview of what to check for on various platforms.
Difficult to do reliably on windows (some odd things such as a file named "com" still being invalid) and do you want to handle unicode, or subst tricks to allow a >260 char filename.
There is already a good answer here How do I check if a given string is a legal / valid file name under Windows?
see example (from Digia Qt Creator sources) in: https://qt.gitorious.org/qt-creator/qt-creator/source/4df7656394bc63088f67a0bae8733f400671d1b6:src/libs/utils/filenamevalidatinglineedit.cpp
I'd just create a simple function to validate the filename for the platform, which just searches through the string for any invalid characters. Don't think there's a built-in function in Qt. You could use #ifdefs inside the function to determine what platform you're on. Clean enough I'd say.