Can I retrieve a path, containing other than Latin characters?

Can I retrieve a path, containing other than Latin characters? - c++

I call GetModuleFileName function, in order to retrieve the fully qualified path of a specified module, in order to call another .exe in the same file, via Process::Start method.
However, .exe cannot be called when the path contains other than Latin characters (in my case Greek characters).
Is there any way I can fix this?
Code:
TCHAR path[1000];
GetModuleFileName(NULL, path, 1000) ; // Retrieves the fully qualified path for the file that
// contains the specified module.
PathRemoveFileSpec(path); // Removes the trailing file name and backslash from a path (TCHAR).
CHAR mypath[1000];
// Convert TCHAR to CHAR.
wcstombs(mypath, path, wcslen(path) + 1);
// Formatting the string: constructing a string by substituting computed values at various
// places in a constant string.
CHAR mypath2[1000];
sprintf_s(mypath2, "%s\\Client_JoypadCodesApplication.exe", mypath);
String^ result;
result = marshal_as<String^>(mypath2);
Process::Start(result);

Strings in .NET are encoded in UTF-16. The fact that you are calling wcstombs() means your app is compiled for Unicode and TCHAR maps to WCHAR, which is what Windows uses for UTF-16. So there is no need to call wcstombs() at all. Retrieve and format the path as UTF-16, then marshal it as UTF-16. Stop using TCHAR altogether (unless you need to compile for Windows 9x/ME):
WCHAR path[1000];
GetModuleFileNameW(NULL, path, 1000);
PathRemoveFileSpecW(path);
WCHAR mypath[1000];
swprintf_s(mypath, 1000, L"%s\\Client_JoypadCodesApplication.exe", path);
String^ result;
result = marshal_as<String^>(mypath);
Process::Start(result);
A better option would be to use a native .NET solution instead (untested):
String^ path = Path::DirectoryName(Application->StartupPath); // uses GetModuleFileName() internally
// or:
//String^ path = Path::DirectoryName(Process::GetCurrentProcess()->MainModule->FileName);
Process::Start(path + L"\\Client_JoypadCodesApplication.exe");

You must use GetModuleFileNameW and store the result in a wchar_t string.
Most Win32 API functions have a "Unicode" variant, which takes/gives UTF-16 strings. Using the ANSI versions is highly discouraged.

Related

GetFullPathNameW and long Windows file paths

In the Windows version of my current personal project, I'm looking to support extended length filepaths. As a result, I'm a little confused with how to use the GetFullPathNameW API to resolve the full name of a long filepath.
According to the MSDN (with regards to the lpFileName parameter):
In the ANSI version of this function, the name is limited to MAX_PATH characters. To extend this limit to 32,767 wide characters, call the Unicode version of the function and prepend "\?\" to the path. For more information, see Naming a File.
If I'm understanding this correctly, in order to use an extended length filepath with GetFullPathNameW, I need to specify a path with the \\?\ prefix attached. Since the \\?\ prefix is only valid before volume letters or UNC paths, this would mean that the API is unusable for resolving the full name of a path relative to the current directory.
If that's the case, is there another API I can use to resolve the full name of a filepath like ..\somedir\somefile.txt if the resulting name's length exceeds MAX_PATH? If not, would I be able to combine GetCurrentDirectory with the relative filepath (\\?\C:\my\cwd\..\somedir\somefile.txt) and use it with GetFullPathNameW, or would I need to handle all of the filepath resolution on my own?

GetFullPathNameA is limited to MAX_PATH characters, because it converts the ANSI name to a UNICODE name beforehand using a hardcoded MAX_PATH-sized (in chars) UNICODE buffer. If the conversion doesn't fail due to the length restrictions, then GetFullPathNameW (or direct GetFullPathName_U[Ex]) is called and the resulting UNICODE name is converted to ANSI.
GetFullPathNameW is a very thin shell over GetFullPathName_U. It is limited to MAXSHORT (0x7fff) length in WCHARs, independent of the \\?\ file prefix. Even without \\?\, it will be work for long (> MAX_PATH) relative names. However, if the lpFileName parameter does not begin with the \\?\ prefix, the result name in the lpBuffer parameter will not begin with \\?\ either.
if you will be use lpBuffer with functions like CreateFileW - this function internally convert Win32Name to NtName. and result will be depended from nape type (RTL_PATH_TYPE). if the name does not begin with \\?\ prefix, the conversion fails because RtlDosPathNameToRelativeNtPathName_U[_WithStatus] fails (because if the path not begin with \\?\ it will be internally call GetFullPathName_U (same function called by GetFullPathNameW) with nBufferLength hardcoded to MAX_PATH (exactly 2*MAX_PATH in bytes – NTDLL functions use buffer size in bytes, not in WCHARs). If name begin with \\?\ prefix, another case in RtlDosPathNameToRelativeNtPathName_U[_WithStatus] is executed – RtlpWin32NtNameToNtPathName, which replaces \\?\ with \??\ and has no MAX_PATH limitation
So the solution may look like this:
if(ULONG len = GetFullPathNameW(FileName, 0, 0, 0))
{
PWSTR buf = (PWSTR)_alloca((4 + len) * sizeof(WCHAR));
buf[0] = L'\\', buf[1] = L'\\', buf[2] = L'?', buf[3] = L'\\';
if (len - 1 == GetFullPathName(FileName, len, buf + 4, &c))
{
CreateFile(buf, ...);
}
}
So we need to specify a path with the \\?\ prefix attached, but not before GetFullPathName - after!
For more info, read this - The Definitive Guide on Win32 to NT Path Conversion

Just to update with the current state:
Starting in Windows 10, version 1607, MAX_PATH limitations have been removed from common Win32 file and directory functions. However, you must opt-in to the new behavior. To enable the new long path behavior, both of the following conditions must be met: ...
For the rest, please see my answer here: https://stackoverflow.com/a/57624626/3736444

Convert wide CString to char*

There are lots of times this question has been asked and as many answers - none of which work for me and, it seems, many others. The question is about wide CStrings and 8bit chars under MFC. We all want an answer that will work in ALL cases, not a specific instance.
void Dosomething(CString csFileName)
{
char cLocFileNamestr[1024];
char cIntFileNamestr[1024];
// Convert from whatever version of CString is supplied
// to an 8 bit char string
cIntFileNamestr = ConvertCStochar(csFileName);
sprintf_s(cLocFileNamestr, "%s_%s", cIntFileNamestr, "pling.txt" );
m_KFile = fopen(LocFileNamestr, "wt");
}
This is an addition to existing code (by somebody else) for debugging.
I don't want to change the function signature, it is used in many places.
I cannot change the signature of sprintf_s, it is a library function.

You are leaving out a lot of details, or ignoring them. If you are building with UNICODE defined (which it seems you are), then the easiest way to convert to MBCS is like this:
CStringA strAIntFileNameStr = csFileName.GetString(); // uses default code page
CStringA is the 8-bit/MBCS version of CString.
However, it will fill with some garbage characters if the unicode string you are translating from contains characters that are not in the default code page.
Instead of using fopen(), you could use _wfopen() which will open a file with a unicode filename. To create your file name, you would use swprintf_s().

an answer that will work in ALL cases, not a specific instance...
There is no such thing.
It's easy to convert "ABCD..." from wchar_t* to char*, but it doesn't work that way with non-Latin languages.
Stick to CString and wchar_t when your project is unicode.
If you need to upload data to webpage or something, then use CW2A and CA2W for utf-8 and utf-16 conversion.
CStringW unicode = L"Россия";
MessageBoxW(0,unicode,L"Russian",0);//should be okay
CStringA utf8 = CW2A(unicode, CP_UTF8);
::MessageBoxA(0,utf8,"format error",0);//WinApi doesn't get UTF-8
char buf[1024];
strcpy(buf, utf8);
::MessageBoxA(0,buf,"format error",0);//same problem
//send this buf to webpage or other utf-8 systems
//this should be compatible with notepad etc.
//text will appear correctly
ofstream f(L"c:\\stuff\\okay.txt");
f.write(buf, strlen(buf));
//convert utf8 back to utf16
unicode = CA2W(buf, CP_UTF8);
::MessageBoxW(0,unicode,L"okay",0);

Storing and retrieving UTF-8 strings from Windows resource (RC) files

I created an RC file which contains a string table, I would like to use some special
characters: ö ü ó ú ő ű á é. so I save the string with UTF-8 encoding.
But when I call in my cpp file, something like this:
LoadString("hu.dll", 12, nn, MAX_PATH);
I get a weird result:
How do I solve this problem?

As others have pointed out in the comments, the Windows APIs do not provide direct support for UTF-8 encoded text. You cannot pass the MessageBox function UTF-8 encoded strings and get the output that you expect. It will, instead, interpret them as characters in your local code page.
To get a UTF-8 string to pass to the Windows API functions (including MessageBox), you need to use the MultiByteToWideChar function to convert from UTF-8 to UTF-16 (what Windows calls Unicode, or wide strings). Passing the CP_UTF8 flag for the first parameter is the magic that enables this conversion. Example:
std::wstring ConvertUTF8ToUTF16String(const char* pszUtf8String)
{
// Determine the size required for the destination buffer.
const int length = MultiByteToWideChar(CP_UTF8,
0, // no flags required
pszUtf8String,
-1, // automatically determine length
nullptr,
0);
// Allocate a buffer of the appropriate length.
std::wstring utf16String(length, L'\0');
// Call the function again to do the conversion.
if (!MultiByteToWideChar(CP_UTF8,
0,
pszUtf8String,
-1,
&utf16String[0],
length))
{
// Uh-oh! Something went wrong.
// Handle the failure condition, perhaps by throwing an exception.
// Call the GetLastError() function for additional error information.
throw std::runtime_error("The MultiByteToWideChar function failed");
}
// Return the converted UTF-16 string.
return utf16String;
}
Then, once you have a wide string, you will explicitly call the wide-string variant of the MessageBox function, MessageBoxW.
However, if you only need to support Windows and not other platforms that use UTF-8 everywhere, you will probably have a much easier time sticking exclusively with UTF-16 encoded strings. This is the native Unicode encoding that Windows uses, and you can pass these types of strings directly to any of the Windows API functions. See my answer here to learn more about the interaction between Windows API functions and strings. I recommend the same thing to you as I did to the other guy:
Stick with wchar_t and std::wstring for your characters and strings, respectively.
Always call the W variants of Windows API functions, including LoadStringW and MessageBoxW.
Ensure that the UNICODE and _UNICODE macros are defined either before you include any of the Windows headers or in your project's build settings.

How do I store value to string with RegOpenKeyEx?

I need to grab the path from the registry. The following code works except for the last part where I'm storing the path to the string. Running the debugger in Visual Studio 2008 the char array has the path, but every other character is a zero. This results in the string only being assigned the first letter. I've tried changing char res[1024] to char *res = new char[1024] and this just makes it store the first letter in the char array instead of the string. The rest of the program needs the path as a string datatype so it can't stay as a char array. What am I missing here?
unsigned long type=REG_SZ, size=1024;
string path;
char res[1024];
HKEY key;
if (RegOpenKeyEx(HKEY_LOCAL_MACHINE, _T("SOFTWARE\\Classes\\dsn\\shell\\open\\command"), NULL, KEY_READ, &key)==ERROR_SUCCESS){
RegQueryValueEx(key,
NULL,// YOUR value
NULL,
&type,
(LPBYTE)res,
&size);
RegCloseKey(key);
path = string(res);
}

You're getting back a Unicode string, but assigning it to a char-based string.
You could switch path's class to being a 'tstring' or 'wstring', or use RegQueryValueExA (A for ASCII).

You are compiling in Unicode. Go to Project Settings>Configuration Properties>General and change "Character Set" to "Not Set", and rebuild your project.
RegOpenKey is actually a macro defined in the WINAPI headers. If Unicode is enabled, it resolves to RegOpenKeyW, if not then it resolves to RegOpenKeyA. If you want to continue to compile under unicode, then you can just call RetgOpenKeyA directly instead of using the macro.
Otherwise, you'll need to deal with Unicode strings which, if needed, we can help you with also.

For C++, you may prefer to access the Registry using the ATL helper class CRegKey. The method for storing string values is QueryStringValue. There are other (somewhat) typesafe methods for retrieving and setting different registry value types.
It's not the best C++ interface (eg. no std::string support) but a little smoother than native Win32.

_wfopen equivalent under Mac OS X

I'm looking to the equivalent of Windows _wfopen() under Mac OS X. Any idea?
I need this in order to port a Windows library that uses wchar* for its File interface. As this is intended to be a cross-platform library, I am unable to rely on how the client application will get the file path and give it to the library.

POSIX API in Mac OS X are usable with UTF-8 strings. In order to convert a wchar_t string to UTF-8, it is possible to use the CoreFoundation framework from Mac OS X.
Here is a class that will wrap an UTF-8 generated string from a wchar_t string.
class Utf8
{
public:
Utf8(const wchar_t* wsz): m_utf8(NULL)
{
// OS X uses 32-bit wchar
const int bytes = wcslen(wsz) * sizeof(wchar_t);
// comp_bLittleEndian is in the lib I use in order to detect PowerPC/Intel
CFStringEncoding encoding = comp_bLittleEndian ? kCFStringEncodingUTF32LE
: kCFStringEncodingUTF32BE;
CFStringRef str = CFStringCreateWithBytesNoCopy(NULL,
(const UInt8*)wsz, bytes,
encoding, false,
kCFAllocatorNull
);
const int bytesUtf8 = CFStringGetMaximumSizeOfFileSystemRepresentation(str);
m_utf8 = new char[bytesUtf8];
CFStringGetFileSystemRepresentation(str, m_utf8, bytesUtf8);
CFRelease(str);
}
~Utf8()
{
if( m_utf8 )
{
delete[] m_utf8;
}
}
public:
operator const char*() const { return m_utf8; }
private:
char* m_utf8;
};
Usage:
const wchar_t wsz = L"Here is some Unicode content: éà€œæ";
const Utf8 utf8 = wsz;
FILE* file = fopen(utf8, "r");
This will work for reading or writing files.

You just want to open a file handle using a path that may contain Unicode characters, right? Just pass the path in filesystem representation to fopen.
If the path came from the stock Mac OS X frameworks (for example, an Open panel whether Carbon or Cocoa), you won't need to do any conversion on it and will be able to use it as-is.
If you're generating part of the path yourself, you should create a CFStringRef from your path and then get that in filesystem representation to pass to POSIX APIs like open or fopen.
Generally speaking, you won't have to do a lot of that for most applications. For example, many applications may have auxiliary data files stored the user's Application Support directory, but as long as the names of those files are ASCII, and you use standard Mac OS X APIs to locate the user's Application Support directory, you don't need to do a bunch of paranoid conversion of a path constructed with those two components.
Edited to add: I would strongly caution against arbitrarily converting everything to UTF-8 using something like wcstombs because filesystem encoding is not necessarily identical to the generated UTF-8. Mac OS X and Windows both use specific (but different) canonical decomposition rules for the encoding used in filesystem paths.
For example, they need to decide whether "é" will be stored as one or two code units (either LATIN SMALL LETTER E WITH ACUTE or LATIN SMALL LETTER E followed by COMBINING ACUTE ACCENT). These will result in two different — and different-length — byte sequences, and both Mac OS X and Windows work to avoid putting multiple files with the same name (as the user perceives them) in the same directory.
The rules for how to perform this canonical decomposition can get pretty hairy, so rather than try to implement it yourself it's best to leave it to the functions the system frameworks have provided for you to do the heavy lifting.

#JKP:
Not all functions in MacOS X accept UTF8, but filenames and filepaths may be UTF8, thus all POSIX functions dealing with file access (open, fopen, stat, etc.) accept UTF8.
See here. Quote:
How a file name looks at the API level
depends on the API. Current Carbon
APIs handle file names as an array of
UTF-16 characters; POSIX ones handle
them as an array of UTF-8, which is
why UTF-8 works well in Terminal. How
it's stored on disk depends on the
disk format; HFS+ uses UTF-16, but
that's not important in most cases.
Some other POSIX functions handle UTF8 as well. E.g. functions dealing with user names, group names or user passwords use UTF8 to store the information (thus a user name can be Japanese and your password can be Chinese, no problem).
But not all handle UTF8. E.g. for all string functions an UTF8 string is just a normal C String and characters above 126 have no special meaning. They don't understand the concept of multiple bytes (chars in C) forming a single Unicode character. How other APIs handle char * pointer being passed to them is different from API to API. However, as a rule as the thumb you can say:
Either the function only accepts C strings with pure ASCII characters (only in the range 0 to 126) or it will accept UTF8. Usually functions don't allow characters above 126 and interpret them in any other encoding than UTF8. If this really was the case, it is documented and then there must be a way to pass the encoding along with the string.

If you're using Cocoa it's fairly easy with NSString. Just load the UTF16 data in using -initWithBytes:length:encoding: (or perhaps -initWithCString:encoding:) and then get a UTF8 version by calling UTF8String on the result. Then, just call fopen with your new UTF8 string as the param.
You can definitely call fopen with a UTF-8 string, regardless of language - can't help with C++ on OSX though - sorry.

I have read file name from configuration UTF8 file through wifstream (it uses wchar_t buffer).
Mac implementation is different from Linux and Windows.
wifstream reads each byte from file to separate wchar_t cell in the buffer. So we have 3 empty bytes, although open requires char string. Thus programmer can use wcstombs function to convert wide character string to multi-byte string.
The API supports UTF8. For better understanding use memory watcher and hex editor for your file.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js