How would one get a UTF-8/Unicode string from GetOpenFileName? - c++

I'm developing an application in MinGW/C++ that uses Windows' common dialogs. The need has arisen to collect a file name that might have non-ASCII characters in it. Is there a flag or another option for retrieving a file name in Unicode, or preferably UTF-8?

Call GetOpenFileNameW. You can do this without converting your entire app to Unicode which may be the most expedient solution.
Windows API comes in 2 flavours, ANSI and Unicode. The former has functions with an A suffix. The latter have a W suffix. You are currently using the former.

Related

MFC CEdit converts non-ascii characters to ascii

We have an MFC Windows Application, written originally in VC++ 6 and over the years updated for newer IDE, currently developed in VS2017.
The application is built with MBCS (not unicode). Trying to switch to Unicode causes 3806 compile errors, and that is probably just a tip of an iceberg.
However we want to be able to run the application with different code page, ie. 1250 (Central European).
I tried to build a small test application, and managed to get it to work with special characters (čćšđž). I did this by setting dialog font to Microsoft Sans Serif with code page 1250.
The same approach in our application does not work. Note: dialogs in our application are created dynamically, and font is set using SetFont.
There is a difference how the special characters are treated in these two applications.
In test application, the special characters are displayed in the edit control, and GetWindowsText retrieves the right bytes. However, trying to write some characters from other languages, renders them as "????".
In our application, all special characters are rendered properly, but GetWindowText (or WM_GETTEXT) convert the special characters to the similar ascii counterpart (čćđ -> ccd).
I believe that Edit control in our application displays Unicode text, but GetWindowText converts it to ascii.
Does anyone have any idea what is happening here, and how I might solve it?
Note: I know how to convert project to Unicode. We are choosing not to commit resources to it at the moment, as it would probably take weeks or months to implement. The question is how I might get it to work with MBSC and why is edit control converting Č to C.
I believe it is absolutely possible to port the application to other languages/codepages, you only need to modify the .rc (resource) files, basically having one resource file for each language, which you may rather want to do anyway, as strings in menus and/or string-tables would be in a different language. And this is actually the only change needed, as far as the application part is concerned.
The other part is the system you are running it on. A window can be unicode or non-unicode. You can see this with the Spyxx utility, it tells you whether a window (procedure) is unicode or not (Window properties, General tab). And while unicode windows do work properly, non-unicode ones have to change encoding from/to unicode and mbcs when getting or setting the text. The conversion is based on the system (default) code-page. This can only be set globally (for the whole machine), and not per application or window. And of course, setting the font's codepage is not enough (and imo it's not needed at all, if you are runnign the application on a machine with the "correct" codepage). That is, for non-unicode applications, only one codepage will be working properly, the others won't.
I can see two options:
If you only need to update a small number of controls, it may be possible to change only these controls to unicode, and use the "wide" versions of the get/set window-test functions or messages - you will have to convert the text between unicode and your desired codepage. It requires writing some code, but has the advantage of the conversion being independent from the system default codepage, eg you can have the codepage in some configuration file, in the registry, or as a command-line option (in the application's shortcut). Some control types can be changed to unicode, some others not, so pls check the documentation. Used this technique successfully for a mbcs application displaying/editing translated strings in many different languages, but I only had one control, a List-View, which btw offers the LVM_SETUNICODEFORMAT message, thus allowing for unicode texts, even in a mbcs application.
The easiest method is simply run the application as is, but it will only be working on machines with the proper default codepage, as most non-unicode applications do.
The system default codepage can be changed by setting the "Language for non-Unicode programs" option, available in the regional settings, Administrative tab, and requires a reboot. Changing the Windows UI language will change this option as well, but by setting this option you don't need to change the UI language, eg you can have English UI and East-European codepage.
See a very similar post here.
Late to the party:
In our application, all special characters are rendered properly, but GetWindowText (or WM_GETTEXT) convert the special characters to the similar ascii counterpart (čćđ -> ccd).
That sounds like the ES_OEMCONVERT flag has been set for the control:
Converts text entered in the edit control. The text is converted from the Windows character set to the OEM character set and then back to the Windows character set. This ensures proper character conversion when the application calls the CharToOem function to convert a Windows string in the edit control to OEM characters. This style is most useful for edit controls that contain file names that will be used on file systems that do not support Unicode.
To change this style after the control has been created, use SetWindowLong.

using CListCtrl to display utf-8 characters

I'm trying to display text in my CListCtrl for the last several hours with no success.
I'm using std::ifstream to read from .txt file that uses utf-8 to populate the CListCtrl.
"Project properties->Character" set is "not set" and I can't change it to use Unicode, this is an old project originally not written by me.
Also conversion from UTF-8 to ANSI doesn't work, and I can't use boost libraries.
As I read the CListCtrl doesn't support UTF-8.
I will be glad to hear any solution that might work, Extended CListCtrl to replace the old one etc.. I am using VS2010, .NET4.
You need to use the Unicode version of list view APIs (e.g. LVM_SETITEMW). The MBCS version of MFC calls the ANSI versions of Windows APIs which may not be able to display some Unicode characters in your file.
This means you need to send a LVM_SETITEMW message with a LVITEMW structure if you want to change an item, for example. If you have many list controls to change, you can probably write a CListCtrlW class using MFC's code as reference. LVITEMW expects strings to be UTF-16, so you need to convert the string data to UTF-16. You can do this via MultiByteToWideChar or CA2W with the CP_UTF8 code page. Also if you are using a font that cannot handle some Unicode characters from your input, you need to change the font.
If a lot of places in the UI are required to handle Unicode input, you can try move the ANSI part of the business logic out to a DLL then change your main exe project to Unicode.

Using an ini file without Unicode

Is there any provision in WinAPI or otherwise for using ini files (or similar style config files) without having to use LPCWSTRs for most things?
My app is using single width ASCII strings throughout, and I've just got round to reading the ini file. Unicode strings are proving to be difficult to deal with and convert between.
If I can't find something fairly simple I think I will just use fstream and be done with it.
.INI files are very old stuff. They were existing decades before the Unicode was introduced. They are simple ASCII files. Tons of applications (including mine) are working with them using simple ASCII Api like GetPrivateProfileString.
If your application uses Unicode default, you can write explicitly GetPrivateProfileStringA. This will force all its params to be simple strings.

SHGetFolderPath returns path with question marks in it

Our application calls ShGetFolderPath when it runs, to get the My Documents folder. This normally works great. However, for three users - Дмитрий, Jörg and Jörgen (see if you can spot the pattern!) - the call returns some very strange results. For example, for Дмитрий, the call returns:
c:\Users\???????\Documents
I assume there's some sort of character encoding shenanigan going on here, possibly related to Unicode, but I don't have any experience with that sort of thing. How can I get a useful path to the folder (and other related folders) out of windows, without grovelling through registry keys for the information?
In an email to me, Дмитрий ("Dmitry"), told me his "my documents" folder was actually located here:
C:\Users\43D6~1\Documents
So I know there's a way to get a "normal" version of the path out of Windows, I just don't know what it is.
Background: Our application is not unicode-aware, and uses standard "char *" strings. How can we get the "normal" path? I'm not opposed to calling the "unicode" version of the function, then converting it to "normal" text, if that's possible. Converting the application entirely to use unicode is not an option here (we don't have the time).
Thanks.
Go ahead and get the file path in Unicode. Then call GetShortPathNameW to convert to short pathname components. The output shouldn't contain any characters outside of the ASCII range even though it's a Unicode function. You can then truncate each Unicode character back to 8 bits to create a char string.
I'm not opposed to calling the "unicode" version of the function, then converting it to "normal" text, if that's possible.
If you change your call to SHGetFolderPath to SHGetFolderPathW, it will provide you with a string of type LPWSTR, which is a Unicode string. From there, you can use that string with the various Unicode functions that end with W to access the folder or files you need.

How to use resources in VC++?

I am using VC 9 and I want to support Russian language for my application. I even created Russian resource strings. But my system has Russian Language setting. If it is not there every character displays junk (its code page is 1251). I also made DLL from Russian resource file. If I run that DLL in application from installed location, it works fine.
But when I change computer setting to English and run that DLL from appilcation, dialog and message box shows junk character. But shouldn't application read from DLL, not from computer language setting? Here I am facing problem how to make a language independent DLL. Any code or setting for this?
By far the easiest solution is to stick to Unicode.
Windows is Unicode internally. (Almost) Every API function exists in two variants, FooA and FooW. THe FooA variant converts char's to wchar_t's before calling FooW. The exact conversion is defined by the code page.
Now, if you use Unicode, there is no such conversion, and no code page. If the user enters ж (U+0436, it is stored as wchar_t(0x0436) and never converted. If your resource contains ж in Unicode, it too is not converted.
If the strings you want to display cannot be represented in the system code page, the only solution is Unicode.