non-unicode WM_CHAR in unicode windows

non-unicode WM_CHAR in unicode windows - c++

I have written a DLL which exports a function that creates a window using RegisterClassExW and CreateWindowExW. Every message is retrieved via
GetMessageW(&msg, wnd_handle, 0, 0);
TranslateMessage(&msg);
DispatchMessageW(&msg);
Also there is a program which loads the DLL and calls the function.
Despite the Unicode window creation method, the wParam in the WM_CHAR message always contains ASCII characters, even if I type some non-ASCII symbols or use Alt+(code). Instead of UTF-16, the wParam contains some ASCII character between 'A' and 'z'.
The WndProc is a static function inside the DLL.
The problem doesn't occur when all the window-related code is inside one program.
Is there a way to always have Unicode WM_CHAR messages inside the DLL's window?

the problem was in the message retrieval process. I used GetMessage() with the handle of my window instead of just 0, GetMessageW(&msg, wnd_handle, 0, 0) instead of GetMessageW(&msg, 0, 0, 0). In this way, the WM_INPUTLANGCHANGEREQUEST messages were swallowed and the locale remained English.

Your approach seems like it should work.
Is it possible that you're calling the ANSI DefWindowProc instead of the wide version? That would translate a WM_UNICHAR into ANSI WM_CHAR messages. Maybe that would explain what you're seeing.
As an experiment, I'd handle the WM_UNICHAR messages directly, and see what the data looks like at that point.

I am not 100% sure, but it might help:
Take a look to the settings of the project where you implement the code that calls the DLL functions. Make sure that the character set is UNICODE as well, and not multibyte:
(go to Project Properties, then to General section, and put Character Set option to "Use Unicode Character Set"). I was assuming that you're using Visual Studio 2003 or later.

Related

Unicode text appears as question marks in edit box, even though I use SetWindowTextW()

I have a problem with unicode filenames appearing as question marks in my edit boxes.
When I paste unicode characters in an edit box, for example Arabic or Thai, they show correctly, but after I run this code, they become question marks. How come?
WCHAR buf[100];
GetWindowTextW(hWndEditBox, buf, 100);
SetWindowTextW(hWndEditBox, buf);
Another thing - the project is ANSI (we have code that can't be ported so the entire project stays ANSI), i.e. _UNICODE macro is undefined, but I explicitly use the Unicode versions of the filenames.

The GetWindowText function actually sends a WM_GETTEXT message to the window (hWndEditBox). Since you're using the *A functions rather than the *W function (specifically CreateWindowExA in this case, I think) your message loop will be converting from wide characters to multi-byte characters using some locale.
Your only solution here seems to be changing the entire window setup - if your code that requires ANSI is not related to UI this should be possible. Alternatively, you may be able to replace the edit box with rich edit boxes, that provide extra messages (such as streaming, for example).
You may want to check whether it is the GetWindowTextW call or the SetWindowTextW call that is doing the bad conversion - if GetWindowTextW works correctly you may be able to convert to multibyte using the correct locale before you set it.
Finally, you can try adjusting the thread's code page before reading the text, though this may cause all sorts of other issues. The usual advice is to use Unicode.
Sources: GetWindowText and this comment from Raymond Chen on his blog.

A useful answer to address SetWindowTextW()is given in https://stackoverflow.com/a/11515400/1190077 :
intercept the resulting WM_SETTEXT message and re-route it to DefWindowProcW() instead of DefWindowProc().

Get the WNDPROC for windows handle

Exist any Windows api function to retrieve the WNDPROC for a Windows Handle?
Thanks in advance.

Use GetWindowLongPtr(hwnd, GWLP_WNDPROC).
Caution: GetWindowLongPtr is actually #defined to GetWindowLong for 32-bit systems, so in order to import it in Delphi you might need to use GetWindowLong instead. As well, GetWindowLongPtr itself is #defined to either GetWindowLongPtrA or GetWindowLongPtrW (for non-unicode and unicode targets), so again you might need to choose the right name manually for Delphi if the import system there is not really smart.
Remember that if you are going to call the obtained window proc, you should do it using CallWindowProc. Thanks to #In silico for the hint.
Please note that the value which is returned is not always the real pointer to the window procedure. Sometimes it's just a kind of handle which is recognized and correctly processed by CallWindowProc. For example, you'll not get the real function pointer if your application is ANSI, but the window belongs to a Unicode component (or vice versa). See this posting in The Old New Thing for more details.

c++ win32 get utf8 char from keyboard

how would i read keystrokes using the win32 api? i would also like to see them from international keyboards like german umlauts.
thanks

There's a difference between keyboard presses and the characters they generate.
At the lowest level, you can poll the keyboard state with GetKeyboardState. That's often how keylogging malware does it, since it requires the least privileges and sees everything regardless of where the focus is. The problem with this approach (besides requiring constant polling) is that you have to piece together the keyboard state into keystrokes and then keystrokes into a character stream. You have to know how the keyboard is mapped, you have keep state of shift keys, control keys, alt keys, etc. You have to know about auto-repeat, dead keys, and possibly other complications.
If you have privileges you can install a keyboard hook, as Jens mentioned in his answer.
If you have focus, and you're a console app, you use one of the functions to read from standard input. On Windows, it's hard to get true Unicode input. You generally get so-called ANSI characters, which correspond to the current code page for the console window. If you know the code page, you can use MultiByteToWideChar to convert the single- or multi-byte input into UTF-16 (which Windows documentation calls Unicode). From there you can convert it to UTF-8 (with WideCharToMultiByte) or whatever other Unicode encoding you want.
If you have focus, and you're a GUI app, you can see keystrokes with WM_KEYDOWN (and friends). You can can also get fully resolved UTF-16 characters with WM_CHAR (or UTF-32 from WM_UNICHAR). If you need UTF-8 from those, you'll have to do a conversion.

To get keyboard input regardless of focus, you'll probably need to hook the keyboard.
Take a look at SetWindowsHookEx with WH_KEYBOARD or WH_KEYBOARD_LL. Add a W to the call for the Unicode variant.

Piecewise conversion of an MFC app to Unicode/MBCS

I have a large MFC application that I am extending to allow for multi-lingual input. At the moment I need to allow the user to enter Unicode data in edit boxes on a single dialog.
Is there a way to do this without turning UNICODE or MBCS on for the entire application? I only need a small part of the application converted at the moment. Is it possible to do this piecewise, and if so, how?
Clarification: I could use ::GetWindowTextW() to get Unicode information out of the window. I am trying to figure out how to allow the user to enter Unicode text in the window. Currently, characters the user types outside of the windows-1252 codepage show up as '?'. Is there a way to fix this?

To allow CEdit to show Unicode characters you should create it with CreateWindowW function. I've just tested it in ANSI MFC program.
// allows Unicode characters
CreateWindowW( L"EDIT", L"", WS_CHILD|WS_VISIBLE, 10, 10, 50, 20, GetSafeHwnd(), 0, 0, 0 );
// shows Unicode characters as ?
CreateWindow( "EDIT", "", WS_CHILD|WS_VISIBLE, 10, 10, 50, 20, GetSafeHwnd(), 0, 0, 0 );
You could create all edit boxes manually in OnInitDialog function of dialog box. And later subclass them to custom CMyEdit class with Unicode support.

Can you replace these edit boxes with rich edit controls? Then you could enter international characters even in a non-Unicode build; internally, they would be rtf-encoded, but then when you stream the text out from the control, you can use the SF_UNICODE format to get the Unicode representation.

This PowerPoint slideshow may be of interest to you -- it's a bit old (2000) but it talks about converting a program to mixed ANSI/Unicode.
Case Study: Porting an MFC Application to Unicode

Just a thought - you could try turning on UNICODE for your build and use ANSI calls where you need to (eg. CStringA).
(I understand that this may not be an option for you, but thought it worth pointing out that you could tackle this problem the other way round)

Why is my WM_UNICHAR handler never called?

I have an ATL control that I want to be Unicode-aware. I added a message handler for WM_UNICHAR:
MESSAGE_HANDLER( WM_UNICHAR, OnUniChar )
But, for some reason, the OnUniChar handler is never called.
According to the documentation, the handler should first be called with "UNICODE_NOCHAR", on which the handler should return TRUE if you want to receive UTF-32 characters. But, as I said, the handler is never called.
Is there anything special that needs to be done to activate this?

What are you doing that you think should generate a WM_UNICHAR message?
If your code (or the ATL code) ultimately calls CreateWindowW, then your window is already Unicode aware, and WM_CHAR messages will be UTF-16 format.
The documentation is far from clear on when, exactly, a WM_UNICHAR message gets generated, but from what I can gather in very limited poking around on Google Groups and on the Internet it looks like it gets sent by 3rd party apps and not by Windows itself, unless the Window is an ANSI window (CreateWindowA and all that). Have you tried manually sending a WM_UNICHAR message to your window to see what happens? If you get the message then there's nothing wrong with your message dispatch code and there's just nothing happening that would cause WM_UNICHAR. You can also check with Spy++ and see whether you're getting that message, though I suspect it's just not being sent.

My experience today is that Spy++ does not give correct results for WM_CHAR in a Unicode proc. I am getting ASCII translations or '?' showing in the Messages list, even if I view Raw (not Decoded) arguments. The debugger shows wParam to be the Unicode code point though.

void CMFCProView::OnUniChar (UINT xChar, UINT nRepCnt, UINT nFlags)
void CMFCProView::OnChar (UINT xChar, UINT nRepCnt, UINT nFlags)
The range of UINT (unsigned int) is 0 to 4294967295 decimal (16-bit).
OnChar can do whatever you want OnUniChar to do. Click an English
character A on the softkeyboard, then OnChar will receive 0x0041.
Click a CJKV 一 (one), then OnChar will receive 0x4E00. So we don't
need OnUniChar in App.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js