"type" emojis using C++ [duplicate] - c++

This question already has an answer here:
Using SendInput to send unicode characters beyond U+FFFF
(1 answer)
Closed 7 years ago.
I don't really know what I should type in the title, but anyway, here is what I need :
I make small programs that do stuff like, "typing" the given input. Here is a small example to type "test" (as example).
#include <windows.h>
void Press(int Touch);
int main()
{
Sleep(5000);//Sleep a bit, so that you can select where to type
Press(VkKeyScan('t'));
Press(VkKeyScan('e'));
Press(VkKeyScan('s'));
Press(VkKeyScan('t'));
return 0;
}
void Press(int Touch)
{
keybd_event(Touch, 0x9d, 0, 0);
keybd_event(Touch, 0x9d, KEYEVENTF_KEYUP, 0);
}
So what I need is barely this, but with emojis. I need to be able to "type" any emoji like this one : "😂", from my program. Any ideas please ?

There's two ways you can approach this.
The first is using "alt codes":
Hold the ALT key
Press + on the number pad
Type the Unicode code point in hex
Release the ALT key
However, this method requires having EnableHexNumpad set in the Windows registry.
The second would be using the Windows clipboard.
Save the contents of the Windows clipboard
Set the clipboard contents to the Unicode character you wish to insert
Send CTRL + V to paste the character
Revert the clipboard to its previous content

keybd_event is deprecated, as mentioned on its MSDN page. Usually when you look up a Windows function and it is deprecated, you really should consider using the newer one.
Use SendInput, which among other things supports unicode keyboard input emulation.
You can send non-16-bit clean unicode characters by packing two different INPUT structures in a row.

Related

Text on MFC Controls - Unicode Characters such as Japanese get cut off

Background
I'm working on a C++/MFC application and we've been converting it to display unicode characters to support foreign languages. For the most part this has been successful and unicode characters are displayed correctly. But I've encountered an issue where certain text on certain controls gets cut off.
Example
Here you can see a button that should display "ログアウト/終了" but gets cutoff and displays an unknown character in it's place.
But if I pad the string with spaces it displays fine. The number of spaces needed varies by string. This string needed 4 spaces to display correctly, whereas another string with one less character needed 5 spaces; there doesn't seem to be a correlation or pattern with the number of spaces needed. And also, I don't want to pad strings randomly throughout the code, especially when other languages don't need this at all.
What I've tried (doesn't work)
Shrinking the font size
Resizing the control
Changing the font facename
Changing the font character set
Copying the control properties from another control in the application that does not have this issue
Add extra null terminators
Padding with zero-width characters
Using SetWindowTextW
Changing source and execution character sets
Changing system locale
The only thing I've found that works is padding with an arbitrary amount of spaces which is certainly not an ideal solution.
Other info
I've only noticed this issue for Japanese characters, but have only tested English, German, and Japanese.
Japanese characters use 3 bytes of data, which I suspect has something to do with this but I don't know what or why. English characters use 1 byte and certain German characters use 2 bytes.
A control (button/label/etc) in one place may have an issue whereas a control in a different place that contains the same text does not have the issue, even if they're both buttons..etc.
When the text is cutoff, it typically either displays a question mark box (like the first image) or a random character/letter at the end. This character changes each time I run the application, but the question box is the most common.
For my padding "fix", it doesn't matter if the spaces are at the beginning or end of the string, as long as the number of spaces is enough. It also doesn't need to be spaces, any non-zero-width character works.
Compiled using MBCS (Multibyte Character Set) and the Windows 10 UTF-8 Unicode Support setting enabled. (As opposed to compiling with UNICODE defined which isn't an option. Large old codebase)
EDIT: Here is an example on how the text is set
GetDlgItem(IDC_SOME_CTRL_ID)->SetWindowText(GetTranslation("Some String"));
Where GetTranslation() is our own function to look up the translation of "Some String" (basically a lookup table) and return a CString. Using a debugger I can see the returned CString always has the correct string value. I can replace GetTranslation with a hardcoded Japanese string and the issue will still happen.
EDIT 2: I got complaints that this code wasn't enough.
myapp.rc
// Microsoft Visual C++ generated resource script.
//
#include "resource.h"
#define APSTUDIO_READONLY_SYMBOLS
#include "afxres.h"
#undef APSTUDIO_READONLY_SYMBOLS
IDD_VIEW_MENU DIALOGEX 0, 0, 50, 232
STYLE DS_SETFONT | WS_CHILD
FONT 14, "Verdana", 0, 0, 0x1
BEGIN
CONTROL "btn0",IDC_BUTTON_MENU_0,"Button",BS_3STATE | BS_PUSHLIKE,12,38,25,13
END
#endif
resource.h
#define IDC_BUTTON_MENU_0 6040
ViewMenu.cpp
#include "stdafx.h"
#include "ViewMenu.h"
CViewMenu::CViewMenu() : CFormView(CViewMenu::IDD)
{
}
void CViewMenu::DoDataExchange(CDateExchange* pDX)
{
CFormView::DoDataExchange(pDX);
DDX_Control(pDX, IDC_BUTTON_MENU_0, m_ctrlMenuButton0);
}
void CViewMenu::OnInitialUpdate()
{
CFormView::OnInitialUpdate();
}
void CViewMenu::OnDraw(CDC* pDC)
{
CFormView::OnDraw(pDC);
GetDlgItem(IDC_BUTTON_MENU_0)->SetWindowText("ログアウト/終了");
return;
}
ViewMenu.h
#include "resource.h"
class CViewMenu : public CFormView
{
protected:
CViewMenu();
public:
enum { IDD = IDD_VIEW_MENU };
CButton m_ctrlMenuButton0;
}
The following should work in Windows 10 versions 1903 and later, regardless of the default system locale, and fulfills OP's requirements (string literals, MBCS build, no Unicode windows etc). It was verified to work in version 2004 set to En-US locale, without "Beta: Use Unicode UTF-8 for worldwide language support" checked, using VS 2019 16.7.5 to build.
Save source files containing characters outside the active codepage in UTF-8 encoding, with or without BOM.
Compile with _MBCS defined (in the IDE: Properties / Advanced / Character Set = MBCS).
Compile with the /utf-8 switch (C/C++ / Command Line / Additional Options = /utf-8).
Create a manifest file declaring UTF-8 as the target codepage for the process (per the activeCodePage documentation).
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<assembly manifestVersion="1.0" xmlns="urn:schemas-microsoft-com:asm.v1" xmlns:asmv3="urn:schemas-microsoft-com:asm.v3">
<asmv3:application>
<asmv3:windowsSettings xmlns="http://schemas.microsoft.com/SMI/2019/WindowsSettings">
<activeCodePage>UTF-8</activeCodePage>
</asmv3:windowsSettings>
</asmv3:application>
</assembly>
Add the manifest file to the project (in the IDE: Manifest Tool / General / Input and Output / Additional Manifest Files = manifest file created at the previous step).
This ain't Python. With C++ you need to know, why your code works. Otherwise it doesn't.
GetDlgItem(IDC_BUTTON_MENU_0)->SetWindowText("ログアウト/終了");
That's where you and your compiler start to disagree. You think this should be UTF-8. Your compiler, on the other hand, trusts you, and assumes that you are using the source character set.
While you are unaware of a concept called source character set, you get all confused about something that should be the norm: Garbage in, garbage out.
If you feel like fixing the "Garbage in" part (now, clearly, that is your job), read up on C++ string literals. In case you don't make it to the end, the quickest way to fix your ungodly workaround is to use a u8 prefix.
Seriously, though, the real solutions is to use Windows' native character encoding. Which, oddly, you seem to reject, even though you could use it, given a string literal. I mean, it's not like you have to change anything global. Just call SetWindowTextW and use an L prefix.
Just saying, you know...

Writing unicode(?) character directly from source code to WriteConsoleOutput

I'm trying to use WriteConsoleOutput from the WinApi to write characters to the command prompt window buffer. The thing is, I'd really like to be able to write characters such as ☺ directly into the source code, as-is, instead of using some kind of encoding/notation like '\uFFFF' or '0xFF', since I don't understand them too well (differences between codepages/character sets/etc.)
The code below showcases the simplest form of my problem. Running this code does not print ☺ into the command prompt window, but a question mark (?) instead.
#include <Windows.h>
int main()
{
HANDLE h = GetStdHandle(STD_OUTPUT_HANDLE);
CHAR_INFO c[1] = {0};
COORD cS = {1, 1};
COORD cH = {0, 0};
SMALL_RECT sr = {0, 0, 0, 0};
c[0].Attributes = FOREGROUND_INTENSITY;
c[0].Char.UnicodeChar = '☺';
WriteConsoleOutput(h, c, cS, cH, &sr);
Sleep(5000);
return 0;
}
It is vital for my code to display output identically between all Windows versions, regardless of the languages installed/used. So to my knowledge (which admittedly is absolutely minimal), I'd need to set a specific codepage (one which would hopefully be supported by the command prompt in any language Windows).
I've tried:
• Changing from using the CHAR_INFO.UnicodeChar to CHAR_INFO.AsciiChar
• Fiddling around with SetConsoleCP and SetConsoleOutputCP functions, but I haven't got a clue on how to utilize them to help me with this problem.
• Changing the Visual Studio -> Project -> Project properties.. -> Character Set setting to every possible value.
• Using specifically either WriteConsoleOutputA or WriteConsoleOutputW in addition to the aforementioned settings
• Changing the source code file encoding to UTF-8 with(/out) signature.
In my project I'm programmatically setting the command prompt font to 8x8 Terminal, which to my knowledge does not support actual unicode characters. The available characters are displayed here. Those characters do include '☺', so I'm not entirely sure my question is about unicode. I have no idea anymore. Please help.
C source has to be ascii only. If you embed non-ascii characters in a C source file, and IDE might show them in what appears to be the correct format, but the compiler quite likely treats them differently, and the executable function you pass them to can treat them differently still. It's just not portable or reliable. But you can use the escape sequence \x to embed arbitrary bytes in C strings.
UTF-8 is good for internal use, but Windows APIs don't yet support it, so you need to convert to Windows 16 bit chars (UTF-16 nearly but not quite), to display extended characters. However you have to ensure that you are calling the wide character version of the Windows API. Most Windows API functions that take string come in a A and W version (ascii and wide) for binary backwards compatibility. If you query the identifier in the IDE (go to definition etc) you should see which version you have.

C++ spanish question mark

I am beginning developing in C++ and I am developing a simple calculator in console and when my program ask to the user if wants to exit,the character '¿' doesn't appear (The questions in spanish are between '¿' and '?')
Can someone help me?
PD: The problem only happens in Windows,not in Linux
EDIT: Here is the code that output the code:
cout << '¿' <<"Desea salir (S/N)? " ;
There are a few ways to deal with this problem.
The fundamental problem is not that the ¿ doesn't exist in the console, but that the console and your C++ text editor disagree on what that character is. The two are using different character codes for many characters beyond those needed for English. Character codes 32-126 (letters, numbers, punctuation and brackets), are universally the same. However, character codes 128 through 255, which from a Spanish point of view includes all the accented characters, "u with diaeresis" (e.g. "pingüino"), Ñ, and the starting ¿ and ¡, depend on the specific environment.
Why have such an inconvenient disagreement in character codes is a historical accident, interesting on its own but out of the scope of this question. To keep it simple: in the Windows OS, "consoles" (typically) use the list of characters described in OEM Code Page 437, while Windows applications like your C++ editor (typically) use the Windows-1252 Code Page.
There is no portable (universal) solution for this problem, because the issue of differing charsets is a platform-specific problem. Windows is unfortunately somewhat unique in that the editor and (console) outputs use different sets.
The first and simplest solution - which is fine for toy programs - is to just look up the character code that you want from the OEM 437 code-page, and use that. For ¿, that's #168 (0xa8 in hex, or \250 in octal). You can just embed the character code in the string to make clear what you're trying to do, either of these:
std::cout << ""\x0a8""Cu""\x0a0""l es el primer n""\x0a3""mero?\n"; // hex
std::cout << "\250Cu\240l es el primer n\243mero?\n"; // octal
Outputs:
¿Cuál es el primer número?
Note how I had to do the same thing with the ú and the á. Unfortunately, writing strings like this gets unwieldy quickly. using macros or const chars can help, but not much.
A second alternative is to use a Windows function such as CharToOemA. For example1:
#include <windows.h>
...
...
char pregunta[] = "¿Cuál es el primer número\n";
char *pregunta_oem = new char[sizeof(pregunta)/sizeof(char)];
CharToOemA(pregunta, pregunta_oem);
std::cout << pregunta_oem;
delete []pregunta_oem;
For a more complex program, I would wrap that pattern into a utility function or class.
A different approach is to change the Code Page of the console, so that it agrees with your C++ editor and the rest of Windows. You can do that via the CHCP console command, or via the SetConsoleOutputCP() function, but that doesn't work on the default "raster font" used by consoles, so you have to change the font as well. When the font is set to a unicode font like Lucida Console, this works:
std::cout << "¿Cuál es el primer número?\n"; // ┐Cußl es el...
UINT originalCP = GetConsoleOutputCP();
SetConsoleOutputCP(1252);
std::cout << "¿Cuál es el primer número?\n"; // ¿Cuál es el...
SetConsoleOutputCP(originalCP);
(I don't know if you can change the font from the program itself; I have to look that up. The standard way to do it from the console is to click on the tiny icon on the corner, click Properties, Font tab, and pick a font from the list).
1 I have to warn that this snippet contains a number of subtleties that can easily trip a beginner. You have to make sure the source of the text is a char array; if you're using a char pointer, sizeof won't work correctly and you have to use strlen(source)+1. For the source I used the natural option of a char array initialized to a literal, but you can't do that for the destination because the contents of such an array are read/only. If you are using a new'd char array or one that is not initialized to a literal, you can use the same char array for the source and destination. This example feels very C-like.
You can use _setmode function to do that :
#include <iostream>
#include <string>
#if defined(WIN32) && !defined(UNIX)
# include <io.h> // for _setmode()
# include <fcntl.h> // for _O_U16TEXT
#endif // WIN32 && !UNIX
int main()
{
#if defined(WIN32) && !defined(UNIX)
_setmode(_fileno(stdout), _O_U16TEXT);
//^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
#endif // WIN32 && !UNIX
std::wstring wstr = L"'¿' and '?'";
std::wcout << L"WString : " << wstr << std::endl;
system("pause");
return 0;
}
To write UNICODE chars (assuming LE is the standard Windows variant of UTF-16...) out with the iostream library, call _setmode() with _O_U16TEXT and then use wcout.
But you can't use cout anymore. It throws an assert.
Check this answer.
Assuming you are using simple call to std::cout, you should be able to print Unicode strings, if you set your command line to Unicode mode:
1. Change code page to UTF-8
You can do this by simply calling the command below in your cmd:
chcp 65001
2. Make sure you are using a font which has the characters you want to display
Lucidia Console should do the trick, as it supports ¿ (and other characters included in WGL4).
this character is simply not included in basic ascii. Try using wstring http://www.cplusplus.com/reference/string/wstring/
As you can see in Ascii table, symbol ¿ have the code 168. You can use in output stream \ddd to print some special character.
This is because the command console does not support non-ASCII characters by default (ASCII has mainly English language characters and few accented characters). To get support for characters in other character classes play around with the chcp command. Refer to it's documentation here.
In your case I think you need to run chcp 850 in the console before running your program.

Linux send unicode character to active application

Ok, so I'm trying to develop an app using C++ and Qt4 for Linux that will map certain key sequences to special Unicode characters. Also, I'm trying to make it bilingual, so the special Unicode character sent depends on the selected language. Example: AltGr+s will send ß or ș, depending whether German or Romanian is selected. On Windows, I have achieved this using AutoHotKey. However, I couldn't get IronAHK to work on Linux so I have written myself a nice Qt Application for it, using Qxt to register "global" shortcuts. I have tried this snippet:
void mainWnd::sendKeypress( unsigned int keycode )
{
Display *display = QX11Info::display();
Window curr_focus;
int revert_to;
XGetInputFocus( display, &curr_focus, &revert_to );
XTestFakeKeyEvent( display, keycode, true, 0 );
XTestFakeKeyEvent( display, keycode, false, 1 );
XFlush( display );
}
copied from another application(where it works), but here it seems to do nothing. Also, there might be a problem with the fact that the characters I'm trying to send aren't found on a US 101 Keyboard, that I currently use on my laptop(and as the layout in the OS).
So my question is: how do I make the app send a Unicode character to whichever app has focus, inserting a special character(sort of like KCharMap)? Remember, these are special characters which are not found on a normal US Keyboard. Thanks in advance.

FillConsoleOutputCharacter/WriteConsoleOutput and special characters

I'm messing with some of the native windows console functions, and am impressed by their speed,if not their ease of use.
Anyway, I have long known that the following code will produce some interesting characters
for(int i = 0; i < 256; i++)
{
cout << char(i) << endl;
}
However, I cannot get FillConsoleOutputCharacter or WriteConsoleOutput to produce all of those characters (many simply appear as question marks).
Here is an example of the code I am using:
COORD spot = {0,0};
HANDLE hOut = GetStdHandle(STD_OUTPUT_HANDLE);
DWORD Written;
for(int i = 0; i < 256; i++)
{
FillConsoleOutputAttribute(hOut, 7, 1, spot, &Written);
FillConsoleOutputCharacterW(hOut, char(i), 1, spot, &Written);
spot.Y++;
}
Does anyone know of a relatively convenient way to write those characters with the native functions?
By the way, I am using Visual Studio 2010 on Windows 7 x64.
Try using FillConsoleOutputCharacterA instead of FillConsoleOutputCharacterW which is using the unicode character which can take a little bit of knowledge to get correctly.
edit I tried using FillConsoleOutputCharacterA and it gives equivalent output to your first case.
FillConsoleOutputCharacterA should write the same set of characters that the cout function does. These characters are determined by the console's current code page.
With FillConsoleOutputCharacterW, you can still generate all the same characters (as well as any additional characters that may be included in the console font) but you need to use the Unicode (16-bit) codes for these characters, rather than the 8-bit codes used with cout.
Note that Windows internally uses an out-of-date version of Unicode, with characters limited to 16 bits (0-65536) rather than Unicode proper which uses 0-1,112,063 (although most of these codes remain unassigned). I believe the console's Unicode character set corresponds to plane 0 of Unicode, the basic multilingual plane.
The question marks appear when you write a control character or a character that isn't included in the current font.