I have an array of wchar_t. I need to add a unicode null character at a specific position in the array.
wchar_t var1[100];
var1[79] = '\u0000';
I tried the above but get the following compilation error.
error C3850: '\u0000': a universal-character-name specifies an invalid character
How do I add a unicode null character?
I think you can use
var1[79] = L'\0'
The language doesn't allow you to use universal character names for characters that you can easily write without using a UCN. That's why '\u0000' isn't permitted. (I'm not quite sure what the rationale for that rule is.)
Since var1 is an array of wchar_t, L'\0' is the most straightforward thing to use.
But since char, wchar_t, and int are all integral types, and since values of any integral type can be assigned to an object of another integral type (as long as the value is in range of the target type), any of the following will work:
var1[79] = L'\0'; // best
var1[79] = '\0'; // char value converted to wchar_t
var1[79] = 0; // int value converted to wchar_t
I have programmed some simple window applications using raw api32 and my best guess is to use L'\0'.
Integer zero will also do:
var1[79] = 0;
Related
I'm trying to assign the Chinese character 牛 as a char value in C++. On Xcode, I get the error:
"Character too large for enclosing character literal type."
When I use an online IDE like JDoodle or Browxy, I get the error:
"multi-character character constant."
It doesn't matter whether I use char, char16_t, char32_t or wchar_t, it won't work. I thought any Chinese character could at least fit into wchar_t, but this appears not to be the case. What can I do differently?
char letter = '牛';
char16_t character = '牛';
char32_t hanzi = '牛';
wchar_t word = '牛';
All of your character literals are standard chars. To get a wider type, you need to include the proper prefix on the literal:
char letter = '牛';
char16_t character = u'牛';
char32_t hanzi = U'牛';
wchar_t word = L'牛';
Is a c-style string containing only one char considered a string or would you call that construct a char?
Zero or more characters followed by a NUL-terminator is a C-style string. You can use the double quotation character notation to define a literal.
In C, an int that can fit into a char, such as '3' is a char.
Something like '34' is multicharacter literal.
A one element buffer is still technically a buffer. Forming a pointer to the start of it is not at all affected by how many items are in it.
So no, it's not a char. Furthermore, even the type system would differentiate char[1] from char.
It's also worth nothing that you may be surprised by what is a 1 character string. Because this one "a" has two characters in the buffer, not one. The only one character buffer that is a valid C-string is the empty string.
Is a c-style string containing only one char considered a string or
would you call that construct a char?
Indeed a C-Style string means a string i.e. it is quite different from a char data type. Since in C language, You don't have a dedicated built-in type to manipulate and represent string type like in C++ we have std::string hence once has to use character arrays (essentially null terminated) i.e. char str[SIZE] = "something" to represent character string type. On the other hand a single character is stored in char which is altogether different from char []. These two things are not same!
Example,
char str[] = "a"; // sizeof(str) will give 2 because presence of extra NULL character
char c = 'a'; // simply a single character
I am new to MFC. I am trying to do simple mfc application and I'm getting confuse in some places. For example, SetWindowText have two api, SetWindowTextA, SetWindowTextW one api takes char * and another one accepts wchar_t *.
What is the use of char * and wchar_t *?
char is used for so called ANSI family of functions (typically function name ends with A), or more commonly known as using ASCII character set.
wchar_t is used for new so called Unicode (or Wide) family of functions (typically function name ends with W), which use UTF-16 character set. It is very similar to UCS-2, but not quite it. If character requires more than 2 bytes, it will be converted into 2 composite codepoints, and this can be very confusing.
If you want to convert one to another, it is not really simple task. You will need to use something like MultiByteToWideChar, which requires knowing and providing code page for input ANSI string.
On Windows, APIs that take char * use the current code page whereas wchar_t * APIs use UTF-16. As a result, you should always use wchar_t on Windows. A recommended way to do this is to:
// Be sure to define this BEFORE including <windows.h>
#define UNICODE 1
#include <windows.h>
When UNICODE is defined, APIs like SetWindowText will be aliased to SetWindowTextW and can therefore be used safely. Without UNICODE, SetWindowText will be aliased to SetWindowTextA and therefore cannot be used without first converting to the current code page.
However, there's no good reason to use wchar_t when you are not calling Windows APIs, since its portable functionality is not useful, and its useful functionality is not portable (wchar_t is UTF-16 only on Windows, on most other platforms it is UTF-32, what a total mess.)
SetWindowTextA takes char*, which is a pointer to ANSI strings.
SetWindowTextW takes wchar_t*, which is a pointer to "wide" strings (Unicode).
SetWindowText has been defined (#define) to either of these in header Windows.h based on the type of application you are building. If you are building a UNICODE build then your code will automatically use SetWindowTextW.
SetWindowTextA is there primarily to support legacy code, which needs to be built as SBCS (Single byte character set).
char* : It means that this is a pointer to data of type char.
Example
// Regular char
char aChar = 'a';
// Pointer to char
char* aPointer = new char;
*aPointer = 'a';
// Pointer to an array of 10 chars
char* anArray = new char[ 10 ];
*anArray = 'a';
anArray[ 1 ] = 'b';
// Also a pointer to an array of 10
char[] anArray = new char[ 10 ];
*anArray = 'a';
anArray[ 1 ] = 'b';
wchar_t* : wchar_t is defined such that any locale's char encoding can be converted to a wchar_t representation where every wchar_t represents exactly one codepoint.
I tried to use this code:
USES_CONVERSION;
LPWSTR temp = A2W(selectedFileName);
but when I check the temp variable, just get the first character
thanks in advance
If I recall correctly, CString is typedef'd to either CStringA or CStringW, depending on whether you're building Unicode or not.
LPWSTR is a "Long Pointer to a Wide STRing" -- aka: wchar_t*
If you want to pass a CString to a function that takes LPWSTR, you can do:
some_function(LPWSTR str);
// if building in unicode:
some_function(selectedFileName);
// if building in ansi:
some_function(CA2W(selectedFileName));
// The better way, especially if you're building in both string types:
some_function(CT2W(selectedFileName));
HOWEVER LPWSTR is non-const access to a string. Are you using a function that tries to modify the string? If so, you want to use an actual buffer, not a CString.
Also, when you "check" temp -- what do you mean? did you try cout << temp? Because that won't work (it will display just the first character):
char uses one byte per character. wchar_t uses two bytes per character. For plain english, when you convert it to wide strings, it uses the same bytes as the original string, but each character gets padded with a zero. Since the NULL terminator is also a zero, if you use a poor debugger or cout (which is uses ANSI text), you will only see the first character.
If you want to print a wide string to standard out, use wcout.
In short: You cannot. If you need a non-const pointer to the underlying character buffer of a CString object you need to call GetBuffer.
If you need a const pointer you can simply use static_cast<LPCWSTR>(selectedFilename).
I know this is a decently old question, but I had this same question and none of the previous answers worked for me.
This, however, did work for my unicode build:
LPWSTR temp = (LPWSTR)(LPCWSTR)selectedFileName;
LPWSTR is a "Long Pointer to a Wide String". It is like wchar*.
CString strTmp = "temp";
wchar* szTmp;
szTmp = new WCHAR[wcslen(strTmp) + 1];
wcscpy_s(szTmp, wcslen(strTmp) + 1, strTmp);
I have been having trouble working with 3-byte Unicode UTF-8 characters in arrays. When they are in char arrays I get multi-character character constant and implicit constant conversion warnings, but when I use wchar_t arrays, wcout returns nothing at all. Because of the nature of the project, it must be an array and not a string. Below is an example of what I've been trying to do.
#include <iostream>
#include <string>
using namespace std;
int main()
{
wchar_t testing[40];
testing[0] = L'\u0B95';
testing[1] = L'\u0BA3';
testing[2] = L'\u0B82';
testing[3] = L'\0';
wcout << testing[0] << endl;
return 0;
}
Any suggestions? I'm working with OSX.
Since '\u0B95' requires 3 bytes, it is considered a multicharacter literal. A multicharacter literal has type int and an implementation-defined value. (Actually, I don't think gcc is correct to do this)
Putting the L prefix before the literal makes it have type wchar_t and has an implementation defined value (it maps to a value in the execution wide-character set which is an implementation defined superset of the basic execution wide-character set).
The C++11 standard provides us with some more Unicode aware types and literals. The additional types are char16_t and char32_t, whose values are the Unicode code-points that represent the character. They are analogous to UTF-16 and UTF-32 respectively.
Since you need character literals to store characters from the basic multilingual plane, you'll need a char16_t literal. This can be written as, for example, u'\u0B95'. You can therefore write your code as follows, with no warnings or errors:
char16_t testing[40];
testing[0] = u'\u0B95';
testing[1] = u'\u0BA3';
testing[2] = u'\u0B82';
testing[3] = u'\0';
Unfortunately, the I/O library does not play nicely with these new types.
If you do not truly require using character literals as above, you may make use of the new UTF-8 string literals:
const char* testing = u8"\u0B95\u0BA3\u0B82";
This will encode the characters as UTF-8.