How big is wchar_t with GCC?

How big is wchar_t with GCC? - c++

GCC supports -fshort-wchar that switches wchar_t from 4, to two bytes.
What is the best way to detect the size of wchar_t at compile time, so I can map it correctly to the appropriate utf-16 or utf-32 type?
At least, until c++0x is released and gives us stable utf16_t and utf_32_t typedefs.
#if ?what_goes_here?
typedef wchar_t Utf32;
typedef unsigned short Utf16;
#else
typedef wchar_t Utf16;
typedef unsigned int Utf32;
#endif

You can use the macros
__WCHAR_MAX__
__WCHAR_TYPE__
They are defined by gcc. You can check their value with echo "" | gcc -E - -dM
As the value of __WCHAR_TYPE__ can vary from int to short unsigned int or long int, the best for your test is IMHO to check if __WCHAR_MAX__ is above 2^16.
#if __WCHAR_MAX__ > 0x10000
typedef ...
#endif

template<int>
struct blah;
template<>
struct blah<4> {
typedef wchar_t Utf32;
typedef unsigned short Utf16;
};
template<>
struct blah<2> {
typedef wchar_t Utf16;
typedef unsigned int Utf32;
};
typedef blah<sizeof(wchar_t)>::Utf16 Utf16;
typedef blah<sizeof(wchar_t)>::Utf32 Utf32;

You can use the standard macro: WCHAR_MAX:
#include <wchar.h>
#if WCHAR_MAX > 0xFFFFu
// ...
#endif
WCHAR_MAX Macro was defined by ISO C and ISO C++ standard (see: ISO/IEC 9899 - 7.18.3 Limits of other integer types and ISO/IEC 14882 - C.2), so you could use it safely on almost all compilers.

The size depends on the compiler flag -fshort-wchar:
g++ -E -dD -fshort-wchar -xc++ /dev/null | grep WCHAR
#define __WCHAR_TYPE__ short unsigned int
#define __WCHAR_MAX__ 0xffff
#define __WCHAR_MIN__ 0
#define __WCHAR_UNSIGNED__ 1
#define __GCC_ATOMIC_WCHAR_T_LOCK_FREE 2
#define __SIZEOF_WCHAR_T__ 2
#define __ARM_SIZEOF_WCHAR_T 4

As Luther Blissett said, wchar_t exists independently from Unicode - they are two different things.
If you are really talking about UTF-16 - be aware that there are unicode characters which map to two 16-bit words (U+10000..U+10FFFF, although these are rarely used in western countries/languages).

$ g++ -E -dD -xc++ /dev/null | grep WCHAR
#define __WCHAR_TYPE__ int
#define __WCHAR_MAX__ 2147483647
#define __WCHAR_MIN__ (-__WCHAR_MAX__ - 1)
#define __GCC_ATOMIC_WCHAR_T_LOCK_FREE 2
#define __SIZEOF_WCHAR_T__ 4

Related

C vs C++ why is this macro not expanded to a constant?

I am using gcc/g++. The below code compiles fine with gcc -S test.c, however with g++ -S test.cpp I get error: requested alignment is not an integer constant. If I look at the preprocessor output for both it looks identical. So my question is why isn't ALIGN_BYTES being evaluated to the constant 64 by the preprocessor in the C++ case? (If I replace ALIGN_BYTES with the constant 64 it works fine)
/* test.c, test.cpp */
#define BITS 512
#define ALIGN_BYTES (BITS / 8)
#define ALIGN __attribute__ ((aligned(ALIGN_BYTES)))
typedef char* ALIGN char_PT;

It is not a macro expansion issue. The macro is not expanded to a constant in either C or C++, here. The preprocessor does not do arithmetic, so it simply generates the expression 512 / 8 which isn't a constant, but which the compiler should definitely be able to reduce to one.
The preprocessor generates the same code for C and C++ here, but GCC (for reasons I do not understand) treats the __attribute__ extension differently in the two languages. I really have no idea why, there likely are good reasons, but someone else will have to explain that.
If you compile C, gcc is happy with aligned((512 / 8)), but if you compile C++ with g++ it will complain that 512 / 8 is not a constant. It is right, I guess, but really also wrong.
There are other cases where it is the opposite, where g++ is happy with a non-constant, but gcc is not. If you declare a static const int for example, you can use it in __attribute__((aligned(...)) in C++ but not in C. Again, I cannot explain why. It's a compiler extension and GCC can do whatever. And for some reason, it will treat the two languages differently here.
/* g++ will complain about this one; gcc will not */
typedef char *__attribute__((aligned((512 / 8)))) char_PT;
/* gcc will complain about this one; g++ will not */
static const int A = 16;
typedef char *__attribute__((aligned(A))) char_PT2;
I suppose, though, that since we know one version that works with C and another that works with C++, we could do this:
#define BITS 512
#ifdef __cplusplus
static const unsigned int ALIGN_BYTES = BITS / 8;
#define ALIGN __attribute__((aligned(ALIGN_BYTES)))
#else /* C */
#define ALIGN_BYTES (BITS / 8)
#define ALIGN __attribute__((aligned(ALIGN_BYTES)))
#endif
typedef char *ALIGN char_PT;

As #Deduplicator suggested, __attribute__ is a gcc extension. Using the below fixed the problem.
#ifdef __cplusplus
#define ALIGN alignas(ALIGN_BYTES)
#else
#define ALIGN __attribute__ ((aligned(ALIGN_BYTES)))
#endif

error: 'res_ninit' was not declared in this scope; did you mean 'res_init'?

I am trying to install an application from its source code in the alpine it says there is no res_ninit, res_nsearch and res_nclose but we can see here that do exists in the Linux headers and I have already installed apk add linux-headers, how can I resolve this issue?
make install
Consolidate compiler generated dependencies of target save_linker_opts
[ 2%] Built target save_linker_opts
[ 2%] Built target build_protobuf
Consolidate compiler generated dependencies of target cdk_foundation
[ 3%] Building CXX object cdk/foundation/CMakeFiles/cdk_foundation.dir/socket_detail.cc.o
/dep/mysql-connector-cpp/cdk/foundation/socket_detail.cc: In function 'std::forward_list<cdk::foundation::connection::detail::Srv_host_detail> cdk::foundation::connection::detail::srv_list(const string&)':
/dep/mysql-connector-cpp/cdk/foundation/socket_detail.cc:1097:3: error: 'res_ninit' was not declared in this scope; did you mean 'res_init'?
1097 | res_ninit(&state);
| ^~~~~~~~~
| res_init
/dep/mysql-connector-cpp/cdk/foundation/socket_detail.cc:1107:13: error: 'res_nsearch' was not declared in this scope; did you mean 'res_search'?
1107 | int res = res_nsearch(&state, hostname.c_str(), ns_c_in, ns_t_srv, query_buffer, sizeof (query_buffer) );
| ^~~~~~~~~~~
| res_search
/dep/mysql-connector-cpp/cdk/foundation/socket_detail.cc:1143:3: error: 'res_nclose' was not declared in this scope
1143 | res_nclose(&state);
| ^~~~~~~~~~
make[2]: *** [cdk/foundation/CMakeFiles/cdk_foundation.dir/build.make:146: cdk/foundation/CMakeFiles/cdk_foundation.dir/socket_detail.cc.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:1061: cdk/foundation/CMakeFiles/cdk_foundation.dir/all] Error 2
Update: I cat the header file /usr/include/resolv.h and we can see that function does not exists, so the question is how can I set up Linux header like ubuntu that can work in the alpine?
#ifndef _RESOLV_H
#define _RESOLV_H
#include <stdint.h>
#include <arpa/nameser.h>
#include <netinet/in.h>
#ifdef __cplusplus
extern "C" {
#endif
#define MAXNS 3
#define MAXDFLSRCH 3
#define MAXDNSRCH 6
#define LOCALDOMAINPARTS 2
#define RES_TIMEOUT 5
#define MAXRESOLVSORT 10
#define RES_MAXNDOTS 15
#define RES_MAXRETRANS 30
#define RES_MAXRETRY 5
#define RES_DFLRETRY 2
#define RES_MAXTIME 65535
/* unused; purely for broken apps */
typedef struct __res_state {
int retrans;
int retry;
unsigned long options;
int nscount;
struct sockaddr_in nsaddr_list[MAXNS];
# define nsaddr nsaddr_list[0]
unsigned short id;
char *dnsrch[MAXDNSRCH+1];
char defdname[256];
unsigned long pfcode;
unsigned ndots:4;
unsigned nsort:4;
unsigned ipv6_unavail:1;
unsigned unused:23;
struct {
struct in_addr addr;
uint32_t mask;
} sort_list[MAXRESOLVSORT];
void *qhook;
void *rhook;
int res_h_errno;
int _vcsock;
unsigned _flags;
union {
char pad[52];
struct {
uint16_t nscount;
uint16_t nsmap[MAXNS];
int nssocks[MAXNS];
uint16_t nscount6;
uint16_t nsinit;
struct sockaddr_in6 *nsaddrs[MAXNS];
unsigned int _initstamp[2];
} _ext;
} _u;
} *res_state;
#define __RES 19960801
#ifndef _PATH_RESCONF
#define _PATH_RESCONF "/etc/resolv.conf"
#endif
struct res_sym {
int number;
char *name;
char *humanname;
};
#define RES_F_VC 0x00000001
#define RES_F_CONN 0x00000002
#define RES_F_EDNS0ERR 0x00000004
#define RES_EXHAUSTIVE 0x00000001
#define RES_INIT 0x00000001
#define RES_DEBUG 0x00000002
#define RES_AAONLY 0x00000004
#define RES_USEVC 0x00000008
#define RES_PRIMARY 0x00000010
#define RES_IGNTC 0x00000020
#define RES_RECURSE 0x00000040
#define RES_DEFNAMES 0x00000080
#define RES_STAYOPEN 0x00000100
#define RES_DNSRCH 0x00000200
#define RES_INSECURE1 0x00000400
#define RES_INSECURE2 0x00000800
#define RES_NOALIASES 0x00001000
#define RES_USE_INET6 0x00002000
#define RES_ROTATE 0x00004000
#define RES_NOCHECKNAME 0x00008000
#define RES_KEEPTSIG 0x00010000
#define RES_BLAST 0x00020000
#define RES_USEBSTRING 0x00040000
#define RES_NOIP6DOTINT 0x00080000
#define RES_USE_EDNS0 0x00100000
#define RES_SNGLKUP 0x00200000
#define RES_SNGLKUPREOP 0x00400000
#define RES_USE_DNSSEC 0x00800000
#define RES_DEFAULT (RES_RECURSE|RES_DEFNAMES|RES_DNSRCH|RES_NOIP6DOTINT)
#define RES_PRF_STATS 0x00000001
#define RES_PRF_UPDATE 0x00000002
#define RES_PRF_CLASS 0x00000004
#define RES_PRF_CMD 0x00000008
#define RES_PRF_QUES 0x00000010
#define RES_PRF_ANS 0x00000020
#define RES_PRF_AUTH 0x00000040
#define RES_PRF_ADD 0x00000080
#define RES_PRF_HEAD1 0x00000100
#define RES_PRF_HEAD2 0x00000200
#define RES_PRF_TTLID 0x00000400
#define RES_PRF_HEADX 0x00000800
#define RES_PRF_QUERY 0x00001000
#define RES_PRF_REPLY 0x00002000
#define RES_PRF_INIT 0x00004000
struct __res_state *__res_state(void);
#define _res (*__res_state())
int res_init(void);
int res_query(const char *, int, int, unsigned char *, int);
int res_querydomain(const char *, const char *, int, int, unsigned char *, int);
int res_search(const char *, int, int, unsigned char *, int);
int res_mkquery(int, const char *, int, int, const unsigned char *, int, const unsigned char*, unsigned char *, int);
int res_send(const unsigned char *, int, unsigned char *, int);
int dn_comp(const char *, unsigned char *, int, unsigned char **, unsigned char **);
int dn_expand(const unsigned char *, const unsigned char *, const unsigned char *, char *, int);
int dn_skipname(const unsigned char *, const unsigned char *);
#ifdef __cplusplus
}
#endif
#endif

in the Linux headers
Linux is generally/colloquially the name of all unix-ish operating systems with a Linux kernel, but specifically, Linux refers to the Linux kernel tiself. The resolve headers are not part of Linux kernel. linux-headers install headers needed to compile Linux kernel modules. It's unrelated.
The mentioned resolver headers are implemented inside glibc GNU C library. Alpine distribution uses musl implementation of C standard library, not glibc.
how can I resolve this issue?
One of:
you can implement res_ninit and relevant function that depend on musl and use that implementation when compiling the application
patch mysql-connector yourself to be compilable with musl
compile/install glibc to your system and compile mysql-connector against it
do not use alpine for programs that require glibc and use only glibc-compatibile Linux distribution
notify mysql-connector developers about the issue and financially support them so they will fix the issue

What is the role of the # symbol in c++?

I saw some code in glog like below:
#if #ac_cv_have_libgflags#
#include <gflags/gflags.h>
#endif
#ac_google_start_namespace#
#if #ac_cv_have_uint16_t# // the C99 format
typedef int32_t int32;
typedef uint32_t uint32;
typedef int64_t int64;
typedef uint64_t uint64;
#elif #ac_cv_have_u_int16_t# // the BSD format
What is the role of the # symbol in c++, how to use it?

Those "#ac...#" tokens are for autoconf aka ./configure. They are replaced before the file is compiled, by the preprocessor called m4.
After the m4 preprocessing is done on your example, but before the C preprocessing is done, it might look like this:
#if 1
#include <gflags/gflags.h>
#endif
namespace google {
#if 1 // the C99 format
typedef int32_t int32;
typedef uint32_t uint32;
typedef int64_t int64;
typedef uint64_t uint64;
#elif 0 // the BSD format
Some of the tokens in your example are populated by a file like this: https://android.googlesource.com/platform/external/open-vcdiff/+/0a58c5c2f73e5047b36f12b5f12b12d6f2a9f69d/gflags/m4/google_namespace.m4
For more on autoconf, see: http://www.cs.columbia.edu/~sedwards/presentations/autoconf1996.pdf

Can anyone explain me this code

#ifndef EIGHT_BIT
#define THIRTYTWO_BIT // default 32 bit
#endif
#ifdef THIRTYTWO_BIT
#define WORD unsigned long
#define WORDLENGTH 4
#if defined(WIN32) && !defined(__GNUC__)
#define WORD64 unsigned __int64
#else
#define WORD64 unsigned long long
#endif
// THIRTYTWO_BIT
#endif
#ifdef EIGHT_BIT
#define WORD unsigned short
#define WORDLENGTH 4
// EIGHT_BIT
#endif

It's just a definition of constants (aka defines) depending on the #define EIGHT_BIT.
If EIGHT_BIT is defined, WORD means unsigned short and WORDLENGTH is 4. Otherwise, WORD is unsigned long and WORDLENGTH is also 4. Additionally, WORD64 will be defined as unsigned long long unless you are on a WIN32 system and not using GCC.

All the "code" does it setup pre-processor symbols for any "live" code that you do have. If a symbol called EIGHT_BIT is defined before this code is pre-processed, it sets up WORD and WORDLENGTH accordingly (though WORDLENGTH's value is suspect), and it will set up the values differently if EIGHT_BIT is not already defined.

The first thing to note about this code is that none of it will actually be compiled into C. Every line that isn't whitespace or a comment starts with a pound sign (#), meaning they are preprocessor directives. A preprocessor directive alters the code before it even makes it to the compiler. For more information of preprocessor directives, see this article.
Now that we know that much, let's look through the code:
#ifndef EIGHT_BIT
#define THIRTYTWO_BIT // default 32 bit
#endif
If the macro EIGHT_BIT is not defined, define another macro called THIRTYTWO_BIT. This is most likely referring to the number of bits in a word on a processor. This code intends to be cross-platform, meaning that it can run on a number of processors. The snippet you posted pertains to managing different word widths.
#ifdef THIRTYTWO_BIT
#define WORD unsigned long
#define WORDLENGTH 4
If the macro THIRTYTWO_BIT is defined, then define a WORD to be an unsigned long, which has a WORDLENGTH of 4 (presumably bytes). Note that this statement isn't necessarily true, as the C standard only guarantees that a long will be as least as long as an int.
#if defined(WIN32) && !defined(__GNUC__)
#define WORD64 unsigned __int64
#else
#define WORD64 unsigned long long
#endif
If this is a 32-bit Windows platform and the GNU C compiler is not available, then use the Microsoft-specific datatype for 64-bit words (unsigned __int64). Otherwise, use the GNU C datatype (unsigned long long).
// THIRTYTWO_BIT
#endif
Every #if and #ifdef directive must be matched by a corresponding #endif to delineate where the conditional section ends. This line ends the #ifdef THIRTYTWO_BIT declaration made previously.
#ifdef EIGHT_BIT
#define WORD unsigned short
#define WORDLENGTH 4
// EIGHT_BIT
#endif
If the target processor has a word width of 8 bits, then define a WORD to be an unsigned short, and define the WORDLENGTH to be 4 (again, presumably in bytes).

_T( ) macro changes for UNICODE character data

I have UNICODE application where in we use _T(x) which is defined as follows.
#if defined(_UNICODE)
#define _T(x) L ##x
#else
#define _T(x) x
#endif
I understand that L gets defined to wchar_t, which will be 4 bytes on any platform. Please correct me if I am wrong. My requirement is that I need L to be 2 bytes. So as compiler hack I started using -fshort-wchar gcc flag. But now I need my application to be moved to zSeries where I don't get to see the effect of -fshort-wchar flag in that platform.
In order for me to be able to port my application on zSeries, I need to modify _T( ) macro in such a way that even after using L ##x and without using -fshort-wchar flag, I need to get 2byte wide character data.Can some one tell me how I can change the definition of L so that I can define L to be 2 bytes always in my application.

You can't - not without c++0x support. c++0x defines the following ways of declaring string literals:
"string of char characters in some implementation defined encoding" - char
u8"String of utf8 chars" - char
u"string of utf16 chars" - char16_t
U"string of utf32 chars" - char32_t
L"string of wchar_t in some implementation defined encoding" - wchar_t
Until c++0x is widely supported, the only way to encode a utf-16 string in a cross platform way is to break it up into bits:
// make a char16_t type to stand in until msvc/gcc/etc supports
// c++0x utf string literals
#ifndef CHAR16_T_DEFINED
#define CHAR16_T_DEFINED
typedef unsigned short char16_t;
#endif
const char16_t strABC[] = { 'a', 'b', 'c', '\0' };
// the same declaration would work for a type that changes from 8 to 16 bits:
#ifdef _UNICODE
typedef char16_t TCHAR;
#else
typedef char TCHAR;
#endif
const TCHAR strABC2[] = { 'a', 'b', 'b', '\0' };
The _T macro can only deliver the goods on platforms where wchar_t's are 16bits wide. And, the alternative is still not truly cross-platform: The coding of char and wchar_t is implementation defined so 'a' does not necessarily encode the unicode codepoint for 'a' (0x61). Thus, to be strictly accurate, this is the only way of writing the string:
const TCHAR strABC[] = { '\x61', '\x62', '\x63', '\0' };
Which is just horrible.

Ah! The wonders of portability :-)
If you have a C99 compiler for all your platforms, use int_least16_t, uint_least16_t, ... from <stdint.h>. Most platforms also define int16_t but it's not required to exist (if the platform is capable of using exactly 16 bits at a time, the typedef int16_t must be defined).
Now wrap all the strings in arrays of uint_least16_t and make sure your code does not expect values of uint_least16_t to wrap at 65535 ...

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How big is wchar_t with GCC? - c++

template<int> struct blah; template<> struct blah<4> { typedef wchar_t Utf32; typedef unsigned short Utf16; }; template<> struct blah<2> { typedef wchar_t Utf16; typedef unsigned int Utf32; }; typedef blah<sizeof(wchar_t)>::Utf16 Utf16; typedef blah<sizeof(wchar_t)>::Utf32 Utf32;

You can use the standard macro: WCHAR_MAX: #include <wchar.h> #if WCHAR_MAX > 0xFFFFu // ... #endif WCHAR_MAX Macro was defined by ISO C and ISO C++ standard (see: ISO/IEC 9899 - 7.18.3 Limits of other integer types and ISO/IEC 14882 - C.2), so you could use it safely on almost all compilers.

As Luther Blissett said, wchar_t exists independently from Unicode - they are two different things. If you are really talking about UTF-16 - be aware that there are unicode characters which map to two 16-bit words (U+10000..U+10FFFF, although these are rarely used in western countries/languages).

$ g++ -E -dD -xc++ /dev/null | grep WCHAR #define __WCHAR_TYPE int #define WCHAR_MAX 2147483647 #define WCHAR_MIN (-WCHAR_MAX - 1) #define GCC_ATOMIC_WCHAR_T_LOCK_FREE 2 #define __SIZEOF_WCHAR_T__ 4

Related

C vs C++ why is this macro not expanded to a constant?

error: 'res_ninit' was not declared in this scope; did you mean 'res_init'?

What is the role of the # symbol in c++?

Can anyone explain me this code

_T( ) macro changes for UNICODE character data

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How big is wchar_t with GCC? - c++

template<int> struct blah; template<> struct blah<4> { typedef wchar_t Utf32; typedef unsigned short Utf16; }; template<> struct blah<2> { typedef wchar_t Utf16; typedef unsigned int Utf32; }; typedef blah<sizeof(wchar_t)>::Utf16 Utf16; typedef blah<sizeof(wchar_t)>::Utf32 Utf32;

You can use the standard macro: WCHAR_MAX: #include <wchar.h> #if WCHAR_MAX > 0xFFFFu // ... #endif WCHAR_MAX Macro was defined by ISO C and ISO C++ standard (see: ISO/IEC 9899 - 7.18.3 Limits of other integer types and ISO/IEC 14882 - C.2), so you could use it safely on almost all compilers.

As Luther Blissett said, wchar_t exists independently from Unicode - they are two different things. If you are really talking about UTF-16 - be aware that there are unicode characters which map to two 16-bit words (U+10000..U+10FFFF, although these are rarely used in western countries/languages).

$ g++ -E -dD -xc++ /dev/null | grep WCHAR #define __WCHAR_TYPE__ int #define __WCHAR_MAX__ 2147483647 #define __WCHAR_MIN__ (-__WCHAR_MAX__ - 1) #define __GCC_ATOMIC_WCHAR_T_LOCK_FREE 2 #define __SIZEOF_WCHAR_T__ 4

Related

C vs C++ why is this macro not expanded to a constant?

error: 'res_ninit' was not declared in this scope; did you mean 'res_init'?

What is the role of the # symbol in c++?

Can anyone explain me this code

_T( ) macro changes for UNICODE character data

Categories

Resources

$ g++ -E -dD -xc++ /dev/null | grep WCHAR #define __WCHAR_TYPE int #define WCHAR_MAX 2147483647 #define WCHAR_MIN (-WCHAR_MAX - 1) #define GCC_ATOMIC_WCHAR_T_LOCK_FREE 2 #define __SIZEOF_WCHAR_T__ 4