Values assigned to char in c++ [duplicate] - c++

Why is this a warning? I think there are many cases when is more clear to use multi-char int constants instead of "no meaning" numbers or instead of defining const variables with same value. When parsing wave/tiff/other file types is more clear to compare the read values with some 'EVAW', 'data', etc instead of their corresponding values.
Sample code:
int waveHeader = 'EVAW';
Why does this give a warning?

According to the standard (§6.4.4.4/10)
The value of an integer character constant containing more than one
character (e.g., 'ab'), [...] is implementation-defined.
long x = '\xde\xad\xbe\xef'; // yes, single quotes
This is valid ISO 9899:2011 C. It compiles without warning under gcc with -Wall, and a “multi-character character constant” warning with -pedantic.
From Wikipedia:
Multi-character constants (e.g. 'xy') are valid, although rarely
useful — they let one store several characters in an integer (e.g. 4
ASCII characters can fit in a 32-bit integer, 8 in a 64-bit one).
Since the order in which the characters are packed into one int is not
specified, portable use of multi-character constants is difficult.
For portability sake, don't use multi-character constants with integral types.

This warning is useful for programmers that would mistakenly write 'test' where they should have written "test".
This happen much more often than programmers that do actually want multi-char int constants.

If you're happy you know what you're doing and can accept the portability problems, on GCC for example you can disable the warning on the command line:
-Wno-multichar
I use this for my own apps to work with AVI and MP4 file headers for similar reasons to you.

Even if you're willing to look up what behavior your implementation defines, multi-character constants will still vary with endianness.
Better to use a (POD) struct { char[4] }; ... and then use a UDL like "WAVE"_4cc to easily construct instances of that class

Simplest C/C++ any compiler/standard compliant solution, was mentioned by #leftaroundabout in comments above:
int x = *(int*)"abcd";
Or a bit more specific:
int x = *(int32_t*)"abcd";
One more solution, also compliant with C/C++ compiler/standard since C99 (except clang++, which has a known bug):
int x = ((union {char s[5]; int number;}){"abcd"}).number;
/* just a demo check: */
printf("x=%d stored %s byte first\n", x, x==0x61626364 ? "MSB":"LSB");
Here anonymous union is used to give a nice symbol-name to the desired numeric result, "abcd" string is used to initialize the lvalue of compound literal (C99).

If you want to disable this warning it is important to know that there are two related warning parameters in GCC and Clang: GCC Compiler options -wno-four-char-constants and -wno-multichar

Related

Forcing sign of a bit field (pre-C++14) when using fixed size types

Skip to the bolded part for the essential question, the rest is just background.
For reasons I prefer not to get into, I'm writing a code generator that generates C++ structs in a (very) pre-C++14 environment. The generator has to create bit-fields; it also needs the tightest possible control over the behaviour of the generated fields, in as portable a fashion as possible. I need to control both the size of the underlying allocation unit, and how signed values are handled. I won't get into why I'm on such a fool's errand, that so obviously runs afoul of Implementation Defined behaviour, but there's a paycheck involved, and all the right ways to do what needs to be done have been rejected by the people who arrange the paychecks.
So I'm stuck generating things like:
int32_t x : 11;
because I need to convince the compiler that this field (and other adjacent fields with the same underlying type) live in a 32 bit word. Generating int for the underlying type is not an option because int doesn't have a fixed size, and things would go very wrong the day someone releases a compiler in which int is 64 bits wide, or we end up back on one where it's 16.
In pre-C++14, int x: 11 might or might not be an unsigned field, and you prepend an explicit signed or unsigned to get what you need. I'm concerned that int32_t and friends will have the same ambiguity (why wouldn't it?) but compilers are gagging on signed int32_t.
Does the C++ standard have any words on whether the intxx_t types impose their signedness on bit fields? If not, is there any guarantee that something like
typedef signed int I32;
...
I32 x : 11;
...
assert(sizeof(I32)==4); //when this breaks, you won't have fun
will carry the signed indicator into the bitfield?
Please note that any suggestion that starts with "just generate a function to..." is by fiat off the table. These generated headers will be plugged into code that does things like s->x = 17; and I've had it nicely explained to me that I must not suggest changing it all to s->set_x(17) even one more time. Even though I could trivially generate a set_x function to exactly and safely do what I need without any implementation defined behaviour at all. Also, I've very aware of the vagaries of bit fields, and left to right and right to left and inside out and whatever else compilers get up to with them, and several other reasons why this is a fool's errand. And I can't just "try stuff" because this needs to work on compilers I don't have, which is why I'm scrambling after guarantees in the standard.
Note: I can't implement any solution that doesn't allow existing code to simply cast a pointer to a buffer of bytes to a pointer to the generated struct, and then use their pointer to get to fields to read and write. The existing code is all about s->x, and must work with no changes. That rules out any solution involving a constructor in generated code.
Does the C++ standard have any words on whether the intxx_t types impose their signedness on bit fields?
No.
The standard's synopsis for the fixed-width integers of <cstdint>, [cstdint.syn] (link to modern standard; the relevant parts of the synopsis looks the same in the C++11 standard) simply specifies, descriptively (not by means of the signed/unsigned keywords), that they shall be of "signed integer type" or "unsigned integer type".
E.g. for gcc, <cstdint> expose the fixed width integers of <stdint.h>, which in turn are typedefs to predefined pre-processor macros (e.g. __INT32_TYPE__ for int32_t), the latter being platform specific.
The standard does not impose any required use of the signed or unsigned keywords in this synopsis, and thus bit fields of fixed width integer types will, in C++11, suffer the same implementation-defined behavior regarding their signedness as is present when declaring a plain integer bit field. Recall that the relevant part of [class.bit]/3 prior to C++14 was (prior to action due to CWG 739):
It is implementation-defined whether a plain (neither explicitly signed nor unsigned) char, short, int, long, or long long bit-field is signed or unsigned. ...
Indeed, the following thread
How are the GNU C preprocessor predefined macros used?
shows an example where e.g. __INT32_TYPE__ on the answerer's particular platform is defined with no explicit presence of the signed keyword:
$ gcc -dM -E - < /dev/null | grep __INT
...
#define __INT32_TYPE__ int
it also needs the tightest possible control over the behaviour of the generated fields, in as portable a fashion as possible. I need to control both the size of the underlying allocation unit, and how signed values are handled.
These two goals are incompatible. Bitfields inherently have portability problems.
If the standard defined the behaviors you want, then the "vagaries of bit fields" wouldn't exist, and people wouldn't bother recommending using bitmasks and shifts for portability.
What you possibly could do is to provide a class that exposes the same interface as a struct with bitfields but that doesn't actually use bitfields internally. Then you could make its constructor and destructor read or write those fields portably via masks and shifts. For example, something like:
class BitfieldProxy
{
public:
BitfieldProxy(uint32_t& u)
: x((u >> 4) & 0x7FF),
y(u & 0xF),
mDest(u)
{
}
~BitfieldProxy()
{
assert((x & 0x7FF) == x);
assert((y & 0xF) == y);
dest = (x << 4) | y;
}
BitfieldProxy(const BitfieldProxy&) = delete;
BitfieldProxy& operator=(const BitfieldProxy&) = delete;
// Only the last 11 bits are valid.
unsigned int x;
// Only the last 4 bits are valid.
unsigned int y;
private:
uint32_t& mDest;
};

Purpose of using UINT64_C?

I found this line in boost source:
const boost::uint64_t m = UINT64_C(0xc6a4a7935bd1e995);
I wonder what is the purpose of using a MACRO here?
All this one does is to add ULL to the constant provided.
I assume it may be used to make it harder for people to make mistake of typing UL instead of ULL, but I wonder if there is any other reason to use it.
If you look at boost/cstdint.h, you can see that the definition of the UINT64_C macro is different on different platforms and compilers.
On some platforms it's defined as value##uL, on others it's value##uLL, and on yet others it's value##ui64. It all depends on the size of unsigned long and unsigned long long on that platform or the presence of compiler-specific extensions.
I don't think using UINT64_C is actually necessary in that context, since the literal 0xc6a4a7935bd1e995 would already be interpreted as a 64-bit unsigned integer. It is necessary in some other context though. For example, here the literal 0x00000000ffffffff would be interpreted as a 32-bit unsigned integer if it weren't specifically specified as a 64-bit unsigned integer by using UINT64_C (though I think it would be promoted to uint64_t for the bitwise AND operation).
In any case, explicitly declaring the size of literals where it matters serves a valuable role in code-clarity. Sometimes, even if an operation is perfectly well-defined by the language, it can be difficult for a human programmer to tell what types are involved. Saying it explicitly can make code easier to reason about, even if it doesn't directly alter the behavior of the program.

g++ warning: conversion to uint16_t from int may alter its value

At the advice of a high rep SO user, I've recently started compiling with the -Wconversion flag on my codebase. This has generated quite a few warnings, some which are legitimate (needlessly adding signed and unsigned types, for instance), but also some head scratchers, demonstrated below:
#include <cstdint>
int main()
{
uint16_t a = 4;
uint16_t b = 5;
b += a;
return 0;
}
When I compile with g++ -Wconversion -std=c++11 -O0 myFile.cpp, I get
warning: conversion to 'uint16_t {aka short unsigned int}' from 'int' may alter its value [-Wconversion]
b += a;
^
I've perused some similar questions on SO (dealing with | and << operators), taken a look here, and have read the Numeric Promotions and Numeric Conversions sections here. My understanding is, in order to do the math, a and b are promoted to int (since that's the first type that can fit the entire uint16_t value range), math is performed, the result is written back... except the result of the math is an int, and writing that back to uint16_t generates the warning. The consensus of the other questions was basically to cast away the warning, and the only way I've figured out how to do that is b = (uint16_t)(b + a); (or the equivalent b = static_cast<uint16_t>(b + a);).
Don't want this question to get too broad, but assuming my understanding of integer promotions is correct...
What's the best way to handle this moving forward? Should I avoid performing math on types narrower than int? It seems quite odd to me that I have to cast an arithmetic result which is the same type as all the operands (guess I would expect the compiler to recognize that and suppress the warning). Historically, I've liked to use no more bits than I need, and just let the compiler handle the promotions/conversions/padding as necessary.
Anyone use -Wconversion flag frequently? Just after a couple of days of using it myself, I'm starting to think its best use case is to turn it on, look at what it complains about, fix the legitimate complaints, then turn it back off. Or perhaps my definition of "legitimate complaint" needs readjusting. Replacing all of my += operators with spelled out casts seems like a nuisance more than anything.
I'm tempted to tag this as c as well, since an equivalent c code compiled with gcc -Wconversion -std=c11 -O0 myFile.c produces the exact same warning. But as is, I'm using g++ version 5.3.1 on an x86_64 Fedora 23 box. Please point me to the dupe if I've missed it; if the only answer/advice here is to cast away the warning, then this is a dupe.
What's the best way to handle this moving forward?
-Wno-conversion
Or just leave it unspecified. This is just an opinion, though.
In my experience, the need for narrow integer arithmetic tends to be quite rare, so you could still keep it on for the project, and disable for the few cases where this useless warning occurs. However, this probably depends highly on the type of your project, so your experience may vary.
Should I avoid performing math on types narrower than int?
Usually yes; unless you have a specific reason to use them. "I don't need the extra bits" isn't a specific enough reason in my opinion. Arithmetic operands are promoted to int anyway and it's usually faster and less error prone to use int.
Just after a couple of days of using it myself, I'm starting to think its best use case is to turn it on, look at what it complains about, fix the legitimate complaints, then turn it back off.
This is quite often a useful approach to warning flags that are included in neither -Wall nor -Wextra such as the ones with -Wsuggest- prefix. There is a reason why they aren't included in "all warnings".
I think this can be considered a shortcoming of gcc.
As this code doesn't generate any warning:
int a = ..., b = ...;
a += b;
This code should not generate either, because semantically they are the same (two same-type numbers are added, and the result is put into a same-type variable):
short a = ..., b = ...;
a += b;
But GCC generates a warning, because as you say, the short's gets promoted to int's. But the short version isn't more dangerous as the int one, in the sense that if the addition overflows, then the behavior is implementation-defined for the short case, and undefined for the int case (or if unsigned numbers are used, then truncation can happen in both cases).
Clang handles this case more intelligently, and doesn't warn for this case. I think it's because it actually tracks the possible bit-width (or maybe range?) of the result. So, for example, this warns:
int a = ...;
short b = a;
But this doesn't (but GCC warns for this):
int a = ...;
short b = a&0xf; // there is a conversion here, but clang knows that only 4 bits are used, so it doesn't warn
So, until GCC will have a more intelligent -Wconversion, your options are:
don't use -Wconversion
fix all the warnings it prints
use clang instead (maybe for GCC: turn off this warning; and for clang: turn it on)
But don't hold your breath until it's fixed, there is a bug about this, opened in 2009.
A note:
Historically, I've liked to use no more bits than I need, and just let the compiler handle the promotions/conversions/padding as necessary.
If you use shorter types for storage, it's fine. But usually, there's no reason to use shorter types than int for arithmetic. It gives no speedup (even, it can be slower, because of the unnecessary maskings).

Disable default numeric types in compiler

When creating custom typedefs for integers, is it possible for compiler to warn when you when using a default numeric type?
For example,
typedef int_fast32_t kint;
int_fast32_t test=0;//Would be ok
kint test=0; //Would be ok
int test=0; //Would throw a warning or error
We're converting a large project and the default int size on platform is 32767 which is causing some issues. This warning would warn a user to not use ints in the code.
If possible, it would be great if this would work on GCC and VC++2012.
I'm reasonably sure gcc has no such option, and I'd be surprised if VC did.
I suggest writing a program that detects references to predefined types in source code, and invoking that tool automatically as part of your build process. It would probably suffice to search for certain keywords.
Be sure you limit this to your own source files; predefined and third-party headers are likely to make extensive use of predefined types.
But I wouldn't make the prohibition absolute. There are a number of standard library functions that use predefined types. For example, in c = getchar() it makes no sense to declare c as anything other than int. And there's no problem for something like for (int i = 0; i <= 100; i ++) ...
Ideally, the goal should be to use predefined types properly. The language has never guaranteed that an int can exceed 32767. (But "proper" use is difficult or impossible to verify automatically.)
I'd approach this by doing a replace-all first and then documenting this thoroughly.
You can use a preprocessor directive:
#define int use kint instead
Note that technically this is undefined behavior and you'll run into trouble if you do this definition before including third-party headers.
I would recommend to make bulk replacement int -> old_int_t at the very beginning of your porting. This way you can continue modifying your code without facing major restrictions and at the same time have access to all places that are not yet updated.
Eventually, at the end of your work, all occurencies of old_int_t should go away.
Even if one could somehow undefine the keyword int, that would do nothing to prevent usage of that type, since there are many cases where the compiler will end up using that type. Beyond the obvious cases of integer literals, there are some more subtle cases involving integer promotion. For example, if int happens to be 64 bits, operations between two variables of type uint32_t will be performed using type int rather than uint32_t. As nice as it would be to be able to specify that some variables represent numbers (which should be eagerly promoted when practical) while others represent members of a wrapping algebraic ring (which should not be promoted), I know of no facility to do such a thing. Consequently, int is unavoidable.

Output difference in gcc and turbo C

Why is there a difference in the output produced when the code is compiled using the two compilers gcc and turbo c.
#include <stdio.h>
int main()
{
char *p = "I am a string";
char *q = "I am a string";
if(p==q)
{
printf("Optimized");
}
else{
printf("Change your compiler");
}
return 0;
}
I get "Optimized" on gcc and "Change your compiler" on turbo c. Why?
Your questions has been tagged C as well as C++. So I'd answer for both the languages.
[C]
From ISO C99 (Section 6.4.5/6)
It is unspecified whether these arrays are distinct provided their elements have the appropriate values.
That means it is unspecified whether p and q are pointing to the same string literal or not. In case of gcc they both are pointing to "I am a string" (gcc optimizes your code) whereas in turbo c they are not.
Unspecified Behavior:
Use of an unspecified value, or other behavior where this International Standard provides
two or more possibilities and imposes no further requirements on which is chosen in any
instance
[C++]
From ISO C++-98 (Section 2.13.4/2)
Whether all string literals are distinct(that is, are stored in non overlapping objects) is implementation defined.
In C++ your code invokes Implementation defined behaviour.
Implementation-defined Behavior:
Unspecified Behavior where each implementation documents how the choice is made
Also see this question.
Since your string literal is a constant expression, i.e. you should not modify it via a pointer, there is no real purpose in storing it in the memory space twice. Being a newer compiler, gcc merges the literals by default while Turbo C does not. It is a sign of gcc's support for the newer language standard that has the notion of const data.
Please forget the answers in the same line as
"It's because Turbo C is SO TOTALLY OLD and they couldn't do it THEN, because it had to be FAST, but the GCC is totally NEW and RAD and that's why it does that!".
Both compiler support merging string constants as an option. The GCC option (-fmerge-constants) is turned on at optimization levels, while the Turbo C Option (-d) is turned off on default. If you are using the TCC IDE, then go to Options|Compiler...|Code Generation.. and check "Duplicate strings merged".
From the gcc manual page :
-fmerge-constants
Attempt to merge identical constants (string constants and
floating point constants) across
compilation units.
This option is the default for optimized compilation if the assembler
and linker support it. Use
-fno-merge-constants to inhibit this behavior.
Enabled at levels -O, -O2, -O3, -Os.
Hence the output.
Turbo C was optimized for fast compilation, so it doesn't have any features that would slow it down. Recognizing duplicate strings would be a slow-down, even if only minor.
The compiler may keep two copies of identical literals if it thinks proper. Finding out if that is the case is presumably the point of this program.
In the good old days, assemblers kept all literals in a literal pool, and patching the literal pool was a recognised (if not approved) technique of modifying 'constants' throughout the program.
If by some chance the compiler allows in this case *p = 'H'; then important differences in behaviour would result.
Historical footnote: Since addresses were smaller than floating-point numeric constants, FORTRAN used to handle floating-point constants much like C handles strings. Since memory was precious, identical constants would be allocated the same space. Also, parameter passing was always done by reference. This meant that if one passed a numeric constant to a procedure that modified its argument, other occurrences of that "constant" would change value.
Hence the old saying: "Variables won't; constants aren't."
Incidentally, has anyone noticed the bug in the Turbo C 2.0 printf which would fail when using a format like "%1.1f" to print numbers like 99.99 (outputs 00.0)? Fixed in 2.01, it reminds me of the Windows 3.1 calculator bug.