Is there an orthodox way to avoid compiler warning C4309 - "truncation of constant value" with binary file output? - c++

My program does the common task of writing binary data to a file, conforming to a certain non-text file format. Since the data I'm writing is not already in existing chunks but instead is put together byte by byte at runtime, I use std::ostream::put() instead of write(). I assume this is normal procedure.
The program works just fine. It uses both std::stringstream::put() and std::ofstream::put() with two-digit hex integers as the arguments. But I get compiler warning C4309: "truncation of constant value" (in VC++ 2010) whenever the argument to put() is greater than 0x7f. Obviously the compiler is expecting a signed char, and the constant is out of range. But I don't think any truncation is actually happening; the byte gets written just like it's supposed to.
Compiler warnings make me think I'm not doing things in the normal, accepted way. The situation I described has to be a common one. Is there are common way to avoid such a compiler warning? Or is this an example of a pointless compiler warning that should just be ignored?
I thought of two inelegant ways to avoid it. I could use syntax like mystream.put( char(0xa4) ) on every call. Or instead of using std::stringstream I could use std::basic_stringstream< unsigned char >, but I don't think that trick would work with std::ofstream, which is not a templated type. I feel like there should be a better solution here, especially since ofstream is meant for writing binary files.
Your thoughts?
--EDIT--
Ah, I was mistaken about std::ofstream not being a templated type. It is actually std::basic_ofstream<char>, but I tried that method that and realized it won't work anyway for lack of defined methods and polymorphic incompatibility with std::ostream.
Here's a code sample:
stringstream ss;
int a, b;
/* Do stuff */
ss.put( 0 );
ss.put( 0x90 | a ); // oddly, no warning here...
ss.put( b ); // ...or here
ss.put( 0xa4 ); // C4309

I found solution that I'm happy with. It's more elegant than explicitly casting every constant to unsigned char. This is what I had:
ss.put( 0xa4 ); // C4309
I thought that the "truncation" was happening in implicitly casting unsigned char to char, but Cong Xu pointed out that integer constants are assumed to be signed, and any one greater than 0x7f gets promoted from char to int. Then it has to actually be truncated (cut down to one byte) if passed to put(). By using the suffix "u", I can specify an unsigned integer constant, and if it's no greater than 0xff, it will be an unsigned char. This is what I have now, without compiler warnings:
ss.put( 0xa4u );

std::stringstream ss;
ss.put(0x7f);
ss.put(0x80); //C4309
As you've guessed, the problem is that ostream.put() expects a char, but 0x7F is the maximum value for char, and anything greater gets promoted to int. You should cast to unsigned char, which is as wide as char so it'll store anything char does and safely, but also make truncation warnings legitimate:
ss.put(static_cast<unsigned char>(0x80)); // OK
ss.put(static_cast<unsigned char>(0xFFFF)); //C4309

Related

Detection of overflow

How to detect overflow of unsigned char variable in c++?
unsigned numbers always positive between 0 to 255.and obey the law 2^n(n = numbers of bit in type);if char 8 bit then unsigned char variables have values between 0 and 255 , while signed chars have values between -128 and 127.
unsigned char Test = 260;
Because 260 is an integer literal your compiler should emit a warning. How to handle that? Do not ignore compiler warnings (or use an alternative syntax to avoid automatic conversions or enable this warning as error). Also note that integer literals are always positive (unsigned): -1 is not an integer literal: it's i.l. 1 and unary operator -. For gcc I'd suggest to use -Wstrict-overflow=2 (or more, according to your code policies) and possibly enabling -Werror=strict-overflow. For MS VC++ you may enable warning C4307 with /we4307 and /W14307 if you keep warnings at level 1 (!!!) (you may also do it with #pragma warning directive).
How to detect overflow of unsigned char variable in c++?
At compile-time compiler warnings are your friends but at run-time?
There is not a portable way (like, for example, checked in C#) to do this and better technique depends on which type of operation you want to monitor. For a simple assignment (made with values known at run-time) you may write something like this:
int32_t bigNumber = 260;
uint8_t smallNumber = static_cast<uint8_t>(bigNumber);
if (static_cast<int32_t>(smallNumber) != bigNumber) {
// Overflow...
}
In alternative you may check before assigning:
int32_t bigNumber = 260;
if (bigNumber > UINT8_MAX) {
// Overflow
}
Note that you may also make compiler life easier writing (after assignment):
if (smallNumber != bigNumber) {
// Overflow
}
It works because automatic promotions will convert smallNumber to bigNumber type (unless you're performing a signed/unsigned comparison, in this case you should simply avoid this alternative).
If you need it often you may write a small helper function to perform this conversion. For some ideas and possible implementations, if you're using MS compiler, you may take a look to SafeInt family functions (note, however, that in this case assignment and casting won't throw).
You can use braces to initialize your value to force a compile-time error (assuming you use C++11 or later):
unsigned char Test{260};
Brace-initialization doesn't allow narrowing conversions.
Of course, that still wouldn't allow sticking the value 260 into an unsigned char but it would draw attention to the attempt. You'd need a bigger data type, e.g., unsigned short, to represent 260.

C/C++ Why to use unsigned char for binary data?

Is it really necessary to use unsigned char to hold binary data as in some libraries which work on character encoding or binary buffers? To make sense of my question, have a look at the code below -
char c[5], d[5];
c[0] = 0xF0;
c[1] = 0xA4;
c[2] = 0xAD;
c[3] = 0xA2;
c[4] = '\0';
printf("%s\n", c);
memcpy(d, c, 5);
printf("%s\n", d);
both the printf's output 𤭢 correctly, where f0 a4 ad a2 is the encoding for the Unicode code-point U+24B62 (𤭢) in hex.
Even memcpy also correctly copied the bits held by a char.
What reasoning could possibly advocate the use of unsigned char instead of a plain char?
In other related questions unsigned char is highlighted because it is the only (byte/smallest) data type which is guaranteed to have no padding by the C-specification. But as the above example showed, the output doesn't seem to be affected by any padding as such.
I have used VC++ Express 2010 and MinGW to compile the above. Although VC gave the warning
warning C4309: '=' : truncation of constant value
the output doesn't seems to reflect that.
P.S. This could be marked a possible duplicate of Should a buffer of bytes be signed or unsigned char buffer? but my intent is different. I am asking why something which seems to be working as fine with char should be typed unsigned char?
Update: To quote from N3337,
Section 3.9 Types
2 For any object (other than a base-class subobject) of trivially
copyable type T, whether or not the object holds a valid value of type
T, the underlying bytes (1.7) making up the object can be copied into
an array of char or unsigned char. If the content of the array of char
or unsigned char is copied back into the object, the object shall
subsequently hold its original value.
In view of the above fact and that my original example was on Intel machine where char defaults to signed char, am still not convinced if unsigned char should be preferred over char.
Anything else?
In C the unsigned char data type is the only data type that has all the following three properties simultaneously
it has no padding bits, that it where all storage bits contribute to the value of the data
no bitwise operation starting from a value of that type, when converted back into that type, can produce overflow, trap representations or undefined behavior
it may alias other data types without violating the "aliasing rules", that is that access to the same data through a pointer that is typed differently will be guaranteed to see all modifications
if these are the properties of a "binary" data type you are looking for, you definitively should use unsigned char.
For the second property we need a type that is unsigned. For these all conversion are defined with modulo arihmetic, here modulo UCHAR_MAX+1, 256 in most 99% of the architectures. All conversion of wider values to unsigned char thereby just corresponds to truncation to the least significant byte.
The two other character types generally don't work the same. signed char is signed, anyhow, so conversion of values that don't fit it is not well defined. char is not fixed to be signed or unsigned, but on a particular platform to which your code is ported it might be signed even it is unsigned on yours.
You'll get most of your problems when comparing the contents of individual bytes:
char c[5];
c[0] = 0xff;
/*blah blah*/
if (c[0] == 0xff)
{
printf("good\n");
}
else
{
printf("bad\n");
}
can print "bad", because, depending on your compiler, c[0] will be sign extended to -1, which is not any way the same as 0xff
The plain char type is problematic and shouldn't be used for anything but strings. The main problem with char is that you can't know whether it is signed or unsigned: this is implementation-defined behavior. This makes char different from int etc, int is always guaranteed to be signed.
Although VC gave the warning ... truncation of constant value
It is telling you that you are trying to store int literals inside char variables. This might be related to the signedness: if you try to store an integer with value > 0x7F inside a signed character, unexpected things might happen. Formally, this is undefined behavior in C, though practically you'd just get a weird output if attempting to print the result as an integer value stored inside a (signed) char.
In this specific case, the warning shouldn't matter.
EDIT :
In other related questions unsigned char is highlighted because it is the only (byte/smallest) data type which is guaranteed to have no padding by the C-specification.
In theory, all integer types except unsigned char and signed char are allowed to contain "padding bits", as per C11 6.2.6.2:
"For unsigned integer types other than unsigned char, the bits of the
object representation shall be divided into two groups: value bits and
padding bits (there need not be any of the latter)."
"For signed integer types, the bits of the object representation shall
be divided into three groups: value bits, padding bits, and the sign
bit. There need not be any padding bits; signed char shall not have
any padding bits."
The C standard is intentionally vague and fuzzy, allowing these theoretical padding bits because:
It allows different symbol tables than the standard 8-bit ones.
It allows implementation-defined signedness and weird signed integer formats such as one's complement or "sign and magnitude".
An integer may not necessarily use all bits allocated.
However, in the real world outside the C standard, the following applies:
Symbol tables are almost certainly 8 bits (UTF8 or ASCII). Some weird exceptions exist, but clean implementations use the standard type wchar_t when implementing symbols tables larger than 8 bits.
Signedness is always two's complement.
An integer always uses all bits allocated.
So there is no real reason to use unsigned char or signed char just to dodge some theoretical scenario in the C standard.
Bytes are usually intended as unsigned 8 bit wide integers.
Now, char doesn't specify the sign of the integer: on some compilers char could be signed, on other it may be unsigned.
If I add a bit shift operation to the code you wrote, then I will have an undefined behaviour. The added comparison will also have an unexpected result.
char c[5], d[5];
c[0] = 0xF0;
c[1] = 0xA4;
c[2] = 0xAD;
c[3] = 0xA2;
c[4] = '\0';
c[0] >>= 1; // If char is signed, will the 7th bit go to 0 or stay the same?
bool isBiggerThan0 = c[0] > 0; // FALSE if char is signed!
printf("%s\n", c);
memcpy(d, c, 5);
printf("%s\n", d);
Regarding the warning during the compilation: if the char is signed then you are trying to assign the value 0xf0, which cannot be represented in the signed char (range -128 to +127), so it will be casted to a signed value (-16).
Declaring the char as unsigned will remove the warning, and is always good to have a clean build without any warning.
The signed-ness of the plain char type is implementation defined, so unless you're actually dealing with character data (a string using the platform's character set - usually ASCII), it's usually better to specify the signed-ness explicitly by either using signed char or unsigned char.
For binary data, the best choice is most probably unsigned char, especially if bitwise operations will be performed on the data (specifically bit shifting, which doesn't behave the same for signed types as for unsigned types).
I am asking why something which seems to be working as fine with char should be typed unsigned char?
If you do things which are not "correct" in the sense of the standard, you rely on undefined behaviour. Your compiler might do it the way you want today, but you don't know what it does tomorrow. You don't know what GCC does or VC++ 2012. Or even if the behaviour depends on external factors or Debug/Release compiles etc. As soon as you leave the safe path of the standard, you might run into trouble.
Well, what do you call "binary data"? This is a bunch of bits, without any meaning assigned to them by that specific part of software that calls them "binary data". What's the closest primitive data type, which conveys the idea of the lack of any specific meaning to any one of these bits? I think unsigned char.
Is it really necessary to use unsigned char to hold binary data as in some libraries which work on character encoding or binary buffers?
"really" necessary? No.
It is a very good idea though, and there are many reasons for this.
Your example uses printf, which not type-safe. That is, printf takes it's formatting cues from the format string and not from the data type. You could just as easily tried:
printf("%s\n", (void*)c);
... and the result would have been the same. If you try the same thing with c++ iostreams, the result will be different (depending on the signed-ness of c).
What reasoning could possibly advocate the use of unsigned char instead of a plain char?
Signed specifies that the most significant bit of the data (for unsigned char the 8-th bit) represents the sign. Since you obviously do not need that, you should specify your data is unsigned (the "sign" bit represents data, not the sign of the other bits).

Integer to Character conversion in C

Lets us consider this snippet:
int s;
scanf("%c",&s);
Here I have used int, and not char, for variable s, now for using s for character conversion safely I have to make it char again because when scanf reads a character it only overwrites one byte of the variable it is assigning it to, and not all four that int has.
For conversion I could use s = (char)s; as the next line, but is it possible to implement the same by subtracting something from s ?
What you've done is technically undefined behaviour. The %c format calls for a char*, you've passed it an int* which will (roughly speaking) be reinterpreted. Even assuming that the pointer value is still good after reinterpreting, storing an arbitrary character to the first byte of an int and then reading it back as int is undefined behaviour. Even if it were defined, reading an int when 3 bytes of it are uninitialized, is undefined behaviour.
In practice it probably does something sensible on your machine, and you just get garbage in the top 3 bytes (assuming little-endian).
Writing s = (char)s converts the value from int to char and then back to int again. This is implementation-defined behaviour: converting an out-of-range value to a signed type. On different implementations it might clean up the top 3 bytes, it might return some other result, or it might raise a signal.
The proper way to use scanf is:
char c;
scanf("%c", &c);
And then either int s = c; or int s = (unsigned char)c;, according to whether you want negative-valued characters to result in a negative integer, or a positive integer (up to 255, assuming 8-bit char).
I can't think of any good reason for using scanf improperly. There are good reasons for not using scanf at all, though:
int s = getchar();
Are you trying to convert a digit to its decimal value? If so, then
char c = '8';
int n = c - '0';
n should 8 at this point.
That's probably not a good idea; GCC gives me a warning for that code:
main.c:10: warning: format ‘%c’ expects type ‘char *’, but
argument 2 has type ‘int *’
In this case you're ok since you're passing a pointer to more space than you need (for most systems), but what if you did it the other way around? Could be crash city. If you really want to do something like what you have there, just do the typecast or mask it - the mask will be endian-dependent.
As written this won't work reliably . The argument, &s, to scanf is a pointer to int and scanf is expecting a pointer to char. The two data type (int and char) have different sizes (at least on most architectures) so the data may get put in the wrong spot in memeory, and the other part of s may not get properly cleared.
The answers suggesting manipulation of the result after using a pointer to int rely on unspecified behavior (i.e. that scanf will put the character value it has in the least significant byte of the int you're pointing to), and are not safe.
Not but you could use the following:
s = s & 0xFF
That will blank out all of the data except the first byte. But in general all these ideas (and the ones above) are bad ideas, since not all systems store the lowest part of the integer in memory first. So if you ever have to port this code to a big endian system, you'll be screwed.
True, you may never have to port the code, but why write unportable code to begin with?
See this for more info:
http://en.wikipedia.org/wiki/Endianness

Worst side effects from chars signedness. (Explanation of signedness effects on chars and casts)

I frequently work with libraries that use char when working with bytes in C++. The alternative is to define a "Byte" as unsigned char but that not the standard they decided to use. I frequently pass bytes from C# into the C++ dlls and cast them to char to work with the library.
When casting ints to chars or chars to other simple types what are some of the side effects that can occur. Specifically, when has this broken code that you have worked on and how did you find out it was because of the char signedness?
Lucky i haven't run into this in my code, used a char signed casting trick back in an embedded systems class in school. I'm looking to better understand the issue since I feel it is relevant to the work I am doing.
One major risk is if you need to shift the bytes. A signed char keeps the sign-bit when right-shifted, whereas an unsigned char doesn't.
Here's a small test program:
#include <stdio.h>
int main (void)
{
signed char a = -1;
unsigned char b = 255;
printf("%d\n%d\n", a >> 1, b >> 1);
return 0;
}
It should print -1 and 127, even though a and b start out with the same bit pattern (given 8-bit chars, two's-complement and signed values using arithmetic shift).
In short, you can't rely on shift working identically for signed and unsigned chars, so if you need portability, use unsigned char rather than char or signed char.
The most obvious gotchas come when you need to compare the numeric value of a char with a hexadecimal constant when implementing protocols or encoding schemes.
For example, when implementing telnet you might want to do this.
// Check for IAC (hex FF) byte
if (ch == 0xFF)
{
// ...
Or when testing for UTF-8 multi-byte sequences.
if (ch >= 0x80)
{
// ...
Fortunately these errors don't usually survive very long as even the most cursory testing on a platform with a signed char should reveal them. They can be fixed by using a character constant, converting the numeric constant to a char or converting the character to an unsigned char before the comparison operator promotes both to an int. Converting the char directly to an unsigned won't work, though.
if (ch == '\xff') // OK
if ((unsigned char)ch == 0xff) // OK, so long as char has 8-bits
if (ch == (char)0xff) // Usually OK, relies on implementation defined behaviour
if ((unsigned)ch == 0xff) // still wrong
I've been bitten by char signedness in writing search algorithms that used characters from the text as indices into state trees. I've also had it cause problems when expanding characters into larger types, and the sign bit propagates causing problems elsewhere.
I found out when I started getting bizarre results, and segfaults arising from searching texts other than the one's I'd used during the initial development (obviously characters with values >127 or <0 are going to cause this, and won't necessarily be present in your typical text files.
Always check a variable's signedness when working with it. Generally now I make types signed unless I have a good reason otherwise, casting when necessary. This fits in nicely with the ubiquitous use of char in libraries to simply represent a byte. Keep in mind that the signedness of char is not defined (unlike with other types), you should give it special treatment, and be mindful.
The one that most annoys me:
typedef char byte;
byte b = 12;
cout << b << endl;
Sure it's cosmetics, but arrr...
When casting ints to chars or chars to other simple types
The critical point is, that casting a signed value from one primitive type to another (larger) type does not retain the bit pattern (assuming two's complement). A signed char with bit pattern 0xff is -1, while a signed short with the decimal value -1 is 0xffff. Casting an unsigned char with value 0xff to a unsigned short, however, yields 0x00ff. Therefore, always think of proper signedness before you typecast to a larger or smaller data type. Never carry unsigned data in signed data types if you don't need to - if an external library forces you to do so, do the conversion as late as possible (or as early as possible if the external code acts as data source).
The C and C++ language specifications define 3 data types for holding characters: char, signed char and unsigned char. The latter 2 have been discussed in other answers. Let's look at the char type.
The standard(s) say that the char data type may be signed or unsigned and is an implementation decision. This means that some compilers or versions of compilers, can implement char differently. The implication is that the char data type is not conducive for arithmetic or Boolean operations. For arithmetic and Boolean operations, signed and unsigned versions of char will work fine.
In summary, there are 3 versions of char data type. The char data type performs well for holding characters, but is not suited for arithmetic across platforms and translators since it's signedness is implementation defined.
You will fail miserably when compiling for multiple platforms because the C++ standard doesn't define char to be of a certain "signedness".
Therefore GCC introduces -fsigned-char and -funsigned-char options to force certain behavior. More on that topic can be found here, for example.
EDIT:
As you asked for examples of broken code, there are plenty of possibilities to break code that processes binary data. For example, image you process 8-bit audio samples (range -128 to 127) and you want to halven the volume. Now imagine this scenario (in which the naive programmer assumes char == signed char):
char sampleIn;
// If the sample is -1 (= almost silent), and the compiler treats char as unsigned,
// then the value of 'sampleIn' will be 255
read_one_byte_sample(&sampleIn);
// Ok, halven the volume. The value will be 127!
char sampleOut = sampleOut / 2;
// And write the processed sample to the output file, for example.
// (unsigned char)127 has the exact same bit pattern as (signed char)127,
// so this will write a sample with the loudest volume!!
write_one_byte_sample_to_output_file(&sampleOut);
I hope you like that example ;-) But to be honest I've never really came across such problems, not even as a beginner as far as I can remember...
Hope this answer is sufficient for you downvoters. What about a short comment?
Sign extension. The first version of my URL encoding function produced strings like "%FFFFFFA3".

Inconsistent results from printf with long long int?

struct DummyStruct{
unsigned long long std;
int type;
};
DummyStruct d;
d.std = 100;
d.type = 10;
/// buggy printf, unsigned long long to int conversion is buggy.
printf("%d,%d\n",d.std, d.type); // OUTPUT: 0,100
printf("%d,%d\n", d.type, d.std); // OUTPUT: 10,100
printf("%lld,%d\n",d.std, d.type); // OUTPUT: 100,10
Please tell me why unsigned long long to int conversion is not properly handled in printf. I am using glibc.
Is this bug in printf ?
why printf does not do internal type conversion ?
The %d argument tells printf to interpret the corresponding argument as an int. Try using %llu for long long. And memorize this reference card.
(So no, it's not a bug)
Its your usage that is the problem.
Unless the types specified in the format string are exactly the same as the types in the parameters then things will not work correctly.
This is because the compiler pushes the parameters as-is onto the stack.
There is not type checking or conversion.
At run-time the code is pulling the values of the stack and advancing to the next object based on the value in the format string. If the format string is wrong then the amount advanced is incorrect and you will get funny results.
Rule one: The chances of you finding a bug in the library or the compiler are very, very slim. Always assume the compiler / library is right.
Parameters are passed to printf() through the mechanisms in <stdarg.h> (variable argument lists), which involves some magic on the stack.
Without going into too much detail, what printf() does is assuming that the next parameter it has to pull from the stack is of the type specified in your format string - in the case of %d, a signed int.
This works if the actual value you've put in there is smaller or equal in width to int, because internally any smaller value passed on the stack is extended to the width of int through a mechanism called "integer promotion".
This fails, however, if the type you have passed to printf() is larger than int: printf() is told (by your %d) to expect an int, and pulls the appropriate number of bytes (let's assume 4 bytes for a 32 bit int) from the stack.
In case of your long long, which we'll assume is 8 bytes for a 64 bit value, this results in printf() getting only half of your long long. The rest is still on the stack, and will give pretty strange results if you add another %d to your format string.
;-)
With printf (which is one of the very old functions from original C) the compiler does not cast the paramters after the format list to the desired type, i.e. you need to make sure yourself that the types in the parameter list match the one in the format.
With most other functions, the compiler squeezes the given parameter into the declared types, but printf, scanf and friends require you to tell the compiler exactly which types are following.