Inconsistent results from printf with long long int? - c++

struct DummyStruct{
unsigned long long std;
int type;
};
DummyStruct d;
d.std = 100;
d.type = 10;
/// buggy printf, unsigned long long to int conversion is buggy.
printf("%d,%d\n",d.std, d.type); // OUTPUT: 0,100
printf("%d,%d\n", d.type, d.std); // OUTPUT: 10,100
printf("%lld,%d\n",d.std, d.type); // OUTPUT: 100,10
Please tell me why unsigned long long to int conversion is not properly handled in printf. I am using glibc.
Is this bug in printf ?
why printf does not do internal type conversion ?

The %d argument tells printf to interpret the corresponding argument as an int. Try using %llu for long long. And memorize this reference card.
(So no, it's not a bug)

Its your usage that is the problem.
Unless the types specified in the format string are exactly the same as the types in the parameters then things will not work correctly.
This is because the compiler pushes the parameters as-is onto the stack.
There is not type checking or conversion.
At run-time the code is pulling the values of the stack and advancing to the next object based on the value in the format string. If the format string is wrong then the amount advanced is incorrect and you will get funny results.

Rule one: The chances of you finding a bug in the library or the compiler are very, very slim. Always assume the compiler / library is right.
Parameters are passed to printf() through the mechanisms in <stdarg.h> (variable argument lists), which involves some magic on the stack.
Without going into too much detail, what printf() does is assuming that the next parameter it has to pull from the stack is of the type specified in your format string - in the case of %d, a signed int.
This works if the actual value you've put in there is smaller or equal in width to int, because internally any smaller value passed on the stack is extended to the width of int through a mechanism called "integer promotion".
This fails, however, if the type you have passed to printf() is larger than int: printf() is told (by your %d) to expect an int, and pulls the appropriate number of bytes (let's assume 4 bytes for a 32 bit int) from the stack.
In case of your long long, which we'll assume is 8 bytes for a 64 bit value, this results in printf() getting only half of your long long. The rest is still on the stack, and will give pretty strange results if you add another %d to your format string.
;-)

With printf (which is one of the very old functions from original C) the compiler does not cast the paramters after the format list to the desired type, i.e. you need to make sure yourself that the types in the parameter list match the one in the format.
With most other functions, the compiler squeezes the given parameter into the declared types, but printf, scanf and friends require you to tell the compiler exactly which types are following.

Related

Converting Integer Types

How does one convert from one integer type to another safely and with setting off alarm bells in compilers and static analysis tools?
Different compilers will warn for something like:
int i = get_int();
size_t s = i;
for loss of signedness or
size_t s = get_size();
int i = s;
for narrowing.
casting can remove the warnings but don't solve the safety issue.
Is there a proper way of doing this?
You can try boost::numeric_cast<>.
boost numeric_cast returns the result of converting a value of type Source to a value of type Target. If out-of-range is detected, an exception is thrown (see bad_numeric_cast, negative_overflow and positive_overflow ).
How does one convert from one integer type to another safely and with setting off alarm bells in compilers and static analysis tools?
Control when conversion is needed. As able, only convert when there is no value change. Sometimes, then one must step back and code at a higher level. IOWs, was a lossy conversion needed or can code be re-worked to avoid conversion loss?
It is not hard to add an if(). The test just needs to be carefully formed.
Example where size_t n and int len need a compare. Note that positive values of int may exceed that of size_t - or visa-versa or the same. Note in this case, the conversion of int to unsigned only happens with non-negative values - thus no value change.
int len = snprintf(buf, n, ...);
if (len < 0 || (unsigned)len >= n) {
// Handle_error();
}
unsigned to int example when it is known that the unsigned value at this point of code is less than or equal to INT_MAX.
unsigned n = ...
int i = n & INT_MAX;
Good analysis tools see that n & INT_MAX always converts into int without loss.
There is no built-in safe narrowing conversion between int types in c++ and STL. You could implement it yourself using as an example Microsoft GSL.
Theoretically, if you want perfect safety, you shouldn't be mixing types like this at all. (And you definitely shouldn't be using explicit casts to silence warnings, as you know.) If you've got values of type size_t, it's best to always carry them around in variables of type size_t.
There is one case where I do sometimes decide I can accept less than 100.000% perfect type safety, and that is when I assign sizeof's return value, which is a size_t, to an int. For any machine I am ever going to use, the only time this conversion might lose information is when sizeof returns a value greater than 2147483647. But I am content to assume that no single object in any of my programs will ever be that big. (In particular, I will unhesitatingly write things like printf("sizeof(int) = %d\n", (int)sizeof(int)), explicit cast and all. There is no possible way that the size of a type like int will not fit in an int!)
[Footnote: Yes, it's true, on a 16-bit machine the assumption is the rather less satisfying threshold that sizeof won't return a value greater than 32767. It's more likely that a single object might have a size like that, but probably not in a program that's running on a 16-bitter.]

Why do Boost Format and printf behave differently on same format string

The Boost Format documentation says:
One of its goal is to provide a replacement for printf, that means
format can parse a format-string designed for printf, apply it to the
given arguments, and produce the same result as printf would have.
When I compare the output of boost:format and printf using the same format string I get different outputs. Online example is here
#include <iostream>
#include <boost/format.hpp>
int main()
{
boost::format f("BoostFormat:%d:%X:%c:%d");
unsigned char cr =65; //'A'
int cr2i = int(cr);
f % cr % cr % cr % cr2i;
std::cout << f << std::endl;
printf("Printf:%d:%X:%c:%d",cr,cr,cr,cr2i);
}
The output is:
BoostFormat: A:A:A:65
printf: 65:41:A:65
The difference is when I want to display a char as integral type.
Why there is a difference? Is this a bug or wanted behavior?
This is expected behaviour.
In the boost manual it is written about the classical type-specification you uses:
But the classical type-specification flag of printf has a weaker
meaning in format. It merely sets the appropriate flags on the
internal stream, and/or formatting parameters, but does not require
the corresponding argument to be of a specific type.
Please note also, that in the stdlib-printf call all char arguments are automatically
converted to int due to the vararg-call. So the generated code is identical to:
printf("Printf:%d:%X:%c:%d",cr2i,cr2i,cr2i,cr2i);
This automatic conversion is not done with the % operator.
Addition to the accepted answer:
This also happens to arguments of type wchar_t as well as unsigned short and other equivalent types, which may be unexpected, for example, when using members of structs in the Windows API (e.g., SYSTEMTIME), which are short integers of type WORD for historical reasons.
If you are using Boost Format as a replacement for printf and "printf-like" functions in legacy code, you may consider creating a wrapper, which overrides the % operator in such a way that it converts
char and short to int
unsigned char and unsigned short to unsigned int
to emulate the behavior of C variable argument lists. It will still not be 100% compatible, but most of the remaining incompatibilities are actually helpful for fixing potentially unsafe code.
Newer code should probably not use Boost Format, but the standard std::format, which is not compatible to printf.

Is there an orthodox way to avoid compiler warning C4309 - "truncation of constant value" with binary file output?

My program does the common task of writing binary data to a file, conforming to a certain non-text file format. Since the data I'm writing is not already in existing chunks but instead is put together byte by byte at runtime, I use std::ostream::put() instead of write(). I assume this is normal procedure.
The program works just fine. It uses both std::stringstream::put() and std::ofstream::put() with two-digit hex integers as the arguments. But I get compiler warning C4309: "truncation of constant value" (in VC++ 2010) whenever the argument to put() is greater than 0x7f. Obviously the compiler is expecting a signed char, and the constant is out of range. But I don't think any truncation is actually happening; the byte gets written just like it's supposed to.
Compiler warnings make me think I'm not doing things in the normal, accepted way. The situation I described has to be a common one. Is there are common way to avoid such a compiler warning? Or is this an example of a pointless compiler warning that should just be ignored?
I thought of two inelegant ways to avoid it. I could use syntax like mystream.put( char(0xa4) ) on every call. Or instead of using std::stringstream I could use std::basic_stringstream< unsigned char >, but I don't think that trick would work with std::ofstream, which is not a templated type. I feel like there should be a better solution here, especially since ofstream is meant for writing binary files.
Your thoughts?
--EDIT--
Ah, I was mistaken about std::ofstream not being a templated type. It is actually std::basic_ofstream<char>, but I tried that method that and realized it won't work anyway for lack of defined methods and polymorphic incompatibility with std::ostream.
Here's a code sample:
stringstream ss;
int a, b;
/* Do stuff */
ss.put( 0 );
ss.put( 0x90 | a ); // oddly, no warning here...
ss.put( b ); // ...or here
ss.put( 0xa4 ); // C4309
I found solution that I'm happy with. It's more elegant than explicitly casting every constant to unsigned char. This is what I had:
ss.put( 0xa4 ); // C4309
I thought that the "truncation" was happening in implicitly casting unsigned char to char, but Cong Xu pointed out that integer constants are assumed to be signed, and any one greater than 0x7f gets promoted from char to int. Then it has to actually be truncated (cut down to one byte) if passed to put(). By using the suffix "u", I can specify an unsigned integer constant, and if it's no greater than 0xff, it will be an unsigned char. This is what I have now, without compiler warnings:
ss.put( 0xa4u );
std::stringstream ss;
ss.put(0x7f);
ss.put(0x80); //C4309
As you've guessed, the problem is that ostream.put() expects a char, but 0x7F is the maximum value for char, and anything greater gets promoted to int. You should cast to unsigned char, which is as wide as char so it'll store anything char does and safely, but also make truncation warnings legitimate:
ss.put(static_cast<unsigned char>(0x80)); // OK
ss.put(static_cast<unsigned char>(0xFFFF)); //C4309

Integer to Character conversion in C

Lets us consider this snippet:
int s;
scanf("%c",&s);
Here I have used int, and not char, for variable s, now for using s for character conversion safely I have to make it char again because when scanf reads a character it only overwrites one byte of the variable it is assigning it to, and not all four that int has.
For conversion I could use s = (char)s; as the next line, but is it possible to implement the same by subtracting something from s ?
What you've done is technically undefined behaviour. The %c format calls for a char*, you've passed it an int* which will (roughly speaking) be reinterpreted. Even assuming that the pointer value is still good after reinterpreting, storing an arbitrary character to the first byte of an int and then reading it back as int is undefined behaviour. Even if it were defined, reading an int when 3 bytes of it are uninitialized, is undefined behaviour.
In practice it probably does something sensible on your machine, and you just get garbage in the top 3 bytes (assuming little-endian).
Writing s = (char)s converts the value from int to char and then back to int again. This is implementation-defined behaviour: converting an out-of-range value to a signed type. On different implementations it might clean up the top 3 bytes, it might return some other result, or it might raise a signal.
The proper way to use scanf is:
char c;
scanf("%c", &c);
And then either int s = c; or int s = (unsigned char)c;, according to whether you want negative-valued characters to result in a negative integer, or a positive integer (up to 255, assuming 8-bit char).
I can't think of any good reason for using scanf improperly. There are good reasons for not using scanf at all, though:
int s = getchar();
Are you trying to convert a digit to its decimal value? If so, then
char c = '8';
int n = c - '0';
n should 8 at this point.
That's probably not a good idea; GCC gives me a warning for that code:
main.c:10: warning: format ‘%c’ expects type ‘char *’, but
argument 2 has type ‘int *’
In this case you're ok since you're passing a pointer to more space than you need (for most systems), but what if you did it the other way around? Could be crash city. If you really want to do something like what you have there, just do the typecast or mask it - the mask will be endian-dependent.
As written this won't work reliably . The argument, &s, to scanf is a pointer to int and scanf is expecting a pointer to char. The two data type (int and char) have different sizes (at least on most architectures) so the data may get put in the wrong spot in memeory, and the other part of s may not get properly cleared.
The answers suggesting manipulation of the result after using a pointer to int rely on unspecified behavior (i.e. that scanf will put the character value it has in the least significant byte of the int you're pointing to), and are not safe.
Not but you could use the following:
s = s & 0xFF
That will blank out all of the data except the first byte. But in general all these ideas (and the ones above) are bad ideas, since not all systems store the lowest part of the integer in memory first. So if you ever have to port this code to a big endian system, you'll be screwed.
True, you may never have to port the code, but why write unportable code to begin with?
See this for more info:
http://en.wikipedia.org/wiki/Endianness

reading bytes directly from RAM C++

Can anyone explain the following behaviour to a relative newbie...
const char cInputFilenameAndPath[] = "W:\\testerfile.bin";
int filesize = 4584;
char * fileinrampointer;
fileinrampointer = (char*) malloc(filesize);
ifstream fsInputFileStream;
fsInputFileStream.open(cInputFilenameAndPath, fstream::in | fstream::binary);
fsInputFileStream.read((char *)(fileinrampointer), filesize);
for(int f=0; f<4; f++)
{
printf("%x\n", *fileinrampointer);
fileinrampointer++;
}
I was expecting the above code to rread the first 4 bytes of the file I just read into memory. In the loop I am just displaying the current byte pointed to by the pointer then incrementing the pointer ready to display the next byte.
When I run the code I get:
37
ffffff94
42
ffffffd2
The values are correct but every other value seems to be padded up to a 64 bit number.
Because I'm asking it to display the value indicated by a 'char sized' pointer, I was expecting char size results but every other result comes out as a long long.
If I asign *fileinrampointer to an unsigned __int8 it leaves me with the value I want (without the leading 1s) which solves the problem, but I'm just wondering if anyone can explain what is happening above?
The expression *fileinrampointer is of type signed char, and it is being promoted to a signed int while being passed to printf. Thus, the sign bit propagates. Later on, you print it out with %x which means unsigned int in hex, which causes you to print all the 1's (as opposed to correctly interpret them as a part of a 2's complement signed integer). Also, ffffffd2 is 8 hex digits which means it's a 32bit signed integer.
If you declare fileinrampointer as unsigned char or unsigned __int8 the sign bit doesn't propagate during promotion. You may as well leave it signed and cast it
printf("%x\n", static_cast<unsigned char>(*fileinrampointer) );
ISO/IEC 9899:1999 6.5.2.2:
6 . If the expression that denotes the called function has a type that does not include a prototype, the integer promotions are performed on each argument, and arguments that have type float are promoted to double. These are called the default argument promotions. [...]
[...]
7. If the expression that denotes the called function has a type that does include a prototype, the arguments are implicitly converted, as if by assignment, to the types of the corresponding parameters, taking the type of each parameter to be the unqualified version of its declared type. The ellipsis notation in a function prototype declarator causes argument type conversion to stop after last declared parameter. The default argument promotions are performed on trailing arguments.
This clearly backs up my statement that this is integer promotion, and not printf interpretation.
Also see
ISO/IEC 9899:1999 7.15.1.1
glibc manual A.2.2.4
glibc manual 12.12.4
securecoding.cert.org
You are not asking it to display a value indicated by a char sized pointer, you are asking it to display a hexidecimal integer (%x) using the contents of a char pointer. Not tried it but you could try casting it:
printf("%x\n", (unsigned int)(*fileinrampointer));