One of clients to whom we provided with source code said that by changing int to long and atoi to atol, they get different result of our program. But as far as I understand, int and long on Windows have the same 4 byte size and the same min/max. In the same analogy, I expected atoi and atol produce same output and with our testing, they do.
Is there any difference between those commands I didn't know?
In non-error cases, the functions are both defined equivalent to
strtol(nptr, (char **)NULL, 10)
the only difference is that atoi casts the return value to int.
There could be different behavior in error cases (when the string represents a value that is out of range of the type), since behavior is undefined for both. But I'd be surprised. Even if atoi and atol aren't implemented by calling strtol, they're probably implemented by the same code or very similar.
Personally I'd ask that the client show me the exact code. Maybe they didn't just replace int -> long and atoi -> atol as they claim. If that is all they changed (but they did so slightly differently from how you assumed when you did your tests), probably they've found the symptom of a bug in your code.
Related
I have read some documents about each, for example
http://www.cplusplus.com/reference/string/stof/
http://www.cplusplus.com/reference/cstdlib/atof/
I understand that atof is part of <cstdlib> and has const char* as input parameter, and std::stof is part of <string> and has different input format.
But it's not clear,
can they be used interchangeably:
do they convert to same float value with same input?
what scenario is best to use for each of these?
I assume you meant to compare std::atof with std::stod (both return double).
Just comparing the two linked reference pages yields the following differences :
std::atof takes a char*, while std::stod takes either a std::string or a std::wstring (ie. it has support for wide strings)
std::stod will also return the index of the first unconverted character if the pos parameter is not NULL (useful for further parsing of the string)
if the converted value falls outside of the range of a double, std::atof will return an undefined value, while std::stod will throw a std::out_of_range exception (definitely better than an undefined value)
if no conversion can be performed, std::atof will return 0.0, while std::stod will throw a std::invalid_argument exception (easier to distinguish with an actual converted 0.0)
These are all positive points for std::stod, making it the more advanced alternative of the 2.
Everything Sander said is correct. However, you specifically asked:
Can they be used interchangeably?
The answer is no, at least not in the general case. If you're using atof, chances are good that you have legacy C code. In that case, you must be careful about introducing code that can throw exceptions, especially in "routine" situations such as when a user gives bad input.
Do they convert to same float value with same input?
No. They both convert to a double value, not a float. Conversion to the same value isn't specifically guaranteed by the standard (to my knowledge), and it is possible that there is round-off somewhere that is slightly different. However, given valid input, I would be pretty surprised if there were a difference between the return values from the two in the same compiler.
What scenario is best to use for each of these?
If:
You have a conforming C++ 11 or later compiler, and
Your code can tolerate exceptions (or you're willing to catch them around every call), and
You already have a std::string or the performance hit of a conversion is unimportant.
Or, if you are a C++ beginner and don't know the answers to these questions.
Then I would suggest using std::stod. Otherwise, you could consider using atof.
The following simple program is behaving unpredictably. Sometimes it prints "0.00000", sometimes it prints more "0" than I can count. Some times it uses up all memory on the system, before the system either kills some process, or it fails with bad_alloc.
#include "stdio.h"
int main() {
fprintf(stdout, "%.*f", 0.0);
}
I'm aware that this is incorrect usage of fprintf. There should be another argument specifying the width of the formatting. It's just surprising that the behavior is so unpredictable. Sometimes it seems to use a default width, while sometimes it fails very badly. Could this not be made to always fail or always use some default behaviour?
I came over similar usage in some code at work, and spent a lot of time figuring out what was happening. It only seemed to happen with debug builds, but would not happen while debugging with gdb. Another curiosity is that running it through valgrind would consistently bring about the printing of many "0"s case, which otherwise happens quite seldom, but the memory usage issue would never occur then either.
I am running Red Hat Enterprise Linux 7, and compiled with gcc 4.8.5.
Formally this is undefined behavior.
As for what you're observing in practice:
My guess is that fprintf ends up using an uninitialized integer as the number of decimal places to output. That's because it'll try to read a number from a location where the caller didn't write any particular value, so you'll just get whatever bits happen to be stored there. If that happens to be a huge number, fprintf will try to allocate a lot of memory to store the result string internally. That would explain the "running out of memory" part.
If the uninitialized value isn't quite that big, the allocation will succeed and you'll end up with a lot of zeroes.
And finally, if the random integer value happens to be just 5, you'll get 0.00000.
Valgrind probably consistently initializes the memory your program sees, so the behavior becomes deterministic.
Could this not be made to always fail
I'm pretty sure it won't even compile if you use gcc -pedantic -Wall -Wextra -Werror.
The format string does not match the parameters, therefore the bahaviour of fprintf is undefined. Google "undefined behaviour C" for more information about "undefined bahaviour".
This would be correct:
// printf 0.0 with 7 decimals
fprintf(stdout, "%.*f", 7, 0.0);
Or maybe you just want this:
// printf 0.0 with de default format
fprintf(stdout, "%f", 0.0);
About this part of your question: Sometimes it seems to use a default width, while sometimes it fails very badly. Could this not be made to always fail or always use some default behaviour?
There cannot be any default behaviour, fprintf is reading the arguments according to the format string. If the arguments don't match, fprintf ends up with seamingly random values.
About this part of your question: Another curiosity is that running it through valgrind would consistently bring about the printing of many "0"s case, which otherwise happens quite seldom, but the memory usage issue would never occur then either.:
This is just another manifestation of undefined behaviour, with valgrind the conditions are quite different and therefore the actual undefined bahaviour can be different.
Undefined behaviour is undefined.
However, on x86-64 System-V ABI it is well-known that arguments are not passed on stack but in registers. Floating point variables are passed in floating-point registers, and integers are passed in general-purpose registers. There is no parameter store on stack, so the width of the arguments does not matter. Since you never passed any integer in the variable argument part, the general purpose register corresponding to the first argument will contain whatever garbage it had from before.
This program will show how the floating point values and integers are passed separately:
#include <stdio.h>
int main() {
fprintf(stdout, "%.*f\n", 42, 0.0);
fprintf(stdout, "%.*f\n", 0.0, 42);
}
Compiled on x86-64, GCC + Glibc, both printfs will produce the same output:
0.000000000000000000000000000000000000000000
0.000000000000000000000000000000000000000000
This is undefined behaviour in the standard. It means "anything is fair game" because you're doing wrong things.
The worst part is that most certainly any compiler will warn you, but you have ignored the warning. Putting some kind of validation other than the compiler will incurr in a cost that everybody will pay just so you can do what's wrong.
That's the opposite of what C and C++ stand for: you pay for what you use. If you want to pay the cost, it's up to you to do the checking.
What's really happening depends on the ABI, compiler and architecture. It's undefined behaviour because the language gives the implementer the freedom to do what's better on every machine (meaning, sometimes faster code, sometimes shorter code).
As an example, when you call a function on the machine, it just means that you're instructing the microprocessor to go to a certain code location.
In some made up assembly and ABI, then, printf("%.*f", 5, 1); will translate into something like
mov A, STR_F ; // load into register A the 32 bit address of the string "%.*f"
mov B, 5 ; // load second 32 bit parameter into B
mov F0, 1.0 ; // load first floating point parameter into register F0
call printf ; // call the function
Now, if you miss some parameter, in this case B, it will take any value that was there before.
The thing with functions like printf is that they allow anything in their parameter list (it's printf(const char*, ...), so anything is valid). That's why you shouldn't use printf on C++: you have better alternatives, like streams. printf avoids the checkings of the compiler. streams are better aware of types and are extensible to your own types. Also, that's why your code should compile without warnings.
In both C (n1570 7.21.6.1/10) and C++ (by inclusion of the C standard library) it is undefined behavior to provide an argument to printf whose type does not match its conversion specification. A simple example:
printf("%d", 1.9)
The format string specifies an int, while the argument is a floating point type.
This question is inspired by the question of a user who encountered legacy code with an abundance of conversion mismatches which apparently did no harm, cf. undefined behaviour in theory and in practice.
Declaring a mere format mismatch UB seems drastic at first. It is clear that the output can be wrong, depending on things like the exact mismatch, argument types, endianness, possibly stack layout and other issues. This extends, as one commentator there pointed out, also to subsequent (or even previous?) arguments. But that is far from general UB. Personally, I never encountered anything else but the expected wrong output.
To venture a guess, I would exclude alignment issues. What I can imagine is that providing a format string which makes printf expect large data together with small actual arguments possibly lets printf read beyond the stack, but I lack deeper insight in the var args mechanism and specific printf implementation details to verify that.
I had a quick look at the printf sources, but they are pretty opaque to the casual reader.
Therefore my question: What are the specific dangers of mis-matching conversion specifiers and arguments in printf which make it UB?
printf only works as described by the standard if you use it correctly. If you use it incorrectly, the behaviour is undefined. Why should the standard define what happens when you use it wrong?
Concretely, on some architectures floating point arguments are passed in different registers to integer arguments, so inside printf when it tries to find an int matching the format specifier it will find garbage in the corresponding register. Since those details are outside the scope of the standard there is no way to deal with that kind of misbehaviour except to say it's undefined.
For an example of how badly it could go wrong, using a format specifier of "%p" but passing a floating point type could mean that printf tries to read a pointer from a register or stack location which hasn't been set to a valid value and could contain a trap representation, which would cause the program to abort.
Some compilers may implement variable-format arguments in a way that allows the
types of arguments to be validated; since having a program trap on incorrect
usage may be better than possibly having it output seemingly-valid-but-wrong
information, some platforms may choose to do that.
Because the behavior of traps is outside the realm of the C Standard, any action
which might plausibly trap is classified as invoking Undefined Behavior.
Note that the possibility of implementations trapping based on incorrect formatting means that behavior is considered undefined even in cases where the expected type and the actual passed type have the same representation, except that signed and unsigned numbers of the same rank are interchangeable if the values they hold are within the range which is common to both [i.e. if a "long" holds 23, it may be output with "%lX" but not with "%X" even if "int" and "long" are the same size].
Note also that the C89 committee introduced a rule by fiat, which remains to this day, which states that even if "int" and "long" have the same format, the code:
long foo=23;
int *u = &foo;
(*u)++;
invokes Undefined Behavior since it causes information which was written as type "long" to be read as type "int" (behavior would also be Undefined if it was type "unsigned int"). Since a "%X" format specifier would cause data to be read as type "unsigned int", passing the data as type "long" would almost certainly cause the data to be stored somewhere as "long" but subsequently read as type "unsigned int", such behavior would almost likely violate the aforementioned rule.
Just to take your example: suppose that your architecture's procedure call standard says that floating-point arguments are passed in floating-point registers. But printf thinks you are passing an integer, because of the %d format specifier. So it expects an argument on the call stack, which isn't there. Now anything can happen.
Any printf format/argument mismatch will cause erroneous output, so you cannot rely on anything once you do that. It is hard to tell which will have dire consequences beyond garbage output because it depends completely no the specifics of the platform you are compiling for and the actual details of the printf implementation.
Passing invalid arguments to a printf instance that has a %s format can cause invalid pointers to be dereferenced. But invalid arguments for simpler types such as int or double can cause alignment errors with similar consequences.
I'll start by asking you to be aware of the fact that long is 64-bit for 64-bit versions of OS X, Linux, the BSD clones, and various Unix flavors if you aren't already aware. 64-bit Windows, however, kept long as 32-bit.
What does this have to do with printf() and UB with respect to its conversion specifications?
Internally, printf() will use the va_arg() macro. If you use %ld on 64-bit Linux and only pass an int, the other 32 bits will be retrieved from adjacent memory. If you use %d and pass a long on 64-bit Linux, the other 32 bits will still be on the argument stack. In other words, the conversion specification indicates the type (int, long, whatever) to va_arg(), and the size of the corresponding type determines the number of bytes by which va_arg() adjusts its argument pointer. Whereas it will just work on Windows since sizeof(int)==sizeof(long), porting it to another 64-bit platform can cause trouble, especially when you have a int *nptr; and try to use %ld with *nptr. If you don't have access to the adjacent memory, you'll likely get a segfault. So the possible concrete cases are:
adjacent memory is read, and output is messed up from that point on
adjacent memory is attempted to be read, and there's a segfault due to a protection mechanism
the size of long and int are the same, so it just works
the value fetched is truncated, and output is messed up from that point on
I'm not sure if alignment is an issue on some platforms, but if it is, it would depend upon the implementation of passing function parameters. Some "intelligent" compiler-specific printf() with a short argument list might bypass va_arg() altogether and represent the passed data as a string of bytes rather than working with a stack. If that happened, printf("%x %lx\n", LONG_MAX, INT_MIN); has three possibilities:
the size of long and int are the same, so it just works
ffffffff ffffffff80000000 is printed
the program crashes due to an alignment fault
As for why the C standard says that it causes undefined behavior, it doesn't specify exactly how va_arg() works, how function parameters are passed and represented in memory, or the explicit sizes of int, long, or other primitive data types because it doesn't unnecessarily constrain implementations. As a result, whatever happens is something the C standard cannot predict. Just looking at the examples above should be an indication of that fact, and I can't imagine what else other implementations exist that might behave differently altogether.
We have some legacy code that at one point in time long data types were refactored to int data types. During this refactor a number of printf / sprintf format statements were left incorrect as %ld instead of changed to %d. For example:
int iExample = 32;
char buf[200];
sprintf(buf, "Example: %ld", iExample);
This code is compiled on both GCC and VS2012 compilers. We use Coverity for static code analysis and code like in the example was flagged as a 'Printf arg type mismatch' with a Medium level of severity, CWE-686: Function Call With Incorrect Argument Type I can see this would be definitely a problem had the format string been that of an signed (%d) with an unsigned int type or something along these lines.
I am aware that the '_s' versions of sprintf etc are more secure, and that the above code can also be refactored to use std::stringstream etc. It is legacy code however...
I agree that the above code really should be using %d at the very least or refactored to use something like std::stringstream instead.
Out of curiosity is there any situation where the above code will generate incorrect results? As this legacy code has been around for quite some time and appears to be working fine.
UPDATED
Removed the usage of the word STL and just changed it to be std::stringstream.
As far as the standard is concerned, the behavior is undefined, meaning that the standard says exactly nothing about what will happen.
In practice, if int and long have the same size and representation, it will very likely "work", i.e., behave as if the correct format string has been used. (It's common for both int and long to be 32 bits on 32-bit systems).
If long is wider than int, it could still work "correctly". For example, the calling convention might be such that both types are passed in the same registers, or that both are pushed onto the stack as machine "words" of the same size.
Or it could fail in arbitrarily bad ways. If int is 32 bits and long is 64 bits, the code in printf that tries to read a long object might get a 64-bit object consisting of the 32 bits of the actual int that was passed combined with 32 bits of garbage. Or the extra 32 bits might consistently be zero, but with the 32 significant bits at the wrong end of the 64-bit object. It's also conceivable that fetching 64 bits when only 32 were passed could cause problems with other arguments; you might get the correct value for iExample, but following arguments might be fetched from the wrong stack offset.
My advice: The code should be fixed to use the correct format strings (and you have the tools to detect the problematic calls), but also do some testing (on all the C implementations you care about) to see whether it causes any visible symptoms in practice. The results of the testing should be used only to determine the priority of fixing the problems, not to decide whether to fix them or not. If the code visibly fails now, you should fix it now. If it doesn't, you can get away with waiting until later (presumably you have other things to work on).
It's undefined and depends on the implementation. On implementations where int and long have the same size, it will likely work as expected. But just try it on any system with 32-bit int and 64-bit long, especially if your integer is not the last format argument, and you're likely to get problems where printf reads 64 bits where only 32 were provided, the rest quite possibly garbage, and possibly, depending on alignment, the following arguments also cannot get accessed correctly.
I am using Visual studio 2008.
For below code
double main()
{
}
I get error:
error C3874: return type of 'main'
should be 'int' instead of 'double'
But if i use below code
char main()
{
}
No errors.
After running and exiting the output window displays
The program '[5856] test2.exe: Native'
has exited with code -858993664
(0xcccccc00).
Question: Is compiler doing implicit cast from default return value of zero(integer) to char ?
how the code 0xcccccc00 got generated ?
Looks like last byte in that code seem to be the actual returned value. Why 0xcccccc is coming ?
The correct way to do it, per the C++ standard is:
int main()
{
...
}
Don't change the return type to anything else or your code will not be C++ and you're just playing with compiler specific functionality. Those ccccc values in your example are just unitialized bytes (which the C allocator sets to 0xCC) being returned.
The value returned from the main function becomes the exit status of the process, though the C standard only ascribes specific meaning to two values: EXIT_SUCCESS (traditionally zero) and EXIT_FAILURE. The meaning of other possible return values is implementation-defined. However there is no standard for how non-zero codes are interpreted.
You may refer to an useful post:
What should main() return in C and C++?
Yet another MSVC extension/bug!
The answer to your first question is sort of yes. A char is essentially a very small integral type, so the compiler is being (extremely) lenient. Double isn't acceptable because it's not an integral type. The 0xCCCCCC is memory that never got initialized (except for the purposes of debugging). Since ASCII characters can only have two hex digits, the conversion failed to set the first 24 bits at all (and just set the last 8 bits to 0). What an odd and undesirable compiler trick.
About main function, $3.6.1/2 - "It
shall have a return type of type int,
but otherwise its type is
implementation-defined."
As I understand, anything that mentions 'shall' in the standard document AND is not ahdered to by the code is an instant condition required to be diagnosed by the compiler, unless the Standard specifically says that such diagnosis is not required.
So I guess VS has a bug if it allows such a code.
The main function is supposed to return an int. Not doing that means you're out in undefined territory. Don't forget to wave at the standard on your way past. Your char return probably works because char can be easily converted to an int. Doubles can certainly not. Not only are they longer (double the length) but they're floating point, which means you'll have 1s in wonky places.
Short answer: don't do that.
It is probably because char will implicitly cast to an int, however double won't as there would be data loss.
(See here: http://msdn.microsoft.com/en-us/library/y5b434w4%28v=VS.71%29.aspx for more info)
However you don't see the conversion problem because the compiler catches the worst sin (as stated in the other answers) of using a non standard return type.