effect of using sprintf / printf using %ld format string instead of %d with int data type - c++

We have some legacy code that at one point in time long data types were refactored to int data types. During this refactor a number of printf / sprintf format statements were left incorrect as %ld instead of changed to %d. For example:
int iExample = 32;
char buf[200];
sprintf(buf, "Example: %ld", iExample);
This code is compiled on both GCC and VS2012 compilers. We use Coverity for static code analysis and code like in the example was flagged as a 'Printf arg type mismatch' with a Medium level of severity, CWE-686: Function Call With Incorrect Argument Type I can see this would be definitely a problem had the format string been that of an signed (%d) with an unsigned int type or something along these lines.
I am aware that the '_s' versions of sprintf etc are more secure, and that the above code can also be refactored to use std::stringstream etc. It is legacy code however...
I agree that the above code really should be using %d at the very least or refactored to use something like std::stringstream instead.
Out of curiosity is there any situation where the above code will generate incorrect results? As this legacy code has been around for quite some time and appears to be working fine.
UPDATED
Removed the usage of the word STL and just changed it to be std::stringstream.

As far as the standard is concerned, the behavior is undefined, meaning that the standard says exactly nothing about what will happen.
In practice, if int and long have the same size and representation, it will very likely "work", i.e., behave as if the correct format string has been used. (It's common for both int and long to be 32 bits on 32-bit systems).
If long is wider than int, it could still work "correctly". For example, the calling convention might be such that both types are passed in the same registers, or that both are pushed onto the stack as machine "words" of the same size.
Or it could fail in arbitrarily bad ways. If int is 32 bits and long is 64 bits, the code in printf that tries to read a long object might get a 64-bit object consisting of the 32 bits of the actual int that was passed combined with 32 bits of garbage. Or the extra 32 bits might consistently be zero, but with the 32 significant bits at the wrong end of the 64-bit object. It's also conceivable that fetching 64 bits when only 32 were passed could cause problems with other arguments; you might get the correct value for iExample, but following arguments might be fetched from the wrong stack offset.
My advice: The code should be fixed to use the correct format strings (and you have the tools to detect the problematic calls), but also do some testing (on all the C implementations you care about) to see whether it causes any visible symptoms in practice. The results of the testing should be used only to determine the priority of fixing the problems, not to decide whether to fix them or not. If the code visibly fails now, you should fix it now. If it doesn't, you can get away with waiting until later (presumably you have other things to work on).

It's undefined and depends on the implementation. On implementations where int and long have the same size, it will likely work as expected. But just try it on any system with 32-bit int and 64-bit long, especially if your integer is not the last format argument, and you're likely to get problems where printf reads 64 bits where only 32 were provided, the rest quite possibly garbage, and possibly, depending on alignment, the following arguments also cannot get accessed correctly.

Related

Memory waste? If main() should only return 0 or 1, why is main declared with int and not short int or even char?

For example:
#include <stdio.h>
int main (void) /* Why int and not short int? - Waste of Memory */
{
printf("Hello World!");
return 0;
}
Why main() is conventional defined with int type, which allocates 4 bytes in memory on 32-bit, if it usually returns only 0 or 1, while other types such as short int (2 bytes,32-bit) or even char (1 byte,32-bit) would be more memory saving?
It is wasting memory space.
NOTE: The question is not a duplicate of the thread given; its answers only correspond to the return value itself but not its datatype at explicit focus.
The Question is for C and C++. If the answers between those alter, share your wisdom with the mention of the context of which language in particular is focused.
Usually assemblers use their registers to return a value (for example the register AX in Intel processors). The type int corresponds to the machine word That is, it is not required to convert, for example, a byte that corresponds to the type char to the machine word.
And in fact, main can return any integer value.
It's because of a machine that's half a century old.
Back in the day when C was created, an int was a machine word on the PDP-11 - sixteen bits - and it was natural and efficient to have main return that.
The "machine word" was the only type in the B language, which Ritchie and Thompson had developed earlier, and which C grew out of.
When C added types, not specifying one gave you a machine word - an int.
(It was very important at the time to save space, so not requiring the most common type to be spelled out was a Very Good Thing.)
So, since a B program started with
main()
and programmers are generally language-conservative, C did the same and returned an int.
There are two reasons I would not consider this a waste:
1 practical use of 4 byte exit code
If you want to return an exit code that exactly describes an error you want more than 8 bit.
As an example you may want to group errors: the first byte could describe the vague type of error, the second byte could describe the function that caused the error, the third byte could give information about the cause of the error and the fourth byte describes additional debug information.
2 Padding
If you pass a single short or char they will still be aligned to fit into a machine word, which is often 4 Byte/32 bit depending on architecture. This is called padding and means, that you will most likely still need 32 bit of memory to return a single short or char.
The old-fashioned convention with most shells is to use the least significant 8 bits of int, not just 0 or 1. 16 bits is increasingly common due to that being the minimum size of an int allowed by the standard.
And what would the issue be with wasting space? Is the space really wasted? Is your computer so full of "stuff" that the remaining sizeof(int) * CHAR_BIT - 8 would make a difference? Could the architecture exploit that and use those remaining bits for something else? I very much doubt it.
So I wouldn't say the memory is at all wasted since you get it back from the operating system when the program finishes. Perhaps extravagent? A bit like using a large wine glass for a small tipple perhaps?
1st: Alone your assumption/statement if it usually returns only 0 or 1 is wrong.
Usually the return code is expected to be 0 if no error occurred but otherwise it can return any number to represent different errors. And most (at least command line programs) do so. Many programs also output negative numbers.
However there are a few common used codes https://www.tldp.org/LDP/abs/html/exitcodes.html also here another SO member points to a unix header that contains some codes https://stackoverflow.com/a/24121322/2331592
So after all it is not just a C or C++ type thing but also has historical reasons how most operating systems work and expect the programs to behave and since that the languages have to support that and so at least C like languages do that by using an int main(...).
2nd:
your conclusion It is wasting memory space is wrong.
Using an int in comparison to a shorter type does not involve any waste.
Memory is usually handled in word-size (that that mean may depend from your architecture) anyway
working with sub-word-types involves computation overheand on some architecture (read: load, word, mask out unrelated bits; store: load memory, mask out variable bits, or them with the new value, write the word back)
the memory is not wasted unless you use it. if you write return 0; no memory is ever used at this point. if you return myMemorySaving8bitVar; you only have 1 byte used (most probable on the stack (if not optimized out at all))
You're either working in or learning C, so I think it's a Real Good Idea that you are concerned with efficiency. However, it seems that there are a few things that seem to need clarifying here.
First, the int data type is not an never was intended to mean "32 bits". The idea was that int would be the most natural binary integer type on the target machine--usually the size of a register.
Second, the return value from main() is meant to accommodate a wide range of implementations on different operating systems. A POSIX system uses an unsigned 8-bit return code. Windows uses 32-bits that are interpreted by the CMD shell as 2's complement signed. Another OS might choose something else.
And finally, if you're worried about memory "waste", that's an implementation issue that isn't even an issue in this case. Return codes from main are typically returned in machine registers, not in memory, so there is no cost or savings involved. Even if there were, saving 2 bytes in the run of a nontrivial program is not worth any developer's time.
The answer is "because it usually doesn't return only 0 or 1." I found this thread from software engineering community that at least partially answers your question. Here are the two highlights, first from the accepted answer:
An integer gives more room than a byte for reporting the error. It can be enumerated (return of 1 means XYZ, return of 2 means ABC, return of 3, means DEF, etc..) or used as flags (0x0001 means this failed, 0x0002 means that failed, 0x0003 means both this and that failed). Limiting this to just a byte could easily run out of flags (only 8), so the decision was probably to use an integer.
An interesting point is also raised by Keith Thompson:
For example, in the dialect of C used in the Plan 9 operating system main is normally declared as a void function, but the exit status is returned to the calling environment by passing a string pointer to the exits() function. The empty string denotes success, and any non-empty string denotes some kind of failure. This could have been implemented by having main return a char* result.
Here's another interesting bit from a unix.com forum:
(Some of the following may be x86 specific.)
Returning to the original question: Where is the exit status stored? Inside the kernel.
When you call exit(n), the least significant 8 bits of the integer n are written to a cpu register. The kernel system call implementation will then copy it to a process-related data structure.
What if your code doesn't call exit()? The c runtime library responsible for invoking main() will call exit() (or some variant thereof) on your behalf. The return value of main(), which is passed to the c runtime in a register, is used as the argument to the exit() call.
Related to the last quote, here's another from cppreference.com
5) Execution of the return (or the implicit return upon reaching the end of main) is equivalent to first leaving the function normally (which destroys the objects with automatic storage duration) and then calling std::exit with the same argument as the argument of the return. (std::exit then destroys static objects and terminates the program)
Lastly, I found this really cool example here (although the author of the post is wrong in saying that the result returned is the returned value modulo 512). After compiling and executing the following:
int main() {
return 42001;
}
on a POSIX compliant my* system, echo $? returns 17. That is because 42001 % 256 == 17 which shows that 8 bits of data are actually used. With that in mind, choosing int ensures that enough storage is available for passing the program's exit status information, because, as per this answer, compliance to the C++ standard guarantees that size of int (in bits)
can't be less than 8. That's because it must be large enough to hold "the eight-bit code units of the Unicode UTF-8 encoding form."
EDIT:
*As Andrew Henle pointed out in the comment:
A fully POSIX compliant system makes the entire int return value available, not just 8 bits. See pubs.opengroup.org/onlinepubs/9699919799/basedefs/signal.h.html: "If si_code is equal to CLD_EXITED, then si_status holds the exit value of the process; otherwise, it is equal to the signal that caused the process to change state. The exit value in si_status shall be equal to the full exit value (that is, the value passed to _exit(), _Exit(), or exit(), or returned from main()); it shall not be limited to the least significant eight bits of the value."
I think this makes for an even stronger argument for the use of int over data types of smaller sizes.

Is there a reason to use C++11's std::int_fast32_t or std::int_fast16_t over int in cross-platform code?

In C++11 we are provided with fixed-width integer types, such as std::int32_tand std::int64_t, which are optional and therefore not optimal for writing cross-platform code. However, we also got non-optional variants for the types: e.g. the "fast" variants, e.g. std::int_fast32_tand std::int_fast64_t, as well as the "smallest-size" variants, e.g. std::int_least32_t, which both are at least the specified number of bits in size.
The code I am working on is part of a C++11-based cross-platform library, which supports compilation on the most popular Unix/Windows/Mac compilers. A question that now came up is if there is an advantage in replacing the existing integer types in the code by the C++11 fixed-width integer types.
A disadvantage of using variables like std::int16_t and std::int32_t is the lack of a guarantee that they are available, since they are only provided if the implementation directly supports the type (according to http://en.cppreference.com/w/cpp/types/integer).
However, since int is at least 16 bits and 16-bit are large enough for the integers used in the code, what about the usage of std::int_fast16_t over int? Does it provide a benefit to replace all int types by std::int_fast16_t and all unsigned int's by std::uint_fast16_t in that way or is this unnecessary?
Anologously, if knowing that all supported platforms and compilers feature an int of at least 32 bits size, does it make sense to replace them by std::int_fast32_t and std::uint_fast32_t respectively?
int can be 16, 32 or even 64 bit on current computers and compilers. In the future, it could be bigger (say, 128 bits).
If your code is ok with that, go with it.
If your code is only tested and working with 32 bit ints, then consider using int32_t. Then the code will fail at compile time instead of at run time when run on a system that doesn't have 32 bit ints (which is extremely rare today).
int_fast32_t is when you need at least 32 bits, but you care a lot about performance. On hardware that a 32 bit integer is loaded as a 64 bit integer, then bitshifted back down to a 32 bit integer in a cumbersome process, the int_fast_32_t may be a 64 bit integer. The cost of this is that on obscure platforms, your code behaves very differently.
If you are not testing on such platforms, I would advise against it.
Having things break at build time is usually better than having breaks at run time. If and when your code is actually run on some obscure processor needing these features, then fix it. The rule of "you probably won't need it" applies.
Be conservative, generate early errors on hardware you are not tested on, and when you need to port to said hardware do the work and testing required to be reliable.
In short:
Use int_fast##_t if and only if you have tested your code (and will continue to test it) on platforms where the int size varies, and you have shown that the performance improvement is worth that future maintenance.
Using int##_t with common ## sizes means that your code will fail to compile on platforms that you have not tested it on. This is good; untested code is not reliable, and unreliable code is usually worse than useless.
Without using int32_t, and using int, your code will sometimes have ints that are 32 and sometimes ints that are 64 (and in theory more), and sometimes ints that are 16. If you are willing to test and support every such case in every such int, go for it.
Note that arrays of int_fast##_t can have cache problems: they could be unreasonably big. As an example, int_fast16_t could be 64 bits. An array of a few thousand or million of them could be individually fast to work with, but the cache misses caused by their bulk could make them slower overall; and the risk that things get swapped out to slower storage grows.
int_least##_t can be faster in those cases.
The same applies, doubly so, to network-transmitted and file-stored data, on top of the obvious issue that network/file data usually has to follow formats that are stable over compiler/hardware changes. This, however, is a different question.
However, when using fixed width integer types you must pay special attention to the fact that int, long, etc. still have the same width as before. Integer promotion still happens based on the size of int, which depends on the compiler you are using. An integral number in your code will be of type int, with the associated width. This can lead to unwanted behaviour if you compile your code using a different compiler. For more detailed info: https://stackoverflow.com/a/13424208/3144964
I have just realised that the OP is just asking about int_fast##_t not int##_t since the later is optional. However, I will keep the answer hopping it may help someone.
I would add something. Fixed size integers are so important (or even a must) for building APIs for other languages. One example is when when you want to pInvoke functions and pass data to them in a native C++ DLL from a .NET managed code for example. In .NET, int is guaranteed to be a fixed size (I think it is 32bit). So, if you used int in C++ and it was considered as 64-bit rather than 32bit, this may cause problems and cuts down the sequence of wrapped structs.

porting 32 bit to 64bit code

Was trying to port 32bit to 64bit code I was wondering if there are some standard rules when it comes to porting ?
I have my code compiling in a 64bit environment and now I come across some errors like
cast from pointer to integer of different size [-Werror=pointer-to-int-cast] for
x = (int32_t)y;
And to get of this I use x = (size_t)y; I get rid of the error but is this the correct way. Also in various location I have to cast a variable to (unsigned long long). For example
printf("Total Time : %5qu\n",time->compile_time
This gives an error error: format '%qu' expects argument of type 'long long unsigned int', but argument 2 has type (XYZ).
to get this fixed i do something like
printf("Total Time : %5qu\n",(unsigned long long) time->compile_time
Again is this proper ??
I think it's safe to assume that y is a pointer in this case.
Instead of size_t you should use intptr_t or uintptr_t.
See size_t vs. uintptr_t.
As for your second cast it depends what you mean by proper?
The usual advice is to avoid casting. However, like all things in programming there is a reason that they are available. When working on an implementation of malloc on an embedded system I had to cast pointers to uintptr_t in order to be able to do the necessary arithmetic on them. This code was tested on a 64 bit PC but ran on a 32 bit micro controller. The fact that I used two architectures was the best way to ensure it was somewhat portable code.
Casting though makes your code dependent on how the underlying type is defined! Just like you noticed with your x = (int32_t)y this line made your code dependent on the fact that a pointer was 32 bits wide.
The only advice I can give you is to know your types. If you want to cast, it is ok (so long as you can't make your variable of the correct type to begin with) but it may reduce your portability unless you choose the "correct" type to cast to.
The same goes for the printf cast. If I was you, I would read the definition of %5qu thoroughly first (this may help). Then I would attempt to use an appropriately typed variable (or conversely a different format string) and only if that failed would I resort to a cast.
I have never used %qu but I would interpret it as a 64 bit unsigned int so I would try using uint64_t (because long long is not guaranteed to be 64 bits across all platforms). Although by what I read on Wikipedia the q specifier is platform specific to begin with so it might be wise to change it.
Any more than this and the question becomes too broad (it's good that we stuck to specific examples). If you get stuck, come back with individual types that you want to check and ask questions only about them.
Was it Stroustrup that said they call it a 'cast' because it props up something that's broken ? ;-)
x = (int32_t) y;
In this case you are using an exact width type, so it really depends on what x and y are. The error message suggest that y is a pointer. A pointer is not an int32_t, so the real question is why is y being assigned to x ... it may indicate a potential problem. Casting it away may just cover the problem so that it bites you at run-time rather than compile-time. Figure out what the code thinks it's doing and "re-jigger" the types to fit the code. The reason the error goes away when you use a (size_t) cast is that likely the pointer is 64 bits and size_t is 64 bits, but you could consider that a simple form of random casting luck. The same is true when casting to (unsigned long long). Don't assume that an int is 32 or 64 bits and don't using casting as a porting tool ... it will get you in trouble. It's tough to be more specific based on a single line of code. If you want to post a < 25 line function that has issues; more specific advice may be available.

How to write convertible code, 32 bit/64 bit?

A c++ specific question. So i read a question about what makes a program 32 bit/64 bit, and the anwser it got was something like this (sorry i cant find the question, was somedays ago i looked at it and i cant find it again:( ): As long as you dont make any "pointer assumptions", you only need to recompile it. So my question is, what are pointer assumtions ? To my understanding there is 32 bit pointer and 64 bit pointers so i figure it is something to do with that . Please show the diffrence in code between them. Any other good habits to keep in mind while writing code, that helps it making it easy to convert between the to are also welcome :) tho please share examples with them
Ps. I know there is this post:
How do you write code that is both 32 bit and 64 bit compatible?
but i tougth it was kind of to generall with no good examples, for new programmers like myself. Like what is a 32 bit storage unit ect. Kinda hopping to break it down a bit more (no pun intended ^^ ) ds.
In general it means that your program behavior should never depend on the sizeof() of any types (that are not made to be of some exact size), neither explicitly nor implicitly (this includes possible struct alignments as well).
Pointers are just a subset of them, and it probably also means that you should not try to rely on being able to convert between unrelated pointer types and/or integers, unless they are specifically made for this (e.g. intptr_t).
In the same way you need to take care of things written to disk, where you should also never rely on the size of e.g. built in types, being the same everywhere.
Whenever you have to (because of e.g. external data formats) use explicitly sized types like uint32_t.
For a well-formed program (that is, a program written according to syntax and semantic rules of C++ with no undefined behaviour), the C++ standard guarantees that your program will have one of a set of observable behaviours. The observable behaviours vary due to unspecified behaviour (including implementation-defined behaviour) within your program. If you avoid unspecified behaviour or resolve it, your program will be guaranteed to have a specific and certain output. If you write your program in this way, you will witness no differences between your program on a 32-bit or 64-bit machine.
A simple (forced) example of a program that will have different possible outputs is as follows:
int main()
{
std::cout << sizeof(void*) << std::endl;
return 0;
}
This program will likely have different output on 32- and 64-bit machines (but not necessarily). The result of sizeof(void*) is implementation-defined. However, it is certainly possible to have a program that contains implementation-defined behaviour but is resolved to be well-defined:
int main()
{
int size = sizeof(void*);
if (size != 4) {
size = 4;
}
std::cout << size << std::endl;
return 0;
}
This program will always print out 4, despite the fact it uses implementation-defined behaviour. This is a silly example because we could have just done int size = 4;, but there are cases when this does appear in writing platform-independent code.
So the rule for writing portable code is: aim to avoid or resolve unspecified behaviour.
Here are some tips for avoiding unspecified behaviour:
Do not assume anything about the size of the fundamental types beyond that which the C++ standard specifies. That is, a char is at least 8 bit, both short and int are at least 16 bits, and so on.
Don't try to do pointer magic (casting between pointer types or storing pointers in integral types).
Don't use a unsigned char* to read the value representation of a non-char object (for serialisation or related tasks).
Avoid reinterpret_cast.
Be careful when performing operations that may over or underflow. Think carefully when doing bit-shift operations.
Be careful when doing arithmetic on pointer types.
Don't use void*.
There are many more occurrences of unspecified or undefined behaviour in the standard. It's well worth looking them up. There are some great articles online that cover some of the more common differences that you'll experience between 32- and 64-bit platforms.
"Pointer assumptions" is when you write code that relies on pointers fitting in other data types, e.g. int copy_of_pointer = ptr; - if int is a 32-bit type, then this code will break on 64-bit machines, because only part of the pointer will be stored.
So long as pointers are only stored in pointer types, it should be no problem at all.
Typically, pointers are the size of the "machine word", so on a 32-bit architecture, 32 bits, and on a 64-bit architecture, all pointers are 64-bit. However, there are SOME architectures where this is not true. I have never worked on such machines myself [other than x86 with it's "far" and "near" pointers - but lets ignore that for now].
Most compilers will tell you when you convert pointers to integers that the pointer doesn't fit into, so if you enable warnings, MOST of the problems will become apparent - fix the warnings, and chances are pretty decent that your code will work straight away.
There will be no difference between 32bit code and 64bit code, the goal of C/C++ and other programming languages are their portability, instead of the assembly language.
The only difference will be the distrib you'll compile your code on, all the work is automatically done by your compiler/linker, so just don't think about that.
But: if you are programming on a 64bit distrib, and you need to use an external library for example SDL, the external library will have to also be compiled in 64bit if you want your code to compile.
One thing to know is that your ELF file will be bigger on a 64bit distrib than on a 32bit one, it's just logic.
What's the point with pointer? when you increment/change a pointer, the compiler will increment your pointer from the size of the pointing type.
The contained type size is defined by your processor's register size/the distrib your working on.
But you just don't have to care about this, the compilation will do everything for you.
Sum: That's why you can't execute a 64bit ELF file on a 32bit distrib.
Typical pitfalls for 32bit/64bit porting are:
The implicit assumption by the programmer that sizeof(void*) == 4 * sizeof(char).
If you're making this assumption and e.g. allocate arrays that way ("I need 20 pointers so I allocate 80 bytes"), your code breaks on 64bit because it'll cause buffer overruns.
The "kitten-killer" , int x = (int)&something; (and the reverse, void* ptr = (void*)some_int). Again an assumption of sizeof(int) == sizeof(void*). This doesn't cause overflows but looses data - the higher 32bit of the pointer, namely.
Both of these issues are of a class called type aliasing (assuming identity / interchangability / equivalence on a binary representation level between two types), and such assumptions are common; like on UN*X, assuming time_t, size_t, off_t being int, or on Windows, HANDLE, void* and long being interchangeable, etc...
Assumptions about data structure / stack space usage (See 5. below as well). In C/C++ code, local variables are allocated on the stack, and the space used there is different between 32bit and 64bit mode due to the point below, and due to the different rules for passing arguments (32bit x86 usually on the stack, 64bit x86 in part in registers). Code that just about gets away with the default stacksize on 32bit might cause stack overflow crashes on 64bit.
This is relatively easy to spot as a cause of the crash but depending on the configurability of the application possibly hard to fix.
Timing differences between 32bit and 64bit code (due to different code sizes / cache footprints, or different memory access characteristics / patterns, or different calling conventions ) might break "calibrations". Say, for (int i = 0; i < 1000000; ++i) sleep(0); is likely going to have different timings for 32bit and 64bit ...
Finally, the ABI (Application Binary Interface). There's usually bigger differences between 64bit and 32bit environments than the size of pointers...
Currently, two main "branches" of 64bit environments exist, IL32P64 (what Win64 uses - int and long are int32_t, only uintptr_t/void* is uint64_t, talking in terms of the sized integers from ) and LP64 (what UN*X uses - int is int32_t, long is int64_t and uintptr_t/void* is uint64_t), but there's the "subdivisions" of different alignment rules as well - some environments assume long, float or double align at their respective sizes, while others assume they align at multiples of four bytes. In 32bit Linux, they align all at four bytes, while in 64bit Linux, float aligns at four, long and double at eight-byte multiples.
The consequence of these rules is that in many cases, bith sizeof(struct { ...}) and the offset of structure/class members are different between 32bit and 64bit environments even if the data type declaration is completely identical.
Beyond impacting array/vector allocations, these issues also affect data in/output e.g. through files - if a 32bit app writes e.g. struct { char a; int b; char c, long d; double e } to a file that the same app recompiled for 64bit reads in, the result will not be quite what's hoped for.
The examples just given are only about language primitives (char, int, long etc.) but of course affect all sorts of platform-dependent / runtime library data types, whether size_t, off_t, time_t, HANDLE, essentially any nontrivial struct/union/class ... - so the space for error here is large,
And then there's the lower-level differences, which come into play e.g. for hand-optimized assembly (SSE/SSE2/...); 32bit and 64bit have different (numbers of) registers, different argument passing rules; all of this affects strongly how such optimizations perform and it's very likely that e.g. SSE2 code which gives best performance in 32bit mode will need to be rewritten / needs to be enhanced to give best performance 64bit mode.
There's also code design constraints which are very different for 32bit and 64bit, particularly around memory allocation / management; an application that's been carefully coded to "maximize the hell out of the mem it can get in 32bit" will have complex logic on how / when to allocate/free memory, memory-mapped file usage, internal caching, etc - much of which will be detrimental in 64bit where you could "simply" take advantage of the huge available address space. Such an app might recompile for 64bit just fine, but perform worse there than some "ancient simple deprecated version" which didn't have all the maximize-32bit peephole optimizations.
So, ultimately, it's also about enhancements / gains, and that's where more work, partly in programming, partly in design/requirements comes in. Even if your app cleanly recompiles both on 32bit and 64bit environments and is verified on both, is it actually benefitting from 64bit ? Are there changes that can/should be done to the code logic to make it do more / run faster in 64bit ? Can you do those changes without breaking 32bit backward compatibility ? Without negative impacts on the 32bit target ? Where will the enhancements be, and how much can you gain ?
For a large commercial project, answers to these questions are often important markers on the roadmap because your starting point is some existing "money maker"...

What are possible consequences of format/argument type mismatches in c++ fprintf function calls?

We have a legacy c++ .dll compiled for windows under visual studio. We have been running into issues where we get different results when we compile the program using different compiler options.
I have done a pretty simple port so that I can compile it under linux using g++. I just wanted to see what kind of warnings gcc would throw at me and possibly try to run it using valgrind to look for possible erros.
So that is the background. Here is the question. We have a bunch of fprintf function calls that print to a log file. When compiled under g++, I get warnings like this.
../f11.cpp:754:128: warning: format ‘%i’ expects type ‘int’, but argument 8 has type ‘long unsigned int’
Obviously this is a bad thing we need to fix, but I am just curious about the potential consequences of ignoring this warning? Are the consequences only limited to the output in the log file, or could this cause things like buffer overruns or any other type of situation where we are overwriting memory without knowing it?
By definition, it's undefined behavior to have mismatching format strings and argument types.
So anything is allowed to happen. But more likely, you'll get completely garbled and non-sense output - especially if the sizes of the operands you pass don't match what's expected from the format strings.
It is possible for a buffer overrun to happen - if printf() ends up reading past the parameters, it will be reading stack garbage. But it will only reading it. It won't be writing to it, so it shouldn't corrupt anything. (with one exception - %n, see comments)
But then again, it's undefined behavior. Anything is possible.
You are getting such warnings because of code like
long x;
printf ("x=%i\n", x);
On a 64 bits x86-64 Linux machine, what is probably happenning is that printf implementation would call va_arg(arglist, int) for the x argument. Since int is 32 bits and long is 64 bits, the 64 bits value is probably truncated to its 32 lower bits, which in that particular case probably don't harm much.
If it is a scanf ("%i", &x); things become much much uglier. Probably, only 32 bits out of 64 of the long x get changed, and that will break the code later.
But as everyone responded, this is undefined behavior. you'll feel sorry if you don't fix it or at the very least, add a big fat /* FIXME */ comment for the person working on the code in a few weeks or months.