Output difference in gcc and turbo C - c++

Why is there a difference in the output produced when the code is compiled using the two compilers gcc and turbo c.
#include <stdio.h>
int main()
{
char *p = "I am a string";
char *q = "I am a string";
if(p==q)
{
printf("Optimized");
}
else{
printf("Change your compiler");
}
return 0;
}
I get "Optimized" on gcc and "Change your compiler" on turbo c. Why?

Your questions has been tagged C as well as C++. So I'd answer for both the languages.
[C]
From ISO C99 (Section 6.4.5/6)
It is unspecified whether these arrays are distinct provided their elements have the appropriate values.
That means it is unspecified whether p and q are pointing to the same string literal or not. In case of gcc they both are pointing to "I am a string" (gcc optimizes your code) whereas in turbo c they are not.
Unspecified Behavior:
Use of an unspecified value, or other behavior where this International Standard provides
two or more possibilities and imposes no further requirements on which is chosen in any
instance
[C++]
From ISO C++-98 (Section 2.13.4/2)
Whether all string literals are distinct(that is, are stored in non overlapping objects) is implementation defined.
In C++ your code invokes Implementation defined behaviour.
Implementation-defined Behavior:
Unspecified Behavior where each implementation documents how the choice is made
Also see this question.

Since your string literal is a constant expression, i.e. you should not modify it via a pointer, there is no real purpose in storing it in the memory space twice. Being a newer compiler, gcc merges the literals by default while Turbo C does not. It is a sign of gcc's support for the newer language standard that has the notion of const data.

Please forget the answers in the same line as
"It's because Turbo C is SO TOTALLY OLD and they couldn't do it THEN, because it had to be FAST, but the GCC is totally NEW and RAD and that's why it does that!".
Both compiler support merging string constants as an option. The GCC option (-fmerge-constants) is turned on at optimization levels, while the Turbo C Option (-d) is turned off on default. If you are using the TCC IDE, then go to Options|Compiler...|Code Generation.. and check "Duplicate strings merged".

From the gcc manual page :
-fmerge-constants
Attempt to merge identical constants (string constants and
floating point constants) across
compilation units.
This option is the default for optimized compilation if the assembler
and linker support it. Use
-fno-merge-constants to inhibit this behavior.
Enabled at levels -O, -O2, -O3, -Os.
Hence the output.

Turbo C was optimized for fast compilation, so it doesn't have any features that would slow it down. Recognizing duplicate strings would be a slow-down, even if only minor.

The compiler may keep two copies of identical literals if it thinks proper. Finding out if that is the case is presumably the point of this program.
In the good old days, assemblers kept all literals in a literal pool, and patching the literal pool was a recognised (if not approved) technique of modifying 'constants' throughout the program.
If by some chance the compiler allows in this case *p = 'H'; then important differences in behaviour would result.

Historical footnote: Since addresses were smaller than floating-point numeric constants, FORTRAN used to handle floating-point constants much like C handles strings. Since memory was precious, identical constants would be allocated the same space. Also, parameter passing was always done by reference. This meant that if one passed a numeric constant to a procedure that modified its argument, other occurrences of that "constant" would change value.
Hence the old saying: "Variables won't; constants aren't."
Incidentally, has anyone noticed the bug in the Turbo C 2.0 printf which would fail when using a format like "%1.1f" to print numbers like 99.99 (outputs 00.0)? Fixed in 2.01, it reminds me of the Windows 3.1 calculator bug.

Related

Why does this code with a character array, which was given a variable as size, compile? [duplicate]

This question already has answers here:
Variable Length Array (VLA) in C++ compilers
(2 answers)
Closed 3 years ago.
Playing around with templates, I ran into an interesting phenomenon w.r.t to array and defining its size, which I thought is not allowed in C++.
I used a global variable to define the size of an array inside main(), and it worked for some reason (see code below)
1) Why does this even compile? I thought only constexpr may be used for array size
2) Suppose the above is valid, it still does not explain why it works even when sz = 8 which is clearly less than the size of the character string
3) Why are we getting that weird '#' and '?'. I tried different combination of strings, for example, only number characters ("123456789") and it did not appear.
Appreciate any help. Thanks.
Here is my code
#include <iostream>
#include <cstring>
using namespace std;
int sz = 8;
int main()
{
const char temp[sz] = "123456abc"; //9 characters + 1 null?
cout << temp << endl;
return 0;
}
Output:
123456ab�
#
1) Why does this even compile? I thought only constexpr may be used for array size
The asker is correct that Variable Length Arrays (VLAs) are not Standard C++. The g++ compiler includes support for VLAs by extension. How can this be? A compiler developer can add pretty much anything it wants so long as the behaviour required by the Standard is met. It is up to the developer to document the behaviour.
2) Suppose the above is valid, it still does not explain why it works even when sz = 8 which is clearly less than the size of the character string
Normally g++ emits an error if an array is initialized with values that exceed the size of the array. I have to confirm with documentation to see whether the C++ Standard requires an error in this case. It seems like a good place for an error, but I can't remember if an error is required.
In this case it appears that the VLA extension has a side effect that eliminates the error g++ normally spits out for trying to overfill an array. This makes a lot of sense since the compiler doesn't know the size of the array, in the general case, at compile time and cannot perform the test. No test, no error.
None of this is covered by the C++ Standard because VLA is not supported.
A quick look through the C standard, which does permit VLAs, didn't turn up any guidance for this case. C the rules for initializing VLAs are pretty simple: You can't. The compiler doesn't know how big the array will be at compile time, so it can't initialize. There may be an exception for string literals, but I haven't found one.
I also have not found a good description of the behaviour in GCC documentation.
clang produces the error I expect based on my read of the C standard: error: variable-sized object may not be initialized
Addendum: Probably should have checked Stack Overflow first and saved myself a lot of time: Initializing variable length array .
3) Why are we getting that weird '#' and '?'. I tried different combination of strings, for example, only number characters ("123456789") and it did not appear.
What appears to be happening, and since none of this is standard I have no quotes to back it up, is the string literal is copied into the VLA up to the size of the VLA. The portions of the literal past the end of the VLA are silently discarded. This includes the null terminator, and the behaviour is undefined when printing an unterminated char array.
Solution:
Stick to Standardized behaviour where possible. The major compilers have options to warn you of non-Standard code. Use -pedantic with compilers using the gcc options and /permissive- with recent versions of Visual Studio. When forced to step outside the Standard, consult the compiler documentation or the implementers themselves for the sticky-or-undocumented bits.
If you don't get good answers, try to find another path.

Values assigned to char in c++ [duplicate]

Why is this a warning? I think there are many cases when is more clear to use multi-char int constants instead of "no meaning" numbers or instead of defining const variables with same value. When parsing wave/tiff/other file types is more clear to compare the read values with some 'EVAW', 'data', etc instead of their corresponding values.
Sample code:
int waveHeader = 'EVAW';
Why does this give a warning?
According to the standard (§6.4.4.4/10)
The value of an integer character constant containing more than one
character (e.g., 'ab'), [...] is implementation-defined.
long x = '\xde\xad\xbe\xef'; // yes, single quotes
This is valid ISO 9899:2011 C. It compiles without warning under gcc with -Wall, and a “multi-character character constant” warning with -pedantic.
From Wikipedia:
Multi-character constants (e.g. 'xy') are valid, although rarely
useful — they let one store several characters in an integer (e.g. 4
ASCII characters can fit in a 32-bit integer, 8 in a 64-bit one).
Since the order in which the characters are packed into one int is not
specified, portable use of multi-character constants is difficult.
For portability sake, don't use multi-character constants with integral types.
This warning is useful for programmers that would mistakenly write 'test' where they should have written "test".
This happen much more often than programmers that do actually want multi-char int constants.
If you're happy you know what you're doing and can accept the portability problems, on GCC for example you can disable the warning on the command line:
-Wno-multichar
I use this for my own apps to work with AVI and MP4 file headers for similar reasons to you.
Even if you're willing to look up what behavior your implementation defines, multi-character constants will still vary with endianness.
Better to use a (POD) struct { char[4] }; ... and then use a UDL like "WAVE"_4cc to easily construct instances of that class
Simplest C/C++ any compiler/standard compliant solution, was mentioned by #leftaroundabout in comments above:
int x = *(int*)"abcd";
Or a bit more specific:
int x = *(int32_t*)"abcd";
One more solution, also compliant with C/C++ compiler/standard since C99 (except clang++, which has a known bug):
int x = ((union {char s[5]; int number;}){"abcd"}).number;
/* just a demo check: */
printf("x=%d stored %s byte first\n", x, x==0x61626364 ? "MSB":"LSB");
Here anonymous union is used to give a nice symbol-name to the desired numeric result, "abcd" string is used to initialize the lvalue of compound literal (C99).
If you want to disable this warning it is important to know that there are two related warning parameters in GCC and Clang: GCC Compiler options -wno-four-char-constants and -wno-multichar

Are pointer variables just integers with some operators or are they "symbolic"?

EDIT: The original word choice was confusing. The term "symbolic" is much better than the original ("mystical").
In the discussion about my previous C++ question, I have been told that pointers are
"a simple value type much like an integer"
not "mystical"
"The Bit pattern (object representation) contains the value (value representation) (§3.9/4) for trivially copyable types, which a pointer is."
This does not sound right! If nothing is symbolic and a pointer is its representation, then I can do the following. Can I?
#include <stdio.h>
#include <string.h>
int main() {
int a[1] = { 0 }, *pa1 = &a[0] + 1, b = 1, *pb = &b;
if (memcmp (&pa1, &pb, sizeof pa1) == 0) {
printf ("pa1 == pb\n");
*pa1 = 2;
}
else {
printf ("pa1 != pb\n");
pa1 = &a[0]; // ensure well defined behaviour in printf
}
printf ("b = %d *pa1 = %d\n", b, *pa1);
return 0;
}
This is a C and C++ question.
Testing with Compile and Execute C Online with GNU GCC v4.8.3: gcc -O2 -Wall gives
pa1 == pb
b = 1 *pa1 = 2
Testing with Compile and Execute C++ Online with GNU GCC v4.8.3: g++ -O2 -Wall
pa1 == pb
b = 1 *pa1 = 2
So the modification of b via (&a)[1] fails with GCC in C and C++.
Of course, I would like an answer based on standard quotes.
EDIT: To respond to criticism about UB on &a + 1, now a is an array of 1 element.
Related: Dereferencing an out of bound pointer that contains the address of an object (array of array)
Additional note: the term "mystical" was first used, I think, by Tony Delroy here. I was wrong to borrow it.
The first thing to say is that a sample of one test on one compiler generating code on one architecture is not the basis on which to draw a conclusion on the behaviour of the language.
c++ (and c) are general purpose languages created with the intention of being portable. i.e. a well formed program written in c++ on one system should run on any other (barring calls to system-specific services).
Once upon a time, for various reasons including backward-compatibility and cost, memory maps were not contiguous on all processors.
For example I used to write code on a 6809 system where half the memory was paged in via a PIA addressed in the non-paged part of the memory map. My c compiler was able to cope with this because pointers were, for that compiler, a 'mystical' type which knew how to write to the PIA.
The 80386 family has an addressing mode where addresses are organised in groups of 16 bytes. Look up FAR pointers and you'll see different pointer arithmetic.
This is the history of pointer development in c++. Not all chip manufacturers have been "well behaved" and the language accommodates them all (usually) without needing to rewrite source code.
Stealing the quote from TartanLlama:
[expr.add]/5 "[for pointer addition, ] if both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined."
So the compiler can assume that your pointer points to the a array, or one past the end. If it points one past the end, you cannot defererence it. But as you do, it surely can't be one past the end, so it can only be inside the array.
So now you have your code (reduced)
b = 1;
*pa1 = 2;
where pa points inside an array a and b is a separate variable. And when you print them, you get exactly 1 and 2, the values you have assigned them.
An optimizing compiler can figure that out, without even storing a 1or a 2 to memory. It can just print the final result.
If you turn off the optimiser the code works as expected.
By using pointer arithmetic that is undefined you are fooling the optimiser.
The optimiser has figured out that there is no code writing to b, so it can safely store it in a register. As it turns out, you have acquired the address of b in a non-standard way and modify the value in a way the optimiser doesn't see.
If you read the C standard, it says that pointers may be mystical. gcc pointers are not mystical. They are stored in ordinary memory and consist of the same type of bytes that make up all other data types. The behaviour you encountered is due to your code not respecting the limitations stated for the optimiser level you have chosen.
Edit:
The revised code is still UB. The standard doesn't allow referencing a[1] even if the pointer value happens to be identical to another pointer value. So the optimiser is allowed to store the value of b in a register.
C was conceived as a language in which pointers and integers were very intimately related, with the exact relationship depending upon the target platform. The relationship between pointers and integers made the language very suitable for purposes of low-level or systems programming. For purposes of discussion below, I'll thus call this language "Low-Level C" [LLC].
The C Standards Committee wrote up a description of a different language, where such a relationship is not expressly forbidden, but is not acknowledged in any useful fashion, even when an implementation generates code for a target and application field where such a relationship would be useful. I'll call this language "High Level Only C" [HLOC].
In the days when the Standard was written, most things that called themselves C implementations processed a dialect of LLC. Most useful compilers process a dialect which defines useful semantics in more cases than HLOC, but not as many as LLC. Whether pointers behave more like integers or more like abstract mystical entities depends upon which exact dialect one is using. If one is doing systems programming, it is reasonable to view C as treating pointers and integers as intimately related, because LLC dialects suitable for that purpose do so, and HLOC dialects that don't do so aren't suitable for that purpose. When doing high-end number crunching, however, one would far more often being using dialects of HLOC which do not recognize such a relationship.
The real problem, and source of so much contention, lies in the fact that LLC and HLOC are increasingly divergent, and yet are both referred to by the name C.

C++ handling of excess precision

I'm currently looking at code which does multi-precision floating-point arithmetic. To work correctly, that code requires values to be reduced to their final precision at well-defined points. So even if an intermediate result was computed to an 80 bit extended precision floating point register, at some point it has to be rounded to 64 bit double for subsequent operations.
The code uses a macro INEXACT to describe this requirement, but doesn't have a perfect definition. The gcc manual mentions -fexcess-precision=standard as a way to force well-defined precision for cast and assignment operations. However, it also writes:
‘-fexcess-precision=standard’ is not implemented for languages other than C
Now I'm thinking about porting those ideas to C++ (comments welcome if anyone knows an existing implementation). So it seems I can't use that switch for C++. But what is the g++ default behavior in absence of any switch? Are there more C++-like ways to control the handling of excess precision?
I guess that for my current use case, I'll probably use -mfpmath=sse in any case, which should not incur any excess precision as far as I know. But I'm still curious.
Are there more C++-like ways to control the handling of excess precision?
The C99 standard defines FLT_EVAL_METHOD, a compiler-set macro that defines how excess precision should happen in a C program (many C compilers still behave in a way that does not exactly conform to the most reasonable interpretation of the value of FP_EVAL_METHOD that they define: older GCC versions generating 387 code, Clang when generating 387 code, …). Subtle points in relation with the effects of FLT_EVAL_METHOD were clarified in the C11 standard.
Since the 2011 standard, C++ defers to C99 for the definition of FLT_EVAL_METHOD (header cfloat).
So GCC should simply allow -fexcess-precision=standard for C++, and hopefully it eventually will. The same semantics as that of C are already in the C++ standard, they only need to be implemented in C++ compilers.
I guess that for my current use case, I'll probably use -mfpmath=sse in any case, which should not incur any excess precision as far as I know.
That is the usual solution.
Be aware that C99 also defines FP_CONTRACT in math.h that you may want to look at: it relates to the same problem of some expressions being computed at a higher precision, striking from a completely different side (the modern fused-multiply-add instruction instead of the old 387 instruction set). This is a pragma to decide whether the compiler is allowed to replace source-level additions and multiplications with FMA instructions (this has the effect that the multiplication is virtually computed at infinite precision, because this is how this instruction works, instead of being rounded to the precision of the type as it would be with separate multiplication and addition instructions). This pragma has apparently not been incorporated in the C++ standard (as far as I can see).
The default value for this option is implementation-defined and some people argue for the default to be to allow FMA instructions to be generated (for C compilers that otherwise define FLT_EVAL_METHOD as 0).
You should, in C, future-proof
your code with:
#include <math.h>
#pragma STDC FP_CONTRACT off
And the equivalent incantation in C++ if your compiler documents one.
what is the g++ default behavior in absence of any switch?
I am afraid that the answer to this question is that GCC's behavior, say, when generating 387 code, is nonsensical. See the description of the situation that motivated Joseph Myers to fix the situation for C. If g++ does not implement -fexcess-precision=standard, it probably means that 80-bit computations are randomly rounded to the precision of the type when the compiler happened to have to spill some floating-point registers to memory, leading the program below to print "foo" in some circumstances outside the programmer's control:
if (x == 0.0) return;
... // code that does not modify x
if (x == 0.0) printf("foo\n");
… because the code in the ellipsis caused x, that was held in an 80-bit floating-point register, to be spilt to a 64-bit slot on the stack.
But what is the g++ default behavior in absence of any switch?
I found one answer myself via an experiment, using the following code:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char** argv) {
double a = atof("1.2345678");
double b = a*a;
printf("%.20e\n", b - 1.52415765279683990130);
return 0;
}
If b is rounded (-fexcess-precision=standard), then the result is zero. Otherwise (-fexcess-precision=fast) it is something like 8e-17. Compiling with -mfpmath=387 -O3, I could reproduce both cases for gcc-4.8.2. For g++-4.8.2 I get an error for -fexcess-precision=standard if I try that, and without a flag I get the same behavior as -fexcess-precision=fast gives for C. Adding -std=c++11 does not help. So now the suspicion already voiced by Pascal is official: g++ does not necessarily round everywhere it should.

Can I guarantee the C++ compiler will not reorder my calculations?

I'm currently reading through the excellent Library for Double-Double and Quad-Double Arithmetic paper, and in the first few lines I notice they perform a sum in the following way:
std::pair<double, double> TwoSum(double a, double b)
{
double s = a + b;
double v = s - a;
double e = (a - (s - v)) + (b - v);
return std::make_pair(s, e);
}
The calculation of the error, e, relies on the fact that the calculation follows that order of operations exactly because of the non-associative properties of IEEE-754 floating point math.
If I compile this within a modern optimizing C++ compiler (e.g. MSVC or gcc), can I be ensured that the compiler won't optimize out the way this calculation is done?
Secondly, is this guaranteed anywhere within the C++ standard?
You might like to look at the g++ manual page: http://gcc.gnu.org/onlinedocs/gcc-4.6.1/gcc/Optimize-Options.html#Optimize-Options
Particularly -fassociative-math, -ffast-math and -ffloat-store
According to the g++ manual it will not reorder your expression unless you specifically request it.
Yes, that is safe (at least in this case). You only use two "operators" there, the primary expression (something) and the binary something +/- something (additive).
Section 1.9 Program execution (of C++0x N3092) states:
Operators can be regrouped according to the usual mathematical rules only where the operators really are associative or commutative.
In terms of the grouping, 5.1 Primary expressions states:
A parenthesized expression is a primary expression whose type and value are identical to those of the enclosed expression. ... The parenthesized expression can be used in exactly the same contexts as those where the enclosed expression can be used, and with the same meaning, except as otherwise indicated.
I believe the use of the word "identical" in that quote requires a conforming implementation to guarantee that it will be executed in the specified order unless another order can give the exact same results.
And for adding and subtracting, section 5.7 Additive operators has:
The additive operators + and - group left-to-right.
So the standard dictates the results. If the compiler can ascertain that the same results can be obtained with different ordering of the operations then it may re-arrange them. But whether this happens or not, you will not be able to discern a difference.
This is a very valid concern, because Intel's C++ compiler, which is very widely used, defaults to performing optimizations that can change the result.
See http://software.intel.com/sites/products/documentation/hpc/compilerpro/en-us/cpp/lin/compiler_c/copts/common_options/option_fp_model.htm#option_fp_model
I would be quite surprised if any compiler wrongly assumed associativity of arithmetic operators with default optimising options.
But be wary of extended precision of FP registers.
Consult compiler documentation on how to ensure that FP values do not have extended precision.
If you really need to, I think you can make a noinline function no_reorder(float x) { return x; }, and then use it instead of parenthesis. Obviously, it's not a particularly efficient solution though.
In general, you should be able to -- the optimizer should be aware of the properties of the real operations.
That said, I'd test the hell out of the compiler I was using.
Yes. The compiler will not change the order of your calculations within a block like that.
Between compiler optimizations and out-of-order execution on the processor, it is almost a guarantee that things will not happen exactly as you ordered them.
HOWEVER, it is also guaranteed that this will NEVER change the result. C++ follows standard order of operations and all optimizations preserve this behavior.
Bottom line: Don't worry about it. Write your C++ code to be mathematically correct and trust the compiler. If anything goes wrong, the problem was almost certainly not the compiler.
As per the other answers you should be able to rely on the compiler doing the right thing -- most compilers allow you to compile and inspect the assembler (use -S for gcc) -- you may want to do that to make sure you get the order of operation you expect.
Different optimization levels (in gcc, -O _O2 etc) allows code to be re-arranged (however sequential code like this is unlikely to be affected) -- but I would suggest you should then isolate that particular part of code into a separate file, so that you can control the optimization level for just the calculation.
The short answer is: the compiler will probably change the order of your calculations, but it will never change the behavior of your program (unless your code makes use of expression with undefined behavior: http://blog.regehr.org/archives/213)
However, you can still influence this behavior by deactivating all compiler optimizations (option "-O0" with gcc). If you still needs the compiler to optimize the rest of your code, you may put this function in a separate ".c" which you can compile with "-O0".
Additionally, you can use some hacks. For instance, if you interleaves your code with extern function calls the compiler may consider that it is unsafe to re-order your code as the function may have unknown side-effect. Calling "printf" to print the value of your intermediate results will conduct to similar behavior.
Anyway, unless you have any very good reason (e.g. debugging) you typically don't want to care about that, and you should trust the compiler.