Is std::cout of a char reference defined by the standard? - c++

I have read this post and the answers indicate a behavior described in a paragraph below. I am not trying to make it work on my machine, or find a workaround to make it work on my machine, it is a question of is it defined behavior according to the standard.
Consider the following code which creates an int variable, an int-reference variable, and prints out the result of calling the address-operator on the int-reference variable
#include <iostream>
int main() {
int a = 70;
int& b = a;
std::cout << &b << std::endl;
return 0;
}
It prints out what I would expect, which is an address in memory, i.e., the address of int variable a.
But now I change int to char, or unsigned char, or signed char, and both on Xcode (Version 6.4) and Visual Studio (VS 2013 Ultimate) I get unexpected behavior.
#include <iostream>
int main() {
// or unsigned char or signed char, same weird behavior
char a = 70;
char& b = a;
std::cout << &b << std::endl;
return 0;
}
In Xcode, the console prints something like F\330\367\277_\377 . I get that F is the ASCII code for 70, but I do not understand the rest of it. I assume it is also a set of ASCII characters, since on Visual Studio it prints out the F followed by some weird characters.
I tried other integer types and it worked fine. And I know that often char/signed char/unsigned char or some combination of them are implemented as the same type. The only thing I can think of is that the reference type is being implemented as a pointer type and then interpreting the call to &b as returning a pointer type, and then std::cout is taking its input to mean to print out all characters in a char array.
Is this defined behavior?
To reiterate: my question is more specifically, is this a defined behavior which is part of the standard, is this behavior not defined by the standard, is this a non-standard implementation of the compilers? Something else?

Related

Weird type assignment for array index in C++

Hi I have a sample program
#include <iostream>
int main() {
int a = -5;
int arr[a];
std::cout << "Size of arr: " << sizeof(arr) << std::endl;
return 0;
}
Here I am getting the output of 17179869164.
My question is that the array size value should not accept negative values! and if I try giving a[-5], it throws an error. but now how am I getting the output of 17179869164.
I have my assumption too, the -5 is converted to an unsigned int value of 4294967291 and the total size is given as 17179869164 = 4294967291 * 4(size of int).
So I wanted to know why the compiler is typecasting signed int to unsigned int and not throwing a compile-time error. I needed a clear understanding of how the compiler is executing that piece of code?
It is something called undefined behavior. To catch that kind of bug you could use the help of a static analyser.
Someone else asked something similar here:
Declaring an array of negative length
For C++, Variable Length Arrays are not provide by the standard, but may be provided by compiler extension. For C, the short answer is the standard converts the value to a positive integer value unless it is a constant expression -- in your case resulting in the use of the unsigned value (use of the two's compliment value as a positive value). Specifically:
C11 Standard - 6.7.6.2 Array
declarators(p5)
If the size is an expression that is not an integer constant
expression: if it occurs in a declaration at function prototype scope,
it is treated as if it were replaced by *; otherwise, each time it is
evaluated it shall have a value greater than zero.
I noticed some interesting behavior in GodBolt.
I took you code and added a second copy where a is declared constant:
#include <iostream>
int foo() {
int a = -5;
int arr[a];
std::cout << "Size of arr: " << sizeof(arr) << std::endl;
return 0;
}
int bar() {
const int a = -5;
int arr[a];
std::cout << "Size of arr: " << sizeof(arr) << std::endl;
return 0;
}
Then I threw GCC, Clang, and MSVC at them.
As far as I know, GCC and Clang both support variable length arrays (VLAs) as an "extra feature", and they both ate foo without a single complaint. Whereas MSVC, who does not support VLAs, complained.
On the other hand, none of them accepted bar on account of a being negative.
As for why GCC and Clang can't tell that ais negative in foo, that I will leave as a question for people more versed in compiler guts than I.

Is printing of a member pointer to an int defined

Suppose I have this code:
#include <iostream>
struct Mine
{
int a;
int b;
};
int main()
{
int Mine::* memberPointerA = &Mine::a;
int Mine::* memberPointerB = &Mine::b;
std::cout << memberPointerA;
std::cout << "\n";
std::cout << memberPointerB;
}
When I run this with Microsoft Visual C++ (2015)
I get the following output
1
1
The output I expect is something more like this:
1
2
So this begs the question: Is this printing of a member pointer defined behavior?
There's a defined conversion from pointer to bool. Since the member variable pointers are not NULL, they evaluate as true and print as 1.
The key issue at hand is that a pointer-to-member cannot be converted to void*, which is what the overload that usually handles printing pointers takes.
Thus, the next best conversion is used, which is the conversion pointer->bool. Both pointers are not null pointers, thus you get the output you see.
If you try printing "normal" pointers (as opposed to pointers to member), you would get the some output along the lines of what you expected initially.

Cast an object value without pointers

Let's assume that A and B are two classes (or structures) having no inheritance relationships (thus, object slicing cannot work). I also have an object b of the type B. I would like to interpret its binary value as a value of type A:
A a = b;
I could use reinterpret_cast, but I would need to use pointers:
A a = reinterpret_cast<A>(b); // error: invalid cast
A a = *reinterpret_cast<A *>(&b); // correct [EDIT: see *footnote]
Is there a more compact way (without pointers) that does the same? (Including the case where sizeof(A) != sizeof(B))
Example of code that works using pointers: [EDIT: see *footnote]
#include <iostream>
using namespace std;
struct C {
int i;
string s;
};
struct S {
unsigned char data[sizeof(C)];
};
int main() {
C c;
c.i = 4;
c.s = "this is a string";
S s = *reinterpret_cast<S *>(&c);
C s1 = *reinterpret_cast<C *>(&s);
cout << s1.i << " " << s1.s << endl;
cout << reinterpret_cast<C *>(&s)->i << endl;
return 0;
}
*footnote: It worked when I tried it, but it is actually an undefined behavior (which means that it may work or not) - see comments below
No. I think there's nothing in the C++ syntax that allows you to implicitly ignore types. First, that's against the notion of static typing. Second, C++ lacks standardization at binary level. So, whatever you do to trick the compiler about the types you're using might be specific to a compiler implementation.
That being said, if you really wanna do it, you should check how your compiler's data alignment/padding works (i.e.: struct padding in c++) and if there's a way to control it (i.e.: What is the meaning of "__attribute__((packed, aligned(4))) "). If you're planning to do this across compilers (i.e.: with data transmitted across the network), then you should be extra careful. There are also platform issues, like different addressing models and endianness.
Yes, you can do it without a pointer:
A a = reinterpret_cast<A &>(b); // note the '&'
Note that this may be undefined behaviour. Check out the exact conditions at http://en.cppreference.com/w/cpp/language/reinterpret_cast

const char* to int cast?

I suppose the behaviour of the following snippet is supposed to be undefined but I just wanted to make sure I am understanding things right.
Let's say we have this code:
#include <iostream>
int main()
{
std::cout << "mamut" - 8 << std::endl;
return 0;
}
So what I think this does is (char*)((int)(const char*) - (int)), though the output after this is pretty strange, not that I expect it to make any real sense. So my question is about the casting between char* and int - is it undefined, or is there some logic behind it?
EDIT:
Let me just add this:
#include <iostream>
int main ()
{
const char* a = "mamut";
int b = int(a);
std::cout << b << std::endl;
std::cout << &a <<std::endl;
// seems b!= &a
for( int i = 0; i<100;i++)
{
std::cout<<(const char*)((int)a - i)<<std::endl;
}
return 0;
}
The output after i gets big enough gives me a something like _Jv_RegisterClasses etc.
Just for the record:
std::cout << a - i << std::endl;
produces the same result as:
std::cout<<(const char*)((int)a - i)<<std::endl;
There is no cast, you are merely telling cout that you want to print the string at the address of the string literal "mamut" minus 8 bytes. You are doing pointer arithmetic. cout will then print whatever happens to be at that address, or possibly crash & burn, since accessing arrays out of bounds leads to undefined behavior.
EDIT
Regarding the edit by the op: converting an address to int doesn't necessarily result in a correct number identical to the address. An address doesn't necessarily fit in an int and on top of that, int is a signed type and it doesn't make any sense to store addresses in signed types.
To guarantee a conversion from pointer to integer without losses, you need to use uintptr_t from stdint.h.
To quote the C standard 6.3.2.3 (I believe C++ is identical in this case):
Any pointer type may be converted to an integer type. Except as
previously specified, the result is implementation-defined. If the
result cannot be represented in the integer type, the behavior is
undefined. The result need not be in the range of values of any
integer type.
There is no casting going on. "mamut" is a pointer to characters, and - 8 will do pointer arithmetic on it. You are right that it's undefined behavior, so even though the semantic behavior is pointer arithmetic, the runtime behavior can be literally anything.
You are printing string starting from address of "mamut" minus 8 bytes till null terminator i.e. in total 8+5 = 13 chars

Examples of code that compiles but executes differently in C versus C++ [closed]

C and C++ have many differences, and not all valid C code is valid C++ code.
(By "valid" I mean standard code with defined behavior, i.e. not implementation-specific/undefined/etc.)
Is there any scenario in which a piece of code valid in both C and C++ would produce different behavior when compiled with a standard compiler in each language?
To make it a reasonable/useful comparison (I'm trying to learn something practically useful, not to try to find obvious loopholes in the question), let's assume:
Nothing preprocessor-related (which means no hacks with #ifdef __cplusplus, pragmas, etc.)
Anything implementation-defined is the same in both languages (e.g. numeric limits, etc.)
We're comparing reasonably recent versions of each standard (e.g. say, C++98 and C90 or later)
If the versions matter, then please mention which versions of each produce different behavior.
Here is an example that takes advantage of the difference between function calls and object declarations in C and C++, as well as the fact that C90 allows the calling of undeclared functions:
#include <stdio.h>
struct f { int x; };
int main() {
f();
}
int f() {
return printf("hello");
}
In C++ this will print nothing because a temporary f is created and destroyed, but in C90 it will print hello because functions can be called without having been declared.
In case you were wondering about the name f being used twice, the C and C++ standards explicitly allow this, and to make an object you have to say struct f to disambiguate if you want the structure, or leave off struct if you want the function.
For C++ vs. C90, there's at least one way to get different behavior that's not implementation defined. C90 doesn't have single-line comments. With a little care, we can use that to create an expression with entirely different results in C90 and in C++.
int a = 10 //* comment */ 2
+ 3;
In C++, everything from the // to the end of the line is a comment, so this works out as:
int a = 10 + 3;
Since C90 doesn't have single-line comments, only the /* comment */ is a comment. The first / and the 2 are both parts of the initialization, so it comes out to:
int a = 10 / 2 + 3;
So, a correct C++ compiler will give 13, but a strictly correct C90 compiler 8. Of course, I just picked arbitrary numbers here -- you can use other numbers as you see fit.
The following, valid in C and C++, is going to (most likely) result in different values in i in C and C++:
int i = sizeof('a');
See Size of character ('a') in C/C++ for an explanation of the difference.
Another one from this article:
#include <stdio.h>
int sz = 80;
int main(void)
{
struct sz { char c; };
int val = sizeof(sz); // sizeof(int) in C,
// sizeof(struct sz) in C++
printf("%d\n", val);
return 0;
}
C90 vs. C++11 (int vs. double):
#include <stdio.h>
int main()
{
auto j = 1.5;
printf("%d", (int)sizeof(j));
return 0;
}
In C auto means local variable. In C90 it's ok to omit variable or function type. It defaults to int. In C++11 auto means something completely different, it tells the compiler to infer the type of the variable from the value used to initialize it.
Another example that I haven't seen mentioned yet, this one highlighting a preprocessor difference:
#include <stdio.h>
int main()
{
#if true
printf("true!\n");
#else
printf("false!\n");
#endif
return 0;
}
This prints "false" in C and "true" in C++ - In C, any undefined macro evaluates to 0. In C++, there's 1 exception: "true" evaluates to 1.
Per C++11 standard:
a. The comma operator performs lvalue-to-rvalue conversion in C but not C++:
char arr[100];
int s = sizeof(0, arr); // The comma operator is used.
In C++ the value of this expression will be 100 and in C this will be sizeof(char*).
b. In C++ the type of enumerator is its enum. In C the type of enumerator is int.
enum E { a, b, c };
sizeof(a) == sizeof(int); // In C
sizeof(a) == sizeof(E); // In C++
This means that sizeof(int) may not be equal to sizeof(E).
c. In C++ a function declared with empty params list takes no arguments. In C empty params list mean that the number and type of function params is unknown.
int f(); // int f(void) in C++
// int f(*unknown*) in C
This program prints 1 in C++ and 0 in C:
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
int d = (int)(abs(0.6) + 0.5);
printf("%d", d);
return 0;
}
This happens because there is double abs(double) overload in C++, so abs(0.6) returns 0.6 while in C it returns 0 because of implicit double-to-int conversion before invoking int abs(int). In C, you have to use fabs to work with double.
#include <stdio.h>
int main(void)
{
printf("%d\n", (int)sizeof('a'));
return 0;
}
In C, this prints whatever the value of sizeof(int) is on the current system, which is typically 4 in most systems commonly in use today.
In C++, this must print 1.
Another sizeof trap: boolean expressions.
#include <stdio.h>
int main() {
printf("%d\n", (int)sizeof !0);
}
It equals to sizeof(int) in C, because the expression is of type int, but is typically 1 in C++ (though it's not required to be). In practice they are almost always different.
An old chestnut that depends on the C compiler, not recognizing C++ end-of-line comments...
...
int a = 4 //* */ 2
+2;
printf("%i\n",a);
...
The C++ Programming Language (3rd Edition) gives three examples:
sizeof('a'), as #Adam Rosenfield mentioned;
// comments being used to create hidden code:
int f(int a, int b)
{
return a //* blah */ b
;
}
Structures etc. hiding stuff in out scopes, as in your example.
Another one listed by the C++ Standard:
#include <stdio.h>
int x[1];
int main(void) {
struct x { int a[2]; };
/* size of the array in C */
/* size of the struct in C++ */
printf("%d\n", (int)sizeof(x));
}
Inline functions in C default to external scope where as those in C++ do not.
Compiling the following two files together would print the "I am inline" in case of GNU C but nothing for C++.
File 1
#include <stdio.h>
struct fun{};
int main()
{
fun(); // In C, this calls the inline function from file 2 where as in C++
// this would create a variable of struct fun
return 0;
}
File 2
#include <stdio.h>
inline void fun(void)
{
printf("I am inline\n");
}
Also, C++ implicitly treats any const global as static unless it is explicitly declared extern, unlike C in which extern is the default.
#include <stdio.h>
struct A {
double a[32];
};
int main() {
struct B {
struct A {
short a, b;
} a;
};
printf("%d\n", sizeof(struct A));
return 0;
}
This program prints 128 (32 * sizeof(double)) when compiled using a C++ compiler and 4 when compiled using a C compiler.
This is because C does not have the notion of scope resolution. In C structures contained in other structures get put into the scope of the outer structure.
struct abort
{
int x;
};
int main()
{
abort();
return 0;
}
Returns with exit code of 0 in C++, or 3 in C.
This trick could probably be used to do something more interesting, but I couldn't think of a good way of creating a constructor that would be palatable to C. I tried making a similarly boring example with the copy constructor, that would let an argument be passed, albeit in a rather non-portable fashion:
struct exit
{
int x;
};
int main()
{
struct exit code;
code.x=1;
exit(code);
return 0;
}
VC++ 2005 refused to compile that in C++ mode, though, complaining about how "exit code" was redefined. (I think this is a compiler bug, unless I've suddenly forgotten how to program.) It exited with a process exit code of 1 when compiled as C though.
Don't forget the distinction between the C and C++ global namespaces. Suppose you have a foo.cpp
#include <cstdio>
void foo(int r)
{
printf("I am C++\n");
}
and a foo2.c
#include <stdio.h>
void foo(int r)
{
printf("I am C\n");
}
Now suppose you have a main.c and main.cpp which both look like this:
extern void foo(int);
int main(void)
{
foo(1);
return 0;
}
When compiled as C++, it will use the symbol in the C++ global namespace; in C it will use the C one:
$ diff main.cpp main.c
$ gcc -o test main.cpp foo.cpp foo2.c
$ ./test
I am C++
$ gcc -o test main.c foo.cpp foo2.c
$ ./test
I am C
int main(void) {
const int dim = 5;
int array[dim];
}
This is rather peculiar in that it is valid in C++ and in C99, C11, and C17 (though optional in C11, C17); but not valid in C89.
In C99+ it creates a variable-length array, which has its own peculiarities over normal arrays, as it has a runtime type instead of compile-time type, and sizeof array is not an integer constant expression in C. In C++ the type is wholly static.
If you try to add an initializer here:
int main(void) {
const int dim = 5;
int array[dim] = {0};
}
is valid C++ but not C, because variable-length arrays cannot have an initializer.
Empty structures have size 0 in C and 1 in C++:
#include <stdio.h>
typedef struct {} Foo;
int main()
{
printf("%zd\n", sizeof(Foo));
return 0;
}
This concerns lvalues and rvalues in C and C++.
In the C programming language, both the pre-increment and the post-increment operators return rvalues, not lvalues. This means that they cannot be on the left side of the = assignment operator. Both these statements will give a compiler error in C:
int a = 5;
a++ = 2; /* error: lvalue required as left operand of assignment */
++a = 2; /* error: lvalue required as left operand of assignment */
In C++ however, the pre-increment operator returns an lvalue, while the post-increment operator returns an rvalue. It means that an expression with the pre-increment operator can be placed on the left side of the = assignment operator!
int a = 5;
a++ = 2; // error: lvalue required as left operand of assignment
++a = 2; // No error: a gets assigned to 2!
Now why is this so? The post-increment increments the variable, and it returns the variable as it was before the increment happened. This is actually just an rvalue. The former value of the variable a is copied into a register as a temporary, and then a is incremented. But the former value of a is returned by the expression, it is an rvalue. It no longer represents the current content of the variable.
The pre-increment first increments the variable, and then it returns the variable as it became after the increment happened. In this case, we do not need to store the old value of the variable into a temporary register. We just retrieve the new value of the variable after it has been incremented. So the pre-increment returns an lvalue, it returns the variable a itself. We can use assign this lvalue to something else, it is like the following statement. This is an implicit conversion of lvalue into rvalue.
int x = a;
int x = ++a;
Since the pre-increment returns an lvalue, we can also assign something to it. The following two statements are identical. In the second assignment, first a is incremented, then its new value is overwritten with 2.
int a;
a = 2;
++a = 2; // Valid in C++.