I got the following discrepancy between compiling as 'C' vs. compiling as 'C++'
struct A {
int x;
int y;
};
struct A get() {
return {0};
}
When compiling as 'C++' everything goes fine.
However, when compiling as 'C'; i am getting:
error : expected expression
which i can fix by doing:
return (struct A){0};
However, i wonder where the difference comes from. Can any one point in the language reference where this difference comes from?
The two use completely different mechanisms, one of which is C++11-specific, the other of which is C99-specific.
The first bit,
struct A get() {
return {0};
}
depends on [stmt.return] (6.6.3 (2)) in C++11, which says
(...) A return statement with a braced-init-list initializes the object or reference to be returned from the function by copy-list-initialization from the specified initializer list. [ Example:
std::pair<std::string,int> f(const char *p, int x) {
return {p,x};
}
-- end example ]
This passage does not exist in C (nor C++ before C++11), so the C compiler cannot handle it.
On the other hand,
struct A get() {
return (struct A){0};
}
uses a C99 feature called "compound literals" that does not exist in C++ (although some C++ compilers, notably gcc, provide it as a language extension; gcc warns about it with -pedantic). The semantics are described in detail in section 6.5.2.5 of the C99 standard; the money quote is
4 A postfix expression that consists of a parenthesized type name followed by a brace-enclosed list of initializers is a compound literal. It provides an unnamed object whose value is given by the initializer list. (footnote 80)
80) Note that this differs from a cast expression. For example, a cast specifies a conversion to scalar types or void only, and the result of a cast expression is not an lvalue.
So in this case (struct A){0} is an unnamed object that is copied into the return value and returned. (note that modern compilers will elide this copy, so you do not have to fear runtime overhead from it)
And there you have it, chapters and verses. Why these features exist the way they do in their respective languages may prove to be a fascinating discussion, but I fear that it is difficult for anyone outside the respective standardization committees to give an authoritative answer to the question. Both features were introduced after C and C++ split ways, and they're not developed side by side (nor would it make sense to do so). Divergence is inevitable even in small things.
Related
Suppose I'm using some C library which has a function:
int foo(char* str);
and I know for a fact that foo() does not modify the memory pointed to by str. It's just poorly written and doesn't bother to declare str being constant.
Now, in my C++ code, I currently have:
extern "C" int foo(char* str);
and I use it like so:
foo(const_cast<char*>("Hello world"));
My question: Is it safe - in principle, from a language-lawyering perspective, and in practice - for me to write:
extern "C" int foo(const char* str);
and skip the const_cast'ing?
If it is not safe, please explain why.
Note: I am specifically interested in the case of C++98 code (yes, woe is me), so if you're assuming a later version of the language standard, please say so.
Is it safe for me to write: and skip the const_cast'ing?
No.
If it is not safe, please explain why.
-- From language side:
After reading the dcl.link I think exactly how the interoperability works between C and C++ is not exactly specified, with many "no diagnostic required" cases. The most important part is:
Two declarations for a function with C language linkage with the same function name (ignoring the namespace names that qualify it) that appear in different namespace scopes refer to the same function.
Because they refer to the same function, I believe a sane assumption would be that the declaration of a identifier with C language linkage on C++ side has to be compatible with the declaration of that symbol on C side. In C++ there is no concept of "compatible types", in C++ two declarations have to be identical (after transformations), making the restriction actually more strict.
From C++ side, we read c++draft basic#link-11:
After all adjustments of types (during which typedefs are replaced by their definitions), the types specified by all declarations referring to a given variable or function shall be identical, [...]
Because the declaration int foo(const char *str) with C language linkage in a C++ translation unit is not identical to the declaration int foo(char *str) declared in C translation unit (thus it has C language linkage), the behavior is undefined (with famous "no diagnostic required").
From C side (I think this is not even needed - the C++ side is enough to make the program have undefined behavior. anyway), the most important part would be C99 6.7.5.3p15:
For two function types to be compatible, both shall specify compatible return types. Moreover, the parameter type lists, if both are present, shall agree in the number of parameters and in use of the ellipsis terminator; corresponding parameters shall have compatible types [...]
Because from C99 6.7.5.1p2:
For two pointer types to be compatible, both shall be identically qualified and both shall be pointers to compatible types.
and C99 6.7.3p9:
For two qualified types to be compatible, both shall have the identically qualified version of a compatible type [...]
So because char is not compatible with const char, thus const char * is not compatible with char *, thus int foo(const char *) is not compatible with int foo(char*). Calling such a function (C99 6.5.2.2p9) would be undefined behavior (you may see also C99 J.2)
-- From practical side:
I do not believe will be able to find a compiler+architecture combination where one translation unit sees int foo(const char *) and the other translation unit defines a function int foo(char *) { /* some stuff */ } and it would "not work".
Theoretically, an insane implementation may use a different register to pass a const char* argument and a different one to pass a char* argument, which I hope would be well documented in that insane architecture ABI and compiler. If that's so, wrong registers will be used for parameters, it will "not work".
Still, using a simple wrapper costs nothing:
static inline int foo2(const char *var) {
return foo(static_cast<char*>(var));
}
I think the base answer is:
Yes, you can cast off const even if the referenced object is itself const such as a string literal in the example.
Undefined behaviour is only specified to arise in the event of an attempt to modify the const object not as a result of the cast.
Those rules and their reason to exist is 'old'. I'm sure they predate C++98.
Contrast it with volatile where any attempt to access a volatile object through a non-volatile reference is undefined behaviour. I can only read 'access' as read and/or write here.
I won't repeat the other suggestions but here is the most paranoid solution.
It's paranoid not because the C++ semantics aren't clear. They are clear. At least if you accept something being undefined behaviour is clear!
But you've described it as 'poorly written' and you want to put some sandbags round it!
The paranoid solution relies on the fact that if you are passing a constant object it will be constant for the whole execution (if the program doesn't risk UB).
So make a single copy of "hello world" lower in the call-stack or even initialised as a file scope object. You can declare it static in a function and it will (with minimal overhead) only be constructed once.
This recovers almost all of the benefits of string literal. The lower down the call stack including file-scope (global you put it the better.
I don't know how long the lifetime of the pointed-to object passed to foo() needs to be.
So it needs to be at least low enough in the chain to satisfy that condition.
NB: C++98 has std::string but it won't quite do here because you're still forbidden for modifying the result of c_str().
Here the semantics are defined.
#include <cstring>
#include <iostream>
class pseudo_const{
public:
pseudo_const(const char*const cstr): str(NULL){
const size_t sz=strlen(cstr)+1;
str=new char[sz];
memcpy(str,cstr,sz);
}
//Returns a pointer to a life-time permanent copy of
//the string passed to the constructor.
//Modifying the string through this value will be reflected in all
// subsequent calls.
char* get_constlike() const {
return str;
}
~pseudo_const(){
delete [] str;
}
private:
char* str;
};
const pseudo_const str("hello world");
int main() {
std::cout << str.get_constlike() << std::endl;
return 0;
}
The following C++ program compiles just fine (g++ 5.4 at least gives a warning when invoked with -Wall):
int main(int argc, char *argv[])
{
int i = i; // !
return 0;
}
Even something like
int& p = p;
is swallowed by the compiler.
Now my question is: Why is such an initialization legal? Is there any actual use-case or is it just a consequence of the general design of the language?
This is a side effect of the rule that a name is in scope immediately after it is declared. There's no need to complicate this simple rule just to prevent writing code that's obvious nonsense.
Just because the compiler accepts it (syntactically valid code) does not mean that it has well defined behaviour.
The compiler is not required to diagnose all cases of Undefined Behaviour or other classes of problems.
The standard gives it pretty free hands to accept and translate broken code, on the assumption that if the results were to be undefined or nonsensical the programmer would not have written that code.
So; the absense of warnings or errors from your compiler does not in any way prove that your program has well defined behaviour.
It is your responsibility to follow the rules of the language.
The compiler usually tries to help you by pointing out obvious flaws, but in the end it's on you to make sure your program makes sense.
And something like int i = i; does not make sense but is syntactically correct, so the compiler may or may not warn you, but in any case is within its rights to just generate garbage (and not tell you about it) because you broke the rules and invoked Undefined Behaviour.
I guess the gist of your question is about why the second identifier is recognized as identifying the same object as the first, in int i = i; or int &p = p;
This is defined in [basic.scope.pdecl]/1 of the C++14 standard:
The point of declaration for a name is immediately after its complete declarator and before its initializer (if any), except as noted below. [Example:
unsigned char x = 12;
{ unsigned char x = x; }
Here the second x is initialized with its own (indeterminate) value. —end example ]
The semantics of these statements are covered by other threads:
Is int x = x; UB?
Why can a Class& be initialized to itself?
Note - the quoted example differs in semantics from int i = i; because it is not UB to evaluate an uninitialized unsigned char, but is UB to evaluate an uninitialized int.
As noted on the linked thread, g++ and clang can give warnings when they detect this.
Regarding rationale for the scope rule: I don't know for sure, but the scope rule existed in C so perhaps it just made its way into C++ and now it would be confusing to change it.
If we did say that the declared variable is not in scope for its initializer, then int i = i; might make the second i find an i from an outer scope, which would also be confusing.
Where in the standard are functions returning functions disallowed? I understand they are conceptually ridiculous, but it seems to me that the grammar would allow them. According to this webpage, a "noptr-declarator [is] any valid declarator" which would include the declarator of a function:
int f()();
Regarding the syntax.
It seems to me that the syntax, as spelled out in [dcl.decl], allows
int f(char)(double)
which could be interpreted as the function f that takes a char and returns a function with same signature as int g(double).
1 declarator:
2 ptr-declarator
3 noptr-declarator parameters-and-qualifiers trailing-return-type
4 ptr-declarator:
5 noptr-declarator
6 ptr-operator ptr-declarator
7 noptr-declarator:
8 declarator-id attribute-specifier-seq opt
9 noptr-declarator parameters-and-qualifiers
10 noptr-declarator [ constant-expression opt ] attribute-specifier-seq opt
11 ( ptr-declarator )
12 parameters-and-qualifiers:
13 ( parameter-declaration-clause ) cv-qualifier-seqAfter
Roughly speaking, after
1->2, 2=4, 4->6, 4->6
you should have
ptr-operator ptr-operator ptr-operator
Then, use 4->5, 5=7, 7->8 for the first declarator; use 4->5, 5=7, 7->9 for the second and third declarators.
From [dcl.fct], pretty explicitly:
Functions shall not have a return type of type array or function, although they may have a return type of
type pointer or reference to such things. There shall be no arrays of functions, although there can be arrays
of pointers to functions.
With C++11, you probably just want:
std::function<int()> f();
std::function<int(double)> f(char);
There is some confusion regarding the C++ grammar. The statement int f(char)(double); can be parsed according to the grammar. Here is a parse tree:
Furthermore such a parse is even meaningful based on [dcl.fct]/1:
In a declaration T D where D has the form
D1 ( parameter-declaration-clause ) cv-qualifier-seqopt
ref-qualifieropt exception-specificationopt attribute-specifier-seqopt
and the type of the contained declarator-id in the declaration T D1 is “derived-declarator-type-list T”, the
type of the declarator-id in D is “derived-declarator-type-list function of (parameter-declaration-clause ) cv-qualifier-seqopt
ref-qualifieropt returning T”.
In this example T == int, D == f(char)(double), D1 == f(char). The type of the declarator-id in T D1 (int f(char)) is "function of (char) returning int". So derived-declarator-type-list is "function of (char) returning". Thus, the type of f would be read as "function of (char) returning function of (double) returning int."
It's ultimately much ado about nothing, as this is an explicitly disallowed declarator form. But not by the grammar.
With C++11 (but not previous versions of C++) you can not only return C-like function pointers, but also C++ closures, notably with anonymous functions. See also std::function
The standard disallows (semantically, not syntactically - so it is not a question of grammar ; see Barry's answer for the citation) returning functions (and also disallow sizeof on functions!) but permits to return function pointers.
BTW, I don't think that you could return entire functions. What would that mean? How would you implement that? Practically speaking, a function is some code block, and its name is (like for arrays) a pointer to the start of the function's machine code.
A nice trick might be to build (using mechanisms outside of the C++ standard) a function at runtime (and then handling its function pointer). Some external libraries might permit that: you could use a JIT library (e.g. asmjit, gccjit, LLVM ...) or simply generate C++ code, then compile and dlopen & dlsym it on POSIX systems, etc.
PS. You are probably right in understanding that the C++11 grammar (the EBNF rules in the standard) does not disallow returning functions. It is a semantic rule stated in plain English which disallows that (it is not any grammar rule). I mean that the EBNF alone would allow:
// semantically wrong... but perhaps not syntactically
typedef int sigfun_T(std::string);
sigfun_T foobar(int);
and it is for semantics reasons (not because of EBNF rules) that a compiler is rightly rejecting the above code. Practically speaking, the symbol table matters a lot to the C++ compiler (and it is not syntax or context-free grammar).
The sad fact about C++ is that (for legacy reasons) its grammar (alone) is very ambiguous. Hence C++11 is difficult to read (for humans), difficult to write (for developers), difficult to parse (for compilers), ....
using namespace std;
auto hello()
{
char name[] = "asdf";
return [&]() -> void
{
printf("%s\n", name);
};
}
int main()
{
hello()();
auto r = hello();
r();
return 0;
}
Actually in C one cannot pass or return function. Only a pointer/address of the function can be passed/returned, which conceptually is pretty close. To be honest, thanks to possiblity of ommiting & and * with function pointers one shouldn't really care if function or pointer is passed (unless it contains any static data). Here is simple function pointer declaration:
void (*foo)();
foo is pointer to function returning void and taking no arguments.
In C++ it is not that much different. One can still use C-style function pointers or new useful std::function object for all callable creations. C++ also adds lambda expressions which are inline functions which work somehow similar to closures in functional languages. It allows you not to make all passed functions global.
And finally - returning function pointer or std::function may seem ridiculous, but it really is not. For example, the state machine pattern is (in most cases) based on returning pointer to function handling next state or callable next state object.
The formal grammar of C actually disallows returning functions but, one can always return a function pointer which, for all intents and purposes, seems like what you want to do:
int (*getFunc())(int, int) { … }
I am fixated in saying, the grammar, is a more fundamental explanation for the lack of support of such feature. The standard's rules are a latter concern.
If the grammar does not provide for a way to accomplish something, I don't think it matters what the semantics or the standard says for any given context-free language.
Unless you are returning a pointer or reference to a function which is ok, the alternative is returning a copy of a function. Now, think about what a copy of a function looks like, acts like, behaves like. That, first of all would be an array of bytes, which isn't allowed either, and second of all those bytes would be the equivalence of a piece of code literally returning a piece of code....nearly all heuristic virus scanners would consider that a virus because there would also be no way to verify the viability of the code being returned by the runtime system or even at compile time. Even if you could return an array, how would you return a function? The primary issue with returning an array (which would be a copy on the stack) is that the size is not known and so there's no way to remove it from the stack, and the same dilemma exists for functions (where the array would be the machine language binary code). Also, if you did return a function in that fashion, how would you turn around and call that function?
To sum up, the notion of returning a function rather than a point to a function fails because the notion of that is a unknown size array of machine code placed (copied) onto the stack. It's not something the C or C++ was designed to allow, and with code there is now way to turn around and call that function, especially i you wanted to pass arguments.
I hope this makes sense
in kinda sense,function_pointer is funciton its self,
and "trailing return type" is a good and hiding thing in c++11,i will rather write like this:
#include<iostream>
auto return0(void)->int
{return 0;}
auto returnf(void)->decltype(&return0)
{return &return0;}
/*Edit: or...
auto returnf(void)->auto (*)(void)->int
{return &return0;}
//that star means function pointer
*/
auto main(void)->int
{
std::cout<<returnf()();
return 0;
}
(if type unmatch,try apply address operator "&")
look,so much of sense.
but down side of this is that "auto" thing in head of function,
that`s non-removable and no type could match it (even lambda could match template type std::function<>)
but if you wish,macro could do magic(sometime curse) to you
#define func auto
Consider the following code snippet:
union
{
int a;
float b;
};
a = /* ... */;
b = a; // is this UB?
b = b + something;
Is the assignment of one union member to another valid?
Unfortunately I believe the answer to this question is that this operation on unions is under specified in C++, although self assignment is perfectly ok.
Self assignment is well defined behavior, if we look at the draft C++ standard section 1.9 Program execution paragraph 15 has the following examples:
void f(int, int);
void g(int i, int* v) {
i = v[i++]; // the behavior is undefined
i = 7, i++, i++; // i becomes 9
i = i++ + 1; // the behavior is undefined
i = i + 1; // the value of i is incremented
f(i = -1, i = -1); // the behavior is undefined
}
and self assignment is covered in the i = i + 1 example.
The problem here is that unlike C89 forward which supports type-punning in C++ it is not clear. We only know that:
In a union, at most one of the non-static data members can be active at any time
but as this discussion in the WG21 UB study group mailing list shows this concept is not well understood, we have the following comments:
While the standard uses the term "active field", it does not define it
and points out this non-normative note:
Note: In general, one must use explicit destructor calls and placement new operators to change the active member of a union. — end note
so we have to wonder whether:
b = a;
makes b the active member or not? I don't know and I don't see a way to prove it with the any of the current versions of the draft standard.
Although in all practicality most modern compilers for example gcc supports type-punning in C++, which means that the whole concept of the active member is bypassed.
I would expect that unless the source and destination variables are the same type, such a thing would be Undefined Behavior in C, and I see no reason to expect C++ to handle it any differently. Given long long *x,*y;, some compilers might process a statement like *x = *y >>8; by generating code to read all of *y, compute the result, and store it to *x, but a compiler might perfectly legitimately write code that copied parts of *y to *x individually. The standard makes clear that if *x and *y are pointers to the same object of the same type, the compiler must ensure that no part of the value gets overwritten while that part is still needed in the computation, but compilers are not required to deal with aliasing in other situations.
struct A {
static const int a = 5;
struct B {
static const int b = a;
};
};
int main() {
return A::B::b;
}
The above code compiles. However if you go by Effective C++ book by Scott Myers(pg 14);
We need a definition for a in addition to the declaration.
Can anyone explain why this is an exception?
C++ compilers allow static const integers (and integers only) to have their value specified at the location they are declared. This is because the variable is essentially not needed, and lives only in the code (it is typically compiled out).
Other variable types (such as static const char*) cannot typically be defined where they are declared, and require a separate definition.
For a tiny bit more explanation, realize that accessing a global variable typically requires making an address reference in the lower-level code. But your global variable is an integer whose size is this typically around the size of an address, and the compiler realizes it will never change, so why bother adding the pointer abstraction?
By really pedantic rules, yes, your code needs a definition for that static integer.
But by practical rules, and what all compilers implement because that's how the rules of C++03 are intended - no, you don't need a definition.
The rules for such static constant integers are intended to allow you to omit the definition if the integer is used only in such situations where a value is immediately read, and if the static member can be used in constant expressions.
In your return statement, the value of the member is immediately read, so you can omit the definition of the static constant integer member if that's the only use of it. The following situation needs a definition, however:
struct A {
static const int a = 5;
struct B {
static const int b = a;
};
};
int main() {
int *p = &A::B::b;
}
No value is read here - but instead the address of it is taken. Therefore, the intent of the C++03 Standard is that you have to provide a definition for the member like the following in some implementation file.
const int A::B::b;
Note that the actual rules appearing in the C++03 Standard says that a definition is not required only where the variable is used where a constant expression is required. That rule, however, if strictly applied, is too strict. It would only allow you to omit a definition for situation like array-dimensions - but would require a definition in cases like a return statement. The corresponding defect report is here.
The wording of C++0x has been updated to include that defect report resolution, and to allow your code as written.
However, if you try the ternary operand without "defining" static consts, you get a linker error in GCC 4x:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=13795
So, although constructs like int k = A::CONSTVAL; are illegal in the current standard, they are supported. But the ternary operand is not. Some operators are more equal than others, if you get my drift :)
So much for "lax" rules. I suggest you write code conforming to the standard if you do not want surprises.
In general, most (and recent) C++ compilers allow static const ints
You just lucky, perhaps not. Try older compiler, such as gcc 2.0 and it will vehemently punish you with-less-than-pretty error message.