-Werror=format: how can the compiler know - c++

I wrote this intentionally wrong code
printf("%d %d", 1);
compiling with g++ and -Werror=format.
The compiler gives this very impressive warning:
error: format '%d' expects a matching 'int' argument [-Werror=format]
As far as I can see, there's no way the compiler can tell that the code is wrong, because the format string isn't parsed until runtime.
My question: does the compiler have a special feature that kicks in for printf and similar libc functions, or is this a feature I could use for my own functions? String literals?

As far as I can see, there's no way the compiler can tell that the code is wrong, because the format string isn't parsed until runtime.
As long as the format string is a string literal, it can be parsed at compile time. If it isn't (which is usually a bad idea anyway), then you can get a warning about that from -Wformat-security.
does the compiler have a special feature that kicks in for printf and similar libc functions?
Yes.
or is this a feature I could use for my own functions?
Yes, as long as you're using the same style of format string as printf (or various other standard functions like scanf or strftime).
void my_printf(Something, char const * format, SomethingElse, ...)
__attribute__ ((format (printf,2,4)));
to indicate that the second argument is a printf-style format string, and the values to format begin with the fourth. See http://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html.

Well, printf definitely parses the format string at runtime in order to do its job. But nowhere is it written that the compiler may not choose to parse it itself if it wants to.
The documentation for -Wformat says that this is exactly what happens:
-Wformat
-Wformat=n
Check calls to printf and scanf, etc., to make sure that the arguments supplied have > types appropriate to the format string
specified, and that the conversions specified in the format string
make sense. This includes standard functions, and others specified by
format attributes (see Function Attributes), in the printf, scanf,
strftime and strfmon (an X/Open extension, not in the C standard)
families (or other target-specific families). Which functions are
checked without format attributes having been specified depends on the
standard version selected, and such checks of functions without the
attribute specified are disabled by -ffreestanding or -fno-builtin.
The formats are checked against the format features supported by GNU libc version 2.2.
These include all ISO C90 and C99 features, as
well as features from the Single Unix Specification and some BSD and
GNU extensions. Other library implementations may not support all
these features; GCC does not support warning about features that go
beyond a particular library's limitations. However, if -Wpedantic is
used with -Wformat, warnings are given about format features not in
the selected standard version (but not for strfmon formats, since
those are not in any version of the C standard). See Options
Controlling C Dialect.
Update: Turns out you can use it on your own functions. Mike has the details.

Related

c16rtomb()/c32rtomb() locale-independent conversion?

C++11 introduced the c16rtomb()/c32rtomb() conversion functions, along with the inverse (mbrtoc16()/mbrtoc32()). c16rtomb() clearly states in the reference documentation here:
The multibyte encoding used by this function is specified by the currently active C locale
The documentation for c32rtomb() states the same. Both the C and C++ versions agree that these are locale-dependent conversions (as they should be, according to the naming convention of the functions themselves).
However, MSVC seems to have taken a different approach and made them locale-independent (not using the current C locale) according to this document. These conversion functions are specified under the heading Locale-independent multibyte routines.
C++20 adds to the confusion by including the c8rtomb()/mbrtoc8() functions, which if locale-independent would basically do nothing, converting UTF-8 input to UTF-8 output.
Two questions arise from this:
Do any other compilers actually follow the standard and implement locale-dependent Unicode multibyte conversion routines? I couldn't find any concrete information after extensive searching.
Is this a bug in MSVC's implementation?

(v) is actually (*&v) since when?

Could C++ standards gurus please enlighten me:
Since which C++ standard version has this statement failed because (v) seems to be equivalent to (*&v)?
I.e. for example the code:
#define DEC(V) ( ((V)>0)? ((V)-=1) : 0 )
...{...
register int v=1;
int r = DEC(v) ;
...}...
This now produces warnings under -std=c++17 like:
cannot take address of register variable
left hand side of operand must be lvalue
Many C macros enclose ALL macro parameters in parentheses, of which the above is meant only to be a representative example.
The actual macros that produce warnings are for instance
the RTA_* macros in /usr/include/linux/rtnetlink.h.
Short of not using/redefining these macros in C++, is there any workaround?
If you look at the revision summary of the latest C++1z draft, you'd see this in [diff.cpp14.dcl.dcl]
[dcl.stc]
Change: Removal of register storage-class-specifier.
Rationale: Enable repurposing of deprecated keyword in future
revisions of this International Standard.
Effect on original feature: A valid C++ 2014 declaration utilizing the register
storage-class-specifier is ill-formed in this International Standard.
The specifier can simply be removed to retain the original meaning.
The warning may be due to that.
register is no longer a storage class specifier, you should remove it. Compilers may not be issuing the right error or warnings but your code should not have register to begin with
The following is a quote from the standard informing people about what they should do with regards to register in their code (relevant part emphasized), you probably have an old version of that file
C.1.6 Clause 10: declarations [diff.dcl]
Change: In C++, register is not a storage class specifier.
Rationale: The storage class specifier had no effect in C++.
Effect on original feature: Deletion of semantically well-defined feature.
Difficulty of converting: Syntactic transformation.
How widely used: Common.
Your worry is unwarranted since the file in question does not actually contain the register keyword:
grep "register" /usr/include/linux/rtnetlink.h
outputs nothing. Either way, you shouldn't be receiving the warning since:
System headers don't emit warnings by default, at least in GCC
It isn't wise to try to compile a file that belongs to a systems project like the linux kernel in C++ mode, as there may be subtle and nasty breaking changes
Just include the file normally or link the C code to your C++ binary. Report a bug if you really are getting a warning that should normally be suppressed to your compiler vendor.

ISO C++ and the infamous underscore [duplicate]

This MSDN article states that getcwd() has been deprecated and that the ISO C++ compatible _getcwd should be used instead, which raises the question: what makes getcwd() not ISO-compliant?
There is a good discussion about that. P.J. Plauger answers to this
I'm the guy who insisted back in 1983 that the space of
names available to a C program be partitioned into:
a) those defined by the implementation for the benefit of the programmer (such as printf)
b) those reserved to the programmer (such as foo)
c) those reserved to the implementation (such as _unlink)
We knew even then that "the implementation" was too monolithic --
often more than one source supplies bits of the implementation --
but that was the best we could do at the time. Standard C++
has introduced namespaces to help, but they have achieved only
a fraction of their stated goals. (That's what happens when you
standardize a paper tiger.)
In this particular case, Posix supplies a list of category (a) names
(such as unlink) that you should get defined when and only when you
include certain headers. Since the C Standard stole its headers from
Unix, which is the same source as for Posix, some of those headers
overlap historically. Nevertheless, compiler warnings should have
some way of taking into account whether the supported environment
is "pure" Standard C++ (a Platonic ideal) or a mixed C/C++/Posix
environment. The current attempt by Microsoft to help us poor
programmers fails to take that into account. It insists on treating
unlink as a category (b) name, which is myopic.
Well, GCC will not declare POSIX names in strict C mode, at least (though, it still does in C++ mode):
#include <stdio.h>
int main() {
&fdopen;
return 0;
}
Output using -std=c99
test.c: In function 'main':
test.c:4: error: 'fdopen' undeclared (first use in this function)
You will have to tell it explicitly that you are operating in a mixed C/Posix by using feature test macros or not passing any specific standard. It will then default to gnu89 which assumes a mixed environment (man feature_test_macros). Apparently, MSVC does not have that possibility.
Functions not specified in the standard are supposed to be prefixed by an underscore as an indication that they're vendor-specific extensions or adhere to a non-ISO standard. Thus the "compliance" here was for Microsoft to add an underscore to the name of this specific function since it's not part of the ISO standard.
As others have already pointed out, getcwd is not included in ISO C++, but is part of POSIX/IEEE Std 1003.1.
Microsoft has decided to include some of the most commonly used POSIX functions in their C standard library (but prefix these functions with an underscore to essentially discourage their usage).
For the record, getcwd() wasn't deprecated by ISO. It was "deprecated" by Microsoft. Microsoft rewrote many C functions -- often with a little better security in mind (say, string functions that also take a max_length parameter). They then had their compiler spit out these warnings, which I consider bogus because no standards group deprecated any of the functions declared deprecated.
To add on to Dan Olson's post: See ANSI C Compliance page on MSDN
The names of Microsoft-specific functions and global variables begin with a single underscore. These names can be overridden only locally, within the scope of your code. For example, when you include Microsoft run-time header files, you can still locally override the Microsoft-specific function named _open by declaring a local variable of the same name. However, you cannot use this name for your own global function or global variable.
As far as I'm aware getcwd() has never been part of ISO Standard C++. _getcwd() definitely isn't, as standard names will not begin with an underscore.
In fact, the MSDN article links to a man page that says it is declared in direct.h, which is not a Standard C++ header file. The article seems bogus to me.
The MSDN article is somewhat confusing in what a normal person would conclude from just a quick reading (if they don't read it with a very careful lawyer eye).
What the MSDN article says is: getcwd() is not compliant with the ISO C++ standard. To comply with that ISO C++ standard for naming of functions (which is what getcwd violates), Microsoft properly put an _ on the front of the function, so the same function becomes _getcwd(). That is the ISO C++ compliant way of naming the function because getcwd() and _getcwd() are not an ISO C++ standard function, but are a Microsoft (vendor) specific, or implementation specific function.
The article does not indicate what a C++ ISO standard call to get the working directory would be... though thats what folks tend to read at a quick glance.

strcmpi renamed to _strcmpi?

In MSVC++, there's a function strcmpi for case-insensitive C-string comparisons.
When you try and use it, it goes,
This POSIX function is deprecated beginning in Visual C++ 2005. Use the ISO C++ conformant _stricmp instead.
What I don't see is why does ISO not want MSVC++ to use strcmpi, and why is _stricmp the preferred way, and why would they bother to rename the function, and how is a function beginning with an underscore ISO conformant. I know there must be a reason for all this, and I'm suspecting its because strcmpi is non-standard, and perhaps ISO wants non-standard extensions to begin with an _underscore?
ISO C reserves certain identifiers for future expansion (see here), including anything that starts with "str".
IMNSHO, this is Microsoft's way of saying "Do not put Unix software on Windows machines". There are several frustrating aspects to the problem:
strcmpi() is not a POSIX function - the relevant functions are defined in <strings.h> and are called strcasecmp() etc.
Even if you explicitly request support for POSIX functions, Microsoft thinks that you may not use the POSIX names but must prefix them with the wretched underscore.
AFAIK, there isn't a way of overriding the MSVC compiler's view on the issue.
That said, the GCC tool chain gets a bit stroppy about some functions - mktemp() et al. However, it does compile and link successfully, despite the warnings (which are justified).
I note that MSVC also has a bee in its bonnet about snprintf() et al. If their function conformed to the C99 standard (along with the rest of the compiler), then there would never be any risk of an overflow - the standard requires null termination, contrary to the claims of Microsoft.
I haven't got a really good solution to this problem - I'm not sure there is one. One possibility is to create a header (or set of headers) to map all the actual POSIX names to Microsoft's misinterpretation of them. Another is two create a library of trivial functions with the correct POSIX name that each call down onto the Microsoft version of the name (giving you a massive collection of four-line functions - the declarator line, an open brace, a close brace, and a return statement that invokes the Microsoft variant of the POSIX function name.
It's funny how the Microsoft API calls, which also pollute the user's name space, are not deprecated or renamed.
Names begining witth an underscore and a lower case letter are reserved by the C++ Standard for the C++ implementation, if they are declared in the global namespace. This stops them from clashing with similar names in your own code, which must not use this naming convention.
strcmpi goes away altogether in Visual C++ 2008, so you should definitely heed the deprecation if you ever intend to upgrade.
The _ doesn't make the function ISO standard, it's just that functions beginning with _ are safer to add as the language evolves because that's one of the parts of the namespace reserved for the language to use.
According to Microsoft's documentation for _stricmp, it sounds like strcmpi has some practices that result in some unintuitive orderings (including normalizing to lower case instead of simply treating case as irrelevant). Sounds like _stricmp takes more pains to do what one would naturally expect.

Why is getcwd() not ISO C++ compliant?

This MSDN article states that getcwd() has been deprecated and that the ISO C++ compatible _getcwd should be used instead, which raises the question: what makes getcwd() not ISO-compliant?
There is a good discussion about that. P.J. Plauger answers to this
I'm the guy who insisted back in 1983 that the space of
names available to a C program be partitioned into:
a) those defined by the implementation for the benefit of the programmer (such as printf)
b) those reserved to the programmer (such as foo)
c) those reserved to the implementation (such as _unlink)
We knew even then that "the implementation" was too monolithic --
often more than one source supplies bits of the implementation --
but that was the best we could do at the time. Standard C++
has introduced namespaces to help, but they have achieved only
a fraction of their stated goals. (That's what happens when you
standardize a paper tiger.)
In this particular case, Posix supplies a list of category (a) names
(such as unlink) that you should get defined when and only when you
include certain headers. Since the C Standard stole its headers from
Unix, which is the same source as for Posix, some of those headers
overlap historically. Nevertheless, compiler warnings should have
some way of taking into account whether the supported environment
is "pure" Standard C++ (a Platonic ideal) or a mixed C/C++/Posix
environment. The current attempt by Microsoft to help us poor
programmers fails to take that into account. It insists on treating
unlink as a category (b) name, which is myopic.
Well, GCC will not declare POSIX names in strict C mode, at least (though, it still does in C++ mode):
#include <stdio.h>
int main() {
&fdopen;
return 0;
}
Output using -std=c99
test.c: In function 'main':
test.c:4: error: 'fdopen' undeclared (first use in this function)
You will have to tell it explicitly that you are operating in a mixed C/Posix by using feature test macros or not passing any specific standard. It will then default to gnu89 which assumes a mixed environment (man feature_test_macros). Apparently, MSVC does not have that possibility.
Functions not specified in the standard are supposed to be prefixed by an underscore as an indication that they're vendor-specific extensions or adhere to a non-ISO standard. Thus the "compliance" here was for Microsoft to add an underscore to the name of this specific function since it's not part of the ISO standard.
As others have already pointed out, getcwd is not included in ISO C++, but is part of POSIX/IEEE Std 1003.1.
Microsoft has decided to include some of the most commonly used POSIX functions in their C standard library (but prefix these functions with an underscore to essentially discourage their usage).
For the record, getcwd() wasn't deprecated by ISO. It was "deprecated" by Microsoft. Microsoft rewrote many C functions -- often with a little better security in mind (say, string functions that also take a max_length parameter). They then had their compiler spit out these warnings, which I consider bogus because no standards group deprecated any of the functions declared deprecated.
To add on to Dan Olson's post: See ANSI C Compliance page on MSDN
The names of Microsoft-specific functions and global variables begin with a single underscore. These names can be overridden only locally, within the scope of your code. For example, when you include Microsoft run-time header files, you can still locally override the Microsoft-specific function named _open by declaring a local variable of the same name. However, you cannot use this name for your own global function or global variable.
As far as I'm aware getcwd() has never been part of ISO Standard C++. _getcwd() definitely isn't, as standard names will not begin with an underscore.
In fact, the MSDN article links to a man page that says it is declared in direct.h, which is not a Standard C++ header file. The article seems bogus to me.
The MSDN article is somewhat confusing in what a normal person would conclude from just a quick reading (if they don't read it with a very careful lawyer eye).
What the MSDN article says is: getcwd() is not compliant with the ISO C++ standard. To comply with that ISO C++ standard for naming of functions (which is what getcwd violates), Microsoft properly put an _ on the front of the function, so the same function becomes _getcwd(). That is the ISO C++ compliant way of naming the function because getcwd() and _getcwd() are not an ISO C++ standard function, but are a Microsoft (vendor) specific, or implementation specific function.
The article does not indicate what a C++ ISO standard call to get the working directory would be... though thats what folks tend to read at a quick glance.