Is every "normal" use of user-defined literals undefined behavior? - c++

User defined literals must start with an underscore.
This is a more or less universally well-known rule that you can find on every layman-worded site talking about user literals. It is also a rule which I (and possibly others?) have been blatantly ignoring ever since on a "what a bullshit" base. Now of course, that's strictly not correct. In the strictest sense, this uses a reserved identifier, and thus invokes Undefined Behavior (although you don't get as much as a shrug from the compiler, practically).
So, pondering whether I should continue to deliberately ignore that (in my opinion useless) part of the standard or not, I decided to look at what's actually written. Because, you know, what does it matter what everybody knows. What matters is what's written in the standard.
[over.literal] states that "some" literal suffix identifiers are reserved, linking to [usrlit.suffix]. The latter states that all are reserved, except those that start with an underscore. OK, so that's pretty much exactly what we already knew, explicitly written (or rather, written backwards).
Also, [over.literal] contains a Note which hints to an obvious but troubling thing:
except for the constraints described above, they are ordinary namespace-scope functions and function templates
Well, sure they are. Nowhere does it say that they aren't, so what else would you expect them to be.
But wait a moment. [lex.name] explicitly states that each identifier that begins with an underscore in the global namespace is reserved.
Now, a literal operator usually, unless you explicitly put it into a namespace (which, I believe nobody does!?) is very much in the global namespace. So, the name, which must begin with an underscore, is reserved. There is no mention of a special exception. So, every name (with underscore, or without) is a reserved name.
Are you indeed expected to put user defined literals into a namespace because the "normal" usage (underscore or not) is using a reserved name?

Yes: the combination of forbidding the use of _ as the start of a global identifier coupled with requiring non-standard UDLs to start with _ means that you can't put them in the global namespace. But you shouldn't be dirtying up the global namespace with stuff, especially UDLs, so that shouldn't be much of a problem.
The traditional idiom, as used by the standard, is to put UDLs in a literals namespace (and if you have different sets of UDLs, then you put them in different inline namespaces below that namespace). That literals namespace is typically underneath your main one. When you want to use a particular set of UDLs, you invoke using namespace my_namespace::literals or whichever sub-namespace contains your literal set of choice.
This is important because UDLs tend to be heavily abbreviated. The standard for example uses s for std::string, but also for std::chrono::duration of seconds. While they do apply to different kinds of literals (s applied to a string is a string, while s applied to a number is a duration), it can sometimes be confusing to read code that uses abbreviated literals. So you shouldn't throw literals at all users of your library; they should opt-in to using them.
By using different namespaces for these (std::literals::string_literals and std::literals::chrono_literals), the user can be up-front about which sets of literals they want in which parts of code.

This is a good question, and I'm not sure about the answer, but I think the answer is "no, it's not UB" based on a particular reading of the standard.
[lex.name]/3.2 reads:
Each identifier that begins with an underscore is reserved to the implementation for use as a name in the global namespace.
Now, clearly, the restriction "as a name in the global namespace" should be read as applying to the entire rule, not just to how the implementation may use the name. That is, its meaning is not
"each identifier that begins with an underscore is reserved to the implementation, AND the implementation may use such identifiers as names in the global namespace"
but rather,
"the use of any identifier that begins with an underscore as a name in the global namespace is reserved to the implementation".
(If we believed the first interpretation, then it would mean that no one could declare a function called my_namespace::_foo, for example.)
Under the second interpretation, something like a global declaration of operator""_foo (in the global scope) is legal, because such a declaration does not use _foo as a name. Rather, the identifier is just a part of the actual name, which is operator""_foo (which does not start with an underscore).

Is every “normal” use of user-defined literals undefined behavior?
Clearly not.
The following is the idiomatic (and thus definitely “normal”) use of UDLs, and it’s well-defined according to the rule you’ve just listed:
namespace si {
struct metre { … };
constexpr metre operator ""_m(long double value) { return metre{value}; }
}
You’ve listed problematic cases and I agree with your assessment about their validity but they’re easily avoided in idiomatic C++ code so I don’t entirely see the problem with the current wording, even if it was potentially accidental.
According to the example in [over.literal]/8, we can even use capital letters after the underscore:
float operator ""E(const char*); // error: reserved literal suffix (20.5.4.3.5, 5.13.8)
double operator""_Bq(long double); // OK: does not use the reserved identifier _Bq (5.10)
double operator"" _Bq(long double); // uses the reserved identifier _Bq (5.10)
The only problematic thing thus seems to be the fact that the standard makes the whitespace between "" and the UDL name significant.

Yes, defining your own user defined literal in the global namespace results in an ill-formed program.
I haven't run into this myself, because I try to follow the rule:
Don't put anything (besides main, namespaces, and extern "C" stuff for ABI stability) in the global namespace.
namespace Mine {
struct meter { double value; };
inline namespace literals {
meter operator ""_m( double v ) { return {v}; }
}
}
int main() {
using namespace Mine::literals;
std::cout << 15_m.value << "\n";
}
This also means you cannot use _CAPS as your literal name, even in a namespace.
Inline namespaces called literals is a great way to package up your user defined literal operators. They can be imported where you want to use it without having to name exactly which literals you want, or if you import the entire namespace you also get the literals.
This follows how the std library handles literals as well, so should be familiar to users of your code.

Given the literal with suffix _X, the grammar calls _X an "identifier".
So, yes: the standard has, presumably inadvertently, made it impossible to create a UDT at global scope, or UDTs that start with a capital letter, in a well-defined program. (Note that the former is not something you generally want to do anyway!)
This cannot be resolved editorially: the names of user-defined literals would have to have their own lexical "namespace" that prevented clashes with (for example) names of implementation-provided functions. In my opinion, though, it would have been nice for there to be a non-normative note somewhere, pointing out the consequences of these rules and pointing out that they are deliberate.

Related

What's the meaning of "reserved for any use"?

NOTE: This is a c question, though I added c++ in case some C++ expert can provide a rationale or historical reason why C++ is using a different wording than C.
In the C standard library specification, we have this normative text, C17 7.1.3 Reserved identifiers (emphasis mine):
All identifiers that begin with an underscore and either an uppercase letter or another underscore are always reserved for any use.
All identifiers that begin with an underscore are always reserved for use as identifiers with file scope in both the ordinary and tag name spaces.
Now I keep reading answers on SO by various esteemed C experts, where they claim it is fine for a compiler or standard library to use identifiers with underscore + uppercase, or double underscore.
Doesn't "reserved for any use" mean reserved for anyone except future extensions to the C language itself? Meaning that the implementation is not allowed to use them.
While the second phrase above, regarding single leading underscore seems to be directed to the implementation?
In general, the C standard is written in a way that expects compiler vendors/library implementers to be the typical reader - not so much the application programmers.
Notably, C++ has a very different wording:
Each name that contains a double underscore (__) or begins with an underscore followed by an uppercase letter (2.11) is reserved to the implementation for any use.
(See What are the rules about using an underscore in a C++ identifier?)
Is this perhaps a mix-up between C and C++ and the languages are different here?
In the C standard, the meaning of the term "reserved" is defined by 7.1.3p2, immediately below the bullet list you are quoting:
No other identifiers are reserved. If the program declares or defines an identifier in a context in which it is reserved (other than as allowed by 7.1.4), or defines a reserved identifier as a macro name, the behavior is undefined.
Emphasis mine: reserved identifiers place a restriction on the program, not the implementation. Thus, the common interpretation – reserved identifiers may be used by the implementation to any purpose – is correct for C.
I have not kept up with the C++ standard and no longer feel qualified to interpret it.
While the Standard is primarily written to guide implementers, it is written as a description of what makes a program well-formed, and what its effect is. That's because the basic definition of a standards-conforming compiler is one that does the correct thing for any standards-conforming program:
A strictly conforming program shall use only those features of the language and library
specified in this International Standard....A conforming
hosted implementation shall accept any strictly conforming program.
Read separately, this is hugely restrictive of extensions to a compiler. For instance, based solely on that clause, a compiler shouldn't get to define any of its own reserved words. After all, any given word a particular compiler might want to reserve, could nevertheless show up in a strictly conforming program, forcing the compiler's hand.
The standard goes on, however:
A conforming implementation may have extensions (including additional
library functions), provided they do not alter the behavior of any strictly conforming
program.
That's the key piece. Compiler extensions need to be written in such a way that they affect nonconforming programs (ones which contain undefined behavior, or which shouldn't even compile at all), allowing them to compile and do fun extra things.
So the purpose of defining "reserved identifiers", when the language doesn't actually need those identifiers for anything, is to give implementations some extra wiggle room by providing them with some things which make a program nonconforming. The reason a compiler can recognize, say, __declspec as part of a declaration is because putting __declspec into a declaration is otherwise illegal, so the compiler is allowed to do whatever it wants!
The importance of "reserved for any use", therefore, is that it leaves no question about a compiler's power to treat such identifiers as having any meaning it cares to. Future compatibility is a comparatively distant concern.
The C++ standard works in a similar way, though it's a bit more explicit about the gambit:
A conforming implementation may have extensions (including additional library functions), provided they do
not alter the behavior of any well-formed program. Implementations are required to diagnose programs that
use such extensions that are ill-formed according to this International Standard. Having done so, however,
they can compile and execute such programs.
I suspect the difference in wording is down to the C++ standard just being clearer about how extensions are meant to work. Nevertheless, nothing in the C standard precludes an implementation from doing the same thing. (And we all basically ignore the requirement that the compiler warn you every time you use __declspec.)
Regarding the difference in wording in C versus C++, I'm posting my own little research here as reference:
The early K&R C 1st edition has this text:
...names which are intended for use only by functions of the library begin with an underscore so they are less likely to collide with names in a user's program.
K&R 2nd edition added an Appendix B which addresses the standard library, where we can read
External identifiers that begin with an underscore are reserved for use by the library, as are all
other identifiers that begin with an underscore and an upper-case letter or another underscore.
Early ANSI C drafts, as well as "C90" ISO 9899:1990, has the same text as in the current ISO standard.
The earliest C++ drafts however, has a different text, as noted by #hvd, possibly a clarification of the C standard. From DRAFT: 20 September 1994:
17.3.3.1.2 Global names
...
Each name that begins with an underscore and either an uppercase letter or another underscore (2.8) is
reserved to the implementation for any use
So apparently the wording "reserved for any use" was invented by the ANSI/ISO C90 committee, whereas the C++ committee some years later used a clearer wording, similar to the wording in the pre-standard K&R book.
The C99 rationale V5.10 says this below 7.1.3:
Also reserved for the implementor are all external identifiers beginning with an underscore, and
all other identifiers beginning with an underscore followed by a capital letter or an underscore.
This gives a name space for writing the numerous behind-the-scenes non-external macros and
functions a library needs to do its job properly.
This makes the committee's intention quite clear: "reserved for any use" means "reserved for the implementor".
Also of note, the current C standard has the following normative text elsewhere, in 6.2.5:
There may also be
implementation-defined extended signed integer types. 38)
where the informative foot note 38 says:
Implementation-defined keywords shall have the form of an identifier reserved for any use as
described in 7.1.3.
C has multiple contexts in which a symbol can have a definition:
The space of macro names,
The space of formal names of arguments to a macro (this space is specific to each function-like macro),
The space of ordinary identifiers,
The space of tag names,
The space of labels (this space is specific to each function), and
The space of structure/union members (this space is specific to each struct/union).
What "reserved for any use" means that the user code in a compliant program cannot use1 symbols that start with an underscore that is followed by an uppercase letter or another underscore in any of the above contexts. Compare with identifiers that start with a single underscore but are followed by a lowercase number or a digit. This falls into the second class of identifiers that start with an underscore. User code can can be use these identifiers as the names of macro arguments, as labels, or as the names of structure/union members.
"Reserved for any use" does not mean that the implementation cannot use such symbols. The intent of the reservation is to provide a name space that implementations can freely use without concern that the names defined by the implementation will conflict with the names defined by the user code in a compliant program.
1The standard does not quite mean "cannot use". The standard encourages the programmatic use of a small number of names that start with a double underscore. For example, a compliant implementation is required to define __STDC_VERSION__, __FILE__, __LINE__, and __func__. The 2011 version of the standard even gives an example of a presumably compliant program that references __func__.
The C Standard allows implementations to attach any meaning they see fit to reserved identifiers. Most implementations will treat unrecognized identifiers of reserved forms the same as any other recognized identifiers when there is no reason to do otherwise, thus allowing something like:
#ifdef __ACME_COMPILER
#define near __near
#else
#define near
#endif
int near foo;
to declare an identifier foo using a __near qualifier if the code is being processed in an Acme compiler (which would presumably support such a thing), but also be compatible with other compilers that would not require or benefit from the use of such a directive. Nothing would forbid a conforming implementation from defining __ACME_COMPILER and interpreting __near to mean "launch nuclear missiles", but a quality implementation shouldn't go out of its way to break code like the above. If an implementation doesn't know what __ACME_COMPILER is supposed to mean, treating it like any other unknown identifier would allow it to support useful constructs like the above.
It is months late but one point remains the others have not addressed.
Your question can be viewed from the opposite direction. The standard allows the implementation (as you have observed) to use a symbol like _Foo but, more importantly, thereby forbids the implementation from using foo. The latter is reserved for your use.
To understand, for discussion's sake, suppose that a future C standard introduced the new keyword _Foo. The hypothetical implementation was already using this symbol, so what happens?
Answer:
At first, the implementation will not yet have implemented the new standard. Until implemented, the new standard lacks practical effect.
Later, as part of implementing the new standard, the implementation quietly changes each _Foo to _Bar.
No problem.
In fact, if you think about it in this manner, you can say that the way the standard reserves such words is almost the only way it could reserve them.

Is it definitely illegal to refer to a reserved name?

On the std-proposals list, the following code was given:
#include <vector>
#include <algorithm>
void foo(const std::vector<int> &v) {
#ifndef _ALGORITHM
std::for_each(v.begin(), v.end(), [](int i){std::cout << i; }
#endif
}
Let's ignore, for the purposes of this question, why that code was given and why it was written that way (as there was a good reason but it's irrelevant here). It supposes that _ALGORITHM is a header guard inside the standard header <algorithm> as shipped with some known standard library implementation. There is no inherent intention of portability here.
Now, _ALGORITHM would of course be a reserved name, per:
[C++11: 2.11/3]: In addition, some identifiers are reserved for use by C++ implementations and standard libraries (17.6.4.3.2) and shall not be used otherwise; no diagnostic is required.
[C++11: 17.6.4.3.2/1]: Certain sets of names and function signatures are always reserved to the implementation:
Each name that contains a double underscore _ _ or begins with an underscore followed by an uppercase letter (2.12) is reserved to the implementation for any use.
Each name that begins with an underscore is reserved to the implementation for use as a name in the global namespace.
I was always under the impression that the intent of this passage was to prevent programmers from defining/mutating/undefining names that fall under the above criteria, so that the standard library implementors may use such names without any fear of conflicts with client code.
But, on the std-proposals list, it was claimed that this code is itself ill-formed for merely referring to such a reserved name. I can now see how the use of the phrase "shall not be used otherwise" from [C++11: 2.11/3]: may indeed suggest that.
One practical rationale given was that the macro _ALGORITHM could expand to some code that wipes your hard drive, for example. However, taking into account the likely intention of the rule, I'd say that such an eventuality has more to do with the obvious implementation-defined* nature of the _ALGORITHM name, and less to do with it being outright illegal to refer to it.
* "implementation-defined" in its English language sense, not the C++ standard sense of the phrase
I'd say that, as long as we're happy that we are going to have implementation-defined results and that we should investigate what that macro means on our implementation (if it exists at all!), it should not be inherently illegal to refer to such a macro provided we do not attempt to modify it.
For example, code such as the following is used all over the place to distinguish between code compiled as C and code compiled as C++:
#ifdef __cplusplus
extern "C" {
#endif
and I've never heard a complaint about that.
So, what do you think? Does "shall not be used otherwise" include simply writing such a name? Or is it probably not intended to be so strict (which may point to an opportunity to adjust the standard wording)?
Whether it's legal or not is implementation-specific (and identifier-specific).
When the Standard gives the implementation the sole right to use these names, that includes the right to make the names available in user code. If an implementation does so, great.
But if an implementation doesn't expressly give you the right, it is clear from "shall not be used otherwise" that the Standard does not, and you have undefined behavior.
The important part is "reserved to the implementation". It means that the compiler vendor may use those names and even document them. Your code may then use those names as documented. This is often used for extensions like __builtin_expect, where the compiler vendor avoids any clash with your identifiers (that are declared by your code) by using those reserved names. Even the standard uses them for things like __attribute__ to make sure it doesn't break existing (legal) code when adding new features.
http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1882
Each identifier that contains a double understore __ or begins with an underscore followed by an uppercase letter is reserved to the implementation for any use.
any use. (similar text occurs both before and after that defect fix is applied)
__cplusplus is defined by the standard. _ALGORITHM is reserved by the standard to be used by implementations. These seem quite different? (The two sections of the standard do conflict, in that one states that __cplusplus is reserved for any use, and another uses it specifically, but I think that the winner of that conflict is clear).
The _ALGORITHM identifier could, under the standard, be used as part of a pre-processing step to say "replace this source code with hard drive deleting code". Its existence (prior to pre-processing, or after) could be sufficient to completely change your program behavior.
Now this is unlikely, but I do not think it results in an non-conforming implementation. It is a matter of quality of implementation only.
An implementation is free to document and define what _ALGORITHM means. For example, it could document that it is a header guard for <algorithm>, and indicates if that header file has been included. Treating your current <algorithm> implementation as documentation is probably going to far.
I'd guess using __cplusplus in C mode is technically "just as bad" as using _ALGORITHM, but this question is a c++ question, not a c question. I haven't delved into the c standard to look for quotes about it.
The names in [cpp.predefined] are different. Those have a specified meaning, so an implementation can't reserve them for any use, and using them in a program has a well-defined portable meaning. Using an implementation-specific identifier like the example of _ALGORITHM is ill-formed because it violates a shall-rule.
Yes, I'm fully aware of multiple examples where the library specification uses "shall" to mean "this is a requirement on user code, and violations are UB, not ill-formed".
Regarding whether it's UB or implementation-defined, running an ill-formed program results in UB. The standard wording clearly says the program is ill-formed, UB occurs if the implementation still chooses to accept the program and run it.
So, if a program uses the identifier _ALGORITHM, that program is ill-formed, and running such a program is UB, but that does not mean it doesn't work fine on an implementation that uses _ALGORITHM as an include guard, nor does it mean that it doesn't work fine on an implementation that doesn't.
If users are concerned about such ill-formedness and potential UB, and said users want to write portable C++, they shouldn't use reserved identifiers in portable C++ programs. If users accept that regardless of the standard prohibiting such a use, no practical implementation will wipe your hard drive, they can freely use such reserved identifiers, but by the letter of the standard, such uses are still ill-formed.
Historically, the purpose for making the use of such tokens "undefined behavior" is that compilers are free to attach any meaning they want to any such token that are not defined within the C standard. For example, on some embedded processors, using __xdata as a storage class for a variable will ask that it be stored in an area of RAM which is slower to access than the normal variable-storage area, but is much larger. On typical processors of that family, storage for "normal" variables would be limited to about 100 bytes, but storage for xdata variables may be much larger--up to 64K. The standard says basically nothing about what compilers are allowed to do with such directives, although typically (I'm not sure if the standard mandates this behavior, though I'm unaware of compilers violating it) such tokens are generally ignored within code that is disabled using a #if or similar directives.
Some libraries' header files will start their own internal identifiers with something that starts with two underscores but includes a pattern that's unlikely to be used by a compiler for any purpose (e.g. version 23 of the Foozle library might precede its identifiers with use __FZ23). It would be perfectly legitimate for a future compilers to use identifiers starting with __FZ23 for other purposes, and if that were to happen the Foozle library would need to be changed to use something else. If, however, it is likely that a major compiler upgrade would likely necessitate rewrites of the Foozle library for other reasons anyway, that risk may be acceptable compared to the risk of identifiers conflicting with outside code.
Note also that some project header files which are targeted toward a processor that requires __ directives may conditionally define macros with those names when compiled for other processors, for example:
#ifndef USE_XDATA
#define __XDATA
#endif
though a somewhat better pattern would generally be:
#ifdef USE_XDATA
#define XDATA __XDATA
#else
#define XDATA
#endif
When writing new code, the latter pattern is often better, but the former pattern may sometimes be useful when adapting existing code written on a platform that requires __XDATA so that it may be used both on platforms that use/require that directive and on platforms that do not.
Whether or not it is legal is a matter of local law. Whether it means anything, and if so, what, is a matter for the language definition. When you use a name that's reserved to the implementation the behavior of your program is undefined. That means that the language definition does not tell you what the program does. Nothing more, nothing less. If the compiler you're using documents what a particular reserved identifier does, then you can use that identifier with that compiler. If you hunt through headers and guess what various un-documented identifiers mean you might be able to use them, but don't be surprised if your code breaks when a subsequent update changes something.
Don't get hung up on __cplusplus. It's core language, and the stuff about double underscores, etc. is library. If that's not convincing, just consider it a glitch. You can use __cplusplus in C++ programs; its meaning is well defined.

Why aren't C++14 standard-defined literals in the global namespace by default?

C++14 includes standard-defined literals for, amongst other things, std::string and various timespans from the <chrono> header.
To use them you must say using namespace std::literals; (or some variation depending on exactly which literals you want, as they're in a variety of inline namespaces).
All this is good, but I'm curious as to why the using declaration is required. UDLs without a leading underscore are reserved for the implementation, so there is no possibility that "hello world"s could ever mean anything else in a standard-conforming programme.
So why isn't #include <string> sufficient to bring the literal conversion function into scope? Why must I explicitly include the literal namespace?
EDIT: N3531 is the most recent version of the proposal I could find -- unfortunately it doesn't discuss the motivation for putting things in a namespace but only says:
One can summarize the requirements of the [Portland] discussion as follows:
use an inline namespace for a (group of related) UDL operator(s)
There already are two UDLs named s: one for strings and one for seconds. Due to the understandably terse names of suffixes, they chronically suffer from name conflicts, so pouring all of them into one namespace cannot go well for long. Hence it was decided that they be put into inline namespaces, which allow for both unambiguous (using namespace std::literals::chrono_literals) and simple using directives (using namespace std).
the standard library already defines multiple versions of what s can mean:
It can be used to define a string literal.
It can be used to define a chrono::seconds literal.
One is based on a string literal, one is based on an integer or a double literal, of course, i.e., they can actually coexist. However, I'd expect that there may be more uses of s in the future. Thus, having to choose which namespaces are imported rather than getting any imposed on you seems like a reasonable approach.
Look at paper N2765. UDLs are hooked into the regular name lookup process. As string literals have common string types, there's a large chance of a collision if you ignored namespaces.

Global variables, what are they exactly?

First of all, I'm new to c++, and 'trying' to prefix my variables.
But it isn't very clear to me.
So my question is, is it correct to prefix static variables with "g_"?
Thank you!
using namespace std;
// The main window class name.
static TCHAR g_szWindowClass[] = _T("win32app");
// The string that appears in the application's title bar.
static TCHAR g_szTitle[] = _T("Win32 App");
...
It's better to use a prefix than nothing that distinguishes global variables as such. But
it's even better to avoid global variables to the degree possible, and
instead of a C style prefix, in C++ you can use a named namespace.
It also has many advantages to avoid Microsoft's T macro silliness. It's in support of Windows 9x, and you're probably not targeting Windows 9x. Also, it has many advantages, not the least for maintenance, to avoid Microsoft's silly Hungarian notation thing, that is, prefixes like sz, which was in support of Microsoft's 1980's Programmers Workbench help system, which just like Windows 98 is not very relevant any longer.
Also, it can be advantageous to use const wherever practically possible.
Note that const at namespace level implies static storage class, so an explicit static is then no longer necessary.
Thus, instead of the current
// The main window class name.
static TCHAR g_szWindowClass[] = _T("win32app");
do
namespace g {
auto const windowClassName = L"win32app";
}
with
C++ namespace g instead of C prefix g_,
const added, guaranteeing that this variable is not modified, and
direct use of wide character literal instead of Microsoft Windows 9x T macros.
Then you can refer to g::windowClassName, or without the prefix after a using namespace g;, or even with an alias for g.
The particular braces convention I use for namespaces is in support of nested namespaces without the indentation hassle. Unfortunately that's not supported by common editors.
C++ has no official naming conventions. It does have a few rules for variable names, or identifers in general, which you have to follow, but other than that, names are entirely up to you, with all the flexibility and dangers it brings (much like the rest of the language).
Here is a good overview of the rules: http://en.cppreference.com/w/cpp/keyword
So, for example, _G_szTitle would be wrong, but g_szTitle is OK.
The real problem is that you almost certainly do not want to use globals. Global variables are almost always bad design. Avoid them.
Another, smaller, problem is that you use the so-called "Hungarian notation". Google a bit for it to find out why many people (myself included) are opposed to it, especially in a language like C++.
The most obvious definition of a global variable is a variable declared at namespace scope (including the outermost namespace).
Now, you could argue that a variable declared at namespace scope which is also declared static and, thus, isn't visible outside the given translation unit. Likewise, a variable declared in an unnamed namespace might be considered non-global. However, both of these kinds of variables shared many of the the bad properties of global variables. For example, they introduce a serialization point when being accessed from multiple threads.
Thus, I consider actually a wider range of variables to be global, i.e., also static data members in classes and function locale static variables. Each of these also exists just once throughout a a program. Just because these constructs happen to be used for some [anti] design patterns (notable Singleton) doesn't magically bless global variables!
With respect to prefixing variables names: do not include type prefix into your variable names! In C++ types are already sufficiently checked by the compiler. Including the type tends to result in eventually incorrect names. Specifically with respect to global variables, here is my recommendation for their prefix: whenever you want to use the prefix for a global variable stop whatever you are doing! You are in the process of constructing a problem and you should rather seek to change the design to remove the need for the global variable!
C++11 Standard (draft n3337):
17.6.4.3.2 Global names [global.names]
Certain sets of names and function signatures are always reserved to the implementation:
— Each name that contains a double underscore __ or begins with an underscore followed by an uppercase letter (2.12) is reserved to the implementation for any use.
— Each name that begins with an underscore is reserved to the implementation for use as a name in the global namespace.
Other than these there aren't any restrictions on the (identifier) names you choose for global variables.
It's a convention used by some to prefix global variables by g_, member variables by m_, etc. This is a matter of choice; the language itself doesn't impose such a requirement. So you're free to name them anything and prefix them with anything as long as the identifier starts with an English alphabet.
As for the usage of global variables, I would say if you are just beginning to learn C++, use them, get hurt and then realize how they are bad; you'll see why they are always condemned by experienced programmers. Just telling they're bad would add little value, some things are better learned by experience.

C++ using C code using double underscores in defines and identifiers

I understand that in C++ double underscores in identifiers are reserved for the compiler. I have some C code which has characteristics similar to this in the corresponding header files:
extern "C" {
#define HELLO__THERE 1
int hello__out__there( int );
}
I will be using this header in a C++ project, and plan to be doing things in C++ like:
if (HELLO__THERE == abc)
hello__out__there(foo);
Is this acceptable behavior in C++, covered by the standard?
In the C++03 standard 17.4.3.1.2 Global names, that use of underscores is defined as reserved:
Each name that contains a double underscore (_ _) or begins with an underscore followed by an upper-
case letter (2.11) is reserved to the implementation for any use.
Being reserved means that it might be used in any conforming implementation and therefore it is not advisable to use it.
You should be fine, unless by some fluke chance that one of the defines has clashes with your compiler's one. If that is the case, it'll likely be a warning or error (depending on your compiler's configuration) that there'll be a duplicate symbol.
Hope it helps. Cheers!
The method call would be OK but why compare HELLO_THERE to some value abc? If you were testing to see if a method was there I would wrap it in #ifdef ... #endif instead because if hello_out_there is not defined for some reason that would be a compile error.
double underlines in identifiers are reserved for the compiler
First, it's underscore I guess. Second such identifiers are reserved. That doesn't hold one back to not use it. You can use it (until there is no naming conflict).
Is this acceptable behavior in C++, covered by the standard?
Yes. It's acceptable. However, there is difference between acceptable and good code. If you are following a proper coding guidelines then your code will be good as well as acceptable. IMHO, you should refer to some good coding standards on internet; it will help you a lot.