What is the decimal point ('.') in C++ and can I make one? - c++

I am in a C++ class right now so this question will concern itself primarily with that language, though I haven't been able to find any information for any other language either and I have a feeling whatever the answer is it's probably largely cross language.
In C++ unmarked numbers are assumed to be of integral type ('4', for example, is an integer)
various bounding marks allow for the number to be interpreted differently (''4'', for example, is a character, '"4"' a string).
As far as I know, there is only one kind of unary mark: the decimal point. ('4.' is a double).
I would like to create a new unary mark that designates a constant number in the code to be interpreted as a member of a created datatype. More fundamentally, I would like to know what the '.' and ',' and '"', and ''' are (they aren't operators, keywords, or statements, so what are they?) and how the compiler deals with/interprets them.
More information, if you feel it is necessary:
I am trying to make a complex number header that I can include in any project to do complex math. I am aware of the library but it is, IMHO, ugly and if used extensively slows down coding time. Also I'm mostly trying to code this to improve my programming skills. My goal is to be able to declare a complex variable by doing something of the form cmplx num1= 3 + 4i; where '3' and '4' are arbitrary and 'i' is a mark similar to the decimal point which indicates '4' as imaginary.

I would like to create a new unary mark that designates a constant number in the code to be interpreted as a member of a created datatype.
You can use user defined literals that were introduced in C++11. As an example, assuming you have a class type Type and you want to use the num_y syntax, where num is a NumericType, you can do:
Type operator"" _y(NumericType i) {
return Type(i);
}
Live demo

Things like 4, "4" and 4. are all single tokens,
indivisible. There's no way you can add new tokens to the
language. In C++11, it is possible to define user defined
literals, but they still consist of several tokens; for complex,
a much more natural solution would be to support a constant i,
to allow writing things like 4 + 3*i. (But you'd still need
the C++11 support for constexpr for it to be a compile time
constant.)

Related

C++ std Chrono - How did they manage to let us declare values as 1s, 1000ms, etc? [duplicate]

C++11 introduces user-defined literals which will allow the introduction of new literal syntax based on existing literals (int, hex, string, float) so that any type will be able to have a literal presentation.
Examples:
// imaginary numbers
std::complex<long double> operator "" _i(long double d) // cooked form
{
return std::complex<long double>(0, d);
}
auto val = 3.14_i; // val = complex<long double>(0, 3.14)
// binary values
int operator "" _B(const char*); // raw form
int answer = 101010_B; // answer = 42
// std::string
std::string operator "" _s(const char* str, size_t /*length*/)
{
return std::string(str);
}
auto hi = "hello"_s + " world"; // + works, "hello"_s is a string not a pointer
// units
assert(1_kg == 2.2_lb); // give or take 0.00462262 pounds
At first glance this looks very cool but I'm wondering how applicable it really is, when I tried to think of having the suffixes _AD and _BC create dates I found that it's problematic due to operator order. 1974/01/06_AD would first evaluate 1974/01 (as plain ints) and only later the 06_AD (to say nothing of August and September having to be written without the 0 for octal reasons). This can be worked around by having the syntax be 1974-1/6_AD so that the operator evaluation order works but it's clunky.
So what my question boils down to is this, do you feel this feature will justify itself? What other literals would you like to define that will make your C++ code more readable?
Updated syntax to fit the final draft on June 2011
At first sight, it seems to be simple syntactic sugar.
But when looking deeper, we see it's more than syntactic sugar, as it extends the C++ user's options to create user-defined types that behave exactly like distinct built-in types. In this, this little "bonus" is a very interesting C++11 addition to C++.
Do we really need it in C++?
I see few uses in the code I wrote in the past years, but just because I didn't use it in C++ doesn't mean it's not interesting for another C++ developer.
We had used in C++ (and in C, I guess), compiler-defined literals, to type integer numbers as short or long integers, real numbers as float or double (or even long double), and character strings as normal or wide chars.
In C++, we had the possibility to create our own types (i.e. classes), with potentially no overhead (inlining, etc.). We had the possibility to add operators to their types, to have them behave like similar built-in types, which enables C++ developers to use matrices and complex numbers as naturally as they would have if these have been added to the language itself. We can even add cast operators (which is usually a bad idea, but sometimes, it's just the right solution).
We still missed one thing to have user-types behave as built-in types: user-defined literals.
So, I guess it's a natural evolution for the language, but to be as complete as possible: "If you want to create a type, and you want it to behave as much possible as a built-in types, here are the tools..."
I'd guess it's very similar to .NET's decision to make every primitive a struct, including booleans, integers, etc., and have all structs derive from Object. This decision alone puts .NET far beyond Java's reach when working with primitives, no matter how much boxing/unboxing hacks Java will add to its specification.
Do YOU really need it in C++?
This question is for YOU to answer. Not Bjarne Stroustrup. Not Herb Sutter. Not whatever member of C++ standard committee. This is why you have the choice in C++, and they won't restrict a useful notation to built-in types alone.
If you need it, then it is a welcome addition. If you don't, well... Don't use it. It will cost you nothing.
Welcome to C++, the language where features are optional.
Bloated??? Show me your complexes!!!
There is a difference between bloated and complex (pun intended).
Like shown by Niels at What new capabilities do user-defined literals add to C++?, being able to write a complex number is one of the two features added "recently" to C and C++:
// C89:
MyComplex z1 = { 1, 2 } ;
// C99: You'll note I is a macro, which can lead
// to very interesting situations...
double complex z1 = 1 + 2*I;
// C++:
std::complex<double> z1(1, 2) ;
// C++11: You'll note that "i" won't ever bother
// you elsewhere
std::complex<double> z1 = 1 + 2_i ;
Now, both C99 "double complex" type and C++ "std::complex" type are able to be multiplied, added, subtracted, etc., using operator overloading.
But in C99, they just added another type as a built-in type, and built-in operator overloading support. And they added another built-in literal feature.
In C++, they just used existing features of the language, saw that the literal feature was a natural evolution of the language, and thus added it.
In C, if you need the same notation enhancement for another type, you're out of luck until your lobbying to add your quantum wave functions (or 3D points, or whatever basic type you're using in your field of work) to the C standard as a built-in type succeeds.
In C++11, you just can do it yourself:
Point p = 25_x + 13_y + 3_z ; // 3D point
Is it bloated? No, the need is there, as shown by how both C and C++ complexes need a way to represent their literal complex values.
Is it wrongly designed? No, it's designed as every other C++ feature, with extensibility in mind.
Is it for notation purposes only? No, as it can even add type safety to your code.
For example, let's imagine a CSS oriented code:
css::Font::Size p0 = 12_pt ; // Ok
css::Font::Size p1 = 50_percent ; // Ok
css::Font::Size p2 = 15_px ; // Ok
css::Font::Size p3 = 10_em ; // Ok
css::Font::Size p4 = 15 ; // ERROR : Won't compile !
It is then very easy to enforce a strong typing to the assignment of values.
Is is dangerous?
Good question. Can these functions be namespaced? If yes, then Jackpot!
Anyway, like everything, you can kill yourself if a tool is used improperly. C is powerful, and you can shoot your head off if you misuse the C gun. C++ has the C gun, but also the scalpel, the taser, and whatever other tool you'll find in the toolkit. You can misuse the scalpel and bleed yourself to death. Or you can build very elegant and robust code.
So, like every C++ feature, do you really need it? It is the question you must answer before using it in C++. If you don't, it will cost you nothing. But if you do really need it, at least, the language won't let you down.
The date example?
Your error, it seems to me, is that you are mixing operators:
1974/01/06AD
^ ^ ^
This can't be avoided, because / being an operator, the compiler must interpret it. And, AFAIK, it is a good thing.
To find a solution for your problem, I would write the literal in some other way. For example:
"1974-01-06"_AD ; // ISO-like notation
"06/01/1974"_AD ; // french-date-like notation
"jan 06 1974"_AD ; // US-date-like notation
19740106_AD ; // integer-date-like notation
Personally, I would choose the integer and the ISO dates, but it depends on YOUR needs. Which is the whole point of letting the user define its own literal names.
Here's a case where there is an advantage to using user-defined literals instead of a constructor call:
#include <bitset>
#include <iostream>
template<char... Bits>
struct checkbits
{
static const bool valid = false;
};
template<char High, char... Bits>
struct checkbits<High, Bits...>
{
static const bool valid = (High == '0' || High == '1')
&& checkbits<Bits...>::valid;
};
template<char High>
struct checkbits<High>
{
static const bool valid = (High == '0' || High == '1');
};
template<char... Bits>
inline constexpr std::bitset<sizeof...(Bits)>
operator"" _bits() noexcept
{
static_assert(checkbits<Bits...>::valid, "invalid digit in binary string");
return std::bitset<sizeof...(Bits)>((char []){Bits..., '\0'});
}
int
main()
{
auto bits = 0101010101010101010101010101010101010101010101010101010101010101_bits;
std::cout << bits << std::endl;
std::cout << "size = " << bits.size() << std::endl;
std::cout << "count = " << bits.count() << std::endl;
std::cout << "value = " << bits.to_ullong() << std::endl;
// This triggers the static_assert at compile time.
auto badbits = 2101010101010101010101010101010101010101010101010101010101010101_bits;
// This throws at run time.
std::bitset<64> badbits2("2101010101010101010101010101010101010101010101010101010101010101_bits");
}
The advantage is that a run-time exception is converted to a compile-time error.
You couldn't add the static assert to the bitset ctor taking a string (at least not without string template arguments).
It's very nice for mathematical code. Out of my mind I can see the use for the following operators:
deg for degrees. That makes writing absolute angles much more intuitive.
double operator ""_deg(long double d)
{
// returns radians
return d*M_PI/180;
}
It can also be used for various fixed point representations (which are still in use in the field of DSP and graphics).
int operator ""_fix(long double d)
{
// returns d as a 1.15.16 fixed point number
return (int)(d*65536.0f);
}
These look like nice examples how to use it. They help to make constants in code more readable. It's another tool to make code unreadable as well, but we already have so much tools abuse that one more does not hurt much.
UDLs are namespaced (and can be imported by using declarations/directives, but you cannot explicitly namespace a literal like 3.14std::i), which means there (hopefully) won't be a ton of clashes.
The fact that they can actually be templated (and constexpr'd) means that you can do some pretty powerful stuff with UDLs. Bigint authors will be really happy, as they can finally have arbitrarily large constants, calculated at compile time (via constexpr or templates).
I'm just sad that we won't see a couple useful literals in the standard (from the looks of it), like s for std::string and i for the imaginary unit.
The amount of coding time that will be saved by UDLs is actually not that high, but the readability will be vastly increased and more and more calculations can be shifted to compile-time for faster execution.
Bjarne Stroustrup talks about UDL's in this C++11 talk, in the first section on type-rich interfaces, around 20 minute mark.
His basic argument for UDLs takes the form of a syllogism:
"Trivial" types, i.e., built-in primitive types, can only catch trivial type errors. Interfaces with richer types allow the type system to catch more kinds of errors.
The kinds of type errors that richly typed code can catch have impact on real code. (He gives the example of the Mars Climate Orbiter, which infamously failed due to a dimensions error in an important constant).
In real code, units are rarely used. People don't use them, because incurring runtime compute or memory overhead to create rich types is too costly, and using pre-existing C++ templated unit code is so notationally ugly that no one uses it. (Empirically, no one uses it, even though the libraries have been around for a decade).
Therefore, in order to get engineers to use units in real code, we needed a device that (1) incurs no runtime overhead and (2) is notationally acceptable.
Let me add a little bit of context. For our work, user defined literals is much needed. We work on MDE (Model-Driven Engineering). We want to define models and metamodels in C++. We actually implemented a mapping from Ecore to C++ (EMF4CPP).
The problem comes when being able to define model elements as classes in C++. We are taking the approach of transforming the metamodel (Ecore) to templates with arguments. Arguments of the template are the structural characteristics of types and classes. For example, a class with two int attributes would be something like:
typedef ::ecore::Class< Attribute<int>, Attribute<int> > MyClass;
Hoever, it turns out that every element in a model or metamodel, usually has a name. We would like to write:
typedef ::ecore::Class< "MyClass", Attribute< "x", int>, Attribute<"y", int> > MyClass;
BUT, C++, nor C++0x don't allow this, as strings are prohibited as arguments to templates. You can write the name char by char, but this is admitedly a mess. With proper user-defined literals, we could write something similar. Say we use "_n" to identify model element names (I don't use the exact syntax, just to make an idea):
typedef ::ecore::Class< MyClass_n, Attribute< x_n, int>, Attribute<y_n, int> > MyClass;
Finally, having those definitions as templates helps us a lot to design algorithms for traversing the model elements, model transformations, etc. that are really efficient, because type information, identification, transformations, etc. are determined by the compiler at compile time.
Supporting compile-time dimension checking is the only justification required.
auto force = 2_N;
auto dx = 2_m;
auto energy = force * dx;
assert(energy == 4_J);
See for example PhysUnits-CT-Cpp11, a small C++11, C++14 header-only library for compile-time dimensional analysis and unit/quantity manipulation and conversion. Simpler than Boost.Units, does support unit symbol literals such as m, g, s, metric prefixes such as m, k, M, only depends on standard C++ library, SI-only, integral powers of dimensions.
Hmm... I have not thought about this feature yet. Your sample was well thought out and is certainly interesting. C++ is very powerful as it is now, but unfortunately the syntax used in pieces of code you read is at times overly complex. Readability is, if not all, then at least much. And such a feature would be geared for more readability. If I take your last example
assert(1_kg == 2.2_lb); // give or take 0.00462262 pounds
... I wonder how you'd express that today. You'd have a KG and a LB class and you'd compare implicit objects:
assert(KG(1.0f) == LB(2.2f));
And that would do as well. With types that have longer names or types that you have no hopes of having such a nice constructor for sans writing an adapter, it might be a nice addition for on-the-fly implicit object creation and initialization. On the other hand, you can already create and initialize objects using methods, too.
But I agree with Nils on mathematics. C and C++ trigonometry functions for example require input in radians. I think in degrees though, so a very short implicit conversion like Nils posted is very nice.
Ultimately, it's going to be syntactic sugar however, but it will have a slight effect on readability. And it will probably be easier to write some expressions too (sin(180.0deg) is easier to write than sin(deg(180.0)). And then there will be people who abuse the concept. But then, language-abusive people should use very restrictive languages rather than something as expressive as C++.
Ah, my post says basically nothing except: it's going to be okay, the impact won't be too big. Let's not worry. :-)
I have never needed or wanted this feature (but this could be the Blub effect). My knee jerk reaction is that it's lame, and likely to appeal to the same people who think that it's cool to overload operator+ for any operation which could remotely be construed as adding.
C++ is usually very strict about the syntax used - barring the preprocessor there is not much you can use to define a custom syntax/grammar. E.g. we can overload existing operatos, but we cannot define new ones - IMO this is very much in tune with the spirit of C++.
I don't mind some ways for more customized source code - but the point chosen seems very isolated to me, which confuses me most.
Even intended use may make it much harder to read source code: an single letter may have vast-reaching side effects that in no way can be identified from the context. With symmetry to u, l and f, most developers will choose single letters.
This may also turn scoping into a problem, using single letters in global namespace will probably be considered bad practice, and the tools that are supposed mixing libraries easier (namespaces and descriptive identifiers) will probably defeat its purpose.
I see some merit in combination with "auto", also in combination with a unit library like boost units, but not enough to merit this adition.
I wonder, however, what clever ideas we come up with.
I used user literals for binary strings like this:
"asd\0\0\0\1"_b
using std::string(str, n) constructor so that \0 wouldn't cut the string in half. (The project does a lot of work with various file formats.)
This was helpful also when I ditched std::string in favor of a wrapper for std::vector.
Line noise in that thing is huge. Also it's horrible to read.
Let me know, did they reason that new syntax addition with any kind of examples? For instance, do they have couple of programs that already use C++0x?
For me, this part:
auto val = 3.14_i
Does not justify this part:
std::complex<double> operator ""_i(long double d) // cooked form
{
return std::complex(0, d);
}
Not even if you'd use the i-syntax in 1000 other lines as well. If you write, you probably write 10000 lines of something else along that as well. Especially when you will still probably write mostly everywhere this:
std::complex<double> val = 3.14i
'auto' -keyword may be justified though, only perhaps. But lets take just C++, because it's better than C++0x in this aspect.
std::complex<double> val = std::complex(0, 3.14);
It's like.. that simple. Even thought all the std and pointy brackets are just lame if you use it about everywhere. I don't start guessing what syntax there's in C++0x for turning std::complex under complex.
complex = std::complex<double>;
That's perhaps something straightforward, but I don't believe it's that simple in C++0x.
typedef std::complex<double> complex;
complex val = std::complex(0, 3.14);
Perhaps? >:)
Anyway, the point is: writing 3.14i instead of std::complex(0, 3.14); does not save you much time in overall except in few super special cases.

what does operator"" overloading mean in C++? [duplicate]

C++11 introduces user-defined literals which will allow the introduction of new literal syntax based on existing literals (int, hex, string, float) so that any type will be able to have a literal presentation.
Examples:
// imaginary numbers
std::complex<long double> operator "" _i(long double d) // cooked form
{
return std::complex<long double>(0, d);
}
auto val = 3.14_i; // val = complex<long double>(0, 3.14)
// binary values
int operator "" _B(const char*); // raw form
int answer = 101010_B; // answer = 42
// std::string
std::string operator "" _s(const char* str, size_t /*length*/)
{
return std::string(str);
}
auto hi = "hello"_s + " world"; // + works, "hello"_s is a string not a pointer
// units
assert(1_kg == 2.2_lb); // give or take 0.00462262 pounds
At first glance this looks very cool but I'm wondering how applicable it really is, when I tried to think of having the suffixes _AD and _BC create dates I found that it's problematic due to operator order. 1974/01/06_AD would first evaluate 1974/01 (as plain ints) and only later the 06_AD (to say nothing of August and September having to be written without the 0 for octal reasons). This can be worked around by having the syntax be 1974-1/6_AD so that the operator evaluation order works but it's clunky.
So what my question boils down to is this, do you feel this feature will justify itself? What other literals would you like to define that will make your C++ code more readable?
Updated syntax to fit the final draft on June 2011
At first sight, it seems to be simple syntactic sugar.
But when looking deeper, we see it's more than syntactic sugar, as it extends the C++ user's options to create user-defined types that behave exactly like distinct built-in types. In this, this little "bonus" is a very interesting C++11 addition to C++.
Do we really need it in C++?
I see few uses in the code I wrote in the past years, but just because I didn't use it in C++ doesn't mean it's not interesting for another C++ developer.
We had used in C++ (and in C, I guess), compiler-defined literals, to type integer numbers as short or long integers, real numbers as float or double (or even long double), and character strings as normal or wide chars.
In C++, we had the possibility to create our own types (i.e. classes), with potentially no overhead (inlining, etc.). We had the possibility to add operators to their types, to have them behave like similar built-in types, which enables C++ developers to use matrices and complex numbers as naturally as they would have if these have been added to the language itself. We can even add cast operators (which is usually a bad idea, but sometimes, it's just the right solution).
We still missed one thing to have user-types behave as built-in types: user-defined literals.
So, I guess it's a natural evolution for the language, but to be as complete as possible: "If you want to create a type, and you want it to behave as much possible as a built-in types, here are the tools..."
I'd guess it's very similar to .NET's decision to make every primitive a struct, including booleans, integers, etc., and have all structs derive from Object. This decision alone puts .NET far beyond Java's reach when working with primitives, no matter how much boxing/unboxing hacks Java will add to its specification.
Do YOU really need it in C++?
This question is for YOU to answer. Not Bjarne Stroustrup. Not Herb Sutter. Not whatever member of C++ standard committee. This is why you have the choice in C++, and they won't restrict a useful notation to built-in types alone.
If you need it, then it is a welcome addition. If you don't, well... Don't use it. It will cost you nothing.
Welcome to C++, the language where features are optional.
Bloated??? Show me your complexes!!!
There is a difference between bloated and complex (pun intended).
Like shown by Niels at What new capabilities do user-defined literals add to C++?, being able to write a complex number is one of the two features added "recently" to C and C++:
// C89:
MyComplex z1 = { 1, 2 } ;
// C99: You'll note I is a macro, which can lead
// to very interesting situations...
double complex z1 = 1 + 2*I;
// C++:
std::complex<double> z1(1, 2) ;
// C++11: You'll note that "i" won't ever bother
// you elsewhere
std::complex<double> z1 = 1 + 2_i ;
Now, both C99 "double complex" type and C++ "std::complex" type are able to be multiplied, added, subtracted, etc., using operator overloading.
But in C99, they just added another type as a built-in type, and built-in operator overloading support. And they added another built-in literal feature.
In C++, they just used existing features of the language, saw that the literal feature was a natural evolution of the language, and thus added it.
In C, if you need the same notation enhancement for another type, you're out of luck until your lobbying to add your quantum wave functions (or 3D points, or whatever basic type you're using in your field of work) to the C standard as a built-in type succeeds.
In C++11, you just can do it yourself:
Point p = 25_x + 13_y + 3_z ; // 3D point
Is it bloated? No, the need is there, as shown by how both C and C++ complexes need a way to represent their literal complex values.
Is it wrongly designed? No, it's designed as every other C++ feature, with extensibility in mind.
Is it for notation purposes only? No, as it can even add type safety to your code.
For example, let's imagine a CSS oriented code:
css::Font::Size p0 = 12_pt ; // Ok
css::Font::Size p1 = 50_percent ; // Ok
css::Font::Size p2 = 15_px ; // Ok
css::Font::Size p3 = 10_em ; // Ok
css::Font::Size p4 = 15 ; // ERROR : Won't compile !
It is then very easy to enforce a strong typing to the assignment of values.
Is is dangerous?
Good question. Can these functions be namespaced? If yes, then Jackpot!
Anyway, like everything, you can kill yourself if a tool is used improperly. C is powerful, and you can shoot your head off if you misuse the C gun. C++ has the C gun, but also the scalpel, the taser, and whatever other tool you'll find in the toolkit. You can misuse the scalpel and bleed yourself to death. Or you can build very elegant and robust code.
So, like every C++ feature, do you really need it? It is the question you must answer before using it in C++. If you don't, it will cost you nothing. But if you do really need it, at least, the language won't let you down.
The date example?
Your error, it seems to me, is that you are mixing operators:
1974/01/06AD
^ ^ ^
This can't be avoided, because / being an operator, the compiler must interpret it. And, AFAIK, it is a good thing.
To find a solution for your problem, I would write the literal in some other way. For example:
"1974-01-06"_AD ; // ISO-like notation
"06/01/1974"_AD ; // french-date-like notation
"jan 06 1974"_AD ; // US-date-like notation
19740106_AD ; // integer-date-like notation
Personally, I would choose the integer and the ISO dates, but it depends on YOUR needs. Which is the whole point of letting the user define its own literal names.
Here's a case where there is an advantage to using user-defined literals instead of a constructor call:
#include <bitset>
#include <iostream>
template<char... Bits>
struct checkbits
{
static const bool valid = false;
};
template<char High, char... Bits>
struct checkbits<High, Bits...>
{
static const bool valid = (High == '0' || High == '1')
&& checkbits<Bits...>::valid;
};
template<char High>
struct checkbits<High>
{
static const bool valid = (High == '0' || High == '1');
};
template<char... Bits>
inline constexpr std::bitset<sizeof...(Bits)>
operator"" _bits() noexcept
{
static_assert(checkbits<Bits...>::valid, "invalid digit in binary string");
return std::bitset<sizeof...(Bits)>((char []){Bits..., '\0'});
}
int
main()
{
auto bits = 0101010101010101010101010101010101010101010101010101010101010101_bits;
std::cout << bits << std::endl;
std::cout << "size = " << bits.size() << std::endl;
std::cout << "count = " << bits.count() << std::endl;
std::cout << "value = " << bits.to_ullong() << std::endl;
// This triggers the static_assert at compile time.
auto badbits = 2101010101010101010101010101010101010101010101010101010101010101_bits;
// This throws at run time.
std::bitset<64> badbits2("2101010101010101010101010101010101010101010101010101010101010101_bits");
}
The advantage is that a run-time exception is converted to a compile-time error.
You couldn't add the static assert to the bitset ctor taking a string (at least not without string template arguments).
It's very nice for mathematical code. Out of my mind I can see the use for the following operators:
deg for degrees. That makes writing absolute angles much more intuitive.
double operator ""_deg(long double d)
{
// returns radians
return d*M_PI/180;
}
It can also be used for various fixed point representations (which are still in use in the field of DSP and graphics).
int operator ""_fix(long double d)
{
// returns d as a 1.15.16 fixed point number
return (int)(d*65536.0f);
}
These look like nice examples how to use it. They help to make constants in code more readable. It's another tool to make code unreadable as well, but we already have so much tools abuse that one more does not hurt much.
UDLs are namespaced (and can be imported by using declarations/directives, but you cannot explicitly namespace a literal like 3.14std::i), which means there (hopefully) won't be a ton of clashes.
The fact that they can actually be templated (and constexpr'd) means that you can do some pretty powerful stuff with UDLs. Bigint authors will be really happy, as they can finally have arbitrarily large constants, calculated at compile time (via constexpr or templates).
I'm just sad that we won't see a couple useful literals in the standard (from the looks of it), like s for std::string and i for the imaginary unit.
The amount of coding time that will be saved by UDLs is actually not that high, but the readability will be vastly increased and more and more calculations can be shifted to compile-time for faster execution.
Bjarne Stroustrup talks about UDL's in this C++11 talk, in the first section on type-rich interfaces, around 20 minute mark.
His basic argument for UDLs takes the form of a syllogism:
"Trivial" types, i.e., built-in primitive types, can only catch trivial type errors. Interfaces with richer types allow the type system to catch more kinds of errors.
The kinds of type errors that richly typed code can catch have impact on real code. (He gives the example of the Mars Climate Orbiter, which infamously failed due to a dimensions error in an important constant).
In real code, units are rarely used. People don't use them, because incurring runtime compute or memory overhead to create rich types is too costly, and using pre-existing C++ templated unit code is so notationally ugly that no one uses it. (Empirically, no one uses it, even though the libraries have been around for a decade).
Therefore, in order to get engineers to use units in real code, we needed a device that (1) incurs no runtime overhead and (2) is notationally acceptable.
Let me add a little bit of context. For our work, user defined literals is much needed. We work on MDE (Model-Driven Engineering). We want to define models and metamodels in C++. We actually implemented a mapping from Ecore to C++ (EMF4CPP).
The problem comes when being able to define model elements as classes in C++. We are taking the approach of transforming the metamodel (Ecore) to templates with arguments. Arguments of the template are the structural characteristics of types and classes. For example, a class with two int attributes would be something like:
typedef ::ecore::Class< Attribute<int>, Attribute<int> > MyClass;
Hoever, it turns out that every element in a model or metamodel, usually has a name. We would like to write:
typedef ::ecore::Class< "MyClass", Attribute< "x", int>, Attribute<"y", int> > MyClass;
BUT, C++, nor C++0x don't allow this, as strings are prohibited as arguments to templates. You can write the name char by char, but this is admitedly a mess. With proper user-defined literals, we could write something similar. Say we use "_n" to identify model element names (I don't use the exact syntax, just to make an idea):
typedef ::ecore::Class< MyClass_n, Attribute< x_n, int>, Attribute<y_n, int> > MyClass;
Finally, having those definitions as templates helps us a lot to design algorithms for traversing the model elements, model transformations, etc. that are really efficient, because type information, identification, transformations, etc. are determined by the compiler at compile time.
Supporting compile-time dimension checking is the only justification required.
auto force = 2_N;
auto dx = 2_m;
auto energy = force * dx;
assert(energy == 4_J);
See for example PhysUnits-CT-Cpp11, a small C++11, C++14 header-only library for compile-time dimensional analysis and unit/quantity manipulation and conversion. Simpler than Boost.Units, does support unit symbol literals such as m, g, s, metric prefixes such as m, k, M, only depends on standard C++ library, SI-only, integral powers of dimensions.
Hmm... I have not thought about this feature yet. Your sample was well thought out and is certainly interesting. C++ is very powerful as it is now, but unfortunately the syntax used in pieces of code you read is at times overly complex. Readability is, if not all, then at least much. And such a feature would be geared for more readability. If I take your last example
assert(1_kg == 2.2_lb); // give or take 0.00462262 pounds
... I wonder how you'd express that today. You'd have a KG and a LB class and you'd compare implicit objects:
assert(KG(1.0f) == LB(2.2f));
And that would do as well. With types that have longer names or types that you have no hopes of having such a nice constructor for sans writing an adapter, it might be a nice addition for on-the-fly implicit object creation and initialization. On the other hand, you can already create and initialize objects using methods, too.
But I agree with Nils on mathematics. C and C++ trigonometry functions for example require input in radians. I think in degrees though, so a very short implicit conversion like Nils posted is very nice.
Ultimately, it's going to be syntactic sugar however, but it will have a slight effect on readability. And it will probably be easier to write some expressions too (sin(180.0deg) is easier to write than sin(deg(180.0)). And then there will be people who abuse the concept. But then, language-abusive people should use very restrictive languages rather than something as expressive as C++.
Ah, my post says basically nothing except: it's going to be okay, the impact won't be too big. Let's not worry. :-)
I have never needed or wanted this feature (but this could be the Blub effect). My knee jerk reaction is that it's lame, and likely to appeal to the same people who think that it's cool to overload operator+ for any operation which could remotely be construed as adding.
C++ is usually very strict about the syntax used - barring the preprocessor there is not much you can use to define a custom syntax/grammar. E.g. we can overload existing operatos, but we cannot define new ones - IMO this is very much in tune with the spirit of C++.
I don't mind some ways for more customized source code - but the point chosen seems very isolated to me, which confuses me most.
Even intended use may make it much harder to read source code: an single letter may have vast-reaching side effects that in no way can be identified from the context. With symmetry to u, l and f, most developers will choose single letters.
This may also turn scoping into a problem, using single letters in global namespace will probably be considered bad practice, and the tools that are supposed mixing libraries easier (namespaces and descriptive identifiers) will probably defeat its purpose.
I see some merit in combination with "auto", also in combination with a unit library like boost units, but not enough to merit this adition.
I wonder, however, what clever ideas we come up with.
I used user literals for binary strings like this:
"asd\0\0\0\1"_b
using std::string(str, n) constructor so that \0 wouldn't cut the string in half. (The project does a lot of work with various file formats.)
This was helpful also when I ditched std::string in favor of a wrapper for std::vector.
Line noise in that thing is huge. Also it's horrible to read.
Let me know, did they reason that new syntax addition with any kind of examples? For instance, do they have couple of programs that already use C++0x?
For me, this part:
auto val = 3.14_i
Does not justify this part:
std::complex<double> operator ""_i(long double d) // cooked form
{
return std::complex(0, d);
}
Not even if you'd use the i-syntax in 1000 other lines as well. If you write, you probably write 10000 lines of something else along that as well. Especially when you will still probably write mostly everywhere this:
std::complex<double> val = 3.14i
'auto' -keyword may be justified though, only perhaps. But lets take just C++, because it's better than C++0x in this aspect.
std::complex<double> val = std::complex(0, 3.14);
It's like.. that simple. Even thought all the std and pointy brackets are just lame if you use it about everywhere. I don't start guessing what syntax there's in C++0x for turning std::complex under complex.
complex = std::complex<double>;
That's perhaps something straightforward, but I don't believe it's that simple in C++0x.
typedef std::complex<double> complex;
complex val = std::complex(0, 3.14);
Perhaps? >:)
Anyway, the point is: writing 3.14i instead of std::complex(0, 3.14); does not save you much time in overall except in few super special cases.

Return Set for a Command Parser

I need to write a parser to parse commands. 5 such commands are:
"a=10"
"b=foo"
"c=10,10"
"clear d"
"c push_back 2"
In the case of the first example, set is the command, a is the object and 10 is the value.
What do you think the parser should return for each line above?
Here is my idea:
"a=10" -> SET (COMMAND_ENUM), INT (VALUE_TYPE), "a", ("10")
"b=foo" -> SET (COMMAND_ENUM), STRING (VALUE_TYPE), "b", ("foo")
Is this a good approach? What is the standard approach for this problem? Should I dispatch instead?
I have a function which checks the type associated with an object. For example, a above is of type INT and must be assigned an INT value, otherwise the parser should return or throw an error of some sort. I also have a convert function for converting values from strings to the desired type. These throw if the conversion is not possible. If the parser tries to convert the values from strings to the required type, then it is probably a good idea to return them via a boost::variant.
You need to come up with at least a semi-formal grammar for the command language you want to recognize, since you've left a whole lot of things really vaguely specified (e.g. in b=foo you want b to be a variable name but foo to be a string literal. How do you distinguish them?. Does a sequence of characters represent an identifier if it's on the right side of an assignment, but a literal if it's on the left side? Or does a single character represent an identifier, but multiple characters represent a literal?) In c=10,10 does 10,10 represent a list or a vector? Writing a grammar will at least force you to think about such things, and it will also serve at least as a guide to how to write your parser (at most it will be something that can be automatically translated into your parser).
You're on the right track by thinking of how statements should be represented as Abstract Syntax Trees (ASTs), but you need to take a step backwards and look at what you want in terms of concrete syntax.

What new capabilities do user-defined literals add to C++?

C++11 introduces user-defined literals which will allow the introduction of new literal syntax based on existing literals (int, hex, string, float) so that any type will be able to have a literal presentation.
Examples:
// imaginary numbers
std::complex<long double> operator "" _i(long double d) // cooked form
{
return std::complex<long double>(0, d);
}
auto val = 3.14_i; // val = complex<long double>(0, 3.14)
// binary values
int operator "" _B(const char*); // raw form
int answer = 101010_B; // answer = 42
// std::string
std::string operator "" _s(const char* str, size_t /*length*/)
{
return std::string(str);
}
auto hi = "hello"_s + " world"; // + works, "hello"_s is a string not a pointer
// units
assert(1_kg == 2.2_lb); // give or take 0.00462262 pounds
At first glance this looks very cool but I'm wondering how applicable it really is, when I tried to think of having the suffixes _AD and _BC create dates I found that it's problematic due to operator order. 1974/01/06_AD would first evaluate 1974/01 (as plain ints) and only later the 06_AD (to say nothing of August and September having to be written without the 0 for octal reasons). This can be worked around by having the syntax be 1974-1/6_AD so that the operator evaluation order works but it's clunky.
So what my question boils down to is this, do you feel this feature will justify itself? What other literals would you like to define that will make your C++ code more readable?
Updated syntax to fit the final draft on June 2011
At first sight, it seems to be simple syntactic sugar.
But when looking deeper, we see it's more than syntactic sugar, as it extends the C++ user's options to create user-defined types that behave exactly like distinct built-in types. In this, this little "bonus" is a very interesting C++11 addition to C++.
Do we really need it in C++?
I see few uses in the code I wrote in the past years, but just because I didn't use it in C++ doesn't mean it's not interesting for another C++ developer.
We had used in C++ (and in C, I guess), compiler-defined literals, to type integer numbers as short or long integers, real numbers as float or double (or even long double), and character strings as normal or wide chars.
In C++, we had the possibility to create our own types (i.e. classes), with potentially no overhead (inlining, etc.). We had the possibility to add operators to their types, to have them behave like similar built-in types, which enables C++ developers to use matrices and complex numbers as naturally as they would have if these have been added to the language itself. We can even add cast operators (which is usually a bad idea, but sometimes, it's just the right solution).
We still missed one thing to have user-types behave as built-in types: user-defined literals.
So, I guess it's a natural evolution for the language, but to be as complete as possible: "If you want to create a type, and you want it to behave as much possible as a built-in types, here are the tools..."
I'd guess it's very similar to .NET's decision to make every primitive a struct, including booleans, integers, etc., and have all structs derive from Object. This decision alone puts .NET far beyond Java's reach when working with primitives, no matter how much boxing/unboxing hacks Java will add to its specification.
Do YOU really need it in C++?
This question is for YOU to answer. Not Bjarne Stroustrup. Not Herb Sutter. Not whatever member of C++ standard committee. This is why you have the choice in C++, and they won't restrict a useful notation to built-in types alone.
If you need it, then it is a welcome addition. If you don't, well... Don't use it. It will cost you nothing.
Welcome to C++, the language where features are optional.
Bloated??? Show me your complexes!!!
There is a difference between bloated and complex (pun intended).
Like shown by Niels at What new capabilities do user-defined literals add to C++?, being able to write a complex number is one of the two features added "recently" to C and C++:
// C89:
MyComplex z1 = { 1, 2 } ;
// C99: You'll note I is a macro, which can lead
// to very interesting situations...
double complex z1 = 1 + 2*I;
// C++:
std::complex<double> z1(1, 2) ;
// C++11: You'll note that "i" won't ever bother
// you elsewhere
std::complex<double> z1 = 1 + 2_i ;
Now, both C99 "double complex" type and C++ "std::complex" type are able to be multiplied, added, subtracted, etc., using operator overloading.
But in C99, they just added another type as a built-in type, and built-in operator overloading support. And they added another built-in literal feature.
In C++, they just used existing features of the language, saw that the literal feature was a natural evolution of the language, and thus added it.
In C, if you need the same notation enhancement for another type, you're out of luck until your lobbying to add your quantum wave functions (or 3D points, or whatever basic type you're using in your field of work) to the C standard as a built-in type succeeds.
In C++11, you just can do it yourself:
Point p = 25_x + 13_y + 3_z ; // 3D point
Is it bloated? No, the need is there, as shown by how both C and C++ complexes need a way to represent their literal complex values.
Is it wrongly designed? No, it's designed as every other C++ feature, with extensibility in mind.
Is it for notation purposes only? No, as it can even add type safety to your code.
For example, let's imagine a CSS oriented code:
css::Font::Size p0 = 12_pt ; // Ok
css::Font::Size p1 = 50_percent ; // Ok
css::Font::Size p2 = 15_px ; // Ok
css::Font::Size p3 = 10_em ; // Ok
css::Font::Size p4 = 15 ; // ERROR : Won't compile !
It is then very easy to enforce a strong typing to the assignment of values.
Is is dangerous?
Good question. Can these functions be namespaced? If yes, then Jackpot!
Anyway, like everything, you can kill yourself if a tool is used improperly. C is powerful, and you can shoot your head off if you misuse the C gun. C++ has the C gun, but also the scalpel, the taser, and whatever other tool you'll find in the toolkit. You can misuse the scalpel and bleed yourself to death. Or you can build very elegant and robust code.
So, like every C++ feature, do you really need it? It is the question you must answer before using it in C++. If you don't, it will cost you nothing. But if you do really need it, at least, the language won't let you down.
The date example?
Your error, it seems to me, is that you are mixing operators:
1974/01/06AD
^ ^ ^
This can't be avoided, because / being an operator, the compiler must interpret it. And, AFAIK, it is a good thing.
To find a solution for your problem, I would write the literal in some other way. For example:
"1974-01-06"_AD ; // ISO-like notation
"06/01/1974"_AD ; // french-date-like notation
"jan 06 1974"_AD ; // US-date-like notation
19740106_AD ; // integer-date-like notation
Personally, I would choose the integer and the ISO dates, but it depends on YOUR needs. Which is the whole point of letting the user define its own literal names.
Here's a case where there is an advantage to using user-defined literals instead of a constructor call:
#include <bitset>
#include <iostream>
template<char... Bits>
struct checkbits
{
static const bool valid = false;
};
template<char High, char... Bits>
struct checkbits<High, Bits...>
{
static const bool valid = (High == '0' || High == '1')
&& checkbits<Bits...>::valid;
};
template<char High>
struct checkbits<High>
{
static const bool valid = (High == '0' || High == '1');
};
template<char... Bits>
inline constexpr std::bitset<sizeof...(Bits)>
operator"" _bits() noexcept
{
static_assert(checkbits<Bits...>::valid, "invalid digit in binary string");
return std::bitset<sizeof...(Bits)>((char []){Bits..., '\0'});
}
int
main()
{
auto bits = 0101010101010101010101010101010101010101010101010101010101010101_bits;
std::cout << bits << std::endl;
std::cout << "size = " << bits.size() << std::endl;
std::cout << "count = " << bits.count() << std::endl;
std::cout << "value = " << bits.to_ullong() << std::endl;
// This triggers the static_assert at compile time.
auto badbits = 2101010101010101010101010101010101010101010101010101010101010101_bits;
// This throws at run time.
std::bitset<64> badbits2("2101010101010101010101010101010101010101010101010101010101010101_bits");
}
The advantage is that a run-time exception is converted to a compile-time error.
You couldn't add the static assert to the bitset ctor taking a string (at least not without string template arguments).
It's very nice for mathematical code. Out of my mind I can see the use for the following operators:
deg for degrees. That makes writing absolute angles much more intuitive.
double operator ""_deg(long double d)
{
// returns radians
return d*M_PI/180;
}
It can also be used for various fixed point representations (which are still in use in the field of DSP and graphics).
int operator ""_fix(long double d)
{
// returns d as a 1.15.16 fixed point number
return (int)(d*65536.0f);
}
These look like nice examples how to use it. They help to make constants in code more readable. It's another tool to make code unreadable as well, but we already have so much tools abuse that one more does not hurt much.
UDLs are namespaced (and can be imported by using declarations/directives, but you cannot explicitly namespace a literal like 3.14std::i), which means there (hopefully) won't be a ton of clashes.
The fact that they can actually be templated (and constexpr'd) means that you can do some pretty powerful stuff with UDLs. Bigint authors will be really happy, as they can finally have arbitrarily large constants, calculated at compile time (via constexpr or templates).
I'm just sad that we won't see a couple useful literals in the standard (from the looks of it), like s for std::string and i for the imaginary unit.
The amount of coding time that will be saved by UDLs is actually not that high, but the readability will be vastly increased and more and more calculations can be shifted to compile-time for faster execution.
Bjarne Stroustrup talks about UDL's in this C++11 talk, in the first section on type-rich interfaces, around 20 minute mark.
His basic argument for UDLs takes the form of a syllogism:
"Trivial" types, i.e., built-in primitive types, can only catch trivial type errors. Interfaces with richer types allow the type system to catch more kinds of errors.
The kinds of type errors that richly typed code can catch have impact on real code. (He gives the example of the Mars Climate Orbiter, which infamously failed due to a dimensions error in an important constant).
In real code, units are rarely used. People don't use them, because incurring runtime compute or memory overhead to create rich types is too costly, and using pre-existing C++ templated unit code is so notationally ugly that no one uses it. (Empirically, no one uses it, even though the libraries have been around for a decade).
Therefore, in order to get engineers to use units in real code, we needed a device that (1) incurs no runtime overhead and (2) is notationally acceptable.
Let me add a little bit of context. For our work, user defined literals is much needed. We work on MDE (Model-Driven Engineering). We want to define models and metamodels in C++. We actually implemented a mapping from Ecore to C++ (EMF4CPP).
The problem comes when being able to define model elements as classes in C++. We are taking the approach of transforming the metamodel (Ecore) to templates with arguments. Arguments of the template are the structural characteristics of types and classes. For example, a class with two int attributes would be something like:
typedef ::ecore::Class< Attribute<int>, Attribute<int> > MyClass;
Hoever, it turns out that every element in a model or metamodel, usually has a name. We would like to write:
typedef ::ecore::Class< "MyClass", Attribute< "x", int>, Attribute<"y", int> > MyClass;
BUT, C++, nor C++0x don't allow this, as strings are prohibited as arguments to templates. You can write the name char by char, but this is admitedly a mess. With proper user-defined literals, we could write something similar. Say we use "_n" to identify model element names (I don't use the exact syntax, just to make an idea):
typedef ::ecore::Class< MyClass_n, Attribute< x_n, int>, Attribute<y_n, int> > MyClass;
Finally, having those definitions as templates helps us a lot to design algorithms for traversing the model elements, model transformations, etc. that are really efficient, because type information, identification, transformations, etc. are determined by the compiler at compile time.
Supporting compile-time dimension checking is the only justification required.
auto force = 2_N;
auto dx = 2_m;
auto energy = force * dx;
assert(energy == 4_J);
See for example PhysUnits-CT-Cpp11, a small C++11, C++14 header-only library for compile-time dimensional analysis and unit/quantity manipulation and conversion. Simpler than Boost.Units, does support unit symbol literals such as m, g, s, metric prefixes such as m, k, M, only depends on standard C++ library, SI-only, integral powers of dimensions.
Hmm... I have not thought about this feature yet. Your sample was well thought out and is certainly interesting. C++ is very powerful as it is now, but unfortunately the syntax used in pieces of code you read is at times overly complex. Readability is, if not all, then at least much. And such a feature would be geared for more readability. If I take your last example
assert(1_kg == 2.2_lb); // give or take 0.00462262 pounds
... I wonder how you'd express that today. You'd have a KG and a LB class and you'd compare implicit objects:
assert(KG(1.0f) == LB(2.2f));
And that would do as well. With types that have longer names or types that you have no hopes of having such a nice constructor for sans writing an adapter, it might be a nice addition for on-the-fly implicit object creation and initialization. On the other hand, you can already create and initialize objects using methods, too.
But I agree with Nils on mathematics. C and C++ trigonometry functions for example require input in radians. I think in degrees though, so a very short implicit conversion like Nils posted is very nice.
Ultimately, it's going to be syntactic sugar however, but it will have a slight effect on readability. And it will probably be easier to write some expressions too (sin(180.0deg) is easier to write than sin(deg(180.0)). And then there will be people who abuse the concept. But then, language-abusive people should use very restrictive languages rather than something as expressive as C++.
Ah, my post says basically nothing except: it's going to be okay, the impact won't be too big. Let's not worry. :-)
I have never needed or wanted this feature (but this could be the Blub effect). My knee jerk reaction is that it's lame, and likely to appeal to the same people who think that it's cool to overload operator+ for any operation which could remotely be construed as adding.
C++ is usually very strict about the syntax used - barring the preprocessor there is not much you can use to define a custom syntax/grammar. E.g. we can overload existing operatos, but we cannot define new ones - IMO this is very much in tune with the spirit of C++.
I don't mind some ways for more customized source code - but the point chosen seems very isolated to me, which confuses me most.
Even intended use may make it much harder to read source code: an single letter may have vast-reaching side effects that in no way can be identified from the context. With symmetry to u, l and f, most developers will choose single letters.
This may also turn scoping into a problem, using single letters in global namespace will probably be considered bad practice, and the tools that are supposed mixing libraries easier (namespaces and descriptive identifiers) will probably defeat its purpose.
I see some merit in combination with "auto", also in combination with a unit library like boost units, but not enough to merit this adition.
I wonder, however, what clever ideas we come up with.
I used user literals for binary strings like this:
"asd\0\0\0\1"_b
using std::string(str, n) constructor so that \0 wouldn't cut the string in half. (The project does a lot of work with various file formats.)
This was helpful also when I ditched std::string in favor of a wrapper for std::vector.
Line noise in that thing is huge. Also it's horrible to read.
Let me know, did they reason that new syntax addition with any kind of examples? For instance, do they have couple of programs that already use C++0x?
For me, this part:
auto val = 3.14_i
Does not justify this part:
std::complex<double> operator ""_i(long double d) // cooked form
{
return std::complex(0, d);
}
Not even if you'd use the i-syntax in 1000 other lines as well. If you write, you probably write 10000 lines of something else along that as well. Especially when you will still probably write mostly everywhere this:
std::complex<double> val = 3.14i
'auto' -keyword may be justified though, only perhaps. But lets take just C++, because it's better than C++0x in this aspect.
std::complex<double> val = std::complex(0, 3.14);
It's like.. that simple. Even thought all the std and pointy brackets are just lame if you use it about everywhere. I don't start guessing what syntax there's in C++0x for turning std::complex under complex.
complex = std::complex<double>;
That's perhaps something straightforward, but I don't believe it's that simple in C++0x.
typedef std::complex<double> complex;
complex val = std::complex(0, 3.14);
Perhaps? >:)
Anyway, the point is: writing 3.14i instead of std::complex(0, 3.14); does not save you much time in overall except in few super special cases.

Why can't variable names start with numbers?

I was working with a new C++ developer a while back when he asked the question: "Why can't variable names start with numbers?"
I couldn't come up with an answer except that some numbers can have text in them (123456L, 123456U) and that wouldn't be possible if the compilers were thinking everything with some amount of alpha characters was a variable name.
Was that the right answer? Are there any more reasons?
string 2BeOrNot2Be = "that is the question"; // Why won't this compile?
Because then a string of digits would be a valid identifier as well as a valid number.
int 17 = 497;
int 42 = 6 * 9;
String 1111 = "Totally text";
Well think about this:
int 2d = 42;
double a = 2d;
What is a? 2.0? or 42?
Hint, if you don't get it, d after a number means the number before it is a double literal
It's a convention now, but it started out as a technical requirement.
In the old days, parsers of languages such as FORTRAN or BASIC did not require the uses of spaces. So, basically, the following are identical:
10 V1=100
20 PRINT V1
and
10V1=100
20PRINTV1
Now suppose that numeral prefixes were allowed. How would you interpret this?
101V=100
as
10 1V = 100
or as
101 V = 100
or as
1 01V = 100
So, this was made illegal.
Because backtracking is avoided in lexical analysis while compiling. A variable like:
Apple;
the compiler will know it's a identifier right away when it meets letter 'A'.
However a variable like:
123apple;
compiler won't be able to decide if it's a number or identifier until it hits 'a', and it needs backtracking as a result.
Compilers/parsers/lexical analyzers was a long, long time ago for me, but I think I remember there being difficulty in unambiguosly determining whether a numeric character in the compilation unit represented a literal or an identifier.
Languages where space is insignificant (like ALGOL and the original FORTRAN if I remember correctly) could not accept numbers to begin identifiers for that reason.
This goes way back - before special notations to denote storage or numeric base.
I agree it would be handy to allow identifiers to begin with a digit. One or two people have mentioned that you can get around this restriction by prepending an underscore to your identifier, but that's really ugly.
I think part of the problem comes from number literals such as 0xdeadbeef, which make it hard to come up with easy to remember rules for identifiers that can start with a digit. One way to do it might be to allow anything matching [A-Za-z_]+ that is NOT a keyword or number literal. The problem is that it would lead to weird things like 0xdeadpork being allowed, but not 0xdeadbeef. Ultimately, I think we should be fair to all meats :P.
When I was first learning C, I remember feeling the rules for variable names were arbitrary and restrictive. Worst of all, they were hard to remember, so I gave up trying to learn them. I just did what felt right, and it worked pretty well. Now that I've learned alot more, it doesn't seem so bad, and I finally got around to learning it right.
It's likely a decision that came for a few reasons, when you're parsing the token you only have to look at the first character to determine if it's an identifier or literal and then send it to the correct function for processing. So that's a performance optimization.
The other option would be to check if it's not a literal and leave the domain of identifiers to be the universe minus the literals. But to do this you would have to examine every character of every token to know how to classify it.
There is also the stylistic implications identifiers are supposed to be mnemonics so words are much easier to remember than numbers. When a lot of the original languages were being written setting the styles for the next few decades they weren't thinking about substituting "2" for "to".
Variable names cannot start with a digit, because it can cause some problems like below:
int a = 2;
int 2 = 5;
int c = 2 * a;
what is the value of c? is 4, or is 10!
another example:
float 5 = 25;
float b = 5.5;
is first 5 a number, or is an object (. operator)
There is a similar problem with second 5.
Maybe, there are some other reasons. So, we shouldn't use any digit in the beginnig of a variable name.
The restriction is arbitrary. Various Lisps permit symbol names to begin with numerals.
COBOL allows variables to begin with a digit.
Use of a digit to begin a variable name makes error checking during compilation or interpertation a lot more complicated.
Allowing use of variable names that began like a number would probably cause huge problems for the language designers. During source code parsing, whenever a compiler/interpreter encountered a token beginning with a digit where a variable name was expected, it would have to search through a huge, complicated set of rules to determine whether the token was really a variable, or an error. The added complexity added to the language parser may not justify this feature.
As far back as I can remember (about 40 years), I don't think that I have ever used a language that allowed use of a digit to begin variable names. I'm sure that this was done at least once. Maybe, someone here has actually seen this somewhere.
As several people have noticed, there is a lot of historical baggage about valid formats for variable names. And language designers are always influenced by what they know when they create new languages.
That said, pretty much all of the time a language doesn't allow variable names to begin with numbers is because those are the rules of the language design. Often it is because such a simple rule makes the parsing and lexing of the language vastly easier. Not all language designers know this is the real reason, though. Modern lexing tools help, because if you tried to define it as permissible, they will give you parsing conflicts.
OTOH, if your language has a uniquely identifiable character to herald variable names, it is possible to set it up for them to begin with a number. Similar rule variations can also be used to allow spaces in variable names. But the resulting language is likely to not to resemble any popular conventional language very much, if at all.
For an example of a fairly simple HTML templating language that does permit variables to begin with numbers and have embedded spaces, look at Qompose.
Because if you allowed keyword and identifier to begin with numberic characters, the lexer (part of the compiler) couldn't readily differentiate between the start of a numeric literal and a keyword without getting a whole lot more complicated (and slower).
C++ can't have it because the language designers made it a rule. If you were to create your own language, you could certainly allow it, but you would probably run into the same problems they did and decide not to allow it. Examples of variable names that would cause problems:
0x, 2d, 5555
One of the key problems about relaxing syntactic conventions is that it introduces cognitive dissonance into the coding process. How you think about your code could be deeply influenced by the lack of clarity this would introduce.
Wasn't it Dykstra who said that the "most important aspect of any tool is its effect on its user"?
The compiler has 7 phase as follows:
Lexical analysis
Syntax Analysis
Semantic Analysis
Intermediate Code Generation
Code Optimization
Code Generation
Symbol Table
Backtracking is avoided in the lexical analysis phase while compiling the piece of code. The variable like Apple, the compiler will know its an identifier right away when it meets letter ‘A’ character in the lexical Analysis phase. However, a variable like 123apple, the compiler won’t be able to decide if its a number or identifier until it hits ‘a’ and it needs backtracking to go in the lexical analysis phase to identify that it is a variable. But it is not supported in the compiler.
When you’re parsing the token you only have to look at the first character to determine if it’s an identifier or literal and then send it to the correct function for processing. So that’s a performance optimization.
Probably because it makes it easier for the human to tell whether it's a number or an identifier, and because of tradition. Having identifiers that could begin with a digit wouldn't complicate the lexical scans all that much.
Not all languages have forbidden identifiers beginning with a digit. In Forth, they could be numbers, and small integers were normally defined as Forth words (essentially identifiers), since it was faster to read "2" as a routine to push a 2 onto the stack than to recognize "2" as a number whose value was 2. (In processing input from the programmer or the disk block, the Forth system would split up the input according to spaces. It would try to look the token up in the dictionary to see if it was a defined word, and if not would attempt to translate it into a number, and if not would flag an error.)
Suppose you did allow symbol names to begin with numbers. Now suppose you want to name a variable 12345foobar. How would you differentiate this from 12345? It's actually not terribly difficult to do with a regular expression. The problem is actually one of performance. I can't really explain why this is in great detail, but it essentially boils down to the fact that differentiating 12345foobar from 12345 requires backtracking. This makes the regular expression non-deterministic.
There's a much better explanation of this here.
it is easy for a compiler to identify a variable using ASCII on memory location rather than number .
I think the simple answer is that it can, the restriction is language based. In C++ and many others it can't because the language doesn't support it. It's not built into the rules to allow that.
The question is akin to asking why can't the King move four spaces at a time in Chess? It's because in Chess that is an illegal move. Can it in another game sure. It just depends on the rules being played by.
Originally it was simply because it is easier to remember (you can give it more meaning) variable names as strings rather than numbers although numbers can be included within the string to enhance the meaning of the string or allow the use of the same variable name but have it designated as having a separate, but close meaning or context. For example loop1, loop2 etc would always let you know that you were in a loop and/or loop 2 was a loop within loop1.
Which would you prefer (has more meaning) as a variable: address or 1121298? Which is easier to remember?
However, if the language uses something to denote that it not just text or numbers (such as the $ in $address) it really shouldn't make a difference as that would tell the compiler that what follows is to be treated as a variable (in this case).
In any case it comes down to what the language designers want to use as the rules for their language.
The variable may be considered as a value also during compile time by the compiler
so the value may call the value again and again recursively
Backtracking is avoided in lexical analysis phase while compiling the piece of code. The variable like Apple; , the compiler will know its a identifier right away when it meets letter ‘A’ character in the lexical Analysis phase. However, a variable like 123apple; , compiler won’t be able to decide if its a number or identifier until it hits ‘a’ and it needs backtracking to go in the lexical analysis phase to identify that it is a variable. But it is not supported in compiler.
Reference
There could be nothing wrong with it when comes into declaring variable.but there is some ambiguity when it tries to use that variable somewhere else like this :
let 1 = "Hello world!"
print(1)
print(1)
print is a generic method that accepts all types of variable. so in that situation compiler does not know which (1) the programmer refers to : the 1 of integer value or the 1 that store a string value.
maybe better for compiler in this situation to allows to define something like that but when trying to use this ambiguous stuff, bring an error with correction capability to how gonna fix that error and clear this ambiguity.