Where does the k prefix for constants come from? - c++

it's a pretty common practice that constants are prefixed with k (e.g. k_pi). But what does the k mean?
Is it simply that c already meant char?

It's a historical oddity, still common practice among teams who like to blindly apply coding standards that they don't understand.
Long ago, most commercial programming languages were weakly typed; automatic type checking, which we take for granted now, was still mostly an academic topic. This meant that is was easy to write code with category errors; it would compile and run, but go wrong in ways that were hard to diagnose. To reduce these errors, a chap called Simonyi suggested that you begin each variable name with a tag to indicate its (conceptual) type, making it easier to spot when they were misused. Since he was Hungarian, the practise became known as "Hungarian notation".
Some time later, as typed languages (particularly C) became more popular, some idiots heard that this was a good idea, but didn't understand its purpose. They proposed adding redundant tags to each variable, to indicate its declared type. The only use for them is to make it easier to check the type of a variable; unless someone has changed the type and forgotten to update the tag, in which case they are actively harmful.
The second (useless) form was easier to describe and enforce, so it was blindly adopted by many, many teams; decades later, you still see it used, and even advocated, from time to time.
"c" was the tag for type "char", so it couldn't also be used for "const"; so "k" was chosen, since that's the first letter of "konstant" in German, and is widely used for constants in mathematics.

I haven't seen it that much, but maybe it comes from certain languages' (the germanic ones in particular) spelling of the word constant - konstant.

Don't use Hungarian Notation. If you want constants to stand out, make them all caps.
As a side note: there are a lot of things in the Google Coding Standards that are poor practice (in terms of code readability). That is what happens when you design a coding standard by committee.

It means the value is k-onstant.

I think mathematical convention was the precedent. k is used in maths all the time as just some constant.

K stands for konstant, a wordplay on constant. It relates to Coding Styles.
It's just a matter of preference, some people and projects use them which means they also embrace the Hungarian notation, many don't. That's not that important.
If you're unsure what a prefix or style might mean, always check if the project has a coding style reference and read that.

Actually, whenever I define constants in typescript, I do something like this -
NODE_ENV = 'production';
But recently, I saw that the k prefix is being used in the Flutter SDK. It makes sense to me to keep using the k prefix cuz' it helps your editor/IDE in searching out constants in your codebase.

It's a convention, probably from math. But there are other suggestions for constant too, for example Kernighan and Ritchie in their book "The C language" suggest writing constants' name in capital letters (e.g. #define MAX 55).

I think, it means coefficient (as k in math means)

Related

Is it a good style to write constants on the left of equal to == in If statement in C++? [duplicate]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Okay, we know that the following two lines are equivalent -
(0 == i)
(i == 0)
Also, the first method was encouraged in the past because that would have allowed the compiler to give an error message if you accidentally used '=' instead of '=='.
My question is - in today's generation of pretty slick IDE's and intelligent compilers, do you still recommend the first method?
In particular, this question popped into my mind when I saw the following code -
if(DialogResult.OK == MessageBox.Show("Message")) ...
In my opinion, I would never recommend the above. Any second opinions?
I prefer the second one, (i == 0), because it feel much more natural when reading it. You ask people, "Are you 21 or older?", not, "Is 21 less than or equal to your age?"
It doesn't matter in C# if you put the variable first or last, because assignments don't evaluate to a bool (or something castable to bool) so the compiler catches any errors like "if (i = 0) EntireCompanyData.Delete()"
So, in the C# world at least, its a matter of style rather than desperation. And putting the variable last is unnatural to english speakers. Therefore, for more readable code, variable first.
If you have a list of ifs that can't be represented well by a switch (because of a language limitation, maybe), then I'd rather see:
if (InterstingValue1 == foo) { } else
if (InterstingValue2 == foo) { } else
if (InterstingValue3 == foo) { }
because it allows you to quickly see which are the important values you need to check.
In particular, in Java I find it useful to do:
if ("SomeValue".equals(someString)) {
}
because someString may be null, and in this way you'll never get a NullPointerException. The same applies if you are comparing constants that you know will never be null against objects that may be null.
(0 == i)
I will always pick this one. It is true that most compilers today do not allow the assigment of a variable in a conditional statement, but the truth is that some do. In programming for the web today, I have to use myriad of langauges on a system. By using 0 == i, I always know that the conditional statement will be correct, and I am not relying on the compiler/interpreter to catch my mistake for me. Now if I have to jump from C# to C++, or JavaScript I know that I am not going to have to track down assignment errors in conditional statements in my code. For something this small and to have it save that amount of time, it's a no brainer.
I used to be convinced that the more readable option (i == 0) was the better way to go with.
Then we had a production bug slip through (not mine thankfully), where the problem was a ($var = SOME_CONSTANT) type bug. Clients started getting email that was meant for other clients. Sensitive type data as well.
You can argue that Q/A should have caught it, but they didn't, that's a different story.
Since that day I've always pushed for the (0 == i) version. It basically removes the problem. It feels unnatural, so you pay attention, so you don't make the mistake. There's simply no way to get it wrong here.
It's also a lot easier to catch that someone didn't reverse the if statement in a code review than it is that someone accidentally assigned a value in an if. If the format is part of the coding standards, people look for it. People don't typically debug code during code reviews, and the eye seems to scan over a (i = 0) vs an (i == 0).
I'm also a much bigger fan of the java "Constant String".equals(dynamicString), no null pointer exceptions is a good thing.
You know, I always use the if (i == 0) format of the conditional and my reason for doing this is that I write most of my code in C# (which would flag the other one anyway) and I do a test-first approach to my development and my tests would generally catch this mistake anyhow.
I've worked in shops where they tried to enforce the 0==i format but I found it awkward to write, awkward to remember and it simply ended up being fodder for the code reviewers who were looking for low-hanging fruit.
Actually, the DialogResult example is a place where I WOULD recommend that style. It places the important part of the if() toward the left were it can be seen. If it's is on the right and the MessageBox have more parameters (which is likely), you might have to scroll right to see it.
OTOH, I never saw much use in the "(0 == i) " style. If you could remember to put the constant first, you can remember to use two equals signs,
I'm trying always use 1st case (0==i), and this saved my life a few times!
I think it's just a matter of style. And it does help with accidentally using assignment operator.
I absolutely wouldn't ask the programmer to grow up though.
I prefer (i == 0), but I still sort of make a "rule" for myself to do (0 == i), and then break it every time.
"Eh?", you think.
Well, if I'm making a concious decision to put an lvalue on the left, then I'm paying enough attention to what I'm typing to notice if I type "=" for "==". I hope. In C/C++ I generally use -Wall for my own code, which generates a warning on gcc for most "=" for "==" errors anyway. I don't recall seeing that warning recently, perhaps because the longer I program the more reflexively paranoid I am about errors I've made before...
if(DialogResult.OK == MessageBox.Show("Message"))
seems misguided to me. The point of the trick is to avoid accidentally assigning to something.
But who is to say whether DialogResult.OK is more, or less likely to evaluate to an assignable type than MessageBox.Show("Message")? In Java a method call can't possibly be assignable, whereas a field might not be final. So if you're worried about typing = for ==, it should actually be the other way around in Java for this example. In C++ either, neither or both could be assignable.
(0==i) is only useful because you know for absolute certain that a numeric literal is never assignable, whereas i just might be.
When both sides of your comparison are assignable you can't protect yourself from accidental assignment in this way, and that goes for when you don't know which is assignable without looking it up. There's no magic trick that says "if you put them the counter-intuitive way around, you'll be safe". Although I suppose it draws attention to the issue, in the same way as my "always break the rule" rule.
I use (i == 0) for the simple reason that it reads better. It makes a very smooth flow in my head. When you read through the code back to yourself for debugging or other purposes, it simply flows like reading a book and just makes more sense.
My company has just dropped the requirement to do if (0 == i) from its coding standards. I can see how it makes a lot of sense but in practice it just seems backwards. It is a bit of a shame that by default a C compiler probably won't give you a warning about if (i = 0).
Third option - disallow assignment inside conditionals entirely:
In high reliability situations, you are not allowed (without good explanation in the comments preceeding) to assign a variable in a conditional statement - it eliminates this question entirely because you either turn it off at the compiler or with LINT and only under very controlled situations are you allowed to use it.
Keep in mind that generally the same code is generated whether the assignment occurs inside the conditional or outside - it's simply a shortcut to reduce the number of lines of code. There are always exceptions to the rule, but it never has to be in the conditional - you can always write your way out of that if you need to.
So another option is merely to disallow such statements, and where needed use the comments to turn off the LINT checking for this common error.
-Adam
I'd say that (i == 0) would sound more natural if you attempted to phrase a line in plain (and ambiguous) english. It really depends on the coding style of the programmer or the standards they are required to adhere to though.
Personally I don't like (1) and always do (2), however that reverses for readability when dealing with dialog boxes and other methods that can be extra long. It doesn't look bad how it is not, but if you expand out the MessageBox to it's full length. You have to scroll all the way right to figure out what kind of result you are returning.
So while I agree with your assertions of the simplistic comparison of value types, I don't necessarily think it should be the rule for things like message boxes.
both are equal, though i would prefer the 0==i variant slightly.
when comparing strings, it is more error-prone to compare "MyString".equals(getDynamicString())
since, getDynamicString() might return null.
to be more conststent, write 0==i
Well, it depends on the language and the compiler in question. Context is everything.
In Java and C#, the "assignment instead of comparison" typo ends up with invalid code apart from the very rare situation where you're comparing two Boolean values.
I can understand why one might want to use the "safe" form in C/C++ - but frankly, most C/C++ compilers will warn you if you make the typo anyway. If you're using a compiler which doesn't, you should ask yourself why :)
The second form (variable then constant) is more readable in my view - so anywhere that it's definitely not going to cause a problem, I use it.
Rule 0 for all coding standards should be "write code that can be read easily by another human." For that reason I go with (most-rapidly-changing value) test-against (less-rapidly-changing-value, or constant), i.e "i == 0" in this case.
Even where this technique is useful, the rule should be "avoid putting an lvalue on the left of the comparison", rather than the "always put any constant on the left", which is how it's usually interpreted - for example, there is nothing to be gained from writing
if (DateClass.SATURDAY == dateObject.getDayOfWeek())
if getDayOfWeek() is returning a constant (and therefore not an lvalue) anyway!
I'm lucky (in this respect, at least) in that these days in that I'm mostly coding in Java and, as has been mentioned, if (someInt = 0) won't compile.
The caveat about comparing two booleans is a bit of a red-herring, as most of the time you're either comparing two boolean variables (in which case swapping them round doesn't help) or testing whether a flag is set, and woe-betide-you if I catch you comparing anything explicitly with true or false in your conditionals! Grrrr!
In C, yes, but you should already have turned on all warnings and be compiling warning-free, and many C compilers will help you avoid the problem.
I rarely see much benefit from a readability POV.
Code readability is one of the most important things for code larger than a few hundred lines, and definitely i == 0 reads much easier than the reverse
Maybe not an answer to your question.
I try to use === (checking for identical) instead of equality. This way no type conversion is done and it forces the programmer do make sure the right type is passed,
You are right that placing the important component first helps readability, as readers tend to browse the left column primarily, and putting important information there helps ensure it will be noticed.
However, never talk down to a co-worker, and implying that would be your action even in jest will not get you high marks here.
I always go with the second method. In C#, writing
if (i = 0) {
}
results in a compiler error (cannot convert int to bool) anyway, so that you could make a mistake is not actually an issue. If you test a bool, the compiler is still issuing a warning and you shouldn't compare a bool to true or false. Now you know why.
I personally prefer the use of variable-operand-value format in part because I have been using it so long that it feels "natural" and in part because it seems to the predominate convention. There are some languages that make use of assignment statements such as the following:
:1 -> x
So in the context of those languages it can become quite confusing to see the following even if it is valid:
:if(1=x)
So that is something to consider as well. I do agree with the message box response being one scenario where using a value-operand-variable format works better from a readability stand point, but if you are looking for constancy then you should forgo its use.
This is one of my biggest pet peeves. There is no reason to decrease code readability (if (0 == i), what? how can the value of 0 change?) to catch something that any C compiler written in the last twenty years can catch automatically.
Yes, I know, most C and C++ compilers don't turn this on by default. Look up the proper switch to turn it on. There is no excuse for not knowing your tools.
It really gets on my nerves when I see it creeping into other languages (C#,Python) which would normally flag it anyway!
I believe the only factor to ever force one over the other is if the tool chain does not provide warnings to catch assignments in expressions. My preference as a developer is irrelevant. An expression is better served by presenting business logic clearly. If (0 == i) is more suitable than (i == 0) I will choose it. If not I will choose the other.
Many constants in expressions are represented by symbolic names. Some style guides also limit the parts of speech that can be used for identifiers. I use these as a guide to help shape how the expression reads. If the resulting expression reads loosely like pseudo code then I'm usually satisfied. I just let the expression express itself and If I'm wrong it'll usually get caught in a peer review.
We might go on and on about how good our IDEs have gotten, but I'm still shocked by the number of people who turn the warning levels on their IDE down.
Hence, for me, it's always better to ask people to use (0 == i), as you never know, which programmer is doing what.
It's better to be "safe than sorry"
if(DialogResult.OK == MessageBox.Show("Message")) ...
I would always recommend writing the comparison this way. If the result of MessageBox.Show("Message") can possibly be null, then you risk a NPE/NRE if the comparison is the other way around.
Mathematical and logical operations aren't reflexive in a world that includes NULLs.

c++ styleguide: why to have non-lvalues on the left side?

In one C++ coding style guide,
I found one particular recommendation (page 41, recommendation number 53):
Always have non-lvalues on the left side (0 == i instead of i == 0).
And I don't uderstand what is this good for? Are to sticking to this practice?
I'm not and I don't know why is his a good practice. The only advantage I can think of is that is will avoid mistaking an unintentional assignment with a comparison (if (foo = 0){} versus if (foo == 0){})
Have you got any other ideas why should I use it?
Yes, you guessed it right. It's the good, old Yoda condition!!!
As you say, the reason some people use it is to occasionally avoid typing = when they mean ==.
Since it only catches some cases, where you're comparing a lvalue with a constant or rvalue, and every compiler I know of will warn you if you make that mistake, there's very little point in doing it.
At least to native English speakers, it makes the code read as if it's written backwards; so a "Yoda condition" some call it do. Like many rules in corporate style guides, it dates back to a time when dealing with unforgiving compilers was a higher priority than writing readable code.

Conventions, Style, and Usage for Clojure Constants?

What are the best practices for defining constants in Clojure in terms of style, conventions, efficiency, etc.
For example, is this right?
(def *PI* 3.14)
Questions:
Should constants be capitalized in Clojure?
Stylistically, should they have the asterisk (*) character on one or both sides?
Any computational efficiency considerations I should be aware of?
I don't think there is any hard and fast rules. I usually don't give them any special treatment at all. In a functional language, there is less of a distinction between a constant and any other value, because things are more often pure.
The asterisks on both sides are called "ear muffs" in Clojure. They are usually used to indicate a "special" var, or a var that will be dynamically rebound using binding later. Stuff like out and in which are occasionally rebound to different streams by users and such are examples.
Personally, I would just name it pi. I don't think I've ever seen people give constants special names in Clojure.
EDIT: Mister Carper just pointed out that he himself capitalizes constants in his code because it's a convention in other languages. I guess this goes to show that there are at least some people who do that.
I did a quick glance through the coding standards but didn't find anything about it. This leads me to conclude that it's really up to you whether or not you capitalize them. I don't think anyone will slap you for it in the long run.
On the computational efficiency front you should know there is no such thing as a global constant in Clojure. What you have above is a var, and every time you reference it, it does a lookup. Even if you don't put earmuffs on it, vars can always be rebound, so the value could always change, so they are always looked up in a table. For performance critical loops this is most decidedly non-optimal.
There are some options like putting a let block around your critical loops and let the value of any "constant" vars so that they are not looked up. Or creating a no-arg macro so that the constant value is compiled into the code. Or you could create a Java class with a static member.
See this post, and the following discussion about constants for more info:
http://groups.google.com/group/clojure/msg/78abddaee41c1227
The earmuffs are a way of denoting that a given symbol will have its own thread-local binding at some point. As such, it does not make sense to apply the earmuffs to your Pi constant.
*clojure-version* is an example of a constant in Clojure, and it's entirely in lower-case.
Don't use a special notation for constants; everything is assumed a constant unless specified otherwise.
See http://dev.clojure.org/display/community/Library+Coding+Standards
Clojure has a variety of literals such as:
3.14159
:point
{:x 0
:y 1}
[1 2 3 4]
#{:a :b :c}
The literals are constant. As far as I know, there is no way to define new literals. If you want to use a new constant, you can effectively generate a literal in the code at compile-time:
(defmacro *PI* [] 3.14159265358979323)
(prn (*PI*))
In Common Lisp, there's a convention of naming constants with plus signs (+my-constant+), and in Scheme, by prefixing with a dollar sign ($my-constant); see this page. Any such convention conflicts with the official Clojure coding standards, linked in other answers, but maybe it would be reasonable to want to distinguish regular vars from those defined with the :const attribute.
I think there's an advantage to giving non-function variables of any kind some sort of distinguishing feature. Suppose that aside from variables defined to hold functions, you typically only use local names defined by function parameters, let, etc. If you nevertheless occasionally define a non-function variable using def, then when its name appears in a function definition in the same file, it looks to the eye like a local variable. If the function is complex, you may spend several seconds looking for the name definition within the function. Adding a distinguishing feature like earmuffs or plus signs or all uppercase, as appropriate to the variable's use, makes it obvious that the variable's definition is somewhere else.
In addition, there are good reasons to give special constants like pi a special name, so no one has to wonder whether pi means, say, "print-index", or the i-th pizza, or "preserved interface". Of course I think those variables should have more informative names, but lots of people use cryptic, short variable names, and I end up reading their code. I shouldn't have to wonder whether pi means pi, so something like PI might make sense. None would think that's a run of the mill variable in Clojure.
According to the "Practical Clojure" book, it should be named *pi*

Is it feasible to ascribe pronunciations to distinct source code concepts?

I frequently tutor fellow students in programming, most often in C++ or Java.
It is uniquely aggravating to try to verbally convey the essential syntax of a C++ expression. The speaker must give either an idiomatic translation into English, or a full specification of the code in verbal longhand, using explicit yet slow terms such as "opening parenthesis", "bitwise and", et cetera. Neither of these solutions is optimal.
In C++, there is a finite set of keywords—63—and operators—54, discounting named operators and treating compound assignment operators and prefix versus postfix auto-increment and decrement as distinct. There are just a few types of literal, a similar number of grouping symbols, and the semicolon. Unless I'm utterly mistaken, that's about it.
Would it not then be feasible to ascribe a concise, unique pronunciation to each of these distinct concepts (including one for whitespace, where it is required) and go from there? Programming languages are far more regular than natural languages, so the pronunciation could be standardised.
Instead of creating new "words" to describe them, for things such as "include" you could simply prefix it with "keyword" when saying it aloud. You could use words/phrases commonly known to say other parts as well. As with any new programmer, you have to literally describe everything anyway, so I don't think that requires special attention. I think creating new words is the harder method...
So, for example:
#include <iostream>;
int main()
{
if (1 < 2)
return 1;
else
return 0;
}
Could be read out as:
(keyword) include iostream new-line
(keyword) int main no params start
block if number 1 (operator) less than
number 2 new-line (keyword) return
number 1 new-line (keyword) else
new-line (keyword) return number 0 end
block
Treat words in () as optional descriptive words, most likely to be used in more complex code. You could use the word 'literal' if you want them to actually write the descriptive word. For example
(keyword) if literal number (operator)
less than literal keyword
becomes
if (number < keyword)
Other words could be given defined meanings as well, such as 'split-line' when you want them to continue on the next line, without closing any currently open parenthesis, etc.
I personally find this method quite simple to use and easy to teach. YMMV, as always.
Of course, this doesn't solve the internationalisation issue, but at worst, would result in 'new words' being used in the non-English languages, which is no worse than the proposed solution you offered.
As a blind developer, programming since I was 13, I found this question really interesting. First of all, as mentioned by other peple, learning a new language to be able to understand code is not a practical solution, as it would probably take longer to learn the spoken utterances as it would to learn the actual programming language.
Reading the question/answers two further points occured to me:
Firstly, you'd be surprised how important "thinking time" is. I have previously programmed in C/C++/Java and now use C# as my primary language, and consider myself very competant. But when I did a couple of projects in Python, I found the reduced punctuation robbed me of my "thinking time" - subconsciously, I was using the punctuation to digest what I'd just heard - fascinating... However, the situation is a bit different when it comes to identifiers, as these aren't well known by the listener - I personally find it hard to listen to code with acronym variables (RGXRatio, RGVRatio) as I don't have time to figure out what it means. On the flip side, hungarian notation and initial underscores makes code hard to listen to as the length of the variables (in terms of time taken to speak) is much longer than the more important operations being performed on those variables.
Another thing to consider is that the length of the audio stream is an end result, but not the root cause. The reason the audio is so long is because audio is a one-dimensional medium, whereas reading text is a 2d medium with the ability to jump around and skip past irelevant/familiar text. It wouldn't work for a face-to-face lecture, but what if there were keyboard commands for controlling the speech. In text documents my screen reader lets me jump to the next line, but what if this were adapted to the semantics of a programming language. some research, such as by T V Raman at Google, includes using different voices for syntax highlighting, and audio cues to mark metadata like capitals.
I know the original question specifically related to a lecture given to a class, but if like myself you have to listen to entire files of source code , I also find the structure of the code makes a huge difference. I personally read code like a story - left to right, top to bottom. so it's very hard to trace through unfamiliar code when it's written bottom-up.
So would it not then be feasible to simply ascribe a concise, unique pronunciation to each of these distinct concepts (including one for whitespace, where it is required) and go from there? Programming languages are far more regular than natural languages, so the pronunciation could be standardised
Perhaps, but you've lost sight of your goal. The premise was that the person listening did not already know the language. If he does, we can simply say "include iostream" when we mean #include <iostream>, or "vector of int" when we mean std::vector<int>.
Your premise was that the person listening is not familiar enough with the language to understand what you read out loud unless you read out exactly what it says.
Now, inventing a whole new language just to describe the primitives that occur in your source code doesn't solve the problem. Instead, you still have to read out every syntactic token (with simpler, more "standardized" pronunciations, yes, but they still have to be read out loud), and the person listening still won't understand you, because if they don't know C++ well enough to understand "include iostream", they won't understand your standardized pronunciation either. And if you're going to teach them your pronunciation, why bother, when you could've just taught them to understand C++ syntax directly instead?
There's also the root problem that C++ code tends to consist of a lot of syntactic tokens. Take a line as simple as this:
std::vector<int> v;
I count 9 tokens. Not one of them can be omitted. If the person listening does not understand the code and syntax well enough to understand a high-level description such as "declare a vector of int, named v", then you'll have to read out all 9 tokens in some form. Even if you come up with simpler names than "namespace resolution operator" and "less than sign", you still have to list 9 token names. Which is a lot of work.
In short, no, I don't think it'd work. First, it's still too cumbersome, and second, it's presuming prior knowledge on the part of the person listening, when the motivation for this was that the person listening was a student without the prior knowledge that made it possible to understand a high-level description of the code.

Why can't variable names start with numbers?

I was working with a new C++ developer a while back when he asked the question: "Why can't variable names start with numbers?"
I couldn't come up with an answer except that some numbers can have text in them (123456L, 123456U) and that wouldn't be possible if the compilers were thinking everything with some amount of alpha characters was a variable name.
Was that the right answer? Are there any more reasons?
string 2BeOrNot2Be = "that is the question"; // Why won't this compile?
Because then a string of digits would be a valid identifier as well as a valid number.
int 17 = 497;
int 42 = 6 * 9;
String 1111 = "Totally text";
Well think about this:
int 2d = 42;
double a = 2d;
What is a? 2.0? or 42?
Hint, if you don't get it, d after a number means the number before it is a double literal
It's a convention now, but it started out as a technical requirement.
In the old days, parsers of languages such as FORTRAN or BASIC did not require the uses of spaces. So, basically, the following are identical:
10 V1=100
20 PRINT V1
and
10V1=100
20PRINTV1
Now suppose that numeral prefixes were allowed. How would you interpret this?
101V=100
as
10 1V = 100
or as
101 V = 100
or as
1 01V = 100
So, this was made illegal.
Because backtracking is avoided in lexical analysis while compiling. A variable like:
Apple;
the compiler will know it's a identifier right away when it meets letter 'A'.
However a variable like:
123apple;
compiler won't be able to decide if it's a number or identifier until it hits 'a', and it needs backtracking as a result.
Compilers/parsers/lexical analyzers was a long, long time ago for me, but I think I remember there being difficulty in unambiguosly determining whether a numeric character in the compilation unit represented a literal or an identifier.
Languages where space is insignificant (like ALGOL and the original FORTRAN if I remember correctly) could not accept numbers to begin identifiers for that reason.
This goes way back - before special notations to denote storage or numeric base.
I agree it would be handy to allow identifiers to begin with a digit. One or two people have mentioned that you can get around this restriction by prepending an underscore to your identifier, but that's really ugly.
I think part of the problem comes from number literals such as 0xdeadbeef, which make it hard to come up with easy to remember rules for identifiers that can start with a digit. One way to do it might be to allow anything matching [A-Za-z_]+ that is NOT a keyword or number literal. The problem is that it would lead to weird things like 0xdeadpork being allowed, but not 0xdeadbeef. Ultimately, I think we should be fair to all meats :P.
When I was first learning C, I remember feeling the rules for variable names were arbitrary and restrictive. Worst of all, they were hard to remember, so I gave up trying to learn them. I just did what felt right, and it worked pretty well. Now that I've learned alot more, it doesn't seem so bad, and I finally got around to learning it right.
It's likely a decision that came for a few reasons, when you're parsing the token you only have to look at the first character to determine if it's an identifier or literal and then send it to the correct function for processing. So that's a performance optimization.
The other option would be to check if it's not a literal and leave the domain of identifiers to be the universe minus the literals. But to do this you would have to examine every character of every token to know how to classify it.
There is also the stylistic implications identifiers are supposed to be mnemonics so words are much easier to remember than numbers. When a lot of the original languages were being written setting the styles for the next few decades they weren't thinking about substituting "2" for "to".
Variable names cannot start with a digit, because it can cause some problems like below:
int a = 2;
int 2 = 5;
int c = 2 * a;
what is the value of c? is 4, or is 10!
another example:
float 5 = 25;
float b = 5.5;
is first 5 a number, or is an object (. operator)
There is a similar problem with second 5.
Maybe, there are some other reasons. So, we shouldn't use any digit in the beginnig of a variable name.
The restriction is arbitrary. Various Lisps permit symbol names to begin with numerals.
COBOL allows variables to begin with a digit.
Use of a digit to begin a variable name makes error checking during compilation or interpertation a lot more complicated.
Allowing use of variable names that began like a number would probably cause huge problems for the language designers. During source code parsing, whenever a compiler/interpreter encountered a token beginning with a digit where a variable name was expected, it would have to search through a huge, complicated set of rules to determine whether the token was really a variable, or an error. The added complexity added to the language parser may not justify this feature.
As far back as I can remember (about 40 years), I don't think that I have ever used a language that allowed use of a digit to begin variable names. I'm sure that this was done at least once. Maybe, someone here has actually seen this somewhere.
As several people have noticed, there is a lot of historical baggage about valid formats for variable names. And language designers are always influenced by what they know when they create new languages.
That said, pretty much all of the time a language doesn't allow variable names to begin with numbers is because those are the rules of the language design. Often it is because such a simple rule makes the parsing and lexing of the language vastly easier. Not all language designers know this is the real reason, though. Modern lexing tools help, because if you tried to define it as permissible, they will give you parsing conflicts.
OTOH, if your language has a uniquely identifiable character to herald variable names, it is possible to set it up for them to begin with a number. Similar rule variations can also be used to allow spaces in variable names. But the resulting language is likely to not to resemble any popular conventional language very much, if at all.
For an example of a fairly simple HTML templating language that does permit variables to begin with numbers and have embedded spaces, look at Qompose.
Because if you allowed keyword and identifier to begin with numberic characters, the lexer (part of the compiler) couldn't readily differentiate between the start of a numeric literal and a keyword without getting a whole lot more complicated (and slower).
C++ can't have it because the language designers made it a rule. If you were to create your own language, you could certainly allow it, but you would probably run into the same problems they did and decide not to allow it. Examples of variable names that would cause problems:
0x, 2d, 5555
One of the key problems about relaxing syntactic conventions is that it introduces cognitive dissonance into the coding process. How you think about your code could be deeply influenced by the lack of clarity this would introduce.
Wasn't it Dykstra who said that the "most important aspect of any tool is its effect on its user"?
The compiler has 7 phase as follows:
Lexical analysis
Syntax Analysis
Semantic Analysis
Intermediate Code Generation
Code Optimization
Code Generation
Symbol Table
Backtracking is avoided in the lexical analysis phase while compiling the piece of code. The variable like Apple, the compiler will know its an identifier right away when it meets letter ‘A’ character in the lexical Analysis phase. However, a variable like 123apple, the compiler won’t be able to decide if its a number or identifier until it hits ‘a’ and it needs backtracking to go in the lexical analysis phase to identify that it is a variable. But it is not supported in the compiler.
When you’re parsing the token you only have to look at the first character to determine if it’s an identifier or literal and then send it to the correct function for processing. So that’s a performance optimization.
Probably because it makes it easier for the human to tell whether it's a number or an identifier, and because of tradition. Having identifiers that could begin with a digit wouldn't complicate the lexical scans all that much.
Not all languages have forbidden identifiers beginning with a digit. In Forth, they could be numbers, and small integers were normally defined as Forth words (essentially identifiers), since it was faster to read "2" as a routine to push a 2 onto the stack than to recognize "2" as a number whose value was 2. (In processing input from the programmer or the disk block, the Forth system would split up the input according to spaces. It would try to look the token up in the dictionary to see if it was a defined word, and if not would attempt to translate it into a number, and if not would flag an error.)
Suppose you did allow symbol names to begin with numbers. Now suppose you want to name a variable 12345foobar. How would you differentiate this from 12345? It's actually not terribly difficult to do with a regular expression. The problem is actually one of performance. I can't really explain why this is in great detail, but it essentially boils down to the fact that differentiating 12345foobar from 12345 requires backtracking. This makes the regular expression non-deterministic.
There's a much better explanation of this here.
it is easy for a compiler to identify a variable using ASCII on memory location rather than number .
I think the simple answer is that it can, the restriction is language based. In C++ and many others it can't because the language doesn't support it. It's not built into the rules to allow that.
The question is akin to asking why can't the King move four spaces at a time in Chess? It's because in Chess that is an illegal move. Can it in another game sure. It just depends on the rules being played by.
Originally it was simply because it is easier to remember (you can give it more meaning) variable names as strings rather than numbers although numbers can be included within the string to enhance the meaning of the string or allow the use of the same variable name but have it designated as having a separate, but close meaning or context. For example loop1, loop2 etc would always let you know that you were in a loop and/or loop 2 was a loop within loop1.
Which would you prefer (has more meaning) as a variable: address or 1121298? Which is easier to remember?
However, if the language uses something to denote that it not just text or numbers (such as the $ in $address) it really shouldn't make a difference as that would tell the compiler that what follows is to be treated as a variable (in this case).
In any case it comes down to what the language designers want to use as the rules for their language.
The variable may be considered as a value also during compile time by the compiler
so the value may call the value again and again recursively
Backtracking is avoided in lexical analysis phase while compiling the piece of code. The variable like Apple; , the compiler will know its a identifier right away when it meets letter ‘A’ character in the lexical Analysis phase. However, a variable like 123apple; , compiler won’t be able to decide if its a number or identifier until it hits ‘a’ and it needs backtracking to go in the lexical analysis phase to identify that it is a variable. But it is not supported in compiler.
Reference
There could be nothing wrong with it when comes into declaring variable.but there is some ambiguity when it tries to use that variable somewhere else like this :
let 1 = "Hello world!"
print(1)
print(1)
print is a generic method that accepts all types of variable. so in that situation compiler does not know which (1) the programmer refers to : the 1 of integer value or the 1 that store a string value.
maybe better for compiler in this situation to allows to define something like that but when trying to use this ambiguous stuff, bring an error with correction capability to how gonna fix that error and clear this ambiguity.