C++ keywords in scoped enumerators

C++ keywords in scoped enumerators - c++

Can I use reserved identifiers in a scoped enumerator; and importantly, why not?
enum struct token { void, int, return, not, if };

Why not is VERY difficult to answer but isn't really all that important. We could spend years discussing why some features are in a language and why some aren't. It would be a waste of energy. This is the language definition you have to work with and all languages that I'm familiar with follow the same paradigm. (.NET allows reserved keywords used as identifier names with but only when prefixed with a special symbol(s).) Also, no language is perfect and certainly will not fit every single criteria that any individual programmer might be looking for.
But consider if keywords were allowed in enums and in other situations: The compiler would certainly be much, much more complicated and therefore slower. Also the resulting code would be more likely to confuse the reader and make it less maintainable. C++ gives plenty of rope to hang yourself with already. Why ask for more?
But, if you really want a collection of values that reflect the language's reserved symbols then you might consider using something like std::map with string keys and string values. That might come close to giving you what you want w/o any internal voodoo.

Related

Why allow function declarations inside function bodies

Why does the syntax allow function declarations inside function bodies?
It does create a lot of questions here on SO where function declarations are mistaken for variable initializations and the like.
a object();
Not to mention most vexing parse.
Is there any use-case that is not easily achieved by the more common scope hiding means like namespaces and members?
Is it for historical reasons?
Addendum: If for historical reasons, inherited from C to limit scope, what is the problem banning them?

Whilst many C++ applications are written solely in C++ code, there are also a lot of code that is some mixture of C and C++ code. This mixture is definitely something of C++'s important part of its usefulness (including the "easy" interfacing to existing API's, anything from OpenGL or cURL to custom hardware drivers that are written in C, can pretty much be used directly with very little effort, where trying to interface your custom hardware C driver into a Basic interpreter is pretty diffcult)
If we start breaking that compatibility by removing things "for no particular value", then C++ is no longer as useful. Aside from giving better error messages in a condition that is confusing, which of course is useful in itself, it's hard to see how it's useful to REMOVE this - and that's of course assuming NONE of the C++ is using this in itself - and I wouldn't be surprised if it DOES happen at times even in modern code (for whatever good or bad reasons).
In general, C++ tries very hard to not break backwards compatibility - and this, in my mind, is a good thing. That's why the keyword static is used for a bunch of different things, rather than adding a new keyword, and auto means something different now than it used to in C, but it's not a "new" keyword that could break existing code that happened to use whatever other word chosen (and that is a small break, but nobody really used it for the past 20 years anyway).

Well, the ability to declare functions within function bodies is inherited from C so, by definition, there is a reason involving historical and backward-compatibility reasons. When there is likely to be real-world code which uses a feature, the argument to remove that feature from the language is weakened.
People - particularly those who only use the latest version of the language, and are not required to maintain legacy code - do tend to under-estimate how strong an argument backward compatibility is in C++. The original C++ standard was specifically required to maintain backward compatibility with C. As a rough rule, standards discourage removing old features if doing so is likely to break existing code. It can be done, however, if the only possible usage causes a danger that cannot be prevented (which is reason for removal of gets(), for example).
When maintaining legacy code there are often significant costs with updating a code base to replace all instances of an old construct with some modern replacement. A coding change that may be insignificant for a hobbyist programmer may be extremely costly when maintaining large-scale code bases in regulatory environments, where it is necessary to provide formal evidence and audit trail that the change of code does not affect ability to meet its original requirement.
There are certain programming styles where it is useful to be able to limit the scope of any declarations. Not everyone uses such programming styles, but the reason such features are in the language is to allow the programmer the choice of programming technique. And, whether advocates of removing such features like it or not, there is a certain amount of code which uses such constructs usefully. That significantly weakens the case for removing the feature from the language.
These sorts of arguments will tend to come up for languages that are used in large-scale development, to develop systems in regulatory environments, etc etc. C and C++ (and a number of other languages) are used in such settings, so will tend to accumulate some set of features for "historical" or "backward compatibility" reasons. It is possible to make a case for removing such features by providing evidence that the feature is not in real-world use. But, since the argument is about justifying a negative claim, that is difficult (all it needs is someone to provide ONE example of continuing real-world beneficial usage and suddenly a counter-example exists which supports the case for keeping the feature).

Are "n" or "ch" prefixes common prefixes when naming int or char variables in C++?

I'm currently going through learncpp.com's C++ tutorials and I'm seeing that their variable naming trend has them naming int variables with an "n" prefix (i.e. int nValue) and "ch" prefixes for char variables (i.e. char chOperation). Is this something that is commonplace in the industry that I should form as a habit now?

Is this something that is commonplace in the industry?
This practice was common in some parts of Microsoft twenty or thirty years ago, due to a misunderstanding of a somewhat more useful convention used by other parts of the company (that of tagging variables indicate their purpose which, in a weakly typed language, can help avoid various kinds of category error). Neither convention serves any useful purpose in a strongly typed language like C++: the type system can catch such errors automatically and more reliably.
It became widely used by others, long after Microsoft (presumably) realised that it was pointless and advised against its use, presumably in the belief that emulating Microsoft's habits might also emulate their success. It's still occasionally seen today, by people who develop habits and never question their usefulness, and by companies who prioritise style guides above software.
Personally, I find it useful as a warning that the code is likely to contain worse horrors.
I should form as a habit now?
It only serves to make the code harder to read, and misleading if you forget to update the tags when you change a variable's type. You should develop a habit of writing clear, readable code, not of sprinkling it with mysterious runes.
Disclaimer: the brief comments about Microsoft are intended to give historical context and are not intended to be an authorative account of Microsoft's policy decisions; specifically the phrase "[Microsoft] realised [it] was pointless" is intended to mean "[some people at Microsoft] realised [the topic under discussion, using redundant type tags in modern C++ in most contexts] was pointless" not (as a commentor appears to have read) "[the entirety of Microsoft] realised [all use of variable tagging] was pointless". All opinions are my own, and may be based on imperfect knowledge.

Yes, they are common (esp. in Windows related projects)
But different projects may use different coding styles. So if you're working with an existing project, then the best is to stick to the style it already follows.
The naming style you mentioned is known as Hungarian style, which is typically used in Windows related projects. In the Hungarian style, variables are formatted in camel case (e.g., CamelCase) and prefixed by their scope and types:
[scope prefix]_[variable type][actual variable name in camel-cased style]
For example:
m_nMemberInteger
is an integer (according to it prefix n), in addition, it's a member variable (according to its prefix m_) to some structure / class, and you can find the complete list of scope and type prefixes used in the Hungarian style in the above link.
However, in linux-based projects, you will usually find people using different coding styles (e.g.,
Google c++ coding style), which uses only lower-cases and underscore _ to name their variables.

This looks similar to Hungarian Notation. Such things are sometimes used, especially in certain fields of programming. Personally I think it makes code look messy. In C++ you should think more about what the object means rather than what its underlying type may happen to be. And modern editors easily let you look up the type of variables, so it is kind of obsolete. I can understand why it was used when editors weren't so helpful..

As mentioned in the other comments, this is known as "Hungarian Notation" and is used to make the type of a variable obvious. While it's perhaps arguable whether it's worth the trouble to label the type, another common convention (especially in C++) is to use prefixes to indicate information about a variable's usage. This is especially useful for references and member variables. For instance, one might have a function that looks like
void MyClass::myMethod(const int& iInput, int& oOutput, int &ioInputAndOutput)
{
oOutput = ioInputAndOutput + mMemberData + iInput;
ioInputAndOutput *= 2;
}
As also mentioned above, the important thing is consistency, which will prevent more bugs than any particular convention. On collaborative projects, it's usually worth it to conform to the existing convention.

Hungarian Notation in Fortran

Is it considered good or bad practice? A friend of mine told me that, generally speaking, it was not considered a good practice in most languages now a days, but that he thought he heard that this was not the case with fortran. Is this true? And if it is true, why?

In 30+ years of programming in Fortran I've never come across anyone using Hungarian notation. Your friend might be confusing Fortran's long-standing (and now deprecated) ability to implicitly type variables depending on the initial letter of their names. But that long predates widespread awareness of what is now called Hungarian notation.
On the more general issue of whether Hungarian notation is, or would be, a good idea for adoption when writing Fortran, I agree with David, and (I think) the wider software development community which regards it as a not-very-useful practice. Fortran certainly does not require it, variable (and other) names follow rules very similar to those in many programming languages.

Systems Hungarian
Systems Hungarian notation essentially adds type information into variable names so that you know the types of the values you are using, and are less likely to use a value in an incorrect way. This is of dubious benefit in modern strongly typed languages, as type-safety significantly reduces the chance of using/accessing a variable erroneously.
However, for less strongly typed languages, including this sort of information can be beneficial, as it keeps the programmer constantly aware of the data they are dealing with.
The biggest criticism of HN (besides it being of limited benefit in a strongly typed language) is that the type prefixes used can result in extremely obscure and confusing variable names, so while you may gain a measure of pseudo type-safety, you can lose clarity in the code (or at least create code that is only readable to be an expert in your conventions) which can harm maintainability.
If you need to produce code to someone else's naming convention then you have little choice, but if you are in control, you can define a sensible, clear, simple naming convention that may suit your needs better, giving a good balance between making variable names information-rich and introducing confusing clutter. For example, one practice is to name boolean variables in the form of IsOpen rather than Open, to avoid confusion between words that can be used as both verbs and nouns. It also makes it easy to see when you are mixing booleans into integer or floating point expressions. This approach also works intuitively, so requires no special knowledge for any programmer to be able to read and understand the code.
Apps Hungarian
In response to the first comment, there is another form of Hungarian Notation (Apps Hungarian). See Wikipedia for a more in-depth description of it, but essentially it associates information relating to the usage or purpose of a variable with its name, rather than its type.
In strongly typed languages this is a much more useful approach, and is well worth considering - or at least (IMHO) in concept. I find the prefixes chosen often tend to be rather complicated and unfriendly (e.g. rw instead of row to my mind just obfuscates the prefix without any practical gain). I also think many examples are rather pointless (e.g. str to indicate that a variable is a string, in many languages is redundant because strings are often only represented in one form, and if the variable is named sensibly ("UserName" rather than "Data") it is often a pretty obvious that it will be a string).
A Modern Alternative
In my opinion/experience, what usually matters is clarifying a few key differences between variables (e.g. we need to treat members, pointers, volatiles and constants quite differently from each other - mixing up a member and a parameter or indexing an array with the wrong index variable can be catastrophic, and modern compilers do little to protect us from these mistakes). The difference between a list and a string is usually obvious if sensible descriptive variable naming is used, and type-safe languages will tell us if we've mixed these types up, so we don't need prefixes for these cases. This led to my own extremely simple prefixing approach which is explained in my answer to this stack overflow question.
Hopefully this post may give you something to think about when deciding if prefixes will be beneficial for you. Ultimately, any prefixing scheme you apply needs to be something that you believe (or better, can prove) is beneficial to you and your team. Don't just follow someone else's scheme - think about how and why a prefix might be useful, and evaluate it objectively before you adopt it or discard it.

It really depends more on the development environment and the team standards than on the language. If you happen to be using Fortran, but in a good IDE with code analysis and navigation, then you probably don't need Hungarian Notation.
I think the real question to ask yourself is, "What would Hungarian Notation give me?"
That is, what is its value? Even when using an old language, you can still apply good coding practices and techniques. Keep routines small, keep variable scope small, etc. Now, I'm not expert in Fortran, so I don't know what the limitations are. But I imagine it would still hold true.
Hungarian Notation is particularly useful when you're using a limited editor (no information about the variable on mouse hover, for example) and when your variable has a fairly long lifespan. By which I mean the use of the variable continues well beyond its definition.
Since the inception of Fortran so long ago, we as an industry have learned a lot about organizing code and working effectively on development teams. (Even if the team is only you... Remember that any code you wrote more than a few months ago may as well have been written by somebody else.) You may be able to apply these lessons to your Fortran code.
Use informative variable names. Storage isn't expensive, characters don't take long to send over modems anymore... Reasonably long variable names are both acceptable and encouraged, provided that they convey information about what that variable is and means. Additionally, keep the use of the variable close to its definition. So the additional context of the surrounding code reveals more information about the variable than just its name.
In short, if you have a global variable called price used throughout the application, then calling it dblPrice to indicate that it's an double adds useful information to the variable name. But there are more meaningful ways to add that useful information. price is a poor name for a large-scope variable in the first place, and the scope may be able to be narrowed if possible.

Symbolic vs word operators eg ( '||' vs 'or' and '!' vs 'not')

I think or is keyword in c++.
It might be that I've been doing too much python code recently but I find or is more readable than || and xor much more readable than ^.
Is it a good idea to use the word alternatives to the symbolic operators?
Why don't I see them used more?

The unsatisfying answer is that you should use symbolic operators because everyone else does.
An arguably more sensible reason is that they stand out more from the rest of the code.

Is it a good idea to use the word alternatives to the symbolic operators?
Completely depends on the target audience for your code – both people and tools. People can be unused to them, and some tools don't recognize them. (Sometimes those tools use <ciso646> to define them as macros.)
I've started to use "and" and "or" more, especially when switching between C++ and Python, and it has been more readable. The bit of extra consistency between languages matters more than I first thought it would, but more importantly, && and || are control structures and not operators (i.e. short-circuiting), thus making them words differentiates from operators.
(Yes, technically they're operators in C++, but they're more similar to if, else, return, while, and so forth than +, -, *, and other operators. The comma and conditional operators are similarly control structures, and it probably isn't a coincidence they are often found confusing, or at least less readable than separate statements and if/else, respectively.)
However, I very rarely use them in new code written for SO, for example, because I've not yet encountered a question where bringing up this side issue was more important than being readable to SO's C++ audience.

Every C++ programmer knows about && and ||.
Not every C++ programmer is aware that and and or are legal alternatives.
For that reason alone, you're better off sticking with what's commonly used.
It is pretty easy to get used to, so I'd say it's not a big deal, and definitely not worth potentially confusing the reader of your code over.

These keywords are alternative tokens which were added to the standard (Standard C) in 1995. See details here
https://en.wikipedia.org/wiki/C_alternative_tokens
Why the keywords were added:
The alternative tokens allow programmers to use C language bitwise and logical operators which could otherwise be hard to type on some international and non-QWERTY keyboards.
How they were added:
They are implemented as a group of macro constants in the C standard library in the iso646.h header.
The iso646.h header defines 11 macros including or.
What about C++:
The above-mentioned identifiers are operator keywords in the ISO C++ programming language and do not require the inclusion of a header file. For consistency, the C++98 standard provides the header <ciso646>. However the latter file has no effect, being empty.
So, there is a historic reason for having the keywords in C/C++ languages, and it is not related to what is better to use. As mentioned above, you should stick to a coding convention.

My first question is or a bit-wise or | or a Boolean shortcut or ||?
I bet there is half a dozen people on my team that would have to go look it up.
So I think it is better to stick with the standard convention,
Because that is what people are used to. The whole point of programming is to not be ambiguous.

These keywords are only there for terminals that can't handle the special characters |, & etc. Whether they constitute a more readable code or not is arguable.
If you know what || means then or is not more readable than ||. And if you know the very fundamentals of C++, i.e. the syntax, then in my humble opinion one is not more readable than the other.
Also, C++ programmers in most cases use the special-character alternatives of the keywords. So it's usually a good idea not to be the exception in a project, unless you're starting a project and you're setting the rules.

|| is how you say "boolean or" in C++. If you actually write or you are going to confuse the heck out of readers of your code, even if you can get the compiler to accept it.
I really do have sympathy for your argument, deft. Honestly. C++ code is really ugly (and IMHO hard to follow) due to its reliance on line-noise-like symbology. But that's the C++ philosophy. If you want nice Englishy reable code, C++ is just not your language. Try Ada.
I'm serious here. In a lot of ways Ada is a better language. This is one of them. But if you are going to stick with C++ instead, you need to ebrace it. Trying to write Ada (or Pascal) in C++ is no better than trying to write C++ or Fortran in Ada.

Technical reasons for names containing underscores?

Are there any technical reasons for the use of the underscore in names like (for example) scoped_lock in the Boost library? Why not call it `ScopedLock?
Please note I am not asking about stylistic reasons.

From the Boost Library Requirements and Guidelines,
Given the intent to propose portions of boost for the next revision of the C++ standard library, boost decided to follow the standard library's conventions.

There is no technical reason. If you ignore the stylistic reason, you could write scopedlock, istreamiterator and the-like too.

Readability if you can call that technical... spaces are usually forbidden and underscore is the nearest match. Camel case is horrible to read (an often is reserved for classes as a convention)..

Underscores improve the interface with human neural hardware by creating more space between separate words.
I used to prefer camelcase when I was little, and had a small monitor and small hands. I've mostly come around, though.

Subjectively I find underscores a bit of overkill in code. There is enough abuse of non-alphanumeric symbols in code as is, I think introducing them into identifiers is a bit over the top. Just off the top of my head consider this excerpt from a boost template error:
Derived=boost::transform_iterator<std::binder1st<std::multiplies<size_t>>,boost::counting_iterator<size_t>>,
Base=boost::counting_iterator<size_t>,
Value=boost::detail::transform_iterator_base<std::binder1st<std::multiplies<size_t>>,boost::counting_iterator<size_t>,boost::use_default,boost::use_default>::cv_value_type,
Traversal=boost::use_default,
Reference=boost::detail::transform_iterator_base<std::binder1st<std::multiplies<size_t>>,boost::counting_iterator<size_t>,boost::use_default,boost::use_default>::reference,
Difference=boost::use_default
versus the following that has been converted to Pascal case (I prefer this method):
Derived=boost::TransformIterator<std::Binder1st<std::Multiplies<SizeT>>,boost::CountingIterator<SizeT>>,
Base=boost::CountingIterator<SizeT>,
Value=boost::detail::TransformIteratorBase<std::Binder1st<std::Multiplies<SizeT>>,boost::CountingIterator<SizeT>,boost::UseDefault,boost::UseDefault>::CVValueType,
Traversal=boost::UseDefault,
Reference=boost::detail::TransformIteratorBase<std::Binder1st<std::Multiplies<SizeT>>,boost::CountingIterator<SizeT>,boost::UseDefault,boost::UseDefault>::Reference,
Difference=boost::UseDefault
I can see the advantage of underscores when taken in isolation but with all our other symbols I think we should focus on making programs that read closer to english and not underscore-ese.

There's no technical reason, but there's a reason. You've got to agree with me that it's much easier to read scoped_lock then scopedlock, but scopedLock would make it too. Yet, with underscore is easier to read, IMHO.
But a well-written code is a legible code. It's part of knowing to program well.

There's no technical reason.
Variable names in C++ must only
Start with a letter or underscore
Contain only number, letters (capitalized or not) and underscores
Using this_way or ThisWay is just a matter of style.

The only technical reason is for readability because using CamelCase may cause the wrong interpretation, especially when referring to abbreviations in all caps. A GPS Socket would come out as GPSSocket. There are some better examples, but my mental block precludes me from writing them down. :-(
If you want to get technical, there is no reason since the underscore is a viable character for identifiers.

Although technically speaking there is no difference there could be issues caused by environment. For instance, if you include windows.h you will not want to name any function TextOut even if that's what the function does. The reason is that this name will get replaced by the preprocessor due to the fact that TextOut is a macro in the win32 API. For this reason a project manager may wish to impose non-camel case as a standard.
So there can be technical reasons but there's no reason imposed by the language itself. It's not like Java (does it still do this?) where you are forced by the compiler to use camel case.

There is no technical reason per se. But I do have a reason other than my glib "because they look kewl."
My reason is because I find it useful to distinguish member variables from non-member variables in a convenient way. In particular when I am transferring data from a local variable to a member variable, such as within a constructor. Cheap example:
class Socket
{
public:
Socket(const sockaddr_in& group)
: group_(group)
{
}
private:
sockaddr_in group_;
};
If you ask my opinion, most variable naming schemes are terrible because there are too many rules and too many ways they break down. The classic example of a horrible naming scheme is Hungarian, but even from that I did take something useful: the m_ prefix for member variables came in handy at times. Not too often but often enough for me to borrow the idea if not the method.

There is no technical reason. It is purely stylistic. To be specific, C++ views all symbols that begin with a letter, underscore, or dollar sign the same. The only difference is how they are declared. If you want, you can name your "thing" class as Thing, THING, thing, tHiNg, or even T_h_I_n_G_$ if you want... it won't make a difference to the compiler. However, it does make a difference to other human beings that will look at and use your code. And if you take this too far (such as the last couple of examples I listed), you might even find your life in danger at some point (an angry programmer can be a terrifying thing).

There is no technical reason for or against except that which is imposed by the language, which in this case, does not exist.

This reason skirts the edges of being stylistic, but since no one else has mentioned this so far, I'll simply add that in a case sensitive language like C++, underscores are more memorable than capitalization.
For example, sometimes you might see scopedLock instead of ScopedLock. If you never use caps, that's just one less thing to keep track of.

Well, not the compilers, but prefast rulesets sometimes try to enforce naming conventions. To be frank, so many conventions are really confusing; escpecially when one needs to support old code as well as write new code in multiple languages.

One technical reason I can think of (especially for member function names) is to allow duck-typing. For example, the following boost classes could be used (to some extent) where one expects an STL container:
boost::ptr_container and family
boost::multi_index containers
boost::array
boost::dynamic_bitset (in lieu of boost::bitset)

IMHO, it is pretty reasonable to adopt the style of the Standard Library for the language you use. If it is Java, it is scopedLock, if it is C++ it is scoped_lock. If it is Lisp, it is scoped-lock.
Not that it really matters, anyway.

When C was invented, it was used on Unix, and Unix was operated from terminals that resembled typewriters. Some terminals had both upper and lower case letters, but some terminals had only upper case. If you wanted to use a Unix system but all of the nice terminals were already occupied by your mean greedy selfish colleagues, you got stuck with an older terminal. This is the reason why, if you type your login name in all upper case characters, Unix assumes you don't have lower case. Each lower case letter gets displayed as the corresponding upper case letter, and each upper case letter gets displayed as an asterisk followed by itself.
Now imagine camel casing instead of underscores.
By the way C was based more or less loosely on PL/I. PL/I was punched into cards which originally didn't support lower case, and eventually could be hacked to support lower case but not in a puncher-friendly fashion. Furthermore it was usually printed by printers that didn't support lower case, though a few did. So lower case was out, camel case was out, and programmers were used to underscores. (Except for Cobol programmers, who were used to minus signs in the middle of identifiers meaning this-is-an-identifier not this minus is minus an minus identifier.)
Pascal was invented later, in an environment where lower case letters were more common but still not universal. Camel case became possible because Pascal was case insensitive. Camel case became popular because Pascal didn't allow underscores in identifiers.
So if you like camel case combined with case sensitivity, you're half-Pasced.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js