Which code indentation style is the de facto standard? - indentation

The shop I work in uses 2-space indentation for all code (decided long, long ago), both front-end and back-end development.
It is my opinion that 4-space indentation is the de facto standard in the development world, but I have no facts, couldn't find any, and am not sure where to look.
What is the de facto standard for code indentation?

I think it's commonly understood that using spaces over tabs is a sort of de facto standard in most industries for cross-platform reasons. Having to deal with tab characters on top of spaces and newlines ends up introducing overhead for dealing with lots of IDEs and so forth. It's also kind of common for shops that just use one exclusive IDE for all development to stick to tabs, which allows users to alter the appearance of the tab to whatever they prefer and the code stays consistent (2-4-30 spaces doesn't matter: each indent level is one character). Lea Verou had some thoughts on why tabs are superior. Ironically, there seems to be some consensus that tabs are bad for markup languages.
That being said, in the realm of just 2-4 spaces for indentation, the answer seems to be this: if you use graphical IDEs (intelliJ/eclipse/textmate/sublimetext), the standard is often 4 spaces. If you use command-line (vim/emacs/nano) the standard is often 2 spaces. There are a variety of compelling reasons for both, but there are some concepts that contribute to this. Horizontal space is often at a premium in console environments, especially in environments that are prone to splitting views with screen/tmux etc, but more commonly it's been found that horizontal space is a larger concern.
A lot of people are hating on this question because it's potentially off-topic, but there are actual quantifiable definitions of the definition of a de facto standard if you look through a variety of code-bases in github (or whatever environment is relevant to your industry) and see what is standard in your environment. That being said, there are lots of thoughts on this topic currently, and a lot already in the stackexchange community.

Related

clojure terminating parenthesis syntax

Is there any reason why the expression
(foo5 (foo4 (foo3 (foo2 (foo1 arg)))))
cannot be replaced with
(foo5 (foo4 (foo3 (foo2 (foo1 arg)-)
or the like, and then expanded back?
I know lack of reader macros means that you cannot change syntax, but can this expansion possibly be hard coded into the java?
I do this when I hand write code.
Yes, you could do this, even without reader macros (in fact, you can change Clojures syntax with a bit of hacking).
But, the question is, what would it gain you? Would it always expand to top-level? But then cutting and pasting code would fail, if you moved it to or from top level. And, of course, all the various tools that operate of clojure syntax would need to understand it.
Ultimately if you really dislike all the close parens why not use
(-> arg foo1 foo2 foo3 foo4)
instead?
Yes, this could be done, but I'm not sure it is the right solution and there are a number of negatives which will likely outweigh the benefits.
Suggestions like this are often the result of poor coding tools and a 'traditional' conceptual model for writing code. Selecting the right tools and looking at your code from a slightly different perspective will usually eliminate the cause which lead to this type of suggestion.
Most of the non-functional, non-lispy style languages are based around a token and line model of code. You tend to think of the code in terms of lines of tokens and you tend to edit the code on this basis. There is typically less nesting of expressions and lines are usually terminated with some marker, such as a semi-colan. Likewise, tools such as your editor, have features which have evolved to support token and line based editing. They are good at it.
The lisp style languages are less focused on lines of tokens. The emphasis here is on list forms. lines of tokens are replaced with nested lists of symbols - the line is less relevant and you typically have a lot more nesting of forms. This change means your standard line oriented tools, like your editor, are less suitable. The typical mental model of the code as lines of tokens is also less useful.
With languages like Clojure, your better off thinking in terms of list forms and not lines of code. Once you make this transition, you then start looking for tools which also model the code along these lines. For example, you either look for editors specifically designed to work with lists of data rather than lines of data or you look for editors which have extensions which will allow you to work with lists.
Once your editor understands that lists are the fundamental grouping unit, not lines, things like parenthesis become largely irrelevant from a code writing/editing perspective. You don't worry about closing parenthesis, counting parenthesis nesting levels etc. This all gets managed by the editor automatically. You don't move by lines, you move by lists, you don't kill/delete a line, you kill a list, you don't cut and copy a block of lines, you cut and copy a list of lists etc.
The good news is that in many respects, the structure of these list based code representations are actually easier to manipulate than most of the line based languages. This is primarily because there is less ambiguity or complexity. There are fewer exceptions to the rules and the rules are inherently simple. As a consequence, many editors designed for programmers will have support for this style of coding as well as advanced features which are difficult to implement in less structured code.
My suspicion is that your suggestion to have an additional bit of syntactic sugar to avoid having to type multiple closing parenthesis is actually a symptom of not having the right tools to write your code. Once you do, you will almost never need to enter a closing parenthesis or count opening parens to ensure you get the nesting right. This will be handled by the editor. Your biggest challenge will be in shifting your mental model to think in terms of lists and lists of lists. The parens will become largely invisible and you will jump around in your code according to list units rather than line units. The change is not easy and it can take some time to re-train your brain and fingers, but once you do, you will likely be surprised at how quickly you begin to edit and manipulate your code.
If your an emacs user, I highly recommend extensions such as paredit and lispy. If your using some other editor, look for paredit type extensions. However, as these are extensions, you must also spend some time training yourself to use whatever the key bindings are that the extension uses - there is no point having an extension with great code navigaiton based on lists if you still just arrow around with the arrow keys (unless it is emacs and you have re-bound those arrow keys to use the paredit navigation bindings).

What is the difference between indentation and pretty printing used in c++?

Indentation and pretty printing are used to improve the clarity and readability of a program. Both these styles use spaces. And how can I distinguish between them.
Pretty Printing is a method used to make your code easily readable and understandable. In Wikipedia, Pretty Printing is explained as follows
Prettyprint (or pretty-print) is the application of any of various
stylistic formatting conventions to text files, such as source code,
markup, and similar kinds of content. These formatting conventions can
adjust positioning and spacing (indent style), add color and contrast
(syntax highlighting), adjust size, and make similar modifications
intended to make the content easier for people to view, read, and
understand. Prettyprinters for programming language source code are
sometimes called code beautifiers or syntax highlighters.
Now lets see what Indentation is
In the written form of many languages, an indentation is an empty
space at the beginning of a line to signal the start of a new
paragraph. Many computer languages have adopted this technique to
designate "paragraphs" or other logical blocks in the program.
In computer programming languages, indentation is used to format
program source code to improve readability. Indentation is generally
only of use to programmers; compilers and interpreters rarely care how
much whitespace is present in between programming statements.
From these, one can understand that indentation is a way of Implementing Pretty Printing.

isFollowingCamelCaseConventionInCPlusPlus more_import_than_readability?

I'm moving back from Python to C++ for my next project.
I know why I shouldn't and I know why I should. Never mind that debate.
C++ conventionForVariablesIsCamelCaseAndI'mHavingTroubleAcceptingIt as, at_least_for_my_eyes_it's_less_readable_than_the_lower_case_underscored_convention.
Did any of you encounter an article or information claiming programmers should adopt lower_case_underscored convention and abandon the camelCase, even in C++ projects? Or perhaps research that shows that one is indeed scientifically more readable than the other?
If coding in a team, consistency is probably more important than personal preference; team members should not have to context switch between reading Joe's code and reading Jane's code. Equally if coding academic assignments, course style like team style should be adhered to (for right or wrong), simply because the person awarding the marks will be used to reading that, and you need to extract the best possible mark for your work!
I would suggest otherwise that one convention has little advantage over another. CamelCase does provide a certain efficiency of symbol length.
If it's a private project, use whatever naming convention you feel comfortable with and helps you be productive. Just bear in mind that it is generally a good idea to be in keeping with the overall "style" oft he language / usual practise since any samples / examples etc. will usually use that style, making integration etc. easier and less "jarring".
If it's a public project it's probably better to use the conventions since that's easier for other people to work with.
If it's corporate, do whatever your corporate guidelines mandate. If there aren't any, then I'd do the same as for a public project.
One thing I'd personally say about CamelCase, is not to get completely hung up on it, and apply common sense for the sake of readability. For example, I've often seen abbreviations in camel case names written as part upper / part lower, which I think really hurts readability. In cases like this I'd always go for the more readable option. So, for example I'd always write:
private string targetURL;
rather than
private string targetUrl;
But, this is just personal preference.
There is no agreed upon convention for C++ naming among C++ programmers, however lower case with underscores is used in both the C++ standard library and in boost.
As for the coding standards document you linked...
A link to a random company's coding standards that use "Common practice in the C++ development community" as a justification for their standards, yet provide no citation for that statement smells like a false appeal to authority in order to justify the preferences of whoever wrote the document.
CamelCase vs underscores: Scientific showdown describes a single scientific study which found:
Considering all four hypotheses together, it becomes evident that the camel case style leads to better all around performance once a subject is trained on this style. Training is required to quickly recognize such an identifier.
But then the page disagrees that their conclusions are valid. :)
It also has two polls of visitors to the site, both of which are 50/50 in favor of each style.

Standard/common mark up language for user provided content?

I suppose this might be similar to this question, but I'm wondering if there's a standard/popular mark up language for wiki and similar style user provided content. With the proliferation of different mark up syntaxes out there it seems like one would be a defacto one to implement. There appears to have been at least one group that wanted to create a standard (and an RFC for it), but they appear to have fizzled out around mid-2005.
So does anybody know what the standard/most popular one is? If there isn't one, what's the easiest for users but has a good flexibility for advanced users?
Markdown is popular in some circles. In addition to the original version in Perl, there's a C version called Discount and several other implementations including one in Lua.
There's a list of lightweight markup languages here.
No standard that I know of.
I'd say the most popular (well, in the projects I tend to follow at least) seems to tend towards Markdown and Textile these days.
It's often a matter of preference. My preference goes to Markdown but there wouldn't be that much to argue about if someone preferred another markup.

Bests practices for localized texts in C++ cross-platform applications?

In the current C++ standard (C++03), there are too few specifications about text localization and that makes the C++ developer's life harder than usual when working with localized texts (certainly the C++0x standard will help here later).
Assuming the following scenario (which is from real PC-Mac game development cases):
responsive (real time) application: the application has to minimize non-responsive times to "not noticeable", so speed of execution is important.
localized texts: displayed texts are localized in more than two languages, potentially more - don't expect a fixed number of languages, should be easily extensible.
language defined at runtime: the texts should not be compiled in the application (nor having one application per language), you get the chosen language information at application launch - which implies some kind of text loading.
cross-platform: the application is be coded with cross-platform in mind (Windows - Linux/Ubuntu - Mac/OSX) so the localized text system have to be cross platform too.
stand-alone application: the application provides all that is necessary to run it; it won't use any environment library or require the user to install anything other than the OS (like most games for example).
What are the best practices to manage localized texts in C++ in this kind of application?
I looked into this last year that and the only things I'm sure of are that you should use std::wstring or std::basic_string<ABigEnoughType> to manipulate the texts in the application. I stopped my research because I was working more on the "text display" problem (in the case of real-time 3D), but I guess there are some best practices to manage localized texts in raw C++ beyond just that and "use Unicode".
So, all best-practices, suggestions and information (cross-platform makes it hard I think) are welcome!
At a small Video Game Company, Black Lantern Studios, I was the Lead developer for a game called Lionel Trains DS. We localized into English, Spanish, French, and German. We knew all the languages up front, so including them at compile time was the only option. (They are burned to a ROM, you see)
I can give you information on some of the things we did. Our strings were loaded into an array at startup based on the language selection of the player. Each individual language went into a separate file with all the strings in the same order. String 1 was always the title of the game, string 2 always the first menu option, and so on. We keyed the arrays off of an enum, as integer indexing is very fast, and in games, speed is everything. ( The solution linked in one of the other answers uses string lookups, which I would tend to avoid.) When displaying the strings, we used a printf() type function to replace markers with values. "Train 3 is departing city 1."
Now for some of the pitfalls.
1) Between languages, phrase order is completely different. "Train 3 is departing city 1." translated to German and back ends up being "From City 1, Train 3 is departing". If you are using something like printf() and your string is "Train %d is departing city %d." the German will end up saying "From City 3, Train 1 is departing." which is completely wrong. We solved this by forcing the translation to retain the same word order, but we ended up with some pretty broken German. Were I to do it again, I would write a function that takes the string and a zero-based array of the values to put in it. Then I would use markers like %0 and %1, basically embedding the array index into the string. Update: #Jonathan Leffler pointed out that a POSIX-compliant printf() supports using %2$s type markers where the 2$ portion instructs the printf() to fill that marker with the second additional parameter. That would be quite handy, so long as it is fast enough. A custom solution may still be faster, so you'll want to make sure and test both.
2) Languages vary greatly in length. What was 30 characters in English came out sometimes to as much as 110 characters in German. This meant it often would not fit the screens we were putting it on. This is probably less of a concern for PC/Mac games, but if you are doing any work where the text must fit in a defined box, you will want to consider this. To solve this issue, we stripped as many adjectives from our text as possible for other languages. This shortened the sentence, but preserved the meaning, if loosing a bit of the flavor. I later designed an application that we could use which would contain the font and the box size and allow the translators to make their own modifications to get the text fit into the box. Not sure if they ever implemented it. You might also consider having scrolling areas of text, if you have this problem.
3) As far as cross platform goes, we wrote pretty much pure C++ for our Localization system. We wrote custom encoded binary files to load, and a custom program to convert from a CSV of language text into a .h with the enum and file to language map, and a .lang for each language. The most platform specific thing we used was the fonts and the printf() function, but you will have something suitable for wherever you are developing, or could write your own if needed.
I strongly disagree with the accepted answer. First, the part about using static array lookups to speed up the text lookups is counterproductive premature optimization - Calculating the layout for said text and rendering said text uses 2-4 orders of magnitude more time than a hash lookup. If anyone wanted to implement their own language library it should never be based on static arrays, because doing so trades real benefits (translators don't need access to the code) for imaginary benefits (speed increase of ~0.01%).
Next, writing your own language library to use in your own game is even worse than premature optimization.
There are some extremely good reasons to never write your own localization library:
Planning the time to use an existing localization library is much easier than planning the time to write a localization library. Localization libraries exist, they work, and many people have used them.
Localization is tricky, so you will get things wrong. Every language adds a new quirk, which means whenever you add a new language to your own homegrown localization library you will need to change code again to account for the quirks. Did you know that some languages have more than 2 plural forms, depending on the number of items in question? More than 2 genders (more than 10, even)? Also, the number and date formats vary a lot between different in many languages.
When your application becomes successful you will want add support for more languages. Languages nobody on your team speaks fluently. Hiring someone to write a translation will be considerably cheaper if they already know the tools they are working with.
A very well known and complete localization library is GNU Gettext, which uses the GPL, and should therefore be avoided for commercial work. You can instead use the boost library boost.locale which works with Gettext files, and is free to use and modify for commercial and non-commercial projects of any kind.
GNU Gettext does it all.
There won't be any additional features in the C++0x standard, as far as I can tell. I suspect the Committee considers this a matter for third-party libraries.