Levels of Indentation - c++

How many levels of indentation do you consider reasonable?
I feel that having a C++ function with 4/5+ levels of indentation is normally a bad thing. It implies that you have to mentally keep track of 4/5+ things the whole time.
Is my opinion justified?
(yes, I can avoid having multiple levels of indentation by not indenting at all:)

I agree with you. If a function has more than 4 or 5 nested if/switch/loop/try statements, parts of it should be extracted into their own functions.
This will make the code more readable because the extracted functions' names are usually more descriptive than the code itself.

It depends entirely on the problem you code is trying to solve. Sometimes you have no choice but to have quite deep levels of indentation, though it certainly is a code smell.
I think you're right in that 4 or 5 levels or so is reasonable, more and you should probably be looking to refactor the method.
It's also worth noting that people have been trying to quantify code quality and design metrics for many years. One of the more common metrics is cyclomatic complexity.

Actually, it's not the number of indentation levels that most contributes to unreadable code, it's the length of the module/function/method that you're looking at.
Of course, long sections typically have more levels of indentation because blocks of code are used inline rather than broken out so there is a relation between. Personally, I think there's a smell if a method has more than a couple of screenfuls of code and more than 6 levels of indentation.

I think it can depend on the situation, but generally you'll want less nesting.
Nested loops can kill performance, and nested if statements can ofter be reduced down to less levels.
The problem is that often you run into issues trying to optimize too early in your development. I think that you should always take a little time to consider the possibilities before doing 3+ levels of nesting.

My only pet peeve regarding indentation is when two keywords are used and one depends on the other. For instance:
switch (c) {
case 'x':
foo = bar;
That is bad, you can't have a case outside of a switch, so:
switch (c) {
case 'x':
foo = bar;
.. would be much better. Indentation tends to get crazy in switches with if / else if / else. I also highly recommend keeping an 80 (at most 110 column limit) while indenting. Nobody likes scrolling 30 columns to the right to see what your doing, and 30 columns back to see how you handle the results :) Never mind that poor soul that has to edit your code in dumb 80x25 console mode on a server.

Although what I'm going to say comes from a plain C standpoint, it might apply to C++ as well. I favor the Linux kernel coding style, that is: 8 characters-wide tab indentation (natural tab) and as many levels of indentation as there fit on a 80x25 terminal (without exceeding the 80 characters width, that is).

Too much indentation is probably a sign that you should refactor: create a couple more small methods with good names. That way you break up the logic into smaller pieces that are easier to gobble up.

This relates to the question of how long a function should be. If you keep your function bodies to ten lines or fewer, then it's really hard to get too many levels of indentation.
I think 4 or 5 levels is too many. You can almost certainly factor out some of those inner loops or conditionals into their own functions.
I feel queasy with three levels of indentation myself. Even two levels causes me to reconsider what I'm doing.

Many levels of indentation is fine as long as you can see the all the open and close brackets at the same time in your editor.
This tends to limit you to maybe 4 levels max in practice.

One thing to note however is that some nesting levels are necessary and should not be abstracted out.
If you have some sort of brute force iteration over a 4D array, then you've got four levels of nesting automatically, and if there's is an IF statement, then you have five levels without having a particularly convoluted section of code.
So the level of nesting isn't necessarily the problem, it's the complexity attached to each level of nesting. If you are mixing WHILEs, FOREACHs, IFs and SWITCHes then maybe you should boil them down.
The basic point is that levels of nesting aren't always an indicator of complexity.

IIRC, Linus Torvalds would agree with you. I believe he insists on an 8-space indent and a 80-character line length for the linux kernel code. When you get to 4 or 5 levels of indent under that scheme your code would look scrunched horribly. You're forced to try to refactor the code just to maintain readability.

I used to think Linus was a bit of a hard-ass on this when he said no more than 3 levels or else your code is broken, i.e., but following that principle has helped me a lot in reasoning about my control flows. That said, I had to interpret it differently. The intuitive interpretation is that you should create more and more itsy bitsy functions. I don't recommend taking that idea to the extreme as that can make your control flows and side effects just as difficult to reason about when all the side effects are obscured by function calls leading you all over the place.
What helped me enormously was favoring deferred processing and simpler loops. Take this as an example:
// Remove vertices from mesh.
for each vertex in vertices.to_remove:
{
for each edge in vertex.edges:
{
for each face in edge.faces:
{
face.remove_edge(edge);
if (face.size() < 3)
face.remove();
}
edge.remove();
}
vertex.remove();
}
That monstrous thing with 4 levels of indentation above is awkward (and more so when it's not in pseudocode form where it might have to update texture maps and so on requiring even more levels of indentation). We can instead do this:
for each vertex in vertices.to_remove:
{
for each edge in vertex.edges:
edges_to_remove.insert(edge);
}
for each edge in edges_to_remove:
{
for each face in edge.faces:
faces_to_rebuild.insert(face, edge);
}
for each face,edge in faces_to_rebuild:
{
face.remove_edge(edge);
if (face.size() < 3)
face.remove();
}
for each edge in edges_to_remove:
edge.remove();
for each vertex in vertices_to_remove:
vertex.remove();
While that involves more passes over the data and some additional state, it makes every loop much simpler. And when we have these simpler loops, they become easier to parallelize, and the memory access patterns can start taking on a cache-friendly nature with decent data structures in place.
I've found that favoring this type of deferred processing not only reduces those indentations but also simplifies my ability to comprehend what the code is doing since the control flows are dead simple and they're all causing only one type of uniform side effect. It makes it so much easier to see what each loop is doing and comprehend whether it is doing the right thing or not at a glance, since each one is dedicated to causing one type of side effect, not many different types of side effects. And surprisingly, in spite of the extra work, it often improves cache coherence, and also opens up new doors for multithreading and SIMD, and I end up getting an even more efficient result than I started.

Related

Is it really better to have an unnecessary function call instead of using else?

So I had a discussion with a colleague today. He strongly suggested me to change a code from
if(condition){
function->setValue(true)
}
else{
function->setValue(false)
}
to
function->setValue(false)
if(condition){
function->setValue(true)
}
in order to avoid the 'else'. I disagreed, because - while it might improve readability to some degree - in the case of the if-condition being true, we have 1 absolutely unnecessary function call.
What do you guys think?
Meh.
To do this to just to avoid the else is silly (at least there should be a deeper rationale). There's no extra branching cost to it typically, especially after the optimizer goes through it.
Code compactness can sometimes be a desirable aesthetic, especially if a lot of time is spent skimming and searching through code than reading it line-by-line. There can be legit reasons to favor terser code sometimes, but it's always cons and pros. But even code compactness should not be about cramming logic into fewer lines of code so much as just straightforward logic.
Correctness here might be easier to achieve with one or the other. The point was made in a comment that you might not know the side effects associated with calling setValue(false), though I would suggest that's kind of moot. Functions should have minimal side effects, they should all be documented at the interface/usage level if they aren't totally obvious, and if we don't know exactly what they are, we should be spending more time looking up their documentation prior to calling them (and their side effects should not be changing once firm dependencies are established to them).
Given that, it may sometimes be easier to achieve correctness and maintain it with a solution that starts out initializing states to some default value, and using a form of code that opts in to overwrite it in specific branches of code. From that standpoint, what your colleague suggested may be valid as a way to avoid tripping over that code in the future. Then again, for a simple if/else pair of branches, it's hardly a big deal.
Don't worry about the cost of the extra most-likely-constant-time function call either way in this kind of knee-deep micro-level implementation case, especially with no super tight performance-critical loop around this code (and even then, still prefer to worry about that at least a little bit in hindsight after profiling).
I think there are far better things to think about than this kind of coding style, like testing procedure. Reliable code tends to need less revisiting, and has the freedom to be written in a wider variety of ways without causing disputes. Testing is what establishes reliability. The biggest disputes about coding style tend to follow teams where there's more toe-stepping and more debugging of the same bodies of code over and over and over from disparate people due to lack of reliability, modularity, excessive coupling, etc. It's a symptom of a problem but not necessarily the root cause.

Generating a Provable List of Sets of Scenarios

I'm asking this with full knowledge that this idea is probably well covered in a subject unfamiliar to me. Suppose you're writing a small piece of code that takes an input of an arbitrary number of variables. Those variables can have several states, namely:
Correct Data
Incorrect Data (outside range, improper formatting, whatever)
Unknown (Null)
So if we have 3 input variables, and 3 states per those variables, we end up with 27 possible scenarios. Suppose I have to do some logic based on the state of certain variables, or the combination of states (AND, NAND, OR, etc). Can I easily structure a program in such a way that I provably cover all scenarios without an absolute mess of if/else style logic? The first thing that came to mind was statemachines, but after looking at them for a bit I'm not entirely convinced it's the same thing.
There will be if style logic, but you can use karnaugh maps to make it much cleaner and be sure that you've covered every possibility. What you do, is you make a grid showing every possible combination of states. Then, mark each state in a different way depending on the way you want to react to it. The goal of this is to group states. Then, you can easily see if your groups of states are logically "close together," and if so, you can simplify your control logic. A quick search for karnaugh maps will bring up explanations that will be much easier to follow thanks to pictures, but the idea is to use the grid to see which variables are irrelevant to a group of states, and optimize them out of the logic.

Is optimizing a class for a unit test good practice, or is it premature?

I've seen (and searched for) a lot of questions on StackOverflow about premature optimization - word on the street is, it is the root of all evil. :P I confess that I'm often guilty of this; I don't really optimize for speed at the cost of code legibility, but I will rewrite my code in logical manners using datatypes and methods that seem more appropriate for the task (e.g. in Actionscript 3, using a typed Vector instead of an untyped Array for iteration) and if I can make my code more elegant, I will do so. This generally helps me understand my code, and I generally know why I'm making these changes.
At any rate, I was thinking today - in OOP, we promote encapsulation, attempting to hide the implementation and promote the interface, so that classes are loosely coupled. The idea is to make something that works without having to know what's going on internally - the black box idea.
As such, here's my question - is it wise to attempt to do deep optimization of code at the class level, since OOP promotes modularity? Or does this fall into the category of premature optimization? I'm thinking that, if you use a language that easily supports unit testing, you can test, benchmark, and optimize the class because it in itself is a module that takes input and generates output. But, as a single guy writing code, I don't know if it's wiser to wait until a project is fully finished to begin optimization.
For reference: I've never worked in a team before, so something that's obvious to developers who have this experience might be foreign to me.
Hope this question is appropriate for StackOverflow - I didn't find another one that directly answered my query.
Thanks!
Edit: Thinking about the question, I realize that "profiling" may have been the correct term instead of "unit test"; unit-testing checks that the module works as it should, while profiling checks performance. Additionally, a part of the question I should have asked before - does profiling individual modules after you've created them not reduce time profiling after the application is complete?
My question stems from the game development I'm trying to do - I have to create modules, such as a graphics engine, that should perform optimally (whether they will is a different story :D ). In an application where performance was less important, I probably wouldn't worry about this.
I don't really optimize for speed at the cost of code legibility, but I will rewrite my code in logical manners using datatypes and methods that seem more appropriate for the task [...] and if I can make my code more elegant, I will do so. This generally helps me understand my code
This isn't really optimization, rather refactoring for cleaner code and better design*. As such, it is a Good Thing, and it should indeed be practiced continuously, in small increments. Uncle Bob Martin (in his book Clean Code) popularized the Boy Scout Rule, adapted to software development: Leave the code cleaner than you found it.
So to answer your title question rephrased, yes, refactoring code to make it unit testable is a good practice. One "extreme" of this is Test Driven Development, where one writes the test first, then adds the code which makes the test pass. This way the code is created unit testable from the very beginning.
*Not to be nitpicky, just it is useful to clarify common terminology and make sure that we use the same terms in the same meaning.
True, optimization I believe should be left as a final task (although its good to be cognizant of where you might need to go back and optimize while writing your first draft). That's not to say that you shouldn't re-factor things iteratively in order to maintain order and cleanliness in the code. It is to say that if something currently serves the purpose and isn't botching a requirement of the application then the requirements should first be addressed as ultimately they are what you're responsible for delivering (unless the requirements include specifics on maximum request times or something along those lines). I agree with Korin's methodology as well, build for function first if time permits optimize to your hearts content (or the theoretical limit, whichever comes first).
The reason that premature optimization is a bad thing is this: it can take a lot of time and you don't know in advance where the best use of your time is likely to be.
For example, you could spend a lot of time optimizing a class, only to find that the bottleneck in your application is network latency or similar factor that is far more expensive in terms of execution time. Because at the outset you don't have a complete picture, premature optimization leads to a less than optimal use of your time. In this case, you'd probably have preferred to fix the latency issue than optimize class code.
I strongly believe that you should never reduce your code readability and good design because of performance optimizations.
If you are writing code where performance is critical it may be OK to lower the style and clarity of your code, but this does not hold true for the average enterprise application. Hardware evolves quickly and gets cheaper everyday. In the end you are writing code that is going to be read by other developers, so you'd better do a good job at it!
It's always beautiful to read code that has been carefully crafted, where every path has a test that helps you understand how it should be used. I don't really care if it is 50 ms slower than the spaghetti alternative which does lots of crazy stuff.
Yes you should skip optimizing for the unit test. Optimization when required usually makes the code more complex. Aim for simplicity. If you optimize for the unit test you may actually de-optimize for production.
If performance is really bad in the unit test, you may need to look at your design. Test in the application to see if performance is equally bad before optimizing.
EDIT: De-optimization is likely to occur when the data being handled varies is size. This is most likely to occur will classes that work with sets of data. Response may be linear, but originally slow, compared to geometric and originally fast. If the unit test uses a small set of data, then the geometric solution may be chosen for the unit test. When production hits the class with a large set of data performance tanks.
Sorting algorithms are a classic case for this kind of behavior and resulting de-optimizations. Many other algorithms have similar characteristics.
EDIT2: My most successful optimization was the sort routine for a report where data was stored on disk in a memory mapped file. The sort times were reasonable with moderate data sizes which did not require disk access. With larger sized data sets it could take days to process the data. Initial timings of the report showed; data selection 3 minutes, data sorting 3 days, and reporting 3 minutes. Investigation of the sort showed that it was a fully unoptimized bubble sort (n-1 full passes for a data set of size n), roughly n square in big O notation. Changing the sorting algorithm reduced the sort time for this report to 3 minutes. I would not have expected a unit test to cover this case, and the original code was as simple (fast) as you could get for small sets. The replacement was much more complex and slower for very small sets, but handled large sets faster with a more linear curve, n log n in big O notation. (Note: no optimization was attempted until we had metrics.)
In practice, I aim for a ten-fold improvement of a routine which takes at least 50% of the module run-time. Achieving this level of optimization for a routine using 55% of the run-time will save 50% of the total run-time.

Do very long methods always need refactoring?

I face a situation where we have many very long methods, 1000 lines or more.
To give you some more detail, we have a list of incoming high level commands, and each generates results in a longer (sometime huge) list of lower level commands. There's a factory creating an instance of a class for each incoming command. Each class has a process method, where all the lower level commands are generated added in sequence. As I said, these sequences of commands and their parameters cause quite often the process methods to reach thousands of lines.
There are a lot of repetitions. Many command patterns are shared between different commands, but the code is repeated over and over. That leads me to think refactoring would be a very good idea.
On the contrary, the specs we have come exactly in the same form as the current code. Very long list of commands for each incoming one. When I've tried some refactoring, I've started to feel uncomfortable with the specs. I miss the obvious analogy between the specs and code, and lose time digging into newly created common classes.
Then here the question: in general, do you think such very long methods would always need refactoring, or in a similar case it would be acceptable?
(unfortunately refactoring the specs is not an option)
edit:
I have removed every reference to "generate" cause it was actually confusing. It's not auto generated code.
class InCmd001 {
OutMsg process ( InMsg& inMsg ) {
OutMsg outMsg = OutMsg::Create();
OutCmd001 outCmd001 = OutCmd001::Create();
outCmd001.SetA( param.getA() );
outCmd001.SetB( inMsg.getB() );
outMsg.addCmd( outCmd001 );
OutCmd016 outCmd016 = OutCmd016::Create();
outCmd016.SetF( param.getF() );
outMsg.addCmd( outCmd016 );
OutCmd007 outCmd007 = OutCmd007::Create();
outCmd007.SetR( inMsg.getR() );
outMsg.addCmd( outCmd007 );
// ......
return outMsg;
}
}
here the example of one incoming command class (manually written in pseudo c++)
Code never needs refactoring. The code either works, or it doesn't. And if it works, the code doesn't need anything.
The need for refactoring comes from you, the programmer. The person reading, writing, maintaining and extending the code.
If you have trouble understanding the code, it needs to be refactored. If you would be more productive by cleaning up and refactoring the code, it needs to be refactored.
In general, I'd say it's a good idea for your own sake to refactor 1000+ line functions. But you're not doing it because the code needs it. You're doing it because that makes it easier for you to understand the code, test its correctness, and add new functionality.
On the other hand, if the code is automatically generated by another tool, you'll never need to read it or edit it. So what'd be the point in refactoring it?
I understand exactly where you're coming from, and can see exactly why you've structured your code the way it is, but it needs to change.
The uncertainty you feel when you attempt to refactor can be ameliorated by writing unit tests. If you've tests specific to each spec, then the code for each spec can be refactored until you're blue in the face, and you can have confidence in it.
A second option, is it possible to automatically generate your code from a data structure?
If you've a core suite of classes that do the donkey work and edge cases, you can auto-generate the repetitive 1000 line methods as often as you wish.
However, there are exceptions to every rule.
If the methods are a literal interpretation of the spec (very little additional logic), and the specs change infrequently, and the "common" portions (i.e. bits that happen to be the same right now) of the specs change at different times, and you're not going to be asked to get a 10x performance gain out of the code anytime soon, then (and only then) . . . you may be better off with what you have.
. . . but on the whole, refactor.
Yes, always. 1000 lines is at least 10x longer than any function should ever be, and I'm tempted to say 100x, except that when dealing with input parsing and validation it can become natural to write functions with 20 or so lines.
Edit: Just re-read your question and I'm not clear on one point - are you talking about machine generated code that no-one has to touch? In which case I would leave things as they are.
Refectoring is not the same as writing from scratch. While you should never write code like this, before you refactor it, you need to consider the costs of refactoring in terms of time spent, the associated risks in terms of breaking code that already works, and the net benefits in terms of future time saved. Refactor only if the net benefits outweigh the associated costs and risks.
Sometimes wrapping and rewriting can be a safer and more cost effective solution, even if it appears expensive at first glance.
Long methods need refactoring if they are maintained (and thus need to be understood) by humans.
As a rule of thumb, code for humans first. I don't agree with the common idea that functions need to be short. I think what you need to aim at is when a human reads your code they grok it quickly.
To this effect it's a good idea to simplify things as much as possible--but not more than that. It's a good idea to delegate roughly one task for each function. There is no rule as for what "roughly one task" means: you'll have to use your own judgement for that. But do recognize that a function split into too many other functions itself reduces readability. Think about the human being who reads your function for the first time: they would have to follow one function call after another, constantly context-switching and maintaining a stack in their mind. This is a task for machines, not for humans.
Find the balance.
Here, you see how important naming things is. You will see it is not that easy to choose names for variables and functions, it takes time, but on the other hand it can save a lot of confusion on the human reader's side. Again, find the balance between saving your time and the time of the friendly humans who will follow you.
As for repetition, it's a bad idea. It's something that needs to be fixed, just like a memory leak. It's a ticking bomb.
As others have said before me, changing code can be expensive. You need to do the thinking as for whether it will pay off to spend all this time and effort, facing the risks of change, for a better code. You will possibly lose lots of time and make yourself one headache after another now, in order to possibly save lots of time and headache later.
Take a look at the related question How many lines of code is too many?. There are quite a few tidbits of wisdom throughout the answers there.
To repost a quote (although I'll attempt to comment on it a little more here)... A while back, I read this passage from Ovid's journal:
I recently wrote some code for
Class::Sniff which would detect "long
methods" and report them as a code
smell. I even wrote a blog post about
how I did this (quelle surprise, eh?).
That's when Ben Tilly asked an
embarrassingly obvious question: how
do I know that long methods are a code
smell?
I threw out the usual justifications,
but he wouldn't let up. He wanted
information and he cited the excellent
book Code Complete as a
counter-argument. I got down my copy
of this book and started reading "How
Long Should A Routine Be" (page 175,
second edition). The author, Steve
McConnell, argues that routines should
not be longer than 200 lines. Holy
crud! That's waaaaaay to long. If a
routine is longer than about 20 or 30
lines, I reckon it's time to break it
up.
Regrettably, McConnell has the cheek
to cite six separate studies, all of
which found that longer routines were
not only not correlated with a greater
defect rate, but were also often
cheaper to develop and easier to
comprehend. As a result, the latest
version of Class::Sniff on github now
documents that longer routines may not
be a code smell after all. Ben was
right. I was wrong.
(The rest of the post, on TDD, is worth reading as well.)
Coming from the "shorter methods are better" camp, this gave me a lot to think about.
Previously my large methods were generally limited to "I need inlining here, and the compiler is being uncooperative", or "for one reason or another the giant switch block really does run faster than the dispatch table", or "this stuff is only called exactly in sequence and I really really don't want function call overhead here". All relatively rare cases.
In your situation, though, I'd have a large bias toward not touching things: refactoring carries some inherent risk, and it may currently outweigh the reward. (Disclaimer: I'm slightly paranoid; I'm usually the guy who ends up fixing the crashes.)
Consider spending your efforts on tests, asserts, or documentation that can strengthen the existing code and tilt the risk/reward scale before any attempt to refactor: invariant checks, bound function analysis, and pre/postcondition tests; any other useful concepts from DBC; maybe even a parallel implementation in another language (maybe something message oriented like Erlang would give you a better perspective, given your code sample) or even some sort of formal logical representation of the spec you're trying to follow if you have some time to burn.
Any of these kinds of efforts generally have a few results, even if you don't get to refactor the code: you learn something, you increase your (and your organization's) understanding of and ability to use the code and specifications, you might find a few holes that really do need to be filled now, and you become more confident in your ability to make a change with less chance of disastrous consequences.
As you gain a better understanding of the problem domain, you may find that there are different ways to refactor you hadn't thought of previously.
This isn't to say "thou shalt have a full-coverage test suite, and DBC asserts, and a formal logical spec". It's just that you are in a typically imperfect situation, and diversifying a bit -- looking for novel ways to approach the problems you find (maintainability? fuzzy spec? ease of learning the system?) -- may give you a small bit of forward progress and some increased confidence, after which you can take larger steps.
So think less from the "too many lines is a problem" perspective and more from the "this might be a code smell, what problems is it going to cause for us, and is there anything easy and/or rewarding we can do about it?"
Leaving it cooking on the backburner for a bit -- coming back and revisiting it as time and coincidence allows (e.g. "I'm working near the code today, maybe I'll wander over and see if I can't document the assumptions a bit better...") may produce good results. Then again, getting royally ticked off and deciding something must be done about the situation is also effective.
Have I managed to be wishy-washy enough here? My point, I think, is that the code smells, the patterns/antipatterns, the best practices, etc -- they're there to serve you. Experiment to get used to them, and then take what makes sense for your current situation, and leave the rest.
I think you first need to "refactor" the specs. If there are repetitions in the spec it also will become easier to read, if it makes use of some "basic building blocks".
Edit: As long as you cannot refactor the specs, I wouldn't change the code.
Coding style guides are all made for easier code maintenance, but in your special case the ease of maintenance is achieved by following the spec.
Some people here asked if the code is generated. In my opinion it does not matter: If the code follows the spec "line by line" it makes no difference if the code is generated or hand-written.
1000 thousand lines of code is nothing. We have functions that are 6 to 12 thousand lines long. Of course those functions are so big, that literally things get lost in there, and no tool can help us even look at high level abstractions of them. the code is now unfortunately incomprehensible.
My opinion of functions that are that big, is that they were not written by brilliant programmers but by incompetent hacks who shouldn't be left anywhere near a computer - but should be fired and left flipping burgers at McDonald's. Such code wreaks havok by leaving behind features that cannot be added to or improved upon. (too bad for the customer). The code is so brittle that it cannot be modified by anyone - even the original authors.
And yes, those methods should be refactored, or thrown away.
Do you ever have to read or maintain the generated code?
If yes, then I'd think some refactoring might be in order.
If no, then the higher-level language is really the language you're working with -- the C++ is just an intermediate representation on the way to the compiler -- and refactoring might not be necessary.
Looks to me that you've implemented a separate language within your application - have you considered going that way?
It has been my understanding that it's recommended that any method over 100 lines of code be refactored.
I think some rules may be a little different in his era when code is most commonly viewed in an IDE. If the code does not contain exploitable repetition, such that there are 1,000 lines which are going to be referenced once each, and which share a significant number of variables in a clear fashion, dividing the code into 100-line routines each of which is called once may not be that much of an improvement over having a well-formatted 1,000-line module which includes #region tags or the equivalent to allow outline-style viewing.
My philosophy is that certain layouts of code generally imply certain things. To my mind, when a piece of code is placed into its own routine, that suggests that the code will be usable in more than one context (exception: callback handlers and the like in languages which don't support anonymous methods). If code segment #1 leaves an object in an obscure state which is only usable by code segment #2, and code segment #2 is only usable on a data object which is left in the state created by #1, then absent some compelling reason to put the segments in different routines, they should appear in the same routine. If a program puts objects through a chain of obscure states extending for many hundreds of lines of code, it might be good to rework the design of the code to subdivide the operation into smaller pieces which have more "natural" pre- and post- conditions, but absent some compelling reason to do so, I would not favor splitting up the code without changing the design.
For further reading, I highly recommend the long, insightful, entertaining, and sometimes bitter discussion of this topic over on the Portland Pattern Repository.
I've seen cases where it is not the case (for example, creating an Excel spreadsheet in .Net often requires a lot of line of code for the formating of the sheet), but most of the time, the best thing would be to indeed refactor it.
I personally try to make a function small enough so it all appears on my screen (without affecting the readability of course).
1000 lines? Definitely they need to be refactored. Also not that, for example, default maximum number of executable statements is 30 in Checkstyle, well-known coding standard checker.
If you refactor, when you refactor, add some comments to explain what the heck it's doing.
If it had comments, it would be much less likely a candidate for refactoring, because it would already be easier to read and follow for someone starting from scratch.
Then here the question: in general, do
you think such very long methods would
always need refactoring,
if you ask in general, we will say Yes .
or in a
similar case it would be acceptable?
(unfortunately refactoring the specs
is not an option)
Sometimes are acceptable, but is very unusual, I will give you a pair of examples:
There are some 8 bit microcontrollers called Microchip PIC, that have only a fixed 8 level stack, so you can't nest more than 8 calls, then care must be taken to avoid "stack overflow", so in this special case having many small function (nested) is not the best way to go.
Other example is when doing optimization of code (at very low level) so you have to take account the jump and context saving cost. Use it with care.
EDIT:
Even in generated code, you could need to refactorize the way its generated, for example for memory saving, energy saving, generate human readable, beauty, who knows, etc..
There has been very good general advise, here a practical recommendation for your sample:
common patterns can be isolated in plain feeder methods:
void AddSimpleTransform(OutMsg & msg, InMsg const & inMsg,
int rotateBy, int foldBy, int gonkBy = 0)
{
// create & add up to three messages
}
You might even improve that by making this a member of OutMsg, and using a fluent interface, such that you can write
OutMsg msg;
msg.AddSimpleTransform(inMsg, 12, 17)
.Staple("print")
.AddArtificialRust(0.02);
which can be an additional improvement under circumstances.

More vs Less Functions

I had a little argument, and was wondering what people out there think:
In C++ (or in general), do you prefer code broken up into many shorter functions, with main() consisting of just a list of functions, in a logical order, or do you prefer functions only when necessary (i.e., when they will be reused very many times)? Or perhaps something in between?
Small functions, please
It is the conventional wisdom that smaller functions are better, and I think it's true. In fact, there is a company with an analysis tool that rates individual functions by how many decisions they make compared to the number of unit tests that they have.
The theory is that you may or may not be able to reduce complexity in an entire application, but you have complete control over how much complexity is in any given function.
A measurement called cyclomatic complexity is thought to correlate positively with bad code...specifically, the more paths there are through a method the higher its CCN number is, the more poorly it is written, the harder it is to understand and hence change or even get right to start with, and the more unit tests it will need.
Ok, found the tool. It is called, ahem, the Change Risk Analysis and Predictions index.
Lately, the principle of encoding information only once has grown new acronyms, specifically DRY (Don't Repeat Yourself) and DIE (Duplication is Evil) ...
I believe we can in part thank the RoR community for promoting this philosophy...
Split the functions, but never split functionality.
Functionality may be classified into layers, then each layer may split into different functions. For example, when we are processing a sine series, the main loop for summing and subtracting should be in primary function. This may consider as layer 1. Now the functionality for finding power may classified in to layer 2. This can be implemented as a sub function. Similarly finding factorial also belongs to layer 2 which would be another sub function. Always consider functionality, never count number of lines. Number of lines may vary from 3 to 300, doesn't matter. This will add more readability and maintainability to our code. This is my idea about splitting.
I think the only answer is something in between. If you break up functions every time possible, it becomes unreadable. Likewise, if you never break up, it also becomes unreadable.
I like to group functions into semantic differences. It is one logical unit for some calculation. Keep the units small, but big enough to actually do something useful.
My favorite granularity rule of thumb for a function is "no more than 24 lines of < 80 characters each" -- and that's not just because 80 x 24 terminals were all the rage "back when I started"; I think it's a reasonable guideline for functions you can "grasp as one eyeful", at least in C or languages not much richer than C. "A function does only one thing", AKA "a function has one function" (playing on the meaning of "function" as "role" or "purpose"!-) is a secondary rule I use in languages where "too much functionality" can easily be packed in 24 lines. But the "lexical eyeful" guideline -- 24 x 80 -- is still my main one.
Small functions are good and smaller ones are better.
About five to eight lines of code is my upper limit on function size. Beyond that, and it's too complicated. You should be able to:
Assume that a callee does what its name would indicate,
Read a function's definition in a matter of seconds, and
Convince yourself quickly that the first assumption implies that the present function is correct.
The other thing is that you should use your functions BEFORE you write their code. When you see how you intend to use the function, then you'll see what pre- and post-conditions said functions must respect.
Anything that isn't obviously correct at first glance should be proven correct in the running commentary. If that's difficult, factor sub-functions out.
Whatever helps with code reuse and readability works best, I believe.
Making lots of one line functions just to do it doesn't help with readability so they should be grouped in classes that makes sense, and then split up the functions so that you can understand quickly what is going on in that function.
If you have to jump all over to understand what is going on then your design is flawed.
I prefer functions (or methods) which fit within one screenful of code, so I can see at a glance anything I need to reference to understand how that function works. I generally have about 50 lines of space in my editor windows, which are also generally at 80 columns so I can fit two side by side on a monitor and cross reference between two pieces of code.
So, I generally consider 50 lines to be about the maximum. The only time I would consider allowing more is when you have one big long initialization function or something that is completely linear (no variables, conditionals, or loops), since that's not something where you need all that much context and some APIs require a whole bunch of initialization to get up and running, and splitting it into smaller functions wouldn't really help much.
On the whole, though, nice, small, easy to understand functions that do one thing and are well named are vastly preferable to big sprawling monstrosities that are hundreds of lines long and dozens of variables to keep track of with indentation going 10 levels deep.
Another simple reason: A function should be made when a block of code is being reused more than once or twice. For very small bits of code (say one or two statements), macros often alleviate the problem.