High Cohesion and Concurrency - Are they conflicting interests?

High Cohesion and Concurrency - Are they conflicting interests? - concurrency

I was reading Robert Martin's Clean Code and in that he mentions about the code being highly cohesive:
Classes should have a small number of
instance variables. Each of the
methods of a class should manipulate
one or more of those variables. In
general the more variable a method
manipulates the more cohesive that
method is to its class. A class in
which each variable is used by each
method is maximally cohesivemethod is maximally cohesive
But when we are trying to write concurrent code, we strive to limit the scope of variables to a single method to avoid race conditions. But this results in code which is least cohesive.
When designing an application/class, what should you prefer - Cohesion or Concurrency?

I like quite a few of Martin's concepts, but your code needs to execute correctly and if it doesn't, all the beautiful metrics in the world aren't going to make you look better.
Add to that that threading issues are among the worst there are to debug, and you should not compromise your design for concurrency to conform with your idea of what someone wrote in a book about cohesion. Again, I'm not knocking Martin... I'm sure he'd tell you the same thing. After all, he recognizes that almost everything is on a continuum in most of his writing.
I'm not sure you're putting the emphasis in quite the right place though (could just be the way I'm reading your question). Martin's not saying you should make as many variables live at the class level as possible. He's saying that of the class level variables, how many are you using? If you promote variables you don't need to, you might not be getting higher cohesiveness... you might be getting tighter coupling.

Related

Efficiency of program

I want to know whether there is an effect on program efficiency by adopting object oriented approach to a problem as compared to the structured programming approach in any programming language but specially in c++.

Maybe. Maybe not.
You can write efficient object-oriented code. You can write inefficient structured code.
It depends on the application, how well the code is written, and how heavily the code is optimized. In general, you should write code so that it has a good, clean, modular architecture and is well designed, then if you have problems with performance optimize the hot spots that are causing performance issues.
Use object oriented programming where it makes sense to use it and use structured programming where it makes sense to use it. You don't have to choose between one and the other: you can use both.

I remember back in the early 1990's when C++ was young there were studies done about this. If I remember correctly, the guys who took (well written) C++ programs and recoded them in C got around a 15% increase in speed. The guys who took C programs and recoded them in C++, and modified the imperative style of C to an OO style (but same algorithms) for C++ got the same or better performance. The apparent contradiction was explained by the observation that the C programs, in being translated to an object oriented style, became better organized. Things that you did in C because it was too much code and trouble to do better could more easily be done properly in C++.
Thinking back about this I wonder about the conclusion some. Writing a program a second time will always result in a better program, so it didn't have to be imperative to OO style that made the difference. Todays computer architectures are designed with hardware support for common operations done by OO programs, and compilers have gotten better at using the instructions, so I think that it is likely that whatever overhead a virtual function call had in 1992 it is far smaller today.

There doesn't have to be, if you are very careful to avoid it. If you just take the most straightforward approach, using dynamic allocation, virtual functions, and (especially) passing objects by value, then yes there will be inefficiency.

It doesn't have to be. Algorithm is all matters. I agree encapsulation will slow you down little bit, but compilers are there to optimize.

You would say no if this is the question in computer science paper.
However in the real development environment this tends to be true if the OOP paradigm is used correctly. The reason is that in real development process, we generally need to maintain our code base and that the time when OOP paradigm could help us. One strong point of OOP over structured programming like C is that in OOP it is easier to make the code maintainable. When the code is more maintainable, it means less bug and less time to fix bug and less time needed for implementing new features. The bottom line is then we will have more time to focus on the efficiency of the application.

The problem is not technical, it is psychological. It is in what it encourages you to do by making it easy.
To make a mundane analogy, it is like a credit card. It is much more efficient than writing checks or using cash. If that is so, why do people get in so much trouble with credit cards? Because they are so easy to use that they abuse them. It takes great discipline not to over-use a good thing.
The way OO gets abused is by
Creating too many "layers of abstraction"
Creating too much redundant data structure
Encouraging the use of notification-style code, attempting to maintain consistency within redundant data structures.
It is better to minimize data structure, and if it must be redundant, be able to tolerate temporary inconsistency.
ADDED:
As an illustration of the kind of thing that OO encourages, here's what I see sometimes in performance tuning: Somebody sets SomeProperty = true;. That sounds innocent enough, right? Well that can ripple to objects that contain that object, often through polymorphism that's hard to trace. That can mean that some list or dictionary somewhere needs to have things added to it or removed from it. That can mean that some tree or list control needs controls added or removed or shuffled. That can mean windows are being created or destroyed. It can also mean some things need to be changed in a database, which might not be local so there's some I/O or mutex locking to be done.
It can really get crazy. But who cares? It's abstract.

There could be: the OO approach tends to be closer to a decoupled approach where different modules don't go poking around inside each other. They are restricted to public interfaces, and there is always a potential cost in that. For example, calling a getter instead of just directly examining a variable; or calling a virtual function by default because the type of an object isn't sufficiently obvious for a direct call.
That said, there are several factors that diminish this as a useful observation.
A well written structured program should have the same modularity (i.e. hiding implementations), and therefore incur the same costs of indirection. The cost of calling a function pointer in C is probably going to be very similar to the cost of calling a virtual function in C++.
Modern JITs, and even the use of inline methods in C++, can remove the indirection cost.
The costs themselves are probably relatively small (typically just a few extra simple operations per instruction call). This will be insignificant in a program where the real work is done in tight loops.
Finally, a more modular style frees the programmer to tackle more complicated, but hopefully less complex algorithms without the peril of low level bugs.

Secret to achieve good OO Design [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I am a c++ programmer, and I am looking forward to learning and mastering OO design.I have done a lot of search and as we all know there is loads of material, books, tutorials etc on how to achieve a good OO design. Of course, I do understand a good design is something can only come up with loads of experience, individual talent, brilliance or in fact even by mere luck(exaggeration!).
But sure it all starts off with a solid beginning & building some strong basics.Can someone help me out by pointing out the right material on how to start off this quest of learning designing right from the stage of identifying objects, classes etc to the stage of using design patterns.
Having said that I am a programmer but I have not had a experience in designing.Can you please help me take someone help me out in this transition from a programmer to a designer?
Any tips,suggestions,advice will be helpful.
[Edit]Thanks for the links and answers, I need to get myself in to that :) As i mentioned before I am a C++ programmer and I do understand the OO basic concepts as such, like inheritance, abstraction, polymorphism, and having written code in C++ do understand a few of the design patterns as well.what i dont understand is the basic thought process with which one should approach a requirement. The nitty grittys of how to appraoch and decide on what classes should be made, and how to define or conclude on relationships they should have amongst themselves.Knowing the concepts(t some extent) but not knowing how to apply them is the problem i seem to have :( Any suggestions about that?

(very) Simple, but not simplist design (simple enough design if you prefer) : K.I.S.S.
Prefer flat hierarchies, avoid deep hierarchies.
Separation of concerns is essential.
Consider other "paradigms" than OO when it don't seem simple or elegant enough.
More generally : D.R.Y. and Y.A.G.N.I help you achieve 1.

There is no secret. It's sweat, not magic.
It's not about doing one thing right. It's balancing many things that must not go wrong. Sometimes, they work in sync, sometimes, they work against each other. Design is only one group of these aspects. The best design doesn't help if the project fails (e.g. because it never ships).
The first rule I'd put forward is:
1. There are no absolutes
Follows directly from the "many things to balance. D.R.Y., Y.A.G.N.I. etc. are guidelines, strictly following them cannot guarantee good design, if followed by the letter they may make your project fail.
Example: D.R.Y. One of the most fundamental principles, yet studies show that complexity of small code snippets increases by a factor of 3 or more when they get isolated, due to pre/post condition checking, error handling, and generalization to multiple related cases. So the principle needs to be weakened to "D.R.Y. (at least, not to much)" - when to and when not is the hard part.
The second rule is not a very common one:
2. An interface must be simpler than the implementation
Sounds to trivial to be catchy. Yet, there's much to say about it:
The premise of OO was to manage program sizes that could not be managed with structured programming anymore. The primary mechanism is to encapsulate complexity: we can hide complexity behind a simpler interface, and then forget about that complexity.
Interface complexity involves the documentation, error handling specifications, performance guarantees (or their absence), etc. This means that e.g. reducing the interface declaration by introducing special cases isn't a reduction in complexity - just a shuffle.
3-N Here's where I put most of the other mentions, that have been explained already very well.
Separation of Concerns, K.I.S.S, SOLID principle, D.R.Y., roughly in that order.
How to build software according to these guidelines?
Above guidelines help evaluating a piece of code. Unfortunately, there's no recipe how to get there. "Experienced" means that you have a good feel for how to structure your software, and some decisions just feel bad. Maybe all the principles are just rationnalizaitons after the fact.
The general path is to break down a system into responsibilities, until the individual pieces are managable.
There are formal processes for that, but these just work around the fact that what makes a good, isolated component is a subjective decision. But in the end, that's what we get paid for.
If you have a rough idea of the whole system, it isn't wrong to start with one of these pieces as a seed, and grow them into a "core". Top-down and bottom-up aren't antipodes.
Practice, practice, practice. Build a small program, make it run, change requirements, get it to run again. The "changing requirements" part you don't need to train a lot, we have customers for that.
Post-Project reviews - try to get used to them even for your personal projects. After it's sealed, done, evaluate what was good, what was bad. Consider the source code was thrown away - i.e. don't see that sessison as "what should be fixed?"
Conway's Law says that "A system reflects the structure of the organizaiton that built it." That applies to most complex software I've seen, and formal studies seem to confirm that. We can derive a few bits of information from that:
If structure is important, so are the people you work with.
Or Maybe structure isn't that important. There is not one right structure (just many wrong ones to avoid)

I'm going to quote Marcus Baker talking about how to achieve good OO design in a forum post here: http://www.sitepoint.com/forums/showpost.php?p=4671598&postcount=24
1) Take one thread of a use case.
2) Implement it any old how.
3) Take another thread.
4) Implement it any old how.
5) Look for commonality.
6) Factor the code so that commonality is collected into functions. Aim for clarity of code. No globals, pass everything.
7) Any block of code that is unclear, group into a function as well.
8) Implement another thread any old how, but use your existing functions if they are instant drop-ins.
9) Once working, factor again to remove duplication. By now you may find you are passing similar lumps of stuff around from function to function. To remove duplication, move these into objects.
10) Implement another thread once your code is perfect.
11) Refactor to avoid duplication until bored.
Now the OO bit...
12) By now some candidate higher roles should be emerging. Group those functions into roles by class.
13) Refactor again with the aim of clarity. No class bigger than a couple of pages of code, no method longer than 5 lines. No inheritance unless the variation is just a few lines of code.
From this point on you can cycle for a bit...
14) Implement a thread of use case any old how.
15) Refactor as above. Refactoring includes renaming objects and classes as their meanings evolve.
16) Repeat until bored.
Now the patterns stuff!
Once you have a couple of dozen classes and quite a bit of functionality up and running, you may notice some classes have very similar code, but no obvious split (no, don't use inheritance). At this point consult the patterns books for options on removing the duplication. Hint: you probably want "Strategy".
The following repeats...
17) Implement another thread of a use case any old how.
18) Refactor methods down to five lines or less, classes down to 2 pages or less (preferably a lot less), look for patterns to remove higher level duplication if it really makes the code cleaner.
19) Repeat until your top level constructors either have lot's of parameters, or you find yourself using "new" a lot to create objects inside other objects (this is bad).
Now we need to clean up the dependencies. Any class that does work should not use "new" to create things inside itself. Pass those sub objects from out side. Classes which do no mechanical work are allowed to use the "new" operator. They just assemble stuff - we'll call them factories. A factory is a role in itself. A class should have just one role, thus factories should be separate classes.
20) Factor out the factories.
Now we repeat again...
21) Implement another thread of a use case any old how.
22) Refactor methods down to five lines or less, classes down to 2 pages or less (preferably a lot less), look for patterns to remove higher level duplication if it really makes the code cleaner, make sure you use separate classes for factories.
23) Repeat until your top level classes have an excessive number of parameters (say 8+).
You've probably finished by now. If not, look up the dependency injection pattern...
24) Create (only) your top level classes with a dependency injector.
Then you can repeat again...
25) Implement another thread of a use case any old how.
26) Refactor methods down to five lines or less, classes down to 2 pages or less (preferably a lot less), look for patterns to remove higher level duplication if it really makes the code cleaner, make sure you use separate classes for factories, pass the top level dependencies (including the factories) via DI.
27) Repeat.
At any stage in this heuristic you will probably want to take a look at test driven development. At the very least it will stop regressions while you refactor.
Obviously, this is a pretty simple process, and the information contained therein shouldn't be applied to every situation, but I feel like Marcus gets it right, especially with regards to the process one should use to design OO code. After a while, you'll start doing it naturally, it'll just become second nature. But while learning to do so, this is a great set of steps to follow.

As you said, there is nothing like experience. You can read every existing book on the planet about this, you'll still not as good as if you practice.
Understanding the theory is good, but in my humble opinion, there is nothing like experience. I think the best way to learn and understand things completely is to apply them in some project(s).
There you'll face difficulties, you'll learn to solve them, sometimes perhaps with a bad solution : but still you'll learn. And if at any time something bothers you and you can't find how to solve it nicely, we'll be here on SO to help you ! :)

I can advice you the book "Head First Design Patterns" (search Amazon). It is a good starting point, before seriously diving into the gang of fours' bible, and it shows design principles and the most used patterns.

In a nutshell : Code, criticize, look for a well-known solution, implement it, and back to first step till you're (more or less) satisfied.
As often, the answer to this kind of question is : it depends. And in that case, it depends how you learn things. I'll tell you what work for me, for I face the very problem you describe, but it won't work with everybody and I would not say it's "the perfect answer".
I begin coding something, not too simple, not too complex. Then, I look at the code and I think : all right, what is wrong ? For that, you can use the first three "SOLID principles" :
Single responsibility (are all your classes serving a unique purpose ?)
Open/Close principle (if you want to add a service, your classes can be extended with inheritance, but there is no need to alter the basic functions or your current classes).
Liskov Substitution (all right, this one I can't explain simply, and I'd advise reading about it).
Don't try to master those and understand everything about them. Just use them as guideline to criticize your code. Think chiefly about "what if I want to do this now ?". "What if I work for a client, and he wants to add this or that ?". If your code is perfectly adaptable to any situation (which is almost impossible), you might have reached a very good design.
If it's not, consider a solution. What would you do ? Try to come with an idea. Then, read about design patterns and find one that could answer your problem. See if it matches your idea - often, it's the idea you had, but better expressed and developped. Now, try to implement it. It's going to take time, you'll often fail, it's frustrating, and that's normal.
Design is about experience, but experience is acquired by criticizing your own code. That's how you'll understand design, not as a cool thing to know, but as the basis for a solid code. It's not enough to know "all right, a good code has that and that". It's much better to have experienced why, to have failed and see what whas wrong. The trouble with design pattern is that they are very abstract. My method is a way (probably not the only one nor the best) to make them less abstract to you.

No-solo-work. Good designs are seldom created by a single person only. Talk to your colleagues. Discuss your design with others. And learn.
Don't be too smart. A complex hierarchy with 10 levels of inheritance is seldom a good design. Make sure that you can clearly explain how your design works. If you can't explain the basic principles in 5 minutes, your design is probably too complex.
Learn tricks from the masters: Alexandrescu, Meyers, Sutter, GoF.
Prefer extensibility over perfection. Source code written today will be insufficient in 3 years time. If you write your code to be perfect now, but inextensible, you will have a problem later. If you write your code to be extensible (but not perfect), you will still be able to adapt it later.

The core concepts in my mind are:
Encapsulation - Keep as much of you object hidden from both the prying eyes and sticky fingers of the outside world.
Abstraction - Hide as much of the inner workings of you object from the simple minds of the code that needs to use your objects
All the other concepts such as inheritance, polymorphism and design patters are about incorporating the two concepts above and still have objects that can solve real world problems.

How should I design a mechanism in C++ to manage relatively generic entities within a simulation?

I would like to start my question by stating that this is a C++ design question, more then anything, limiting the scope of the discussion to what is accomplishable in that language.
Let us pretend that I am working on a vehicle simulator that is intended to model modern highway systems. As part of this simulation, entities will be interacting with each other to avoid accidents, stop at stop lights and perhaps eventually even model traffic enforcement with radar guns and subsequent exciting high speed chases.
Being a spatial simulation written in C++, it seems like it would be ideal to start with some kind of Vehicle hierarchy, with cars and trucks deriving from some common base class. However, a common problem I have run in to is that such a hierarchy is usually very rigidly defined, and introducing unexpected changes - modeling a boat for instance - tends to introduce unexpected complexity that tends to grow over time into something quite unwieldy.
This simple aproach seems to suffer from a combinatoric explosion of classes. Imagine if I created a MoveOnWater interface and a MoveOnGround interface, and used them to define Car and Boat. Then lets say I add RadarEquipment. Now I have to do something like add the classes RadarBoat and RadarCar. Adding more capabilities using this approach and the whole thing rapidly becomes quite unreasonable.
One approach I have been investigating to address this inflexibility issue is to do away with the inheritance hierarchy all together. Instead of trying to come up with a type safe way to define everything that could ever be in this simulation, I defined one class - I will call it 'Entity' - and the capabilities that make up an entity - can it drive, can it fly, can it use radar - are all created as interfaces and added to a kind of capability list that the Entity class contains. At runtime, the proper capabilities are created and attached to the entity and functions that want to use these interfaced must first query the entity object and check for there existence. This approach seems to be the most obvious alternative, and is working well for the time being. I, however, worry about the maintenance issues that this approach will have. Effectively any arbitrary thing can be added, and there is no single location in which all possible capabilities are defined. Its not a problem currently, when the total number of things is quite small, but I worry that it might be a problem when someone else starts trying to use and modify the code.
As one potential alternative, I pondered using the template system to achieve type safe while keeping the same kind of flexibility. I imagine I could create entities that inherited whatever combination of interfaces I wanted. Using these objects would entail creating a template class or function that used any combination of the interfaces. One example might be the simple move on road using just the MoveOnRoad interface, whereas more complex logic, like a "high speed freeway chase", could use methods from both MoveOnRoad and Radar interfaces.
Of course making this approach usable mandates the use of boost concept check just to make debugging feasible. Also, this approach has the unfortunate side effect of making "optional" interfaces all but impossible. It is not simple to write a function that can have logic to do one thing if the entity has a RadarEquipment interface, and do something else if it doesn't. In this regard, type safety is somewhat of a curse. I think some trickery with boost any may be able to pull it off, but I haven't figured out how to make that work and it seems like way to much complexity for what I am trying to achieve.
Thus, we are left with the dynamic "list of capabilities" and achieving the goal of having decision logic that drives behavior based on what the entity is capable of becomes trivial.
Now, with that background in mind, I am open to any design gurus telling me where I err'd in my reasoning. I am eager to learn of a design pattern or idiom that is commonly used to address this issue, and the sort of tradeoffs I will have to make.
I also want to mention that I have been contemplating perhaps an even more random design. Even though I my gut tells me that this should be designed as a high performance C++ simulation, a part of me wants to do away with the Entity class and object-orientated foo all together and uses a relational model to define all of these entity states. My initial thought is to treat entities as an in memory database and use procedural query logic to read and write the various state information, with the necessary behavior logic that drives these queries written in C++. I am somewhat concerned about performance, although it would not surprise me if that was a non-issue. I am perhaps more concerned about what maintenance issues and additional complexity this would introduce, as opposed to the relatively simple list-of-capabilities approach.

Encapsulate what varies and Prefer object composition to inheritance, are the two OOAD principles at work here.
Check out the Bridge Design pattern. I visualize Vehicle abstraction as one thing that varies, and the other aspect that varies is the "Medium". Boat/Bus/Car are all Vehicle abstractions, while Water/Road/Rail are all Mediums.
I believe that in such a mechanism, there may be no need to maintain any capability. For example, if a Bus cannot move on Water, such a behavior can be modelled by a NOP behavior in the Vehicle Abstraction.
Use the Bridge pattern when
you want to avoid a permanent binding
between an abstraction and its
implementation. This might be the
case, for example, when the
implementation must be selected or
switched at run-time.
both the abstractions and their
implementations should be extensible
by subclassing. In this case, the
Bridge pattern lets you combine the
different abstractions and
implementations and extend them
independently.
changes in the implementation of an
abstraction should have no impact on
clients; that is, their code should
not have to be recompiled.

Now, with that background in mind, I am open to any design gurus telling me where I err'd in my reasoning.
You may be erring in using C++ to define a system for which you as yet have no need/no requirements:
This approach seems to be the most
obvious alternative, and is working
well for the time being. I, however,
worry about the maintenance issues
that this approach will have.
Effectively any arbitrary thing can be
added, and there is no single location
in which all possible capabilities are
defined. Its not a problem currently,
when the total number of things is
quite small, but I worry that it might
be a problem when someone else starts
trying to use and modify the code.
Maybe you should be considering principles like YAGNI as opposed to BDUF.
Some of my personal favourites are from Systemantics:
"15. A complex system that works is invariably found to have evolved from a simple system that works"
"16. A complex system designed from scratch never works and cannot be patched up to make it work. You have to start over, beginning with a working simple system."
You're also worring about performance, when you have no defined performance requirements, and no problems with performance:
I am somewhat concerned about
performance, although it would not
surprise me if that was a non-issue.
Also, I hope you know about double-dispatch, which might be useful for implementing anything-to-anything interactions (it's described in some detail in More Effective C++ by Scott Meyers).

Do very long methods always need refactoring?

I face a situation where we have many very long methods, 1000 lines or more.
To give you some more detail, we have a list of incoming high level commands, and each generates results in a longer (sometime huge) list of lower level commands. There's a factory creating an instance of a class for each incoming command. Each class has a process method, where all the lower level commands are generated added in sequence. As I said, these sequences of commands and their parameters cause quite often the process methods to reach thousands of lines.
There are a lot of repetitions. Many command patterns are shared between different commands, but the code is repeated over and over. That leads me to think refactoring would be a very good idea.
On the contrary, the specs we have come exactly in the same form as the current code. Very long list of commands for each incoming one. When I've tried some refactoring, I've started to feel uncomfortable with the specs. I miss the obvious analogy between the specs and code, and lose time digging into newly created common classes.
Then here the question: in general, do you think such very long methods would always need refactoring, or in a similar case it would be acceptable?
(unfortunately refactoring the specs is not an option)
edit:
I have removed every reference to "generate" cause it was actually confusing. It's not auto generated code.
class InCmd001 {
OutMsg process ( InMsg& inMsg ) {
OutMsg outMsg = OutMsg::Create();
OutCmd001 outCmd001 = OutCmd001::Create();
outCmd001.SetA( param.getA() );
outCmd001.SetB( inMsg.getB() );
outMsg.addCmd( outCmd001 );
OutCmd016 outCmd016 = OutCmd016::Create();
outCmd016.SetF( param.getF() );
outMsg.addCmd( outCmd016 );
OutCmd007 outCmd007 = OutCmd007::Create();
outCmd007.SetR( inMsg.getR() );
outMsg.addCmd( outCmd007 );
// ......
return outMsg;
}
}
here the example of one incoming command class (manually written in pseudo c++)

Code never needs refactoring. The code either works, or it doesn't. And if it works, the code doesn't need anything.
The need for refactoring comes from you, the programmer. The person reading, writing, maintaining and extending the code.
If you have trouble understanding the code, it needs to be refactored. If you would be more productive by cleaning up and refactoring the code, it needs to be refactored.
In general, I'd say it's a good idea for your own sake to refactor 1000+ line functions. But you're not doing it because the code needs it. You're doing it because that makes it easier for you to understand the code, test its correctness, and add new functionality.
On the other hand, if the code is automatically generated by another tool, you'll never need to read it or edit it. So what'd be the point in refactoring it?

I understand exactly where you're coming from, and can see exactly why you've structured your code the way it is, but it needs to change.
The uncertainty you feel when you attempt to refactor can be ameliorated by writing unit tests. If you've tests specific to each spec, then the code for each spec can be refactored until you're blue in the face, and you can have confidence in it.
A second option, is it possible to automatically generate your code from a data structure?
If you've a core suite of classes that do the donkey work and edge cases, you can auto-generate the repetitive 1000 line methods as often as you wish.
However, there are exceptions to every rule.
If the methods are a literal interpretation of the spec (very little additional logic), and the specs change infrequently, and the "common" portions (i.e. bits that happen to be the same right now) of the specs change at different times, and you're not going to be asked to get a 10x performance gain out of the code anytime soon, then (and only then) . . . you may be better off with what you have.
. . . but on the whole, refactor.

Yes, always. 1000 lines is at least 10x longer than any function should ever be, and I'm tempted to say 100x, except that when dealing with input parsing and validation it can become natural to write functions with 20 or so lines.
Edit: Just re-read your question and I'm not clear on one point - are you talking about machine generated code that no-one has to touch? In which case I would leave things as they are.

Refectoring is not the same as writing from scratch. While you should never write code like this, before you refactor it, you need to consider the costs of refactoring in terms of time spent, the associated risks in terms of breaking code that already works, and the net benefits in terms of future time saved. Refactor only if the net benefits outweigh the associated costs and risks.
Sometimes wrapping and rewriting can be a safer and more cost effective solution, even if it appears expensive at first glance.

Long methods need refactoring if they are maintained (and thus need to be understood) by humans.

As a rule of thumb, code for humans first. I don't agree with the common idea that functions need to be short. I think what you need to aim at is when a human reads your code they grok it quickly.
To this effect it's a good idea to simplify things as much as possible--but not more than that. It's a good idea to delegate roughly one task for each function. There is no rule as for what "roughly one task" means: you'll have to use your own judgement for that. But do recognize that a function split into too many other functions itself reduces readability. Think about the human being who reads your function for the first time: they would have to follow one function call after another, constantly context-switching and maintaining a stack in their mind. This is a task for machines, not for humans.
Find the balance.
Here, you see how important naming things is. You will see it is not that easy to choose names for variables and functions, it takes time, but on the other hand it can save a lot of confusion on the human reader's side. Again, find the balance between saving your time and the time of the friendly humans who will follow you.
As for repetition, it's a bad idea. It's something that needs to be fixed, just like a memory leak. It's a ticking bomb.
As others have said before me, changing code can be expensive. You need to do the thinking as for whether it will pay off to spend all this time and effort, facing the risks of change, for a better code. You will possibly lose lots of time and make yourself one headache after another now, in order to possibly save lots of time and headache later.

Take a look at the related question How many lines of code is too many?. There are quite a few tidbits of wisdom throughout the answers there.
To repost a quote (although I'll attempt to comment on it a little more here)... A while back, I read this passage from Ovid's journal:
I recently wrote some code for
Class::Sniff which would detect "long
methods" and report them as a code
smell. I even wrote a blog post about
how I did this (quelle surprise, eh?).
That's when Ben Tilly asked an
embarrassingly obvious question: how
do I know that long methods are a code
smell?
I threw out the usual justifications,
but he wouldn't let up. He wanted
information and he cited the excellent
book Code Complete as a
counter-argument. I got down my copy
of this book and started reading "How
Long Should A Routine Be" (page 175,
second edition). The author, Steve
McConnell, argues that routines should
not be longer than 200 lines. Holy
crud! That's waaaaaay to long. If a
routine is longer than about 20 or 30
lines, I reckon it's time to break it
up.
Regrettably, McConnell has the cheek
to cite six separate studies, all of
which found that longer routines were
not only not correlated with a greater
defect rate, but were also often
cheaper to develop and easier to
comprehend. As a result, the latest
version of Class::Sniff on github now
documents that longer routines may not
be a code smell after all. Ben was
right. I was wrong.
(The rest of the post, on TDD, is worth reading as well.)
Coming from the "shorter methods are better" camp, this gave me a lot to think about.
Previously my large methods were generally limited to "I need inlining here, and the compiler is being uncooperative", or "for one reason or another the giant switch block really does run faster than the dispatch table", or "this stuff is only called exactly in sequence and I really really don't want function call overhead here". All relatively rare cases.
In your situation, though, I'd have a large bias toward not touching things: refactoring carries some inherent risk, and it may currently outweigh the reward. (Disclaimer: I'm slightly paranoid; I'm usually the guy who ends up fixing the crashes.)
Consider spending your efforts on tests, asserts, or documentation that can strengthen the existing code and tilt the risk/reward scale before any attempt to refactor: invariant checks, bound function analysis, and pre/postcondition tests; any other useful concepts from DBC; maybe even a parallel implementation in another language (maybe something message oriented like Erlang would give you a better perspective, given your code sample) or even some sort of formal logical representation of the spec you're trying to follow if you have some time to burn.
Any of these kinds of efforts generally have a few results, even if you don't get to refactor the code: you learn something, you increase your (and your organization's) understanding of and ability to use the code and specifications, you might find a few holes that really do need to be filled now, and you become more confident in your ability to make a change with less chance of disastrous consequences.
As you gain a better understanding of the problem domain, you may find that there are different ways to refactor you hadn't thought of previously.
This isn't to say "thou shalt have a full-coverage test suite, and DBC asserts, and a formal logical spec". It's just that you are in a typically imperfect situation, and diversifying a bit -- looking for novel ways to approach the problems you find (maintainability? fuzzy spec? ease of learning the system?) -- may give you a small bit of forward progress and some increased confidence, after which you can take larger steps.
So think less from the "too many lines is a problem" perspective and more from the "this might be a code smell, what problems is it going to cause for us, and is there anything easy and/or rewarding we can do about it?"
Leaving it cooking on the backburner for a bit -- coming back and revisiting it as time and coincidence allows (e.g. "I'm working near the code today, maybe I'll wander over and see if I can't document the assumptions a bit better...") may produce good results. Then again, getting royally ticked off and deciding something must be done about the situation is also effective.
Have I managed to be wishy-washy enough here? My point, I think, is that the code smells, the patterns/antipatterns, the best practices, etc -- they're there to serve you. Experiment to get used to them, and then take what makes sense for your current situation, and leave the rest.

I think you first need to "refactor" the specs. If there are repetitions in the spec it also will become easier to read, if it makes use of some "basic building blocks".
Edit: As long as you cannot refactor the specs, I wouldn't change the code.
Coding style guides are all made for easier code maintenance, but in your special case the ease of maintenance is achieved by following the spec.
Some people here asked if the code is generated. In my opinion it does not matter: If the code follows the spec "line by line" it makes no difference if the code is generated or hand-written.

1000 thousand lines of code is nothing. We have functions that are 6 to 12 thousand lines long. Of course those functions are so big, that literally things get lost in there, and no tool can help us even look at high level abstractions of them. the code is now unfortunately incomprehensible.
My opinion of functions that are that big, is that they were not written by brilliant programmers but by incompetent hacks who shouldn't be left anywhere near a computer - but should be fired and left flipping burgers at McDonald's. Such code wreaks havok by leaving behind features that cannot be added to or improved upon. (too bad for the customer). The code is so brittle that it cannot be modified by anyone - even the original authors.
And yes, those methods should be refactored, or thrown away.

Do you ever have to read or maintain the generated code?
If yes, then I'd think some refactoring might be in order.
If no, then the higher-level language is really the language you're working with -- the C++ is just an intermediate representation on the way to the compiler -- and refactoring might not be necessary.

Looks to me that you've implemented a separate language within your application - have you considered going that way?

It has been my understanding that it's recommended that any method over 100 lines of code be refactored.

I think some rules may be a little different in his era when code is most commonly viewed in an IDE. If the code does not contain exploitable repetition, such that there are 1,000 lines which are going to be referenced once each, and which share a significant number of variables in a clear fashion, dividing the code into 100-line routines each of which is called once may not be that much of an improvement over having a well-formatted 1,000-line module which includes #region tags or the equivalent to allow outline-style viewing.
My philosophy is that certain layouts of code generally imply certain things. To my mind, when a piece of code is placed into its own routine, that suggests that the code will be usable in more than one context (exception: callback handlers and the like in languages which don't support anonymous methods). If code segment #1 leaves an object in an obscure state which is only usable by code segment #2, and code segment #2 is only usable on a data object which is left in the state created by #1, then absent some compelling reason to put the segments in different routines, they should appear in the same routine. If a program puts objects through a chain of obscure states extending for many hundreds of lines of code, it might be good to rework the design of the code to subdivide the operation into smaller pieces which have more "natural" pre- and post- conditions, but absent some compelling reason to do so, I would not favor splitting up the code without changing the design.

For further reading, I highly recommend the long, insightful, entertaining, and sometimes bitter discussion of this topic over on the Portland Pattern Repository.

I've seen cases where it is not the case (for example, creating an Excel spreadsheet in .Net often requires a lot of line of code for the formating of the sheet), but most of the time, the best thing would be to indeed refactor it.
I personally try to make a function small enough so it all appears on my screen (without affecting the readability of course).

1000 lines? Definitely they need to be refactored. Also not that, for example, default maximum number of executable statements is 30 in Checkstyle, well-known coding standard checker.

If you refactor, when you refactor, add some comments to explain what the heck it's doing.
If it had comments, it would be much less likely a candidate for refactoring, because it would already be easier to read and follow for someone starting from scratch.

Then here the question: in general, do
you think such very long methods would
always need refactoring,
if you ask in general, we will say Yes .
or in a
similar case it would be acceptable?
(unfortunately refactoring the specs
is not an option)
Sometimes are acceptable, but is very unusual, I will give you a pair of examples:
There are some 8 bit microcontrollers called Microchip PIC, that have only a fixed 8 level stack, so you can't nest more than 8 calls, then care must be taken to avoid "stack overflow", so in this special case having many small function (nested) is not the best way to go.
Other example is when doing optimization of code (at very low level) so you have to take account the jump and context saving cost. Use it with care.
EDIT:
Even in generated code, you could need to refactorize the way its generated, for example for memory saving, energy saving, generate human readable, beauty, who knows, etc..

There has been very good general advise, here a practical recommendation for your sample:
common patterns can be isolated in plain feeder methods:
void AddSimpleTransform(OutMsg & msg, InMsg const & inMsg,
int rotateBy, int foldBy, int gonkBy = 0)
{
// create & add up to three messages
}
You might even improve that by making this a member of OutMsg, and using a fluent interface, such that you can write
OutMsg msg;
msg.AddSimpleTransform(inMsg, 12, 17)
.Staple("print")
.AddArtificialRust(0.02);
which can be an additional improvement under circumstances.

In game programming are global variables bad?

I know my gut reaction to global variables is "badd!" but in the two game development courses I've taken at my college globals were used extensively, and now in the DirectX 9 game programming tutorial I am using (www.directxtutorial.com) I'm being told globals are okay in game programming ...? The site also recommends using only structs if you can when doing game programming to help keep things simple.
I'm really confused on this issue, and all the research I've been trying to do is very confusing. I realize there are issues when using global variables (threading issues, they make code harder to maintain, the state of them is hard to track etc) but also there is a cost associated with not using globals, I'd have to pass a loooot of information around very often which can be confusing and I imagine time-costing, although I guess pointers would speed the process up (this is my first time writing a game in C++.) Anyway, I realize there is probably no "right" or "wrong" answer here since both ways work, but I want my code to be as proper as I can so any input would be good, thank you very much!

The trouble with games and globals is that games (nowadays) are threaded at engine level. Game developers using an engine use the engine's abstractions rather than directly programming concurrency (IIRC). In many of the highlevel languages such as C++, threads sharing state is complex. When many concurrent processes share a common resource they have to make sure they don't tread on eachother's toes.
To get around this, you use concurrency control such as mutex and various locks. This in effect makes asynchronous critical sections of code access shared state in a synchronous manner for writing. The topic of concurrency control is too much to explain fully here.
Suffice to say, if threads run with global variables, it makes debugging very hard, as concurrency bugs are a nightmare (think, "which thread wrote that? Who holds that lock?").
There are exceptions in games programming API such as OpenGL and DX. If your shared data/globals are pointers to DX or OpenGL graphics contexts then typically this maps down to GPU operations which don't suffer so much from the same trouble.
Just be careful. Keeping objects representing 'player' or 'zombie' or whatever, and sharing them between threads can be tricky. Spawn 'player' threads and 'zombie group' threads instead and have a robust concurrency abstraction between them based on message passing rather than accessing those object's state across the thread/critical section boundary.
Saying all that, I do agree with the "Say no to globals" point made below.
For more on the complexities of threads and shared state see:
1 POSIX Threads API - I know it is POSIX, but provides a good idea that translates to other API
2 Wikipedia's excellent range of articles on concurrency control mechanisms
3 The Dining Philosopher's problem (and many others)
4 ThreadMentor tutorials and articles on threading
5 Another Intel article, but more of a marketing thing.
6 An ACM article on building multi-threaded game engines

Have worked on AAA game titles, I can tell you that globals should be eradicated immediately before they spread like a cancer. I've seen them corrupt an I/O subsystem so completely that it had to be wholly thrown out to be rewritten.
Say no to globals. Always.

In this respect, there's no difference between games and other programs. While arguably OK in small examples given in elementary courses, global variables are strongly discouraged in real programs.
So if you want to write correct, readable and maintainable code, stay away from global variables as much as possible.

All answers until now deal with the globals/threads issue, but I will add my 2 cents to the struct/class (understood as all public attributes/private attributes + methods) discussion. Preferring structs over classes on the grounds of that being simpler is in the same line of thought of preferring assembler over C++ on the grounds of that being a simpler language.
The fact that you have to think on how your entities are going to be used and provide methods for it makes the concrete entity a little more complex, but greatly simplifies the rest of the code and maintainability. The whole point of encapsulation is that it simplifies the program by providing clear ways in that your data can be modified while maintaining your objects invariants. You control the entry points and what can happen there. Having all attributes public imply that any part of the code can have a small innocent error (forgot to check condition X) and break your invariants completely (health below 0, but no 'death' processing being triggered)
The other common discussion is performance: If I just need to update a datum, then having to call a method will impact my performance. Not really. If methods are simple and you provide them in the header as inlines (inside the class body or outside with the inline keyword), the compiler will be able to copy those instructions to each use place. You get the guarantee that the compiler will not leave out any check by mistake, and no impact in performance.

Having read a bit more what you posted though:
I'm being told globals are okay in
game programming ...? The site also
recommends using only structs if you
can when doing game programming to
help keep things simple.
Games code is no different from other code really. Gratuitous use of globals is bad regardless. And as for 'only use structs', that is just a load of crap. Approach game development on the same principles as any other software - you may find places where you need to bend this but they should be the exception, typically when dealing with low-level hardware issues.

I would say that the advice about globals and 'keeping things simple' is probably a mixture of being easier to learn and old fashioned thinking about performance. When I was being taught about game programming I remember being told that C++ wasn't advised for games as it would be too slow but I've worked on multiple games using every facet of C++ which proves that isn't true.
I would add to everyone's answers here that globals are to be avoided where possible, I wouldn't be afraid to use whatever you need from C++ to make your code understandable and easy to use. If you come up against a performance problem then profile that specific issue and I'll bet that most of the time you won't need to remove use of a C++ feature but just think about your problem better. There may still be some platforms around that require pure C but I don't really have experience of them, even the Gameboy Advance seemed to deal with C++ quite nicely.

The metaissue here is that of state. How much state does any given function depend on, and how much does it change? Then consider how much state is implicit versus explicit, and cross that with the inspect vs. change.
If you have a function/method/whatever that is called DoStuff(), you have no idea from the outside what it depends on, what it needs, and what's going to happen to the shared state. If this is a class member, you also have no idea how that object's state is going to mutate. This is bad.
Contrast to something like cosf(2), this function is understood not to change any global state, and any state that it requires (lookup tables for example) are hidden from view and have no effect on your program-at-large. This is a function that computes a value based on what you give it and it returns that value. It changes no state.
Class member functions then have the opportunity to step up some problems. There's a huge difference between
myObject.hitpoints -= 4;
myObject.UpdateHealth();
and
myObject.TakeDamage(4);
In the first example, an external operation is changing some state that one of its member functions implicitly depends upon. The fact is that these two lines can be separated by many other lines of code begins to make it non-obvious what's going to happen in the UpdateHealth call, even if outside of the subtraction it is the same as the TakeDamage call. Encapsulating the state changes (in the second example) implies that the specifics of the state changes aren't important to the outside world, and hopefully they're not. In the first example, the state changes are explicitly important to the outside world, and this is really no different than setting some globals and calling a function that uses those globals. E.g. hopefully you'd never see
extern float value_to_sqrt;
value_to_sqrt = 2.0f;
sqrt(); // reads the global value_to_sqrt
extern float sqrt_value; // has the results of the sqrt.
And yet, how many people do exactly this sort of thing in other contexts? (Considering especially that class instance state is "global" in regards to that particular instance.)
So- prefer giving explicit instruction to your function calls, and prefer that they return the results directly rather than having to explicitly set state before calling a function and then checking other state after it returns.
The more state dependencies a bit of code has, the harder it'll be to make it multithread safe, but that has already been covered above. The point I want to make is that the problem isn't so much globals but more the visibility of the collection of state that is required for a bit of code to operate (and subsequently how much other code also depends on that state).

Most games aren't multi-threaded, although newer titles are going down that route, and so they've managed to get away with it so far.
Saying that globals are okay in games is like not bothering to fix the brakes on your car because you only drive at 10mph!
It's bad practice which ever way you look at it.
You only have to look at the number of bugs in games to see examples of this.

If in class A you need to access data D, instead of setting D global, you'd better put into A a reference to D.

Globals are NOT intrinsically bad. In C for instance they are part of the language's normal use... and since C++ builds on C they still have a place.
On an aesthetic level, it's better to avoid them where you can sensibly make them part of a class, but if all you do is wrap a bunch of globals into a singleton, you made things worse because at least with globals it's obvious what the point is.
Be careful, but for some things it makes less sense to force OO concepts on what is actually a global value.

Two specific issues that I've encountered in the past:
First: If you're attempting to separate e.g. render phase (const access to most game state) from logic phase (non-const access to most game state) for whatever reason (decoupling render rate from logic rate, synchronizing game state across a network, recording and playback of gameplay at a fixed point in the frame, etc), globals make it very hard to enforce that.
Problems tend to creep in and become hard to debug and eradicate. This also has implications for threaded renderers separate from game logic, or the like (the other answers cover this topic thoroughly).
Second: The presence of many globals tends to bloat the literal pool, which the compiler typically places after each function.
If you get to your state through either a single "struct GlobalTable" which holds globals or a collection of methods on an object or the like, your literal pool tends to be a lot smaller, decreasing the size of the .text section in your executable.
This is mostly a concern for instruction set architectures that can't embed load targets directly into instructions (see e.g. fixed-width ARM or Thumb version 1 instruction encoding on ARM processors). Even on modern processors I'd wager you'll get slightly smaller codegen.
It also hurts doubly when your instruction and data caches are separate (again, as on some ARM processors); you'll tend to get two cachelines loaded where you might only need one with a unified cache. Since the literal pool may count as data, and won't always start on a cacheline boundary, the same cacheline might have to be loaded both into the i-cache and d-cache.
This may count as a "microoptimization", but it's a transform that's relatively easily applied to an entire codebase (e.g. extern struct GlobalTable { /* ... */ } the; and then replace all g_foo with the.foo)... And we've seen code size decrease between 5 and 10 percent in some cases, with a corresponding boost to performance due to keeping the caches cleaner.

I'm going to take a risk and say it depends to me on the scope/scale of your project. Because if you are trying to program an epic game with a boatload of code, then globals could, and easily in the most painful way in hindsight, cost you way more time than they save.
But if you are trying to code, say, something as simple as Super Mario or Metroid or Contra or even simpler, like Pac-Man, then these games were originally coded in 6502 assembly and used globals for almost everything. Just about all the game data and state was stored in data segments, and that didn't stop the devs from shipping a very competent and popular product in spite of working with absolutely inferior tools and engineering standards which would probably horrify people today.
So if you are just writing this kind of small and simple game which has a very limited scope and isn't designed to grow and expand far beyond its original design, isn't designed to be maintained for years and years, with a few thousand lines of simple C++ code, then I don't see the big deal of using a global here or a singleton there. Someone obsessed with trying to engineer Super Mario with the soundest engineering techniques with SOLID and a DI framework could end up taking far, far longer to ship than even the devs who wrote it in 6502 asm.
And I'm getting old and there's something to it there when I look at these old simple games and how they were coded, and it almost seems like the devs were doing something right in spite of the hard-coded magic numbers and globals all over the place while I spend my career fumbling around and trying to figure out the best way to engineer things. That said this is probably a very unpopular opinion, and not one I would have liked either a decade or two ago, but there's something to it. I don't look at the 6502 asm of Metroid and think, "these devs underengineered their product and their lives would have been so much easier if they did this or that." Seems like they did things just about right.
But again this is for small-scale stuff, maybe in the indie category of games by today's standards, and in the smaller of the indie games among them, and far from doing anything ground-breaking in terms of how much data it can process or using cutting-edge hardware techniques. If in doubt, I'd definitely suggest to err on the side of avoiding globals. It's also a little bit trickier in C++ as opposed to say, C, since you can have objects with constructor and destructors, and initialization and destruction order isn't well-defined and easily predictable for global objects. There I'd say to lean even more on the side of avoiding globals since they can trip you up in whole new ways when you aren't explicitly initializing and destroying them yourself in a predictable order. And naturally if you want to multithread a lot beyond a critical loop here and there, then your ability to reason about thread-safety of any particular code will be severely diminished if you cannot minimize the scope/visibility of your game state to the minimum of places.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js