Does anyone know any sites/books/articles covering best practices or theory around design patterns in high performance applications? It seems a lot of the patterns use indirection/abstraction/encapsulation in a way that may affect performance in computationally intensive code. Head First Design Patterns and even GoF mention possibility of performance hits with many of the patterns but without more concrete advice on how to deal with it.
I’m surprised we aren’t asking what performance problems you are having!
In my experience, performance problems are usually tied to specific conditions and situations. Design patterns, on the other hand, are solutions to more general and abstract problems. It would seem a bit awkward to approach both in the same text: what of possibly many "non-patterned" solutions should the author compare the performance of a design pattern against? When the performance problem is general, there certainly already are patterns to solve them: the Flyweight is a good example.
The penalties imposed by the use of a design pattern are of a finite, very small set: introduction of virtual calls, added latency due to delegation, extra memory consumption due to the proliferation of objects and so on. If, after profiling, you notice that these are the cause of your woes, there are known ways to minimize them.
Knowing the patterns might be useful to solve performance issues, too. First, someone already mentioned that patterns break down a problem in smaller bits: this might ease pinpointing the source of the issue and isolating ugly but performant code. They also create a framework of reasoning and expectations for developers. If you must introduce a deviation for performance reasons, it will be obvious: “Except here, where we forego X and do Y to improve performance, this is a Chain of Responsibility.” They are rules to be broken when needed.
(Alas, there is one very good pattern for getting good performance: measure, pinpoint, fix.)
Design patterns exist to help you come to grips with how to design software or improve its flexibility. How you implement the pattern determines what kind of performance penalty (or benefit) you will see from its use.
Some patterns do exist because that overall way of structuring things generally does lead to faster programs. But unlike algorithms there is no good way to really formally analyze a pattern to decide on how slow or fast it is.
My advice would be to use a pattern if it helps you figure out how to design a particular piece of code, or if you need to refactor to make code more flexible or clear. If you then have performance issues, use standard profiling techniques to find them.
If you're refactoring when you encounter performance issues, maybe the cost isn't worth the refactor, or maybe there's a way to mitigate it. If you're designing new code, maybe there's a way to mutate things to fix the performance issue if it truly lies in the necessary indirection for the pattern to work.
The most concrete advice is: profile it in your application and see how much of an impact it really makes.
Any other advice is going to be considerably more general and may not necessarily apply well to how you have implemented a given pattern in your application with your compiler on your platform.
Design pattern is really focusing on how you structure the code and define the class abstraction and interaction. Performance of your computational performance will really be mostly effected by the way you write the actual code implementation (body of the method).
For C++ I definitely suggest reading Scott Meyers book on Effective C++ and More Effective C++ series of books which in itself really reveals many idioms on writing high performance code.
You can read Herb Sutter's entries under "Effective Concurrency" for things involving multi-threading and concurrency patterns and how they affect performance.
http://herbsutter.com/
Design patterns are mostly ways of breaking your program into smaller pieces, which are easier to reuse, compose, design, and test. Several design patterns will result in code that performs worse than a simpler design, but they have a significant advantage when you consider the 80/20 rule.
The 80/20 rule says that 80 percent of your program's execution time will be spent executing 20 percent of it's code. When your program is nice and modular, it's easy to throw it in a profiler and see exactly which component could be tuned/optimized, or where it makes sense to go with a less flexible design in order to improve performance. Having the design that far separated initially though makes it easier to find performance hot spots.
One term that may help you get better hits is 'pattern language'. It's a collection of patterns that go together for some purpose. If you have a more specific goal that high performance someone may have plotted out a path through patterns for your domain, for example: pattern language for parallel software. Here's another nice collection of parallel programming patterns from UIUC, a hotbed of patterns work.
The ACE/TAO guys have a lot of papers about high performance network patterns using C++
Remember the old saying "You can have it good, fast and cheap, pick two"
Design patterns address the good. A good foundation is needed so the code can be accurate, and maintainable.
If performance is an issue, then benchmark and then optimize the sections that give you problems. Many times performance is just a question of picking a proper algorithm., but it may mean you need to break-out into some horrifically optimized code for that 10% that takes up 90% of the time. Just make sure you comment the S^^T out of it.
GoF design patterns are about using proven patterns to solve common problems with elegant, maintainable code. They don't target performance.
If you want patterns for performance, you may need to look at system architecture patterns, algorithms, data structures, etc.
What does your app do?
If your application is in C++, and is written sensibly, the chances are your code will run blindingly fast on modern hardware, until it has to wait for I/O. The exception would be something like real time image analysis that is very processor intensive.
If performance is an issue, do you really mean I/O performance? (disk, DB, network etc.)
There are 'patterns' that allow your application to perform even while frequently waiting for I/O (asynchronous callbacks etc.)
If you are dealing with an uneven load, whereby the peak load may be much higher than average load, a commonly employed architecture pattern is to de-couple system components with message queues.
Related
So I had a discussion with a colleague today. He strongly suggested me to change a code from
if(condition){
function->setValue(true)
}
else{
function->setValue(false)
}
to
function->setValue(false)
if(condition){
function->setValue(true)
}
in order to avoid the 'else'. I disagreed, because - while it might improve readability to some degree - in the case of the if-condition being true, we have 1 absolutely unnecessary function call.
What do you guys think?
Meh.
To do this to just to avoid the else is silly (at least there should be a deeper rationale). There's no extra branching cost to it typically, especially after the optimizer goes through it.
Code compactness can sometimes be a desirable aesthetic, especially if a lot of time is spent skimming and searching through code than reading it line-by-line. There can be legit reasons to favor terser code sometimes, but it's always cons and pros. But even code compactness should not be about cramming logic into fewer lines of code so much as just straightforward logic.
Correctness here might be easier to achieve with one or the other. The point was made in a comment that you might not know the side effects associated with calling setValue(false), though I would suggest that's kind of moot. Functions should have minimal side effects, they should all be documented at the interface/usage level if they aren't totally obvious, and if we don't know exactly what they are, we should be spending more time looking up their documentation prior to calling them (and their side effects should not be changing once firm dependencies are established to them).
Given that, it may sometimes be easier to achieve correctness and maintain it with a solution that starts out initializing states to some default value, and using a form of code that opts in to overwrite it in specific branches of code. From that standpoint, what your colleague suggested may be valid as a way to avoid tripping over that code in the future. Then again, for a simple if/else pair of branches, it's hardly a big deal.
Don't worry about the cost of the extra most-likely-constant-time function call either way in this kind of knee-deep micro-level implementation case, especially with no super tight performance-critical loop around this code (and even then, still prefer to worry about that at least a little bit in hindsight after profiling).
I think there are far better things to think about than this kind of coding style, like testing procedure. Reliable code tends to need less revisiting, and has the freedom to be written in a wider variety of ways without causing disputes. Testing is what establishes reliability. The biggest disputes about coding style tend to follow teams where there's more toe-stepping and more debugging of the same bodies of code over and over and over from disparate people due to lack of reliability, modularity, excessive coupling, etc. It's a symptom of a problem but not necessarily the root cause.
I've seen (and searched for) a lot of questions on StackOverflow about premature optimization - word on the street is, it is the root of all evil. :P I confess that I'm often guilty of this; I don't really optimize for speed at the cost of code legibility, but I will rewrite my code in logical manners using datatypes and methods that seem more appropriate for the task (e.g. in Actionscript 3, using a typed Vector instead of an untyped Array for iteration) and if I can make my code more elegant, I will do so. This generally helps me understand my code, and I generally know why I'm making these changes.
At any rate, I was thinking today - in OOP, we promote encapsulation, attempting to hide the implementation and promote the interface, so that classes are loosely coupled. The idea is to make something that works without having to know what's going on internally - the black box idea.
As such, here's my question - is it wise to attempt to do deep optimization of code at the class level, since OOP promotes modularity? Or does this fall into the category of premature optimization? I'm thinking that, if you use a language that easily supports unit testing, you can test, benchmark, and optimize the class because it in itself is a module that takes input and generates output. But, as a single guy writing code, I don't know if it's wiser to wait until a project is fully finished to begin optimization.
For reference: I've never worked in a team before, so something that's obvious to developers who have this experience might be foreign to me.
Hope this question is appropriate for StackOverflow - I didn't find another one that directly answered my query.
Thanks!
Edit: Thinking about the question, I realize that "profiling" may have been the correct term instead of "unit test"; unit-testing checks that the module works as it should, while profiling checks performance. Additionally, a part of the question I should have asked before - does profiling individual modules after you've created them not reduce time profiling after the application is complete?
My question stems from the game development I'm trying to do - I have to create modules, such as a graphics engine, that should perform optimally (whether they will is a different story :D ). In an application where performance was less important, I probably wouldn't worry about this.
I don't really optimize for speed at the cost of code legibility, but I will rewrite my code in logical manners using datatypes and methods that seem more appropriate for the task [...] and if I can make my code more elegant, I will do so. This generally helps me understand my code
This isn't really optimization, rather refactoring for cleaner code and better design*. As such, it is a Good Thing, and it should indeed be practiced continuously, in small increments. Uncle Bob Martin (in his book Clean Code) popularized the Boy Scout Rule, adapted to software development: Leave the code cleaner than you found it.
So to answer your title question rephrased, yes, refactoring code to make it unit testable is a good practice. One "extreme" of this is Test Driven Development, where one writes the test first, then adds the code which makes the test pass. This way the code is created unit testable from the very beginning.
*Not to be nitpicky, just it is useful to clarify common terminology and make sure that we use the same terms in the same meaning.
True, optimization I believe should be left as a final task (although its good to be cognizant of where you might need to go back and optimize while writing your first draft). That's not to say that you shouldn't re-factor things iteratively in order to maintain order and cleanliness in the code. It is to say that if something currently serves the purpose and isn't botching a requirement of the application then the requirements should first be addressed as ultimately they are what you're responsible for delivering (unless the requirements include specifics on maximum request times or something along those lines). I agree with Korin's methodology as well, build for function first if time permits optimize to your hearts content (or the theoretical limit, whichever comes first).
The reason that premature optimization is a bad thing is this: it can take a lot of time and you don't know in advance where the best use of your time is likely to be.
For example, you could spend a lot of time optimizing a class, only to find that the bottleneck in your application is network latency or similar factor that is far more expensive in terms of execution time. Because at the outset you don't have a complete picture, premature optimization leads to a less than optimal use of your time. In this case, you'd probably have preferred to fix the latency issue than optimize class code.
I strongly believe that you should never reduce your code readability and good design because of performance optimizations.
If you are writing code where performance is critical it may be OK to lower the style and clarity of your code, but this does not hold true for the average enterprise application. Hardware evolves quickly and gets cheaper everyday. In the end you are writing code that is going to be read by other developers, so you'd better do a good job at it!
It's always beautiful to read code that has been carefully crafted, where every path has a test that helps you understand how it should be used. I don't really care if it is 50 ms slower than the spaghetti alternative which does lots of crazy stuff.
Yes you should skip optimizing for the unit test. Optimization when required usually makes the code more complex. Aim for simplicity. If you optimize for the unit test you may actually de-optimize for production.
If performance is really bad in the unit test, you may need to look at your design. Test in the application to see if performance is equally bad before optimizing.
EDIT: De-optimization is likely to occur when the data being handled varies is size. This is most likely to occur will classes that work with sets of data. Response may be linear, but originally slow, compared to geometric and originally fast. If the unit test uses a small set of data, then the geometric solution may be chosen for the unit test. When production hits the class with a large set of data performance tanks.
Sorting algorithms are a classic case for this kind of behavior and resulting de-optimizations. Many other algorithms have similar characteristics.
EDIT2: My most successful optimization was the sort routine for a report where data was stored on disk in a memory mapped file. The sort times were reasonable with moderate data sizes which did not require disk access. With larger sized data sets it could take days to process the data. Initial timings of the report showed; data selection 3 minutes, data sorting 3 days, and reporting 3 minutes. Investigation of the sort showed that it was a fully unoptimized bubble sort (n-1 full passes for a data set of size n), roughly n square in big O notation. Changing the sorting algorithm reduced the sort time for this report to 3 minutes. I would not have expected a unit test to cover this case, and the original code was as simple (fast) as you could get for small sets. The replacement was much more complex and slower for very small sets, but handled large sets faster with a more linear curve, n log n in big O notation. (Note: no optimization was attempted until we had metrics.)
In practice, I aim for a ten-fold improvement of a routine which takes at least 50% of the module run-time. Achieving this level of optimization for a routine using 55% of the run-time will save 50% of the total run-time.
I want to know whether there is an effect on program efficiency by adopting object oriented approach to a problem as compared to the structured programming approach in any programming language but specially in c++.
Maybe. Maybe not.
You can write efficient object-oriented code. You can write inefficient structured code.
It depends on the application, how well the code is written, and how heavily the code is optimized. In general, you should write code so that it has a good, clean, modular architecture and is well designed, then if you have problems with performance optimize the hot spots that are causing performance issues.
Use object oriented programming where it makes sense to use it and use structured programming where it makes sense to use it. You don't have to choose between one and the other: you can use both.
I remember back in the early 1990's when C++ was young there were studies done about this. If I remember correctly, the guys who took (well written) C++ programs and recoded them in C got around a 15% increase in speed. The guys who took C programs and recoded them in C++, and modified the imperative style of C to an OO style (but same algorithms) for C++ got the same or better performance. The apparent contradiction was explained by the observation that the C programs, in being translated to an object oriented style, became better organized. Things that you did in C because it was too much code and trouble to do better could more easily be done properly in C++.
Thinking back about this I wonder about the conclusion some. Writing a program a second time will always result in a better program, so it didn't have to be imperative to OO style that made the difference. Todays computer architectures are designed with hardware support for common operations done by OO programs, and compilers have gotten better at using the instructions, so I think that it is likely that whatever overhead a virtual function call had in 1992 it is far smaller today.
There doesn't have to be, if you are very careful to avoid it. If you just take the most straightforward approach, using dynamic allocation, virtual functions, and (especially) passing objects by value, then yes there will be inefficiency.
It doesn't have to be. Algorithm is all matters. I agree encapsulation will slow you down little bit, but compilers are there to optimize.
You would say no if this is the question in computer science paper.
However in the real development environment this tends to be true if the OOP paradigm is used correctly. The reason is that in real development process, we generally need to maintain our code base and that the time when OOP paradigm could help us. One strong point of OOP over structured programming like C is that in OOP it is easier to make the code maintainable. When the code is more maintainable, it means less bug and less time to fix bug and less time needed for implementing new features. The bottom line is then we will have more time to focus on the efficiency of the application.
The problem is not technical, it is psychological. It is in what it encourages you to do by making it easy.
To make a mundane analogy, it is like a credit card. It is much more efficient than writing checks or using cash. If that is so, why do people get in so much trouble with credit cards? Because they are so easy to use that they abuse them. It takes great discipline not to over-use a good thing.
The way OO gets abused is by
Creating too many "layers of abstraction"
Creating too much redundant data structure
Encouraging the use of notification-style code, attempting to maintain consistency within redundant data structures.
It is better to minimize data structure, and if it must be redundant, be able to tolerate temporary inconsistency.
ADDED:
As an illustration of the kind of thing that OO encourages, here's what I see sometimes in performance tuning: Somebody sets SomeProperty = true;. That sounds innocent enough, right? Well that can ripple to objects that contain that object, often through polymorphism that's hard to trace. That can mean that some list or dictionary somewhere needs to have things added to it or removed from it. That can mean that some tree or list control needs controls added or removed or shuffled. That can mean windows are being created or destroyed. It can also mean some things need to be changed in a database, which might not be local so there's some I/O or mutex locking to be done.
It can really get crazy. But who cares? It's abstract.
There could be: the OO approach tends to be closer to a decoupled approach where different modules don't go poking around inside each other. They are restricted to public interfaces, and there is always a potential cost in that. For example, calling a getter instead of just directly examining a variable; or calling a virtual function by default because the type of an object isn't sufficiently obvious for a direct call.
That said, there are several factors that diminish this as a useful observation.
A well written structured program should have the same modularity (i.e. hiding implementations), and therefore incur the same costs of indirection. The cost of calling a function pointer in C is probably going to be very similar to the cost of calling a virtual function in C++.
Modern JITs, and even the use of inline methods in C++, can remove the indirection cost.
The costs themselves are probably relatively small (typically just a few extra simple operations per instruction call). This will be insignificant in a program where the real work is done in tight loops.
Finally, a more modular style frees the programmer to tackle more complicated, but hopefully less complex algorithms without the peril of low level bugs.
Similar to Does TDD mean not thinking about class design?, I am having trouble thinking about where the traditional 'design' stage fits into TDD.
According to the Bowling Game Kata (the 'conversation' version, whose link escapes me at the moment) TDD appears to ignore design decisions made early on (discard the frame object, roll object, etc). I can see in that example it being a good idea to follow the tests and ignore your initial design thoughts, but in bigger projects or ones where you want to leave an opening for expansion / customisation, wouldn't it be better to put things in that you don't have a test for or don't have a need for immediately in order to avoid time-consuming rewrites later?
In short - how much design is too much when doing TDD, and how much should I be following that design as I write tests and the code to pass them (ignoring my design to only worry about passing tests)?
Or am I worrying about nothing, and code written simply to follow tests is not (in practice) difficult to rewrite or refactor if you're painted into a corner?
Alternatively, am I missing the point and that I should be expecting to rewrite portions of the code when I come to test a new section of functionality?
I would base your tests on your initial design. In many ways TDD is a discovery process. You can expect to either confirm your early design choices or find that there are better choices you can make. Do as much upfront design as you are comfortable with. Some like to fly by the seat of the chairs doing high level design and using TDD to flesh the design out. While others like to have everything on paper first.
Part of TDD is refactoring.
There is something to be said about 'Designing Big Complex Systems' that should not be associated with TDD - especially when TDD is interpreted as 'Test Driven Design' and not 'Test Driven Development'.
In the context 'Development', using TDD will ensure you are writing testable code which give all the benefits cited about TDD ( detect bugs early, high code:test coverage ratio, easier future refactoring etc. etc.)
But in 'Designing' large complex systems, TDD does not particularly address the following concerns that are inherent in the architecture of the system
(Engineering for) Performance
Security
Scalability
Availability
(and all other 'ilities')
(i.e. all of the concerns above do not magically 'emerge' through the "write a failing test case first, followed by the working implementation, Refactor - lather, rinse, repeat..." recipe).
For these, you will need to approach the problem by white-boarding the high-level and then low-level details of a system with respect to the constraints imposed by the requirements and the problem space.
Some of the above considerations compete with each other and require careful trade-offs that just don't 'emerge' through writing lots of unit tests.
Once key components and their responsibilities are defined and
understood, TDD can be used in the implementation of these
components. The process of Refactoring and continually
reviewing/improving your code will ensure the low-level design
details of these components are well-crafted.
I am yet to come across a significantly complex piece of software (e.g. compiler, database, operating system) that was done in a Test Driven Design style. The following blog article talks about this point extremely well (Compilers, TDD, Mastery)
Also, check the following videos on Architecture which adds a lot of common sense to the thought process.
Start with a rough design idea, pick a first test and start coding, going green test after test, letting the design emerge, similar or not to the initial design. How much initial design depends on the problem complexity.
One must be attentive and listen to and sniff the code, to detect refactoring opportunities and code smells.
Strictly following TDD and the SOLID principles will bring code clean, testable and flexible, so that it can be easily refactored, leveraging on the unit tests as scaffolding to prevent regression.
I've found three ways of doing design with TDD:
Allow the design to emerge naturally as duplication and complexity is removed
Create a perfect design up-front, using mocks combined with the single responsibility principle
Be pragmatic about it.
Pragmatism seems to be the best choice most times, so here's what I do. If I know that a particular pattern will suit my problem very well (for instance, MVC) I'll go straight for the mocks and assume it works. Otherwise, if the design is less clear, I'll allow it to emerge.
The cross-over point at which I feel the need to refactor an emergent design is the point at which it stops being easy to change. If a piece of code isn't perfectly designed, but another dev coming across it could easily refactor it themselves, it's good enough. If the code is becoming so complex that it stops being obvious to another dev, it's time to refactor it.
I like Real Options, and refactoring something to perfection feels to me like committing to the design without any real need to do so. I refactor to "good enough" instead; that way if my design proves itself to be wrong I've not wasted the time. Assume that your design will be wrong if you've never used it before in a similar context.
This also lets me get my code out much more quickly than if it were perfect. Having said that, it was my attempts to make the code perfect that taught me where the line was!
Can anyone help me on design patterns commonly used for RTOS?
In VXworks, which pattern is more preferable?
Can we ignore the second sentence in your question? It is meaningless, and perhaps points to a misunderstanding of design patterns. The first part is interesting however. That said, I would generalise it to cover real-time systems rather than RTOS.
Many of the most familiar patterns are mechanistic, but in real-time systems higher-level architectural patterns are also important.
Bruce Powell Douglass is probably the foremost author on the subject of patterns for real time systems. If you want a flavour of what he has to say on the subject then read this article on Embedded.com (it is part three of a series of three; be sure to read the first two as well, since they also touch on the subject, (1) (2)). You could also do worst than to visit Embedded.com and enter "design patterns" into the search box, there are a number of articles on specific patterns and general articles on the subject.
While I think you are being far to specific in requesting patterns for "RTOS(VxWorks)", patterns I have used specifically with VxWorks are the Facade and Adapter patterns. Partly to provide an OO API, and also to provide a level of RTOS agnostic abstraction. The resulting classes were then implemented for Segger emBOS (to allow us to run a smaller, lower cost, royalty free RTOS), and both Windows and Linux to allow test, debug and simulation of the code in a richer environment with more powerful tools.
A non-exhaustive list of many patterns is provided on Wikipedia, many of which will be applicable to real-time systems. The listed concurrency patterns are most obviously relevant.
As Mike DeSimone commented, way too generic. However, here are couple things to keep in mind for a RTOS (not just VxWorks).
Avoid doing too much in the ISR. If possible pass on some of the processing to a waiting task.
Keep multithreading optimal. Too much and you have context switching overhead. Too little and your problem solution may be complicated.
Another important aspect is keeping the RTOS predictable and understandable for the user. Typically you see fixed-priority schedulers that do not try to be fair or adaptive, but rather do exactly as told and if you mess up with priorities and starve some task, so be it. Time to complete kernel operations tend to be short and predictable, often documented with their worst-case execution times.