Related
I face a situation where we have many very long methods, 1000 lines or more.
To give you some more detail, we have a list of incoming high level commands, and each generates results in a longer (sometime huge) list of lower level commands. There's a factory creating an instance of a class for each incoming command. Each class has a process method, where all the lower level commands are generated added in sequence. As I said, these sequences of commands and their parameters cause quite often the process methods to reach thousands of lines.
There are a lot of repetitions. Many command patterns are shared between different commands, but the code is repeated over and over. That leads me to think refactoring would be a very good idea.
On the contrary, the specs we have come exactly in the same form as the current code. Very long list of commands for each incoming one. When I've tried some refactoring, I've started to feel uncomfortable with the specs. I miss the obvious analogy between the specs and code, and lose time digging into newly created common classes.
Then here the question: in general, do you think such very long methods would always need refactoring, or in a similar case it would be acceptable?
(unfortunately refactoring the specs is not an option)
edit:
I have removed every reference to "generate" cause it was actually confusing. It's not auto generated code.
class InCmd001 {
OutMsg process ( InMsg& inMsg ) {
OutMsg outMsg = OutMsg::Create();
OutCmd001 outCmd001 = OutCmd001::Create();
outCmd001.SetA( param.getA() );
outCmd001.SetB( inMsg.getB() );
outMsg.addCmd( outCmd001 );
OutCmd016 outCmd016 = OutCmd016::Create();
outCmd016.SetF( param.getF() );
outMsg.addCmd( outCmd016 );
OutCmd007 outCmd007 = OutCmd007::Create();
outCmd007.SetR( inMsg.getR() );
outMsg.addCmd( outCmd007 );
// ......
return outMsg;
}
}
here the example of one incoming command class (manually written in pseudo c++)
Code never needs refactoring. The code either works, or it doesn't. And if it works, the code doesn't need anything.
The need for refactoring comes from you, the programmer. The person reading, writing, maintaining and extending the code.
If you have trouble understanding the code, it needs to be refactored. If you would be more productive by cleaning up and refactoring the code, it needs to be refactored.
In general, I'd say it's a good idea for your own sake to refactor 1000+ line functions. But you're not doing it because the code needs it. You're doing it because that makes it easier for you to understand the code, test its correctness, and add new functionality.
On the other hand, if the code is automatically generated by another tool, you'll never need to read it or edit it. So what'd be the point in refactoring it?
I understand exactly where you're coming from, and can see exactly why you've structured your code the way it is, but it needs to change.
The uncertainty you feel when you attempt to refactor can be ameliorated by writing unit tests. If you've tests specific to each spec, then the code for each spec can be refactored until you're blue in the face, and you can have confidence in it.
A second option, is it possible to automatically generate your code from a data structure?
If you've a core suite of classes that do the donkey work and edge cases, you can auto-generate the repetitive 1000 line methods as often as you wish.
However, there are exceptions to every rule.
If the methods are a literal interpretation of the spec (very little additional logic), and the specs change infrequently, and the "common" portions (i.e. bits that happen to be the same right now) of the specs change at different times, and you're not going to be asked to get a 10x performance gain out of the code anytime soon, then (and only then) . . . you may be better off with what you have.
. . . but on the whole, refactor.
Yes, always. 1000 lines is at least 10x longer than any function should ever be, and I'm tempted to say 100x, except that when dealing with input parsing and validation it can become natural to write functions with 20 or so lines.
Edit: Just re-read your question and I'm not clear on one point - are you talking about machine generated code that no-one has to touch? In which case I would leave things as they are.
Refectoring is not the same as writing from scratch. While you should never write code like this, before you refactor it, you need to consider the costs of refactoring in terms of time spent, the associated risks in terms of breaking code that already works, and the net benefits in terms of future time saved. Refactor only if the net benefits outweigh the associated costs and risks.
Sometimes wrapping and rewriting can be a safer and more cost effective solution, even if it appears expensive at first glance.
Long methods need refactoring if they are maintained (and thus need to be understood) by humans.
As a rule of thumb, code for humans first. I don't agree with the common idea that functions need to be short. I think what you need to aim at is when a human reads your code they grok it quickly.
To this effect it's a good idea to simplify things as much as possible--but not more than that. It's a good idea to delegate roughly one task for each function. There is no rule as for what "roughly one task" means: you'll have to use your own judgement for that. But do recognize that a function split into too many other functions itself reduces readability. Think about the human being who reads your function for the first time: they would have to follow one function call after another, constantly context-switching and maintaining a stack in their mind. This is a task for machines, not for humans.
Find the balance.
Here, you see how important naming things is. You will see it is not that easy to choose names for variables and functions, it takes time, but on the other hand it can save a lot of confusion on the human reader's side. Again, find the balance between saving your time and the time of the friendly humans who will follow you.
As for repetition, it's a bad idea. It's something that needs to be fixed, just like a memory leak. It's a ticking bomb.
As others have said before me, changing code can be expensive. You need to do the thinking as for whether it will pay off to spend all this time and effort, facing the risks of change, for a better code. You will possibly lose lots of time and make yourself one headache after another now, in order to possibly save lots of time and headache later.
Take a look at the related question How many lines of code is too many?. There are quite a few tidbits of wisdom throughout the answers there.
To repost a quote (although I'll attempt to comment on it a little more here)... A while back, I read this passage from Ovid's journal:
I recently wrote some code for
Class::Sniff which would detect "long
methods" and report them as a code
smell. I even wrote a blog post about
how I did this (quelle surprise, eh?).
That's when Ben Tilly asked an
embarrassingly obvious question: how
do I know that long methods are a code
smell?
I threw out the usual justifications,
but he wouldn't let up. He wanted
information and he cited the excellent
book Code Complete as a
counter-argument. I got down my copy
of this book and started reading "How
Long Should A Routine Be" (page 175,
second edition). The author, Steve
McConnell, argues that routines should
not be longer than 200 lines. Holy
crud! That's waaaaaay to long. If a
routine is longer than about 20 or 30
lines, I reckon it's time to break it
up.
Regrettably, McConnell has the cheek
to cite six separate studies, all of
which found that longer routines were
not only not correlated with a greater
defect rate, but were also often
cheaper to develop and easier to
comprehend. As a result, the latest
version of Class::Sniff on github now
documents that longer routines may not
be a code smell after all. Ben was
right. I was wrong.
(The rest of the post, on TDD, is worth reading as well.)
Coming from the "shorter methods are better" camp, this gave me a lot to think about.
Previously my large methods were generally limited to "I need inlining here, and the compiler is being uncooperative", or "for one reason or another the giant switch block really does run faster than the dispatch table", or "this stuff is only called exactly in sequence and I really really don't want function call overhead here". All relatively rare cases.
In your situation, though, I'd have a large bias toward not touching things: refactoring carries some inherent risk, and it may currently outweigh the reward. (Disclaimer: I'm slightly paranoid; I'm usually the guy who ends up fixing the crashes.)
Consider spending your efforts on tests, asserts, or documentation that can strengthen the existing code and tilt the risk/reward scale before any attempt to refactor: invariant checks, bound function analysis, and pre/postcondition tests; any other useful concepts from DBC; maybe even a parallel implementation in another language (maybe something message oriented like Erlang would give you a better perspective, given your code sample) or even some sort of formal logical representation of the spec you're trying to follow if you have some time to burn.
Any of these kinds of efforts generally have a few results, even if you don't get to refactor the code: you learn something, you increase your (and your organization's) understanding of and ability to use the code and specifications, you might find a few holes that really do need to be filled now, and you become more confident in your ability to make a change with less chance of disastrous consequences.
As you gain a better understanding of the problem domain, you may find that there are different ways to refactor you hadn't thought of previously.
This isn't to say "thou shalt have a full-coverage test suite, and DBC asserts, and a formal logical spec". It's just that you are in a typically imperfect situation, and diversifying a bit -- looking for novel ways to approach the problems you find (maintainability? fuzzy spec? ease of learning the system?) -- may give you a small bit of forward progress and some increased confidence, after which you can take larger steps.
So think less from the "too many lines is a problem" perspective and more from the "this might be a code smell, what problems is it going to cause for us, and is there anything easy and/or rewarding we can do about it?"
Leaving it cooking on the backburner for a bit -- coming back and revisiting it as time and coincidence allows (e.g. "I'm working near the code today, maybe I'll wander over and see if I can't document the assumptions a bit better...") may produce good results. Then again, getting royally ticked off and deciding something must be done about the situation is also effective.
Have I managed to be wishy-washy enough here? My point, I think, is that the code smells, the patterns/antipatterns, the best practices, etc -- they're there to serve you. Experiment to get used to them, and then take what makes sense for your current situation, and leave the rest.
I think you first need to "refactor" the specs. If there are repetitions in the spec it also will become easier to read, if it makes use of some "basic building blocks".
Edit: As long as you cannot refactor the specs, I wouldn't change the code.
Coding style guides are all made for easier code maintenance, but in your special case the ease of maintenance is achieved by following the spec.
Some people here asked if the code is generated. In my opinion it does not matter: If the code follows the spec "line by line" it makes no difference if the code is generated or hand-written.
1000 thousand lines of code is nothing. We have functions that are 6 to 12 thousand lines long. Of course those functions are so big, that literally things get lost in there, and no tool can help us even look at high level abstractions of them. the code is now unfortunately incomprehensible.
My opinion of functions that are that big, is that they were not written by brilliant programmers but by incompetent hacks who shouldn't be left anywhere near a computer - but should be fired and left flipping burgers at McDonald's. Such code wreaks havok by leaving behind features that cannot be added to or improved upon. (too bad for the customer). The code is so brittle that it cannot be modified by anyone - even the original authors.
And yes, those methods should be refactored, or thrown away.
Do you ever have to read or maintain the generated code?
If yes, then I'd think some refactoring might be in order.
If no, then the higher-level language is really the language you're working with -- the C++ is just an intermediate representation on the way to the compiler -- and refactoring might not be necessary.
Looks to me that you've implemented a separate language within your application - have you considered going that way?
It has been my understanding that it's recommended that any method over 100 lines of code be refactored.
I think some rules may be a little different in his era when code is most commonly viewed in an IDE. If the code does not contain exploitable repetition, such that there are 1,000 lines which are going to be referenced once each, and which share a significant number of variables in a clear fashion, dividing the code into 100-line routines each of which is called once may not be that much of an improvement over having a well-formatted 1,000-line module which includes #region tags or the equivalent to allow outline-style viewing.
My philosophy is that certain layouts of code generally imply certain things. To my mind, when a piece of code is placed into its own routine, that suggests that the code will be usable in more than one context (exception: callback handlers and the like in languages which don't support anonymous methods). If code segment #1 leaves an object in an obscure state which is only usable by code segment #2, and code segment #2 is only usable on a data object which is left in the state created by #1, then absent some compelling reason to put the segments in different routines, they should appear in the same routine. If a program puts objects through a chain of obscure states extending for many hundreds of lines of code, it might be good to rework the design of the code to subdivide the operation into smaller pieces which have more "natural" pre- and post- conditions, but absent some compelling reason to do so, I would not favor splitting up the code without changing the design.
For further reading, I highly recommend the long, insightful, entertaining, and sometimes bitter discussion of this topic over on the Portland Pattern Repository.
I've seen cases where it is not the case (for example, creating an Excel spreadsheet in .Net often requires a lot of line of code for the formating of the sheet), but most of the time, the best thing would be to indeed refactor it.
I personally try to make a function small enough so it all appears on my screen (without affecting the readability of course).
1000 lines? Definitely they need to be refactored. Also not that, for example, default maximum number of executable statements is 30 in Checkstyle, well-known coding standard checker.
If you refactor, when you refactor, add some comments to explain what the heck it's doing.
If it had comments, it would be much less likely a candidate for refactoring, because it would already be easier to read and follow for someone starting from scratch.
Then here the question: in general, do
you think such very long methods would
always need refactoring,
if you ask in general, we will say Yes .
or in a
similar case it would be acceptable?
(unfortunately refactoring the specs
is not an option)
Sometimes are acceptable, but is very unusual, I will give you a pair of examples:
There are some 8 bit microcontrollers called Microchip PIC, that have only a fixed 8 level stack, so you can't nest more than 8 calls, then care must be taken to avoid "stack overflow", so in this special case having many small function (nested) is not the best way to go.
Other example is when doing optimization of code (at very low level) so you have to take account the jump and context saving cost. Use it with care.
EDIT:
Even in generated code, you could need to refactorize the way its generated, for example for memory saving, energy saving, generate human readable, beauty, who knows, etc..
There has been very good general advise, here a practical recommendation for your sample:
common patterns can be isolated in plain feeder methods:
void AddSimpleTransform(OutMsg & msg, InMsg const & inMsg,
int rotateBy, int foldBy, int gonkBy = 0)
{
// create & add up to three messages
}
You might even improve that by making this a member of OutMsg, and using a fluent interface, such that you can write
OutMsg msg;
msg.AddSimpleTransform(inMsg, 12, 17)
.Staple("print")
.AddArtificialRust(0.02);
which can be an additional improvement under circumstances.
Code coverage is propably the most controversial code metric. Some say, you have to reach 80% code coverage, other say, it's superficial and does not say anything about your testing quality. (See Jon Limjap's good answer on "What is a reasonable code coverage % for unit tests (and why)?".)
People tend to measure everything. They need comparisons, benchmarks etc.
Project teams need a pointer, how good their testing is.
So what are alternatives to code coverage? What would be a good metric that says more than "I touched this line of code"?
Are there real alternatives?
If you are looking for some useful metrics that tell you about the quality (or lack there of) of your code, you should look into the following metrics:
Cyclomatic Complexity
This is a measure of how complex a method is.
Usually 10 and lower is good, 11-25 is poor, higher is terrible.
Nesting Depth
This is a measure of how many nested scopes are in a method.
Usually 4 and lower is good, 5-8 is poor, higher is terrible.
Relational Cohesion
This is a measure of how well related the types in a package or assembly are.
Relational cohesion is somewhat of a relative metric, but useful none the less.
Acceptable levels depends on the formula. Given the following:
R: number of relationships in package/assembly
N: number of types in package/assembly
H: Cohesion of relationship between types
Formula: H = (R+1)/N
Given the above formula, acceptable range is 1.5 - 4.0
Lack of Cohesion of Methods (LCOM)
This is a measure of how cohesive a class is.
Cohesion of a class is a measure of how many fields each method references.
Good indication of whether your class meets the Principal of Single Responsibility.
Formula: LCOM = 1 - (sum(MF)/M*F)
M: number of methods in class
F: number of instance fields in class
MF: number of methods in class accessing a particular instance field
sum(MF): the sum of MF over all instance fields
A class that is totally cohesive will have an LCOM of 0.
A class that is completely non-cohesive will have an LCOM of 1.
The closer to 0 you approach, the more cohesive, and maintainable, your class.
These are just some of the key metrics that NDepend, a .NET metrics and dependency mapping utility, can provide for you. I recently did a lot of work with code metrics, and these 4 metrics are the core key metrics that we have found to be most useful. NDepend offers several other useful metrics, however, including Efferent & Afferent coupling and Abstractness & Instability, which combined provide a good measure of how maintainable your code will be (and whether or not your in what NDepend calls the Zone of Pain or the Zone of Uselessness.)
Even if you are not working with the .NET platform, I recommend taking a look at the NDepend metrics page. There is a lot of useful information there that you might be able to use to calculate these metrics on whatever platform you develop on.
Crap4j is one fairly good metrics that I'm aware of...
Its a Java implementation of the Change Risk Analysis and Predictions software metric which combines cyclomatic complexity and code coverage from automated tests.
Bug metrics are also important:
Number of bugs coming in
Number of bugs resolved
To detect for instance if bugs are not resolved as fast as new come in.
What about watching the trend of code coverage during your project?
As it is the case with many other metrics a single number does not say very much.
For example it is hard to tell wether there is a problem if "we have a Checkstyle rules compliance of 78.765432%". If yesterday's compliance was 100%, we are definitely in trouble. If it was 50% yesterday, we are probably doing a good job.
I alway get nervous when code coverage has gotten lower and lower over time. There are cases when this is okay, so you cannot turn off your head when looking at charts and numbers.
BTW, sonar (http://sonar.codehaus.org/) is a great tool for watching trends.
Using code coverage on it's own is mostly pointless, it gives you only insight if you are looking for unnecessary code.
Using it together with unit-tests and aiming for 100% coverage will tell you that all the 'tested' parts (assumed it was all successfully too) work as specified in the unit-test.
Writing unit-tests from a technical design/functional design, having 100% coverage and 100% successful tests will tell you that the program is working like described in the documentation.
Now the only thing you need is good documentation, especially the functional design, a programmer should not write that unless (s)he is an expert of that specific field.
Scenario coverage.
I don't think you really want to have 100% code coverage. Testing say, simple getters and setters looks like a waste of time.
The code always runs in some context, so you may list as many scenarios as you can (depending on the problem complexity sometimes even all of them) and test them.
Example:
// parses a line from .ini configuration file
// e.g. in the form of name=value1,value2
List parseConfig(string setting)
{
(name, values) = split_string_to_name_and_values(setting, '=')
values_list = split_values(values, ',')
return values_list
}
Now, you have many scenarios to test. Some of them:
Passing correct value
List item
Passing null
Passing empty string
Passing ill-formated parameter
Passing string with with leading or ending comma e.g. name=value1, or name=,value2
Running just first test may give you (depending on the code) 100% code coverage. But you haven't considered all the posibilities, so that metric by itself doesn't tell you much.
Code Coverage is just an indicator and helps pointing out lines which are not executed at all in your tests, which is quite interesting. If you reach 80% code coverage or so, it starts making sense to look at the remaining 20% of lines to identify if you are missing some use case. If you see "aha, this line gets executed if I pass an empty vector" then you can actually write a test which passes an empty vector.
As an alternative I can think of, if you have a specs document with Use Cases and Functional Requirements, you should map the unit tests to them and see how many UC are covered by FR (of course it should be 100%) and how many FR are covered by UT (again, it should be 100%).
If you don't have specs, who cares? Anything that happens will be ok :)
How about (lines of code)/(number of test cases)? Not extremely meaningful (since it depends on LOC), but at least it's easy to calculate.
Another one could be (number of test cases)/(number of methods).
I wrote a blog post about why High Test Coverage Ratio is a Good Thing Anyway.
I agree that: when a portion of code is executed by tests, it doesn’t mean that the validity of the results produced by this portion of code is verified by tests.
But still, if you are heavily using contracts to check states validity during tests execution, high test coverage will mean a lot of verification anyway.
The value in code coverage is it gives you some idea of what has been exercised by tests.
The phrase "code coverage" is often used to mean statement coverage, e.g., "how much of my code (in lines) has been executed", but in fact there are over a hundred varieties of "coverage". These other versions of coverage try to provide a more sophisticated view what it means to exercise code.
For example, condition coverage measures how many of the separate elements of conditional expressions have been exercised. This is different than statement coverage. MC/DC
"modified condition/decision coverage" determines whether the elements of all conditional expressions have all been demonstrated to control the outcome of the conditional, and is required by the FAA for aircraft software. Path coverage meaures how many of the possible execution paths through your code have been exercised. This is a better measure than statement coverage, in that paths essentially represent different cases in the code. Which of these measures is best to use depends on how concerned you are about the effectiveness of your tests.
Wikipedia discusses many variations of test coverage reasonably well.
http://en.wikipedia.org/wiki/Code_coverage
As a rule of thumb, defect injection rates proportionally trail code yield and they both typically follow a Rayleigh distribution curve.
At some point your defect detection rate will peak and then start to diminish.
This apex represents 40% of discovered defects.
Moving forward with simple regression analysis you can estimate how many defects remain in your product at any point following the peak.
This is one component of Lawrence Putnam's model.
This hasn't been mentioned, but the amount of change in a given file of code or method (by looking at version control history) is interesting particularly when you're building up a test suite for poorly tested code. Focus your testing on the parts of the code you change a lot. Leave the ones you don't for later.
Watch out for a reversal of cause and effect. You might avoid changing untested code and you might tend to change tested code more.
SQLite is an extremely well-tested library, and you can extract all kinds of metrics from it.
As of version 3.6.14 (all statistics in the report are against that release of SQLite), the SQLite library consists of approximately 63.2 KSLOC of C code. (KSLOC means thousands of "Source Lines Of Code" or, in other words, lines of code excluding blank lines and comments.) By comparison, the project has 715 times as much test code and test scripts - 45261.5 KSLOC.
In the end, what always strikes me as the most significant is none of those possible metrics seem to be as important as the simple statement, "it meets all the requirements." (So don't lose sight of that goal in the process of achieving it.)
If you want something to judge a team's progress, then you have to lay down individual requirements. This gives you something to point to and say "this one's done, this one isn't". It's not linear (solving each requirement will require varying work), and the only way you can linearize it is if the problem has already been solved elsewhere (and thus you can quantize work per requirement).
I like revenue, sales numbers, profit. They are pretty good metrics of a code base.
Probably not only measuring the code covered (touched) by the unit tests but how good the assertions are.
One metric easy to implement is to measure the size of the Assert.AreEqual
You can create your own Assert implementation calling Assert.AreEqual and measuring the size of the object passed as second parameter.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
I like my code being in order, i.e. properly formatted, readable, designed, tested, checked for bugs, etc. In fact I am fanatic about it. (Maybe even more than fanatic...) But in my experience actions helping code quality are hardly implemented. (By code quality I mean the quality of the code you produce day to day. The whole topic of software quality with development processes and such is much broader and not the scope of this question.)
Code quality does not seem popular. Some examples from my experience include
Probably every Java developer knows JUnit, almost all languages implement xUnit frameworks, but in all companies I know, only very few proper unit tests existed (if at all). I know that it's not always possible to write unit tests due to technical limitations or pressing deadlines, but in the cases I saw, unit testing would have been an option. If a developer wanted to write some tests for his/her new code, he/she could do so. My conclusion is that developers do not want to write tests.
Static code analysis is often played around in small projects, but not really used to enforce coding conventions or find possible errors in enterprise projects. Usually even compiler warnings like potential null pointer access are ignored.
Conference speakers and magazines would talk a lot about EJB3.1, OSGI, Cloud and other new technologies, but hardly about new testing technologies or tools, new static code analysis approaches (e.g. SAT solving), development processes helping to maintain higher quality, how some nasty beast of legacy code was brought under test, ... (I did not attend many conferences and it probably looks different for conferences on agile topics, as unit testing and CI and such has a higher value there.)
So why is code quality so unpopular/considered boring?
EDIT:
Thank your for your answers. Most of them concern unit testing (and has been discussed in a related question). But there are lots of other things that can be used to keep code quality high (see related question). Even if you are not able to use unit tests, you could use a daily build, add some static code analysis to your IDE or development process, try pair programming or enforce reviews of critical code.
One obvious answer for the Stack Overflow part is that it isn't a forum. It is a database of questions and answers, which means that duplicate questions are attempted avoided.
How many different questions about code quality can you think of? That is why there aren't 50,000 questions about "code quality".
Apart from that, anyone claiming that conference speakers don't want to talk about unit testing or code quality clearly needs to go to more conferences.
I've also seen more than enough articles about continuous integration.
There are the common excuses for not
writing tests, but they are only
excuses. If one wants to write some
tests for his/her new code, then it is
possible
Oh really? Even if your boss says "I won't pay you for wasting time on unit tests"?
Even if you're working on some embedded platform with no unit testing frameworks?
Even if you're working under a tight deadline, trying to hit some short-term goal, even at the cost of long-term code quality?
No. It is not "always possible" to write unit tests. There are many many common obstacles to it. That's not to say we shouldn't try to write more and better tests. Just that sometimes, we don't get the opportunity.
Personally, I get tired of "code quality" discussions because they tend to
be too concerned with hypothetical examples, and are far too often the brainchild of some individual, who really hasn't considered how aplicable it is to other people's projects, or codebases of different sizes than the one he's working on,
tend to get too emotional, and imbue our code with too many human traits (think of the term "code smell", for a good example),
be dominated by people who write horrible bloated, overcomplicated and verbose code with far too many layers of abstraction, or who'll judge whether code is reusable by "it looks like I can just take this chunk of code and use it in a future project", rather than the much more meaningful "I have actually been able to take this chunk of code and reuse it in different projects".
I'm certainly interested in writing high quality code. I just tend to be turned off by the people who usually talk about code quality.
Code review is not an exact science. Metrics used are somehow debatable. Somewhere on that page : "You can't control what you can't measure"
Suppose that you have one huge function of 5000 lines with 35 parameters. You can unit test it how much you want, it might do exactly what it is supposed to do. Whatever the inputs are. So based on unit testing, this function is "perfect". Besides correctness, there are tons of others quality attributes you might want to measure. Performance, scalability, maintainability, usability and such. Did you ever wondered why software maintenance is such a nightmare?
Real software projects quality control goes far beyond simply checking if the code is correct. If you check the V-Model of software development, you'll notice that coding is only a small part of the whole equation.
Software quality control can go to as far as 60% of the whole cost of your project. This is huge. Instead, people prefer to cut to 0% and go home thinking they made the right choice. I think the real reason why so little time is dedicated to software quality is because software quality isn't well understood.
What is there to measure?
How do we measure it?
Who will measure it?
What will I gain/lose from measuring it?
Lots of coder sweatshops do not realise the relation between "less bugs now" and "more profit later". Instead, all they see is "time wasted now" and "less profit now". Even when shown pretty graphics demonstrating the opposite.
Moreover, software quality control and software engineering as a whole is a relatively new discipline. A lot of the programming space so far has been taken by cyber cowboys. How many times have you heard that "anyone" can program? Anyone can write code that's for sure, but it's not everyone who can be a programmer.
EDIT *
I've come across this paper (PDF) which is from the guy who said "You can't control what you can't measure". Basically he's saying that controlling everything is not as desirable as he first thought it would be. It is not an exact cooking recipe that you can blindly apply to all projects like the software engineering schools want to make you think. He just adds another parameter to control which is "Do I want to control this project? Will it be needed?"
Laziness / Considered boring
Management feeling it's unnecessary -
Ignorant "Just do it right" attitude.
"This small project doesn't need code
quality management" turns into "Now
it would be too costly to implement
code quality management on this large
project"
I disagree that it's dull though. A solid unit testing design makes creating tests a breeze and running them even more fun.
Calculating vector flow control - PASSED
Assigning flux capacitor variance level - PASSED
Rerouting superconductors for faster dialing sequence - PASSED
Running Firefly hull checks - PASSED
Unit tests complete. 4/4 PASSED.
Like anything it can get boring if you do too much of it but spending 10 or 20 minutes writing some random tests for some complex functions after several hours of coding isn't going to suck the creative life from you.
Why is code quality so unpopular?
Because our profession is unprofessional.
However, there are people who do care about code quality. You can find such-minded people for example from the Software Craftsmanship movement's discussion group. But unfortunately the majority of people in software business do not understand the value of code quality, or do not even know what makes up good code.
I guess the answer is the same as to the question 'Why is code quality not popular?'
I believe the top reasons are:
Laziness of the developers. Why invest time in preparing unit tests, review the solution, if it's already implemented?
Improper management. Why ask the developers to cope with code quality, if there are thousands of new feature requests and the programmers could simply implement something instead of taking care of quality of something already implemented.
Short answer: It's one of those intangibles only appreciated by other, mainly experienced, developers and engineers unless something goes wrong. At which point managers and customers are in an uproar and demand why formal processes weren't in place.
Longer answer: This short-sighted approach isn't limited to software development. The American automotive industry (or what's left of it) is probably the best example of this.
It's also harder to justify formal engineering processes when projects start their life as one-off or throw-away. Of course, long after the project is done, it takes a life of its own (and becomes prominent) as different business units start depending on it for their own business process.
At which point a new solution needs to be engineered; but without practice in using these tools and good-practices, these tools are less than useless. They become a time-consuming hindrance. I see this situation all too often in companies where IT teams are support to the business, where development is often reactionary rather than proactive.
Edit: Of course, these bad habits and many others are the real reason consulting firms like Thought Works can continue to thrive as well as they do.
One big factor that I didn't see mentioned yet is that any process improvement (unit testing, continuos integration, code reviews, whatever) needs to have an advocate within the organization who is committed to the technology, has the appropriate clout within the organization, and is willing to do the work to convince others of the value.
For example, I've seen exactly one engineering organization where code review was taken truly seriously. That company had a VP of Software who was a true believer, and he'd sit in on code reviews to make sure they were getting done properly. They incidentally had the best productivity and quality of any team I've worked with.
Another example is when I implemented a unit-testing solution at another company. At first, nobody used it, despite management insistence. But several of us made a real effort to talk up unit testing, and to provide as much help as possible for anyone who wanted to start unit testing. Eventually, a couple of the most well-respected developers signed on, once they started to see the advantages of unit testing. After that, our testing coverage improved dramatically.
I just thought of another factor - some tools take a significant amount of time to get started with, and that startup time can be hard to come by. Static analysis tools can be terrible this way - you run the tool, and it reports 2,000 "problems", most of which are innocuous. Once you get the tool configured properly, the false-positive problem get substantially reduced, but someone has to take that time, and be committed to maintaining the tool configuration over time.
Probably every Java developer knows JUnit...
While I believe most or many developers have heard of JUnit/nUnit/other testing frameworks, fewer know how to write a test using such a framework. And from those, very few have a good understanding of how to make testing a part of the solution.
I've known about unit testing and unit test frameworks for at least 7 years. I tried using it in a small project 5-6 years ago, but it is only in the last few years that I've learned how to do it right. (ie. found a way that works for me and my team...)
For me some of those things were:
Finding a workflow that accomodates unit testing.
Integrating unit testing in my IDE, and having shortcuts to run/debug tests.
Learning how to test what. (Like how to test logging in or accessing files. How to abstract yourself from the database. How to do mocking and use a mocking framework. Learn techniques and patterns that increase testability.)
Having some tests is better than having no tests at all.
More tests can be written later when a bug is discovered. Write the test that proves the bug, then fix the bug.
You'll have to practice to get good at it.
So until finding the right way; yeah, it's dull, non rewarding, hard to do, time consuming, etc.
EDIT:
In this blogpost I go in depth on some of the reasons given here against unit testing.
Code Quality is unpopular? Let me dispute that fact.
Conferences such as Agile 2009 have a plethora of presentations on Continuous Integration, and testing techniques and tools. Technical conference such as Devoxx and Jazoon also have their fair share of those subjects.
There is even a whole conference dedicated to Continuous Integration & Testing (CITCON, which takes place 3 times a year on 3 continents).
In fact, my personal feeling is that those talks are so common, that they are on the verge of being totally boring to me.
And in my experience as a consultant, consulting on code quality techniques & tools is actually quite easy to sell (though not very highly paid).
That said, though I think that Code Quality is a popular subject to discuss, I would rather agree with the fact that developers do not (in general) do good, or enough, tests. I do have a reasonably simple explanation to that fact.
Essentially, it boils down to the fact that those techniques are still reasonably new (TDD is 15 years old, CI less than 10) and they have to compete with 1) managers, 2) developers whose ways "have worked well enough so far" (whatever that means).
In the words of Geoffrey Moore, modern Code Quality techniques are still early in the adoption curve. It will take time until the entire industry adopts them.
The good news, however, is that I now meet developers fresh from university that have been taught TDD and are truly interested in it. That is a recent development. Once enough of those have arrived on the market, the industry will have no choice but to change.
It's pretty simple when you consider the engineering adage "Good, Fast, Cheap: pick two". In my experience 98% of the time, it's Fast and Cheap, and by necessity the other must suffer.
It's the basic psychology of pain. When you'ew running to meet a deadline code quality takes the last seat. We hate it because it's dull and boring.
It reminds me of this Monty Python skit:
"Exciting? No it's not. It's dull. Dull. Dull. My God it's dull, it's so desperately dull and tedious and stuffy and boring and des-per-ate-ly DULL. "
I'd say for many reasons.
First of all, if the application/project is small or carries no really important data at a large scale the time needed to write the tests is better used to write the actual application.
There is a threshold where the quality requirements are of such a level that unit testing is required.
There is also the problem of many methods not being easily testable. They may rely on data in a database or similar, which creates the headache of setting up mockup data to be fed to the methods. Even if you set up mockup data - can you be certain the database would behave the same way?
Unit testing is also weak at finding problems that haven't been considered. That is, unit testing is bad at simulating the unexpected. If you haven't considered what could happen in a power outage, if the network link sends bad data that is still CRC correct. Writing tests for this is futile.
I am all in favour of code inspections as they let programmers share experience and code style from other programmers.
"There are the common excuses for not writing tests, but they are only excuses."
Are they? Get eight programmers in a room together, ask them a question about how best to maintain code quality, and you're going to get nine different answers, depending on their age, education and preferences. 1970s era Computer Scientists would've laughed at the notion of unit testing; I'm not sure they would've been wrong to.
Management needs to be sold on the value of spending more time now to save time down the road. Since they can't actually measure "bugs not fixed", they're often more concerned about meeting their immediate deadlines & ship date than the longterm quality off the project.
Code quality is subjective. Subjective topics are always tedious.
Since the goal is simply to make something that works, code quality always comes in second. It adds time and cost. (I'm not saying that it should not be considered a good thing though.)
99% of the time, there are no third party consquences for poor code quality (unless you're making spaceshuttle or train switching software).
Does it work? = Concrete.
Is it pretty? = In the eye of the beholder.
Read Fred Brooks' The Mythical Man Month. There is no silver bullet.
Unit Testing takes extra work. If a programmer sees that his product "works" (eg, no unit testing), why do any at all? Especially when it is not nearly as interesting as implementing the next feature in the program, etc. Most people just tend to be lazy when it comes down to it, which isn't quite a good thing...
Code quality is context specific and hard to generalize no matter how much effort people try to make it so.
It's similar to the difference between theory and application.
I also have not seen unit tests written on a regular basis. The reason for that was given as the code being too extensively changed at the beginning of the project so everyone dropped writing unit tests until everything got stabilized. After that everyone was happy and not in need of unit tests. So we have a few tests stay there as a history but they are not used and are probably not compatible with the current code.
I personally see writing unit tests for big projects as not feasible, although I admit I have not tried it nor talked to people who did. There are so many rules in business logic that if you just change something somewhere a little bit you have no way of knowing which tests to update beyond those that will crash. Who knows, the old tests may now not cover all possibilities and it takes time to recollect what was written five years ago.
The other reason being the lack of time. When you have a task assigned where it says "Completion time: O,5 man/days", you only have time to implement it and shallow test it, not to think of all possible cases and relations to other project parts and write all the necessary tests. It may really take 0,5 days to implement something and a couple of weeks to write the tests. Unless you were specifically given an order to create the tests, nobody will understand that tremendous loss of time, which will result in yelling/bad reviews. And no, for our complex enterprise application I cannot think of a good test coverage for a task in five minutes. It will take time and probably a very deep knowledge of most application modules.
So, the reasons as I see them is time loss which yields no useful features and the nightmare to maintain/update old tests to reflect new business rules. Even if one wanted to, only experienced colleagues could write those tests - at least one year deep involvement in the project, but two-three is really needed. So new colleagues do not manage proper tests. And there is no point in creating bad tests.
It's 'dull' to catch some random 'feature' with extreme importance for more than a day in mysterious code jungle wrote by someone else x years ago without any clue what's going wrong, why it's going wrong and with absolutely no ideas what could fix it when it was supposed to end in a few hours. And when it's done, no one is satisfied cause of huge delay.
Been there - seen that.
A lot of the concepts that are emphasized in modern writing on code quality overlook the primary metric for code quality: code has to be functional first and foremost. Everything else is just a means to that end.
Some people don't feel like they have time to learn the latest fad in software engineering, and that they can write high-quality code already. I'm not in a place to judge them, but in my opinion it's very difficult for your code to be used over long periods of time if people can't read, understand and change it.
Lack of 'code quality' doesn't cost the user, the salesman, the architect nor the developer of the code; it slows down the next iteration, but I can think of several successful products which seem to be made out of hair and mud.
I find unit testing to make me more productive, but I've seen lots of badly formatted, unreadable poorly designed code which passed all its tests ( generally long-in-the-tooth code which had been patched many times ). By passing tests you get a road-worthy Skoda, not the craftsmanship of a Bristol. But if you have 'low code quality' and pass your tests and consistently fulfill the user's requirements, then that's a valid business model.
My conclusion is that developers do not want to write tests.
I'm not sure. Partly, the whole education process in software isn't test driven, and probably should be - instead of asking for an exercise to be handed in, give the unit tests to the students. It's normal in maths questions to run a check, why not in software engineering?
The other thing is that unit testing requires units. Some developers find modularisation and encapsulation difficult to do well. A good technical lead will create a modular architecture which localizes the scope of a unit, so making it easy to test in isolation; many systems don't have good architects who facilitate testability, or aren't refactored regularly enough to reduce inter-unit coupling.
It's also hard to test distributed or GUI driven applications, due to inherent coupling. I've only been in one team that did that well, and that had as large a test department as a development department.
Static code analysis is often played around in small projects, but not really used to enforce coding conventions or find possible errors in enterprise projects.
Every set of coding conventions I've seen which hasn't been automated has been logically inconsistent, sometimes to the point of being unusable - even ones claimed to have been used 'successfully' in several projects. Non-automatic coding standards seem to be political rather than technical documents.
Usually even compiler warnings like potential null pointer access are ignored.
I've never worked in a shop where compiler warnings were tolerated.
One attitude that I have met rather often (but never from programmers that were already quality-addicts) is that writing unit tests just forces you to write more code without getting any extra functionality for the effort. And they think that that time would be better spent adding functionality to the product instead of just creating "meta code".
That attitude usually wears off as unit tests catch more and more bugs that you realize would be serious and hard to locate in a production environment.
A lot of it arises when programmers forget, or are naive, and act like their code won't be viewed by somebody else at a later date (or themselves months/years down the line).
Also, commenting isn't near as "cool" as actually writing a slick piece of code.
Another thing that several people have touched on is that most development engineers are terrible testers. They don't have the expertise or mind-set to effectively test their own code. This means that unit testing doesn't seem very valuable to them - since all of their code always passes unit tests, why bother writing them?
Education and mentoring can help with that, as can test-driven development. If you write the tests first, you're at least thinking primarily about testing, rather than trying to get the tests done, so you can commit the code...
The likelyhood of you being replaced by a cheaper fresh out of college student or outsource worker is directly proportional to the readability of your code.
People don't have a common sense of what "good" means for code. A lot of people will drop to the level of "I ran it" or even "I wrote it."
We need to have some kind of a shared sense of what good code is, and whether it matters. For the first part of that,I have written up some thoughts:
http://agileinaflash.blogspot.com/2010/02/seven-code-virtues.html
As for whether it matters, that's been covered plenty of times. It matters quite a lot if your code is to live very long. If it really won't ever sell or won't be deployed, then it clearly doesn't. If it's not worth doing, it's not worth doing well.
But if you don't practice writing virtuous code, then you can't do it when it matters. I think people have practiced doing poor work, and don't know anything else.
I think code quality is over-rated. the more I do it the less it means to me. Code quality frameworks prefer over-complicated code. You never see errors like "this code is too abstract, no one will understand it.", but for example PMD says that I have too many methods in my class. So I should cut the class into abstract class/classes (the best way since PMD doesn't care what I do) or cut the classes based on functionality (worst way since it might still have too many methods - been there).
Static Analysis is really cool, however it's just warnings. For example FindBugs has problem with casting and you should use instaceof to make warning go away. I don't do that just to make FindBugs happy.
I think too complicated code is not when method has 500 lines of code, but when method is using 500 other methods and many abstractions just for fun. I think code quality masters should really work on finding when code is too complicated and don't care so much about little things (you can refactor them with the right tools really quickly.).
I don't like idea of code coverage since it's really useless and makes unit-test boring. I always test code with complicated functionality, but only that code. I worked in a place with 100% code coverage and it was a real nightmare to change anything. Because when you change anything you had to worry about broken (poorly written) unit-tests and you never know what to do with them, many times we just comment them out and add todo to fix them later.
I think unit-testing has its place and for example I did a lot of unit-testing in my webpage parser, because all the time I found diffrent bugs or not supported tags. Testing Database programs is really hard if you want to also test database logic, DbUnit is really painful to work with.
I don't know. Have you seen Sonar? Sure it is Maven specific, but point it at your build and boom, lots of metrics. That's the kind of project that will facilitate these code quality metrics going mainstream.
I think that real problem with code quality or testing is that you have to put a lot of work into it and YOU get nothing back. less bugs == less work? no, there's always something to do. less bugs == more money? no, you have to change job to get more money. unit-testing is heroic, you only do it to feel better about yourself.
I work at place where management is encouraging unit-testing, however I am the only person that writes tests(i want to get better at it, its the only reason I do it). I understand that for others writing tests is just more work and you get nothing in return. surfing the web sounds cooler than writing tests.
someone might break your tests and say he doesn't know how to fix or comment it out(if you use maven).
Frameworks are not there for real web-app integration testing(unit test might pass, but it might not work on a web page), so even if you write test you still have to test it manually.
You could use framework like HtmlUnit, but its really painful to use. Selenium breaks with every change on a webpage. SQL testing is almost impossible(You can do it with DbUnit, but first you have to provide test data for it. test data for 5 joins is a lot of work and there is no easy way to generate it). I dont know about your web-framework, but the one we are using really likes static methods, so you really have to work to test the code.
Our management has recently been talking to some people selling C++ static analysis tools. Of course the sales people say they will find tons of bugs, but I'm skeptical.
How do such tools work in the real world? Do they find real bugs? Do they help more junior programmers learn?
Are they worth the trouble?
Static code analysis is almost always worth it. The issue with an existing code base is that it will probably report far too many errors to make it useful out of the box.
I once worked on a project that had 100,000+ warnings from the compiler... no point in running Lint tools on that code base.
Using Lint tools "right" means buying into a better process (which is a good thing). One of the best jobs I had was working at a research lab where we were not allowed to check in code with warnings.
So, yes the tools are worth it... in the long term. In the short term turn your compiler warnings up to the max and see what it reports. If the code is "clean" then the time to look at lint tools is now. If the code has many warnings... prioritize and fix them. Once the code has none (or at least very few) warnings then look at Lint tools.
So, Lint tools are not going to help a poor code base, but once you have a good codebase it can help you keep it good.
Edit:
In the case of the 100,000+ warning product, it was broken down into about 60 Visual Studio projects. As each project had all of the warnings removed it was changed so that the warnings were errors, that prevented new warnings from being added to projects that had been cleaned up (or rather it let my co-worker righteously yell at any developer that checked in code without compiling it first :-)
In my experience with a couple of employers, Coverity Prevent for C/C++ was decidedly worth it, finding some bugs even in good developers’ code, and a lot of bugs in the worst developers’ code. Others have already covered technical aspects, so I’ll focus on the political difficulties.
First, the developers whose code need static analysis the most, are the least likely to use it voluntarily. So I’m afraid you’ll need strong management backing, in practice as well as in theory; otherwise it might end up as just a checklist item, to produce impressive metrics without actually getting bugs fixed. Any static analysis tool is going to produce false positives; you’re probably going to need to dedicate somebody to minimizing the annoyance from them, e.g., by triaging defects, prioritizing the checkers, and tweaking the settings. (A commercial tool should be extremely good at never showing a false positive more than once; that alone may be worth the price.) Even the genuine defects are likely to generate annoyance; my advice on this is not to worry about, e.g., check-in comments grumbling that obviously destructive bugs are “minor.”
My biggest piece of advice is a corollary to my first law, above: Take the cheap shots first, and look at the painfully obvious bugs from your worst developers. Some of these might even have been found by compiler warnings, but a lot of bugs can slip through those cracks, e.g., when they’re suppressed by command-line options. Really blatant bugs can be politically useful, e.g., with a Top Ten List of the funniest defects, which can concentrate minds wonderfully, if used carefully.
As a couple people remarked, if you run a static analysis tool full bore on most applications, you will get a lot of warnings, some of them may be false positives or may not lead to an exploitable defect. It is that experience that leads to a perception that these types of tools are noisy and perhaps a waste of time. However, there are warnings that will highlight a real and potentially dangerous defects that can lead to security, reliability, or correctness issues and for many teams, those issues are important to fix and may be nearly impossible to discover via testing.
That said, static analysis tools can be profoundly helpful, but applying them to an existing codebase requires a little strategy. Here are a couple of tips that might help you..
1) Don't turn everything on at once, decide on an initial set of defects, turn those analyses on and fix them across your code base.
2) When you are addressing a class of defects, help your entire development team to understand what the defect is, why it's important and how to code to defend against that defect.
3) Work to clear the codebase completely of that class of defects.
4) Once this class of issues have been fixed, introduce a mechanism to stay in that zero issue state. Luckily, it is much easier make sure you are not re-introducing an error if you are at a baseline has no errors.
It does help. I'd suggest taking a trial version and running it through a part of your codebase which you think is neglected. These tools generate a lot of false positives. Once you've waded through these, you're likely to find a buffer overrun or two that can save a lot of grief in near future. Also, try at least two/three varieties (and also some of the OpenSource stuff).
I've used them - PC-Lint, for example, and they did find some things. Typically they are configurable and you can tell them 'stop bothering me about xyz', if you determine that xyz really isn't an issue.
I don't know that they help junior programmers learn a lot, but they can be used as a mechanism to help tighten up the code.
I've found that a second set of (skeptical, probing for bugs) eyes and unit testing is typically where I've seen more bug catching take place.
Those tools do help. lint has been a great tool for C developers.
But one objection that I have is that they're batch processes that run after you've written a fair amount of code and potentially generate a lot of messages.
I think a better approach is to build such a thing into your IDE and have it point out the problem while you're writing it so you can correct it right away. Don't let those problems get into the code base in the first place.
That's the difference between the FindBugs static analysis tool for Java and IntelliJ's Inspector. I greatly prefer the latter.
You are probably going to have to deal with a good amount of false positives, particularly if your code base is large.
Most static analysis tools work using "intra-procedural analysis", which means that they consider each procedure in isolation, as opposed to "whole-program analysis" which considers the entire program.
They typically use "intra-procedural" analysis because "whole-program analysis" has to consider many paths through a program that won't actually ever happen in practice, and thus can often generate false positive results.
Intra-procedural analysis eliminates those problems by just focusing on a single procedure. In order to work, however, they usually need to introduce an "annotation language" that you use to describe meta-data for procedure arguments, return types, and object fields. For C++ those things are usually implemented via macros that you decorate things with. The annotations then describe things like "this field is never null", "this string buffer is guarded by this integer value", "this field can only be accessed by the thread labeled 'background'", etc.
The analysis tool will then take the annotations you supply and verify that the code you wrote actually conforms to the annotations. For example, if you could potentially pass a null off to something that is marked as not null, it will flag an error.
In the absence of annotations, the tool needs to assume the worst, and so will report a lot of errors that aren't really errors.
Since it appears you are not using such a tool already, you should assume you are going to have to spend a considerably amount of time annotating your code to get rid of all the false positives that will initially be reported. I would run the tool initially, and count the number of errors. That should give you an estimate of how much time you will need to adopt it in your code base.
Wether or not the tool is worth it depends on your organization. What are the kinds of bugs you are bit by the most? Are they buffer overrun bugs? Are they null-dereference or memory-leak bugs? Are they threading issues? Are they "oops we didn't consider that scenario", or "we didn't test a Chineese version of our product running on a Lithuanian version of Windows 98?".
Once you figure out what the issues are, then you should know if it's worth the effort.
The tool will probably help with buffer overflow, null dereference, and memory leak bugs. There's a chance that it may help with threading bugs if it has support for "thread coloring", "effects", or "permissions" analysis. However, those types of analysis are pretty cutting-edge, and have HUGE notational burdens, so they do come with some expense. The tool probably won't help with any other type of bugs.
So, it really depends on what kind of software you write, and what kind of bugs you run into most frequently.
I think static code analysis is well worth, if you are using the right tool. Recently, we tried the Coverity Tool ( bit expensive). Its awesome, it brought out many critical defects,which were not detected by lint or purify.
Also we found that, we could have avoided 35% of the customer Field defects, if we had used coverity earlier.
Now, Coverity is rolled out in my company and when ever we get a customer TR in old software version, we are running coverity against it to bring out the possible canditates for the fault before we start the analysis in a susbsytem.
Paying for most static analysis tools is probably unnecessary when there's some very good-quality free ones (unless you need some very special or specific feature provided by a commercial version). For example, see this answer I gave on another question about cppcheck.
I guess it depends quite a bit on your programming style. If you are mostly writing C code (with the occasional C++ feature) then these tools will likely be able to help (e.g. memory management, buffer overruns, ...). But if you are using more sophisticated C++ features, then the tools might get confused when trying to parse your source code (or just won't find many issues because C++ facilities are usually safer to use).
As with everything the answer depends ... if you are the sole developer working on a knitting-pattern-pretty-printer for you grandma you'll probably do not want to buy any static analysis tools. If you are having a medium sized project for software that will go into something important and maybe on top of that you have a tight schedule, you might want to invest a little bit now that saves you much more later on.
I recently wrote a general rant on this: http://www.redlizards.com/blog/?p=29
I should write part 2 as soon as time permits, but in general do some rough calculations whether it is worth it for you:
how much time spent on debugging?
how many resources bound?
what percentage could have been found by static analysis?
costs for tool setup?
purchase price?
peace of mind? :-)
My personal take is also:
get static analysis in early
early in the project
early in the development cycle
early as in really early (before nightly build and subsequent testing)
provide the developer with the ability to use static analysis himself
nobody likes to be told by test engineers or some anonymous tool
what they did wrong yesterday
less debugging makes a developer happy :-)
provides a good way of learning about (subtle) pitfalls without embarrassment
This rather amazing result was accomplished using Elsa and Oink.
http://www.cs.berkeley.edu/~daw/papers/fmtstr-plas07.pdf
"Large-Scale Analysis of Format String Vulnerabilities in Debian Linux"
by Karl Chen, David Wagner,
UC Berkeley,
{quarl, daw}#cs.berkeley.edu
Abstract:
Format-string bugs are a relatively common security vulnerability, and can lead to arbitrary code execution. In collaboration with others, we designed and implemented a system to eliminate format string vulnerabilities from an entire Linux distribution, using typequalifier inference, a static analysis technique that can find taint violations. We successfully analyze 66% of C/C++ source packages in the Debian 3.1 Linux distribution. Our system finds 1,533 format string taint warnings. We estimate that 85% of these are true positives, i.e., real bugs; ignoring duplicates from libraries, about 75% are real bugs. We suggest that the technology exists to render format string vulnerabilities extinct in the near future.
Categories and Subject Descriptors D.4.6 [Operating Systems]: Security and Protection—Invasive Software;
General Terms: Security, Languages;
Keywords: Format string vulnerability, Large-scale analysis, Typequalifier inference
Static analysis that finds real bugs is worth it regardless of whether it's C++ or not. Some tend to be quite noisy, but if they can catch subtle bugs like signed/unsigned comparisons causing optimizations that break your code or out of bounds array accesses, they are definitely worth the effort.
At a former employer we had Insure++.
It helped to pinpoint random behaviour (use of uninitialized stuff) which Valgrind could not find. But most important: it helpd to remove mistakes which were not known as errors yet.
Insure++ is good, but pricey, that's why we bought one user license only.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
If you were to mandate a minimum percentage code-coverage for unit tests, perhaps even as a requirement for committing to a repository, what would it be?
Please explain how you arrived at your answer (since if all you did was pick a number, then I could have done that all by myself ;)
This prose by Alberto Savoia answers precisely that question (in a nicely entertaining manner at that!):
http://www.artima.com/forums/flat.jsp?forum=106&thread=204677
Testivus On Test Coverage
Early one morning, a programmer asked
the great master:
“I am ready to write some unit tests. What code coverage should I aim
for?”
The great master replied:
“Don’t worry about coverage, just write some good tests.”
The programmer smiled, bowed, and
left.
...
Later that day, a second programmer
asked the same question.
The great master pointed at a pot of
boiling water and said:
“How many grains of rice should I put in that pot?”
The programmer, looking puzzled,
replied:
“How can I possibly tell you? It depends on how many people you need to
feed, how hungry they are, what other
food you are serving, how much rice
you have available, and so on.”
“Exactly,” said the great master.
The second programmer smiled, bowed,
and left.
...
Toward the end of the day, a third
programmer came and asked the same
question about code coverage.
“Eighty percent and no less!” Replied the master in a stern voice,
pounding his fist on the table.
The third programmer smiled, bowed,
and left.
...
After this last reply, a young
apprentice approached the great
master:
“Great master, today I overheard you answer the same question about
code coverage with three different
answers. Why?”
The great master stood up from his
chair:
“Come get some fresh tea with me and let’s talk about it.”
After they filled their cups with
smoking hot green tea, the great
master began to answer:
“The first programmer is new and just getting started with testing.
Right now he has a lot of code and no
tests. He has a long way to go;
focusing on code coverage at this time
would be depressing and quite useless.
He’s better off just getting used to
writing and running some tests. He can
worry about coverage later.”
“The second programmer, on the other hand, is quite experience both
at programming and testing. When I
replied by asking her how many grains
of rice I should put in a pot, I
helped her realize that the amount of
testing necessary depends on a number
of factors, and she knows those
factors better than I do – it’s her
code after all. There is no single,
simple, answer, and she’s smart enough
to handle the truth and work with
that.”
“I see,” said the young apprentice,
“but if there is no single simple
answer, then why did you answer the
third programmer ‘Eighty percent and
no less’?”
The great master laughed so hard and
loud that his belly, evidence that he
drank more than just green tea,
flopped up and down.
“The third programmer wants only simple answers – even when there are
no simple answers … and then does not
follow them anyway.”
The young apprentice and the grizzled
great master finished drinking their
tea in contemplative silence.
Code Coverage is a misleading metric if 100% coverage is your goal (instead of 100% testing of all features).
You could get a 100% by hitting all the lines once. However you could still miss out testing a particular sequence (logical path) in which those lines are hit.
You could not get a 100% but still have tested all your 80%/freq used code-paths. Having tests that test every 'throw ExceptionTypeX' or similar defensive programming guard you've put in is a 'nice to have' not a 'must have'
So trust yourself or your developers to be thorough and cover every path through their code. Be pragmatic and don't chase the magical 100% coverage. If you TDD your code you should get a 90%+ coverage as a bonus. Use code-coverage to highlight chunks of code you have missed (shouldn't happen if you TDD though.. since you write code only to make a test pass. No code can exist without its partner test. )
Jon Limjap makes a good point - there is not a single number that is going to make sense as a standard for every project. There are projects that just don't need such a standard. Where the accepted answer falls short, in my opinion, is in describing how one might make that decision for a given project.
I will take a shot at doing so. I am not an expert in test engineering and would be happy to see a more informed answer.
When to set code coverage requirements
First, why would you want to impose such a standard in the first place? In general, when you want to introduce empirical confidence in your process. What do I mean by "empirical confidence"? Well, the real goal correctness. For most software, we can't possibly know this across all inputs, so we settle for saying that code is well-tested. This is more knowable, but is still a subjective standard: It will always be open to debate whether or not you have met it. Those debates are useful and should occur, but they also expose uncertainty.
Code coverage is an objective measurement: Once you see your coverage report, there is no ambiguity about whether standards have been met are useful. Does it prove correctness? Not at all, but it has a clear relationship to how well-tested the code is, which in turn is our best way to increase confidence in its correctness. Code coverage is a measurable approximation of immeasurable qualities we care about.
Some specific cases where having an empirical standard could add value:
To satisfy stakeholders. For many projects, there are various actors who have an interest in software quality who may not be involved in the day-to-day development of the software (managers, technical leads, etc.) Saying "we're going to write all the tests we really need" is not convincing: They either need to trust entirely, or verify with ongoing close oversight (assuming they even have the technical understanding to do so.) Providing measurable standards and explaining how they reasonably approximate actual goals is better.
To normalize team behavior. Stakeholders aside, if you are working on a team where multiple people are writing code and tests, there is room for ambiguity for what qualifies as "well-tested." Do all of your colleagues have the same idea of what level of testing is good enough? Probably not. How do you reconcile this? Find a metric you can all agree on and accept it as a reasonable approximation. This is especially (but not exclusively) useful in large teams, where leads may not have direct oversight over junior developers, for instance. Networks of trust matter as well, but without objective measurements, it is easy for group behavior to become inconsistent, even if everyone is acting in good faith.
To keep yourself honest. Even if you're the only developer and only stakeholder for your project, you might have certain qualities in mind for the software. Instead of making ongoing subjective assessments about how well-tested the software is (which takes work), you can use code coverage as a reasonable approximation, and let machines measure it for you.
Which metrics to use
Code coverage is not a single metric; there are several different ways of measuring coverage. Which one you might set a standard upon depends on what you're using that standard to satisfy.
I'll use two common metrics as examples of when you might use them to set standards:
Statement coverage: What percentage of statements have been executed during testing? Useful to get a sense of the physical coverage of your code: How much of the code that I have written have I actually tested?
This kind of coverage supports a weaker correctness argument, but is also easier to achieve. If you're just using code coverage to ensure that things get tested (and not as an indicator of test quality beyond that) then statement coverage is probably sufficient.
Branch coverage: When there is branching logic (e.g. an if), have both branches been evaluated? This gives a better sense of the logical coverage of your code: How many of the possible paths my code may take have I tested?
This kind of coverage is a much better indicator that a program has been tested across a comprehensive set of inputs. If you're using code coverage as your best empirical approximation for confidence in correctness, you should set standards based on branch coverage or similar.
There are many other metrics (line coverage is similar to statement coverage, but yields different numeric results for multi-line statements, for instance; conditional coverage and path coverage is similar to branch coverage, but reflect a more detailed view of the possible permutations of program execution you might encounter.)
What percentage to require
Finally, back to the original question: If you set code coverage standards, what should that number be?
Hopefully it's clear at this point that we're talking about an approximation to begin with, so any number we pick is going to be inherently approximate.
Some numbers that one might choose:
100%. You might choose this because you want to be sure everything is tested. This doesn't give you any insight into test quality, but does tell you that some test of some quality has touched every statement (or branch, etc.) Again, this comes back to degree of confidence: If your coverage is below 100%, you know some subset of your code is untested.
Some might argue that this is silly, and you should only test the parts of your code that are really important. I would argue that you should also only maintain the parts of your code that are really important. Code coverage can be improved by removing untested code, too.
99% (or 95%, other numbers in the high nineties.) Appropriate in cases where you want to convey a level of confidence similar to 100%, but leave yourself some margin to not worry about the occasional hard-to-test corner of code.
80%. I've seen this number in use a few times, and don't entirely know where it originates. I think it might be a weird misappropriation of the 80-20 rule; generally, the intent here is to show that most of your code is tested. (Yes, 51% would also be "most", but 80% is more reflective of what most people mean by most.) This is appropriate for middle-ground cases where "well-tested" is not a high priority (you don't want to waste effort on low-value tests), but is enough of a priority that you'd still like to have some standard in place.
I haven't seen numbers below 80% in practice, and have a hard time imagining a case where one would set them. The role of these standards is to increase confidence in correctness, and numbers below 80% aren't particularly confidence-inspiring. (Yes, this is subjective, but again, the idea is to make the subjective choice once when you set the standard, and then use an objective measurement going forward.)
Other notes
The above assumes that correctness is the goal. Code coverage is just information; it may be relevant to other goals. For instance, if you're concerned about maintainability, you probably care about loose coupling, which can be demonstrated by testability, which in turn can be measured (in certain fashions) by code coverage. So your code coverage standard provides an empirical basis for approximating the quality of "maintainability" as well.
Code coverage is great, but functionality coverage is even better. I don't believe in covering every single line I write. But I do believe in writing 100% test coverage of all the functionality I want to provide (even for the extra cool features I came with myself and which were not discussed during the meetings).
I don't care if I would have code which is not covered in tests, but I would care if I would refactor my code and end up having a different behaviour. Therefore, 100% functionality coverage is my only target.
My favorite code coverage is 100% with an asterisk. The asterisk comes because I prefer to use tools that allow me to mark certain lines as lines that "don't count". If I have covered 100% of the lines which "count", I am done.
The underlying process is:
I write my tests to exercise all the functionality and edge cases I can think of (usually working from the documentation).
I run the code coverage tools
I examine any lines or paths not covered and any that I consider not important or unreachable (due to defensive programming) I mark as not counting
I write new tests to cover the missing lines and improve the documentation if those edge cases are not mentioned.
This way if I and my collaborators add new code or change the tests in the future, there is a bright line to tell us if we missed something important - the coverage dropped below 100%. However, it also provides the flexibility to deal with different testing priorities.
I'd have another anectode on test coverage I'd like to share.
We have a huge project wherein, over twitter, I noted that, with 700 unit tests, we only have 20% code coverage.
Scott Hanselman replied with words of wisdom:
Is it the RIGHT 20%? Is it the 20%
that represents the code your users
hit the most? You might add 50 more
tests and only add 2%.
Again, it goes back to my Testivus on Code Coverage Answer. How much rice should you put in the pot? It depends.
Many shops don't value tests, so if you are above zero at least there is some appreciation of worth - so arguably non-zero isn't bad as many are still zero.
In the .Net world people often quote 80% as reasonble. But they say this at solution level. I prefer to measure at project level: 30% might be fine for UI project if you've got Selenium, etc or manual tests, 20% for the data layer project might be fine, but 95%+ might be quite achievable for the business rules layer, if not wholly necessary. So the overall coverage may be, say, 60%, but the critical business logic may be much higher.
I've also heard this: aspire to 100% and you'll hit 80%; but aspire to 80% and you'll hit 40%.
Bottom line: Apply the 80:20 rule, and let your app's bug count guide you.
For a well designed system, where unit tests have driven the development from the start i would say 85% is a quite low number. Small classes designed to be testable should not be hard to cover better than that.
It's easy to dismiss this question with something like:
Covered lines do not equal tested logic and one should not read too much into the percentage.
True, but there are some important points to be made about code coverage. In my experience this metric is actually quite useful, when used correctly. Having said that, I have not seen all systems and i'm sure there are tons of them where it's hard to see code coverage analysis adding any real value. Code can look so different and the scope of the available test framework can vary.
Also, my reasoning mainly concerns quite short test feedback loops. For the product that I'm developing the shortest feedback loop is quite flexible, covering everything from class tests to inter process signalling. Testing a deliverable sub-product typically takes 5 minutes and for such a short feedback loop it is indeed possible to use the test results (and specifically the code coverage metric that we are looking at here) to reject or accept commits in the repository.
When using the code coverage metric you should not just have a fixed (arbitrary) percentage which must be fulfilled. Doing this does not give you the real benefits of code coverage analysis in my opinion. Instead, define the following metrics:
Low Water Mark (LWM), the lowest number of uncovered lines ever seen in the system under test
High Water Mark (HWM), the highest code coverage percentage ever seen for the system under test
New code can only be added if we don't go above the LWM and we don't go below the HWM. In other words, code coverage is not allowed to decrease, and new code should be covered. Notice how i say should and not must (explained below).
But doesn't this mean that it will be impossible to clean away old well-tested rubbish that you have no use for anymore? Yes, and that's why you have to be pragmatic about these things. There are situations when the rules have to be broken, but for your typical day-to-day integration my experience it that these metrics are quite useful. They give the following two implications.
Testable code is promoted.
When adding new code you really have to make an effort to make the code testable, because you will have to try and cover all of it with your test cases. Testable code is usually a good thing.
Test coverage for legacy code is increasing over time.
When adding new code and not being able to cover it with a test case, one can try to cover some legacy code instead to get around the LWM rule. This sometimes necessary cheating at least gives the positive side effect that the coverage of legacy code will increase over time, making the seemingly strict enforcement of these rules quite pragmatic in practice.
And again, if the feedback loop is too long it might be completely unpractical to setup something like this in the integration process.
I would also like to mention two more general benefits of the code coverage metric.
Code coverage analysis is part of the dynamic code analysis (as opposed to the static one, i.e. Lint). Problems found during the dynamic code analysis (by tools such as the purify family, http://www-03.ibm.com/software/products/en/rational-purify-family) are things like uninitialized memory reads (UMR), memory leaks, etc. These problems can only be found if the code is covered by an executed test case. The code that is the hardest to cover in a test case is usually the abnormal cases in the system, but if you want the system to fail gracefully (i.e. error trace instead of crash) you might want to put some effort into covering the abnormal cases in the dynamic code analysis as well. With just a little bit of bad luck, a UMR can lead to a segfault or worse.
People take pride in keeping 100% for new code, and people discuss testing problems with a similar passion as other implementation problems. How can this function be written in a more testable manner? How would you go about trying to cover this abnormal case, etc.
And a negative, for completeness.
In a large project with many involved developers, everyone is not going to be a test-genius for sure. Some people tend to use the code coverage metric as proof that the code is tested and this is very far from the truth, as mentioned in many of the other answers to this question. It is ONE metric that can give you some nice benefits if used properly, but if it is misused it can in fact lead to bad testing. Aside from the very valuable side effects mentioned above a covered line only shows that the system under test can reach that line for some input data and that it can execute without hanging or crashing.
If this were a perfect world, 100% of code would be covered by unit tests. However, since this is NOT a perfect world, it's a matter of what you have time for. As a result, I recommend focusing less on a specific percentage, and focusing more on the critical areas. If your code is well-written (or at least a reasonable facsimile thereof) there should be several key points where APIs are exposed to other code.
Focus your testing efforts on these APIs. Make sure that the APIs are 1) well documented and 2) have test cases written that match the documentation. If the expected results don't match up with the docs, then you have a bug in either your code, documentation, or test cases. All of which are good to vet out.
Good luck!
Code coverage is just another metric. In and of itself, it can be very misleading (see www.thoughtworks.com/insights/blog/are-test-coverage-metrics-overrated). Your goal should therefore not be to achieve 100% code coverage but rather to ensure that you test all relevant scenarios of your application.
I prefer to do BDD, which uses a combination of automated acceptance tests, possibly other integration tests, and unit tests. The question for me is what the target coverage of the automated test suite as a whole should be.
That aside, the answer depends on your methodology, language and testing and coverage tools. When doing TDD in Ruby or Python it's not hard to maintain 100% coverage, and it's well worth doing so. It's much easier to manage 100% coverage than 90-something percent coverage. That is, it's much easier to fill coverage gaps as they appear (and when doing TDD well coverage gaps are rare and usually worth your time) than it is to manage a list of coverage gaps that you haven't gotten around to and miss coverage regressions due to your constant background of uncovered code.
The answer also depends on the history of your project. I've only found the above to be practical in projects managed that way from the start. I've greatly improved the coverage of large legacy projects, and it's been worth doing so, but I've never found it practical to go back and fill every coverage gap, because old untested code is not well understood enough to do so correctly and quickly.
85% would be a good starting place for checkin criteria.
I'd probably chose a variety of higher bars for shipping criteria - depending on the criticality of the subsystems/components being tested.
Code coverage is great but only as long as the benefits that you get from it outweigh the cost/effort of achieving it.
We have been working to a standard of 80% for some time, however we have just made the decison to abandon this and instead be more focused on our testing. Concentrating on the complex business logic etc,
This decision was taken due to the increasing amount of time we spent chasing code coverage and maintaining existing unit tests. We felt we had got to the point where the benefit we were getting from our code coverage was deemed to be less than the effort that we had to put in to achieve it.
I use cobertura, and whatever the percentage, I would recommend keeping the values in the cobertura-check task up-to-date. At the minimum, keep raising totallinerate and totalbranchrate to just below your current coverage, but never lower those values. Also tie in the Ant build failure property to this task. If the build fails because of lack of coverage, you know someone's added code but hasn't tested it. Example:
<cobertura-check linerate="0"
branchrate="0"
totallinerate="70"
totalbranchrate="90"
failureproperty="build.failed" />
When I think my code isn't unit tested enough, and I'm not sure what to test next, I use coverage to help me decide what to test next.
If I increase coverage in a unit test - I know this unit test worth something.
This goes for code that is not covered, 50% covered or 97% covered.
Short answer: 60-80%
Long answer:
I think it totally depends on the nature of your project. I typically start a project by unit testing every practical piece. By the first "release" of the project you should have a pretty good base percentage based on the type of programming you are doing. At that point you can start "enforcing" a minimum code coverage.
If you've been doing unit testing for a decent amount of time, I see no reason for it not to be approaching 95%+. However, at a minimum, I've always worked with 80%, even when new to testing.
This number should only include code written in the project (excludes frameworks, plugins, etc.) and maybe even exclude certain classes composed entirely of code written of calls to outside code. This sort of call should be mocked/stubbed.
Generally speaking, from the several engineering excellence best practices papers that I have read, 80% for new code in unit tests is the point that yields the best return. Going above that CC% yields a lower amount of defects for the amount of effort exerted. This is a best practice that is used by many major corporations.
Unfortunately, most of these results are internal to companies, so there are no public literatures that I can point you to.
My answer to this conundrum is to have 100% line coverage of the code you can test and 0% line coverage of the code you can't test.
My current practice in Python is to divide my .py modules into two folders: app1/ and app2/ and when running unit tests calculate the coverage of those two folders and visually check (I must automate this someday) that app1 has 100% coverage and app2 has 0% coverage.
When/if I find that these numbers differ from standard I investigage and alter the design of the code so that coverage conforms to the standard.
This does mean that I can recommend achieving 100% line coverage of library code.
I also occasionally review app2/ to see if I could possible test any code there, and If I can I move it into app1/
Now I'm not too worried about the aggregate coverage because that can vary wildly depending on the size of the project, but generally I've seen 70% to over 90%.
With python, I should be able to devise a smoke test which could automatically run my app while measuring coverage and hopefully gain an aggreagate of 100% when combining the smoke test with unittest figures.
Check out Crap4j. It's a slightly more sophisticated approach than straight code coverage. It combines code coverage measurements with complexity measurements, and then shows you what complex code isn't currently tested.
Viewing coverage from another perspective: Well-written code with a clear flow of control is the easiest to cover, the easiest to read, and usually the least buggy code. By writing code with clearness and coverability in mind, and by writing the unit tests in parallel with the code, you get the best results IMHO.
In my opinion, the answer is "It depends on how much time you have". I try to achieve 100% but I don't make a fuss if I don't get it with the time I have.
When I write unit tests, I wear a different hat compared to the hat I wear when developing production code. I think about what the tested code claims to do and what are the situations that can possible break it.
I usually follow the following criteria or rules:
That the Unit Test should be a form of documentation on what's the expected behavior of my codes, ie. the expected output given a certain input and the exceptions it may throw that clients may want to catch (What the users of my code should know?)
That the Unit Test should help me discover the what if conditions that I may not yet have thought of. (How to make my code stable and robust?)
If these two rules doesn't produce 100% coverage then so be it. But once, I have the time, I analyze the uncovered blocks and lines and determine if there are still test cases without unit tests or if the code needs to be refactored to eliminate the unecessary codes.
It depends greatly on your application. For example, some applications consist mostly of GUI code that cannot be unit tested.
I don't think there can be such a B/W rule.
Code should be reviewed, with particular attention to the critical details.
However, if it hasn't been tested, it has a bug!
Depending on the criticality of the code, anywhere from 75%-85% is a good rule of thumb.
Shipping code should definitely be tested more thoroughly than in house utilities, etc.
This has to be dependent on what phase of your application development lifecycle you are in.
If you've been at development for a while and have a lot of implemented code already and are just now realizing that you need to think about code coverage then you have to check your current coverage (if it exists) and then use that baseline to set milestones each sprint (or an average rise over a period of sprints), which means taking on code debt while continuing to deliver end user value (at least in my experience the end user doesn't care one bit if you've increased test coverage if they don't see new features).
Depending on your domain it's not unreasonable to shoot for 95%, but I'd have to say on average your going to be looking at an average case of 85% to 90%.
I think the best symptom of correct code coverage is that amount of concrete problems unit tests help to fix is reasonably corresponds to size of unit tests code you created.
I think that what may matter most is knowing what the coverage trend is over time and understanding the reasons for changes in the trend. Whether you view the changes in the trend as good or bad will depend upon your analysis of the reason.
We were targeting >80% till few days back, But after we used a lot of Generated code, We do not care for %age, but rather make reviewer take a call on the coverage required.
From the Testivus posting I think the answer context should be the second programmer.
Having said this from a practical point of view we need parameter / goals to strive for.
I consider that this can be "tested" in an Agile process by analyzing the code we have the architecture, functionality (user stories), and then come up with a number. Based on my experience in the Telecom area I would say that 60% is a good value to check.