Another one about measuring developer performance [closed] - unit-testing

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I know the question about measuring developer performance has been asked to death, but please bear with me. I know the age old debate about how you cannot measure performance of developers, but the reality is, at our company there is a "need" to do that one way or another.
I work for a relatively small company (small in terms of developers), and management felt the need to measure developer performance based on "functionality that passes test (QA) at first iteration".
We somehow managed to convince them that this was a bad idea for various reasons, and came up instead on measuring developers by putting code in test where all unit tests passes. Since in our team there is no "requirement" per se to develop unit tests before, we felt it was an opportunity to formalise the need to develop unit tests - i.e. put some incentive on developers to write unit tests.
My problem is this: since arguably we will not be releasing code to QA that do not pass all unit tests, how can one reasonably measure developer performance based on unit tests? Based on unit tests, what makes a good developer stand out?
Functionality that fail although unit test passes?
Not writing unit test for a given functionality at all, or not adequate unit tests written?
Quality of unit test written?
Number of Unit tests written?
Any suggestions would be much appreciated. Or am I completely off the mark in this kind of performance measurement?

Perhaps I am completely off the mark in this kind of performance measurement?
The question is not "what do we measure?"
The question is "What is broken?"
Followed by "how do we measure the breakage?"
Followed by "how do we measure the improvement?"
Until you have something you're trying to fix, here's what happens.
You pick something to measure.
People respond by doing what "looks" best according to that metric.
You realize you're measuring the wrong thing.
Specifically.
"functionalities that pass test (QA) at first iteration" Which means what? Save the code until it HAS to work. Later looks better. So, delay until you pass QA on the first iteration.
"Functionality that fail although unit test passes?" This appears to be "incomplete unit tests". So you overtest everything. Take plenty of time to write all possible tests. Slow down delivery so you're not penalized by this measurement.
"Not writing unit test for a given functionality at all, or not adequate unit tests written?" Not sure how you measure this, but it sounds the same as the previous one.
.
"Quality of unit test written?" Subjective measurement. Always a good plan. Define how you're going to measure quality, and you'll get stuff that maximizes that specific measurement. Want more comments? Count those. What more whitespace? Count that.
"Number of Unit tests written?" Nothing motivates me to write redundant tests like counting the number of tests. I can easily copy and paste nearly identical code if it makes me look good according to this metric.
You get what you measure. No matter what metric you put in place, you will find that the specific thing measured will subvert most other quality concerns. Whatever you measure, but absolutely sure you want people to maximize that measurement while reducing others.
Edit
I'm not saying "Don't Measure". I'm saying "you get what you measure". Pick a metric that you want maximized at the expense of others. It's not hard to pick a metric. Just know the consequence of telling management what to measure.

I would argue that unit tests are a quality tool and not a productivity tool. If you want to both encourage unit testing and to give management a productivity metric, make unit testing mandatory to get code into production, and report on productivity based on code/features that makes it into production over a given time frame (weekly, bi-weekly, whatever). If we take as a given that people will game any system, then design the game to meet your goals.

I think Joel had it spot-on when he said that this sort of measurement will be gamed by your developers. It will not achieve what it set out to and you will likely end up with quality suffering (from the perception of everyone using the system) whilst your measurements of quality all suggest things have never been better!
edit. You say that management are demanding this. You are a small company; your management cannot afford everyone to up sticks and leave. Tell them that this is rubbish and you'll play no part in it.
If the whole idea is so that they can rank people to make them redundant (it sounds like it might be at this time), just ask them how many people have to go and then choose those developers you believe to be the worst, using your intelligence and judgement and not some dumb rule-of-thumb

For some reason the defect black market comes to mind... although this is somewhat in reverse.
Any system based on metrics when it comes to developers simply isn't going to work, because it isn't something you can measure using conventional methods. Whatever you try to put in place with regards to anything like this will be gamed (because solving problems is what we do all day, and this is just another problem to be solved) and it will be detrimental to your code (for example I wrote a simple spelling corrector the other day with about 5 unit tests which were sufficient to check it worked, but if I was measured on unit tests I could have spent another day writing another 100 which would all pass but would add no value).
You need to work out why management want this system in place. If it's to give rewards then you should have a look at Joel Spolsky's article about incentive pay which is not far off the mark from what I've seen (think about bonus day and see how many people are really happy -- none as they just got what they thought they deserved -- and how many people are really pissed off -- anyone who got less than they thought they deserved).

To quote Steve Yegge:
shouldn't there be a rule that companies aren't allowed to do things that have been formally ridiculed in a Dilbert comic?

There was just some study I read in the newspaper here at home in Norway. In a nutshell it said that office types of jobs generally had no benefit from performance pay. The reason being that measuring performance in most office types of jobs was almost impossible.
However simpler jobs like e.g. strawberry picking benefited from performance pay because it is really easy to measure performance. Nobody is going to feel bad because a high performer get a higher pay because everybody can clearly see that he or she has picked more berries.
In an office it is not always clear that the other person did a better job. And so a lot of people will be demotivated. They tested with performance pay on teachers and found that it gave negative results. People who got higher pay often didn't see why they did better than others and the ones who got lower usually couldn't see why they got lower.
What they did find though was that non-monetary rewards usually helped. Getting encouraging words from the boss for well done jobb etc.
Read iCon on how Steve Jobs managed to get people to perform. Basically he made people believe that they were part of something big and were going to change the world. That is what makes people put in an effort and perform. I don't think developers will put in a lot of effort for just money. It has to be something they really believe in and/or think is fun or enjoyable.

If you are going to tie people's pay to their unit test performance, the results are not going to be good.
People are going to try to game the system.
What I think you are after is:
You want people to deploy code that works and has a minimum number of bugs
You want the people that do that consistently to be rewarded
Your system will accomplish neither.
By tying people's pay to whether or not their tests fail, you are creating a disincentive to writing tests. Why would someone write code that, at beast, yields no benefit, and at worst limits their salary? The overall incentive will be to keep the size of the test bed minimal, so that the likely hood of failure is minimized.
This means that you will get more bugs, except they will be bugs you just don't know about.
It also means that you will be rewarding people that introduce bugs, rather than those that prevent them.
Basically you'll get the opposite of your objectives.

These are my initial thoughts on your four specific questions:
Tricky this one. At first glance it looks OK, but if the code passes unit test then, unless the developers are cheating (see below) or the test itself is wrong, it's difficult to see how you'd demonstrate this.
This seems like the best approach. All functions should have a unit test and inspection of the code should be able to reveal which ones are present and which are absent. However, one drawback could be that the developers write an empty test (i.e. one that just returns "passed" without actually testing anything). You might have to invest in lengthy code reviews to spot this one.
How are you going to assess quality? Who is going to assess quality? This assumes that your QA team has access to highly skilled independent developers - which may be true, but seems unlikely.
Counting the number of anything (lines of code, unit tests written) is a non starter. Developers will simply write large number of useless tests.
I agree with oxbow_lakes, and in fact the other answers that have appeared since I started writing this - most forms of measurement will be gamed or worse resented by developers.

I believe time is the only, albeit subjective, way to measure a developers performance.
Given enough time in any one company, good developers will stand out. Projectleaders will know who their best assets are. Bad developers will be exposed given enough time. Unfortunatly, therein lies the ultimate problem, enough time.

Basic psychology - People work to incentives. If my chances of getting a bonus / keeping my job / whatever are based on the number of tests I write, I'll write tons of meaningless tests - probably at the expense of actually doing my real job, which is getting a product out the door.
Any other basic metric you can come up with will suffer the same problem and be equally meaningless.
If you insist on "rating" devs, you could use something a bit more lateral. Scores on one of the MS certification tests perhaps (which has the side effect of getting people trained up). At least that's objective and independently verified by a neutral third party so you can't "game" it. Of course that score also bears no resemblance to the person's effectiveness in your team but it's better than an arbitrary internal measurement.
You might also consider running code through some sort of complexity measurement tool (simpler==better) and scoring people on their results. Again, it has the effect of helping people to become better coders, which is what you really want to achieve.

Poor Ash...
Kudos for using managerial inorance to push something completely unrelated, but now you have to come up with a feasible measure.
I cannot come up with any performance measurement that is not ridiculous or easily gamed. Unit tests cannot change it. Since Kopecks and Black Market were linked within minutes, I'd rather give you ammunition for not requiring individual performance measurements:
First, Software is an optimization between conflicting goals. Evaluating one or a few of them - like how many tests come up during QA - will lead to severe tradeoffs in other areas that hurt the final product.
Second, teamwork means more than just the product of a few individuals glued together. The synergistic effects cannot be tracked back to the effort or skill of a single individual - and when developing software in a team, they have huge impact.
Third, the total cost of software unfolds only after time. Maintenance, scalability, compatibility with new platforms, interaction with future products all carry a significant long term cost. Measuring short term cost (year-over-year, or release to production) does not cover the long term cost at all, and once the long term cost is known it is pointless to track it back to the originator.
Why not have each developer "vote" on their collegues: who helped us achieve our goals most in the last year? Why not trust you (as - apparently - their manager or lead) in judging their performance?

There should be a combination of a few factors to the unit tests that should be fairly easy for someone outside the development group to have a scorecard in terms of measuring the following:
1) How well do the unit tests cover the code and any common input data that may be entered for UI elements? This may seem like a basic thing but it is a good starting point and is something that can be quantified easily with tools like nCover I think.
2) Are there boundary conditions often tested,e.g. nulls for parameters or letters instead of numbers and other basic validation tests? This is also something that can be quantified easily by looking at parameters for various methods as well as having coding standards to prevent bypassing things here, e.g. all the object's methods besides the constructor take 0 parameters and thus have no boundary tests.
3) Granularity of a unit test. Does the test check for one specific case and not try to do lots of different cases in one test? Are test classes containing thousands of lines of code?
4) Grade the code and tests in terms of readability and maintainability. Would someone new have to spend days figuring out what is going on or is the code somewhat self-documenting? Examples would include method names and class names being meaningful and documentation being there?
That last 3 things are what I suspect a manager, team lead or someone else that is outside the group of developers could rank and handle. There may be some games to this to exploit things but the question is what end results do you want to have? I'm thinking well-documented, high quality, easily understood code = good code.

Look up Deming and Total Quality Management for his thoughts on why performance appraisals should not be done at all for any job.
How about this instead, assume all employees are acceptable employees unless proved different.
If someone does something unacceptable or does not perform to level you need, write them up as a performance problem. Determine how many writeups they get before you boot them out of the company.
If someone does something well, write them up for doing something good. If you want to offer a bonus, give it at the time the good performance happens. Even better make sure you announce when people get an attaboy. People will work towards getting them. Sure you will have the policial types who will try to game the system and get written up onthe basis of others achievements but you get that anyway in any system. By announcing who got them at the time of the good performance, you have removed the secrecy that allows the office politics players to function best. If everyone knows Joe did something great and you reward Mary instead, people will start to speak up about it. At the very least Joe and Mary might both get an attaboy.
Each year, give everyone the same percentage pay raise as you have only retained the workers who have acceptable performance and you have rewarded the outstanding employees through the year anytime they did something good.
If you are stuck with measuring, then measure how many times you wrote someone up for poor performance and how many times you wrote someone up for good performance. Then you have to be careful to be reasonably objective about it and write up even the people who aren't your ffriends when they do good and the people who are your friends when they do bad. But face it the manager is going to be subjective in the process no matter how you insist on objective criteria becasue there is no object criteria in the real world.

Definetely, and following the accepted answer unit tests are not a good way to measure development performance. They in fact can be a investment with little to no return.
… Automated tests, per se, do not increase our code quality but do need code output
– From Measuring a Developer's impact
Reporting on productivity based on code/features that makes it into production over a given time frame and making unit tests mandatory is actually a good system. The problem is that you get little feedback from it, and there might be too many excuses to meet a goal. Also, features / refactors / enhancements, can be of very different sizes and nature, so it wouldn't be fair to compare in most ocassions as relevant for the organisation.
Using a version control system, as git, we can atomize the minimum unit of valuable work into commits / PRs. Visualization (as in the quote linked above) is a better and more noble objective for management to perceive, rather than having a flat ladder or metric to compare their developers into.
Don't try to measure raw output. Try to understand developer work, go visualize it.

Related

Our code sucks and I'm powerless to fix it. Help! [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Our code sucks. Actually, let me clarify that. Our old code sucks. It's difficult to debug and is full of abstractions that few people understand or even remember. Just yesterday I spent an hour debugging in an area that I've worked for over a year and found myself thinking, "Wow, this is really painful." It's not anyone's fault - I'm sure it all made perfect sense initially. The worst part is usually It Just Works...provided you don't ask it to do anything outside of its comfort zone.
Our new code is pretty good. I think we're doing a lot of good things there. It's clear, consistent, and (hopefully) maintainable. We've got a Hudson server running for continuous integration and we have the beginnings of a unit test suite in place. The problem is our management is laser-focused on writing New Code. There's no time to give Old Code (or even old New Code) the TLC it so desperately needs. At any given moment our scrum backlog (for six developers) has about 140 items and around a dozen defects. And those numbers aren't changing much. We're adding things as fast as we can burn them down.
So what can I do to avoid the headaches of marathon debugging sessions mired in the depths of Old Code? Every sprint is filled to the brim with new development and showstopper defects. Specifically...
What can I do to help maintenance and refactoring tasks get high enough priority to be worked?
Are there any C++-specific strategies you employ to help prevent New Code from rotting so quickly?
Your management may be focused on getting working features into the product, and keeping them working. In this case, you will need to make a business case for refactoring the old stuff, in that by X investment of time and effort you can reduce necessary maintenance time by Y over period Z. Or your management may be fundamentally clueless (this happens, but less often than most developers seem to think), in which case you'll never get permission.
You need to see the business point of view. It doesn't matter to the end user whether the code is ugly or elegant, only what the software does. The cost of bad code is potential unreliability and additional difficulty in changing it; the emotional distress it causes to the programmer is rarely considered.
If you can't get permission to go in and refactor, you can always try it on your own, a little bit at a time. Whenever you fix a bug, do a little rewriting to make things clearer. This may turn out to be faster than the minimum possible fix, particularly in verifying that the code now works. Even if it isn't, it's usually possible to take a little more time on a bug fix without getting into trouble. Just don't get carried away.
If you can leave the code just a little better each time you go in, you'll feel a lot better about it.
Stand Up Meetings
I might go to my mechanic, and we have a little stand-up meeting in the morning:
I tell him I want my wheels aligned,
my tires rotated, and my oil changed.
I mention that "Oh by the way my
brakes felt a little soft on the way
in. Could [he] take a look at them?
How soon can I get my car back because
I need to get back to work?"
He pops his head under my car, pops
back up and says my brakes are leaking
oil and starting to fail. He will need
a part that will arrive at 10:30am.
His man won't finish before lunch, but
I should get my car back by 1:30pm or
so. He's booked solid so he won't be
able to do any of the other stuff
today, and I will have to book another
appointment.
I ask if he can do the other stuff and
I come back for the brake. He tells me
he really can't let me drive out of
there without fixing the brakes
because they might cause an accident,
but if I want to go to another
mechanic, he can call for a tow.
Since the car will be done so shortly
after lunch, I ask if his man can take
a late lunch so I can get my car back
an hour earlier.
He tells me his men come in at 8am and
often work into the evening.
They earn every break they
get, and his man deserves to take his
lunch with everyone else.
None of that is what I wanted to hear. I wanted to hear that I would drive out of there in a half hour with my wheels, tires and oil done.
My mechanic was just straight up and honest with me. Are you straight up and honest with your management? Or do you avoid telling them things they don't want to hear?
Unit Testing
I wouldn't touch a line of code I didn't understand, and I wouldn't check in a new line of code I didn't test thoroughly. (At least, not intentionally.)
Your question seems to imply that somehow a large corpus of poorly documented code made it past review without any unit tests. Maybe you participated in that, and maybe you didn't. Everyone involved needs to accept responsibility for that--including management. Regardless, what's done is done. You cannot go back and change it.
However, right now, in the present time, it is everybody's responsibility to stop the behavior that led to the problem in the first place. You say you spent a year working in code that you find difficult to understand and that has no unit tests. During that year, as you worked hard to improve your understanding, how many unit tests did you write to document and to verify that understanding?
As you struggled through the code slowly gaining understanding, how many comments did you add so you wouldn't have to struggle next time?
Scrum Backlog
Personally, I think the term "Scrum backlog" is a misnomer. A list of things to do is just a list--a shopping list if you will. I had a list when I went to the mechanic. My stand up meeting with the mechanic was really more of a sprint planning meeting.
A sprint planning meeting is a negotiation. If your management is time boxing without that negotiation, they aren't managing anything. They are simply trying to cram 10 lbs of shit into a 5 lb sack, and it's your responsibility to tell them so.
When you show up to a sprint planning meeting, you are expected to commit to a body of work, and it's your responsibility to prepare for that. Preparation means having some idea of what you will have to do to complete each item on the list--including the time it takes to understand obscure code and the time it takes to write unit tests.
If someone invites you to a planning meeting where you won't have time to prepare, decline the meeting and suggest when to reschedule so you will have time.
If you have an existing body of code with no unit tests and a feature might conceivably affect the operation of that code, you need to write unit tests for as much of the old code as might be affected. When you commit to writing the feature, you are committing to doing that work. If that leaves you too little time to commit to some other feature, just say so. Don't commit to the other feature.
When you commit to fix a defect, you commit to testing your work. Obviously, that means writing a unit test for the defect. But if it involves old code with no unit tests, it also means writing unit tests for things that aren't broken yet, but might break due to your change. How else will you test the fix?
If your defect list remains a constant size, your team regresses as much as it fixes. Politely explain to whomever needs to understand that unit tests prevent the regressions that currently keep your defect list from shrinking.
If you fail to write those unit tests because you commit to too many features, whose responsibility is that?
Refactoring
When you refactor code, you have to test all of it, and that means writing unit tests for all of it. If you have a large body of code with no unit tests, you will have to write all of those unit tests before you refactor.
I suggest you hold off on refactoring until those unit tests are in place. In the meantime, if you insist on including unit tests in your estimates for the work you commit to, eventually all those unit tests will be there. And then you can refactor.
The one exception to that is refactoring for testability. You may find that some of the code was not designed for test and that you have to refactor for things like dependency injection before you can create your unit tests. When you commit to writing the feature that requires the unit test, you commit to making the code testable. Include that in your estimate when you commit to the feature.
Commitment + Responsibility = Power
You say you are powerless. When you accept responsibility and commit to doing what needs doing, I think you will find you have all the power you need.
P.S. If anyone complains about anybody "wasting time" writing multiple unit tests when fixing a single defect, show them this video on the 80:20 rule and pound "defect clusters" into their brains.
It is hard to tell much from the information you give. Some questions I would have is a logical reason to be writing new code is to replace the old code. If that is what you are doing, abandon the old code.
Is it also old code that has showstopper defects? If so where are they coming from? Old code does not have "showstopper" defects, it just grinds closer and closer to a halt usually. It is old code after all - it should have the same old defects and the same old limitations, not stuff that has to be looked at right away. Showstopper defects are new code defects. It sounds like there is active development on in the old code.
If you are writing all this new code on top of old code that sucks, with no plans to fix it once and for all, sorry, there is only so much you can do when you are too busy burying yourself to dig yourself out.
If the latter is the case. you should recognize where you are headed, and try to detach a little. It is going to all collapse eventually, if you plan on being around save your strength for a worthwhile battle.
In the meantime try to pick up some design patterns. There are several that can at least help shield you new code from the old stuff, but still, ultimately it is just hard to write good code against bad code.
And your sprints sound maybe confused. Is there not an overall direction? That should determine how much backlog you have, although things can change month to month, is there not a clear sense of moving towards some final goal?
And new code rotting? The way you prevent that is you have a meaningful design, a meaningful direction, and a quality team that is committed to both the quality of their work and the vision of the design. If you have that, discipline is what maintains quality. If you don't have that sorry, you basically were writing code with no purpose already. It was basically rotten on the vine.
Not being critical, just trying to be honest. Take a deep breath. Slow down. You seem like you need it. Look at what you have written here. It tells nothing. You talk of refactor, scrums, showstoppers, defects, old code, new code. What does any of that mean? It is all jumbled up.
What about "new initiatives versus legacy systems"? "Need to refactor early sprint cycle code in terms of latest understanding etc." Are showstoppers in fact "Early components of the current enterprise initiatives have been released but are experiencing problems and no time is budgeted because of new development".
These would be meaningful concepts. You've given us nothing. I understand it is intense. My sprints are crazy too, we add a lot of back;pg items because we could not get many requirements up front (a lot of my new requirements result from having to also contend with external regulatory bodies, the normal business process is not always available).
But at the same time I am ground down by the sheer magnitude of what has to be done and the time to do it. Everything that is added to my backlog needs to be there. It is crazy, but at the same time I have a very clear idea of where I have been, where I need to go, and why the road is getter harder.
Step back, clear your thoughts, figure out the same - where you have been and where you are going. Because if you know that, it sure is not obvious. If you cannot communicate anything your peers can understand, how far are you going to get with a business manager?
Old code always sucks. There's probably some rare exceptions written by people with names like Kernighan or Thompson but, for the typical "code written in an office" stuff, over time it's gonna stink. Developers get more experienced. Newer practices, such as continuous integration, change the game. Stuff get's forgotten. New maintainers fail to grasp designs and wish for re-writes. So best accept this as normal.
Some random things that might help...
Talk about it with your team. Share your experiences and your concerns, while avoiding "man your old code sucks" (for obvious reasons) and see what the consensus is. You're probably not alone.
Forget about your managers. Don't expose them to this level of detail - they don't need to think about new vs. old code and probably won't understand if they do. This is a problem for your team to tackle and, if necessary, to make your PO aware of
Be open to the possibility that you may be able to throw stuff out. Some of that old code probably relates to features that are no longer being used or failed to be adopted by users in the first place. To make this work for you, you really need to go a level higher and think in terms of where the code really delivers user or business value vs. where it's just a ball of mud that no one is brave enough to take a decision on. Who dares, wins.
Relax your view of architectural consistency. There's always a way to tap into a working system with new code somewhere, and that may allow you to slowly migrate to a newer, smarter approach, while preserving the old long enough not to break existing things.
Overall, winning in this kind of situation is less about coding skills and much more about smart choices and handling the human aspects.
Hope that helps.
I recommend keeping track of how many bugs and code changes involve your "old code" and present this to either your manager or to your fellow developers at your next team meeting. With this in hand it should be simple enough to convince them that more needs to be done to refactor your "old code" and bring it up to par with your "new code".
It would also be prudent to document the parts of your "old code" that are most difficult to understand. These would also be the parts of your "old code" that you should be refactoring first once you get the approval.
Something to try: group your class into - say - worst 10%, best 10%, and the rest. Deliver the lists to your management, saying, "I predict the majority of bugs over the next quarter will be found in the first set." Based on length, cyclomatic complexity, test coverage - whatever tools are handy and comfortable to you. Then sit back and watch - and be right. Now you've got some credibility, some leverage when you say, "I'd like to invest some resources in making our bad code better, to reduce bugs and maintenance costs - and I know where to invest that energy, see?"
You could create diagrams and sketches of how the new code works and how the classes and functions are related to one another. You could use FreeMind or maybe Dia. And I definitely agree with Documenting and commenting your code.
I once had a problem with this too. I wrote a font class for J2ME for my own language. It was awful for these reasons that maybe you might also see in your code.
No Comments or documentation
Less object oriented
bad variable / function names
...
But after a few months I was forced to write the whole thing again. Now I've learned to use meaningful variable names that are sometimes VERY long. write comments more than writing codes. And using diagrams for the project's classes and their relationships.
I don't know If it was a real answer but it definitely worked for me. and for old codes you might actually have to reread the whole thing and add comments when you remember the functionalities.
Hope it helped.
Talk to your Product Owner! Explain that time invested in refactoring the old code will bring him benefit of higher team velocity on new features once this obstacle is removed.
Other than the approaches mentioned above which are good, you can also try these:
For keeping future code clean
Try pair programming, at least for parts that make sense. It's an effective way of getting reviewed, refactored code a practice.
Try to get refactoring onto the definition of "done". Then it will be part of the estimation process and allotted accordingly. So the definition of done might include: coded, unit tested, functionally tested, performance tested, code reviewed, refactored, and integrated (or something like this).
For Cleaning up the old code:
Unit tests are great for helping you refactor and figure out how things work.
I agree with the comments that a business case needs to be made for large-scale refactoring. But, small-scale refactoring could be easily included in the estimate and will provide immediate return. i.e.: I spend 2 hours rewriting a piece but I would have spent that time looking for bugs anyway.
You may also want to consider getting the product owner and scrummaster to capture a separate velocity for the old code vs the new code, and use that accordingly.
If there's a desired new feature and you can delineate a non-overwhelming hunk of code that is in the way, then you might be able to get management's blessing to replace the old code with new code that has the desired new feature. When I did this, I had to write a somewhat ugly shim layer to meet the old interfaces of the part of the software I wasn't going to touch. And a test harness that could exercise the existing code and exercise the new code to make sure the new code, as seen through the shim layer, could fool the rest of the application into thinking nothing had changed. By reworking the portion we reworked, we were able to show huge performance benefits, compatibility with desired new hardware, reduction in each of our field site's needs for expertise in administering space for the application - and the new code was much more maintainable. That last point mattered not a whit to the users, but the other advantages from the rework were attractive enough to "sell" the users on the merits of a somewhat painful database conversion.
Another more modest success story: we had a decent trouble tracking system that had literally years of history. There was a subsystem of our application that was famed for the speed with which it would burn out maintenance programmers. Clearly (well, clearly in my mind) it was in need of a major re-write, but management wasn't enthused about that. We were able to dig through the history in the trouble tracking data to show the staffing level that had gone into maintaining this module, and for all that effort, the trouble tickets per month against that module continued to arrive at a constant rate. When faced with actual data like that, even the reluctant managers who had long been tight-fisted about staffing re-work of that subsystem could see the merit of assigning staff to rework that module.
The approach as before was to leave the input and output of that module alone. The good news was that throwing virtual memory at the new code with its fancy new data structures did give a noticeable performance improvement to the module. The bad news is that we were nearly done with the re-implementation before we really understood what was wrong in the original implementation such that it did work most of the time, but managed to fail on some of the transactions on some days. The first cut faithfully reproduced those bugs, but the bugs were easier to understand in the reworked code so we now had a shot at really fixing the real problem. In retrospect, maybe we'd have been smarter to have captured data that produced the problems and have taken better care to make sure the reworked version didn't reproduce that problem. But, the truth is, nobody understood the problem until we were quite far along on the re-write. So, the re-write gave improved performance to the users and improved understanding to the current programmers, such that the real problem could really be resolved at last.
A fail example: There was yet another incredibly ugly module that persistently was a sore spot. Alas, I wasn't clever enough to be able to understand the defacto interfaces to this particular wretched hive of scum and villainy, at least not in the time frame of the nominal release schedule. I'd like to believe that given more time we could have figured out a suitable plan for re-working that piece of the system too, and maybe once we understood it, we could even identify user-desired improvements that we could fit into the re-write. But I can't promise that you'll find a prize in every box. If the box is entirely obscure to you, slicing away a chunk of it and replacing that piece with clean code is hard to do. The guy who had charge of that module is probably the one who was best positioned to figure out a plan of attack, but he saw the frequent crashes and calls from the field for assistance as "job security". I don't think management ever really recognized that he needed to be eased aside for someone with a hunger for change, but that's what probably was needed.
Drew

Why do code quality discussions evoke strong reactions? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
I like my code being in order, i.e. properly formatted, readable, designed, tested, checked for bugs, etc. In fact I am fanatic about it. (Maybe even more than fanatic...) But in my experience actions helping code quality are hardly implemented. (By code quality I mean the quality of the code you produce day to day. The whole topic of software quality with development processes and such is much broader and not the scope of this question.)
Code quality does not seem popular. Some examples from my experience include
Probably every Java developer knows JUnit, almost all languages implement xUnit frameworks, but in all companies I know, only very few proper unit tests existed (if at all). I know that it's not always possible to write unit tests due to technical limitations or pressing deadlines, but in the cases I saw, unit testing would have been an option. If a developer wanted to write some tests for his/her new code, he/she could do so. My conclusion is that developers do not want to write tests.
Static code analysis is often played around in small projects, but not really used to enforce coding conventions or find possible errors in enterprise projects. Usually even compiler warnings like potential null pointer access are ignored.
Conference speakers and magazines would talk a lot about EJB3.1, OSGI, Cloud and other new technologies, but hardly about new testing technologies or tools, new static code analysis approaches (e.g. SAT solving), development processes helping to maintain higher quality, how some nasty beast of legacy code was brought under test, ... (I did not attend many conferences and it probably looks different for conferences on agile topics, as unit testing and CI and such has a higher value there.)
So why is code quality so unpopular/considered boring?
EDIT:
Thank your for your answers. Most of them concern unit testing (and has been discussed in a related question). But there are lots of other things that can be used to keep code quality high (see related question). Even if you are not able to use unit tests, you could use a daily build, add some static code analysis to your IDE or development process, try pair programming or enforce reviews of critical code.
One obvious answer for the Stack Overflow part is that it isn't a forum. It is a database of questions and answers, which means that duplicate questions are attempted avoided.
How many different questions about code quality can you think of? That is why there aren't 50,000 questions about "code quality".
Apart from that, anyone claiming that conference speakers don't want to talk about unit testing or code quality clearly needs to go to more conferences.
I've also seen more than enough articles about continuous integration.
There are the common excuses for not
writing tests, but they are only
excuses. If one wants to write some
tests for his/her new code, then it is
possible
Oh really? Even if your boss says "I won't pay you for wasting time on unit tests"?
Even if you're working on some embedded platform with no unit testing frameworks?
Even if you're working under a tight deadline, trying to hit some short-term goal, even at the cost of long-term code quality?
No. It is not "always possible" to write unit tests. There are many many common obstacles to it. That's not to say we shouldn't try to write more and better tests. Just that sometimes, we don't get the opportunity.
Personally, I get tired of "code quality" discussions because they tend to
be too concerned with hypothetical examples, and are far too often the brainchild of some individual, who really hasn't considered how aplicable it is to other people's projects, or codebases of different sizes than the one he's working on,
tend to get too emotional, and imbue our code with too many human traits (think of the term "code smell", for a good example),
be dominated by people who write horrible bloated, overcomplicated and verbose code with far too many layers of abstraction, or who'll judge whether code is reusable by "it looks like I can just take this chunk of code and use it in a future project", rather than the much more meaningful "I have actually been able to take this chunk of code and reuse it in different projects".
I'm certainly interested in writing high quality code. I just tend to be turned off by the people who usually talk about code quality.
Code review is not an exact science. Metrics used are somehow debatable. Somewhere on that page : "You can't control what you can't measure"
Suppose that you have one huge function of 5000 lines with 35 parameters. You can unit test it how much you want, it might do exactly what it is supposed to do. Whatever the inputs are. So based on unit testing, this function is "perfect". Besides correctness, there are tons of others quality attributes you might want to measure. Performance, scalability, maintainability, usability and such. Did you ever wondered why software maintenance is such a nightmare?
Real software projects quality control goes far beyond simply checking if the code is correct. If you check the V-Model of software development, you'll notice that coding is only a small part of the whole equation.
Software quality control can go to as far as 60% of the whole cost of your project. This is huge. Instead, people prefer to cut to 0% and go home thinking they made the right choice. I think the real reason why so little time is dedicated to software quality is because software quality isn't well understood.
What is there to measure?
How do we measure it?
Who will measure it?
What will I gain/lose from measuring it?
Lots of coder sweatshops do not realise the relation between "less bugs now" and "more profit later". Instead, all they see is "time wasted now" and "less profit now". Even when shown pretty graphics demonstrating the opposite.
Moreover, software quality control and software engineering as a whole is a relatively new discipline. A lot of the programming space so far has been taken by cyber cowboys. How many times have you heard that "anyone" can program? Anyone can write code that's for sure, but it's not everyone who can be a programmer.
EDIT *
I've come across this paper (PDF) which is from the guy who said "You can't control what you can't measure". Basically he's saying that controlling everything is not as desirable as he first thought it would be. It is not an exact cooking recipe that you can blindly apply to all projects like the software engineering schools want to make you think. He just adds another parameter to control which is "Do I want to control this project? Will it be needed?"
Laziness / Considered boring
Management feeling it's unnecessary -
Ignorant "Just do it right" attitude.
"This small project doesn't need code
quality management" turns into "Now
it would be too costly to implement
code quality management on this large
project"
I disagree that it's dull though. A solid unit testing design makes creating tests a breeze and running them even more fun.
Calculating vector flow control - PASSED
Assigning flux capacitor variance level - PASSED
Rerouting superconductors for faster dialing sequence - PASSED
Running Firefly hull checks - PASSED
Unit tests complete. 4/4 PASSED.
Like anything it can get boring if you do too much of it but spending 10 or 20 minutes writing some random tests for some complex functions after several hours of coding isn't going to suck the creative life from you.
Why is code quality so unpopular?
Because our profession is unprofessional.
However, there are people who do care about code quality. You can find such-minded people for example from the Software Craftsmanship movement's discussion group. But unfortunately the majority of people in software business do not understand the value of code quality, or do not even know what makes up good code.
I guess the answer is the same as to the question 'Why is code quality not popular?'
I believe the top reasons are:
Laziness of the developers. Why invest time in preparing unit tests, review the solution, if it's already implemented?
Improper management. Why ask the developers to cope with code quality, if there are thousands of new feature requests and the programmers could simply implement something instead of taking care of quality of something already implemented.
Short answer: It's one of those intangibles only appreciated by other, mainly experienced, developers and engineers unless something goes wrong. At which point managers and customers are in an uproar and demand why formal processes weren't in place.
Longer answer: This short-sighted approach isn't limited to software development. The American automotive industry (or what's left of it) is probably the best example of this.
It's also harder to justify formal engineering processes when projects start their life as one-off or throw-away. Of course, long after the project is done, it takes a life of its own (and becomes prominent) as different business units start depending on it for their own business process.
At which point a new solution needs to be engineered; but without practice in using these tools and good-practices, these tools are less than useless. They become a time-consuming hindrance. I see this situation all too often in companies where IT teams are support to the business, where development is often reactionary rather than proactive.
Edit: Of course, these bad habits and many others are the real reason consulting firms like Thought Works can continue to thrive as well as they do.
One big factor that I didn't see mentioned yet is that any process improvement (unit testing, continuos integration, code reviews, whatever) needs to have an advocate within the organization who is committed to the technology, has the appropriate clout within the organization, and is willing to do the work to convince others of the value.
For example, I've seen exactly one engineering organization where code review was taken truly seriously. That company had a VP of Software who was a true believer, and he'd sit in on code reviews to make sure they were getting done properly. They incidentally had the best productivity and quality of any team I've worked with.
Another example is when I implemented a unit-testing solution at another company. At first, nobody used it, despite management insistence. But several of us made a real effort to talk up unit testing, and to provide as much help as possible for anyone who wanted to start unit testing. Eventually, a couple of the most well-respected developers signed on, once they started to see the advantages of unit testing. After that, our testing coverage improved dramatically.
I just thought of another factor - some tools take a significant amount of time to get started with, and that startup time can be hard to come by. Static analysis tools can be terrible this way - you run the tool, and it reports 2,000 "problems", most of which are innocuous. Once you get the tool configured properly, the false-positive problem get substantially reduced, but someone has to take that time, and be committed to maintaining the tool configuration over time.
Probably every Java developer knows JUnit...
While I believe most or many developers have heard of JUnit/nUnit/other testing frameworks, fewer know how to write a test using such a framework. And from those, very few have a good understanding of how to make testing a part of the solution.
I've known about unit testing and unit test frameworks for at least 7 years. I tried using it in a small project 5-6 years ago, but it is only in the last few years that I've learned how to do it right. (ie. found a way that works for me and my team...)
For me some of those things were:
Finding a workflow that accomodates unit testing.
Integrating unit testing in my IDE, and having shortcuts to run/debug tests.
Learning how to test what. (Like how to test logging in or accessing files. How to abstract yourself from the database. How to do mocking and use a mocking framework. Learn techniques and patterns that increase testability.)
Having some tests is better than having no tests at all.
More tests can be written later when a bug is discovered. Write the test that proves the bug, then fix the bug.
You'll have to practice to get good at it.
So until finding the right way; yeah, it's dull, non rewarding, hard to do, time consuming, etc.
EDIT:
In this blogpost I go in depth on some of the reasons given here against unit testing.
Code Quality is unpopular? Let me dispute that fact.
Conferences such as Agile 2009 have a plethora of presentations on Continuous Integration, and testing techniques and tools. Technical conference such as Devoxx and Jazoon also have their fair share of those subjects.
There is even a whole conference dedicated to Continuous Integration & Testing (CITCON, which takes place 3 times a year on 3 continents).
In fact, my personal feeling is that those talks are so common, that they are on the verge of being totally boring to me.
And in my experience as a consultant, consulting on code quality techniques & tools is actually quite easy to sell (though not very highly paid).
That said, though I think that Code Quality is a popular subject to discuss, I would rather agree with the fact that developers do not (in general) do good, or enough, tests. I do have a reasonably simple explanation to that fact.
Essentially, it boils down to the fact that those techniques are still reasonably new (TDD is 15 years old, CI less than 10) and they have to compete with 1) managers, 2) developers whose ways "have worked well enough so far" (whatever that means).
In the words of Geoffrey Moore, modern Code Quality techniques are still early in the adoption curve. It will take time until the entire industry adopts them.
The good news, however, is that I now meet developers fresh from university that have been taught TDD and are truly interested in it. That is a recent development. Once enough of those have arrived on the market, the industry will have no choice but to change.
It's pretty simple when you consider the engineering adage "Good, Fast, Cheap: pick two". In my experience 98% of the time, it's Fast and Cheap, and by necessity the other must suffer.
It's the basic psychology of pain. When you'ew running to meet a deadline code quality takes the last seat. We hate it because it's dull and boring.
It reminds me of this Monty Python skit:
"Exciting? No it's not. It's dull. Dull. Dull. My God it's dull, it's so desperately dull and tedious and stuffy and boring and des-per-ate-ly DULL. "
I'd say for many reasons.
First of all, if the application/project is small or carries no really important data at a large scale the time needed to write the tests is better used to write the actual application.
There is a threshold where the quality requirements are of such a level that unit testing is required.
There is also the problem of many methods not being easily testable. They may rely on data in a database or similar, which creates the headache of setting up mockup data to be fed to the methods. Even if you set up mockup data - can you be certain the database would behave the same way?
Unit testing is also weak at finding problems that haven't been considered. That is, unit testing is bad at simulating the unexpected. If you haven't considered what could happen in a power outage, if the network link sends bad data that is still CRC correct. Writing tests for this is futile.
I am all in favour of code inspections as they let programmers share experience and code style from other programmers.
"There are the common excuses for not writing tests, but they are only excuses."
Are they? Get eight programmers in a room together, ask them a question about how best to maintain code quality, and you're going to get nine different answers, depending on their age, education and preferences. 1970s era Computer Scientists would've laughed at the notion of unit testing; I'm not sure they would've been wrong to.
Management needs to be sold on the value of spending more time now to save time down the road. Since they can't actually measure "bugs not fixed", they're often more concerned about meeting their immediate deadlines & ship date than the longterm quality off the project.
Code quality is subjective. Subjective topics are always tedious.
Since the goal is simply to make something that works, code quality always comes in second. It adds time and cost. (I'm not saying that it should not be considered a good thing though.)
99% of the time, there are no third party consquences for poor code quality (unless you're making spaceshuttle or train switching software).
Does it work? = Concrete.
Is it pretty? = In the eye of the beholder.
Read Fred Brooks' The Mythical Man Month. There is no silver bullet.
Unit Testing takes extra work. If a programmer sees that his product "works" (eg, no unit testing), why do any at all? Especially when it is not nearly as interesting as implementing the next feature in the program, etc. Most people just tend to be lazy when it comes down to it, which isn't quite a good thing...
Code quality is context specific and hard to generalize no matter how much effort people try to make it so.
It's similar to the difference between theory and application.
I also have not seen unit tests written on a regular basis. The reason for that was given as the code being too extensively changed at the beginning of the project so everyone dropped writing unit tests until everything got stabilized. After that everyone was happy and not in need of unit tests. So we have a few tests stay there as a history but they are not used and are probably not compatible with the current code.
I personally see writing unit tests for big projects as not feasible, although I admit I have not tried it nor talked to people who did. There are so many rules in business logic that if you just change something somewhere a little bit you have no way of knowing which tests to update beyond those that will crash. Who knows, the old tests may now not cover all possibilities and it takes time to recollect what was written five years ago.
The other reason being the lack of time. When you have a task assigned where it says "Completion time: O,5 man/days", you only have time to implement it and shallow test it, not to think of all possible cases and relations to other project parts and write all the necessary tests. It may really take 0,5 days to implement something and a couple of weeks to write the tests. Unless you were specifically given an order to create the tests, nobody will understand that tremendous loss of time, which will result in yelling/bad reviews. And no, for our complex enterprise application I cannot think of a good test coverage for a task in five minutes. It will take time and probably a very deep knowledge of most application modules.
So, the reasons as I see them is time loss which yields no useful features and the nightmare to maintain/update old tests to reflect new business rules. Even if one wanted to, only experienced colleagues could write those tests - at least one year deep involvement in the project, but two-three is really needed. So new colleagues do not manage proper tests. And there is no point in creating bad tests.
It's 'dull' to catch some random 'feature' with extreme importance for more than a day in mysterious code jungle wrote by someone else x years ago without any clue what's going wrong, why it's going wrong and with absolutely no ideas what could fix it when it was supposed to end in a few hours. And when it's done, no one is satisfied cause of huge delay.
Been there - seen that.
A lot of the concepts that are emphasized in modern writing on code quality overlook the primary metric for code quality: code has to be functional first and foremost. Everything else is just a means to that end.
Some people don't feel like they have time to learn the latest fad in software engineering, and that they can write high-quality code already. I'm not in a place to judge them, but in my opinion it's very difficult for your code to be used over long periods of time if people can't read, understand and change it.
Lack of 'code quality' doesn't cost the user, the salesman, the architect nor the developer of the code; it slows down the next iteration, but I can think of several successful products which seem to be made out of hair and mud.
I find unit testing to make me more productive, but I've seen lots of badly formatted, unreadable poorly designed code which passed all its tests ( generally long-in-the-tooth code which had been patched many times ). By passing tests you get a road-worthy Skoda, not the craftsmanship of a Bristol. But if you have 'low code quality' and pass your tests and consistently fulfill the user's requirements, then that's a valid business model.
My conclusion is that developers do not want to write tests.
I'm not sure. Partly, the whole education process in software isn't test driven, and probably should be - instead of asking for an exercise to be handed in, give the unit tests to the students. It's normal in maths questions to run a check, why not in software engineering?
The other thing is that unit testing requires units. Some developers find modularisation and encapsulation difficult to do well. A good technical lead will create a modular architecture which localizes the scope of a unit, so making it easy to test in isolation; many systems don't have good architects who facilitate testability, or aren't refactored regularly enough to reduce inter-unit coupling.
It's also hard to test distributed or GUI driven applications, due to inherent coupling. I've only been in one team that did that well, and that had as large a test department as a development department.
Static code analysis is often played around in small projects, but not really used to enforce coding conventions or find possible errors in enterprise projects.
Every set of coding conventions I've seen which hasn't been automated has been logically inconsistent, sometimes to the point of being unusable - even ones claimed to have been used 'successfully' in several projects. Non-automatic coding standards seem to be political rather than technical documents.
Usually even compiler warnings like potential null pointer access are ignored.
I've never worked in a shop where compiler warnings were tolerated.
One attitude that I have met rather often (but never from programmers that were already quality-addicts) is that writing unit tests just forces you to write more code without getting any extra functionality for the effort. And they think that that time would be better spent adding functionality to the product instead of just creating "meta code".
That attitude usually wears off as unit tests catch more and more bugs that you realize would be serious and hard to locate in a production environment.
A lot of it arises when programmers forget, or are naive, and act like their code won't be viewed by somebody else at a later date (or themselves months/years down the line).
Also, commenting isn't near as "cool" as actually writing a slick piece of code.
Another thing that several people have touched on is that most development engineers are terrible testers. They don't have the expertise or mind-set to effectively test their own code. This means that unit testing doesn't seem very valuable to them - since all of their code always passes unit tests, why bother writing them?
Education and mentoring can help with that, as can test-driven development. If you write the tests first, you're at least thinking primarily about testing, rather than trying to get the tests done, so you can commit the code...
The likelyhood of you being replaced by a cheaper fresh out of college student or outsource worker is directly proportional to the readability of your code.
People don't have a common sense of what "good" means for code. A lot of people will drop to the level of "I ran it" or even "I wrote it."
We need to have some kind of a shared sense of what good code is, and whether it matters. For the first part of that,I have written up some thoughts:
http://agileinaflash.blogspot.com/2010/02/seven-code-virtues.html
As for whether it matters, that's been covered plenty of times. It matters quite a lot if your code is to live very long. If it really won't ever sell or won't be deployed, then it clearly doesn't. If it's not worth doing, it's not worth doing well.
But if you don't practice writing virtuous code, then you can't do it when it matters. I think people have practiced doing poor work, and don't know anything else.
I think code quality is over-rated. the more I do it the less it means to me. Code quality frameworks prefer over-complicated code. You never see errors like "this code is too abstract, no one will understand it.", but for example PMD says that I have too many methods in my class. So I should cut the class into abstract class/classes (the best way since PMD doesn't care what I do) or cut the classes based on functionality (worst way since it might still have too many methods - been there).
Static Analysis is really cool, however it's just warnings. For example FindBugs has problem with casting and you should use instaceof to make warning go away. I don't do that just to make FindBugs happy.
I think too complicated code is not when method has 500 lines of code, but when method is using 500 other methods and many abstractions just for fun. I think code quality masters should really work on finding when code is too complicated and don't care so much about little things (you can refactor them with the right tools really quickly.).
I don't like idea of code coverage since it's really useless and makes unit-test boring. I always test code with complicated functionality, but only that code. I worked in a place with 100% code coverage and it was a real nightmare to change anything. Because when you change anything you had to worry about broken (poorly written) unit-tests and you never know what to do with them, many times we just comment them out and add todo to fix them later.
I think unit-testing has its place and for example I did a lot of unit-testing in my webpage parser, because all the time I found diffrent bugs or not supported tags. Testing Database programs is really hard if you want to also test database logic, DbUnit is really painful to work with.
I don't know. Have you seen Sonar? Sure it is Maven specific, but point it at your build and boom, lots of metrics. That's the kind of project that will facilitate these code quality metrics going mainstream.
I think that real problem with code quality or testing is that you have to put a lot of work into it and YOU get nothing back. less bugs == less work? no, there's always something to do. less bugs == more money? no, you have to change job to get more money. unit-testing is heroic, you only do it to feel better about yourself.
I work at place where management is encouraging unit-testing, however I am the only person that writes tests(i want to get better at it, its the only reason I do it). I understand that for others writing tests is just more work and you get nothing in return. surfing the web sounds cooler than writing tests.
someone might break your tests and say he doesn't know how to fix or comment it out(if you use maven).
Frameworks are not there for real web-app integration testing(unit test might pass, but it might not work on a web page), so even if you write test you still have to test it manually.
You could use framework like HtmlUnit, but its really painful to use. Selenium breaks with every change on a webpage. SQL testing is almost impossible(You can do it with DbUnit, but first you have to provide test data for it. test data for 5 joins is a lot of work and there is no easy way to generate it). I dont know about your web-framework, but the one we are using really likes static methods, so you really have to work to test the code.

How does Test Driven Design help a one man software project?

I've spent a lot of time building out tests for my latest project, and I'm really not sure what the ROI was on the time spent.
I'm a one man operation, and I'm building web applications. I don't necessarily have to "prove" that my software works to anyone (except my users), and I'm worried that I spent a good deal of time needlessly rebugging test code in the past months.
My question is, while I like the idea of TDD for small to large software teams, how does it help a one man team build high quality code quickly?
Thanks
=> ran across this today, from the blog of joel spolsky, one of the founders of stackoverflow:
http://www.joelonsoftware.com/items/2009/09/23.html
"Zawinski didn’t do many unit tests. They “sound great in principle. Given a leisurely development pace, that’s certainly the way to go. But when you’re looking at, ‘We’ve got to go from zero to done in six weeks,’ well, I can’t do that unless I cut something out. And what I’m going to cut out is the stuff that’s not absolutely critical. And unit tests are not critical. If there’s no unit test the customer isn’t going to complain about that.”"
as i'm getting older i think i'm realizing more and more that it's just all about speed and functionality. i'd love to build unit tests. but since we only have so much time at our disposal, i'd rather build it faster, and rely on beta testing and good automated error reporting to weed out any problems as they crop up. if the project eventually gets big enough that this bites me in the a**, it will be generating enough revenue that i can justify a rebuild.
I think that a situation like yours it helps greatly when you have to change/refactor/optimize something on which a lot of code depends... By using unit-testing you can quickly ensure that everything that worked before the change, still works afterwards :) In other words, it gives you confidence.
TDD doesn't really have anything to do with team size. It has to do with creating the smallest amount of software needed with the right interface that works correctly.
The TDD process requires you to write only just enough code to satisfy a test, so you don't end up creating code you don't need.
Using TDD to design a class makes you think as a client of the class, so you end up creating a better interface more often than if you developed it without TDD.
TDD, by its nature will acheive 100% code coverage, proving your code works. A side-effect of this is that you and others can now more safely change your class because it has a full suite of automated tests.
I should add that its iterative nature creates a positive feedback loop as well, so as you iterate you gain more and more confidence in your code.
TDD is not only about testing, it is also about designing your classes / API.
I mean: by writing a test first, you are forced to think on how you want to use your class. So, you first think about the interface of your class, how you want to use your class, and hence, your object model becomes more usable and readable.
rebugging is always needless - just don't delete the bugs in the first place...
For a real answer, you can't do better than 'it depends'. If:
you don't tend to have the kind of
problems that automated unit testing can
find (as opposed to performance, visual or
aesthetic ones)
you have some other way of designing the code (e.g. UML)
you don't tend to have cause to change things while keeping them working
It could well be the case that TDD doesn't really work out for you.
Or maybe you are doing it wrong, and if you did it differently it would work better.
Or, just maybe, it is actually working but you don't realise it. One thing about working solo is that self-assessment is difficult.
In general, people's self-views hold
only a tenuous to modest relationship
with their actual behavior and
performance. The correlation between
self-ratings of skill and actual
performance in many domains is
moderate to meager—indeed, at times,
other people's predictions of a
person's outcomes prove more accurate
than that person's self-predictions.
In addition, people overrate
themselves. On average, people say
that they are "above average" in skill
(a conclusion that defies statistical
possibility), overestimate the
likelihood that they will engage in
desirable behaviors and achieve
favorable outcomes, furnish overly
optimistic estimates of when they will
complete future projects, and reach
judgments with too much confidence.
Several psychological processes
conspire to produce flawed
self-assessments.
In theory team size should not matter. TDD is supposed to pay off because:
You test the code so you get the bugs out
When you maintain or refactor the code you know you didn't break it because you easily can test it
You produce better code because you focus on edge cases as you write the code
You produce better code because you can refactor the code with confidence
And generally I do find that the approach is valuable. I must admit to being in two minds about the ongoing maintenance of some tests.
Even before the term TDD became popular, I've always written a little main function to test whatever piece I was working on. I'd throw it away right afterwards. I have trouble understanding the mentality of programmers who could write code that never been executed and plug it in. When you find a "Bug" after doing that a few times it can take days or even weeks to track down.
Unit testing is a slightly better way to go because your tests hang around and you can see the intent.
Unit testing will find the bug much faster than testing from a UI after integration--it'll just save you time.
Now saving all your unit tests and creating a suite can be of less value if you are single, especially if you like to refactor a lot (You can easily spend more time refactoring tests than code), but the tests are still worth while to create.
Plus, test-driven development is kinda fun to a degree.
I've found that people concentrate too much on the TEST in TDD. There are many who don't believe that this is what the originators had in mind.
I've found that BDD is quite useful, no matter what the problem size. It concentrates on the gathering of how the system is supposed to behave.
Some people take it all the way to the creation of automated unit tests. I use them as both specifications and test cases. Because they're in English, it is easy for the business to understand them, as well as the QA department.
In general, it is a formalized way of recording specs, so that code can be written. Isn't that what the ultimate goal is?
Here are a few links
What's in a Story
Introducing BDD
Designing Klingon Warships Using Behaviour Driven Development

Disadvantages of Test Driven Development? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
What do I lose by adopting test driven design?
List only negatives; do not list benefits written in a negative form.
If you want to do "real" TDD (read: test first with the red, green, refactor steps) then you also have to start using mocks/stubs, when you want to test integration points.
When you start using mocks, after a while, you will want to start using Dependency Injection (DI) and a Inversion of Control (IoC) container. To do that you need to use interfaces for everything (which have a lot of pitfalls themselves).
At the end of the day, you have to write a lot more code, than if you just do it the "plain old way". Instead of just a customer class, you also need to write an interface, a mock class, some IoC configuration and a few tests.
And remember that the test code should also be maintained and cared for. Tests should be as readable as everything else and it takes time to write good code.
Many developers don't quite understand how to do all these "the right way". But because everybody tells them that TDD is the only true way to develop software, they just try the best they can.
It is much harder than one might think. Often projects done with TDD end up with a lot of code that nobody really understands. The unit tests often test the wrong thing, the wrong way. And nobody agrees how a good test should look like, not even the so called gurus.
All those tests make it a lot harder to "change" (opposite to refactoring) the behavior of your system and simple changes just becomes too hard and time consuming.
If you read the TDD literature, there are always some very good examples, but often in real life applications, you must have a user interface and a database. This is where TDD gets really hard, and most sources don't offer good answers. And if they do, it always involves more abstractions: mock objects, programming to an interface, MVC/MVP patterns etc., which again require a lot of knowledge, and... you have to write even more code.
So be careful... if you don't have an enthusiastic team and at least one experienced developer who knows how to write good tests and also knows a few things about good architecture, you really have to think twice before going down the TDD road.
Several downsides (and I'm not claiming there are no benefits - especially when writing the foundation of a project - it'd save a lot of time at the end):
Big time investment. For the simple case you lose about 20% of the actual implementation, but for complicated cases you lose much more.
Additional Complexity. For complex cases your test cases are harder to calculate, I'd suggest in cases like that to try and use automatic reference code that will run in parallel in the debug version / test run, instead of the unit test of simplest cases.
Design Impacts. Sometimes the design is not clear at the start and evolves as you go along - this will force you to redo your test which will generate a big time lose. I would suggest postponing unit tests in this case until you have some grasp of the design in mind.
Continuous Tweaking. For data structures and black box algorithms unit tests would be perfect, but for algorithms that tend to be changed, tweaked or fine tuned, this can cause a big time investment that one might claim is not justified. So use it when you think it actually fits the system and don't force the design to fit to TDD.
When you get to the point where you have a large number of tests, changing the system might require re-writing some or all of your tests, depending on which ones got invalidated by the changes. This could turn a relatively quick modification into a very time-consuming one.
Also, you might start making design decisions based more on TDD than on actually good design prinicipals. Whereas you may have had a very simple, easy solution that is impossible to test the way TDD demands, you now have a much more complex system that is actually more prone to mistakes.
I think the biggest problem for me is the HUGE loss of time it takes "getting in to it". I am still very much at the beginning of my journey with TDD (See my blog for updates my testing adventures if you are interested) and I have literally spent hours getting started.
It takes a long time to get your brain into "testing mode" and writing "testable code" is a skill in itself.
TBH, I respectfully disagree with Jason Cohen's comments on making private methods public, that's not what it is about. I have made no more public methods in my new way of working than before. It does, however involve architectural changes and allowing for you to "hot plug" modules of code to make everything else easier to test. You should not be making the internals of your code more accessible to do this. Otherwise we are back to square one with everything being public, where is the encapsulation in that?
So, (IMO) in a nutshell:
The amount of time taken to think (i.e. actually grok'ing testing).
The new knowledge required of knowing how to write testable code.
Understanding the architectural changes required to make code testable.
Increasing your skill of "TDD-Coder" while trying to improve all the other skills required for our glorious programming craft :)
Organising your code base to include test code without screwing your production code.
PS: If you would like links to positives, I have asked and answered several questions on it, check out my profile.
In the few years that I've been practicing Test Driven Development, I'd have to say the biggest downsides are:
Selling it to management
TDD is best done in pairs. For one, it's tough to resist the urge to just write the implementation when you KNOW how to write an if/else statement. But a pair will keep you on task because you keep him on task. Sadly, many companies/managers don't think that this is a good use of resources. Why pay for two people to write one feature, when I have two features that need to be done at the same time?
Selling it to other developers
Some people just don't have the patience for writing unit tests. Some are very proud of their work. Or, some just like seeing convoluted methods/functions bleed off the end of the screen. TDD isn't for everyone, but I really wish it were. It would make maintaining stuff so much easier for those poor souls who inherit code.
Maintaining the test code along with your production code
Ideally, your tests will only break when you make a bad code decision. That is, you thought the system worked one way, and it turns out it didn't. By breaking a test, or a (small) set of tests, this is actually good news. You know exactly how your new code will affect the system. However, if your tests are poorly written, tightly coupled or, worse yet, generated (cough VS Test), then maintaining your tests can become a choir quickly. And, after enough tests start to cause more work that the perceived value they are creating, then the tests will be the first thing to be deleted when schedules become compressed (eg. it gets to crunch time)
Writing tests so that you cover everything (100% code coverage)
Ideally, again, if you adhere to the methodology, your code will be 100% tested by default. Typically, thought, I end up with code coverage upwards of 90%. This usually happens when I have some template style architecture, and the base is tested, and I try to cut corners and not test the template customizations. Also, I have found that when I encounter a new barrier I hadn't previously encountered, I have a learning curve in testing it. I will admit to writing some lines of code the old skool way, but I really like to have that 100%. (I guess I was an over achiever in school, er skool).
However, with that I'd say that the benefits of TDD far outweigh the negatives for the simple idea that if you can achieve a good set of tests that cover your application but aren't so fragile that one change breaks them all, you will be able to keep adding new features on day 300 of your project as you did on day 1. This doesn't happen with all those who try TDD thinking it's a magic bullet to all their bug-ridden code, and so they think it can't work, period.
Personally I have found that with TDD, I write simpler code, I spend less time debating if a particular code solution will work or not, and that I have no fear to change any line of code that doesn't meet the criteria set forth by the team.
TDD is a tough discipline to master, and I've been at it for a few years, and I still learn new testing techniques all the time. It is a huge time investment up front, but, over the long term, your sustainability will be much greater than if you had no automated unit tests. Now, if only my bosses could figure this out.
On your first TDD project there are two big losses, time and personal freedom
You lose time because:
Creating a comprehensive, refactored, maintainable suite of unit and acceptance tests adds major time to the first iteration of the project. This may be time saved in the long run but equally it can be time you don't have to spare.
You need to choose and become expert in a core set of tools. A unit testing tool needs to be supplemented by some kind of mocking framework and both need to become part of your automated build system. You also want to pick and generate appropriate metrics.
You lose personal freedom because:
TDD is a very disciplined way of writing code that tends to rub raw against those at the top and bottom of the skills scale. Always writing production code in a certain way and subjecting your work to continual peer review may freak out your worst and best developers and even lead to loss of headcount.
Most Agile methods that embed TDD require that you talk to the client continually about what you propose to accomplish (in this story/day/whatever) and what the trade offs are. Once again this isn't everyone's cup of tea, both on the developers side of the fence and the clients.
Hope this helps
TDD requires you to plan out how your classes will operate before you write code to pass those tests. This is both a plus and a minus.
I find it hard to write tests in a "vacuum" --before any code has been written. In my experience I tend to trip over my tests whenever I inevitably think of something while writing my classes that I forgot while writing my initial tests. Then it's time to not only refactor my classes, but ALSO my tests. Repeat this three or four times and it can get frustrating.
I prefer to write a draft of my classes first then write (and maintain) a battery of unit tests. After I have a draft, TDD works fine for me. For example, if a bug is reported, I will write a test to exploit that bug and then fix the code so the test passes.
Prototyping can be very difficult with TDD - when you're not sure what road you're going to take to a solution, writing the tests up-front can be difficult (other than very broad ones). This can be a pain.
Honestly I don't think that for "core development" for the vast majority of projects there's any real downside, though; it's talked down a lot more than it should be, usually by people who believe their code is good enough that they don't need tests (it never is) and people who just plain can't be bothered to write them.
Well, and this stretching, you need to debug your tests. Also, there is a certain cost in time for writing the tests, though most people agree that it's an up-front investment that pays off over the lifetime of the application in both time saved debugging and in stability.
The biggest problem I've personally had with it, though, is getting up the discipline to actually write the tests. In a team, especially an established team, it can be hard to convince them that the time spent is worthwhile.
The downside to TDD is that it is usually tightly associated with 'Agile' methodology, which places no importance on documentation of a system, rather the understanding behind why a test 'should' return one specific value rather than any other resides only in the developer's head.
As soon as the developer leaves or forgets the reason that the test returns one specific value and not some other, you're screwed. TDD is fine IF it is adequately documented and surrounded by human-readable (ie. pointy-haired manager) documentation that can be referred to in 5 years when the world changes and your app needs to as well.
When I speak of documentation, this isn't a blurb in code, this is official writing that exists external to the application, such as use cases and background information that can be referred to by managers, lawyers and the poor sap who has to update your code in 2011.
I've encountered several situations where TDD makes me crazy. To name some:
Test case maintainability:
If you're in a big enterprise, many chances are that you don't have to write the test cases yourself or at least most of them are written by someone else when you enter the company. An application's features changes from time to time and if you don't have a system in place, such as HP Quality Center, to track them, you'll turn crazy in no time.
This also means that it'll take new team members a fair amount of time to grab what's going on with the test cases. In turn, this can be translated into more money needed.
Test automation complexity:
If you automate some or all of the test cases into machine-runnable test scripts, you will have to make sure these test scripts are in sync with their corresponding manual test cases and in line with the application changes.
Also, you'll spend time to debug the codes that help you catch bugs. In my opinion, most of these bugs come from the testing team's failure to reflect the application changes in the automation test script. Changes in business logic, GUI and other internal stuff can make your scripts stop running or running unreliably. Sometimes the changes are very subtle and difficult to detect. Once all of my scripts report failure because they based their calculation on information from table 1 while table 1 was now table 2 (because someone swapped the name of the table objects in the application code).
If your tests are not very thorough you might fall into a false sense of "everything works" just because you tests pass. Theoretically if your tests pass, the code is working; but if we could write code perfectly the first time we wouldn't need tests. The moral here is to make sure to do a sanity check on your own before calling something complete, don't just rely on the tests.
On that note, if your sanity check finds something that is not tested, make sure to go back and write a test for it.
The biggest problem are the people who don't know how to write proper unit tests. They write tests that depend on each other (and they work great running with Ant, but then all of sudden fail when I run them from Eclipse, just because they run in different order). They write tests that don't test anything in particular - they just debug the code, check the result, and change it into test, calling it "test1". They widen the scope of classes and methods, just because it will be easier to write unit tests for them. The code of unit tests is terrible, with all the classical programming problems (heavy coupling, methods that are 500 lines long, hard-coded values, code duplication) and is a hell to maintain. For some strange reason people treat unit tests as something inferior to the "real" code, and they don't care about their quality at all. :-(
You lose the ability to say you are "done" before testing all your code.
You lose the capability to write hundreds or thousands of lines of code before running it.
You lose the opportunity to learn through debugging.
You lose the flexibility to ship code that you aren't sure of.
You lose the freedom to tightly couple your modules.
You lose option to skip writing low level design documentation.
You lose the stability that comes with code that everyone is afraid to change.
You lose a lot of time spent writing tests. Of course, this might be saved by the end of the project by catching bugs faster.
Refocusing on difficult, unforeseen requirements is the constant bane of the programmer. Test-driven development forces you to focus on the already-known, mundane requirements, and limits your development to what has already been imagined.
Think about it, you are likely to end up designing to specific test cases, so you won't get creative and start thinking "it would be cool if the user could do X, Y, and Z". Therefore, when that user starts getting all excited about potential cool requirements X, Y, and Z, your design may be too rigidly focused on already specified test cases, and it will be difficult to adjust.
This, of course, is a double edged sword. If you spend all your time designing for every conceivable, imaginable, X, Y, and Z that a user could ever want, you will inevitably never complete anything. If you do complete something, it will be impossible for anyone (including yourself) to have any idea what you're doing in your code/design.
You will lose large classes with multiple responsibilities.
You will also likely lose large methods with multiple responsibilities.
You may lose some ability to refactor, but you will also lose some of the need to refactor.
Jason Cohen said something like:
TDD requires a certain organization for your code. This might be architecturally wrong; for example, since private methods cannot be called outside a class, you have to make methods non-private to make them testable.
I say this indicates a missed abstraction -- if the private code really needs to be tested, it should probably be in a separate class.
Dave Mann
The biggest downside is that if you really want to do TDD properly you will have to fail a lot before you succeed. Given how many software companies work (dollar per KLOC) you will eventually get fired. Even if your code is faster, cleaner, easier to maintain, and has less bugs.
If you are working in a company that pays you by the KLOCs (or requirements implemented -- even if not tested) stay away from TDD (or code reviews, or pair programming, or Continuous Integration, etc. etc. etc.).
I second the answer about initial development time. You also lose the ability to confortably work without the safety of tests. I've also been described as a TDD nutbar, so you could lose a few friends ;)
It's percieved as slower. Long term that's not true in terms of the grief it will save you down the road, but you'll end up writing more code so arguably you're spending time on "testing not coding". It's a flawed argument, but you did ask!
It can be hard and time consuming writing tests for "random" data like XML-feeds and databases (not that hard). I've spent some time lately working with weather data feeds. It's quite confusing writing tests for that, at least as i don't have too much experience with TDD.
You have to write applications in a different way: one which makes them testable. You'd be surprised how difficult this is at first.
Some people find the concept of thinking about what they're going to write before they write it too hard. Concepts such as mocking can be difficult for some too. TDD in legacy apps can be very difficult if they weren't designed for testing. TDD around frameworks that are not TDD friendly can also be a struggle.
TDD is a skill so junior devs may struggle at first (mainly because they haven't been taught to work this way).
Overall though the cons become solved as people become skilled and you end up abstracting away the 'smelly' code and have a more stable system.
unit test are more code to write, thus a higher upfront cost of development
it is more code to maintain
additional learning required
Good answers all. I would add a few ways to avoid the dark side of TDD:
I've written apps to do their own randomized self-test. The problem with writing specific tests is even if you write lots of them they only cover the cases you think of. Random-test generators find problems you didn't think of.
The whole concept of lots of unit tests implies that you have components that can get into invalid states, like complex data structures. If you stay away from complex data structures there's a lot less to test.
To the extent your application allows it, be shy of design that relies on the proper ordering of notifications, events and side-effects. Those can easily get dropped or scrambled so they need a lot of testing.
Let me add that if you apply BDD principles to a TDD project, you can alleviate a few of the major drawbacks listed here (confusion, misunderstandings, etc.). If you're not familiar with BDD, you should read Dan North's introduction. He came up the concept in answer to some of the issues that arose from applying TDD at the workplace. Dan's intro to BDD can be found here.
I only make this suggestion because BDD addresses some of these negatives and acts as a gap-stop. You'll want to consider this when collecting your feedback.
It takes some time to get into it and some time to start doing it in a project but... I always regret not doing a Test Driven approach when I find silly bugs that an automated test could have found very fast. In addition, TDD improves code quality.
You have to make sure your tests are always up to date, the moment you start ignoring red lights is the moment the tests become meaningless.
You also have to make sure the tests are comprehensive, or the moment a big bug appears, the stuffy management type you finally convinced to let you spend time writing more code will complain.
The person who taught my team agile development didn't believe in planning, you only wrote as much for the tiniest requirement.
His motto was refactor, refactor, refactor. I came to understand that refactor meant 'not planning ahead'.
Development time increases : Every method needs testing, and if you have a large application with dependencies you need to prepare and clean your data for tests.
TDD requires a certain organization for your code. This might be inefficient or difficult to read. Or even architecturally wrong; for example, since private methods cannot be called outside a class, you have to make methods non-private to make them testable, which is just wrong.
When code changes, you have to change the tests as well. With refactoring this can be a
lot of extra work.

What is a reasonable code coverage % for unit tests (and why)? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
If you were to mandate a minimum percentage code-coverage for unit tests, perhaps even as a requirement for committing to a repository, what would it be?
Please explain how you arrived at your answer (since if all you did was pick a number, then I could have done that all by myself ;)
This prose by Alberto Savoia answers precisely that question (in a nicely entertaining manner at that!):
http://www.artima.com/forums/flat.jsp?forum=106&thread=204677
Testivus On Test Coverage
Early one morning, a programmer asked
the great master:
“I am ready to write some unit tests. What code coverage should I aim
for?”
The great master replied:
“Don’t worry about coverage, just write some good tests.”
The programmer smiled, bowed, and
left.
...
Later that day, a second programmer
asked the same question.
The great master pointed at a pot of
boiling water and said:
“How many grains of rice should I put in that pot?”
The programmer, looking puzzled,
replied:
“How can I possibly tell you? It depends on how many people you need to
feed, how hungry they are, what other
food you are serving, how much rice
you have available, and so on.”
“Exactly,” said the great master.
The second programmer smiled, bowed,
and left.
...
Toward the end of the day, a third
programmer came and asked the same
question about code coverage.
“Eighty percent and no less!” Replied the master in a stern voice,
pounding his fist on the table.
The third programmer smiled, bowed,
and left.
...
After this last reply, a young
apprentice approached the great
master:
“Great master, today I overheard you answer the same question about
code coverage with three different
answers. Why?”
The great master stood up from his
chair:
“Come get some fresh tea with me and let’s talk about it.”
After they filled their cups with
smoking hot green tea, the great
master began to answer:
“The first programmer is new and just getting started with testing.
Right now he has a lot of code and no
tests. He has a long way to go;
focusing on code coverage at this time
would be depressing and quite useless.
He’s better off just getting used to
writing and running some tests. He can
worry about coverage later.”
“The second programmer, on the other hand, is quite experience both
at programming and testing. When I
replied by asking her how many grains
of rice I should put in a pot, I
helped her realize that the amount of
testing necessary depends on a number
of factors, and she knows those
factors better than I do – it’s her
code after all. There is no single,
simple, answer, and she’s smart enough
to handle the truth and work with
that.”
“I see,” said the young apprentice,
“but if there is no single simple
answer, then why did you answer the
third programmer ‘Eighty percent and
no less’?”
The great master laughed so hard and
loud that his belly, evidence that he
drank more than just green tea,
flopped up and down.
“The third programmer wants only simple answers – even when there are
no simple answers … and then does not
follow them anyway.”
The young apprentice and the grizzled
great master finished drinking their
tea in contemplative silence.
Code Coverage is a misleading metric if 100% coverage is your goal (instead of 100% testing of all features).
You could get a 100% by hitting all the lines once. However you could still miss out testing a particular sequence (logical path) in which those lines are hit.
You could not get a 100% but still have tested all your 80%/freq used code-paths. Having tests that test every 'throw ExceptionTypeX' or similar defensive programming guard you've put in is a 'nice to have' not a 'must have'
So trust yourself or your developers to be thorough and cover every path through their code. Be pragmatic and don't chase the magical 100% coverage. If you TDD your code you should get a 90%+ coverage as a bonus. Use code-coverage to highlight chunks of code you have missed (shouldn't happen if you TDD though.. since you write code only to make a test pass. No code can exist without its partner test. )
Jon Limjap makes a good point - there is not a single number that is going to make sense as a standard for every project. There are projects that just don't need such a standard. Where the accepted answer falls short, in my opinion, is in describing how one might make that decision for a given project.
I will take a shot at doing so. I am not an expert in test engineering and would be happy to see a more informed answer.
When to set code coverage requirements
First, why would you want to impose such a standard in the first place? In general, when you want to introduce empirical confidence in your process. What do I mean by "empirical confidence"? Well, the real goal correctness. For most software, we can't possibly know this across all inputs, so we settle for saying that code is well-tested. This is more knowable, but is still a subjective standard: It will always be open to debate whether or not you have met it. Those debates are useful and should occur, but they also expose uncertainty.
Code coverage is an objective measurement: Once you see your coverage report, there is no ambiguity about whether standards have been met are useful. Does it prove correctness? Not at all, but it has a clear relationship to how well-tested the code is, which in turn is our best way to increase confidence in its correctness. Code coverage is a measurable approximation of immeasurable qualities we care about.
Some specific cases where having an empirical standard could add value:
To satisfy stakeholders. For many projects, there are various actors who have an interest in software quality who may not be involved in the day-to-day development of the software (managers, technical leads, etc.) Saying "we're going to write all the tests we really need" is not convincing: They either need to trust entirely, or verify with ongoing close oversight (assuming they even have the technical understanding to do so.) Providing measurable standards and explaining how they reasonably approximate actual goals is better.
To normalize team behavior. Stakeholders aside, if you are working on a team where multiple people are writing code and tests, there is room for ambiguity for what qualifies as "well-tested." Do all of your colleagues have the same idea of what level of testing is good enough? Probably not. How do you reconcile this? Find a metric you can all agree on and accept it as a reasonable approximation. This is especially (but not exclusively) useful in large teams, where leads may not have direct oversight over junior developers, for instance. Networks of trust matter as well, but without objective measurements, it is easy for group behavior to become inconsistent, even if everyone is acting in good faith.
To keep yourself honest. Even if you're the only developer and only stakeholder for your project, you might have certain qualities in mind for the software. Instead of making ongoing subjective assessments about how well-tested the software is (which takes work), you can use code coverage as a reasonable approximation, and let machines measure it for you.
Which metrics to use
Code coverage is not a single metric; there are several different ways of measuring coverage. Which one you might set a standard upon depends on what you're using that standard to satisfy.
I'll use two common metrics as examples of when you might use them to set standards:
Statement coverage: What percentage of statements have been executed during testing? Useful to get a sense of the physical coverage of your code: How much of the code that I have written have I actually tested?
This kind of coverage supports a weaker correctness argument, but is also easier to achieve. If you're just using code coverage to ensure that things get tested (and not as an indicator of test quality beyond that) then statement coverage is probably sufficient.
Branch coverage: When there is branching logic (e.g. an if), have both branches been evaluated? This gives a better sense of the logical coverage of your code: How many of the possible paths my code may take have I tested?
This kind of coverage is a much better indicator that a program has been tested across a comprehensive set of inputs. If you're using code coverage as your best empirical approximation for confidence in correctness, you should set standards based on branch coverage or similar.
There are many other metrics (line coverage is similar to statement coverage, but yields different numeric results for multi-line statements, for instance; conditional coverage and path coverage is similar to branch coverage, but reflect a more detailed view of the possible permutations of program execution you might encounter.)
What percentage to require
Finally, back to the original question: If you set code coverage standards, what should that number be?
Hopefully it's clear at this point that we're talking about an approximation to begin with, so any number we pick is going to be inherently approximate.
Some numbers that one might choose:
100%. You might choose this because you want to be sure everything is tested. This doesn't give you any insight into test quality, but does tell you that some test of some quality has touched every statement (or branch, etc.) Again, this comes back to degree of confidence: If your coverage is below 100%, you know some subset of your code is untested.
Some might argue that this is silly, and you should only test the parts of your code that are really important. I would argue that you should also only maintain the parts of your code that are really important. Code coverage can be improved by removing untested code, too.
99% (or 95%, other numbers in the high nineties.) Appropriate in cases where you want to convey a level of confidence similar to 100%, but leave yourself some margin to not worry about the occasional hard-to-test corner of code.
80%. I've seen this number in use a few times, and don't entirely know where it originates. I think it might be a weird misappropriation of the 80-20 rule; generally, the intent here is to show that most of your code is tested. (Yes, 51% would also be "most", but 80% is more reflective of what most people mean by most.) This is appropriate for middle-ground cases where "well-tested" is not a high priority (you don't want to waste effort on low-value tests), but is enough of a priority that you'd still like to have some standard in place.
I haven't seen numbers below 80% in practice, and have a hard time imagining a case where one would set them. The role of these standards is to increase confidence in correctness, and numbers below 80% aren't particularly confidence-inspiring. (Yes, this is subjective, but again, the idea is to make the subjective choice once when you set the standard, and then use an objective measurement going forward.)
Other notes
The above assumes that correctness is the goal. Code coverage is just information; it may be relevant to other goals. For instance, if you're concerned about maintainability, you probably care about loose coupling, which can be demonstrated by testability, which in turn can be measured (in certain fashions) by code coverage. So your code coverage standard provides an empirical basis for approximating the quality of "maintainability" as well.
Code coverage is great, but functionality coverage is even better. I don't believe in covering every single line I write. But I do believe in writing 100% test coverage of all the functionality I want to provide (even for the extra cool features I came with myself and which were not discussed during the meetings).
I don't care if I would have code which is not covered in tests, but I would care if I would refactor my code and end up having a different behaviour. Therefore, 100% functionality coverage is my only target.
My favorite code coverage is 100% with an asterisk. The asterisk comes because I prefer to use tools that allow me to mark certain lines as lines that "don't count". If I have covered 100% of the lines which "count", I am done.
The underlying process is:
I write my tests to exercise all the functionality and edge cases I can think of (usually working from the documentation).
I run the code coverage tools
I examine any lines or paths not covered and any that I consider not important or unreachable (due to defensive programming) I mark as not counting
I write new tests to cover the missing lines and improve the documentation if those edge cases are not mentioned.
This way if I and my collaborators add new code or change the tests in the future, there is a bright line to tell us if we missed something important - the coverage dropped below 100%. However, it also provides the flexibility to deal with different testing priorities.
I'd have another anectode on test coverage I'd like to share.
We have a huge project wherein, over twitter, I noted that, with 700 unit tests, we only have 20% code coverage.
Scott Hanselman replied with words of wisdom:
Is it the RIGHT 20%? Is it the 20%
that represents the code your users
hit the most? You might add 50 more
tests and only add 2%.
Again, it goes back to my Testivus on Code Coverage Answer. How much rice should you put in the pot? It depends.
Many shops don't value tests, so if you are above zero at least there is some appreciation of worth - so arguably non-zero isn't bad as many are still zero.
In the .Net world people often quote 80% as reasonble. But they say this at solution level. I prefer to measure at project level: 30% might be fine for UI project if you've got Selenium, etc or manual tests, 20% for the data layer project might be fine, but 95%+ might be quite achievable for the business rules layer, if not wholly necessary. So the overall coverage may be, say, 60%, but the critical business logic may be much higher.
I've also heard this: aspire to 100% and you'll hit 80%; but aspire to 80% and you'll hit 40%.
Bottom line: Apply the 80:20 rule, and let your app's bug count guide you.
For a well designed system, where unit tests have driven the development from the start i would say 85% is a quite low number. Small classes designed to be testable should not be hard to cover better than that.
It's easy to dismiss this question with something like:
Covered lines do not equal tested logic and one should not read too much into the percentage.
True, but there are some important points to be made about code coverage. In my experience this metric is actually quite useful, when used correctly. Having said that, I have not seen all systems and i'm sure there are tons of them where it's hard to see code coverage analysis adding any real value. Code can look so different and the scope of the available test framework can vary.
Also, my reasoning mainly concerns quite short test feedback loops. For the product that I'm developing the shortest feedback loop is quite flexible, covering everything from class tests to inter process signalling. Testing a deliverable sub-product typically takes 5 minutes and for such a short feedback loop it is indeed possible to use the test results (and specifically the code coverage metric that we are looking at here) to reject or accept commits in the repository.
When using the code coverage metric you should not just have a fixed (arbitrary) percentage which must be fulfilled. Doing this does not give you the real benefits of code coverage analysis in my opinion. Instead, define the following metrics:
Low Water Mark (LWM), the lowest number of uncovered lines ever seen in the system under test
High Water Mark (HWM), the highest code coverage percentage ever seen for the system under test
New code can only be added if we don't go above the LWM and we don't go below the HWM. In other words, code coverage is not allowed to decrease, and new code should be covered. Notice how i say should and not must (explained below).
But doesn't this mean that it will be impossible to clean away old well-tested rubbish that you have no use for anymore? Yes, and that's why you have to be pragmatic about these things. There are situations when the rules have to be broken, but for your typical day-to-day integration my experience it that these metrics are quite useful. They give the following two implications.
Testable code is promoted.
When adding new code you really have to make an effort to make the code testable, because you will have to try and cover all of it with your test cases. Testable code is usually a good thing.
Test coverage for legacy code is increasing over time.
When adding new code and not being able to cover it with a test case, one can try to cover some legacy code instead to get around the LWM rule. This sometimes necessary cheating at least gives the positive side effect that the coverage of legacy code will increase over time, making the seemingly strict enforcement of these rules quite pragmatic in practice.
And again, if the feedback loop is too long it might be completely unpractical to setup something like this in the integration process.
I would also like to mention two more general benefits of the code coverage metric.
Code coverage analysis is part of the dynamic code analysis (as opposed to the static one, i.e. Lint). Problems found during the dynamic code analysis (by tools such as the purify family, http://www-03.ibm.com/software/products/en/rational-purify-family) are things like uninitialized memory reads (UMR), memory leaks, etc. These problems can only be found if the code is covered by an executed test case. The code that is the hardest to cover in a test case is usually the abnormal cases in the system, but if you want the system to fail gracefully (i.e. error trace instead of crash) you might want to put some effort into covering the abnormal cases in the dynamic code analysis as well. With just a little bit of bad luck, a UMR can lead to a segfault or worse.
People take pride in keeping 100% for new code, and people discuss testing problems with a similar passion as other implementation problems. How can this function be written in a more testable manner? How would you go about trying to cover this abnormal case, etc.
And a negative, for completeness.
In a large project with many involved developers, everyone is not going to be a test-genius for sure. Some people tend to use the code coverage metric as proof that the code is tested and this is very far from the truth, as mentioned in many of the other answers to this question. It is ONE metric that can give you some nice benefits if used properly, but if it is misused it can in fact lead to bad testing. Aside from the very valuable side effects mentioned above a covered line only shows that the system under test can reach that line for some input data and that it can execute without hanging or crashing.
If this were a perfect world, 100% of code would be covered by unit tests. However, since this is NOT a perfect world, it's a matter of what you have time for. As a result, I recommend focusing less on a specific percentage, and focusing more on the critical areas. If your code is well-written (or at least a reasonable facsimile thereof) there should be several key points where APIs are exposed to other code.
Focus your testing efforts on these APIs. Make sure that the APIs are 1) well documented and 2) have test cases written that match the documentation. If the expected results don't match up with the docs, then you have a bug in either your code, documentation, or test cases. All of which are good to vet out.
Good luck!
Code coverage is just another metric. In and of itself, it can be very misleading (see www.thoughtworks.com/insights/blog/are-test-coverage-metrics-overrated). Your goal should therefore not be to achieve 100% code coverage but rather to ensure that you test all relevant scenarios of your application.
I prefer to do BDD, which uses a combination of automated acceptance tests, possibly other integration tests, and unit tests. The question for me is what the target coverage of the automated test suite as a whole should be.
That aside, the answer depends on your methodology, language and testing and coverage tools. When doing TDD in Ruby or Python it's not hard to maintain 100% coverage, and it's well worth doing so. It's much easier to manage 100% coverage than 90-something percent coverage. That is, it's much easier to fill coverage gaps as they appear (and when doing TDD well coverage gaps are rare and usually worth your time) than it is to manage a list of coverage gaps that you haven't gotten around to and miss coverage regressions due to your constant background of uncovered code.
The answer also depends on the history of your project. I've only found the above to be practical in projects managed that way from the start. I've greatly improved the coverage of large legacy projects, and it's been worth doing so, but I've never found it practical to go back and fill every coverage gap, because old untested code is not well understood enough to do so correctly and quickly.
85% would be a good starting place for checkin criteria.
I'd probably chose a variety of higher bars for shipping criteria - depending on the criticality of the subsystems/components being tested.
Code coverage is great but only as long as the benefits that you get from it outweigh the cost/effort of achieving it.
We have been working to a standard of 80% for some time, however we have just made the decison to abandon this and instead be more focused on our testing. Concentrating on the complex business logic etc,
This decision was taken due to the increasing amount of time we spent chasing code coverage and maintaining existing unit tests. We felt we had got to the point where the benefit we were getting from our code coverage was deemed to be less than the effort that we had to put in to achieve it.
I use cobertura, and whatever the percentage, I would recommend keeping the values in the cobertura-check task up-to-date. At the minimum, keep raising totallinerate and totalbranchrate to just below your current coverage, but never lower those values. Also tie in the Ant build failure property to this task. If the build fails because of lack of coverage, you know someone's added code but hasn't tested it. Example:
<cobertura-check linerate="0"
branchrate="0"
totallinerate="70"
totalbranchrate="90"
failureproperty="build.failed" />
When I think my code isn't unit tested enough, and I'm not sure what to test next, I use coverage to help me decide what to test next.
If I increase coverage in a unit test - I know this unit test worth something.
This goes for code that is not covered, 50% covered or 97% covered.
Short answer: 60-80%
Long answer:
I think it totally depends on the nature of your project. I typically start a project by unit testing every practical piece. By the first "release" of the project you should have a pretty good base percentage based on the type of programming you are doing. At that point you can start "enforcing" a minimum code coverage.
If you've been doing unit testing for a decent amount of time, I see no reason for it not to be approaching 95%+. However, at a minimum, I've always worked with 80%, even when new to testing.
This number should only include code written in the project (excludes frameworks, plugins, etc.) and maybe even exclude certain classes composed entirely of code written of calls to outside code. This sort of call should be mocked/stubbed.
Generally speaking, from the several engineering excellence best practices papers that I have read, 80% for new code in unit tests is the point that yields the best return. Going above that CC% yields a lower amount of defects for the amount of effort exerted. This is a best practice that is used by many major corporations.
Unfortunately, most of these results are internal to companies, so there are no public literatures that I can point you to.
My answer to this conundrum is to have 100% line coverage of the code you can test and 0% line coverage of the code you can't test.
My current practice in Python is to divide my .py modules into two folders: app1/ and app2/ and when running unit tests calculate the coverage of those two folders and visually check (I must automate this someday) that app1 has 100% coverage and app2 has 0% coverage.
When/if I find that these numbers differ from standard I investigage and alter the design of the code so that coverage conforms to the standard.
This does mean that I can recommend achieving 100% line coverage of library code.
I also occasionally review app2/ to see if I could possible test any code there, and If I can I move it into app1/
Now I'm not too worried about the aggregate coverage because that can vary wildly depending on the size of the project, but generally I've seen 70% to over 90%.
With python, I should be able to devise a smoke test which could automatically run my app while measuring coverage and hopefully gain an aggreagate of 100% when combining the smoke test with unittest figures.
Check out Crap4j. It's a slightly more sophisticated approach than straight code coverage. It combines code coverage measurements with complexity measurements, and then shows you what complex code isn't currently tested.
Viewing coverage from another perspective: Well-written code with a clear flow of control is the easiest to cover, the easiest to read, and usually the least buggy code. By writing code with clearness and coverability in mind, and by writing the unit tests in parallel with the code, you get the best results IMHO.
In my opinion, the answer is "It depends on how much time you have". I try to achieve 100% but I don't make a fuss if I don't get it with the time I have.
When I write unit tests, I wear a different hat compared to the hat I wear when developing production code. I think about what the tested code claims to do and what are the situations that can possible break it.
I usually follow the following criteria or rules:
That the Unit Test should be a form of documentation on what's the expected behavior of my codes, ie. the expected output given a certain input and the exceptions it may throw that clients may want to catch (What the users of my code should know?)
That the Unit Test should help me discover the what if conditions that I may not yet have thought of. (How to make my code stable and robust?)
If these two rules doesn't produce 100% coverage then so be it. But once, I have the time, I analyze the uncovered blocks and lines and determine if there are still test cases without unit tests or if the code needs to be refactored to eliminate the unecessary codes.
It depends greatly on your application. For example, some applications consist mostly of GUI code that cannot be unit tested.
I don't think there can be such a B/W rule.
Code should be reviewed, with particular attention to the critical details.
However, if it hasn't been tested, it has a bug!
Depending on the criticality of the code, anywhere from 75%-85% is a good rule of thumb.
Shipping code should definitely be tested more thoroughly than in house utilities, etc.
This has to be dependent on what phase of your application development lifecycle you are in.
If you've been at development for a while and have a lot of implemented code already and are just now realizing that you need to think about code coverage then you have to check your current coverage (if it exists) and then use that baseline to set milestones each sprint (or an average rise over a period of sprints), which means taking on code debt while continuing to deliver end user value (at least in my experience the end user doesn't care one bit if you've increased test coverage if they don't see new features).
Depending on your domain it's not unreasonable to shoot for 95%, but I'd have to say on average your going to be looking at an average case of 85% to 90%.
I think the best symptom of correct code coverage is that amount of concrete problems unit tests help to fix is reasonably corresponds to size of unit tests code you created.
I think that what may matter most is knowing what the coverage trend is over time and understanding the reasons for changes in the trend. Whether you view the changes in the trend as good or bad will depend upon your analysis of the reason.
We were targeting >80% till few days back, But after we used a lot of Generated code, We do not care for %age, but rather make reviewer take a call on the coverage required.
From the Testivus posting I think the answer context should be the second programmer.
Having said this from a practical point of view we need parameter / goals to strive for.
I consider that this can be "tested" in an Agile process by analyzing the code we have the architecture, functionality (user stories), and then come up with a number. Based on my experience in the Telecom area I would say that 60% is a good value to check.