Unit tests - The benefit from unit tests with contract changes?

Unit tests - The benefit from unit tests with contract changes? - unit-testing

Recently I had an interesting discussion with a colleague about unit tests. We were discussing when maintaining unit tests became less productive, when your contracts change.
Perhaps anyone can enlight me how to approach this problem. Let me elaborate:
So lets say there is a class which does some nifty calculations. The contract says that it should calculate a number, or it returns -1 when it fails for some reason.
I have contract tests who test that. And in all my other tests I stub this nifty calculator thingy.
So now I change the contract, whenever it cannot calculate it will throw a CannotCalculateException.
My contract tests will fail, and I will fix them accordingly. But, all my mocked/stubbed objects will still use the old contract rules. These tests will succeed, while they should not!
The question that rises, is that with this faith in unit testing, how much faith can be placed in such changes... The unit tests succeed, but bugs will occur when testing the application. The tests using this calculator will need to be fixed, which costs time and may even be stubbed/mocked a lot of times...
How do you think about this case? I never thought about it thourougly. In my opinion, these changes to unit tests would be acceptable. If I do not use unit tests, I would also see such bugs arise within test phase (by testers). Yet I am not confident enough to point out what will cost more time (or less).
Any thoughts?

The first issue you raise is the so-called "fragile test" problem. You make a change to your application, and hundreds of tests break because of that change. When this happens, you have a design problem. Your tests have been designed to be fragile. They have not been sufficiently decoupled from the production code. The solution is (as it it in all software problems like this) to find an abstraction that decouples the tests from the production code in such a way that the volatility of the production code is hidden from the tests.
Some simple things that cause this kind of fragility are:
Testing for strings that are displayed. Such strings are volatile because their grammar or spelling may change at the whim of an analyst.
Testing for discrete values (e.g. 3) that should be encoded behind an abstraction (e.g. FULL_TIME).
Calling the same API from many tests. You should wrap the API call in a test function so that when the API changes you can make the change in one place.
Test design is an important issue that is often neglected by TDD beginners. This often results in fragile tests, which then leads the novices to reject TDD as "unproductive".
The second issue you raised was false positives. You have used so many mocks that none of your tests actually test the integrated system. While testing independent units is a good thing, it is also important to test partial and whole integrations of the system. TDD is not just about unit tests.
Tests should be arranged as follows:
Unit tests provide close to 100% code coverage. They test independent units. They are written by programmers using the programming language of the system.
Component tests cover ~50% of the system. They are written by business analysts and QA. They are written in a language like FitNesse, Selenium, Cucumber, etc. They test whole components, not individual units. They test primarily happy path cases and some highly visible unhappy path cases.
Integration tests cover ~20% of the system. They tests small assemblies of components as opposed to the whole system. Also written in FitNesse/Selenium/Cucumber etc. Written by architects.
System tests cover ~10% of the system. They test the whole system integrated together. Again they are written in FitNesse/Selenium/Cucumber etc. Written by architects.
Exploratory manual tests. (See James Bach) These tests are manual but not scripted. They employ human ingenuity and creativity.

It's better to have to fix unit test that fail due to intentional code changes than not having tests to catch the bugs that are eventually introduced by these changes.
When your codebase has a good unit test coverage, you may run into many unit test failures that are not due to bugs in the code but intentional changes on the contracts or code refactoring.
However, that unit test coverage will also give you confidence to refactor the code and implement any contract changes. Some test will fail and will need to be fixed, but other tests will eventually fail due to bugs that you introduced with these changes.

Unit tests surely can not catch all bugs, even in the ideal case of 100% code / functionality coverage. I think that is not to be expected.
If the tested contract changes, I (the developer) should use my brains to update all code (including test code!) accordingly. If I fail to update some mocks which therefore still produce the old behaviour, that is my fault, not of the unit tests.
It is similar to the case when I fix a bug and produce a unit test for, but I fail to think through (and test) all similar cases, some of which later turns out to be buggy as well.
So yes, unit tests need maintenance just as well as the production code itself. Without maintenance, they decay and rot.

I have similar experiences with unit tests - when you change the contract of one class often you need to change loads of other tests as well (which will actually pass in many cases, which makes it even more difficult). That is why I always use higher level tests as well:
Acceptance tests - test a couple or more classes. These tests are usually aligned to user stores that need to be implemented - so you test that the user story "works". These don't need to connect to a DB or other external systems, but may.
Integration tests - mainly to check external system connectivity, etc.
Full end-to-end tests - test the whole system
Please note that even if you have 100% unit test coverage, you are not even guaranteed that your application starts! That is why you need higher level tests. There are so many different layers of tests because the lower you test something, the cheaper it usually is (in terms of development, maintaining test infrastructure as well as execution time).
As a side note - because of the problem you mentioned using unit tests teaches you to keep your components as decoupled as possible and their contracts as small as possible - which is definitely a good practise!

One of the rules for unit tests code (and all other code used for testing) is to treat it the same way as production code - no more, no less - just the same.
My understanding of this is that (beside keeping it relevant, refactored, working etc. like production code) it should be looked at it the same way from the investment/cost prospective as well.
Probably your testing strategy should include something to address the problem you have described in the initial post - something along the lines specifying what test code (including stubs/mocks) should be reviewed (executed, inspected, modified, fixed etc) when a designer change a function/method in production code. Therefore the cost of any production code change must include the cost of doing this - if not - the test code will become "third-class citizen" and the designers' confidence in the unit test suite as well as its relevance will decrease... Obviously, the ROI is in the timing of bugs discovery and fix.

One principle that I rely on here is removing duplication. I generally don't have many different fakes or mocks implementing this contract (I use more fakes than mocks partly for this reason). When I change the contract it is natural to inspect every implementation of that contract, production code or test. It bugs me when I find I'm making this kind of change, my abstractions should have been better thought out perhaps etc, but if the test codes is too onerous to change for the scale of the contract change then I have to ask myself if these also are due some refactoring.

I look at it this way, when your contract changes, you should treat it like a new contract. Therefore, you should create a whole new set of UNIT test for this "new" contract. The fact that you have an existing set of test cases is besides the point.

I second uncle Bob's opinion that the problem is in the design. I would additionally go back one step and check the design of your contracts.
In short
instead of saying "return -1 for x==0" or "throw CannotCalculateException for x==y", underspecify niftyCalcuatorThingy(x,y) with the precondition x!=y && x!=0 in appropriate situations (see below). Thus your stubs may behave arbitrarily for these cases, your unit tests must reflect that, and you have maximal modularity, i.e. the liberty to arbitrarily change the behavior of your system under test for all underspecified cases - without the need to change contracts or tests.
Underspecification where appropriate
You can differentiate your statement "-1 when it fails for some reason" according to the following criteria: Is the scenario
an exceptional behavior that the implementation can check?
within the method's domain/responsibility?
an exception that the caller (or someone earlier in the call stack) can recover from/handle in some other way?
If and only if 1) to 3) hold, specify the scenario in the contract (e.g. that EmptyStackException is thrown when calling pop() on an empty stack).
Without 1), the implementation cannot guarantee a specific behavior in the exceptional case. For instance, Object.equals() does not specify any behavior when the condition of reflexivity, symmetry, transitivity & consistency is not met.
Without 2), SingleResponsibilityPrinciple is not met, modularity is broken and users/readers of the code get confused. For instance, Graph transform(Graph original) should not specify that MissingResourceException might be thrown because deep down, some cloning via serialization is done.
Without 3), the caller cannot make use of the specified behavior (certain return value/exception). For instance, if the JVM throws an UnknownError.
Pros and Cons
If you do specify cases where 1), 2) or 3) does not hold, you get some difficulties:
a main purpose of a (design by) contract is modularity. This is best achievable if you really separate the responsibilities: When the precondition (the responsibility of the caller) is not met, not specifying the behavior of the implementation leads to maximal modularity - as your example shows.
you don't have any liberty to change in the future, not even to a more general functionality of the method which throws exception in fewer cases
exceptional behaviors can become quite complex, so the contracts covering them become complex, error prone and hard to understand. For instance: is every situation covered? Which behavior is correct if multiple exceptional preconditions hold?
The downside of underspecification is that (testing) robustness, i.e. the implementation's ability to react appropriately to abnormal conditions, is harder.
As compromise, I like to use the following contract schema where possible:
<(Semi-)formal PRE- and POST-condition, including exceptional
behavior where 1) to 3) hold>
If PRE is not met, the current implementation throws the RTE A, B or
C.

Related

Goal of unit testing and TDD: find/minimize bugs or improve design?

I'm fairly green to unit testing and TDD, so please bear with me as I ask what some may consider newbie questions, or if this has been debated before. If this turns out to be considered a "bad question" (too subjective and open for debate), I will happily close it. However, I've searched for a couple days, and am not getting a definitive answer, and I need a better understand of this, so I know no better way to get more info than to post here.
I've started reading an older book on unit testing (because a colleague had it on hand), and its opening chapter talks about why to unit test. One of the points it makes is that in the long run, your code is much more reliable and cleaner, and less prone to bugs. It also points out that effective unit testing will make tracking and fixing bugs much easier. So it seems to focus quite a bit on the overall prevention/reduction of bugs in your code.
On the other hand, I also found an article about writing great unit tests, and it states that the goal of unit testing is to make your design more robust, and conversely, finding bugs is the goal of manual testing, not unit testing.
So being the newbie to TDD that I am, I'm a little confused as to the state of mind with which I should go into TDD and building my unit tests. I'll admit that part of the reason I'm taking this on now with my recently started project is because I'm tired of my changes breaking previously existing code. And admittedly, the linked article above does at least point this out as an advantage to TDD. But my hope is that by going back in and adding unit tests to my existing code (and then continuing TDD from this point forward) is to help prevent these bugs in the first place.
Are this book and this article really saying the same thing in different tones, or is there some subjectivity on this subject, and what I'm seeing is just two people having somewhat different views on how to approach TDD?
Thanks in advance.

Unit tests and automated tests generally are for both better design and verified code.
Unit test should test some execution path in some very small unit. This unit is usually public method or internal method exposed on your object. The method itself can still use many other protected or private methods from the same object instance. You can have single method and several unit test for this method to test different execution paths. (By execution path I meant something controlled by if, switch, etc.) Writing unit tests this way will validate that your code really does what you expect. This can be especially important in some corner cases where you expect to throw exception in some rare scenarios etc. You can also test how method behaves if you pass different parameters - for example null instead of object instance, negative value for integer used for indexing, etc. That is especially useful for public API.
Now suppose that your tested method also uses instances of other classes. How to deal with it? Should you still test your single method and believe that class works? What if the class is not implemented yet? What if the class has some complex logic inside? Should you test these execution paths as well on your current method? There are two approaches to deal with this:
For some cases you will simply let the real class instance to be tested together with your method. This is for example very common in case of logging (it is not bad to have logs available for test as well).
For other scenarios you would like to take this dependencies from your method but how to do it? The solution is dependency injection and implementing against abstraction instead of implementation. What does it mean? It means that your method / class will not create instances of these dependencies but instead it will get them either through method parameters, class constructor or class properties. It also means that you will not expect concrete implementation but either abstract base class or interface. This will allow you to pass fake, dummy or mock implementation to your tested object. These special type of implementations simply don't do any processing they get some data and return expected result. This will allow you to test your method without dependencies and lead to much better and more extensible design.
What is the disadvantage? Once you start using fakes / mocks you are testing single method / class but you don't have a test which will grab all real implementations and put them together to test if the whole system really works = You can have thousands of unit tests and validate that each your method works but it doesn't mean they will work together. This is scenario for more complex tests - integration or end-to-end tests.
Unit tests should be usually very easy to write - if they are not it means that your design is probably complicated and you should think about refactoring. They should be also very fast to execute so you can run them very often. Other kinds of test can be more complex and very slow and they should run mostly on build server.
How it fits with SW development process? The worst part of development process is stabilization and bug fixing because this part can be very hardly estimated. To be able to estimate how much time bug fixing takes you must know what causes the bug. But this investigation cannot be estimated. You can have bug which will take one hour to fix but you will spend two weeks by debugging your application and searching for this bug. When using good code coverage you will most probably find such bug early during development.
Automated testing don't say that SW doesn't contain bugs. It only say that you did your best to find and solve them during development and because of that your stabilization could be much less painful and much shorter. It also doesn't say that your SW does what it should - that is more about application logic itself which must be tested by some separate tests going through each use case / user story - acceptance tests (they can be also automated).
How this fit with TDD? TDD takes it to extreme because in TDD you will write your test first to drive your quality, code coverage and design.

It's a false choice. "Find/minimize bugs" OR improve design.
TDD, in particular (and as opposed to "just" unit testing) is all about giving you better design.
And when your design is better, what are the consequences?
Your code is easier to read
Your code is easier to understand
Your code is easier to test
Your code is easier to reuse
Your code is easier to debug
Your code has fewer bugs in the first place
With well-designed code, you spend less time finding and fixing bugs, and more time adding features and polish. So TDD gives you a savings on bugs and bug-hunting, by giving you better design. These things are not separate; they are dependent and interrelated.

There can many different reasons why you might want to test your code. Personally, I test for a number of reasons:
I usually design API using a combination of the normal design patterns (top-down) and test-driven development (TDD; bottom-up) to ensure that I have a sound API both from a best practices point-of-view as well as from an actual usage point-of-view. The focus of the tests is both on the major use-cases for the API, but also on the completeness of the API and the behavior - so they are primary "black box" tests. The development sequence is often:
main API based on design patterns and "gut feeling"
TDD tests for the major use-cases according to the high-level specification for the API - primary in order to make sure the API is "natural" and easy to use
fleshed out API and behavior
all the needed test cases to ensure the completeness and correct behavior
Whenever I fix an error in my code, I try to write a test to make sure it stay fixed. Somehow, the error got into my original design and passed my original testing of the code, so it is probably not all that trivial. I have noticed that many of the tests tests are "write box" tests.
In order to be able to make any sort of major re-factoring of the code, you need an extensive set of API tests to make sure the behavior of the code stays the same after the re-factoring. For any non-trivial API, I want the test suite to be in place and working for a long time before the re-factoring to be sure that all the major use-cases are covered in a good way. As often as not, you are forced to throw away most of your "white box" tests as they - by the very definition - makes too many assumptions about the internals. I usually try to "translate" as many as possible of these tests as the same non-trivial problems tend to survive re-factoring of the code.
In order to transfer any code between developers, I usually also want a good test suite with focus on the API and the major use-cases. So basically the tests from the initial TDD...

I think that answer to your question is: both.
You will improve design because there is one particular thing about TDD that is great: while you write tests you put yourself in the position of the client code that will be using the system under test - and this alone makes you think about certain design choices.
For example: UI. When you start writing the tests, you will see that those God-Forms are impossible to test, so you separate the logic behind the screens to a presenter/controller, and you get MVP/MVC/whatever.
Having the concept of unit testing a class and mocking dependencies brings you to Single Responsibility Principle. There is a point about every of SOLID principles.
As for bugs, well, if you unit test every method of every class you write (except properties, very simple methods and such) you will catch most bugs in the start. Write the integration tests, you cover almost all of them.

I'll take my stab at this using a remix of a previous answer I wrote. In short, I don't see this as a dichotomy between driving good design and minimizing bugs. I see it more as one (good design) leading to the other (minimizing bugs).
I tend towards saying TDD is a design process that happens to involve unit testing. It's a design process because within each Red-Green-Refactor iteration, you write the test first for code that doesn't exist. You're designing as you're going.
The first beauty of TDD is that the design of your code is guaranteed to be testable. Testable code tends to have loose coupling and high cohesion. Loose coupling and high cohesion are important because they make the code easy to change when requirements change. The second beauty of TDD is that after you're done implementing your system, you happen to have a huge regression suite to catch any bugs and changes in assumptions. Thus, TDD makes your code easy to change because of the design it creates and it makes your code safe to change because of the test harness it creates.

Trying to retrospectively add Unit tests can be quite painful and expensive. If the code doesn't support Unit test you may be better looking at integration tests to test your code.

Don't mix Unit Testing with TDD.
Unit Testing is just the fact of "testing" your code to ensure quality and maintainability.
TDD is a full blown development methodology in which you first write your tests (based on requirements), and only then you write the needed code (and just the needed code) to make that test pass. This means that you only write code to repair a broken test.
Once done that, you write another test, and the code needed to make it pass. In the way, you may be forced to do "refactoring" of the code to allow a new test run without braking another. This way, the "design" arises from the tests.
The purpose of this methodology is of course reduce bugs and improve design, but the main goal of it is to improve productivity because you write exactly the code you need. And you don't write documentation: the tests are the documentation. If a requirement changes, then you change the tests and the code afterwards. If new requirements appear, just add new tests.

Is a postcondition a (type of) unit test?

I'm trying to incorporate some design-by-contract techniques into my coding style. Postconditions look a lot to me like embedded unit tests and I'm wondering if my thinking here is on the right track or way off-base.
Wikipedia defines a postcondition as "a condition or predicate that must always be true just after the execution of some section of code or after an operation in a formal specification. Postconditions are sometimes tested using assertions within the code itself".
Is that not very similar to what you do in a unit test that verifies state directly (doesn't use mocks)?
If that's the case:
1) By using post-conditions, aren't I now sort of embedding testing code in my production code, and isn't that frowned upon?
2) Should using postconditions change the structure of my unit tests? My first thought is that the assertion logic is moved from the tests to the postconditions. That is, tests will use the same inputs and I'm still testing everything I was testing before, but now instead of making assertions in the unit tests I'm making a simple binary assertion about the postconditions passing or not.
3) My second thought is that postcondition code might have control flow and is therefore not ideal for test code, which is supposed to be simple and avoid control flow. But, if I test the postconditions, can I then rely on them in my unit tests?
4) It seems difficult to test postconditions because if I understand them correctly they basically pass or fail and you would have to repeat the logic of the postcondition itself to check that it did the right thing. So, how do you test a postcondition? Do you check them by not utilizing them in your unit testing and ensuring your unit tests and postconditions pass or fail together?
5) My unit tests sometimes verify that a method has caused changes to state in collaborators. In standard practice, do postconditions cover collaborator state or just the state of the class they are defined on?

You are on the right track.
It is true that post-conditions serve a similar purpose to unit tests. The key difference is that the post-condition always runs, while the unit test only runs against a known set of data. This means that the post-condition is less likely to overlook the corner case you didn't think of, but is more expensive at run time.
Here are answers to your specific questions.
There is a run-time penalty to post-conditions. However (depending on your environment), it may be possible to drop assertions for speed. (In C you can use an #ifdef, in Java look up AOP, in Python anything in a assert only runs if you pass the --debug flag, etc.) Should you get a performance problem from your assertions, it is solvable. However my preference would be to leave them on until you have a reason not to.
Some of your logic will naturally move from the unit test to the post-condition. However it is worthwhile to make sure that you have unit tests that run through all of the cases of interest for your post-condition. This is particularly true if you are dropping assertions in production for speed.
Post-conditions are not unit tests. Write them in whatever way that makes sense for what they do. (In general they should be somewhat simple.)
In general you test post-conditions as described in #2, by passing in a set of inputs of interest where the post-condition might possibly be violated, and check that it isn't. If you want to test the logic of the post-condition itself, then you can set up code that can violate the post-condition, but which will only run during tests. For instance have a global variable that tests can set which, if it is set, replaces the data to be returned with whatever you want. Now you can cause the post-condition to receive any input you want.
I'm not going to give you a hard and fast rule. They are your contracts. They should say what makes sense for what the function is doing. That said, what you are describing can lead to tight coupling between those objects. Tight coupling is something you should only do with good reason.

Contracts aren't a form of unit-testing. Rather they're a way of specifying (in an executable format) what conditions should hold before and after a particular function or method is called, and may also specify invariants of objects.
You still need tests when you have contracts since just because you've specified what the functions are supposed to do doesn't mean they'll actually do it. But you'll find that your contracts will help you debug - because by having code that can check that what's happening at run-time is what was expected means that any logic or programming error will cause a failure near to the code that contains the error.
You may find that with contracts you're happy to have fewer smaller tests and more larger-scale tests since the contracts will let you narrow down the source of an error even if the test is broad. Also, there's less need for unit tests to play the role of a specification of how the logic is supposed to work, further limiting the value of the smaller tests.
Contracts are like assertions in that you may choose to or choose not to have them enabled in production code. My opinion is that contracts tend to be more expensive than assertions and so you'll tend to have them disabled in production.

As with any methodology or coding style - there is no single correct answer. However, one thing I found to be true so far is that there is never a 'one size fits all' solution.
So, if you implement these assertions into a logics of every single postcondition in your design, I'd consider it to be wrong.
My own opinion is that such assertions should be used only if failure to meet postconditions leads the entire system to a dangerously inconsistent state. So, if something like that happens, I'd definitely like the system to do something like: send email/sms to admin, halt production execution, run diagnostics or whatever should be done for that particular system. Note, that this would be an actual feature which purpose is increased security, it's not a unit test code.
On the other hand, if you're coding assertions after every single method call, then as you noticed only thing you are doing is hardcoding test cases into production code. That doesn't serve any real purpose, other than to make your codebase a big mess.

Does isolation frameworks (Moq, RhinoMock, etc) lead to test overspecification?

In Osherove's great book "The Art of Unit Testing" one of the test anti-patterns is over-specification which is basically the same as testing the internal state of the object instead of some expected output. To my experience, using Isolation frameworks can cause the same unwanted side effects as testing internal behavior because one tends to only implement the behavior necessary to make your stub interact with the object under test. Now if your implementation changes later on (but the contract remains the same), your test will suddenly break because you are expecting some data from the stub which was not implemented.
So what do you think is the best approach to counter this?
1) Implement your stubs/mocks fully, this has the negative side-effect of potentially making your test less readable and also specifying more than necessary to make your test pass.
2) Favor manual, fully implemented fakes.
3) Implement your stubs/fakes so that they make your test just pass, and then deal with the brittleness that this might introduce.

I do not think you should favor manual testing - unless you prefer to test instead of code.
Instead you have another option - if you test the functionality and not the implementation, try to avoid testing private methods (that can be refactored) and in general write less-fragile tests you'll see that using a mocking/isolation framework does not require you to over specify the system nor does it cause your tests to become more fragile.
In a nutshell - writing fragile tests can be done with or without fakes/mocks and vise-versa.

I tend to use mocks instead of stubbed/fake objects. I find them a lot less trouble and they are way better at keeping test code under control because it's not cluttered with all sorts of half baked implementations. They also help to clarify what is being tested.
Another advantage is that I only have to address where the class under test needs something specific from the mock. So I don't have to code where it's not important. As for verification, again I only have to very the calls from the class under test to the mock that I care about and consider important aspects of the test.

I think, the problem is always the same, although it comes in different flavours: If you have tests that somehow cover the internals of a class, then you will break the tests that cover this internal code.
IMHO there are two ways to deal with that:
Your tests only cover the public contract of a class - a test strategy which is widely adopted for that exact reason: You don't have to change your tests as long as the public contract remains constant. Unfortunately, this is not, what you will have when doing Test-driven development.
If your tests come from a TDD process, then they will regularly cover non-public code. This means that they will break if you change the code. The only way to keep things in sync here is to 'fix' the tests together with the code. This means more maintenance during development. There's no recipe to easily deal with that (other than throw away the test, of course...).
My personal 'way out' is think in terms of 'code elements' rather than just code. A code element consists of three parts: Documentation, test, code. So if you change one part of the element, you have to also adjust the other two - otherwise you leave a broken code element behind.

Are there situations where unit tests are detrimental to code?

Most of the discussion on this site is very positive about unit testing. I'm a fan of unit testing myself. However, I've found extensive unit testing brings its own challenges. For example, unit tests are often closely coupled to the code they test, which can make API changes increasingly costly as the volume of tests grows.
Have you found real-world situations where unit tests have been detrimental to code quality or time to delivery? How have you dealt with these situations? Are there any 'best practices' which can be applied to the design and implementation of unit tests?
There is a somewhat related question here: Why didn't unit testing work out for your project?

With extensive unit testing you will start to find that refactoring operations are more expensive for exactly the reasons you said.
IMHO this is a good thing. Expensive and big changes to an API should have a bigger cost relative to small and cheap changes. Refactoring is not a free operation and it's important to understand the impact to both yourself and consumers of your API. Unit Tests are great ruler for measuring how expensive an API change will be to consume.
Part of this problem though is relieved by tooling. Most IDEs directly or indirectly (via plugins) support refactoring operations in their code base. Using these operations to change your unit tests will relieve a bit of the pain.

Are there any 'best practices' which
can be applied to the design and
implementation of unit tests?
Make sure your unit tests haven't become integration tests. For example if you have unit tests for a class Foo, then ideally the tests can only break if
there was a change in Foo
or there was a change in the interfaces used by Foo
or there was a change in the domain model (typically you'll have some classes like "Customer" which are central to the problem space, have no room for abstraction and are therefore not hidden behind an interface)
If your tests are failing because of any other changes, then they have become integration tests and you'll get in trouble as the system grows bigger. Unit tests should have no such scalability issues because they test an isolated unit of code.

One of the projects I worked on was heavily unit-tested; we had over 1000 unit tests for 20 or so classes. There was slightly more test code than production code. The unit tests caught innumerable errors introduced during refactoring operations; they definitely made it easy and safe to make changes, extend functionality etc. The released code had a very low bug rate.
To encourage ourselves to write the unit tests, we specifically chose to keep them 'quick and dirty' - we would bash out a test as we produced the project code, and the tests were boring and 'not real code', so as soon as we wrote one that exercised the functionality of the production code, we were done, and moved on. The only criteria for the test code was that it fully exercised the API of the production code.
What we learnt the hard way is that this approach does not scale. As the code evolved, we saw a need to change the communication pattern between our objects, and suddenly I had 600 failing unit tests! Fixing this took me several days. This level of test breakage happened two or three times with further major architecture refactorings. In each case I don't believe we could reasonably have foreseen the code evolution that was required beforehand.
The moral of the story for me was this: unit-testing code needs to be just as clean as production code. You simply can't get away with cuttting and pasting in unit tests. You need to apply sensible refactoring, and decouple your tests from the production code where possible by using proxy objects.
Of course all of this adds some complexity and cost to your unit tests (and can introduce bugs to your tests!), so it's a fine balance. But I do believe that the concept of 'unit tests', taken in isolation, is not the clear and unambiguous win it's often made out to be. My experience is that unit tests, like everything else in programming, require care, and are not a methodology that can be applied blindly. It's therefore surprising to me that I've not seen more discussion of this topic on forums like this one and in the literature.

Mostly in cases where the system was developed without unit testing in mind, it was an afterthought and not a design tool. When you develop with automated tests the chances of breaking your API diminishes.

An excess of false positives can slow development down, so it's important to test for what you actually want to remain invariant. This usually means writing unit tests in advance for requirements, then following up with more detailed unit tests to detect unexpected shifts in output.

I think you're looking at fixing a symptom, rather than recognizing the whole of the problem. The root problem is that a true API is a published interface*, and it should be subject to the same bounds that you would place on any programming contract: no changes! You can add to an API, and call it API v2, but you can't go back and change API v1.0, otherwise you have indeed broken backward compatibility, which is almost always a bad thing for an API to do.
(* I don't mean to call out any specific interfacing technology or language, interface can mean anything from the class declarations on up.)
I would suggest that a Test Driven Development approach would help prevent many of these kinds of problems in the first place. With TDD you would be "feeling" the awkwardness of the interfaces while you were writing the tests, and you would be compelled to fix those interfaces earlier in the process rather than waiting until after you've written a thousand tests.
One of the primary benefits of Test Driven Development is that it gives you instant feedback on the programmatic use of your class/interface. The act of writing a test is a test of your design, while the act of running the test is the test of your behavior. If it's difficult or awkward to write a test for a particular method, then that method is likely to be used incorrectly, meaning it's a weak design and it should be refactored quickly.

Yes there are situations where unit testing can be detrimental to code quality and delivery time. If you create too many unit test your code will become mangled with interfaces and your code quality as a whole will suffer. Abstraction is great but you can have too much of it.
If your writing unit tests for a prototype or a system that has a high chance of having major changes your unit test will have an effect on time to delivery. In these cases it's often better to write acceptance test which test closer to end to end.

If you're sure your code won't be reused, won't need to be mantained, your project is simple and very short term; then you shouldn't need unit tests.
Unit tests are useful to facilitate changes and maintenance. They really add a little in time to delivery, but it is paid in the medium / long term. If there is no medium / long term, it may be unnecessary, being the manual tests enough.
But all of this is very unlikely, though. So they are still a trend :)
Also, sometimes might be a necessary business decision to invest less time in testing, in order to have a faster urgent delivery (which will need to be paid with interest later)

Slow unit tests can often be detrimental to development. This usually happens when unit tests become integration tests that need to hit web services or the database. If your suite of unit tests takes over an hour to run, often times you'll find yourself and your team essentially paralyzed for that hour waiting to see if the unit tests pass or not (since you don't want to keep building upon a broken foundation).
With that being said, I think the benefits far outweigh the drawbacks in all but the most contrived cases.

What should NOT be unit tested?

I got the impression that some problems are just to hard to unit test. And even if you do it, often such tests provide little value.
What code should not be unit tested, apart from getters and setters?
(might be similar to this question)

My general approach is "if this code is not worth testing, why is it worth having in the first place"? If I'm using a language which forces me to have a lot of uselessly repetitive boilerplate, then maybe I don't need to test those parts if the language's compiler can just check them; but I normally use languages where code I write is actually meaningful;-).
Can you given an example of problem that's too hard to unit-test? I've heard this used as an excuse to avoid testing error-recovery and diagnostic code that's only triggered by rare and very unlikely circumstances, but every time this has come up I've argued that, on the contrary, that code is the one most needing unit tests, because it's not going to get exercised in the integration tests and normal use (e.g. at QA stage).
Dependency Injection lets you use a fake or mock object to stand for (whatever "should never cause this error but we're covering for it anyway" -- network, database, power control interface, etc), and your fake or mock easily can and definitely should cause fake errors of all kinds so you can thoroughly check that error-recovery and diagnostic code.
Maybe it depends on what kind of apps you write -- for the last few years I've been mostly in cluster-management software, where everything that can go wrong will, lots of things that can't possibly go wrong will anyway, and uptime and fast recovery are crucial. In that field nobody would ever dare argue against a belt-and-suspenders approach (if they did the reliability engineers would be after them with cudgels;-).
But I've recently switched to Business Intelligence and I've noticed the approach translates well, too: if the numbers my code is producing (maybe to show as a nice graph to business decision makers, etc) are worth producing at all, they'd better be accurate, which means (among other things) that the code producing them needs to be tested just as thoroughly and carefully as that which monitors a network or a power supply system!-)

You shouldn't write unit tests for other people's code (such as a framework you are using). You should only write tests for your code. Mock out dependencies on other people's code so that you only need to write tests for yours.

This is a question of cost and benefit, the closer you try to get to 100% the more expensive it will be.
There is also the UI layer, if this is in a technology that is difficult to test, you could program this layer such that it is as thin as possible, and then only test it manually.
Depending upon your situation you could drop testing pass-through layers and generated code.
Note that it is not just a question of code coverage but how you test, it may be better to have many tests on a limited part of the code and a lower code coverage.

On my current project I'm doing automated testing, of features and of system functionality, but no unit testing at all: Should one test internal implementation, or only test public behaviour?
Some people talk as if the only alternative to unit-testing is ad-hoc manual testing; but many of the benefits (e.g. regression testing) come from testing being automated, and not necessarily from it's being at the unit level.

You don't need to test the language constructs, but outside of that, there's really not anything that "shouldn't" be unit tested.
If there are cases where you've already got the design, and a good reason for it to exist, and it's not a mission critical part of the application, such as a minor user interface feature, then a case can be made that it's not necessarily worth fighting to produce a unit test for. But that's not necessarily the same as "shouldn't" be unit tested.

Automated unit tests can't be run on graphics code, since the computer cannot decide if what's being drawn to the screen is actually correct or not!
Although you can write a manual unit test in that case, of course

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js