Using API-tests as Unit-tests in a microservice-application (heresy)

Using API-tests as Unit-tests in a microservice-application (heresy) - unit-testing

Question contains heretic content user discretion advised.
My Company codes a website with many very small microservices. I have the task to write unitTests.
I figured it would take a lot of mocking because we have a lot of inter-service communication. Those APIs are very likely to change -> mocks break easily.
So instead of all the pain of these mocks i decided to write api-tests checking for expected error-messages, status code, correct calculations and checked the res-obj against our swagger-documentation.
I even did TDD with these api-tests and I must say it worked really well. But it doesn't have much in common with unit-testing
What do you guys think about this?
Positive:
even tho i Test a API my "Units" are pretty small
tests don't need that much maintenance
tests are written very fast because no mocking is required.
I only need a few tests to catch all the errors. Good error-handling indicates location of error pretty well.
units are very intertwined anyways. Services are extremely dependent.
errors of depending services are always being catched and logged. I then just run the test of the underlying service.
I can fix "integration errors" locally with my unit-tests
Negative:
If error is not perfectly logged it takes some time to find out where the error originated from.
Tests are not that performant. It takes about 20 seconds per service. Probably more as soon as they grow.
Some tests depend on the unit before them.

Related

Unit testing handling of degraded network stack, file corruption, and other imperfections

I'm primarily a C++ coder, and thus far, have managed without really writing tests for all of my code. I've decided this is a Bad Idea(tm), after adding new features that subtly broke old features, or, depending on how you wish to look at it, introduced some new "features" of their own.
But, unit testing seems to be an extremely brittle mechanism. You can test for something in "perfect" conditions, but you don't get to see how your code performs when stuff breaks. A for instance is a crawler, let's say it crawls a few specific sites, for data X. Do you simply save sample pages, test against those, and hope that the sites never change? This would work fine as regression tests, but, what sort of tests would you write to constantly check those sites live and let you know when the application isn't doing it's job because the site changed something, that now causes your application to crash? Wouldn't you want your test suite to monitor the intent of the code?
The above example is a bit contrived, and something I haven't run into (in case you haven't guessed). Let me pick something I have, though. How do you test an application will do its job in the face of a degraded network stack? That is, say you have a moderate amount of packet loss, for one reason or the other, and you have a function DoSomethingOverTheNetwork() which is supposed to degrade gracefully when the stack isn't performing as it's supposed to; but does it? The developer tests it personally by purposely setting up a gateway that drops packets to simulate a bad network when he first writes it. A few months later, someone checks in some code that modifies something subtly, so the degradation isn't detected in time, or, the application doesn't even recognize the degradation, this is never caught, because you can't run real world tests like this using unit tests, can you?
Further, how about file corruption? Let's say you're storing a list of servers in a file, and the checksum looks okay, but the data isn't really. You want the code to handle that, you write some code that you think does that. How do you test that it does exactly that for the life of the application? Can you?
Hence, brittleness. Unit tests seem to test the code only in perfect conditions(and this is promoted, with mock objects and such), not what they'll face in the wild. Don't get me wrong, I think unit tests are great, but a test suite composed only of them seems to be a smart way to introduce subtle bugs in your code while feeling overconfident about it's reliability.
How do I address the above situations? If unit tests aren't the answer, what is?
Edit: I see a lot of answers that say "just mock it". Well, you can't "just mock it", here's why:
Taking my example of the degrading network stack, let's assume your function has a well defined NetworkInterface, which we'll mock. The application sends out packets over both TCP, and UDP. Now, let's say, hey, let's simulate 10% loss on the interface using a mock object, and see what happens. Your TCP connections increase their retry attempts, as well as increasing their back-off, all good practice. You decide to change X% of your UDP packets to actually make a TCP connection, lossy interface, we want to be able to be able to guarantee delivery of some packets, and the others shouldn't lose too much. Works great. Meanwhile, in the real world.. when you increase the number of TCP connections (or, data over TCP), on a connection that's lossy enough, you'll end up increasing your UDP packet loss, as your TCP connections will end up re-sending their data more and more and/or reducing their window, causing your 10% packet loss to actually be more like 90% UDP packet loss now. Whoopsie.
No biggie, let's break that up into UDPInterface, and TCPInterface. Wait a minute.. those are interdependent, testing 10% UDP loss and 10% TCP loss is no different than the above.
So, the issue is now you're not simply unit testing your code, you're introducing your assumptions into the way the operating system's TCP stack works. And, that's a Bad Idea(tm). A much worse idea than just avoiding this entire fiasco.
At some point, you're going to have to create a Mock OS, which behaves exactly like your real OS, except, is testable. That doesn't seem like a nice way forward.
This is stuff we've experienced, I'm sure others can add their experiences too.
I hope someone will tell me I'm very wrong, and point out why!
Thanks!

You start by talking about unit tests, then talk about entire applications; it seems you are a little confused about what unit testing is. Unit testing by definition is about testing at the most fine grained level, when each "unit" of the software is being tested. In common use, a "unit" is an individual function, not an entire application. Contemporary programming style has short functions, each of which does one well defined thing, which is therefore easy to unit test.

what sort of tests would you write to constantly check those sites live?
UnitTests target small sections of code you write. UnitTests do not confirm that things are ok in the world. You should instead define application behavior for those imperfect scenarios. Then you can UnitTest your application in those imperfect scenarios.
for instance a crawler
A crawler is a large body of code you might write. It has some different parts, one part might fetch a webpage. Another part might analyze html. Even these parts may be too large to write a unit test against.
How do you test an application will do its job in the face of a degraded network stack?
The developer tests it personally by purposely setting up a gateway that drops packets to simulate a bad network when he first writes it.
If a test uses the network, it's not a UnitTest.
A UnitTest (which must target your code) cannot call the network. You didn't write the network. The UnitTest should involve a mock network with simulated (but consistent each time) packet loss.
Unit tests seem to test the code only in perfect conditions
UnitTests test your code in defined conditions. If you're only capable of defining perfect conditions, your statement is true. If you're capable of defining imperfect conditions, your statement is false.

Work through any decent book on unit testing - you'll find that it's normal practise to write tests that do indeed cover edge cases where the input is not ideal or is plain wrong.
The most common approach in languages with exception handling is a "should throw" specification, where a certain test is expected to cause a specific exception type to be thrown. If it doesn't throw an exception, the test fails.
Update
In your update you describe complex timing-sensitive interactions. Unit testing simply doesn't help at all there. No need to introduce networking: just think of trying to write a simple thread safe queue class, perhaps on a platform with some new concurrency primitives. Test it on an 8 core system... does it work? You simply can't know that for sure by testing it. There are just too many different ways that the timing can cause operations to overlap between the cores. Depending on luck, it could take weeks of continuous execution before some really unlikely coincidence occurs. The only way to get such things right is through careful analysis (static checking tools can help). It's likely that most concurrent software has some rarely occuring bugs in it, including all operating systems.
Returning to the cases that can actually be tested, I've found integration tests to be often just as useful as unit tests. This can be as elaborate as automating the installation of your product, adding configurations to it (such as your users might create) and then "poking" it from the outside, e.g. automating your UI. This finds a whole other class of issue separate from unit testing.

It sounds as if you answered your own question.
Mocks/stubs are the key to testing difficult to test areas. For all of your examples, the manual approach of say creating a website with dodgy data, or causing network failure could be done manually. However it would be very difficult and tedious to do so, not something anyone would recommend. In fact, doing some would mean you are not actually unit testing.
Instead you'd use mock/stubs to pretend such scenarios have happened allowing you to test them. The benefit of using mocks is that unlike the manual approach you can guarantee that each time you run your tests the same procedure will be carried out. The tests in turn will be much faster and stable because of this.
Edit - With regards the updated question.
Just as a disclaimer my networking experience is very limited, therefore I can't comment on the technical side of your issues. However, I can comment on the fact you sound as if you are testing too much. In other words, your tests cover too much of a wide scope. I don't know what your code base is like but given functions/objects within that, you should still be able to provide fake input that will allow you to test that your objects/functions do the right thing in isolation.
So lets imagine your isolated areas work fine given the requirements. Just because your unit tests pass does not mean you've tested your application. You'll still need to manually test such scenarios you describe. In this scenario it sounds as if stress testing - limiting network resources and so on are required. If your application works as expected - great. If not, you've got missing tests. Unit testing (more in tie with TDD/BDD) is about ensuring small, isolated areas of your application work. You still need integration/manual/regression etc.. testing afterwards. Therefore you should use mocks/stubs to test your small, isolated areas function. Unit testing is more akin to a design process if anything in my opinion.

Integration Testing vs Unit Testing
I should preface this answer by saying I am biased towards integration tests vs unit tests as the primary type of test used in tdd. At work we also have some unit tests mixed in, but only as necessary. The primary reason why we start with an integration test is because we care more about what the application is doing rather than what a particular function does. We also get integration coverage which has been, in my experience, a huge gap for automated testing.
To Mock or Not, Why Not Do Both
Our integration tests can run either fully wired (to unmanaged resource) or with mocks. We have found that helps to cover the gap between real world vs mocks. This also provides us with the option to decide NOT to have a mocked version because the ROI for implementing the mock isn't worth it. You may ask why use mocks at all.
tests suite runs faster
guaranteed same response every time (no timeouts, unforeseen degraded network, etc)
fine-grained control over behavior
Sometimes you shouldn't write a test
Testing, any kind of testing has trade offs. You look at the cost to implement the test, the mock, variant tests, etc and weigh that against the benefits and sometime it doesn't make sense to write the test, the mock, or the variant. This decision is also made within the context of the kind of software your building which really is one of the major factor in deciding how deep and broad your test suite needs to be. To put it another way, I'll write a few tests for the social bacon meetup feature, but I'm not going to write the formal verification test for the bacon-friend algorithm.
Do you simply save sample pages, test
against those, and hope that the sites
never change?
Testing is not a panacea
Yes, you save samples (as fixtures). You don't hope the page doesn't change, but you can't know how and when it will change. If you have ideas or parameters of how it may change then you can create variants to make sure your code will handle those variants. When and if it does change, and it breaks, you add new samples, fix the problems and move on.
what sort of tests would you write to
constantly check those sites live and
let you know when the application
isn't doing it's job because the site
changed something, that now causes
your application to crash?
Testing != Monitoring
Tests are tests and part of development (and QA), not for production. MONITORING is what you use in production to make sure your application is working properly. You can write monitors which should alert you when something is broken. That's a whole other topic.
How do you test an application will do
its job in the face of a degraded
network stack?
Bacon
If it were me I would have a wired and mocked mode for the test (assuming the mock was good enough to be useful). If the mock is difficult to get right, or if it's not worth it then I would just have the wired test. However, I have found that there is almost always a way split the variables in play into different tests. Then each of those tests are targeted to testing that vector of change, while minimizing all the other variability in play. The trick is to write the important variants, not every possible variant.
Further, how about file corruption?
How Much Testing
You mention that checksum being correct, but the file actually being corrupt. The question here is what is the class of software I'm writing. Do I need to be super paranoid about the possibility of a statistically small false positive or not. If I do, then we work to find what how deep and broad to test.

I think you can't and shouldn't make an unit test for all possible errors you might face (what if a meteorite hits the db server?) - you should make an effort to test errors with reasonably probablity and/or rely or another services.
For example; if your application requires the correct arrival of network packets; you should use the TCP transport layer: it guarantees the correctness of the received packets transparently, so you only have to concentrace eg. what happens if network connection is dropped.
Checksums are designed to detect or correct a reasonable amount of errors - if you expect 10 errors per file, you would use different checksum than if you expect 100 errors. If the chosen checksum indicates that the file is correct, than you have no reason to think it's broken (the probablity that it is broken is negligible).
Because you don't have infinite resources (eg. time) you have to make compromises when you write your tests; and choosing these compromises it a tough question.

Although not a complete answer to the massive dilema you face, you can reduce the amount of tests by using a technique called Equivalence Partitioning.
In my organization, we perform many levels of coverage, regression, positive, negative, scenario based, UI in automated and manual tests, all starting from a 'clean environment', but even that isn't perfect.
As for one of the cases you mention, where a programmer comes in and changes some sensitive detection code and no one notices, we would have had a snapshot of data that is 'behaviourally dodgy', which fails consistently with a specific test to test the detection routine - and we would run all tests regularly (and not just at the last minute).

Sometimes I'll create two (or more) test suites. One suite uses mocks/stubs and only tests the code I'm writing. The other tests test the database, web sites, network devices, other servers, and whatever else is outside of my control.
Those other tests are really tests of my assumptions about the systems my code interacts with. So if they fail, I know my requirements have changed. I can then update my internal tests to reflect whatever new behavior my code needs to have.
The internal tests include tests that simulate various failures of the external systems. Whenever I observe a new kind of failure, either through my other tests or as a result of a bug report, I have a new internal test to write.
Writing tests that model all the bizarre things that happen in the real world can be challenging, but the result is that you really think about all those cases, and produce robust code.

The proper use of Unit Testing starts from the ground up. That is, you write your unit tests BEFORE you write your production code. The unit tests are then forced to consider error conditions, pre-conditions, post-conditions, etc. Once you write your production code (and the unit tests are able to compile and run successfully), if someone makes a change to the code that changes any of its conditions (even subtly), the unit test will fail and you will learn about it very quickly (either via compiler error or via a failed unit test).
EDIT: Regarding the updated question
What you are trying to test is not really suited well for unit testing. Networking and database connections test better in a simulated integration test. There are far too many things that can break during the initialization of a remote connection to create a useful unit test for it (I'm sure there are some unit-tests-fix-all people that will disagree with me there, but in my experience, trying to unit test network traffic and/or remote database functionality is worse than shoving a square peg though a round hole).

You are talking about library or application testing, which is not the same as unit testing. You can use unit testing libraries such as CppUnit/NUnit/JUnit for library and regression testing purposes, but as others have said, unit testing is about testing your lowest level functions, which are supposed to be very well defined and easily separated from the rest of the code. Sure, you could pass all low-level unit tests, and still have a network failure in the full system.
Library testing can be very difficult, because sometimes only a human can evaluate the output for correctness. Consider a vector graphics or font rendering library; there's no single perfect output, and you may get a completely different result based on the video card in your machine.
Or testing a PDF parser or a C++ compiler is dauntingly difficult, due to the enormous number of possible inputs. This is when owning 10 years of customer samples and defect history is way more valuable than the source code itself. Almost anyone can sit down and code it, but initially you won't have a way of validating your program for correctness.

The beauty of mock objects is that you can have more than one. Assume that you are programming against a well-defined interface for a network stack. Then you can have a mock object WellBehavingNetworkStack to test the normal case and another mock object OddlyBehavingNetworkStack that simulates some of the network failures that you expect.
Using unit tests I usually also test argument validation (like ensuring that my code throws NullPointerExceptions), and this is easy in Java, but difficult in C++, since in the latter language you can hit undefined behavior quite easily, and then all bets are off. Therefore you cannot be strictly sure that your unit tests work, even if they seem to. But still you can test for odd situations that do not invoke undefined behavior, which should be quite a lot in well-written code.

What you are talking about is making applications more robust. That is, you want them to handle failures elegantly. However, testing every possible real world failure scenario would be difficult if not impossible. The key to making applications robust is to assume that failure is normal and should be expected at some point in the future. How an application handles failure really depends on the situation. There are a number of different ways to detect and handle failure (maybe a good question to ask the group). Trying to rely on unit testing alone will only get you part of the way. Anticipating failure (even on some simple operations) will get you even closer to a more robust application. Amazon built thier entire system to anticipate all types of failures (hardware, software, memory and file corruption). Take a look at thier Dynamo for an example of real world error handling.

Is there such a thing as too much unit testing?

I tried looking through all the pages about unit tests and could not find this question. If this is a duplicate, please let me know and I will delete it.
I was recently tasked to help implement unit testing at my company. I realized that I could unit test all the Oracle PL/SQL code, Java code, HTML, JavaScript, XML, XSLT, and more.
Is there such a thing as too much unit testing? Should I write unit tests for everything above or is that overkill?

This depends on the project and its tolerance for failure. There is no single answer. If you can risk a bug, then don't test everything.
When you have tons of tests, it is also likely you will have bugs in your tests. Adding to your headaches.
test what needs testing, leave what does not which often leaves the fairly simple stuff.

Is there such as thing as too much unit testing?
Sure. The problem is finding the right balance between enough unit testing to cover the important areas of functionality, and focusing effort on creating new value for your customers in the terms of system functionality.
Unit testing code vs. leaving code uncovered by tests both have a cost.
The cost of excluding code from unit testing may include (but aren't limited to):
Increased development time due to fixing issues you can't automatically test
Fixing problems discovered during QA testing
Fixing problems discovered when the code reaches your customers
Loss of revenue due to customer dissatisfaction with defects that made it through testing
The costs of writing a unit test include (but aren't limited to):
Writing the original unit test
Maintaining the unit test as your system evolves
Refining the unit test to cover more conditions as you discover them in testing or production
Refactoring unit tests as the underlying code under test is refactored
Lost revenue when it takes longer for you application to reach enter the market
The opportunity cost of implementing features that could drive sales
You have to make your best judgement about what these costs are likely to be, and what your tolerance is for absorbing such costs.
In general, unit testing costs are mostly absorbed during the development phase of a system - and somewhat during it's maintenance. If you spend too much time writing unit tests you may miss a valuable window of opportunity to get your product to market. This could cost you sales or even long-term revenue if you operate in a competitive industry.
The cost of defects is absorbed during the entire lifetime of your system in production - up until the point the defect is corrected. And potentially, even beyond that, if they defect is significant enough that it affects your company's reputation or market position.

Kent Beck of JUnit and JUnitMax fame answered a similar question of mine.
The question has slightly different semantics but the answer is definitely relevant

The purpose of Unit tests is generally to make it possibly to refector or change with greater assurance that you did not break anything. If a change is scary because you do not know if you will break anything, you probably need to add a test. If a change is tedious because it will break a lot of tests, you probably have too many test (or too fragile a test).
The most obvious case is the UI. What makes a UI look good is something that is hard to test, and using a master example tends to be fragile. So the layer of the UI involving the look of something tends not to be tested.
The other times it might not be worth it is if the test is very hard to write and the safety it gives is minimal.
For HTML I tended to check that the data I wanted was there (using XPath queries), but did not test the entire HTML. Similarly for XSLT and XML. In JavaScript, when I could I tested libraries but left the main page alone (except that I moved most code into libraries). If the JavaScript is particularly complicated I would test more. For databases I would look into testing stored procedures and possibly views; the rest is more declarative.
However, in your case first start with the stuff that worries you the most or is about to change, especially if it is not too difficult to test. Check the book Working Effectively with Legacy Code for more help.

Yes, there is such a thing as too much unit testing. One example would be unit testing in a whitebox manner, such that you're effectively testing the specific implementation; such testing would effectively slow down progress and refactoring by requiring compliant code to need new unit tests (because the tests were dependent upon specific implementation details).

I suggest that in some situations you might want automated testing, but no 'unit' testing at all (Should one test internal implementation, or only test public behaviour?), and that any time spent writing unit tests would be better spent writing system tests.

While more tests is usually better (I have yet to be on a project that actually had too many tests), there's a point at which the ROI bottoms out, and you should move on. I'm assuming you have finite time to work on this project, by the way. ;)
Adding unit tests has some amount of diminishing returns -- after a certain point (Code Complete has some theories), you're better off spending your finite amount of time on something else. That may be more testing/quality activities like refactoring and code review, usability testing with real human users, etc., or it could be spent on other things like new features, or user experience polish.

As EJD said, you can't verify the absence of errors.
This means there are always more tests you could write. Any of these could be useful.
What you need to understand is that unit-testing (and other types of automated testing you use for development purposes) can help with development, but should never be viewed as a replacement for formal QA.
Some tests are much more valuable than others.
There are parts of your code that change a lot more frequently, are more prone to break, etc. These are the most economical tests.
You need to balance out the amount of testing you agree to take on as a developer. You can easily overburden yourself with unmaintainable tests. IMO, unmaintainable tests are worse than no tests because they:
Turn others off from trying to maintain a test suite or write new tests.
Detract from you adding new, meaningful functionality. If automated testing is not a net-positive result, you should ditch it like other engineering practices.
What should I test?
Test the "Happy Path" - this ensures that you get interactions right, and that things are wired together properly. But you don't adequately test a bridge by driving down it on a sunny day with no traffic.
Pragmatic Unit Testing recommends you use Right-BICEP to figure out what to test. "Right" for the happy path, then Boundary conditions, check any Inverse relationships, use another method (if it exists) to Cross-check results, force Error conditions, and finally take into account any Performance considerations that should be verified. I'd say if you are thinking about tests to write in this way, you're most likely figure out how to get to an adequate level of testing. You'll be able to figure out which ones are more useful and when. See the book for much more info.
Test at the right level
As others have mentioned, unit tests are not the only way to write automated tests. Other types of frameworks may be built off of unit tests, but provide mechanisms to do package level, system or integration tests. The best bang for the buck may be at a higher level, and just using unit testing to verify a single component's happy path.
Don't be discouraged
I'm painting a more grim picture here than I expect most developers will find in reality. The bottom line is that you make a commitment to learn how to write tests and write them well. But don't let fear of the unknown scare you into not writing any tests. Unlike production code, tests can be ditched and rewritten without many adverse effects.

Unit test any code that you think might change.

You should only really write unit tests for any code which you have written yourself. There is no need to test the functionality inherently provided to you.
For example, If you've been given a library with an add function, you should not be testing that add(1,2) returns 3. Now if you've WRITTEN that code, then yes, you should be testing it.
Of course, whoever wrote the library may not have tested it and it may not work... in which case you should write it yourself or get a separate one with the same functionality.

Well, you certainly shouldn't unit test everything, but at least the complicated tasks or those that will most likely contain errors/cases you haven't thought of.

The point of unit testing is being able to run a quick set of tests to verify that your code is correct. This lets you verify that your code matches your specification and also lets you make changes and ensure that they don't break anything.
Use your judgement. You don't want to spend all of your time writing unit tests or you won't have any time to write actual code to test.

When you've unit tested your unit tests, thinking you have then provided 200% coverage.

There is a development approach called test-driven development which essentially says that there is no such thing as too much (non-redundant) unit testing. That approach, however, is not a testing approach, but rather a design approach which relies on working code and a more or less complete unit test suite with tests which drive every single decision made about the codebase.
In a non-TDD situation, automated tests should exercise every line of code you write (in particular Branch coverage is good), but even then there are exceptions - you shouldn't be testing vendor-supplied platform or framework code unless you know for certain that there are bugs which will affect you in that platform. You shouldn't be testing thin wrappers (or, equally, if you need to test it, the wrapper is not thin). You should be testing all core business logic, and it is certainly helpful to have some set of tests that exercise your database at some elemental level, although those tests will never work in the common situation where unit tests are run every time you compile.
Specifically with regard to database testing is intrinsically slow, and depending on how much logic is held in your database, quite difficult to get right. Typically things like dbs, HTML/XML documents & templating, and other document-ish aspects of a program are verified moreso than tested. The difference is usually that testing tries to exercise execution paths whereas verification tries to verify inputs and outputs directly.
To learn more about this I would suggest reading up on "Code Coverage". There is a lot of material available if you're curious about this.

Are unit tests and acceptance tests enough?

If I have unit tests for each class and/or member function and acceptance tests for every user story do I have enough tests to ensure the project functions as expected?
For instance if I have unit tests and acceptance tests for a feature do I still need integration tests or should the unit and acceptance tests cover the same ground? Is there overlap between test types?
I'm talking about automated tests here. I know manual testing is still needed for things like ease of use, etc.

If I have unit tests for each class and/or member function and acceptance tests for every user story do I have enough tests to ensure the project functions as expected?
No. Tests can only verify what you have thought of. Not what you haven't thought of.

I'd recommend reading chapters 20 - 22 in the 2nd edition of Code Complete. It covers software quality very well.
Here's a quick breakdown of some of the key points (all credit goes to McConnell, 2004)
Chapter 20 - The Software-Quality Landscape:
No single defect-detection technique is completely effective by itself
The earlier you find a defect, the less intertwined it will become with the rest of your code and the less damage it will cause
Chapter 21 - Collaborative Construction:
Collaborative development practices tend to find a higher percentage of defects than testing and to find them more efficiently
Collaborative development practices tend to find different kinds of errors than testing does, implying that you need to use both reviews and testing to ensure the quality of your software
Pair programming typically costs the about the same as inspections and produces similar quality code
Chapter 22 - Developer Testing:
Automated testing is useful in general and is essential for regression testing
The best way to improve your testing process is to make it regular, measure it, and use what you learn to improve it
Writing test cases before the code takes the same amount of time and effort as writing the test cases after the code, but it shortens defect-detection-debug-correction-cycles (Test Driven Development)
As far as how you are formulating your unit tests, you should consider basis testing, data-flow analysis, boundary analysis etc. All of these are explained in great detail in the book (which also includes many other references for further reading).
Maybe this isn't exactly what you were asking, but I would say automated testing is definitely not enough of a strategy. You should also consider such things as pair programming, formal reviews (or informal reviews, depending on the size of the project) and test scaffolding along with your automated testing (unit tests, regression testing etc.).

The idea of multiple testing cycles is to catch problems as early as possible when things change.
Unit tests should be done by the developers to ensure the units work in isolation.
Acceptance tests should be done by the client to ensure the system meets the requirements.
However, something has changed between those two points that should also be tested. That's the integration of units into a product before being given to the client.
That's something that should first be tested by the product creator, not the client. The minute you invlove the client, things slow down so the more fixes you can do before they get their grubby little hands on it, the better.
In a big shop (like ours), there are unit tests, integration tests, globalization tests, master-build tests and so on at each point where the deliverable product changes. Only once all high severity bugs are fixed (and a plan for fixing low priority bugs is in place) do we unleash the product to our beta clients.
We do not want to give them a dodgy product simply because fixing a bug at that stage is a lot more expensive (especially in terms of administrivia) than anything we do in-house.

It's really impossible to know whether or not you have enough tests based simply on whether you have a test for every method and feature. Typically I will combine testing with coverage analysis to ensure that all of my code paths are exercised in my unit tests. Even this is not really enough, but it can be a guide to where you may have introduced code that isn't exercised by your tests. This should be an indication that more tests need to be written or, if you're doing TDD, you need to slow down and be more disciplined. :-)
Tests should cover both good and bad paths, especially in unit tests. Your acceptance tests may be more or less concerned with the bad path behavior but should at least address common errors that may be made. Depending on how complete your stories are, the acceptance tests may or may not be adequate. Often there is a many-to-one relationship between acceptance tests and stories. If you only have one automated acceptance test for every story, you probably don't have enough unless you have different stories for alternate paths.

Multiple layers of testing can be very useful. Unit tests to make sure the pieces behave; integration to show that clusters of cooperating units cooperate as expected, and "acceptance" tests to show that the program functions as expected. Each can catch problems during development. Overlap per se isn't a bad thing, though too much of it becomes waste.
That said, the sad truth is that you can never ensure that the product behaves "as expected", because expectation is a fickle, human thing that gets translated very poorly onto paper. Good test coverage won't prevent a customer from saying "that's not quite what I had in mind...". Frequent feedback loops help there. Consider frequent demos as a "sanity test" to add to your manual mix.

Probably not, unless your software is really, really simple and has only one component.
Unit tests are very specific, and you should cover everything thoroughly with them. Go for high code-coverage here. However, they only cover one piece of functionality at a time and not how things work together. Acceptance tests should cover only what the customer really cares about at a high level, and while it will catch some bugs in how things work together, it won't catch everything as the person writing such tests will not know about the system in depth.
Most importantly, these tests may not be written by a tester. Unit tests should be written by developers and run frequently (up to every couple minutes, depending on coding style) by the devs (and by the build system too, ideally). Acceptance tests are often written by the customer or someone on behalf of the customer, thinking about what matters to the customer. However, you also need tests written by a tester, thinking like a tester (and not like a dev or customer).
You should also consider the following sorts of tests, which are generally written by testers:
Functional tests, which will cover pieces of functionality. This may include API testing and component-level testing. You will generally want good code-coverage here as well.
Integration tests, which put two or more components together to make sure that they work together. You don't want one component to put out the position in the array where the object is (0-based) when the other component expects the count of the object ("nth object", which is 1-based), for example. Here, the focus is not on code coverage but on coverage of the interfaces (general interfaces, not code interfaces) between components.
System-level testing, where you put everything together and make sure it works end-to-end.
Testing for non-functional features, like performance, reliability, scalability, security, and user-friendliness (there are others; not all will relate to every project).

Integration tests are for when your code integrates with other systems such as 3rd party applications, or other in house systems such as the environment, database etc. Use integration tests to ensure that the behavior of the code is still as expected.

In short no.
To begin with, your story cards should have acceptance criteria. That is, acceptance criteria specified by the product owner in conjunction with the analyst specifying the behavior required and if meet, the story card will be accepted.
The acceptance criteria should drive the automated unit test (done via TDD) and the automated regression/ functional tests which should be run daily. Remember we want to move defects to the left, that is, the sooner we find ‘em the cheaper and faster they are to fix. Furthermore, continuous testing enables us to refactor with confidence. This is required to maintain a sustainable pace for development.
In addition, you need automated performance test. Running a profiler daily or overnight would provide insight into the consumption of CPU and memory and if any memory leaks exist. Furthermore, a tool like loadrunner will enable you to place a load on the system that reflects actual usage. You will be able to measure response times and CPU and memory consumption on the production like machine running loadrunner.
The automated performance test should reflect actual usage of the app. You measure the number of business transactions (i.e., if a web application the clicking on a page and the response to the users or round trips to the server). and determine the mix of such transaction along with the reate they arrive per second. Such information will enable you to design properly the automated loadrunner test required to performance test the application. As is often the case, some of the performance issues will trace back to the implementation of the application while other will be determined by the configuration of the server environment.
Remember, your application will be performance tested. The question is, will the first performance test happen before or after you release the software. Believe me, the worse place to have a performance problem is in production. Performance issues can be the hardest to fix and can cause a deployed to all users to fail thus cancelling the project.
Finally, there is User Acceptance Testing (UAT). These are test designed by the production owner/ business partner to test the overall system prior to release. In generally, because of all the other testing, it is not uncommon for the application to return zero defects during UAT.

It depends on how complex your system is. If your acceptance tests (which satisfy the customers requirements) exercise your system from front to back, then no you don't.
However, if your product relies on other tiers (like backend middleware/database) then you do need a test that proves that your product can happily link up end-to-end.
As other people have commented, tests don't necessarily prove the project functions as expected, just how you expect it to work.
Frequent feedback loops to the customer and/or tests that are written/parsable in a way the customer understands (say for example in a BDD style ) can really help.

If I have unit tests for each class
and/or member function and acceptance
tests for every user story do I have
enough tests to ensure the project
functions as expected?
This is enough to show your software is functionally correct, at least as much as your test coverage is sufficient. Now, depending on what you're developing, there certainly are non-functional requirements that matter, think about reliability, performance and scability.

Technically, a full suit of acceptance tests should cover everything. That being said, they're not "enough" for most definitions of enough. By having unit tests and integration tests, you can catch bugs/issues earlier and in a more localized manner, making them much easier to analyze and fix.
Consider that a full suit of manually executed tests, with the directions written on paper, would be enough to validate that everything works as expected. However, if you can automate the tests, you'd be much better off because it makes doing the testing that much easier. The paper version is "complete", but not "enough". In the same way, each layer of tests add more to the value of "enough".
It's also worth noting that the different sets of tests tend to test the product/code from a different "viewpoint". Much the same way QA may pick up bugs that dev never thought to test for, one set of tests may find things the other set wouldn't.

Acceptance testing can even be made manually by the client if the system in hand is small.
Unit and small integration tests (consisting of unit like tests) are there for you to build a sustainable system.
Don't try to write test for each part of the system. That is brittle (easy to break) and overwhelming.
Decide on the critical parts of the system that takes too much amount of time to manually test and write acceptance tests only for that parts to make things easy for everyone.

What are the pros and cons of automated Unit Tests vs automated Integration tests?

Recently we have been adding automated tests to our existing java applications.
What we have
The majority of these tests are integration tests, which may cover a stack of calls like:-
HTTP post into a servlet
The servlet validates the request and calls the business layer
The business layer does a bunch of stuff via hibernate etc and updates some database tables
The servlet generates some XML, runs this through XSLT to produce response HTML.
We then verify that the servlet responded with the correct XML and that the correct rows exist in the database (our development Oracle instance). These rows are then deleted.
We also have a few smaller unit tests which check single method calls.
These tests are all run as part of our nightly (or adhoc) builds.
The Question
This seems good because we are checking the boundaries of our system: servlet request/response on one end and database on the other. If these work, then we are free to refactor or mess with anything inbetween and have some confidence that the servlet under test continues to work.
What problems are we likely to run into with this approach?
I can't see how adding a bunch more unit tests on individual classes would help. Wouldn't that make it harder to refactor as it's much more likely we will need to throw away and re-write tests?

Unit tests localize failures more tightly. Integration-level tests more closely correspond to user requirements and so are better predictor of delivery success. Neither of them is much good unless built and maintained, but both of them are very valuable if properly used.
(more...)
The thing with units tests is that no integration level test can exercise all the code as much as a good set of unit tests can. Yes, that can mean that you have to refactor the tests somewhat, but in general your tests shouldn't depend on the internals so much. So, lets say for example that you have a single function to get a power of two. You describe it (as a formal methods guy, I'd claim you specify it)
long pow2(int p); // returns 2^p for 0 <= p <= 30
Your test and your spec look essentially the same (this is sort of pseudo-xUnit for illustration):
assertEqual(1073741824,pow2(30);
assertEqual(1, pow2(0));
assertException(domainError, pow2(-1));
assertException(domainError, pow2(31));
Now your implementation can be a for loop with a multiple, and you can come along later and change that to a shift.
If you change the implementation so that, say, it's returning 16 bits (remember that sizeof(long) is only guaranteed to be no less than sizeof(short)) then this tests will fail quickly. An integration-level test should probably fail, but not certainly, and it's just as likely as not to fail somewhere far downstream of the computation of pow2(28).
The point is that they really test for diferent situations. If you could build sufficiently details and extensive integration tests, you might be able to get the same level of coverage and degree of fine-grained testing, but it's probably hard to do at best, and the exponential state-space explosion will defeat you. By partitioning the state space using unit tests, the number of tests you need grows much less than exponentially.

You are asking pros and cons of two different things (what are the pros and cons of riding a horse vs riding a motorcycle?)
Of course both are "automated tests" (~riding) but that doesn't mean that they are alternative (you don't ride a horse for hundreds of miles, and you don't ride a motorcycle in closed-to-vehicle muddy places)
Unit Tests test the smallest unit of the code, usually a method. Each unit test is closely tied to the method it is testing, and if it's well written it's tied (almost) only with that.
They are great to guide the design of new code and the refactoring of existing code. They are great to spot problems long before the system is ready for integration tests. Note that I wrote guide and all the Test Driven Development is about this word.
It does not make any sense to have manual Unit Tests.
What about refactoring, which seems to be your main concern? If you are refactoring just the implementation (content) of a method, but not its existence or "external behavior", the Unit Test is still valid and incredibly useful (you cannot imagine how much useful until you try).
If you are refactoring more aggressively, changing methods existence or behavior, then yes, you need to write a new Unit Test for each new method, and possibly throw away the old one. But writing the Unit Test, especially if you write it before the code itself, will help to clarify the design (i.e. what the method should do, and what it shouldn't) without being confused by the implementation details (i.e. how the method should do the thing that it needs to do).
Automated Integration Tests test the biggest unit of the code, usually the entire application.
They are great to test use cases which you don't want to test by hand. But you can also have manual Integration Tests, and they are as effective (only less convenient).
Starting a new project today, it does not make any sense not to have Unit Tests, but I'd say that for an existing project like yours it does not make too much sense to write them for everything you already have and it's working.
In your case, I'd rather use a "middle ground" approach writing:
smaller Integration Tests which only test the sections you are going to refactor. If you are refactoring the whole thing, then you can use your current Integration Tests, but if you are refactoring only -say- the XML generation, it does not make any sense to require the presence of the database, so I'd write a simple and small XML Integration Test.
a bunch of Unit Tests for the new code you are going to write. As I already wrote above, Unit Tests will be ready as soon as you "mess with anything in between", making sure that your "mess" is going somewhere.
In fact your Integration Test will only make sure that your "mess" is not working (because at the beginning it will not work, right?) but it will not give you any clue on
why it is not working
if your debugging of the "mess" is really fixing something
if your debugging of the "mess" is breaking something else
Integration Tests will only give the confirmation at the end if the whole change was successful (and the answer will be "no" for a long time). The Integration Tests will not give you any help during the refactoring itself, which will make it harder and possibly frustrating. You need Unit Tests for that.

I agree with Charlie about Integration-level tests corresponding more to user actions and the correctness of the system as a whole. I do think there is alot more value to Unit Tests than just localizing failures more tightly though. Unit tests provide two main values over integration tests:
1) Writing unit tests is as much an act of design as testing. If you practice Test Driven Development/Behavior Driven Development the act of writing the unit tests helps you design exactly what you code should do. It helps you write higher quality code (since being loosely coupled helps with testing) and it helps you write just enough code to make your tests pass (since your tests are in effect your specification).
2) The second value of unit tests is that if they are properly written they are very very fast. If I make a change to a class in your project can I run all the corresponding tests to see if I broke anything? How do I know which tests to run? And how long will they take? I can guarantee it will be longer than well written unit tests. You should be able to run all of you unit tests in a couple of minutes at the most.

Just a few examples from personal experience:
Unit Tests:
(+) Keeps testing close to the relevant code
(+) Relatively easy to test all code paths
(+) Easy to see if someone inadvertently changes the behavior of a method
(-) Much harder to write for UI components than for non-GUI
Integration Tests:
(+) It's nice to have nuts and bolts in a project, but integration testing makes sure they fit each other
(-) Harder to localize source of errors
(-) Harder to tests all (or even all critical) code paths
Ideally both are necessary.
Examples:
Unit test: Make sure that input index >= 0 and < length of array. What happens when outside bounds? Should method throw exception or return null?
Integration test: What does the user see when a negative inventory value is input?
The second affects both the UI and the back end. Both sides could work perfectly, and you could still get the wrong answer, because the error condition between the two isn't well-defined.
The best part about Unit testing we've found is that it makes devs go from code->test->think to think->test->code. If a dev has to write the test first, [s]he tends to think more about what could go wrong up front.
To answer your last question, since unit tests live so close to the code and force the dev to think more up front, in practice we've found that we don't tend to refactor the code as much, so less code gets moved around - so tossing and writing new tests constantly doesn't appear to be an issue.

The question has a philisophical part for sure, but also points to pragmatic considerations.
Test driven design used as the means to become a better developer has its merits, but it is not required for that. Many a good programmer exists who never wrote a unit test. The best reason for unit tests is the power they give you when refactoring, especially when many people are changing the source at the same time. Spotting bugs on checkin is also a huge time-saver for a project (consider moving to a CI model and build on checkin instead of nightly). So if you write a unit test, either before or after you written the code it tests, you are sure at that moment about the new code you've written. It is what can happen to that code later that the unit test ensures against - and that can be significant. Unit tests can stop bugs before tehy get to QA, thereby speeding up your projects.
Integration tests stress the interfaces between elements in your stack, if done correctly. In my experience, integration is the most unpredictable part of a project. Getting individual pieces to work tends not to be that hard, but putting everything together can be very difficult because of the types of bugs that can emerge at this step. In many cases, projects are late because of what happens in integration. Some of the errors encountered in this step are found in interfaces that have been broken by some change made on one side that was not communicated to the other side. Another source of integration errors are in configurations discovered in dev but forgotten by the time the app goes to QA. Integration tests can help reduce both types dramatically.
The importance of each test type can be debated, but what will be of most importance to you is the application of either type to your particular situation. Is the app in question being developed by a small group of people or many different groups? Do you have one repository for everything, or many repos each for a particular component of the app? If you have the latter, then you will have challenges with inter compatability of different versions of different components.
Each test type is designed to expose the problems of different levels of integration in the development phase to save time. Unit tests drive the integration of the output many developers operating on one repository. Integration tests (poorly named) drive the integration of components in the stack - components often written by separate teams. The class of problems exposed by integration tests are typically more time-consuming to fix.
So pragmatically, it really boils down to where you most need speed in your own org/process.

The thing that distinguishes Unit tests and Integration tests is the number of parts required for the test to run.
Unit tests (theoretically) require very (or no) other parts to run.
Integration tests (theoretically) require lots (or all) other parts to run.
Integration tests test behaviour AND the infrastructure. Unit tests generally only test behaviour.
So, unit tests are good for testing some stuff, integration tests for other stuff.
So, why unit test?
For instance, it is very hard to test boundary conditions when integration testing. Example: a back end function expects a positive integer or 0, the front end does not allow entry of a negative integer, how do you ensure that the back end function behaves correctly when you pass a negative integer to it? Maybe the correct behaviour is to throw an exception. This is very hard to do with an integration test.
So, for this, you need a unit test (of the function).
Also, unit tests help eliminate problems found during integration tests. In your example above, there are a lot of points of failure for a single HTTP call:
the call from the HTTP client
the servlet validation
the call from the servlet to the business layer
the business layer validation
the database read (hibernate)
the data transformation by the business layer
the database write (hibernate)
the data transformation -> XML
the XSLT transformation -> HTML
the transmission of the HTML -> client
For your integration tests to work, you need ALL of these processes to work correctly. For a Unit test of the servlet validation, you need only one. The servlet validation (which can be independent of everything else). A problem in one layer becomes easier to track down.
You need both Unit tests AND integration tests.

Unit tests execute methods in a class to verify proper input/output without testing the class in the larger context of your application. You might use mocks to simulate dependent classes -- you're doing black box testing of the class as a stand alone entity. Unit tests should be runnable from a developer workstation without any external service or software requirements.
Integration tests will include other components of your application and third party software (your Oracle dev database, for example, or Selenium tests for a webapp). These tests might still be very fast and run as part of a continuous build, but because they inject additional dependencies they also risk injecting new bugs that cause problems for your code but are not caused by your code. Preferably, integration tests are also where you inject real/recorded data and assert that the application stack as a whole is behaving as expected given those inputs.
The question comes down to what kind of bugs you're looking to find and how quickly you hope to find them. Unit tests help to reduce the number of "simple" mistakes while integration tests help you ferret out architectural and integration issues, hopefully simulating the effects of Murphy's Law on your application as a whole.

Joel Spolsky has written very interesting article about unit-testing (it was dialog between Joel and some other guy).
The main idea was that unit tests is very good thing but only if you use them in "limited" quantity. Joel doesn't recommend to achive state when 100% of your code is under testcases.
The problem with unit tests is that when you want to change architecture of your application you'll have to change all corresponding unit tests. And it'll take very much time (maybe even more time than the refactoring itself). And after all that work only few tests will fail.
So, write tests only for code that really can make some troubles.
How I use unit tests: I don't like TDD so I first write code then I test it (using console or browser) just to be sure that this code do nessecary work. And only after that I add "tricky" tests - 50% of them fail after first testing.
It works and it doesn't take much time.

We have 4 different types of tests in our project:
Unit tests with mocking where necessary
DB tests that act similar to unit tests but touch db & clean up afterwards
Our logic is exposed through REST, so we have tests that do HTTP
Webapp tests using WatiN that actually use IE instance and go over major functionality
I like unit tests. They run really fast (100-1000x faster than #4 tests). They are type safe, so refactoring is quite easy (with good IDE).
Main problem is how much work is required to do them properly. You have to mock everything: Db access, network access, other components. You have to decorate unmockable classes, getting a zillion mostly useless classes. You have to use DI so that your components are not tightly coupled and therefore not testable (note that using DI is not actually a downside :)
I like tests #2. They do use the database and will report database errors, constraint violations and invalid columns. I think we get valuable testing using this.
#3 and especially #4 are more problematic. They require some subset of production environment on build server. You have to build, deploy and have the app running. You have to have a clean DB every time. But in the end, it pays off. Watin tests require constant work, but you also get constant testing. We run tests on every commit and it is very easy to see when we break something.
So, back to your question. Unit tests are fast (which is very important, build time should be less than, say, 10 minutes) and the are easy to refactor. Much easier than rewriting whole watin thing if your design changes. If you use a nice editor with good find usages command (e.g. IDEA or VS.NET + Resharper), you can always find where your code is being tested.
With REST/HTTP tests, you get a good a good validation that your system actually works. But tests are slow to run, so it is hard to have a complete validation at this level. I assume your methods accept multiple parametres or possibly XML input. To check each node in XML or each parameter, it would take tens or hundreds of calls. You can do that with unit tests, but you cannot do that with REST calls, when each can take a big fraction of a second.
Our unit tests check special boundary conditions far more often than #3 tests. They (#3) check that main functionality is working and that's it. This seems to work pretty well for us.

As many have mentioned, integration tests will tell you whether your system works, and unit tests will tell you where it doesn't. Strictly from a testing perspective, these two kinds of tests complement each other.
I can't see how adding a bunch more
unit tests on individual classes would
help. Wouldn't that make it harder to
refactor as it's much more likely we
will need to throw away and re-write
tests?
No. It will make refactoring easier and better, and make it clearer to see what refactorings are appropriate and relevant. This is why we say that TDD is about design, not about testing. It's quite common for me to write a test for one method and in figuring out how to express what that method's result should be to come up with a very simple implementation in terms of some other method of the class under test. That implementation frequently finds its way into the class under test. Simpler, more solid implementations, cleaner boundaries, smaller methods: TDD - unit tests, specifically - lead you in this direction, and integration tests do not. They're both important, both useful, but they serve different purposes.
Yes, you may find yourself modifying and deleting unit tests on occasion to accommodate refactorings; that's fine, but it's not hard. And having those unit tests - and going through the experience of writing them - gives you better insight into your code, and better design.

Although the setup you described sounds good, unit testing also offers something important. Unit testing offers fine levels of granularity. With loose coupling and dependency injection, you can pretty much test every important case. You can be sure that the units are robust; you can scrutinise individual methods with scores of inputs or interesting things that don't necessarily occur during your integration tests.
E.g. if you want to deterministically see how a class will handle some sort of failure that would require a tricky setup (e.g. network exception when retrieving something from a server) you can easily write your own test double network connection class, inject it and tell it to throw an exception whenever you feel like it. You can then make sure that the class under test gracefully handles the exception and carries on in a valid state.

You might be interested in this question and the related answers too. There you can find my addition to the answers that were already given here.

What not to test when it comes to Unit Testing?

In which parts of a project writing unit tests is nearly or really impossible? Data access? ftp?
If there is an answer to this question then %100 coverage is a myth, isn't it?

Here I found (via haacked something Michael Feathers says that can be an answer:
He says,
A test is not a unit test if:
It talks to the database
It communicates across the network
It touches the file system
It can't run at the same time as any of your other unit tests
You have to do special things to your environment (such as editing config files) to run it.
Again in same article he adds:
Generally, unit tests are supposed to be small, they test a method or the interaction of a couple of methods. When you pull the database, sockets, or file system access into your unit tests, they are not really about those methods any more; they are about the integration of your code with that other software.

That 100% coverage is a myth, which it is, does not mean that 80% coverage is useless. The goal, of course, is 100%, and between unit tests and then integration tests, you can approach it.What is impossible in unit testing is predicting all the totally strange things your customers will do to the product. Once you begin to discover these mind-boggling perversions of your code, make sure to roll tests for them back into the test suite.

achieving 100% code coverage is almost always wasteful. There are many resources on this.
Nothing is impossible to unit test but there are always diminishing returns. It may not be worth it to unit test things that are painful to unit test.

The goal is not 100% code coverage nor is it 80% code coverage. A unit test being easy to write doesn't mean you should write it, and a unit tests being hard to write doesn't mean you should avoid the effort.
The goal of any test is to detect user visible problems in the most afforable manner.
Is the total cost of authoring, maintaining, and diagnosing problems flagged by the test (including false positives) worth the problems that specific test catches?
If the problem the test catches is 'expensive' then you can afford to put effort into figuring out how to test it, and maintaining that test. If the problem the test catches is trivial then writing (and maintaining!) the test (even in the presence of code changes) better be trivial.
The core goal of a unit test is to protect devs from implementation errors. That alone should indicate that too much effort will be a waste. After a certain point there are better strategies for getting correct implementation. Also after a certain point the user visible problems are due to correctly implementing the wrong thing which can only be caught by user level or integration testing.

What would you not test? Anything that could not possibly break.
When it comes to code coverage you want to aim for 100% of the code you actually write - that is you need not test third-party library code, or operating system code since that code will have been delivered to you tested. Unless its not. In which case you might want to test it. Or if there are known bugs in which case you might want to test for the presence of the bugs, so that you get a notification of when they are fixed.

Unit testing of a GUI is also difficult, albeit not impossible, I guess.

Data access is possible because you can set up a test database.
Generally the 'untestable' stuff is FTP, email and so forth. However, they are generally framework classes which you can rely on and therefore do not need to test if you hide them behind an abstraction.
Also, 100% code coverage is not enough on its own.

#GarryShutler
I actually unittest email by using a fake smtp server (Wiser). Makes sure you application code is correct:
http://maas-frensch.com/peter/2007/08/29/unittesting-e-mail-sending-using-spring/
Something like that could probably be done for other servers. Otherwise you should be able to mock the API...
BTW: 100% coverage is only the beginning... just means that all code has actually bean executed once.... nothing about edge cases etc.

Most tests, that need huge and expensive (in cost of resource or computationtime) setups are integration tests. Unit tests should (in theory) only test small units of the code. Individual functions.
For example, if you are testing email-functionality, it makes sense, to create a mock-mailer. The purpose of that mock is to make sure, your code calls the mailer correctly. To see if your application actually sends mail is an integration test.
It is very useful to make a distinction between unit-tests and integration tests. Unit-tests should run very fast. It should be easily possible to run all your unit-tests before you check in your code.
However, if your test-suite consists of many integration tests (that set up and tear down databases and the like), your test-run can easily exceed half an hour. In that case it is very likely that a developer will not run all the unit-tests before she checks in.
So to answer your question: Do net unit-test things, that are better implemented as an integration test (and also don't test getter/setter - it is a waste of time ;-) ).

In unit testing, you should not test anything that does not belong to your unit; testing units in their context is a different matter. That's the simple answer.
The basic rule I use is that you should unit test anything that touches the boundaries of your unit (usually class, or whatever else your unit might be), and mock the rest. There is no need to test the results that some database query returns, it suffices to test that your unit spits out the correct query.
This does not mean that you should not omit stuff that is just hard to test; even exception handling and concurrency issues can be tested pretty well using the right tools.

"What not to test when it comes to Unit Testing?"
* Beans with just getters and setters. Reasoning: Usually a waste of time that could be better spent testing something else.

Anything that is not completely deterministic is a no-no for unit testing. You want your unit tests to ALWAYS pass or fail with the same initial conditions - if weirdness like threading, or random data generation, or time/dates, or external services can affect this, then you shouldn't be covering it in your unit tests. Time/dates are a particularly nasty case. You can usually architect code to have a date to work with be injected (by code and tests) rather than relying on functionality at the current date and time.
That said though, unit tests shouldn't be the only level of testing in your application. Achieving 100% unit test coverage is often a waste of time, and quickly meets diminishing returns.
Far better is to have a set of higher level functional tests, and even integration tests to ensure that the system works correctly "once it's all joined up" - which the unit tests by definition do not test.

Anything that needs a very large and complicated setup. Ofcourse you can test ftp (client), but then you need to setup a ftp server. For unit test you need a reproducible test setup. If you can not provide it, you can not test it.

You can test them, but they won't be unit tests. Unit test is something that doesn't cross the boundaries, such as crossing over the wire, hitting database, running/interacting with a third party, Touching an untested/legacy codebase etc.
Anything beyond this is integration testing.
The obvious answer of the question in the title is You shouldn't unit test the internals of your API, you shouldn't rely on someone else's behavior, you shouldn't test anything that you are not responsible for.
The rest should be enough for only to make you able to write your code inside it, not more, not less.

Sure 100% coverage is a good goal when working on a large project, but for most projects fixing one or two bugs before deployment isn't necessarily worth the time to create exhaustive unit tests.
Exhaustively testing things like forms submission, database access, FTP access, etc at a very detailed level is often just a waste of time; unless the software being written needs a very high level of reliability (99.999% stuff) unit testing too much can be overkill and a real time sink.

I disagree with quamrana's response regarding not testing third-party code. This is an ideal use of a unit test. What if bug(s) are introduced in a new release of a library? Ideally, when a new version third-party library is released, you run the unit tests that represent the expected behaviour of this library to verify that it still works as expected.

Configuration is another item that is very difficult to test well in unit tests. Integration tests and other testing should be done against configuration. This reduces redundancy of testing and frees up a lot of time. Trying to unit test configuration is often frivolous.

FTP, SMTP, I/O in general should be tested using an interface. The interface should be implemented by an adapter (for the real code) and a mock for the unit test.
No unit test should exercise the real external resource (FTP server etc)

If the code to set up the state required for a unit test becomes significantly more complex than the code to be tested I tend to draw the line, and find another way to test the functionality. At that point you have to ask how do you know the unit test is right!

FTP, email and so forth can you test with a server emulation. It is difficult but possible.
Not testable are some error handling. In every code there are error handling that can never occur. For example in Java there must be catch many exception because it is part of a interface. But the used instance will never throw it. Or the default case of a switch if for all possible cases a case block exist.
Of course some of the not needed error handling can be removed. But is there a coding error in the future then this is bad.

The main reason to unit test code in the first place is to validate the design of your code. It's possible to gain 100% code coverage, but not without using mock objects or some form of isolation or dependency injection.
Remember, unit tests aren't for users, they are for developers and build systems to use to validate a system prior to release. To that end, the unit tests should run very fast and have as little configuration and dependency friction as possible. Try to do as much as you can in memory, and avoid using network connections from the tests.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js