Semi-automated testing of external libraries and error-prone interactions

Semi-automated testing of external libraries and error-prone interactions - unit-testing

Recently I have been trying to use unit tests in my code, and I like the idea in principle. However, the parts of my code that I am most eager to test are those error-prone areas which unit tests alone don't handle very well; for example:
Network code
Filesystem interactions
Database interactions
Communication with hardware (e.g. specialized devices that talk over RS-232)
Calls to quirky third-party libraries
I understand that mock objects are typically used in these situations, but I'm looking for a way to feel confident that the mock objects are correctly simulating the situations I want to test.
For example, suppose I want to write a mock that simulates what happens when the database server is restarted. To do this, I would want to first verify that the database library I'm using will actually throw a particular exception if the DB server is restarted. Right now, I write code like:
def checkDatabaseDropout():
connectToDatabase()
raw_input("Shut down the database and press Enter")
try:
testQuery()
assert False, "Database should have thrown an exception"
except DatabaseError, ex:
pass
Running this requires a fair amount of manual intervention, but it at least it gives me a verifiable set of assumptions I can work with in my code, and it lets me check those assumptions when I upgrade the library, switch to a different underlying database, etc.
My question is: are there better ways of handling this? Are there frameworks that support this kind of semi-automated testing? Or do people generally use other techniques at this end of the testing spectrum?

I try to not foresee these kinds of things.
Even though I'm doing close to 100% TDD, at the end of the day, I'm still building a complete system, so I also test that the entire application runs as expected. Such System Tests can capture and reproduce the kind of scenarios you talk about.
Once I know how to reproduce a given scenario, I can always write a unit test that reproduces it.
So in other words, I currently tend to work with two configurations:
Fully automated unit tests
Manual system tests.
These can interact and feed each other, iteratively making each easier and better to work with.

Related

is it ok to write test cases for save/update/persist methods - whether it be mock or by calling real methods [duplicate]

I know what the advantages are and I use fake data when I am working with more complex systems.
What if I am developing something simple and I can easily set up my environment in a real database and the data being accessed is so small that the access time is not a factor, and I am only running a few tests.
Is it still important to create fake data or can I forget the extra coding and skip right to the real thing?
When I said real database I do not mean a production database, I mean a test database, but using a real live DBMS and the same schema as the real database.

The reasons to use fake data instead of a real DB are:
Speed. If your tests are slow you aren't going to run them. Mocking the DB can make your tests run much faster than they otherwise might.
Control. Your tests need to be the sole source of your test data. When you use fake data, your tests choose which fakes you will be using. So there is no chance that your tests are spoiled because someone left the DB in an unfamiliar state.
Order Independence. We want our tests to be runnable in any order at all. The input of one test should not depend on the output of another. When your tests control the test data, the tests can be independent of each other.
Environment Independence. Your tests should be runnable in any environment. You should be able to run them while on the train, or in a plane, or at home, or at work. They should not depend on external services. When you use fake data, you don't need an external DB.
Now, if you are building a small little application, and by using a real DB (like MySQL) you can achieve the above goals, then by all means use the DB. I do. But make no mistake, as your application grows you will eventually be faced with the need to mock out the DB. That's OK, do it when you need to. YAGNI. Just make sure you DO do it WHEN you need to. If you let it go, you'll pay.

It sort of depends what you want to test. Often you want to test the actual logic in your code not the data in the database, so setting up a complete database just to run your tests is a waste of time.
Also consider the amount of work that goes into maintaining your tests and testdatabase. Testing your code with a database often means your are testing your application as a whole instead of the different parts in isolation. This often result in a lot of work keeping both the database and tests in sync.
And the last problem is that the test should run in isolation so each test should either run on its own version of the database or leave it in exactly the same state as it was before the test ran. This includes the state after a failed test.
Having said that, if you really want to test on your database you can. There are tools that help setting up and tearing down a database, like dbunit.
I've seen people trying to create unit test like this, but almost always it turns out to be much more work then it is actually worth. Most abandoned it halfway during the project, most abandoning ttd completely during the project, thinking the experience transfer to unit testing in general.
So I would recommend keeping tests simple and isolated and encapsulate your code good enough it becomes possible to test your code in isolation.

As far as the Real DB does not get in your way, and you can go faster that way, I would be pragmatic and go for it.
In unit-test, the "test" is more important than the "unit".

I think it depends on whether your queries are fixed inside the repository (the better option, IMO), or whether the repository exposes composable queries; for example - if you have a repository method:
IQueryable<Customer> GetCustomers() {...}
Then your UI could request:
var foo = GetCustomers().Where(x=>SomeUnmappedFunction(x));
bool SomeUnmappedFunction(Customer customer) {
return customer.RegionId == 12345 && customer.Name.StartsWith("foo");
}
This will pass for an object-based fake repo, but will fail for actual db implementations. Of course, you can nullify this by having the repository handle all queries internally (no external composition); for example:
Customer[] GetCustomers(int? regionId, string nameStartsWith, ...) {...}
Because this can't be composed, you can check the DB and the UI independently. With composable queries, you are forced to use integration tests throughout if you want it to be useful.

It rather depends on whether the DB is automatically set up by the test, also whether the database is isolated from other developers.
At the moment it may not be a problem (e.g. only one developer). However (for manual database setup) setting up the database is an extra impediment for running tests, and this is a very bad thing.

If you're just writing a simple one-off application that you absolutely know will not grow, I think a lot of "best practices" just go right out the window.
You don't need to use DI/IOC or have unit tests or mock out your db access if all you're writing is a simple "Contact Us" form. However, where to draw the line between a "simple" app and a "complex" one is difficult.
In other words, use your best judgment as there is no hard-and-set answer to this.

It is ok to do that for the scenario, as long as you don't see them as "unit" tests. Those would be integration tests. You also want to consider if you will be manually testing through the UI again and again, as you might just automated your smoke tests instead. Given that, you might even consider not doing the integration tests at all, and just work at the functional/ui tests level (as they will already be covering the integration).
As others as pointed out, it is hard to draw the line on complex/non complex, and you would usually now when it is too late :(. If you are already used to doing them, I am sure you won't get much overhead. If that were not the case, you could learn from it :)

Assuming that you want to automate this, the most important thing is that you can programmatically generate your initial condition. It sounds like that's the case, and even better you're testing real world data.
However, there are a few drawbacks:
Your real database might not cover certain conditions in your code. If you have fake data, you cause that behavior to happen.
And as you point out, you have a simple application; when it becomes less simple, you'll want to have tests that you can categorize as unit tests and system tests. The unit tests should target a simple piece of functionality, which will be much easier to do with fake data.

One advantage of fake repositories is that your regression / unit testing is consistent since you can expect the same results for the same queries. This makes it easier to build certain unit tests.
There are several disadvantages if your code (if not read-query only) modifies data:
- If you have an error in your code (which is probably why you're testing), you could end up breaking the production database. Even if you didn't break it.
- if the production database changes over time and especially while your code is executing, you may lose track of the test materials that you added and have a hard time later cleaning it out of the database.
- Production queries from other systems accessing the database may treat your test data as real data and this can corrupt results of important business processes somewhere down the road. For example, even if you marked your data with a certain flag or prefix, can you assure that anyone accessing the database will adhere to this schema?
Also, some databases are regulated by privacy laws, so depending on your contract and who owns the main DB, you may or may not be legally allowed to access real data.
If you need to run on a production database, I would recommend running on a copy which you can easily create during of-peak hours.

It's a really simple application, and you can't see it growing, I see no problem running your tests on a real DB. If, however, you think this application will grow, it's important that you account for that in your tests.
Keep everything as simple as you can, and if you require more flexible testing later on, make it so. Plan ahead though, because you don't want to have a huge application in 3 years that relies on old and hacky (for a large application) tests.

The downsides to running tests against your database is lack of speed and the complexity for setting up your database state before running tests.
If you have control over this there is no problem in running the tests directly against the database; it's actually a good approach because it simulates your final product better than running against fake data. The key is to have a pragmatic approach and see best practice as guidelines and not rules.

Learning About Unit Testing Using When and Should and TDD

The tests at my new job are nothing like the tests I have encountered before.
When they're writing their unit tests (presumably before the code), they create a class starting with "When". The name describes the scenario under which the tests will run (the fixture). They'll created subclasses for each branch through the code. All of the tests within the class start with "should" and they test different aspects of the code after running. So, they will have a method for verifying that each mock (DOC) is called correctly and for checking the return value, if applicable. I am a little confused by this method because it means the exact same execution code is being run for each test and this seems wasteful. I was wondering if there is a technique similar to this that they may have adapted. A link explaining the style and how it is supposed to be implemented would be great. I sounds similar to some approaches of BDD I've seen.
I also noticed that they've moved the repeated calls to "execute" the SUT into the setup methods. This causes issues when they are expecting exceptions, because they can't use built-in tools for performing the check (Python unittest's assertRaises). This also means storing the return value as a backing field of the test class. They also have to store many of the mocks as backing fields. Across class hierarchies it becomes difficult to tell the configuration of each mock.
They also test code a little differently. It really comes down to what they consider an integration test. They mock out anything that steals the context away from the function being tested. This can mean private methods within the same class. I have always limited mocking to resources that can affect the results of the test, such as databases, the file system or dates. I can see some value in this approach. However, the way it is being used now, I can see it leading to fragile tests (tests that break with every code change). I get concerned because without an integration test, in this case, you could be using a 3rd party API incorrectly but your unit tests would still pass. I'd like to learn more about this approach as well.
So, any resources about where to learn more about some of these approaches would be nice. I'd hate to pass up a great learning opportunity just because I don't understand they way they are doing things. I would also like to stop focusing on the negatives of these approaches and see where the benefits come in.

If I understood you explanation in the first paragraph correctly, that's quite similar to what I often do. (Depending on whether the testing framework makes it easy or not. Also many mocking frameworks don't support it, but spy frameworks like Mockito do better.)
For example see the stack example here which has a common setup (adding things to the stack) and then a bunch of independent tests which each check one thing. Here's still another example, this time one where none of the tests (#Test) modify the common fixture (#Before), but each of them focuses on checking just one independent thing that should happen. If the tests are very well focused, then it should be possible to change the production code to make any single test fail while all other tests pass (I wrote about that recently in Unit Test Focus Isolation).
The main idea is to have each test check a single feature/behavior, so that when tests fail it's easier to find out why it failed. See this TDD tutorial for more examples and to learn that style.
I'm not worried about the same code paths executed multiple times, when it takes a millisecond to run one test (if it takes more than a couple of seconds to run all unit tests, the tests are probably too big). From your explanation I'm more worried that the tests might be too tightly coupled to the implementation, instead of the feature, if it's systematic that there is one test for each mock. The name of the test would be a good indicator of how well structured or how fragile the tests are - does it describe a feature or how that feature is implemented.
About mocking, a good book to read is Growing Object-Oriented Software Guided by Tests. One should not mock 3rd party APIs (APIs which you don't own and can't modify), for the reason you already mentioned, but one should create an abstraction over it which better fits the needs of the system using it and works the way you want it. That abstraction needs to be integration tested with the 3rd party API, but in all tests using the abstraction you can mock it.

First, the pattern that you are using is based on Cucumber - here's a link. The style is from the BDD (Behavior-driven development) approach. It has two advantages over traditional TDD:
Language - one of the tenants of BDD is that the language you use influences the thoughts you have by forcing you to speak in the language of the end user, you will end up writing different tests than when you write tests from the focus of a programmer
Tests lock code - BDD locks the code at the appropriate level. One problem common in testing is that you write a large number of tests, which makes your codebase more brittle as when you change the code you must also change a large number of tests too. BDD forces you to lock the behavior of your code, rather than the implementation of your code. This way, when a test breaks, it is more likely to be meaningful.
It is worth noting that you do not have to use the Cucumber style of testing to achieve these affects and using it does add an extra layer of overhead. But very few programmers have been successful in keeping the BDD mindset while using traditional xUnit tools (TDD).
It also sounds like you have some scenarios where you would like to say 'When I do , then verify '. Because the current BDD xUnit frameworks only allow you to verify primitives (strings, ints, doubles, booleans....), this usually results in a large number of individual tests (one for each Assert). It is possible to do more complicated verifications using a Golden Master paradigm test tool, such as ApprovalTests. Here's a video example of this.
Finally, here's a link to Dan North's blog - he started it all.

Unit testing programs that mostly interact with external resources

I would like to start doing more unit testing in my applications, but it seems to me that most of the stuff I do is just no suitable to be unit tested. I know how unit tests are supposed to work in textbook examples, but in real world applications they do not seem of much use.
Some applications I write have very simple logic and complex interactions with things that are outside my control. For instance I would like to write a daemon which reacts to signals sent by some applications, and changes some user settings in the OS. I can see three difficulties:
first I have to be able to talk with the applications and be notified of their events;
then I need to interact with OS whenever I receive a signal, in order to change the appropriate user settings;
finally all of this should work as a daemon.
All these things are potentially delicate: I will have to browse possibly complex APIs and I may introduce bugs, say by misinterpreting some parameters. What can unit testing do for me? I can mock both the external application and the OS, and check that given a signal from the application, I will call the appropriate API method on the OS. This is... well, the trivial part of the application.
Actually most of the things I do involve interaction with databases, the filesystem or other applications, and these are the most delicate parts.
For another example look at my build tool PHPmake. I would like to refactor it, as it is not very well-written, but I fear to do this as I have no tests. So I would like to add some. The point is that the things which may be broken by refactoring may not be caught by unit tests:
One of things to do is deciding which things are to be built and which one are already up to date, and this depends on the time of last modification of the files. This time is actually changed by external processes, when some build command is fired.
I want to be sure that the output of external processes is displayed correctly. Sometimes the buikd commands require some input, and that should be also managed correctly. But I do not know a priori which processes will be ran - it may be anything.
Some logic is involved in pattern matching, and this may seem to be testable part. But the functions which do the pattern matching use (ni addition to their own logic) the PHP function glob, which works with the filesystem. If I just mock a tree in place of the actual filesystem, glob will not work.
I could go on with more examples, but the point is the following. Unless I have some delicate algorithms, most of what I do involves interaction with external resources, and this is not suitable for unit testing. More that this, often this interaction is actually the non-trivial part. Still many people see unit testing as a basic tool. What am I missing? How can I learn be a better tester?

I think you open a number of issues in your question.
Firstly, when your application integrates with external environments such as OS, other threads, etc. then you have to separate (1) the logic that is tied in with the external enviroment and (2) your business-code.. that is, the stuff your application does. This is no different to how you would separate GUI and SERVER in an application (or web application).
Secondly, you ask if you should test simple logic. I'd say, it depends. Often simple fetch/store functionality is nice to have tests for. It's like the foundation of your application.. hence its important. Other business stuff built upon your foundation that is very simple, you may easily find yourself both feeling that you are wasting your time, and mostly you are :-)
Thirdly, refactory an existing program and testing it in its existing state may be a problem. If your PHP program produces a set of files on the basis of some input, well, maybe thats your entry point to tests are. Sure the tests may be high-level, but it's an easy way to ensure that after the refactoring, your program produces the same output. Hence, aim for higher-level tests in that situation in the start phase of your refactoring efforts.
I'd like to recommend some literature, but I can only come up with one title. "Working Effectively with Legacy Code" By Micheal Feathers. It's a good start. Another would be "xUnit Test Patterns: Refactoring Test Code" by Gerard Meszaros (although that book is much more sloppy and FULL of copy paste text).

As regards your issue about existing code bases that aren't currently covered by tests in which you would like to start refactoring, I would suggest reading:
Working Effectively with Legacy Code By Micheal Feathers.
That book gives you techniques on how to deal with the issues you might be facing with PHPMake. It provides ways to introduce seams for testing, where there previously weren't any.
Additionally, with code that touches say the file systems, you can abstract the file system calls behind a thin wrapper, using the Adapter Pattern. The unit tests would be against a fake implementation of the abstract interface that the wrapping class implements.
At some point you get to a low enough level where a unit of code can't be isolated for unit testing as these depend on library or API calls (such as in the production implementation of the wrapper). Once this happens integration tests are really the only automated developer tests you can write.

I recommend this google tech-talk on unit testing.
The video boils down to
write your code so that it knows as little about how it will be used as possible. The less assumptions your code makes, the easier it is to test. Avoid complex logic in constructors, the use of singletons, static class members, and so on.
isolate your code from the external world (comms, databases, real time), and make sure that your code only talks to your isolation layer. Otherwise, writing tests will be a nightmare in terms of 'fake environment' setup.
unit tests should test stories; that is what we really understand and care for; given a class with a method foo(), testFoo() is uninformative. They actually recommend test names like itShouldCloseConnectionEvenWhenExceptionThrown(). Ideally, your stories should cover enough functionality that you can rebuild the spec from the stories.
NOTE: the video and this post use Java as an example; however, the main points stand for any language.

"Unit tests" tests one unit of your code. No external tools should be involved. This seems to be complicated for your first app (without knowing to much about it ;)) but the phpMake is unit-testable - I'm sure ... because ant, gradle and maven are unit-testable too ;)!
But of course you can test your first application automated too. There are several different layers one could test an application.
So the task for you is to find an automated way to test your app - be it integration testing or whatever.
E.g. you could write shell scripts, which asserts some output! With that you make sure your application behaves correctly ...

Tests of interactions with external resources are integration tests, not unit tests.
Tests of your code to see how it would behave if particular external interactions had occurred can be unit tests. These should be done by writing your code to use dependency injection, and then, in the unit test, injecting mock objects as dependencies.
For example, consider a piece of code that adds the results of a call to one service to the results of a call to another service:
public int AddResults(IService1 svc1, IService2 svc2, int parameter)
{
return svc1.Call(parameter) + svc2.Call(parameter);
}
You can test this by passing in mock objects for the two services:
private class Service1Returns1 : IService1
{
public int Call(int parameter){return 1;}
}
private class Service2Returns1 : IService2
{
public int Call(int parameter){return 1;}
}
public void Test1And1()
{
Assert.AreEqual(2, AddResults(new Service1Returns1(), new Service2Returns1(), 0));
}

First of all, if unit testing doesn't seem like it would be much use in your applications, why do you even want to start doing more of it? What is motivating you to care about it? It is definitely a waste of time if a) you do everything perfect the first time and nothing ever changes or b) you decide it's a waste of time and do it poorly.
If you do think that you really want to do unit testing, the answer to your questions are all the same: encapsulation. In your daemon example, you could create a ApplcationEventObeservationProxy with a very narrow interface that just implements pass through methods. The purpose of this class is to do nothing but completely encapsulate the rest of your code from the third-party event observing library (nothing means nothing -- no logic here). Do the same thing for OS settings. Then you can completely unit test the class that does actions based on events. I'd recommend have a separate class for the daemon-ness that just wraps your main class -- it will make the testing easier.
There are a couple of benefits to this approach outside of unit testing. One is that if you encapsulate the code that interacts directly with the OS, it's easier to switch it out. This kind of code is particularly prone to breakage outside of your control (i.e., MS patchsets). You will also probably want to support more than one OS, and if the OS specific logic is not tangled with the rest of your logic, it will be easier. The other benefit is that you'll be forced to realize that there is more business logic in your app than you think. :)
Finally, don't forget that unit testing is a foundation for a good product, but not the only ingredient. Having a set of tests that explore and verify the OS API calls you'll be using is a good strategy for the "hard" parts of this problem. You should also have end to end tests that ensure the events in your applications cause the OS setting changes.

As other answers suggested Working Effectively with Legacy Code By Micheal Feathers is a good read. If you have to deal with legacy code, and you want to make sure that the systems interaction work as expected, try writing integration tests first. And then it is more appropriate to write Unit Tests to test the behaviour of methods that are valued from the requirements point of view. You Tests serve a whole different purpose than the integration tests. Unit Tests are more likely to improve the design of your system than testing how everything hangs to gather.

Unit testing handling of degraded network stack, file corruption, and other imperfections

I'm primarily a C++ coder, and thus far, have managed without really writing tests for all of my code. I've decided this is a Bad Idea(tm), after adding new features that subtly broke old features, or, depending on how you wish to look at it, introduced some new "features" of their own.
But, unit testing seems to be an extremely brittle mechanism. You can test for something in "perfect" conditions, but you don't get to see how your code performs when stuff breaks. A for instance is a crawler, let's say it crawls a few specific sites, for data X. Do you simply save sample pages, test against those, and hope that the sites never change? This would work fine as regression tests, but, what sort of tests would you write to constantly check those sites live and let you know when the application isn't doing it's job because the site changed something, that now causes your application to crash? Wouldn't you want your test suite to monitor the intent of the code?
The above example is a bit contrived, and something I haven't run into (in case you haven't guessed). Let me pick something I have, though. How do you test an application will do its job in the face of a degraded network stack? That is, say you have a moderate amount of packet loss, for one reason or the other, and you have a function DoSomethingOverTheNetwork() which is supposed to degrade gracefully when the stack isn't performing as it's supposed to; but does it? The developer tests it personally by purposely setting up a gateway that drops packets to simulate a bad network when he first writes it. A few months later, someone checks in some code that modifies something subtly, so the degradation isn't detected in time, or, the application doesn't even recognize the degradation, this is never caught, because you can't run real world tests like this using unit tests, can you?
Further, how about file corruption? Let's say you're storing a list of servers in a file, and the checksum looks okay, but the data isn't really. You want the code to handle that, you write some code that you think does that. How do you test that it does exactly that for the life of the application? Can you?
Hence, brittleness. Unit tests seem to test the code only in perfect conditions(and this is promoted, with mock objects and such), not what they'll face in the wild. Don't get me wrong, I think unit tests are great, but a test suite composed only of them seems to be a smart way to introduce subtle bugs in your code while feeling overconfident about it's reliability.
How do I address the above situations? If unit tests aren't the answer, what is?
Edit: I see a lot of answers that say "just mock it". Well, you can't "just mock it", here's why:
Taking my example of the degrading network stack, let's assume your function has a well defined NetworkInterface, which we'll mock. The application sends out packets over both TCP, and UDP. Now, let's say, hey, let's simulate 10% loss on the interface using a mock object, and see what happens. Your TCP connections increase their retry attempts, as well as increasing their back-off, all good practice. You decide to change X% of your UDP packets to actually make a TCP connection, lossy interface, we want to be able to be able to guarantee delivery of some packets, and the others shouldn't lose too much. Works great. Meanwhile, in the real world.. when you increase the number of TCP connections (or, data over TCP), on a connection that's lossy enough, you'll end up increasing your UDP packet loss, as your TCP connections will end up re-sending their data more and more and/or reducing their window, causing your 10% packet loss to actually be more like 90% UDP packet loss now. Whoopsie.
No biggie, let's break that up into UDPInterface, and TCPInterface. Wait a minute.. those are interdependent, testing 10% UDP loss and 10% TCP loss is no different than the above.
So, the issue is now you're not simply unit testing your code, you're introducing your assumptions into the way the operating system's TCP stack works. And, that's a Bad Idea(tm). A much worse idea than just avoiding this entire fiasco.
At some point, you're going to have to create a Mock OS, which behaves exactly like your real OS, except, is testable. That doesn't seem like a nice way forward.
This is stuff we've experienced, I'm sure others can add their experiences too.
I hope someone will tell me I'm very wrong, and point out why!
Thanks!

You start by talking about unit tests, then talk about entire applications; it seems you are a little confused about what unit testing is. Unit testing by definition is about testing at the most fine grained level, when each "unit" of the software is being tested. In common use, a "unit" is an individual function, not an entire application. Contemporary programming style has short functions, each of which does one well defined thing, which is therefore easy to unit test.

what sort of tests would you write to constantly check those sites live?
UnitTests target small sections of code you write. UnitTests do not confirm that things are ok in the world. You should instead define application behavior for those imperfect scenarios. Then you can UnitTest your application in those imperfect scenarios.
for instance a crawler
A crawler is a large body of code you might write. It has some different parts, one part might fetch a webpage. Another part might analyze html. Even these parts may be too large to write a unit test against.
How do you test an application will do its job in the face of a degraded network stack?
The developer tests it personally by purposely setting up a gateway that drops packets to simulate a bad network when he first writes it.
If a test uses the network, it's not a UnitTest.
A UnitTest (which must target your code) cannot call the network. You didn't write the network. The UnitTest should involve a mock network with simulated (but consistent each time) packet loss.
Unit tests seem to test the code only in perfect conditions
UnitTests test your code in defined conditions. If you're only capable of defining perfect conditions, your statement is true. If you're capable of defining imperfect conditions, your statement is false.

Work through any decent book on unit testing - you'll find that it's normal practise to write tests that do indeed cover edge cases where the input is not ideal or is plain wrong.
The most common approach in languages with exception handling is a "should throw" specification, where a certain test is expected to cause a specific exception type to be thrown. If it doesn't throw an exception, the test fails.
Update
In your update you describe complex timing-sensitive interactions. Unit testing simply doesn't help at all there. No need to introduce networking: just think of trying to write a simple thread safe queue class, perhaps on a platform with some new concurrency primitives. Test it on an 8 core system... does it work? You simply can't know that for sure by testing it. There are just too many different ways that the timing can cause operations to overlap between the cores. Depending on luck, it could take weeks of continuous execution before some really unlikely coincidence occurs. The only way to get such things right is through careful analysis (static checking tools can help). It's likely that most concurrent software has some rarely occuring bugs in it, including all operating systems.
Returning to the cases that can actually be tested, I've found integration tests to be often just as useful as unit tests. This can be as elaborate as automating the installation of your product, adding configurations to it (such as your users might create) and then "poking" it from the outside, e.g. automating your UI. This finds a whole other class of issue separate from unit testing.

It sounds as if you answered your own question.
Mocks/stubs are the key to testing difficult to test areas. For all of your examples, the manual approach of say creating a website with dodgy data, or causing network failure could be done manually. However it would be very difficult and tedious to do so, not something anyone would recommend. In fact, doing some would mean you are not actually unit testing.
Instead you'd use mock/stubs to pretend such scenarios have happened allowing you to test them. The benefit of using mocks is that unlike the manual approach you can guarantee that each time you run your tests the same procedure will be carried out. The tests in turn will be much faster and stable because of this.
Edit - With regards the updated question.
Just as a disclaimer my networking experience is very limited, therefore I can't comment on the technical side of your issues. However, I can comment on the fact you sound as if you are testing too much. In other words, your tests cover too much of a wide scope. I don't know what your code base is like but given functions/objects within that, you should still be able to provide fake input that will allow you to test that your objects/functions do the right thing in isolation.
So lets imagine your isolated areas work fine given the requirements. Just because your unit tests pass does not mean you've tested your application. You'll still need to manually test such scenarios you describe. In this scenario it sounds as if stress testing - limiting network resources and so on are required. If your application works as expected - great. If not, you've got missing tests. Unit testing (more in tie with TDD/BDD) is about ensuring small, isolated areas of your application work. You still need integration/manual/regression etc.. testing afterwards. Therefore you should use mocks/stubs to test your small, isolated areas function. Unit testing is more akin to a design process if anything in my opinion.

Integration Testing vs Unit Testing
I should preface this answer by saying I am biased towards integration tests vs unit tests as the primary type of test used in tdd. At work we also have some unit tests mixed in, but only as necessary. The primary reason why we start with an integration test is because we care more about what the application is doing rather than what a particular function does. We also get integration coverage which has been, in my experience, a huge gap for automated testing.
To Mock or Not, Why Not Do Both
Our integration tests can run either fully wired (to unmanaged resource) or with mocks. We have found that helps to cover the gap between real world vs mocks. This also provides us with the option to decide NOT to have a mocked version because the ROI for implementing the mock isn't worth it. You may ask why use mocks at all.
tests suite runs faster
guaranteed same response every time (no timeouts, unforeseen degraded network, etc)
fine-grained control over behavior
Sometimes you shouldn't write a test
Testing, any kind of testing has trade offs. You look at the cost to implement the test, the mock, variant tests, etc and weigh that against the benefits and sometime it doesn't make sense to write the test, the mock, or the variant. This decision is also made within the context of the kind of software your building which really is one of the major factor in deciding how deep and broad your test suite needs to be. To put it another way, I'll write a few tests for the social bacon meetup feature, but I'm not going to write the formal verification test for the bacon-friend algorithm.
Do you simply save sample pages, test
against those, and hope that the sites
never change?
Testing is not a panacea
Yes, you save samples (as fixtures). You don't hope the page doesn't change, but you can't know how and when it will change. If you have ideas or parameters of how it may change then you can create variants to make sure your code will handle those variants. When and if it does change, and it breaks, you add new samples, fix the problems and move on.
what sort of tests would you write to
constantly check those sites live and
let you know when the application
isn't doing it's job because the site
changed something, that now causes
your application to crash?
Testing != Monitoring
Tests are tests and part of development (and QA), not for production. MONITORING is what you use in production to make sure your application is working properly. You can write monitors which should alert you when something is broken. That's a whole other topic.
How do you test an application will do
its job in the face of a degraded
network stack?
Bacon
If it were me I would have a wired and mocked mode for the test (assuming the mock was good enough to be useful). If the mock is difficult to get right, or if it's not worth it then I would just have the wired test. However, I have found that there is almost always a way split the variables in play into different tests. Then each of those tests are targeted to testing that vector of change, while minimizing all the other variability in play. The trick is to write the important variants, not every possible variant.
Further, how about file corruption?
How Much Testing
You mention that checksum being correct, but the file actually being corrupt. The question here is what is the class of software I'm writing. Do I need to be super paranoid about the possibility of a statistically small false positive or not. If I do, then we work to find what how deep and broad to test.

I think you can't and shouldn't make an unit test for all possible errors you might face (what if a meteorite hits the db server?) - you should make an effort to test errors with reasonably probablity and/or rely or another services.
For example; if your application requires the correct arrival of network packets; you should use the TCP transport layer: it guarantees the correctness of the received packets transparently, so you only have to concentrace eg. what happens if network connection is dropped.
Checksums are designed to detect or correct a reasonable amount of errors - if you expect 10 errors per file, you would use different checksum than if you expect 100 errors. If the chosen checksum indicates that the file is correct, than you have no reason to think it's broken (the probablity that it is broken is negligible).
Because you don't have infinite resources (eg. time) you have to make compromises when you write your tests; and choosing these compromises it a tough question.

Although not a complete answer to the massive dilema you face, you can reduce the amount of tests by using a technique called Equivalence Partitioning.
In my organization, we perform many levels of coverage, regression, positive, negative, scenario based, UI in automated and manual tests, all starting from a 'clean environment', but even that isn't perfect.
As for one of the cases you mention, where a programmer comes in and changes some sensitive detection code and no one notices, we would have had a snapshot of data that is 'behaviourally dodgy', which fails consistently with a specific test to test the detection routine - and we would run all tests regularly (and not just at the last minute).

Sometimes I'll create two (or more) test suites. One suite uses mocks/stubs and only tests the code I'm writing. The other tests test the database, web sites, network devices, other servers, and whatever else is outside of my control.
Those other tests are really tests of my assumptions about the systems my code interacts with. So if they fail, I know my requirements have changed. I can then update my internal tests to reflect whatever new behavior my code needs to have.
The internal tests include tests that simulate various failures of the external systems. Whenever I observe a new kind of failure, either through my other tests or as a result of a bug report, I have a new internal test to write.
Writing tests that model all the bizarre things that happen in the real world can be challenging, but the result is that you really think about all those cases, and produce robust code.

The proper use of Unit Testing starts from the ground up. That is, you write your unit tests BEFORE you write your production code. The unit tests are then forced to consider error conditions, pre-conditions, post-conditions, etc. Once you write your production code (and the unit tests are able to compile and run successfully), if someone makes a change to the code that changes any of its conditions (even subtly), the unit test will fail and you will learn about it very quickly (either via compiler error or via a failed unit test).
EDIT: Regarding the updated question
What you are trying to test is not really suited well for unit testing. Networking and database connections test better in a simulated integration test. There are far too many things that can break during the initialization of a remote connection to create a useful unit test for it (I'm sure there are some unit-tests-fix-all people that will disagree with me there, but in my experience, trying to unit test network traffic and/or remote database functionality is worse than shoving a square peg though a round hole).

You are talking about library or application testing, which is not the same as unit testing. You can use unit testing libraries such as CppUnit/NUnit/JUnit for library and regression testing purposes, but as others have said, unit testing is about testing your lowest level functions, which are supposed to be very well defined and easily separated from the rest of the code. Sure, you could pass all low-level unit tests, and still have a network failure in the full system.
Library testing can be very difficult, because sometimes only a human can evaluate the output for correctness. Consider a vector graphics or font rendering library; there's no single perfect output, and you may get a completely different result based on the video card in your machine.
Or testing a PDF parser or a C++ compiler is dauntingly difficult, due to the enormous number of possible inputs. This is when owning 10 years of customer samples and defect history is way more valuable than the source code itself. Almost anyone can sit down and code it, but initially you won't have a way of validating your program for correctness.

The beauty of mock objects is that you can have more than one. Assume that you are programming against a well-defined interface for a network stack. Then you can have a mock object WellBehavingNetworkStack to test the normal case and another mock object OddlyBehavingNetworkStack that simulates some of the network failures that you expect.
Using unit tests I usually also test argument validation (like ensuring that my code throws NullPointerExceptions), and this is easy in Java, but difficult in C++, since in the latter language you can hit undefined behavior quite easily, and then all bets are off. Therefore you cannot be strictly sure that your unit tests work, even if they seem to. But still you can test for odd situations that do not invoke undefined behavior, which should be quite a lot in well-written code.

What you are talking about is making applications more robust. That is, you want them to handle failures elegantly. However, testing every possible real world failure scenario would be difficult if not impossible. The key to making applications robust is to assume that failure is normal and should be expected at some point in the future. How an application handles failure really depends on the situation. There are a number of different ways to detect and handle failure (maybe a good question to ask the group). Trying to rely on unit testing alone will only get you part of the way. Anticipating failure (even on some simple operations) will get you even closer to a more robust application. Amazon built thier entire system to anticipate all types of failures (hardware, software, memory and file corruption). Take a look at thier Dynamo for an example of real world error handling.

Unit Testing Third Party ORM

I've read a few threads on SO about usefulness of unit-testing various applications. The opinions can range from "test everything all the time" to "unit tests are useless", and everything in between ("test where it makes sense"). I tend to lean towards the middle one.
That leads me to my question. I am trying to decide if it would be beneficial or practical to have some basic unit-tests testing 3rd party ORM as suggested in this SO post:
link text
some baseline tests may be useful as insurance against future breaking changes, depending on how you are using the tool. For example, instead of mocking up the entire n-tier chain (I'm not a fan of mocking when it is not necessary), just use the ORM tool to create, read, update, and delete a typical object/record, and verify the operation using direct SQL statements on the (test) database. That way if the 3rd-party vendor later updates something that breaks the basic functionality you'll know about it, and new developers to your project can easily see how to use the ORM tool from the unit test examples.
My main reservations following this advise is that it would require way too much setup, would be a headache to maintain, and over all it would not be practical in our environment. Here's the summary of some points to consider:
The ORM we're using requires static datasource object(s) to be created and registered with its Data Access Layer and associated with authenticated user. This would require a lot of test setup, and probably would be problematic on the build server where no user is logged on.
ORM vendor has a pretty good track record of releasing new updates and not breaking basic functionality. Furthermore whenever it's time to update ORM to the latest version, I would imagine that application wouldn't go straight to production, but would be thoroughly regression tested anyway.
Maintaining test db for Unit testing is kind of problematic in this environment. Test db gets wiped out after each major release and replaced with db backup from staging with sensative data obfuscated. I would imagine in order to have a test db for ORM unit testing, we would need to run some scripts/code that would set the database in a "test" state. Again too much setup and maintenance.
And finally ORM documentation/help for new developers. I can see how something like that could be useful. But ORM vendor provides pretty good documentation/help with demo apps. So writing unit tests on top of that doesn't seem to be worth all the efforts.
So, is it worth to go through all these troubles just to make sure that ORM does what it supposed to do (which is CRUD)? Shouldn't it be a responsibility of the vendor anyway?

You said it yourself. Test where it makes sense. If it will make you "feel" better to test the 3rd party ORM, then do it. But, ultimately, you're putting your trust in their tool. What are you going to do if the ORM suddenly stops working? Have you written enough code against it that you can't easily rip it out? You'd wait for them to fix it, probably.
Basically, you have to treat 3rd party tools as the proverbial black boxes and let them do what you bought them to do. That was the reason you paid the money you did, right? To keep from having to write that yourself.

In this particular case, I wouldn't bother. I think you are correct in assuming a bad ROI here.
And yes, I consider it the responsibility of the vendor. I expect (and assume) their stuff works. That's how I treat my vendors.

It is the responsibility of the vendor to make sure the ORM does what it's supposed to do, but it's your responsibility to ensure that your application does what it's supposed to do, and if it fails for whatever reason, your clients will be unhappy, even if it's "just" because the ORM failed.
Your tests could ensure that the ORM works the way you expect it to given the way you're calling it. It's possible that the ORM will change in a way that isn't "broken" but that doesn't play nicely with your application.
That being said, if you're confident in the ORM, and feel that setting up and maintaining any kind of automated tests of the ORM is not worth the effort, it's probably not, especially if you've got other levels of testing that are likely to reveal the problems if they arise.

I personally think that real unit tests should only test the application itself, and everything that needs to be separately deployed and configured should be mocked up.
What you are saying is to write some integration/functional tests, that test the whole system end-to-end. These will never be lightweight, but probably are still useful in some cases (e.g. if your system doesn't change too much and is critical for your company at the same time). I have seen such tests automated as well, using virtual servers (either VMWare or microsoft equivalent), and an example database which was restored from file before every test run. You can also just set the ORM once, and accept that the tests will fail mainly because the configuration will break. Obviously you can test more, but be aware that the cost is higher.

Testing that 3rd party ORM library does its job is not unit testing at all. However, that's not the point of your question.
As was said numerous times in such a books like "Working Effectively with Legacy Code" by Michael Feathers or "Domain-Driven Design" by Eric Evans or "Clean Code" by Robert Martin your 3rd party ORM library is a technical detail which should be abstracted away from your codebase, precisely because you have no control over 3rd party libraries by definition. If they change, you accommodate.
So your solution is to make a wrapper around this ORM library, publishing ideally domain-related interface to rest of your application but generic interface is probably will do, too. This wrapper you need to test using fullstack automated tests, which inevitably should setup your application along with the database and all configuration and preparation required for it. This tests are not unit-level and expected to be really slow.
You can read about the different levels of tests and how they should be set up in the chapter 6 of the book "Continuous Integration" by Paul M. Duvall.
When writing true unit-level tests for your application level, you mock the wrapper above the ORM library, which you are able to do because you control the code of the wrapper.
This is a standard practice. The obvious benefit of it is that when you decide to update the ORM library or (which is highly possible) when your client/boss decide to switch to another ORM or database which this ORM is not compatible with, you will have the instant feedback about regression errors from the tests of your wrapper and all you'll need to do is to accommodate to changes inside your wrapper.
"Too much maintenance burden" is a fallacy created by lack of automation, by the way.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js