unit testing data storage

unit testing data storage - unit-testing

Suppose I have an interface with methods 'storeData(key, data)' and 'getData(key)'. How should I test a concrete implementation? Should I check if the data was correctly set in the storage medium (eg an sql database) or should I just check whether or not it gives the correct data back by using getData?
If I look up the data in the database it feels like I'm also testing the internals of the method but only checking whether it gives the same data back feels incomplete.

You seem to be caught up in the hype of unit testing, what you will be doing is actually an integration test. Setting and getting back the same value from the same key is a unit test you'd do with a mock implementation of the storage engine, but actually testing the real storage, say your database, as you should, that is no longer a unit test, but it is a fundamental part of testing, and it sounds like integration testing to me. Don't use unit testing as your hammer, choose the right tools for the right job. Divide your testing into more layers.

What you want to do in a unit test is make sure that the method does the job that it is supposed to do. If the method uses dependencies to accomplish it's work, you would mock those dependencies out and make sure that your method calls the methods on the objects it depends on with the appropriate arguments. This way you test your code in isolation.
One of the benefits to this is that it will drive the design of your code in a better direction. In order to use mocking, for example, you naturally gravitate towards more decoupled code using dependency injection. This gives you the ability to easily substitute your mock objects for the actual objects that your class depends on. You also end up implementing interfaces, which are more naturally mocked. Both of these things are good design patterns and will improve your code.
In order to test your particular example, for instance, you might have your class depend on a factory to create connections to the database and a builder to construct parameterized SQL commands that are executed via the connection. You'd pass these mocked versions of these objects to your class and ensure that the correct methods to set up the connection and command, build the correct command, execute it, and tear down the connection were invoked. Or perhaps, you inject an already open connection and simply build the command and invoke it. The point is your class is built against an interface or set of interfaces and you use mocking to supply objects that implement those interfaces and can record invocations and supply correct return values to the methods that you expect to use from the interface(s).

In cases like this I will usually create SetUp and TearDown methods that fire before/after my unit tests. These methods will set up any test data I need in the db and delete any test data when I'm done. Pseudo code example:
Const KEY1 = "somekey"
Const VALUE1= "somevalue"
Const KEY2 = "somekey2"
Const VALUE2= "somevalue2"
Sub SetUpUnitTests()
{
Insert Into SQLTable(KEY1,VALUE1)
}
//this test is not dependent on the setData Method
Sub GetDataTest()
{
Assert.IsEqual(getData(KEY1),VALUE1)
}
//this test is not dependent on getData Method
Sub SetDataTest()
{
storeData(newKey,NewData)
Assert.IsNotNull(Direct Call to SQL [Select data from table where key=KEY2])
}
Sub TearDownUnitTests()
{
Delete From table Where key in (KEY1, KEY2)
}

Testing both in concert is a common technique (at least, in my experience), and I wouldn't shy away from it. I've used this same pattern for serializing/deserializing and parsing and printing.
If you don't want to hit the database, you could use a database mock. Some people have the same feelings as you when using mocks - it is partly implementation specific. As in all things, it's a trade-off: consider the benefits of mocking (faster, not db dependent) vs its downsides (won't detect actual db problems, slower).

I think it depends on what happens to the data later - if you're only ever going to access the data using storeData and getData, why not test the methods in concert? I suppose there's a chance that a bug will arise and it'll be slightly harder to figure out whether it's in storeData or getData, but I'd consider that an acceptable risk if it
makes your test easier to implement, and
conceals the internals, as you say
If the data will be read from, or inserted into, the database using some other mechanism, then I'd check the database using SQL as you suggest.
#brendan makes a good point, though - whichever method you decide on, you'll be inserting data in the database. It's a good idea to clear out the data before and after the tests to ensure that you can achieve consistent results.

Related

Understanding stubs, fakes and mocks.

I have just started to read Professional Test Driven Development with C#: Developing Real World Applications with TDD
I have a hard time understanding stubs, fakes and mocks. From what I understand so far, they are fake objects used for the purpose of unit testing your projects, and that a mock is a stub with conditional logic into it.
Another thing I think I have picked up is that mocks are somehow related with dependency injection, a concept which I only managed to understand yesterday.
What I do not get is why I would actually use them. I cannot seem to find any concrete examples online that explains them properly.
Can anyone please explain to me this concepts?

As I've read in the past, here's what I believe each term stands for
Stub
Here you are stubbing the result of a method to a known value, just to let the code run without issues. For example, let's say you had the following:
public int CalculateDiskSize(string networkShareName)
{
// This method does things on a network drive.
}
You don't care what the return value of this method is, it's not relevant. Plus it could cause an exception when executed if the network drive is not available. So you stub the result in order to avoid potential execution issues with the method.
So you end up doing something like:
sut.WhenCalled(() => sut.CalculateDiskSize()).Returns(10);
Fake
With a fake you are returning fake data, or creating a fake instance of an object. A classic example are repository classes. Take this method:
public int CalculateTotalSalary(IList<Employee> employees) { }
Normally the above method would be passed a collection of employees that were read from a database. However in your unit tests you don't want to access a database. So you create a fake employees list:
IList<Employee> fakeEmployees = new List<Employee>();
You can then add items to fakeEmployees and assert the expected results, in this case the total salary.
Mocks
When using mock objects you intend to verify some behaviour, or data, on those mock objects. Example:
You want to verify that a specific method was executed during a test run, here's a generic example using Moq mocking framework:
public void Test()
{
// Arrange.
var mock = new Mock<ISomething>();
mock.Expect(m => m.MethodToCheckIfCalled()).Verifiable();
var sut = new ThingToTest();
// Act.
sut.DoSomething(mock.Object);
// Assert
mock.Verify(m => m.MethodToCheckIfCalled());
}
Hopefully the above helps clarify things a bit.
EDIT:
Roy Osherove is a well-known advocate of Test Driven Development, and he has some excellent information on the topic. You may find it very useful :
http://artofunittesting.com/

They are all variations of the Test Double. Here is a very good reference that explains the differences between them: http://xunitpatterns.com/Test%20Double.html
Also, from Martin Fowler's post: http://martinfowler.com/articles/mocksArentStubs.html
Meszaros uses the term Test Double as the generic term for any kind of
pretend object used in place of a real object for testing purposes.
The name comes from the notion of a Stunt Double in movies. (One of
his aims was to avoid using any name that was already widely used.)
Meszaros then defined four particular kinds of double:
Dummy objects: are passed around but never actually used. Usually they
are just used to fill parameter lists.
Fake objects actually have working implementations, but usually take some shortcut which makes
them not suitable for production (an in memory database is a good
example).
Stubs provide canned answers to calls made during the test,
usually not responding at all to anything outside what's programmed in
for the test. Stubs may also record information about calls, such as
an email gateway stub that remembers the messages it 'sent', or maybe
only how many messages it 'sent'.
Mocks are what we are talking about here: objects pre-programmed with expectations which form a
specification of the calls they are expected to receive.
Of these kinds of doubles, only mocks insist upon behavior verification. The
other doubles can, and usually do, use state verification. Mocks
actually do behave like other doubles during the exercise phase, as
they need to make the SUT believe it's talking with its real
collaborators.

This PHP Unit's manual helped me a lot as introduction:
"Sometimes it is just plain hard to test the system under test (SUT) because it depends on other components that cannot be used in the test environment. This could be because they aren't available, they will not return the results needed for the test or because executing them would have undesirable side effects. In other cases, our test strategy requires us to have more control or visibility of the internal behavior of the SUT." More: https://phpunit.de/manual/current/en/test-doubles.html
And i find better "introductions" when looking for "test doubles" as mocks, fakes, stubs and the others are known.

How would you create unit tests for a data intensive application which could run an endless amount of db queries?

I am working on a reporting application (in PHP). This app has a huge amount of different filters, granulations, etc. in the UI and based on those filters etc, the backend constructs a massive query to pull hundreds of rows of data from the db.
How is it possible to write unit tests for something like this?
Lets say I create a test db with some known data. Would I create a bunch of tests where I compare the returned data set (for whatever filter settings) against hardcoded SQL queries in the tests?
Would this mean that for any schema change, I have to go back and change every single SQL query in the tests?

Unit testing isn't testing in way that uses real code or data, you mock everything you work with. You wouldn't test it in the way you are describing, nor need to. You aren't testing what data you get, only that the data you feed it, after the method processes it, is what you expect or similar.
For example, if you have a method that returns data retrieved from a database, the database has nothing to do with your test. You are testing just that method and the logic there within; what methods you may call within it, expectations as to what you expect those methods within it to do (like return a generic representation of a value you can do an assertion on) etc, and everything outside of that method is mocked (i.e. a generic representation).
In a simple example, if you created one method that is a setter of something, and a one method used as a getter of that something, then you will write a test that says when I use the setter the getter will return the same value.... boom, both methods are tested.
This is the reason why you hear about TDD (test driven development), which may feel counter intuitive at first, but it forces a developer to put together the pieces required to write testable code, which ultimately leads to code that's better. Yes, you can write code that functions perfectly, but it's not necessarily testable (or nearly impossible to), and that's an indicator that it's entirely too coupled, meaning it's not that reusable. For example, instead of creating a method that returns the number of apples, you could create a method that injects the object type so no matter what type of fruit you are using in that part of the project, it could return you a count (oranges, apples, pears, or not even fruit at all). That makes that method reusable, and also means you won't be writing methods for each type of fruit either (so you write less code).
Anyway, provide an example of your code, and your test, to see what the issue is.

How to test the function behavior in unit test?

If a function just calls another function or performs actions. How do I test it? Currently, I enforce all the functions should return a value so that I could assert the function return values. However, I think this approach mass up the API because in the production code. I don't need those functions to return value. Any good solutions?
I think mock object might be a possible solution. I want to know when should I use assert and when should I use mock objects? Is there any general guide line?
Thank you

Let's use BufferedStream.Flush() as an example method that doesn't return anything; how would we test this method if we had written it ourselves?
There is always some observable effect, otherwise the method would not exist. So the answer can be to test for the effect:
[Test]
public void FlushWritesToUnderlyingStream()
{
var memory = new byte[10];
var memoryStream = new MemoryStream(memory);
var buffered = new BufferedStream(memoryStream);
buffered.Write(0xFF);
Assert.AreEqual(0x00, memory[0]); // not yet flushed, memory unchanged
buffered.Flush();
Assert.AreEqual(0xFF, memory[0]); // now it has changed
}
The trick is to structure your code so that these effects aren't too hard to observe in a test:
explicitly pass collaborator objects,
just like how the memoryStream is passed
to the BufferedStream in the constructor.
This is called dependency
injection.
program against an interface, just
like how BufferedStream is programmed
against the Stream interface. This enables
you to pass simpler, test-friendly implementations (like MemoryStream in this case) or use a mocking framework (like MoQ or RhinoMocks), which is all great for unit testing.

Sorry for not answering straight but ... are you sure you have the exact balance in your testing?
I wonder if you are not testing too much ?
Do you really need to test a function that merely delegates to another?
Returns only for the tests
I agree with you when you write you don't want to add return values that are useful only for the tests, not for production. This clutters your API, making it less clear, which is a huge cost in the end.
Also, your return value could seem correct to the test, but nothing says that the implementation is returning the return value that corresponds to the implementation, so the test is probably not proving anything anyway...
Costs
Note that testing has an initial cost, the cost of writing the test.
If the implementation is very easy, the risk of failure is ridiculously low, but the time spend testing still accumulates (over hundred or thousands cases, it ends up being pretty serious).
But more than that, each time you refactor your production code, you will probably have to refactor your tests also. So the maintenance cost of your tests will be high.
Testing the implementation
Testing what a method does (what other methods it calls, etc) is critized, just like testing a private method... There are several points made:
this is fragile and costly : any code refactoring will break the tests, so this increases the maintenance cost
Testing a private method does not bring much safety to your production code, because your production code is not making that call. It's like verifying something you won't actually need.
When a code delegates effectively to another, the implementation is so simple that the risk of mistakes is very low, and the code almost never changes, so what works once (when you write it) will never break...

Yes, mock is generally the way to go, if you want to test that a certain function is called and that certain parameters are passed in.
Here's how to do it in Typemock (C#):
Isolate.Verify.WasCalledWithAnyArguments(()=> myInstance.WeatherService("","", null,0));
Isolate.Verify.WasCalledWithExactArguments(()=> myInstance. StockQuote("","", null,0));
In general, you should use Assert as much as possible, until when you can't have it ( For example, when you have to test whether you call an external Web service API properly, in this case you can't/ don't want to communicate with the web service directly). In this case you use mock to verify that a certain web service method is correctly called with correct parameters.

"I want to know when should I use assert and when should I use mock objects? Is there any general guide line?"
There's an absolute, fixed and important rule.
Your tests must contain assert. The presence of assert is what you use to see if the test passed or failed. A test is a method that calls the "component under test" (a function, an object, whatever) in a specific fixture, and makes specific assertions about the component's behavior.
A test asserts something about the component being tested. Every test must have an assert, or it isn't a test. If it doesn't have assert, it's not clear what you're doing.
A mock is a replacement for a component to simplify the test configuration. It is a "mock" or "imitation" or "false" component that replaces a real component. You use mocks to replace something and simplify your testing.
Let's say you're going to test function a. And function a calls function b.
The tests for function a must have an assert (or it's not a test).
The tests for a may need a mock for function b. To isolate the two functions, you test a with a mock for function b.
The tests for function b must have an assert (or it's not a test).
The tests for b may not need anything mocked. Or, perhaps b makes an OS API call. This may need to be mocked. Or perhaps b writes to a file. This may need to be mocked.

State/Interaction testing and confusion on mixing (or abusing) them

I think understand the definition of State / Interaction based testing (read the Fowler thing, etc). I found that I started state based but have been doing more interaction based and I'm getting a bit confused on how to test certain things.
I have a controller in MVC and an action calls a service to deny a package:
public ActionResult Deny(int id)
{
service.DenyPackage(id);
return RedirectToAction("List");
}
This seems clear to me. Provide a mock service, verify it was called correctly, done.
Now, I have an action for a view that lets the user associate a certificate with a package:
public ActionResult Upload(int id)
{
var package = packageRepository.GetPackage(id);
var certificates = certificateRepository.GetAllCertificates();
var view = new PackageUploadViewModel(package, certificates);
return View(view);
}
This one I'm a bit stumped on. I'm doing Spec style tests (possibly incorrectly) so to test this method I have a class and then two tests: verify the package repository was called, verify the certificate repository was called. I actually want a third to test to verify that the constructor was called but have no idea how to do that! I'm get the impression this is completely wrong.
So for state based testing I would pass in the id and then test the ActionResult's view. Okay, that makes sense. But wouldn't I have a test on the PackageUploadViewModel constructor? So if I have a test on the constructor, then part of me would just want to verify that I call the constructor and that the action return matches what the constructor returns.
Now, another option I can think of is I have a PackageUploadViewModelBuilder (or something equally dumbly named) that has dependency on the two repositories and then I just pass the id to a CreateViewModel method or something. I could then mock this object, verify everything, and be happy. But ... well ... it seems extravagant. I'm making something simple ... not simple. Plus, controller.action(id) returning builder.create(id) seems like adding a layer for no reason (the controller is responsible for building view models.. right?)
I dunno... I'm thinking more state based testing is necessary, but I'm afraid if I start testing return values then if Method A can get called in 8 different contexts I'm going to have a test explosion with a lot of repetition. I had been using interaction based testing to pass some of those contexts to Method B so that all I have to do is verify Method A called Method B and I have Method B tested so Method A can just trust that those contexts are handled. So interaction based testing is building this hierarchy of tests but state based testing is going to flatten it out some.
I have no idea if that made any sense.
Wow, this is long ...

I think Roy Osherove recently twitted that as a rule of thumb, your tests should be 95 percent state-based and 5 percent interaction-based. I agree.
What matters most is that your API does what you want it to, and that is what you need to test. If you test the mechanics of how it achieves what it needs to do, you are very likely to end up with Overspecified Tests, which will bite you when it comes to maintainability.
In most cases, you can design your API so that state-based testing is the natural choice, because that is just so much easier.
To examine your Upload example: Does it matter that GetPackage and GetAllCertificates was called? Is that really the expected outcome of the Upload method?
I would guess not. My guess is that the purpose of the Upload method - it's very reason for existing - is to populate and serve the correct View.
So state-based testing would examine the returned ViewResult and its ViewModel and verify that it has all the correct values.
Sure, as the code stands right now, you will need to provide Test Doubles for packageRepository and certificateRepository, because otherwise exceptions will be thrown, but it doesn't look like it is important in itself that the repository methods are being called.
If you use Stubs instead of Mocks for your repositories, your tests are no longer tied to internal implementation details. If you later on decide to change the implementation of the Upload method to use cached instances of packages (or whatever), the Stub will not be called, but that's okay because it's not important anyway - what is important is that the returned View contains the expected data.
This is much more preferrable than having the test break even if all the returned data is as it should be.
Interestingly, your Deny example looks like a prime example where interaction-based testing is still warranted, because it is only by examining Indirect Outputs that you can verify that the method performed the correct action (the DenyPackage method returns void).
All this, and more, is explained very well in the excellent book xUnit Test Patterns.

The question to ask is "if this code worked, how could I tell?" That might mean testing some interactions or some state, it depends on what's important.
In your first test, the Deny changes the world outside the target class. It requires a collaboration from a service, so testing an interaction makes sense. In your second test, you're making queries on the neighbours (not changing anything outside the target class), so stubbing them makes more sense.
That's why we have a heuristic of "Stub Queries, Mock Actions" in http://www.mockobjects.com/book

How to test function call order

Considering such code:
class ToBeTested {
public:
void doForEach() {
for (vector<Contained>::iterator it = m_contained.begin(); it != m_contained.end(); it++) {
doOnce(*it);
doTwice(*it);
doTwice(*it);
}
}
void doOnce(Contained & c) {
// do something
}
void doTwice(Contained & c) {
// do something
}
// other methods
private:
vector<Contained> m_contained;
}
I want to test that if I fill vector with 3 values my functions will be called in proper order and quantity. For example my test can look something like this:
tobeTested.AddContained(one);
tobeTested.AddContained(two);
tobeTested.AddContained(three);
BEGIN_PROC_TEST()
SHOULD_BE_CALLED(doOnce, 1)
SHOULD_BE_CALLED(doTwice, 2)
SHOULD_BE_CALLED(doOnce, 1)
SHOULD_BE_CALLED(doTwice, 2)
SHOULD_BE_CALLED(doOnce, 1)
SHOULD_BE_CALLED(doTwice, 2)
tobeTested.doForEach()
END_PROC_TEST()
How do you recommend to test this? Are there any means to do this with CppUnit or GoogleTest frameworks? Maybe some other unit test framework allow to perform such tests?
I understand that probably this is impossible without calling any debug functions from these functions, but at least can it be done automatically in some test framework. I don't like to scan trace logs and check their correctness.
UPD: I'm trying to check not only the state of an objects, but also the execution order to avoid performance issues on the earliest possible stage (and in general I want to know that my code is executed exactly as I expected).

You should be able to use any good mocking framework to verify that calls to a collaborating object are done in a specific order.
However, you don't generally test that one method makes some calls to other methods on the same class... why would you?
Generally, when you're testing a class, you only care about testing its publicly visible state. If you test
anything else, your tests will prevent you from refactoring later.
I could provide more help, but I don't think your example is consistent (Where is the implementation for the AddContained method?).

If you're interested in performance, I recommend that you write a test that measures performance.
Check the current time, run the method you're concerned about, then check the time again. Assert that the total time taken is less than some value.
The problem with check that methods are called in a certain order is that your code is going to have to change, and you don't want to have to update your tests when that happens. You should focus on testing the actual requirement instead of testing the implementation detail that meets that requirement.
That said, if you really want to test that your methods are called in a certain order, you'll need to do the following:
Move them to another class, call it Collaborator
Add an instance of this other class to the ToBeTested class
Use a mocking framework to set the instance variable on ToBeTested to be a mock of the Collborator class
Call the method under test
Use your mocking framework to assert that the methods were called on your mock in the correct order.
I'm not a native cpp speaker so I can't comment on which mocking framework you should use, but I see some other commenters have added their suggestions on this front.

You could check out mockpp.

Instead of trying to figure out how many functions were called, and in what order, find a set of inputs that can only produce an expected output if you call things in the right order.

Some mocking frameworks allow you to set up ordered expectations, which lets you say exactly which function calls you expect in a certain order. For example, RhinoMocks for C# allows this.
I am not a C++ coder so I'm not aware of what's available for C++, but that's one type of tool that might allow what you're trying to do.

http://msdn.microsoft.com/en-au/magazine/cc301356.aspx
This is a good article about Context Bound Objects. It contains some so advanced stuff, but if you are not lazy and really want to understand this kind of things it will be really helpful.
At the end you will be able to write something like:
[CallTracingAttribute()]
public class TraceMe : ContextBoundObject
{...}

You could use ACE (or similar) debug frameworks, and in your test, configure the debug object to stream to a file. Then you just need to check the file.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js