Testing Side Effects in BDDs - unit-testing

I have an API that has the following logic:
Consume from Kafka.
Process the record.
Update the database, if the processing was successful.
If the processing fails, then push it to a Kafka topic.
If pushing to Kafka topic failed, then commit.
If the record was processed successfully, then commit.
If the commit fails, then log and move ahead with consuming the next event.
I am writing BDDs for this API. Currently, I feel like I am testing too many scenarios:
ProcessingFailed -> Database is unchanged -> Event should be pushed to Kafka -> Should be committed.
Kafka push failed -> Should be committed.
Commit failed -> (what to do? Should I check if the log is printed correctly?)
Happy path -> Database updated -> Kafka topic does not contained the event -> Commit was successful.
My question is, what's the proper way to test for such side effects?
Now suppose my process step is made of three steps:
Fetch from the database.
Make an HTTP call.
Now supposing that I am simulating a 'processing failed' by bringing my database down. Now do I also need to test that the HTTP call was not made?

A good general rule for bdd tests is each test should only have one reason to fail. For cucumber this translates to only one Then step in each scenario.
With this as guidance I would recommend writing one scenario per step of the process.
# Consume from Kafka
Given a certain thing has happened
# Process the record
When some action is performed successfully
# Update database if processed successfully
Then some result exists in the database
Then your next scenario starts where the first one left off:
Given a certain thing happened
When the action is performed unsuccessfully
# Push failed message to Kafka queue
Then a failed message is sent
The third scenario picks up where the second one leaves off:
Given a certain thing happened
And the action was performed unsuccessfully
When a failure message is sent
Then a thing should not exist in the database
Each scenario builds off the steps verified in the previous scenarios, being careful to ensure scenarios do not share data, or depend on the success of previously executed scenarios.

Currently, I feel like I am testing too many scenarios
My question is, what's the proper way to test for such side effects?
Well, it sounds to me like you are describing a state machine; where the transitions are driven by representations of different effects in the protocol.
Given that, I would normally expect to see tests for each target state.
Depending on your evaluation of the risks, it might make sense to run your automated checks at a number of different grains -- lots of decoupled tests exploring the different corner cases of the state machine itself, some checks to make sure that the orchestration of the different effects is correct, a few tests to make sure the whole mess works when you wire it all together.
Now do I also need to test that the HTTP call was not made?
There are probably two important questions to ask yourself here:
What are the risks of not having an automated test?
Why is just adding tests not effortless?
If the test subject is "so simple that there are obviously no deficiencies", then the investment odds tell us that investing time and money into extra testing is not a favorable play.
On the other hand, if you are looking for an excuse not to test the thing, then you might want to turn a critical eye toward your design. That's especially true if you are adding/changing code in a module that "already works". A big payoff for test investment comes from having many easy accurate tests for the code we are changing on a regular basis, so reluctance to add a new test for code that you are changing is a Big Red Flag[tm] that something Is Not According To Plan.

Related

How is unit testing testing anything?

I don't understand how I'm testing anything with unit testing.
Suppose I am testing that my repository class can retrieve values from the database correctly. The proper way to do this would be to actually call the real database and retrieve and check those values.
But the idea behind unit testing is that it should be done in isolation, and connecting to a running database is not isolation. So what is usually done is to mock or stub the database.
But why would testing on a fake database with hardcoded data and hardcoded return values even test anything? It seems tautological and a waste of time.
Or am I not understanding how to unit test properly?
Does one even unit test database calls?
I don't understand how I'm testing anything with unit testing.
Short answer: you are testing the logic, and leaving out the side effects.
You aren't testing everything; but you are testing something.
Furthermore, if you keep in mind that you aren't really testing the code with side effects, then you are motivated to arrange your code so that the pieces that actually depend on the side effect are small. The big pieces don't actually care where the data comes from, to those are easy to test.
So "something" can be "most things".
There is an impedance problem -- if your test doubles impersonate the production originals inadequately, then some of your test results will be inaccurate.
my philosophy is to test as little as possible to reach a given level of confidence
Kent Beck, 2008
One way of imagining "as little as possible" is to think in terms of cost -- we're aiming for a given confidence level, so we want to achieve as much of that confidence as we can using cheap unit tests, and then make up the difference with more expensive techniques.
Cory Benfield's talk Building Protocol Libraries the Right Way describes an example of the kind of separation we're talking about here. The logic of how to parse an HTTP message is separable from the problem of reading the bytes. If you make the complicated part easy to test, and the hard to test part too simple to fail, your chances of succeeding are quite good.
I think your concern is valid. For me, TDD is more of an evolutionary design practice than unit testing practice, but I'll save that for another discussion.
In your example, what we are really testing is that the logic contained within your individual classes is sound. By stubbing the data coming from the database you have a controlled scenario that you can ensure your code works for that particular scenario. This makes it much easier to ensure full test coverage for all data scenarios. You're correct that this really doesn't test the whole system end to end, but the point is to reduce the overall test maintenance costs and enable faster feedback.
My approach is to mock most collaborators at the unit test level, then write acceptance tests at the integration test level, which validates your system using real data. Because the unit tests with their mocked data allows you to test various data scenarios out, you only need to test a few of those scenarios using integration tests to feel confident that your code will perform as you expect.
You can test your code against actual database in isolation. Just create new database instance for every test, or execute tests synchronously one after another and clean database before next test.
But using actual database will make your tests slow, which will slow down your work, because you want quick feedback on what you are doing.
Do not test every class - test main feature logic, which can use many different classes and mock/stub only dependencies which makes tests slow.
Find your application boundaries and tests logic between them without mocking.
For example in trivial web api application boundaries can be:
- controller action -> request(input)
- controller action -> response(output)
- database -> side effect of received request.
Assume we live in perfect world where new database and web server setup will takes milliseconds. Then you will tests whole pipeline of your application:
1. Configure database for test
2. Send request to the web api server
3. Assert that response contains expected data
4. Assert that database state changed as expected
But in now days world your boundaries will be controller action and abstracted database access point. Which makes your test look like below:
1. Configure mocked database access point(repository)
2. Call controller action with given parameters
3. Assert that action returns expected result
4. Possibly assert that mocked repository received expected update arguments.
If your application have no logic, just read/update data from database - test with actual database or, if your database framework allows it, use database in-memory.

Do atomic tests make sense in dynamically created environments?

We´re building a product that allows users to create custom databases and store data within those DBs (WebApp).
Our issue for testing of the frontend (coffeescript) is that every test should be atomic but that would require setting up a DB for seeing if an item within that DB can be created and persists or to see how changes in a DB affect items.
Essentially, the issue is that the setup code needed to get to the item tests basically sets up a new DB and therefore equals the code that tests setting up a new DB.
There are two approaches and we´re torn on which to use:
1) Create and tear down a new DB with each group of tests
(+) Sorta Atomic (still fails if setting up a DB fails)
(-) Takes a lot of time to execute
(-) Tons of surounding code
(-) No way to explore the created environment
(-) Messy on errors, everything fails
2) Do the setup step by step as seperate tests depending on each other, cleanup routine at beginning of a test
(+) The created environment can be accessed via the UI (not automatically torn down)
(+) Step by step testing, less overall/repetitive code
(-) Tests depended on each other (messy)
(-) Somewhat overall messy
We´re wondering therefore if the golden rule that tests should be atomic makes sense in such a dynamic environment?
Basically, what you are talking about is Integration tests. These are different from Unit Tests. Examples of integration test would be Automated UI tests or Coded UI tests. In most of the projects I've worked on we've had both types of tests and I strongly encourage you to have both types in your project too.
The philosophy behind both these tests is slightly different.
Unit Tests are meant to test isolated bits of functionality.
They are meant to be very fast.
A developer should be able to run them all on their machine in a reasonable amount of time.
There are various consequences of this philosophy.
Because unit test is testing an isolated bit of functionality, you should use mocks and stubs to isolate the rest of the environment and only focus on tiny bits of functionality.
The isolation helps your "design thinking" while writing these tests. In fact this is the reason why the unit tests are required to be fast, because a developer is actively and constantly changing the code and unit tests as part of the design and redesign process. There should be very low overhead to set up, change and run the unit tests. I should be able to ignore everything other than the problem I am trying to solve and quickly iterate and reiterate my designs and tests. This is the idea behind TDD and its claim to help write good testable code. If you are spending a long time trying to set up an overly complex unit test then you have to start reconsidering your design.
The fast nature means that you could run these as part of your Continuous Integration build.
The disadvantage is that because you are testing each functionality in isolation you don't know if they will all work together as a whole. Each time you write a mock, you are implicitly baking in an assumption about how the rest of the system works and that the rest of the system is currently working as it is meant to (i.e nothing else is broken as part of your deployment or running or patching of the OS etc.)
Integration Tests are meant to test the functionality from end to end. You try NOT to mock out or isolate any part of the system.
There are again various consequence of this philosophy. Note that there is no requirement for integration tests to be fast.
Integration tests, by their very nature need to run after your full deployment (as opposed to unit tests which can be run as soon as your code compiles).
Because they take longer, you don't run them as part of your CI environment, but you still need to run them regularly. We usually run them as part of our nightly builds. Or you can run it twice daily etc.
Because the integration tests take a black box approach to the whole system, it doesn't really help you with you "design thinking" about how to actually build the system. But it does help your thinking about the specifications of the system as a whole. i.e What the system should do, not how it should do something.
Note that in both cases the rule of tests being atomic still applies. Each test is different from other tests. This way when a test fails you can be sure about all the conditions that are causing it to fail and concentrate on only fixing that. It's just that an integration test touches as many parts your system as possible.
To give you an example on our current project.
Lets say we need to write a bit of functionality that requires us to add a new table to the DB and bring it through all the layers to show it in the UI.
We start by creating our business logic classes, domain classes, write the appropriate web service, build view models, modify the database etc. While doing each of these we write unit tests to test the code we are currently writing. So when building the business logic classes, we mock out everything else to ensure that the logic in the class is valid (for example, clients over 60 years old get a 50% discount on their car insurance etc.)
Once we do that, we now need to update our deployment scripts / packages etc. to be able to deploy it. i.e update the database creation SQL scripts and the database alteration SQL scripts etc. (In your case this will be complex process).
Now we write integration tests. In this case we might test from SQL Server to Web Service. There is a SQL Integration test base class which contains the set up and tear down method for each test. In the set up we create a brand new database using our sql deployment scripts. Each test also specifies a test data sql script. So for example this test data script might insert a new record into the client table whose age is 70 years. We run this script as part of the "Arrange" of our test. Then make a web service call to search for clients older than 60. This is the "Act" part of the test and from the result, we check to make sure that we only get back the user we've inserted into the DB. At the end of the test, the database is deleted. We've caught bugs here when the columns in SQL database aren't nullable or the datetime columns overflow because the default minimum datetime in .Net is a different size from SQL server's minimum datetime.
Some functionality requires us to interact with an Oracle database. For example, if a new record is added to Oracle, then a trigger/db procedure kicks off and transfers that record to SQL and then we need to bring it up the layers. In this case we have an OracleSQL integration test base class. As you might have guessed, this follows a simliar pattern, but creates both Oracle and SQL dbs inserts test data into Oracle and blows them both away at the end of the test.
The developers usually pick the Web service layer for writing their integration tests. The testers on the other hand use UI automation tools to make sure that the data is actually showing up on screen. For example they will record a test that goes to web page, clicks search button, puts "60" into the age box, clicks the search button etc. That test might leverages the same test data sql script that inserts test data that the developer wrote (or the testing team might come to the developer and ask help crafting sql scripts to insert whatever highly convoluted data they can think of). But the point is, once the test data insertion script is created, it leverages the same underlying system to blow away the whole db, create a new one, insert test data, and run the specified test.
So, to answer your question, you will need two types of tests, unit tests and integration tests. You might have to put in some initial work into creating some base classes or helper methods to create/delete databases, automating your deployment to install/uninstall other components of your system etc. You will have to do this for your final deployment anyway. Integration tests will also be closely related to and dependent on your deployment strategy. This is an advantage and not a disadvantage in my opinion. While it might be painful at first to set it all up, one of the things your integration tests are implicitly testing is your deployment mechanism. If there are any issues with deploying/installing any of the components required by your system, you want to know about it as quickly as possible. Not the day before you are supposed to be deploying to production.
A good suite of tests is invaluable. It also needs to be isolated, rigorous and comprehensive. The tests shouldn't fail when they don't need to but more importantly, they should fail when they need to. And when they do fail, you want them to provide as much information as possible and point you at the exact location of failure. This makes fixing the issue a much easier task. Any time you put into building this test suite will more than pay for itself in no time.
You're not doing atomic tests if you're talking to a database.
You need to mock the database interface and talk to the mock instead. That will be fast, and you'll be able to use the mock to introduce errors that would be difficult using the real database.

Django PostgreSQL asynchronous commits

PostgreSQL supports asynchronous commits - that is, the database engine can be configured to report success even if the database has not completed the write ahead log sync.
http://www.postgresql.org/docs/8.3/static/runtime-config-wal.html#GUC-SYNCHRONOUS-COMMIT
This provides a useful compromise between running some queries in a manner that guarantees that in the event of database crash, it would remain in a consistent state, however, some allegedly committed transactions would appear as if they have been aborted cleanly.
Obviously for some transactions, it's critical that commits remain final - which is why the flag can be configured per transaction.
How can I take advantage of this functionality in django?
First I second Frank's note. That's the way to do it.
However if you do this you probably want to have a function which sets this on each API that may commit. This seems error prone to me so I probably wouldn't mess with it and would instead try hard to batch the transactions into the same transaction to the extent that makes sense. I would suggest further having a method in your models for showing the setting (SHOW synchronous_commit) so that you can properly unit test.
Again because this is a session setting this strikes me as a bit dangerous to play around with in this way, but it could be done if you take necessary precautions.

How do you model a business workflow in ColdFusion?

Since there's no complete BPM framework/solution in ColdFusion as of yet, how would you model a workflow into a ColdFusion app that can be easily extensible and maintainable?
A business workflow is more then a flowchart that maps nicely into a programming language. For example:
How do you model a task X that follows by multiple tasks Y0,Y1,Y2 that happen in parallel, where Y0 is a human process (need to wait for inputs) and Y1 is a web service that might go wrong and might need auto retry, and Y2 is an automated process; follows by a task Z that only should be carried out when all Y's are completed?
My thoughts...
Seems like I need to do a whole lot of storing / managing / keeping
track of states, and frequent checking with cfscheuler.
cfthread ain't going to help much since some tasks can take days
(e.g. wait for user's confirmation).
I can already image the flow is going to be spread around in multiple UDFs,
DB, and CFCs
any opensource workflow engine in other language that maybe we can port over to CF?
Thank you for your brain power. :)
Study the Java Process Definition Language specification where JBoss has an execution engine for it. Using this Java based engine may be your easiest solution, and it solves many of the problems you've outlined.
If you intend to write your own, you will probably end up modelling states and transitions, vertices and edges in a directed graph. And this as Ciaran Archer wrote are the components of a State Machine. The best persistence approach IMO is capturing versions of whatever data is being sent through workflow via serialization, capturing the current state, and a history of transitions between states and changes to that data. The mechanism probably needs a way to keep track of who or what has responsibility for taking the next action against that workflow.
Based on your question, one thing to consider is whether or not you really need to represent parallel tasks in your solution. Where instead it might be possible to en-queue a set of messages and then specify a wait state for all of those to complete. Representing actual parallelism implies you are moving data simultaneously through several different processes. In which case when they join again you need an algorithm to resolve deltas, which is very much a non trivial task.
In the context of ColdFusion and what you're trying to accomplish, a scheduled task may be necessary if the system you're writing needs to poll other systems. Consider WDDX as a serialization format. JSON, while seductively simple, I recall has some edge cases around numbers and dates that can cause you grief.
Finally see my answer to this question for some additional thoughts.
Off the top of my head I'm thinking about the State design pattern with state persisted to a database. Check out the Head First Design Patterns's Gumball Machine example.
Generally this will work if you have something (like a client / order / etc.) going through a number of changes of state.
Different things will happen to your object depending on what state you are in, and that might mean sitting in a database table waiting for a flag to be updated by a user manually.
In terms of other languages I know Grails has a workflow module available. I don't know if you would be better off porting to CF or jumping ship to Grails (right tool for the job and all that).
It's just a thought, hope it helps.

Hudson - different build targets for different triggers

I would like to have different build targets for periodic builds and for those that are triggered by polling SCM.
More specific: The idea is that nightly builds should call 'mvn verify' which includes integration tests, while a normal build calls 'mvn test' that just executes Unit tests.
Any ideas how this can be achieved using Hudson?
Cheers
Chris
You could create two jobs - one scheduled and the other polled.
In the scheduled you can specify a different maven goal from the polled.
The answer by Raghuram is straight forward and correct. But you can also have three jobs. The first two do the triggering and pass the maven goal as a parameter into the third job. Sounds like a lot of clutter, and to a certain point it is. But it will help, if you have a lot of configuration to do (especially if the configuration needs to be changed regularly). It will help to have the configuration correct for both jobs. Configuration does not only include the build steps but also the harvesting of all reports, post build cleanup, notifications, triggering of downstream jobs, ... Another advantage is, that you don't need to synchronize the two jobs, so that they not run in parallel (if that causes problems).
Don't understand me wrong, my first impulse would go for two jobs, which has it's own advantages. The history for the nightly build will be contain the whole day (actually since the last nightly build) and not only for the time since the last build (which could be a triggered one. Integration tests usually need a more extensive setup or access to scarce resources. With two jobs you don't block these resources when you run the test goal. In addition I expect that more test results need to be harvested to be displayed and tracked over time by Hudson. You also might want to run more metrics against your code whose results should be displayed by Hudson. The disadvantage is that you of course need to keep the build steps basically the same all the time.
But in the end it is a case-by base decision if you go with 2 or 3 jobs.