Related
I know what the advantages are and I use fake data when I am working with more complex systems.
What if I am developing something simple and I can easily set up my environment in a real database and the data being accessed is so small that the access time is not a factor, and I am only running a few tests.
Is it still important to create fake data or can I forget the extra coding and skip right to the real thing?
When I said real database I do not mean a production database, I mean a test database, but using a real live DBMS and the same schema as the real database.
The reasons to use fake data instead of a real DB are:
Speed. If your tests are slow you aren't going to run them. Mocking the DB can make your tests run much faster than they otherwise might.
Control. Your tests need to be the sole source of your test data. When you use fake data, your tests choose which fakes you will be using. So there is no chance that your tests are spoiled because someone left the DB in an unfamiliar state.
Order Independence. We want our tests to be runnable in any order at all. The input of one test should not depend on the output of another. When your tests control the test data, the tests can be independent of each other.
Environment Independence. Your tests should be runnable in any environment. You should be able to run them while on the train, or in a plane, or at home, or at work. They should not depend on external services. When you use fake data, you don't need an external DB.
Now, if you are building a small little application, and by using a real DB (like MySQL) you can achieve the above goals, then by all means use the DB. I do. But make no mistake, as your application grows you will eventually be faced with the need to mock out the DB. That's OK, do it when you need to. YAGNI. Just make sure you DO do it WHEN you need to. If you let it go, you'll pay.
It sort of depends what you want to test. Often you want to test the actual logic in your code not the data in the database, so setting up a complete database just to run your tests is a waste of time.
Also consider the amount of work that goes into maintaining your tests and testdatabase. Testing your code with a database often means your are testing your application as a whole instead of the different parts in isolation. This often result in a lot of work keeping both the database and tests in sync.
And the last problem is that the test should run in isolation so each test should either run on its own version of the database or leave it in exactly the same state as it was before the test ran. This includes the state after a failed test.
Having said that, if you really want to test on your database you can. There are tools that help setting up and tearing down a database, like dbunit.
I've seen people trying to create unit test like this, but almost always it turns out to be much more work then it is actually worth. Most abandoned it halfway during the project, most abandoning ttd completely during the project, thinking the experience transfer to unit testing in general.
So I would recommend keeping tests simple and isolated and encapsulate your code good enough it becomes possible to test your code in isolation.
As far as the Real DB does not get in your way, and you can go faster that way, I would be pragmatic and go for it.
In unit-test, the "test" is more important than the "unit".
I think it depends on whether your queries are fixed inside the repository (the better option, IMO), or whether the repository exposes composable queries; for example - if you have a repository method:
IQueryable<Customer> GetCustomers() {...}
Then your UI could request:
var foo = GetCustomers().Where(x=>SomeUnmappedFunction(x));
bool SomeUnmappedFunction(Customer customer) {
return customer.RegionId == 12345 && customer.Name.StartsWith("foo");
}
This will pass for an object-based fake repo, but will fail for actual db implementations. Of course, you can nullify this by having the repository handle all queries internally (no external composition); for example:
Customer[] GetCustomers(int? regionId, string nameStartsWith, ...) {...}
Because this can't be composed, you can check the DB and the UI independently. With composable queries, you are forced to use integration tests throughout if you want it to be useful.
It rather depends on whether the DB is automatically set up by the test, also whether the database is isolated from other developers.
At the moment it may not be a problem (e.g. only one developer). However (for manual database setup) setting up the database is an extra impediment for running tests, and this is a very bad thing.
If you're just writing a simple one-off application that you absolutely know will not grow, I think a lot of "best practices" just go right out the window.
You don't need to use DI/IOC or have unit tests or mock out your db access if all you're writing is a simple "Contact Us" form. However, where to draw the line between a "simple" app and a "complex" one is difficult.
In other words, use your best judgment as there is no hard-and-set answer to this.
It is ok to do that for the scenario, as long as you don't see them as "unit" tests. Those would be integration tests. You also want to consider if you will be manually testing through the UI again and again, as you might just automated your smoke tests instead. Given that, you might even consider not doing the integration tests at all, and just work at the functional/ui tests level (as they will already be covering the integration).
As others as pointed out, it is hard to draw the line on complex/non complex, and you would usually now when it is too late :(. If you are already used to doing them, I am sure you won't get much overhead. If that were not the case, you could learn from it :)
Assuming that you want to automate this, the most important thing is that you can programmatically generate your initial condition. It sounds like that's the case, and even better you're testing real world data.
However, there are a few drawbacks:
Your real database might not cover certain conditions in your code. If you have fake data, you cause that behavior to happen.
And as you point out, you have a simple application; when it becomes less simple, you'll want to have tests that you can categorize as unit tests and system tests. The unit tests should target a simple piece of functionality, which will be much easier to do with fake data.
One advantage of fake repositories is that your regression / unit testing is consistent since you can expect the same results for the same queries. This makes it easier to build certain unit tests.
There are several disadvantages if your code (if not read-query only) modifies data:
- If you have an error in your code (which is probably why you're testing), you could end up breaking the production database. Even if you didn't break it.
- if the production database changes over time and especially while your code is executing, you may lose track of the test materials that you added and have a hard time later cleaning it out of the database.
- Production queries from other systems accessing the database may treat your test data as real data and this can corrupt results of important business processes somewhere down the road. For example, even if you marked your data with a certain flag or prefix, can you assure that anyone accessing the database will adhere to this schema?
Also, some databases are regulated by privacy laws, so depending on your contract and who owns the main DB, you may or may not be legally allowed to access real data.
If you need to run on a production database, I would recommend running on a copy which you can easily create during of-peak hours.
It's a really simple application, and you can't see it growing, I see no problem running your tests on a real DB. If, however, you think this application will grow, it's important that you account for that in your tests.
Keep everything as simple as you can, and if you require more flexible testing later on, make it so. Plan ahead though, because you don't want to have a huge application in 3 years that relies on old and hacky (for a large application) tests.
The downsides to running tests against your database is lack of speed and the complexity for setting up your database state before running tests.
If you have control over this there is no problem in running the tests directly against the database; it's actually a good approach because it simulates your final product better than running against fake data. The key is to have a pragmatic approach and see best practice as guidelines and not rules.
This article advises to use SQLite for tests even if you use another RDBMS (I use PostgreSQL) in development and production. I tried SQLite for one test case and it ran faster indeed (~18.8 times faster, 0.5s vs 9.4s!).
In which cases could using SQLite result in different test results than if I used PostgreSQL?
Only if I would test a piece of code that contains a raw SQL query?
Any time the query generator might produce queries that behave differently on different platforms despite the efforts of the query generator's platform abstractions. Differences in regular expressions, collations and sorting, different levels of strictness about aggregates and grouping, use of full-text search or other extension features, use of anything but the most utterly simple functions and operators, etc.
Also, as you noted, any time you run raw SQL.
It's moderately reasonable to run tests on SQLite during iterative development, but you really need to run them on the same DB you're going to deploy on before you push to production. Otherwise you'll get bitten by some query where different engines have different capabilities to prove transitive equality through joins and GROUP BY or are differently permissive of queries, so a query will work on one then fail on the other.
You should also test against PostgreSQL on a reasonable data set before pushing changes live in order to find obvious performance regressions that'll be an issue in production. It makes little sense to do this on SQLite, where often totally different queries will be fast or slow.
I'm surprised you're seeing the kind of speed difference you report. I'd want to look into why the tests run so much slower on PostgreSQL and what you can do about it, since in production it's clearly not going to have the same kind of performance difference. I wrote a bit about this in optimise PostgreSQL for fast testing.
The performance characteristics will be very different in most cases. Often faster. It's typically good for testing because the SQLite engine does not need to take into account multiple client access. SQLite only allowed one thread to access it at once. This greatly reduces a lot of the overhead and complexity compared to other RDBMSs.
As far as raw queries go, there are going to be lot of features that SQLite does not support compared to Postgres or another RDBMS. Stay away from raw queries as much as possible to keep your code portable. The exception will be when you need to optimize specific queries for production. In those cases you can keep a setting in settings.py to check if you are on production and run the generic filters instead of a raw query. There are many types of generic raw queries that will not need this sort of checking though.
Also, the fact that a SQLite DB is just a file, it makes it very simple to tear down and start over for testing.
We have a C++ application that utilizes some basic APIs to send raw queries to a MS SQL Server. Scattered through the various translation units in our program, we have simple 1-2 line queries as C++ strings, and every now and then you'll see more complex queries that can be over 20 lines.
I can't help but think that the larger queries, specifically the 20+ line ones, should not be embedded in C++ code as constant strings. I want to propose pulling these out into separate text files that are loaded on-demand by the C++ application, however I'm not sure if this is the best approach.
What design choices are typical for situations like this? I definitely feel there needs to be improvement, I just don't know if moving the SQL queries out into data files (text files) is the best idea.
You could make a DAL (Data Access Layer).
It would be the API that the rest of the program talks to. Then you can mess around and try anything and everything (Stored procedures, caching, etc.) without disturbing the main program.
Move them into their own files, or even into their own stored procedures. Queries embedded in the application cannot be changed without a recompile, and depending on your release procedures, that could severely impair your ability to respond to emergencies or deploy hot fixes. You could alter your app to cache the file contents, if you go down that road, and even periodically check the files for updates.
the best "design choice" - for many different reasons - is to use MSSQL stored procedures whenever/wherever possible.
I've seen code that segregates SQL queries into a common module, but I don't think there's much benefit to a common "queries module" (or a standalone text file) over having the SQL queries spelled out as string literals in the module that's calling them.
Stored procedures, on the other hand, increase modularity, enhance security, and can vastly improve performance.
IMHO...
I would leave the SQL embedded in the C++ functions that use it: it will be easier to read and understand what the code does.
If you have SQL queries scattered around your code I'd say that there is some problem with the overall structure of the classes you are using: you should have some (or even just one) 'low level' classes that handle the interaction with the database, and the rest of the code uses these classes.
I personally don't like using stored procedure: if you have to support a different database server the porting will be a pain, I never saw that much of a performance improvement and to understand what the code does you have to jump back and forth between the stored procedures and the C++.
It really depends, here are some notes:
1) If all your sql code resides in the application, then your application is pretty much self contained in terms of logic. This is good as you have done in the current application. In terms of speed, this can be a little slower as SQL will need to be parsed when when you run these queries(also depends if you used Prepared statements,etc which can speed it up).
2) The second approach is to put all SQL logic as stored procedures on the server. This is a very much preferred approach for even small SQL queries whether one line or not. You just build a DAL layer. In terms of performance this is very good, however the logic lives in two different systems, your C++ app and the SQL server. You will quite likely need to build a small utility application that can translate the stored procedures input and output to template code (be it C++ or any other) to make your life easier.
3) A mixed approach with the above two. I would not recommend this route.
You need to think about how these queries are likely to change over time, and compare it to how the related C++ code is likely to change. If the queries are relatively independent of the code, and have a higher likelihood of change, then I would either load them at runtime from separate files, or use stored procedures instead. That approach allows for changing the queries without recompiling the C++ code. On the other hand, if the queries are highly coupled to the C++ code, making a change in one likely to accompany a change in the other, I would keep the queries in the code. This approach makes a change more localized and less error prone.
I'm thinking of the case where the program doesn't really compute anything, it just DOES a lot. Unit testing makes sense to me when you're writing functions which calculate something and you need to check the result, but what if you aren't calculating anything? For example, a program I maintain at work relies on having the user fill out a form, then opening an external program, and automating the external program to do something based on the user input. The process is fairly involved. There's like 3000 lines of code (spread out across multiple functions*), but I can't think of a single thing which it makes sense to unit test.
That's just an example though. Should you even try to unit test "procedural" programs?
*EDIT
Based on your description these are the places I would look to unit test:
Does the form validation work of user input work correctly
Given valid input from the form is the external program called correctly
Feed in user input to the external program and see if you get the right output
From the sounds of your description the real problem is that the code you're working with is not modular. One of the benefits I find with unit testing is that it code that is difficult to test is either not modular enough or has an awkward interface. Try to break the code down into smaller pieces and you'll find places where it makes sense to write unit tests.
I'm not an expert on this but have been confused for a while for the same reason. Somehow the applications I'm doing just don't fit to the examples given for UNIT testing (very asynchronous and random depending on heavy user interaction)
I realized recently (and please let me know if I'm wrong) that it doesn't make sense to make a sort of global test but rather a myriad of small tests for each component. The easiest is to build the test in the same time or even before creating the actual procedures.
Do you have 3000 lines of code in a single procedure/method? If so, then you probably need to refactor your code into smaller, more understandable pieces to make it maintainable. When you do this, you'll have those parts that you can and should unit test. If not, then you already have those pieces -- the individual procedures/methods that are called by your main program.
Even without unit tests, though, you should still write tests for the code to make sure that you are providing the correct inputs to the external program and testing that you handle the outputs from the program correctly under both normal and exceptional conditions. Techniques used in unit testing -- like mocking -- can be used in these integration tests to ensure that your program is operating correctly without involving the external resource.
An interesting "cut point" for your application is you say "the user fills out a form." If you want to test, you should refactor your code to construct an explicit representation of that form as a data structure. Then you can start collecting forms and testing that the system responds appropriately to each form.
It may be that the actions taken by your system are not observable until something hits the file system. Here are a couple of ideas:
Set up something like a git repository for the initial state of the file system, run a form, and look at the output of git diff. It's likely this is going to feel more like regression testing than unit testing.
Create a new module whose only purpose is to make your program's actions observable. This can be as simple as writing relevant text to a log file or as complex as you like. If necessary, you can use conditional compilation or linking to ensure this module does something only when the system is under test. This is closer to traditional unit testing as you can now write tests that say upon receiving form A, the system should take sequence of actions B. Obviously you have to decide what actions should be observed to form a reasonable test.
I suspect you'll find yourself migrating toward something that looks more like regression testing than unit testing per se. That's not necessarily bad. Don't overlook code coverage!
(A final parenthetical remark: in the bad old days of interactive console applications, Don Libes created a tool called Expect, which was enormously helpful in allowing you to script a program that interacted like a user. In my opinion we desperately need something similar for interacting with web pages. I think I'll post a question about this :-)
You don't necessarily have to implement automated tests that test individual methods or components. You could implement an automated unit test that simulates a user interacting with your application, and test that your application responds in the correct way.
I assume you are manually testing your application currently, if so then think about how you could automate that and work from there. Over time you should be able to break your tests into progressively smaller chunks that test smaller sections of code. Any sort of automated testing is usually a lot better than nothing.
Most programs (regardless of the language paradigm) can be broken into atomic units which take input and provide output. As the other responders have mentioned, look into refactoring the program and breaking it down into smaller pieces. When testing, focus less on the end-to-end functionality and more on the individual steps in which data is processed.
Also, a unit doesn't necessarily need to be an individual function (though this is often the case). A unit is a segment of functionality which can be tested using inputs and measuring outputs. I've seen this when using JUnit to test Java APIs. Individual methods might not necessarily provide the granularity I need for testing, though a series of method calls will. Therefore, the functionality I regard as a "unit" is a little greater than a single method.
You should at least refactor out the stuff that looks like it might be a problem and unit test that. But as a rule, a function shouldn't be that long. You might find something that is unit test worthy once you start refactoring
Good object mentor article on TDD
As a few have answered before, there are a few ways you can test what you have outlined.
First the form input, can be tested in a few ways.
What happens if invalid data is inputted, valid data, etc.
Then each of the function can be tested to see if the functions when supplied with various forms of correct and incorrect data react in the proper manner.
Next you can mock the application that are being called so that you can make sure that your application send and process data to the external programs correctly. Don't for get to make sure your program deals with unexpected data from the external program as well.
Usually, the way I figure out how I want to write tests for a program I have been assigned to maintain, is to see what I am do manually to test the program. Then try and figure how to automate as much of it as possible. Also, don't restrict your testing tools just to the programming language you are writing the code in.
I think a wave of testing paranoia is spreading :) Its good to examine things to see if tests would make sense, sometimes the answer is going to be no.
The only thing that I would test is making sure that bogus form input is handled correctly.. I really don't see where else an automated test would help. I think you'd want the test to be non invasive (i.e. no record is actually saved during testing), so that might rule out the other few possibilities.
If you can't test something how do you know that it works? A key to software design is that the code should be testable. That may make the actual writing of the software more difficult, but it pays off in easier maintenance later.
I have a method that is called on an object to perform some business logic and add it to the database.
The object is a Transaction, and part of the business logic requires searching the databses for related accounts and history items on the account.
There are then a series of comparisons and operations that need to bring back information from the account and apply it to the transaction before the transaction is then passed on to other people and written to the database.
The only way I can think of for testing this currently is within the test to create an account and the relevant history information, then to construct a transaction for each different scenario and capture the information written to the DB for the transaction and information being passed on, however this feels like its testing way too much in one test. Each scenario would be performed in a separate unit test, with the test construction refactored out into separate methods, but the actual piece of code targetted by the test is over 500 lines long.
I guess this question is more about refactoring than unit testing, but in this case they go hand in hand.
If anyone has any advice (good or bad) then I'd be glad to hear it.
EDIT:
Pseudo code:
Find account for transaction
Do validation on transaction codes and values
Update transaction with info from account
Get related history from account Handle different transaction codes and values (6 different combinations, each with different logic)
Update the transaction again with new account info (resulting from business logic)
Send transaction to clients
I would appreciate it if you had some pseudocode on this question, but just following it over I would:
Create interfaces for the data access objects that directly access the database - this way you can pass in an object that only pretends (e.g., mocks) that it accesses the database. This object would then return results consistent with the results your database would return, without actually doing any DB call. Your object could also simulate scenarios such as rolling back data to its original state.
Extract each "scenario" into a single method each - that is the essence of a unit. If your method is 500 lines long then there must be contiguous blocks in there that can be extracted. Write a unit test for each, if appropriate.
If your unit test is testing too much, that probably means your method is doing too much - You can extract methods by identifying the different things you are testing and then putting them in their own methods. Rinse and repeat until you only need one test for each method.
Transactions "passed on to other people" sounds like a code smell - a transaction in and by itself should only be one contiguous unit. If you need different users to finish your transaction, you're doing too much; keep track of your data's state on the DB instead, in terms of flags or such, not in terms of a DB transaction.
Separating out units from existing legacy code can be extremely tricky and time consuming. Check out Working Effectively With Legacy Code for a variety of tried and tested techniques to make things more manageable.
This depends on what you would like to test. Would you like to test the database transaction? Would you like to test the business transaction or something else? Try to use mockups for things you would not like to test. With mockups you can concentrate on certain test objectives.
Yes you -can- rewrite you current code so that it can be unit tested according
to all guidelines and best-practices.
However, that can be expensive, and You should estimate the cost and compare that
against the earnings...
The earnings is that you might discover a problem with the code and also, if done right, the reduction of the complexity as the result of the refactoring.
Both factors might save some time - in the future.
The cost is the time and effort you have to spend both refactor your code,
writing the test cases and also the extra time you might have to spend in the future
to maintain the test-cases and the mocking code - and that can be significant costs.
You are comparing a known cost against a future risk and I am sure a lot of smart guys knows how to do that, but it's obvious that You can actually spend an infinite time refactoring and mocking without ever reducing the risk of failure to zero (or even at all if the code and problem is complex and you are messing things up when refactoring), so you need to find a balance here.
In this case, as the code is old, it might be ok to be "sloppy" or "pragmatic" and do black box testing - or top-down testing and just test the interface (or abstraction) without bothering to mock the database. And yes, you can argue that this is not a unit test but instead a system test or a function test practice.
...But, it might give the best value for your money - or your employers/customers money - or more time with your significant other (or at least more time to watch discovery channel.)
If you have old code, allow black box testing, allow dependencies between the tests and compile a sequence of test that sets up the test data and manipulates it, and its at least tested automatically while not tested 100%.