What is the correct way to write unit tests for individual MetaFlow steps? And, how do you test full DAGs using fixtures in place of real datasets? How can I ensure that these tests' artifacts don't pollute the artifact store?
Related
When testing my Apache Spark application, I want to do some integration tests. For that reason I create a local spark appliciation (with hive support enabled), in which the tests are executed.
How can I achieve that after each test, the derby metastore is cleared, so that the next test has a clean environment again.
What I don't want to do is restarting the spark application after each test.
Are there any best practices to achieve what I want?
I think that introduction of some application level logic for integration testing kind of breaks concept of integration testing.
From my point of view correct approach is to restart application for each test.
Anyway I believe another option is to start/stop SparkContext for each test. It should clean any relevant stuff.
UPDATE - answer to comments
Maybe it's possible to do a cleanup by deleting tables/files?
I would ask more general question - what do you want to test with your test?
In a software development is defined unit testing and integration testing. And nothing in between. If you desire to do something that is not integration and not unit test - then you're doing something wrong. Specifically, with your test you try to test something that is already tested.
For the difference and general idea of unit and integration tests you can read here.
I suggest you to rethink your testing and depending on what you want to test do either integration or unit test. For example:
To test application logic - unit test
To test that your application works in environment - integration test. But here you shouldn't test WHAT is stored in Hive. Only that the fact of storage is happened, because WHAT is stored shall be tested by unit test.
So. The conclusion:
I believe you need integration tests to achieve your goals. And the best way to do it - restart your application for each integration test. Because:
In real life your application will be started and stopped
In addition to your Spark stuff - you need to make sure that all your objects in code are correctly deleted/reused. Singletones, Persistent objects, Configurations.. - it all may interfere with your tests
Finally, the code that will perform integration tests - where is a guarantee, that it will not break production logic at some point?
I'd like to use 'ava' tool for both unit and integration testing. But I can't figure out what's the best way to separate those tests. Unit tests should run before the code deployed into test environment, and integration tests need to run after the code has been deployed to the test server.
My challenge is that 'ava' reads it's configuration from 'ava' section of package.json. Not sure how to tell it to use different sets of test sources depending on which stage of deployment it is.
You can also use an ava.config.js file. For now, you could use environment variables to switch the config. Keep an eye on https://github.com/avajs/ava/issues/1857 though which will add a CLI flag so you can select a different config file.
The testing framework I'm writing includes templates for users to write their own tests. These tests have two modes, one for setting up their related files, and one for verifying those files. When users write their test, they must run the test in setup once to generate those files, but then I want to make sure they don't check in tests that are still in setup.
I can assert a test failure in the setup, but how can I trigger the unit tests at checkin and prevent checking in if any of the tests fail?
Is there a better way to prevent users from checking in files in a specific configuration?
You can write a trigger to run the unit tests at checkin and block checkin if the tests fail. I would run the tests against test scripts which are already in ClearCase or someone could create their own local version of the test which is easy to pass (return true).
Another method would be to allow checkin and only fire the trigger upon delivery. That way the user can checkpoint their code but when they deliver it for others to use the unit tests would need to pass before allowing deliver completion.
I have successfully begun to write SSDT unit tests for my recent stored procedure changes. One thing I've noticed is that I wind up creating similar test data for many of the tests. This suggests to me that I should create a set of test data during deployment, as a post-deploy step. This data would then be available to all subsequent tests, and there would be less need for lengthy creation of pre-test scripts. Data which is unique to a given unit test would remain in pre-test scripts.
The problem is that the post-deploy script would run not only during deployment for unit tests, but also during deployment to a real environment. Is there a way to make the post-deploy step (or parts of it) run only during the deployment for an SSDT unit test?
I have seen that the test settings in the app.config include the database project configuration to deploy. But I don't see how to cause different configurations to use different SQLCMD variables.
I also see that we can set different SQLCMD variables in the publish profiles, but I don't see how the unit test settings in app.config can reference different publish profiles.
You could use an IF-statement, checking ##SERVERNAME and only running your Unit Testing code on the Unit Test server(s), with the same type of test for the other environments.
Alternatively you could make use of the Build number in your TFS Build Definition. If the Build contains, for example the substring 'test', you could execute the the test-code, otherwise not. Then you make sure to set an appropriate build number in all your builds.
I have been watching various videos and reading various blogs where they go about unit testing a repository.
The most common pattern is to create a Fake repository that implements the same interface as the real one. Then the fake one uses an internal Dictionary or something.
So in effect you are unit testing the logic of the fakerepository which will never go into production.
Now you may use dependency injection to inject a mock DBContext by using some IDBContext interface. However then you are just testing each repository method which in effect just forward to the dbcontext (which is mocked).
So unless each repository method has lots of logic before calling on the dbcontext then it seems a bit pointless?
I think it would be better to have the tests on repository as integration tests and actually have them hitting the Database?
The new EF 4.1 makes this easy as it can create the database on the fly based on a connection string in your test project, then you can delete it after tests are run using the dbcontext.Database methods.
Your objections are partially correct. Their correctness depends on the way how the repository is defined.
First faking or mocking repository is not for testing repository itself but for testing layers using the repository.
If the repository exposes IQueryable and upper layer can build linq-to-entities query then mocking repository means testing non existing logic. You need integration test and run the query against a real testing database. You can either redeploy database for each test which will make it very slow or you can run each test in a transaction and rollback it when the test completes.
If the repository doesn't exposes IQueryable you can still think about it as a black box and mock it. Query logic will be inside the repository and it will be tested separately with integration tests.
I would refer you to set of other answers about repository itself and testing.
The best approach I have seen is from Sharp Architecture where they use a SQLLite database, created in the TestFixtureSetup based on the NHibernate mapping info.
The repository tests then use this In-Memory database.
Technically this is still integration test as database involved, but practically, it ticks all the boxes for a unit test since:
1) The database is transient - no connection string configs to worry about, nor do you need a complete db sitting on a server somewhere for the unit test to use.
2) The setup is fast, and the tests equally so as all in memory.
3) As it uses the NHibernate mapping info to generate the schema, you don't have to worry about keeping the unit test setup synchronised with code changes.
http://wiki.sharparchitecture.net/default.aspx?AspxAutoDetectCookieSupport=1
It may be possible to use the same approach with EF.