Testing Spark: how to create a clean environment for each test - unit-testing

When testing my Apache Spark application, I want to do some integration tests. For that reason I create a local spark appliciation (with hive support enabled), in which the tests are executed.
How can I achieve that after each test, the derby metastore is cleared, so that the next test has a clean environment again.
What I don't want to do is restarting the spark application after each test.
Are there any best practices to achieve what I want?

I think that introduction of some application level logic for integration testing kind of breaks concept of integration testing.
From my point of view correct approach is to restart application for each test.
Anyway I believe another option is to start/stop SparkContext for each test. It should clean any relevant stuff.
UPDATE - answer to comments
Maybe it's possible to do a cleanup by deleting tables/files?
I would ask more general question - what do you want to test with your test?
In a software development is defined unit testing and integration testing. And nothing in between. If you desire to do something that is not integration and not unit test - then you're doing something wrong. Specifically, with your test you try to test something that is already tested.
For the difference and general idea of unit and integration tests you can read here.
I suggest you to rethink your testing and depending on what you want to test do either integration or unit test. For example:
To test application logic - unit test
To test that your application works in environment - integration test. But here you shouldn't test WHAT is stored in Hive. Only that the fact of storage is happened, because WHAT is stored shall be tested by unit test.
So. The conclusion:
I believe you need integration tests to achieve your goals. And the best way to do it - restart your application for each integration test. Because:
In real life your application will be started and stopped
In addition to your Spark stuff - you need to make sure that all your objects in code are correctly deleted/reused. Singletones, Persistent objects, Configurations.. - it all may interfere with your tests
Finally, the code that will perform integration tests - where is a guarantee, that it will not break production logic at some point?

Related

Do SpringBoot's tests with MockMVC fit more in surefire (mvn test) or failsafe (mvn verify)?

I am assessing what is the perks and cons of using each approach.
To begin with, I am not sure whether a mockmvc can be considered a true integration test, since it mocks internal dependencies.
Even if I used an actual instance with true requests for my tests, I'm still mocking my external dependencies, and I am not quite sure the aim of a true integration/verify test is testing the environment as if it was real.
Besides, putting this controller tests in verify makes my pipeline longer and slower, since it will be interrupted after an unnecessary package and the like.
What do you thing is a proper schema for optimizing these tools in a build process?
One of the ideas I have is trying to use it like 2 profiles:
-Profile test would execute all IT tests with mocked external dependencies on test phase
- Profile integration would execute all IT tests with real prod config on verify
But tests would be the same.
Out of my personal experience, we've been in the same dilemma. We've ended up using both types of test:
- unit tests managed by surefire plugin
- integration tests managed by failsafe plugin.
Both were running during the build (but at different phases of course)
Now, regarding the controller tests:
I believe unit tests should be blazing fast, tens or hundreds of them should run within 1 second or so they also should not have external dependencies and run all-in-memory (no sockets, networking, databases, etc.)
These tests should be run by the programmer any time during the development, maybe 5 times in a minute, just to make sure the small refactoring doesn't break something, for example.
On the other hand, controller tests run the whole spring thing, which is by definition is not that fast. As for external dependencies, depending on the configuration of mock MVC you can even end up running some kind of internal server to serve the requests, so its far (IMO) from being a unit test.
That's why we've decided to run those with failsafe plugin and be integration tests.
Of course, Spring configurations if used properly can be cached by Spring between the tests, but this fact can only help and make integration tests run faster, but it doesn't mean that this kind of tests is a unit test.

Jenkins and Sonarqube - where to run unit tests

I'm just starting to mess about with continous integration. Therefore I wanted to set up Jenkins as also Sonarqube. While reading manuals/docs and tutorials I got a little bit confused.
For both systems, there are descriptions about how to set up unit test runners. So where should unit tests ideally be run? In Jenkins or in Sonarqube or in both systems? Where does it belong in theory/best practice?
We have configured Jenkins to launch the unit tests and the results are “forwarded” to Sonar to be interpreted as a post build action
The Best practice would be running the Unit test in Jenkins. This would ensure the Unit test cases are executed before we Build/Deploy.
SonarQube is normally used to ensure the quality of the code which will point out the bad codes, based on the guidelines/rules.It also gives the report on the Unit test coverage, Lines of code etc.
Usually it's done in Jenkins as you want to actually test your code before building the module.

Java EE test strategy

Java EE is a new world for me, my experiences are on embedded systems, but I started a new job and I would like to know if there is a test process to follow for web applications based on Java EE. Which test strategy is usually adopted in this field?
Basic Unit test
Functional test
Integration test
System test, stress test, load test,....
....
and which is the scope of each test phase for web development? As server code and client code are both involved I don't know which is the best approach in this field. Also, several machines are involved: DB, buisness tier, presentation tier, load balancers, authentication with CAS, Active Directory,...
Which is the best test environment for each phases? When using the production CAS authentication, ...
Links, books, simple explanation or other kind of address is well appreciated.
The best test framework is Junit -for unit tests, in my opinion.
http://www.junit.org/
-for mocking objects, which you will need a lot, like to mock the database, mock services and other object in j2ee environment to be able to test in isolation .use http://www.jmock.org/ , http://code.google.com/p/mockito/, http://www.easymock.org/
-for acceptance and functional testing there is selenium http://seleniumhq.org/ this framework enables you to automate your tests.
I Advice you to read this books about testing in general and testing in j2ee evironment in particular.
http://www.manning.com/rainsberger/
http://www.amazon.com/Test-Driven-Development-By-Example/dp/0321146530
http://manning.com/massol/
http://manning.com/koskela/
First, whatever you plan to do as testing, take care of your build process (a good starting point is maven as build tool)
Junit (or testng) is almost good for everything (due to its simplicity)
Unit test:
For mock, I would prefer Mockito to jmock or easymock.
Acceptance test:
Regarding UI testing selenium is fine for web application (give a look at PageObject pattern if you plan to do a lot of UI testing).
For other interface testing (such as webservice), soapui is a nice starting point.
Integration testing:
You will face the middle ware problem, mainly solved in java by a container. Now it becomes fun :) If you run in "real" JEE, then it depends if it's prior to JEE6 or not as from JEE6 you have an embedded container (which really ease the testing). Otherwise, go for a dependency injection framework (Spring, Guice, ...).
Other hints for integration or acceptance testing:
you will may be need to mock some interface (give a look to MOCO to mock external service based on HTTP).
also think about some embedded servlet container (Jetty) to ease web the testing.
configuration and provisioning can be a problem too. ex.: for the DB you can automate this with "flyway" or "liquibase"
DB testing you have two approach: resetting data after each test (see DBUnit) or in transaction testing (see Spring test for an example)

DAO Unit testing

I have been looking at EasyMock and tutorials/examples around using it for Unit Testing DAO classes, for an "outside container" test. However, I think most of them talk about testing the Service Layer instead, mocking the DAO class. I am a bit confused, is it really how you Unit Test the DAO layer?
Some would say that the tests interacting with DB & EJBs are actually Integration tests and not Unit tests but then how would you know if your SQL is correct (assuming no ORM) and your DAO inserts/queries the right data from your real (read, local database which is similar to that in production) database?
I read that DBUnit is a solution for such a situation. But my question is about using a framework like DBUnit "outside container". What if the DAO depends on some EJBs, how do we handle the transactions, what happens if there are triggers that update other tables on your inserts?
What is the best way to Unit Test only the DAOs with such dependencies?
Personally, I unit test DAOs by hitting some sort of test database, preferable the same type of database (not the SAME database, obviously) that your app uses in production.
I think if you do that, the test is more of an integration test, because it has a dependency on a running database. This approach has the benefit in that it is as close as possible to your running production environment. It has the downsides that you need test configuration, you need a running test database (either local to your machine or somewhere in your environment) and the tests can take longer to run. You also need to be sure to rollback the test data after tests execute.
Once DAOs are tested, definitely mock them to unit test your services.
Typically with DAOs the idea is to have a minimal wrapper around data-access code, so there's nothing there to test except for the mapping to the database, and unit tests with mocks are useless. If there is actually logic in the DAO worth testing with mocks, then an argument could be made that you're misusing the DAO pattern and that the logic should be in a service.
For testing the mapping to the database DBUnit is useful because it allows you to specify a starting dataset before the test so your test starts from a known state, and it allows you to specify what the ending state of the data should be, so you don't have to write a lot of unit test code asserting what is there is what is expected.
Ideally if you have a tool like Hibernate that abstracts the database away you can get by with using an in-memory database like H2 or HSQLDB, so your tests run faster and there's no database to create. If you do have to use a real database make sure your tests have it to themselves so they can create and delete data without impacting or being impacted by other processes. In practice having a database to yourself, both locally and in CI environments, is unlikely and using the in-memory database is much more practical.
Complementing on Koya anwers, you can use HSQLDB for DAO testing. I imagine you have use Spring and Hibernate in your project. You would need separate configurations files to point for the HSQLDB, you would need to insert data prior to execute the tests. There are some limitations to what you can do with HSQLDB, but it is OK for general use as queries and joins. With this solution can be used in a continous environment , such as jenkins.
Integration tests could use the HSQLDB , so this part is not mocked.
I am using HSQLDB for Dao and Service API testing. The performance is good and it supports transactions too. I am not using EJB. I use Hibernate.
There are some issues that I am aware of that running the tests on a different database may mask some of the supported database issues. But I think such issues should be caught in the smoke & acceptance tests.
regards,
Koya
[Edited Mar 2022] I ultimately settled for writing the Unit Integration tests that can run outside the container, with a living database and using a standalone transaction manager from Bitronix for transactional support, the instance of which I setup & tear-down with every run. This lets me rollback the transactions as well.

How do you unit test web apps hosted remotely?

I'm familiar with TDD and use it in both my workplace and my home-brewed web applications. However, every time I have used TDD in a web application, I have had the luxury of having full access to the web server. That means that I can update the server then run my unit tests directly from the server. My question is, if you are using a third party web host, how do you run your unit tests on them?
You could argue that if your app is designed well and your build process is sound and automated, that running unit tests on your production server isn't necessary, but personally I like the peace of mind in knowing that everything is still "green" after a major update.
For everyone who has responded with "just test before you deploy" and "don't you have a staging server?", I understand where you're coming from. I do have a staging server and a CI process set up. My unit tests do run and I make sure they all pass before an an update to production.
I realize that in a perfect world I wouldn't be concerned with this. But I've seen it happen before. If a file is left out of the update or a SQL script isn't run, the effects are immediately apparent when running your unit tests but can go unnoticed for quite some time without them.
What I'm asking here is if there is any way, if only to satisfy my own compulsive desires, to run a unit test on a server that I cannot install applications on or remote into (e.g. one which I will only have FTP access to in order to update files)?
I think I probably would have to argue that running unit tests on your production server isn't really part of TDD because by the time you deploy to your production environment technically speaking, you're past "development".
I'm quite a stickler for TDD, and when I'm preaching the benefits to clients I often find myself saying "you can't half adopt TDD, it's all or nothing"
What you probably should have is some form of automated testing that you perform "after" deployment but these are not part of TDD.
Maybe you should look at your process again.
You could write functional tests in something like WATIR, WATIN or Selenium that test what is returned in the reponse page after posting certain form data or requesting specific URLs.
For clarification: what sort of access do you have to your web server? FTP or WebDAV only? From your question, I'm guessing ssh access isn't available - you're dropping files in a directory to deploy. Is that correct?
If so, the answer for unit testing is likely 'do it before you deploy'. You can set up functional testing driven by an automated tool like Selenium to test your app remotely via the web interface, but that's not really unit testing the sense that you're restricted to testing the system as a whole.
Have you considered setting up a staging server, perhaps as a VMWare instance, that mirrors or at least mimics your deployment environment?
What's preventing you from running unit tests on the server? If you can upload your production code and let it run there, why can't you upload this other code and run it as well?
I've written test tools for sites using python and httplib/urllib2 generally it would have been overkill but it was suitable in these cases. Not sure it's going to be of general use though.