Restoring test database for unit testing in Neo4j - unit-testing

I would like to unit test CRUD operations against a pre-populated Neo4j database.
I am thinking that a way to do this might be to:
Create a an empty database (let's call it testDB)
Create a database backup (let's call it testingBackup)
On running tests:
Delete any data from testDB
Populate testDB from testingBackup
Run unit test queries on the now populated testDB
I am aware of the backup / restore functions, the load / dump functions and the export to csv / load from csv etc. However, I'm not sure which of these will be most appropriate to use and can be automated most easily. I'm on Ubuntu and using python.
I would need to be able to quickly and easily alter the backup data as the application evolves.
What is the best approach for this please?

I have something dome somthing similar, with some caveats. I have done tests like these using Java and testcontainers. Also, i didn't use neo4j. I have used postgress, sqlserver and mongodb for my tests. Using the same technique for neo4j should be similar to one of those. I will post the link to my github examples for mongodb/springboot/java. Take a look.
The idea is to spin up a testcontainer from the test (ie, a docker container for tests), populate it with data , make the application use this for its database use, then assert at the end.
In your example, there is no testingbackup. Only a csv file with data.
-Your test spins up a testcontainer with neo4j from your test (this is your testdb).
-Load the csv into this container.
-get the ip, port, user, password of the testcontainer (this part depends on the type of database image available for testcontainers. Some images allow you to set your own port, userid and password. Some of them won't.)
-pass these details to your application and start it (i am not sure how this part will work for a python app. here you are on your own. See the link to a blog i found for a python/testcontainer example below. I have used spring-boot app. You can see my code in github)
-once done, execute queries to your containerized neo4j and assert.
-when the test ends, the container is disposed off with the data.
-any change is done to the csv file which can create new scenarios for your test.
-create another csv file/test as needed.
Here are the links,
https://www.testcontainers.org/
testcontainers neo4j module https://www.testcontainers.org/modules/databases/neo4j/
A blog detailing testcontainers and python.
https://medium.com/swlh/testcontainers-in-python-testing-docker-dependent-python-apps-bd34935f55b5
My github link to a mongodb/springboot and sqlserver/springboot examples.
One of these days i will add a neo4j sample as well.
https://github.com/snarasim123/testcontainers

Related

How can I run integration tests without modifying the database?

I am making some integration tests for an app, testing routes that modify the database. So far, I have added some code to my tests to delete all the changes I have made to the DB because I don't want to change it, but it adds a lot of work and doesn't sounds right. I then thought about copying the database, testing, deleting the database in my testing script. The problem with that is that it is too long to do. Is there a method for doing that ?
I see two possible ways to solve your problem:
In-memory database e.g. (h2)
Database in docker container.
Both approaches solve your problem, you can just shutdown db/container and run it again, db will be clean in that case and you don't have to care about it. Just run new one. However there are some peculiarities:
In-memory is easier to implement and use, but it may have problems with dialects, e.g. some oracle sql commands are not available for H2. And eventually you are running your tests on different DB
Docker container with db is harder to plugin into your build and tests, but it doesn't have embeded DB problems with dialects and DB in docker is the same as your real one.
You can start a database transaction at the beginning of a test and then roll it back. See the following post for details:
https://lostechies.com/jimmybogard/2012/10/18/isolating-database-data-in-integration-tests/

How do I set up all integration tests to use a database that is created on startup

I want to have all my integration tests use the same database. I want to create that database by "Publishing" my database project in source control.
After the selected amount of tests were run, I want to delete that database.
What would be the best approach for this?
I've seen successfully being used hazelcast - it's a
open source in-memory data grid solution.
so you can pretty easy set it up as a sandbox DB. It'll live as long as your tests need it. Maybe the most visible advantages are elasticity and scalability.
I want to create that database by "Publishing" my database project in source control.
Putting it in Git should be straightforward as well - take a database dump, and put it under version control. This way it is just a flat text file. it's good to keep both a data dump, and a schema one. This way using diff it becomes fairly easy to see what changed in the schema from revision to revision.

Good way to deploy a django app with an asynchronous script running outside of the app

I am building a small financial web app with django. The app requires that the database has a complete history of prices, regardless of whether someone is currently using the app. These prices are freely available online.
The way I am currently handling this is by running simultaneously a separate python script (outside of django) which downloads the price data and records it in the django database using the sqlite3 module.
My plan for deployment is to run the app on an AWS EC2 instance, change the permissions of the folder where the db file resides, and separately run the download script.
Is this a good way to deploy this sort of app? What are the downsides?
Is there a better way to handle the asynchronous downloads and the deployment? (PythonAnywhere?)
You can write the daemon code and follow this approach to push data to DB as soon as you get it from Internet. Since your daemon would be running independently from the Django, you'd need to take care of data synchronisation related issues as well. One possible solution could be to use DateTimeField in your Django model with auto_now_add = True, which will give you idea of time when data was entered in DB. Hope this helps you or someone else looking for similar answer.

Migrations Plugin for CakePHP

I have few questions about this plugin.
1- what does it do?
Is it for exchanging databases between teams or changing their schema or creating tables based on models or something else?
2- if it is not meant to create tables based on models where can I find a script that does this?
3-can it work under windows?
thanks
The Migrations plugin allows versioning of your db changes. Much like is available in other PHP frameworks and Rails.
You essentially start with your original schema and create the initial migration. Each time you make a change you generate a 'diff' that gets stored in the filesystem.
When you run a migration, the database is updated with the changes you made. Think deployment to a staging or production server where you want the structure to be the same as your code is moved from environment to environment.
We are starting to look at this plugin so we can automate our deployments, as the DB changes are done manually right now.

How to ensure database changes can be easily moved over DVCS using django

Overview
I'm building a website in django. I need to allow people to begin to add flatpages, and set some settings in the admin. These changes should be definitive, since that information comes from the client. However, I'm also developing the backend, and as such will am creating and migrating tables. I push these changes to the hub.
Tools
django
git
south
postgres
Problem
How can I ensure that I get the database changes from the online site down to me on my lappy, and also how can I push my database changes up to the live site, so that we have a minimum of co-ordination needed? I am familiar with git hooks, so that option is in play.
Addendum:
I guess I know which tables can be modified via the admin. There should not be much overlap really. As I consider further, the danger really is me pushing data that would overwrite something they have done.
Thanks.
For getting your schema changes up to the server, just use South carefully. If you modify any table they might have data in, make sure you write both a schema migration and as necessary a data migration to preserve the sense of their data.
For getting their updated data back down to you (which doesn't seem critical, but might be nice to work with up-to-date test data as you're developing), I generally just use Django fixtures and the dumpdata and loaddata commands. It's easy enough to dump a fixture and commit it to your repo, then a loaddata on your end.
You could try using git hooks to automate some of this, but if you want automation I do recommend trying something like Fabric instead. Much of this stuff doesn't need to be run every single time you push/pull (in particular, I usually wouldn't want to dump a new data fixture that frequently).
You should probably take a look at South:
http://south.aeracode.org/
It seems to me that you could probably create a git hook that triggers off South if you are doing some sort of continuous integration system.
Otherwise, every time you do a push you will have to manually execute the migration steps yourself. Don't forget to put up the "site is under maintenance" message. ;)
I recommend that you use mk-table-sync to pull changes from live server to your laptop.
mk-table-sync takes a lot of parameters so you can automate this process by using fabric. You would basically create a fabric function that executes mk-table-sync on each tablet that you want to pull from the server.
This means that you can not make dabatase changes yourself, because they will be overwritten by the pull.
The only changes that you would be making to the live database are using South. You would push the code to the server and then run migrate to update the database schema.