How to use SQLite :memory: database in webpy for unittesting

How to use SQLite :memory: database in webpy for unittesting - unit-testing

I want to use a SQLite in memory (":memory:") DB for the tests in my webapp. I'm using nosetests for the tests, and webpy as framework.
I want to populate the DB in the setup() function, and then run all my tests. My problem is that webpy closes all the open DB connections after each request, and the SQLite :memory: DB only lasts until you close the connection, so only the first test is actually run correctly and all the others fail.
My choices are either to run the tests on a disk backed DB, or to recreate the entire DB in memory at the beginning of each individual test.
Do you know how can I prevent webpy from closing DB connections after each request?
Can you think of any other way to get an in memory SQLite DB that lasts for more than one request using webpy?

Maybe you could run the tests on a DB stored on the disk, but using a RAM disk. In Windows, you can install a driver to set up a RAM disk (some instructions here). In Linux, I believe you want to set up tmpfs.
A ram disk will act exactly like a hard disk, but will operate completely from memory, so that you will lose some of the overhead of loading files to/from the hard disk.

Untested:
class NoCloseDB(web.db.SqliteDB):
def _unload_context(self):
pass # this keeps the _ctx.db attribute alive
web.db.register_database('sqlite',NoCloseDB) # overrides the previous registration
Notice that this can only work if you run web.py in a way that uses only one operating system process. If a request is dispatched across multiple processes, each one will still get its own database.

Related

How can I run integration tests without modifying the database?

I am making some integration tests for an app, testing routes that modify the database. So far, I have added some code to my tests to delete all the changes I have made to the DB because I don't want to change it, but it adds a lot of work and doesn't sounds right. I then thought about copying the database, testing, deleting the database in my testing script. The problem with that is that it is too long to do. Is there a method for doing that ?

I see two possible ways to solve your problem:
In-memory database e.g. (h2)
Database in docker container.
Both approaches solve your problem, you can just shutdown db/container and run it again, db will be clean in that case and you don't have to care about it. Just run new one. However there are some peculiarities:
In-memory is easier to implement and use, but it may have problems with dialects, e.g. some oracle sql commands are not available for H2. And eventually you are running your tests on different DB
Docker container with db is harder to plugin into your build and tests, but it doesn't have embeded DB problems with dialects and DB in docker is the same as your real one.

You can start a database transaction at the beginning of a test and then roll it back. See the following post for details:
https://lostechies.com/jimmybogard/2012/10/18/isolating-database-data-in-integration-tests/

Django bulk_upsert saves queries to memory causing memory errors

For our Django web server we have quite limited resources which means we have to be careful with the amount of memory we use. One part of our web server is a crom job (using celery and rabbitmq) that parses a ~130MB csv file into our postgres database. The csv file is saved to disk and then read using the csv module from python, reading row by row. Because the csv file is basically a feed, we use the bulk_upsert from the custom postgres manager from django-postgres-extra to upsert our data and override existing entries. Recently we started experiencing memory errors and we eventually found out they were caused by Django.
Running mem_top() showed us that Django was storing massive upsert queries(INSERT ... ON CONFLICT DO) including their metadata, in memory. Each bulk_upsert of 15000 rows would add 40MB memory used by python, leading to a total of 1GB memory used when the job would finish as we upsert 750.000 rows in total. Apparently Django does not release the query from memory after it's finished. Running the crom job without the upsert call would lead to a max memory usage of 80MB, of which 60MB is default for celery.
We tried running gc.collect() and django.db.reset_queries() but the queries are still stored in memory. Our Debug setting is set to false and CONN_MAX_AGE is also not set. Currently we're out clues for where to look to fix this issue, we can't run our crom jobs now. Do you know of any last resorts to try to resolve this issue?
Some more meta info regarding our server:
django==2.1.3
django-elasticsearch-dsl==0.5.1
elasticsearch-dsl==6.1.0
psycopg2-binary==2.7.5
gunicorn==19.9.0
celery==4.3.0
django-celery-beat==1.5.0
django-postgres-extra==1.22
Thank you very much in advance!

Today I've found the solutions for our issues so I thought it'd be great to share. It turned out that the issue was a combination of Django and Sentry (which we only use on our production server). Django would log the query and Sentry would then catch this log and keep it in memory for some reason. As each raw SQL query was about 40MB this ate a lot of memory. Currently, we turned Sentry off on our crom job server and are looking into a way to clear the logs kept by sentry.

Connection of a program to a database when using c++

I'm writing a simple program to manage the contacts. Now, I want to ask how can I handle DB storage for the program?
Since the program is installed locally, and it goes in many different Windows operating system, how the Database storage and connectivity will be handled on the machines where no MS SQL Server is installed? How the portability and shipment is tackled?

If you need to use SQL Server specifically, then it has to be installed on the machine.
On the other hand, if all you need is to store some data and not necessarily a database, that could be done in an arbitrary way.
You could also use something like SQLite, which is an SQL database that can be stored in a single file and doesn't require a server running (meaning you can just access it from your program using the driver.)

How to have Django test case and Selenium server use same database?

I have a Django (v1.4, using Postgresql) project which I've written a bunch of working unittests for. These use FactoryBoy to generate most of their data.
I'm now starting to write some integration tests using LiveServerTestCase with Selenium. I've just realised that my tests and the live test server use different databases. Which means that data created by factories in my tests aren't available to Selenium.
I'm not sure of the best way to progress. I think I could use fixtures to supply data that would work, although this is a pain having got this far using factories instead.
Is there a way I can continue to use factories to generate data that will work for my Selenium tests? Really I'd like my tests and LiveServerTestCase to use the same database.

I found out why this happened to me, and some possible workarounds, including Ilya Baryshev's answer above.
If your test descends from Django's TestCase, and if your database supports transactions, then each test runs in its own transaction, and nobody outside (no other thread, external process, or other test) can see the objects created in the database by your test.
LiveServerTestCase uses threads, so it would suffer from this problem. So the designers made it inherit from TransactionTestCase instead of TestCase, which disables these transactions, so that changes are globally visible.
What happened to me was that I added some mixins to my test class, and one of them pulled in TestCase. This doesn't cause an error, but it silently replaces the base class of LiveServerTestCase with TestCase, which enables transactions again, causing the problem that you describe.
Ilya's SQLite memory database workaround works because Django contains code that detects when using a SQLite :memory: database that actually shares the same connection between threads, so you see your test's objects in the LiveServerThread because they're inside the same transaction. However this comes with some caveats:
It’s important to prevent simultaneous database queries via this shared connection by the two threads, as that may sometimes randomly cause the tests to fail. So you need to ensure that the two threads don’t access the database at the same time. In particular, this means that in some cases (for example, just after clicking a link or submitting a form), you might need to check that a response is received by Selenium and that the next page is loaded before proceeding with further test execution. Do this, for example, by making Selenium wait until the HTML tag is found in the response (requires Selenium > 2.13)...
https://docs.djangoproject.com/en/1.4/topics/testing/#live-test-server
In my case, once we identifier that autocommit was being turned off when the test started, and tracked down why (because we had entered TestCase code that we shouldn't have done), we were able to fix the inheritance hierarchy to avoid pulling in TestCase, and then the same database was visible from both the live server thread and the test.
This also works with Postgres databases, so it would provide a solution for velotron.

Have you tried using sqlite as your database backend for tests?
When using an in-memory SQLite database to run the tests, the same
database connection will be shared by two threads in parallel: the
thread in which the live server is run and the thread in which the
test case is run.
from Django docs
If you're not using anything beyond regular ORM, you might benefit from test speedups as well.

Multiple copies of application accessing SQLite file?

Our application uses an SQLite database file to hold some data in it. The app opens the database in the file on startup, reads and writes to it, and closes it on exit.
Unfortunately, we can't forbid someone from running two copies of our app at once. If that happens, presumably there will be two copies of the app trying to read from and/or write to the file at the same time. I imagine this would not end well for the database file.
What can we do to avoid causing data loss for the user? Should we simply avoid opening the database if a second copy of the app is launched concurrently? Or is there something cleverer we can do?
Thanks.

Any sane database provider, including sqlite, will not corrupt your database if 2 people access it at the same time. Most will queue the requests if there's no way to run them in parallel.
Now what your app does with the data, that is your app's problem, but don't worry about the database itself.

Some info about sqlite concurrency: http://www.sqlite.org/lockingv3.html

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js