Our application uses an SQLite database file to hold some data in it. The app opens the database in the file on startup, reads and writes to it, and closes it on exit.
Unfortunately, we can't forbid someone from running two copies of our app at once. If that happens, presumably there will be two copies of the app trying to read from and/or write to the file at the same time. I imagine this would not end well for the database file.
What can we do to avoid causing data loss for the user? Should we simply avoid opening the database if a second copy of the app is launched concurrently? Or is there something cleverer we can do?
Thanks.
Any sane database provider, including sqlite, will not corrupt your database if 2 people access it at the same time. Most will queue the requests if there's no way to run them in parallel.
Now what your app does with the data, that is your app's problem, but don't worry about the database itself.
Some info about sqlite concurrency: http://www.sqlite.org/lockingv3.html
Related
For our Django web server we have quite limited resources which means we have to be careful with the amount of memory we use. One part of our web server is a crom job (using celery and rabbitmq) that parses a ~130MB csv file into our postgres database. The csv file is saved to disk and then read using the csv module from python, reading row by row. Because the csv file is basically a feed, we use the bulk_upsert from the custom postgres manager from django-postgres-extra to upsert our data and override existing entries. Recently we started experiencing memory errors and we eventually found out they were caused by Django.
Running mem_top() showed us that Django was storing massive upsert queries(INSERT ... ON CONFLICT DO) including their metadata, in memory. Each bulk_upsert of 15000 rows would add 40MB memory used by python, leading to a total of 1GB memory used when the job would finish as we upsert 750.000 rows in total. Apparently Django does not release the query from memory after it's finished. Running the crom job without the upsert call would lead to a max memory usage of 80MB, of which 60MB is default for celery.
We tried running gc.collect() and django.db.reset_queries() but the queries are still stored in memory. Our Debug setting is set to false and CONN_MAX_AGE is also not set. Currently we're out clues for where to look to fix this issue, we can't run our crom jobs now. Do you know of any last resorts to try to resolve this issue?
Some more meta info regarding our server:
django==2.1.3
django-elasticsearch-dsl==0.5.1
elasticsearch-dsl==6.1.0
psycopg2-binary==2.7.5
gunicorn==19.9.0
celery==4.3.0
django-celery-beat==1.5.0
django-postgres-extra==1.22
Thank you very much in advance!
Today I've found the solutions for our issues so I thought it'd be great to share. It turned out that the issue was a combination of Django and Sentry (which we only use on our production server). Django would log the query and Sentry would then catch this log and keep it in memory for some reason. As each raw SQL query was about 40MB this ate a lot of memory. Currently, we turned Sentry off on our crom job server and are looking into a way to clear the logs kept by sentry.
I'm writing a simple program to manage the contacts. Now, I want to ask how can I handle DB storage for the program?
Since the program is installed locally, and it goes in many different Windows operating system, how the Database storage and connectivity will be handled on the machines where no MS SQL Server is installed? How the portability and shipment is tackled?
If you need to use SQL Server specifically, then it has to be installed on the machine.
On the other hand, if all you need is to store some data and not necessarily a database, that could be done in an arbitrary way.
You could also use something like SQLite, which is an SQL database that can be stored in a single file and doesn't require a server running (meaning you can just access it from your program using the driver.)
I'm trying to figure out the safest way of storing chat history for my application on the clients computer. By "safe" I mean so that my application is allowed to actually read/write to the SQLite database. Clients will range from Windows, OS X and Linux users. So i need to find a way on each platform of determining where I'm allowed to create a SQLite database for storing the message history.
Problems I've run into in the past were for example when people used terminal clients for example Citrix where the users is not allowed to write to almost any directory. The drive is often a shared network drive.
Some ideas:
Include an empty database.db within my installer that contains prebuilt tables. And store the database next to my executable. However I'm almost certain that not all clients will be allowed to read/write here, for example Windows users who do not have admin rights.
Use QStandardPaths::writableLocation and create the database at the first run time
Locate the users home dir and create the database at the first run time
Any ideas if there is a really good solution to this problem?
I am creating a few nodes in neo4j using spring data, and then I am also accessing them via findByPropertyValue(prop, val).
Everything works correctly when I am reading/writing to the embedded DB using spring data.
Now, as per the Michael Hunger's book : Good Relationship, I opened up Neoclipse in read-only mode connection to my currently active Neo4j connection in Java..
But, it somehow still says that Neo4j's kernel is actively used by some other program or something.
Question 1 :What am I doing wrong here?
Also, I have created a few nodes and persisted them. Whenever I restart the embedded neo4j db, I can view all my nodes when I do findAll().
Question 2 :When I try to visualize all my nodes in Neoclipse(considering the db is accessible), I can only see one single node(which is empty), has no properties associated to it, whereas I have a name property defined.
I started my java app, persisted few nodes, traversed and got the output from in the java console. Now, I shutdown the application and started the Neoclipse IDE, connected to my DB and found that no nodes are present(Problem of Question 2).
After trying again(heads down), I go back to my Java app and ran my app, and surprisingly I found that I am getting a Lucene-file-corrupted error(unrecognized file format) error. I had no code changes, I did not delete anything, but still got this error.
Question 3 :Not sure what I am doing wrong. But since I found this discussion on my bug(lucene/concurrent db access), I am willing to know if this is a bug or if this is due to any programatic error.(Does it have to do something with Eclipse Juno)
Any reply would be highly appreciated.
Make sure you are properly committing the transactions.
Data is not immediately flushed to the disk by Neo4j and hence you might not be viewing the nodes immediately in Neoclipse. I always restart the application that is using Neo4j in
embedded mode so that data is flushed to the disk and then open neoclipse.
Posting your code would help us to check for any issues.
I want to use a SQLite in memory (":memory:") DB for the tests in my webapp. I'm using nosetests for the tests, and webpy as framework.
I want to populate the DB in the setup() function, and then run all my tests. My problem is that webpy closes all the open DB connections after each request, and the SQLite :memory: DB only lasts until you close the connection, so only the first test is actually run correctly and all the others fail.
My choices are either to run the tests on a disk backed DB, or to recreate the entire DB in memory at the beginning of each individual test.
Do you know how can I prevent webpy from closing DB connections after each request?
Can you think of any other way to get an in memory SQLite DB that lasts for more than one request using webpy?
Maybe you could run the tests on a DB stored on the disk, but using a RAM disk. In Windows, you can install a driver to set up a RAM disk (some instructions here). In Linux, I believe you want to set up tmpfs.
A ram disk will act exactly like a hard disk, but will operate completely from memory, so that you will lose some of the overhead of loading files to/from the hard disk.
Untested:
class NoCloseDB(web.db.SqliteDB):
def _unload_context(self):
pass # this keeps the _ctx.db attribute alive
web.db.register_database('sqlite',NoCloseDB) # overrides the previous registration
Notice that this can only work if you run web.py in a way that uses only one operating system process. If a request is dispatched across multiple processes, each one will still get its own database.