I have a problem with SQLite. It seems that every call takes ~300ms to execute. After some testing I noticed that the delay is caused by transactions. 8 normal inserts with implicit transactions take about 2 seconds, however, if I start a transaction before the inserts and commit it after, I can do almost a million inserts in the same time. Calls affected include DROP TABLE, CREATE TABLE, INSERT and I assume others, too (probably all that implicitly begin a transaction).
Some more info:
Downloaded the source amalgamation from the SQLite website (3200100)
Compiled it using Visual Studio into a static library (Not using any compiler flags, although I have been playing around with them without luck)
I am using sqlite3_open16 followed by sqlite3_prepare16_v3 and then sqlite3_step to start execution and/or receive the first result
No multithreading, no access from multiple processes, database file is exclusively opened by this program
If I create the file on my SSD (960 EVO) instead the "transaction delay" goes from 300ms down to 10ms. Still an absurdly high value, though, but I feel like the speed of my disk shouldn't influence whatever is slowing the transactions down?
The function that is blocking is sqlite3_step (It also annoys me that I have to call a function with that name just to execute a DROP TABLE, for example, but not that it matters)
Edit: During the transaction, the CPU usage is 100%.
On a side note, is it possible to "help" SQLite with organizing data if you know that every single row of your table will be exactly, say, 64 Byte?
I hope you can help me with this or possibly recommend an alternative (relational, c++ api, file based, highly performant)
Thank you very much!
SQLite makes lots of effort to ensure it doesn't suffer data corruption, so with an implicit transaction, you are limited by your hard disk speed.
With a transaction, the data is written to other locations, and only committed to disk once, and is much faster
From sqlite speed
With synchronization turned on, SQLite executes an fsync() system call (or the equivalent) at key points to make certain that critical data has actually been written to the disk drive surface.
When creating a transaction, the data is written to other files, and only when all the data is committed, will the fsync cost be paid, and all together. That is a price for that part of the configuration. A positive from this, is I have never suffered from sqlite data loss through corruption.
I feel like the speed of my disk shouldn't influence whatever is slowing the transactions down?
This is an important trade-off. If you want improved data integrity, then the speed of your disk is relevant.
How long does committing a transaction take?
From sqlite faq :19 why are transactions slow
SQLite will easily do 50,000 or more INSERT statements per second on an average desktop computer. But it will only do a few dozen transactions per second.
You can :-
Use transactions to bind more work. The cost is per transaction, so can be bulked up.
Use temporary tables. Temporary tables do not suffer the performance, and will run at full speed.
NOT RECOMMENDED. Use PRAGMA synchronous=OFF to disable the synchronous write.
Related
I'm trying to use RocksDB to store billions of records, so the resulting databases are fairly large - hundreds of gigabytes, several terabytes in some cases. The data is initially imported from a different service snapshot and updated from Kafka afterwards, but that's beside the point.
There are two parts of the problem:
Part 1) Initial data import takes hours with autocompactions disabled (it takes days if I enable them), after that I reopen the database with autocompactions enabled, but they aren't triggered automatically when the DB is opened, so I have to do it with CompactRange(Range{nil, nil}) in Go manually.
Manual compaction takes almost similar time with only one CPU core being busy and during compaction the overall size of the DB increases 2x-3x, but then ends up around 0.5x
Question 1: Is there a way to avoid 2x-3x data size growth during compaction? It becomes a problem when the data size reaches terabytes. I use the default Level Compaction, which according to the docs "optimizes disk footprint vs. logical database size (space amplification) by minimizing the files involved in each compaction step".
Question 2: Is it possible to engage more CPU cores for manual compaction? Looks like only one is used atm (even though MaxBackgroundCompactions = 32). It would speed up the process A LOT as there are no writes during initial manual compaction, I just need to prepare the DB without waiting days.
Would it work with several routines working on different sets of keys instead of just one routine working on all keys? If yes, what's the best way to divide the keys into these sets?
Part 2) Even after this manual compaction, RocksDB seems to perform autocompaction later, after I start adding/updating the data, and after it's done the DB size gets even smaller - around 0.4x comparing to the size before the manual compaction.
Question 3: What's the difference between manual and autocompation and why autocompaction seems to be more effective in terms of resulting data size?
My project is in Go, but I'm more or less familiar with RocksDB C++ code and I couldn't find any answers to these questions in the docs or in the source code.
My thinking is that, if we preload clients' data(account no, netbalance) in advance, and whenever a transaction is processed the txn record is written into RAM in FIFO data structure, and update also the clients' data in RAM, then after a certain period the record will be written into database in disk to prevent data lost from RAM due to volatility.
By doing so the time in I/O should be saved and hance less time for seeking clients' data for the aim (faster transaction).
I have heard about in-memory database but I do not know if my idea is same as that thing. Also, is there better idea than what I am thinking?
In my opinion, there are several aspects to think about / research to get a step forward. Pre-Loading and Working on data is usually faster than being bound to disk / database page access schemata. However, you are instantly loosing durability. Therefore, three approaches are valid in different situations:
disk-synchronous (good old database way, after each transaction data is guaranteed to be in permanent storage)
in-memory (good as long as the system is up and running, faster by orders of magnitude, risk of loosing transaction data on errors)
delayed (basically in-memory, but from time to time data is flushed to disk)
It is worth noting that delayed is directly supported on Linux through Memory-Mapped files, which are - on the one hand side - often as fast as usual memory (unless reading and accessing too many pages) and on the other hand synced to disk automatically (but not instantly).
As you tagged C++, this is possibly the simplest way of getting your idea running.
Note, however, that when assuming failures (hardware, reboot, etc.) you won't have transactions at all, because it is non-trivial to concretely tell, when the data is actually written.
As a side note: Sometimes, this problem is solved by writing (reliably) to a log file (sequential access, therefore faster than directly to the data files). Search for the word Compaction in the context of databases: This is the operation to merge a log with the usually used on-disk data structures and happens from time to time (when the log gets too large).
To the last aspect of the question: Yes, in-memory databases work in main memory. Still, depending on the guarantees (ACID?) they give, some operations still involve hard disk or NVRAM.
At present, we are using Redis as an in-memory, fast cache. It is working well. The problem is, once Redis is restarted, we need to re-populate it by fetching data from our persistent store. This overloads our persistent
store beyond its capacity and hence the recovery takes a long time.
We looked at Redis persistence options. The best option (without compromising performance) is to use AOF with 'appendfsync everysec'. But with this option, we can loose last second data. That is not acceptable. Using AOF with 'appednfsync always' has a considerable performance penalty.
So we are evaluating single node Aerospike. Does it guarantee no data loss in case of power failures? i.e. In response to a write operation, once Aerospike sends success to the client, the data should never be lost, even if I pull the power cable of the server machine. As I mentioned above, I believe Redis can give this guarantee with the 'appednfsync always' option. But we are not considering it as it has the considerable performance penalty.
If Aerospike can do it, I would want to understand in detail how persistence works in Aerospike. Please share some resources explaining the same.
We are not looking for a distributed system as strong consistency is a must for us. The data should not be lost in node failures or split brain scenarios.
If not aerospike, can you point me to another tool that can help achieve this?
This is not a database problem, it's a hardware and risk problem.
All databases (that have persistence) work the same way, some write the data directly to the physical disk while others tell the operating system to write it. The only way to ensure that every write is safe is to wait until the disk confirms the data is written.
There is no way around this and, as you've seen, it greatly decreases throughput. This is why databases use a memory buffer and write batches of data from the buffer to disk in short intervals. However, this means that there's a small risk that a machine issue (power, disk failure, etc) happening after the data is written to the buffer but before it's written to the disk will cause data loss.
On a single server, you can buy protection through multiple power supplies, battery backup, and other safeguards, but this gets tricky and expensive very quickly. This is why distributed architectures are so common today for both availability and redundancy. Distributed systems do not mean you lose consistency, rather they can help to ensure it by protecting your data.
The easiest way to solve your problem is to use a database that allows for replication so that every write goes to at least 2 different machines. This way, one machine losing power won't affect the write going to the other machine and your data is still safe.
You will still need to protect against a power outage at a higher level that can affect all the servers (like your entire data center losing power) but you can solve this by distributing across more boundaries. It all depends on what amount of risk is acceptable to you.
Between tweaking the disk-write intervals in your database and using a proper distributed architecture, you can get the consistency and performance requirements you need.
I work for Aerospike. You can choose to have your namespace stored in memory, on disk or in memory with disk persistence. In all of these scenarios we perform favourably in comparison to Redis in real world benchmarks.
Considering storage on disk when a write happens it hits a buffer before being flushed to disk. The ack does not go back to the client until that buffer has been successfully written to. It is plausible that if you yank the power cable before the buffer flushes, in a single node cluster the write might have been acked to the client and subsequently lost.
The answer is to have more than one node in the cluster and a replication-factor >= 2. The write then goes to the buffer on the client and the replica and has to succeed on both before being acked to the client as successful. If the power is pulled from one node, a copy would still exist on the other node and no data would be lost.
So, yes, it is possible to make Aerospike as resilient as it is reasonably possible to be at low cost with minimal latencies. The best thing to do is to download the community edition and see what you think. I suspect you will like it.
I believe aerospike would serves your purpose, you can configure it for hybrid storage at namespace(i.e. DB) level in aerospike.conf
which is present at /etc/aerospike/aerospike.conf
For details please refer official documentation here: http://www.aerospike.com/docs/operations/configure/namespace/storage/
I believe you're going to be at the mercy of the latency of whatever the storage medium is, or the latency of the network fabric in the case of cluster, regardless of what DBMS technology you use, if you must have a guarantee that the data won't be lost. (N.B. Ben Bates' solution won't work if there is a possibility that the whole physical plant loses power, i.e. both nodes lose power. But, I would think an inexpensive UPS would substantially, if not completely, mitigate that concern.) And those latencies are going to cause a dramatic insert/update/delete performance drop compared to a standalone in-memory database instance.
Another option to consider is to use NVDIMM storage for either the in-memory database or for the write-ahead transaction log used to recover from. It will have the absolute lowest latency (comparable to conventional DRAM). And, if your in-memory database will fit in the available NVDIMM memory, you'll have the fastest recovery possible (no need to replay from a transaction log) and comparable performance to the original IMDB performance because you're back to a single write versus 2+ writes for adding a write-ahead log and/or replicating to another node in a cluster. But, your in-memory database system has to be able to support direct recovery of an in-memory database (not just from a transaction log). But, again, two requirements for this to be an option:
1. The entire database must fit in the NVDIMM memory
2. The database system has to be able to support recovery of the database directly after system restart, without a transaction log.
More in this white paper http://www.odbms.org/wp-content/uploads/2014/06/IMDS-NVDIMM-paper.pdf
I have an app that spins up multiple processes to read large amounts of data from several PostgreSQL tables to do number crunching, and then stores the results in separate tables.
When I tested this with just a single process, it was blazing fast and was using almost 100% CPU, but when I tried using 8 processes on an 8 core machine, all processes registered about 1% CPU and the whole task seemed to take even longer.
When I check pg_stat_activity, I saw several connections listed as "<IDLE> in transaction". Following some advice here, I looked at pg_locks, and I'm seeing hundreds of "AccessShareLock" locks on the dozens of read-only tables. Based on the docs, I believe this is the default, but I think this is causing the processes to step on each others feet, negating any benefit to multi-processing.
Is there a more efficient isolation level to use, or better way to tune PostgreSQL to allow faster read-only access to several processes, so each doesn't need to lock the table? Specifically, I'm using Django as my ORM.
Not sure what throttles your multiple cores, but it has nothing to do with the isolation level. Even if you have concurrent write operations. Per documentation:
The main advantage of using the MVCC model of concurrency control
rather than locking is that in MVCC locks acquired for querying
(reading) data do not conflict with locks acquired for writing data,
and so reading never blocks writing and writing never blocks reading.
PostgreSQL maintains this guarantee even when providing the strictest
level of transaction isolation through the use of an innovative
Serializable Snapshot Isolation (SSI) level.
Bold emphasis mine.
Of course, reading also never blocks reading.
Maybe you need to reconfigure resource allocation on your server? Default configuration is regularly to conservative. On the other hand, some parameters should not be set too high in a multi-user environment. work_mem comes to mind. Check the list for Performance Optimization in the Postgres Wiki.
And finally:
Django as my ORM.
ORMs often try to stay platform-independent and fail to get the full potential out of a particular RDBMS. They are primitive crutches and don't play well with performance optimization.
I would like to implement caching in SQLite Database. My primary objective is to write data to RAM and when the Cache is filled I want to flush all the data to disk database. I would like to know whether it is possible at all? if possible can I have some sample codes?
Thanks
SQLite already does its own cacheing, which is likely to be more efficient than anything you can implement - you can read about the interface to it here. You may be interested in other optimisations - there is a FAQ here.
You might want to checkout the SQLite fine-tuning commands (pragmas)
Since sqlite is transactional, it relies on fsync to ensure a particular set of statements have completed when a transaction is committed. The speed and implementation of fsync varies from platform to platform.
So, by batching several statements within a transaction, you can get a significant increase in speed since several blocks of data will be written before fsync is called.
An older sqlite article here illustrates this difference between doing several INSERTs inside and outside transactions.
However, if you are writing an application needing concurrent access to data, note that when sqlite starts a write transaction, all reads (select statements) will be blocked. You may want to explore using your in memory cache to retrieve data while a write transaction is taking place.
With that said, it's also possible that sqlite's caching scheme will handle that for you.
Why do you want to do this? Are you running into performance issues? Or do you want to prevent other connections from seeing data until you commit it to disk?
Regarding syncing to disk, there is a tradeoff between database integrity and speed. Which you want depends on your situation.
Use transactions. Advantages: High reliability and simple. Disadvantages: once you start a transaction, no one else can write to the database until you COMMIT or ROLLBACK. This is usually the best solution. If you have a lot of work to do at once, begin a transaction, write everything you need, then COMMIT. All your changes will be cached in RAM until you COMMIT, at which time the database will explicitly sync to disk.
Use PRAGMA journal_mode=MEMORY and/or PRAGMA synchronous=OFF. Advantages: High speed and simple. Disadvantages: The database is no longer safe against power loss and program crashes. You can lose your entire database with these options. However, they avoid explicitly syncing to disk as often.
Write your changes to an in-memory database and manually sync when you want. Advantages: High speed and reliable. Disadvantages: Complicated, and another program can write to the database without you knowing about it. By writing to an in-memory database, you never need to sync to disk until you want to. Other programs can write to the database file, and if you're not careful you can overwrite those changes. This option is probably too complicated to be worth it.