Is it possible to store columnoriented tables inmemory in Memsql? Standard is row oriented tables in memory, column oriented on disk.
MemSQL columnstore tables are always disk-backed, however columnstore data is of course cached in memory, so if all your data happens to fit in memory you will get in-memory performance. (The disk only needs to be involved in that writes must persist to disk for durability, and after restart data must be loaded from disk before it can be read, just like for any durable in-memory store.)
In the rowstore, we use data structures and algorithms (e.g. lockfree skiplists) that take advantage of the fact that the data is in-memory to improve performance on point reads and writes, especially with high concurrency, but columnstore query execution works on fast scans over blocks of data and batch writes, which works well whether the data resides in-memory or on-disk.
Related
Suppose I were to write my own database in c++ and suppose I would use a binary tree or a hash map as the underlying datastructure. How would I handle updates to this datastructure?
1) Should I first create the binary tree and then somehow persist it onto a disk? And every time data has to be updated I need to open this file and update it? Wouldn't that be a costly operation?
2) Is there a way to directly work on the binary tree without loading it into memory and then persisting again?
3) How does SQLite and Mysql deal with it?
4) My main question is, how do databases persist huge amounts of data and concurrently make updates to it without opening and closing the file each time.
Databases see the disk or file as one big bock device and manage blocks in M-way Balanced Trees. They insert/update/delete records in these blocks and flush dirty blocks to disk again. They manage allocation tables of free blocks so the database does not need to be rewritten on each access. As RAM memory is expensive but fast, pages are kept in a RAM cache. Separate indexes (either separate files or just blocks) manage quick access based on keys. Blocks are often the native allocation size of the underlying filesystem (e.g. cluster size). Undo/redo logs are maintained for atomicity. etc.
Much more to be told and this question actually belongs on Computer Science Stack Exchange. For more information read Horowitz & Sahni, "Fundamentals of datastructures", p.496.
As to your questions:
You open it once and keep open while your database manager is running. You allocate storage as needed and maintain an M-way tree as described above.
Yes. You read blocks that you keep in a cache.
and 4: See above.
Typically, you would not do file I/O to access the data. Use mmap to map the data into the virtual address space of the process and let the OS block cache take care of the reads and writes.
My thinking is that, if we preload clients' data(account no, netbalance) in advance, and whenever a transaction is processed the txn record is written into RAM in FIFO data structure, and update also the clients' data in RAM, then after a certain period the record will be written into database in disk to prevent data lost from RAM due to volatility.
By doing so the time in I/O should be saved and hance less time for seeking clients' data for the aim (faster transaction).
I have heard about in-memory database but I do not know if my idea is same as that thing. Also, is there better idea than what I am thinking?
In my opinion, there are several aspects to think about / research to get a step forward. Pre-Loading and Working on data is usually faster than being bound to disk / database page access schemata. However, you are instantly loosing durability. Therefore, three approaches are valid in different situations:
disk-synchronous (good old database way, after each transaction data is guaranteed to be in permanent storage)
in-memory (good as long as the system is up and running, faster by orders of magnitude, risk of loosing transaction data on errors)
delayed (basically in-memory, but from time to time data is flushed to disk)
It is worth noting that delayed is directly supported on Linux through Memory-Mapped files, which are - on the one hand side - often as fast as usual memory (unless reading and accessing too many pages) and on the other hand synced to disk automatically (but not instantly).
As you tagged C++, this is possibly the simplest way of getting your idea running.
Note, however, that when assuming failures (hardware, reboot, etc.) you won't have transactions at all, because it is non-trivial to concretely tell, when the data is actually written.
As a side note: Sometimes, this problem is solved by writing (reliably) to a log file (sequential access, therefore faster than directly to the data files). Search for the word Compaction in the context of databases: This is the operation to merge a log with the usually used on-disk data structures and happens from time to time (when the log gets too large).
To the last aspect of the question: Yes, in-memory databases work in main memory. Still, depending on the guarantees (ACID?) they give, some operations still involve hard disk or NVRAM.
At present, we are using Redis as an in-memory, fast cache. It is working well. The problem is, once Redis is restarted, we need to re-populate it by fetching data from our persistent store. This overloads our persistent
store beyond its capacity and hence the recovery takes a long time.
We looked at Redis persistence options. The best option (without compromising performance) is to use AOF with 'appendfsync everysec'. But with this option, we can loose last second data. That is not acceptable. Using AOF with 'appednfsync always' has a considerable performance penalty.
So we are evaluating single node Aerospike. Does it guarantee no data loss in case of power failures? i.e. In response to a write operation, once Aerospike sends success to the client, the data should never be lost, even if I pull the power cable of the server machine. As I mentioned above, I believe Redis can give this guarantee with the 'appednfsync always' option. But we are not considering it as it has the considerable performance penalty.
If Aerospike can do it, I would want to understand in detail how persistence works in Aerospike. Please share some resources explaining the same.
We are not looking for a distributed system as strong consistency is a must for us. The data should not be lost in node failures or split brain scenarios.
If not aerospike, can you point me to another tool that can help achieve this?
This is not a database problem, it's a hardware and risk problem.
All databases (that have persistence) work the same way, some write the data directly to the physical disk while others tell the operating system to write it. The only way to ensure that every write is safe is to wait until the disk confirms the data is written.
There is no way around this and, as you've seen, it greatly decreases throughput. This is why databases use a memory buffer and write batches of data from the buffer to disk in short intervals. However, this means that there's a small risk that a machine issue (power, disk failure, etc) happening after the data is written to the buffer but before it's written to the disk will cause data loss.
On a single server, you can buy protection through multiple power supplies, battery backup, and other safeguards, but this gets tricky and expensive very quickly. This is why distributed architectures are so common today for both availability and redundancy. Distributed systems do not mean you lose consistency, rather they can help to ensure it by protecting your data.
The easiest way to solve your problem is to use a database that allows for replication so that every write goes to at least 2 different machines. This way, one machine losing power won't affect the write going to the other machine and your data is still safe.
You will still need to protect against a power outage at a higher level that can affect all the servers (like your entire data center losing power) but you can solve this by distributing across more boundaries. It all depends on what amount of risk is acceptable to you.
Between tweaking the disk-write intervals in your database and using a proper distributed architecture, you can get the consistency and performance requirements you need.
I work for Aerospike. You can choose to have your namespace stored in memory, on disk or in memory with disk persistence. In all of these scenarios we perform favourably in comparison to Redis in real world benchmarks.
Considering storage on disk when a write happens it hits a buffer before being flushed to disk. The ack does not go back to the client until that buffer has been successfully written to. It is plausible that if you yank the power cable before the buffer flushes, in a single node cluster the write might have been acked to the client and subsequently lost.
The answer is to have more than one node in the cluster and a replication-factor >= 2. The write then goes to the buffer on the client and the replica and has to succeed on both before being acked to the client as successful. If the power is pulled from one node, a copy would still exist on the other node and no data would be lost.
So, yes, it is possible to make Aerospike as resilient as it is reasonably possible to be at low cost with minimal latencies. The best thing to do is to download the community edition and see what you think. I suspect you will like it.
I believe aerospike would serves your purpose, you can configure it for hybrid storage at namespace(i.e. DB) level in aerospike.conf
which is present at /etc/aerospike/aerospike.conf
For details please refer official documentation here: http://www.aerospike.com/docs/operations/configure/namespace/storage/
I believe you're going to be at the mercy of the latency of whatever the storage medium is, or the latency of the network fabric in the case of cluster, regardless of what DBMS technology you use, if you must have a guarantee that the data won't be lost. (N.B. Ben Bates' solution won't work if there is a possibility that the whole physical plant loses power, i.e. both nodes lose power. But, I would think an inexpensive UPS would substantially, if not completely, mitigate that concern.) And those latencies are going to cause a dramatic insert/update/delete performance drop compared to a standalone in-memory database instance.
Another option to consider is to use NVDIMM storage for either the in-memory database or for the write-ahead transaction log used to recover from. It will have the absolute lowest latency (comparable to conventional DRAM). And, if your in-memory database will fit in the available NVDIMM memory, you'll have the fastest recovery possible (no need to replay from a transaction log) and comparable performance to the original IMDB performance because you're back to a single write versus 2+ writes for adding a write-ahead log and/or replicating to another node in a cluster. But, your in-memory database system has to be able to support direct recovery of an in-memory database (not just from a transaction log). But, again, two requirements for this to be an option:
1. The entire database must fit in the NVDIMM memory
2. The database system has to be able to support recovery of the database directly after system restart, without a transaction log.
More in this white paper http://www.odbms.org/wp-content/uploads/2014/06/IMDS-NVDIMM-paper.pdf
I want to know how many I/O operations (one or many?) it would take for SAS to load a dataset from hard disk to hash table?
My understanding is:
there are three components when people load data from hard disk to hash table, their relationships are below:
Data on hard disk---> memory buffering---->PDV------>hash table(in memory)
the I/O operation is measured when data load from hard disk to memory buffering.
If space of memory buffering is small (smaller than data on hard disk) then it would take memory buffering more than 1 time (or many times) to load data from hard disk to memory buffer before transfer to PDV.
But if memory buffering is big (bigger than data on hard disk), then it can load all the data from hard disk in one go.
I want to know if my understanding is correct?
I went over the redland documentation and there are some problems I couldn't be certain of solving.
Looking on a c++ side, suppose you generate numerous rdf triples over time for several different graphs, and knowing it is not interested a primary interest to have all graphs in memory:
Is it possible to use redland to perform single/bulk insertions (write into persistent storage) without keeping the graph in memory, and how would you tune such insertions?
If we forget about the querying, what would be a good persistent way of storage: files or databases?
What do you think?
Is it possible to use redland to perform single/bulk insertions (write into persistent storage) without keeping the graph in memory, and how would you tune such insertions?
Yes. Create a librdf_storage object where you want your data stored and pass it to librdf_new_model(). Then use any of the API functions such as librdf_parser_parse_into_model()to store data in that model and it gets persisted in the storage.
The graph is only kept in memory if the librdf storage module is written that way.
If we forget about the querying, what would be a good persistent way of storage: files or databases?
The file storage is not really for serious business. It keeps the graph in memory and persists to disk by serializing to/from RDF/XML.
Use a database-backed storage such as mysql or BDB hashes.