Implementing a lock free data structure on disk - c++

I have an interesting challenge, for those with strong background in lock-free data structures, and disk based data structures.
I'm looking for a way to build in C++ a data structure to hold a varying amount of objects.
The limitation are such:
The data structure must reside on disk.
There is one thread writing to the data structure and many others reading from it.
Every read is atomic. (lets assume I can atomically read a block of size 32/64KB for this and that all objects are small than that in size.
A write should not block a read, for that it is possible to assume that I can write in an atomic way a block of 32/64KB as well.
Locks cannot be used at all.
Any suggestions?
I was thinking of using something like a B-Tree and when needed to split nodes and write new data than move them to new nodes at the end of the file and then just update the pointers to the nodes which will reside for example in some other file (the original blocks will be marked as free and added to a freestore)
However, I run into a problem if my mapping file is greater than 32/64Kb.. Let say I want it to hold even just 1 million object pointers than at 4 bytes/pointer I get to 4 million bytes which is roughly 4 Megs... (and at 1 billion objects even more than that..) Which means the mappings file cannot be written in an atomic manner.
So if someone has a better suggestion as to maybe how to implement the above - or even some direction it would be greatly appreciated.
As far as I know all opensource/commercial implementations of B-Tree use locks of some sort, which I cannot use.
Thanks,
Max.

You won't get very far by just assuming reads/writes are atomic -- mainly because they're not, and you'll end up emulating it in a way that'll kill performance.
It sounds like you want to research MVCC, which is the pretty standard mechanism to use when designing a lock-free database. The basic concept is that every read gets a "snapshot" of the database -- usually implemented in a lock-free way by leaving old pages alone and performing any modifications to new pages only. Once the old pages are finished being used by readers, they're finally marked for re-use.
While MVCC is significantly more involved than a CPU/RAM lock-free structure, once you have it many of the same optimistic lock-free patterns apply towards using it.

LMDB will do all of this with no problem. It is an MVCC B+tree and readers are completely lockless.

Related

Thread-safe read-only alternative to vtkUnstructuredGrid->GetPoint()

I have started working on multithreading and point cloud processing. Problem is i have to implement multithreading onto an existing implementation and there are so many read and write operation so using mutex does not give me enough speed up in terms of performance due to too many read operations from the grid.
At the end i modified the code in a way that i can have one vtkSmartPointer<vtkUnstructuredGrid>which holds my point cloud. The only operation the threads have to do is accessing points using GetPoint method. However, it is not thread safe even when you have read-only operation due to smart pointers.
Because of that i had to copy my main Point cloud for each thread which at the end causes memory issues if i have too many threads and big clouds.
I tried to cut point clouds into chunks but then it gets too complicated again when i have too many threads. I can not guarantee optimized amount of points to process for each thread. Also i do neighbour search for each point so cutting point cloud into chunks gets even more complicated because i need to have overlaps for each chunk in order to get proper neighbourhood search.
Since vtkUnstructuredGridis memory optimized i could not replace it with some STL containers. I would be happy if you can recommend me data structures i can use for point cloud processing that are thread-safe to read. Or if there is any other solution i could use.
Thanks in advance
I am not familiar with VTK or how it works.
In general, there are various techniques and methods to improve performance in multi-threading environment. The question is vague, so I can only provide a general vague answer.
Easy: In case there are many reads and few writes, use std::shared_mutex as it allows multiple reads simultaneously.
Moderate: If the threads work with distinct data most of the time: they access the same data array but at distinct locations - then you can implement a handler that ensures that the threads concurrently work over distinct pieces of data without intersections and if a thread ask to work over a piece of data that is currently being processed, then tell it to work over something else or wait.
Hard: There are methods that allow efficient concurrency via std::atomic by utilizing various memory instructions. I am not too familiar with it and it is definitely not simple but you can seek tutorials on it in the internet. As far as I know, certain parts of such methods are still in research-and-development and best practices aren't yet developed.
P.S. If there are many reads/writes over the same data... is the implementation even aware of the fact that the data is shared over several threads? Does it even perform correctly? You might end up needing to rewrite the whole implementation.
I just thought i post the solution because it was actually my stupitidy. I realized that at one part of my code i was using double* vtkDataSet::GetPoint(vtkIdType ptId) version of GetPoint() which is not thread safe.
For multithreaded code void vtkDataSet::GetPoint(vtkIdType id,double x[3]) should be used.

Predictively computing potentially needed values on large shared data structure with infrequent updates

I have a system I need to design with low latency in mind, processing power and memory are generous. I have a large (several GB) data structure that is updated once every few seconds. Many (read only) operations are going to run against this data structure between updates, in parallel, accessing it heavily. As soon as an update occurs, all computations in progress should be cleanly cancelled, as their results are invalidated by the update.
The issue I'm running into here is that writes are infrequent enough, and readers access so often that locking around individual reader access would have a huge hit to performance. I'm fine with the readers reading invalid data, but then I need to deal with any invariants broken (assertions) or segfaults due to stale pointers, etc. At the same time, I can't have readers block writers, so reader-writer locks acquired at every reader's thread start is unacceptable.
The only solution I can think of has a number of issues, which is to allocate a mapping with mmap, put the readers in separate processes, and mprotect the memory to kill the workers when it's time to update. I'd prefer a cross-platform solution (ideally pure C++), however, and ideally without forking every few seconds. This would also require some surgery to get all the data structures located in shm.
Something like a revocable lock would do exactly what I need, but I don't know of any libraries that provide such functionality.
If this was a database I'd use multi-versions concurrency control. Readers obtain a logical snapshot while the underlying physical data structures are mostly lock-free (or locked very shortly and fine-grainedly).
You say your memory is generously equipped. Can you just create a complete copy of the data structure? Then you modify the copy and swap it out atomically.
Or, can you use immutable data-structures so that readers continue to use the old version and the writer creates new objects?
Or, you implement MVCC in a fine-grained way. Let's say you want to version a hash-set. Instead of keeping one value per key, you keep one value per key per version. Readers read from the latest version that is <= the version that existed when they started to read. Writers create a new version number for each write "transaction". Only when all writes are complete readers would start picking up changes from the new version. This is how MVCC-databases do it.
Besides these approaches I also liked your mmap idea. I don't think you need a separate process is your OS supports copy-on-write memory mappings. Then you can map the same memory area multiple times and provide a stable snapshot to readers.

(preferably boost) lock-free array/vector/map/etc?

Considering my lack of c++ knowledge, please try to read my intent and not my poor technical question.
This is the backbone of my program https://github.com/zaphoyd/websocketpp/blob/experimental/examples/broadcast_server/broadcast_server.cpp
I'm building a websocket server with websocket++ (and oh is websocket++ sweet. I highly recommend), and I can easily manipulate per user data thread-safely because it really doesn't need to be manipulated by different threads; however, I do want to be able to write to an array (I'm going to use the catch-all term "array" from weaker languages like vb, php, js) in one function thread (with multiple iterations that could be running simultanously) and also read in 1 or more threads.
Take stack as an example: if I wanted to have all of the ids (PRIMARY column of all articles) sorted in a particular way, in this case by net votes, and held in memory, I'm thinking I would have a function that's called in its' own boost::thread, fired whenever a vote on the site comes in to reorder the array.
How can I do this without locking & blocking? I'm 100% fine with users reading from an old array while another is being built, but I absolutely do not want their reads or the thread writes to ever fail/be blocked.
Does a lock-free array exist? If not, is there some way to build the new array in a temporary array and then write it to the actual array when the building is finished without locking & blocking?
Have you looked at Boost.Lockfree?
Uh, uh, uh. Complicated.
Look here (for an example): RCU -- and this is only about multiple reads along with ONE write.
My guess is that multiple writers at once are not going to work. You should rather look for a more efficient representation than an array, one that allows for faster updates. How about a balanced tree? log(n) should never block anything in a noticeable fashion.
Regarding boost -- I'm happy that it finally has proper support for thread synchronization.
Of course, you could also keep a copy and batch the updates. Then a background process merges the updates and copies the result for the readers.

c++: how to optimize IO?

I am working on a mathematical problem that has the advantage of being able to "pre-compute" about half of the problem, save this information to file, and then reuse it many times to compute various 'instances' of my problem. The difficulty is that uploading all of this information in order to solve the actual problem is a major bottleneck.
More specifically:
I can pre-compute a huge amount of information - tons of probabilities (long double), a ton of std::map<int,int>, and much more - and save all this stuff to disk (several Gb).
The second half of my program accepts an input argument D. For each D, I need to perform a great many computations that involve a combination of the pre-computed data (from file), and some other data that are specific to D (so that the problem is different for each D).
Sometimes I will need to pick out certain pieces of pre-computed information from the files. Other times, I will need to upload every piece of data from a (large) file.
Are there any strategies for making the IO faster?
I already have the program parallelized (MPI, via boost::mpi) for other reasons, but regardless, accessing files on the disk is making my compute time unbearable.
Any strategies or optimizations?
Currently I am doing everything with cstdio, i.e. no iostream. Will that make a big difference?
Certainly the fastest (but the fragilest) solution would be to mmap the data to a fixed address. Slap it all in one big struct, and instantiate the std:::map with an allocator which will allocate in a block attached to the end of the struct. It's not simple, but it will be fast; one call to mmap, and the data is in your (virtual) memory. And because you're forcing the address in mmap, you can even store the pointers, etc.
As mentioned above, in addition to requiring a fair amount of work, it's fragile. Recompile your application, and the targeted address might not be available, or the layout might be different, or whatever. But since it's really just an optimization, this might not be an issue; anytime a compatibility issue arises, just drop the old file and start over. It will make the first run after a change which breaks compatibility extremely slow, but if you don't break compatibility too often...
The stuff that isn't in a map is easy. You put everything in one contiguous chunk of memory that you know (like a big array, or a struct/class with no pointers), and then use write() to write it out. Later use read() to read it in, in a single operation. If the size might vary, then use one operation to read a single int with the size, allocate the memory, and then use a single read() to pull it in.
The map part is a bit harder, since you can't do it all in one operation. Here you need to come up with a convention for serializing it. To make the i/o as fast as possible, your best bet is to convert it from the map to an in-memory form that is all in one place and you can convert back to the map easily and quickly. If, for example your keys are ints, and your values are of constant size then you could make an array of keys, and an array of values, copy your keys into the one array and values into the other, and then write() the two arrays, possibly writing out their size as well. Again, you read things in with only two or three calls to read().
Note that nothing ever got translated to ASCII, and there are a minimum number of system calls. The file will not be human readable, but it will be compact, and fast to read in. Three things make i/o slow: 1) system calls, if you use small reads/writes; 2) translation to/from ASCII (printf, scanf); 3) disk speed. Hard to do much about 3) (other than an SSD). You can do the read in a background thread, but you might need to block waiting for the data to be in.
Some guidelines:
multiple calls to read() are more expensive than single call
binary files are faster than text files
single file is faster than multiple files for large values of "multiple"
use memory-mapped files if you can
use 64 bit OS to let OS manage the memory for you
Ideally, I'd try to put all long doubles into memory-mapped file, and all maps into binary files.
Divide and conquer: if 64 bits is not an option, try to break your data into large chunks in a way that all chunks are never used together, and the entire chunk is needed when it's needed. This way you could load the chunks when they needed and discard them when they are not.
These suggestions of uploading the whole data to the RAM are good when two conditions are met:
Sum of all I/O times during is much more than cost of loading all data to RAM
Relatively large portion of all data is being accessed during application run
(they are usually met when some application is running for a long time processing different data)
However for other cases other options might be considered.
E.g. it is essential to understand if access pattern is truly random. If no, look into reordering data to ensure that items that are accessible together are close to each other. This will ensure that OS caching is performing at its best, and also will reduce HDD seek times (not a case for SSD of course).
If accesses are truly random, and application is not running as long as needed to ammortize one-time data loading cost I would look into architecture, e.g. by extracting this data manager into separate module that will keep this data preloaded.
For Windows it might be system service, for other OSes other options are available.
Cache, cache, cache. If it's only several GB it should be feasible to cache most if not all of your data in something like memcached. This is an especially good solution if you're using MPI across multiple machines rather than just multiple processors on the same machine.
If it's all running on the same machine, consider a shared memory cache if you have the memory available.
Also, make sure your file writes are being done on a separate thread. No need to block an entire process waiting for a file to write.
As was said, cache as much as you can in memory.
If you're finding that the amount you need to cache is larger than your memory will allow, try swapping out the caches between memory and disk how it is often done when virtual memory pages need to be swapped to disk. It is essentially the same problem.
One common method is the Least Recently Used Algorithm for determining which page will be swapped.
It really depends on how much memory is available and what the access pattern is.
The simplest solution is to use memory mapped files. This generally requires that the file has been layed out as if the objects were in memory, so you will need to only use POD data with no pointers (but you can use relative indexes).
You need to study your access pattern to see if you can group together the values that are often used together. This will help the OS in better caching those values (ie, keeping them in memory for you, rather than always going to the disk to read them).
Another option will be to split the file into several chunks, preferably in a logical way. It might be necessary to create an index file that map a range of values to the file that contain them.
Then, you can only access the set of files required.
Finally, for complex data structures (where memory mapped files fail) or for sparse reading (when you only ever extract only a small piece of information from a given file), it might be interesting to read about LRU caches.
The idea will be to use serialization and compression. You write several files, among which an index, and compress all of them (zip). Then, at launch time, you start by loading the index and save it in memory.
Whenever you need to access a value, you first try your cache, if it is not it, you access the file that contains it, decompress it in memory, dump its content in your cache. Note: if the cache is too small, you have to be picky about what you dump in... or reduce the size of the files.
The frequently accessed values will stay in cache, avoiding unnecessary round-trip, and because the file is zipped there will be less IO.
Structure your data in a way that caching can be effective. For instance, when you are reading "certain pieces," if those are all contiguous it won't have to seek around the disk to gather all of them.
Reading and writing in batches, instead of record by record will help if you are sharing disk access with another process.
More specifically: I can pre-compute a huge amount of information - tons of probabilities (long double), a ton of std::map, and much more - and save all this stuff to disk (several Gb).
As far as I understood the std::map are pre-calculated also and there are no insert/remove operations. Only search. How about an idea to replace the maps to something like std::hash_map or sparsehash. In theory it can give performance gain.
More specifically: I can pre-compute a huge amount of information - tons of probabilities (long double), a ton of std::map, and much more - and save all this stuff to disk (several Gb).
Don't reinvent the wheel. I'd suggest using a key-value data store, such as berkeley db: http://docs.oracle.com/cd/E17076_02/html/gsg/C/concepts.html
This will enable saving and sharing the files, caching the parts you actually use a lot and keeping other parts on disk.

list/map of key-value pairs backed up by file on disk

I need to make a list of key-value pairs (similar to std::map<std::string, std::string>) that is stored on disk, can be accessed by multiple threads at once. keys can be added or removed, values can be changed, keys are unique. Supposedly the whole thing might not fit into memory at once, so updates to the map must be saved to the disk.
The problem is that I'm not sure how to approach this problem. I understand how to deal with multithreading issues, but I'm not sure which data structure is suitable for storing data on disk. Pretty much anything I can think of can dramatically change structure and cause massive overwrite of the disk storage, if I approach problem head-on. On other hand, relational databases and windows registry deal with this problem, so there must be a way to approach it.
Is there a data structure that is "made" for such scenario?
Or do I simply use any traditional data structure(trees or skip lists, for example) and make some kind of "memory manager" (disk-backed "heap") that allocates chunks of disk space, loads them into memory on request and unloads them onto disk, when necessary? I can imagine how to write such "disk-based heap", but that solution isn't very elegant, especially when you add multi-threading to the picture.
Ideas?
The data structure that is "made" for your scenario is B-tree or its variants, like B+ tree.
Long and short of it: once you write things to disk you are not longer dealing with "data structures" - you are dealing with "serialization" and "databases."
The C++ STL and its data structures do not really address these issues, but, fortunately, they have already been addressed thousands of times by thousands of programmers already. Chances are 99.9% that they've already written something that will work well for you.
Based on your description, sqlite sounds like it would be a decent, balanced choice for your application.
If you only need to do lookups (and insertions, deletions) by key, and not more complex field-based queries, BDB may be a better choice for your application.