high speed interprocess associative array

high speed interprocess associative array - c++

Is there library usable from c++ for sharing fairly simple data (integers,floating point numbers, strings) between cooperative processes?
Must be :
high-speed (SQL-based methods too slow due to parsing)
able to get,set,update,delete both fixed and variable data types (e.g. int and string)
ACID (atomic,consistent,isolated,durable)
usable under linux
usable by processes without a shared parent.
highly compatible license: e.g. LGPL,MIT,BSD
For bonus points:
ability to work across the network.
ability to handle aggregation/composition into more complicated structures

Take a look at boost::interprocess. For local use, you probably can't beat a map or hash table in shared memory. Allowing networking makes things more difficult, in that case something like memcached or CouchDB might be more appropriate.

Related

Can an OCaml program use more than one processor core?

Isn't it important to be able to do that in order to attain maximal speed?
Edit:
Clojure, for example, has pmap, which uses more than one core.
Dr. Harrop wrote (Jan. 9, 2011):
The new features being added to the language, such as first-class
modules in OCaml 3.12, are nowhere near as valuable as multicore
capability would have been.

Yes it can; for this, you should use a multi-processing model, where you program spawns multiple processes to do computation independently and then merge results.
The simplest way to do is to use the Unix.fork system call to fork your program into two processes. This is described for example in the online book Unix system programming in OCaml. If the computation you want to split across cores has a simple structure (iteration, mapping over a pool of inputs), Parmap is a library that will let you benefit from parallelism rather easily, just by changing some function calls in your application (if it is well structured already). If you want to do more sophisticated things (direct access to shared memory structures, message boxes...), the Ocaml-net project supports lot of convenient features through the Netmulticore library.
If you want to do distributed programming (programs that run on a cluster of several machines), the OcamlMPI library provides support for the well-known distributed message passing framework MPI. There is also the more experimental and high-level JoCaml extension, that uses an interesting, more researchy approach to concurrent communication.
Note that if you have no specific performance constraints, or if you application is inherently sequential, it makes no sense to bother to try to parallelize some computation (at the cost of the higher book-keeping overhead of synchronization), in the latter case because of Amdahl's law.

If your parallel code generates a large amount of data then there is no easy way to get it back efficiently in OCaml. The traditional workaround is to fork processes and marshal results back to the parent process but your parent process then deserializes all of the data on a single core, reallocating everything on its own heap. That is very inefficient on a multicore and means that OCaml cannot express efficient implementations of most parallel algorithms including pmap.

MPI how to send and receive unknown datatypes

We have developed an algorithm library in C++ which allows the user to implement his own datatypes for sharing data between individual algorithms (also implemented by the user).
This works fine, but we want to provide parallelization at library level. The individual algorithms should be executed in parallel on different nodes of distributed memory machines.
We decided to use MPI for parallelization, as it can be used for distributed and shared memory machines without code changes.
Unfortunately we fight now the problem how to distribute the user implemented datatypes between the nodes. We have the following problems:
We do not know how big the data might be, it might even change from run to run.
We do not know what data is inside the data structure.
The amount of data can be very big up to 1GB (this should be no problem with MPI)
The user should not see any difference in implementing the datatypes or algorithms for parallel execution (for the algorithm there is actually no problem)
Is there a possibility to use MPI to share these data between the nodes, or are there approaches available, which might be better suited for this kind of problem.
We would like to have a solution which works at least on shared memory machines however we would love to have a solution which works without code changes on shared and distributed memory machines.

Yes, you can do this with MPI, but no, MPI can't do it for you by itself.
Whether you're sending this data to another node, or writing it to disk, at some point you need to expressly describe the data structures layout in memory so that it can be serialized. If you pass MPI (or any other communications library) a pointer, it doesn't know what lies on the other side of that pointer, and so it has no way of traversing the data structure to copy its contents.
You can marshal the arguments into plain old data (manually, or with things like MPI_PACK), or you can create an MPI datatype which describes the layout of data in memory for that particular instance, and that will copy the data over. In addition, you'll need to redirecting any pointers within the data structure. Boost serialization may be able to help you with all of this.

Main Memory Database with C++ Interface

I am looking for a main memory database with a C++ interface. I am looking for a database with a programmatic query interface and preferably one that works with native C++ types. SQLLite, for example, takes queries as string and needs to perform parsing ... which is time consuming. The operations I am looking for are:
Creation of tables of arbitrary dimensions (number of attributes) capable of storing integer types.
Support for insertion, deletion, selection, projection and (not a priority) joins.

The parsing time of SQLite isn't really that much (you can amortize it over many queries) unless you're substituting the values into the SQL query by hand. Substituting by hand is hard work, awkward, slow and probably unsafe too. Instead, you should be using bound parameters so that you can do things more directly (see http://www.sqlite.org/c3ref/bind_blob.html for the relevant API).
Note that if you switch to a different database, you will have the same issue; you only get high speed out of any SQL system by using bound parameters. (And consider not sweating over performance too much; the bits where it hits storage are the bottleneck…)

Try Boost.MultiIndex.

BerkeleyDB (now owned by Oracle) can store data entirely in memory (though it was originally designed for disk storage). TimesTen (now also owned by Oracle) was designed from the beginning for in-memory storage. Both of them support both SQL and an API for direct access from C, C++, etc.

performance penalty of message passing as opposed to shared data

There is a lot of buzz these days about not using locks and using Message passing approaches like Erlang. Or about using immutable datastructures like in Functional programming vs. C++/Java.
But what I am concerned with is the following:
AFAIK, Erlang does not guarantee Message delivery. Messages might be lost. Won't the algorithm and code bloat and be complicated again if you have to worry about loss of messages? Whatever distributed algorithm you use must not depend on guaranteed delivery of messages.
What if the Message is a complicated object? Isn't there a huge performance penalty in copying and sending the messages vs. say keeping it in a shared location (like a DB that both processes can access)?
Can you really totally do away with shared states? I don't think so. For e.g. in a DB, you have to access and modify the same record. You cannot use message passing there. You need to have locking or assume Optimistic concurrency control mechanisms and then do rollbacks on errors. How does Mnesia work?
Also, it is not the case that you always need to worry about concurrency. Any project will also have a large piece of code that doesn't have to do anything with concurrency or transactions at all (but they do have performance and speed as a concern). A lot of these algorithms depend on shared states (that's why pass-by-reference or pointers are so useful).
Given this fact, writing programs in Erlang etc is a pain because you are prevented from doing any of these things. May be, it makes programs robust, but for things like Solving a Linear Programming problem or Computing the convex hulll etc. performance is more important and forcing immutability etc. on the algorithm when it has nothing to do with Concurrency/Transactions is a poor decision. Isn't it?

That's real life : you need to account for this possibility regardless of the language / platform. In a distributed world (the real world), things fail: live with it.
Of course there is a cost: nothing is free in our universe. But shouldn't you use another medium (e.g. file, db) instead of shuttling "big objects" in communication pipes? You can always use "message" to refer to "big objects" stored somewhere.
Of course not: the idea behind functional programming / Erlang OTP is to "isolate" as much as possible the areas were "shared state" is manipulated. Futhermore, having clearly marked places where shared state is mutated helps testability & traceability.
I believe you are missing the point: there is no such thing as a silver bullet. If your application cannot be successfully built using Erlang then don't do it. You can always some other part of the overall system in another fashion i.e. use a different language / platform. Erlang is no different from another language in this respect: use the right tool for the right job.
Remember: Erlang was designed to help solve concurrent, asynchronous and distributed problems. It isn't optimized for working efficiently on a shared block of memory for example... unless you count interfacing with nif functions working on shared blocks part of the game :-)

Real-world systems are always hybrids anyway: I don't believe the modern paradigms try, in practice, to get rid of mutable data and shared state.
The objective, however, is not to need concurrent access to this shared state. Programs can be divided into the concurrent and the sequential, and use message-passing and the new paradigms for the concurrent parts.
Not every code will get the same investment: There is concern that threads are fundamentally "considered harmful". Something like Apache may need traditional concurrent threads and a key piece of technology like that may be carefully refined over a period of years so it can blast away with fully concurrent shared state. Operating system kernels are another example where "solve the problem no matter how expensive it is" may make sense.
There is no benefit to fast-but-broken: But for new code, or code that doesn't get so much attention, it may be the case that it simply isn't thread-safe, and it will not handle true concurrency, and so the relative "efficiency" is irrelevant. One way works, and one way doesn't.
Don't forget testability: Also, what value can you place on testing? Thread-based shared-memory concurrency is simply not testable. Message-passing concurrency is. So now you have the situation where you can test one paradigm but not the other. So, what is the value in knowing that the code has been tested? The danger in not even knowing if the other code will work in every situation?

A few comments on the misunderstanding you have of Erlang:
Erlang guarantees that messages will not be lost, and that they will arrive in the order sent. A basic error situation is that machine A can not speak to machine B. When that happens process monitors and links will trigger, and system node-down messages will be sent to the processes that registered for it. Nothing will be silently dropped. Processes will "crash" and supervisors (if any) tries to restart them.
Objects can not be mutated, so they are always copied. One way to secure immutability is by copying values to other erlang process' heaps. Another way is to allocate objects in a shared heap, message references to them and simply not have any operations that mutate them. Erlang does the first for performance! Realtime suffers if you need to stop all processes to garbage collect a shared heap. Ask Java.
There is shared state in Erlang. Erlang is not proud of it, but it is pragmatic about it. One example is the local process registry which is a global map that maps a name to a process so that system processes can be restarted and claim their old name. Erlang just tries to avoid shared state if it possibly can. ETS tables that are public are another example.
Yes, sometimes Erlang is too slow. This happens all languages. Sometimes Java is too slow. Sometimes C++ is too slow. Just because a tight loop in a game had to drop down to assembly to kick off some serious SIMD-based vector mathematics you can't deduce that everything should be written in assembly because it is the only language that is fast when it matters. What matters is being able to write systems that have good performance, and Erlang manages quite well. See benchmarks on yaws or rabbitmq.
Your facts are not facts about Erlang. Even if you think Erlang programming is a pain, you will find other people create some awesome software thanks to it. You should attempt writing an IRC server in Erlang, or something else very concurrent. Even if you're never going to use Erlang again, you would have learned to think about concurrency another way. But of course, you will, because Erlang is awesome easy.
Those that do not understand Erlang are doomed to re-implement it badly.
Okay, the original was about Lisp, but... its true!

There are some implicit assumption in your questions - you assume that all the data can fit
on one machine and that the application is intrinsically localised to one place.
What happens if the application is so large it cannot fit on one machine? What happens if the application outgrows one machine?
You don't want to have one way to program an application if it fits on one machine and
a completely different way of programming it as soon as it outgrows one machine.
What happens if you want make a fault-tolerant application? To make something fault-tolerant you need at least two physically separated machines and no sharing.
When you talk about sharing and data bases you omit to mention that things like mySQL
cluster achieve fault-tolerence precisely by maintaining synchronised copies of the
data in physically separated machines - there is a lot of message passing and
copying that you don't see on the surface - Erlang just exposes this.
The way you program should not suddenly change to accommodate fault-tolerance and scalability.
Erlang was designed primarily for building fault-tolerant applications.
Shared data on a multi-core has it's own set of problems - when you access shared data
you need to acquire a lock - if you use a global lock (the easiest approach) you can end up
stopping all the cores while you access the shared data. Shared data access on a multicore
can be problematic due to caching problems, if the cores have local data caches then accessing "far away" data (in some other processors cache) can be very expensive.
Many problems are intrinsically distributed and the data is never available in one place
at the same time so - these kind of problems fit well with the Erlang way of thinking.
In a distributed setting "guaranteeing message delivery" is impossible - the destination machine might have crashed. Erlang cannot thus guarantee message delivery -
it takes a different approach - the system will tell you if it failed to deliver a message
(but only if you have used the link mechanism) - then you can write you own custom error
recovery.)
For pure number crunching Erlang is not appropriate - but in a hybrid system Erlang
is good at managing how computations get distributed to available processors, so we see a lot of systems where Erlang manages the distribution and fault-tolerent aspects of the problem, but the problem itself is solved in a different language.
and other languages are used

For e.g. in a DB, you have to access and modify the same record
But that is handled by the DB. As a user of the database, you simply execute your query, and the database ensures it is executed in isolation.
As for performance, one of the most important things about eliminating shared state is that it enables new optimizations. Shared state is not particularly efficient. You get cores fighting over the same cache lines, and data has to be written through to memory where it could otherwise stay in a register or in CPU cache.
Many compiler optimizations rely on absence of side effects and shared state as well.
You could say that a stricter language guaranteeing these things requires more optimizations to be performant than something like C, but it also makes these optimizations much much easier for the compiler to implement.
Many concerns similar to concurrency issues arise in singlethreaded code. Modern CPUs are pipelined, execute instructions out of order, and can run 3-4 of them per cycle. So even in a single-threaded program, it is vital that the compiler and CPU is able to determine which instructions can be interleaved and executed in parallel.

For correctness, shared is the way to go, and keep the data as normalized as possible. For immediacy, send messages to inform of changes, but always back them up with polling. Messages get dropped, duplicated, re-ordered, delayed - don't rely on them.
If speed is what you're worried about, first do it single-thread and tune the daylights out of it. Then if you've got multiple cores and know how to split up the work, use parallelism.

Erlang provides supervisors and gen_server callbacks for synchronous calls, so you will know about it if a message isn't delivered: either the gen_server call returns a timeout, or your whole node will be brought down and up if the supervisor is triggered.
usually if the processes are on the same node, message-passing languages optimise away the data copying, so it's almost like shared memory, except if the object is changed used by both afterward, which can not be done using shared memory either anyways
There is some state which is kept by processes by passing it around to themselves in the recursive tail-calls, also some state can be of course passed through messages. I don't use mnesia much, but it is a transactional database, so once you have passed the operation to mnesia (and it has returned) you are pretty much guaranteed it will go through..
Which is why it is easy to tie such applications into erlang with the use of ports or drivers. The easiest are the ports, it's much like a unix pipe, though I think performance isn't that great...and as said, message-passing usually ends up just being pointer passing anyways as the VM/compiler optimise the memory copy out.

Best way for interprocess communication in C++

I have two processes one will query other for data.There will be huge amount of queries in a limited time (10000 per second) and data (>100 mb) will be transferred per second.Type of data will be an integral type(double,int)
My question is in which way to connect this process?
Shared memory , message queue , lpc(Local Procedure call) or others....
And also i want to ask which library you suggest? by the way please do not suggest MPI.
edit : under windows xp 32 bit

One Word: Boost.InterProcess. If it really needs to be fast, shared memory is the way to go. You nearly have zero overhead as the operation system does the usual mapping between virtual and physical addresses and no copy is required for the data. You just have to lookout for concurrency issues.
For actually sending commands like shutdown and query, I would use message queues. I previously used localhost network programming to do that, and used manual shared memory allocation, before i knew about boost. Damn if i would need to rewrite the app, I would immediately pick boost. Boost.InterProcess makes this more easy for you. Check it out.

I would use shared memory to store the data, and message queues to send the queries.

I'll second Marc's suggestion -- I'd not bother with boost unless you have a portability concern or want to do cool stuff like map standard container types over shared memory (in which case I'd definitely use boost).
Otherwise, message queues and shared memory are pretty simple to deal with.

If your data consists of multiple types and/or you need things like mutex, use Boost.
Else use a shared section of memory using #pragma data_seg or a memory mapped file.

If you do use shared memory you will have to decide whether or not to spin or not. I'd expect that if you use a semaphore for synchronization and storing data in shared memory you will not get much performance benefit compared to using message queues (at significant clarity degradation), but if you spin on an atomic variable for synchronization, then you have to suffer the consequences of that.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js