Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
I'm looking for a framework / approach to do message passing distributed computation in C++.
I've currently got an iterative, single-threaded algorithm that incrementally updates some data model. The updates are literally additive, and I'd like to distribute (or at least parallelize) the computation hereof over as many machines+cores as possible. The data model can be viewed as a big array of (independent) floating point values.
Since the updates are all additive (i.e. commutative and associative), it's OK to merge in updates from other nodes in arbitrary order or even to batch merge updates. When it comes to applying updates, the map/reduce paradigm would work fine.
On the other hand, the updates are computed with respect to the current model state. Each step "corrects" some flaw, so it's important that the model used for computing the update is as fresh as possible (the more out of date the model, the less useful the update). Worst case, the updates are fully dependent, and parallelism doesn't do any good.
I've never implemented anything flexibly distributable, but this looks like a prime candidate. So, I'm looking for some framework or approach to distribute the updates (which consist mostly of floating point numbers and a few indexes into the array to pinpoint where to add the update). But, I'm unsure as to how:
I can broadcast updates to all connected processes. But that means massive network traffic, so I'd realistically need to batch updates; and then updates will be less current. This doesn't look scalable anyhow.
I can do some kind of ring topology. Basically, a machine sends the next machine the sum of its own updates and those of it's predecessors. But then I'd need to figure out how to not duplicate updates, after all, the ring is circular and eventually it's own updates will arrive as part of the sum of its predecessors.
or some kind of tree structure...
To recap, to get decent convergence performance, low latency is critical; the longer between update computation and update application, the less useful the update is. Updates need to be distributed to all nodes as quickly as possible; but because of the commutative and associate nature of the updates, it doesn't matter whether these updates are individually broadcast (probably inefficient) or arrive as part of a merged batch.
Does anybody know of any existing frameworks or approaches to speed up development? Or even just general pointers? I've never done anything quite like this...
You probably want MPI (Message Passing Interface.) It's essentially the industry-standard for distributed computing. There are many implementations, but I would recommend OpenMPI because it's both free, and highly regarded. It provides you with a C API to pass messages between nodes, and also provides higher-level functionality like broadcast, all-to-all, reduce, scatter-gather, etc. It works over TCP, as well as faster, lower-latency interconnects like Infiniband or Myrinet, and supports various topologies.
There is also a Boost wrapper around MPI (Boost.MPI) that will provide you with a more C++ friendly interface.
Are you looking for something like Boost.MPI?
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
I have been exploring parallel IO in Chapel. The Chapel documentation mentions a parallel IO flag, and that channels can work in parallel. But, I don't see anything else.
I don't have a particular question in mind, but just want to know more about it.
Could Chapel team or sound Chapel practitioners discuss any example of a due use of the Chapel parallel IO paradigm?
Parallel I/O can mean different things to different people, which makes it challenging to have a single, simple answer to this (though it might suggest that the Chapel project should add a parallel I/O landing page to its documentation which would point to other resources?). For example, "parallel I/O" could mean:
using multiple tasks (on a single node or across multiple nodes) to write to a single file
using multiple tasks to write to multiple files
using a parallel file system of some sort
Another important factor is the desired file format: text, binary, or a specific file format like HDF5, NetCDF, etc.
Generally speaking, the explicit way of doing parallel I/O in Chapel is to create a number of tasks using Chapel's language features for expressing parallelism (e.g., coforall, cobegin, or begin), and then to give each task its own channel to read from / write to. If all of the channels refer to a single file, the tasks would likely need to coordinate between themselves to make sure they were writing to / reading from disjoint segments of the file. If each channel refers to its own file, such coordination wouldn't be necessary.
The other major way to get parallel I/O in Chapel is implicit, by invoking a library routine where the parallelism is created and managed within the routine itself—either using techniques like the above for routines written in Chapel, or by calling out to an external parallel function (e.g., a parallel I/O routine from a C library).
Finally, you could create multiple tasks that call serial (or parallel) I/O library routines simultaneously.
For an example of the first, explicit approach, see this sample program that I recently put together in response to a similar question. It declares a 2D array whose rows are block-distributed and then uses a task per locale (compute node) to write that locale's sub-array out to a single/shared binary-format file. It then does a similar thing to read the data back into a second array and verifies that the two arrays match. In both cases, each task advances its channel to the appropriate file offset corresponding to the values it wants to write/read.
Examples of the library-based approach to parallel I/O include the hdf5WriteDistributedArray() routine which logically does something very similar to the previous example, yet using the HDF5 file format. Or, the readAllHDF5Files() routine is an example of a library routine that reads from multiple files in parallel.
I think it's safe to say that Chapel should support many more library routines to help with parallel I/O than it has today. The main challenge is knowing which patterns and formats from the space outlined above will be most important to users. We're always open to requests and input in this regard.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
I want to implement a fast database alternative that only needs to handle binary data.
To specify, I want something close to a database that will be securely stored even in case of a forced termination (task manager) during execution, whilst also being accessed directly from memory in C++. Like a vector of structs that is mirrored to the hard disk. It should be able to handle hundreds of thousands of read accesses and at least 1000 write accesses per second. In case of a forced termination, at most the last command can be lost. It does not need to support multithreading and the database file will only be accessed by a single instance of the program. Only needs to run on Windows. These are the solutions I've thought of so far:
SQL Databases
Advantages
Easy to implement, since lots of libraries are available
Disadvantages
Server is on a different process, therefor possibly slow inter process communication
Necessity of parsing SQL queries
Built for multithreaded environments, so lots of unnecessary synchronization
Rows can't be directly accessed using pointers but need to be copied at least twice per change
Unnecessary delays on the UPDATE query, since the whole table needs to be searched and the WHERE case checked
These were just a few from the top of my head, there might be a lot more
Memory Mapped Files
Advantages
Direct memory mapping, so direct pointer access possible
Very fast compared to databases
Disadvantages
Forceful termination could lead to a whole page not being written
Lots of code (I don't actually mind that)
No forced synchronization possible
Increasing file size might take a lot of time
C++ vector*
Advantages
Direct pointer access possible, however, needs to manually notify of changes
Very fast compared to databases
Total programming freedom
Disadvantages
Possibly slow because of many calls to WriteFile
Lots of code (I don't actually mind that)
C++ vector with complete write every few seconds
Advantages
Direct pointer access possible
Very fast compared to databases
Total programming freedom
Disadvantages
Lots of unchanged data being rewritten to file, alternatively lots of RAM wasted on preventing unnecessary writes
Inaccessibility during writes of lots of RAM wasted on copy
Could lose multiple seconds worth of data
Multiple threads and therefor synchronization needed
*Basically, a wrapper class that only exposes per row read/write functionality of a vector OR allows direct write to memory, but relies on the caller to notify of changes, all reads are done from a copy in memory, all writes are done to a copy in memory and the file itself on a per-command basis
Also, is it possible to write to different parts of a file without flushing, and then flushing all changes at once with a guarantee that the file will be written either completely or not at all even in case of a forced termination during write? All I can think of is the following workflow:
Duplicate target file on startup, then for every set of data:
Write all changes to duplicate -> Flush by replacing original with duplicate
However, I feel like this would be a horrible waste of hard disk space for big files.
Thanks in advance for any input!
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Wants to create an application storing data in memory. But i dont want the data to be lost even if my app crashes.
What concept should i use?
Should I use a shared memory, or is there some other concept that suits my requirement better.
You are asking for persistence (or even orthogonal persistence) and/or for application checkpointing.
This is not possible (at least thru portable C++ code) in the general case for some arbitrary existing C++ code, e.g. because of ASLR, because of pointers on -or to- the local call stack, because of multi-threading, and because of external resources (sockets, opened files, ...), because the current continuation cannot be accessed, restored and handled in standard C++.
However, you might design your application with persistence in mind. This is a strong architectural requirement. You could for instance have every class contain some dumping method and its load factory function. Beware of shared pointers, and take into account that you could have cyclic references. Study garbage collection algorithms (e.g. in the Gc HandBook) which are similar to those needed for persistence (a copying GC is quite similar to a checkpointing algorithm).
Look also in serialization libraries (like libs11n). You might also consider persisting into textual format (e.g. JSON), perhaps inside some Sqlite database (or some real database like PostGreSQL or MongoDb....). I am doing this (in C) in my monimelt software.
You might also consider checkpointing libraries like BLCR
The important thing is to think about persistence & checkpointing very early at design time. Thinking of your application as some specialized bytecode interpreter or VM might help (notably if you want to persist continuations, or some form of "call stack").
You could fork your process (assuming you are on Linux or Posix) before persistence. Hence, persistence time does not matter that much (e.g. if you persist every hour or every ten minutes).
Some language implementations are able to persist their entire state (notably their heap), e.g. SBCL (a good Common Lisp implementation) with its save-lisp-and-die, or Poly/ML -an ML dialect- with its SaveState, or Squeak (a Smalltalk implementation).
See also this answer & that one. J.Pitrat's blog has a related entry: CAIA as a sleeping beauty.
Persistency of data with code (e.g. vtables of objects, function pointers) might be technically difficult. dladdr(3) -with dlsym- might help (and, if you are able to code machine-specific things, consider the old getcontext(3), but I don't recommend that). Avoid name mangling (for dlsym) by declaring extern "C" all code related to persistence. If you want to persist some data and be able to restart from it with a slightly modified program (e.g. a small bugfix) things are much more complex.
More pragmatically, you could have a class representing your entire persistable state, and implement methods to persist (and reload it). You would then persist only at certain steps of your algorithm (e.g. if you have a main loop or an event loop, at start of that loop). You probably don't want to persist too often (e.g. because of the time and disk space required to persist), e.g. perhaps every ten minutes. You might perhaps consider some transaction log if it fits in the overall picture of your application.
Use memory mapped files - mmap (https://en.wikipedia.org/wiki/Mmap) And allocate all your structures inside mapped memory region. System will properly save mapped file to disk when your app crashes.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
In a few months I will start to write my bachelor-thesis. Although we only discussed the topic of my thesis very roughly, the main problem will be something like this:
A program written in C++ (more or less a HTTP-Server, but I guess it doesn't matter here) has to be executed to fulfill its task. There are several instances of this program running at the same time, and a loadbalancer takes care of equal distribution of http-requests between all instances. Every time the program's code is changed to enhance it, or to get rid of bugs, all instances have to be restarted. This can take up to 40 minutes, for one instance. As there are more than ten instances running, the restart process can take up to one work day. This is way to slow.
The presumed bottleneck is the access to the database during startup to load all necessary data (guess it will be a mysql-database). The idea of the teamleader to decrease the amount of time needed for the startup-process is to serialize the content of the database to a file, and read from this file instead of reading from the database. That would be my task. Of course the problem is to check if there is new data in the database, that is not in the file. I guess write processes are still applied to the database, not to the serialized file. My first idea is to use apache thrift for serialization and deserialization, as I already worked with it and it is fast, as far as I know (maybe i write some small python programm, to take care of this). However, I have some basic questions regarding this problem:
Is it a good solution to read from file instead of reading from database. Is there any chance this will save time?
Would thrift work well in this scenario, or is there some faster way for serialization/deserialization
As I am only reading, not writing, I don't have to take care of consistency, right?
Can you recommend some books or online literature that is worth to read regarding this topic.
If I'm missing Information, just ask. Thanks in advance. I just want to be well informed and prepared before I start with the thesis, this is why I ask.
Kind regards
Michael
Cache is king
As a general recommendation: Cache is king, but don't use files.
Cache? What cache?
The cache I'm talking about is of course an external cache. There are plenty of systems available, a lot of them are able to form a cache cluster with cached items spread across multiple machine's RAM. If you are doing it cleverly, the cost of serializing/deserializing into memory will make your algorithms shine, compared to the cost of grinding the database. And on top of that, you get nice features like TTL for cached data, a cache that persists even if your business logic crashes, and much more.
What about consistency?
As I am only reading, not writing, I don't have to take care of consistency, right?
Wrong. The issue is not, who writes to the database. It is about whether or not someone writes to the database, how often this happens, and how up-to-date your data need to be.
Even if you cache your data into a file as planned in your question, you have to be aware that this produces a redundant data duplicate, disconnected from the original data source. So the real question you have to answer (I can't do this for you) is, what the optimum update frequency should be. Do you need immediate updates in near-time? Is a certain time lag be acceptable?
This is exactly the purpose of the TTL (time to live) value that you can put onto your cached data. If you need more frequent updates, set a short TTL. If you are ok with updates in a slower frequency, set the TTL accordingly or have a scheduled task/thread/process running that does the update.
Ok, understood. Now what?
Check out Redis, or the "oldtimer" Memcached. You didn't say much about your platform, but there are Linux and Windows versions available for both (and especially on Windows you will have a lot more fun with Redis).
PS: Oh yes, Thrift serialization can be used for the serialization part.
I am quite excited by the possibility of using languages which have parallelism / concurrency built in, such as stackless python and erlang, and have a firm belief that we'll all have to move in that direction before too long - or will want to because it will be a good/easy way to get to scalability and performance.
However, I am so used to thinking about solutions in a linear/serial/OOP/functional way that I am struggling to cast any of my domain problems in a way that merits using concurrency. I suspect I just need to unlearn a lot, but I thought I would ask the following:
Have you implemented anything reasonably large in stackless or erlang or other?
Why was it a good choice? Was it a good choice? Would you do it again?
What characteristics of your problem meant that concurrent/parallel was right?
Did you re-cast an exising problem to take advantage of concurrency/parallelism? and
if so, how?
Anyone any experience they are willing to share?
in the past when desktop machines had a single CPU, parallelization only applied to "special" parallel hardware. But these days desktops have usually from 2 to 8 cores, so now the parallel hardware is the standard. That's a big difference and therefore it is not just about which problems suggest parallelism, but also how to apply parallelism to a wider set of problems than before.
In order to be take advantage of parallelism, you usually need to recast your problem in some ways. Parallelism changes the playground in many ways:
You get the data coherence and locking problems. So you need to try to organize your problem so that you have semi-independent data structures which can be handled by different threads, processes and computation nodes.
Parallelism can also introduce nondeterminism into your computation, if the relative order in which the parallel components do their jobs affects the results. You may need to protect against that, and define a parallel version of your algorithm which is robust against different scheduling orders.
When you transcend intra-motherboard parallelism and get into networked / cluster / grid computing, you also get the issues of network bandwidth, network going down, and the proper management of failing computational nodes. You may need to modify your problem so that it becomes easier to handle the situations where part of the computation gets lost when a network node goes down.
Before we had operating systems people building applications would sit down and discuss things like:
how will we store data on disks
what file system structure will we use
what hardware will our application work with
etc, etc
Operating systems emerged from collections of 'developer libraries'.
The beauty of an operating system is that your UNWRITTEN software has certain characteristics, it can:
talk to permanent storage
talk to the network
run in a command line
be used in batch
talk to a GUI
etc, etc
Once you have shifted to an operating system - you don't go back to the status quo ante...
Erlang/OTP (ie not Erlang) is an application system - it runs on two or more computers.
The beauty of an APPLICATION SYSTEM is that your UNWRITTEN software has certain characteristics, it can:
fail over between two machines
work in a cluster
etc, etc...
Guess what, once you have shifted to an Application System - you don't go back neither...
You don't have to use Erlang/OTP, Google have a good Application System in their app engine, so don't get hung up about the language syntax.
There may well be good business reasons to build on the Erlang/OTP stack not the Google App Engine - the biz dev guys in your firm will make that call for you.
The problems will stay almost the same inf future, but the underlying hardware for the realization is changing. To use this, the way of compunication between objects (components, processes, services, how ever you call it) will change. Messages will be sent asynchronously without waiting for a direct response. Instead after a job is done the process will call the sender back with the answer. It's like people working together.
I'm currently designing a lightweighted event-driven architecture based on Erlang/OTP. It's called Tideland EAS. I'm describing the ideas and principles here: http://code.google.com/p/tideland-eas/wiki/IdeasAndPrinciples. It's not ready, but maybe you'll understand what I mean.
mue
Erlang makes you think of the problem in parallel. You won't forget it one second. After a while you adapt. Not a big problem. Except the solution become parallel in every little corner. All other languages you have to tweak. To be concurrent. And that doesn't feel natural. Then you end up hating your solution. Not fun.
The biggest advantages Erlang have is that it got no global garbage collect. It will never take a break. That is kind of important, when you have 10000 page views a second.