I'm writing a specialized distributed storage system using akka clustering and would like to send large payloads (>1MB byte arrays) between actors. I found that I had to edit akka.remote.netty.tcp.maximum-frame-size to enable this.
My question is: are there any other performance implications that I need to take into account for this? For example, do I need to further tune netty buffer sizes? Is there a way to minimize the number of copies created?
One thing to think of is that large packages can cause something like head of line blocking where the system messages such as heartbeats will be blocked by sending the large package. In general it is a good idea to split large internode messages up into smaller messages to avoid this.
We are currently working on a new remoting subsystem for Akka which contains a separate large messages channel between nodes and where we also have a separation of regular messages and system messages to avoid those interfering with each other.
Related
I'm working to design a middle layer for an application that will receive up to ~5000 requests every few seconds and need to retrieve information from a database. I've been looking at use the Play Framework (I use scala for my REST api design) as they say its fully async and built on Akka. However, the main bottleneck of any solution seems to happen during read/writes to the database. Many Database cannot support simultaneous read/writes from a database of such a scale. How is such high concurrency achieved then for an app like this? I would guess Facebook/Twitter/ (name other big company) may have achieved this for their Applications as millions of people may be using them concurrently.
As Tim's comment was saying caching may or may not be able to help in your case. If not I would also recommend looking into horizontally scalable databases, for example cockroachdb if you want a transactional SQL db. Otherwise there are many no-sql choices a la mongodb etc. And if you really want to stick to traditional SQL systems you'll have to vertically scale your servers (buy the most expensive hardware) and work with read-replicas.
A huge component is your data model and query access pattern. If each query is incrementing a shared counter that has to be synchronized there will be a ton of contention, but if each query is touch completely separate data on the other end the spectrum than there will be a lot less contention.
I think there are a couple of dimensions I would consider:
Data Schema and Access Patterns (discussed above)
Language Choice
This is important becaues if you were in a web server context and were using prefork by default each process may have its own connection to the database. In an environment like python or ruby you may need hundreds of processes to handle your load. Contrast this with akka or another async networking based runtime (node, python gevent/asyncio, go, etc) where a single instance with a small thread pool can handle a large number of requests. Each have their tradeoffs.
Distributed Systems
Depending on your data schema and access patterns 5000 requests per second to a RDBMS is completely achievable. It would probably require relatively beefy hardware but but I'v personally done it a number of times. Getting to larger scales requires more computers in order to distribute the work/load. If your workload is right heavy and you can support potentially stale reads, a read replica is one option. With another machine in the mix reads are distributed over 2 machines but writes are still directed at a single machine (leader). Caching is another option.
At much higher workloads some sort of partitioning needs to occur in order to overcome the constraints of a single machine. https://github.com/vitessio/vitess
Many of the big contenders have solutions to horizontally scaling their databases. This has many drawbacks as well and will require careful planning.
The one thing I'd recommend is that if 5000 requests per second is projected for the near future, start with the minimal amount of hardware necessary (single instance) query patterns and operation get exponentially more complicated with a distributed database.
I am new to akka and am trying to see if it answers the problematics i am facing. I have data from databases to extract, transform with algorithms and send by and to actors. This involves a lot of computing.
Can akka handle all this (communication and computing)? Or do i have to call upon another tool to manage the calculus part?
Thank you all.
wip
Well, all I can offer here is my experience. As a matter of fact I am currently working on something similar (i.e an ETL with text files). We're essentially taking a lot of text files and loading their lines up into a PostgreSQL database. This is our setup :
Intel Xeon 8 cores + SSD
Files and app on the same machine
Remote database
We're able to fetch, parse and load 26 millions file lines and creating specific database indices in about 12 minutes, which is about 1.3GB worth of files and 3GB in database. On a much crappier mono-core and HDD setup we can do it in about 40 minutes.
The good thing about Akka is that it will allow you to save up resources and scale more since several actors can share one thread.
Akka can easily handle many millions of message sends per second, oldie but goodie on this topic here in this letitcrash.com post. As long as you factor out blocking operations in separate dispatchers (thread pools) the actor model eases parallel computations a lot, which of course gives you nice wall-clock-time in such data crunching apps.
Problem:
Process a backlog of messages where each message has three headers "service", "client", and "stream". I want to process the backlog of messages with maximum concurrency, but I have some requirements:
Only 10 messages with the same service can be processing at once.
Only 4 messages with the same service AND client can be processing at
once.
All messages with the same service AND client AND stream must
be kept in order.
Additional Information:
I've been playing around with "maxConcurrentConsumers" along with the "JMSXGroupID" in a ServiceMix (Camel + ActiveMQ) context, and I seem to be able to get 2 out of 3 of my requirements satisfied.
For example, if I do some content-based routing to split the backlog up into separate "service" queues (one queue for each service), then I can set the JMSXGroupID to (service + client + stream), and set maxConcurrentConsumers=10 on routes consuming from each queue. This solves the first and last requirements, but I may have too many messages for the same client processing at the same time.
Please note that if a solution requires a separate queue and route for every single combination of service+client, that would become unmanageable because there could be 10s of thousands of combinations.
Any feedback is greatly appreciated! If my question is unclear, please feel free to suggest how I can improve it.
To my knowledge, this would be very hard to achieve if you have 10k+ combos.
You can get around one queue per service/client combo by using consumers and selectors. That would, however, be almost equally hard to deal with (you simply don't create 10k+ selector consumers unharmed and without significant performance considerations), if you cannot predict in some way a limited set of service/client active at once.
Can you elaborate on the second requirement? Do you need it to make sure there are some sense of fairness among your clients? Please elaborate and I'll update if I can think of anything else.
Update:
Instead of consuming by just listening to messages, you could possibly do a browse on the queue, looping through the messages and pick one that "has free slots". You can probably figure out if the limit has been reached by some shared variable that keeps track given you run in a single instance.
I have to implement a SOA solution with web services. I have to transfer large objects (ex: Invoices of 25~30mb of XML data) and I wonder what's the best approach...
Should I:
A. transfer parts of this objects separately (ex: header first, then items one by one, regardless of the fact that there could be 1000 of them) in several WS calls and then organize them in "server side" dealing with retries and errors.
Or ...
B. Should I transfer the entire payload in one single call and try to optimize it (and not to "burn" Http connections)?
I'm using .Net's WCF to expose services layer. I accept recommended readings and considerations.
The idea would be to maximize the load and minimize the number of calls. This isn't always simple since - in a one shot call - firewalls or the web service itself could limit the payload size and your message might not make it, or - in case of multiple calls - as you mentioned yourself, you have to deal with errors and retries (basically doing WS-ReliableMessaging).
So perhaps, instead of concentrating on the message of an usual call, you might try changing how you perform the respective call, and maybe have a look at MTOM (Message Transmission Optimization Mechanism) with WCF, or maybe use streaming.
I'm trying to write up a tool that requires knowledge of the state of other machines in a cluster (local LAN). This is for a network failover/high availability system similar to VRRP and corosync/openais, but I wish to contain more information (such as near real-time speed/performance characteristics) so devices can make more intelligent choices. This means using a protocol more complicated than a predetermine weight-based mechanism: by allowing all clustered machines to see the state of each other, they can communally agree on which is the most suitable to be the master device.
From my searches, I haven't found any (C, C++ or JavaME) libraries that offer a distributed state mechanism. Ideally, I'm looking for something that broadcasts/multicasts each individual machines state periodically so participating machines can build up a global state table and all can see who the master should be. State in this case is arbitrary key/value pairs.
I'd rather not re-invent any wheels so am curious to know if anyone here can point me in the right direction?
If I were you I'd investigate memcached (memcached.org) or one of the nosql variants.
It sounds like Apache ZooKeeper might be a good match. It's distributed, hierarchical key-value store. To quote their Overview page:
ZooKeeper was designed to store coordination data: status information, configuration, location information, etc.
Here's an example of a simple Leader Election recipie, although it would require adaptation to determine a leader by some weighted criterion.
I'm not sure if there is any application for your purpose or not.
But I know that you can write a simple program with MPI library and broadcast any information that you want.
all client's can send their state to root node, and the root node then broadcast the message.
functions that you need for this are:
MPI_Bcast
MPI_Send
MPI_Recv
there is lots of tutorial on C++/MPI on net, just google it!