I have two nodes/machines/JVMs connected with Akka.
Each JVM has one different actor.
I am going to send large amount of messages from one actor to second.
Should I use tell() or create topic and subscribe/publish?
As I understand, tell is one-to-one communication and subscribe/publish is for one-to-many and many-to-many, however, as subscribe/publish also works for one-to-one (each actor may subscribe their own topic and publish to other, and thus communicate).
I do not know which one is better in this case.
What is difference in subscribe/publish and tell in one-to-one communication?
I am particularly interested in:
- performance (case 1: both on single node, case 2: on different nodes)
- design (with tell() I need to pass ActorRef first, while subscribe/publish allows to stay "anonymous")
I also should note, that in future I may need to stream big files between actors. By "big" I mean up to 4gb, but mostly thousands of kb/mb smaller files.
Related
If you are familiar with Trello, would storing an entire Trello board as an actor (with akka persistence) be a good use case?
A trello board consists of:
lists
tasks in a list
each task can have comments and other properties
What are the general best practices or considerations when deciding if akka persistance is a good use case for a given problem set?
Any context where event sourcing is a good fit is a good fit for Akka Persistence.
Event sourcing, in turn, is generally applicable (note that nearly any DB you're using is event sourcing (with exceptionally frequent snapshotting, truncation of the event log, and purging of old snapshots)).
Event sourcing works really well when you want to explicitly model how entities in your domain change over time: you're effectively defining an algebra of changes. The richer (i.e. the further from just create/update) this model of change is, the more it's a fit for event sourcing. This modeling of change in turn facilitates letting other components of a system only update their state when needed.
Akka Persistence, especially when used with cluster sharding, lets you handle commands/requests without having to read from a DB on every command/request (basically, you'll read from the DB when bringing back an already persisted actor, but subsequent commands/requests (until such time as the actor passivates or dies) don't require such reads). The model of parent and child actors in Akka also tends to lead to a natural encoding of many-to-one relationships.
In the example of a trello board, I would probably have
each board be a persistent actor, which is parent to
lists, which are persistent actors and are each parents to
list items, which are also persistent actors
Depending on how much was under a list item, they might in turn have child persistent actors (for comments, etc.).
It's probably worth reading up on domain-driven design. While DDD doesn't require the actor model (nor vice versa), and neither of them requires event sourcing (nor vice versa), I and many others have found that they reinforce each other.
It mostly depends on how much write the app wants to perform.
Akka persistence is an approach to achieve very high write throughput while ensuring the persistence of the data, i.e., if the actor dies and data in memory is lost, it is fine because the write logs are persisted to disk.
If the persistence of the data is necessary, while very high write throughput is not required (imagine the app updates the Trello board 1 time per second), then it is totally fine to simply writing the data to external storage.
would storing an entire Trello board as an actor (with akka
persistence) be a good use case
I would say the size of the actor should match the size of an Aggregate Root. Making an entire board an Aggregate Root seems like a very bad choice. It means that all actions on that board are now serialized and none can happen concurrently. Why should changing description of card #1 conflicts with moving car #2 to a different category? Why should creating a new board category conflict with assigning card #3 to someone?
I mean, you could make an entire system a single actor and you wouldn't ever have to care about race conditions, but you'd also kill your scalability...
In a typical client server model, what does it mean to subscribe or unsusbscribe to a feed? Is there a generic codebase or boilerplate model or set of standard procedures or class design and functionalities involved? This is all C++ based. There's no other info other than the client is attempting to connect to the server to retrieve data based on some sort of signature. I know it's somewhat vague, but I guess this is really a question of what are things to keep in mind and what a typical subscribe or unsubscribe method might entail. Maybe something along the lines of extending a client server model like http://www.linuxhowtos.org/C_C++/socket.htm.
This is primarily an information architecture question. "Subscribing to feeds" implies that the server offers a lot of information, which may not be uniformly relevant to all clients. Feeds are a mechanism by which clients can select relevant information.
Concretely, you first need to identify the atoms of information that you have. What are the smallest chunks of data ? What properties to they have? Can new atoms replace older atoms, and if so, what identifies their relation? Are there other atom relations besides replacement?
Next, there's the mapping of those atoms to particular feeds. What are the possible combinations of atoms needed by a client? How can these combinations be bundled in two ore more feeds? It is possible to map each atom uniquely to a single feed? Or must atoms be shared between feeds? If so, is that rare enough that you can ignore it and just send duplicates?
If a client connects, how do you figure out which atoms need to be shared? Is it just live streaming (atoms are sent only when they're generated on the server), do you have a set of current atoms (sent when a client connects), or do you need some history as well? Is there client caching?
It's clear that you can't have a single off-the-shelf solution when the business side is so diverse.
I am creating actors that represents physical devices and their state. As devices come online I create them "on demand" by sending and Identify message to the actor's path and then if it does not exist yet I create one. Potentially, there could be several million of these devices.
My concern is that the Identify look-up will take a performance hit as the number of actors increases. Is this a valid concern?
I was considering using a router strategy to segment the actors, but then I found that searching on the path with a wild card for the router yielded ActorIdentities from each router. I assume that a ConsistentHashingRouter would suit this scenario, but before I go down that rabbit hole I just want to make sure I am not optimizing prematurely.
The entity which creates an actor is only its parent (there no other way), which means that that parent actor does not need to use Identify at all, just check context.child(name).isDefined. That is very efficient, although you might want to shard your devices across multiple parents if you really have a massive number of them.
Problem:
Process a backlog of messages where each message has three headers "service", "client", and "stream". I want to process the backlog of messages with maximum concurrency, but I have some requirements:
Only 10 messages with the same service can be processing at once.
Only 4 messages with the same service AND client can be processing at
once.
All messages with the same service AND client AND stream must
be kept in order.
Additional Information:
I've been playing around with "maxConcurrentConsumers" along with the "JMSXGroupID" in a ServiceMix (Camel + ActiveMQ) context, and I seem to be able to get 2 out of 3 of my requirements satisfied.
For example, if I do some content-based routing to split the backlog up into separate "service" queues (one queue for each service), then I can set the JMSXGroupID to (service + client + stream), and set maxConcurrentConsumers=10 on routes consuming from each queue. This solves the first and last requirements, but I may have too many messages for the same client processing at the same time.
Please note that if a solution requires a separate queue and route for every single combination of service+client, that would become unmanageable because there could be 10s of thousands of combinations.
Any feedback is greatly appreciated! If my question is unclear, please feel free to suggest how I can improve it.
To my knowledge, this would be very hard to achieve if you have 10k+ combos.
You can get around one queue per service/client combo by using consumers and selectors. That would, however, be almost equally hard to deal with (you simply don't create 10k+ selector consumers unharmed and without significant performance considerations), if you cannot predict in some way a limited set of service/client active at once.
Can you elaborate on the second requirement? Do you need it to make sure there are some sense of fairness among your clients? Please elaborate and I'll update if I can think of anything else.
Update:
Instead of consuming by just listening to messages, you could possibly do a browse on the queue, looping through the messages and pick one that "has free slots". You can probably figure out if the limit has been reached by some shared variable that keeps track given you run in a single instance.
Since there's no complete BPM framework/solution in ColdFusion as of yet, how would you model a workflow into a ColdFusion app that can be easily extensible and maintainable?
A business workflow is more then a flowchart that maps nicely into a programming language. For example:
How do you model a task X that follows by multiple tasks Y0,Y1,Y2 that happen in parallel, where Y0 is a human process (need to wait for inputs) and Y1 is a web service that might go wrong and might need auto retry, and Y2 is an automated process; follows by a task Z that only should be carried out when all Y's are completed?
My thoughts...
Seems like I need to do a whole lot of storing / managing / keeping
track of states, and frequent checking with cfscheuler.
cfthread ain't going to help much since some tasks can take days
(e.g. wait for user's confirmation).
I can already image the flow is going to be spread around in multiple UDFs,
DB, and CFCs
any opensource workflow engine in other language that maybe we can port over to CF?
Thank you for your brain power. :)
Study the Java Process Definition Language specification where JBoss has an execution engine for it. Using this Java based engine may be your easiest solution, and it solves many of the problems you've outlined.
If you intend to write your own, you will probably end up modelling states and transitions, vertices and edges in a directed graph. And this as Ciaran Archer wrote are the components of a State Machine. The best persistence approach IMO is capturing versions of whatever data is being sent through workflow via serialization, capturing the current state, and a history of transitions between states and changes to that data. The mechanism probably needs a way to keep track of who or what has responsibility for taking the next action against that workflow.
Based on your question, one thing to consider is whether or not you really need to represent parallel tasks in your solution. Where instead it might be possible to en-queue a set of messages and then specify a wait state for all of those to complete. Representing actual parallelism implies you are moving data simultaneously through several different processes. In which case when they join again you need an algorithm to resolve deltas, which is very much a non trivial task.
In the context of ColdFusion and what you're trying to accomplish, a scheduled task may be necessary if the system you're writing needs to poll other systems. Consider WDDX as a serialization format. JSON, while seductively simple, I recall has some edge cases around numbers and dates that can cause you grief.
Finally see my answer to this question for some additional thoughts.
Off the top of my head I'm thinking about the State design pattern with state persisted to a database. Check out the Head First Design Patterns's Gumball Machine example.
Generally this will work if you have something (like a client / order / etc.) going through a number of changes of state.
Different things will happen to your object depending on what state you are in, and that might mean sitting in a database table waiting for a flag to be updated by a user manually.
In terms of other languages I know Grails has a workflow module available. I don't know if you would be better off porting to CF or jumping ship to Grails (right tool for the job and all that).
It's just a thought, hope it helps.