I'm interested in using AKKA for an IoT Device scenario but I'm worried about complicating an individual actor. In most industries, a device is not as simple as a 'temperature sensor' you see in most tutorials. A device represents something more complex that can take on the following characteristics:
Many sensors can be represented (temperatures, electrical/fluid flows, power output, on/off values.....
Each of the values above can be queried for current value, and more likely historical values (trends, histograms....)
Alerting rules can be set up for any one of the sensor values
Each device has a fairly complex configuration that must be managed (what sensors, what unit of measure)
Many different message types can be sent (sensor reading request, alerts, configuration updates....)
So my general question is does anyone have good advice on what the level of complexity an actor should take on?
Thanks
Steve
Below are a few bullet points one might want to keep in mind when determining what level of complexity an actor should take on:
Akka actors are lightweight and loosely-coupled by design, thus scale well in a distributed environment. On the other hand, each actor can be tasked to handle fairly complex business logic using Akka's functionality-rich API. This results in great flexibility in determining how much workload an actor should bear.
In general, quantity of IoT devices and operational complexity in each device are the two key factors in the design of the device actor. If total device quantity is large, one should consider having some group-device actors each of which handles a set of devices using, for instance, a private key-value collection. On the other hand, if each IoT device involves fairly complex computation or state mutation logic, it might be better to make each actor represent an individual device. It's worth noting that the two strategies aren't mutually exclusive.
For historical data, I would recommend having actors periodically fed to a database (e.g. Cassandra, PostgreSQL) for OLAP queries. Actors should be left to answer only simple queries.
Akka actors have a well-defined lifecycle with hooks like preStart(), postRestart(), postStop() for programmatic logic control. Supervisor strategies can be created to manage actors in accordance with specific business rules (send alerts, restart actors, etc).
On customizing attributes (e.g. unit of measure) specific to the type of devices, one could model a device type along with its associated sensor attributes, say, as a case class and make it a parameter of the device actor.
Capability of handling different message types via non-blocking message passing is one of the biggest strengths of Akka actors. The receive partial function in an actor effectively handles various message types via pattern matching. When representing a device with complex state mutation logic, its operational state can be safely hotswapped via context.become.
This blog post about simulating IoT devices as individual actors might be of interest.
Related
I'm a newbie, currently interested in data security & integrity.
I'm quite new to blockchain and distributed system theories, and suffering from some unclear doubts/questions on the fault-tolerant consensus.
May I ask for your kind advice on my dull thoughts regarding on the blockchain's true objective?
It would be a great help for me to step forward on understading better concept of consensus.
Here's the summary of what I understand (please correct me if I'm wrong)
In a synchronous network model, it is assumed that one can guarantee the message being delivered within a certain amount of time.
In an asynchronous network model, there is no certain guarantee on the message delivery.
In a design perspective, it is easier & more efficient to design a system based on the synchronous model.
Even the quorum size requirement can be reduced - synchronous model needs only f+1 votes while asynchronous model needs 2f+1 votes on the consensus.
(This is because that the synchronous model can eliminate the possiblity of message dropout (up to f), while the async model needs to consider both message dropout & possibly malicious messages.)
But in a distributed system based on multiple nodes, it is normally impossible to guarantee the message delivery since there is no central manager who can monitor all nodes whether each receives the message or not.
That is why most of the blockchain-oriented distributed ledgers (non-currency) are based on the asynchronous consensus schemes such as PBFT (Castro, Liskov 99).
In a private blockchain scenario, I believe that the main & final purpose of the consensus is to let all nodes hold a chain of blocks, where each block has a certain amount of agreements (i.e. more than f signatures).
So, based on the facts above, I got curious whether the fault-tolerance model only stands for the standard "client-server" environment.
What happens if we give up the client-server model, and let the client supersede peers' broadcast communications? (For a scenario where the client has enough computation power but is just short on storage, so it wants to manage multiple (but few, e.g. 3) replicas to store data via transactions)
To be more specific with a simple example (an authenticated environment with PKI):
Each replica only performs a "block creation (+signing)" and "block finalization (verifying signatures)" logic, and there may exist some malicious replicas (up to f) which try to spread different outputs, which is not originated from the client's transaction.
Let the client (a "data owner/manager" would be a more suitable term now...) visit all replicas, and it handles each replica as below:
Enforce that all replicas work as a synchronized state machine; assure all 3 replicas are synced up (check the latest block of each, and supplement the lagged ones)
Sets a callback (with a timeout) for each replica to guarantee the message delivery
Send a transaction to all replicas, and receive the block (with a signature) generated from the corresponding transaction (assume that the block contains only one transaction)
If more than f signatures (from replicas) for the same block message are collected, deliver the collected signatures to all replicas and order them to finalize the block.
If we do as above, instead of replicas making consensus on their own (i.e. no view-changes), can we still say that the system suffices a BFT model?
It seems that byzantine replicas cannot breach safety, since the client (and honest replicas) will only order/accept block finalization when the block comes with more than f signatures.
The concern is the liveness - can we say that the liveness still holds, since the system can continue (not a client-server model anymore, but the "system" goes on) because the client will ignore the faulty replica response and order honest replicas to finalize?
This curiosity is just stuck in my head, and it's blocking me from clearly understanding why the private blockchain systems need asynchronous consensus process among peers themselves.
(since they already trust the client transactions, and participants are designated by PKI & signatures)
Could anyone be kind and inform that whether my dominant-client example can still suffice the BFT, or something would go wrong?
Thank you so much!
Example:
Suppose I have an actor that is managing communication with some external service, so it has a Client object within it for making requests to the external service. In order to avoid this actor becoming monolithic, I might want to create child actors for handling different pieces of interaction: maintaining a heartbeat to the service, making and coordinating complicated requests, etc. These child actors will need a reference to a Client of the service.
Given that these client objects are unlikely to be serialisable and may be stateful e.g. contain connection state, is it an anti pattern to pass them to child actors via Props? The Akka documentation seems to strongly encourage maintaining serialisable props, but this seems to be extremely limiting in this case.
The Akka documentation seems to strongly encourage maintaining serialisable props
I am not aware of this suggestion, do you mind sharing the link in the question?
From my experience, it is pretty common to pass these Client references from parent to child actors. Sometimes I may choose to pass the exact method (a function) instead of the Client reference for the ease of unit testing. As long as you're not spawning an actor across the network boundary, I don't see any reason why this is a bad thing.
Regarding the Client object you describe, for network level things (e.g., connection state etc) I will leverage akka-http Client API. If you were to keep application level things, I would prefer having a separate actor to be dedicated for such use. It sounds a bit anti-pattern to me to keep application state in non-actor provided that you have Akka actor, which is designed to host state.
I'm struggling with proper design of communication between 2 separate akka microservices/bounded contexts in akka cluster.
Lets say we have 2 microservices in cluster per node.
Both microservices are based on akka.
It's clear for me that communication within particular bounded context will be handled by sending messages from actor to actor or from actor on node1 to actor on node2 (if necessary)
Q: but is it OK to use similar communication between separate akka application? e.g. boundedContext1.actor --message--> boundedContext2.actor
or it should be done via much clearer boundaries: In bc1 raise an event - publish to broker and read the event in bc2?
//Edit currently we've implemented a service registry an we're publishing events to service registry via Akka streams.
I think there is no universal answer here.
It is like if your BCs are simple enough you can hold the BCs in one application and even in one project/library with very weak boundaries i.e. just placing them into separate namespaces and providing an API for other BCs.
But if your BCs become more complex, more independent and require its own deployment cycle then it is definitely better to construct more strong boundaries and separate microservices that communicate through message broker.
So, my answer is you should just "feel" the right way according to your particular needs. If you don't "feel" it then follow KISS principle and start with an easier way i.e. using built-in akka communication system. And if in the future your BCs will become more complex then you will have to refactor them. But this decision will be justified and it will not be an unnecessary overhead.
I am currently working on a Play! project that has the following architecture:
Controllers -> Services (actors) -> Models (Regular case classes)
For each request that comes in, we will issue a call to the service layers like so:
Service ? DoSomething(request, context)
We have a set number of these service actors behind an akka router that are created during app initialization, and is expandable on demand.
And in the service we mostly do modest data manipulation or database calls:
receive = {
case DoSomething(x, y) => {
...
Model.doSometing(...)
sender ! result
}
}
I am having second thoughts on whether we should be using actors for our services or just use Futures only.
We do not have any internal state that needs to be modified in the service actors, whatever message comes in goes to a function and spits out the result. Isn't this the big strength of the actor model?
We are doing a lot of tasks which seem to take a lot away from the actor model
We aren't doing heavy computation and remoting doesn't make sense because most of the work is for the database and roundtriping to a remote actor to make some db call is unnecessary
We do use reactivemongo, so every db call is non blocking. We can make a lot of these calls
It seems to me that removing akka and just use Futures makes our life a lot easier, and we don't really lose anything.
There certainly is no shortage of opinion on the topic of what should and shouldn't be an actor. Like these two posts:
http://noelwelsh.com/programming/2013/03/04/why-i-dont-like-akka-actors/
http://www.chrisstucchio.com/blog/2013/actors_vs_futures.html
I don't think you're going to find an absolute answer to this question other then that it's situational and it's really up to your preferences and your problem. What I can do for you is to offer my opinion that is based on us implementing Akka for about 2 years now.
For me, I like to think of Akka really as a platform. We come for the Actor Model but we stay for all of the other goodness that the platform provides like Clustering/Remoting, FSM, Routing, Circuit Breaker, Throttling and the like. We are trying to build an SOA like architecture with our actors acting as services. We are deploying these services across a cluster, so we are taking advantage of things like Location Transparency and Routing to provide the ability for a service consumer (which itself could be another service) to find and use a service no matter where it is deployed, and in a highly available manner. Akka makes this whole process pretty simple based on the platform tools they offer.
Within our system, we have the concept of what I call Foundation Services. These are really simple services (like basic lookup/management services for a particular entity). These services generally don't call any other services, and in some cases, just perform DB lookups. These services are pooled (router) and don't usually have any state. They are pretty similar to what you are describing some of your services to be like. We then start to build more and more complex services on top of these foundation services. Most of these services are short lived (to avoid asking), sometimes FSM based, that collect data from the foundation services and then crunch and do something as a result. Even though these foundation services are themselves pretty simple, and some would say don't require an actor, I like the flexibility in that when I compose them into a higher level service, I can look them up and they can be anywhere (location transparent) in my cluster with any number of instances available (routing) for using.
So for us, it was really a design decision to baseline around an actor as a sort of micro-like service that is available within our cluster for consumption by any other service, no matter how simple that service is. I like communicating with these services, where ever they are, through a coarse grained interface in an async manner. A lot of those principles are aspects of building out a good SOA. If that's your goal, then I think Akka can be very helpful in achieving that goal. If you are not looking to do something like that, then maybe your are right in questioning your decision to use Akka for your services. Like I said earlier, it's really up to you to figure out what you are trying to do from an architecture perspective and then design your services layer to meet those goals.
I think you're on right tracks.
We do not have any internal state that needs to be modified in the service actors, whatever message comes in goes to a function and spits out the result. Isn't this the big strength of the actor model?
I found Chris Stucchio's blog (referred to by #cmbaxter above) instantly delightful. My case was so simple that architectural considerations were not a valid point. Just Spray routing and lots of database access, like you have. No state. Thus Future. So much simpler code.
Actor should be crated when you need some really long living stuff with modifying state. In other cases there are no any benefits from actors, especially from non-typed ones.
- do pattern matching every time
- control actor's lifecycle
- remember the things which should not be passed between threads
Why do all of this when you may have simple Future?
There are tasks where actors fit very well, but not everywhere
I was wondering the same and what we decide to do was to use Akka for our data access and it works very well, it's very testable (and tested), and very portable.
We created repositories, long living actors, that we bootstrapped in our App : (FYI, we are using slick for our DB Access, but also have a similar design for our MongoDB needs)
val subscriptionRepo = context.actorOf(Props(new SubscriptionRepository(appConfig.db)), "repository-subscription")
Now we are able to send a "Request" Message for data, ex:
case class SubscriptionsRequested(atDate: ZeroMillisDateTime)
that the actor will respond with
case class SubscriptionsFound(users: Seq[UserSubscription])
or Failure(exception)
In our situation (spray apps but also CLI), we wrapped those calls in a short living actor that take the context and complete on reception and closes itself. (You could handle domain specific logic in those actor, have it to extend another actor that manages its lifecycle and exception so you would only have to specify a partial function for your needs and leave the abstract actor to deal with timeouts, common exceptions etc.
We also have situations where we needed more work to be done in the initiating actor, and it very convenient to fire x messages to your repositories and have your actor storing those messages as they arrive, doing something once they are all there, firing back for completion to the sender( for instance) and close itself.
Thanks to this design, we have a very reactive repository living outside our app, completely tested with Akka TestKit and H2, completely DB agnostic, and it's dead easy, to access data from our DBs (and we never do any ASK, only Tell : Tell to repo, tell to sender, complete or x Tells to repos, pattern match on expected results until completion, tell to sender).
When should I use Actors vs. Remote Actors in Akka?
I understand that both can scale a machine up, but only remote actors can scale out, so is there any practical production use of the normal Actor?
If a remote actor only has a minor initial setup overhead and does not have any other major overhead to that of a normal Actor, then I would think that using a Remote Actor would be the standard, since it can scale up and out with ease. Even if there is never a need to scale production code out, it would be nice to have the option (if it doesn't come with baggage).
Any insight on when to use an Actor vs. Remote Actor would be much appreciated.
Remote Actors cannot scale up, they are only remote references to a local actor on another machine.
For Akka 2.0 we will introduce clustered actors, which will allow you to write an Akka application and scale it up only using config.
Regular Actors can be used in sending out messages in local project.
As for the Remote Actors, you can used it in sending out messages to dependent projects that are connected to the project sending out the message.
Please refer here for the Remote Akka Actors
http://doc.akka.io/docs/akka/snapshot/scala/remoting.html
The question asks "If a remote actor only has a minor initial setup overhead and does not have any other major overhead then I would think that using a Remote Actor would be the standard". Yet the Fallacies of distributed computing make the point that it is a design error to assume that remoting with any technology has no overhead. You have the overhead of copying the messages to bytes and transmitting it across the network interface. You also have all the complexity of different processes being up, down, stalled or unreachable and of the network having hiccups leading to lost, duplicated or reordered messages.
This great article has real world examples of weird network errors which make remoting hard to make bullet proof. The Akka project lead Roland Kuln in his free video course about akka says that in his experience for every 1T of network messages being sent he sees a corruption. Notes on Distributed Systems for Young Bloods says "distributed systems tend to need actual, not simulated, distribution to flush out their bugs" so even good unit tests wont make for a perfect system. There is lots of advice that remoting is not "free" but hard work to get perfect.
If you need to use remoting for availability, or to move to huge scale, then note that akka does at-least-once delivery with possible duplication. So you must ensure that duplicated messages don't create bad results.
The moment you start to use remoting you have a distributed system which creates challenges which are discussed in Distributed systems for fun and profit. Unless you are doing very simply things like stateless calculators that are idempotent to duplicated messages things get tricky. One of assignments on that akka video course at the link above is to make a replicated key-value store which can deal with lost messages by writing the logic yourself. Its far from being an easy assignment. State distributed across different processes gets very hard, actors encapsulate state, therefore distributing actors can get very hard, depending on the consistency and availability requirements of the system you are building.
This all implies that if you can avoid remoting and achieve what you need to achieve then you would be wise to avoid it. If you do need remoting then Akka makes it easy due to its location transparency. So whilst its a great toolbox to take with you on the job; you should double check if the job needs all the tools or only the simplest ones in the box.