What is controlling the delay of Event Transfer in Akka Projections from Event Journal to target DB - akka

I have an Akka Project with Akka Persistence that works perfectly, lately I integrated it with Akka Projections and it works but I have some weird phenomena that I like to ask here.
When I am sending Commands to Akka they are processed and saved at Apache Cassandra (I can see the persisted events) but the Events picked with a delay between 5 to 10 minutes by Akka Projections.
Now I could not see in the documentation any word about Akka Projections working model about that it is a push or pull model. So is Akka Projections polling the Cassandra Journal or somehow pushed.
If it is pull model, is there a configuration parameter that I can with which interval Akka Projections should poll the Casandra Journal.
Any ideas?

Akka Projections is built on top of Akka Persistence Query and generally relies on the events by tag query. The relevant options which would come into play here are covered in the docs for Akka Persistence Cassandra's events by tag query. The events-by-tag query which is feeding the Projection works on the pull model (though some metadata can be updated on a push model, e.g. if akka.persistence.cassandra.events-by-tag.pubsub-notification is set to on in the writer).
The 15 minute delay doesn't immediately scream out anything, unless this is the first time starting the projection, as the default akka.persistence.cassandra.events-by-tag.first-time-bucket and akka.persistence.cassandra.bucket-size values will cause the query to iterate through every hour since November, 2015. bucket-size can't be changed once data has been written, but the first-time-bucket can be set to a time not too soon before the earliest date the application was started.
A Projection should be saving offsets as it makes progress, so once a projection catches up, it should stay caught up (and not be doing that scan from 2015). If you're starting Projections on demand, I'd consider revisiting that decision (e.g. having a long-lived Projection for a given tag which publishes the events to Kafka).
Beyond that, I'd look at how well your Cassandra is performing. This delay might be caused by an excessive number of retries of the query: akka.persistence.cassandra.verbose-debug-logging = true in the config might shed some light.

Related

Akka: Persistence failure when replaying events

We are working on an event sourced application with akka-persistance using Oracle database as event store. The application have been running in production for sometime now. Lately we are seeing the following error in the application for some of the persistent actors.
Persistence failure when replaying events for persistenceId [some-persistence-id]. Last known sequence number [0]
Can someone who faced a similar issue in their application share their experience of why this happens?
Also, going through Akka documentation at: https://doc.akka.io/docs/akka/current/persistence.html, onRecoveryFailure is responsible for handling such failures. Is there a way we can override this method to ignore the persisted events in case we see failures while replaying events? In our scenario replaying the events is not very critical and we can serve the users even by ignoring the m.
That log is typically a manifestation of something else. Since the failure is from sequence number zero, that points an actual query to the DB failing (e.g. timeout). There should be other logs around the time of that log which will provide further information.
Akka Persistence has a fairly strong assumption that the persisted state is important (otherwise why would you be persisting?). Off the top of my head, I would consider separating the parts of the actor which are affected by persistence from the parts which aren't: the non-persistent actor can spawn a persistent child and interact with it (it can do tricks with stashing, for instance, to present an illusion that it and its child are the same actor).

How to perform non-idempotent actions (send email) in a Actor model framework (e.g., akka.net)?

I am looking into using an actor model framework (akka.net with akka.net persistence, but I am looking for a general case answer) to build an 'widget order processing workflow'.
Pretty standard:
Customer orders widget
Payment is processed
Email confirmation sent to customer
Send picklist message to warehouse
Warehouse sends a 'widget has been shipped' message back
Send a 'your item has shipped' email to customer
Now let's say between 4 and 5 a server deployment/restart happens. This would cause a actor(s) rehydration (let's assume there is no snapshot yet). That means we would process the payment again, and resend the order placed email. However it turns out our customers don't like this 'feature'!
How to I prevent non-idempotent actions from re-occurring when using an actor model framework?
I have thought about having a separate store of 'payment processed for order db table'; but this feels like I am fighting the framework/paradigm and I wonder if there is a 'proper' way of doing this kind of thing!
Ok so it turns out it is pretty simple.
With akka.net persistence, after a system restore, messages are replayed. The correct state can be recreated by (re) processing these messages.
There is however a IsRestoring property, which can be checked to see if this is the first or a subsequent processing. Presumably other actor model framework have something similar.
So you do something like:
private void ProcessPayment (Order message)
{
if(!this.IsRestoring){
//Perform non-idempotent payment process
}
}
To make a robust workflow processor, you have to store ALL data of a workflow process in a permanent storage.
You can employ a database, a messaging system like Kafka, or use ready-made workflow management software.
Since you already use Akka, Akka Persistence also can be an option.
UPDATE
Building a system which continue to work correctly in presence of system failures and restarts is a considerable task, far more complex than developing an actor framework. That is, you cannot just take any actor framework and add fault tolerance to it.

How to externalize akka sharded actor state to redis or ignite?

I am very new to Akka clustering and working on a proof of concept. In my case i have an actor which is running on a cluster and the actor has state as a Map[String,Any]. So, for any request the actor receives it based on the incoming message it create a new entity actor and the data map. The problem here is the map is in memory right now. Is it possible to store the sharded actor state somewhere in redis or ignite ?
You should probably start by having a look at akka-persistence (the persistence module included in akka). The snapshotting part is meant to persist the state directly, but you have to start with the command/event-sourcing part, the snapshotting part being an optional enhancement.
Then you can combine this with automatic passivation of your sharded actors after a certain inactivity timeout.
With the above, you'll have a solution that persists the state of your actors in an external storage system to free up memory, restoring your actor's state whenever they come back to life.
Last step would be to see which storage backends are available for akka-persistence and match your requirements, you can implement your own of course.

Logstash & Elasticsearch mass log process

I would like to know what's the best configuration for processing mass log files - I've enabled AWS new feature, ELB logs and would like to ship all to an elasticsearch using logstash,
I have almost 400 Million requests per day, which architecture should I choose?
Best Regards.
For Logstash/Elasticsearch it "all depends" on your events, how complex your Logstash configurations are, and how complex the resulting events are.
Best is to do a proof of concept starting one index with one shard on one machine and fire events at it until you find the limit.
Same procedure for processing events with Logstash.
Then you'll have a reference for the hardware necessary to process your volume of events.

Eventsourced actors Replay

I have been working on integrating eventsourced in our application for guaranteed actor message delivery.
I was looking into the message replay section - replay-parameters. Our application will receive a lot of messages and we don't want the replay to start from scratch on application restart. This would dramatically increase our application start up time.
So as suggested in the wiki, we were planning to start the replay from an upper sequence number. So our application has to have some logic to define this upper sequence number.
I was wondering if there an easy way to query the eventsourced framework to find the highest sequence number of the successfully acknowledged message. In that case we need not write any logic in our app and we can start replay from this highest sequence number.
What you're probably looking for is standalone usage of reliable channels (and their activation via channel ! Deliver). A reliable channel, when activated, automatically starts with the re-delivery of all messages that haven't been ACKed by a destination.
ReplayParams are for replaying messages to a processor (persistent/stateful actor). If you want to reduce recovery time of processor state, consider using snapshots.