AWS SQS Stream Not Shutting Down - amazon-web-services

On empty receive from AWS SQS I am trying to shutdown and stop listening using below code but it is not happening-
val queueSourceSettings: SqsSourceSettings = SqsSourceSettings(
getQueue(queueType).waitTimeSeconds,
getQueue(queueType).maxBufferSize,
getQueue(queueType).maxBatchSize,
messageAttributeNames = Seq(
MessageAttributeName(TRANSACTION_ID.name)
),
closeOnEmptyReceive = true
)
SqsSource(endpoint.queue.url, endpoint.client, endpoint.queueSourceSettings)
.via(flow)
.recoverWithRetries(-1, {
case e =>
logger.error("Stream Faiure: ", e)
streamFailure(e)
SqsSource(endpoint.queue.url, endpoint.client, endpoint.queueSourceSettings)
.via(flow)
})
.runWith(SqsAckSink(endpoint.queue.url)(ec, endpoint.client))

Related

Akka pre-mature termination

I'm experiencing early termination of the AkkaSystem after first record being read ,entering into the Dead letter without executing the task for all the records
I have 10records in my file , 2 of the records has the matching filename to be pushed to s3.
what could be going wrong here. Please suggest
Sample file record:
xxxxx,ABC,2019-05-10 00:11:00
yyyyyy,XYZ,2019-05-10 00:41:00
import akka.actor.{Actor, ActorSystem, Props}
import scala.io.Source
import scala.sys.process._
class HelloActor extends Actor {
def receive: Receive = {
case line: String => {
// print("Ahshan"+line)
val row = line.split ( "," )
val stdout = new StringBuilder
val stderr = new StringBuilder
val status = Seq ( "/bin/sh", "-c", "ls /Users/ahshan.md/Downloads/".concat ( row ( 2 ).substring ( 0, 10 ) ).concat ( "/*" ).concat ( row ( 0 ) ).concat ( "*" ) ) ! ProcessLogger ( stdout append _, stderr append _ )
// println(status)
// println("stdout: " + stdout)
// println("stderr: " + stderr)
if (status == 0) {
println ( "/bin/sh", "-c", "aws s3 cp ".concat ( stdout.mkString ).concat ( " " ).concat ( "s3://ahshan/".concat ( row ( 1 ) ).concat ( "/" ).concat ( row ( 0 ) ).concat ( ".email" ) ) )
}
else {
// println ( "File Not Found: " + row ( 0 ), row ( 1 ), row ( 2 ).substring ( 0, 10 ) )
println ( "stderr: " + stderr )
}
}
case "finished" => println ( "Hello Ahshan" )
case _ => println ( "Exiting" )
}
}
object AkkaHelloWorld extends App {
// an actor needs an ActorSystem
val system = ActorSystem ( "HelloSystem" )
// create and start the actor
val helloActor = system.actorOf ( Props [HelloActor], name = "helloActor" )
try {
val filename = "/Users/ahshan.md/Downloads/test.txt"
for (line <- Source.fromFile ( filename ).getLines) {
helloActor ! line
}
}
finally {
system.terminate ()
}
}
The ActorSystem and the actors run concurrently with your main thread, sending a message to an actor is async and the sending thread immediately continues, it does not wait for the actor to process the message.
This means that in your code the main thread fires off each line to the actor as fast as it can and then terminates the actor system. In this example it could make sense to move the termination logic into the actor and let the actor terminate the system when it has completed.
That will only work as long as your application is a single actor though, as soon as you add another actor you will have to revisit how you terminate the system.
As an additional note using the process API in Scala like that will block until the process has completed, calling blocking code can have bad consequences, read this section of the docs for more details: https://doc.akka.io/docs/akka/current/typed/dispatchers.html#blocking-needs-careful-management

can this code can be translated to stateful akka streams?

I'm trying to listen to sqs using akka streams and i get messages from it's q
using this code snippet:
of course this code snippet get messages one-by-one (then ack it):
implicit val system = ActorSystem()
implicit val mat = ActorMaterializer()
implicit val ec = ExecutionContext.fromExecutor(Executors.newFixedThreadPool(ioThreadPoolSize))
val awsSqsClient: AmazonSQSAsync = AmazonSQSAsyncClientBuilder
.standard()
.withCredentials(new ClasspathPropertiesFileCredentialsProvider())
.withEndpointConfiguration(new EndpointConfiguration(sqsEndpoint, configuration.regionName))
.build()
val future = SqsSource(sqsEndpoint)(awsSqsClient)
.takeWhile(_ => true)
.mapAsync(parallelism = 2)(m => {
val msgBody = SqsMessage.deserializeJson(m.getBody)
msgBody match {
case Right(body) => val id = getId(body) //do some stuff with the message may save state according the id
}
Future(m, Ack())
})
.to(SqsAckSink(sqsEndpoint)(awsSqsClient))
.run()
my question is:
can i get several messages, and save them for example in a stateful map for latter use?
for example that after receiving 5 messages (all of them will saved (per state))
then if specific condition happens i will ack them all, and if not they will return into queue (will happen anyway because visibility timeout)?
thanks.
Could be that you're looking for grouped (or groupedWithin) combinator. These allow you to batch messages and process them in groups. groupedWithin allows you to release a batch after a certain time in case it hasn't yet reached your determined size. Docs reference here.
In a subsequent check flow you can perform any logic you need, and emit the sequence in case you want the messages to be acked, or not emit them otherwise.
Example:
val yourCheck: Flow[Seq[MessageActionPair], Seq[MessageActionPair], NotUsed] = ???
val future = SqsSource(sqsEndpoint)(awsSqsClient)
.takeWhile(_ => true)
.mapAsync(parallelism = 2){ ... }
.grouped(5)
.via(yourCheck)
.mapConcat(identity)
.to(SqsAckSink(sqsEndpoint)(awsSqsClient))
.run()

akka stream custom graph stage

I have an akka stream from a web-socket like akka stream consume web socket and would like to build a reusable graph stage (inlet: the stream, FlowShape: add an additional field to the JSON specifying origin i.e.
{
...,
"origin":"blockchain.info"
}
and an outlet to kafka.
I face the following 3 problems:
unable to wrap my head around creating a custom Inlet from the web socket flow
unable to integrate kafka directly into the stream (see the code below)
not sure if the transformer to add the additional field would be required to deserialize the json to add the origin
The sample Project (flow only) looks like:
import system.dispatcher
implicit val system = ActorSystem()
implicit val materializer = ActorMaterializer()
val incoming: Sink[Message, Future[Done]] =
Flow[Message].mapAsync(4) {
case message: TextMessage.Strict =>
println(message.text)
Future.successful(Done)
case message: TextMessage.Streamed =>
message.textStream.runForeach(println)
case message: BinaryMessage =>
message.dataStream.runWith(Sink.ignore)
}.toMat(Sink.last)(Keep.right)
val producerSettings = ProducerSettings(system, new ByteArraySerializer, new StringSerializer)
.withBootstrapServers("localhost:9092")
val outgoing = Source.single(TextMessage("{\"op\":\"unconfirmed_sub\"}")).concatMat(Source.maybe)(Keep.right)
val webSocketFlow = Http().webSocketClientFlow(WebSocketRequest("wss://ws.blockchain.info/inv"))
val ((completionPromise, upgradeResponse), closed) =
outgoing
.viaMat(webSocketFlow)(Keep.both)
.toMat(incoming)(Keep.both)
// TODO not working integrating kafka here
// .map(_.toString)
// .map { elem =>
// println(s"PlainSinkProducer produce: ${elem}")
// new ProducerRecord[Array[Byte], String]("topic1", elem)
// }
// .runWith(Producer.plainSink(producerSettings))
.run()
val connected = upgradeResponse.flatMap { upgrade =>
if (upgrade.response.status == StatusCodes.SwitchingProtocols) {
Future.successful(Done)
} else {
throw new RuntimeException(s"Connection failed: ${upgrade.response.status}")
system.terminate
}
}
// kafka that works / writes dummy data
val done1 = Source(1 to 100)
.map(_.toString)
.map { elem =>
println(s"PlainSinkProducer produce: ${elem}")
new ProducerRecord[Array[Byte], String]("topic1", elem)
}
.runWith(Producer.plainSink(producerSettings))
One issue is around the incoming stage, which is modelled as a Sink. where it should be modelled as a Flow. to subsequently feed messages into Kafka.
Because incoming text messages can be Streamed. you can use flatMapMerge combinator as follows to avoid the need to store entire (potentially big) messages in memory:
val incoming: Flow[Message, String, NotUsed] = Flow[Message].mapAsync(4) {
case msg: BinaryMessage =>
msg.dataStream.runWith(Sink.ignore)
Future.successful(None)
case TextMessage.Streamed(src) =>
src.runFold("")(_ + _).map { msg => Some(msg) }
}.collect {
case Some(msg) => msg
}
At this point you got something that produces strings, and can be connected to Kafka:
val addOrigin: Flow[String, String, NotUsed] = ???
val ((completionPromise, upgradeResponse), closed) =
outgoing
.viaMat(webSocketFlow)(Keep.both)
.via(incoming)
.via(addOrigin)
.map { elem =>
println(s"PlainSinkProducer produce: ${elem}")
new ProducerRecord[Array[Byte], String]("topic1", elem)
}
.toMat(Producer.plainSink(producerSettings))(Keep.both)
.run()

Akka (.net) cluster with remote nodes: Disassociated exception

Using akka (.net) I am trying to implement simple cluster use case.
Cluster - for nodes up/down events.
Remote - for sending message to specific node.
There are two actors: Master Node which listening cluster events and Slave Node which connecting to the cluster.
Address address = new Address("akka.tcp", "ClusterSystem", "master", 8080);
cluster.Join(address);
When ClusterEvent.MemberUp message is reseived Master Node creating actor link:
ClusterEvent.MemberUp up = message as ClusterEvent.MemberUp;
ActorSelection nodeActor = system.ActorSelection(up.Member.Address + "/user/slave_0");
Sending message to this actor causes an error:
Association with remote system akka.tcp://ClusterSystem#slave:8090 has failed; address is now gated for 5000 ms. Reason is: [Disassociated]
master config:
akka {
actor {
provider = ""Akka.Cluster.ClusterActorRefProvider, Akka.Cluster""
}
remote {
helios.tcp {
port = 8080
hostname = master
bind-hostname = master
bind-port = 8080
send-buffer-size = 512000b
receive-buffer-size = 512000b
maximum-frame-size = 1024000b
tcp-keepalive = on
}
}
cluster{
failure-detector {
heartbeat - interval = 10 s
}
auto-down-unreachable-after = 10s
gossip-interval = 5s
}
stdout-loglevel = DEBUG
loglevel = DEBUG
debug {{
receive = on
autoreceive = on
lifecycle = on
event-stream = on
unhandled = on
}}
}
slave config:
akka {
actor {
provider = ""Akka.Cluster.ClusterActorRefProvider, Akka.Cluster""
}
remote {
helios.tcp {
port = 8090
hostname = slave
bind-hostname = slave
bind-port = 8090
send-buffer-size = 512000b
receive-buffer-size = 512000b
maximum-frame-size = 1024000b
tcp-keepalive = on
}
}
cluster{
failure-detector {
heartbeat - interval = 10 s
}
auto-down-unreachable-after = 10s
gossip-interval = 5s
}
stdout-loglevel = DEBUG
loglevel = DEBUG
debug {{
receive = on
autoreceive = on
lifecycle = on
event-stream = on
unhandled = on
}}
}
Here's your problem:
cluster{
failure-detector {
heartbeat - interval = 10 s
}
auto-down-unreachable-after = 10s
gossip-interval = 5s
}
heartbeat-interval and auto-down-unreachable-after are the same duration - therefore your nodes will almost always disassociate automatically after 10s, because you're betting on a race condition that the failure detector might lose.
auto-down-unreachable-after is a dangerous setting - do not use it. You'll end up with a split brain or worse.
And make sure your failure detector interval is always lower than your auto-down interval.

Akka: context become throws NullPointerException?

I have a pretty simple actor defined as:
object CoreActor extends Actor with ActorLogging {
// val SYSTEM_NAME = "CoreActors"
val system = Akka.system()
// val system = ActorSystem.create("push", ConfigFactory.load.getConfig("push"))
val pushUri = Play.current.configuration.getString("pushservice.uri").getOrElse("akka.tcp://CentralappPush#127.0.0.1:5000")
val protocol = Play.current.configuration.getString("pushservice.protocol").getOrElse("akka.tcp")
val pushSystem = Play.current.configuration.getString("pushservice.system").getOrElse("CentralappPush")
val ip = Play.current.configuration.getString("pushservice.ip").getOrElse("127.0.0.1")
val port = Play.current.configuration.getInt("pushservice.port").getOrElse(5000)
val rootPath = Play.current.configuration.getString("pushservice.path.root").getOrElse("user")
val actorPath = Play.current.configuration.getString("pushservice.path.actor").getOrElse("PushMaster")
val selectionPath = RootActorPath(new Address(protocol, pushSystem, ip, port)) / rootPath / actorPath
val pushActor = context.actorSelection(selectionPath)
def receive = {
case pprs: List[PlaceProvider] => {
log.info("I received something")
pushActor ! pprs.map(_.clone)
context become afterSend
}
}
def afterSend: Receive = {
case pprs: List[PlaceProvider] => {
pprs.foreach(_.update) // update in the db
context.stop(self)
}
case _ => {
log.info("Did not understand message")
context.stop(self)
}
}
}
The actors are created with unique names from within a controller in the Play! framework. What I'm seeing is that when an actor is created for the first time of a place update, it does it's job and goes into the shutting down context as expected. A second call to the same action within the play framework causes this:
[ERROR] [02/18/2015 12:32:46.181] [application-akka.actor.default-dispatcher-2] [akka://application/user/OlD1vFKVLn1424259166159] null
java.lang.NullPointerException
at actors.CoreActor$$anonfun$receive$1.applyOrElse(CoreActor.scala:29)
at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
at actors.CoreActor$.aroundReceive(CoreActor.scala:11)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
at akka.dispatch.Mailbox.run(Mailbox.scala:220)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Which is quite baffling. Why does that happen? Why does it happen only during the second time an actor of the same type is started?
You actor is an object - this enforces that there is a single instance of the actor. When an actor is stopped it will go through some processing to cleanup its resources. Since on your next request you try to re-create a new actor using the same instance the creation is failing.
Try changing it to a class.
class CoreActor extends Actor with ActorLogging {
}