Spray, Akka and Apache Kafka Producer

Spray, Akka and Apache Kafka Producer - akka

I am creating a simple REST-Api with Spray/Akka to receive a json message and pass it to a Apache Kafka producer. The Apache Kafka producer is a non-blocking API to send messages to the Kafka message broker and is thread-safe (should be shared by all threads).
My basic architecture is the following (pseudo code) in the routing trait
val myKafkaProducerActor = system.actorOf(Props[KafkaProducerActor])
val route = {
path("message") {
get {
entity(as[String]) { message =>
myKafkaProducerActor ! message
}
}
}
That is, I use always one single actor (myKafkaProducerActor) to forward the message, since that actor only contains very minimal checks (check if is a json document at all) and hand it over immediately to the non-blocking message producer api.
My concern is now:
Does it make sense at all to forward the message to a separate actor (the kafka producer is non-blocking, I have only separated it due to the validity checks, which are cheap currently, although).
How does the default akka message reliability affect spray (at most once delivery). Is it only theoretical, since the message is forwarded on the same jvm ? Is it better to not use any followup actors at all and accept a small performance penalty but have a greater reliability ?
Thanks.

Related

AWS SQS Selective Polling Pattern

I have a system where I publish updates to a shared topic meant for specific consumers.
I noticed messages getting stuck in the queue due to a lack of selective listening in SQS consumers, so messages are being hijacked.
Example:
Given: Message{destination: A, payload: 1234}
Given: ConsumerA, & ConsumerB
I expect Message to be processed by ConsumerA. However, it gets hijacked by Consumer B continuously. It receives the message, then refuses to process it since the destination field doesn't match, leading to the visibility timeout to expire, and the message put back on the queue.. but due to the nature of SQS, ConsumerB has an equal chance of picking the message again.
My question is, what patterns are used to solve this type of issue?
I'm considering creating a queue per consumer but it has drawbacks specific to the system im working on.
If I could only listen for messages with matching attributes, problem solved, but that's seemingly not the case.
Is there any other way?

Sharing a single Amazon SQS queue is not an appropriate architecture for your use-case.
If you want your consumers to be able to 'request' a message from a particular subset, you should either use separate SQS queues or use a database. You could even store objects in Amazon S3 as a form of noSQL database.
Having consumers grab messages and then 'send them back' to the queue is not compatible with the design of the Amazon SQS service.

Multiple different consumers of same Kinesis stream

I have a Kinesis producer which writes a single type of message to a stream. I want to process this stream in multiple, completely different consumer applications. So, a pub/sub with a single publisher for a given topic/stream. I also want to make use of checkpointing to ensure that each consumer processes every message written to the stream.
Initially, I was using the same App Name for all consumers and producers. However, I started getting the following error once I started more than one consumer:
com.amazonaws.services.kinesis.model.InvalidArgumentException: StartingSequenceNumber 49564236296344566565977952725717230439257668853369405442 used in GetShardIterator on shard shardId-000000000000 in stream PackageCreated under account ************ is invalid because it did not come from this stream. (Service: AmazonKinesis; Status Code: 400; Error Code: InvalidArgumentException; Request ID: ..)
This seems to be because consumers are clashing with their checkpointing as they are using the same App Name.
From reading the documentation, it seems the only way to do pub/sub with checkpointing is by having a stream per consumer application, which requires each producer to know about all possible consumers. This is more tightly coupled than I want; it's really just a queue.
It seems like Kafka supports what I want: arbitrary consumption of a given topic/partition, since consumers are completely in control of their own checkpointing. Is my only option to move to Kafka, or some other alternative, if I want pub/sub with checkpointing?
My RecordProcessor code, which is identical in each consumer:
override def processRecords(processRecordsInput: ProcessRecordsInput): Unit = {
log.trace("Received record(s) from kinesis")
for {
record <- processRecordsInput.getRecords
json <- jawn.parseByteBuffer(record.getData).toOption
msg <- decode[T](json.toString).toOption
} yield subscriber ! msg
processRecordsInput.getCheckpointer.checkpoint()
}
The code parses the message and sends it off to the subscriber. For now, I'm simply marking all messages as successfully received. I can see messages being sent on the AWS Kinesis dashboard, but no reads happen, presumably because each application has its own AppName and doesn't see any other messages.

The pattern you want, that of one publisher to & multiple consumers from one Kinesis stream, is supported. You don't need a separate stream per consumer.
How do you do that? You need to give a different application-name to every consumer. That way, checkpointing info of one consumer won't collide with that of another.
Check the first response to this: https://forums.aws.amazon.com/message.jspa?messageID=554375

does new instance of actor get created when too many msg?

I recently learned about the akka,but some idea I can't grasp.
my question is, if there are too many message in queue,will a new actor be created?
in many framework,for example, one http-requet message coming,and the framework found that the current "worker" are busy,so the framework will create another "worker " to process the new message in another thread.
but it seems the akka doesn't do this way,there is only one actor instance.
so I think the "busy actor" will bocking the queue, which will hit the throughout and performance , am I correct?

Each Actor stores their messages in a Mailbox.
http://doc.akka.io/docs/akka/current/scala/mailboxes.html
The default mailbox is unbounded and non-blocking. If your actor cannot process messages quickly enough, their mailbox balloons in size and consumes increasing amounts of RAM. You can configure Akka to use a bounded, blocking Mailbox which will block the sender when over capacity.
If you would like to dynamically manage a pool of actors, look into Routing strategies.
http://doc.akka.io/docs/akka/2.4.1/scala/routing.html
You can create a Router Actor that receives messages and passes them to routee actors. The Router also manages the routee pool and can dynamically generate routees as needed.
Also, if using Future and callback asynchronous execution, your actors will not block on http requests.

TL;DR:
If you send messages faster than your Actor can process them, eventually your application will start dropping messages.
Longer answer:
As I understand, every Akka Actor has a Queue associated with it, which holds all the messages it receives.
If you send messages to this Actor, faster than the Actor can process them, eventually the queue will overflow, since messages on the queue are kept in ram.
It is not possible to spawn another Actor, on the fly. Since the messages on the queue are processed in order. This ordering will be broken if more than one consumer exists.
I would suggest you take a look at Akka Streams, this is a higher level API built on top of actors, and guards you against this kind of thing by providing backpressure throughout your system. This means that if the actor you're sending messages to is slower than whoever is producing the messages, the consumer will ask the producer to slow down, and will not overflow your Actor's queue.

synchronous activemq webservice

I have a webservice (Restful) that send a message through ActiveMQ, and synchronously receive the response by creating a temporary listener in the same request.
The problem is, the listener wait for response of synchronous process , but never die. I need that listener receive response, and immediately stop the listener once is responded the request of webservice.
I have a great problem, because for each request of web services, a listener is created and this is active, producing overhead.

That code in the link is not production grade - simply an example how to make a "hello world" request reply.
Here is some psuedo code to deal with consuming responses blocking - and closing the consumer afterwards.
MessageConsumer responseConsumer = session.createConsumer(tempDest);
Messages response = responseConsumer.receive(waitTimeout);
// TODO handle msg
responseConsumer.close();
Temp destinations in JMS are pretty slow anyways. You can instead use JMSCorrelationID and make the replies go to a "regular queue" handled by a single consumer for all replies. That way, you need some thread handling code to hand over the message to the web service thread, but it will be non blocking and very fast.

Glassfish - JMS Request/Response - message doesn't go on queue

I'm trying to implement a web service in Glassfish 3.1.2, using the included OpenMQ JMS queue, that implements a synchronous JMS Request-Response using Temporary queuing for the response. It sends a message that is picked up off the main queue by a remote client job (runs outside of container), and receives back a response on the temporary queue.
In a basic Java POC, this works. But once I put the server-side code into the container, it doesn't work.
I turned off the job so that the messages would just go to the queue and not be picked up, and I follow the queue with QBrowser.
If I simply send the message from the producer, it gets onto the queue and could be read by the job.
But once I add in the code to receive() the response, the message is not readable on the queue. QBrowser says that there is 1 message on the queue, but it is marked UnAck and the queue appears empty (e.g. message is not readable).
connectionFactory and requestQueue are injected as #Resource from glassfish. Main queue is defined in glassfish.
Web Service innards:
connection = connectionFactory .createConnection();
connection.start();
session = connection.createSession(false, Session.AUTO_ACKNOWLEDGE);
MessageProducer producer = session.createProducer(requestQueue);
producer.setDeliveryMode(DeliveryMode.NON_PERSISTENT);
MyObject myObj=new MyObject();
Message message=session.createObjectMessage(myObj);
TemporaryQueue responseQueue = session.createTemporaryQueue();
MessageConsumer consumer = session.createConsumer(responseQueue);
message.setJMSReplyTo(responseQueue);
producer.send(message);
//if I comment out the next line, the message appears on the queue. If I leave it in, it will behave as described above.
Message response=consumer.receive();
I've tried various approaches, including separate connections and sessions and asynchronous consumer, and attempted a Transacted session for the producer but only got stacktraces when trying to commit.
What am I missing to make this get to the queue properly?
Thanks in advance!
Edit: Domain.xml references for ConnectionFactory and Queue:
<connector-connection-pool description="Connection factory for job processing" name="jms/MyJobs"
resource-adapter-name="jmsra" connection-definition-name="javax.jms.ConnectionFactory"
transaction-support=""></connector-connection-pool>
<connector-resource pool-name="jms/MyJobs" jndi-name="jms/MyJobs"></connector-resource>
<admin-object-resource res-adapter="jmsra" res-type="javax.jms.Queue"
description="Queue to request a job process" jndi-name="jms/MyJobRequest">
<property name="Name" value="MyJobRequest"></property>
</admin-object-resource>
[...]
<resource-ref ref="jms/MyJobs"></resource-ref>
<resource-ref ref="jms/MyJobRequest"></resource-ref>

Turned out to be a Transactional issue.
Got around it by adding a new method:
#Transactional(propagation = Propagation.REQUIRED, rollbackFor = Throwable.class)
private void sendMessage(MessageProducer producer, Message message) throws Exception{
producer.send(message);
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js