(Unit) Testing Akka Streams Kafka - unit-testing

I'am currently evaluating Apache Kafka for the use as a middleware in a microservice environment. Not only as a Message Queue but also for aggregating other data sources. Kafka seems to be the perfect fit. The services in majority are based on the Play Framework, so Akka Stream Kafka seems to be the natural choice to interact with Kafka.
I prototyped a small App with a Consumer and a Publisher, communicating via JSON and that was pretty straight forward. But when it comes to unit testing I become a little helpless. Is it possible to run the tests in a lightweight fashion and not with a running Kafka Cluster or an embedded server (check here)? I also found this project which looked promising, but I was not able to test my Consumer with it. Isn't that the right tool? I'am a little confused.

Not sure if your question is still relevant, but have you had a look at the Alpakka Kafka testkit?

Related

Akka - understand the actors model

I have been learning Akka for few days and I have a simple questions to understand it well. How should be the application architecture created for REST service which using actors? Actors should be:
A simple component (for example Service layer, DAO, controllers, etc)?
An Actor should be a buisness logic element. For example I have a business logic which should be spearate into tasks which are actors?
An Actor = microservice. It is a high level layer. Every microservice in application should work as an separate actor?
I cannot understand it in this way - how should I use actors in correct way? If I create a REST service with layers (controllers, services, DAO and database), how should I separate it as actors in Akka application?
There was a blog (likely this) that reflects my take on Akka Actors pretty well. I don't really use them.
Depends on who you talk with, some people are really into it whereas others may see it as an underlying essential which maybe isn't that useful on an application level.
I use actors for handling state. That's all. Otherwise it's Futures or Akka Streams. I hope you like the blog. If you still have questions after it, please shoot. I have 5+ years of Akka behind me. Happy to help.
I wouldn't recommend building a REST service using raw Akka actors. Actors are better used for encapsulating state and behavior. For example, the loosely-coupled lightweight actors can be used for simulating individual IoT devices (e.g. thermostats), each of which maintains its own internal state (e.g. cool setting) and adjusts/reports its settings via non-blocking message passing.
For REST API/service, you might want to consider using Play which is built on top of Akka, supporting non-blocking I/O, JSON as first-class citizen, Websockets, etc. Here's a basic example of creating a REST service using Play.
On microservice, as noted in the above link:
Building a REST API in Play does not automatically make it a
"microservice" because it does not cover larger scale concerns about
microservices such as ensuring resiliency, consistency, or monitoring.
To incorporate microservice into your REST API, consider the Lagom framework which is built on top of Play/Akka along with the reactive qualities.

Kafka: Are there are examples on how to use Mockito for unit testing Kafka?

I have a producer application that needs unit testing. I don't want to spin up a Zookeeper and Kafka server for this purpose. Is there a simpler way to test it using Mockito?
If you don't want to start Kafka and Zookeeper, you can use the Mock clients that come with Kafka to fake sending and receiving messages from a Kafka cluster:
MockProducer: http://kafka.apache.org/10/javadoc/org/apache/kafka/clients/producer/MockProducer.html
MockConsumer: http://kafka.apache.org/10/javadoc/org/apache/kafka/clients/consumer/MockConsumer.html
For such testing I've used EmbeddedKafka from the spring-kafka-test library (even though I wasn't using Spring in my app, that proved to be the easiest way of setting up unit tests). Here's an example : https://www.codenotfound.com/spring-kafka-embedded-unit-test-example.html
It actually spins up a Kafka and Zookeeper in the same process for you, so you're not really mocking anything out and so you don't need mockito for this. I used plain JUnit.

Role of kafka consumer, seperate service or Django component?

I'm designing a web log analytic.
And I found an architect with Django(Back-end & front-end)+ kafka + spark.
I also found some same system from this link:http://thevivekpandey.github.io/posts/2017-09-19-high-velocity-data-ingestion.html with below architect
But I confuse about the role of kafka-consumer. It will is a service, independent to Django, right?
So If I want to plot real-time data to front-end chart, how to I attached to Django.
It will too ridiculous if I place both kafka-consumer & producer in Django. Request from sdk come to Django by pass to kafa topic (producer) and return Django (consumer) for process. Why we don't go directly. It looks simple and better.
Please help me to understand the role of kafka consumer, where it should belong? and how to connect to my front-end.
Thanks & best Regards,
Jame
The article mentions about the use case without Kafka:
We saw that in times of peak load, data ingestion was not working properly: it was taking too long to connect to MongoDB and requests were timing out. This was leading to data loss.
So the main point of introducing Kafka and Kafka Consumer is to avoid too much load on DB layer and handle it gracefully with a messaging layer in between. To be honest, any message queue can be used in this case, not only Kafka.
Kafka Consumer can be a part of the web layer. It wouldn't be optimal, because you want the separation of concerns (which makes the system more reliable in case of failures) and ability to scale things independently.
It's better to implement the Kafka Consumer as a separate service if the concerns mentioned above really matter (scalability and reliability) and it's easy for you to do operationally (because you need to deploy, monitor, etc. a new service now). In the end it's a classic monolith vs. microservices dilemma.

How to configure OSB to consume messages from Amazon SQS

I'm newbie to AWS and trying to work on the SQS for the first time. I've an Oracle Service Bus (OSB) in non-cloud environment and would like to configure OSB to consume messages from Amazon SQS. The documentation mentions to use REST API and poll repeatedly for messages. I also read about the 'client library for JMS' so that the OSB could treat SQS as JMS provider. What is the best approach to achieve this? Appreciate your inputs.
The easiest (not necessarily the purest way) would be to create a Java EE app that imports the SQS libraries and pulls messages from AWS and puts them on a local queue for OSB to process. The example code snippets are in Java, so it should be relatively straight forward.
The purest way would be to set it up as a remote JMS provider. However, how to set that up is not so clear - you may end up writing most of the code that went into option #1 above, but making a JMS client library instead of a MDB.

How to use run distribute tasks on worker nodes in a Clojure app?

On Python/Django stack, we were used to using Celery along with RabbitMQ.
Everything was easily done.
However when we tried doing the same thing in Clojure land, what we could get was Langhour.
In our current naive implementation we have a worker system which has three core parts.
Publisher module
Subscriber module
Task module
We can start the system on any node in either publisher or subscriber mode.
They are connected to RabbitMQ server.
They share one worker_queue.
What we are doing is creating tasks in Task module, and then when we want to run a task on subscriber. we send an expression call to the method, in EDN format to Subscriber which then decodes this and runs the actual task using eval.
Now is using eval safe ? we are not running expressions generated by user or any third party system.Initially we were planning to use JSON to send the payload message but then EDN gave us a lot more flexibility and it works like a charm, as of now.
Also is there a better way to do this ?
Depends on you needs (and your team), I highly suggest Storm Project. You will get a distributed, fault tolerant and realtime computation and it is really easy to use.
Another nice thing in Storm that it supports a plethora of options as the datasource for the topologies. It can be for example: Apache Kafka, RabbitMQ, Kestrel, MongoDB. If you aren't satisfied, then you can write your own driver.
It is also has a web interface to see what is happening in your topology.