How to unit test kafka streams dsl when using schema registry - unit-testing

Lets say I want write a unit test for the example show here :
https://github.com/confluentinc/kafka-streams-examples/blob/5.1.2-post/src/main/java/io/confluent/examples/streams/WikipediaFeedAvroLambdaExample.java
I tried the following methods, both which did not work out for me:
1) Use TopologyTestDriver.
This class is pretty useful as long as schema registry is not involved.
I tried making use of MockSchemaRegistryClient but it didn't work out.
And even if it does work out, it requires that I create my own serializers which kind of defeats the purpose of schema registry.
2) Use EmbeddedSingleNodeKafkaCluster defined in the same project.
https://github.com/confluentinc/kafka-streams-examples/blob/5.1.2-post/src/test/java/io/confluent/examples/streams/kafka/EmbeddedSingleNodeKafkaCluster.java
Now this class is really handy and seems to have embedded kafka cluster and schema registry. But it does not seem to be available in any artifact. Consequently I tried copying the class but ran into further import issues.
Unable to download this particular artifact : io.confluent:kafka-schema-registry-client:5.0.0:tests
Has anyone able to make progress with the above mentioned options? Or even a completely different solution?

For doing this I ended up doing this small test library based on testcontainers: https://github.com/vspiliop/embedded-kafka-cluster. Starts a fully configurable docker based Kafka cluster (broker, zookeeper and Confluent Schema Registry) as part of your tests. Check out the example unit and cucumber tests.

Related

Can I deploy a multiclass java jar in aws lambda OR it should be always a single class file recommended in lambda?

I have an existing spring boot, not a webservice but a Kafka client app. But the issue is we have been structured with typical Processor->Service->DAO layer. The jar is above 50 MB so anyway its not a candidate for aws lambda. I got some doubts, can I deploy the full jar OR should I use the step functions?. All tutorials are a single class function. Do anyone have tried out this(multiclass jar)? Also now lambda have introduced dockers. Thats adding a more confusion, can I deploy a docker, but looks like its the same under the hood.
My pick is ECS/EKS with Fargate. Basically am planning to get rid of the docker image as well. But looks like there is no way available in lambda to host my existing app other than refactoring it as step function. Is it correct?
You can deploy the full fat jar with the usual multi-class hierarchy, but it is not recommended due to the Cold Start issue unless you use "Provisioned concurrency".
Here are my tips for you:
Keep the multi-class hierarchy, which anyways doesn't have much impact on the Jar size. This will keep your code testable. Try to remove the Spring if it is possible and create your own small dependency injection framework or use other small frameworks for that purpose.
Review all your dependencies, remove jars that are not needed. Our usual code is always very small, the dependent jar makes our deployable huge.

How to test Elasticsearch index creation?

I would like to write a JUnit Test using any kind of embedded Elasticsearch engine in order to test my services which should create indexes with mappings on start-up. What is the best way to do it?
Probably, it would be also enough to use ESTestCase. Unfortunately, I cannot find simple usage examples. Could anyone provide one?
There is no embedded Elasticsearch since 5.x any more. I would use Testcontainers for this: https://github.com/dadoonet/testcontainers-java-module-elasticsearch
PS: This code will soon move to the Testcontainers repo.

How to write automated tests when using cloud APIs?

I'm adding to an open source project that uses in this case some Azure cloud functionality, but the same general problem is applicable for any cloud API. I want to write tests for my code, but the results of the test are reliant on something having happened in the cloud service I'm using, and in order for this to happen, I need to supply credentials to the cloud service. In a private project, I could certainly just add my cloud credentials to the testing environment, but for public/open source projects, I can't do this. I can test locally easily enough, but this project uses CI (as do many OSS projects), so this can't really be done.
One approach seems to be using mock or something similar, but that doesn't actually seem to test that things are happening as they should be, and strikes me as a mostly pointless method to achieve 100% coverage.
Are there any 'virtual test cloud' environments that can be spun up to create an identical interface to the cloud service in question, but only for testing? How do these deal with side effects (the code in question creates a DNS entry, and ideally would test for the actual existence of a DNS entry using the system's resolver rather than another cloud call)?
How do people do this kind of testing?
I start with a spike solution to learn how to pass the required credentials. With this knowledge, I can TDD an acceptance test to call a simple API and get a "success" result.
I exclude the credentials from my repository. Instead, I include a template file with instructions.
From there, I drop down to unit tests to TDD sending requests and receiving responses. I don't test actual communication with any service. Instead:
Test the contents of requests.
Create responses and test how they're handled. This makes it really easy to test all sorts of error conditions.
Once I've TDD'd credentials, requests, and responses, I use what I call a spike test to confirm that everything is in fact working. Basically, this uses non-automated confirmation in anything I can quickly hack together.

How to specify custom database-connection parameters for testing purposes in Play Framework v2?

I want to run my tests against a distinct PostgreSQL database, as opposed to the in-memory database option or the default database configured for the local application setup (via the db.default.url configuration variable). I tried using the %test.db and related configuration variables (as seen here), but that didn't seem to work; I think those instructions are intended for Play Framework v1.
FYI, the test database will have it's schema pre-defined and will not need to be created and destroyed with each test run. (Though, I don't mind if it is re-created and destroyed with each test run, but I don't want to use "evolutions" to do so; I have a single SQL schema file I'm using at this point.)
Use alternative configuration files while local development to override DB credentials (and other settings) ie. like described in the other answer (Update 1).
Tip: using different kinds of databases in development and production leads fast to errors and bugs, so it's better to install the same DB locally for development and testing.
We were able to implement Play 1.x style configs on top of Play 2.x - though I bet the creators of Play will cringe when they hear this.
The code is not quite shareable, but basically, you just have to override the "configuration" method in your GlobalSettings: http://www.playframework.org/documentation/api/2.0.3/scala/index.html#play.api.GlobalSettings
You can check for some system of conf setting like "environment.tag=%test" then override all configs of for "%test.foo=bar" into "foo=bar".

Automated Testing in Apache Hive

I am about to embark on a project using Apache Hadoop/Hive which will involve a collection of hive query scripts to produce data feeds for various down stream applications. These scripts seem like ideal candidates for some unit testing - they represent the fulfillment of an API contract between my data store and client applications, and as such, it's trivial to write what the expected results should be for a given set of starting data. My issue is how to run these tests.
If I was working with SQL queries, I could use something like SQLlite or Derby to quickly bring up test databases, load test data and run a collection of query tests against them. Unfortunately, I am unaware of any such tools for Hive. At the moment, my best thought is to have the test framework bring up a hadoop local instance and run Hive against that, but I've never done that before and I'm not sure it will work, or be the right path.
Also, I'm not interested in a pedantic discussion about if what I am doing is unit testing or integration testing - I just need to be able to prove my code works.
Hive has special standalone mode, specifically design for the testing purposes. In this case it can run without hadoop. I think it is exactly what you need.
There is a link to the documentation:
http://wiki.apache.org/hadoop/Hive/HiveServer
I'm working as part of a team to support a big data and analytics platform, and we also have this kind of issue.
We've been searching for a while and we found two pretty promising tools: https://github.com/klarna/HiveRunner https://github.com/bobfreitas/HadoopMiniCluster
HiveRunner is a framework built on top of JUnit to test Hive Queries. It starts a standalone HiveServer with in memory HSQL as the metastore. With it you can stub tables, views, mock samples, etc.
There are some limitations on Hive versions though, but I definitely recommend it
Hope it helps you =)
You may also want to consider the following blog post which describes automating unit testing using a custom utility class and ant: http://dev.bizo.com/2011/04/hive-unit-testing.html
I know this is an old thread, but just in case someone comes across it. I have followed up on the whole minicluster & hive testing, and found that things have changed with MR2 and YARN, but in a good way. I have put together an article and github repo to give some help in it:
http://www.lopakalogic.com/articles/hadoop-articles/hive-testing/
Hope it helps!