Flink: Access operator state after execution is complete - state

Assuming I have a custom RichFunction with some raw state. How can I get the state (from every parallel instance of the operator) back to the main/driver code when the flink jobs ends?
abstract class MyRichMap extends RichMapFunction[SomeType, Unit] {
protected var someVar: Engine = _
override def open(parameters: Configuration): Unit = {
// assume someVar inititation here
....
}
override def map(value: SomeType): Unit = {
engine.process(value)
}
val env = StreamExecutionEnvironment.getExecutionEnvironment
...
someSource.map (new MyRichMap())
env.execute()
// How to get engine or some field of it here? (e.g., engine.someCounter)
what's the best way to approach this?

If you want to test MyRichMap(), then you'd start with unit tests - see https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/testing.html
If you want to test a complete workflow, a simple approach inside of a single JVM (e.g. running locally command line or Eclipse) is to create a sink that captures results to a (thread-safe) singleton, and then check the contents. That implies your sources complete (are bounded) so the workflow will terminate.

Related

trouble debugging async Task in unit test because it's not entered

So I am trying to test a method of type async Task that is called inside of a command handler, inside that method I have some ifs and I want to check on which branch it goes.
Because on each branch a certain method is called, I can see which branch it went to by
await myRepository.Received(1).Method1(3, null);
Imagine the key method is like this:
public async Task MyKeyMethod(int x) {
if (x == 21)
Method1("bla");
if (x == 22)
Method2("blue");
if (x == 23)
Method3("ba");
}
So I want to test that the call MyKeyMethod(2) actually goes into the branch that calls Method2("blue");
And I know that I can do this by something like:
await MyKeyMethod.Received(1).Method2(22); // Received(1) means that method was invoked once.
Question 1: what should 22 be? The parameter supplied to Method2 or the one supplied to MyKeyMethod?
Question2: Why does my code not even enter any async Task method that I have inside the command handler (during debugging)?
Is there any concrete example that you have?
I am able to enter step by step the command by doing something like:
var cmd = new MyCommand(myObject); // myObject is an object that I mocked earlier (gave it some dummy values for each field)
var commandResponse = await handler.Handle(cmd);
Assert.That(commandResponse.IsSuccessful, Is.True);
...just NOT at the next deeper level, like the async Tasks inside those commands. I can just at the moment simulate what the async Task return, which is not what I want in this instance.
Question 3. Could this be because those async Task methods are inside a repository that is mocked by using
myRepository = Substitute.For<IMyRepository>();
Question 4. How do I enter actually not mockingly Task methods found inside Repositories that are mocked?
I am still getting the hang of it, "it" being the broader subject of unit tests in NUnit, but apparently my hunch was right. Because the repository was mocked, I could not enter and debug inside of one of its contained methods. So I used a real (not fake) method of the repository which by the way took in its constructors some fake instances of other dependant repos or services, and then I could go inside that Task.
So, factually, instead of:
myRepository = Substitute.For<IMyRepository>();
I went and created a real instance, such as:
var myRepository = new MyRepository>(mockService1, mockRepo2);
where mockService1 was mocked using Substitute like previously pointed out.
And by doing so I could then debug a method like:
myRepository.MyMethod(x) which previously the debugger couldn't analyse the inside of.
If you have a better way of phrasing my conclusions, by all means, or more complete explanation, please go ahead. Thank you

Wrapper around TASKs in C#

I am using tasks in WinForms (.NET 4.0) to perform lengthy operations like WCF call. Application is already in product with heavy use of Tasks (almost all the methods which uses Tasks are void).
During the unit testing we have used AutoResetEvents (in actual code) to find out when the given task is completed then perform assert.
This gives me a thought that almost all the AutoResetEvent are waste of effort. They are just fulfilling unit testing needs, nothing else.
Can we create a wrapper around Tasks likewise when actual code run... they should work in background and in case of unit testing they should be synchronous.
Similar to below link for BackgroundWorker.
http://si-w.co.uk/blog/2009/09/11/unit-testing-code-that-uses-a-backgroundworker/
Why can't you simply use the continuation for tasks in your wrapper, like this:
var task = ...
task.ContinueWith(t => check task results here)
Also, unit tests can be marked as async, if they have a return type Task, so you can use an await there, and after that do your asserts:
[Test]
public async Task SynchronizeTestWithRecurringOperationViaAwait()
{
var sut = new SystemUnderTest();
// Execute code to set up timer with 1 sec delay and interval.
var firstNotification = sut.StartRecurring();
// Wait that operation has finished two times.
var secondNotification = await firstNotification.GetNext();
await secondNotification.GetNext();
// Assert outcome.
Assert.AreEqual("Init Poll Poll", sut.Message);
}
Another approach (from the same article) is to use a custom task scheduler, which will be synchronous in case of unit testing:
[Test]
public void TestCodeSynchronously()
{
var dts = new DeterministicTaskScheduler();
var sut = new SystemUnderTest(dts);
// Execute code to schedule first operation and return immediately.
sut.StartAsynchronousOperation();
// Execute all operations on the current thread.
dts.RunTasksUntilIdle();
// Assert outcome of the two operations.
Assert.AreEqual("Init Work1 Work2", sut.Message);
}
Same MSDN magazine contains nice article about best practices for async unit testing. Also async void should be used only as an event handler, all other methods should have async Task signature.

How do I manage unit test resources in Kotlin, such as starting/stopping a database connection or an embedded elasticsearch server?

In my Kotlin JUnit tests, I want to start/stop embedded servers and use them within my tests.
I tried using the JUnit #Before annotation on a method in my test class and it works fine, but it isn't the right behaviour since it runs every test case instead of just once.
Therefore I want to use the #BeforeClass annotation on a method, but adding it to a method results in an error saying it must be on a static method. Kotlin doesn't appear to have static methods. And then the same applies for static variables, because I need to keep a reference to the embedded server around for use in the test cases.
So how do I create this embedded database just once for all of my test cases?
class MyTest {
#Before fun setup() {
// works in that it opens the database connection, but is wrong
// since this is per test case instead of being shared for all
}
#BeforeClass fun setupClass() {
// what I want to do instead, but results in error because
// this isn't a static method, and static keyword doesn't exist
}
var referenceToServer: ServerType // wrong because is not static either
...
}
Note: this question is intentionally written and answered by the author (Self-Answered Questions), so that the answers to commonly asked Kotlin topics are present in SO.
Your unit test class usually needs a few things to manage a shared resource for a group of test methods. And in Kotlin you can use #BeforeClass and #AfterClass not in the test class, but rather within its companion object along with the #JvmStatic annotation.
The structure of a test class would look like:
class MyTestClass {
companion object {
init {
// things that may need to be setup before companion class member variables are instantiated
}
// variables you initialize for the class just once:
val someClassVar = initializer()
// variables you initialize for the class later in the #BeforeClass method:
lateinit var someClassLateVar: SomeResource
#BeforeClass #JvmStatic fun setup() {
// things to execute once and keep around for the class
}
#AfterClass #JvmStatic fun teardown() {
// clean up after this class, leave nothing dirty behind
}
}
// variables you initialize per instance of the test class:
val someInstanceVar = initializer()
// variables you initialize per test case later in your #Before methods:
var lateinit someInstanceLateZVar: MyType
#Before fun prepareTest() {
// things to do before each test
}
#After fun cleanupTest() {
// things to do after each test
}
#Test fun testSomething() {
// an actual test case
}
#Test fun testSomethingElse() {
// another test case
}
// ...more test cases
}
Given the above, you should read about:
companion objects - similar to the Class object in Java, but a singleton per class that is not static
#JvmStatic - an annotation that turns a companion object method into a static method on the outer class for Java interop
lateinit - allows a var property to be initialized later when you have a well defined lifecycle
Delegates.notNull() - can be used instead of lateinit for a property that should be set at least once before being read.
Here are fuller examples of test classes for Kotlin that manage embedded resources.
The first is copied and modified from Solr-Undertow tests, and before the test cases are run, configures and starts a Solr-Undertow server. After the tests run, it cleans up any temporary files created by the tests. It also ensures environment variables and system properties are correct before the tests are run. Between test cases it unloads any temporary loaded Solr cores. The test:
class TestServerWithPlugin {
companion object {
val workingDir = Paths.get("test-data/solr-standalone").toAbsolutePath()
val coreWithPluginDir = workingDir.resolve("plugin-test/collection1")
lateinit var server: Server
#BeforeClass #JvmStatic fun setup() {
assertTrue(coreWithPluginDir.exists(), "test core w/plugin does not exist $coreWithPluginDir")
// make sure no system properties are set that could interfere with test
resetEnvProxy()
cleanSysProps()
routeJbossLoggingToSlf4j()
cleanFiles()
val config = mapOf(...)
val configLoader = ServerConfigFromOverridesAndReference(workingDir, config) verifiedBy { loader ->
...
}
assertNotNull(System.getProperty("solr.solr.home"))
server = Server(configLoader)
val (serverStarted, message) = server.run()
if (!serverStarted) {
fail("Server not started: '$message'")
}
}
#AfterClass #JvmStatic fun teardown() {
server.shutdown()
cleanFiles()
resetEnvProxy()
cleanSysProps()
}
private fun cleanSysProps() { ... }
private fun cleanFiles() {
// don't leave any test files behind
coreWithPluginDir.resolve("data").deleteRecursively()
Files.deleteIfExists(coreWithPluginDir.resolve("core.properties"))
Files.deleteIfExists(coreWithPluginDir.resolve("core.properties.unloaded"))
}
}
val adminClient: SolrClient = HttpSolrClient("http://localhost:8983/solr/")
#Before fun prepareTest() {
// anything before each test?
}
#After fun cleanupTest() {
// make sure test cores do not bleed over between test cases
unloadCoreIfExists("tempCollection1")
unloadCoreIfExists("tempCollection2")
unloadCoreIfExists("tempCollection3")
}
private fun unloadCoreIfExists(name: String) { ... }
#Test
fun testServerLoadsPlugin() {
println("Loading core 'withplugin' from dir ${coreWithPluginDir.toString()}")
val response = CoreAdminRequest.createCore("tempCollection1", coreWithPluginDir.toString(), adminClient)
assertEquals(0, response.status)
}
// ... other test cases
}
And another starting AWS DynamoDB local as an embedded database (copied and modified slightly from Running AWS DynamoDB-local embedded). This test must hack the java.library.path before anything else happens or local DynamoDB (using sqlite with binary libraries) won't run. Then it starts a server to share for all test classes, and cleans up temporary data between tests. The test:
class TestAccountManager {
companion object {
init {
// we need to control the "java.library.path" or sqlite cannot find its libraries
val dynLibPath = File("./src/test/dynlib/").absoluteFile
System.setProperty("java.library.path", dynLibPath.toString());
// TEST HACK: if we kill this value in the System classloader, it will be
// recreated on next access allowing java.library.path to be reset
val fieldSysPath = ClassLoader::class.java.getDeclaredField("sys_paths")
fieldSysPath.setAccessible(true)
fieldSysPath.set(null, null)
// ensure logging always goes through Slf4j
System.setProperty("org.eclipse.jetty.util.log.class", "org.eclipse.jetty.util.log.Slf4jLog")
}
private val localDbPort = 19444
private lateinit var localDb: DynamoDBProxyServer
private lateinit var dbClient: AmazonDynamoDBClient
private lateinit var dynamo: DynamoDB
#BeforeClass #JvmStatic fun setup() {
// do not use ServerRunner, it is evil and doesn't set the port correctly, also
// it resets logging to be off.
localDb = DynamoDBProxyServer(localDbPort, LocalDynamoDBServerHandler(
LocalDynamoDBRequestHandler(0, true, null, true, true), null)
)
localDb.start()
// fake credentials are required even though ignored
val auth = BasicAWSCredentials("fakeKey", "fakeSecret")
dbClient = AmazonDynamoDBClient(auth) initializedWith {
signerRegionOverride = "us-east-1"
setEndpoint("http://localhost:$localDbPort")
}
dynamo = DynamoDB(dbClient)
// create the tables once
AccountManagerSchema.createTables(dbClient)
// for debugging reference
dynamo.listTables().forEach { table ->
println(table.tableName)
}
}
#AfterClass #JvmStatic fun teardown() {
dbClient.shutdown()
localDb.stop()
}
}
val jsonMapper = jacksonObjectMapper()
val dynamoMapper: DynamoDBMapper = DynamoDBMapper(dbClient)
#Before fun prepareTest() {
// insert commonly used test data
setupStaticBillingData(dbClient)
}
#After fun cleanupTest() {
// delete anything that shouldn't survive any test case
deleteAllInTable<Account>()
deleteAllInTable<Organization>()
deleteAllInTable<Billing>()
}
private inline fun <reified T: Any> deleteAllInTable() { ... }
#Test fun testAccountJsonRoundTrip() {
val acct = Account("123", ...)
dynamoMapper.save(acct)
val item = dynamo.getTable("Accounts").getItem("id", "123")
val acctReadJson = jsonMapper.readValue<Account>(item.toJSON())
assertEquals(acct, acctReadJson)
}
// ...more test cases
}
NOTE: some parts of the examples are abbreviated with ...
Managing resources with before/after callbacks in tests, obviously, has it's pros:
Tests are "atomic". A test executes as a whole things with all the callbacks One won't forget to fire up a dependency service before the tests and shut it down after it's done. If done properly, executions callbacks will work on any environment.
Tests are self-contained. There is no external data or setup phases, everything is contained within a few test classes.
It has some cons too. One important of them is that it pollutes the code and makes the code violate single responsibility principle. Tests now not only test something, but perform a heavyweight initialization and resource management. It can be ok in some cases (like configuring an ObjectMapper), but modifying java.library.path or spawning another processes (or in-process embedded databases) are not so innocent.
Why not treat those services as dependencies for your test eligible for "injection", like described by 12factor.net.
This way you start and initialize dependency services somewhere outside of the test code.
Nowadays virtualization and containers are almost everywhere and most developers' machines are able to run Docker. And most of the application have a dockerized version: Elasticsearch, DynamoDB, PostgreSQL and so on. Docker is a perfect solution for external services that your tests need.
It can be a script that runs is run manually by a developer every time she wants to execute tests.
It can be a task run by build tool (e.g. Gradle has awesome dependsOn and finalizedBy DSL for defining dependencies). A task, of course, can execute the same script that developer executes manually using shell-outs / process execs.
It can be a task run by IDE before test execution. Again, it can use the same script.
Most CI / CD providers have a notion of "service" — an external dependency (process) that runs in parallel to your build and can be accessed via it's usual SDK / connector / API: Gitlab, Travis, Bitbucket, AppVeyor, Semaphore, …
This approach:
Frees your test code from initialization logic. Your tests will only test and do nothing more.
Decouples code and data. Adding a new test case can now be done by adding new data into dependency services with it's native toolset. I.e. for SQL databases you'll use SQL, for Amazon DynamoDB you'll use CLI to create tables and put items.
Is closer to a production code, where you obviously do not start those services when your "main" application starts.
Of course, it has it's flaws (basically, the statements I've started from):
Tests are not more "atomic". Dependency service must be started somehow prior test execution. The way it is started may be different in different environments: developer's machine or CI, IDE or build tool CLI.
Tests are not self-contained. Now your seed data may be even packed inside an image, so changing it may require rebuilding a different project.

How to unit test Service Fabric Actor with State

I've started writing unit tests for new actor with state. The state is initialised in the OnActivateAsync method which is called by Service Fabric when the Actor is activated.
When unit testing, I'm creating the Actor myself and as the method is protected I don't have access from my unit test to call this method myself.
I'm wondering on the usual approach for this kind of testing. I could mock the Actor and mock the state, but for the code I want to test call the original. Am wondering if there is another approach I've not come across.
Another approach would be to move the State initialisation to somewhere else like a public method or in the constructor but the template for an Actor has the code there so it may be a best practice.
Use the latest version of ServiceFabric.Mocks NuGet package. It contains special extension to invoke OnActivateAsync protected method and the whole tool set for ServiceFabric unit testing.
var svc = MockActorServiceFactory.CreateActorServiceForActor<MyActor>();
var actor = svc.Activate(new ActorId(Guid.NewGuid()));
actor.InvokeOnActivateAsync().Wait();
I like to use the InternalsVisibleTo attribute and an internal method on the actor, which calls the OnActivateAsync method.
In the target Actor project, AssemblyInfo.cs add a line like this:
[assembly: InternalsVisibleTo("MyActor.Test")]
Where "MyActor.Test" is the name of the test project you want to grant access to your internal members.
In the target Actor class add a method something like this:
internal Task InvokeOnActivateAsync()
{
return OnActivateAsync();
}
This way you can invoke the OnActivateAsync method from your test project something like this:
var actor = CreateNewActor(id);
actor.InvokeOnActivateAsync()
I appreciate this is not ideal, but you can use reflection to call the OnActivateAsync() method.
For example,
var method = typeof(ActorBase).GetMethod("OnActivateAsync", BindingFlags.Instance | BindingFlags.NonPublic);
await (Task)method.Invoke(actor, null);
This way you'll be testing the actual method you want to test and also won't be exposing methods you don't really want to expose.
You may find it useful to group the creation of the actor and the manual call to OnActivateAsync() in a single method so that it's used across your test suite and it mimics the original Service Fabric behaviour.

Scala - write unit tests for objects/singletons that extends a trait/class with DB connection

Unit test related question
Encountered a problem with testing scala objects that extend another trait/class that has a DB connection (or any other "external" call)
Using a singleton with a DB connection anywhere in my project makes unit-test not be a option because I cannot override / mock the DB connection
This results in changing my design only for test purpose in situations where its clearly needed to be a object
Any suggestions ?
Code snippet for a non testable code :
object How2TestThis extends SomeDBconnection {
val somethingUsingDB = {
getStuff.map(//some logic)
}
val moreThigs {
//more things
}
}
trait SomeDBconnection {
import DBstuff._
val db = connection(someDB)
val getStuff = db.getThings
}
One of the options is to use cake pattern to require some DB connection and mixin specific implementation as desired. For example:
import java.sql.Connection
// Defines general DB connection interface for your application
trait DbConnection {
def getConnection: Connection
}
// Concrete implementation for production/dev environment for example
trait ProductionDbConnectionImpl extends DbConnection {
def getConnection: Connection = ???
}
// Common code that uses that DB connection and needs to be tested.
trait DbConsumer {
this: DbConnection =>
def runDb(sql: String): Unit = {
getConnection.prepareStatement(sql).execute()
}
}
...
// Somewhere in production code when you set everything up in init or main you
// pick concrete db provider
val prodDbConsumer = new DbConsumer with ProductionDbConnectionImpl
prodDbConsumer.runDb("select * from sometable")
...
// Somewhere in test code you mock or stub DB connection ...
val testDbConsumer = new DbConsumer with DbConnection { def getConnection = ??? }
testDbConsumer.runDb("select * from sometable")
If you have to use a singleton/Scala object you can have a lazy val or some init(): Unit method that sets connection up.
Another approach would be to use some sort of injector. For example look at Lift code:
package net.liftweb.http
/**
* A base trait for a Factory. A Factory is both an Injector and
* a collection of FactorMaker instances. The FactoryMaker instances auto-register
* with the Injector. This provides both concrete Maker/Vender functionality as
* well as Injector functionality.
*/
trait Factory extends SimpleInjector
Then somewhere in your code you use this vendor like this:
val identifier = new FactoryMaker[MongoIdentifier](DefaultMongoIdentifier) {}
And then in places where you actually have to get access to DB:
identifier.vend
You can supply alternative provider in tests by surrounding your code with:
identifier.doWith(mongoId) { <your test code> }
which can be conveniently used with specs2 Around context for example:
implicit val dbContext new Around {
def around[T: AsResult](t: => T): Result = {
val mongoId = new MongoIdentifier {
def jndiName: String = dbName
}
identifier.doWith(mongoId) {
AsResult(t)
}
}
}
It's pretty cool because it's implemented in Scala without any special bytecode or JVM hacks.
If you think first 2 options are too complicated and you have a small app you can use Properties file/cmd args to let you know if you are running in test or production mode. Again the idea comes from Lift :). You can easily implement it yourself, but here how you can do it with Lift Props:
// your generic DB code:
val jdbcUrl: String = Props.get("jdbc.url", "jdbc:postgresql:database")
You can have 2 props files:
production.default.props
jdbc.url=jdbc:postgresql:database
test.default.props
jdbc.url=jdbc:h2
Lift will automatically detect run mode Props.mode and pick the right props file to read. You can set run mode with JVM cmd args.
So in this case you can either connect to in-memory DB or just read run mode and set your connection in code accordingly (mock, stub, uninitialized, etc).
Use regular IOC pattern - pass dependencies via constructor arguments to the class. Don't use an object. This gets inconvenient quickly unless you use special dependency injection frameworks.
Some suggestions:
Use object for something that can't have an alternative implementation and if this only implementation will work in all environments. Use object for constants and pure FP non side effecting code. Use singletons for wiring things up at the last moment - like a class with main, not somewhere deep in the code where many components depend on it unless it has no side effects or it uses something like stackable/injectable vendor providers (see Lift).
Conclusion:
You can't mock an object or override its implementation. You need to design your code to be testable and some of the options for it are listed above. It's a good practice to make your code flexible with easily composable parts not only for the purposes of testing but also for reusability and maintainability.