Why is dagger considered better for AWS lambda implementation than Guice? - amazon-web-services

I know that dagger creates injection at compile time by generating code and hence its performance is better than Guice, which do it at runtime. But specifically for case of lambda, I see it mentioned at multiple places that Dagger is preferred. Is it because of the cold start problem?
Because of the cold start problem in lambda, lambda keeps doing bootstrapping multiple times whenever it receives a request after long time. So, with dagger, bootstrapping would be much faster as compared to Guice as it already has the generated code? I am saying if all the objects in Guice are also created during bootstrap as compared to Lazy loading.

As you already know, any dependency injection framework, at some point, needs to build some sort of dependency graph of the objects that are required by your application. Building this graph is often the most computationally expensive part of the DI framework.
Guice figures it out this graph by using reflection at runtime. Dagger generates code that represents the dependency graph at compile time. I don’t know which one is faster, but I do know that using reflection incurs a non-trivial performance hit.
However, the biggest difference is that Dagger does all the heavy lifting at compile time (which means you do the work once, no matter how many times you run it), whereas Guice must do the equivalent work every time the application starts up.
Now, to answer your question, Dagger is preferred if your application frequently starts and stops. With something like an mobile app, a slower startup time mostly just degrades the UX. With Lambda, not only does it slow down the cold start time, but since you are billed for the amount of time your code is running, it’s actually going to cost you more money to be constantly rebuilding the dependency graph.
TLDR; Dagger is preferred on Lambda (for both the cold start time and the cost) because it moves the most expensive part of the DI framework to compile time instead of performing it at runtime.

Related

How to simulate the passage of time to debug a program?

I'm writing a C++ program that keeps track of dates whenever it's ran, keeping a record. I want to test and debug this record keeping functionality, but the records themselves can span years; letting time pass naturally to expose bugs would take... a while.
Is there an established way of simulating the passage of time, so I an debug this program more easily? I'm using C's ctime library to acquire dates.
If you want your system to be testable, you'll want to be able to replace your entire time engine.
There are a number of ways to replace a time engine.
The easiest would be to create a new time engine singleton that replicates the time engine API you are using (in your case, ctime).
Then sweep out every use of ctime based APIs and redirect them to your new time engine.
This new time engine can be configured to use ctime, or something else.
To be more reliable, you'll even want to change the binary layout and type of any data structures used that interact with your old API, audit every case where you convert things to/from void pointers or reinterpret cast them, etc.
Another approach is dependency injection, where instead of using a singleton you pass the time-engine as an argument to your program. Every part of your program that needs access to time now either stores a pointer to the time engine, or takes it as a function argument.
Now that you have control over time you can arrange for it to pass faster, jump around, etc. Depending on your system you may want to be more or less realistic in how time passes, and it could expose different real and not real bugs.
Even after you do all of this you will not be certain that your program doesn't have time bugs. You may interact with OS time in ways you don't expect. But you can at least solve some of these problems.

Strange Unit Test Run Time Differences

I have unit test methods calling exactly same thing :
void Test()
{
for (int i = 0; i < 100000; i++);
}
One of them is always run in different duration.
If I remove first one, TestMethod3 is always different:
If I add another test methods, TestMethod6 is always different:
There is always one method that is different from others.
What is the reason behind this strange difference?
I am currently studying on algorithms and trying to measure run times with test methods. This difference made me think whether test method run times are reliable.
That has something to do with the test runner in visual studio. The tests are usually run simultaneously but the ones you see with the greater time is usually the one that was started first. I've noticed that in visual studio for years now. If you were to run one of them on their own you will notice that its run time will be longer than if it was run as part of a run all.
I've always assumed that it had to do with the timer being started early while the tests were still loading.
You can't test performance in a simple unit test. Part of the reason is that there are many different implementations and configurations of testing frameworks, with different impacts on the performance of a test.
The most notable is whether tests run in parallel, multi-threaded, or consecutively. Obviously the first option completely invalidates any benchmarking. The second option, though, still doesn't guarantee a valid benchmarking.
This is because of other factors which are independent of the actual unit testing framework: These include
Initial delays due to class loading and memory allocation
Just-in-time compilation of your byte code into machine code. This is difficult to control and can happen seemingly unpredictably.
Branch prediction, which may greatly influence your runtime behaviour, depending on the nature of the processed data and control flow
Garbage collection
Doing even remotely valid benchmarks in Java is an art form in itself. In order to get close, you should at least ensure that
you are running your code of interest in a single thread, with no other active threads
don't use garbage collection (i.e. make sure that there is enough memory for the test to perform without GC, and setting the GC options of your JVM appropriately)
have a warmup phase where you run your code in a sufficient number of iterations before starting to benchmark it.
This IBM article on 'Robust Java Benchmarking' is helpful as an introduction of the pitfalls of Java benchmarking.

How do I optimize the spawning of actors in Unreal Engine 4?

Currently, I'm using a blueprint script to generate and delete around 60 actors in a radius of a flying pawn.
This creates framerate spikes, and I've read that this is quite a heavy process.
So obviously I would like to increase the performance of this process by just placing the same logic in C++.
But I would like to know a couple of things first to make sure what I'm doing is right.
1 ) Is the spawnActor function in C++ by itself - faster - than the blueprint node?
2 ) Which kind of properties of the blueprint, increase the processing time of spawning?
I know that for example enabling physics will increase the process time, but are there any more properties that I need to take into consideration?
I thank everyone taking their time reading this, and any kind of help is much appreciated :)
You can't say that C++'s SpawnActor is faster, since the Blueprint's SpawnActor finally leads to the C++ SpawnActor. But of course if you directly write C++ codes then it saves the time of calling several functions routing Blueprint node to C++ function. Hence for the spike of SpawnActor I don't think it's because of calling Blueprint SpawnActor, i.e. calling C++ won't fix it.
The SpawnActor itself does have its cost, so calling it 60 times in a row surely makes a spike. I'm not very sure how many overhead is in SpawnActor, but at least I can guess that allocating memory for these new Actors cost some time. Also if your Actor template has a lot of components then it takes more time. So a common technique is to use an Actor Pool to pre-spawn some Actors and reuse them.
C++ will be always faster than Blueprints because last are based on templates, so your compilation and bug fixing won't be easy.
Amount of characters will always affect performance because of CPU time, but it's ok. You must take into account that getting access to characters based on UE4 iterators and dedicated containers, so they are already optimised and thread safe.

Profiling a multiprocess system

I have a system that i need to profile.
It is comprised of tens of processes, mostly c++, some comprised of several threads, that communicate to the network and to one another though various system calls.
I know there are performance bottlenecks sometimes, but no one has put in the time/effort to check where they are: they may be in userspace code, inefficient use of syscalls, or something else.
What would be the best way to approach profiling a system like this?
I have thought of the following strategy:
Manually logging the roundtrip times of various code sequences (for example processing an incoming packet or a cli command) and seeing which process takes the largest time. After that, profiling that process, fixing the problem and repeating.
This method seems sorta hacky and guess-worky. I dont like it.
How would you suggest to approach this problem?
Are there tools that would help me out (multi-process profiler?)?
What im looking for is more of a strategy than just specific tools.
Should i profile every process separately and look for problems? if so how do i approach this?
Do i try and isolate the problematic processes and go from there? if so, how do i isolate them?
Are there other options?
I don't think there is a single answer to this sort of question. And every type of issue has it's own problems and solutions.
Generally, the first step is to figure out WHERE in the big system is the time spent. Is it CPU-bound or I/O-bound?
If the problem is CPU-bound, a system-wide profiling tool can be useful to determine where in the system the time is spent - the next question is of course whether that time is actually necessary or not, and no automated tool can tell the difference between a badly written piece of code that does a million completely useless processing steps, and one that does a matrix multiplication with a million elements very efficiently - it takes the same amount of CPU-time to do both, but one isn't actually achieving anything. However, knowing which program takes most of the time in a multiprogram system can be a good starting point for figuring out IF that code is well written, or can be improved.
If the system is I/O bound, such as network or disk I/O, then there are tools for analysing disk and network traffic that can help. But again, expecting the tool to point out what packet response or disk access time you should expect is a different matter - if you contact google to search for "kerflerp", or if you contact your local webserver that is a meter away, will have a dramatic impact on the time for a reasonable response.
There are lots of other issues - running two pieces of code in parallel that uses LOTS of memory can cause both to run slower than if they are run in sequence - because the high memory usage causes swapping, or because the OS isn't able to use spare memory for caching file-I/O, for example.
On the other hand, two or more simple processes that use very little memory will benefit quite a lot from running in parallel on a multiprocessor system.
Adding logging to your applications such that you can see WHERE it is spending time is another method that works reasonably well. Particularly if you KNOW what the use-case is where it takes time.
If you have a use-case where you know "this should take no more than X seconds", running regular pre- or post-commit test to check that the code is behaving as expected, and no-one added a lot of code to slow it down would also be a useful thing.

Unit testing concurrent software - what do you do?

As software gets more and more concurrent, how do you handle testing the core behaviour of the type with your unit tests (not the parallel behaviour, just the core behaviour)?
In the good old days, you had a type, you called it, and you checked either what it returned and/or what other things it called.
Nowadays, you call a method and the actual work gets scheduled to run on the next available thread; you don't know when it'll actually start and call the other things - and what's more, those other things could be concurrent too.
How do you deal with this? Do you abstract/inject the concurrent scheduler (e.g. abstract the Task Parallel Library and provide a fake/mock in the unit tests)?
What resources have you come across that helped you?
Edit
I've edited the question to emphasise testing the normal behaviour of the type (ignoring whatever parallel mechanism is used to take advantage of multi-core, e.g. the TPL)
Disclaimer: I work for Corensic, a small startup in Seattle. We've got a tool called Jinx that is designed to detect concurrency errors in your code. It's free for now while we're in Beta, so you might want to check it out. ( http://www.corensic.com/ )
In a nutshell, Jinx is a very thin hypervisor that, when activated, slips in between the processor and operating system. Jinx then intelligently takes slices of execution and runs simulations of various thread timings to look for bugs. When we find a particular thread timing that will cause a bug to happen, we make that timing "reality" on your machine (e.g., if you're using Visual Studio, the debugger will stop at that point). We then point out the area in your code where the bug was caused. There are no false positives with Jinx. When it detects a bug, it's definitely a bug.
Jinx works on Linux and Windows, and in both native and managed code. It is language and application platform agnostic and can work with all your existing tools.
If you check it out, please send us feedback on what works and doesn't work. We've been running Jinx on some big open source projects and already are seeing situations where Jinx can find bugs 50-100 times faster than simply stress testing code.
I recommend picking up a copy of Growing Object Oriented Software by Freeman and Pryce. The last couple of chapters are very enlightening and deal with this specific topic. It also introduces some terminology which helps in pinning down the notation for discussion.
To summarize ....
Their core idea is to split the functionality and concurrent/synchronization aspects.
First test-drive the functional part in a single synchronous thread like a normal object.
Once you have the functional part pinned down. You can move on to the concurrent aspect. To do that, you'd have to think and come up with "observable invariants w.r.t. concurrency" for your object, e.g. the count should be equal to the times the method is called. Once you have identified the invariants, you can write stress tests that run multiple threads et.all to try and break your invariants. The stress tests assert your invariants.
Finally as an added defence, run tools or static analysis to find bugs.
For passive objects, i.e. code that'd be called from clients on different threads: your test needs to mimic clients by starting its own threads. You would then need to choose between a notification-listening or sampling/polling approach to synchronize your tests with the SUT.
You could either block till you receive an expected notification
Poll certain observable side-effects with a reasonable timeout.
The field of Unit testing for race conditions and deadlocks is relativly new and lacks good tools.
I know of two such tools both in early alpha/beta stages:
Microsoft's Chess
Typemock Racer
ANother option is to try and write a "stress test" that would cause deadlocks/race condtions to surface, create multiople instances/threads and run them side by side. The downside of this approch is that if the test fail it would be very hard to reproduce it. I suggest using logs both in the test and production code so that you'll be able to understand what happened.
A technique I've found useful is to run tests within a tool that detects race conditions like Intel Parallel Inspector. The test runs much slower than normal, because dependencies on timing have to be checked, but a single run can find bugs that otherwise would require millions of repeated ordinary runs.
I've found this very useful when converting existing systems for fine-grained parallelism via multi-core.
Unit tests really should not test concurrency/asynchronous behaviour, you should use mocks there and verify that the mocks receive the expected input.
For integration tests I just explicitly call the background task, then check the expectations after that.
In Cucumber it looks like this:
When I press "Register"
And the email sending script is run
Then I should have an email
Given that your TPL will have its own separate unit test you don't need to verify that.
Given that I write two tests for each module:
1) A single threaded unit test that uses some environment variable or #define to turn of the TPL so that I can test my module for functional correctness.
2) A stress test that runs the module in its threaded deployable mode. This test attempts to find concurrency issues and should use lots of random data.
The second test often includes many modules and so is probably more of an integration/system test.