Performance graphs for various computations

Performance graphs for various computations - chapel

I have noticed that the following site provides performance graphs for various kinds of computations (including arrays, FFTs, parallels, etc, etc,...)
Chapel Performance Graphs for chapcs
For example, the graph for "2D Array Assignment" gives the following:
I am wondering whether this is mainly for the internal use only (by the Chapel developers) or the test codes are public for the users as well (for trying to run them on local machines). Also, I think they might be very nice to learn good idioms to get higher performance for some tasks...
Thanks in advance!

The source code for all Chapel performance tests is public (and generally speaking almost all Chapel development and code is public.) You can find the tests in the github repository: https://github.com/chapel-lang/chapel/tree/master/test
Matching the graph name to the test isn't always easy. Typically what I'd do for something like this is clone the repo and do git grep "2D Array Assignment" -- test. That will tell you that the .graph file associated with this test is test/performance/sungeun/assign.1024.graph, and normally (though not always) the test name has a similar basename. In this case the test is test/performance/sungeun/assign.chpl.
You can run performance tests using start_test by doing something like start_test --performance test/performance/sungeun/assign.chpl and results will be in test/perfdat/$HOSTNAME with the graphs being in an html subdirectory
While the graphs and testing infrastructure are public, they are geared towards developers and many aspects of the testing system aren't always very intuitive or polished for "end users". https://github.com/chapel-lang/chapel/tree/master/doc/rst/developer/bestPractices/TestSystem.rst is a rather sprawling document that has more information on the testing infrastructure

Related

Can unit tests be environment-dependent?

I have this very generic question about unit test. Should we write environment dependent unit tests? The unit tests may depend on file system, a specific connection or presence of a file.
I am not sure if there is a answer to it. But I found it takes time to decouple the unit tests from the environment. Does that mean it is ok to create environment dependent unit tests? Or it always points out that there is a design problem when we need to write environment dependent unit tests?

TL;DR
Whether environment dependent tests are indeed a design flaw or not is strongly dependent on your application, programming language and specific problem. Keep in mind, that tests are never "complete" or "perfect" and try not to achieve perfection at the expense of more and more time. Using modern frameworks, making your tests re-usable and mocking your environment can take a lot of the pain out of testing.
The longer answer
IMHO, there's no perfect answer to this question. Unit Tests should be as generic and re-usable as possible without trying to cover environmental edge cases at any price.
Example: Most environments should have an AMD64 processor nowadays (not regarding phones), so in most cases it is not needed to create unit tests for X32 or ARM environments, ...
Your question is technology agnostic, so it's hard to tell, if the need for environment dependent unit tests is a design flaw. If you're using Java (see below), I'd rather think it is, using C++, it might not be...
Some rules of thumb I can give from my experience (and therefore a little opinion-based):
Follow the pareto principle
Don't try to achieve 100% of what is possible in theory, but concretate on the about 80% that are also likely to be encountered. Let's say, you expect your application to run on Windows Clients. So possibly your tests should cover the environmental characteristics of Windows 10 and 11. Older Windows versions, Linux, Mac OS, ... should not be needed.
Or, if your application mainly has as Spanish audience, you should not care so much about how it can deal with chinese characters or canadian data protection rules.
Exclude environmental specialties as much as sane and possible when coding your application
If you can manage to make your application code mostly independent from your environment, the same will "automatically" apply to your tests. (Also, see the point about not re-inventing the wheel below)
As you didn't ask for a specific programming language, I'll refer to Java here, but similar concepts exist in most modern languages.
E. g., Java abstracts very much from your environment, so you can use File.separator, if you don't know if your application will be running on Windows (\ as separator) or Linux (/).
Not using OS-specific APIs (like Java NI) avoids environment-related problems as well. If not dealing closely with hardware access, there shouldn't be any need to use them.
Adapt your tests
Like the "real" application, usually your tests are never "done". A new problem with your software arises? Fix it and add a specific test case for this kind of bug (trying to be quite generic). If it was a problem related to the 20% of cases or environments you did not consider before, be happy - now you've got a little hold of them as well.
Package your test data with your application
Referring to Java again, you could place your test files in the resources folder and load them dynamically while testing - not need to bother the real file system of the testing machine.
Also, if you're using a database, you could spin up an in-memory instance (like H2) during testing. This will not exactly mirror your production database system, but makes testing very easy without the danger of breaking something. Anyway, database abstraction is also very sophisticated nowadays, so despite of some edge cases, you should not see any difference in behaviour if you're using a good database abstraction layer (which itself in turn make you less environment-dependent).
Make your tests re-usable
Using a tool like Maven or Gradle, you can make your tests run on a bigger variety of environments, if they are packaged as just pointed out. The need for a specific test environment will decrease.
Mocking is your friend
As with the "fake" database mentioned before, you can mock a lot of things. There are frameworks like Mockito allowing white box and black box testing with mocked objects. "Helpers" like WireMock allow you to even mock the answer you get from an external service. And there are a lot more possibilities out there!
Don't re-invent the wheel
If you use one of todays advanced application frameworks (like Spring, Symfony, Boost, ...) they'll do a lot of the work for you if you make proper use of them. You'll have to read the docs and burrow into them, but then you will realize how much easier they can make your life. This does not apply to frameworks only, but also to libraries, services, components, ... And the good thing is, all of that will also come into place when it's about testing! Possibly most of your environmental dependencies might vanish if you use a good software foundation.

Unit Testing Machine Learning Code

I am writing a fairly complicated machine learning program for my thesis in computer vision. It's working fairly well, but I need to keep trying out new things out and adding new functionality. This is problematic because I sometimes introduce bugs when I am extending the code or trying to simplify an algorithm.
Clearly the correct thing to do is to add unit tests, but it is not clear how to do this. Many components of my program produce a somewhat subjective answer, and I cannot automate sanity checks.
For example, I had some code that approximated a curve with a lower-resolution curve, so that I could do computationally intensive work on the lower-resolution curve. I accidentally introduced a bug into this code, and only found it through a painstaking search when my the results of my entire program got slightly worse.
But, when I tried to write a unit-test for it, it was unclear what I should do. If I make a simple curve that has a clearly correct lower-resolution version, then I'm not really testing out everything that could go wrong. If I make a simple curve and then perturb the points slightly, my code starts producing different answers, even though this particular piece of code really seems to work fine now.

You may not appreciate the irony, but basically what you have there is legacy code: a chunk of software without any unit tests. Naturally you don't know where to begin. So you may find it helpful to read up on handling legacy code.
The definitive thought on this is Michael Feather's book, Working Effectively with Legacy Code. There used to be a helpful summary of that on the ObjectMentor site, but alas the website has gone the way of the company. However WELC has left a legacy in reviews and other articles. Check them out (or just buy the book), although the key lessons are the ones which S.Lott and tvanfosson cover in their replies.
2019 update: I have fixed the link to the WELC summary with a version from the Wayback Machine web archive (thanks #milia).
Also - and despite knowing that answers which comprise mainly links to other sites are low quality answers :) - here is a link to a new (2019 new) Google tutorial on Testing and Debugging ML code. I hope this will be of illumination to future Seekers who stumble across this answer.

"then I'm not really testing out everything that could go wrong."
Correct.
The job of unit tests is not to test everything that could go wrong.
The job of unit tests is to test that what you have does the right thing, given specific inputs and specific expected results. The important part here is the specific visible, external requirements are satisfied by specific test cases. Not that every possible thing that could go wrong is somehow prevented.
Nothing can test everything that could go wrong. You can write a proof, but you'll be hard-pressed to write tests for everything.
Choose your test cases wisely.
Further, the job of unit tests is to test that each small part of the overall application does the right thing -- in isolation.
Your "code that approximated a curve with a lower-resolution curve" for example, probably has several small parts that can be tested as separate units. In isolation. The integrated whole could also be tested to be sure that it works.
Your "computationally intensive work on the lower-resolution curve" for example, probably has several small parts that can be tested as separate units. In isolation.
That point of unit testing is to create small, correct units that are later assembled.

Without seeing your code, it's hard to tell, but I suspect that you are attempting to write tests at too high a level. You might want to think about breaking your methods down into smaller components that are deterministic and testing these. Then test the methods that use these methods by providing mock implementations that return predictable values from the underlying methods (which are probably located on a different object). Then you can write tests that cover the domain of the various methods, ensuring that you have coverage of the full range of possible outcomes. For the small methods you do so by providing values that represent the domain of inputs. For the methods that depend on these, by providing mock implementations that return the range of outcomes from the dependencies.

Your unit tests need to employ some kind of fuzz factor, either by accepting approximations, or using some kind of probabilistic checks.
For example, if you have some function that returns a floating point result, it is almost impossible to write a test that works correctly across all platforms. Your checks would need to perform the approximation.
TEST_ALMOST_EQ(result, 4.0);
Above TEST_ALMOST_EQ might verify that result is between 3.9 and 4.1 (for example).
Alternatively, if your machine learning algorithms are probabilistic, your tests will need to accommodate for it by taking the average of multiple runs and expecting it to be within some range.
x = 0;
for (100 times) {
x += result_probabilistic_test();
}
avg = x/100;
TEST_RANGE(avg, 10.0, 15.0);
Ofcourse, the tests are non-deterministic, so you will need to tune them such that you can get non-flaky tests with a high probability. (E.g., increase the number of trials, or increase the range of error).
You can also use mocks for this (e.g, a mock random number generator for your probabilistic algorithms), and they usually help for deterministically testing specific code paths, but they are a lot of effort to maintain. Ideally, you would use a combination of fuzzy testing and mocks.
HTH.

Generally, for statistical measures you would build in an epsilon for your answer. I.E. the mean square difference of your points would be < 0.01 or some such. Another option is to run several times and if it fails "too often" then you have an issue.

Get an appropriate test dataset (maybe a subset of what your using usually)
Calculate some metric on this dataset (e.g. the accuracy)
Note down the value obtained (cross-validated)
This should give an indication of what to set the threshold for
Of course if can be that when making changes to your code the performance on the dataset will increase a little, but if it ever decreases by large this would be an indication something is going wrong.

How to write dynamically self balancing system for testability?

I am about to embark on writing a system that needs to re-balance it's load distribution amongst the remaining nodes once one of more of the nodes involved fail. Anyone have any good references on what to avoid and what works?
In particular I'm curious how one should start in order to build such a system to to be able to unit-test it.

This question smells like my distributed systems class. So I feel I should point out the textbook we used.
It covers many aspects of distributed systems at an abstract level, so a lot of its content would apply to what you're going to do.
It does a pretty good job of pointing out pitfalls and common mistakes, as well as giving possible solutions.
The first edition is available for free download from the authors.
The book doesn't really cover unit-testing of distributed systems though. I could see entire book written on just that.

This sounds like a task that involves a considerable degree of out-of-process communication and other environment-dependent code.
To make your code Testable, it is important to abstract such code away from your main logic so that you can unit test the core engine without having to depend on any of these environment-specific things.
The recommended approach is to hide such components behind an interface that you can then replace with so-called Test Doubles in unit tests.
The book xUnit Test Patterns cover many of these things, and much more, very well.

Is it possible to use TDD with image processing algorithms?

Recently, I have worked in a project were TDD (Test Driven Development) was used. The project was a web application developed in Java and, although unit-testing web applications may not be trivial, it was possible using mocking (we have used the Mockito framework).
Now I will start a project where I will use C++ to work with image processing (mostly image segmentation) and I'm not sure whether using TDD is a good idea. The problem is that is very hard to tell whether the result of a segmentation is right or not, and the same problem applies to many other image processing algorithms.
So, what I would like to know is if someone here have successfully used TDD with image segmentation algorithms (not necessarily segmentation algorithms).

at a minimum you can use the tests for regression testing. For example, suppose you have 5 test images for a particular segmentation algorithm. You run the 5 images through the code and manually verify the results. The results, when correct, are stored on disk somewhere, and future executions of these tests compare the generated results to the stored results.
that way, if you ever make a breaking change, you'll catch it, but more importantly you only have to go through a (correct) manual test cycle once.

Whenever I do any computer-vision related development TDD is almost standard practice. You have images and something you want to measure. Step one is to hand-label a (large) subset of the images. This gives you test data. The process (for full correctness) is then to divide your test-set in two, a "development set" and a "verification set". You do repeated development cycles until your algorithm is accurate enough when applied to the development set. Then you verify the result on the veriication set (so that you're not overtraining on some weird aspect of your development set.
This is test driven development at its purest.
Note that you're testing two different things when developing heavily algorithm dependent software like this.
The regular bugs you'll get in your software. These can be tested using "normal" TDD techniques
The performance of your algorithm, for which you need a system outlined above.
A program can be bug free according to (1) but not quite according to (2). For example, a very simple image segmentation algorithm says: "the left half of the image is one segment, the right half is another segment. This program can be made bug free according to (1) quite easily. It is another matter entirely wether it satisfies your performance needs. Don't confuse the two aspects, and don't let one interfere with the other.
More specifically, I'd advice you to develop the algorithm first, buggy warts and all, and then use TDD with the algorithm (not the code!) and perhaps other requirements of the software as specification for a separate TDDevelopment process. Doing unit tests for small temporary helper functions deep within some reasonably complex algorithm under heavy development is a waste of time and effort.

TDD in image processing only makes sense for deterministic problems like:
image arithmetic
histogram generation
and so on..
However TDD is not suitable for feature extraction algorithms like:
edge detection
segmentation
corner detection
... since no algorithm can solve this kind of problems for all images perfectly.

I think the best you can do is test the simple, mathematically well-defined building blocks your algorithm consists of, like linear filters, morphological operations, FFT, wavelet transforms etc. These are often tricky enough to implement efficiently and correctly for all border cases so verifying them does make sense.
For an actual algorithm like image segmentation, TDD doesn't make much sense IMHO. I don't even think unit-tests make sense here. Sure, you can write tests, but those will always be extremely fragile. A typical image processing algorithm needs a few parameters that have to be adjusted for the desired results (a process that can't be automated, and can't be done before the algorithm is working). The results of a segmentation algorithm aren't well defined either, but your unit test can only test for some well-defined property. An algorithm can have that property without doing what you want, or the other way round, so your test result isn't very informative. Also, to test the results of a segmentation algorithm you need to write a lot of pretty hard code, while verifying the results visually is pretty easy and you have to do it anyway.
I think in a way it's similar to unit-testing user interfaces: Testing the actual well-defined functionality (e.g. when the user clicks this button, some item is added to this list and this label shows that text...) is relatively easy and can save a lot of work and debugging. But no test in the world will tell you if your UI is usable, understandable or pretty, because these things just aren't well defined.

we had some discussion on the very same "problem" with many remarks mentioned in your comments below those answers here.
We came to the end, that TDD in in computer vision / image processing (concerning the global goal of segmention, detection or sth like that) could be:
get an image/sequence that should be processed and create a test for that image: desired output and a metric to tell how far your result may differ from that "ground truth".
get another image/sequence for a different setting (different lighting, different objects or something like that), where your algorithm fails and write a test for that.
improve your algorithm in a way that it solves all previous tests.
go back to 2.
no idea whether this is applicable, creating the tests will be much more complex than in traditional TDD since it might be hard to define the allowed differences between your ground truth and your algorithm output.
Probably it's better to just use some QualityDrivenDevelopment where your changes just shouldnt make things "worse" (you again have to find a metric for that) than before.
Obiviously you still can use traditional unit testing for deterministic parts of those algorithms, but that's not the real problem of "TDD-in-signal-processing"

The image processing tests that you describe in your question take place at a much higher level than most of the tests that you will write using TDD.
In a true Test Driven Development process you will first write a failing test before adding any new functionality to your software, then write the code that causes the test to pass, rinse and repeat.
This process yields a large library of Unit Tests, sometimes with more LOC of tests than functional code!
Because your analytic algorithms have structured behavior, they would be an excellent match for a TDD approach.
But I think the question you are really asking is "how do I go about executing a suite of Integration Tests against fuzzy image processing software?" You might think I am splitting hairs, but this distinction between Unit Tests and Integration Tests really gets to the heart of what Test Driven Development means. The benefits of the TDD process come from the rich supporting fabric of Unit Tests more than anything else.
In your case I would compare the Integration Test suite to automated performance metrics against a web application. We want to accumulate a historical record of execution times, but we probably don't want to explicitly fail the build for a single poorly performing execution (which might have been affected by network congestion, disk I/O, whatever). You might set some loose tolerances around performance of your test suite and have the Continuous Integration server kick out daily reports that give you a high level overview of the performance of your algorithm.

I'd say TDD is much easier in such an application than in a web one. You have a completely deterministic algorithm you have to test. You don't have to worry about fuzzy stuff like user input and HTML rendering.
Your algorithm consists of a number of steps. Each of these steps can be tested. If you give them fixed, known input, they should yield fixed, known output. So write a test for that. You can't test that the algorithm "is correct" in general, but you can give it data for which you've already precomputed the correct result, so you can verify that it yields the correct output in that case.

I am not really into your problem, so I don't know its hot spots. However, the final result of your algorithm is hopefully deterministic, so you can perform functional testing on it. Of course, you will have to determine a "known good" result. I know of TDD performed on graphic libraries (VTK, to be precise). The comparison is done on the final result image, pixel by pixel. Without going in so much detail, if you have a known good result, you can perform an md5 of the test result and compare it against the md5 of the known-good.
For unit testing, I am pretty sure you can test individual routines. This will force you to have a very fine-grained development style.

Might want to take a look at this paper

If your goal is to optimize an algorithm rather than verifying correctness you need a metric. A good metric would measure the performance criteria underlying in your algorithm. For a segmentation algorithm this could be the sum of standard deviations of pixel data within each segment. Using the metric you can use threshold levels of acceptance or rank versions of the algorithm.

You can use a statistical approach where you have many examples and correct outcomes, and the test runs all of them and evaluates the algorithm on them. It then produces a single number that is the combined success rate of all of them.
This way you are less sensitive to specific failures and your test is more robust.
You can then use a threshold on the success rate to see if the test failed or not.

Is Unit Testing Suitable for BPM Development?

I'm currently working on a large BPM project at work which uses the Global 360 BPM tool-set called Process 360. Just to give some background; this product works like a lot of other BPM solutions in that you design multiple "process maps" which define the flow of a particular business process you're trying to model, and each process map consists of multiple task nodes connected together which perform particular functions (calling web-services etc).
Currently we're experiencing some pretty serious issues during QA phases of our releases because there isn't any way provided by the tool-set to automate testing of the process map routes. So when a large and complex process is developed and handed over to our test team there are often a large number of issues which crop up. While obviously you'd expect some issues to come out of QA, I can't help the feeling that a lot of the bugs etc could have been spotted during development if we had some sort of automated testing framework which we could use to build up a set of unit tests which proved the various routes in the process map(s).
At the moment the only real development testing that occurs is more akin to functional testing performed by the developers which is documented as a set of manual steps per test-case. The problem with this approach is that it's very time consuming for the developers to run manually, and because of this, is also relatively prone to error. Also; because we're usually on a pretty tight schedule, the tests are often not executed often enough to spot issues early.
As I mentioned earlier; there isn't a way provided by the current tool-set to perform this sort of automated testing. Which actually got me thinking why? Being very new to the whole BPM scene my assumption was that this was just a feature lacking in the product, but I also wonder whether "unit testing" just isn't done in the BPM world traditionally? Perhaps it just isn't suited well to this sort of work?
I'd be interested to know if anyone else has ever encountered these sorts of issues, and also what - if anything - can be done to improve things.

I have seen something about that, though not Global 360 related : using bpelunit for testing processes
I develop a workflow tool and there is an increased demand for opening the test tools used for testing the engine to end-users.

I've done "unit" testing with K2.net 2003, another commercial BPM. I'd really call this integration testing, because it requires a test server and it's relatively slow. However, it is automated.
There's a good discussion of this in the book Professional K2 blackpearl (it applies to K2.net 2003 as well).
In order to apply it to your platform, the tool set has to have an API that permits starting process instances, obtaining work items, completing work items, etc. You write tests using any supported language (I used C#) and a testing framework (I used NUnit). If the API supports synchronous calls, this is easier to do. For each test:
Start the process under test
Progress the work item to a decision point
Set process instance data appropriately
Complete the work item
Assert that the work item is now at the expected activity
Delete or complete the process instance
Base test classes or helper methods can make this easier. You could even write a DSL for testing maps.
Essentially you want full "test coverage" of the process/map - test every decision point and insure that the correct branch is taken.

There are two aspects of BPM that are related yet not identical.
There's BPM that tool and technology vendors advocate for which is all about features.
There's also BPM that Enterprise Architects advocate for which is all about establishing Centers of Excellence.
The former is where a company buys some software.
The latter is where a company makes systemic and inherent changes to the behavior of their IT workers.
The former is supposed to be in the service of the latter but that isn't necessarily so. Acquiring the former is necessary but not sufficient for achieving the latter.
I don't know just how well that Global 360 supports what is known as Test Driven Development but JBoss jBPM does provide some tool support for easily writing JUnit tests.
However, the tool can't and won't force developers to write them or to embrace TDD principles.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js