Automated testing a game

Automated testing a game - unit-testing

Question
How would you go adding automated testing to a game?
I believe you can unit test a lot of the game engine's functionality (networking, object creation, memory management, etc), but is it possible to automate test the actual game itself?
I'm not talking about gameplay elements (like Protoss would beat Zerg in map X), but I'm talking about the interaction between the game and the engine.
Introduction
In game development, the engine is just a platform for the game. You could think of the game engine as an OS and the game as a software the OS would run. The game could be a collection of scripts or an actual subroutine inside the game engine.
Possible Answers
My idea is this:
You would need an engine that is deterministic. This means that given one set of input, the output would be exactly the same. This would inlude the random generator being seeded with the same input.
Then, create a bare-bone level which contains a couple of objects the avatar/user can interact with. Start small and then add objects into the level as more interactions are developed.
Create a script which follows a path (tests pathfinding) and interact with the different objects (store the result or expected behavior). This script would be your automated test. After a certain amount of time (say, one week), run the script along with your engine's unit tests.

This post at Games From Within might be relevant/interesting.

Riot Games has an article on using automated testing for League of Legends (LoL), a multiplayer online RTS game.
According to the developers, there are many changes to both the game code and game balance everyday. They built a Python test framework that is basically a simpler game client that sends commands to the Continuous Integration server that is running an instance of LoL's game server. The server then send the test framework the effect of the command, allowing the response to be tested.
The framework provides an event queue that records the events, data, and effect from a particular point in time. The article calls this a "snapshot".
The article described an example of a unittest for a spell:
Setup
1. Give a character the ability.
2. Spawn an enemy character in the midlane (a location on the map).
3. Spawn a creep in the midlane. (In the context of LoL, creeps are weak non-controllable characters that are part of each team's army. They are basically canon fodder and is a source of experience and gold for the enemy team. But if left unchecked, they can overwhelm the opposing team)
4. Teleport the character to the midlane.
Execute
1. Take a snapshot of all the variables (e.g. the current life from the player, enemy and normal characters).
2. Cast the spell.
3. Activate the spell's effects (for example, there are some spells that will proc on hit) on an enemy character.
4. Reset the spell's cooldown so it can be cast again immediately.
5. Cast the spell.
6. Activate the spell's effects on a creep (in the context of LoL, most spells have different calculations when used on creeps).
7. Take another snapshot.
Verify
Starting from the first snapshot, replay the events, and assert that the expected results (from a game designer's point of view) are correct. Examples of events that can be verified are: The damage is within the range of the spell's damage (LoL uses random numbers to give variance to attacks), Damage is properly resisted when compared with a player character and a creep, and spells are cast within its effective range.
The article shows that a video of the test can be extracted when the test server is viewed from a normal game client.

Values are so random within the gameplay aspects of development that it would be a far fetched idea to test for absolute values
But we can test deterministic values. For example, a unit test might have Guybrush Threepwood move toward a door (pathfinding), open the door (use command), fail because he doesn't have a key in his inventory (feedback), pick the door key (pathfinding + inventory management) and then finally opening the door.
All of these paths are deterministic. With this unit test, I can refactor the memory manager and if it somehow broke the inventory management routine, the unit test would fail.
This is just one idea for unit testing in games. I would love to know other ideas, hence, the motivation for this post.

I did something similar to your idea once and it was very successful, though I suspect it is really more of a system test than a unit test. As you suggest your random number generator must be seeded with the same value, and must produce an identical sequence each time.
The game ran on 50hz cycles, so timing was not an issue. I had a system that would record mouse clicks and locations, and used this to manually generate a 'script' which could be replayed to produce the same results. By removing the timing delays and turning off the graphic generation an hour of gameplay could be replicated in a few seconds.
The biggest problem was that changes to the game design would invalidate the script.
If your barebones room contained logic that was independent of the general game play then it could work very well. The engine could start up without any ui and start the script as soon as initialisation is complete. Testing for crashing along the way would be simple, but more complex tests such as leaving the characters in the correct positions would be more complex. If the recording of the scripts are simple enough, which they were in my system, then they can be updated very easily, and special scripts to test specialised behavior can be set up very quickly. My system had the added advantage that it could be used during game testing, and the exact sequence of events recorded to make bug fixing easier.

An article from Power of Two GamesGames From Within was mentioned in another answer already, but I suggest reading everything (or nearly everything) there, as they are all really well-written and apply directly to games development. The article on Assert is particularly good. You can also visit their previous website at Games From Within, which has a lot written about Test Driven Development, which is unit testing taken to the extreme.
The Power of Two guys are the ones who implemented UnitCpp, a pretty well-regarded unit testing framework. Personally, I prefer WinUnit.

If you are testing the rendering engine I guess you could render specific test scenes, do a screen captures and compare them to reference test renderings. That way you can detect if changes in the engine breaks anything, visually. You can write similar test for the sound engine, or even animation (by comparing a series of frames).
If you want to test game logic or scene progress you can do this by testing various conditions on the scripting variables (assuming you are using scripting to implement most of the scene and story aspects).

If you're using XNA (the idea could be extrapolated to other frameworks of course), you could use an in-game unit test framework that lets you access the game's state in the unit test. One such framework is Scurvy.Test :-)

http://flea.sourceforge.net/gameTestServer.pdf
This is an interesting discussion on implementing a full-blown functional tester in a game.
The term "unit testing" implies that a "unit" is being tested. This is one thing. If you're doing higher-level testing (e.g. several systems at once), usually this is called functional testing. It is possible to unit test much of a game, however you can't really test for fun.
Determinism isn't necessary, as long as your tests can be fuzzy. E.g. "did the character get hurt" as opposed to "did the character lose 14.7 hitpoints.

I have written a paper on that topic -
http://download.springer.com/static/pdf/722/art%253A10.7603%252Fs40601-013-0010-4.pdf?auth66=1407852969_87bc2e71ad5228b36738f0237084ebe5&ext=.pdf

This doesn't really answer your question but I was listening to a podcast on Pex from microsoft which does a similar thing to the solution you're proposing and when I was listening to it I remember thinking that it would be really interesting to see if it would be able to test games. I don't know if it would be able to help you specifically, but perhaps you could take a look at some of the ideas they use and apply it to your unit testing.

Related

When to start worrying about netcode?

I'm currently working on a C++/SDL/OpenGL game. I've already made a few small games, but only local ones (no netcode). So I know how to make the engine, but I'm unsure about the netcode.
Can I firstly create the full engine for split-screen play and later on add the netcode or will this make everything complicated? Do I already have to take netcode into consideration while programming the basic game engine or is it also okay to just put it on top of the game after it runs fine on one machine?
It's a 2D shooter type game, if that matters. And no, I don't like to change my choice of programming language/window manager/api because I already implemented the bare bones of the game. I'm just curous how this issue is approached best.

In theory, all you need is a good enough design. Write enough abstract classes and BAM! you can pop out one user interface (i.e. local-only) for another one (networked). I wouldn't believe the theory, though.
It's possible to do what you want, but it involves taking into consideration all of the new issues you address when dealing with networked gameplay - syncing views for multiple users, what to do when one user drops their network link (how to detect when one user drops their network link, of course), network latency in receiving user input, handling lag on one side and not the other. Networked programming is completely different, and some of the aspects (largely ones dealing with synchronization) may impact your core engine itself. Even "just showing two views" gets a lot tougher, because you now have data on two completely different machines, and the data isn't necessarily the same.
My suggestion would be to do the opposite of what you're hoping for. Get the networking code working first with minimal graphics. In fact, console messages will be far more important than pretty graphics. You already have experience with making the graphics of other games - work the most questionable technology first. Get a good feel of all the things the networked code will ask of you, then focus on the graphics afterwards.

Normally for a network oriented game there are five concepts too keep in mind:
events
dispatcher
synchronization
rendering
simulation
Events. A game engine is a event software, that means over a state of each generic object in the game (can be a unit, GUI, etc), you do an action, that means, you call a function or do nothing.
Dispatcher take each event change and dispatch that change to another subsystem.
Synchronization means that over a change of event, all clients in network must be advised throw his dispatcher over that change, in this way all players can see the changes of other players, render and simulate same things at same time.
Rendering The render read parameters and relevant states for each object and draw in screen. For example, is you have a property for each unit named life_points, you can draw a normal unit if life_points>50 and a damage unit if life_point>0 and life_point<50 and a destroyed unit if life_point=0. Render dont make changes in objects, just draw what read from them.
Simulation read every object and perform some task taking on count states and properties, for example, if you have cero point of life, you mark the state of a unit as DEAD (for example) or change de GUI, or if a unit get close to another of a enemy team, you change the state from static to move moving close to that another unit. Plus this, here you make the physics of units, changing positions, rotations, etc etc... as you have all objects synchronized over network, everybody will be watching the same thing.
Best regards.

Add in netcode as soon as you can. If you don't do this you may have to overhaul a lot of the engine later in the dev cycle, so better to do it early.
It also depends on how complex the game is, but the same principles still stand. Best not to tack it on at the last second
Hope this helps!

What form of testing should I perform?

I want to write an algorithm (a bunch of machine learning algorithms) in C/C++ or maybe in Java, possibly in Python. The language doesn't really matter to me - I'm familiar with all of the above.
What matters to me is the testing. I want to train my models using training data. So I have the test input and I know what the output should be and I compare it to the model's output. What kind of test is this? Is it a unit test? How do I approach the problem? I can see that I can write some code to check what I need checking but I want to separate testing from main code. Testing is a well developed field and I've seen this done before but I don't know the name and type of this particular kind of testing so that I can read up on it and not create a mess. I'd be grateful if you could let me know what this testing method is called.

Your best bet is watch the psychology of testing videos from the tetsing God http://misko.hevery.com/
Link of Misko videos:
http://misko.hevery.com/presentations/
And read this Google testing guide http://misko.hevery.com/code-reviewers-guide/
Edited:
Anyone can write tests, they are really simple and there is no magic to write a test, you can simply do something like:
var sut = new MyObject();
var res = sut.IsValid();
if(res != true)
{
throw new ApplicationException("message");
}
That is the theory of course these days we have tools to simplify the tests and we can write something like this:
new MyObject().IsValid().Should().BeTrue();
But what you should do is focus on writing testable code, that's the magic key
Just follow the psychology of testing videos from Misko to get you started

This sounds a lot like Test-Driven Development (TDD), where you create unit-tests ahead of the production code. There are many detailed answers around this site on both topics. I've linked to a couple of pertinent questions to get you started.

If your inputs/outputs are at the external interfaces of your full program, that's black box system testing. If you are going inside your program to zoom in on a particular function, e.g., a search function, providing inputs directly into the function and observing the behavior, that's unit testing. This could be done at function level and/or module level.

If you're writing a machine learning project, the testing and training process isn't really Test-Driven Development. Have you ever heard of co-evolution? You have a set puzzles for your learning system that are, themselves, evolving. Their fitness is determined by how much they confound your cases.
For example, I want to evolve a sorting network. My learning system is the programs that produce networks. My co-evolution system generates inputs that are difficult to sort. The sorting networks are rewarded for producing correct sorts and the co-evolutionary systems are rewarded for how many failures they trigger in the sorting networks.
I've done this with genetic programming projects and it worked quite well.

Probably back testing, which means you have some historical inputs and run your algorithm over them to evaluate the performance of your algorithm. The term you used yourself - training data - is more general and you could search for that to find some useful links.

Its Unit testing. the controllers are tested and the code is checked in and out without really messing up your development code. This process is also called a Test Driven Development(TDD) where your every development cycle is tested before going into the next software iteration or phase.

Although this is a very old post, my 2 cents :)
Once you've decided which algorithmic method to use (your "evaluation protocol", so to say) and tested your algorithm on unitary edge cases, you might be interested in ways to run your algorithm on several datasets and assert that the results are above a certain threshold (individually, or on average, etc.)
This tutorial explains how to do it within the pytest framework, that is the most popular testing framework within python. It is based on an example (comparing polynomial fitting algorithms on several datasets).
(I'm the author, feel free to provide feedback on the github page!)

UI testing vs unit testing

what is the different purpose of those both? I mean, in which condition I should do each of them?
as for the example condition. if you have the backend server and several front-end webs, which one you'll do?do-unit testing the backend server first or do-UI testing in the web UI first?
given the condition, the server and the front-end webs already exist, so it's not an iterative design to build along with (TDD)...

Unit testing aims to test small portions of your code (individual classes / methods) in isolation from the rest of the world.
UI testing may be a different name for system / functional / acceptance testing, where you test the whole system together to ensure it does what it is supposed to do under real life circumstances. (Unless by UI testing you mean usability / look & feel etc. testing, which is typically constrained to details on the UI.)
You need both of these in most of projects, but at different times: unit testing during development (ideally from the very beginning, TDD style), and UI testing somewhat later, once you actually have some complete end-to-end functionality to test.
If you already have the system running, but no tests, practically you have legacy code. Strive to get the best test coverage achievable with the least effort first, which means high level functional tests. Adding unit tests is needed too, but it takes much more effort and starts to pay back later.
Recommended reading: Working Effectively with Legacy Code.

Unit test should always be done. Unittests are there to provide proof that each UNIT (read: object) of your technical solution delivers the expected results. To put it very (maybe too) simple, user testing is there to verify that your system fulfills the needs and demands of the user.

Test pyramid [1] is important concept here, well described by Martin Fawler.
In short, tests that run end-to-end through the UI are: brittle and expensive to write. You may consider test recording tools [2] to speed recording and re-recording up. Disclaimer - I'm developer of such tool.
[1] https://martinfowler.com/articles/practical-test-pyramid.html
[2] https://anwendo.com

In addition to the accepted answer, today I just came up with this question of why not just programmatically trigger layout functions and then unit-test your logic around that as well?
The answer I got from a senior dev was: programmatically trigger layout functions will not be an absolute copy of the real user-experience. In the real world, the system will trigger many callbacks, like when the user of an app backgrounds or foregrounds the app. Obviously you can trigger such events manually and test again, but would you be sure you got all events in all sequences right?!
The real user-experience is one where user makes actual network calls, taps on screens, loads multiple screen on top of each other and at times you might get system callbacks. Callbacks which you forgot to mock that you didn't properly mock. In unit-tests you're mainly testing in isolation. In UI test, you setup the app, may have to login, etc. That stack you build is much more complex vs a unit-test. Hence it's better to not mix unit-testing with UI testing.

TDD with diagrams

I have an app which draws a diagram. The diagram follows a certain schema,
for e.g shape X goes within shape Y, shapes {X, Y} belong to a group P ...
The diagram can get large and complicated (think of a circuit diagram).
What is a good approach for writing unit tests for this app?

Find out where the complexity in your code is.
separate it out from the untestable visual presentation
test it
If you don't have any non-visual complexity, you are not writing a program, you are producing a work of art.
Unless you are using a horribly buggy compiler or something, I'd avoid any tests that boil down to 'test source code does what it says it does'. Any test that's functionally equivalent to:
assertEquals (hash(stripComments(loadSourceCode())), 0x87364fg3234);
can be deleted without loss.

It's hard to write defined unit tests for something visual like this unless you really understand the exact sequence of API calls that are going to be built.
To test something "visual" like this, you have three parts.
A "spike" to get the proper look, scaling, colors and all that. In some cases, this is almost the entire application.
A "manual" test of that creates some final images to be sure they look correct to someone's eye. There's no easy way to test this except by actually looking at the actual output. This is hard to automate.
Mock the graphics components to be sure your application calls the graphics components properly.
When you make changes, you have to run both tests: Are the API calls all correct? and Does that sequence of API calls produce the image that looks right?
You can -- if you want to really burst a brain cell -- try to create a PNG file from your graphics and test to see if the PNG file "looks" right. It's hardly worth the effort.
As you go forward, your requirements may change. In this case, you may have to rewrite the spike first and get things to look right. Then, you can pull out the sequence of API calls to create automated unit tests from the spike.
One can argue that creating the spike violates TDD. However, the spike is designed to create a testable graphics module. You can't easily write the test cases first because the test procedure is "show it to a person". It can't be automated.

You might consider first converting the initial input data into some intermediate format, that you can test. Then you forward that intermediate format to the actual drawing function, which you have to test manually.
For example when you have a program that inputs percentages and outputs a pie chart, then you might have an intermediate format that exactly describes the dimensions and position of each sector.

You've described a data model. The application presumably does something, rather than just sitting there with some data in memory. Write tests which exercise the behaviour of the application and verify the outcome is what is expected.

What's the Point of Selenium?

Ok, maybe I'm missing something, but I really don't see the point of Selenium. What is the point of opening the browser using code, clicking buttons using code, and checking for text using code? I read the website and I see how in theory it would be good to automatically unit test your web applications, but in the end doesn't it just take much more time to write all this code rather than just clicking around and visually verifying things work?
I don't get it...

It allows you to write functional tests in your "unit" testing framework (the issue is the naming of the later).
When you are testing your application through the browser you are usually testing the system fully integrated. Consider you already have to test your changes before committing them (smoke tests), you don't want to test it manually over and over.
Something really nice, is that you can automate your smoke tests, and QA can augment those. Pretty effective, as it reduces duplication of efforts and gets the whole team closer.
Ps as any practice that you are using the first time it has a learning curve, so it usually takes longer the first times. I also suggest you look at the Page Object pattern, it helps on keeping the tests clean.
Update 1: Notice that the tests will also run javascript on the pages, which helps testing highly dynamic pages. Also note that you can run it with different browsers, so you can check cross-browser issues(at least on the functional side, as you still need to check the visual).
Also note that as the amount of pages covered by tests builds up, you can create tests with complete cycles of interactions quickly. Using the Page Object pattern they look like:
LastPage aPage = somePage
.SomeAction()
.AnotherActionWithParams("somevalue")
//... other actions
.AnotherOneThatKeepsYouOnthePage();
// add some asserts using methods that give you info
// on LastPage (or that check the info is there).
// you can of course break the statements to add additional
// asserts on the multi-steps story.
It is important to understand that you go gradual about this. If it is an already built system, you add tests for features/changes you are working on. Adding more and more coverage along the way. Going manual instead, usually hides what you missed to test, so if you made a change that affects every single page and you will check a subset (as time doesn't allows), you know which ones you actually tested and QA can work from there (hopefully by adding even more tests).

This is a common thing that is said about unit testing in general. "I need to write twice as much code for testing?" The same principles apply here. The payoff is the ability to change your code and know that you aren't breaking anything.

Because you can repeat the SAME test over and over again.

If your application is even 50+ pages and you need to do frequent builds and test it against X number of major browsers it makes a lot of sense.

Imagine you have 50 pages, all with 10 links each, and some with multi-stage forms that require you to go through the forms, putting in about 100 different sets of information to verify that they work properly with all credit card numbers, all addresses in all countries, etc.
That's virtually impossible to test manually. It becomes so prone to human error that you can't guarantee the testing was done right, never mind what the testing proved about the thing being tested.
Moreover, if you follow a modern development model, with many developers all working on the same site in a disconnected, distributed fashion (some working on the site from their laptop while on a plane, for instance), then the human testers won't even be able to access it, much less have the patience to re-test every time a single developer tries something new.
On any decent size of website, tests HAVE to be automated.

The point is the same as for any kind of automated testing: writing the code may take more time than "just clicking around and visually verifying things work", maybe 10 or even 50 times more.
But any nontrivial application will have to be tested far more than 50 times eventually, and manual tests are an annoying chore that will likely be omitted or done shoddily under pressure, which results in bugs remaining undiscovered until just bfore (or after) important deadlines, which results in stressful all-night coding sessions or even outright monetary loss due to contract penalties.

Selenium (along with similar tools, like Watir) lets you run tests against the user interface of your Web app in ways that computers are good at: thousands of times overnight, or within seconds after every source checkin. (Note that there are plenty of other UI testing pieces that humans are much better at, such as noticing that some odd thing not directly related to the test is amiss.)
There are other ways to involve the whole stack of your app by looking at the generated HTML rather than launching a browser to render it, such as Webrat and Mechanize. Most of these don't have a way to interact with JavaScript-heavy UIs; Selenium has you somewhat covered here.

Selenium will record and re-run all of the manual clicking and typing you do to test your web application. Over and over.
Over time studies of myself have shown me that I tend to do fewer tests and start skipping some, or forgetting about them.
Selenium will instead take each test, run it, if it doesn't return what you expect it, it can let you know.
There is an upfront cost of time to record all these tests. I would recommend it like unit tests -- if you don't have it already, start using it with the most complex, touchy, or most updated parts of your code.

And if you save those tests as JUnit classes you can rerun them at your leisure, as part of your automated build, or in a poor man's load test using JMeter.

In a past job we used to unit test our web-app. If the web-app changes its look the tests don't need to be re-written. Record-and-replay type tests would all need to be re-done.

Why do you need Selenium? Because testers are human beings. They go home every day, can't always work weekends, take sickies, take public holidays, go on vacation every now and then, get bored doing repetitive tasks and can't always rely on them being around when you need them.
I'm not saying you should get rid of testers, but an automated UI testing tool complements system testers.

The point is the ability to automate what was before a manual and time consuming test. Yes, it takes time to write the tests, but once written, they can be run as often as the team wishes. Each time they are run, they are verifying that behavior of the web application is consistent. Selenium is not a perfect product, but it is very good at automating realistic user interaction with a browser.

If you do not like the Selenium approach, you can try HtmlUnit, I find it more useful and easy to integrate into existing unit tests.

For applications with rich web interfaces (like many GWT projects) Selenium/Windmill/WebDriver/etc is the way to create acceptance tests. In case of GWT/GXT, the final user interface code is in JavaScript so creating acceptance tests using normal junit test cases is basically out of question. With Selenium you can create test scenarios matching real user actions and expected results.
Based on my experience with Selenium it can reveal bugs in the application logic and user interface (in case your test cases are well written). Dealing with AJAX front ends requires some extra effort but it is still feasible.

I use it to test multi page forms as this takes the burden out of typing the same thing over and over again. And having the ability to check if certain elements are present is great. Again, using the form as an example your final selenium test could check if something like say "Thanks Mr. Rogers for ordering..." appears at the end of the ordering process.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js