Handing of C code in C++ (Vulkan) - c++

I am trying to write a rendering engine in C++ based on Vulkan. Vulkan is written in C, as a result it has some interesting conventions.
A recurring pattern I see in tutorials/code snippets from Vulkan apps is that most code is in 1 very big class. (right now my vulkan class is already about 2000 lines too). But to make a proper rendering engine, I will need to compartmentalize my code till some degree.
One of the aforementioned interesting bits is that it has something called a Logical Device, which is an abstract reference to the graphics card.
It is used everywhere, to create and allocate things in the following way:
Create structs with creation info
Create variable that the code will output into
Call the actual vkCreateSomething or vkAllocateSomething function, pass in the logical device,
the creation info and the reference to the variable to output to and check if it was a success.
on its own there is nothing wrong with this style I'd say. It's just that it's not really handy at all in OOP because it relies on the logical device being available everywhere.
How would I deal with this problem? Service locators and singletons are considered to be horrible solutions by many (which I can understand), so that seems like something I'd rather avoid.
Are there design patterns that deal with this?

The logical device is an actual dependency.
It has state, and its state needs to be available to work with the hardware.
You can use it as an argument to your operations, a value stored in pretty much every class, a global, or a monadic-esque "final" argument where every operation just returns something still needing the device to run on. You can replace a (pointer/reference to) it with a function returning a (pointer/reference to) it.
Consider if pure OOP is what you want to do; vulkan and rendering is more about operations than things being operated on. I would want to mix some functional programming patterns in, which makes the monad-like choice more reasonable.
Compose operations on buffers/data. These return operations, which also take buffers and data. The composition operation specifies which arguments are new inputs, and which are consumed by the next step. Doing this you can (at compile time) set up a type-safe graph of work to do, all without running anything.
The resulting composed operation would then have a setup (where you bind the logical device and anything you can do "early" before you need to have the expensive buffers ready), and an execute phase (where you feed it the expensive buffers and it generates output).
Or as another approach, find a compiler with coroutine support from c++2a and write it async yet procedurally.

Vulkan is a OOP API. It is not class-based, because it is C99 not C++. That can easily be fixed by using the official Vulkan-Hpp. You can consume it as vulkan.hpp which is part of the semi-official LunarG Vulkan SDK.
The usage would not be that different from vulkan.h though: you would probably have a member pointer/reference to a Device instance, or would have a VkDevice handle member in each object that needs it. Some higher level object would handle the lifetime of the Logical Device (e.g. your RenderingEngine class or such). The difference would be almost only esthetical: you would use device->command(...) instead of vkCommand(device, ...). vulkan.hpp does not seem to use proper RAII through constructors/destructors which is a shame.
Alternatively the user of your engine can manage the device. Though unlike OpenGL there is not much use for this. The user can make its own VkInstance and VkDevice if it also wishes to use Vulkan for something.
A recurring pattern I see in tutorials/code snippets from Vulkan apps is that most code is in 1 very big class.
That's not really specific to Vulkan. If you think about it, pretty much all C++ applications are one big class doing everything (only differences being how much the programmer bothers to delegate from it to some other class instances).

Related

Manually manage and update Intel TBB flow graph?

I have successfully prototyped an application using Intel's awesome TBB flow graph library. It seems to work quite well, but now I need to refactor the code into a production-ready version.
Previously, I have worked with some larger and more "over-developed" frameworks for this particular domain (the work is in image processing and previous applications have used ITK/VTK). For this application, however, I am trying to take a lower-level and more focused approach.
Currently, I am just assembling my entire graph in main() which is obviously not sustainable. I'd like to allow the pipeline to run iteratively so that I can grab output data from each stage and display it for debug/analysis purposes.
My idea so far is to abstract each logical "stage" of the application into a class that accepts a &tbb::flow::graph as a constructor argument and internally stores a reference to the graph node it controls. I can have the wrapper class allocate an additional tbb::flow::broadcast_node at the output and an async node after that to fire off events.
Is this a sensible design concept? Generally speaking, how have others integrated the TBB flow graph concepts into the structure of their application? The examples and documentation are quite scant for this particular portion of the TBB library.
As with any design, I don't think there is a clearly right or wrong way to do it. I don't know your code good enough, but breaking the code up by logical stages is probably a good idea.
When it comes to frameworks like TBB, a design decision that you have to make is whether you should hide all framework aspects behind an interface. The advantage is that you could later swap out the implementation with another implementation (e.g., replace TBB by OpenMP). On the other hand, introducing an additional layer might not be needed in all cases. Especially, if it is unlikely that you will ever replaced TBB.
The design design that you describe in your question is how to structure the framework dependent part. It depends very much on the concrete algorithm that you are implementing. For instance, if it consists of applying separate transformations on one images, creating one class per transformation step could be a good approach.
In addition, it might make sense to wrap everything in a function or class. If the operation that you are implementing takes one image as an input and produces one image as an output, this is something that can be hidden behind an interface that hides the implementation details (in this case TBB).

Strategies for managing the OpenGL state machine

I'm currently getting to grips with OpenGL. I started out with GLUT but decided to "graduate" to the SFML libraries. SFML actually provides even less GL utilities than GLUT, but is portable and provides some other functionalities. So it's really just me, GL and GLU. Yes, I'm a sucker for punishment.
I wanted to ask about strategies that people have for managing things such as matrix changes, colour changes, material changes etc.
Currently I am rendering from a single thread following a "Naked Objects" design philosophy. ie. Every graphical object has a Render() function which does the work of drawing itself. These objects may themselves be aggregates of further objects, or aggregates of graphical primitives. When a particular Render() is called it has no information about what transformations/ material changes have been called before it (a good thing, surely).
As things have developed I have settled on certain strategies such as making every function promise to push then pop the matrices if they perform any transformations. With other settings, I explicitly set anything that needs setting before calling glBegin() and take nothing for granted. Problems creep in when one render function makes some changes to less common state variables, and I am starting to consider using some RAII to enforce the reversal of all state changes made in a scope. Using OpenGL sometimes reminds me alot of assembly programming.
To keep this all manageable, and to help with debugging, I find that I am practically developing my own openGL wrapper, so I figured it would be good to hear about strategies that others have used, or thoughts and considerations on the subject. Or maybe it's just time to switch to something like a scene graph library?
Update : 13/5/11
Having now looked into rendering with vertex/normal/colour arrays and VBO's I have decided to consolidate all actual openGL communication into a separate module. The rendering process will consist of getting a load of GL independent spatial/material data out of my objects and then conveying all this information to openGL in an interpretable format. This means all raw array handling and state manipulation will be consolidated into one area. It's adds an extra indirection, and a little computation overhead to the rendering process, but it means that I can use a single VBO / array for all my data and then pass it all at once, once per frame to openGL.
So it's really just me, GL and GLU
I see nothing bad in that. I'd even get rid of GLU, if possible.
With other settings, I explicitly set
anything that needs setting before
calling glBegin() and take nothing for
granted.
Also this is a good strategy, but of course you should keep expensive state switches to a minimum. Instead of immediate mode (glBegin / glEnd) you should migrate to using vertex arrays and if available vertex buffer objects.
Problems creep in when one render
function makes some changes to less
common state variables, and I am
starting to consider using some RAII
to enforce the reversal of all state
changes made in a scope.
Older versions of OpenGL provide you the attribute stack with accessing functions glPushAttrib / glPopAttrib and glPushClientAttrib / glPopClientAttrib for the client states.
But yes, the huge state space of older OpenGL versions was one of the major reasons to slimming down OpenGL-3; what's been covered by a lot of fixed function pipeline states now is configured and accessed through shaders, where each shader encapsulates what would have been dozens of OpenGL state variable values.
Using OpenGL sometimes reminds me alot
of assembly programming.
This not a suprise at all, as the very first incarnation of OpenGL was designed assuming some abstract machine (the implementation) on which OpenGL calls are kind of the Operation Codes of that machine.
First, try not to use glBegin/glEnd calls for new development. They are deprecated in OpenGL 3 and simply don't work in OpenGL ES (iOS, WebOS, Android). Instead, use vertex arrays and VBOs to consolidate your drawing.
Second, instead of writing your own wrapper, take a look at some recent open source ones to see how they do things. For example, check out the Visualization Library (http://www.visualizationlibrary.com/jetcms/). It's a fairly thin wrapper around OpenGL, so it's worth a look.

In game programming are global variables bad?

I know my gut reaction to global variables is "badd!" but in the two game development courses I've taken at my college globals were used extensively, and now in the DirectX 9 game programming tutorial I am using (www.directxtutorial.com) I'm being told globals are okay in game programming ...? The site also recommends using only structs if you can when doing game programming to help keep things simple.
I'm really confused on this issue, and all the research I've been trying to do is very confusing. I realize there are issues when using global variables (threading issues, they make code harder to maintain, the state of them is hard to track etc) but also there is a cost associated with not using globals, I'd have to pass a loooot of information around very often which can be confusing and I imagine time-costing, although I guess pointers would speed the process up (this is my first time writing a game in C++.) Anyway, I realize there is probably no "right" or "wrong" answer here since both ways work, but I want my code to be as proper as I can so any input would be good, thank you very much!
The trouble with games and globals is that games (nowadays) are threaded at engine level. Game developers using an engine use the engine's abstractions rather than directly programming concurrency (IIRC). In many of the highlevel languages such as C++, threads sharing state is complex. When many concurrent processes share a common resource they have to make sure they don't tread on eachother's toes.
To get around this, you use concurrency control such as mutex and various locks. This in effect makes asynchronous critical sections of code access shared state in a synchronous manner for writing. The topic of concurrency control is too much to explain fully here.
Suffice to say, if threads run with global variables, it makes debugging very hard, as concurrency bugs are a nightmare (think, "which thread wrote that? Who holds that lock?").
There are exceptions in games programming API such as OpenGL and DX. If your shared data/globals are pointers to DX or OpenGL graphics contexts then typically this maps down to GPU operations which don't suffer so much from the same trouble.
Just be careful. Keeping objects representing 'player' or 'zombie' or whatever, and sharing them between threads can be tricky. Spawn 'player' threads and 'zombie group' threads instead and have a robust concurrency abstraction between them based on message passing rather than accessing those object's state across the thread/critical section boundary.
Saying all that, I do agree with the "Say no to globals" point made below.
For more on the complexities of threads and shared state see:
1 POSIX Threads API - I know it is POSIX, but provides a good idea that translates to other API
2 Wikipedia's excellent range of articles on concurrency control mechanisms
3 The Dining Philosopher's problem (and many others)
4 ThreadMentor tutorials and articles on threading
5 Another Intel article, but more of a marketing thing.
6 An ACM article on building multi-threaded game engines
Have worked on AAA game titles, I can tell you that globals should be eradicated immediately before they spread like a cancer. I've seen them corrupt an I/O subsystem so completely that it had to be wholly thrown out to be rewritten.
Say no to globals. Always.
In this respect, there's no difference between games and other programs. While arguably OK in small examples given in elementary courses, global variables are strongly discouraged in real programs.
So if you want to write correct, readable and maintainable code, stay away from global variables as much as possible.
All answers until now deal with the globals/threads issue, but I will add my 2 cents to the struct/class (understood as all public attributes/private attributes + methods) discussion. Preferring structs over classes on the grounds of that being simpler is in the same line of thought of preferring assembler over C++ on the grounds of that being a simpler language.
The fact that you have to think on how your entities are going to be used and provide methods for it makes the concrete entity a little more complex, but greatly simplifies the rest of the code and maintainability. The whole point of encapsulation is that it simplifies the program by providing clear ways in that your data can be modified while maintaining your objects invariants. You control the entry points and what can happen there. Having all attributes public imply that any part of the code can have a small innocent error (forgot to check condition X) and break your invariants completely (health below 0, but no 'death' processing being triggered)
The other common discussion is performance: If I just need to update a datum, then having to call a method will impact my performance. Not really. If methods are simple and you provide them in the header as inlines (inside the class body or outside with the inline keyword), the compiler will be able to copy those instructions to each use place. You get the guarantee that the compiler will not leave out any check by mistake, and no impact in performance.
Having read a bit more what you posted though:
I'm being told globals are okay in
game programming ...? The site also
recommends using only structs if you
can when doing game programming to
help keep things simple.
Games code is no different from other code really. Gratuitous use of globals is bad regardless. And as for 'only use structs', that is just a load of crap. Approach game development on the same principles as any other software - you may find places where you need to bend this but they should be the exception, typically when dealing with low-level hardware issues.
I would say that the advice about globals and 'keeping things simple' is probably a mixture of being easier to learn and old fashioned thinking about performance. When I was being taught about game programming I remember being told that C++ wasn't advised for games as it would be too slow but I've worked on multiple games using every facet of C++ which proves that isn't true.
I would add to everyone's answers here that globals are to be avoided where possible, I wouldn't be afraid to use whatever you need from C++ to make your code understandable and easy to use. If you come up against a performance problem then profile that specific issue and I'll bet that most of the time you won't need to remove use of a C++ feature but just think about your problem better. There may still be some platforms around that require pure C but I don't really have experience of them, even the Gameboy Advance seemed to deal with C++ quite nicely.
The metaissue here is that of state. How much state does any given function depend on, and how much does it change? Then consider how much state is implicit versus explicit, and cross that with the inspect vs. change.
If you have a function/method/whatever that is called DoStuff(), you have no idea from the outside what it depends on, what it needs, and what's going to happen to the shared state. If this is a class member, you also have no idea how that object's state is going to mutate. This is bad.
Contrast to something like cosf(2), this function is understood not to change any global state, and any state that it requires (lookup tables for example) are hidden from view and have no effect on your program-at-large. This is a function that computes a value based on what you give it and it returns that value. It changes no state.
Class member functions then have the opportunity to step up some problems. There's a huge difference between
myObject.hitpoints -= 4;
myObject.UpdateHealth();
and
myObject.TakeDamage(4);
In the first example, an external operation is changing some state that one of its member functions implicitly depends upon. The fact is that these two lines can be separated by many other lines of code begins to make it non-obvious what's going to happen in the UpdateHealth call, even if outside of the subtraction it is the same as the TakeDamage call. Encapsulating the state changes (in the second example) implies that the specifics of the state changes aren't important to the outside world, and hopefully they're not. In the first example, the state changes are explicitly important to the outside world, and this is really no different than setting some globals and calling a function that uses those globals. E.g. hopefully you'd never see
extern float value_to_sqrt;
value_to_sqrt = 2.0f;
sqrt(); // reads the global value_to_sqrt
extern float sqrt_value; // has the results of the sqrt.
And yet, how many people do exactly this sort of thing in other contexts? (Considering especially that class instance state is "global" in regards to that particular instance.)
So- prefer giving explicit instruction to your function calls, and prefer that they return the results directly rather than having to explicitly set state before calling a function and then checking other state after it returns.
The more state dependencies a bit of code has, the harder it'll be to make it multithread safe, but that has already been covered above. The point I want to make is that the problem isn't so much globals but more the visibility of the collection of state that is required for a bit of code to operate (and subsequently how much other code also depends on that state).
Most games aren't multi-threaded, although newer titles are going down that route, and so they've managed to get away with it so far.
Saying that globals are okay in games is like not bothering to fix the brakes on your car because you only drive at 10mph!
It's bad practice which ever way you look at it.
You only have to look at the number of bugs in games to see examples of this.
If in class A you need to access data D, instead of setting D global, you'd better put into A a reference to D.
Globals are NOT intrinsically bad. In C for instance they are part of the language's normal use... and since C++ builds on C they still have a place.
On an aesthetic level, it's better to avoid them where you can sensibly make them part of a class, but if all you do is wrap a bunch of globals into a singleton, you made things worse because at least with globals it's obvious what the point is.
Be careful, but for some things it makes less sense to force OO concepts on what is actually a global value.
Two specific issues that I've encountered in the past:
First: If you're attempting to separate e.g. render phase (const access to most game state) from logic phase (non-const access to most game state) for whatever reason (decoupling render rate from logic rate, synchronizing game state across a network, recording and playback of gameplay at a fixed point in the frame, etc), globals make it very hard to enforce that.
Problems tend to creep in and become hard to debug and eradicate. This also has implications for threaded renderers separate from game logic, or the like (the other answers cover this topic thoroughly).
Second: The presence of many globals tends to bloat the literal pool, which the compiler typically places after each function.
If you get to your state through either a single "struct GlobalTable" which holds globals or a collection of methods on an object or the like, your literal pool tends to be a lot smaller, decreasing the size of the .text section in your executable.
This is mostly a concern for instruction set architectures that can't embed load targets directly into instructions (see e.g. fixed-width ARM or Thumb version 1 instruction encoding on ARM processors). Even on modern processors I'd wager you'll get slightly smaller codegen.
It also hurts doubly when your instruction and data caches are separate (again, as on some ARM processors); you'll tend to get two cachelines loaded where you might only need one with a unified cache. Since the literal pool may count as data, and won't always start on a cacheline boundary, the same cacheline might have to be loaded both into the i-cache and d-cache.
This may count as a "microoptimization", but it's a transform that's relatively easily applied to an entire codebase (e.g. extern struct GlobalTable { /* ... */ } the; and then replace all g_foo with the.foo)... And we've seen code size decrease between 5 and 10 percent in some cases, with a corresponding boost to performance due to keeping the caches cleaner.
I'm going to take a risk and say it depends to me on the scope/scale of your project. Because if you are trying to program an epic game with a boatload of code, then globals could, and easily in the most painful way in hindsight, cost you way more time than they save.
But if you are trying to code, say, something as simple as Super Mario or Metroid or Contra or even simpler, like Pac-Man, then these games were originally coded in 6502 assembly and used globals for almost everything. Just about all the game data and state was stored in data segments, and that didn't stop the devs from shipping a very competent and popular product in spite of working with absolutely inferior tools and engineering standards which would probably horrify people today.
So if you are just writing this kind of small and simple game which has a very limited scope and isn't designed to grow and expand far beyond its original design, isn't designed to be maintained for years and years, with a few thousand lines of simple C++ code, then I don't see the big deal of using a global here or a singleton there. Someone obsessed with trying to engineer Super Mario with the soundest engineering techniques with SOLID and a DI framework could end up taking far, far longer to ship than even the devs who wrote it in 6502 asm.
And I'm getting old and there's something to it there when I look at these old simple games and how they were coded, and it almost seems like the devs were doing something right in spite of the hard-coded magic numbers and globals all over the place while I spend my career fumbling around and trying to figure out the best way to engineer things. That said this is probably a very unpopular opinion, and not one I would have liked either a decade or two ago, but there's something to it. I don't look at the 6502 asm of Metroid and think, "these devs underengineered their product and their lives would have been so much easier if they did this or that." Seems like they did things just about right.
But again this is for small-scale stuff, maybe in the indie category of games by today's standards, and in the smaller of the indie games among them, and far from doing anything ground-breaking in terms of how much data it can process or using cutting-edge hardware techniques. If in doubt, I'd definitely suggest to err on the side of avoiding globals. It's also a little bit trickier in C++ as opposed to say, C, since you can have objects with constructor and destructors, and initialization and destruction order isn't well-defined and easily predictable for global objects. There I'd say to lean even more on the side of avoiding globals since they can trip you up in whole new ways when you aren't explicitly initializing and destroying them yourself in a predictable order. And naturally if you want to multithread a lot beyond a critical loop here and there, then your ability to reason about thread-safety of any particular code will be severely diminished if you cannot minimize the scope/visibility of your game state to the minimum of places.

How can I decrease complexity in library without increasing complexity elsewhere?

I am tasked to maintain and update a library which allows a computer to send commands at a hardware device and then receive its response. Currently the code is setup in such a way that every single possible command the device can receive is sent via its own function. Code repetition is everywhere; a DRY advocate's worst nightmare.
Obviously there is much opportunity for improvement. The problem is each command has a different payload. Currently the data that is to be the payload is passed to each command function in the form of arguments. It's difficult to consolidate functionality without pushing the complexity to a level that calls the library.
When a response is received from the device its data is put into an object of a class solely responsible for holding this data, they do nothing else. There are hundreds of classes which do this. These objects are then used to access the returned data by the app layer.
My objectives:
Throughly reduce code repetition
Maintain similiar level of complexity at application layer
Make it easier to add new commands
My idea:
Have one function to send a command and one to receive (the receiving function is automatically called when a response from the device is detected). Have a struct holding all command/response data which will be passed to sending function and returned by receiving function. Since each command has a corresponding enum value, have a switch statement which sets up any command specific data for sending.
Is my idea the best way to do it? Is there a design pattern I could use here? I've looked and looked but nothing seems to fit my needs.
Thanks in advance! (Please let me know if clarification is necessary)
This reminds me of the REST vs. SOA debate, albeit on a smaller physical scale.
If I understand you correctly, right now you have calls like
device->DoThing();
device->DoOtherThing();
and then sometimes I get a callback like
callback->DoneThing(ThingResult&);
callback->DoneOtherTHing(OtherThingResult&)
I suggest that the user is the key component here. Do the current library users like the interface at the level it is designed? Is the interface consistent, even if it is large?
You seem to want to propose
device->Do(ThingAndOtherThingParameters&)
callback->Done(ThingAndOtherThingResult&)
so to have a single entry point with more complex data.
The downside from a library user perspective may that now I have to use a manual switch() or other type statement to tell what really happened. While the dispatching to the appropriate result callback used to be done for me, now you have made it a burden upon the library user.
Unless this bought me as a user some level of flexibility, that I as as user wanted I would consider this a step backwards.
For your part as an implementor, one suggestion would be to go to the generic form internally, and then offer both interfaces externally. Perhaps the old specific interface could even be auto-generated somehow.
Good Luck.
Well, your question implies that there is a balance between the library's complexity and the client's. When those are the only two choices, one almost always goes with making the client's life easier. However, those are rarely really the only two choices.
Now in the text you talk about a command processing architecture where each command has a different set of data associated with it. In the olden days, this would typically be implemented with a big honking case statement in a loop, where each case called a different routine with different parameters and perhaps some setup code. Grisly. McCabe complexity analysers hate this.
These days what you can do with an OO language is use dynamic dispatch. Create a base abstract "command" class with a standard "handle()" method, and have each different command inherit from it to add their own members (to represent the different "arguments" to the different commands). Then you create a big honking array of these at startup, usually indexed by the command ID. For languages like C++ or Ada it has to be an array of pointers to "command" objects, for the dynamic dispatch to work. Then you can just call the appropriate command object for the command ID you read from the client. The big honking case statement is now handled implicitly by the dynamic dispatch.
Where you can get the big savings in this scenario is in subclassing. Do you have several commands that use the exact same parameters? Make a subclass for them, and then derive all of those commands from that subclass. Do you have several commands that have to perform the same operation on one of the parameters? Make a subclass for them with that one method implemented for that operation, and then derive all those commands from that subclass.
Your first objective should be to produce a library that decouples higher software layers from the hardware. Users of your library shouldn't care that you have a hardware device that can execute a number of functions with a different payload. They should only care what the device does in a higher level. In this sense, it is in my opinion a good thing that every command is mapped to each one function.
My plan will be:
Identify the objects the higher data layers need to get the job done. Model the objects in C++ classes from their perspective, not from the perspective of the hardware
Define the interface of the library using the above objects
Start the implementation of the library. Perhaps an intermediate layer that maps software objects to hardware objects is necessary
There are many things you can do to reduce code repetition. You can use polymorphism. Define a class with the base functionality and extend it. You can also use utility classes, that implement functions needed for many commands.

Lua, game state and game loop

Call main.lua script at each game loop iteration - is it good or bad design? How does it affect on the performance (relatively)?
Maintain game state from a. C++ host-program or b. from Lua scripts or c. from both and synchronise them?
(Previous question on the topic: Lua and C++: separation of duties)
(I vote for every answer. The best answer will be accepted.)
My basic rule for lua is - or any script language in a game -
Anything that happens on every frame: c++
asynchronous events - user input - lua
synchronous game engine events - lua
Basically, any code thats called at >33-100Hz (depending on frame rate) is C++
I try to invoke the script engine <10Hz.
Based on any kind of actual metric? not really. but it does put a dividing line in the design, with c++ and lua tasks clearly delineated - without the up front delineation the per frame lua tasks will grow until they are bogging processing per frame - and then theres no clear guideline on what to prune.
The best thing about lua is that it has a lightweight VM, and after the chunks get precompiled running them in the VM is actually quite fast, but still not as fast as a C++ code would be, and I don't think calling lua every rendered frame would be a good idea.
I'd put the game state in C++, and add functions in lua that can reach, and modify the state. An event based approach is almost better, where event registering should be done in lua (preferably only at the start of the game or at specific game events, but no more than a few times per minute), but the actual events should be fired by C++ code. User inputs are events too, and they don't usually happen every frame (except for maybe MouseMove but which should be used carefully because of this). The way you handle user input events (whether you handle everything (like which key was pressed, etc) in lua, or whether there are for example separate events for each keys on the keyboard (in an extreme case) depends on the game you're trying to make (a turn based game might have only one event handler for all events, an RTS should have more events, and an FPS should be dealt with care (mainly because moving the mouse will happen every frame)). Generally the more separate kinds of events you have, the less you have to code in lua (which will increase performance), but the more difficult it gets if a "real event" you need to handle is actually triggered by more separate "programming level events" (which might actually decrease performance, because the lua code needs to be more complex).
Alternatively if performance is really important you can actually improve the lua VM by adding new opcodes to it (I've seen some of the companies to do this, but mainly to make decompilation of the compiled lua chunks more harder), which is actually not a hard thing to do. If you have something that the lua code needs to do a lot of times (like event registering, event running, or changing the state of the game) you might want to implement them in the lua VM, so instead of multiple getglobal and setglobal opcodes they would only take one or two (for example you could make a SETSTATE opcode with a 0-255 and a 0-65535 parameter, where the first parameter descibes which state to modify, and the second desribes the new value of the state. Of course this only works if you have a maximum of 255 events, with a maximum of 2^16 values, but it might be enough in some cases. And the fact that this only takes one opcode means that the code will run faster). This would also make decompilation more harder if you intend to obscure your lua code (although not much to someone who knows the inner workings of lua). Running a few opcodes per frame (around 30-40 tops) won't hit your performance that badly. But 30-40 opcodes in the lua VM won't get you far if you need to do really complex things (a simple if-then-else can take up to 10-20 or more opcodes depending on the expression).
I don't like C++. But I do like games.
My approach might be a bit atypical: I do everything I can in Lua, and only the absolute minimum in C++. The game loop, the entities, etc are all done in Lua. I even have a QuadTree implementation done in Lua. C++ handles graphical and filesystem stuff, as well as interfacing with external libraries.
This is not a machine-based decision, but a programmer-based one; I output code much faster in Lua than In C++. So I spend my programmer cycles on new features rather than on saving computer cycles. My target machines (any laptop from the last 3 years) are able to cope with this amount of Lua very easily.
Lua is surprisingly low-footprint (take a look to luaJIT if you don't know it).
This said, if I ever find a bottleneck (I haven't yet) I'll profile the game in order to find the slow part, and I'll translate that part to C++ ... only if I can't find a way around it using Lua.
I am using Lua for the first time in a game I've been working on. The C++ side of my application actually holds pointers to instances of each game state. Some of the game states are implemented in C++ and some are implemented in Lua (such as the "game play" state).
The update and main application loop live on the C++ side of things. I have exposed functions that allow the Lua VM to add new game states to the application at runtime.
I have not yet had any problems with slowness, even running on hardware with limited resources (Atom processor with integrated video). Lua functions are called every frame. The most expensive (in terms of time) operation in my application is rendering.
The ability to create new states completely in Lua was one of the best decisions I made on the project, since it allows me to freely add portions of the game without recompiling the whole thing.
Edit: I'm using Luabind, which I have read performs slower in general than other binding frameworks and of course the Lua C API.
You probably don't want to execute the entire Lua script on every frame iteration, because any sufficiently complex game will have multiple game objects with their own behaviors. In other words, the advantages of Lua are lost unless you have multiple tiny scripts that handle a specific part of the behavior of a larger game. You can use the lua_call function to call any appropriate lua routine in your script, not just the entire file.
There's no ideal answer here, but the vast majority of your game state is traditionally stored in the game engine (i.e. C++). You reveal to Lua just enough for Lua to do the decision making that you've assigned to Lua.
You need to consider which language is appropriate for which behaviors. Lua is useful for high level controls and decisions, and C++ is useful for performance oriented code. Lua is particularly useful for the parts of your game that you need to tweak without recompiling. All magic constants and variables could go into Lua, for example. Don't try to shoehorn Lua where it does not belong, i.e. graphics or audio rendering.
IMHO Lua scripts are for specific behaviours, it's definitely going to hurt performance if you are calling a Lua script 60 times per second.
Lua scripts are often to separate stuff like Behaviour, and specific events from your Game Engine logic (GUI, Items, Dialogs, game engine events, etc...). A good usage of Lua for example would be when triggering an explosion (particle FX), if the Game Character walks somewhere, hard-coding the output of that event in your engine would be a very ugly choice. Though, making the engine trigger the correct script would be a better choice, decoupling that specific behavior off your engine.
I would recommend, to try to keep your Game State in one part, instead of upscaling the level of complexity of keeping states synchronized in two places (Lua and Engine), add threading to that, and you will end up having a very ugly mess. Keep it simple. (In my Designs I mostly keep Game State in C++)
Good luck with your Game!
I'd like to throw in my two cents since I strongly believe that there's some incorrect advice being given here. For context, I am using Lua in a large game that involves both intensive 3D rendering as well as intensive game logic simulation. I've become more familiar than I'd have liked to with Lua and performance...
Note that I'm going to talk specifically about LuaJIT, because you're going to want to use LuaJIT. It's plug-and-play, really, so if you can embed Lua you can embed LuaJIT. You'll want it, if not for the extra speed, then for the automagic foreign function interface module (require 'ffi') that will allow you to call your native code directly from Lua without ever having to touch the Lua C API (95%+ of cases).
It's perfectly fine to call Lua at 60hz (I call it at 90hz in VR..). The catch is that you are going to have to be careful to do it correctly. As someone else mentioned, it's critical that you load the script only once. You can then use the C API to get access to functions you defined in that script, or to run the script itself as a function. I recommend the former: for a relatively simple game, you can get by with defining functions like onUpdate (dt), onRender (), onKeyPressed (key), onMouseMoved (dx, dy), etc. You can call these at the appropriate time from your main loop in C++. Alternatively, you can actually have your entire main loop be in Lua, and instead invoke your C++ code for performance-critical routines (I do that). This is especially easy to do with the LuaJIT FFI.
This is the really hard question. It will depend on your needs. Can you quite easily hammer down the game state? Great, put it C++-side and access from LuaJIT FFI. Not sure what all will be in the game state / like to be able to prototype quickly? Nothing wrong with keeping it in Lua. That is, until you start talking about a complex game with 1000s of objects, each containing non-trivial state. In this case, hybrid is the way to go, but figuring out exactly how to split state between C++ and Lua, and how to martial said state between the two (especially in perf-critical routines) is something of an art. Let me know if you come up with a bulletproof technique :) As with everything else, the general rule of thumb is: data that goes through performance-critical pathways needs to be on the native side. For example, if your entities have positions and velocities that you update each frame, and you have thousands of said entities, you need to do this in C++. However, you can get away with layering an 'inventory' on top of these entities using Lua (an inventory doesn't need a per-frame update).
Now, a couple more notes I'd like to throw out both as general FYIs and in response to some of the other answers.
Event-based approaches are, in general, critical to the performance of any game, but that goes doubly for systems written in Lua. I said it's perfectly fine to call Lua # 60hz. But it's not perfectly fine to be running tight loops over lots of game objects each frame in said Lua. You might get away with wastefully calling update() on everything in the universe in C++ (though you shouldn't), but doing so in Lua will start eating up those precious milliseconds far too quickly. Instead, as others have mentioned, you need to be thinking of Lua logic as 'reactive' -- usually, this means handling an event. For example, don't check that one entity is in range of another each frame in Lua (I mean, this is fine for one entity, but in general when you're scaling up your game you need to not think like this). Instead, tell your C++ engine to notify you when the two entities get within a certain distance of one another. In this way, Lua becomes the high-level controller of game logic, dispatching high-level commands and responding to high-level events, not carrying out the low-level math grind of trivial game logic.
Be wary of the advice that 'mixed code' is slow. The Lua C API is lightweight and fast. Wrapping functions for use with Lua is, at worst, quite easy (and if you take a while to understand how Lua interfaces with C w.r.t. the virtual stack, etc, you will notice that it has been designed specifically to minimize call overhead), and at best, trivial (no wrapping required) and 100% as performant as native calls (thanks, LuaJIT FFI!) In most cases I find that a careful mixture of Lua and C (via the LJ FFI) is superior to pure Lua. Even vector operations, which I believe the LuaJIT manuals mention somewhere as an example where code should be kept to Lua, I have found to be faster when I execute via Lua->C calls. Ultimately, don't trust anyone or anything but your own (careful) performance measurements when it comes to pieces of code that are performance-sensitive.
Most small-ish games could get away just fine with game state, the core loop, input handling, etc. done entirely in Lua and running under LuaJIT. If you're building a small-ish game, consider the "move to C++ as needed" approach (lazy compilation!) Write in Lua, when you identify a clear bottleneck (and have measured it to make sure it's the culprit), move that bit to C++ and be on your way. If you're writing a big game with 1000s of complex objects with complex logic, advanced rendering, etc. then you'll benefit from more up-front designing.
Best of luck :)
About the performance of 1: if main.lua does not change, load it once with lua_loadfile or loadfile, save a reference to the returned function, and then call it when needed.
Most of the performance will be lost through the binding between Lua and C++. A function call will actually need to be wrapped, and re-wrapped, and like that a couple of time usually. Pure Lua or pure C++ code is usually faster than mixed code (for small operations).
Having said that, I personally didn't see any strong performance hit running a Lua script every frame.
Usually scripting is good at high level. Lua has been used in famous games for the Bots (Quake 3) and for the User Interface (World of Warcraft). Used at high level Lua micro-threads come handy: The coroutines can save a lot (compared to real threads). For example to run some Lua code only once in a while.