Pointer vs Variable speed in C++

Pointer vs Variable speed in C++ - c++

At a job interview I was asked the question "In C++ how do you access a variable faster, though the normal variable identifier or though a pointer". I must say I did not have a good technical answer to the question so I took a wild guess.
I said that access time will probably be that same as normal variable/identifier is a pointer to the memory address where the value is stored, just like a pointer. In other words, that in terms of speed they both have the same performance, and that pointers are only different because we can specify the memory address we want them to point to.
The interviewer did not seem very convinced/satisfied with my answer (although he did not say anything, just carried on asking something else), therefore I though to come and ask SO'ers wether my answer was accurate, and if not why (from a theory and technical POV).

When you access a "variable", you look up the address, and then fetch the value.
Remember - a pointer IS a variable. So actually, you:
a) look up the address (of the pointer variable),
b) fetch the value (the address stored at that variable)
... and then ...
c) fetch the value at the address pointed to.
So yes, accessing via "pointer" (rather than directly) DOES involve (a bit) of extra work and (slightly) longer time.
Exactly the same thing occurs whether or not it's a pointer variable (C or C++) or a reference variable (C++ only).
But the difference is enormously small.

A variable does not have to live in main memory. Depending on the circumstances, the compiler can store it in a register for all or part of its life, and accessing a register is much faster than accessing RAM.

Let's ignore optimization for a moment, and just think about what the abstract machine has to do to reference a local variable vs. a variable through a (local) pointer. If we have local variables declared as:
int i;
int *p;
when we reference the value of i, the unoptimized code has to go get the value that is (say) at 12 past the current stack pointer and load it into a register so we can work with it. Whereas when we reference *p, the same unoptimized code has to go get the value of p from 16 past the current stack pointer, load it into a register, and then go get the value that the register points to and load it into another register so we can work with it as before. The first part of the work is the same, but the pointer access conceptually involves an additional step that needs to be done before we can work with the value.
That was, I think, the point of the interview question - to see if you understood the fundamental difference between the two types of access. You were thinking that the local variable access involved a kind of lookup, and it does - but the pointer access involves that very same type of lookup to get to the value of the pointer before we can start to go after the thing it is pointing to. In simple, unoptimized terms, the pointer access is going to be slower because of that extra step.
Now with optimization, it may happen that the two times are very close or identical. It is true that if other recent code has already used the value of p to reference another value, you may already find p in a register, so that the lookup of *p via p takes the same time as the lookup of i via the stack pointer. By the same token, though, if you have recently used the value of i, you may already find it in a register. And while the same might be true of the value of *p, the optimizer can only reuse its value from the register if it is sure that p hasn't changed in the mean time. It has no such problem reusing the value of i. In short, while accessing both values may take the same time under optimization, accessing the local variable will almost never be slower (except in really pathological cases), and may very well be faster. That makes it the correct answer to the interviewer's question.
In the presence of memory hierarchies, the difference in time may get even more pronounced. Local variables are going to be located near each other on the stack, which means that you are very likely to find the address you need already in main memory and in the cache the first time you access it (unless it is the very first local variable you access in this routine). There is no such guarantee with the address the pointer points to. Unless it was recently accessed, you may need to wait for a cache miss, or even a page fault, to access the pointed-to address, which could make it slower by orders of magnitude vs. the local variable. No, that won't happen all the time - but it's a potential factor that could make a difference in some cases, and that too is something that could be brought up by a candidate in response to such a question.
Now what about the question other commenters have raised: how much does it matter? It's true, for a single access, the difference is going to be tiny in absolute terms, like a grain of sand. But you put enough grains of sand together and you get a beach. And though (to continue the metaphor) if you are looking for someone who can run quickly down a beach road, you don't want someone who will obsess about sweeping every grain of sand off the road before he or she can start running, you do want someone who will be aware when he or she is running through knee-deep dunes unnecessarily. Profilers won't always rescue you here - in these metaphorical terms, they are much better at recognizing a single big rock that you need to run around than noticing lots of little grains of sand that are bogging you down. So I would want people on my team who understand these issues at a fundamental level, even if they rarely go out of their way to use that knowledge. Don't stop writing clear code in the quest for microoptimization, but be aware of the kinds of things that can cost performance, especially when designing your data structures, and have a sense of whether you are getting good value for the price you are paying. That's why I think this was a reasonable interview question, to explore the candidate's understanding of these issues.

What paulsm4 and LaC said + a little asm:
int y = 0;
mov dword ptr [y],0
y = x;
mov eax,dword ptr [x] ; Fetch x to register
mov dword ptr [y],eax ; Store it to y
y = *px;
mov eax,dword ptr [px] ; Fetch address of x
mov ecx,dword ptr [eax] ; Fetch x
mov dword ptr [y],ecx ; Store it to y
Not that on the other hand it matters much, also this probably is harder to optimize (fe. you can't keep the value in cpu register, as the pointer just points to some place in memory). So optimized code for y = x; could look like this:
mov dword ptr [y], ebx - if we assume that local var x was stored in ebx

I think the interviewer was looking for you to mention the word register. As in, if you declare a variable as a register variable the compiler will do its utmost to ensure that it is stored in a register on the CPU.
A bit of chat around bus access and negotiation for other types of variables and pointers alike would have helped to frame it.

paulsm4 and LaC has already explained it nicely along with other members. I want to emphasize effect of paging when the pointer is pointing to something in heap which has been paged out.
=> Local variables are available either in the stack or in the register => while in case of pointer, the pointer may be pointing to an address which is not in cache and paging will certainly slow down the speed.

A variable holds a value of certain type, and accessing the variable means getting this value, from memory or from a register. When getting the value from memory we need to get it's address from somewhere - most of the time it has to be loaded into a register (sometimes it can be part of the load command itself, but this is quite rare).
A pointer keeps an address of a value; this value has to be in memory, the pointer itself can be in memory or in a register.
I would expect that on average access via a pointer will be slower than accessing the value through a variable.

Your analysis ignores the common scenario in which the pointer itself is a memory variable which must also be accessed.
There are many factors that affect the performance of software, but if you make certain simplifying assumptions about the variables involved (notably that they are not cached in any way), then each level of pointer indirection requires an additional memory access.
int a = 1234; // luggage combination
int *b = &a;
int **c = &b;
...
int e = a; // one memory access
int e = *b; // two memory accesses
int e = **c; // three memory accesses
So the short answer to "which is faster" is: ignoring compiler and processor optimizations which might be occurring, it is faster to access the variable directly.
In a best-case scenario, where this code is executed repeatedly in a tight loop, the pointer value would likely be cached into a CPU register or at worst into the processor's L1 cache. In such a case, it is likely that a first-level pointer indirection is as fast or faster than accessing the variable directly since "directly" probably means via the "stack pointer" register (plus some offset). In both cases you are using a CPU register as a pointer to the value.
There are other scenarios that could affect this analysis, such as for global or static data where the variable's address is hard-coded into the instruction stream. In such a scenario, the answer may depend on the specifics of the processor involved.

I think the key part of the question is "access a variable". To me, if a variable is in scope, why would you create a pointer to it (or a reference) to access it? Using a pointer or a reference would only make sense if the variable was in itself a data structure of some sort or if you were acessing it in some non-standard way (like interpreting an int as a float).
Using a pointer or a reference would be faster only in very specific circumstances. Under general circumstances, it seems to me that you would be trying to second guess the compiler as far as optimization is concerned and my experience tells me that unless you know what you're doing, that's a bad idea.
It would even depend on the keyword. A const keyword might very well mean that the variable is totally optimized out at compile time. That is faster than a pointer. The register keyword does not guarantee that the variable is stored in a register. So how do you know whether its faster or not? I think the answer is that it depends because there is no one size fits all answer.

I think a better answer might be it depends on where the pointer is 'pointing to'. Note, a variable might already be in the cache. However a pointer might incur a fetch penalty. It's similar to a linked list vs Vector performance tradeoff. A Vector is cache friendly because all of your memory is contigious. However a linked list, since it contains pointers, might incur a cache penalty because the memory is potentially scattered all over the place

Related

Is passing a char or short by reference or pointer slower than passing by value?

I've always heard it said that passing by reference instead of value (copying) is more efficient for types larger than int or long long because it avoids copying. When an argument is passed by value the value is pushed onto the stack where the operations are done in the function, but the way I see it is that when you pass a reference or pointer you're passing the address of the variable. If that's the case, is it true that the CPU has to fetch that value from that address, which isn't local like stack variables, resulting in fetching data that might not be in cache? Does that mean that passing something like char or short is slower if done by reference or pointer? I've heard it said that for these types it doesn't make a difference, but if it doesn't, could you explain where my reasoning is wrong?

First of all, in nearly all cases 'faster' and 'slower' cannot be determined by some reasoning. It just requires measuring time for different situations.
Having said the above let's get to the chase. Here are a few important points:
no matter what you do compiler is free to change anything as long as it is not changing the observable behaviour of program. So it might be unimportant from this perspective since compiler can interchange (in both ways) pass by const T& and pass by T as long as it knows it makes no difference from semantic perspective
even if the compiler does not change anything you are likely not to observe any effect since nearly all modern processors work quite well with single level of indirection (which results in single assembler instruction). Of course not every instruction is equally fast but this gives you idea about how little this changes
(I think) there is a slightly bigger possiblity of cache miss when passing by reference which theoretically can incur some bigger performance penalties but I think memory prefetching is sophisticated enough on current processors to deal with this problem in most of cases
even if your processor has visible speed differences for these operations it still will be less important than algorithmic complexity making this a theoretical questions really, not a practical one

Yes, it is. But it doesn't cause big performance issue. Also, fetching value from memory is slower than copyying it. Also for int, long long types, you should not use references for them.

According to an older post in StackOverflow #green-lantern said:
Overhead with passing by reference:
each access needs a dereference, i.e., there is one more memory read
Overhead with passing by value: the value needs to be copied on the stack or into registers
For small objects, such as an integer, passing by value will be faster. For bigger objects (for example a large structure), the copying would create too much overhead so passing by reference will be faster.
Source:Pass by value faster than pass by reference
Let's make a quick glance at the sizes for the different types:
char - 1 byte
char pointer - 8 bytes
short - 2 bytes
short pointer - 8 bytes
As a rule of thumb, passing by reference or pointer is typically faster than passing by value, if the amount of data passed by value is larger than the size of a pointer.
In this current case passing by value is faster than passing by reference for the types char and short

Here's an analogy:
Pass by value: Here is the thing.
Pass by reference: You go to this address and you'll find the thing.
Which is faster*? Pass by value.
Does it matter? Depends on your use case, try measuring it.
Totally depends on CPU architecture, cache size, and other things.

C++ how are variables accessed in memory?

When I create a new variable in a C++ program, eg a char:
char c = 'a';
how does C++ then have access to this variable in memory? I would imagine that it would need to store the memory location of the variable, but then that would require a pointer variable, and this pointer would again need to be accessed.

See the docs:
When a variable is declared, the memory needed to store its value is
assigned a specific location in memory (its memory address).
Generally, C++ programs do not actively decide the exact memory
addresses where its variables are stored. Fortunately, that task is
left to the environment where the program is run - generally, an
operating system that decides the particular memory locations on
runtime. However, it may be useful for a program to be able to obtain
the address of a variable during runtime in order to access data cells
that are at a certain position relative to it.
You can also refer this article on Variables and Memory
The Stack
The stack is where local variables and function parameters reside. It
is called a stack because it follows the last-in, first-out principle.
As data is added or pushed to the stack, it grows, and when data is
removed or popped it shrinks. In reality, memory addresses are not
physically moved around every time data is pushed or popped from the
stack, instead the stack pointer, which as the name implies points to
the memory address at the top of the stack, moves up and down.
Everything below this address is considered to be on the stack and
usable, whereas everything above it is off the stack, and invalid.
This is all accomplished automatically by the operating system, and as
a result it is sometimes also called automatic memory. On the
extremely rare occasions that one needs to be able to explicitly
invoke this type of memory, the C++ key word auto can be used.
Normally, one declares variables on the stack like this:
void func () {
int i; float x[100];
...
}
Variables that are declared on the stack are only valid within the
scope of their declaration. That means when the function func() listed
above returns, i and x will no longer be accessible or valid.
There is another limitation to variables that are placed on the stack:
the operating system only allocates a certain amount of space to the
stack. As each part of a program that is being executed comes into
scope, the operating system allocates the appropriate amount of memory
that is required to hold all the local variables on the stack. If this
is greater than the amount of memory that the OS has allowed for the
total size of the stack, then the program will crash. While the
maximum size of the stack can sometimes be changed by compile time
parameters, it is usually fairly small, and nowhere near the total
amount of RAM available on a machine.

Assuming this is a local variable, then this variable is allocated on the stack - i.e. in the RAM. The compiler keeps track of the variable offset on the stack. In the basic scenario, in case any computation is then performed with the variable, it is moved to one of the processor's registers and the CPU performs the computation. Afterwards the result is returned back to the RAM. Modern processors keep whole stack frames in the registers and have multiple levels of registers, so it can get quite complex.
Please note the "c" name is no more mentioned in the binary (unless you have debugging symbols). The binary only then works with the memory locations. E.g. it would look like this (simple addition):
a = b + c
take value of memory offset 1 and put it in the register 1
take value of memory offset 2 and put in in the register 2
sum registers 1 and 2 and store the result in register 3
copy the register 3 to memory location 3
The binary doesn't know "a", "b" or "c". The compiler just said "a is in memory 1, b is in memory 2, c is in memory 3". And the CPU just blindly executes the commands the compiler has generated.

C++ itself (or, the compiler) would have access to this variable in terms of the program structure, represented as a data structure. Perhaps you're asking how other parts in the program would have access to it at run time.
The answer is that it varies. It can be stored either in a register, on the stack, on the heap, or in the data/bss sections (global/static variables), depending on its context and the platform it was compiled for: If you needed to pass it around by reference (or pointer) to other functions, then it would likely be stored on the stack. If you only need it in the context of your function, it would probably be handled in a register. If it's a member variable of an object on the heap, then it's on the heap, and you reference it by an offset into the object. If it's a global/static variable, then its address is determined once the program is fully loaded into memory.
C++ eventually compiles down to machine language, and often runs within the context of an operating system, so you might want to brush up a bit on Assembly basics, or even some OS principles, to better understand what's going on under the hood.

Lets say our program starts with a stack address of 4000000
When, you call a function, depending how much stack you use, it will "allocate it" like this
Let's say we have 2 ints (8bytes)
int function()
{
int a = 0;
int b = 0;
}
then whats gonna happen in assembly is
MOV EBP,ESP //Here we store the original value of the stack address (4000000) in EBP, and we restore it at the end of the function back to 4000000
SUB ESP, 8 //here we "allocate" 8 bytes in the stack, which basically just decreases the ESP addr by 8
so our ESP address was changed from
4000000
to
3999992
that's how the program knows knows the stack addresss for the first int is "3999992" and the second int is from 3999996 to 4000000
Even tho this pretty much has nothing to do with the compiler, it's really important to know because when you know how stack is "allocated", you realize how cheap it is to do things like
char my_array[20000];
since all it's doing is just doing sub esp, 20000 which is a single assembly instruction
but if u actually use all those bytes like memset(my_array,20000) that's a different history.

how does C++ then have access to this variable in memory?
It doesn't!
Your computer does, and it is instructed on how to do that by loading the location of the variable in memory into a register. This is all handled by assembly language. I shan't go into the details here of how such languages work (you can look it up!) but this is rather the purpose of a C++ compiler: to turn an abstract, high-level set of "instructions" into actual technical instructions that a computer can understand and execute. You could sort of say that assembly programs contain a lot of pointers, though most of them are literals rather than "variables".

Passing scalar types by value or reference: does it matter?

Granted, micro-optimization is stupid and probably the cause of many mistakes in practice. Be that as it may, I have seen many people do the following:
void function( const double& x ) {}
instead of:
void function( double x ) {}
because it was supposedly "more efficient". Say that function is called ridiculously often in a program, millions of times; does this sort of "optimisation" matter at all?

Long story short no, and particularly not on most modern platforms where scalar and even floating point types are passed via register. The general rule of thumb I've seen bandied about is 128bytes as the dividing line between when you should just pass by value and pass by reference.
Given the fact that the data is already stored in a register you're actually slowing things down by requiring the processor to go out to cache/memory to get the data. That could be a huge hit depending on if the cache line the data is in is invalid.
At the end of the day it really depends on what the platform ABI and calling convention is. Most modern compilers will even use registers to pass data structures if they will fit (e.g. a struct of two shorts etc.) when optimization is turned up.

Passing by reference in this case is certainly not more efficient by itself. Note that qualifying that reference with a const does not mean that the referenced object cannot change. Moreover, it does not mean that the function itself cannot change it (if the referee is not constant, then the function it can legally use const_cast to get rid of that const). Taking that into account, it is clear that passing by reference forces the compiler to take into account possible aliasing issues, which in general case will lead to generation of [significantly] less efficient code in pass-by-reference case.
In order to take possible aliasing out of the picture, one'd have to begin the latter version with
void function( const double& x ) {
double non_aliased_x = x;
// ... and use `non_aliased_x` from now on
...
}
but that would defeat the proposed reasoning for passing by reference in the first place.
Another way to deal with aliasing would be to use some sort of C99-style restrict qualifier
void function( const double& restrict x ) {
but again, even in this case the cons of passing by reference will probably outweigh the pros, as explained in other answers.

In the latter example you save 4B of being copied to stack during function call. It takes 8B to store doubles and only 4B to store a pointer (in 32b environment, in 64b it takes 64b=8B so you don't save anything) or a reference which is nothing more than a pointer with a bit of compiler support.

Unless the function is inlined, and depending on the calling convention (the following assumes stack-based parameter passing, which in modern calling conventions is only used when the function has too many arguments*), there are two differences in how the argument is passed and used:
double: The (probably) 8 byte large value is written onto the stack and read by the function as is.
double & or double *: The value lies somewhere in the memory (might be "near" the current stack pointer, e.g. if it's a local variable, but might also be somewhere far away). A (probably) 4 or 8 byte large pointer address (32 bit or 64 bit system respectively) is stored on the stack and the function needs to dereference the address to read the value. This also requires the value to be in addressable memory, which registers aren't.
This means, the stack space required to pass the argument might be a little bit less when using references. This not only decreases memory requirement but also cache efficiency of the topmost bytes of the stack. When using references, dereferencing adds some piece of work more to do.
To summarize, use references for large types (let's say when sizeof(T) > 32 or maybe even more). When stack size and hotness plays a very important role maybe already if sizeof(T) > sizeof(T*).
*) See the comments on this and SOReader's answer for what's happening if this is not the case.

How much does pointer indirection affect efficiency?

Is dereferencing a pointer notabley slower than just accessing that value directly? I suppose my question is - how fast is the deference operator?

Going through a pointer indirection can be much slower because of how a modern CPU works. But it has nothing much to do with runtime memory.
Instead, speed is affected by prediction and cache.
Prediction is easy when the pointer has not been changed or when it is changed in predictable ways (for example, increment or decrement by four in a loop). This allows the CPU to essentially run ahead of the actual code execution, figure out what the pointer value is going to be, and load that address into cache. Prediction becomes impossible when the pointer value is built by a complex expression like a hash function.
Cache comes into play because the pointer might point into memory that isn't in cache and it will have to be fetched. This is minimized if prediction works but if prediction is impossible then in the worst case you can have a double impact: the pointer is not in cache and the pointer target is not in cache either. In that worst-case the CPU would stall twice.
If the pointer is used for a function pointer, the CPU's branch predictor comes into play. In C++ virtual tables, the function values are all constant and the predictor has it easy. The CPU will have the code ready to run and in the pipeline when execution goes through the indirect jump. But, if it is an unpredictable function pointer the performance impact can be heavy because the pipeline will need to be flushed which wastes 20-40 CPU cycles with each jump.

Depends on stuff like:
whether the "directly accessed" value is in a register already, or on the stack (that's also a pointer indirection)
whether the target address is in cache already
the cache architecture, bus architecture etc.
ie, too many variables to usefully speculate about without narrowing it down.
If you really want to know, benchmark it on your specific hardware.

it requires a memory access more:
read the address stored into the pointer variable
read the value at the address read
This could not be equal to 2 simple operation, because it may require also more time due to access an address not already loaded in the cache.

Assuming you're dealing with a real pointer (not a smart pointer of some sort), the dereference operation doesn't consume (data) memory at all. It does (potentially) involve an extra memory reference though: one to load the pointer itself, the other to access the data pointed to by the pointer.
If you're using a pointer in a tight loop, however, it'll normally be loaded into a register for the duration. In this case, the cost is mostly in terms of extra register pressure (i.e., if you use a register to store that pointer, you can't use it to store something else at the same time). If you have an algorithm that would otherwise exactly fill the registers, but with enregistering a pointer would overflow to memory it can make a difference. At one time, that was a pretty big loss, but with most modern CPUs (with more registers and on-board cache) that's rarely a big issue. The obvious exception would be an embedded CPU with fewer registers and no cache (and without on-chip memory).
The bottom line is that it's usually pretty negligible, often below the threshold where you can even measure it dependably.

It does. It costs an extra fetch.
Accessing a variable by value, the variable is directly read from its memory location.
Accessing the same through pointer adds an overhead of fetching the address of the variable from the pointer and then reading the value from that memory location.
Ofcourse, Assuming that the variable is not placed in a register, which it would be in some scenarios like tight loops. I believe the Question seeks answer of an overhead assuming no such scenarios.

C++ STL: Array vs Vector: Raw element accessing performance

I'm building an interpreter and as I'm aiming for raw speed this time, every clock cycle matters for me in this (raw) case.
Do you have any experience or information what of the both is faster: Vector or Array?
All what matters is the speed I can access an element (opcode receiving), I don't care about inserting, allocation, sorting, etc.
I'm going to lean myself out of the window now and say:
Arrays are at least a bit faster than vectors in terms of accessing an element i.
It seems really logical for me. With vectors you have all those security and controlling overhead which doesn't exist for arrays.
(Why) Am I wrong?
No, I can't ignore the performance difference - even if it is so small - I have already optimized and minimized every other part of the VM which executes the opcodes :)

Element access time in a typical implementation of a std::vector is the same as element access time in an ordinary array available through a pointer object (i.e. a run-time pointer value)
std::vector<int> v;
int *pa;
...
v[i];
pa[i];
// Both have the same access time
However, the access time to an element of an array available as an array object is better than both of the above accesses (equivalent to access through a compile-time pointer value)
int a[100];
...
a[i];
// Faster than both of the above
For example, a typical read access to an int array available through a run-time pointer value will look as follows in the compiled code on x86 platform
// pa[i]
mov ecx, pa // read pointer value from memory
mov eax, i
mov <result>, dword ptr [ecx + eax * 4]
Access to vector element will look pretty much the same.
A typical access to a local int array available as an array object will look as follows
// a[i]
mov eax, i
mov <result>, dword ptr [esp + <offset constant> + eax * 4]
A typical access to a global int array available as an array object will look as follows
// a[i]
mov eax, i
mov <result>, dword ptr [<absolute address constant> + eax * 4]
The difference in performance arises from that extra mov instruction in the first variant, which has to make an extra memory access.
However, the difference is negligible. And it is easily optimized to the point of being exactly the same in multiple-access context (by loading the target address in a register).
So the statement about "arrays being a bit faster" is correct in narrow case when the array is accessible directly through the array object, not through a pointer object. But the practical value of that difference is virtually nothing.

You may be barking up the wrong tree. Cache misses can be much more important than the number of instructions that get executed.

No. Under the hood, both std::vector and C++0x std::array find the pointer to element n by adding n to the pointer to the first element.
vector::at may be slower than array::at because the former must compare against a variable while the latter compares against a constant. Those are the functions that provide bounds checking, not operator[].
If you mean C-style arrays instead of C++0x std::array, then there is no at member, but the point remains.
EDIT: If you have an opcode table, a global array (such as with extern or static linkage) may be faster. Elements of a global array would be addressable individually as global variables when a constant is put inside the brackets, and opcodes are often constants.
Anyway, this is all premature optimization. If you don't use any of vector's resizing features, it looks enough like an array that you should be able to easily convert between the two.

You're comparing apples to oranges. Arrays have a constant-size and are automatically allocated, while vectors have a dynamic size and are dynamically allocated. Which you use depends on what you need.
Generally, arrays are "faster" to allocate (in quotes because comparison is meaningless) because dynamic allocation is slower. However, accessing an element should be the same. (Granted an array is probably more likely to be in cache, though that doesn't matter after the first access.)
Also, I don't know what "security" you're talking about, vector's have plenty of ways to get undefined behavior just like arrays. Though they have at(), which you don't need to use if you know the index is valid.
Lastly, profile and look at the generated assembly. Nobody's guess is gonna solve anything.

For decent results, use std::vector as the backing storage and take a pointer to its first element before your main loop or whatever:
std::vector<T> mem_buf;
// stuff
uint8_t *mem=&mem_buf[0];
for(;;) {
switch(mem[pc]) {
// stuff
}
}
This avoids any issues with over-helpful implementations that perform bounds checking in operator[], and makes single-stepping easier when stepping into expressions such as mem_buf[pc] later in the code.
If each instruction does enough work, and the code is varied enough, this should be quicker than using a global array by some negligible amount. (If the difference is noticeable, the opcodes need to be made more complicated.)
Compared to using a global array, on x86 the instructions for this sort of dispatch should be more concise (no 32-bit displacement fields anywhere), and for more RISC-like targets there should be fewer instructions generated (no TOC lookups or awkward 32-bit constants), as the commonly-used values are all in the stack frame.
I'm not really convinced that optimizing an interpreter's dispatch loop in this way will produce a good return on time invested -- the instructions should really be made to do more, if it's an issue -- but I suppose it shouldn't take long to try out a few different approaches and measure the difference. As always in the event of unexpected behaviour the generated assembly language (and, on x86, the machine code, as instruction length can be a factor) should be consulted to check for obvious inefficiencies.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js