What are opaque identifiers with regards to handles?

What are opaque identifiers with regards to handles? - c++

I have read this article and I encountered the following
A resource handle can be an opaque identifier, in which case it is
often an integer number (often an array index in an array or "table"
that is used to manage that type of resource), or it can be a pointer
that allows access to further information.
So a handle is either an opaque identifier or a pointer that allows access to further information. But from what I understand, these specific pointers are opaque pointers, so what exactly is the difference between these pointers ,which are opaque pointer, and opaque identifiers?

One of the literal meanings of "opaque" is "not transparent".
In computer science, an opaque identifier or a handle is one that doesn't expose its inner details. This means we can only access information from it by using some defined interface, and can't otherwise access information about its value (if any) or internal structure.
As an example, a FILE in the C standard library (and available in C++ through <cstdio>) is an opaque type. We don't know if it is a data structure, an integer, or anything else. All we know is that a set of functions, like fopen() return a pointer to one (i.e. a FILE *) and other functions (fclose(), fprintf(), ....) accept a FILE * as an argument. If we have a FILE *, we can't reliably do anything with it (e.g. actually write to a file) unless we use those functions.
The advantage of that is it allows different implementations to use different ways of representing a file. As long as our code uses the supplied functions, we don't have to worry about the internal workings of a FILE, or of I/O functions. Compiler vendors (or implementers of the standard library) worry about getting the internal details right. We simply use the opaque type FILE, and pointers to it, and stick to using standard functions, and our code works with all implementations (compilers, standard library versions, host systems)
An opaque identifier can be of any type. It can be an integer, a pointer, even a pointer to a pointer, or a data structure. Integers and pointers are common choices, but not the only ones. The key is only using a defined set of operations (i.e. a specific interface) to interact with those identifiers, without getting our hands dirty by playing with internal details.

A handle is said to be "opaque" when client code doesn't know how to see what it references. It's simply some identifier that can be used to identify something. Often it will be a pointer to an incomplete type that's only defined within a library and who's definition isn't visible to client code. Or it could just be an integer that references some element in some data structure. The important thing is that the client doesn't know or care what the handle is. The client only cares that it uniquely identifies some resource.
Consider the following interface:
widget_handle create_widget();
void do_a_thing(widget_handle);
void destroy_widget(widget_handle);
Here, it doesn't actually matter to the calling code what a widget_handle is, how the library actually stores widgets, or how the library actually uses a widget_handle to find a particular widget. It could be a pointer to a widget, or it could be an index into some global array of widgets. The caller doesn't care. All that matters is that it somehow identifies a widget.

One possible difference is that an integer handle can have "special" values, while pointer handle cannot.
For example, the file descriptors 0,1,2 are stdin, stdout, stderr. This would be harder to pull off if you have a pointer for a handle.

You really you shouldn't care. They could be everything.
Suppose you buy a ticket from person A for an event. You must give this ticket to person B to access the event.
The nature of the ticket is irrelevant to you, it could be:
a paper ticket,
an alphanumerical code,
a barcode,
a QR-Code,
a photo,
a badge,
whatever.
You don't care. Only A and B use the ticket for its nature, you are just carrying it around. Only B knows how to verify the validity and only A know how to issue a correct ticket.
An opaque pointer could be a memory location directly, while an integer could be an offset of a base memory address in a table but how is that relevant to you, client of the opaque handle?

In classic Mac OS memory management, handles were doubly indirected pointers. The handle pointed to a "master pointer" which was the address of the actual data. This allowed moving the actual storage of the object in memory. When a block was moved, its master pointer would be updated by the memory manager.
In order to use the data the handle ultimately referenced, the handle had to be locked, which would prevent it being moved. (There was little concurrency in the system so unless one was calling the operating system or libraries which might, one could also rely on memory not getting moved. Doing so was somewhat perilous however as code often evolved to call something that could move memory inside a place where that was not expected.)
In this design, the handle is a pointer but it is not an opaque type. A generic handle is a void ** in C, but often one had a typed handle. If you look here you'll find lots of handle types that are more concrete. E.g. StringHandle.

Related

Why is a function not an object?

I read in the standards n4296 (Draft) § 1.8 page 7:
An object is a region of storage. [ Note: A function is not an object,
regardless of whether or not it occupies storage in the way that
objects do. —end note ]
I spent some days on the net looking for a good reason for such exclusion, with no luck. Maybe because I do not fully understand objects. So:
Why is a function not an object? How does it differ?
And does this have any relation with the functors (function objects)?

A lot of the difference comes down to pointers and addressing. In C++¹ pointers to functions and pointers to objects are strictly separate kinds of things.
C++ requires that you can convert a pointer to any object type into a pointer to void, then convert it back to the original type, and the result will be equal to the pointer you started with². In other words, regardless of exactly how they do it, the implementation has to ensure that a conversion from pointer-to-object-type to pointer-to-void is lossless, so no matter what the original was, whatever information it contained can be recreated so you can get back the same pointer as you started with by conversion from T* to void * and back to T*.
That's not true with a pointer to a function though--if you take a pointer to a function, convert it to void *, and then convert it back to a pointer to a function, you may lose some information in the process. You might not get back the original pointer, and dereferencing what you do get back gives you undefined behavior (in short, don't do that).
For what it's worth, you can, however, convert a pointer to one function to a pointer to a different type of function, then convert that result back to the original type, and you're guaranteed that the result is the same as you started with.
Although it's not particularly relevant to the discussion at hand, there are a few other differences that may be worth noting. For example, you can copy most objects--but you can't copy any functions.
As far as relationship to function objects goes: well, there really isn't much of one beyond one point: a function object supports syntax that looks like a function call--but it's still an object, not a function. So, a pointer to a function object is still a pointer to an object. If, for example, you convert one to void *, then convert it back to the original type, you're still guaranteed that you get back the original pointer value (which wouldn't be true with a pointer to a function).
As to why pointers to functions are (at least potentially) different from pointers to objects: part of it comes down to existing systems. For example, on MS-DOS (among others) there were four entirely separate memory models: small, medium, compact, and large. Small model used 16 bit addressing for either functions or data. Medium used 16 bit addresses for data, and 20-bit addresses for code. Compact reversed that (16 bit addresses for code, 20-bit addresses for data). Large used 20-bit addresses for both code and data. So, in either compact or medium model, converting between pointers to code and pointers to functions really could and did lead to problems.
More recently, a fair number of DSPs have used entirely separate memory buses for code and for data and (like with MS-DOS memory models) they were often different widths, converting between the two could and did lose information.
These particular rules came to C++ from C, so the same is true in C, for whatever that's worth.
Although it's not directly required, with the way things work, pretty much the same works out to be true for a conversion from the original type to a pointer to char and back, for whatever that's worth.

Why a function is not an object? How does it differ?
To understand this, let's move from bottom to top in terms of abstractions involved. So, you have your address space through which you can define the state of the memory and we have to remember that fundamentally it's all about this state you operate on.
Okay, let's move a bit higher in terms of abstractions. I am not taking about any abstractions imposed by a programming language yet (like object, array, etc.) but simply as a layman I want to keep a record of a portion of the memory, lets call it Ab1 and another one called Ab2.
Both have a state fundamentally but I intend to manipulate/make use of the state differently.
Differently...Why and How?
Why ?
Because of my requirements (to perform addition of 2 numbers and store the result back, for example). I will be using use Ab1 as a long usage state and Ab2 as relatively shorter usage state. So, I will create a state for Ab1(with the 2 numbers to add) and then use this state to populate some of state of Ab2(copy them temporarily) and perform further manipulation of Ab2(add them) and save a portion of resultant Ab2 to Ab1(the added result). Post that Ab2 becomes useless and we reset its state.
How?
I am going to need some management of both the portions to keep track of what words to pick from Ab1 and copy to Ab2 and so on. At this point I realize that I can make it work to perform some simple operations but something serious shall require a laid out specification for managing this memory.
So, I look for such management specification and it turns out there exists a variety of these specifications (with some having built-in memory model, others provide flexibility to manage the memory yourself) with a better design. In-fact because they(without even dictating how to manage the memory directly) have successfully defined the encapsulation for this long lived storage and rules for how and when this can be created and destroyed.
The same goes for Ab2 but the way they present it makes me feel like this is much different from Ab1. And indeed, it turns out to be. They use a stack for state manipulation of Ab2 and reserve memory from heap for Ab1. Ab2 dies after a while.(after finished executing).
Also, the way you define what to do with Ab2 is done through yet another storage portion called Ab2_Code and specification for Ab1 involves similarly Ab1_Code
I would say, this is fantastic! I get so much convenience that allows me to solve so many problems.
Now, I am still looking from a layman's perspective so I don't feel surprised really having gone through the thought process of it all but if you question things top-down, things can get a bit difficult to put into perspective.(I suspect that's what happened in your case)
BTW, I forgot to mention that Ab1 is called an object officially and Ab2 a function stack while Ab1_Code is the class definition and Ab2_Code is the function definition code.
And it is because of these differences imposed by the PL, you find that they are so different.(your question)
Note: Don't take my representation of Ab1/Object as a long storage abstraction as a rule or a concrete thing - it was from layman perspective. The programming language provides much more flexibility in terms of managing lifecycle of an object. So, object may be deployed like Ab1 but it can be much more.
And does this have any relation with the functors (function objects)?
Note that the first part answer is valid for many programming languages in general(including C++), this part has to do specifically with C++ (whose spec you quoted). So you have pointer to a function, you can have a pointer to an object too. Its just another programming construct that C++ defines. Notice that this is about having a pointer to the Ab1, Ab2 to manipulate them rather than having another distinct abstraction to act upon.
You can read about its definition, usage here:
C++ Functors - and their uses

Let me answer the question in simpler language (terms).
What does a function contain?
It basically contains instructions to do something. While executing the instructions, the function can temporarily store and / or use some data - and might return some data.
Although the instructions are stored somewhere - those instructions themselves are not considered as objects.
Then, what are the objects?
Generally, objects are entities which contain data - which get manipulated / changed / updated by functions (the instructions).
Why the difference?
Because computers are designed in such way that the instructions do not depend on the data.
To understand this, let's think about a calculator. We do different mathematical operations using a calculator. Say, if we want to add some numbers, we provide the numbers to the calculator. No matter what the numbers are, the calculator will add them in the same way following the same instructions (if the result exceeds the calculator's capacity to store, it will show an error - but that is because of calculator's limitation to store the result (the data), not because of its instructions for addition).
Computers are designed in the similar manner. That is why when you use a library function (for example qsort()) on some data which are compatible with the function, you get the same result as you expect - and the functionality of the function doesn't change if the data changes - because the instructions of the function remains unchanged.
Relation between function and functors
Functions are set of instructions; and while they are being executed, some temporary data can be required to store. In other words, some objects might be temporarily created while executing the function. These temporary objects are functors.

Decide if it is a process or thread in C++

Give a void * variable as input (can only point to a process or thread), I'd like to first determine its type and then convert it to that type.
How should I do that in C++? I know it's a dumb question, but I've never done C/C++ before and can't think C/C++ way yet.
EDIT: I need to achieve this on both Linux and Windows.

You can't. Pointers carry two pieces of information: the location in memory to where they point and the type of the pointed object. With a void * this last information is omitted, and there's no way to reconstruct what type it pointed to. So, you need to carry along this pointer another value that specifies what it actually points to (you can use e.g. an enum).
The only facility somehow related to this task in C++ is RTTI, but it works only on pointers to polymorphic classes (RTTI usually exploits the vtable of the object to store additional information about the dynamic type of the pointer, but the vtable can be accessed and correctly interpreted only if it is known that the object belongs to a particular polymorphic class hierarchy).
I'm looking for a uniform way to pass pid or tid in but will treat the ids differently. Sorry, I might not properly state my problem.
Well, this is a completely different thing... if you need to pass around your PID/TID inside a void * you could simply create a struct or something like that with a member for the ID and one to store if such ID is a PID or a TID.

There are a bunch of solutions.
For example, keep track of all the Process and Thread objects created. Store these each in a set<void*>, and check for the presence of that void* in the ProcessSet or ThreadSet. This solution just requires that you know where the objects are allocated.
Other approaches require some ability to deference.
Most obviously, if you have defined the types Process and Thread, give them a common base class and pass that around instead of a void*. This is basic OOP. You can then use RTTI to find the derived type. But most likely in this situation, a refactor/ redesign would obviate the need for this in the first place.
If you cannot add a base type, you could add a wrapper, and pass that around. This works even if all you ever see is a void*. This is similar to the set<> solution in that you require to know the type when it is allocated.
struct ProcessOrThread
{
bool isProcess_;
void* handle_;
};
All this really boils down to: If you know the type to start with, avoid throwing that information away in the first place.

What system are you talking about? On Linux, I would say your question does not make any sense, because processes don't have addresses (a pid_t as returned by fork or getpid is an integer).
You could use libraries which wrap processes and threads as objects, like Qt does (and it works on Linux, Windows, MaCOSX...). (and they you could e.g. use dynamic_cast or Qt meta object system, if you are sure the pointer points to either an instance of QThread or an instance of QProcess).

The only thing you can do is attach a type information to the process/thread structures.

Handle object in c++

I've been told that a handle is a sort of "void" pointer. But what exactly does "void pointer" mean and what is its purpose. Also, what does "somehandle = GetStdHandle(STD_INPUT_HANDLE); do?

A handle in the general sense is an opaque value that uniquely identifies an object. In this context, "opaque" means that the entity distributing the handle (e.g. the window manager) knows how handles map to objects but the entities which use the handle (e.g. your code) do not.
This is done so that they cannot get at the real object unless the provider is involved, which allows the provider to be sure that noone is messing with the objects it owns behind its back.
Since it's very practical, handles have traditionally been integer types or void* because using primitives is much easier in C than anything else. In particular, a lot of functions in the Win32 API accept or return handles (which are #defined with various names: HANDLE, HKEY, many others). All of these types map to void*.
Update:
To answer the second question (although it might be better asked and answered on its own):
GetStdHandle(STD_INPUT_HANDLE) returns a handle to the standard input device. You can use this handle to read from your process's standard input.

A HANDLE isn't necessarily a pointer or a double pointer, it may be an index in an OS table as well as anything else. It's defined for convenience as a void * because often is used actually as a pointer, and because in C void * is a type on which you can't perform almost any operation.
The key point is that you must think at it as some opaque token that represents a resource managed by the OS; passing it to the appropriate functions you tell them to operate on such object. Because it's "opaque", you shouldn't change it or try to dereference it: just use it with functions that can work with it.

A HANDLE is a pointer to a pointer, it's pretty much as simple as that.
So to get the pointer to the data, you'd have to dereference it first.
GetStdHandle(STD_INPUT_HANDLE) will return the handle to the stdin stream - standard input. That's either the console or a file/stream if you invoke from the command prompt with a '<' character.

A Windows HANDLE is effectively an index into an array of void pointers, plus a few other things. A void pointer (void*) is the pointer that points to an unknown type and should be avoided at all costs in C++- however the Windows API is C-compatible and uses it to avoid having to expose Windows internal types.
GetStdHandle(STD_INPUT_HANDLE) means, get the HANDLE associated to the Standard output stream.

How can I maintain a weak reference on a COM object in C++?

In my application, I'm hooking various functions for creating COM objects (such as CoCreateInstanceEx) to get notified whenever some object is created. I'm keeping track of all created objects in a std::list and I'm iterating over that list to do various things (like checking which OLE objects have been activated).
The issue with this is that right now, whenever adding an IUnknown pointer to my list, I call IUnknown::AddRef on it to make sure that it doesn't get destroyed while I'm tracking it. That's not what I really want though; the lifetime of the object should be as long (or short) as it is without my tracing code, so I'd rather like to maintain a weak reference on the objects. Whenever the last reference to some tracked COM object is removed (and thus the object gets destroyed), I'd like to get notified so that I can update my bookkeeping (e.g. by setting the pointer in my list to NULL).*
What's the best way to do this? Right now, I'm patching the (first) VTable of all created objects so that the calls to IUnknown::Release via the first vtable get notified. However, this won't work for COM interfaces which inherit from multiple interfaces (and thus have multiple vtables), but I'm not sure whether this is really a problem: given the Rules for Implementing QueryInterface, there should always be just one IUnknown returned by IUnknown::QueryInterface, right? So I could do that and then patch that vtable.
Furthermore, this approach is also a bit hairy since it involves creating thunks which generate some code. I only implemented this for 32bit so far. Not a big issue, but still.
I'm really wondering whether there isn't a more elegant way to have a weak reference to a COM object. Does anybody know?
*: The next thing I'll have to solve is making this work correctly in case I have active iterators (I'm using custom iterator objects) traversing the list of COM objects. I may need to keep track of the active iterators and once the last one finished, remove all null pointers from the list. Or something like that.

This isn't an answer as much as a set of issues why this is a really tricky thing to do - I'm putting it in as an answer since there's too much information here than fits in a comment :)
My understanding is that the concept of weak reference just doesn't exist in COM, period. You've got reference counting via IUnknown, and that's the sum total of how COM deals with object lifetime management. Anything beyond that is, strictly speaking, not COM.
(.Net does support the concept, but it's got an actual GC-based memory manager to provide appropriate support, and can treat WeakRef objects differently than regular references in memory. But that's not the case with the very simple world that COM assumes, which is a world of plain memory and pointers, and little more.)
COM specifies that reference counting is per-interface; any COM object is free to do ref counting per object as a convenience, but the upshot is that if you're wrapping an object, you have to assume the most restrictive case. So you cannot assume that any given IUnknown will be used for all addrefs/releases on that object: you'd really need to track each interface separately.
The canonical IUnknown - the one you get back by QI'ing for IUnknown - could be any interface at all - even a dedicated IUnknown that is used only for the purpose of acting as an identity! - so long as the same binary pointer value is returned each time. All other interfaces could be implemented any way; typically the same value is returned each time, but a COM object could legitimately return a new IFoo each time someone QI's for IFoo. Or even keep around a cache of IFoos and return one at random.
...and then you've got aggregation to deal with - basically, COM doesn't have a strong concept of object at all, it's all about interfaces. Objects, in COM, are just a collection of interfaces that happen to share the same canonical IUnknown: they might be implemented as a single C/C++ object behind the scenes, or as a family of related C/C++ objects presenting a facade of a 'single COM object'.
Having said all of that, given that:
I'm tracing the state of various components (including all COM objects) of this software for the sake of debugging
Here's an alternate approach that might produce some useful data to debug with.
The idea here is that many implementations of COM objects will return the ref count as the return value to Release() - so if they return 0, then that's a clue that the interface may have been released.
This is not guaranteed, however: as MSDN states:
The method returns the new reference count. This value is intended to be used only for test purposes.
(emphasis added.)
But that's apparently what you're doing here.
So one thing you could do, assuming you own the calling code, is to replace calls with Release() with an inline called MyRelease() or similar that will call release, and if it notices that the return value is 0, then notes that the interface pointer is now possibly freed - removes it from a table, logs it to a file, etc.
One major caveat: keep in mind that COM does not have a concept of weak ref, even if you try to hack something together. Using a COM interface pointer that has not been AddRef()'d is illegal as far as COM is concerned; so if you save away interface pointer values in any sort of list, the only thing you should so with those is treat them as opaque numbers for debugging purposes (eg. log them to a file so you can correlate creates with destroys, or keep track of how many you have outstanding), but do not attempt to use them as actual interface pointers.
Again, keep in mind that nothing requires a COM object to follow the convention of returning the refcount; so be aware that you could see something that looks like a bug but is actually just an implementation of Release just happens to always returns 0 (or rand(), if you're especially unlucky!)

First, you're right that QueryInterface for IUnknown should always return the same pointer; IUnknown is treated as the object's identity IIRC, so needs to be stable.
As for weak pointers, off the top of my head, maybe you could give CoMarshalInterThreadInterfaceInStream a whirl? It is meant to allow you to serialize a reference to a COM object into a stream, then create a new reference to the object on some other thread using the stream. However, if you serialise into a stream and retain the stream as a sort of weak pointer, then unmarshal later on to recover the pointer, you could check whether unmarshalling fails; If so, the object is gone.

With WinRT IWeakReference was added to enable weak refs to COM objects. Objects created with WRL's RuntimeClass support IWeakReference by default (can be disabled with an option).
you can use IWeakReference in your designs but it means you will need to use at least some WinRT concepts, IInspectable based interface.

Getting list of all existing vtables

In my application I have quite some void-pointers (this is because of historical reasons, application was originally written in pure C). In one of my modules I know that the void-pointers points to instances of classes that could inherit from a known base class, but I cannot be 100% sure of it. Therefore, doing a dynamic_cast on the void-pointer might give problems. Possibly, the void-pointer even points to a plain-struct (so no vptr in the struct).
I would like to investigate the first 4 bytes of the memory the void-pointer is pointing to, to see if this is the address of the valid vtable. I know this is platform, maybe even compiler-version-specific, but it could help me in moving the application forward, and getting rid of all the void-pointers over a limited time period (let's say 3 years).
Is there a way to get a list of all vtables in the application, or a way to check whether a pointer points to a valid vtable, and whether that instance pointing to the vtable inherits from a known base class?

I would like to investigate the first
4 bytes of the memory the void-pointer
is pointing to, to see if this is the
address of the valid vtable.
You can do that, but you have no guarantees whatsoever it will work. Y don't even know if the void* will point to the vtable. Last time I looked into this (5+ years ago) I believe some compiler stored the vtable pointer before the address pointed to by the instance*.
I know this is platform, maybe even
compiler-version-specific,
It may also be compiler-options speciffic, depending on what optimizations you use and so on.
but it could help me in moving the
application forward, and getting rid
of all the void-pointers over a
limited time period (let's say 3
years).
Is this the only option you can see for moving the application forward? Have you considered others?
Is there a way to get a list of all
vtables in the application,
No :(
or a way to check whether a pointer
points to a valid vtable,
No standard way. What you can do is open some class pointers in your favorite debugger (or cast the memory to bytes and log it to a file) and compare it and see if it makes sense. Even so, you have no guarantees that any of your data (or other pointers in the application) will not look similar enough (when cast as bytes) to confuse whatever code you like.
and whether that instance pointing to
the vtable inherits from a known base
class?
No again.
Here are some questions (you may have considered them already). Answers to these may give you more options, or may give us other ideas to propose:
how large is the code base? Is it feasible to introduce global changes, or is functionality to spread-around for that?
do you treat all pointers uniformly (that is: are there common points in your source code where you could plug in and add your own metadata?)
what can you change in your sourcecode? (If you have access to your memory allocation subroutines or could plug in your own for example you may be able to plug in your own metadata).
If different data types are cast to void* in various parts of your code, how do you decide later what is in those pointers? Can you use the code that discriminates the void* to decide if they are classes or not?
Does your code-base allow for refactoring methodologies? (refactoring in small iterations, by plugging in alternate implementations for parts of your code, then removing the initial implementation and testing everything)
Edit (proposed solution):
Do the following steps:
define a metadata (base) class
replace your memory allocation routines with custom ones which just refer to the standard / old routines (and make sure your code still works with the custom routines).
on each allocation, allocate the requested size + sizeof(Metadata*) (and make sure your code still works).
replace the first sizeof(Metadata*) bytes of your allocation with a standard byte sequence that you can easily test for (I'm partial to 0xDEADBEEF :D). Then, return [allocated address] + sizeof(Metadata*) to the application. On deallocation, take the recieved pointer, decrement it by `sizeof(Metadata*), then call the system / previous routine to perform the deallocation. Now, you have an extra buffer allocated in your code, specifically for metadata on each allocation.
In the cases you're interested in having metadata for, create/obtain a metadata class pointer, then set it in the 0xDEADBEEF zone. When you need to check metadata, reinterpret_cast<Metadata*>([your void* here]), decrement it, then check if the pointer value is 0xDEADBEEF (no metadata) or something else.
Note that this code should only be there for refactoring - for production code it is slow, error prone and generally other bad things that you do not want your production code to be. I would make all this code dependent on some REFACTORING_SUPPORT_ENABLED macro that would never allow your Metadata class to see the light of a production release (except for testing builds maybe).

I would say it is not possible without related reference (header declaration).

If you want to replace those void pointers to correct interface type, here is what I think to automate it:
Go through your codebase to get a list of all classes that has virtual functions, you could do this fast by writing script, like Perl
Write an function which take a void* pointer as input, and iterate over those classes try to dynamic_cast it, and log information if succeeded, such as interface type, code line
Call this function anywhere you used void* pointer, maybe you could wrap it with a macro so you could get file, line information easy
Run a full automation (if you have) and analyse the output.

The easier way would be to overload operator new for your particular base class. That way, if you know your void* pointers are to heap objects, then you can also with 100% certainty determine whether they're pointing to your object.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js