How to create a VB6 collection object with ATL - c++

or a VB6 - compatible - collection object.
We provide hooks into our .net products through a set of API's.
We need to continue to support customers that call our API's from VB6, so we need to continue supporting VB6 collection objects (simple with VBA.Collection in .net).
The problem is supporting some sites that use VBScript to call our API's. VBScript has no concept of a collection object, so to create a collection object to pass to our API we built a VB6 ActiveX DLL that provides a "CreateCollection" method. This method simply creates and passes back a new collection object. Problem solved.
After many years of pruning, porting and re-building, this DLL is the only VB6 code we have. Because of it we still need to install Visual Studio 6 on our Dev & build Machines.
I'm not happy with our reliance on this DLL for several reasons (my personal dislike of VB6 is not one of them). Top of the list is that Microsoft no longer support Visual Studio 6.
My question is, how do I get ATL to create a collection object that implements the same interface as the VB6 collection object.
I've a good handle on C++, but only a loose grasp of ATL - I can create simple objects and implement simple methods, but this is beyond me.

Collections are more or less based on convention. They implement IDispatch and expose some standard methods and properties:
Add() - optional
Remove() - optional
Count - read-only
_NewEnum - hidden, read-only, returns pointer to enumerator object that implements IEnumVariant
The _NewEnum property is what allows Visual Basic For Each.
In the IDL you use a dual interface and:
[propget, id(DISPID_NEWENUM), restricted] HRESULT _NewEnum([out, retval] IUnknown** pVal)
Here are some MSDN entries: Design Considerations for ActiveX Objects
And here is some ATL specific convenience: ATL Collections and Enumerators

Lets target this VBScript snippet
Dim vElem
For Each vElem In MyObject
particularly the implementation of MyObject. As a minimum you have to implement a method/propget with DISPID_NEWENUM on the default dispinterface (its dual/dispinterface to talk about DISPIDs). You can name it whatever you want, it doesn't matter. Most collections use NewEnum, and flag it in IDL as hidden. VB6 uses underscore prefix to mark hidden methods so you might see _NewEnum as recommendation but it's kind of a cargo cult ATL does.
You don't need any Count, Item, Add, Remove, Clear or any other method at all (on the default interface). You can supply these as a convenience (particulatly Item accessor and probably Count) but you don't have to, to make the sample code above work.
Next, the retval has to be a separate object (so called enumerator) which implements IEnumVARIANT interface by using a (private) pointer to MyObject. In IDL you can declare retval as IUnknown nothing wrong here. What is most interesting is that you have to implement only the Next method on IEnumVARIANT, you can return E_NOTIMPLEMENTED on the rest if you like or optionally implement them though these are never called by For Each. What makes the implementation even easier is that celt parameter of Next (the number of items requested) is always 1, so For Each requests items always one by one.
What you can use in ATL is CComEnumOnSTL and the like to create a "proxy" enumerator on an STL container, or the array based enumerator ATL provides (and exclude STL).

For a good example of how to implement COM collections that would be used naturally in script programming languages, check out my website
It offers a comprehensive example of how to do that...


Why does Microsoft have IHTMLDocument, IHTMLDocument2, ... , IHTMLDocument8?

What is the meaning of the number in the end of the interface name? I see that IHTMLDocument3-7 have no members (see example for #5), and 8 has gesture related members. Is the number derived from Windows version?
This is a general feature of public COM interfaces.
If you want backward compatibility, you never want to change a published interface, because that would mean all the code people wrote for, say, IE 6 stops working with IE 7, and all of their customers get mad at them, and they get mad at you.
So, if IE 5 adds new features that needed to be exposed, instead of changing IHTMLDocument, you create a new interface, and make IE5 support both (by inheritance, QueryInterface, or some more explicit mechanism). And when IE 7.0.2 or IE 8 or Win XP or whatever adds even more new features, you create another one. And so on.
While MS could have come up with descriptive suffixes instead of just sequential numbers, that would probably be more confusing than helpful. So, IHTMLDocument2, IHTMLDocument3, etc. are the names. They don't mean anything, except the order they were added.
What is the meaning of the number in the end of the interface name?
That is the standard convention for versioning COM interfaces. IXXX2 extends IXXX with new functions. IXXX3 extends IXXX2 with new functions, and so on. This allows clients to use older functions without breaking when new versions are released, and use newer functions when desired, even check if those functions are available before trying to call them.
I see that IHTMLDocument3-7 have no members
Where did you get that idea from? Look at their actual definitions. They expose many new members from one interface to the next.
No - it just signifies a different version of the interface. It has nothing to do with Windows version (and, for that matter, little/nothing to do with MSHTML version):
as of
where we can see :
The IHTMLDocument3 interface inherits from the IDispatch
interface but does not have additional members.
it can bee little confusing for newcomers to interface world.

COleSafeArray vs CComSafeArray

I am in a situation where I have a COM object that I need to use in come windows only C++ code. The COM object has functions that accept SAFEARRAYs as arguments to pass arrays of bytes. After looking at the SAFEARRAY API ( ) I decided it wasn't what I wanted and that I should find an object oriented wrapper. I tried looking for open source ones and I didn't find any. I found that microsoft has created two objects that seem to encapsulate SAFEARRAY. It looks like the CComSafeArray is exactly what I need, and like the the COleSafeArray might be useful but could exist only for legacy compatibility.
Is COleSafeArray just around for historical compatibility or is there something I am missing?
When should COleSafeArray be used instead of CComSafeArray?
Are there any open source implementations that might be worth looking into?
What are the Pros and Cons of each?
The difference is obvious from class names.
MFC COleSafeArray is designed to support OLE Automation and is actually wrapper for OLE VARIANT struct (which can contain SAFEARRAY). It works generally with array elements as they are of VARIANT type so you need to select & extract appropriate type manually.
ATL CComSafeArray is designed to support SAFEARRAY for generic COM and is actually wrapper for SAFEARRAY struct. It's template class parametrized with array elements type.
In general you shall use CComSafeArray, it's easier and simplier, accessing elements almost same way as for regular arrays/vectors.
COleSafeArray may be preferable sometimes if you work with OLE Automation interfaces that intensively use VARIANT parameters, e.g. automating MS Office, using Visual Basic components etc. For comparision in this case with CComSafeArray you will need to wrap/unwrap it manually to/from VARIANT object.

COM Object version number and how it's handled via the Win registry

a COM object is registered to the system via a bunch of informaion - stuff like the GUID, ProgId etc.
TypeLibs also have a 'version' field which a) can be written to the registry and b) are inside the idl.
My question now is - is the version field somehow checked by the system while using/ initating the COM object.
E.g. if I create a new version of my COM Object without changeing the GUID but with incrementing the version number does this lead to any kind of inconsistency?
The reason for not changing the GUID but incrementing the version number would be bug fixes in the typelib without touching the interface.
This should not be an issue, gven your deployment scenario. COM is an umbrella term for a series of technologies. At almost the lowest level*, CoCreateInstance doesn't care about higher-level concepts such as typelibs. It does care about IClassFactory, and in particular its CreateInstance method. Neither the CoGetClassObject function to get IClassFactory, nor the IClassFactory::CreateInstance method use the typelib version number, or even assume that a typelib exists.
*At the very lowest level, you can even have COM objects without a class factory or CLSID, but those can't be created by CoCreateInstance.

Custom COM Implementation?

I'm looking to implement a custom implementation of COM in C++ on a UNIX type platform to allow me to dynamically load and link object oriented code. I'm thinking this would be based on a similar set of functionality that POSIX provides to load and call dll's ie dlopen, dlsym and dlclose.
I understand that the general idea of COM is that you link to a few functions ie QueryInterface, AddRef and Release in a common dll (Kernel32.dll) which then allows you to access interfaces which are just a table of function pointers encapsulated with a pointer to the object for which the function pointers should be called with. These functions are exposed through IUnknown which you must inherit off of.
So how does this all work? Is there a better way to dynamically link and load to object oriented code? How does inheritance from a dll work - does every call to the base class have to be to an exposed member function i.e private/protected/public is simply ignored?
I'm quite well versed in C++ and template meta-programming and already have a fully reflective C++ system i.e member properties, member functions and global/static functions that uses boost.
A couple of things to keep in mind:
The power of COM comes largely from the IDL and the midl compiler. It allows a verry succint definition of the objects and interfaces with all the C/C++ boilerplate generated for you.
COM registration. On Windows the class IDs (CLSID) are recorded in the registry where they are associated with the executable. You must provide similar functionality in the UNIX environment.
The whole IUnknown implementation is fairly trivial, except for QueryInterface which works when implemented in C (i.e. no RTTI).
A whole another aspect of COM is IDispatch - i.e. late bound method invocation and discovery (read only reflection).
Have a look at XPCOM as it is a multi-platform COM like environment. This is really one of those things you are better off leveraging other technologies. It can suck up a lot of the time better spent elsewhere.
I'm looking to implement a custom implementation of COM in C++ on a UNIX type platform to allow me to dynamically load and link object oriented code. I'm thinking this would be based on a similar set of functionality that POSIX provides to load and call dll's ie dlopen, dlsym and dlclose.
At its simplest level, COM is implemented with interfaces. In c++, if you are comfortable with the idea of pure virtual, or abstract base classes, then you already know how to define an interface in c++
struct IMyInterface {
void Method1() =0;
void Method2() =0;
The COM runtime provides a lot of extra services that apply to the windows environment but arn't really needed when implementing "mini" COM in a single application as a means to dynamically link to a more OO interface than traditionally allowed by dlopen, dlsym, etc.
COM objects are implemented in .dll, .so or .dylib files depending on your platform. These files need to export at least one function that is standardized: DllGetClassObject
In your own environment you can prototype it however you want but to interop with the COM runtime on windows obviously the name and parameters need to conform to the com standard.
The basic idea is, this is passed a pointer to a GUID - 16 bytes that uniquely are assigned to a particular object, and it creates (based on the GUID) and returns the IClassFactory* of a factory object.
The factory object is then used, by the COM runtime, to create instances of the object when the IClassFactory::CreateInstance method is called.
So, so far you have
a dynamic library exporting at least one symbol, named "DllGetClassObject" (or some variant thereof)
A DllGetClassObject method that checks the passed in GUID to see if and which object is being requested, and then performs a "new CSomeObjectClassFactory"
A CSomeObjectClassFactory implementation that implements (derives from) IClassFactory, and implements the CreateInstance method to "new" instances of CSupportedObject.
CSomeSupportedObject that implements a custom, or COM defined interface that derives from IUnknown. This is important because IClassFactory::CreateInstance is passed an IID (again, a 16byte unique id defining an interface this time) that it will need to QueryInterface on the object for.
I understand that the general idea of COM is that you link to a few functions ie QueryInterface, AddRef and Release in a common dll (Kernel32.dll) which then allows you to access interfaces which are just a table of function pointers encapsulated with a pointer to the object for which the function pointers should be called with. These functions are exposed through IUnknown which you must inherit off of.
Actually, COM is implemented by OLE32.dll which exposes a "c" api called CoCreateInstance. The app passed CoCreateInstance a GUID, which it looks up in the windows registry - which has a DB of GUID -> "path to dll" mappings. OLE/COM then loads (dlopen) the dll, calls its DllGetClassObject (dlsym) method, passing in the GUID again, presuming that succeeds, OLE/COM then calls the CreateInstance and returns the resulting interface to app.
So how does this all work? Is there a better way to dynamically link and load to object oriented code? How does inheritance from a dll work - does every call to the base class have to be to an exposed member function i.e private/protected/public is simply ignored?
implicit inheritance of c++ code from a dll/so/dylib works by exporting every method in the class as a "decorated" symbol. The method name is decorated with the class, and type of every parameter. This is the same way the symbols are exported from static libraries (.a or .lib files iirc). Static or dynamic libraries, "private, protected etc." are always enforced by the compiler, parsing the header files, never the linker.
I'm quite well versed in C++ and template meta-programming and already have a fully reflective C++ system i.e member properties, member functions and global/static functions that uses boost.
c++ classes can typically only be exported from dlls with static linkage - dlls that are loaded at load, not via dlopen at runtime. COM allows c++ interfaces to be dynamically loaded by ensuring that all datatypes used in COM are either pod types, or are pure virtual interfaces. If you break this rule, by defining an interface that tries to pass a boost or any other type of object you will quickly get into a situation where the compiler/linker will need more than just the header file to figure out whats going on and your carefully prepared "com" dll will have to be statically or implicitly linked in order to function.
The other rule of COM is, never pass ownership of an object accross a dynamic library boundary. i.e. never return an interface or data from a dll, and require the app to delete it. Interfaces all need to implement IUnknown, or at least a Release() method, that allows the object to perform a delete this. Any returned data types likewise must have a well known de-allocator - if you have an interface with a method called "CreateBlob", there should probably be a buddy method called "DeleteBlob".
To really understand how COM works, I suggest reading "Essential COM" by Don Box.
Look at the CORBA documentation, at System.ComponentModel in the sscli, the XPCOM parts of the Mozilla codebase. Miguel de Icaza implemented something like OLE in GNOME called Bonobo which might be useful as well.
Depending on what you're doing with C++ though, you might want to look at plugin frameworks for C++ like Yehia. I believe Boost also has something similar.
Edit: pugg seems better maintained than Yehia at the moment. I have not tried it though.
The basic design of COM is pretty simple.
All COM objects expose their functionality through one or more interfaces
All interfaces are derived from the IUnknown interface, thus all interfaces have
QueryInterface, AddRef & Release methods as the first 3 methods of their virtual
function table in a known order
All objects implement IUnknown
Any interface that an object supports can be queried from any other interface.
Interfaces are identified by Globally Unique Identifiers, these are IIDs GUIDs or CLSIDs, but they are all really the same thing.
Where COM gets complex is in how it deals with allowing interfaces to be called from outside the process where the object resides. COM marshalling is a nasty, hairy, beast. Made even more so by the fact that COM supports both single threaded and multi-threaded programming models.
The Windows implementaion of COM allows objects to be registered (the original use of the Windows registry was for COM). At a minimum the COM registry contains the mapping between the unique GUID for a COM object, and the library (dll) that contains it's code.
For this to work. DLLs that implement COM objects must have a ClassFactory - an entry point in the DLL with a standard name that can be called to create one of the COM objects the DLL implements. (In practice, Windows COM gets an IClassFactory object from this entry point, and uses that to create other COM objects).
so that's the 10 cent tour, but to really understand this, you need to read Essential COM by Don Box.
You may be interested in the (not-yet)Boost.Extension library.

is DISPID_VALUE reliable for invokes on IDispatchs from scripts?

Continuing from this question, i am confused whether DISPID_VALUE on IDispatch::Invoke() for script functions and properties (JavaScript in my case) can be considered standard and reliable for invoking the actual function that is represented by the IDispatch?
If yes, is that mentioned anywhere in MSDN?
Please note that the question is about if that behaviour can be expected, not what some interfaces i can't know in advance might look like.
A simple use case would be:
// usage in JavaScript
myObject.attachEvent("TestEvent", function() { alert("rhubarb"); });
// handler in ActiveX, MyObject::attachEvent(), C++
DISPATCH_METHOD, par, res, ex, err);
edit: tried to clarify the question.
It should be reliable for invokes on objects from scripts if the script defines it consistently. This should be the case for JScript/Javascript in MSHTML, but unfortunately there is really sparse documentation on the subject, I don't have any solid proof in-hand.
In my own experience, a Javascript function passed to attachEvent() should always be consistent- an object received that is a 'function' can only have one callable method that matches itself. Hence the default method is the only one you can find, with DISPID 0. Javascript functions don't ordinarily have member functions, although i'm sure there is a way for this to be possible. If it did have member functions, you would see them the same way as member functions on objects. Member functions in JScript will always be consistent with regard to IDispatchEx, according to the rules of expando functions, as any functions added to an object count as expandos.
IDispatchEx interface # MSDN
The default method or property that DISPID_VALUE invokes should be consistent for a given interface. That method/property has to be specified as DISPID_VALUE in the definition of the interface in the IDL for the type library. The only way it could change is if the owner of the interface released a new version of the interface that changed which method/property was the default but that would violate a fundamental rule of COM interfaces.
As meklarian said, DISPID_VALUE (0) seems to work pretty consistantly for JS functions (thus it works great with a custom attachEvent). I've been using them this way for about a year, and it's always worked. I've also found with an activeX control embedded with an <object> tag that to get it to work consistently, I need to implement IConnectionPointContainer and IConnectionPoint for the main (object tag) IDispatch-implementing CComObject, but any others that I expose to javascript as return values from methods or properties (through Invoke) I have to implement attachEvent and detachEvent myself.
When using Connection Points, the IDispatch objects in question will expect events to be fired to the same DISPID as they are attached to on your IDispatch object..
see for an example of implementing the ConnectionPoints.
You can add DISPID's to a DISPINTERFACE, but you cannot change them once it has been published. If you need to, you can use IDispatch::GetIDsOfNames to map names to DISPIDs.
Pick up a copy of Inside Ole (2nd ed) and Inside Ole 2 (2nd ed) for a few bucks used on Amazon. It's a good reference for these obscure OLE incantations.