My COM-implementing classes take parameters as BSTR (and VARIANT*) and these get passed around internally. Generally we convert them to std::string or std::wstring but some times they are getting pased around and will be sent back into another COM call.
In these cases is it better to pass raw COM types around, or wrap them in helper-classes like _bstr_t and _variant_t?
If you do nothing special with a BSTR but just pass it to another method, then you don't have to wrap it. You can see the BSTR as an opaque pointer.
_bstr_t (or CComBSTR which is another wrapper provider with Visual Studio) is useful when you need to allocate BSTR and don't want to manage memory yourself (and make sure you're not leaking memory), but they are not mandatory.
PS: unless I need to output a BSTR to a program that doesn't understand Unicode, I would never use an intermediary std:string to pass BSTRs around, as I would take the risk of losing information. std:wstring is better.
Related
I am trying to make a DLL that exposes a certain API and thus wanted to implement a safe way to copy strings over the DLL boundaries. The DLL implementation is very straight forward - all the functions that return string values take two arguments - char* and size_t&. If the size is big enough I memcpy the contents of the string from within the DLL to the given pointer, set the size to the actual one and return a successful return code. If it is not I set the size to what it should be and return an error code. The DLL side is very straightforward.
Now what is more complicated - how to make a nice template function which given a pointer to some function from the DLL would do all the correct manipulations to fill out an instance of std::string. This is what I came down to:
template<typename I>
CErrorCode GetStringValue(const I& Instance, CErrorCode(I::*pMethod)(char*, size_t&) const, std::string& sValue)
{
std::string sTemporaryValue;
size_t nValueLength = sTemporaryValue.capacity();
sTemporaryValue.resize(nValueLength);
do
{
auto eErrorCode = (Instance.*pMethod)(const_cast<char*>(sTemporaryValue.c_str()), nValueLength);
if (eErrorCode == CErrorCode::BufferTooSmall)
{
sTemporaryValue.resize(nValueLength);
}
else
{
if (eErrorCode == CErrorCode::NoError)
{
sTemporaryValue.resize(nValueLength);
sValue = std::move(sTemporaryValue);
}
return eErrorCode;
}
}
while (true);
}
So I did the initial resize because I can't do that after the first call (since then it would erase the content, since the string is initially empty). But resize fills the string with zero characters, which kind of upsets me (I mean I know I am going to be doing the filling myself anyway). And then I need the resize even after a successful run since if the length was actually smaller I need to resize down. Any suggestions if this can be done I a nicer way?
Since you are using DLLs, my understanding is that you are on Windows, so I'm going to suggest a Windows-specific solution.
Well, the problem of safely passing string across module boundaries has already been solved on Windows by COM memory allocator and BSTR.
So, instead of passing a string pointer and a size pointer, and check if the caller's allocated buffer is large enough, and if not return the required size, etc. it seems much simpler to me to just return BSTRs from the DLL.
BSTRs are allocated using a COM allocator, and so you can allocate them and free them across different module boundaries.
There are also nice C++ wrapper classes around a BSTR, e.g. _bstr_t or ATL's CComBSTR. You can use those at the caller's site.
Once you have a BSTR instance, possibly properly wrapped in a convenient RAII wrapper at the caller's site, you can easily convert it to a std::wstring.
Or, if you want to use the Unicode UTF-8 encoding and std::string, you can convert from the BSTR's native UTF-16 encoding to UTF-8, using WideCharToMultiByte() (this MSDN Magazine article can come in handy).
Bonus reading: Eric’s Complete Guide To BSTR Semantics
P.S.
If you don't want to use the BSTR type, you can still use a common memory allocator provided by COM: e.g. you can allocate your (UTF-8) string memory using CoTaskMemAlloc() in the DLL, and free it using CoTaskMemFree() at the caller's site.
I want to call a 3rd party function from my code. The function's prototype is:
HRESULT foo(/*[in]*/VARIANT, /*[in]*/VARIANT*, /*[out]*/VARIANT*)
Currently I am using CComVariant wrappers around my VARIANT variables, and I want to pass them to this function. I am a little bit confused, how should I do it. In the case of the input arguments, should I just pass them directly, or detach them into a simple VARIANT and pass those variables? As I can see, both version works, but which one is the cleaner and recommended method? My current code is something like this:
CComVariant param0(CComBSTR( "Hello world"));
CComVariant param1(VARIANT_FALSE);
CComVariant param2;
HRESULT hr = foo(param0, ¶m1, ¶m2);
Yes, this code is okay and can hardly be done better given how ATL::CComVariant is designed unless you spend some time writing helper functions. A couple of thoughts on that.
Passing CComVariant as in VARIANT is okay - the base part is just member-wise copied into the function parameter and used by the called function from there. No deep copying happens and that's okay, the callee can use the copy just as well, if it wants to deep copy the parameter or AddRef() it - it still can do so.
Passing address of CComVariant as in VARIANT is okay (CComVariant* is implicitly upcast to VARIANT*, the function accesses the subobject), but not the best code in the world. Here's why. There's a lot of similar classes which own resources, the most important of them are ATL::CComBSTR, _bstr_t, ATL::CComPtr, _com_ptr_t and all of them have operator&() overloaded and also those of them which belong to ATL namespace just return the pointer to the stored pointer but _bstr_t and _com_ptr_t release the owned object before returning the pointer. Which is why this:
ATL::CComPtr<IUnknown> ptr( initialize() );
&ptr;
and this:
_com_ptr_t<IUnknown> ptr( initialize() );
&ptr;
have different effects. Neither ATL::CComVariant nor _variant_t have operator&() overloaded which complicates things a bit further. The cleaner way would be to use an "accessor extractor" method such as _variant_t::GetVARIANT() but ATL::CComVariant has not such method. So using & is the only option you have here.
Passing CComVariant address obtained using & as out VARIANT* is okay, but again not the best code in the world. It's fine when you know for sure the variant is empty but this code:
CComVariant var( obtainIUnknown() );
foo( whatever, whatever, &var );
will blindly overwrite the pointer and cause the previously referenced object to be leaked. A better way would be to use something like _variant_t::GetAddress() which clears the variant and then returns a pointer to raw VARIANT but again ATL::CComVariant has no such method.
So the bottom like is you code is okay, just be careful when modifying it. If you happen to write a lot of code like this you may be better off writing helper functions which do those "extraction" manipulations.
BSTR is this weird Windows data type with a few specific uses, such as COM functions. According to MSDN, it contains a WCHAR string and some other stuff, like a length descriptor. Windows is also nice enough to give us the _bstr_t class, which encapsulates BSTR; it takes care of the allocation and deallocation and gives you some extra functionality. It has four constructors, including one that takes in a char* and one that takes in a wchar_t*. MSDN's description of the former: "Constructs a _bstr_t object by calling SysAllocString to create a new BSTR object and then encapsulates it. This constructor first performs a multibyte to Unicode conversion."
It also has operators that can extract a pointer to the string as any of char*, const char*, and wchar_t*, and I'm pretty sure those are null-terminated, which is cool.
I've spent a while reading up on how to to convert between multibyte and Unicode, and I've seen a lot of talk about how to use mbstowcs and wcstomb, and how MultiByteToWideChar and WideCharToMultiByte are better because of encodings may differ, and blah blah blah. It all kind of seems like a headache, so I'm wondering whether I can just construct a _bstr_t and use the operations to access the strings, which would be... a lot fewer lines of code:
char* multi = "asdf";
_bstr_t bs = _bstr_t(mb);
wchar_t* wide = (wchar_t*)bs; // assume read-only
I guess my intuitive answer to this is that we don't know what Windows is doing behind the scenes, so if I have a problem using mbstowcs/wcstomb (I guess I really mean mbstowcs_s/wcstomb_s) rather than MultiByteToWideChar/WideCharToMultiByte, I shouldn't risk it because it's possible that Windows uses those. (It's almost certainly not using the latter, since I'm not specifying a "code page" here, whatever that is.) Honestly I'm not sure yet whether I consider the mbstowcs_s and wcstomb_s functions OK for my purposes, because I don't really have a grasp on all of the different encodings and stuff, but that's a whole different question and it seems to be addressed all over the Internet.
Sooooo, is there anything wrong with doing this, aside from that potential concern?
Using _bstr_t::_bstr_t(const char*) is not exactly a good idea in production code:
Constructs a _bstr_t object by calling SysAllocString to create a new BSTR object and encapsulate it. This constructor first performs a multibyte to Unicode conversion. If s2 is too large, you [sic] may generate a stack overflow error. In such a situation, convert your char* to a wchar_t with MultiByteToWideChar and then call the wchar_t * constructor.
Besides that _bstr_t::operator wchar_t*() const throw() seems barely useful. It's just for struct member extraction, so you're constrained to a const:
These operators can be used to extract raw pointers to the encapsulated Unicode or multibyte BSTR object. The operators return the pointer to the actual internal buffer, so the resulting string cannot be modified.
So _bstr_t is just a helper object for encapsulating BSTRs, and a mediocre one at that. Conversion using MultiByteToWideChar and WideCharToMultiByte is a much better choice, for multiple reasons:
It's much less prone to crash.
You don't get a const buffer in return, because you provide your own.
The names of those functions are self-descriptive. Conversion through a constructor and casting operator of an unrelated type is not.
I have a "const char* str" with a very long string.
I need to pass it from a cpp client to a .Net COM method which expects BSTR type.
Currently I use:
CComBSTR bstr = str;
This has the following issues:
Sometimes this line fails with out of memory message
When I pass the bstr to the COM class it takes a lot of memory (much more than the string size) so it can fail with out of memory
Questions:
Am I converting to CComBSTR wisely? e.g. is there a way to use the heap or something
Is it better to use BSTR instead?
Any other suggestion is also welcomed...
If a method is expecting a BSTR passing a BSTR is the only correct way.
To convert char* to a BSTR you use MultiByteToWideChar() Win32 API function for conversion and SysAllocStringLen() for memory allocation. You can't get around that - you need SysAllocStringLen() for memory allocation because otherwise the COM server will fail if it calls SysStringLen().
When you use CComBSTR and assign a char* to it the same sequence is run - ATL is available as headers and you can enjoy reading it to see how it works. So in fact CComBSTR does exactly the minimal set of necessary actions.
When you pass a BSTR to a COM server CComBSTR::operator BSTR() const is called that simply returns a pointer to the wrapped BSTR - the BSTR body is not copied. Anything that happens next is up to the COM server or the interop being used - they decide for themselves whether they want to copy the BSTR body or just read it directly.
Your only bet for resolving the memory outages is to change the COM interface so that it accepts some reader and requests the data in chunks through that reader.
Is it an In-Process COM server do you have the code for it or is it a 3rd party? because you can pass the actual char* pointer to the COM server and not pay the price of allocate+copy+free. You will need to add a new method/property that will be available only to C++ clients.
Instead of passing BSTR you can wrap your char* in a Stream interface, the .NET server should get a Stream instead of a string.
On the C++ side implement a COM class that support the IStream COM interface, this class is a read only stream which wraps the char*, you can pass this class as UCOMIStream interface to the .NET server.
On the .NET side use the UCOMIStream methods to read the string, be careful not to read the entire stream in one pass.
A CComBSTR is a wrapper around a BSTR which in turn is counted Unicode string with special termination.
You would thus expect it to be about twice the size of the corresponding char* form (for character sets that mainly use single-byte characters).
Using a CComBSTR is a good idea in general, since the destructor will free the memory associated with the BSTR for you.
I have a BSTR object that I would like to convert to copy to a wchar__t object. The tricky thing is the length of the BSTR object could be anywhere from a few kilobytes to a few hundred kilobytes. Is there an efficient way of copying the data across? I know I could just declare a wchar_t array and alway allocate the maximum possible data it would ever need to hold. However, this would mean allocating hundreds of kilobytes of data for something that potentially might only require a few kilobytes. Any suggestions?
First, you might not actually have to do anything at all, if all you need to do is read the contents. A BSTR type is a pointer to a null-terminated wchar_t array already. In fact, if you check the headers, you will find that BSTR is essentially defined as:
typedef BSTR wchar_t*;
So, the compiler can't distinguish between them, even though they have different semantics.
There is are two important caveat.
BSTRs are supposed to be immutable. You should never change the contents of a BSTR after it has been initialized. If you "change it", you have to create a new one assign the new pointer and release the old one (if you own it).
[UPDATE: this is not true; sorry! You can modify BSTRs in place; I very rarely have had the need.]
BSTRs are allowed to contain embedded null characters, whereas traditional C/C++ strings are not.
If you have a fair amount of control of the source of the BSTR, and can guarantee that the BSTR does not have embedded NULLs, you can read from the BSTR as if it was a wchar_t and use conventional string methods (wcscpy, etc) to access it. If not, your life gets harder. You will have to always manipulate your data as either more BSTRs, or as a dynamically-allocated array of wchar_t. Most string-related functions will not work correctly.
Let's assume you control your data, or don't worry about NULLs. Let's assume also that you really need to make a copy and can't just read the existing BSTR directly. In that case, you can do something like this:
UINT length = SysStringLen(myBstr); // Ask COM for the size of the BSTR
wchar_t *myString = new wchar_t[length+1]; // Note: SysStringLen doesn't
// include the space needed for the NULL
wcscpy(myString, myBstr); // Or your favorite safer string function
// ...
delete myString; // Done
If you are using class wrappers for your BSTR, the wrapper should have a way to call SysStringLen() for you. For example:
CComBString use .Length();
_bstr_t use .length();
UPDATE: This is a good article on the subject by someone far more knowledgeable than me:
"Eric [Lippert]'s Complete Guide To BSTR Semantics"
UPDATE: Replaced strcpy() with wcscpy() in the example.
BSTR objects contain a length prefix, so finding out the length is cheap. Find out the length, allocate a new array big enough to hold the result, process into that, and remember to free it when you're done.
There is never any need for conversion. A BSTR pointer points to the first character of the string and it is null-terminated. The length is stored before the first character in memory. BSTRs are always Unicode (UTF-16/UCS-2). There was at one stage something called an 'ANSI BSTR' - there are some references in legacy APIs - but you can ignore these in current development.
This means you can pass a BSTR safely to any function expecting a wchar_t.
In Visual Studio 2008 you may get a compiler error, because BSTR is defined as a pointer to unsigned short, while wchar_t is a native type. You can either cast or turn off wchar_t compliance with /Zc:wchar_t.
One thing to keep in mind is that BSTR strings can, and often do, contain embedded nulls. A null does not mean the end of the string.
Use ATL, and CStringT then you can just use the assignment operator. Or you can use the USES_CONVERSION macros, these use heap alloc, so you will be sure that you won't leak memory.