Problems reading std::string from std::stringstream in C++/CLI code - c++

I have a piece of code to read the build date and month like so. Here __DATE__ is as defined in Predefined Macros
const char* BUILD_DATE = __DATE__;
std::stringstream ss(BUILD_DATE);
std::string month;
size_t year;
ss >> month;
ss >> year;
ss >> year;
char buffer[1024];
sprintf(buffer, "Read year=[%d] month=[%s] date=[%s]", year,month.c_str(),BUILD_DATE);
When it works correcty the buffer is usually
Read year=[2013] month=[Mar] date=[Mar 9 2013]
But on some runs it is
Read year=[0] month=[M] date=[Mar 9 2013]
or
Read year=[2013] month=[Mar ] date=[Mar 9 2013]
Basically the year is either 0 or the month has an extra space.
The project is an x64/CLR build using Microsoft Visual Studio 2010 SP1 on Windows 7 box.
I am stumped as to why this happens occasionally. Am I using stringstream incorrectly ?

I was initially tempted to just delete the question but I thought I would share my findings in case some other poor soul encounters the same issue. The problem was very mysterious, never occurred on multiples runs of my application and only occurred while testing and never while debugging a test.
This innocent looking function
const char* BUILD_DATE = __DATE__;
std::stringstream ss(BUILD_DATE);
std::string month;
size_t year;
ss >> month;
ss >> year;
ss >> year;
was implemented in a C++/CLI dll. Before I can dwell into the details let me explain how the stringstream reads month and year here. To figure out how many characters constitute the month variable ss >> month needs to break the ss string buffer by space. The way it does it by using the current locale and in particular a facet of it called ctype. The ctype facet has function called ctype::is which can tell whether a character is space or not. In a well behaved C++ application everything just works according to standard. Now lets assume for some reason the ctype facet gets corrupted.
Viola, operator >> cannot determine what is a space and what isn't and cannot parse correctly. This was precisely what was happening in my case and below are the details.
The rest of the answer only applies to the std c++ library as provided by Visual Studio 2010 and how it operates under C++/CLI.
Consider some code like so
struct Foo
{
Foo()
{
x = 42;
}
~Foo()
{
x = 45;
}
int x;
};
Foo myglobal;
void SimpleFunction()
{
int myx = myglobal.x;
}
int main()
{
SimpleFunction();
return 0;
}
Here myglobal is what you call a object with static storage duration guaranteed to be initialized before main is entered and in SimpleFunction you will always see myx
as 42. The lifetime of myglobal is what we typically call per-process as in it is valid for the lifetime of the problem. The Foo::~Foo destructor will only run after main has returned.
Enter C++/CLI and AppDomain
AppDomain as per msdn provides you with isolated environment where applications execute. For C++/CLI it introduces the notion of objects of what I would call appdomain storage duration
__declspec(appdomain) Foo myglobal;
So if you changed the definition of myglobal like above you can potentially have myglobal.x be a different value in different appdomains kind of like thread local storage. So regular C++ objects of static duration are initialized/cleaned during init/exit of your program. I am using init/exit/cleaned very loosely here, but you get the idea. Object of appdomain storage are initialized/cleaned during load/unload of the AppDomain.
A typical managed program only uses the default AppDomain so per-process / per-appdomain storage is pretty much the same.
In C++ the static initialization order fiasco is a very common mistake where objects of static storage duration during their initialization refer to other objects of static storage duration that may have not been initialized.
Now consider what happens when a per-process variable refers to per-domain variable. Basically after the AppDomain is unloaded the per-process variable will refer to junk memory. For those of you who are wondering what it has to do with the original problem, please bear with me just a tad more.
Visual studio use_facet implementation
std::use_facet is used to get a facet of interest from the locale. It is used by operator << to get the ctype facet. It is defined as
template <class Facet> const Facet& use_facet ( const locale& loc );
Notice it returns a reference to Facet. The way it is implemented by VC is
const _Facet& __CRTDECL use_facet(const locale& _Loc)
{ // get facet reference from locale
_BEGIN_LOCK(_LOCK_LOCALE) // the thread lock, make get atomic
const locale::facet *_Psave =
_Facetptr<_Facet>::_Psave; // static pointer to lazy facet
size_t _Id = _Facet::id;
const locale::facet *_Pf = _Loc._Getfacet(_Id);
if (_Pf != 0)
; // got facet from locale
else if (_Psave != 0)
_Pf = _Psave; // lazy facet already allocated
else if (_Facet::_Getcat(&_Psave, &_Loc) == (size_t)(-1))
#if _HAS_EXCEPTIONS
_THROW_NCEE(bad_cast, _EMPTY_ARGUMENT); // lazy disallowed
#else /* _HAS_EXCEPTIONS */
abort(); // lazy disallowed
#endif /* _HAS_EXCEPTIONS */
else
{ // queue up lazy facet for destruction
_Pf = _Psave;
_Facetptr<_Facet>::_Psave = _Psave;
locale::facet *_Pfmod = (_Facet *)_Psave;
_Pfmod->_Incref();
_Pfmod->_Register();
}
return ((const _Facet&)(*_Pf)); // should be dynamic_cast
_END_LOCK()
}
What is happening here is we ask the locale for the facet of interest and store it in
template<class _Facet>
struct _Facetptr
{ // store pointer to lazy facet for use_facet
__PURE_APPDOMAIN_GLOBAL static const locale::facet *_Psave;
};
the local cache _Psave so that subsequent calls to get the same facet are faster. The caller of use_facet is not responsible for the returned facet life management so how are these facets cleaned up. The secret is the last part of code with comment queue up lazy facet for destruction. The _Pfmod->_Register() eventually calls this
__PURE_APPDOMAIN_GLOBAL static _Fac_node *_Fac_head = 0;
static void __CLRCALL_OR_CDECL _Fac_tidy()
{ // destroy lazy facets
_BEGIN_LOCK(_LOCK_LOCALE) // prevent double delete
for (; std::_Fac_head != 0; )
{ // destroy a lazy facet node
std::_Fac_node *nodeptr = std::_Fac_head;
std::_Fac_head = nodeptr->_Next;
_DELETE_CRT(nodeptr);
}
_END_LOCK()
}
struct _Fac_tidy_reg_t { ~_Fac_tidy_reg_t() { ::_Fac_tidy(); } };
_AGLOBAL const _Fac_tidy_reg_t _Fac_tidy_reg;
void __CLRCALL_OR_CDECL locale::facet::_Facet_Register(locale::facet *_This)
{ // queue up lazy facet for destruction
_Fac_head = _NEW_CRT _Fac_node(_Fac_head, _This);
}
Pretty smart right. Adds all the new'd facets to a linked list and clean them all using a static object destructor. Except there is a slight problem. _Fac_tidy_reg is marked as _AGLOBAL meaning all created facets are destroyed on a per-appdomain level.
The locale::facet *_Psave on the other hand is declared __PURE_APPDOMAIN_GLOBAL which seems to eventually expand to meaning per-process. So after the appdomain is cleaned up the per-process _Psave could potentially point to deleted faceted memory. This was precisely my problem. The way VS2010 Unit testing occurs is a process called QTAgent runs all your tests. Theses tests seem to be done in different appdomains on different runs by the same QTAgent process. Most likely to isolate side effects of previous test runs to effect subsequent tests. This is all well and good for fully managed code where pretty much all static storage is either thread/appdomain level but for C++/CLI which use per-process/per-appdomain incorrectly it could be a problem. The reason why I could never debug the test and ever find the problem is because the UT infrastructure always seem to spawn a new QTAgent process for debugging which means a new appdomain and a new process that has none of these problems.

I suggest trying this to see the actual date string:
cout << "Raw date: " << ss.str() << "\n";
Or stepping with a debugger and looking at the ss variable after it is created.

Related

C++: How to pass user input through the system without using global variables?

I am having the problem, that my application can has a lot of user input which determines how the application will be run. The application is an in memory database system and the user could for example invoke the program with commands like '--pagesize 16384' (sets the memory page size to use), '--alignment 4096' (sets the memory alignment to use) or '--measure' (sets a flag to measure certain routines).
Currently I save all the user input in global variables which are defined as extern in a header file:
//#file common.hh
extern size_t PAGE_SIZE_GLOBAL;
extern size_t ALIGNMENT_GLOBAL;
extern size_t MEMCHUNK_SIZE_GLOBAL;
extern size_t RUNS_GLOBAL;
extern size_t VECTORIZE_SIZE_GLOBAL;
extern bool MEASURE_GLOBAL;
extern bool PRINT_GLOBAL;
extern const char* PATH_GLOBAL;
and in main source file:
#include "modes.hh"
size_t PAGE_SIZE_GLOBAL;
size_t ALIGNMENT_GLOBAL;
size_t MEMCHUNK_SIZE_GLOBAL;
size_t RUNS_GLOBAL;
size_t VECTORIZE_SIZE_GLOBAL;
bool MEASURE_GLOBAL;
bool PRINT_GLOBAL;
const char* PATH_GLOBAL;
int main(const int argc, const char* argv[]){
...
//Initialize the globals with user input
PAGE_SIZE_GLOBAL = lArgs.pageSize();
ALIGNMENT_GLOBAL = lArgs.alignment();
MEMCHUNK_SIZE_GLOBAL = lArgs.chunkSize();
RUNS_GLOBAL = lArgs.runs();
VECTORIZE_SIZE_GLOBAL = lArgs.vectorized();
MEASURE_GLOBAL = lArgs.measure();
PRINT_GLOBAL = lArgs.print();
std::string tmp = lArgs.path() + storageModel + "/";
PATH_GLOBAL = tmp.c_str();
...
}
I then include the header file common.hh in each file, where a global variable is needed (which can be very deep down in the system).
I already read a dozen times to prevent global variables so this is obviously bad style. In the book 'Code Complete 2' from Steve McConnell the chapter about global variables also stated to prevent global variables and use access routines instead. In the section 'How to Use Access Routines' he writes
"Hide data in a class. Declare that data by using the static keyword
(...) to ensure only a single instance of the data exists. Write
routines that let you look at the data and change it."
First of all, the global data won't change (maybe this is changed later but at least not in the near future). But I don't get how these access routines are any better? I will also have a class I need to include at every file where the data is needed. The only difference is the global data are static members accessed through getter functions.
(Edited) I also thought about using a global data Singleton class. But an object with ALL the global data sounds overkill since only a few global variables of the object are needed at its different destinations.
My Question: Should I just stick to the global variables? Are there better solutions, what am I missing? What are the best practices?
Edit:
If I would identify a few classes where the user input is needed the most, I could change the global data to member variables. What would be the best practice to pass the user input to these classes? Passing the data as parameters through the whole system down to the lowest layers sounds wrong. Is there are design pattern (thinking about something like a factory) which would be suited here?
How to pass user input through the system without using global
variables.
It is easy. Surprise, I created a class.
For a while, I called this class a travel case, because I considered it analogous to the needs of a suitcase during a trip. The TC_t is a non-standard container which held useful things for what is going on at your destination, and there is only one created, with references passed to any other objects that could use the information. Not global, in the strictest sense.
This TC_t is created in main() thread, while studying the command line options.
I recently wrote yet-another-game-of-life. User inputs included a) destination of output (i.e. a tty num), b) initial fill-pattern choices, c) 'overrides' for game board dimensions, d) test modes, including max speed, and vector vs. array options for cell behaviours.
The GOLUtil_t (Game Of Life Utility) (previously TC_t) includes methods that are useful in more than one effort.
For your question, the two typical globals I avoided are the a) gameBoard, and b) ansi terminal access.
std::cout << "accessing '" << aTermPFN << "' with std::ofstream "
<< std::endl;
std::ofstream* ansiTerm = new std::ofstream(aTermPFN);
if (!ansiTerm->is_open())
{
dtbAssert(nullptr != ansiTerm)(aTermPFN);
std::cerr << "Can not access '" << aTermPFN << "'" << std::endl;
assert(0); // abort
}
// create game-board - with a vector of cell*
CellVec_t gameBoard;
gameBoard.reserve (aMaxRow * aMaxCol);
GOLUtil_t gBrd(aMaxRow, aMaxCol, gameBoard, *ansiTerm);
This last line invoked the ctor of GOLUtil_t.
The instance "gBrd" is then passed (by reference) to the ctor of the game, and from there, to any aggregate objects it contained.
std::string retVal;
{
// initialize display, initialize pattern
GameOfLife_t GOL(gBrd, timeOfDay, fillPatternChoiceLetter, useArray);
std::string retValS = GOL.exec2(testMode);
retVal = gBrd.clearGameBoard(retValS); // delete all cells
}
// force GameOfLife_t dtor before close ansiTerm
ansiTerm->close();
Summary - No globals.
Every instance of any class that needed this info (where to output? what are dimensions?) has access to the GOLUtil_t for their entire lifetime. And GOLUtil_t has methods to lighten the coding load.
Note: because single output terminal, I used a single thread (main)
Your first refactor effort might be to:
a) remove the global classes,
b) and instead instantiate these in main() (for lifetime control)
c) and then pass-by-reference these formerly global instances to those non-global objects that make use of them. I recommend in the ctor(s).
d) remember to clean up (delete if new'd)
my environment: Ubuntu 15.10, 64 bit, g++ V5

visual c++ 2013 memory leak wostringstream, imbue & locale facets

Related to my previous question, I assumed that memory leak occurs in std::string, but taking a deeper look, I got some strange results. Let's begin:
Consider we have a global
static volatile std::wostringstream *Log = nullptr;
and in a WriteToLog() function we have following code:
std::wostringstream* new_log = new std::wostringstream(std::ios::in | std::ios::out);
new_log->imbue(CConsumer::GetUtf8Locale());
std::wostringstream* old_log = (std::wostringstream*)Log;
while((std::wostringstream *)::InterlockedCompareExchangePointer((PVOID volatile *)&Log, new_log, (PVOID)old_log) != new_log)
{
::SleepEx(10, FALSE);
}
std::string logtext(Hooker::Utf16ToUtf8(old_log->str()));
which utilizes proprietary:
static std::locale FORCEINLINE GetUtf8Locale()
{
static std::unique_ptr<std::codecvt_utf8_utf16<wchar_t>> code_cvt(new std::codecvt_utf8_utf16<wchar_t>(std::codecvt_mode::generate_header | std::codecvt_mode::little_endian));
return std::locale(std::locale(), code_cvt.get());
}
Since log events occurs occasionally, it generates enormous memory leak (from initial 5MB/500 handles it jumps to 200MB/300,000 handles in a matter of minutes).
Previously, I assumed it's a leak in relation to std::string, but, using Visual Studio Profiler, it shows that all leaks are caused by GetUtf8Locale().
Can anybody help me with this issue?
This answer is wrong and I overlooked the "Compare" part of InterlockedCompareExchangePointer. However, I think it's a step in the right direction.
So we start with an object who's Log member points at 0x87654321. Then two threads call WriteToLog
Thread1 Thread2
...new_log=new ...
(now new_log=0x15331564)
...new_log=new ...
(now new_log=0x25874963)
...old_log=Log;
(now old_log=0x87654321)
...old_log=Log;
(now old_log=0x87654321)
InterlockedCompareExchangePointer
(now new_log=0x87654321)
(now Log=0x15331564)
InterlockedCompareExchangePointer
(now new_log=0x15331564)
(now Log=0x25874963)
...stuff... ...stuff...
delete old_log
(now 0x87654321 is deleted)
delete old_log
(now 0x87654321 is deleted TWICE!)
And the Log member points at 0x25874963, so... log 0x15331564 is leaked!
Your use of InterlockedCompareExchangePointer is incorrect. I think this is right, depending on the code you haven't shown.
std::wostringstream* new_log = new std::wostringstream(std::ios::in | std::ios::out);
new_log->imbue(CConsumer::GetUtf8Locale());
std::wostringstream* old_log = ::InterlockedExchangePointer((PVOID volatile *)&Log.get(), new_log);
std::string logtext(Hooker::Utf16ToUtf8(old_log->str()));
delete old_log;

Mutex assert in boost regex constructor

I'm using boost 1.47 for Arm, with the Code Sourcery C++ compiler (4.5.1), crosscompiling from Windows 7 targeting Ubuntu.
When we compile the debug version (i.e. asserts are enabled), there is an assert triggered:
pthread_mutex_lock.c:62: __pthread_mutex_lock: Assertion 'mutex->__data.__owner == 0' failed.
Compiling in release mode, the assert is not triggered and the program works fine (as far as we can tell).
This is happening under a Ubuntu 10.x Arm board.
So, it appears that the pthread_mutex_lock thinks the mutex was set by a different thread than the current one. At this point in my program, we're still single threaded, verified by printing out pthread_self in main and just before the regex constructor is called. That is, it should not have failed the assertion.
Below is the snippet of code that triggers the problem.
// Set connection server address and port from a URL
bool MyHttpsXmlClient::set_server_url(const std::string& server_url)
{
#ifdef BOOST_HAS_THREADS
cout <<"Boost has threads" << endl;
#else
cout <<"WARNING: boost does not support threads" << endl;
#endif
#ifdef PTHREAD_MUTEX_INITIALIZER
cout << "pthread mutex initializer" << endl;
#endif
{
pthread_t id = pthread_self();
printf("regex: Current threadid: %d\n",id);
}
const boost::regex e("^((http|https)://)?([^:]*)(:([0-9]*))?"); // 2: service, 3: host, 5: port // <-- dies in here
I've confirmed that BOOST_HAS_THREADS is set, as is PTHREAD_MUTEX_INITIALIZER.
I tried following the debugger though boost but it's templated code and it was rather difficult to follow the assembly, but we basically die in do_assign
(roughtly line 380 in basic_regex.hpp)
basic_regex& assign(const charT* p1,
const charT* p2,
flag_type f = regex_constants::normal)
{
return do_assign(p1, p2, f);
}
the templated code is:
// out of line members;
// these are the only members that mutate the basic_regex object,
// and are designed to provide the strong exception guarentee
// (in the event of a throw, the state of the object remains unchanged).
//
template <class charT, class traits>
basic_regex<charT, traits>& basic_regex<charT, traits>::do_assign(const charT* p1,
const charT* p2,
flag_type f)
{
shared_ptr<re_detail::basic_regex_implementation<charT, traits> > temp;
if(!m_pimpl.get())
{
temp = shared_ptr<re_detail::basic_regex_implementation<charT, traits> >(new re_detail::basic_regex_implementation<charT, traits>());
}
else
{
temp = shared_ptr<re_detail::basic_regex_implementation<charT, traits> >(new re_detail::basic_regex_implementation<charT, traits>(m_pimpl->m_ptraits));
}
temp->assign(p1, p2, f);
temp.swap(m_pimpl);
return *this;
}
I'm not sure what component is actually using the mutex--does anyone know?
In the debugger, I could retrieve the address for the variable mutex and then inspect (mutex->__data.__owner). I got the offsets from the compiler header file bits/pthreadtypes.h, which shows:
/* Data structures for mutex handling. The structure of the attribute
type is not exposed on purpose. */
typedef union
{
struct __pthread_mutex_s
{
int __lock;
unsigned int __count;
int __owner;
/* KIND must stay at this position in the structure to maintain
binary compatibility. */
int __kind;
unsigned int __nusers;
__extension__ union
{
int __spins;
__pthread_slist_t __list;
};
} __data;
char __size[__SIZEOF_PTHREAD_MUTEX_T];
long int __align;
I used these offsets to inspect the data in memory. The values did not make sense:
For instance, the __data.__lock field (an int) is 0xb086b580. The __count (an unsigned int) is 0x6078af00, and __owner (an int) is 0x6078af00.
This leads me to think that somehow initialization of this mutex was not performed. Either that or it was completely corrupted, but I'm leaning to missed initialization because when I linked with debug boost libraries, there was no assert.
Now, I'm assuming that whatever mutex that is being queried, is some global/static that is used to make regex threadsafe, and that somehow it was not initialized.
Has anyone encountered anything similar? Is there some extra step needed for Ubuntu to ensure mutex initialization?
Is my implementation assumption correct?
If it is correct, can someone point me to where this mutex is declared, and where it's initialization is occurring
any suggestions on further debugging steps? I'm thinking I might have to somehow download the source and rebuild with tracing in there (hoping StackOverflow can help me before I get to this point)
One of the first things to check when a really REALLY peculiar runtime crash appears in a well-known, well-tested library like boost is whether there's a header/library configuration mismatch. IMHO, putting _DEBUG or NDEBUG in headers, especially within the structures in a way that affects their binary layout, is an anti-pattern. Ideally we should be able to use the same .lib whether we define _DEBUG, DEBUG, Debug, Debug, NDEBUG, or whatever (so that we can select the .lib based on whether we want to have debug symbols or not, not whether it matches header defines). Unfortunately this isn't always the case.
I used these offsets to inspect the data in memory. The values did not make sense:
For instance, the __data.__lock field (an int) is 0xb086b580. The __count (an unsigned > int) is 0x6078af00, and __owner (an int) is 0x6078af00.
This sounds like different parts of your code have different views on how large various structures should be. Some things to check:
Is there any #define which enlarges a data structure, but is not consistently set throughout your code base? (On Windows, _SECURE_SCL is infamous for this kind of bugs)
Do you do any structure packing? If you set #pragma pack anywhere in a header and forget to unset it at the end of the header, any data structures included after that will have a different layout than elsewhere in your program.

Is something wrong with wrapper classes for C string functions?

I have some C++ code written in C-style.
For some reasons I can't use C++ string and IO libraries, so for strings handling I should use only functions like sprintf, itoa, etc.
I want to replace C-style which requires temporary buffers
char buf[12];
itoa(x, buf, 16);
set_some_text(buf);
by the following code
class i2a
{
public:
explicit i2a(int value) { ::sprintf(buf, "%d", value); }
operator const char* () const { return buf; }
private:
char buf[12];
};
// usage:
set_some_text(i2a(x));
(Such classes can be written for char<->wchar_t convertions, etc.)
I see some cases when such classes will be dangerous:
For example, one can write
const char* someMeaningfulName = i2a(x);
// the right code should be i2a someMeaningfulName(x); or i2a someMeaningfulName = i2a(x);
set_some_text(someMeaningfulName);
In more complex case, a function which accepts text will not copy it, but will save pointer to it somewhere. For example it may be
class Foo { .... const char* p; };
Foo f(const char* text) { ... foo.p = text; return foo; }
it can be really unobvious, unlike const char* variable.
Is there a way to make such classes more secure?
Upd: why not std::string, boost::lexical_cast, boost::format, etc :
The code should work when compiled with -fno-except (C++ exceptions disabled - no throw, no stack unwinding). Also it should keep working on low memory conditions.
std::string, streams uses heap-allocated memory and at least throws bad_alloc.
When we have no free heap memory, usually we still have some kilobytes of stack (for example to write to user that we are out of memory and then make proper cleanup).
ATL and MFC String Conversion Macros are also written in this way. Calling the constructor directly like i2a(x) will create a temporary object that will live until the function to which it is passed is complete. So here: do_some(i2a(x)), the temporary object will be there until do_some() is complete.
Refer the Example section (example 2) of this msdn document
Here,
const char* someMeaningfulName = i2a(x);
set_some_text(someMeaningfulName);
This will not work as, the temporary object will be freed on the first statement itself. someMeaningfulName will be garbage. If you feel that as lack of security, all I can say is:
That's how Microsoft does it!
You will always get somewhere into troubles if you use const char* for your local variables.
You will have the same with const char* someMeaningfulName = std::string("foo").c_str();
If you can you should declare your local variable like this :
i2a someMeaningfulName(x);
set_some_text(someMeaningfulName);
You can also consider adding a copy constructor to i2a to avoid sharing the buffer between two instances.

Is it possible to *safely* return a TCHAR* from a function?

I've created a function that will convert all the event notification codes to strings. Pretty simple stuff really.
I've got a bunch of consts like
const _bstr_t DIRECTSHOW_MSG_EC_ACTIVATE("A video window is being activated or deactivated.");
const _bstr_t DIRECTSHOW_MSG_EC_BUFFERING_DATA("The graph is buffering data, or has stopped buffering data.");
const _bstr_t DIRECTSHOW_MSG_EC_BUILT("Send by the Video Control when a graph has been built. Not forwarded to applications.");
.... etc....
and my function
TCHAR* GetDirectShowMessageDisplayText( int messageNumber )
{
switch( messageNumber )
{
case EC_ACTIVATE: return DIRECTSHOW_MSG_EC_ACTIVATE;
case EC_BUFFERING_DATA: return DIRECTSHOW_MSG_EC_BUFFERING_DATA;
case EC_BUILT: return DIRECTSHOW_MSG_EC_BUILT;
... etc ...
No big deal. Took me 5 minutes to throw together.
... but I simply don't trust that I've got all the possible values, so I want to have a default to return something like "Unexpected notification code (7410)" if no matches are found.
Unfortunately, I can't think of anyway to return a valid pointer, without forcing the caller to delete the string's memory ... which is not only nasty, but also conflicts with the simplicity of the other return values.
So I can't think of any way to do this without changing the return value to a parameter where the user passes in a buffer and a string length. Which would make my function look like
BOOL GetDirectShowMessageDisplayText( int messageNumber, TCHAR* outBuffer, int bufferLength )
{
... etc ...
I really don't want to do that. There must be a better way.
Is there?
I'm coming back to C++ after a 10 year hiatus, so if it's something obvious, don't discount that I've overlooked it for a reason.
C++? std::string. It's not going to destroy the performance on any modern computer.
However if you have some need to over-optimize this, you have three options:
Go with the buffer your example has.
Have the users delete the string afterwards. Many APIs like this provide their own delete function for deleting each kind of dynamically allocated return data.
Return a pointer to a static buffer which you fill in with the return string on each call. This does have some drawbacks, though, in that it's not thread safe, and it can be confusing because the returned pointer's value will change the next time someone calls the function. If non-thread-safety is acceptable and you document the limitations, it should be all right though.
If you are returning a point to a string constant, the caller will not have to delete the string - they'll only have to if you are new-ing the memory used by the string every time. If you're just returning a pointer to a string entry in a table of error messages, I would change the return type to TCHAR const * const and you should be OK.
Of course this will not prevent users of your code to attempt to delete the memory referenced by the pointer but there is only so much you can do to prevent abuse.
Just declare use a static string as a default result:
TCHAR* GetDirectShowMessageDisplayText( int messageNumber )
{
switch( messageNumber )
{
// ...
default:
static TCHAR[] default_value = "This is a default result...";
return default_value;
}
}
You may also declare "default_value" outside of the function.
UPDATE:
If you want to insert a message number in that string then it won't be thread-safe (if you are using multiple threads). However, the solution for that problem is to use thread-specific string. Here is an example using Boost.Thread:
#include <cstdio>
#include <boost/thread/tss.hpp>
#define TCHAR char // This is just because I don't have TCHAR...
static void errorMessageCleanup (TCHAR *msg)
{
delete []msg;
}
static boost::thread_specific_ptr<TCHAR> errorMsg (errorMessageCleanup);
static TCHAR *
formatErrorMessage (int number)
{
static const size_t MSG_MAX_SIZE = 256;
if (errorMsg.get () == NULL)
errorMsg.reset (new TCHAR [MSG_MAX_SIZE]);
snprintf (errorMsg.get (), MSG_MAX_SIZE, "Unexpected notification code (%d)", number);
return errorMsg.get ();
}
int
main ()
{
printf ("Message: %s\n", formatErrorMessage (1));
}
The only limitation of this solution is that returned string cannot be passed by the client to the other thread.
Perhaps have a static string buffer you return a pointer to:
std::ostringstream ss;
ss << "Unexpected notification code (" << messageNumber << ")";
static string temp = ss.str(); // static string always has a buffer
return temp.c_str(); // return pointer to buffer
This is not thread safe, and if you persistently hold the returned pointer and call it twice with different messageNumbers, they all point to the same buffer in temp - so both pointers now point to the same message. The solution? Return a std::string from the function - that's modern C++ style, try to avoid C style pointers and buffers. (It looks like you might want to invent a tstring which would be std::string in ANSI and std::wstring in unicode, although I'd recommend just going unicode-only... do you really have any reason to support non-unicode builds?)
You return some sort of self-releasing smart pointer or your own custom string class. You should follow the interface as it's defined in std::string for easiest use.
class bstr_string {
_bstr_t contents;
public:
bool operator==(const bstr_string& eq);
...
~bstr_string() {
// free _bstr_t
}
};
In C++, you never deal with raw pointers unless you have an important reason, you always use self-managing classes. Usually, Microsoft use raw pointers because they want their interfaces to be C-compatible, but if you don't care, then don't use raw pointers.
The simple solution does seem to be to just return a std::string. It does imply one dynamic memory allocation, but you'd probably get that in any case (as either the user or your function would have to make the allocation explicitly)
An alternative might be to allow the user to pass in an output iterator which you write the string into. Then the user is given complete control over how and when to allocate and store the string.
On the first go-round I missed that this was a C++ question rather than a plain C question. Having C++ to hand opens up another possibility: a self-managing pointer class that can be told whether or not to delete.
class MsgText : public boost::noncopyable
{
const char* msg;
bool shouldDelete;
public:
MsgText(const char *msg, bool shouldDelete = false)
: msg(msg), shouldDelete(shouldDelete)
{}
~MsgText()
{
if (shouldDelete)
free(msg);
}
operator const char*() const
{
return msg;
}
};
const MsgText GetDirectShowMessageDisplayText(int messageNumber)
{
switch(messageNumber)
{
case EC_ACTIVATE:
return MsgText("A video window is being activated or deactivated.");
// etc
default: {
char *msg = asprintf("Undocumented message (%u)", messageNumber);
return MsgText(msg, true);
}
}
}
(I don't remember if Windows CRT has asprintf, but it's easy enough to rewrite the above on top of std::string if it doesn't.)
Note the use of boost::noncopyable, though - if you copy this kind of object you risk double frees. Unfortunately, that may cause problems with returning it from your message-pretty-printer function. I'm not sure what the right way to deal with that is, I'm not actually much of a C++ guru.
You already use _bstr_t, so if you can just return those directly:
_bstr_t GetDirectShowMessageDisplayText(int messageNumber);
If you need to build a different message at runtime you can pack it into a _bstr_t too. Now the ownership is clear and the use is still simple thanks to RAII.
The overhead is negligible (_bstr_t uses ref-counting) and the calling code can still use _bstr_ts conversion to wchar_t* and char* if needed.
There's no good answer here, but this kludge might suffice.
const char *GetDirectShowMessageDisplayText(int messageNumber)
{
switch(messageNumber)
{
// ...
default: {
static char defaultMessage[] = "Unexpected notification code #4294967296";
char *pos = defaultMessage + sizeof "Unexpected notification code #" - 1;
snprintf(pos, sizeof "4294967296" - 1, "%u", messageNumber);
return defaultMessage;
}
}
}
If you do this, callers must be aware that the string they get back from GetDirectShowMessageText might be clobbered by a subsequent call to the function. And it's not thread safe, obviously. But those might be acceptable limitations for your application.