C++ std::string and NULL const char* - c++

I am working in C++ with two large pieces of code, one done in "C style" and one in "C++ style".
The C-type code has functions that return const char* and the C++ code has in numerous places things like
const char* somecstylefunction();
...
std::string imacppstring = somecstylefunction();
where it is constructing the string from a const char* returned by the C style code.
This worked until the C style code changed and started returning NULL pointers sometimes. This of course causes seg faults.
There is a lot of code around and so I would like to most parsimonious way fix to this problem. The expected behavior is that imacppstring would be the empty string in this case. Is there a nice, slick solution to this?
Update
The const char* returned by these functions are always pointers to static strings. They were used mostly to pass informative messages (destined for logging most likely) about any unexpected behavior in the function. It was decided that having these return NULL on "nothing to report" was nice, because then you could use the return value as a conditional, i.e.
if (somecstylefunction()) do_something;
whereas before the functions returned the static string "";
Whether this was a good idea, I'm not going to touch this code and it's not up to me anyway.
What I wanted to avoid was tracking down every string initialization to add a wrapper function.

Probably the best thing to do is to fix the C library functions to their pre-breaking change behavior. but maybe you don't have control over that library.
The second thing to consider is to change all the instances where you're depending on the C lib functions returning an empty string to use a wrapper function that'll 'fix up' the NULL pointers:
const char* nullToEmpty( char const* s)
{
return (s ? s : "");
}
So now
std::string imacppstring = somecstylefunction();
might look like:
std::string imacppstring( nullToEmpty( somecstylefunction());
If that's unacceptable (it might be a lot of busy work, but it should be a one-time mechanical change), you could implement a 'parallel' library that has the same names as the C lib you're currently using, with those functions simply calling the original C lib functions and fixing the NULL pointers as appropriate. You'd need to play some tricky games with headers, the linker, and/or C++ namespaces to get this to work, and this has a huge potential for causing confusion down the road, so I'd think hard before going down that road.
But something like the following might get you started:
// .h file for a C++ wrapper for the C Lib
namespace clib_fixer {
const char* somecstylefunction();
}
// .cpp file for a C++ wrapper for the C Lib
namespace clib_fixer {
const char* somecstylefunction() {
const char* p = ::somecstylefunction();
return (p ? p : "");
}
}
Now you just have to add that header to the .cpp files that are currently calling calling the C lib functions (and probably remove the header for the C lib) and add a
using namespace clib_fixer;
to the .cpp file using those functions.
That might not be too bad. Maybe.

Well, without changing every place where a C++ std::string is initialized directly from a C function call (to add the null-pointer check), the only solution would be to prohibit your C functions from returning null pointers.
In GCC compiler, you can use a compiler extension "Conditionals with Omitted Operands" to create a wrapper macro for your C function
#define somecstylefunction() (somecstylefunction() ? : "")
but in general case I would advise against that.

I suppose you could just add a wrapper function which tests for NULL, and returns an empty std::string. But more importantly, why are your C functions now returning NULL? What does a NULL pointer indicate? If it indicates a serious error, you might want your wrapper function to throw an exception.
Or to be safe, you could just check for NULL, handle the NULL case, and only then construct an std::string.
const char* s = somecstylefunction();
if (!s) explode();
std::string str(s);

For a portable solution:
(a) define your own string type. The biggest part is a search and replace over the entire project - that can be simple if it's always std::string, or big one-time pain. (I'd make the sole requriement that it's Liskov-substitutable for a std::string, but also constructs an empty string from an null char *.
The easiest implementation is inheriting publicly from std::string. Even though that's frowned upon (for understandable reasons), it would be ok in this case, and also help with 3rd party libraries expecting a std::string, as well as debug tools. Alternatively, aggegate and forward - yuck.
(b) #define std::string to be your own string type. Risky, not recommended. I wouldn't do it unless I knew the codebases involved very well and saves you tons of work (and I'd add some disclaimers to protect the remains of my reputation ;))
(c) I've worked around a few such cases by re-#define'ing the offensive type to some utility class only for the purpose of the include (so the #define is much more limited in scope). However, I have no idea how to do that for a char *.
(d) Write an import wrapper. If the C library headers have a rather regular layout, and/or you know someone who has some experience parsing C++ code, you might be able to generate a "wrapper header".
(e) ask the library owner to make the "Null string" value configurable at least at compile time. (An acceptable request since switching to 0 can break compatibility as well in other scenarios) You might even offer to submit the change yourself if that's less work for you!

You could wrap all your calls to C-stlye functions in something like this...
std::string makeCppString(const char* cStr)
{
return cStr ? std::string(cStr) : std::string("");
}
Then wherever you have:
std::string imacppstring = somecstylefunction();
replace it with:
std::string imacppstring = makeCppString( somecystylefunction() );
Of course, this assumes that constructing an empty string is acceptable behavior when your function returns NULL.

I don't generally advocate subclassing standard containers, but in this case it might work.
class mystring : public std::string
{
// ... appropriate constructors are an exercise left to the reader
mystring & operator=(const char * right)
{
if (right == NULL)
{
clear();
}
else
{
std::string::operator=(right); // I think this works, didn't check it...
}
return *this;
}
};

Something like this should fix your problem.
const char *cString;
std::string imacppstring;
cString = somecstylefunction();
if (cString == NULL) {
imacppstring = "";
} else {
imacppstring = cString;
}
If you want, you could stick the error checking logic in its own function. You'd have to put this code block in fewer places, then.

Related

prevent string literals from being converted to bool versus std::string

Similar questions have been asked before, such as String literal matches bool overload instead of std::string.
But what I want to know is what should C++ developers do to prevent this from happening? As someone who writes C++ libraries for others to consume, what should I do to ensure this doesn't happen? Here is the example I ran into today, where a library had 2 initialize() methods:
void initialize(bool someflag) { /* ... */ }
void initialize(const std::string & name) { /* ... */ }
Now the problematic code was in the application that wanted to utilize this functionality and which called it in a manner similar to this:
initialize("robert");
At first glance you'd think that this would call initialize(string) but it actually calls the first initialize(bool) with a boolean flag set to true!
Yes, I know it can be fixed with this:
initialize( std::string("robert") );
But this puts the onus on the caller.
Edit for #zdan: I didn't consider the "solutions" in the other linked question to be great solutions since 1) I was hoping not to have to add a const char * version of every method that takes a bool or string, and 2) the template solution increases the maintainability of the code significantly for affected methods, renders them almost unreadable.
what should I do to ensure this doesn't happen?
One possibility is to create an overload that accepts a char const* and make it a pass through to the overload that accepts a std::string.
void initialize(char const* name) { initialize(std::string(name)); }

How can I ensure that correct function is called in case there are multiple candidates

In C++ is perfectly legitimate to do:
bool x = "hi";
Because "hi" is translated by compiler to a char array and returns a pointer to that array, which is a number and number can be implicitly converted to bool (0 is false, anything else is true).
So I have these ctor:
Exception(QString Text, bool __IsRecoverable = true);
Exception(QString Text, QString _Source, bool __IsRecoverable = true);
Sadly I figured out that calling
Exception *e = new Exception("error happened", "main.cpp #test");
It creates a new instance of "Exception" class which is created using Exception(QString Text, bool __IsRecoverable = true); constructor, which is wrong to a point.
Is there a simple way to ensure that correct function is called, other than restructuring the constructors entirely, changing position of arguments, etc?
Firstly, I'm not sure why you're dynamically allocating an exception class. I'm not sure that's ever a good idea.
You can explicitly construct a QString:
Exception e("error happened", QString("main.cpp #test"));
Or you can pass the third argument:
Exception e("error happened", "main.cpp #test", true);
Or you can add an additional constructor that takes const char* and will be preferred over the conversion to bool:
Exception(QString Text, const char* Source, bool IsRecoverable = true);
You can easily make this forward to the QString version. Also note that names beginning with an underscore and a capital letter or with two underscores are reserved.
My suggestion would be to not use default arguments. They contribute to overload resolution problems like this, and anyway it is not very readable to just see true as an argument. Whoever's reading the code then has to stop and go look up what the true means. Even if it's yourself you may forget it in a few months time when you come back to the code, especially if you do this sort of thing a lot.
For example:
struct Exception: public whatever
{
Exception(char const *text);
Exception(char const *text, char const *source);
};
struct RecoverableException: public Exception
{
RecoverableException(char const *text);
RecoverableException(char const *text, char const *source);
};
It's a little bit more typing in this source file but the payoff is that your code which actually uses the exceptions is simpler and clearer.
To implement these constructors you could have them all call a particular function in the .cpp file with relevant arguments selecting which behaviour you want.
I have a preference for using char const * rather than QString as I am paranoid about two things:
unwanted conversions
memory allocation failure
If constructing a QString throws then things go downhill fast. But you may choose to not worry about this possibility because if the system ran out of memory and your exception handling doesn't prepare for that possibility then it's going to terminate either way.

Several specific methods or one generic method?

this is my first question after long time checking on this marvelous webpage.
Probably my question is a little silly but I want to know others opinion about this. What is better, to create several specific methods or, on the other hand, only one generic method? Here is an example...
unsigned char *Method1(CommandTypeEnum command, ParamsCommand1Struct *params)
{
if(params == NULL) return NULL;
// Construct a string (command) with those specific params (params->element1, ...)
return buffer; // buffer is a member of the class
}
unsigned char *Method2(CommandTypeEnum command, ParamsCommand2Struct *params)
{
...
}
unsigned char *Method3(CommandTypeEnum command, ParamsCommand3Struct *params)
{
...
}
unsigned char *Method4(CommandTypeEnum command, ParamsCommand4Struct *params)
{
...
}
or
unsigned char *Method(CommandTypeEnum command, void *params)
{
switch(command)
{
case CMD_1:
{
if(params == NULL) return NULL;
ParamsCommand1Struct *value = (ParamsCommand1Struct *) params;
// Construct a string (command) with those specific params (params->element1, ...)
return buffer;
}
break;
// ...
default:
break;
}
}
The main thing I do not really like of the latter option is this,
ParamsCommand1Struct *value = (ParamsCommand1Struct *) params;
because "params" could not be a pointer to "ParamsCommand1Struct" but a pointer to "ParamsCommand2Struct" or someone else.
I really appreciate your opinions!
General Answer
In Writing Solid Code, Steve Macguire's advice is to prefer distinct functions (methods) for specific situations. The reason is that you can assert conditions that are relevant to the specific case, and you can more easily debug because you have more context.
An interesting example is the standard C run-time's functions for dynamic memory allocation. Most of it is redundant, as realloc can actually do (almost) everything you need. If you have realloc, you don't need malloc or free. But when you have such a general function, used for several different types of operations, it's hard to add useful assertions and it's harder to write unit tests, and it's harder to see what's happening when debugging. Macquire takes it a step farther and suggests that, not only should realloc just do _re_allocation, but it should probably be two distinct functions: one for growing a block and one for shrinking a block.
While I generally agree with his logic, sometimes there are practical advantages to having one general purpose method (often when operations is highly data-driven). So I usually decide on a case by case basis, with a bias toward creating very specific methods rather than overly general purpose ones.
Specific Answer
In your case, I think you need to find a way to factor out the common code from the specifics. The switch is often a signal that you should be using a small class hierarchy with virtual functions.
If you like the single method approach, then it probably should be just a dispatcher to the more specific methods. In other words, each of those cases in the switch statement simply call the appropriate Method1, Method2, etc. If you want the user to see only the general purpose method, then you can make the specific implementations private methods.
Generally, it's better to offer separate functions, because they by their prototype names and arguments communicate directly and visibly to the user that which is available; this also leads to more straightforward documentation.
The one time I use a multi-purpose function is for something like a query() function, where a number of minor query functions, rather than leading to a proliferation of functions, are bundled into one, with a generic input and output void pointer.
In general, think about what you're trying to communicate to the API user by the API prototypes themselves; a clear sense of what the API can do. He doesn't need excessive minutae; he does need to know the core functions which are the entire point of having the API in the first place.
First off, you need to decide which language you are using. Tagging the question with both C and C++ here makes no sense. I am assuming C++.
If you can create a generic function then of course that is preferable (why would you prefer multiple, redundant functions?) The question is; can you? However, you seem to be unaware of templates. We need to see what you have omitted here to tell if you if templates are suitable however:
// Construct a string (command) with those specific params (params->element1, ...)
In the general case, assuming templates are appropriate, all of that turns into:
template <typename T>
unsigned char *Method(CommandTypeEnum command, T *params) {
// more here
}
On a side note, how is buffer declared? Are you returning a pointer to dynamically allocated memory? Prefer RAII type objects and avoid dynamically allocating memory like that if so.
If you are using C++ then I would avoid using void* as you don't really need to. There is nothing wrong with having multiple methods. Note that you don't actually have to rename the function in your first set of examples - you can just overload a function using different parameters so that there is a separate function signature for each type. Ultimately, this kind of question is very subjective and there are a number of ways of doing things. Looking at your functions of the first type, you would perhaps be well served by looking into the use of templated functions
You could create a struct. That's what I use to handle console commands.
typedef int (* pFunPrintf)(const char*,...);
typedef void (CommandClass::*pKeyFunc)(char *,pFunPrintf);
struct KeyCommand
{
const char * cmd;
unsigned char cmdLen;
pKeyFunc pfun;
const char * Note;
long ID;
};
#define CMD_FORMAT(a) a,(sizeof(a)-1)
static KeyCommand Commands[]=
{
{CMD_FORMAT("one"), &CommandClass::CommandOne, "String Parameter",0},
{CMD_FORMAT("two"), &CommandClass::CommandTwo, "String Parameter",1},
{CMD_FORMAT("three"), &CommandClass::CommandThree, "String Parameter",2},
{CMD_FORMAT("four"), &CommandClass::CommandFour, "String Parameter",3},
};
#define AllCommands sizeof(Commands)/sizeof(KeyCommand)
And the Parser function
void CommandClass::ParseCmd( char* Argcommand )
{
unsigned int x;
for ( x=0;x<AllCommands;x++)
{
if(!memcmp(Commands[x].cmd,Argcommand,Commands[x].cmdLen ))
{
(this->*Commands[x].pfun)(&Argcommand[Commands[x].cmdLen],&::printf);
break;
}
}
if(x==AllCommands)
{
// Unknown command
}
}
I use a thread safe printf pPrintf, so ignore it.
I don't really know what you want to do, but in C++ you probably should derive multiple classes from a Formatter Base class like this:
class Formatter
{
virtual void Format(unsigned char* buffer, Command command) const = 0;
};
class YourClass
{
public:
void Method(Command command, const Formatter& formatter)
{
formatter.Format(buffer, command);
}
private:
unsigned char* buffer_;
};
int main()
{
//
Params1Formatter formatter(/*...*/);
YourClass yourObject;
yourObject.Method(CommandA, formatter);
// ...
}
This removes the resposibility to handle all that params stuff from your class and makes it closed for changes. If there will be new commands or parameters during further development you don't have to modifiy (and eventually break) existing code but add new classes that implement the new stuff.
While not full answer this should guide you in correct direction: ONE FUNCTION ONE RESPONSIBILITY. Prefer the code where it is responsible for one thing only and does it well. The code whith huge switch statement (which is not bad by itself) where you need cast void * to some other type is a smell.
By the way I hope you do realise that according to standard you can only cast from void * to <type> * only when the original cast was exactly from <type> * to void *.

Is something wrong with wrapper classes for C string functions?

I have some C++ code written in C-style.
For some reasons I can't use C++ string and IO libraries, so for strings handling I should use only functions like sprintf, itoa, etc.
I want to replace C-style which requires temporary buffers
char buf[12];
itoa(x, buf, 16);
set_some_text(buf);
by the following code
class i2a
{
public:
explicit i2a(int value) { ::sprintf(buf, "%d", value); }
operator const char* () const { return buf; }
private:
char buf[12];
};
// usage:
set_some_text(i2a(x));
(Such classes can be written for char<->wchar_t convertions, etc.)
I see some cases when such classes will be dangerous:
For example, one can write
const char* someMeaningfulName = i2a(x);
// the right code should be i2a someMeaningfulName(x); or i2a someMeaningfulName = i2a(x);
set_some_text(someMeaningfulName);
In more complex case, a function which accepts text will not copy it, but will save pointer to it somewhere. For example it may be
class Foo { .... const char* p; };
Foo f(const char* text) { ... foo.p = text; return foo; }
it can be really unobvious, unlike const char* variable.
Is there a way to make such classes more secure?
Upd: why not std::string, boost::lexical_cast, boost::format, etc :
The code should work when compiled with -fno-except (C++ exceptions disabled - no throw, no stack unwinding). Also it should keep working on low memory conditions.
std::string, streams uses heap-allocated memory and at least throws bad_alloc.
When we have no free heap memory, usually we still have some kilobytes of stack (for example to write to user that we are out of memory and then make proper cleanup).
ATL and MFC String Conversion Macros are also written in this way. Calling the constructor directly like i2a(x) will create a temporary object that will live until the function to which it is passed is complete. So here: do_some(i2a(x)), the temporary object will be there until do_some() is complete.
Refer the Example section (example 2) of this msdn document
Here,
const char* someMeaningfulName = i2a(x);
set_some_text(someMeaningfulName);
This will not work as, the temporary object will be freed on the first statement itself. someMeaningfulName will be garbage. If you feel that as lack of security, all I can say is:
That's how Microsoft does it!
You will always get somewhere into troubles if you use const char* for your local variables.
You will have the same with const char* someMeaningfulName = std::string("foo").c_str();
If you can you should declare your local variable like this :
i2a someMeaningfulName(x);
set_some_text(someMeaningfulName);
You can also consider adding a copy constructor to i2a to avoid sharing the buffer between two instances.

C++ Struct initialisation problem

This c++ code is working fine , however memory validator says that I am using a deleted pointer in:
grf->filePath = fname; Do you have any idea why ? Thank you.
Dirloader.h
// Other code
class CDirLoader
{
public:
struct TKnownGRF
{
std::string filePath;
DWORD encodingType;
DWORD userDataLen;
char *userData;
};
// Other Code
CDirLoader();
virtual ~CDirLoader();
Dirloader.cpp
// Other code
void CDirLoader::AddGroupFile(const std::string& _fname)
{
// Other code including std::string fname = _fname;
TKnownGRF *grf = new TKnownGRF;
grf->filePath = fname;
delete grf; // Just for testing purposes
P.S.: This is only an code extract. Of course if I define a struct TKnownGRF inside .cpp and use it as an actual object, gfr.filepath = something, instead of pointer grf->filepath=something, than it is ok, but I do need to have it inside *.h in CDirLoader class, due to many other vector allocations.
Since the function returns void
void CDirLoader::AddGroupFile(const std::string& _fname)
the question is what are you going to do with grf?
Are you going to delete it? If so, then, why do a new? you can just declare a TKnownGRF variable on the stack! In that case, _fname is not contributing to the logic of this method.
I guess that the class CDirLoader has a member variable of type TKnownGRF, say grf_, and that need to be used in the AddsGroupFile() method, e.g.:
grf_.filepath = _fname;
Does this happen to be using an older version of STL, say, VC6, and running multithreaded? Older versions of STL's string class used a reference counted copy on write implementation, which didn't really work in a multithreaded environment. See this KB article on VC 6.
Or, it's also possible that you are looking at the wrong problem. If you call std::string::c_str() and cache the result at all, the cached result would probably be invalidated when you modified the original string. There are a few cases where you can get away with that, but it's very much implementation specific.