null pointer in C++ - c++

everyone, I have some question about C++, what do You actually prefer to use
int* var = 0;
if(!var)...//1)
or
if(var == 0)..//2)
what are the pros and cons? thanks in advance

I prefer if (!var), because then you cannot accidentally assign to var, and you do not have to pick between 0 and NULL.

I've always been taught to use if (!var), and it seems that all the idiomatic C(++) I've ever read follows this. This has a few nice semantics:
There's no accidental assignment
You're testing the existence of something (ie. if it's NULL). Hence, you can read the statement as "If var does not exist" or "If var is not set"
Maps closely to what you'd idiomatically write if var was a boolean (bools aren't available in C, but people still mimic their use)
Similar to the previous point, it maps closely to predicate function calls, ie. if (!isEmpty(nodeptr)) { .. }

Mostly personal preference. Some will advise that the first is preferred because it becomes impossible to accidentally assign instead of compare. But by that same token, if you make a habit of putting the rvalue on the left of the comparison, the compiler will catch you when you blow it:
if( 0 == var )
...which is certainly true. I simply find if( !var ) to be a little more expressive, and less typing.
They will both evaluate the same, so there's no runtime difference. The most important thing is that you pick one and stick with it. Don't mix one with the other.

Either one is good, though I personally prefer 2 - or something like it.
Makes more sense to me to read:
if ( ptr != NULL )
than
if ( ptr )
The second I may confuse for just being a boolean to look at, but the first I'd be able to tell immediately that it's a pointer.
Either way, I think it's important to pick one and stick with that for consistency, though - rather than having it done in different ways throughout your product.

A few years has passed...
The C++11 standard introduced nullptr to check for null pointers and it should be used as it ensure that the comparison is actually done on a pointer.
So the most robust check would be to do:
if(myPtr != nullptr)
The problem with !var is that it's valid if var is a boolean or a numerical values (int, byte etc...) it will test if they are equal 0 or not.
While if you use nullptr, you are sure that you are checking a pointer not any other type:
For example:
int valA = 0;
int *ptrA = &valA;
if(!valA) // No compile error
if(!*ptrA) // No compile error
if(!ptrA) // No compile error
if(valA != nullptr) // Compile error, valA is not a pointer
if(*ptrA != nullptr) // Compile error, *ptrA is not a pointer
if(ptrA != nullptr) // No compile error
So, it's pretty easy to make a mistake when manipulating pointer to an int as in your example, that's why nullptr should be used.

I would prefer option 3:
if(not var)
{
}
The new (well since 1998) operator keywords for C++ can make code easier to read sometimes.
Strangely enough, they seem to be very unknown. There seem to be more people who know what trigraphs are (thank you IOCCC!), than people who know what operator keywords are. Operator keywords have a similar reason to exist: they allow to program in C++ even if the keyboard does not provide all ANSII characters. But in contradiction to trigraphs, operator keywords make code more readable instead of obfuscating it.

DON'T use NULL, use 0.
Strictly speaking, pointer is not bool, so use '==' and '!=' operators.
NEVER use 'if ( ptr == 0 )', use 'if ( 0 == ptr )' instead.
DON'T use C-pointers, use smart_ptr instead. :-)

According to the High Integrity C++ coding Standard Manual: for comparison between constants and variables you should put the constant on the left side to prevent an assignment instead of an equality comparison, for example:
Avoid:
if(var == 10) {...}
Prefer
if(10 == var) {...}
Particularly in your case I prefer if(0 == var) because it is clear that you are comparing the pointer to be null (0). Because the ! operator can be overloaded then it could have other meanings depending on what your pointer is pointing to: if( !(*var) ) {...}.

Related

What to return if a function cannot return anything? [duplicate]

I'm reading the documentation of std::experimental::optional and I have a good idea about what it does, but I don't understand when I should use it or how I should use it. The site doesn't contain any examples as of yet which leaves it harder for me to grasp the true concept of this object. When is std::optional a good choice to use, and how does it compensate for what was not found in the previous Standard (C++11).
The simplest example I can think of:
std::optional<int> try_parse_int(std::string s)
{
//try to parse an int from the given string,
//and return "nothing" if you fail
}
The same thing might be accomplished with a reference argument instead (as in the following signature), but using std::optional makes the signature and usage nicer.
bool try_parse_int(std::string s, int& i);
Another way that this could be done is especially bad:
int* try_parse_int(std::string s); //return nullptr if fail
This requires dynamic memory allocation, worrying about ownership, etc. - always prefer one of the other two signatures above.
Another example:
class Contact
{
std::optional<std::string> home_phone;
std::optional<std::string> work_phone;
std::optional<std::string> mobile_phone;
};
This is extremely preferable to instead having something like a std::unique_ptr<std::string> for each phone number! std::optional gives you data locality, which is great for performance.
Another example:
template<typename Key, typename Value>
class Lookup
{
std::optional<Value> get(Key key);
};
If the lookup doesn't have a certain key in it, then we can simply return "no value."
I can use it like this:
Lookup<std::string, std::string> location_lookup;
std::string location = location_lookup.get("waldo").value_or("unknown");
Another example:
std::vector<std::pair<std::string, double>> search(
std::string query,
std::optional<int> max_count,
std::optional<double> min_match_score);
This makes a lot more sense than, say, having four function overloads that take every possible combination of max_count (or not) and min_match_score (or not)!
It also eliminates the accursed "Pass -1 for max_count if you don't want a limit" or "Pass std::numeric_limits<double>::min() for min_match_score if you don't want a minimum score"!
Another example:
std::optional<int> find_in_string(std::string s, std::string query);
If the query string isn't in s, I want "no int" -- not whatever special value someone decided to use for this purpose (-1?).
For additional examples, you could look at the boost::optional documentation. boost::optional and std::optional will basically be identical in terms of behavior and usage.
An example is quoted from New adopted paper: N3672, std::optional:
optional<int> str2int(string); // converts int to string if possible
int get_int_from_user()
{
string s;
for (;;) {
cin >> s;
optional<int> o = str2int(s); // 'o' may or may not contain an int
if (o) { // does optional contain a value?
return *o; // use the value
}
}
}
but I don't understand when I should use it or how I should use it.
Consider when you are writing an API and you want to express that "not having a return" value is not an error. For example, you need to read data from a socket, and when a data block is complete, you parse it and return it:
class YourBlock { /* block header, format, whatever else */ };
std::optional<YourBlock> cache_and_get_block(
some_socket_object& socket);
If the appended data completed a parsable block, you can process it; otherwise, keep reading and appending data:
void your_client_code(some_socket_object& socket)
{
char raw_data[1024]; // max 1024 bytes of raw data (for example)
while(socket.read(raw_data, 1024))
{
if(auto block = cache_and_get_block(raw_data))
{
// process *block here
// then return or break
}
// else [ no error; just keep reading and appending ]
}
}
Edit: regarding the rest of your questions:
When is std::optional a good choice to use
When you compute a value and need to return it, it makes for better semantics to return by value than to take a reference to an output value (that may not be generated).
When you want to ensure that client code has to check the output value (whoever writes the client code may not check for error - if you attempt to use an un-initialized pointer you get a core dump; if you attempt to use an un-initialized std::optional, you get a catch-able exception).
[...] and how does it compensate for what was not found in the previous Standard (C++11).
Previous to C++11, you had to use a different interface for "functions that may not return a value" - either return by pointer and check for NULL, or accept an output parameter and return an error/result code for "not available".
Both impose extra effort and attention from the client implementer to get it right and both are a source of confusion (the first pushing the client implementer to think of an operation as an allocation and requiring client code to implement pointer-handling logic and the second allowing client code to get away with using invalid/uninitialized values).
std::optional nicely takes care of the problems arising with previous solutions.
I often use optionals to represent optional data pulled from configuration files, that is to say where that data (such as with an expected, yet not necessary, element within an XML document) is optionally provided, so that I can explicitly and clearly show if the data was actually present in the XML document. Especially when the data can have a "not set" state, versus an "empty" and a "set" state (fuzzy logic). With an optional, set and not set is clear, also empty would be clear with the value of 0 or null.
This can show how the value of "not set" is not equivalent to "empty". In concept, a pointer to an int (int * p) can show this, where a null (p == 0) is not set, a value of 0 (*p == 0) is set and empty, and any other value (*p <> 0) is set to a value.
For a practical example, I have a piece of geometry pulled from an XML document that had a value called render flags, where the geometry can either override the render flags (set), disable the render flags (set to 0), or simply not affect the render flags (not set), an optional would be a clear way to represent this.
Clearly a pointer to an int, in this example, can accomplish the goal, or better, a share pointer as it can offer cleaner implementation, however, I would argue it's about code clarity in this case. Is a null always a "not set"? With a pointer, it is not clear, as null literally means not allocated or created, though it could, yet might not necessarily mean "not set". It is worth pointing out that a pointer must be released, and in good practice set to 0, however, like with a shared pointer, an optional doesn't require explicit cleanup, so there isn't a concern of mixing up the cleanup with the optional having not been set.
I believe it's about code clarity. Clarity reduces the cost of code maintenance, and development. A clear understanding of code intention is incredibly valuable.
Use of a pointer to represent this would require overloading the concept of the pointer. To represent "null" as "not set", typically you might see one or more comments through code to explain this intention. That's not a bad solution instead of an optional, however, I always opt for implicit implementation rather than explicit comments, as comments are not enforceable (such as by compilation). Examples of these implicit items for development (those articles in development that are provided purely to enforce intention) include the various C++ style casts, "const" (especially on member functions), and the "bool" type, to name a few. Arguably you don't really need these code features, so long as everyone obeys intentions or comments.

Testing against NULL or nullptr

See this code:
if (m_hStatusBarPageBreakIcon != NULL)
VERIFY(DestroyIcon(m_hStatusBarPageBreakIcon));
Sometimes when I have opted to use nullptr the compiler complains. But as a general rule is it OK to use nullptr? This variable is of type HICON.
Assuming compiler supports nullptr, the only case where checking a handle for NULL works, but for nullptr fails to compiler is a case where a handle is not defined as a pointer.
As Windows headers define NULL as 0, the check for NULL could work for integer handles, for example. Sure checking an integer for nullptr is an error.
For such cases, if some handle is integer, or some API is documented to accept NULL, I would still replace NULL, not with nullptr, but with literal 0.
I will look on that from a different perspective. Basically I want to have typesafe code.
NULL is even better than nullptr, because 0 is of type int and nullptr is anything.
And static code analysis tools will emit an information if you write uint i=3U, then some more code and then if (i == 0). Same with NULL and nullptr.
So, what I use is:
template <typename T>
const T null(void)
{
return static_cast<T>(0);
}
Example code
typedef unsigned char uchar;
. . .
uchar someVariable = null<uchar>();
SomeComplexClass *someComplexClass = null<SomeComplexClass *>();
. . .
if (null<SomeComplexClass *>() == someComplexClass)
{
. . .
}
The normal programmer will sit there and say. What is that nonsense?
But consider, we develop some ASIL D code. And then you refactor your code. And you will accidently modify "SomeComplexClass" to "someOtherComplexClass".
if (null<SomeComplexClass *>() == someOtherComplexClass)
With this typesafe 0, the static code analysis tools will emit an info.
Think of that, when you drive your car next time with an ASIL D rated steering-wheel, brake- or accelarator-pedal software.
BTW: also auto should be avoided wherever possible in such code.

C++ Null Test After Dereference

I have the below c++ code
*dtcStatus = DataDTC[idxDTC].status;
if (dtcStatus != NULL) {
return E_OK;
} else {
ALOGE("Dem_GetStatusOfDTC return NULL");
return E_NOT_OK;
}
My static analysis tool is reporting warning for the above code Null Test After Dereference
My static analysis tool is reporting warning for the above code Null Test After Dereference
This is where you indirect (dereference) through the pointer:
*dtcStatus = DataDTC[idxDTC].status;
This is where you test for null:
if (dtcStatus != NULL) {
If look closely, you'll notice that the indirection is before the check. If the program would enter the else branch, then it must have indirected through a null pointer and thus the behaviour of the program is undefined. This is why the tool informs you of this bug.
Any way to over come that
You can over come "testing for null after dereference" by not doing that. In other words, by testing for null before the indirection. Simply swap those two lines.
This line
*dtcStatus = DataDTC[idxDTC].status;
is dereferencing dtcStatus (via the * operator). You are only allowed to do that when you are absolutely certain that dtcStatus is a valid pointer, ie points to some instance, ie is not NULL.
Now the static analyzer tries to tell you that either
you are absolutely certain that the pointer is valid. In that case you do not have to check whether it is NULL on the next line. Or...
you are not certain. In this case you should check for NULL before you dereference the pointer.
The compiler is actually telling you what's wrong
Null Test After Dereference
Means you are trying to test a variable with NULL after dereferencing it.
You should rather not dereference your variable for the NULL check:
if (DataDTC[idxDTC].status != NULL) {}
Or if you want still want to use your variable you could test &dtcStatus.
Remove the star at the beginning here:
*dtcStatus = DataDTC[idxDTC].status;

What does CString::GetBuffer() with no size parameter do?

Perhaps I'm going insane, but I have tried every search combination I can think of, and I can't find a definition for CString::GetBuffer() with no parameters. Every reference I look up describes CString::GetBuffer( int ), where the int parameter passed in is the max buffer length. The definition in the header is for CSimpleStringT::GetBuffer(). That gave me the following link, which at least acknowledges the existence of the parameterless version, but offers no description of its behavior.
https://msdn.microsoft.com/en-us/library/sddk80xf.aspx#csimplestringt__getbuffer
I'm looking at existing C++ (Visual Studio) code that I don't want to change if I don't have to, but I need to know the expected behavior of CString::GetBuffer(). I'd appreciate it if someone could explain it or point me to some documentation on it.
Although the msdn documentation doesn't really say what GetBuffer without a parameter does, the MFC source code reveals the answer:
return( m_pszData );
So it just returns a pointer to the underlying character buffer. (It also checks to see if the internal data is shared and forks/copies it first).
The code is in atlsimpstr.h
Complete function:
PXSTR GetBuffer()
{
CStringData* pData = GetData();
if( pData->IsShared() )
{
Fork( pData->nDataLength );
}
return( m_pszData );
}
tl;dr
Call CString::GetString().
This is asking the wrong question for the wrong reasons. Just to get it out of the way, here is the answer from the documentation:
Return Value
An PXSTR pointer to the object's (null-terminated) character buffer.
This is true for both overloads, with and without an explicit length argument. When calling the overload taking a length argument, the internal buffer may get resized to accommodate for increased storage requirements, prior to returning a pointer to that buffer.
From this comment, it becomes apparent, that the question is asking for the wrong thing altogether. To learn why, you need to understand what the purpose of the GetBuffer() family of class members is: To temporarily disable enforcement of CString's class invariants1 for modification, until establishing them again by calling one of the ReleaseBuffer() members. The primary use case for this is to interface with C code (like the Windows API).
The important information is:
GetBuffer() should only be called, if you plan to directly modify the contents of the stored character sequence.
Every call to GetBuffer() must be matched with a call to ReleaseBuffer(), before using any other CString class member2. Note in particular, that operator PCXSTR() and the destructor are class members.
As long as you follow that protocol, the controlled character sequence will always be null-terminated.
Given your actual use case (Log.Print("%s\n", myCstring.GetBuffer())), none of the previous really applies. Since you do not plan to actually modify the string contents, you should access the immutable CString interface (e.g. GetString() or operator PCXSTR()) instead. This requires const-correct function signatures (TCHAR const* vs. TCHAR*). Failing that, use a const_cast if you can ensure, that the callee will not mutate the buffer.
There are several benefits to this:
It is semantically correct. If all you want is a view into the character string, you do not need a pointer to a mutable buffer.
There are no superfluous copies of the contents. CString implements copy-on-write semantics. Requesting a mutable buffer necessitates copying the contents for shared instances, even if you are going to throw that copy away immediately after evaluating the current expression.
The immutable interface cannot fail. No exceptions are thrown when calling operator PXCSTR() or GetString().
1 The relevant invariants are: 1 The controlled sequence of characters is always null-terminated. 2 GetLength() returns the count of characters in the controlled sequence, excluding the null terminator.
2 It is only strictly required to call one of the ReleaseBuffer() implementations, if the contents were changed. This is often not immediately obvious from looking at the source code, so always calling ReleaseBuffer() is the safe option.
Documentation is inconclusive. Looking at ATL sources available here (https://github.com/dblock/msiext/blob/d8898d0c84965622868b1763958b68e19fd49ba8/externals/WinDDK/7600.16385.1/inc/atl71/atlsimpstr.h - I do not claim to know if they are official or not) it looks like GetBuffer() without arguments returns the current buffer, cloning it before if it is shared.
On the other hand, GetBuffer(int) with size is going to check (through the call to PrepareWrite and possibly PrepareWrite2) if the current buffer size is greater than requested, and if it is not, it will allocate the new buffer - thus matching MSDN description.
On a side note, PrepareWrite seems to become quite creative in how it checks for two conditions:
PXSTR PrepareWrite( __in int nLength )
{
CStringData* pOldData = GetData();
int nShared = 1-pOldData->nRefs; // nShared < 0 means true, >= 0 means false
int nTooShort = pOldData->nAllocLength-nLength; // nTooShort < 0 means true, >= 0 means false
if( (nShared|nTooShort) < 0 ) // If either sign bit is set (i.e. either is less than zero), we need to copy data
{
PrepareWrite2( nLength );
}
return( m_pszData );
}
Windows API functions often require the input of a character buffer of a certain length. Then use the GetBuffer(int) version. The following code snippet illustrates this and the difference between GetBuffer() and GetString() and the importance of calling ReleaseBuffer() after calling GetBuffer():
CStringW FullName;
if(::GetModuleFileNameW(nullptr,FullName.GetBuffer(MAX_PATH), MAX_PATH) <= 0)
return 0; //GetBuffer() returns PXSTR
FullName.ReleaseBuffer(); //Don't forget!
FullName = L"Path and Name: " + FullName;
std::wcout << FullName.GetString() << L"\n"; //GetString() returns PCXSTR

What's the reasoning behind putting constants in 'if' statements first?

I was looking at some example C++ code for a hardware interface I'm working with and noticed a lot of statements along the following lines:
if ( NULL == pMsg ) return rv;
I'm sure I've heard people say that putting the constant first is a good idea, but why is that? Is it just so that if you have a large statement you can quickly see what you're comparing against or is there more to it?
So that you don't mix comparison (==) with assignment (=).
As you know, you can't assign to a constant. If you try, the compiler will give you an error.
Basically, it's one of defensive programming techniques. To protect yourself from yourself.
To stop you from writing:
if ( pMsg = NULL ) return rv;
by mistake. A good compiler will warn you about this however, so most people don't use the "constant first" way, as they find it difficult to read.
It stops the single = assignment bug.
Eg,
if ( NULL = pMsg ) return rv;
won't compile, where as
if ( pMsg = NULL) return rv;
will compile and give you headaches
To clarify what I wrote in some of the comments, here is a reason not to do this in C++ code.
Someone writes, say, a string class and decides to add a cast operator to const char*:
class BadString
{
public:
BadString(const char* s) : mStr(s) { }
operator const char*() const { return mStr.c_str(); }
bool operator==(const BadString& s) { return mStr == s.mStr; }
// Other stuff...
private:
std::string mStr;
};
Now someone blindly applies the constant == variable "defensive" programming pattern:
BadString s("foo");
if ("foo" == s) // Oops. This compares pointers and is never true.
{
// ...
}
This is, IMO, a more insidious problem than accidental assignment because you can't tell from the call site that anything is obviously wrong.
Of course, the real lessons are:
Don't write your own string classes.
Avoid implicit cast operators, especially when doing (1).
But sometimes you're dealing with third-party APIs you can't control. For example, the _bstr_t string class common in Windows COM programming suffers from this flaw.
When the constant is first, the compiler will warn you if you accidentally write = rather than == since it's illegal to assign a value to a constant.
Compilers outputting warnings is good, but some of us in the real world can't afford to treat warnings as errors. Reversing the order of variable and constant means this simple slip always shows up as an error and prevents compilation. You get used to this pattern very quickly, and the bug it protects against is a subtle one, which is often difficult to find once introduced.
They said, "to prevent mixing of assignment and comparison".
In reality I think it is nonsense: if you are so disciplined that you don't forget to put constant at the left side, you definitely won't mix up '=' with '==', would you? ;)
I forget the article, but the quote went something like:
Evidently it's easier remembering to put the constant first, than it is remembering to use ==" ;))