How do you deal with NUL? - c++

From time to time, I run into communications issue with other programmers, when we talk about NULL. Now NULL could be
a NULL pointer
the NUL character
an empty data element in some sort of database.
NUL seems to be the most confusing. It is the ASCII character 0x00.
I tend to use '\0' in my code to represent it. Some developers in my group
tend to prefer to simply use 0, and let the compiler implicitly cast it to a char.
What do you prefer to use for NUL? and why?

I use '\0' for the nul-character and NULL for pointers because it is clearest in both cases.
BTW, both 0 and '\0' are ints in C and either one will be converted to char when stored in a char variable.

I like the pre-defined NULL macro, as it preserves the semantic meaning, rather than some other use of the number 0.

There are many English words which are spelled or spoken alike, yet which have different meanings. Like in English, use the context in which the discussion is taking place to guide you toward the intended meaning.

For dealing with strings, I alwayse represent the null character as '\0'.
For pointers, I try to use implicit-conversion-to-boolean (if (!myPtr) or if (myPtr)) for pointer nullity.
If I need a default value for a pointer, it's NULL, e.g. struct list_head = { 0.0, NULL };).
END_OF_STRING is silly, since it's extra indirection that simply confuses new readers (anyone who doesn't immediately recognize '\0' should step away from the keyboard).
One other thing—I think the difference between a null value and an empty value is extremely important when talking about data modeling. This is especially true when discussing C-style strings or nullable database fields. There's a huge difference between someone telling you "I have no name" and "My name is ."

#BKB:
I see the point in his advice, but "NULL" makes it clearer that the context is pointers. It's like using "0.0" for floating-point values, as '\0' when dealing with characters. (Likewise, I prefer seeing 0 if a char is being used in an arithmetic context.)
Bjarne further states in this FAQ that NULL is #defined as 0 anyway, so standard code shouldn't have a problem with it. I agree that the all-caps notation is ugly, but we'll have to wait until 0x (where nullptr will be available, as a keyword.)

If I remember correctly most C compilers define NULL like this:
#define NULL ((void*)0)
This is to ensure that NULL is interpreted as being a pointer type (in C). However this can cause issues in the much more type strict world of C++. Eg:
// Example taken from wikibooks.org
std::string * str = NULL; // Can't automatically cast void * to std::string *
void (C::*pmf) () = &C::func;
if (pmf == NULL) {} // Can't automatically cast from void * to pointer to member function.
Therefore in the current C++ standard null pointers should be initialized with the literal 0. Obviously because people are so used to using the NULL define I think a lot of C++ compilers either silently ignore the issue or redefine NULL to be 0 in C++ code. Eg:
#ifdef __cplusplus
#define NULL (0)
#else
#define NULL ((void*)0)
#endif
The C++x0 standard now defines a nullptr keyword to represent null pointers. Visual C++ 2005's CLI/C++ compiler also uses this keyword when setting managed pointers to null. In current compilers you can create a template to emulate this new keyword.
There is a much more detailed article on wikibooks.org discussing this issue.

A one-L NUL, it ends a string.
A two-L NULL points to no thing.
And I will bet a golden bull
That there is no three-L NULLL.
(The name of the original author is, alas, lost to the sands of time.)

NULL for databases, NIL for code.

While, on the whole, I would advice using named constants, this is one exception. To me, defining:
#define NULL 0
#define END_OF_STRING '\0'
makes as much sense as defining:
#define SEVEN 7
which is none. And yes, I am aware that NULL is already defined by the compiler, but I never use it. For pointers, 0; for chars, '\0'. Longer does not always mean more expressive.

I quite like
#define ASCII_NUL ('\0')
I only very occasionally mistype '\0' as '0'. But when I have done it, I've found the error very hard to spot by code inspection, with hilarious consequences. So I don't like '\0' much, and prefer ASCII_NUL or 0 (of course the latter has the wrong type in C++). Obviously I use '\0' where demanded by consistency with existing code, or style guides.
The Google C++ style guide, which contains a few things I like and a few I don't, but seems mostly sound, prefers NULL to 0 for pointers. It points out that NULL might not be defined simply as 0 (or 0L), especially in implementations where sizeof(void*) might not be sizeof(int) (or sizeof(long int)).
0 and NULL are both specified to be of integral type, and when converted to a pointer type they both must yield a null pointer value. But they aren't necessarily of the same integral type. So you might conceivably get some useful warnings or errors in some situations by using NULL.

For communication I use NULL. If I'm working with a developer who cannot grasp the concept of NULL for different data-types then I'd be concerned.
For implementation it's case-specific. Numbers are 0 (post-fixed f for floating-point), pointers are NULL and character strings are 0.

Systems that don't use binary 0 for NULL are getting harder to find. They also tend to have various portability issues. Why? Because on these systems neither memset nor calloc can clear out a struct that contains pointers correctly.

const char END_OF_STRING = '\0';
So when you say:
str[i] = END_OF_STRING;
or
if (*ptr == END_OF_STRING)
there is absolutely no question what you mean.

We use NULL for pointers and NULLCHAR for characters, using
#define NULLCHAR '\0'

Sort of related: Slashdot recently had a story on the comp.lang.c FAQ section on null pointers, which I found quite interesting.

Related

Null-terminate string: Use '\0' or just 0?

If I need to null-terminate a String, should I rather use \0 or is a simple 0 also enough?
Is there any difference between using
char a[5];
a[0] = 0;
and
char a[5];
a[0] = '\0';
Or is \0 just preferred to make it clear that I'm null-terminating here, but for the compiler it is the same?
'\0' is an escape sequence for an octal literal with the value of 0. So there is no difference between them
Side note: if you are dealing with strings than you should use a std::string.
Use '\0' or just 0?
There is no difference in value.
In C, there is no difference in type: both are int.
In C++ they have different types: char and int. So the edge goes to '\0' as there is no type conversion involved.
Different style guides promote one over the other. '\0' for clarity. 0 for lack of clutter.
Right answer: Use the style based on your group's coding standards/guidelines. If you group does not have such guideline, make one. Better to use one than have divergent styles.
'\0' is exactly the same as 0 despite of the type. '\0' is just a representation as a char literal. The type char can be initialized from plain int literals though.
So it's actually impossible to tell what's better, just keep in mind to use it consistently in your code.
Both will generate the same machine code, as 0 will be converted to the character value 0, and '\0' is just another way of writing the character value 0. The latter is clearly a character, so it will show that you didn't accidentally mean to write '0' instead, but other than that, it's exactly the same thing in the end.
It's the same thing. Look at the ascii table.
It's better from my point of view to use '\0' because you explicity say that the end of string.
That help when you read your code (like using NULL for pointer instead of 0).

Strange definition of FALSE and TRUE, why? [duplicate]

This question already has answers here:
Why #define TRUE (1==1) in a C boolean macro instead of simply as 1?
(8 answers)
Closed 9 years ago.
In some code I am working on I have come across strange re-definitions of truth and falsehood. I have seen such things before to make checks more strict/certain, but this one is a little bizarre in my mind and I wonder if anyone can tell me what could be a good reason for such definitions, see below with my comments next to them:
#define FALSE (1 != 1) // why not just define it as "false" or "0"?
#define TRUE (!FALSE) // why not just define it as "true" or "1"?
There are many other strange oddities in this code base. Like there are re-definitions for all the standard types like:
#define myUInt32 unsigned integer // why not just use uint32_t from stdint?
All these little "quirks" make me feel like I am missing something obvious, but I really can't see the point :(
Note: Strictly this is c++ code, but it could have been ported from a 'c' project.
The intent appears to be portability.
#define FALSE (1 != 1) // why not just define it as "false" or "0"?
#define TRUE (!FALSE) // why not just define it as "true" or "1"?
These have boolean type in languages that support it (C++), while providing still-useful numeric values for those that don't (C — even C99 and C11, apparently, despite their acquisition of explicit boolean datatypes).
Having booleans where possible is good for function overloading.
#define myUInt32 unsigned integer // why not just use uint32_t from stdint?
That's fine if stdint is available. You may take such things for granted, but it's a big wide world out there! This code recognises that.
Disclaimer: Personally, I would stick to the standards and simply state that compilers released later than 1990 are a pre-requisite. But we don't know what the underlying requirements are for the project in question.
TRWTF is that the author of the code in question did not explain this in comments alongside.
#define FALSE (1 != 1) // why not just define it as "false" or "0"?
I think it is because the type of the expression (1!=1) depends on the language's support for boolean value — if it is C++, the type is bool, else it is int.
On the other hand 0 is always int, in both languages, and false is not recognized in C.
Strictly this is c++ code, but it could have been ported from a 'c' project.
This is about portability as mentioned previously, but it actually goes far beyond. It's a clever exploit of the language definitions in order to comply with the languages.
These absurd looking macros are not as absurd as they appear at first glance, but they are in fact ingenious, as they guarantee for both C and C++ that TRUE and FALSE have the correct values (and type) of true and false (even if the compiler is a C compiler which doesn't have those keywords, so you can't trivially write something like #define TRUE true).
In C++ code, such a construct would be useless, since the language defines true and false as keywords.
However, in order to have C and C++ seamlessly interoperate in the same code base, you need "something" that works for both (unless you want to use a different code style).
The way these macros are defined is a testimony of the C++ standard on being explicitly vague about what values true and false actually have. The C++ standard states:
Values of type bool are either true or false.
[...]
A zero value, null pointer value, or null member pointer value is converted to false; any other value is converted to true.
[...]
A prvalue of type bool can be converted to a prvalue of type int, with false becoming zero and true becoming one.
Note how it says that two particular values exist and what their names are, and what corresponding integer values they convert to and from, but it does not say what these values are. You might be inclined to think that the values are obviously 0 and 1 (and incidentially you might have guessed right), but that's not the case. The actual values are deliberately not defined.
This is analogous to how pointers (and in particular the nullpointer) are defined. Remember that an integer literal of zero converts to the null pointer and that a nullpointer converts to false, compares equal... blah blah... etc etc.
A lot is being said about which converts to what and whatnot, but it doesn't say anywhere that a null pointer has to have a binary representation of zero (and in fact, there exist some exotic architectures where this isn't the case!).
Many of them have historical reasons such as old codes migrated from C, codes from non-standard C++ compilers, cross compiler codes (portability), to support backward compatibility, following code styles, bad habits.
There were some compilers which had not <cstdint> for integer types like uint32_t, or they had not <cstdbool>. A good programmer had to define everything and use pre-processors heavily to make his program well defined over different compilers.
Today, we can use <cstdint>, true/false, <cstdbool>, ... and everyone is happy!
The nice thing about this defintion is to avoid an implicit conversion from TRUE or FALSE to an integer. This is useful to make sure the compiler can't choose the wrong function overload.
C didn't have a native boolean type, so you couldn't strictly use "false" or "true" without defining them elsewhere - and how would you then define that?
The same argument applies to myUInt32 - C originally didnt have uint32_t and the other types in stdint, so this provides a means of ensuring you are getting the correct size integer. If you ported to a different architecture, you just need to change the definition of myUInt32 to be whatever equates to an unsigned integer 32 bits wide - be that a long, or a short.
There is a nice explanation which states the difference between false and FALSE. I think it might help in understanding bit further though most of the answers have explained it.
here

String literals that contain '\0' - why aren't they the same?

So I did the following test:
char* a = "test";
char* b = "test";
char* c = "test\0";
And now the questions:
1) Is it guaranteed that a==b? I know I'm comparing addresses. This is not meant to compare the strings, but whether identical string literals are stored in a single memory location
2) Why doesn't a==c? Shouldn't the compiler be able to see that they're referring to the same string?
3) Is an extra \0 appended at the end of c, even though it already contains one?
I didn't want to ask 3 different questions for this because they seem somehow related, sorry 'bout that.
Note: The tag is correct, I'm interested in C++. (although please specify if the behavior is different for C)
Is it guaranteed that a==b?
No. But it is allowed by §2.14.5/12:
Whether all string literals are distinct (that is, are stored in nonoverlapping objects) is implementation-defined. The effect of attempting to modify a string literal is undefined.
And as you can see from that last sentence using char* instead of char const* is a recipe for trouble (and your compiler should be rejecting it; make sure you have warnings enabled and high conformance levels selected).
Why doesn't a==c? Shouldn't the compiler be able to see that they're referring to the same string?
No, they're not required to be referring to same array of characters. One has five elements, the other six. An implementation could store the two in overlapping storage, but that's not required.
Is an extra \0 appended at the end of c, even though it already contains one?
Yes.
1 - absolutely not. a might == b though if the compiler chooses to share the same static string.
2 - because they are NOT referring to the same string
3 - yes.
The behavior is no different between C and C++ here except that C++ compilers should reject the assignment to non-const char*.
1) Is it guaranteed that a==b?
It is not. Note that you are comparing addresses and they could be pointing to different locations. Most smart compilers would fold this duplicate literal constant, so the pointers may compare equal, but again its not guaranteed by the standard.
2) Why doesn't a==c? Shouldn't the compiler be able to see that they're referring to the same string?
You are trying to compare pointers, they point to different memory locations. Even if you were comparing the content of such pointers, they are still unequal (see next question).
3) Is an extra \0 appended at the end of c, even though it already contains one?
Yes, there is.
First note that this should be const char* as that's what string literals decay to.
Both create arrays initialized with 't' 'e' 's' 't' folowed by a '\0' (length = 5). Comparing for equality will only tell you if they both start with the same pointer, not if they have the same contents (though logically, the two ideas follow each other).
A isn't equal to C because the same rules apply, a = 't' 'e' 's' 't' '\0' and b = 't' 'e' 's' 't' '\0' '\0'
Yes, the compiler always does it and you shouldn't expicitly do in if you're making a string like this. If you however crated an array and manually populated it, you need to ensure you add the \0.
Note that for my #3, const char[] = "Hello World" would also automatically get the \0 at the end, I was refferring to manually filling the array, not having the compiler work it out.
The problem here is you're mixing the concepts of pointer and textual equivalence.
When you say a == b or a == c you are asking if the pointers involved point to the same physical address. The test has nothing to do with the textual contents of the pointers.
To get textual equivalence you should use strcmp
If you are doing pointer comparisons than a != b, b != c, and c != a. Unless the compiler is smart enough to notice that your first two strings are the same.
If you do a strcmp(str, str) then all your strings will come back as matches.
I am not sure if the compiler will add an additional null termination to c, but I would guess that it would.
As has been said a few times in other answers, you are comparing pointers. However, I would add that strcmp(b,c) should be true, because it stops checking at the first \0.

Why isn't ("Maya" == "Maya") true in C++?

Any idea why I get "Maya is not Maya" as a result of this code?
if ("Maya" == "Maya")
printf("Maya is Maya \n");
else
printf("Maya is not Maya \n");
Because you are actually comparing two pointers - use e.g. one of the following instead:
if (std::string("Maya") == "Maya") { /* ... */ }
if (std::strcmp("Maya", "Maya") == 0) { /* ... */ }
This is because C++03, §2.13.4 says:
An ordinary string literal has type “array of n const char”
... and in your case a conversion to pointer applies.
See also this question on why you can't provide an overload for == for this case.
You are not comparing strings, you are comparing pointer address equality.
To be more explicit -
"foo baz bar" implicitly defines an anonymous const char[m]. It is implementation-defined as to whether identical anonymous const char[m] will point to the same location in memory(a concept referred to as interning).
The function you want - in C - is strmp(char*, char*), which returns 0 on equality.
Or, in C++, what you might do is
#include <string>
std::string s1 = "foo"
std::string s2 = "bar"
and then compare s1 vs. s2 with the == operator, which is defined in an intuitive fashion for strings.
The output of your program is implementation-defined.
A string literal has the type const char[N] (that is, it's an array). Whether or not each string literal in your program is represented by a unique array is implementation-defined. (§2.13.4/2)
When you do the comparison, the arrays decay into pointers (to the first element), and you do a pointer comparison. If the compiler decides to store both string literals as the same array, the pointers compare true; if they each have their own storage, they compare false.
To compare string's, use std::strcmp(), like this:
if (std::strcmp("Maya", "Maya") == 0) // same
Typically you'd use the standard string class, std::string. It defines operator==. You'd need to make one of your literals a std::string to use that operator:
if (std::string("Maya") == "Maya") // same
What you are doing is comparing the address of one string with the address of another. Depending on the compiler and its settings, sometimes the identical literal strings will have the same address, and sometimes they won't (as apparently you found).
Any idea why i get "Maya is not Maya" as a result
Because in C, and thus in C++, string literals are of type const char[], which is implicitly converted to const char*, a pointer to the first character, when you try to compare them. And pointer comparison is address comparison.
Whether the two string literals compare equal or not depends whether your compiler (using your current settings) pools string literals. It is allowed to do that, but it doesn't need to. .
To compare the strings in C, use strcmp() from the <string.h> header. (It's std::strcmp() from <cstring>in C++.)
To do so in C++, the easiest is to turn one of them into a std::string (from the <string> header), which comes with all comparison operators, including ==:
#include <string>
// ...
if (std::string("Maya") == "Maya")
std::cout << "Maya is Maya\n";
else
std::cout << "Maya is not Maya\n";
C and C++ do this comparison via pointer comparison; looks like your compiler is creating separate resource instances for the strings "Maya" and "Maya" (probably due to having an optimization turned off).
My compiler says they are the same ;-)
even worse, my compiler is certainly broken. This very basic equation:
printf("23 - 523 = %d\n","23"-"523");
produces:
23 - 523 = 1
Indeed, "because your compiler, in this instance, isn't using string pooling," is the technically correct, yet not particularly helpful answer :)
This is one of the many reasons the std::string class in the Standard Template Library now exists to replace this earlier kind of string when you want to do anything useful with strings in C++, and is a problem pretty much everyone who's ever learned C or C++ stumbles over fairly early on in their studies.
Let me explain.
Basically, back in the days of C, all strings worked like this. A string is just a bunch of characters in memory. A string you embed in your C source code gets translated into a bunch of bytes representing that string in the running machine code when your program executes.
The crucial part here is that a good old-fashioned C-style "string" is an array of characters in memory. That block of memory is often referred to by means of a pointer -- the address of the start of the block of memory. Generally, when you're referring to a "string" in C, you're referring to that block of memory, or a pointer to it. C doesn't have a string type per se; strings are just a bunch of chars in a row.
When you write this in your code:
"wibble"
Then the compiler provides a block of memory that contains the bytes representing the characters 'w', 'i', 'b', 'b', 'l', 'e', and '\0' in that order (the compiler adds a zero byte at the end, a "null terminator". In C a standard string is a null-terminated string: a block of characters starting at a given memory address and continuing until the next zero byte.)
And when you start comparing expressions like that, what happens is this:
if ("Maya" == "Maya")
At the point of this comparison, the compiler -- in your case, specifically; see my explanation of string pooling at the end -- has created two separate blocks of memory, to hold two different sets of characters that are both set to 'M', 'a', 'y', 'a', '\0'.
When the compiler sees a string in quotes like this, "under the hood" it builds an array of characters, and the string itself, "Maya", acts as the name of the array of characters. Because the names of arrays are effectively pointers, pointing at the first character of the array, the type of the expression "Maya" is pointer to char.
When you compare these two expressions using "==", what you're actually comparing is the pointers, the memory addresses of the beginning of these two different blocks of memory. Which is why the comparison is false, in your particular case, with your particular compiler.
If you want to compare two good old-fashioned C strings, you should use the strcmp() function. This will examine the contents of the memory pointed two by both "strings" (which, as I've explained, are just pointers to a block of memory) and go through the bytes, comparing them one-by-one, and tell you whether they're really the same.
Now, as I've said, this is the kind of slightly surprising result that's been biting C beginners on the arse since the days of yore. And that's one of the reasons the language evolved over time. Now, in C++, there is a std::string class, that will hold strings, and will work as you expect. The "==" operator for std::string will actually compare the contents of two std::strings.
By default, though, C++ is designed to be backwards-compatible with C, i.e. a C program will generally compile and work under a C++ compiler the same way it does in a C compiler, and that means that old-fashioned strings, "things like this in your code", will still end up as pointers to bits of memory that will give non-obvious results to the beginner when you start comparing them.
Oh, and that "string pooling" I mentioned at the beginning? That's where some more complexity might creep in. A smart compiler, to be efficient with its memory, may well spot that in your case, the strings are the same and can't be changed, and therefore only allocate one block of memory, with both of your names, "Maya", pointing at it. At which point, comparing the "strings" -- the pointers -- will tell you that they are, in fact, equal. But more by luck than design!
This "string pooling" behaviour will change from compiler to compiler, and often will differ between debug and release modes of the same compiler, as the release mode often includes optimisations like this, which will make the output code more compact (it only has to have one block of memory with "Maya" in, not two, so it's saved five -- remember that null terminator! -- bytes in the object code.) And that's the kind of behaviour that can drive a person insane if they don't know what's going on :)
If nothing else, this answer might give you a lot of search terms for the thousands of articles that are out there on the web already, trying to explain this. It's a bit painful, and everyone goes through it. If you can get your head around pointers, you'll be a much better C or C++ programmer in the long run, whether you choose to use std::string instead or not!

Why/When to use (!!p) instead of (p != NULL)

In the following code, what is the benefit of using (!!p) instead of (p != NULL)?
AClass *p = getInstanceOfAClass();
if( !!p )
// do something
else
// do something without having valid pointer
It is pretty much the same, although I consider the !!p to be bad style, and usually indicates a coder trying to be clever.
That's a matter of style, in fact they are equivalent. See this very similar question for discussion.
IMO comparing against null pointer is clearer.
I thing GMan’s original comment should be the accepted answer:
I wonder what's wrong with just if (p)
The point is: nothing is wrong with it, and this should be the preferred way. First off, !!p is “too clever”; it’s also completely unnecessary and thus bad (notice: we’re talking about pointers in an if statement here, so Anacrolix’ comment, while generally valid, doesn’t apply here!).
The same goes for p != NULL. While this is possible, it’s just not needed. It’s more code, it’s completely redundant code and hence it makes the code worse. The truest thing Jeff Atwood ever said was that “the best code is no code at all.” Avoid redundant syntax. Stick to the minimum (that still conveys the complete meaning; if (p) is complete).
Finally, if (p) is arguably the most idiomatic way to write this in C++. C++ bends over backwards to get this same behaviour for other types in the language (e.g. data streams), at the cost of some very weird quirks. The next version of the standard even introduces new a syntax to achieve this behaviour in user-defined types.
For pointers, we get the same for free. So use it.
/EDIT: About clarity: sharptooth writes that
IMO comparing against null pointer is clearer.
I claim that this is objectively wrong: if (p) is clearer. There is no possible way that this statement could mean anything else, neither in this context nor in any other, in C++.
As far as I can see, it's just a shorter way to convert it into a boolean value. It applies the ! twice, though, whereas p != NULL does one comparison. So I guess the benefit is just shorter code, albeit more cryptic if you don't know what !!p is supposed to mean.
They are the same, but I recommend to use
NULL != p
It is more readable.
There is no difference in the given example.
However the assumption that this applies to all cases is incorrect. a = not not b is not the same as a = b, as far as integer types are concerned.
In C, 0 is false. Anything but 0 is true. But not 0 is 1, and nothing else. In C++, true casts to 1 as an integer, not only for backward compatibilty with C, but because 1 is not 0, and 1 is the most common value used to denote true in C bool types, including the official C bool type, and BOOL used in Win32.
While for the example code given, !!p is unnecessary because the result is cast to a bool for evaluation of the if condition, that doesn't rule out the use of !! for purposes of casting booleans to expected integer values. Personally in this example, to maximize the probability that type changes and semantics are clear, I would use NULL != p or p != NULL to make it absolutely clear what is meant.
This technique is known as the double-bang idiom, and this guy provides some good justifications.
Do !!NOT use double negation. A simple argument is that since C++ is a limited English subset and english just does not have a double negation then english speakers will have a lot of difficulty to parse what is going on.