Compare strings ignore NUL

Compare strings ignore NUL - c++

I'm trying to compare two std::strings, "abc\0" and "abc".
Is there a way to ignore a character, in this case '\0' (NUL), when comparing?
Right now I'm doing a pop_back() on the string with the trailing NUL to remove it, but there must be a better way to handle this.

If the null terminator is always at the end, you could use std::strcmp.
Otherwise you can write the loop yourself to iterate both strings and compare the characters, with special rules when encountering the terminator.
A less efficient version can be implemented purely using standard algorithms: Create a copy of each string with the terminators removed (std::copy_if), and then use comparison operator of std::string.

Unlike C-style strings, NUL is a valid character for std::string and will be treated no difference as other characters.
Therefore, if one of your std::strings suffixes a NUL character and, that is not your want, you may should better figure it out why a NUL was included in the first place.
If you can't locate the source and fix it, there are two general workarounds.
(1) Trimming the string before the comparison
you can use a trim()-like funcion from third-party libraries, e.g. boost; or write your own implementation.
This function simply removes NUL character(s).
(2) Comparing strings in C-Style
use std::string::c_str() to obtain a c-style string and then use function like strcmp to do comparison。
Note, if there is a NUL character in the middle of the string, like ab\0c, this workaround would provide you a incorrect result.

If you are comparing two strings, one of length 4 "abc\0" and one of length 3 "abc", of course they are not going to compare equal. The fact that one of the characters is a NUL is not relevant here.
If you're trying to see if one string is the same as the other except that it has a NUL embedded in it at the end - that can be written thus:
std::string s1{"abc\0", 4};
std::string s2{"abc" , 3};
return s2.length() == s1.length() - 1 && // sizes differ by one char
s1.back() == '\0' && // last char is a NUL
s1.compare(0, s1.length() - 1, s2) == 0; // all the other ones are the same

Related

What does '\0' mean?

I can't understand what the '\0' in the two different place mean in the following code:
string x = "hhhdef\n";
cout << x << endl;
x[3]='\0';
cout << x << endl;
cout<<"hhh\0defef\n"<<endl;
Result:
hhhdef
hhhef
hhh
Can anyone give me some pointers?

C++ std::strings are "counted" strings - i.e., their length is stored as an integer, and they can contain any character. When you replace the third character with a \0 nothing special happens - it's printed as if it was any other character (in particular, your console simply ignores it).
In the last line, instead, you are printing a C string, whose end is determined by the first \0 that is found. In such a case, cout goes on printing characters until it finds a \0, which, in your case, is after the third h.

C++ has two string types:
The built-in C-style null-terminated strings which are really just byte arrays and the C++ standard library std::string class which is not null terminated.
Printing a null-terminated string prints everything up until the first null character. Printing a std::string prints the whole string, regardless of null characters in its middle.

\0 is the NULL character, you can find it in your ASCII table, it has the value 0.
It is used to determinate the end of C-style strings.
However, C++ class std::string stores its size as an integer, and thus does not rely on it.

You're representing strings in two different ways here, which is why the behaviour differs.
The second one is easier to explain; it's a C-style raw char array. In a C-style string, '\0' denotes the null terminator; it's used to mark the end of the string. So any functions that process/display strings will stop as soon as they hit it (which is why your last string is truncated).
The first example is creating a fully-formed C++ std::string object. These don't assign any special meaning to '\0' (they don't have null terminators).

The \0 is treated as NULL Character. It is used to mark the end of the string in C.
In C, string is a pointer pointing to array of characters with \0 at the end. So following will be valid representation of strings in C.
char *c =”Hello”; // it is actually Hello\0
char c[] = {‘Y’,’o’,’\0′};
The applications of ‘\0’ lies in determining the end of string .For eg : finding the length of string.

The \0 is basically a null terminator which is used in C to terminate the end of string character , in simple words its value is null in characters basically gives the compiler indication that this is the end of the String Character
Let me give you example -
As we write printf("Hello World"); /* Hello World\0
here we can clearly see \0 is acting as null ,tough printinting the String in comments would give the same output .

String going crazy if I don't give it a little extra room. Can anyone explain what is happening here?

First, I'd like to say that I'm new to C / C++, I'm originally a PHP developer so I am bred to abuse variables any way I like 'em.
C is a strict country, compilers don't like me here very much, I am used to breaking the rules to get things done.
Anyway, this is my simple piece of code:
char IP[15] = "192.168.2.1";
char separator[2] = "||";
puts( separator );
Output:
||192.168.2.1
But if I change the definition of separator to:
char separator[3] = "||";
I get the desired output:
||
So why did I need to give the man extra space, so he doesn't sleep with the man before him?

That's because you get a not null-terminated string when separator length is forced to 2.
Always remember to allocate an extra character for the null terminator. For a string of length N you need N+1 characters.
Once you violate this requirement any code that expects null-terminated strings (puts() function included) will run into undefined behavior.
Your best bet is to not force any specific length:
char separator[] = "||";
will allocate an array of exactly the right size.

Strings in C are NUL-terminated. This means that a string of two characters requires three bytes (two for the characters and the third for the zero byte that denotes the end of the string).
In your example it is possible to omit the size of the array and the compiler will allocate the correct amount of storage:
char IP[] = "192.168.2.1";
char separator[] = "||";
Lastly, if you are coding in C++ rather than C, you're better off using std::string.

If you're using C++ anyway, I'd recommend using the std::string class instead of C strings - much easier and less error-prone IMHO, especially for people with a scripting language background.

There is a hidden nul character '\0' at the end of each string. You have to leave space for that.
If you do
char seperator[] = "||";
you will get a string of size 3, not size 2.

Because in C strings are nul terminated (their end is marked with a 0 byte). If you declare separator to be an array of two characters, and give them both non-zero values, then there is no terminator! Therefore when you puts the array pretty much anything could be tacked on the end (whatever happens to sit in memory past the end of the array - in this case, it appears that it's the IP array).
Edit: this following is incorrect. See comments below.
When you make the array length 3, the extra byte happens to have 0 in it, which terminates the string. However, you probably can't rely on that behavior - if the value is uninitialized it could really contain anything.

In C strings are ended with a special '\0' character, so your separator literal "||" is actually one character longer. puts function just prints every character until it encounters '\0' - in your case one after the IP string.

In C, strings include a (invisible) null byte at the end. You need to account for that null byte.
char ip[15] = "1.2.3.4";
in the code above, ip has enough space for 15 characters. 14 "regular characters" and the null byte. It's too short: should be char ip[16] = "1.2.3.4";
ip[0] == '1';
ip[1] == '.';
/* ... */
ip[6] == '4';
ip[7] == '\0';

Since no one pointed it out so far: If you declare your variable like this, the strings will be automagically null-terminated, and you don't have to mess around with the array sizes:
const char* IP = "192.168.2.1";
const char* seperator = "||";
Note however, that I assume you don't intend to change these strings.
But as already mentioned, the safe way in C++ would be using the std::string class.

A C "String" always ends in NULL, but you just do not give it to the string if you write
char separator[2] = "||". And puts expects this \0 at the ned in the first case it writes till it finds a \0 and here you can see where it is found at the end of the IP address. Interesting enoiugh you can even see how the local variables are layed out on the stack.

The line: char seperator[2] = "||"; should get you undefined behaviour since the length of that character array (which includes the null at the end) will be 3.
Also, what compiler have you compiled the above code with? I compiled with g++ and it flagged the above line as an error.

String in C\C++ are null terminated, i.e. have a hidden zero at the end.
So your separator string would be:
{'|', '|', '\0'} = "||"

STL basic_string length with null characters

Why is it that you can insert a '\0' char in a std::basic_string and the .length() method is unaffected but if you call char_traits<char>::length(str.c_str()) you get the length of the string up until the first '\0' character?
e.g.
string str("abcdefgh");
cout << str.length(); // 8
str[4] = '\0';
cout << str.length(); // 8
cout << char_traits<char>::length(str.c_str()); // 4

Great question!
The reason is that a C-style string is defined as a sequence of bytes that ends with a null byte. When you use .c_str() to get a C-style string out of a C++ std::string, then you're getting back the sequence the C++ string stores with a null byte after it. When you pass this into strlen, it will scan across the bytes until it hits a null byte, then report how many characters it found before that. If the string contains a null byte, then strlen will report a value that's smaller than the whole length of the string, since it will stop before hitting the real end of the string.
An important detail is that strlen and char_traits<char>::length are NOT the same function. However, the C++ ISO spec for char_traits<charT>::length (§21.1.1) says that char_traits<charT>::length(s) returns the smallest i such that char_traits<charT>::eq(s[i], charT()) is true. For char_traits<char>, the eq function just returns if the two characters are equal by doing a == comparison, and constructing a character by writing char() produces a null byte, and so this is equal to saying "where is the first null byte in the string?" It's essentially how strlen works, though the two are technically different functions.
A C++ std::string, however, it a more general notion of "an arbitrary sequence of characters." The particulars of its implementation are hidden from the outside world, though it's probably represented either by a start and stop pointer or by a pointer and a length. Because this representation does not depend on what characters are being stored, asking the std::string for its length tells you how many characters are there, regardless of what those characters actually are.
Hope this helps!

Defining a string with no null terminating char(\0) at the end

What are various ways in C/C++ to define a string with no null terminating char(\0) at the end?
EDIT: I am interested in character arrays only and not in STL string.

Typically as another poster wrote:
char s[6] = {'s', 't', 'r', 'i', 'n', 'g'};
or if your current C charset is ASCII, which is usually true (not much EBCDIC around today)
char s[6] = {115, 116, 114, 105, 110, 107};
There is also a largely ignored way that works only in C (not C++)
char s[6] = "string";
If the array size is too small to hold the final 0 (but large enough to hold all the other characters of the constant string), the final zero won't be copied, but it's still valid C (but invalid C++).
Obviously you can also do it at run time:
char s[6];
s[0] = 's';
s[1] = 't';
s[2] = 'r';
s[3] = 'i';
s[4] = 'n';
s[5] = 'g';
or (same remark on ASCII charset as above)
char s[6];
s[0] = 115;
s[1] = 116;
s[2] = 114;
s[3] = 105;
s[4] = 110;
s[5] = 103;
Or using memcopy (or memmove, or bcopy but in this case there is no benefit to do that).
memcpy(c, "string", 6);
or strncpy
strncpy(c, "string", 6);
What should be understood is that there is no such thing as a string in C (in C++ there is strings objects, but that's completely another story). So called strings are just char arrays. And even the name char is misleading, it is no char but just a kind of numerical type. We could probably have called it byte instead, but in the old times there was strange hardware around using 9 bits registers or such and byte implies 8 bits.
As char will very often be used to store a character code, C designers thought of a simpler way than store a number in a char. You could put a letter between simple quotes and the compiler would understand it must store this character code in the char.
What I mean is (for example) that you don't have to do
char c = '\0';
To store a code 0 in a char, just do:
char c = 0;
As we very often have to work with a bunch of chars of variable length, C designers also choosed a convention for "strings". Just put a code 0 where the text should end. By the way there is a name for this kind of string representation "zero terminated string" and if you see the two letters sz at the beginning of a variable name it usually means that it's content is a zero terminated string.
"C sz strings" is not a type at all, just an array of chars as normal as, say, an array of int, but string manipulation functions (strcmp, strcpy, strcat, printf, and many many others) understand and use the 0 ending convention. That also means that if you have a char array that is not zero terminated, you shouldn't call any of these functions as it will likely do something wrong (or you must be extra carefull and use functions with a n letter in their name like strncpy).
The biggest problem with this convention is that there is many cases where it's inefficient. One typical exemple: you want to put something at the end of a 0 terminated string. If you had kept the size you could just jump at the end of string, with sz convention, you have to check it char by char. Other kind of problems occur when dealing with encoded unicode or such. But at the time C was created this convention was very simple and did perfectly the job.
Nowadays, the letters between double quotes like "string" are not plain char arrays as in the past, but const char *. That means that what the pointer points to is a constant that should not be modified (if you want to modify it you must first copy it), and that is a good thing because it helps to detect many programming errors at compile time.

The terminating null is there to terminate the string. Without it, you need some other method to determine it's length.
You can use a predefined length:
char s[6] = {'s','t','r','i','n','g'};
You can emulate pascal-style strings:
unsigned char s[7] = {6, 's','t','r','i','n','g'};
You can use std::string (in C++). (since you're not interested in std::string).
Preferably you would use some pre-existing technology that handles unicode, or at least understands string encoding (i.e., wchar.h).
And a comment: If you're putting this in a program intended to run on an actual computer, you might consider typedef-ing your own "string". This will encourage your compiler to barf if you ever accidentally try to pass it to a function expecting a C-style string.
typedef struct {
char[10] characters;
} ThisIsNotACString;

C++ std::strings are not NUL terminated.
P.S : NULL is a macro1. NUL is \0. Don't mix them up.
1: C.2.2.3 Macro NULL
The macro NULL, defined in any of <clocale>, <cstddef>, <cstdio>, <cstdlib>, <cstring>,
<ctime>, or <cwchar>, is an implementation-defined C++ null pointer constant in this International
Standard (18.1).

In C++ you can use the string class and not deal with the null char at all.

Just for the sake of completeness and nail this down completely.
vector<char>

Use std::string.
There are dozens of other ways to store strings, but using a library is often better than making your own. I'm sure we could all come up with plenty of wacky ways of doing strings without null terminators :).

In C there generally won't be an easier solution. You could possibly do what pascal did and put the length of the string in the first character, but this is a bit of a pain and will limit your string length to the size of the integer that can fit in the space of the first char.
In C++ I'd definitely use the std::string class that can be accessed by
#include <string>
Being a commonly used library this will almost certainly be more reliable than rolling your own string class.

The reason for the NULL termination is so that the handler of the string can determine it's length. If you don't use a NULL termination, you need to pass the strings length, either through a separate parameter/variable, or as part of the string. Otherwise, you could use another delimeter, so long as it isn't used within the string itself.
To be honest, I don't quite understand your question, or if it actually is a question.

Even the string class will store it with a null. If for some reason you absolutely do not want a null character at the end of your string in memory, you'd have to manually create a block of characters, and fill it out yourself.
I can't personally think of any realistic scenario for why you'd want to do this, since the null character is what signals the end of the string. If you're storing the length of the string too, then I guess you've saved one byte at the cost of whatever the size of your variable is (likely 4 bytes), and gained faster access to the length of said string.

How to use string and string pointers in C++

I am very confused about when to use string (char) and when to use string pointers (char pointers) in C++. Here are two questions I'm having.
which one of the following two is correct?
string subString;
subString = anotherString.sub(9);
string *subString;
subString = &anotherString.sub(9);
which one of the following two is correct?
char doubleQuote = aString[9];
if (doubleQuote == "\"") {...}
char *doubleQuote = &aString[9];
if (doubleQuote == "\"") {...}

None of them are correct.
The member function sub does not exist for string, unless you are using another string class that is not std::string.
The second one of the first question subString = &anotherString.sub(9); is not safe, as you're storing the address of a temporary. It is also wrong as anotherString is a pointer to a string object. To call the sub member function, you need to write anotherString->sub(9). And again, member function sub does not exist.
The first one of the second question is more correct than the second one; all you need to do is replace "\"" with '\"'.
The second one of the second question is wrong, as:
doubleQuote does not refer to the 10th character, but the string from the 10th character onwards
doubleQuote == "\"" may be type-wise correct, but it doesn't compare equality of the two strings; it checks if they are pointing to the same thing. If you want to check the equality of the two strings, use strcmp.

In C++, you can (and should) always use std::string (while remembering that string literals actually are zero-terminated character arrays). Use char* only when you need to interface with C code.
C-style strings need error-prone manual memory management, need to explicitly copy strings (copying pointers doesn't copy the string), and you need to pay attention to details like allocating enough memory to have the terminating '\0' fit in, while std::string takes care of all this automagically.

For the first question, the first sample, assuming sub will return a substring of the provided string.
For the second, none:
char doubleQuote = aString[9];
if( doubleQuote == '\"') { ... }

Erm, are you using string from STL?
(i.e. you have something like
#include <string>
#using namespace std;
in the beginning of your source file ;) )
then it would be like
string mystring("whatever:\"\""");
char anElem = mystring[9];
if (anElem=="\"") { do_something();}
or you can write
mystring.at(9)
instead of square brackets.
May be these examples can help.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Compare strings ignore NUL - c++

I'm trying to compare two std::strings, "abc\0" and "abc". Is there a way to ignore a character, in this case '\0' (NUL), when comparing? Right now I'm doing a pop_back() on the string with the trailing NUL to remove it, but there must be a better way to handle this.

Related

What does '\0' mean?

String going crazy if I don't give it a little extra room. Can anyone explain what is happening here?

STL basic_string length with null characters

Defining a string with no null terminating char(\0) at the end

How to use string and string pointers in C++

Categories

Resources