single character c-style string full of junk - c++

It's a shame I can't figure out such basic thing about c++, but c-style strings are acting as I wouldn't expect. For example, I create it like this:
char* cstr = new char[1];
It's initialized to: Íýýýýý««««««««îţ . Like normal, I can set just first char because others are not really existing (or I thought that they aren't). While working whit c-style strings all this junk is ingored and everything works fine.
Now I mixed std::string whit those c-stlye one and what I get is a mess. Whit this code:
std::string str = "aaa";
str += cstr;
I end up whit: aaaÍýýýýý««««««««îţ , but now those characters actually exist as string.size() returns length including this junk.
I can't find why is this happening, but it must be connected whit string creating, because something like char* cstr = "aaa" results in aaa without any additional junk, but trying to change string initialized this way results in memory access violation. Could someone explain me this behavior please? Thanks!
PS: My JavaScript Failed to load so if someone could format this post properly, I'd be glad!
Answer: Oh god! How could I forget on that... thanks to all for, well, immediate answer. Best one was from minitech so I'll mark this as answer as soon as my java script loads up :/

All C-style strings are null-terminated. So, a string initialized using new char[1] leaves you space for no characters. You can't set the first character to anything but \0, otherwise normal string operations will keep reading into memory until they find a zero. So use new char[2] instead.

When working with C-style strings you need to have a null terminator:
char* cstr = new char[2];
cstr[0] = 'X';
cstr[1] = '\0';
Having said all that, it is really bad code to do the above. Just use std::string unless you have a very good reason not too. It takes care of the memory allocations and deallocations for you.

C-style strings require a NUL ('\0') terminator; they don't have a length associated with them like C++ strings do. So your single-character string must be new char[2]; it will not be initialized; and you will need to make sure it's terminated with \0.

When you use new char[1], you request space for an array of characters. There is no request that said characters are initialized. Thus, the "junk" that you see is uninitialized memory. Before treating the array as a C-style string, you should do this:
cstr[0] = '\0';

c-style strings are NULL delimited. So, to ignore any junk in memory you need to place NULL byte('\0') in the string body. Otherwise, system library function will look at all bytes starting with your string start until they meet NULL byte in the memory (which will be at some random position).
This also mean that to have c-style string of one character you actually need to allocate 2 bytes: one for a meaningful character and second for '\0'.

Related

Please explain me the need for appending '\0' while converting string to char

While using proc/mysql for c++ I have taken string as user input and converted into char via strcpy(c,s.c_str()); function, where c is the binding variable through which I'll add value in the database table and s is the string (user input), it is working fine but my teacher is asking me append '\0' at the end - I can't understand the reason why I need to?
Your teacher is deluded.
c_str() in itself appends a zero [or rather, std::string reserves space for an extra character when creating the string, and makes sure this is zero at least at the point of c_str() returning - in C++11, it is guaranteed that there is an extra character space filled with zero at the end of the string, always].
You DO need a zero at the end of a string to mark the end of the string in a C-style string, such as those used by strcpy.
[As others have pointed out, you should also check that the string fits before copying, and I would suggest reject if it won't fit, as truncating it will lead to other problems - as well as checking that there isn't any sql-injection attacks and a multitude of other things required for "good pracice in an SQL environment"]
While the teacher is deluded on the appending '\0' to the string, your code exhibits another very bad bug.
You should never use strcpy in such a fashion. You should always use some routine which controls the nubmer of characters copied, like strncpy(), or other alternatives, and provide it with the size of receiving variable. Otherwise you are just asking for troubles.
Just guessing, it's a protection against buffer overflow. If c is only N bytes long and s.c_str() returns a pointer to a N+k length string, you'd write k bytes after c, which is bad.
Now let's say (if you didn't SEGFAULT already) you pass this c NUL-terminated string to a C function, you have no guarantee that the \0 you wrote after c is still there. This C function will then read an undefined amount of bytes after c, which is badder worse.
Anyway, use ::strncpy():
char c[64];
::strncpy(c, s.c_str(), sizeof(c));
c[sizeof(c)-1] = '\0';

how to make a not null-terminated c string?

i am wondering :char *cs = .....;what will happen to strlen() and printf("%s",cs) if cs point to memory block which is huge but with no '\0' in it?
i write these lines:
char s2[3] = {'a','a','a'};
printf("str is %s,length is %d",s2,strlen(s2));
i get the result :"aaa","3",but i think this result is because that a '\0'(or a 0 byte) happens to reside in the location s2+3.
how to make a not null-terminated c string? strlen and other c string function relies heavily on the '\0' byte,what if there is no '\0',i just want know this rule deeper and better.
ps: my curiosity is aroused by studying the follw post on SO.
How to convert a const char * to std::string
and these word in that post :
"This is actually trickier than it looks, because you can't call strlen unless the string is actually nul terminated."
If it's not null-terminated, then it's not a C string, and you can't use functions like strlen - they will march off the end of the array, causing undefined behaviour. You'll need to keep track of the length some other way.
You can still print a non-terminated character array with printf, as long as you give the length:
printf("str is %.3s",s2);
printf("str is %.*s",s2_length,s2);
or, if you have access to the array itself, not a pointer:
printf("str is %.*s", (int)(sizeof s2), s2);
You've also tagged the question C++: in that language, you usually want to avoid all this error-prone malarkey and use std::string instead.
A "C string" is, by definition, null-terminated. The name comes from the C convention of having null-terminated strings. If you want something else, it's not a C string.
So if you have a string that is not null-terminated, you cannot use the C string manipulation routines on it. You can't use strlen, strcpy or strcat. Basically, any function that takes a char* but no separate length is not usable.
Then what can you do? If you have a string that is not null-terminated, you will have the length separately. (If you don't, you're screwed. You need some way to find the length, either by a terminator or by storing it separately.) What you can do is allocate a buffer of the appropriate size, copy the string over, and append a null. Or you can write your own set of string manipulation functions that work with pointer and length. In C++ you can use std::string's constructor that takes a char* and a length; that one doesn't need the terminator.
Your supposition is correct: your strlen is returning the correct value out of sheer luck, because there happens to be a zero on the stack right after your improperly terminated string. It probably helps that the string is 3 bytes, and the compiler is likely aligning stuff on the stack to 4-byte boundaries.
You cannot depend on this. C strings need NUL characters (zeroes) at the end to work correctly. C string handling is messy, and error-prone; there are libraries and APIs that help make it less so… but it's still easy to screw up. :)
In this particular case, your string could be initialized as one of these:
A: char s2[4] = { 'a','a','a', 0 }; // good if string MUST be 3 chars long
B: char *s2 = "aaa"; // if you don't need to modify the string after creation
C: char s2[]="aaa"; // if you DO need to modify the string afterwards
Also note that declarations B and C are 'safer' in the sense that if someone comes along later and changes the string declaration in a way that alters the length, B and C are still correct automatically, whereas A depends on the programmer remembering to change the array size and keeping the explicit null terminator at the end.
What happens is that strlen keeps going, reading memory values until it eventually gets to a null. it then assumes that is the terminator and returns the length that could be massively large. If you're using strlen in an environment that expects C-strings to be used, you could then copy this huge buffer of data into another one that is just not big enough - causing buffer overrun problems, or at best, you could copy a large amount of garbage data into your buffer.
Copying a non-null terminated C string into a std:string will do this. If you then decide that you know this string is only 3 characters long and discard the rest, you will still have a massively long std:string that contains the first 3 good characters and then a load of wastage. That's inefficient.
The moral is, if you're using the CRT functions to operator on C strings, they must be null-terminated. Its no different to any other API, you must follow the rules that API sets down for correct usage.
Of course, there is no reason you cannot use the CRT functions if you always use the specific-length versions (eg strncpy) but you will have to limit yourself to just those, always, and manually keep track of the correct lengths.
Convention states that a char array with a terminating \0 is a null terminated string. This means that all str*() functions expect to find a null-terminator at the end of the char-array. But that's it, it's convention only.
By convention also strings should contain printable characters.
If you create an array like you did char arr[3] = {'a', 'a', 'a'}; you have created a char array. Since it is not terminated by a \0 it is not called a string in C, although its contents can be printed to stdout.
The C standard does not define the term string until the section 7 - Library functions. The definition in C11 7.1.1p1 reads:
A string is a contiguous sequence of characters terminated by and including the first null character.
(emphasis mine)
If the definition of string is a sequence of characters terminated by a null character, a sequence of non-null characters not terminated by a null is not a string, period.
What you have done is undefined behavior.
You are trying to write to a memory location that is not yours.
Change it to
char s2[] = {'a','a','a','\0'};

Why are strings in C++ usually terminated with '\0'?

In many code samples, people usually use '\0' after creating a new char array like this:
string s = "JustAString";
char* array = new char[s.size() + 1];
strncpy(array, s.c_str(), s.size());
array[s.size()] = '\0';
Why should we use '\0' here?
The title of your question references C strings. C++ std::string objects are handled differently than standard C strings. \0 is important when using C strings, and when I use the term string in this answer, I'm referring to standard C strings.
\0 acts as a string terminator in C. It is known as the null character, or NUL, and standard C strings are null-terminated. This terminator signals code that processes strings - standard libraries but also your own code - where the end of a string is. A good example is strlen which returns the length of a string: strlen works using the assumption that it operates on strings that are terminated using \0.
When you declare a constant string with:
const char *str = "JustAString";
then the \0 is appended automatically for you. In other cases, where you'll be managing a non-constant string as with your array example, you'll sometimes need to deal with it yourself. The docs for strncpy, which is used in your example, are a good illustration: strncpy copies over the null terminator character except in the case where the specified length is reached before the entire string is copied. Hence you'll often see strncpy combined with the possibly redundant assignment of a null terminator. strlcpy and strcpy_s were designed to address the potential problems that arise from neglecting to handle this case.
In your particular example, array[s.size()] = '\0'; is one such redundancy: since array is of size s.size() + 1, and strncpy is copying s.size() characters, the function will append the \0.
The documentation for standard C string utilities will indicate when you'll need to be careful to include such a null terminator. But read the documentation carefully: as with strncpy the details are easily overlooked, leading to potential buffer overflows.
Why are strings in C++ usually terminated with '\0'?
Note that C++ Strings and C strings are not the same.
In C++ string refers to std::string which is a template class and provides a lot of intuitive functions to handle the string.
Note that C++ std::string are not \0 terminated, but the class provides functions to fetch the underlying string data as \0 terminated c-style string.
In C a string is collection of characters. This collection usually ends with a \0.
Unless a special character like \0 is used there would be no way of knowing when a string ends.
It is also aptly known as the string null terminator.
Ofcourse, there could be other ways of bookkeeping to track the length of the string, but using a special character has two straight advantages:
It is more intuitive and
There are no additional overheads
Note that \0 is needed because most of Standard C library functions operate on strings assuming they are \0 terminated.
For example:
While using printf() if you have an string which is not \0terminated then printf() keeps writing characters to stdout until a \0 is encountered, in short it might even print garbage.
Why should we use '\0' here?
There are two scenarios when you do not need to \0 terminate a string:
In any usage if you are explicitly bookkeeping length of the string and
If you are using some standard library api will implicitly add a \0 to strings.
In your case you already have the second scenario working for you.
array[s.size()] = '\0';
The above code statement is redundant in your example.
For your example using strncpy() makes it useless. strncpy() copies s.size() characters to your array, Note that it appends a null termination if there is any space left after copying the strings. Since arrayis of size s.size() + 1 a \0 is automagically added.
'\0' is the null termination character. If your character array didn't have it and you tried to do a strcpy you would have a buffer overflow. Many functions rely on it to know when they need to stop reading or writing memory.
strncpy(array, s.c_str(), s.size());
array[s.size()] = '\0';
Why should we use '\0' here?
You shouldn't, that second line is waste of space. strncpy already adds a null termination if you know how to use it. The code can be rewritten as:
strncpy(array, s.c_str(), s.size()+1);
strncpy is sort of a weird function, it assumes that the first parameter is an array of the size of the third parameter. So it only copies null termination if there is any space left after copying the strings.
You could also have used memcpy() in this case, it will be slightly more efficient, though perhaps makes the code less intuitive to read.
In C, we represent string with an array of char (or w_char), and use special character to signal the end of the string. As opposed to Pascal, which stores the length of the string in the index 0 of the array (thus the string has a hard limit on the number of characters), there is theoretically no limit on the number of characters that a string (represented as array of characters) can have in C.
The special character is expected to be NUL in all the functions from the default library in C, and also other libraries. If you want to use the library functions that relies on the exact length of the string, you must terminate the string with NUL. You can totally define your own terminating character, but you must understand that library functions involving string (as array of characters) may not work as you expect and it will cause all sorts of errors.
In the snippet of code given, there is a need to explicitly set the terminating character to NUL, since you don't know if there are trash data in the array allocated. It is also a good practice, since in large code, you may not see the initialization of the array of characters.

strncpy char string issue when adding length

I'm having a problem with comparing 2 char strings that are both the same:
char string[50];
strncpy(string, "StringToCompare", 49);
if( !strcmp("StringToCompare", string) )
//do stuff
else
//the code runs into here even tho both strings are the same...this is what the problem is.
If I use:
strcpy(string, "StringToCompare");
instead of:
strncpy(string, "StringToCompare", 49);
it solves the problem, but I would rather insert the length of the string rather than it getting it itself.
What's going wrong here? How do I solve this problem?
You forgot to put a terminating NUL character to string, so maybe strcmp run over the end. Use this line of code:
string[49] = '\0';
to solve your problem.
You need to set the null terminator manually when using strncpy:
strncpy(string, "StringToCompare", 48);
string[49] = 0;
Lots of apparent guesses in the other answers, but a quick suggestion.
First of all, the code as written should work (and in fact, does work in Visual Studio 2010). The key is in the details of 'strncpy' -- it will not implicity add a null terminating character unless the source length is less than the destination length (which it is in this case). strcpy on the other hand does include the null terminator in all cases, suggesting that your compiler isn't properly handling the strncpy function.
So, if this isn't working on your compiler, you should likely initialize your temporary buffer like this:
char string[50] = {0}; // initializes all the characters to 0
// below should be 50, as that is the number of
// characters available in the string (not 49).
strncpy(string, "StringToCompare", 50);
However, I suspect this is likely just an example, and in the real world your source string is 49 (again, you should pass 50 to strncpy in this case) characters or longer, in which case the NULL terminator is NOT being copied into your temporary string.
I would echo the suggestions in the comments to use std::string if available. It takes care of all of this for you, so you can focus on your implementation rather than these trite details.
The byte count parameter in strncpy tells the function how many bytes to copy, not the length of the character buffer.
So in your case you are asking to copy 49 bytes from your constant string into the buffer, which I don't think is your intent!
However, it doesn't explain why you are getting the anomalous result. What compiler are you using? When I run this code under VS2005 I get the correct behavior.
Note that strncpy() has been deprecated in favor of strncpy_s, which does want the buffer length passed to it:
strncpy_s (string,sizeof(string),"StringToCompare",49)
strcopy and strncpy: in this situation they behave identically!!
So you didn't tell us the truth or the whole picture (eg: the string is at least 49 characters long)

What is the internal structure of an object of the (EDIT: MFC) CString class?

I need to strncpy() (effectively) from a (Edit: MFC) CString object to a C string variable. It's well known that strncpy() sometimes fails (depending on the source length **EDIT and the length specified in the call) to terminate the dest C string correctly. To avoid that evil, I'm thinking to store a NUL char inside the CString source object and then to strcpy() or memmove() that guy.
Is this a reasonable way to go about it? If so, what must I manipulate inside the CString object? If not, then what's an alternative that will guarantee a properly-terminated destination C string?
strncpy() only "fails" to null-terminate the destination string when the source string is longer than the length limit you specify. You can ensure that the destination is null-terminated by setting its last character to null yourself. For example:
#define DEST_STR_LEN 10
char dest_str[DEST_STR_LEN + 1]; // +1 for the null
strncpy(dest_str, src_str, DEST_STR_LEN);
dest_str[DEST_STR_LEN] = '\0';
If src_str is more than DEST_STR_LEN characters long, dest_str will be a properly-terminated string of DEST_STR_LEN characters. If src_str is shorter than that, strncpy() will put a null terminator somewhere within dest_str, so the null at the very end is irrelevant and harmless.
CSimpleStringT::GetString gives a pointer to a null-terminated string. Use this as the soure for strncpy. As this is C++, you should only use C-style strings when interfacing with legacy APIs. Use std::string instead.
One of the alternative ways would be to zero string first and then cast or memcpy from CString.
I hope they don't changed from when I used them: that was many years ago :)
They used an interesting 'trick' to handle the refcount and the very fast and efficient automatic conversion to char*: i.e the pointer is to LPCSTR, but some back byte is reserved to keep the implementation state.
So the struct can be used with the older windows API (LPCSTR without overhead). I found at the time the idea interesting!
Of course the key ìs the availability of allocators: they simply offsets the pointer when mallocing/freeing.
I remember there was a buffer request to (for instance) modify the data available: GetBuffer(0), followed by ReleaseBuffer().
HTH
If you are not compiling with _UNICODE enabled, then you can get a const char * from a CString very easily. Just cast it to an LPCTSTR:
CString myString("stuff");
const char *byteString = (LPCTSTR)myString;
This is guaranteed to be NULL-terminated.
If you have built with _UNICODE, then CString is a UTF-16 encoded string. You can't really do anything directly with that.
If you do need to copy the data from the CString, this very easy, even using C-style code. Just make sure that you allocate sufficient memory and are copying the right length:
CString myString("stuff");
char *outString = (char*)malloc(myString.Length() + 1);
strncpy(outString, (LPCTSTR)myString, myString.Length());
CString ends with NULL so as long as your text is correct (no NULL characters inside) then copying should be safe. You can write:
char szStr[256];
strncpy(szStr, (LPCSTR) String, 3);
szStr[3]='\0'; /// b-cos no null-character is implicitly appended to the end of destination
if you store null somehere inside CString object you will probably cause yourself more problems, CString stores its lenght internally.
Another alternative solution would rather involve support from CPU or compiler, as it's much better approach - simply make sure that when copying memory in "safe" mode, at any time after every atomic operation there is zero added on the end, so when whole loop fails, the destination string will still be terminated, without need to zero it fully before making copy.
There could be also support for fast zero - just mark start and stop of zeroed region and it's instantly cleared in RAM, this would make things a lot easier.