I have to write a function that fills a char* buffer for an assigned length with the content of a string. If the string is too long, I just have to cut it. The buffer is not allocated by me but by the user of my function. I tried something like this:
int writebuff(char* buffer, int length){
string text="123456789012345";
memcpy(buffer, text.c_str(),length);
//buffer[length]='\0';
return 1;
}
int main(){
char* buffer = new char[10];
writebuff(buffer,10);
cout << "After: "<<buffer<<endl;
}
my question is about the terminator: should it be there or not? This function is used in a much wider code and sometimes it seems I get problems with strange characters when the string needs to be cut.
Any hints on the correct procedure to follow?
A C-style string must be terminated with a zero character '\0'.
In addition you have another problem with your code - it may try to copy from beyond the end of your source string. This is classic undefined behavior. It may look like it works, until the one time that the string is allocated at the end of a heap memory block and the copy goes off into a protected area of memory and fails spectacularly. You should copy only until the minimum of the length of the buffer or the length of the string.
P.S. For completeness here's a good version of your function. Thanks to Naveen for pointing out the off-by-one error in your terminating null. I've taken the liberty of using your return value to indicate the length of the returned string, or the number of characters required if the length passed in was <= 0.
int writebuff(char* buffer, int length)
{
string text="123456789012345";
if (length <= 0)
return text.size();
if (text.size() < length)
{
memcpy(buffer, text.c_str(), text.size()+1);
return text.size();
}
memcpy(buffer, text.c_str(), length-1);
buffer[length-1]='\0';
return length-1;
}
If you want to treat the buffer as a string you should NULL terminate it. For this you need to copy length-1 characters using memcpy and set the length-1 character as \0.
it seems you are using C++ - given that, the simplest approach is (assuming that NUL termination is required by the interface spec)
int writebuff(char* buffer, int length)
{
string text = "123456789012345";
std::fill_n(buffer, length, 0); // reset the entire buffer
// use the built-in copy method from std::string, it will decide what's best.
text.copy(buffer, length);
// only over-write the last character if source is greater than length
if (length < text.size())
buffer[length-1] = 0;
return 1; // eh?
}
char * Buffers must be null terminated unless you are explicitly passing out the length with it everywhere and saying so that the buffer is not null terminated.
Whether or not you should terminate the string with a \0 depends on the specification of your writebuff function. If what you have in buffer should be a valid C-style string after calling your function, you should terminate it with a \0.
Note, though, that c_str() will terminate with a \0 for you, so you could use text.size() + 1 as the size of the source string. Also note that if length is larger than the size of the string, you will copy further than what text provides with your current code (you can use min(length - 2, text.size() + 1/*trailing \0*/) to prevent that, and set buffer[length - 1] = 0 to cap it off).
The buffer allocated in main is leaked, btw
my question is about the terminator: should it be there or not?
Yes. It should be there. Otherwise how would you later know where the string ends? And how would cout would know? It would keep printing garbage till it encounters a garbage whose value happens to be \0. Your program might even crash.
As a sidenote, your program is leaking memory. It doesn't free the memory it allocates. But since you're exiting from the main(), it doesn't matter much; after all once the program ends, all the memory would go back to the OS, whether you deallocate it or not. But its good practice in general, if you don't forget deallocating memory (or any other resource ) yourself.
I agree with Necrolis that strncpy is the way to go, but it will not get the null terminator if the string is too long. You had the right idea in putting an explicit terminator, but as written your code puts it one past the end. (This is in C, since you seemed to be doing more C than C++?)
int writebuff(char* buffer, int length){
char* text="123456789012345";
strncpy(buffer, text, length);
buffer[length-1]='\0';
return 1;
}
It should most defiantly be there*, this prevents strings that are too long for the buffer from filling it completely and causing an overflow later on when its accessed. though imo, strncpy should be used instead of memcpy, but you'll still have to null terminate it. (also your example leaks memory).
*if you're ever in doubt, go the safest route!
First, I don't know whether writerbuff should terminate the string or not. That is a design question, to be answered by the person who decided that writebuff should exist at all.
Second, taking your specific example as a whole, there are two problems. One is that you pass an unterminated string to operator<<(ostream, char*). Second is the commented-out line writes beyond the end of the indicated buffer. Both of these invoke undefined behavior.
(Third is a design flaw -- can you know that length is always less than the length of text?)
Try this:
int writebuff(char* buffer, int length){
string text="123456789012345";
memcpy(buffer, text.c_str(),length);
buffer[length-1]='\0';
return 1;
}
int main(){
char* buffer = new char[10];
writebuff(buffer,10);
cout << "After: "<<buffer<<endl;
}
In main(), you should delete the buffer you allocated with new., or allocate it statically (char buf[10]). Yes, it's only 10 bytes, and yes, it's a memory "pool," not a leak, since it's a one-time allocations, and yes, you need that memory around for the entire running time of the program. But it's still a good habit to be into.
In C/C++ the general contract with character buffers is that they be null-terminiated, so I would include it unless I had been explicitly told not to do it. And if I did, I would comment it, and maybe even use a typedef or name on the char * parameter indicating that the result is a string that is not null terminated.
Related
According to this StackOverflow comment strncpy should never be used with a non-fixed length array.
strncpy should never be used unless you're working with fixed-width, not-necessarily-terminated string fields in structures/binary files. – R.. Jan 11 '12 at 16:22
I understand that it is redundant if you are dynamically allocating memory for the string but is there a reason why it would be bad to use strncpy over strcpy
strncpy will copy data up to the limit you specify--but if it reaches that limit before the end of the string, it'll leave the destination unterminated.
In other words, there are two possibilities with strncpy. One is that you get behavior precisely like strcpy would have produced anyway (except slower, since it fills the remainder of the destination buffer with NULs, which you virtually never actually want or care about). The other is that it produces a result you generally can't put to any real use.
If you want to copy a string up to a maximum length into a fixed-length buffer, you can (for example) use sprintf to do the job:
char buffer[256];
sprintf(buffer, "%255s", source);
Unlike strncpy, this always zero-terminates the result, so the result is always usable as a string.
If you don't want to use sprintf (or similar), I'd advise just writing a function that actually does what you want, something on this general order:
void copy_string(char const *dest, char const *source, size_t max_len) {
size_t i;
for (i=0; i<max_len-1 && source[i]; i++)
dest[i] = source[i];
dest[i] = '\0';
}
Since you've tagged this as C++ (in addition to C): my advice would be to generally avoid this whole mess in C++ by just using std::string.
If you really have to work with NUL-terminated sequences in C++, you might consider another possibility:
template <size_t N>
void copy_string(char const (&dest)[N], char const *source) {
size_t i;
for (i=0; i<N-1 && source[i]; i++)
dest[i] = source[i];
dest[i] = '\0';
}
This only works when the destination is an actual array (not a pointer), but for that case, it gets the compiler to deduce the size of the array, instead of requiring the user to pass it explicitly. This will generally make the code a tiny bit faster (less overhead in the function call) and much harder to screw up and pass the wrong size.
The argument against using strncpy is that it does not guarentee that your string will be null terminated.
The less error prone way to copy a string in C when using non-fixed length arrays is to use snprintf which does guarentee null termination of your string.
A good Blog Post Commenting on *n* functions.
These functions let you specify the size of the buffer but – and this is really important – they do not guarantee null-termination. If you ask these functions to write more characters than will fill the buffer then they will stop – thus avoiding the buffer overrun – but they will not null-terminate the buffer.
Which means that the use of strncpy and other such functions when not dealing with fixed arrays introduces unnessisary risk of non-null terminated strings which can be time-bombs in your code.
char * strncpy ( char * destination, const char * source, size_t num );
Limitations of strncpy():
It doesn't put a null-terminator on the destination string if it is completely filled. And, no null-character is implicitly appended at the end of destination if source is longer than num.
If num is greater than the length of source string, the destination string is padded with null characters up to num length.
Like strcpy, it is not a memory-safe operation. Because it does not check for sufficient space in destination before it copies source, it is a potential cause of buffer overruns.
Refer: Why should you use strncpy instead of strcpy?
We have 2 versions for copy string from one to another
1> strcpy
2> strncpy
These two versions is used for fixed and non-fixed length array. The strcpy don't check the upper bound for destination string when copy string, strncpy will check it. When the destination string is reached to this upper bound, the function strncpy will return error code, in the meantime the function strcpy will cause some effect in memory of the current process and terminate the process immediately. So that the strncpy is more secure than strcpy
I am having an issue with a function that concatenates a number to the end of a char array. I honestly can't see the issue:
void x(int num, char* originalArray) {
char* concat_str = new char[1];
sprintf(concat_str, "%d", num);
for (int i = 0; i < 1; i++)
originalArray[i + 10] = concat_str[i];
delete [] concat_str;
}
The error message is:
HEAP CORRUPTION DETECTED: after Normal block (#147) at 0x01204CA0. CRT detected that the application wrote to the memory after the end of heap buffer.
Any ideas? I'm a beginning programmer, but I've done this same kind of thing many times and never had this issue. Thanks.
concat_str needs to be large enough to hold the number of digits in num plus a null terminator. Since it size is one you only have enough room for the null terminator. Trying to add anything else is undefined behavior as it accesses memory you do not own and causing the heap corruption.
You're allocating one byte for concat_str, and sprintf'ing something that requires more space than that.
First of all, note that your API is a C api. If you want a dynamic array in C++, use std::vector<char>.
But, assuming you need to stick to the C API, there's no way to guarantee that your originalArray is large enough to hold the result. Furthermore, the temporary buffer is unnecessary.
You should modify the API and the implementation as follows:
Take the size of the destination, to guarantee that it doesn't write past its end.
Write the string in place in the destination buffer. Putting it into a temporary buffer and then copying to the destination is a waste of time.
Use snprintf, not sprintf. The latter is not safe: you can't guarantee it won't write past the end of your buffer.
You should assert the precondition that the offset is at least smaller than the destination size.
If the precondition holds, the destination will be always properly zero-terminated, although the text representation of num might not fit in it.
You can return the length of the destination that is necessary for the value to fit.
Thus:
size_t fun(int num, char * dest, size_t dest_size) {
const size_t offset = 10;
assert(dest_size > offset);
return offset + snprintf(dest + offset, dest_size - offset, "%d", num);
}
I have create these simple function.
char* tmp = new char[len];
strncpy(tmp,str+start,len);
int ret = atoi(tmp);
delete []tmp;
return ret;
I have a problem with memory managment.
When I read ret variable, the value is null. If I remove the instruction "delete []tmp;" the value is correct but the memory fast increase (because I don't release the memory).
Any ideas?
Thanks
From man strncpy: The strncpy() function is similar than strcpy, except that at most n bytes of src are copied. Warning: If there is no null byte among the first n bytes of src, the string placed in dest will not be null-terminated.
Check your str variable length and verify this null terminating condicion on strncpy
There are a few problems with atoi, one of them is that it doesn't have any kind of validation that the string you pass is really a number. Instead you might want to use strtol instead.
Also note that strncpy might not terminate the string in some cases. And that you might want to allocate one extra character (len + 1) for the terminator.
strncpy fills up the target array with '\0's once the end of the source is reached.
atoi expects a null terminated c-string, means an array of characters that ends with a '\0'.
Therefore you have to create an array with a size of len + 1, the strncpy function will the automatically null-terminate your target array.
What you could do is have the buffer be allocated from outside the function, and pass the tmp argument as an array. A better approach though, is to allocate an object. These get destructed as long as you don't use the new keyword. std::string would be perfect in this scenario.
Make sure tmp is terminated with a '\0'
Hard to tell what's wrong since str and len are not shown.
This function could be a lot simpler:
int ret = atoi(str + start);
return ret;
BTW, ret is an int and NULL is usually referred to pointers.
As I understood the correct programming style tells that if you want to get string (char []) from another function is best to create char * by caller and pass it to string formating function together with created string length. In my case string formating function is "getss".
void getss(char *ss, int& l)
{
sprintf (ss,"aaaaaaaaaa%d",1);
l=11;
}
int _tmain(int argc, _TCHAR* argv[])
{
char *f = new char [1];
int l =0;
getss(f,l);
cout<<f;
char d[50] ;
cin>> d;
return 0;
}
"getss" formats string and returns it to ss*. I thought that getss is not allowed to got outside string length that was created by caller. By my understanding callers tells length by variable "l" and "getcc" returns back length in case buffer is not filled comleatly but it is not allowed go outside array range defined by caller.
But reality told me that really it is not so important what size of buffer was created by caller. It is ok, if you create size of 1, and getss fills with 11 characters long. In output I will get all characters that "getss" has filled.
So what is reason to pass length variable - you will always get string that is zero terminated and you will find the end according that.
What is the reason to create buffer with specified length if getss can expand it?
How it is done in real world - to get string from another function?
Actually, the caller is the one that has allocated the buffer and knows the maximum size of the string that can fit inside. It passes that size to the function, and the function has to use it to avoid overflowing the passed buffer.
In your example, it means calling snprintf() rather than sprintf():
void getss(char *ss, int& l)
{
l = snprintf(ss, l, "aaaaaaaaaa%d", 1);
}
In C++, of course, you only have to return an instance of std::string, so that's mostly a C paradigm. Since C does not support references, the function usually returns the length of the string:
int getss(char *buffer, size_t bufsize)
{
return snprintf(buffer, bufsize, "aaaaaaaaaa%d", 1);
}
You were only lucky. Sprintf() can't expand the (statically allocated) storage, and unless you pass in a char array of at least length + 1 elements, expect your program to crash.
In this case you are simply lucky that there is no "important" other data after the "char*" in memory.
The C runtime does not always detect these kinds of violations reliably.
Nonetheless, your are messing up the memory here and your program is prone to crash any time.
Apart from that, using raw "char*" pointers is really a thing you should not do any more in "modern" C++ code.
Use STL classes (std::string, std::wstring) instead. That way you do not have to bother about memory issues like this.
In real world in C++ is better to use std::string objects and std::stringstream
char *f = new char [1];
sprintf (ss,"aaaaaaaaaa%d",1);
Hello, buffer overflow! Use snprintf instead of sprintf in C and use C++ features in C++.
By my understanding callers tells length by variable "l" and "getcc" returns back length in case buffer is not filled comleatly but it is not allowed go outside array range defined by caller.
This is spot on!
But reality told me that really it is not so important what size of buffer was created by caller. It is ok, if you create size of 1, and getss fills with 11 characters long. In output I will get all characters that "getss" has filled.
This is absolutely wrong: you invoked undefined behavior, and did not get a crash. A memory checker such as valgrind would report this behavior as an error.
So what is reason to pass length variable.
The length is there to avoid this kind of undefined behavior. I understand that this is rather frustrating when you do not know the length of the string being returned, but this is the only safe way of doing it that does not create questions of string ownership.
One alternative is to allocate the return value dynamically. This lets you return strings of arbitrary length, but the caller is now responsible for freeing the returned value. This is not very intuitive to the reader, because malloc and free happen in different places.
The answer in C++ is quite different, and it is a lot better: you use std::string, a class from the standard library that represents strings of arbitrary length. Objects of this class manage the memory allocated for the string, eliminating the need of calling free manually.
For cpp consider smart pointers in your case propably a shared_ptr, this will take care of freeing the memory, currently your program is leaking memory since, you never free the memory you allocate with new. Space allocate by new must be dealocated with delete or it will be allocated till your programm exits, this is bad, imagine your browser not freeing the memory it uses for tabs when you close them.
In the special case of strings I would recommend what OP's said, go with a String. With Cpp11 this will be moved (not copied) and you don't need to use new and have no worries with delete.
std::string myFunc() {
std::string str
//work with str
return str
}
In C++ you don't have to build a string. Just output the parts separately
std::cout << "aaaaaaaaaa" << 1;
Or, if you want to save it as a string
std::string f = "aaaaaaaaaa" + std::to_string(1);
(Event though calling to_string is a bit silly for a constant value).
std::strlen doesn't handle c strings that are not \0 terminated. Is there a safe version of it?
PS I know that in c++ std::string should be used instead of c strings, but in this case my string is stored in a shared memory.
EDIT
Ok, I need to add some explanation.
My application is getting a string from a shared memory (which is of some length), therefore it could be represented as an array of characters. If there is a bug in the library writing this string, then the string would not be zero terminated, and the strlen could fail.
You've added that the string is in shared memory. That's guaranteed readable, and of fixed size. You can therefore use size_t MaxPossibleSize = startOfSharedMemory + sizeOfSharedMemory - input; strnlen(input, MaxPossibleSize) (mind the extra n in strnlen).
This will return MaxPossibleSize if there's no \0 in the shared memory following input, or the string length if there is. (The maximal possible string length is of course MaxPossibleSize-1, in case the last byte of shared memory is the first \0)
C strings that are not null-terminated are not C strings, they are simply arrays of characters, and there is no way of finding their length.
If you define a c-string as
char* cowSays = "moo";
then you autmagically get the '\0' at the end and strlen would return 3. If you define it like:
char iDoThis[1024] = {0};
you get an empty buffer (and array of characters, all of which are null characters). You can then fill it with what you like as long as you don't over-run the buffer length. At the start strlen would return 0, and once you have written something you would also get the correct number from strlen.
You could also do this:
char uhoh[100];
int len = strlen(uhoh);
but that would be bad, because you have no idea what is in that array. It could hit a null character you might not. The point is that the null character is the defined standard manner to declare that the string is finished.
Not having a null character means by definition that the string is not finished. Changing that will break the paradigm of how the string works. What you want to do is make up your own rules. C++ will let you do that, but you will have to write a lot of code yourself.
EDIT
From your newly added info, what you want to do is loop over the array and check for the null character by hand. You should also do some validation if you are expecting ASCII characters only (especially if you are expecting alpha-numeric characters). This assumes that you know the maximum size.
If you do not need to validate the content of the string then you could use one of the strnlen family of functions:
http://msdn.microsoft.com/en-us/library/z50ty2zh%28v=vs.80%29.aspx
http://linux.about.com/library/cmd/blcmdl3_strnlen.htm
size_t safe_strlen(const char *str, size_t max_len)
{
const char * end = (const char *)memchr(str, '\0', max_len);
if (end == NULL)
return max_len;
else
return end - str;
}
Yes, since C11:
size_t strnlen_s( const char *str, size_t strsz );
Located in <string.h>
Get a better library, or verify the one you have - if you can't trust you library to do what it says it will, then how the h%^&l do you expect your program to?
Thats said, Assuming you know the length of the buiffer the string resides, what about
buffer[-1+sizeof(buffer)]=0 ;
x = strlen(buffer) ;
make buffer bigger than needed and you can then test the lib.
assert(x<-1+sizeof(buffer));
C11 includes "safe" functions such as strnlen_s. strnlen_s takes an extra maximum length argument (a size_t). This argument is returned if a null character isn't found after checking that many characters. It also returns the second argument if a null pointer is provided.
size_t strnlen_s(const char *, size_t);
While part of C11, it is recommended that you check that your compiler supports these bounds-checking "safe" functions via its definition of __STDC_LIB_EXT1__. Furthermore, a user must also set another macro, __STDC_WANT_LIB_EXT1__, to 1, before including string.h, if they intend to use such functions. See here for some Stack Overflow commentary on the origins of these functions, and here for C++ documentation.
GCC and Clang also support the POSIX function strnlen, and provide it within string.h. Microsoft too provide strnlen which can also be found within string.h.
You will need to encode your string. For example:
struct string
{
size_t len;
char *data;
} __attribute__(packed);
You can then accept any array of characters if you know the first sizeof(size_t) bytes of the shared memory location is the size of the char array. It gets tricky when you want to chain arrays this way.
It's better to trust your other end to terminate it's strings or roll your own strlen that does not go outside the bounderies of the shared memory segment (providing you know at least the size of that segment).
If you need to get the size of shared memory, try to use
// get memory size
struct shmid_ds shm_info;
size_t shm_size;
int shm_rc;
if((shm_rc = shmctl(shmid, IPC_STAT, &shm_info)) < 0)
exit(101);
shm_size = shm_info.shm_segsz;
Instead of using strlen you can use shm_size - 1 if you are sure that it is null terminated. Otherwise you can null terminate it by data[shm_size - 1] = '\0'; then use strlen(data);
a simple solution:
buff[BUFF_SIZE -1] = '\0'
ofc this will not tell you if the string originally was exactly BUFF_SIZE-1 long or it was just not terminated... so you need xtra logic for that.
How about this portable nugget:
int safeStrlen(char *buf, int max)
{
int i;
for(i=0;buf[i] && i<max; i++){};
return i;
}
As Neil Butterworth already said in his answer above: C-Strings which are not terminated by a \0 character, are no C-Strings!
The only chance you do have is to write an immutable Adaptor or something which creates a valid copy of the C-String with a \0 terminating character. Of course, if the input is wrong and there is an C-String defined like:
char cstring[3] = {'1','2','3'};
will indeed result in unexpected behavior, because there can be something like 123#4x\0 in the memory now. So the result of of strlen() for example is now 6 and not 3 as expected.
The following approach shows how to create a safe C-String in any case:
char *createSafeCString(char cStringToCheck[]) {
//Cast size_t to integer
int size = static_cast<int>(strlen(cStringToCheck)) ;
//Initialize new array out of the stack of the method
char *pszCString = new char[size + 1];
//Copy data from one char array to the new
strncpy(pszCString, cStringToCheck, size);
//set last character to the \0 termination character
pszCString[size] = '\0';
return pszCString;
}
This ensures that if you manipulate the C-String to not write on the memory of something else.
But this is not what you wanted. I know, but there is no other way to achieve the length of a char array without termination. This isn't even an approach. It just ensures that even if the User (or Dev) is inserting ***** to work fine.