better approach to copy portion of char array than strncpy - c++

I used std::strncpy in c++98 to copy portion of a char array to another char array. It seems that it requires to manually add the ending character '\0', in order to properly terminate the string.
As below, if not explicitly appending '\0' to num1, the char array may have other characters in the later portion.
char buffer[] = "tag1=123456789!!!tag2=111222333!!!10=240";
char num1[10];
std::strncpy(num1, buffer+5, 9);
num1[9] = '\0';
Is there better approach than this? I'd like to have a one-step operation to reach this goal.

Yes, working with "strings" in C was rather verbose, wasn't it!
Fortunately, C++ is not so limited:
const char* in = "tag1=123456789!!!tag2=111222333!!!10=240";
std::string num1{in+5, in+15};
If you can't use a std::string, or don't want to, then simply wrap the logic you have described into a function, and call that function.
As below, if not explicitly appending '\0' to num1, the char array may have other characters in the later portion.
Not quite correct. There is no "later portion". The "later portion" you thought you observed was other parts of memory that you had no right to view. By failing to null-terminate your would-be C-string, your program has undefined behaviour and the computer could have done anything, like travelling back in time and murdering my great-great-grandmother. Thanks a lot, pal!
It's worth noting, then, that because it's C library functions doing that out-of-bounds memory access, if you hadn't used those library functions in that way then you didn't need to null-terminate num1. Only if you want to treat it as a C-style string later is that required. If you just consider it to be an array of 10 bytes, then everything is still fine.

Related

Confusion about the necessity of the null-character?

I am reading about why exactly there is a need for null-characters, and then I found this answer which made somewhat sense to me. It states that it is needed because that char arrays (for the C strings) are often allocated much larger than the actual strings and you thereby need a a way to symbolize the end.
But why aren't these array not just constructed with a size deduction based on the initializer (without the null-character that actually is implicitly added when assigning directly to string literals). Like, if the arrays holding the strings are constructed using size deduction, there would not be a need for the null-character because the array was not any bigger than the string, so of course, it would end at the end of that array.
I am reading about why exactly there is a need for null-characters, and then I found this answer which made somewhat sense to me. It states that it is needed because that char arrays (for the C strings) are often allocated much larger than the actual strings and you thereby need a a way to symbolize the end.
The answer is misleading. That's not really the reason for why null termination is needed. The accepted answer with more upvotes is better.
there would not be a need for the null-character because the array was not any bigger than the string, so of course, it would end at the end of that array.
Let us remind ourselves, that we cannot use arrays as function arguments. Even if we could, we wouldn't want to, because it would be slow to copy an entire array into the argument.
Therefore, there is a need to refer to an array indirectly. Indirection is commonly achieved using pointers (or references). Now, we could have a "pointer to character array of size 42", but that is not very useful because then the argument can only point to strings of one particular size.
Instead, the common approach is to use a pointer to the first element of the array. This is so common pattern that the language has a rule that allows the name of the array to implicitly decay into the pointer to first element.
But can you tell how big an array is, based on a pointer to an element of that array? You cannot. You need extra information. The accepted answer of the linked question explains the options that are available for representing the size, and that the designer of C chose the option that uses a terminating character (which was already the convention used by the BCPL language which C is based on).
TL;DR Size information is needed because there is a need to refer to the string indirectly, and that indirection hides the knowledge about the size of the array. Null termination is one way to encode the size information within the content of the string, and it is the way that was chosen by the designer of the C language.
Historically, string arrays are provided with termination symbol(s). Reason is simple: instead of sending two values (head of the array and array length) you just need to pass just one value, head of the array. This simplifies calling signature but places some requirements for caller.
In C/C++ itself, null character is a termination symbol so all runtime functions do work with intention that very first null char they can meet is a line end. Same time, in terms of applied logic, terminal symbol(s) may be different: for example, in HTTP headers there is a CR-LF-CR-LF sequence that marks a end-of-the-header and single CR-LF sequence is just a start-of-next-line.
But why aren't these array not just constructed with a size deduction
based on the initializer (without the null-character that actually is
implicitly added when assigning directly to string literals).
I suppose you mean why you can't write:
char t[] = "abracadabra";
and the compiler would deduce a size of 11?
Because you have 12 characters and not 11. If the array would have size 11, then something would be lost: the byte used to contains the NUL would not have been referenced and compiler wouldn't make a difference in between:
char t[] = "abracadabra"; // an array deduced from a C-string literal
and
char t[11] = { 'a', 'b', 'r', 'a', 'c', 'a', 'b', 'r', 'a' }; // a "real" array not a C-string!
The first would have to release 12 bytes at the end of scope and the second 11.
Historically arrays are just some kind a syntactic sugar on top of pointers arithmetic.
... because that char arrays ... are often allocated much larger than the actual strings
That answer is awful.
C strings can be dynamically allocated, meaning you don't know, before runtime, how long they should be. Instead of pre-allocating a massive array and filling most of it with zeroes, you can just malloc(required_size+1) and stick a single nul character at the end.
Conversely, string literals which are known at compile time, are definitely not "allocated much larger than the actual strings". there wouldn't be any point, since you know exactly how much space is needed in advance.
But why aren't these array not just constructed with a size deduction based on the initializer
size_t expected;
if (read(fd, &expected, sizeof(expected)) == sizeof(expected)) {
char *buf = malloc(expected + 1);
if (buf && read(fd, buf, expected) == expected) {
buf[expected] = 0;
/* now do something with buf */
}
}
there you go, a dynamically-sized string. What would your "size deduction" be? What is the "initializer"?
I could have written a less-ugly example using std::string, since the question is tagged C++, but it's actually C strings you're specifically asking about, and it doesn't make any real difference.
Strings are often manipulated by creating a char array to hold intermediate results and modifying its contents:
char buffer[128];
strcpy(buffer, "Hello, ");
strcat(buffer, "world");
std::cout << buffer << '\n';
After the call to strcpy the buffer has 7 characters that we care about; after the call to strcat it has 12. So the number of characters in the buffer can change, and we need to have a way of indicating how many characters there are that matter. One convention is to put a character count in the first location in the array, and the actual characters after that. Another convention is to put a marker at the end of the characters that matter. There are tradeoffs here, but the decision in C, which was carried through into C++ was to go with an end marker.

Please explain me the need for appending '\0' while converting string to char

While using proc/mysql for c++ I have taken string as user input and converted into char via strcpy(c,s.c_str()); function, where c is the binding variable through which I'll add value in the database table and s is the string (user input), it is working fine but my teacher is asking me append '\0' at the end - I can't understand the reason why I need to?
Your teacher is deluded.
c_str() in itself appends a zero [or rather, std::string reserves space for an extra character when creating the string, and makes sure this is zero at least at the point of c_str() returning - in C++11, it is guaranteed that there is an extra character space filled with zero at the end of the string, always].
You DO need a zero at the end of a string to mark the end of the string in a C-style string, such as those used by strcpy.
[As others have pointed out, you should also check that the string fits before copying, and I would suggest reject if it won't fit, as truncating it will lead to other problems - as well as checking that there isn't any sql-injection attacks and a multitude of other things required for "good pracice in an SQL environment"]
While the teacher is deluded on the appending '\0' to the string, your code exhibits another very bad bug.
You should never use strcpy in such a fashion. You should always use some routine which controls the nubmer of characters copied, like strncpy(), or other alternatives, and provide it with the size of receiving variable. Otherwise you are just asking for troubles.
Just guessing, it's a protection against buffer overflow. If c is only N bytes long and s.c_str() returns a pointer to a N+k length string, you'd write k bytes after c, which is bad.
Now let's say (if you didn't SEGFAULT already) you pass this c NUL-terminated string to a C function, you have no guarantee that the \0 you wrote after c is still there. This C function will then read an undefined amount of bytes after c, which is badder worse.
Anyway, use ::strncpy():
char c[64];
::strncpy(c, s.c_str(), sizeof(c));
c[sizeof(c)-1] = '\0';

how to make a not null-terminated c string?

i am wondering :char *cs = .....;what will happen to strlen() and printf("%s",cs) if cs point to memory block which is huge but with no '\0' in it?
i write these lines:
char s2[3] = {'a','a','a'};
printf("str is %s,length is %d",s2,strlen(s2));
i get the result :"aaa","3",but i think this result is because that a '\0'(or a 0 byte) happens to reside in the location s2+3.
how to make a not null-terminated c string? strlen and other c string function relies heavily on the '\0' byte,what if there is no '\0',i just want know this rule deeper and better.
ps: my curiosity is aroused by studying the follw post on SO.
How to convert a const char * to std::string
and these word in that post :
"This is actually trickier than it looks, because you can't call strlen unless the string is actually nul terminated."
If it's not null-terminated, then it's not a C string, and you can't use functions like strlen - they will march off the end of the array, causing undefined behaviour. You'll need to keep track of the length some other way.
You can still print a non-terminated character array with printf, as long as you give the length:
printf("str is %.3s",s2);
printf("str is %.*s",s2_length,s2);
or, if you have access to the array itself, not a pointer:
printf("str is %.*s", (int)(sizeof s2), s2);
You've also tagged the question C++: in that language, you usually want to avoid all this error-prone malarkey and use std::string instead.
A "C string" is, by definition, null-terminated. The name comes from the C convention of having null-terminated strings. If you want something else, it's not a C string.
So if you have a string that is not null-terminated, you cannot use the C string manipulation routines on it. You can't use strlen, strcpy or strcat. Basically, any function that takes a char* but no separate length is not usable.
Then what can you do? If you have a string that is not null-terminated, you will have the length separately. (If you don't, you're screwed. You need some way to find the length, either by a terminator or by storing it separately.) What you can do is allocate a buffer of the appropriate size, copy the string over, and append a null. Or you can write your own set of string manipulation functions that work with pointer and length. In C++ you can use std::string's constructor that takes a char* and a length; that one doesn't need the terminator.
Your supposition is correct: your strlen is returning the correct value out of sheer luck, because there happens to be a zero on the stack right after your improperly terminated string. It probably helps that the string is 3 bytes, and the compiler is likely aligning stuff on the stack to 4-byte boundaries.
You cannot depend on this. C strings need NUL characters (zeroes) at the end to work correctly. C string handling is messy, and error-prone; there are libraries and APIs that help make it less so… but it's still easy to screw up. :)
In this particular case, your string could be initialized as one of these:
A: char s2[4] = { 'a','a','a', 0 }; // good if string MUST be 3 chars long
B: char *s2 = "aaa"; // if you don't need to modify the string after creation
C: char s2[]="aaa"; // if you DO need to modify the string afterwards
Also note that declarations B and C are 'safer' in the sense that if someone comes along later and changes the string declaration in a way that alters the length, B and C are still correct automatically, whereas A depends on the programmer remembering to change the array size and keeping the explicit null terminator at the end.
What happens is that strlen keeps going, reading memory values until it eventually gets to a null. it then assumes that is the terminator and returns the length that could be massively large. If you're using strlen in an environment that expects C-strings to be used, you could then copy this huge buffer of data into another one that is just not big enough - causing buffer overrun problems, or at best, you could copy a large amount of garbage data into your buffer.
Copying a non-null terminated C string into a std:string will do this. If you then decide that you know this string is only 3 characters long and discard the rest, you will still have a massively long std:string that contains the first 3 good characters and then a load of wastage. That's inefficient.
The moral is, if you're using the CRT functions to operator on C strings, they must be null-terminated. Its no different to any other API, you must follow the rules that API sets down for correct usage.
Of course, there is no reason you cannot use the CRT functions if you always use the specific-length versions (eg strncpy) but you will have to limit yourself to just those, always, and manually keep track of the correct lengths.
Convention states that a char array with a terminating \0 is a null terminated string. This means that all str*() functions expect to find a null-terminator at the end of the char-array. But that's it, it's convention only.
By convention also strings should contain printable characters.
If you create an array like you did char arr[3] = {'a', 'a', 'a'}; you have created a char array. Since it is not terminated by a \0 it is not called a string in C, although its contents can be printed to stdout.
The C standard does not define the term string until the section 7 - Library functions. The definition in C11 7.1.1p1 reads:
A string is a contiguous sequence of characters terminated by and including the first null character.
(emphasis mine)
If the definition of string is a sequence of characters terminated by a null character, a sequence of non-null characters not terminated by a null is not a string, period.
What you have done is undefined behavior.
You are trying to write to a memory location that is not yours.
Change it to
char s2[] = {'a','a','a','\0'};

What is the internal structure of an object of the (EDIT: MFC) CString class?

I need to strncpy() (effectively) from a (Edit: MFC) CString object to a C string variable. It's well known that strncpy() sometimes fails (depending on the source length **EDIT and the length specified in the call) to terminate the dest C string correctly. To avoid that evil, I'm thinking to store a NUL char inside the CString source object and then to strcpy() or memmove() that guy.
Is this a reasonable way to go about it? If so, what must I manipulate inside the CString object? If not, then what's an alternative that will guarantee a properly-terminated destination C string?
strncpy() only "fails" to null-terminate the destination string when the source string is longer than the length limit you specify. You can ensure that the destination is null-terminated by setting its last character to null yourself. For example:
#define DEST_STR_LEN 10
char dest_str[DEST_STR_LEN + 1]; // +1 for the null
strncpy(dest_str, src_str, DEST_STR_LEN);
dest_str[DEST_STR_LEN] = '\0';
If src_str is more than DEST_STR_LEN characters long, dest_str will be a properly-terminated string of DEST_STR_LEN characters. If src_str is shorter than that, strncpy() will put a null terminator somewhere within dest_str, so the null at the very end is irrelevant and harmless.
CSimpleStringT::GetString gives a pointer to a null-terminated string. Use this as the soure for strncpy. As this is C++, you should only use C-style strings when interfacing with legacy APIs. Use std::string instead.
One of the alternative ways would be to zero string first and then cast or memcpy from CString.
I hope they don't changed from when I used them: that was many years ago :)
They used an interesting 'trick' to handle the refcount and the very fast and efficient automatic conversion to char*: i.e the pointer is to LPCSTR, but some back byte is reserved to keep the implementation state.
So the struct can be used with the older windows API (LPCSTR without overhead). I found at the time the idea interesting!
Of course the key ìs the availability of allocators: they simply offsets the pointer when mallocing/freeing.
I remember there was a buffer request to (for instance) modify the data available: GetBuffer(0), followed by ReleaseBuffer().
HTH
If you are not compiling with _UNICODE enabled, then you can get a const char * from a CString very easily. Just cast it to an LPCTSTR:
CString myString("stuff");
const char *byteString = (LPCTSTR)myString;
This is guaranteed to be NULL-terminated.
If you have built with _UNICODE, then CString is a UTF-16 encoded string. You can't really do anything directly with that.
If you do need to copy the data from the CString, this very easy, even using C-style code. Just make sure that you allocate sufficient memory and are copying the right length:
CString myString("stuff");
char *outString = (char*)malloc(myString.Length() + 1);
strncpy(outString, (LPCTSTR)myString, myString.Length());
CString ends with NULL so as long as your text is correct (no NULL characters inside) then copying should be safe. You can write:
char szStr[256];
strncpy(szStr, (LPCSTR) String, 3);
szStr[3]='\0'; /// b-cos no null-character is implicitly appended to the end of destination
if you store null somehere inside CString object you will probably cause yourself more problems, CString stores its lenght internally.
Another alternative solution would rather involve support from CPU or compiler, as it's much better approach - simply make sure that when copying memory in "safe" mode, at any time after every atomic operation there is zero added on the end, so when whole loop fails, the destination string will still be terminated, without need to zero it fully before making copy.
There could be also support for fast zero - just mark start and stop of zeroed region and it's instantly cleared in RAM, this would make things a lot easier.

C++: Is it safe to move a jpeg around in memory using a std::string?

I have an external_jpeg_func() that takes jpeg data in a char array to do stuff with it. I am unable to modify this function. In order to provide it the char array, I do something like the following:
//what the funcs take as inputs
std::string my_get_jpeg();
void external_jpeg_func(const char* buf, unsigned int size);
int main ()
{
std::string myString = my_get_jpeg();
external_jpeg_func(myString.data(), myString.length() );
}
My question is: Is it safe to use a string to transport the char array around? Does jpeg (or perhaps any binary file format) be at risk of running into characters like '\0' and cause data loss?
My recommendation would be to use std::vector<char>, instead of std::string, in this case; the danger with std::string is that it provides a c_str() function and most developers assume that the contents of a std::string are NUL-terminated, even though std::string provides a size() function that can return a different value than what you would get by stopping at NUL. That said, as long as you are careful to always use the constructor that takes a size parameter, and you are careful not to pass the .c_str() to anything, then there is no problem with using a string here.
While there is no technical advantage to using a std::vector<char> over a std::string, I feel that it does a better job of communicating to other developers that the content is to be interpreted as an arbitrary byte sequence rather than NUL-terminated textual content. Therefore, I would choose the former for this added readability. That said, I have worked with plenty of code that uses std::string for storing arbitrary bytes. In fact, the C++ proto compiler generates such code (though, I should add, that I don't think this was a good choice for the readability reasons that I mentioned).
std::string does not treat null characters specially, unless you don't give it an explicit string length. So your code will work fine.
Although, in C++03, strings are technically not required to be stored in contiguous memory. Just about every std::string implementation you will find will in fact store them that way, but it is not technically required. C++11 rectifies this.
So, I would suggest you use a std::vector<char> in this case. std::string doesn't buy you anything over a std::vector<char>, and it's more explicit that this is an array of characters and not a possibly printable string.
I think it is better to use char array char[] or std::vector<char>. This is standard way to keep images. Of course, binary file may contain 0 characters.