Wrong behavior of the char arrays after strncpy - c++

I am not a C-dev and I can get something wrong (I am doing the changes in the old code):
There is a function called strncpySafe (it is just a wrapper on strncpy as I can see):
void strncpySafe(char *strDest, const char *strSource, int count)
{
strncpy(strDest, strSource, count-1);
strDest[count-1] = '\0';
}
The step by itself, where is a copy from source A to source B with an offset:
void Foo(const char *message) {
char line[1024];
...
strncpySafe(line, &message[message_offset], count);
In the last step they are modifying a line[] that was copied (the message[] should stay the same):
line[N] = 0;
On the last step I can see from a VSCode debugger that line[N] is being changed and in the same time the message[N] also modifies.
I am using Ubuntu /g++-8, -march=x86-64, -std=c++11.
Is it something about the same pointers? Is it the wrong usage of the strncpy?
Thank you.
ps: the same code is being used inside a game client for windows and linux and I can say that on windows it is not being reproduced (windows was built on an older c-compiler, haven't checked with the same c++11 build yet).
EDIT: to make it clear, the modification of the line and a message happens in the same time when I pass a step with line[N] = 0;
Removed incorrect naming of the message_offset_2. It is a count.
Let me provide an example of the execution:
strncpySafe(line, &message[5], 10); // It copies 10 elements from 5th
line[5] = 0; // this leads that message[5] also gets 0 for it's element
There are no errors with the boundaries (offsets and counters seems alright).
I agree that this code is deprecated and the logic might be unclear (why is that done that way) and I could use std::string. For me it was interesting why does it happen.

Taking into account that the array message is also changed as you wrote in your question then it is evident that you are using the function incorrectly and as a result you have undefined behavior.
For example the third parameter that you named like message_offset_2 specifies the number of characters that should be copied from a string to the the destination character array. So it should not be named like message_offset_2.
Another reason of undefined behavior can be using of overlapped arrays.
So either the third argument is specified incorrectly or there takes place overlapping of character arrays.
But in any case the function is declared and defined badly.
If it is a wrapper around the standard C function strncpy then it should be declared at least like
char * strncpySafe( char * restrict s1, const char * restrict s2, size_t n );
Or if it is declared as a C++ function then
char * strncpySafe( char * s1, const char * s2, size_t n );
If the function is designed to copy n characters then the body of the function should look like
if ( n )
{
strncpy( s1, s2, n );
s1[n] = '\0';
}
return s1;
So the destination array shall have at least n + 1 elements.
And (the C Standard, 7.23.2.4 The strncpy function )
If copying takes place between objects that overlap, the behavior is
undefined.

Related

Why does a `char *` allocated through malloc prints out gibberish after the function allocating it returns?

I'm trying to write a function to parse and extract the components of a URL. Moreover, I need the components (e.g. hostname) to have the type char * since I intend to pass them to C APIs.
My current approach is to save the components in the parse_url function to the heap by calling malloc. But for some reason, the following code is printing gibberish. I'm confused by this behavior because I thought memory allocated on the heap will persist even after the function allocating it returns.
I'm new to C/C++, so please let me know what I did wrong and how to achieve what I wanted. Thank you.
#include <iostream>
#include <string>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
using namespace std;
void cast_to_cstyle(string source, char *target)
{
target = (char *)malloc(source.size() + 1);
memcpy(target, source.c_str(), source.size() + 1);
}
void parse_url(string url, char *protocol_cstyle, char *hostname_cstyle, char *port_cstyle, char *path_cstyle)
{
size_t found = url.find_first_of(":");
string protocol = url.substr(0, found);
string url_new = url.substr(found + 3); // `url_new` is the url excluding the "http//" part
size_t found1 = url_new.find_first_of(":");
string hostname = url_new.substr(0, found1);
size_t found2 = url_new.find_first_of("/");
string port = url_new.substr(found1 + 1, found2 - found1 - 1);
string path = url_new.substr(found2);
cast_to_cstyle(protocol, protocol_cstyle);
cast_to_cstyle(hostname, hostname_cstyle);
cast_to_cstyle(port, port_cstyle);
cast_to_cstyle(path, path_cstyle);
}
int main() {
char *protocol;
char *hostname;
char *port;
char *path;
parse_url("http://www.example.com:80/file.txt", protocol, hostname, port, path);
printf("%s, %s, %s, %s\n", (protocol), (hostname), (port), (path));
return 0;
}
The problem is that arguments are passed by value, so the newly created string never leaves the function (albeit exists until program termination as free is never called on it). You can pass by reference¹ like:
void cast_to_cstyle(string source, char *&target)
or better, pass the source string by (constant) reference too (string is expensive to copy):
void cast_to_cstyle(const string &source, char *&target)
(neither function body nor the call site need to be changed).
But you may not need even that.
If the API doesn’t actually modify the string despite using non-const pointer (pretty common in C AFAIK), you can use const_cast, like const_cast<char *>(source.c_str()).
Even if it may modify the string, &source[0] is suitable (at least since C++11). It may not seem right but it is:
a pointer to s[0] can be passed to functions that expect a pointer to the first element of a null-terminated (since C++11)CharT[] array.
— https://en.cppreference.com/w/cpp/string/basic_string
(and since C++17 data() is the way to go).
However, unlike that obtained from malloc any such pointer becomes invalid when the string is resized or destroyed (be careful “the string” means “that particular copy of the string” if you have several).
¹ Strictly speaking, pass a reference; references aren’t restricted to function arguments in C++.
The problem was as #WeatherVane and #JerryJeremiah mentioned. The pointer returned by malloc and assigned to target was in the local scope of cast_to_cstyle(), which got destroyed after the function returns. So the protocol, hostname, port, path variables declared in main were never assigned, hence it printed out gibberish. I've fixed this by making the cast_to_style() returns a char *.
char *cast_to_cstyle_str(string source)
{
char *target = (char *)malloc(source.size() + 1);
memcpy(target, source.c_str(), source.size() + 1);
return target;
}
Note: I forgot to free up malloc in my question.

String encryption function works with char[], but not a plain string

I'm using version xtea encryption from wikipedia that's written in C++. I wrote a function to encrypt a string
const char* charPtrDecrypt(const char* encString, int len, bool encrypt)
{
/********************************************************
* This currently uses a hard-coded key, but I'll implement
* a dynamic key based on string length or something.
*********************************************************/
unsigned int key[4] = { 0xB5D1, 0x22BA, 0xC2BC, 0x9A4E };
int n_blocks=len/BLOCK_SIZE;
if (len%BLOCK_SIZE != 0)
++n_blocks;
for (int i = 0; i < n_blocks; i++)
{
if (encrypt)
xtea::Encrypt(32, (uint32_t*)(encString + (i*BLOCK_SIZE)), key);
else
xtea::Decrypt(32, (uint32_t*)(encString + (i*BLOCK_SIZE)), key);
}
return encString;
}
It works when I supply a const char encString[] = "Hello, World!", but when I supply a raw string e.g. const char* a = charPtrDecrypt("Hello, World!", 14, true) It crashes.
There's an old saying (I know it's old, because I first posted it to Usenet around 1992 or so) that: "If you lie to the compiler, it will get its revenge." That's what's happening here.
Here:
const char* charPtrDecrypt(const char* encString, int len, bool encrypt)
...you promise that you will not modify the characters that encString points at. That's what the const says/means/does.
Here, however:
xtea::Encrypt(32, (uint32_t*)(encString + (i*BLOCK_SIZE)), key);
...you cast away that constness (cast to uint32_t *, with no const qualifier), and pass the pointer to a function that modifies the buffer it points at.
Then the compiler gets its revenge: it allows you to pass a pointer to data you can't modify, because you promise not to modify it--but then when you turn around and try to modify it anyway, your program crashes and burns because you try to modify read-only data.
This can be avoided in any number of ways. One would be to get away from the relatively low-level constructs you're using now, and pass/return std::strings instead of pointers to [const] char.
The code has still more problems than just that though. For one thing, it treats the input as a block of uint32_t items, and rounds its view of the length up to the next multiple of the size of a uint32_t (typically 4). Unfortunately, it doesn't actually change the size of the buffer, so even when the buffer is writable, it doesn't really work correctly--it still reads and writes beyond the end of the buffer.
Here again, std::string will be helpful: it lets us resize the string up to the correct size instead of just reading/writing past the end of the fixed-size buffer.
Along with that, there's a fact the compiler won't care about, but you (and any reader of this code) will (or at least should): the name of the function is misleading, and has parameters whose meaning isn't at all apparent--particularly the Boolean that governs whether to encrypt or decrypt. I'd advise using an enumeration instead, and renaming the function to something that can encompass either encryption or decryption:
Finally, I'd move the if statement that determines whether to encrypt or decrypt outside the loop, since we aren't going to change from one to the other as we process one input string.
Taking all those into account, we could end up with code something like this:
enum direction { ENCRYPT, DECRYPT };
std::string xtea_process(std::string enc_string, direction d) {
unsigned int key[4] = { 0xB5D1, 0x22BA, 0xC2BC, 0x9A4E };
size_t len = enc_string.size();
len += len % BLOCK_SIZE; // round up to next multiple of BLOCK_SIZE
enc_string.resize(len); // enlarge the string to that size, if necessary
if (direction == DECRYPT)
for (size_t i = 0; i < len; i+=BLOCK_SIZE)
xtea::Decrypt(32, reinterpret_cast<uint32_t *>(&encString[i]), key);
else
for (size_t i = 0; i < len; i += BLOCK_SIZE)
xtea::Encrypt(32, reinterpret_cast<uint32_t *>(&encString[i]), key);
}
return encString;
}
This does still leave (at least) one point that I haven't bothered to deal with: some machines may have stricter alignment requirements for a uint32_t than for char, and it's theoretically possible that the buffer used in a string won't meet those stricter alignment requirements. You could run into a situation where you need to copy the data out of the string, into a buffer that's properly aligned for uint32_t access, do the encryption/decryption, then copy the result back.
You pass a constant const char* to the function but cast it to a non-constant uint32_t*. I guess that xtea::Encrypt modifies the string buffer in place.
In the first version const char encString[] = "Hello, World!" the variable --while being const-- most likely lies on the stack which is modifiable. So it's "not nice" to remove the const, but it works.
In the second version you string most likely lies in a read-only data segment. So casting away const let's you call the Encrypt function, but crashes as soon as the function really tries to modify the string.

How to avoid providing length along with char*?

There is a function which sends data to the server:
int send(
_In_ SOCKET s,
_In_ const char *buf,
_In_ int len,
_In_ int flags
);
Providing length seems to me a little bit weird. I need to write a function, sending a line to the server and wrapping this one such that we don't have to provide length explicitly. I'm a Java-developer and in Java we could just invoke String::length() method, but now we're not in Java. How can I do that, unless providing length as a template parameter? For instance:
void sendLine(SOCKET s, const char *buf)
{
}
Is it possible to implement such a function?
Use std string:
void sendLine(SOCKET s, const std::string& buf) {
send (s, buf.c_str(), buf.size()+1, 0); //+1 will also transmit terminating \0.
}
On a side note: your wrapper function ignores the return value and doesn't take any flags.
you can retrieve the length of C-string by using strlen(const char*) function.
make sure all the strings are null terminated and keep in mind that null-termination (the length grows by 1)
Edit: My answer originally only mentioned std::string. I've now also added std::vector<char> to account for situations where send is not used for strictly textual data.
First of all, you absolutely need a C++ book. You are looking for either the std::string class or for std::vector<char>, both of which are fundamental elements of the language.
Your question is a bit like asking, in Java, how to avoid char[] because you never heard of java.lang.String, or how to avoid arrays in general because you never heard of java.util.ArrayList.
For the first part of this answer, let's assume you are dealing with just text output here, i.e. with output where a char is really meant to be a text character. That's the std::string use case.
Providing lenght seems to me a little bit wierd.
That's the way strings work in C. A C string is really a pointer to a memory location where characters are stored. Normally, C strings are null-terminated. This means that the last character stored for the string is '\0'. It means "the string stops here, and if you move further, you enter illegal territory".
Here is a C-style example:
#include <string.h>
#include <stdio.h>
void f(char const* s)
{
int l = strlen(s); // l = 3
printf(s); // prints "foo"
}
int main()
{
char* test = new char[4]; // avoid new[] in real programs
test[0] = 'f';
test[1] = 'o';
test[2] = 'o';
test[3] = '\0';
f(test);
delete[] test;
}
strlen just counts all characters at the specified position in memory until it finds '\0'. printf just writes all characters at the specified position in memory until it finds '\0'.
So far, so good. Now what happens if someone forgets about the null terminator?
char* test = new char[3]; // don't do this at home, please
test[0] = 'f';
test[1] = 'o';
test[2] = 'o';
f(test); // uh-oh, there is no null terminator...
The result will be undefined behaviour. strlen will keep looking for '\0'. So will printf. The functions will try to read memory they are not supposed to. The program is allowed to do anything, including crashing. The evil thing is that most likely, nothing will happen for a while because a '\0' just happens to be stored there in memory, until one day you are not so lucky anymore.
That's why C functions are sometimes made safer by requiring you to explicitly specify the number of characters. Your send is such a function. It works fine even without null-terminated strings.
So much for C strings. And now please don't use them in your C++ code. Use std::string. It is designed to be compatible with C functions by providing the c_str() member function, which returns a null-terminated char const * pointing to the contents of the string, and it of course has a size() member function to tell you the number of characters without the null-terminated character (e.g. for a std::string representing the word "foo", size() would be 3, not 4, and 3 is also what a C function like yours would probably expect, but you have to look at the documentation of the function to find out whether it needs the number of visible characters or number of elements in memory).
In fact, with std::string you can just forget about the whole null-termination business. Everything is nicely automated. std::string is exactly as easy and safe to use as java.lang.String.
Your sendLine should thus become:
void sendLine(SOCKET s, std::string const& line)
{
send(s, line.c_str(), line.size());
}
(Passing a std::string by const& is the normal way of passing big objects in C++. It's just for performance, but it's such a widely-used convention that your code would look strange if you just passed std::string.)
How can I do that, unless providing lenght as a template parameter?
This is a misunderstanding of how templates work. With a template, the length would have to be known at compile time. That's certainly not what you intended.
Now, for the second part of the answer, perhaps you aren't really dealing with text here. It's unlikely, as the name "sendLine" in your example sounds very much like text, but perhaps you are dealing with raw data, and a char in your output does not represent a text character but just a value to be interpreted as something completely different, such as the contents of an image file.
In that case, std::string is a poor choice. Your output could contain '\0' characters that do not have the meaning of "data ends here", but which are part of the normal contents. In other words, you don't really have strings anymore, you have a range of char elements in which '\0' has no special meaning.
For this situation, C++ offers the std::vector template, which you can use as std::vector<char>. It is also designed to be usable with C functions by providing a member function that returns a char pointer. Here's an example:
void sendLine(SOCKET s, std::vector<char> const& data)
{
send(s, &data[0], data.size());
}
(The unusual &data[0] syntax means "pointer to the first element of the encapsulated data. C++11 has nicer-to-read ways of doing this, but &data[0] also works in older versions of C++.)
Things to keep in mind:
std::string is like String in Java.
std::vector is like ArrayList in Java.
std::string is for a range of char with the meaning of text, std::vector<char> is for a range of char with the meaning of raw data.
std::string and std::vector are designed to work together with C APIs.
Do not use new[] in C++.
Understand the null termination of C strings.

Store value in Pointers as an Array - C++

I am trying to make a function like strcpy in C++. I cannot use built-in string.h functions because of restriction by our instructor. I have made the following function:
int strlen (char* string)
{
int len = 0;
while (string [len] != (char)0) len ++;
return len;
}
char* strcpy (char* *string1, char* string2)
{
for (int i = 0; i<strlen (string2); i++) *string1[i] = string2[i];
return *string1;
}
main()
{
char* i = "Farid";
strcpy (&i, "ABC ");
cout<<i;
}
But I am unable to set *string1 [i] value. When I try to do so an error appears on screen 'Program has encountered a problem and need to close'.
What should I do to resolve this problem?
Your strcpy function is wrong. When you write *string1[i] you are actually modifying the first character of the i-th element of an imaginary array of strings. That memory location does not exist and your program segfaults.
Do this instead:
char* strcpy (char* string1, char* string2)
{
for (int i = 0; i<strlen (string2); i++) string1[i] = string2[i];
return string1;
}
If you pass a char* the characters are already modifiable. Note It is responsibility of the caller to allocate the memory to hold the copy. And the declaration:
char* i = "Farid";
is not a valid allocation, because the i pointer will likely point to read-only memory. Do instead:
char i[100] = "Farid";
Now i holds 100 chars of local memory, plenty of room for your copy:
strcpy(i, "ABC ");
If you wanted this function to allocate memory, then you should create another one, say strdup():
char* strdup (char* string)
{
size_t len = strlen(string);
char *n = malloc(len);
if (!n)
return 0;
strcpy(n, string);
return n;
}
Now, with this function the caller has the responsibility to free the memory:
char *i = strdup("ABC ");
//use i
free(i);
Because this error in the declaration of strcpy: "char* *string1"
I don't think you meant string1 to be a pointer to a pointer to char.
Removing one of the * should word
The code has several issues:
You can't assign a string literal to char* because the string literal has type char const[N] (for a suitable value of N) which converts to char const* but not to char*. In C++03 it was possible to convert to char* for backward compatibility but this rule is now gone. That is, your i needs to be declared char const*. As implemented above, your code tries to write read-only memory which will have undesirable effects.
The declaration of std::strcpy() takes a char* and a char const*: for the first pointer you need to provide sufficient space to hold a string of the second argument. Since this is error-prone it is a bad idea to use strcpy() in the first place! Instead, you want to replicate std::strncpy() which takes as third argument the length of the first buffer (actually, I'm never sure if std::strncpy() guarantees zero termination or not; you definitely also want to guarantee zero termination).
It is a bad idea to use strlen() in the loop condition as the function needs to be evaluated for each iteration of the loop, effectively changing the complexity of strlen() from linear (O(N)) to quadratic (O(N2)). Quadratic complexity is very bad. Copying a string of 1000 characters takes 1000000 operations. If you want to try out the effect, copy a string with 1000000 characters using a linear and a quadratic algorithm.
Your strcpy() doesn't add a null-terminator.
In C++ (and in C since ~1990) the implicit int rule doesn't apply. That is, you really need to write int in front of main().
OK, a couple of things:
you are missing the return type for the main function
declaration. Not really allowed under the standard. Some compilers will still allow it, but others will fail on the compile.
the way you have your for loop structured in
strcpy you are calling your strlen function each time through
the loop, and it is having to re-count the characters in the source
string. Not a big deal with a string like "ABC " but as strings get
longer.... Better to save the value of the result into a variable and use that in the for loop
Because of the way that you are declaring i in
`main' you are pointing to read-only storage, and will be causing an
access violation
Look at the other answers here for how to rebuild your code.
Pointer use in C and C++ is a perennial issue. I'd like to suggest the following tutorial from Paul DiLorenzo, "Learning C++ Pointers for REAL dummies.".
(This is not to imply that you are a "dummy," it's just a reference to the ",insert subject here> for Dummies" lines of books. I would not be surprised that the insertion of "REAL" is to forestall lawsuits over trademarked titles)
It is an excellent tutorial.
Hope it helps.

std::string.c_str() has different value than std::string?

I have been working with C++ strings and trying to load char * strings into std::string by using C functions such as strcpy(). Since strcpy() takes char * as a parameter, I have to cast it which goes something like this:
std::string destination;
unsigned char *source;
strcpy((char*)destination.c_str(), (char*)source);
The code works fine and when I run the program in a debugger, the value of *source is stored in destination, but for some odd reason it won't print out with the statement
std::cout << destination;
I noticed that if I use
std::cout << destination.c_str();
The value prints out correctly and all is well. Why does this happen? Is there a better method of copying an unsigned char* or char* into a std::string (stringstreams?) This seems to only happen when I specify the string as foo.c_str() in a copying operation.
Edit: To answer the question "why would you do this?", I am using strcpy() as a plain example. There are other times that it's more complex than assignment. For example, having to copy only X amount of string A into string B using strncpy() or passing a std::string to a function from a C library that takes a char * as a parameter for a buffer.
Here's what you want
std::string destination = source;
What you're doing is wrong on so many levels... you're writing over the inner representation of a std::string... I mean... not cool man... it's much more complex than that, arrays being resized, read-only memory... the works.
This is not a good idea at all for two reasons:
destination.c_str() is a const pointer and casting away it's const and writing to it is undefined behavior.
You haven't set the size of the string, meaning that it won't even necessealy have a large enough buffer to hold the string which is likely to cause an access violation.
std::string has a constructor which allows it to be constructed from a char* so simply write:
std::string destination = source
Well what you are doing is undefined behavior. Your c_str() returns a const char * and is not meant to be assigned to. Why not use the defined constructor or assignment operator.
std::string defines an implicit conversion from const char* to std::string... so use that.
You decided to cast away an error as c_str() returns a const char*, i.e., it does not allow for writing to its underlying buffer. You did everything you could to get around that and it didn't work (you shouldn't be surprised at this).
c_str() returns a const char* for good reason. You have no idea if this pointer points to the string's underlying buffer. You have no idea if this pointer points to a memory block large enough to hold your new string. The library is using its interface to tell you exactly how the return value of c_str() should be used and you're ignoring that completely.
Do not do what you are doing!!!
I repeat!
DO NOT DO WHAT YOU ARE DOING!!!
That it seems to sort of work when you do some weird things is a consequence of how the string class was implemented. You are almost certainly writing in memory you shouldn't be and a bunch of other bogus stuff.
When you need to interact with a C function that writes to a buffer there's two basic methods:
std::string read_from_sock(int sock) {
char buffer[1024] = "";
int recv = read(sock, buffer, 1024);
if (recv > 0) {
return std::string(buffer, buffer + recv);
}
return std::string();
}
Or you might try the peek method:
std::string read_from_sock(int sock) {
int recv = read(sock, 0, 0, MSG_PEEK);
if (recv > 0) {
std::vector<char> buf(recv);
recv = read(sock, &buf[0], recv, 0);
return std::string(buf.begin(), buf.end());
}
return std::string();
}
Of course, these are not very robust versions...but they illustrate the point.
First you should note that the value returned by c_str is a const char* and must not be modified. Actually it even does not have to point to the internal buffer of string.
In response to your edit:
having to copy only X amount of string A into string B using strncpy()
If string A is a char array, and string B is std::string, and strlen(A) >= X, then you can do this:
B.assign(A, A + X);
passing a std::string to a function from a C library that takes a char
* as a parameter for a buffer
If the parameter is actually const char *, you can use c_str() for that. But if it is just plain char *, and you are using a C++11 compliant compiler, then you can do the following:
c_function(&B[0]);
However, you need to ensure that there is room in the string for the data(same as if you were using a plain c-string), which you can do with a call to the resize() function. If the function writes an unspecified amount of characters to the string as a null-terminated c-string, then you will probably want to truncate the string afterward, like this:
B.resize(B.find('\0'));
The reason you can safely do this in a C++11 compiler and not a C++03 compiler is that in C++03, strings were not guaranteed by the standard to be contiguous, but in C++11, they are. If you want the guarantee in C++03, then you can use std::vector<char> instead.