Back to basics - idiomatic way to copy string to static array - c++

Ok, strncpy is not designed to work with NULL terminated strings - it's not designed for NULL terminated strings (if dest is too short it won't be terminated by NULL and if dest is longer it will be padded by zero's).
So, here is a trivial code:
const char *src = ....; // NULL terminated string of unknown length
char dest[30];
How to src to dest? strcpy is not safe, strncpy is bad choice too. So I left with strlen followed by memcpy?
I suppose solution will differ a bit whenever I care care dest won't be truncated (dest is smaller than length of src) or not.
Some limitations:
Legacy code, so I don't want and can not to change it to std::string
I don't have strlcpy - gcc doesn't supply it.
Code may be used in parts of application where performance is critical (e.g. I don't want to wast CPU time padding with zeros dest as strncpy does). However, I'm not talking about premature optimization, but rather the idiotic way to perform string copying in C-way.
Edit
Oopps, I meant strncpy and not snprintf. My mistake

With strncpy:
strncpy(dest, src, sizeof(dest) - 1);
dest[sizeof(dest) - 1] = '\0';
This pads with zeros, but does much less formatting work than snprintf. If you really must have the computer do as little as possible, describe it yourself:
char* last = dest + sizeof(dest) - 1;
char* curr = dest; /* assuming we must not alter 'dest' */
while (curr != last && *src) { *curr++ = *src++; }
*last = '\0'; /* avoids a branch, but always writes.
If branch prediction is working well and the text normally fits:
if (curr == last) { *curr = '\0'; } */

I think you're talking about strncpy(), which might not terminate the string and will fill the remainder of the buffer with zeros.
snprintf() always terminates the destination string (as long as thebuffer has a size of at least 1) and doesn't pad the remainder of the buffer with zeros.
In summary, snprintf() is what you want, except you're very concerned about performance. Since snprintf() needs to interpret the format string (even if all it ends up doing is copying a string), you might be better off with something like strlcpy() for bounded string copy operations.
(and if you want strlcpy() but don't have it, you can get the rather simple source here. For completeness, strlcat() is here)

If you don't care about truncation, you can use strncat():
dest[0] = 0;
strncat(dest, src, sizeof dest - 1);

I'd just roll my own:
for (int i = 0; i < (sizeof(dest) - 1) && src[i] != NULL; i++)
{
dest[i] = src[i];
}
dest[i] = NULL;
This ensures that dest is null-terminated, but never adds more nulls than necessary. If you're really performance-sensitive, you can declare this as a macro or an inline function in a common header.

Use snprintf. It always null-terminates and does not do any null padding. Don't know where you got the misconceptions about it...

std::copy(src, src+strlen(src)+1, dest)

I'm not sure I understand your question entirely, but if you're concerned about zero-padding it can often be done pretty efficiently if you initialize your array like this.
char dest[30] = { 0 };
If you initialize it like that you don't have to care about extra logic to add '\0' to the end of the string and it might even turn out faster.
But if you're going to optimize remember to measure the performance before and after the optimization. Otherwise always code with readability and maintainability in mind.

Related

How to create a function that removes all of a selected character in a C-string?

I want to make a function that removes all the characters of ch in a c-string.
But I keep getting an access violation error.
Unhandled exception at 0x000f17ba in testassignments.exe: 0xC0000005: Access violation writing location 0x000f787e.
void removeAll(char* &s, const char ch)
{
int len=strlen(s);
int i,j;
for(i = 0; i < len; i++)
{
if(s[i] == ch)
{
for(j = i; j < len; j++)
{
s[j] = s[j + 1];
}
len--;
i--;
}
}
return;
}
I expected the c-string to not contain the character "ch", but instead, I get an access violation error.
In the debug I got the error on the line:
s[j] = s[j + 1];
I tried to modify the function but I keep getting this error.
Edit--
Sample inputs:
s="abmas$sachus#settes";
ch='e' Output->abmas$sachus#settes, becomes abmas$sachus#stts
ch='t' Output-> abmas$sachus#stts, becomes abmas$sachus#ss.
Instead of producing those outputs, I get the access violation error.
Edit 2:
If its any help, I am using Microsoft Visual C++ 2010 Express.
Apart from the inefficiency of your function shifting the entire remainder of the string whenever encountering a single character to remove, there's actually not much wrong with it.
In the comments, people have assumed that you are reading off the end of the string with s[j+1], but that is untrue. They are forgetting that s[len] is completely valid because that is the string's null-terminator character.
So I'm using my crystal ball now, and I believe that the error is because you're actually running this on a string literal.
// This is NOT okay!
char* str = "abmas$sachus#settes";
removeAll(str, 'e');
This code above is (sort of) not legal. The string literal "abmas$sachus#settes" should not be stored as a non-const char*. But for backward compatibility with C where this is allowed (provided you don't attempt to modify the string) this is generally issued as a compiler warning instead of an error.
However, you are really not allowed to modify the string. And your program is crashing the moment you try.
If you were to use the correct approach with a char array (which you can modify), then you have a different problem:
// This will result in a compiler error
char str[] = "abmas$sachus#settes";
removeAll(str, 'e');
Results in
error: invalid initialization of non-const reference of type ‘char*&’ from an rvalue of type ‘char*’
So why is that? Well, your function takes a char*& type that forces the caller to use pointers. It's making a contract that states "I can modify your pointer if I want to", even if it never does.
There are two ways you can fix that error:
The TERRIBLE PLEASE DON'T DO THIS way:
// This compiles and works but it's not cool!
char str[] = "abmas$sachus#settes";
char *pstr = str;
removeAll(pstr, 'e');
The reason I say this is bad is because it sets a dangerous precedent. If the function actually did modify the pointer in a future "optimization", then you might break some code without realizing it.
Imagine that you want to output the string with characters removed later, but the first character was removed and you function decided to modify the pointer to start at the second character instead. Now if you output str, you'll get a different result from using pstr.
And this example is only assuming that you're storing the string in an array. Imagine if you actually allocated a pointer like this:
char *str = new char[strlen("abmas$sachus#settes") + 1];
strcpy(str, "abmas$sachus#settes");
removeAll(str, 'e');
Then if removeAll changes the pointer, you're going to have a BAD time when you later clean up this memory with:
delete[] str; //<-- BOOM!!!
The I ACKNOWLEDGE MY FUNCTION DEFINITION IS BROKEN way:
Real simply, your function definition should take a pointer, not a pointer reference:
void removeAll(char* s, const char ch)
This means you can call it on any modifiable block of memory, including an array. And you can be comforted by the fact that the caller's pointer will never be modified.
Now, the following will work:
// This is now 100% legit!
char str[] = "abmas$sachus#settes";
removeAll(str, 'e');
Now that my free crystal-ball reading is complete, and your problem has gone away, let's address the elephant in the room:
Your code is needlessly inefficient!
You do not need to do the first pass over the string (with strlen) to calculate its length
The inner loop effectively gives your algorithm a worst-case time complexity of O(N^2).
The little tricks modifying len and, worse than that, the loop variable i make your code more complex to read.
What if you could avoid all of these undesirable things!? Well, you can!
Think about what you're doing when removing characters. Essentially, the moment you have removed one character, then you need to start shuffling future characters to the left. But you do not need to shuffle one at a time. If, after some more characters you encounter a second character to remove, then you simply shunt future characters further to the left.
What I'm trying to say is that each character only needs to move once at most.
There is already an answer demonstrating this using pointers, but it comes with no explanation and you are also a beginner, so let's use indices because you understand those.
The first thing to do is get rid of strlen. Remember, your string is null-terminated. All strlen does is search through characters until it finds the null byte (otherwise known as 0 or '\0')...
[Note that real implementations of strlen are super smart (i.e. much more efficient than searching single characters at a time)... but of course, no call to strlen is faster]
All you need is your loop to look for the NULL terminator, like this:
for(i = 0; s[i] != '\0'; i++)
Okay, and now to ditch the inner loop, you just need to know where to stick each new character. How about just keeping a variable new_size in which you are going to count up how long the final string is.
void removeAll(char* s, char ch)
{
int new_size = 0;
for(int i = 0; s[i] != '\0'; i++)
{
if(s[i] != ch)
{
s[new_size] = s[i];
new_size++;
}
}
// You must also null-terminate the string
s[new_size] = '\0';
}
If you look at this for a while, you may notice that it might do pointless "copies". That is, if i == new_size there is no point in copying characters. So, you can add that test if you want. I will say that it's likely to make little performance difference, and potentially reduce performance because of additional branching.
But I'll leave that as an exercise. And if you want to dream about really fast code and just how crazy it gets, then go and look at the source code for strlen in glibc. Prepare to have your mind blown.
You can make the logic simpler and more efficient by writing the function like this:
void removeAll(char * s, const char charToRemove)
{
const char * readPtr = s;
char * writePtr = s;
while (*readPtr) {
if (*readPtr != charToRemove) {
*writePtr++ = *readPtr;
}
readPtr++;
}
*writePtr = '\0';
}

Reading contents of file into dynamically allocated char* array- can I read into std::string instead?

I have found myself writing code which looks like this
// Treat the following as pseudocode - just an example
iofile.seekg(0, std::ios::end); // iofile is a file opened for read/write
uint64_t f_len = iofile.tellg();
if(f_len >= some_min_length)
{
// Focus on the following code here
char *buf = new char[7];
char buf2[]{"MYFILET"}; // just some random string
// if we see this it's a good indication
// the rest of the file will be in the
// expected format (unlikely to see this
// sequence in a "random file", but don't
// worry too much about this)
iofile.read(buf, 7);
if(memcmp(buf, buf2, 7) == 0) // I am confident this works
{
// carry on processing file ...
// ...
// ...
}
}
else
cout << "invalid file format" << endl;
This code is probably an okay sketch of what we might want to do when opening a file, which has some specified format (which I've dictated). We do some initial check to make sure the string "MYFILET" is at the start of the file - because I've decided all my files for the job I'm doing are going to start with this sequence of characters.
I think this code would be better if we didn't have to play around with "c-style" character arrays, but used strings everywhere instead. This would be advantageous because we could do things like if(buf == buf2) if buf and buf2 where std::strings.
A possible alternative could be,
// Focus on the following code here
std::string buf;
std::string buf2("MYFILET"); // very nice
buf.resize(7); // okay, but not great
iofile.read(buf.data(), 7); // pretty awful - error prone if wrong length argument given
// also we have to resize buf to 7 in the previous step
// lots of potential for mistakes here,
// and the length was used twice which is never good
if(buf == buf2) then do something
What are the problems with this?
We had to use the length variable 7 (or constant in this case) twice. Which is somewhere between "not ideal" and "potentially error prone".
We had to access the contents of buf using .data() which I shall assume here is implemented to return a raw pointer of some sort. I don't personally mind this too much, but others may prefer a more memory-safe solution, perhaps hinting we should use an iterator of some sort? I think in Visual Studio (for Windows users which I am not) then this may return an iterator anyway, which will give [?] warnings/errors [?] - not sure on this.
We had to have an additional resize statement for buf. It would be better if the size of buf could be automatically set somehow.
It is undefined behavior to write into the const char* returned by std::string::data(). However, you are free to use std::vector::data() in this way.
If you want to use std::string, and dislike setting the size yourself, you may consider whether you can use std::getline(). This is the free function, not std::istream::getline(). The std::string version will read up to a specified delimiter, so if you have a text format you can tell it to read until '\0' or some other character which will never occur, and it will automatically resize the given string to hold the contents.
If your file is binary in nature, rather than text, I think most people would find std::vector<char> to be a more natural fit than std::string anyway.
We had to use the length variable 7 (or constant in this case) twice.
Which is somewhere between "not ideal" and "potentially error prone".
The second time you can use buf.size()
iofile.read(buf.data(), buf.size());
We had to access the contents of buf using .data() which I shall
assume here is implemented to return a raw pointer of some sort.
And pointed by John Zwinck, .data() return a pointer to const.
I suppose you could define buf as std::vector<char>; for vector (if I'm not wrong) .data() return a pointer to char (in this case), not to const char.
size() and resize() are working in the same way.
We had to have an additional resize statement for buf. It would be
better if the size of buf could be automatically set somehow.
I don't think read() permit this.
p.s.: sorry for my bad English.
We can validate a signature without double buffering (rdbuf and a string) and allocating from the heap...
// terminating null not included
constexpr char sig[] = { 'M', 'Y', 'F', 'I', 'L', 'E', 'T' };
auto ok = all_of(begin(sig), end(sig), [&fs](char c) { return fs.get() == (int)c; });
if (ok) {}
template<class Src>
std::string read_string( Src& src, std::size_t count){
std::string buf;
buf.resize(count);
src.read(&buf.front(), 7); // in C++17 make it buf.data()
return buf;
}
Now auto read = read_string( iofile, 7 ); is clean at point of use.
buf2 is a bad plan. I'd do:
if(read=="MYFILET")
directly, or use a const char myfile_magic[] = "MYFILET";.
I liked many of the ideas from the examples above, however I wasn't completely satisfied that there was an answer which would produce undefined-behaviour-free code for C++11 and C++17. I currently write most of my code in C++11 - because I don't anticipate using it on a machine in the future which doesn't have a C++11 compiler.
If one doesn't, then I add a new compiler or change machines.
However it does seem to me to be a bad idea to write code which I know may not work under C++17... That's just my personal opinion. I don't anticipate using this code again, but I don't want to create a potential problem for myself in the future.
Therefore I have come up with the following code. I hope other users will give feedback to help improve this. (For example there is no error checking yet.)
std::string
fstream_read_string(std::fstream& src, std::size_t n)
{
char *const buffer = new char[n + 1];
src.read(buffer, n);
buffer[n] = '\0';
std::string ret(buffer);
delete [] buffer;
return ret;
}
This seems like a basic, probably fool-proof method... It's a shame there seems to be no way to get std::string to use the same memory as allocated by the call to new.
Note we had to add an extra trailing null character in the C-style string, which is sliced off in the C++-style std::string.

What are the potential security vulnerabilities? C++

My boss told me to look at the following code and tell him what the potential security vulnerabilities were. I'm not very good at this kind of thing, since I don't think in the way of trying to hack code. All I see is that nothing is declared private, but other than that I just don't know.
#define NAME_SIZE (unsigned char) 255
// user input should contain the user’s name (first name space
// middle initial space last name and a null
// character), and was entered directly by the user.
// Returns the first character in the user input, or -1 if the method failed.
char poor_method(char* user_input, char* first, char *middle, char* last)
{
char*buffer;
char length;
// find first name
buffer = strtok(user_input, " ");
if(buffer==0)
{
return -1;
}
length = strlen(buffer);
if(length <= NAME_SIZE)
{
strcpy(first, buffer);
}
// find middle name
buffer = strtok(NULL, " ");
if(buffer==0)
{
return-1;
}
if(middle)
*middle = buffer[0];
// find last name
buffer = strtok(NULL, "\0");
length = strlen(buffer);
if(length <= NAME_SIZE)
{
strcpy(last, buffer);
}
// Check to make sure that all of the user input was used
buffer = strtok(NULL, "\0");
if(buffer != NULL)
{
return-1;
}
return first[0];
}
What security vulnerabilities are there?
Get good at writing secure code
You most likely don't want systems that you are responsible for finding their way onto bugtraq or cve. If you don't understand it, be honest with your boss. Tell him you don't understand and you want to work on it. Pick up Writing Secure Code. Read it, learn it, love it. Asking this question on SO and giving your boss the answer definitely doesn't help you in the long run.
Then look at the sample code again :)
What I saw (by no means a complete list):
There's no guarantees you're going to get a char pointer which points to a null-terminating string (unless you're allowed to make that assumption, not really a safe one to make).
strtok and strcpy are the C way of doing things and come with the fun stuff of programming C code. If you must use them, so be it (just make sure you can guarantee you're inputs to these functions will indeed be valid). Otherwise, try switching your code to use std::string and the "C++ way" (as Cat Plus Plus put it)
I'm assuming this is a typo:
charpoor_method(
You're missing a space between char and poor_method(
You're not checking if first or last are indeed valid pointers (unfortunately, the best you can do is to check them against NULL).
There's no guarantee that the buffers first or last can indeed hold whatever you're copying to them.
Another typo:
returnfirst[0];
missing space between return and first[0]
Learning to write secure code is something that's very important to do. Follow Brecht's advice and get good at it.
Ok strtok assumes user_input is NULL terminated, this might not be true.
charlength = strlen(buffer);
if(length &lt= NAME_SIZE)
{
strcpy(first, buffer);
}
charlenght here is undeclared, so is length, they should be declared as unsigned int.
strlen wont count the '\0' as a part of the length, so later strcpy will copy the '\0' to whatever is after First if the len of buffer is 255 + 1('\0')
Also is unknown if char *first size is, it should be NAME_SIZE but the comparisson should be
length <= NAME_SIZE - 1
or allocate char *first to NAME_SIZE + 1
I'd probably rewrite the whole thing, is quite ugly.
Rather than using strcpy(), use strncpy() with a specific length parameter, as that function, like strtok(), assumes a NULL-terminated buffer for the source, and that may not be the case, giving you a buffer overflow for the data copied into the buffer pointed to by either first or last. Additionally, you have no idea how long the buffers are that have been allocated for first and last ... Don't assume that the user of your function has properly allocated enough memory to copy into unless they've passed you a parameter telling you there are enough memory slots in the buffers. Otherwise again, you could (and most likely will) end-up with buffer overflows.
Also you may want to use the restrict keyword if you're using C99 in order to prevent the caller of your function from aliasing the same memory location for buffer, first, and last.

Strange char**/calloc behavior

When I debug the following code, strncpy works fine but as soon as the loop exits, I see that parent_var_names is pointing NULL/0xfdfdfddf. I am puzzled!
parent_var_names = (const char**)calloc(net->nodes[x]->num_parents, sizeof(const char*));
for(int i(1); i < net->nodes[x]->num_parents; ++i)
{
parent_var_names[i] = (const char*)malloc(strlen(rhs_arr[net->nodes[x]->markov_parents[i]]));
strncpy((char*)parent_var_names[i], (char*)rhs_arr[net->nodes[x]->markov_parents[i]], strlen(rhs_arr[net->nodes[x]->markov_parents[i]]));
}
Placing guard bytes (i.e. 0xFDFDFDFD) around an allocated region of memory is a feature of the (Microsoft) debug heap. Seeing that you encounter this value either means you are overwriting memory somewhere, or you are looking at the value of parent_var_names[0] without actually writing anything in there (i.e. take a close look at the value you initialize your loop variable iwith).
Furthermore, your code could be simplified to:
#include <string>
/* ... */
int ii = 0;
int parent_count = net->nodes[x]->num_parents;
char** parent_var_names = calloc(parent_count, sizeof(char*));
for(; ii < parent_count; ++ii)
{
parent_var_names[ii] = strdup(rhs_arr[net->nodes[x]->markov_parents[ii]]);
}
Also make sure your markov_parents are definitely zero-terminated. Btw., in response to your comment that you want "C and C++ compatibility": Your code is not valid C code, so ...
int i(1) in your for loop init should probably be int i(0).
Otherwise you're never setting parent_var_names[0] to anything other than the 0 that calloc() initializes it to.
Just for completeness (since this is mentioned a couple times in the comments), you're not taking into account the '\0' terminator for the strings you're copying. Since you're not copying the terminator in your strncpy() calls, you're not overflowing the buffers allocated, but your results aren't properly-terminated strings. It's possible that this is your intent, but that would be unusual. If it is intended, throw a comment in there...
If your parent_var_names aren't NULL terminated it's probably because you are using strlen when you allocate space for the strings. This fails to create space for the string and the NULL terminator. Try strlen()+1 instead.
You should probably be using std::string anyway...

Is there a better way of copying a string to a character array with things already in it

messageBuffer[0] = 1;
messageBuffer[1] = 0;
for (int i = 2; i < (userName.size() + 2); i++)
{
messageBuffer[i] = userName[(i - 2)];
}
userName is a string. I was just wondering if there is already a function that exists that I haven't found yet. I have tried looking on cpluscplus but nothing that I see.
Thanks for all the help guys =) I really appreiciate it. This site is awesome!
C++ strings are not guaranteed by the standard to be contiguous, which means that all the suggestions so far using strncpy are either unsafe (if they copy from something like &userName[0] or potentially inefficient (if they copy from userName.c_str(), which may imply an unnecessary copy)
The correct C++ solution is to use std::copy. Fast and safe.
std::copy(userName.begin(), userName.end(), messageBuffer+2);
As a general rule, if you find yourself messing around with C string functions in a C++ program, you are doing it wrong.
strncpy
EDIT: See #jaif answer for "C++" way.
strcpy(messageBuffer + 2, userName.c_str());
standard disclaimers about making sure you have enough memory apply
messageBuffer should be 3 characters bigger than the string (one for \0)
a bit of reference about the function here
Use strcpy and check the buffer size by hand.
strncpy is a little safer, but dangerous in other way. If the buffer is too small, strncpy does not terminate the string with \0 which will cause an error somewhere else in the program.
If you want to use strncpy, then be sure to verify that the output is \0-terminated. Usually when people use strncpy, they forget to do this which is why I recommend strcpy. C and C++ programmers can usually spot the missing buffer size check when using strcpy.
Yes, you can use strcpy, memcpy, memmove or std::copy to do this. Just pass the address of messageBuffer[2] as the destination. strcpy( &messageBuffer[2], userName.begin() );
If possible, use a std::vector with std::copy
std::vector<char> messageBuffer;
messageBuffer.reserve(userName.size() + 2); // optional
messageBuffer.push_back(1);
messageBuffer.push_back(0);
std::copy(userName.begin(), userName.end(), std::back_inserter(messageBuffer));
theC_API(&messageBuffer[0]);
Maybe not the fastest, but no chance of miscalculations.
Use strncpy
Use strncpy() for ANSI strings (char*), wcsncpy() for Unicode (wchar_t*) strings.
You have two options, the unsafe and the safe way.
Unsafe:
// bad idea, string lengths are not checked and if src is longer than
// memory available for dest, you will stomp over random memory
strcpy(dest, src);
Safer:
// Much safer, you can specify how many characters to copy (lesser of src length and
// dest length - 1 and add a null terminator '\0' to dest if the string was truncated).
strncpy(dest, src, num_characters);
What about using strcpy (or strncpy to prevent buffer overflows)?
Make sure the length of messageBuffer allows copying userName and just strcpy(messageBuffer + 2, userName);
Better way by using vector is
std::vector<char> messageBuffer;
messageBuffer.resize(userName.size()+2);
strcpy(&messageBuffer[0],usernamr.c_str());
std::copy(userName.begin(), userName.end(), std::back_inserter(messageBuffer));
theC_API(&messageBuffer[0]);