Random behavior of an array of char arrays - c++

I have an array of character arrays that is split based on a pipe ('|') operator (example below) and the function I am using to create this array seems to work on occasion, and then on occasion, it will create the array then abort and give me one of two different errors.
I am not sure what I am doing wrong? Particularly I am not sure why it creates successfully every time but then seems to break after creation about half the time, regardless of the input.
Example array:
"here is | an example | input" = {"here is", "an example", "input"}
Errors:
Error in './msh': malloc(): memory corruption (fast): 0x000...
Error in './msh': free(): invalid pointer: 0x0000....
Code:
char** allArgs = new char*[100];
void createArgArrays(const char* line) {
char* token;
token = strtok((char*)line, "|");
int i = 0;
while(token != NULL) {
allArgs[i] = token;
i++;
token = strtok(NULL, "|");
}
}
Where I call the code:
string input;
getline(cin, input);
createArgArrays(input.c_str());
Any insight/help is greatly appreciated.

c_str() returns a const char *. strtok() modifies the string it refers to.
Per http://www.cplusplus.com/reference/string/string/c_str/:
c++98
A program shall not alter any of the characters in this sequence.
Don't cast away const to force things to "work".

A few things:
The C++ way is sometimes different than the C way.
Andrew Henle's point about casting should be carved into stone tablets.
If you really want to use a C function, try walking the string using strchr().
Also, try something like std::vector<std::string> (see std::vector::push_back) to store your string chunks - it'll be a bit cleaner and avoids an arbitrary cap on the size of allArgs.
Another thing you could look at is boost::split(), which probably does exactly what you want anyway.

Related

C++ getting length of char array using a second function

I'm trying to get the length of a character array in a second function. I've looked at a few questions on here (1 2) but they don't answer my particular question (although I'm sure something does, I just can't find it). My code is below, but I get the error "invalid conversion from 'char' to 'const char*'". I don't know how to convert my array to what is needed.
#include <cstring>
#include <iostream>
int ValidInput(char, char);
int main() {
char user_input; // user input character
char character_array[26];
int valid_guess;
valid_guess = ValidGuess(user_input, character_array);
// another function to do stuff with valid_guess output
return 0;
}
int ValidGuess (char user_guess, char previous_guesses) {
for (int index = 0; index < strlen(previous_guesses); index++) {
if (user_guess == previous_guesses[index]) {
return 0; // invalid guess
}
}
return 1; // valid guess, reaches this if for loop is complete
}
Based on what I've done so far, I feel like I'm going to have a problem with previous_guesses[index] as well.
char user_input;
defines a single character
char character_array[26];
defines an array of 26 characters.
valid_guess = ValidGuess(user_input, character_array);
calls the function
int ValidGuess (char user_guess, char previous_guesses)
where char user_guess accepts a single character, lining up correctly with the user_input argument, and char previous_guesses accepts a single character, not the 26 characters of character_array. previous_guesses needs a different type to accommodate character_array. This be the cause of the reported error.
Where this gets tricky is character_array will decay to a pointer, so
int ValidGuess (char user_guess, char previous_guesses)
could be changed to
int ValidGuess (char user_guess, char * previous_guesses)
or
int ValidGuess (char user_guess, char previous_guesses[])
both ultimately mean the same thing.
Now for where things get REALLY tricky. When an array decays to a pointer it loses how big it is. The asker has gotten around this problem, kudos, with strlen which computes the length, but this needs a bit of extra help. strlen zips through an array, counting until it finds a null terminator, and there are no signs of character_array being null terminated. This is bad. Without knowing where to stop strlen will probably keep going1. A quick solution to this is go back up to the definition of character_array and change it to
char character_array[26] = {};
to force all of the slots in the array to 0, which just happens to be the null character.
That gets the program back on its feet, but it could be better. Every call to strlen may recount (compilers are smart and could compute once per loop and store the value if it can prove the contents won't change) the characters in the string, but this is still at least one scan through every entry in character_array to see if it's null when what you really want to do is scan for user_input. Basically the program looks at every item in the array twice.
Instead, look for both the null terminator and user_input in the same loop.
int index = 0;
while (previous_guesses[index] != '\0' ) {
if (user_guess == previous_guesses[index]) {
return 0; // prefer returning false here. The intent is clearer
}
index++;
}
You can also wow your friends by using pointers and eliminating the need for the index variable.
while (*previous_guesses != '\0' ) {
if (user_guess == *previous_guesses) {
return false;
}
previous_guesses++;
}
The compiler knows and uses this trick too, so use the one that's easier for you to understand.
For 26 entries it probably doesn't matter, but if you really want to get fancy, or have a lot more than 26 possibilities, use a std::set or a std::unordered_set. They allow only one of an item and have much faster look-up than scanning a list one by one, so long as the list is large enough to get over the added complexity of a set and take advantage of its smarter logic. ValidGuess is replaced with something like
if (used.find(user_input) != used.end())
Side note: Don't forget to make the user read a value into user_input before the program uses it. I've also left out how to store the previous inputs because the question does as well.
1 I say probably because the Standard doesn't say what to do. This is called Undefined Behaviour. C++ is littered with the stuff. Undefined Behaviour can do anything -- work, not work, visibly not work, look like it works until it doesn't, melt your computer, anything -- but what it usually does is the easiest and fastest thing. In this case that's just keep going until the program crashes or finds a null.

How to create a function that removes all of a selected character in a C-string?

I want to make a function that removes all the characters of ch in a c-string.
But I keep getting an access violation error.
Unhandled exception at 0x000f17ba in testassignments.exe: 0xC0000005: Access violation writing location 0x000f787e.
void removeAll(char* &s, const char ch)
{
int len=strlen(s);
int i,j;
for(i = 0; i < len; i++)
{
if(s[i] == ch)
{
for(j = i; j < len; j++)
{
s[j] = s[j + 1];
}
len--;
i--;
}
}
return;
}
I expected the c-string to not contain the character "ch", but instead, I get an access violation error.
In the debug I got the error on the line:
s[j] = s[j + 1];
I tried to modify the function but I keep getting this error.
Edit--
Sample inputs:
s="abmas$sachus#settes";
ch='e' Output->abmas$sachus#settes, becomes abmas$sachus#stts
ch='t' Output-> abmas$sachus#stts, becomes abmas$sachus#ss.
Instead of producing those outputs, I get the access violation error.
Edit 2:
If its any help, I am using Microsoft Visual C++ 2010 Express.
Apart from the inefficiency of your function shifting the entire remainder of the string whenever encountering a single character to remove, there's actually not much wrong with it.
In the comments, people have assumed that you are reading off the end of the string with s[j+1], but that is untrue. They are forgetting that s[len] is completely valid because that is the string's null-terminator character.
So I'm using my crystal ball now, and I believe that the error is because you're actually running this on a string literal.
// This is NOT okay!
char* str = "abmas$sachus#settes";
removeAll(str, 'e');
This code above is (sort of) not legal. The string literal "abmas$sachus#settes" should not be stored as a non-const char*. But for backward compatibility with C where this is allowed (provided you don't attempt to modify the string) this is generally issued as a compiler warning instead of an error.
However, you are really not allowed to modify the string. And your program is crashing the moment you try.
If you were to use the correct approach with a char array (which you can modify), then you have a different problem:
// This will result in a compiler error
char str[] = "abmas$sachus#settes";
removeAll(str, 'e');
Results in
error: invalid initialization of non-const reference of type ‘char*&’ from an rvalue of type ‘char*’
So why is that? Well, your function takes a char*& type that forces the caller to use pointers. It's making a contract that states "I can modify your pointer if I want to", even if it never does.
There are two ways you can fix that error:
The TERRIBLE PLEASE DON'T DO THIS way:
// This compiles and works but it's not cool!
char str[] = "abmas$sachus#settes";
char *pstr = str;
removeAll(pstr, 'e');
The reason I say this is bad is because it sets a dangerous precedent. If the function actually did modify the pointer in a future "optimization", then you might break some code without realizing it.
Imagine that you want to output the string with characters removed later, but the first character was removed and you function decided to modify the pointer to start at the second character instead. Now if you output str, you'll get a different result from using pstr.
And this example is only assuming that you're storing the string in an array. Imagine if you actually allocated a pointer like this:
char *str = new char[strlen("abmas$sachus#settes") + 1];
strcpy(str, "abmas$sachus#settes");
removeAll(str, 'e');
Then if removeAll changes the pointer, you're going to have a BAD time when you later clean up this memory with:
delete[] str; //<-- BOOM!!!
The I ACKNOWLEDGE MY FUNCTION DEFINITION IS BROKEN way:
Real simply, your function definition should take a pointer, not a pointer reference:
void removeAll(char* s, const char ch)
This means you can call it on any modifiable block of memory, including an array. And you can be comforted by the fact that the caller's pointer will never be modified.
Now, the following will work:
// This is now 100% legit!
char str[] = "abmas$sachus#settes";
removeAll(str, 'e');
Now that my free crystal-ball reading is complete, and your problem has gone away, let's address the elephant in the room:
Your code is needlessly inefficient!
You do not need to do the first pass over the string (with strlen) to calculate its length
The inner loop effectively gives your algorithm a worst-case time complexity of O(N^2).
The little tricks modifying len and, worse than that, the loop variable i make your code more complex to read.
What if you could avoid all of these undesirable things!? Well, you can!
Think about what you're doing when removing characters. Essentially, the moment you have removed one character, then you need to start shuffling future characters to the left. But you do not need to shuffle one at a time. If, after some more characters you encounter a second character to remove, then you simply shunt future characters further to the left.
What I'm trying to say is that each character only needs to move once at most.
There is already an answer demonstrating this using pointers, but it comes with no explanation and you are also a beginner, so let's use indices because you understand those.
The first thing to do is get rid of strlen. Remember, your string is null-terminated. All strlen does is search through characters until it finds the null byte (otherwise known as 0 or '\0')...
[Note that real implementations of strlen are super smart (i.e. much more efficient than searching single characters at a time)... but of course, no call to strlen is faster]
All you need is your loop to look for the NULL terminator, like this:
for(i = 0; s[i] != '\0'; i++)
Okay, and now to ditch the inner loop, you just need to know where to stick each new character. How about just keeping a variable new_size in which you are going to count up how long the final string is.
void removeAll(char* s, char ch)
{
int new_size = 0;
for(int i = 0; s[i] != '\0'; i++)
{
if(s[i] != ch)
{
s[new_size] = s[i];
new_size++;
}
}
// You must also null-terminate the string
s[new_size] = '\0';
}
If you look at this for a while, you may notice that it might do pointless "copies". That is, if i == new_size there is no point in copying characters. So, you can add that test if you want. I will say that it's likely to make little performance difference, and potentially reduce performance because of additional branching.
But I'll leave that as an exercise. And if you want to dream about really fast code and just how crazy it gets, then go and look at the source code for strlen in glibc. Prepare to have your mind blown.
You can make the logic simpler and more efficient by writing the function like this:
void removeAll(char * s, const char charToRemove)
{
const char * readPtr = s;
char * writePtr = s;
while (*readPtr) {
if (*readPtr != charToRemove) {
*writePtr++ = *readPtr;
}
readPtr++;
}
*writePtr = '\0';
}

Splitting a std::string into two const char*s resulting in the second const char* overwriting the first

I am taking a line of input which is separated by a space and trying to read the data into two integer variables.
for instance: "0 1" should give child1 == 0, child2 == 1.
The code I'm using is as follows:
int separator = input.find(' ');
const char* child1_str = input.substr(0, separator).c_str(); // Everything is as expected here.
const char* child2_str = input.substr(
separator+1, //Start with the next char after the separator
input.length()-(separator+1) // And work to the end of the input string.
).c_str(); // But now child1_str is showing the same location in memory as child2_str!
int child1 = atoi(child1_str);
int child2 = atoi(child2_str); // and thus are both of these getting assigned the integer '1'.
// do work
What's happening is perplexing me to no end. I'm monitoring the sequence with the Eclipse debugger (gdb). When the function starts, child1_str and child2_str are shown to have different memory locations (as they should). After splitting the string at separator and getting the first value, child1_str holds '0' as expected.
However, the next line, which assigns a value to child2_str not only assigns the correct value to child2_str, but also overwrites child1_str. I don't even mean the character value is overwritten, I mean that the debugger shows child1_str and child2_str to share the same location in memory.
What the what?
1) Yes, I'll be happy to listen to other suggestions to convert a string to an int -- this was how I learned to do it a long time ago, and I've never had a problem with it, so never needed to change, however:
2) Even if there's a better way to perform the conversion, I would still like to know what's going on here! This is my ultimate question. So even if you come up with a better algorithm, the selected answer will be the one that helps me understand why my algorithm fails.
3) Yes, I know that std::string is C++ and const char* is standard C. atoi requires a c string. I'm tagging this as C++ because the input will absolutely be coming as a std::string from the framework I am using.
First, the superior solutions.
In C++11 you can use the newfangled std::stoi function:
int child1 = std::stoi(input.substr(0, separator));
Failing that, you can use boost::lexical_cast:
int child1 = boost::lexical_cast<int>(input.substr(0, separator));
Now, an explanation.
input.substr(0, separator) creates a temporary std::string object that dies at the semicolon. Calling c_str() on that temporary object gives you a pointer that is only valid as long as the temporary lives. This means that, on the next line, the pointer is already invalid. Dereferencing that pointer has undefined behaviour. Then weird things happens, as is often the case with undefined behaviour.
The value returned by c_str() is invalid after the string is destructed. So when you run this line:
const char* child1_str = input.substr(0, separator).c_str();
The substr function returns a temporary string. After the line is run, this temporary string is destructed and the child1_str pointer becomes invalid. Accessing that pointer results in undefined behavior.
What you should do is assign the result of substr to a local std::string variable. Then you can call c_str() on that variable, and the result will be valid until the variable is destructed (at the end of the block).
Others have already pointed out the problem with your current code. Here's how I'd do the conversion:
std::istringstream buffer(input);
buffer >> child1 >> child2;
Much simpler and more straightforward, not to mention considerably more flexible (e.g., it'll continue to work even if the input has a tab or two spaces between the numbers).
input.substr returns a temporary std::string. Since you are not saving it anywhere, it gets destroyed. Anything that happens afterwards depends solely on your luck.
I recommend using an istringstream.

add Null terminator on istream error

I am trying to read a text file that has collections of strings into an array of objects, and am having problems with the input. I get an error that goes to istream here
*_Str = _Elem(); // add terminating null character
I don't really know much about how to use strings in C++, so any help would be appreciated.
my code:
char bird_name[MAX_LINE_LENGTH];
char* description =new char [MAX_LINE_LENGTH];
char* sound=new char [MAX_LINE_LENGTH];
int num_states= 0;
char* states[10];
bool valid = true;
char* state_name = new char [MAX_LINE_LENGTH];
for (int j =0; j<10; j++)
states[j]=new char [MAX_LINE_LENGTH];
char *input_filename = argv[1];
ifstream input(input_filename);
if (!input.is_open())
{
cerr << "Invalid filename: " << input_filename << endl;
system("pause");
return 1;
}
input.getline(bird_name, MAX_LINE_LENGTH);
char* state_num = new char [MAX_LINE_LENGTH];
while (strcmp(bird_name, "END") != 0)
{
input.getline(description, MAX_LINE_LENGTH);
consume_newline(input);
input.getline(sound, MAX_LINE_LENGTH);
consume_newline(input);
input.getline(state_num, MAX_LINE_LENGTH);
num_states = int(state_num);
consume_newline(input);
for (int k = 0; k<num_states; k++)
input.getline(states[k], MAX_LINE_LENGTH);
consume_newline(input);
consume_newline(input);
birds[num_birds++] = new Bird(bird_name, description, sound, num_states, states);
//birds[num_birds]->display();
input.getline(bird_name, MAX_LINE_LENGTH);
}
The offending code you mention, …
*_Str = _Elem(); // add terminating null character
is presumably from some standard library source code file.
Note that in your own code you should not use identifiers starting with underscore followed by uppercase, since they are reserved for the implementation (such as the code above).
The comment indicates that things go awry when the standard lib code has read a complete line of input into the buffer, and is trying to add a terminating null-byte.
That in turn indicates that the buffer is too small, or that the buffer pointer handed to the standard library code, is not even valid.
I am unable to find that in the code that you’re showing. And I suspect that the code you’re showing is not the code where the problem manifests. Please note that for the future: if at all possible, post complete code that you have tested one millisecond ago…
Anyway, it’s not necessary to know exactly where and what goes wrong (in detail) in order to fix things. You can just employ an “Alexandrian solution”. That expression refers to Alexander the Great who, when he could not find any rope end to start untying a really Bad Knot™, just sliced it in two with his sword.
So consider your declaration …
char* description =new char [MAX_LINE_LENGTH];
Now the first obvious thing that is wrong with that, glaring us in the face, is the use of an ALL UPPERCASE identifier. Reserve that for macros. Then it becomes …
char* description =new char [max_line_length];
Second, using a raw pointer, and raw new, is generally just Bad™. So get rid of that. Then it looks like …
char description[max_line_length];
Third, using a raw array like that is often a good solution, but it turns out that this one is being used for a variable length string. And for that usage, it is just Bad™. Instead use an object of some string class, such as the standard library’s std::string:
std::string description;
You need to include the [string] header for that, i.e. #include <string>.
Fourth, this variable is only used inside the loop, so move the declaration inside the loop!
Fifth, with std::string, you need to change the getline call, currently …
input.getline(description, MAX_LINE_LENGTH);
to use the freestanding getline function from the [string] header, namely …
std::getline( input, description );
Sixth, there is no error checking on input operations. You need to add error checking and error handling. Assuming that input as a std::istream, then you can check input.fail(); it’s true if some input operation has failed.
Sevent… Oh there should logically be seventh point here, since seven is a much more pleasing number than six. However, I have nothing to say that will fit into this seventh point.
Cheers & hth.,

Having issues with execvp()

So here is the bit of my code that's giving me problems:
void childProcessHandler(string command){
int argCounter = 0;
for(int i=0; i!=command.size(); i++)
argCounter+=( command.at(i) == ' ');
char * temp, *token;
char *childArgs[argCounter];
argCounter = 1;
temp = new char [command.size()+1];
strcpy (temp, command.c_str());
token = strtok (temp," ");
childArgs[0] = token;
while (token!=NULL)
{
token = strtok(NULL," ");
childArgs[argCounter] = token;
argCounter++;
}
//delete[] temp; //Should remove token as well?
execvp(childArgs[0], childArgs);
cout<<"PROBLEM!"<<endl;
exit(-1);
}
In the main() method my code gets to a point where it forks() (the parent process then waits for the child to exit.) then the child process (process ID == 0 yes?) calls the method childProcessHandler with the user input (command to run + args) as it's argument. Then I tokenize the user input and call execvp on it.
Everything compiles and executes. The line after execvp is never reached because execvp only returns when there is an error yes?
The project is to simulate a unix terminal however when I give it the command "date" nothing gets printed like it should... The child exits and the parent process resumes just fine however nothing is sent back up to the terminal window...
What am I doing wrong?
(Also we were "recommended" to use strtok to tokenize it but if anyone has anything simpler i'm open to opinions.)
THANKS!
EDIT
The above code works, for example, if I type in "date " instead of "date". I think there might be something fishy with the "tokenizer" not putting a null character at the end of the childArgs[] array. I'll play around with that and thanks for the quick responses!
(Ninja edit, also commented out delete[] temp for the time being)
You're mixing std::string and char/char*. Fine, but you have to be careful, they have different behaviours.
In particular this line:
temp = new char [command.size()+1];
Is creating an actual array to hold a string in.
token = strtok (temp," ");
This is making token (which is just a pointer) point to a place inside temp. strtok() modifies the input string to make a temporary string within a string (sounds crazy, I know).
You need to copy the string strtok() gives you into a permanent home. Either use std::string to save you time and code, or do it the char* way and allocate the new string yourself. E.g. instead of:
childArgs[0] = token;
you need:
childArgs[0] = new char[strlen(token)+1];
strcpy(childArgs[0], token);
The same applies to tokens stored in the array during the loop over the command arguments.
Your childargs vector of pointers point into the bytes allocated in the block of memory "temp". When you free temp, you are removing the memory pointed to by the childargs pointers, possibly corrupting some of the values within your vector.
Remove the call to delete[] to stop freeing the memory pointed to by the childargs pointers. You will not be leaking memory. Once you call exec_() your entire process image is replaced anyway. The only thing that survives a call to exec_() (for the most part) are your file descriptors.
As a test, try something a bit more simple: After your call to fork() in the child, just call exec with the path to "date". Make that work before fiddling with the parameter list vector.
As another test, remove your call to exec, and print out your entire vector of pointers to make sure that your tokenizing is working the way you think it should. Remember that your final entry must be NULL so that you know where the end of the vector is.