why memcpy alters the string array? - c++

I have code which goes like this. Could you tell me why it is not behaving as I would expect it to be?
/*
* test.cpp
*
* Created on: Dec 6, 2012
* Author: sandeep
*/
#include<iostream>
#include<string.h>
using namespace std;
int main()
{
int i=0;
string s="hello A B:bye A B";
char *input;
input=new char(s.size());
for(i=0;i<=s.size();i++)
input[i]=s[i];
char *tokenized1[2],*tokenized2[3];
tokenized1[0]=strtok(input,":");
tokenized1[1]=strtok(NULL,":");
i=0;
char *lstring;
while(i<2)
{
lstring=new char(strlen(tokenized1[i]));
memcpy(lstring,tokenized1[i],strlen(tokenized1[i])+1);
cout<<tokenized1[0]<<" "<<tokenized1[1]<<endl;
tokenized2[0]=strtok(lstring," ");
tokenized2[1]=strtok(NULL," ");
tokenized2[2]=strtok(NULL," ");
char c=tokenized2[0][0];
cout<<c<<endl;
cout<<tokenized2[0]<<" "<<tokenized2[1]<<" "<<tokenized2[2]<<endl;
i++;
}
}
and the output is this.
hello A B by
h
hello A B
hello A B by
b
by
There are some junk values at the end of 1st, 4th and 6th line of output.
Why tokenized1[1] got altered when I I did memcopy of tokenized1[0]? and how to solve this?

There's a couple of bugs in the following new call. You need to use square brackets; also, the argument is off by one.
lstring=new char[strlen(tokenized1[i]) + 1];
Without the square brackets, your are allocating space for one character. As a result, the memcpy() writes past the allocated memory.
edit: I just noticed the other new, which will also need to be fixed:
input=new char[s.size() + 1];
Finally, s[i] reads past the end of the string in:
for(i=0;i<=s.size();i++)
input[i]=s[i];
There could well be other bugs, not to mention memory leaks...

You don't appear to be zero terminating 'input'

In addition to what NPE said, there's a couple of other small things:
char *input;
input=new char(s.size());
This might have something to do with it - you are allocating a single character. You then write that one character, and overwrite other memory that is used for who-knows-what. Try this instead:
char *input = new char[s.size() + 1];
Another issue is your loop, immediately below that:
for(i=0;i<=s.size();i++)
input[i]=s[i];
At least on my system, using std::string::operator[] with an offset equal to s.size() fails; I don't know about your particular implementation, but I'm pretty it fails as well. Be safe, rather than sorry, and recode your loop thusly:
for(i = 0; i < s.size(); i++)
input[i] = s[i];
input[i] = 0;
I hope this helps.

Related

Is it valid to append a string to a character array like char p [] = "TEST" using strcat

#include<iostream>
#include<string.h>
using namespace std;
int main()
{
char p [] = "TEST";
strcat (p, "VAL");
cout << p;
return 0;
}
If what I understand is correct, a statement like char p [] = "TEST"; will allocate space from stack. When I call strcat() for such a string how the storage for p[] is adjusted to accommodate extra characters?
Last cout prints "TESTVAL". Is it valid to call strcat like this? If yes, how this works? I might be having problem with my understanding, but feeling like I lost touch. So this could easily be a dumb question. Please shed some light.
The storage is not adjusted, the call is not valid, and the behaviour of the code is undefined.
when you write
char buffer[] = "some literal";
it is expanded to
char buffer[sizeof("some literal")] = "some literal";
which has exact size to store "some literal" and nothing more.
when you concate another string in the end of the current buffer- you write beyond the boundries of the array - having undefined behavior.
another issue that in C++, we usually use std::string to handle strings, which does all the memory adjustment for us automatically.
p reserves space for 5 characters (4 + 1 for the null terminator). You are then appending 3 more characters which needs room for 8 (7 + 1 for the null). You don't have enough room for that and will be overwriting the stack. Depending on your compiler and build settings, you may not see any difference as potentially, the compiler leaves spaces between stack variables. On an optimised release build, you will probably get a crash.
If you change your code to look like this, you should see that sentinel1 & 2 are no longer 0 (it depends on the compiler which one will get trashed).
#include<iostream>
#include<string.h>
using namespace std;
int main()
{
int sentinel1 = 0;
char p [] = "TEST";
int sentinel2 = 0;
strcat (p, "VAL");
cout << p << sentinel1 << sentinel2;
return 0;
}

Segfaults on appending char* arrays

I'm making a lexical analyzer and this is a function out of the whole thing. This function takes as argument a char, c, and appends this char to the end of an already defined char* array (yytext). It then increments the length of the text (yylen).
I keep getting segfaults on the shown line when it enters this function. What am I doing wrong here? Thanks.
BTW: can't use the strncpy/strcat, etc. (although if you want you can show me that implementation too)
This is my code:
extern char *yytext;
extern int *yylen;
void consume(char c){
int s = *yylen + 1; //gets yylen (length of yytext) and adds 1
//now seg faults here
char* newArray = new char[s];
for (int i = 0;i < s - 1;i++){
newArray[i] = yytext[i]; //copy all chars from existing yytext into newArray
}
newArray[s-1] = c; //append c to the end of newArray
for (int i = 0;i < s;i++){ //copy all chars + c back to yytext
yytext[i] = newArray[i];
}
yylen++;
}
You have
extern int *yylen;
but try to use it like so:
int s = (int)yylen + 1;
If the variable is an int *, use it like an int * and dereference to get the int. If it is supposed to be an int, then declare it as such.
That can t work:
int s = (int)yylen + 1; //gets yylen (length of yytext) and adds 1
char newArray[s];
use malloc or a big enought buffer
char * newarray=(char*)(malloc(s));
Every C-style string should be null-terminated. From your description it seems you need to append the character at c. So, you need 2 extra locations ( one is for appending the character and other for null-terminator ).
Next, yylen is of type int *. You need to dereference it to get the length (assuming it is pointing to valid memory location ). So, try -
int s = *yylen + 2;
I don't see the need of temporary array but there might be a reason why you are doing it. Now,
yytext[i] = newArray[i]; //seg faults here
you have to check if yytext is pointing to a valid write memory location. If yes, then is it long enough to fill the appending character plus null terminator.
But I would recommend using std::string than working with character arrays. Using it would be a one liner to solve the problem.

Junk after C++ string when returned

I've just finished C++ The Complete Reference and I'm creating a few test classes to learn the language better. The first class I've made mimics the Java StringBuilder class and the method that returns the string is as follows:
char *copy = new char[index];
register int i;
for(i = 0; i <= index; i++) {
*(copy + i) = *(stringArray + i);
} //f
return copy;
stringArray is the array that holds the string that is being built, index represents the amount of characters that have been entered.
When the string returns there is some junk after it, such as if the string created is abcd the result is abcd with 10 random characters after it. Where is this junk coming from? If you need to see more of the code please ask.
You need to null terminate the string. That null character tells the computer when when string ends.
char * copy = new char[ length + 1];
for(int i = 0; i < length; ++i) copy[i] = stringArray[i];
copy[length] = 0; //null terminate it
Just a few things. Declare the int variable in the tighest scope possible for good practice. It is good practice so that unneeded scope wont' be populate, also easier on debugging and kepping track. And drop the 'register' keyword, let the compiler determine what needs to be optimized. Although the register keyword just hints, unless your code is really tight on performance, ignore stuff like that for now.
Does index contain the length of the string you're copying from including the terminating null character? If it doesn't then that's your problem right there.
If stringArrary isn't null-terminated - which can be fine under some circumstances - you need to ensure that you append the null terminator to the string you return, otherwise you don't have a valid C string and as you already noticed, you get a "bunch of junk characters" after it. That's actually a buffer overflow, so it's not quite as harmless as it seems.
You'll have to amend your code as follows:
char *copy = new char[index + 1];
And after the copy loop, you need to add the following line of code to add the null terminator:
copy[index] = '\0';
In general I would recommend to copy the string out of stringArray using strncpy() instead of hand rolling the loop - in most cases strncpy is optimized by the library vendor for maximum performance. You'll still have to ensure that the resulting string is null terminated, though.

strcat error "Unhandled exception.."

My goal with my constructor is to:
open a file
read into everything that exists between a particular string ("%%%%%")
put together each read row to a variable (history)
add the final variable to a double pointer of type char (_stories)
close the file.
However, the program crashes when I'm using strcat. But I can't understand why, I have tried for many hours without result. :/
Here is the constructor code:
Texthandler::Texthandler(string fileName, int number)
: _fileName(fileName), _number(number)
{
char* history = new char[50];
_stories = new char*[_number + 1]; // rows
for (int j = 0; j < _number + 1; j++)
{
_stories[j] = new char [50];
}
_readBuf = new char[10000];
ifstream file;
int controlIndex = 0, whileIndex = 0, charCounter = 0;
_storieIndex = 0;
file.open("Historier.txt"); // filename
while (file.getline(_readBuf, 10000))
{
// The "%%%%%" shouldnt be added to my variables
if (strcmp(_readBuf, "%%%%%") == 0)
{
controlIndex++;
if (controlIndex < 2)
{
continue;
}
}
if (controlIndex == 1)
{
// Concatenate every line (_readBuf) to a complete history
strcat(history, _readBuf);
whileIndex++;
}
if (controlIndex == 2)
{
strcpy(_stories[_storieIndex], history);
_storieIndex++;
controlIndex = 1;
whileIndex = 0;
// Reset history variable
history = new char[50];
}
}
file.close();
}
I have also tried with stringstream without results..
Edit: Forgot to post the error message:
"Unhandled exception at 0x6b6dd2e9 (msvcr100d.dll) in Step3_1.exe: 0xC00000005: Access violation writing location 0c20202d20."
Then a file named "strcat.asm" opens..
Best regards
Robert
You've had a buffer overflow somewhere on the stack, as evidenced by the fact one of your pointers is 0c20202d20 (a few spaces and a - sign).
It's probably because:
char* history = new char[50];
is not big enough for what you're trying to put in there (or it's otherwise not set up correctly as a C string, terminated with a \0 character).
I'm not entirely certain why you think multiple buffers of up to 10K each can be concatenated into a 50-byte string :-)
strcat operates on null terminated char arrays. In the line
strcat(history, _readBuf);
history is uninitialised so isn't guaranteed to have a null terminator. Your program may read beyond the memory allocated looking for a '\0' byte and will try to copy _readBuf at this point. Writing beyond the memory allocated for history invokes undefined behaviour and a crash is very possible.
Even if you added a null terminator, the history buffer is much shorter than _readBuf. This makes memory over-writes very likely - you need to make history at least as big as _readBuf.
Alternatively, since this is C++, why don't you use std::string instead of C-style char arrays?

Please suggest what is wrong with this string reversal function?

This code is compiling clean. But when I run this, it gives exception "Access violation writing location" at line 9.
void reverse(char *word)
{
int len = strlen(word);
len = len-1;
char * temp= word;
int i =0;
while (len >=0)
{
word[i] = temp[len]; //line9
++i;--len;
}
word[i] = '\0';
}
Have you stepped through this code in a debugger?
If not, what happens when i (increasing from 0) passes len (decreasing towards 0)?
Note that your two pointers word and temp have the same value - they are pointing to the same string.
Be careful: not all strings in a C++ program are writable. Even if your code is good it can still crash when someone calls it with a string literal.
When len gets to 0, you access the location before the start of the string (temp[0-1]).
Try this:
void reverse(char *word)
{
size_t len = strlen(word);
size_t i;
for (i = 0; i < len / 2; i++)
{
char temp = word[i];
word[i] = word[len - i - 1];
word[len - i - 1] = temp;
}
}
The function looks like it would not crash, but it won't work correctly and it will read from word[-1], which is not likely to cause a crash, but it is a problem. Your crashing problem is probably that you passed in a string literal that the compiler had put into a read-only data segment.
Something like this would crash on many operating systems.
char * word = "test";
reverse(word); // this will crash if "test" isn't in writable memory
There are also several problems with your algorithm. You have len = len-1 and later temp[len-1] which means that the last character will never be read, and when len==0, you will be reading from the first character before the word. Also, temp and word are both pointers, so they both point to the same memory, I think you meant to make a copy of word rather than just a copy of the pointer to word. You can make a copy of word with strdup. If you do that, and fix your off-by-one problem with len, then your function should work,
But that still won't fix the write crash, which is caused by code that you have not shown us.
Oh, and if you do use strdup be sure to call free to free temp before you leave the function.
Well, for one, when len == 0 len-1 will be a negative number. And that's pretty illegal. Second, it's quite possible that your pointer is pointing at an unreserved area of memory.
If you called that function as followed:
reverse("this is a test");
then with at least one compiler will pass in a read only string due to backwards compatibility with C where you can
pass string literals as non-const char*.