Strange behavior with function strncpy? - c++

In my project,I have met these strange problem with strncpy. I have checked the reference. But the function strncpy behavior make me confused.
In this function, when it runs to strncpy(subs,target,term_len);
While I don't know why there is two blanks after the string?!!! It is a big project, I cannot paste all the code here. Following is just a piece. All my code is here.
char* subs = new char[len];
while(top<=bottom){
char* term = m_strTermTable[bottom].strterm;
int term_len = strlen(term);
memset(subs,'\0',len);
strncpy(subs,target,term_len);
int subs_len = strlen(subs);
int re = strcmp(subs,term);
if (re == 0)
{
return term_len;
}
bottom--;
}
delete[] subs;

strncpy does not add a terminating null byte if the source string is longer than the maximum number of characters (i.e. in your case, that would be if strlen(target) > term_len holds). If that happens, subs may or may not be null terminated correctly.
Try changing your strncpy call to
strncpy(subs, target, term_len-1);
so that even if strncpy doesn't add a terminating null byte, subs will still be null-terminated correctly due to the previous memset call.
Now, that being said - you could avoid using a separate subs buffer altogether (which leaks anyway in case the control flow gets to the return statement) by just using strncmp as in
while(top<=bottom) {
char* term = m_strTermTable[bottom].strterm;
int term_len = strlen(term);
if (strncmp(term, target, term_len) == 0) {
return term_len;
}
bottom--;
}

Related

How to check the contents of a LPTSTR string?

I'm trying to understand why a segmentation fault (SIGSEGV) occurs during the execution of this piece of code. This error occurs when testing the condition specified in the while instruction, but it does not occur at the first iteration, but at the second iteration.
LPTSTR arrayStr[STR_COUNT];
LPTSTR inputStr;
LPTSTR str;
// calls a function from external library
// in order to set the inputStr string
set_input_str(param1, (char*)&inputStr, param3);
str = inputStr;
while( *str != '\0' )
{
if( debug )
printf("String[%d]: %s\n", i, (char*)str);
arrayStr[i] = str;
str = str + strlen((char*)str) + 1;
i++;
}
After reading this answer, I have done some research on the internet and found this article, so I tried to modify the above code, using this piece of code read in this article (see below). However, this change did not solve the problem.
for (LPTSTR pszz = pszzStart; *pszz; pszz += lstrlen(pszz) + 1) {
... do something with pszz ...
}
As assumed in this answer, it seems that the code expects double null terminated arrays of string. Therefore, I wonder how I could check the contents of the inputStr string, in order to check if it actually contains only one null terminator char.
NOTE: the number of characters in the string printed from printf instruction is twice the value returned by the lstrlen(str) function call at the first iteration.
OK, now that you've included the rest of the code it is clear that it is indeed meant to parse a set of consecutive strings. The problem is that you're mixing narrow and wide string types. All you need to do to fix it is change the variable definitions (and remove the casts):
char *arrayStr[STR_COUNT];
char *inputStr;
char *str;
// calls a function from external library
// in order to set the inputStr string
set_input_str(param1, &inputStr, param3);
str = inputStr;
while( *str != '\0' )
{
if( debug )
printf("String[%d]: %s\n", i, str);
arrayStr[i] = str;
str = str + strlen(str) + 1;
i++;
}
Specifically, the issue was occurring on this line:
while( *str != '\0' )
since you hadn't cast str to char * the comparison was looking for a wide nul rather than a narrow nul.
str = str + strlen(str) + 1;
You go out of bounds, change to
str = str + 1;
or simply:
str++;
Of course you are inconsistently using TSTR and strlen, the latter assuming TCHAR = char
In any case, strlen returns the length of the string, which is the number of characters it contains not including the nul character.
Your arithmetic is out by one but you know you have to add one to the length of the string when you allocate the buffer.
Here however you are starting at position 0 and adding the length which means you are at position len which is the length of the string. Now the string runs from offset 0 to offset len - 1 and offset len holds the null character. Offset len + 1 is out of bounds.
Sometimes you might get away with reading it, if there is extra padding, but it is undefined behaviour and here you got a segfault.
This looks to me like code that expects double null terminated arrays of strings. I suspect that you are passing a single null terminated string.
So you are using something like this:
const char* inputStr = "blah";
but the code expects two null terminators. Such as:
const char* inputStr = "blah\0";
or perhaps an input value with multiple strings:
const char* inputStr = "foo\0bar\0";
Note that these final two strings are indeed double null terminated. Although only one null terminator is written explicitly at the end of the string, the compiler adds another one implicitly.
Your question edit throws a new spanner in the works? The cast in
strlen((char*)str)
is massively dubious. If you need to cast then the cast must be wrong. One wonders what LPTSTR expands to for you. Presumably it expands to wchar_t* since you added that cast to make the code compile. And if so, then the cast does no good. You are lying to the compiler (str is not char*) and lying to the compiler never ends well.
The reason for the segmentation fault is already given by Alter's answer. However, I'd like to add that the usual style of parsing a C-style string is more elegant and less verbose
while (char ch = *str++)
{
// other instructions
// ...
}
The scope of ch is only within in the body of the loop.
Aside: Either tag the question as C or C++ but not both, they're different languages.

Junk after C++ string when returned

I've just finished C++ The Complete Reference and I'm creating a few test classes to learn the language better. The first class I've made mimics the Java StringBuilder class and the method that returns the string is as follows:
char *copy = new char[index];
register int i;
for(i = 0; i <= index; i++) {
*(copy + i) = *(stringArray + i);
} //f
return copy;
stringArray is the array that holds the string that is being built, index represents the amount of characters that have been entered.
When the string returns there is some junk after it, such as if the string created is abcd the result is abcd with 10 random characters after it. Where is this junk coming from? If you need to see more of the code please ask.
You need to null terminate the string. That null character tells the computer when when string ends.
char * copy = new char[ length + 1];
for(int i = 0; i < length; ++i) copy[i] = stringArray[i];
copy[length] = 0; //null terminate it
Just a few things. Declare the int variable in the tighest scope possible for good practice. It is good practice so that unneeded scope wont' be populate, also easier on debugging and kepping track. And drop the 'register' keyword, let the compiler determine what needs to be optimized. Although the register keyword just hints, unless your code is really tight on performance, ignore stuff like that for now.
Does index contain the length of the string you're copying from including the terminating null character? If it doesn't then that's your problem right there.
If stringArrary isn't null-terminated - which can be fine under some circumstances - you need to ensure that you append the null terminator to the string you return, otherwise you don't have a valid C string and as you already noticed, you get a "bunch of junk characters" after it. That's actually a buffer overflow, so it's not quite as harmless as it seems.
You'll have to amend your code as follows:
char *copy = new char[index + 1];
And after the copy loop, you need to add the following line of code to add the null terminator:
copy[index] = '\0';
In general I would recommend to copy the string out of stringArray using strncpy() instead of hand rolling the loop - in most cases strncpy is optimized by the library vendor for maximum performance. You'll still have to ensure that the resulting string is null terminated, though.

strcat error "Unhandled exception.."

My goal with my constructor is to:
open a file
read into everything that exists between a particular string ("%%%%%")
put together each read row to a variable (history)
add the final variable to a double pointer of type char (_stories)
close the file.
However, the program crashes when I'm using strcat. But I can't understand why, I have tried for many hours without result. :/
Here is the constructor code:
Texthandler::Texthandler(string fileName, int number)
: _fileName(fileName), _number(number)
{
char* history = new char[50];
_stories = new char*[_number + 1]; // rows
for (int j = 0; j < _number + 1; j++)
{
_stories[j] = new char [50];
}
_readBuf = new char[10000];
ifstream file;
int controlIndex = 0, whileIndex = 0, charCounter = 0;
_storieIndex = 0;
file.open("Historier.txt"); // filename
while (file.getline(_readBuf, 10000))
{
// The "%%%%%" shouldnt be added to my variables
if (strcmp(_readBuf, "%%%%%") == 0)
{
controlIndex++;
if (controlIndex < 2)
{
continue;
}
}
if (controlIndex == 1)
{
// Concatenate every line (_readBuf) to a complete history
strcat(history, _readBuf);
whileIndex++;
}
if (controlIndex == 2)
{
strcpy(_stories[_storieIndex], history);
_storieIndex++;
controlIndex = 1;
whileIndex = 0;
// Reset history variable
history = new char[50];
}
}
file.close();
}
I have also tried with stringstream without results..
Edit: Forgot to post the error message:
"Unhandled exception at 0x6b6dd2e9 (msvcr100d.dll) in Step3_1.exe: 0xC00000005: Access violation writing location 0c20202d20."
Then a file named "strcat.asm" opens..
Best regards
Robert
You've had a buffer overflow somewhere on the stack, as evidenced by the fact one of your pointers is 0c20202d20 (a few spaces and a - sign).
It's probably because:
char* history = new char[50];
is not big enough for what you're trying to put in there (or it's otherwise not set up correctly as a C string, terminated with a \0 character).
I'm not entirely certain why you think multiple buffers of up to 10K each can be concatenated into a 50-byte string :-)
strcat operates on null terminated char arrays. In the line
strcat(history, _readBuf);
history is uninitialised so isn't guaranteed to have a null terminator. Your program may read beyond the memory allocated looking for a '\0' byte and will try to copy _readBuf at this point. Writing beyond the memory allocated for history invokes undefined behaviour and a crash is very possible.
Even if you added a null terminator, the history buffer is much shorter than _readBuf. This makes memory over-writes very likely - you need to make history at least as big as _readBuf.
Alternatively, since this is C++, why don't you use std::string instead of C-style char arrays?

Please suggest what is wrong with this string reversal function?

This code is compiling clean. But when I run this, it gives exception "Access violation writing location" at line 9.
void reverse(char *word)
{
int len = strlen(word);
len = len-1;
char * temp= word;
int i =0;
while (len >=0)
{
word[i] = temp[len]; //line9
++i;--len;
}
word[i] = '\0';
}
Have you stepped through this code in a debugger?
If not, what happens when i (increasing from 0) passes len (decreasing towards 0)?
Note that your two pointers word and temp have the same value - they are pointing to the same string.
Be careful: not all strings in a C++ program are writable. Even if your code is good it can still crash when someone calls it with a string literal.
When len gets to 0, you access the location before the start of the string (temp[0-1]).
Try this:
void reverse(char *word)
{
size_t len = strlen(word);
size_t i;
for (i = 0; i < len / 2; i++)
{
char temp = word[i];
word[i] = word[len - i - 1];
word[len - i - 1] = temp;
}
}
The function looks like it would not crash, but it won't work correctly and it will read from word[-1], which is not likely to cause a crash, but it is a problem. Your crashing problem is probably that you passed in a string literal that the compiler had put into a read-only data segment.
Something like this would crash on many operating systems.
char * word = "test";
reverse(word); // this will crash if "test" isn't in writable memory
There are also several problems with your algorithm. You have len = len-1 and later temp[len-1] which means that the last character will never be read, and when len==0, you will be reading from the first character before the word. Also, temp and word are both pointers, so they both point to the same memory, I think you meant to make a copy of word rather than just a copy of the pointer to word. You can make a copy of word with strdup. If you do that, and fix your off-by-one problem with len, then your function should work,
But that still won't fix the write crash, which is caused by code that you have not shown us.
Oh, and if you do use strdup be sure to call free to free temp before you leave the function.
Well, for one, when len == 0 len-1 will be a negative number. And that's pretty illegal. Second, it's quite possible that your pointer is pointing at an unreserved area of memory.
If you called that function as followed:
reverse("this is a test");
then with at least one compiler will pass in a read only string due to backwards compatibility with C where you can
pass string literals as non-const char*.

Issue with char[] in VS2008 - why does strcat append to the end of an empty array?

I am passing an empty char array that I need to recursively fill using strcat(). However, in the VS debugger, the array is not empty, it's full of some weird junk characters that I don't recognise. strcat() then appends to the end of these junk characters rather than at the front of the array.
I have also tried encoded[0] = '\0' to clear the junk before passing the array, but then strcat() doesn't append anything on the recursive call.
This is the code that supplies the array and calls the recursive function:
char encoded[512];
text_to_binary("Some text", encoded);
This is the recursive function:
void text_to_binary(const char* str, char* encoded)
{
char bintemp[9];
bintemp[0] = '\0';
while(*str != '\0')
{
ascii_to_binary(*str, bintemp);
strcat(encoded, bintemp);
str++;
text_to_binary(str, encoded);
}
}
What is going on?
ps. I can't use std::string - I am stuck with the char*.
Edit: This is the junk character in the array:
ÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌ...
You are not initialising the array. Change:
char encoded[512];
to
char encoded[512] = "";
strcat appends to the end of the string, the end is marked by a \0, it then appends a \0 to the new end position.
You should clear the destination encoded with either encoded[0]=0; or memset first.
char encoded[512];.. encoded is not initialized and will contain junk (or 0xCCCCCCCC in debug builds).
Your problem was due to encode initialization I think. A few comment on your program:
it's better to avoid recursive
function when you can do it with a
loop.
Second you should add the size of
encoded to avoid possible overflow
error (in the case the size of string
is bigger than encoded).
void text_to_binary(const char* str, char* encoded)
{
char bintemp[9];
bintemp[0] = '\0';
encode[0] = '\0';
for(const char *i = str; i!='\0'; i++)
{
ascii_to_binary(*i, bintemp);
strcat(encoded, bintemp);
}
}
PS: i didn't tried the source code, so if there is an error add a comment and I will correct it.
Good contination on your project.
The solution to your immediate problem has been posted already, but your text_to_binary is still inefficient. You are essentially calling strcat in a loop with always the same string to concatenate to, and strcat needs to iterate through the string to find its end. This makes your algorithm quadratic. What you should do is to keep track of the end of encoded on your own and put the content of bintemp directly there. A better way to write the loop would be
while(*str != '\0')
{
ascii_to_binary(*str, bintemp);
strcpy(encoded, bintemp);
encoded += strlen(bintemp);
str++;
}
You don't need the recursion because you are already looping over str (I believe this to be correct, as your original code will fill encoded pretty weirdly). Also, in the modified version, encoded is always pointing to the end of the original encoded string, so you can just use strcpy instead of strcat.
You didn't attached source of ascii_to_binary, let's assume that it will fill buffer with hex dump of the char (if this is the case it's easier to use sprintf(encoded+(i2),"%2x",*(str+i));
What's the point of recursively calling text_to_binary? I think this might be a problem.