Null Pointer issue using string::iterator in Visual Studio 2005 - c++

I am working with some legacy code. The legacy code works in production mode in the following scenario. I'm trying to build a command line version of the legacy code for testing purposes. I suspect there is an environmental setting issue at work here, but I'm relatively new to C++ and Visual Studio (long time eclipse/java guy).
This code is attempting to read in a string from a stream. It reads in a short, which in my debug scenario has a value of 11. Then, it is supposed to read in 11 chars. But this code craps out on the first char. Specifically, in the read method below, ptr is null, and so the fread call is throwing an exception. Why is ptr NULL?
Point of clarification, ptr becomes null between the operator>>(string) and operator>>(char) calls.
Mystream& Mystream::operator>>( string& str )
{
string::iterator it;
short length;
*this >> length;
if( length >= 0 )
{
str.resize( length );
for ( it = str.begin(); it != str.end(); ++it )
{
*this >> *it;
}
}
return *this;
}
The method for reading the short is here and looking at the file buffer etc. this looks like it is working properly.
Mystream& Mystream::operator>>(short& n )
{
read( ( char* )&n, sizeof( n ) );
SwapBytes( *this, ( char* )&n, sizeof( n ) );
return *this;
}
Now, the method for reading in a char is here:
Mystream& Mystream::operator>>(char& n )
{
read( ( char* )&n, sizeof( n ) );
return *this;
}
and the read method is:
Mystream& Mystream::read( char* ptr, int n )
{
fread( (void*)ptr, (size_t)1, (size_t)n, fp );
return *this;
}
One thing I don't understand, in the string input method, the *it is a char right? So why does the operator>>(char &n) method get dispatched on that line? In the debugger, it looks like the *it is a 0, (although a colleague tells me he doesn't trust the 2005 debugger on such things) and thus, it looks like the &n is treated as a null pointer and so the read method is throwing an exception.
Any insights you can provide would be most helpful!
Thanks
John
ps. For the curious, Swap Bytes looks like this:
inline void SwapBytes( Mystream& bfs, char * ptr, int nbyte, int nelem = 1)
{
// do we need to swap bytes?
if( bfs.byteOrder() != SYSBYTEORDER )
DoSwapBytesReally( bfs, ptr, nbyte, nelem );
}
And DoSwapBytesReally looks like:
void DoSwapBytesReally( Mystream& bfs, char * ptr, int nbyte, int nelem )
{
// if the byte order of the file
// does not match the system byte order
// then the bytes should be swapped
int i, n;
char temp;
#ifndef _DOSPOINTERS_
char *ptr1, *ptr2;
#else _DOSPOINTERS_
char huge *ptr1, huge *ptr2;
#endif _DOSPOINTERS_
int nbyte2;
nbyte2 = nbyte/2;
for ( n = 0; n < nelem; n++ )
{
ptr1 = ptr;
ptr2 = ptr1 + nbyte - 1;
for ( i = 0; i < nbyte2; i++ )
{
temp = *ptr1;
*ptr1++ = *ptr2;
*ptr2-- = temp;
}
ptr += nbyte;
}
}

I'd throw out this mess and start over. Extrapolating from the code, if what you had actually worked, it would be roughly equivalent to something like this:
MyStream::operator>>(string &s) {
short size;
fread((void *)&size, sizeof(size), 1, fP);
size = ntohs(size); // oops: after reading edited question, this is really wrong.
s.resize(size);
fread((void *)&s[0], 1, size, fp);
return *this;
}
In this case, delegating most of the work to other functions doesn't seem to have gained much -- this does the work more directly, but still isn't significantly longer or more complex than the original (if anything, I'd say rather the opposite).

I found a gray beard in the company who could explain what's going on to me. (I had already spoken to 2 old timers so I figured I had covered the old timer avenue of attack.) The code above is not ANSI compliant STL code. In Visual Studio 2005, Microsoft first introduced STL and there were issues. In particular older code that used to work would now fail in 2005 (I think 64bit mode may play a role in this as well.) Because of this, code will not work in debug mode (but it will work in release mode). One partial article is located here.
http://msdn.microsoft.com/en-us/library/aa985982%28v=vs.80%29.aspx
The particular issue I saw has to do with the line: it = str.begin() in the first method in the question. str is an empty string. So str.begin() is technically not defined. Visual Studio treats this situation differently between debug and release modes. (Can't do this in debug, you can do it in release.)
Bottom line, the gray beard suggested rewrite was exactly Jerry's. Ironically, the gray beard had fixed this problem in several files, but neglected to check it into the mainline. Uh oh. That scares the &#$!! out of me.

Related

Does this Microsoft CFileDialog example lead to a potential memory violation

I've been experiencing a number of random crashes using the MFC CFileDialog class so I had a look at their example code from this page which reads as follows;
#define MAX_CFileDialog_FILE_COUNT 99
#define FILE_LIST_BUFFER_SIZE ((MAX_CFileDialog_FILE_COUNT * (MAX_PATH + 1)) + 1)
CString fileName;
wchar_t* p = fileName.GetBuffer( FILE_LIST_BUFFER_SIZE );
CFileDialog dlgFile(TRUE);
OPENFILENAME& ofn = dlgFile.GetOFN( );
ofn.Flags |= OFN_ALLOWMULTISELECT;
ofn.lpstrFile = p;
ofn.nMaxFile = FILE_LIST_BUFFER_SIZE;
dlgFile.DoModal();
fileName.ReleaseBuffer();
wchar_t* pBufEnd = p + FILE_LIST_BUFFER_SIZE - 2;
wchar_t* start = p;
while( ( p < pBufEnd ) && ( *p ) )
p++;
if( p > start )
{
_tprintf(_T("Path to folder where files were selected: %s\r\n\r\n"), start );
p++;
int fileCount = 1;
while( ( p < pBufEnd ) && ( *p ) )
{
start = p;
while( ( p < pBufEnd ) && ( *p ) )
p++;
if( p > start )
_tprintf(_T("%2d. %s\r\n"), fileCount, start );
p++;
fileCount++;
}
}
By my reading of it, the statement fileName.ReleaseBuffer(); makes the memory pointed to in the buffer variable pinvalid, such that the remaining code is liable to experience memory violations. At the same time, I'd also assume that Microsoft would have checked such examples prior to publishing them. Am I missing something obvious here? Is there any reason for the use of a CString here over a simple new followed by a delete after the buffer is no longer required?
Sample code isn't formal documentation. This sample is wrong. The documentation is right:
The address returned by GetBuffer may not be valid after the call to ReleaseBuffer because additional CSimpleStringT operations can cause the CSimpleStringT buffer to be reallocated.
The sample uses CString (over raw pointers and manual memory management) for automatic memory management and exception safety. The latter is a lot harder to get right with manual memory management (although this sample doesn't get exception safety right, either).
If you want to fix the sample code to adhere to the contract, the following changes need to be made:*
Replace wchar_t* pBufEnd = p + FILE_LIST_BUFFER_SIZE - 2; with const wchar_t* pBufEnd = fileName.GetString() + FILE_LIST_BUFFER_SIZE - 2;.
Replace wchar_t* start = p; with const wchar_t* start = fileName.GetString();
Replace all remaining occurrences of p in the code after the dialog invocation with a new variable, initialized as const wchar_t* current = fileName.GetString();).
This is a common error. Whenever a developer thinks they need a char* of sorts, they overlook that they need a const char* instead, which pretty much every string type supplies by means of a member function.
Note that there are other bugs in the sample code, that have not been explicitly addressed in this answer (like the mismatch of character types as explained in another answer).
* A C++ implementation that retrieves the list of selected files can be found in this answer.
You might be noticing a difference between specification and implementation. The code above works because the CString implementation allows it, even though the CString specification bans it.
And to highlight the quality of the example: it mixes TCHAR and wchar_t. In tprintf("%s", start) the string start has to be a TCHAR* but the example uses wchar_t* start

Function isalnum(): unexpected results

For an assignment, I am using std::isalnum to determine if the input is a letter or a number. The point of the assignment is to create a "dictionary." It works well on small paragraphs, but does horrible on pages of text. Here is the code snippet I am using.
custom::String string;
std::cin >> string;
custom::String original = string;
size_t size = string.Size();
char j;
size_t i = 0;
size_t beg = 0;
while( i < size)
{
j = string[i];
if(!!std::isalnum(static_cast<unsignedchar>(j)))
{
--size;
}
if( std::isalnum( j ) )
{
string[i-beg] = tolower(j);
}
++i;
}//end while
string.SetSize(size - beg, '\0');
The code presented as I write this, does not make sense as a whole.
However, the calls to isalnum, as shown, would only work for plain ASCII, because
the C character classification functions require non-negative argument, or else EOF as argument, and
in order to work for international characters,
the encoding must be single-byte per character, and
setlocale should have been called prior to using the functions.
Regarding the first of these three points, you can wrap std::isalnum like this:
using Byte = unsigned char;
auto is_alphanumeric( char const ch )
-> bool
{ return !!std::isalnum( static_cast<Byte>( ch ) ); }
where the !! is just to silence a sillywarning from Visual C++ (warning about "performance", of all things).
Disclaimer: code untouched by compiler's hands.
Addendum: if you don't have a C++11 compiler, but only C++03,
typedef unsigned char Byte;
bool is_alphanumeric( char const ch )
{
return !!std::isalnum( static_cast<Byte>( ch ) );
}
As Bjarne remarked, C++11 feels like a whole new language! ;-)
I was able to create a solution to the problem. I noticed that isalnum did take care of some non alpha-numerics, but not all the time. Since the code above is part of a function, I called it multiple times with refined results given each time. I then came up with a do while loop that stores the string's size, calls the function, stores the new size, and compares them. If they are not the same it means that there is a chance that it needs to be called again. If they are the same, then the string has been fully cleaned. I am guessing that the reason isalnum was not working well was because I was reading in several chapters of a book into the string. Here is my code:
custom::string abc;
std::cin >> abc;
size_t first = 0;
size_t second = 0;
//clean the word
do{
first = abc.Size();
Cleanup(abc);
second = abc.Size();
}while(first != second);

invalid operands to binary *

typedef struct pixel_type
{
unsigned char r;
unsigned char g;
unsigned char b;
} pixel;
buffer = (int *) malloc (sizeof(pixel) * stdin );
I keep getting an error that says "invalid operands to binary *(have unsigned int' and 'struct _IO_FILE *)." The struct is defined outside of a function so it is universal. The buffer is defined within the main. I can provide more code if needed. What is my problem?
EDIT: Alright so apparently I was a little confusing. What I'm trying to do is pass a file in, and then malloc enough space for that file. I was thinking of using a FILE function to pass the file in, and then using that, but was hoping to just use "stdin" instead. Is this not allowed? And this is in C. Just tagged C++ hoping someone else might see a similar problem.
Sorry for the silly question. Not new to C as a whole, but new to malloc. Second year student :P
I think you want to read the number of pixels from stdin:
int n;
scanf("%d", &n);
and then allocate memory for that many pixels:
unsigned char * buffer = (unsigned char *) malloc (sizeof(pixel) * n );
The right way to allocate the memory would be something like
size_t elements = 0;
... // get the number of elements as a separate operation
pixel *buffer = malloc( sizeof *buffer * elements ); // note no cast,
// operand of sizeof
if ( buffer )
{
// load your buffer here
}
In C, casting the result of malloc is considered bad practice1. It's unnecessary, since values of void * can be assigned to any pointer type, and under C89 compilers it can suppress a diagnostic if you forget to include stdlib.h or otherwise don't have a declaration for malloc in scope.
Also, since the expression *buffer has type pixel, the expression sizeof *buffer is equivalent to sizeof (pixel). This can save you some maintenance time if the type of buffer ever changes.
How you get the number of elements for your array really depends on your application. The easiest way would be to stick that value at the head of your data file:
size_t elements = 0;
FILE *data = fopen( "pixels.dat", "r" );
if ( !data )
{
// You will want to add real error handling here.
exit( 0 );
}
if ( fscanf( data, "%zu", &elements ) != 1 )
{
// You will want to add real error handling here
exit( 0 );
}
pixel *buffer = malloc( sizeof *buffer * elements );
if ( buffer )
{
for ( size_t i = 0; i < elements; i++ )
{
if ( fscanf( data, "%hhu %hhu %hhu", // %hhu for unsigned char
&buffer[i].r, &buffer[i].g, &buffer[i].b ) != 3 )
{
// more real error handling here
exit( 0 );
}
}
}
Naturally, this assumes that your data file is structured as rows of 3 integer values, like
10 20 30
40 50 60
etc.
1. As opposed to C++, where it's required, but if you're writing C++ you should be using the new operator anyway. Yes, you will see thousands of examples that include the cast. You will also see thousands of examples that use void main(). Most C references are simply crap.

C++ function call results in garbage parameters

ASI have a C++ function that looks like this:
static const unsigned int unknown = (unsigned)-1;
static inline char *
duplicateStringValue( const char *value,
unsigned int length = unknown )
{
if ( length == unknown )
length = (unsigned int)strlen(value);
char *newString = static_cast<char *>( malloc( length + 1 ) );
ASSERT( newString != 0, "Failed to allocate string value buffer" );
memcpy( newString, value, length );
newString[length] = 0;
return newString;
}
(This happens to be in the jsoncpp library, but I'm pretty sure that's orthogonal to the problem)
The issue is that according to GDB, the function is arriving on the stack with parameters ("", 31135568). The program tries and fails to allocate 31 megabytes, hits the assert and dies.
By examining the frame above dulpicateStringValue() with GDB, I can see that it is being invoked with the first parameter pointing to a small string on the heap, and the second parameter left out. In other words, as far as I can tell the function call is incorrectly getting garbage values for parameters.
I'm truly stumped by this. The only idea I have is that dulpicateStringValue() is called many times before this successfully, but at this point the stack is ~25 frames deep, much deeper than usual (as far as I can tell). Perhaps the stack and the heap are colliding and scribbling all over each other?
If anyone has some insights or has encountered something similar, I'd love to hear about it.
Edit: In response to questions, the function is being called as
value_.string_ = duplicateStringValue( other.value_.string_ );
where other.value_ is a union described by GDB as
value_ = {int_ = 34536679944, uint_ = 34536679944,
real_ = 1.7063387081744787e-313, bool_ = 8,
string_ = 0x80a8bea08 "boolean", map_ = 0x80a8bea08}
The code for the union:
union ValueHolder
{
LargestInt int_;
LargestUInt uint_;
double real_;
bool bool_;
char *string_;
ObjectValues *map_;
} value_;
Edit 2: #MarkRansom asked how the parameters are getting on the stack. In fact, they're not, GDB is reading them straight out of the register:
(gdb) f 1
#1 0x000000080663377e in duplicateStringValue (value=0x80aac9a10 "", length=31135568) at json_value.cpp:60
60 ASSERT( newString != 0, "Failed to allocate string value buffer" );
(gdb) p &length
Address requested for identifier "length" which is in register $rsi
(gdb) p &value
Address requested for identifier "value" which is in register $r13

C code - need to clarify the effectiveness

Hi I have written a code based upon a requirement.
(field1_6)(field2_30)(field3_16)(field4_16)(field5_1)(field6_6)(field7_2)(field8_1).....
this is one bucket(8 fields) of data. we will receive 20 buckets at a time means totally 160 fields.
i need to take the values of field3,field7 & fields8 based upon predefined condition.
if teh input argument is N then take the three fields from 1st bucket and if it is Y i need
to take the three fields from any other bucket other than 1st one.
if argumnet is Y then i need to scan all the 20 buckets one after other and check
the first field of the bucket is not equal to 0 and if it is true then fetch the three fields of that bucket and exit.
i have written the code and its also working fine ..but not so confident that it is effctive.
i am afraid of a crash some time.please suggest below is the code.
int CMI9_auxc_parse_balance_info(char *i_balance_info,char *i_use_balance_ind,char *o_balance,char *o_balance_change,char *o_balance_sign
)
{
char *pch = NULL;
char *balance_id[MAX_BUCKETS] = {NULL};
char balance_info[BALANCE_INFO_FIELD_MAX_LENTH] = {0};
char *str[160] = {NULL};
int i=0,j=0,b_id=0,b_ind=0,bc_ind=0,bs_ind=0,rc;
int total_bukets ;
memset(balance_info,' ',BALANCE_INFO_FIELD_MAX_LENTH);
memcpy(balance_info,i_balance_info,BALANCE_INFO_FIELD_MAX_LENTH);
//balance_info[BALANCE_INFO_FIELD_MAX_LENTH]='\0';
pch = strtok (balance_info,"*");
while (pch != NULL && i < 160)
{
str[i]=(char*)malloc(strlen(pch) + 1);
strcpy(str[i],pch);
pch = strtok (NULL, "*");
i++;
}
total_bukets = i/8 ;
for (j=0;str[b_id]!=NULL,j<total_bukets;j++)
{
balance_id[j]=str[b_id];
b_id=b_id+8;
}
if (!memcmp(i_use_balance_ind,"Y",1))
{
if (atoi(balance_id[0])==1)
{
memcpy(o_balance,str[2],16);
memcpy(o_balance_change,str[3],16);
memcpy(o_balance_sign,str[7],1);
for(i=0;i<160;i++)
free(str[i]);
return 1;
}
else
{
for(i=0;i<160;i++)
free(str[i]);
return 0;
}
}
else if (!memcmp(i_use_balance_ind,"N",1))
{
for (j=1;balance_id[j]!=NULL,j<MAX_BUCKETS;j++)
{
b_ind=(j*8)+2;
bc_ind=(j*8)+3;
bs_ind=(j*8)+7;
if (atoi(balance_id[j])!=1 && atoi( str[bc_ind] )!=0)
{
memcpy(o_balance,str[b_ind],16);
memcpy(o_balance_change,str[bc_ind],16);
memcpy(o_balance_sign,str[bs_ind],1);
for(i=0;i<160;i++)
free(str[i]);
return 1;
}
}
for(i=0;i<160;i++)
free(str[i]);
return 0;
}
for(i=0;i<160;i++)
free(str[i]);
return 0;
}
My feeling is that this code is very brittle. It may well work when given good input (I don't propose to desk check the thing for you) but if given some incorrect inputs it will either crash and burn or give misleading results.
Have you tested for unexpected inputs? For example:
Suppose i_balance_info is null?
Suppose i_balance_info is ""?
Suppose there are fewer than 8 items in the input string, what will this line of code do?
memcpy(o_balance_sign,str[7],1);
Suppose that that the item in str[3] is less than 16 chars long, what will this line of code do?
memcpy(o_balance_change,str[3],16);
My approach to writing such code would be to protect against all such eventualities. At the very least I would add ASSERT() statements, I would usually write explicit input validation and return errors when it's bad. The problem here is that the interface does not seem to allow for any possibility that there might be bad input.
I had a hard time reading your code but FWIW I've added some comments, HTH:
// do shorter functions, long functions are harder to follow and make errors harder to spot
// document all your variables, at the very least your function parameters
// also what the function is suppose to do and what it expects as input
int CMI9_auxc_parse_balance_info
(
char *i_balance_info,
char *i_use_balance_ind,
char *o_balance,
char *o_balance_change,
char *o_balance_sign
)
{
char *balance_id[MAX_BUCKETS] = {NULL};
char balance_info[BALANCE_INFO_FIELD_MAX_LENTH] = {0};
char *str[160] = {NULL};
int i=0,j=0,b_id=0,b_ind=0,bc_ind=0,bs_ind=0,rc;
int total_bukets=0; // good practice to initialize all variables
//
// check for null pointers in your arguments, and do sanity checks for any
// calculations
// also move variable declarations to just before they are needed
//
memset(balance_info,' ',BALANCE_INFO_FIELD_MAX_LENTH);
memcpy(balance_info,i_balance_info,BALANCE_INFO_FIELD_MAX_LENTH);
//balance_info[BALANCE_INFO_FIELD_MAX_LENTH]='\0'; // should be BALANCE_INFO_FIELD_MAX_LENTH-1
char *pch = strtok (balance_info,"*"); // this will potentially crash since no ending \0
while (pch != NULL && i < 160)
{
str[i]=(char*)malloc(strlen(pch) + 1);
strcpy(str[i],pch);
pch = strtok (NULL, "*");
i++;
}
total_bukets = i/8 ;
// you have declared char*str[160] check if enough b_id < 160
// asserts are helpful if nothing else assert( b_id < 160 );
for (j=0;str[b_id]!=NULL,j<total_bukets;j++)
{
balance_id[j]=str[b_id];
b_id=b_id+8;
}
// don't use memcmp, if ('y'==i_use_balance_ind[0]) is better
if (!memcmp(i_use_balance_ind,"Y",1))
{
// atoi needs balance_id str to end with \0 has it?
if (atoi(balance_id[0])==1)
{
// length assumptions and memcpy when its only one byte
memcpy(o_balance,str[2],16);
memcpy(o_balance_change,str[3],16);
memcpy(o_balance_sign,str[7],1);
for(i=0;i<160;i++)
free(str[i]);
return 1;
}
else
{
for(i=0;i<160;i++)
free(str[i]);
return 0;
}
}
// if ('N'==i_use_balance_ind[0])
else if (!memcmp(i_use_balance_ind,"N",1))
{
// here I get a headache, this looks just at first glance risky.
for (j=1;balance_id[j]!=NULL,j<MAX_BUCKETS;j++)
{
b_ind=(j*8)+2;
bc_ind=(j*8)+3;
bs_ind=(j*8)+7;
if (atoi(balance_id[j])!=1 && atoi( str[bc_ind] )!=0)
{
// length assumptions and memcpy when its only one byte
// here u assume strlen(str[b_ind])>15 including \0
memcpy(o_balance,str[b_ind],16);
// here u assume strlen(str[bc_ind])>15 including \0
memcpy(o_balance_change,str[bc_ind],16);
// here, besides length assumption you could use a simple assignment
// since its one byte
memcpy(o_balance_sign,str[bs_ind],1);
// a common practice is to set pointers that are freed to NULL.
// maybe not necessary here since u return
for(i=0;i<160;i++)
free(str[i]);
return 1;
}
}
// suggestion do one function that frees your pointers to avoid dupl
for(i=0;i<160;i++)
free(str[i]);
return 0;
}
for(i=0;i<160;i++)
free(str[i]);
return 0;
}
A helpful technique when you want to access offsets in an array is to create a struct that maps the memory layout. Then you cast your pointer to a pointer of the struct and use the struct members to extract information instead of your various memcpy's
I would also suggest you reconsider your parameters to the function in general, if you place every of them in a struct you have better control and makes the function more readable e.g.
int foo( input* inbalance, output* outbalance )
(or whatever it is you are trying to do)