Does this Microsoft CFileDialog example lead to a potential memory violation - c++

I've been experiencing a number of random crashes using the MFC CFileDialog class so I had a look at their example code from this page which reads as follows;
#define MAX_CFileDialog_FILE_COUNT 99
#define FILE_LIST_BUFFER_SIZE ((MAX_CFileDialog_FILE_COUNT * (MAX_PATH + 1)) + 1)
CString fileName;
wchar_t* p = fileName.GetBuffer( FILE_LIST_BUFFER_SIZE );
CFileDialog dlgFile(TRUE);
OPENFILENAME& ofn = dlgFile.GetOFN( );
ofn.Flags |= OFN_ALLOWMULTISELECT;
ofn.lpstrFile = p;
ofn.nMaxFile = FILE_LIST_BUFFER_SIZE;
dlgFile.DoModal();
fileName.ReleaseBuffer();
wchar_t* pBufEnd = p + FILE_LIST_BUFFER_SIZE - 2;
wchar_t* start = p;
while( ( p < pBufEnd ) && ( *p ) )
p++;
if( p > start )
{
_tprintf(_T("Path to folder where files were selected: %s\r\n\r\n"), start );
p++;
int fileCount = 1;
while( ( p < pBufEnd ) && ( *p ) )
{
start = p;
while( ( p < pBufEnd ) && ( *p ) )
p++;
if( p > start )
_tprintf(_T("%2d. %s\r\n"), fileCount, start );
p++;
fileCount++;
}
}
By my reading of it, the statement fileName.ReleaseBuffer(); makes the memory pointed to in the buffer variable pinvalid, such that the remaining code is liable to experience memory violations. At the same time, I'd also assume that Microsoft would have checked such examples prior to publishing them. Am I missing something obvious here? Is there any reason for the use of a CString here over a simple new followed by a delete after the buffer is no longer required?

Sample code isn't formal documentation. This sample is wrong. The documentation is right:
The address returned by GetBuffer may not be valid after the call to ReleaseBuffer because additional CSimpleStringT operations can cause the CSimpleStringT buffer to be reallocated.
The sample uses CString (over raw pointers and manual memory management) for automatic memory management and exception safety. The latter is a lot harder to get right with manual memory management (although this sample doesn't get exception safety right, either).
If you want to fix the sample code to adhere to the contract, the following changes need to be made:*
Replace wchar_t* pBufEnd = p + FILE_LIST_BUFFER_SIZE - 2; with const wchar_t* pBufEnd = fileName.GetString() + FILE_LIST_BUFFER_SIZE - 2;.
Replace wchar_t* start = p; with const wchar_t* start = fileName.GetString();
Replace all remaining occurrences of p in the code after the dialog invocation with a new variable, initialized as const wchar_t* current = fileName.GetString();).
This is a common error. Whenever a developer thinks they need a char* of sorts, they overlook that they need a const char* instead, which pretty much every string type supplies by means of a member function.
Note that there are other bugs in the sample code, that have not been explicitly addressed in this answer (like the mismatch of character types as explained in another answer).
* A C++ implementation that retrieves the list of selected files can be found in this answer.

You might be noticing a difference between specification and implementation. The code above works because the CString implementation allows it, even though the CString specification bans it.
And to highlight the quality of the example: it mixes TCHAR and wchar_t. In tprintf("%s", start) the string start has to be a TCHAR* but the example uses wchar_t* start

Related

Optimize this function? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
I have written this code (Uses V8 library). I went through it a couple of times and this feels like it's the only way I can write something like this. The aim of the function is to replace the JavaScript .split() function; as when using that function with a limit doesn't include the last part of the array in the returning array. EG:
var str = "Test split string with limit";
var out = str.split(' ', 2);
The array out will contain: [Test, split]. I want it to contain: [Test, split, string with limit].
I know there are pure JS ways to do this however I find them hacky and possibly slower(?) than a single C++ bind call.
Here's my function:
/**
* Explodes a string but limits the tokens
* #param input
* #param delim
* #param limit
* #return
*/
void ASEngine::Engine::ASstrtok(const v8::FunctionCallbackInfo<v8::Value>& args)
{
Assert(3, args);
Isolate* isolate = args.GetIsolate();
/* Get args */
String::Utf8Value a1(args[0]);
String::Utf8Value a2(args[1]);
Local<Uint32> a3 = args[2]->ToUint32();
std::string input = std::string(*a1);
std::string delim = std::string(*a2);
unsigned int limit = a3->Int32Value();
unsigned int inputLen = input.length();
// Declare a temporary array to shove into the return later
std::vector<char*> tmp;
tmp.reserve(limit);
unsigned int delimlen = delim.length();
char* cp = (char*) malloc(inputLen);
char* cursor = cp + inputLen; // Cursor
char* cpp = (char*) cp; // Keep the start of the string
// Copy the haystack into a modifyable char ptr
memset(cp + inputLen, 0x00, 1);
memcpy(cp, input.c_str(), inputLen);
unsigned int arrayIndex = 0;
for(unsigned int i=0;i<limit;i++)
{
if((cursor = strstr(cp, delim.c_str())) == NULL)
{
cursor = (char*) cpp + inputLen;
break;
}
for(int j=0;j<delimlen;j++)
*(cursor+j) = 0x00;
tmp.push_back(cp);
cp = cursor + delimlen;
arrayIndex++;
}
if(*(cp) != '\0')
{
arrayIndex++;
tmp.push_back(cp);
}
Handle<Array> rtn = Array::New(args.GetIsolate(), arrayIndex);
/* Loop through the temporary array and assign
the variables to the V8 array */
for(unsigned int i=0;i<arrayIndex;i++)
{
rtn->Set(i, String::NewFromUtf8(
isolate, tmp[i], String::kNormalString, strlen(tmp[i])
));
}
/* Clean up memory */
delete cpp;
cp = NULL;
cpp = NULL;
cursor = NULL;
isolate = NULL;
/* Set the return */
args.GetReturnValue().Set(rtn);
}
If you are wondering: The variable cpp is there so I can delete the character pointer after I am done (As calling v8's String::NewFromUtf8() function copies the string) and I modify the cp pointer during the process of the function.
Before optimising, I would fix the code so that it is correct.
char* cp = (char*) malloc(inputLen);
...
/* Clean up memory */
delete cpp;
Whilst in some implementations, new and malloc do exactly the same thing, other implementations do not. So, if you allocate with malloc, use free to free the memory, not delete.
If you want to be clever about it, I expect:
tmp.reserve(limit+1);
will ensure that you have space for the remainder of the string without further allocation in the vector.
Since cursor isn't used after the loop, setting it inside the if that breaks the loop makes no sense.
if((cursor = strstr(cp, delim.c_str())) == NULL)
{
cursor = (char*) cpp + inputLen;
break;
}
You are using casts to (char *) in places that don't need it, for example:
char* cpp = (char*) cp; // Keep the start of the string
(cp is a char * already).
This:
memset(cp + inputLen, 0x00, 1);
is the same as:
cp[inputlen] = 0;
but unless the compiler inlines the memset, much faster.
Likewsie:
*(cursor+j) = 0x00;
can be written:
cursor[j] = 0;
However, assuming delimLen is greater than 1, you could get away with:
for(int j=0;j<delimlen;j++)
*(cursor+j) = 0x00;
converted to:
*cursor = 0;
Since your new cp value will skip to beyond delimlen anyway.
These serve absolutely no purpose:
cp = NULL;
cpp = NULL;
cursor = NULL;
isolate = NULL;
Unfortunately, I expect most of the time in your function won't be in any of the code I've commented on. But in the passing arguments back and forth between the calling JS library and the native C++ code. I'd be surprised if you gain anything over writing the same code in JS. (None of the above make much of a difference when it comes to speed, it's just correctness and "a small number of potentially wasted cycles if the compiler is rather daft").

Function isalnum(): unexpected results

For an assignment, I am using std::isalnum to determine if the input is a letter or a number. The point of the assignment is to create a "dictionary." It works well on small paragraphs, but does horrible on pages of text. Here is the code snippet I am using.
custom::String string;
std::cin >> string;
custom::String original = string;
size_t size = string.Size();
char j;
size_t i = 0;
size_t beg = 0;
while( i < size)
{
j = string[i];
if(!!std::isalnum(static_cast<unsignedchar>(j)))
{
--size;
}
if( std::isalnum( j ) )
{
string[i-beg] = tolower(j);
}
++i;
}//end while
string.SetSize(size - beg, '\0');
The code presented as I write this, does not make sense as a whole.
However, the calls to isalnum, as shown, would only work for plain ASCII, because
the C character classification functions require non-negative argument, or else EOF as argument, and
in order to work for international characters,
the encoding must be single-byte per character, and
setlocale should have been called prior to using the functions.
Regarding the first of these three points, you can wrap std::isalnum like this:
using Byte = unsigned char;
auto is_alphanumeric( char const ch )
-> bool
{ return !!std::isalnum( static_cast<Byte>( ch ) ); }
where the !! is just to silence a sillywarning from Visual C++ (warning about "performance", of all things).
Disclaimer: code untouched by compiler's hands.
Addendum: if you don't have a C++11 compiler, but only C++03,
typedef unsigned char Byte;
bool is_alphanumeric( char const ch )
{
return !!std::isalnum( static_cast<Byte>( ch ) );
}
As Bjarne remarked, C++11 feels like a whole new language! ;-)
I was able to create a solution to the problem. I noticed that isalnum did take care of some non alpha-numerics, but not all the time. Since the code above is part of a function, I called it multiple times with refined results given each time. I then came up with a do while loop that stores the string's size, calls the function, stores the new size, and compares them. If they are not the same it means that there is a chance that it needs to be called again. If they are the same, then the string has been fully cleaned. I am guessing that the reason isalnum was not working well was because I was reading in several chapters of a book into the string. Here is my code:
custom::string abc;
std::cin >> abc;
size_t first = 0;
size_t second = 0;
//clean the word
do{
first = abc.Size();
Cleanup(abc);
second = abc.Size();
}while(first != second);

C++ function call results in garbage parameters

ASI have a C++ function that looks like this:
static const unsigned int unknown = (unsigned)-1;
static inline char *
duplicateStringValue( const char *value,
unsigned int length = unknown )
{
if ( length == unknown )
length = (unsigned int)strlen(value);
char *newString = static_cast<char *>( malloc( length + 1 ) );
ASSERT( newString != 0, "Failed to allocate string value buffer" );
memcpy( newString, value, length );
newString[length] = 0;
return newString;
}
(This happens to be in the jsoncpp library, but I'm pretty sure that's orthogonal to the problem)
The issue is that according to GDB, the function is arriving on the stack with parameters ("", 31135568). The program tries and fails to allocate 31 megabytes, hits the assert and dies.
By examining the frame above dulpicateStringValue() with GDB, I can see that it is being invoked with the first parameter pointing to a small string on the heap, and the second parameter left out. In other words, as far as I can tell the function call is incorrectly getting garbage values for parameters.
I'm truly stumped by this. The only idea I have is that dulpicateStringValue() is called many times before this successfully, but at this point the stack is ~25 frames deep, much deeper than usual (as far as I can tell). Perhaps the stack and the heap are colliding and scribbling all over each other?
If anyone has some insights or has encountered something similar, I'd love to hear about it.
Edit: In response to questions, the function is being called as
value_.string_ = duplicateStringValue( other.value_.string_ );
where other.value_ is a union described by GDB as
value_ = {int_ = 34536679944, uint_ = 34536679944,
real_ = 1.7063387081744787e-313, bool_ = 8,
string_ = 0x80a8bea08 "boolean", map_ = 0x80a8bea08}
The code for the union:
union ValueHolder
{
LargestInt int_;
LargestUInt uint_;
double real_;
bool bool_;
char *string_;
ObjectValues *map_;
} value_;
Edit 2: #MarkRansom asked how the parameters are getting on the stack. In fact, they're not, GDB is reading them straight out of the register:
(gdb) f 1
#1 0x000000080663377e in duplicateStringValue (value=0x80aac9a10 "", length=31135568) at json_value.cpp:60
60 ASSERT( newString != 0, "Failed to allocate string value buffer" );
(gdb) p &length
Address requested for identifier "length" which is in register $rsi
(gdb) p &value
Address requested for identifier "value" which is in register $r13

Null Pointer issue using string::iterator in Visual Studio 2005

I am working with some legacy code. The legacy code works in production mode in the following scenario. I'm trying to build a command line version of the legacy code for testing purposes. I suspect there is an environmental setting issue at work here, but I'm relatively new to C++ and Visual Studio (long time eclipse/java guy).
This code is attempting to read in a string from a stream. It reads in a short, which in my debug scenario has a value of 11. Then, it is supposed to read in 11 chars. But this code craps out on the first char. Specifically, in the read method below, ptr is null, and so the fread call is throwing an exception. Why is ptr NULL?
Point of clarification, ptr becomes null between the operator>>(string) and operator>>(char) calls.
Mystream& Mystream::operator>>( string& str )
{
string::iterator it;
short length;
*this >> length;
if( length >= 0 )
{
str.resize( length );
for ( it = str.begin(); it != str.end(); ++it )
{
*this >> *it;
}
}
return *this;
}
The method for reading the short is here and looking at the file buffer etc. this looks like it is working properly.
Mystream& Mystream::operator>>(short& n )
{
read( ( char* )&n, sizeof( n ) );
SwapBytes( *this, ( char* )&n, sizeof( n ) );
return *this;
}
Now, the method for reading in a char is here:
Mystream& Mystream::operator>>(char& n )
{
read( ( char* )&n, sizeof( n ) );
return *this;
}
and the read method is:
Mystream& Mystream::read( char* ptr, int n )
{
fread( (void*)ptr, (size_t)1, (size_t)n, fp );
return *this;
}
One thing I don't understand, in the string input method, the *it is a char right? So why does the operator>>(char &n) method get dispatched on that line? In the debugger, it looks like the *it is a 0, (although a colleague tells me he doesn't trust the 2005 debugger on such things) and thus, it looks like the &n is treated as a null pointer and so the read method is throwing an exception.
Any insights you can provide would be most helpful!
Thanks
John
ps. For the curious, Swap Bytes looks like this:
inline void SwapBytes( Mystream& bfs, char * ptr, int nbyte, int nelem = 1)
{
// do we need to swap bytes?
if( bfs.byteOrder() != SYSBYTEORDER )
DoSwapBytesReally( bfs, ptr, nbyte, nelem );
}
And DoSwapBytesReally looks like:
void DoSwapBytesReally( Mystream& bfs, char * ptr, int nbyte, int nelem )
{
// if the byte order of the file
// does not match the system byte order
// then the bytes should be swapped
int i, n;
char temp;
#ifndef _DOSPOINTERS_
char *ptr1, *ptr2;
#else _DOSPOINTERS_
char huge *ptr1, huge *ptr2;
#endif _DOSPOINTERS_
int nbyte2;
nbyte2 = nbyte/2;
for ( n = 0; n < nelem; n++ )
{
ptr1 = ptr;
ptr2 = ptr1 + nbyte - 1;
for ( i = 0; i < nbyte2; i++ )
{
temp = *ptr1;
*ptr1++ = *ptr2;
*ptr2-- = temp;
}
ptr += nbyte;
}
}
I'd throw out this mess and start over. Extrapolating from the code, if what you had actually worked, it would be roughly equivalent to something like this:
MyStream::operator>>(string &s) {
short size;
fread((void *)&size, sizeof(size), 1, fP);
size = ntohs(size); // oops: after reading edited question, this is really wrong.
s.resize(size);
fread((void *)&s[0], 1, size, fp);
return *this;
}
In this case, delegating most of the work to other functions doesn't seem to have gained much -- this does the work more directly, but still isn't significantly longer or more complex than the original (if anything, I'd say rather the opposite).
I found a gray beard in the company who could explain what's going on to me. (I had already spoken to 2 old timers so I figured I had covered the old timer avenue of attack.) The code above is not ANSI compliant STL code. In Visual Studio 2005, Microsoft first introduced STL and there were issues. In particular older code that used to work would now fail in 2005 (I think 64bit mode may play a role in this as well.) Because of this, code will not work in debug mode (but it will work in release mode). One partial article is located here.
http://msdn.microsoft.com/en-us/library/aa985982%28v=vs.80%29.aspx
The particular issue I saw has to do with the line: it = str.begin() in the first method in the question. str is an empty string. So str.begin() is technically not defined. Visual Studio treats this situation differently between debug and release modes. (Can't do this in debug, you can do it in release.)
Bottom line, the gray beard suggested rewrite was exactly Jerry's. Ironically, the gray beard had fixed this problem in several files, but neglected to check it into the mainline. Uh oh. That scares the &#$!! out of me.

Multi-Dimensional Array ( C++ )

I'm trying to store a pointer in an array.
My pointer to a pointer is class object is:
classType **ClassObject;
So i know i can allocate it by using the new operator like this:
ClassObject = new *classType[ 100 ] = {};
I'm reading a text file, with punctuation and here is what i have so far:
// included libraries
// main function
// defined varaibles
classType **ClassObject; // global object
const int NELEMENTS = 100; // global index
wrdCount = 1; // start this at 1 for the 1st word
while ( !inFile.eof() )
{
getline( inFile, str, '\n' ); // read all data into a string varaible
str = removePunct(str); // User Defined Function to remove all punctuation.
for ( unsigned x = 0; x < str.length(); x++ )
{
if ( str[x] == ' ' )
{
wrdCount++; // Incrementing at each space
ClassObject[x] = new *classType[x];
// What i want to do here is allocate space for each word read from the file.
}
}
}
// this function just replaces all punctionation with a space
string removePunct(string &str)
{
for ( unsigned x = 0; x < str.length(); x++ )
if ( ispunct( str[x] ) )
str[x] = ' ';
return str;
}
// Thats about it.
I guess my questions are:
Have I allocated space for each word in the file?
How would i store a pointer in the ClassObject array within my while/for loop?
If you are using C++ use the Boost Multidimensional Array Library
Hmm, I'm not sure what you want to do (especially new *classType[x] -- does this even compile?)
If you want a new classType for every word, then you can just go
ClassObject[x] = new classType; //create a _single_ classType
ClassObject[x]->doSomething();
provided that ClassObject is initialized (as you said).
You say you want a 2D array - if you want to do that, then the syntax is:
ClassObject[x] = new classType[y]; //create an array of classType of size y
ClassObject[0][0].doSomething(); //note that [] dereferences automatically
However, I'm also not sure of what you mean by new *classType[ 100 ] = {}; - what are the curly braces doing there? It seems like it should be
classType** classObject = new classType*[100];
I highly suggest you use something else, though, as this is really nasty (and you have to take care of deletion... ugh)
Use vector<>s or as the above poster suggested, the boost libraries.
Your code is perfectly fine excepting one line:
ClassObject[x] = new *classType[x];
The star * needs to go away, and what you're probably trying to say is that you want ClassObject to be indexed to word count rather than x.
Replace that line with:
ClassObject[wrdCount] = new classType[x];
Hope that helps,
Billy3