How to prevent copying a wild pointer string

How to prevent copying a wild pointer string - c++

My program is crash intermittently when it tries to copy a character array which is not ended by a NULL terminator('\0').
class CMenuButton {
TCHAR m_szNode[32];
CMenuButton() {
memset(m_szNode, '\0', sizeof(m_szNode));
}
};
int main() {
....
CString szTemp = ((CMenuButton*)pButton)->m_szNode; // sometime it crashes here
...
return 0;
}
I suspected someone had not copied the character well ended by '\0', and it ended like:
Stack
m_szNode $%#^&!&!&!*#*#&!(*#(!*##&#&*&##!^&*&#(*!#*((*&*SDFKJSHDF*(&(*&(()(**
Can you tell me what is happening and what should i do to prevent the copying of wild pointer? Help will be very much appreciated!
I guess I'm unable to check if the character array is NULL before copying...

I suspect that your real problem could be that pButton is a bad pointer, so check that out first.
The only way to be 100% sure that a pointer is correct, and points to a correctly sized/allocated object is to never use pointers you didn't create, and never accept/return pointers. You would use cookies, instead, and look up your pointer in some sort of cookie -> pointer lookup (such as a hash table). Basically, don't trust user input.
If you are more concerned with finding bugs, and less about 100% safety against things like buffer overrun attacks, etc. then you can take a less aggressive approach. In your function signatures, where you currently take pointers to arrays, add a size parameter. E.g.:
void someFunction(char* someString);
Becomes
void someFunction(char* someString, size_t size_of_buffer);
Also, force the termination of arrays/strings in your functions. If you hit the end, and it isn't null-terminated, truncate it.
Make it so you can provide the size of the buffer when you call these, rather than calling strlen (or equivalent) on all your arrays before you call them.
This is similar to the approach taken by the "safe string functions" that were created by Microsoft (some of which were proposed for standardization). Not sure if this is the perfect link, but you can google for additional links:
http://msdn.microsoft.com/en-us/library/ff565508(VS.85).aspx

There are two possibilities:
pButton doesn't point to a CMenuButton like you think it does, and the cast is causing undefined behavior.
The code that sets m_szNode is incorrect, overflowing the given size of 32 characters.
Since you haven't shown us either piece of code, it's difficult to see what's wrong. Your initialization of m_szNode looks OK.
Is there any reason that you didn't choose a CString for m_szNode?

My approach would be to make m_szNode a private member in CMenuButton, and explicitly NULL-terminate it in the mutator method.
class CMenuButton {
private:
TCHAR m_szNode[32];
public:
void set_szNode( TCHAR x ) {
// set m_szNode appropriately
m_szNode[ 31 ] = 0;
}
};

Related

What does CString::GetBuffer() with no size parameter do?

Perhaps I'm going insane, but I have tried every search combination I can think of, and I can't find a definition for CString::GetBuffer() with no parameters. Every reference I look up describes CString::GetBuffer( int ), where the int parameter passed in is the max buffer length. The definition in the header is for CSimpleStringT::GetBuffer(). That gave me the following link, which at least acknowledges the existence of the parameterless version, but offers no description of its behavior.
https://msdn.microsoft.com/en-us/library/sddk80xf.aspx#csimplestringt__getbuffer
I'm looking at existing C++ (Visual Studio) code that I don't want to change if I don't have to, but I need to know the expected behavior of CString::GetBuffer(). I'd appreciate it if someone could explain it or point me to some documentation on it.

Although the msdn documentation doesn't really say what GetBuffer without a parameter does, the MFC source code reveals the answer:
return( m_pszData );
So it just returns a pointer to the underlying character buffer. (It also checks to see if the internal data is shared and forks/copies it first).
The code is in atlsimpstr.h
Complete function:
PXSTR GetBuffer()
{
CStringData* pData = GetData();
if( pData->IsShared() )
{
Fork( pData->nDataLength );
}
return( m_pszData );
}

tl;dr
Call CString::GetString().
This is asking the wrong question for the wrong reasons. Just to get it out of the way, here is the answer from the documentation:
Return Value
An PXSTR pointer to the object's (null-terminated) character buffer.
This is true for both overloads, with and without an explicit length argument. When calling the overload taking a length argument, the internal buffer may get resized to accommodate for increased storage requirements, prior to returning a pointer to that buffer.
From this comment, it becomes apparent, that the question is asking for the wrong thing altogether. To learn why, you need to understand what the purpose of the GetBuffer() family of class members is: To temporarily disable enforcement of CString's class invariants1 for modification, until establishing them again by calling one of the ReleaseBuffer() members. The primary use case for this is to interface with C code (like the Windows API).
The important information is:
GetBuffer() should only be called, if you plan to directly modify the contents of the stored character sequence.
Every call to GetBuffer() must be matched with a call to ReleaseBuffer(), before using any other CString class member2. Note in particular, that operator PCXSTR() and the destructor are class members.
As long as you follow that protocol, the controlled character sequence will always be null-terminated.
Given your actual use case (Log.Print("%s\n", myCstring.GetBuffer())), none of the previous really applies. Since you do not plan to actually modify the string contents, you should access the immutable CString interface (e.g. GetString() or operator PCXSTR()) instead. This requires const-correct function signatures (TCHAR const* vs. TCHAR*). Failing that, use a const_cast if you can ensure, that the callee will not mutate the buffer.
There are several benefits to this:
It is semantically correct. If all you want is a view into the character string, you do not need a pointer to a mutable buffer.
There are no superfluous copies of the contents. CString implements copy-on-write semantics. Requesting a mutable buffer necessitates copying the contents for shared instances, even if you are going to throw that copy away immediately after evaluating the current expression.
The immutable interface cannot fail. No exceptions are thrown when calling operator PXCSTR() or GetString().
1 The relevant invariants are: 1 The controlled sequence of characters is always null-terminated. 2 GetLength() returns the count of characters in the controlled sequence, excluding the null terminator.
2 It is only strictly required to call one of the ReleaseBuffer() implementations, if the contents were changed. This is often not immediately obvious from looking at the source code, so always calling ReleaseBuffer() is the safe option.

Documentation is inconclusive. Looking at ATL sources available here (https://github.com/dblock/msiext/blob/d8898d0c84965622868b1763958b68e19fd49ba8/externals/WinDDK/7600.16385.1/inc/atl71/atlsimpstr.h - I do not claim to know if they are official or not) it looks like GetBuffer() without arguments returns the current buffer, cloning it before if it is shared.
On the other hand, GetBuffer(int) with size is going to check (through the call to PrepareWrite and possibly PrepareWrite2) if the current buffer size is greater than requested, and if it is not, it will allocate the new buffer - thus matching MSDN description.
On a side note, PrepareWrite seems to become quite creative in how it checks for two conditions:
PXSTR PrepareWrite( __in int nLength )
{
CStringData* pOldData = GetData();
int nShared = 1-pOldData->nRefs; // nShared < 0 means true, >= 0 means false
int nTooShort = pOldData->nAllocLength-nLength; // nTooShort < 0 means true, >= 0 means false
if( (nShared|nTooShort) < 0 ) // If either sign bit is set (i.e. either is less than zero), we need to copy data
{
PrepareWrite2( nLength );
}
return( m_pszData );
}

Windows API functions often require the input of a character buffer of a certain length. Then use the GetBuffer(int) version. The following code snippet illustrates this and the difference between GetBuffer() and GetString() and the importance of calling ReleaseBuffer() after calling GetBuffer():
CStringW FullName;
if(::GetModuleFileNameW(nullptr,FullName.GetBuffer(MAX_PATH), MAX_PATH) <= 0)
return 0; //GetBuffer() returns PXSTR
FullName.ReleaseBuffer(); //Don't forget!
FullName = L"Path and Name: " + FullName;
std::wcout << FullName.GetString() << L"\n"; //GetString() returns PCXSTR

Manipulating std::string

The below code does not give any fault/error/warning(although I think there might be some illegal memory access happening). Strangely, the size of the string being printed using 2 different methods(strlen and std::string.size() is coming out differently.
strlen(l_str.c_str()-> is giving the size as 1500, whereas,
l_str.size()-> is giving the size as 0.
#include <string.h>
#include <string>
#include <stdio.h>
#include<iostream>
using namespace std;
void strRet(void* data)
{
char ar[1500];
memset(ar,0,1500);
for(int i=0;i<1500;i++)
ar[i]='a';
memset(data,0,1500); // This might not be correct but it works fine
memcpy(data,ar,1500);
}
int main()
{
std::string l_str;
cout<<endl<<"size before: "<<l_str.length();
int var=10;
strRet((void *)l_str.c_str());
printf("Str after call: %s\n",l_str.c_str());
cout<<endl<<"size after(using strlen): "<<strlen(l_str.c_str());
cout<<endl<<"Size after(using size function): "<<l_str.size();
printf("var value after call: %d\n",var);
return 0;
}
Please suggest, if I'm doing something which I'm not supposed to do!
Also, I wanted to know which memory bytes are being set to 0 when I do memset(data,0,1500);? What I mean to ask is that if suppose, my string variable's starting address is 100, then does memset command sets the memory range [100,1600] as 0? Or is it setting some other memory range?

memset(data,0,1500); // This might not be correct but it works fine
It isn't correct, and it doesn't "work fine". This is Undefined Behaviour, and you're making the common mistake of assuming that if it compiles, and your computer doesn't instantly catch fire, everything is fine.
It really isn't.
I've done something which I wasn't supposed to do!
Yes, you have. You took a pointer to a std::string, a non-trivial object with its own state and behaviour, asked it for the address of some memory it controls, and cast that to void*.
There's no reason to do that, you should very rarely ever see void* in C++ code, and seeing C-style casts to any type is pretty worrying.
Don't take void* pointers into objects with state and behaviour like std::string until you understand what you're doing and why this is wrong. Then, when that day comes, you still won't do it because you'll know better.
We can look at the first problem in some fine detail, if it helps:
(void *)l_str.c_str()
what does c_str() return? A pointer to some memory owned by l_str
where is this memory? No idea, that's l_str's business. If this standard library implementation uses the small string optimization, it may be inside the l_str object. If not, it may be dynamically allocated.
how much memory is allocated at this location? No idea, that's l_str's business. All we can say for sure is that there is at least one legally-addressable char (l_str.c_str()[0] == '\0') and that it's legal to use the address l_str.c_str()+1 (but only as a one-past-the-end pointer, so you can't dereference it)
So, the statement
strRet((void *)l_str.c_str());
passes strRet a pointer to a location containing one or more addressable chars, of which the first is zero. That's everything we can say about it.
Now let's look again at the problematic line
memset(data,0,1500); // This might not be correct but it works fine
why would we expect there to be 1500 chars at this location? If you'd documented strRet as requiring a buffer of at least 1500 allocated chars, would it look reasonable to actually pass l_str.c_str() when you know l_str has just been default constructed as an empty string? It's not like you asked l_str to allocate that storage for you.
You could start to make this work by giving l_str a chance to allocate the memory you intend to write, by calling
l_str.reserve(1500);
before calling strRet. This still won't notify l_str that you filled it with 'a's though, because you did that by changing the raw memory behind its back.
If you want this to work correctly, you could replace the entirety of strRet with
std::string l_str(1500, 'a');
or, if you want to change an existing string correctly, with
void strRet(std::string& out) {
// this just speeds it up, since we know the size in advance
out.reserve(1500);
// this is in case the string wasn't already empty
out.clear();
// and this actually does the work
std::fill_n(std::back_inserter(out), 1500, 'a');
}

Cannot safely delete LPTSTR allocation

Consider:
CCustomDateTime::CCustomDateTime()
{
LPTSTR result = new TCHAR[1024];
time_t _currentTime_t = time(0);
tm now;
localtime_s(&now, &_currentTime_t);
_tasctime_s(result, _tcslen(result), &now);
_currentTime = result;
delete[] result; // Error occurs here
}
CCustomDateTime::~CCustomDateTime()
{
}
__int64 CCustomDateTime::CurrentTimeAsInt64()
{
return _currentTime_t;
}
LPTSTR CCustomDateTime::CurrentTimeAsString()
{
return _currentTime;
}
I am unable to figure out the safest place to call delete[] on result.
If delete[] is ignored everything is fine, but otherwise an error occurs:
HEAP CORUPTION DETECTED at line delete[]

_tcslen(result) is not doing what you think it is.
change
_tasctime_s(result, _tcslen(result), &now);
to
_tasctime_s(result, 1024, &now);

There are a few problems with your code that I can see:
You don't check any of the function calls for errors. Don't ignore the return value. Use it to check for errors.
The second argument to _tasctime_s is the number of elements in the buffer provided. In other words, 1024. But you pass _tcslen(result) which is the length of the null-terminated string. Not only is that the wrong value, but result is at that point not initialised, so your code has undefined behaviour.
You assign a value to _currentTime, and then immediately delete that memory. So, _currentTime is a stale pointer. Any attempt to read from that memory is yet more undefined behaviour.
I don't want to tell you what your code should be, because you have only given us a tiny window into what you are trying to achieve. Dynamically allocating a fixed length array seems pointless. You may as well use automatically allocated storage. Of course, if you do want to return the memory to the caller, then dynamic allocation makes sense, but in that case then surely the caller would be responsible for calling delete[]. Since this code is clearly C++ I have to wonder why you are using raw memory allocation. Why not use standard library classes like std::string?
Looking at your update to the question, you could deallocate the memory in the destructor of your class. Personally though, I would recommend learning about the standard library classes that will greatly simplify your code.

_tcslen maps to strlen or wcslen depending on whether you are using ANSI or Unicode, respectively.
Both these functions return the length of a string, not the size of the buffer. In other words, they take a pointer to the first character of a string and continuously increment the pointer in search of a null terminator.
Calling these functions on an uninitialized buffer is undefined behavior because there's a very good chance that the pointer will get incremented out of the array bounds and elsewhere into the process' memory.

What unexpected behaviour can returning a pointer to a char array member cause?

Okay, so. I've been working on a class project (we haven't covered std::string and std::vector yet though obviously I know about them) to construct a time clock of sorts. The main portion of the program expects time and date values as formatted c-strings (e.g. "12:45:45", "12/12/12" etc.), and I probably could have kept things simple by storing them the same way in my basic class. But, I didn't.
Instead I did this:
class UsageEntry {
public:
....
typedef time_t TimeType;
typedef int IDType;
...
// none of these getters are thread safe
// furthermore, the char* the getters return should be used immediately
// and then discarded: its contents will be modified on the next call
// to any of these functions.
const char* getUserID();
const char* getDate();
const char* getTimeIn();
const char* getTimeOut();
private:
IDType m_id;
TimeType m_timeIn;
TimeType m_timeOut;
char m_buf[LEN_MAX];
};
And one of the getters (they all do basically the same thing):
const char* UsageEntry::getDate()
{
strftime(m_buf, LEN_OF_DATE, "%D", localtime(&m_timeIn));
return m_buf;
}
And here is a function that uses this pointer:
// ==== TDataSet::writeOut ====================================================
// writes an entry to the output file
void TDataSet::writeOut(int index, FILE* outFile)
{
// because of the m_buf kludge, this cannot be a single
// call to fprintf
fprintf(outFile, "%s,", m_data[index].getUserID());
fprintf(outFile, "%s,", m_data[index].getDate());
fprintf(outFile, "%s,", m_data[index].getTimeIn());
fprintf(outFile, "%s\n", m_data[index].getTimeOut());
fflush(outFile);
} // end of TDataSet::writeOut
How much trouble will this cause? Or to look at it from another angle, what other sorts of interesting and !!FUN!! behaviour can this cause? And, finally, what can be done to fix it (besides the obvious solution of using strings/vectors instead)?
Somewhat related: How do the C++ library functions that do similar things handle this? e.g. localtime() returns a pointer to a struct tm object, which somehow survives the end of that function call at least long enough to be used by strftime.

There is not enough information to determine if it will cause trouble because you do not show how you use it. As long as you document the caveats and keep them in mind when using your class, there won't be issues.
There are some common gotchas to watch out for, but hopefully these are common sense:
Deleting the UsageEntry will invalidate the pointers returned by your getters, since those buffers will be deleted too. (This is especially easy to run into if using locally declared UsageEntrys, as in MadScienceDream's example.) If this is a risk, callers should create their own copy of the string. Document this.
It does not look like m_timeIn is const, and therefore it may change. Calling the getter will modify the internal buffer and these changes will be visible to anything that has that pointer. If this is a risk, callers should create their own copy of the string. Document this.
Your getters are neither reentrant nor thread-safe. Document this.
It would be safer to have the caller supply a destination buffer and length as a parameter. The function can return a pointer to that buffer for convenience. This is how e.g. read works.
A strong API can avoid issues. Failing that, good documentation and common sense can also reduce the chance of issues. Behavior is only unexpected if nobody expects it, this is why documentation about the behavior is important: It generally eliminates unexpected behavior.
Think of it like the "CAUTION: HOT SURFACE" warning on top of a toaster oven. You could design the toaster oven with insulation on top so that an accident can't happen. Failing that, the least you can do is put a warning label on it and there probably won't be an accident. If there's neither insulation nor a warning, eventually somebody will burn themselves.
Now that you've edited your question to show some documentation in the header, many of the initial risks have been reduced. This was a good change to make.
Here is an example of how your usage would change if user-supplied buffers were used (and a pointer to that buffer returned):
// ==== TDataSet::writeOut ====================================================
// writes an entry to the output file
void TDataSet::writeOut(int index, FILE* outFile)
{
char userId[LEN_MAX], date[LEN_MAX], timeIn[LEN_MAX], timeOut[LEN_MAX];
fprintf(outFile, "%s,%s,%s,%s\n",
m_data[index].getUserID(userId, sizeof(userId)),
m_data[index].getDate(date, sizeof(date)),
m_data[index].getTimeIn(timeIn, sizeof(timeIn)),
m_data[index].getTimeOut(timeOut, sizeof(timeOut))
);
fflush(outFile);
} // end of TDataSet::writeOut

How much trouble will this cause? Or to look at it from another angle,
what other sorts of interesting and !!FUN!! behaviour can this cause?
And, finally, what can be done to fix it (besides the obvious solution
of using strings/vectors instead)?
Well there is nothing very FUN here, it just means that the results of your getter cannot outlive the corresponding instance of UsageEntry or you have a dangling pointer.
How do the C++ library functions that do similar things handle this?
e.g. localtime() returns a pointer to a struct tm object, which
somehow survives the end of that function call at least long enough to
be used by strftime.
The documentation of localtime says:
Return value
pointer to a static internal std::tm object on success, or NULL otherwise. The structure may be shared between
std::gmtime, std::localtime, and std::ctime, and may be overwritten on
each invocation.

The main problem here, as the main problem with most pointer based code, is the issue of ownership. The problem is the following:
const char* val;
{
UsageEntry ue;
val = ue.getDate();
}//ue goes out of scope
std::cout << val << std::endl;//SEGFAULT (maybe, really nasal demons)
Because val is actually owned by ue, you shoot yourself in the foot if they exist in different scopes. You COULD document this, but it is oh-so-much simpler to pass the buffer in as an argument (just like the strftime function does).
(Thanks to odedsh below for pointing this one out)
Another issue is that subsequent calls will blow away the info gained. The example odesh used was
fprintf(outFile, "%s\n%s",ue.getUserID(), ue.getDate());
but the problem is more pervasive:
const char* id = ue.getUserID();
const char* date = ue.getDate();//Changes id!
This violates the "Principal of Least Astonishment" becuase...well, its weird.
This design also breaks the rule-of-thumb that each class should do exactly one thing. In this case, UsageEntry both provides accessors to get the formatted time as a string, AND manages that strings buffer.

Possible Memory Leak: new char[strlen()]

This is a fairly basic question and I am pretty sure I know the answer, but seeing as the consequence for being wrong is a segfault I figure I should ask. I have been using strlen() and the new char[] operator in the following way for quite some time now and just noticed something that threw up a red flag:
void genericCopy(char *somestring, char *someOtherString) {
someOtherString = new char[strlen(somestring)];
strcpy(someOtherString,somestring);
}
My question is, seeing as a string should be null terminated, should I be doing this as such:
void genericCopy(char *somestring, char *someOtherString) {
someOtherString = new char[strlen(somestring)+1];
strcpy(someOtherString,somestring);
someOtherString[strlen(someOtherString)] = '\0';
}
So far I have never had a problem with the first method, but that doesn't mean I'm doing it right. Since the length being return by strlen()is the number of characters in the string without the null terminator so new isn't reserving space for '/0'... At least I don't think it is.

First of all, you should know that this function of yours is pointless to write, just use strdup (if available on your system).
But yes, you need an additional byte to store the \0, so always do something like new char[strlen(somestring)+1];. However, there is no need to manually add the \0; strcpy already does this.
You should use something like Valgrind to discover this and similar bugs in your code.
There is however an additional problem in your code; your code will always leak someOtherString; it will not be returned to where you called it from. You either need to change your method to something like:
char *genericCopy(char *something) {
char *copy = new char[strlen(somestring)+1];
strcpy(copy,somestring);
return copy;
}
and then get the copy as follows:
copy = genericCopy(something);
Or you need to change your method to something like:
void genericCopy(char *something, char **copy) {
*copy = new char[strlen(somestring)+1];
strcpy(*copy,somestring);
}
and call it as:
genericCopy(something, &copy);
If you'll be using C++ you could also just change the method prototype to:
void genericCopy(char* somestring, char*& someOtherString)
and call it as:
genericCopy(something, copy);
Then someOtherString will be passed as a reference, and the new value you allocate to it will propagate outside of your method.

Yes, your suspicion is correct. You should be allocating an additional character, and making sure the copied string is null-terminated. (strcpy() itself will do this, but when someone advises to you that you switch to strncpy(), as they no doubt will (it's safer!) you'll need to be extra careful, because it is NOT guaranteed to copy the '/0'.)
If you're already using C++, though, you may be well-advised to switch to using std::string. It's often an easier, less error-prone method of manipulating character arrays.
However, here's the further problem that you need to address. You are assigning your new character array to a COPY of someOtherString. You need to make some changes:
void genericCopy(char *somestring, char **someOtherString) {
*someOtherString = new char[strlen(somestring)+1];
strcpy(*someOtherString,somestring);
(*someOtherString)[strlen(somestring)] = '\0';
}
This way you will get back the new character buffer outside your function call.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to prevent copying a wild pointer string - c++

My approach would be to make m_szNode a private member in CMenuButton, and explicitly NULL-terminate it in the mutator method. class CMenuButton { private: TCHAR m_szNode[32]; public: void set_szNode( TCHAR x ) { // set m_szNode appropriately m_szNode[ 31 ] = 0; } };

Related

What does CString::GetBuffer() with no size parameter do?

Manipulating std::string

Cannot safely delete LPTSTR allocation

What unexpected behaviour can returning a pointer to a char array member cause?

Possible Memory Leak: new char[strlen()]

Categories

Resources