Deep copy of TCHAR array is truncated - c++

I've created a class to test some functionality I need to use. Essentially the class will take a deep copy of the passed in string and make it available via a getter. Am using Visual Studio 2012. Unicode is enabled in the project settings.
The problem is that the memcpy operation is yielding a truncated string. Output is like so;
THISISATEST: InstanceDataConstructor: Testing testing 123
Testing te_READY
where the first line is the check of the passed in TCHAR* string & the second line is the output from populating the allocated memory with the memcpy operation. Output expected is; "Testing testing 123".
Can anyone explain what is wrong here?
N.B. Got the #ifndef UNICODE typedefs from here: how-to-convert-tchar-array-to-stdstring
#ifndef INSTANCE_DATA_H//if not defined already
#define INSTANCE_DATA_H//then define it
#include <string>
//TCHAR is just a typedef, that depending on your compilation configuration, either defaults to char or wchar.
//Standard Template Library supports both ASCII (with std::string) and wide character sets (with std::wstring).
//All you need to do is to typedef String as either std::string or std::wstring depending on your compilation configuration.
//To maintain flexibility you can use the following code:
#ifndef UNICODE
typedef std::string String;
#else
typedef std::wstring String;
#endif
//Now you may use String in your code and let the compiler handle the nasty parts. String will now have constructors that lets you convert TCHAR to std::string or std::wstring.
class InstanceData
{
public:
InstanceData(TCHAR* strIn) : strMessage(strIn)//constructor
{
//Check to passed in string
String outMsg(L"THISISATEST: InstanceDataConstructor: ");//L for wide character string literal
outMsg += strMessage;//concatenate message
const wchar_t* finalMsg = outMsg.c_str();//prepare for outputting
OutputDebugStringW(finalMsg);//print the message
//Prepare TCHAR dynamic array. Deep copy.
charArrayPtr = new TCHAR[strMessage.size() +1];
charArrayPtr[strMessage.size()] = 0;//null terminate
std::memcpy(charArrayPtr, strMessage.data(), strMessage.size());//copy characters from array pointed to by the passed in TCHAR*.
OutputDebugStringW(charArrayPtr);//print the copied message to check
}
~InstanceData()//destructor
{
delete[] charArrayPtr;
}
//Getter
TCHAR* getMessage() const
{
return charArrayPtr;
}
private:
TCHAR* charArrayPtr;
String strMessage;//is used to conveniently ascertain the length of the passed in underlying TCHAR array.
};
#endif//header guard

A solution without all of the dynamically allocated memory.
#include <tchar.h>
#include <vector>
//...
class InstanceData
{
public:
InstanceData(TCHAR* strIn) : strMessage(strIn),
{
charArrayPtr.insert(charArrayPtr.begin(), strMessage.begin(), strMessage.end())
charArrayPtr.push_back(0);
}
TCHAR* getMessage()
{ return &charArrayPtr[0]; }
private:
String strMessage;
std::vector<TCHAR> charArrayPtr;
};
This does what your class does, but the major difference being that it does not do any hand-rolled dynamic allocation code. The class is also safely copyable, unlike the code with the dynamic allocation (lacked a user-defined copy constructor and assignment operator).
The std::vector class has superseded having to do new[]/delete[] in almost all circumstances. The reason being that vector stores its data in contiguous memory, no different than calling new[].

Please pay attention to the following lines in your code:
// Prepare TCHAR dynamic array. Deep copy.
charArrayPtr = new TCHAR[strMessage.size() + 1];
charArrayPtr[strMessage.size()] = 0; // null terminate
// Copy characters from array pointed to by the passed in TCHAR*.
std::memcpy(charArrayPtr, strMessage.data(), strMessage.size());
The third argument to pass to memcpy() is the count of bytes to copy.
If the string is a simple ASCII string stored in std::string, then the count of bytes is the same of the count of ASCII characters.
But, if the string is a wchar_t Unicode UTF-16 string, then each wchar_t occupies 2 bytes in Visual C++ (with GCC things are different, but this is a Windows Win32/C++ code compiled with VC++, so let's just focus on VC++).
So, you have to properly scale the size count for memcpy(), considering the proper size of a wchar_t, e.g.:
memcpy(charArrayPtr, strMessage.data(), strMessage.size() * sizeof(TCHAR));
So, if you compile in Unicode (UTF-16) mode, then TCHAR is expanded to wchar_t, and sizeof(wchar_t) is 2, so the content of your original string should be properly deep-copied.
As an alternative, for Unicode UTF-16 strings in VC++ you may use also wmemcpy(), which considers wchar_t as its "unit of copy". So, in this case, you don't have to scale the size factor by sizeof(wchar_t).
As a side note, in your constructor you have:
InstanceData(TCHAR* strIn) : strMessage(strIn)//constructor
Since strIn is an input string parameter, consider passing it by const pointer, i.e.:
InstanceData(const TCHAR* strIn)

Related

Assign variable wstring to Char array in C++

I have char array as follows:
TCHAR name[256] = L"abc";
Also I have another wstring vector as follows,
std::vector<std::wstring> nameList;
nameList.push_back(L"cde");
nameList.push_back(L"fgh");
I want to assign nameList vector first element to name array,
Can any one help for that me?
You can use std::copy; name is an array with a bound, but it's usage as a function argument decays to a pointer to it's first element, which satisfies the requirements for an output iterator.
So you can:
wchar_t name[256] = L"abc";
std::vector<std::wstring> nameList;
nameList.push_back(L"cde");
nameList.push_back(L"fgh");
std::copy(nameList.front().begin(), nameList.front().end(), name);
Note that: this will not add any trailing \0 terminator to the buffer; If you wanted to replace/overwrite name, you should as well just use std::wstring and save yourself some hassles
Given your question and the assumption that you must use an array instead of a wstring, your best bet may be to use either std::copy or even an old fashioned memcpy. However these are dangerous for the following two reasons:
If TCHAR is not actually a wchar_t there are likely be to memory errors.
If nameList contains a string that is longer than 255 TCHAR characters you will have a buffer overflow.
That said, you can do this safely with the following:
if (nameList[0].size() >= 256) {
throw std::length_error("string too long");
}
std::copy(nameList[0].begin(), nameList[0].end(), name);
name[nameList[0].size()] = TCHAR(0);
You could also add a static_assert to force a compiler error if TCHAR is not a wchar_t, but it probably isn't necessary as the copy would perform any implicit conversion on a character by character basis.

What is the difference and the relationship of char and CString [duplicate]

This question already has answers here:
What is `CString`?
(3 answers)
Closed 9 years ago.
Can someone explain me the difference and the relationship between the char * and CString?... Thanks.
There are few important differences.
char * is a pointer to char. Generally you can't say if it a single char, or a beginning of a string, and what is the length. All those things are dictated by program logic and some conventions, i.e. standard C functions, like to use const char * as inputs. You need to manage memory allocated for strings manually.
CString is a macro. Depending on your program compilation options, it can be defined to either the CStringA or CStringW class. There are differences and similarities.
The difference is that CStringAoperates with non-Unicode data (similar to char*), and CStringW is a Unicode string (similar to wchar_t*).
Both classes, however, are equivalent in the aspect of string manipulation and storage management. They are closer to the standard C++ std::string and std::wstring classes.
Apart from that, both CStringA and CStringW provide the capability to convert strings to and from Unicode form.
a CString will be an array of char and a char* will be a pointer into the array of char with which you can iterate the characters of the string.
Actually from MSDN:
CString is based on the TCHAR data type. If the symbol _UNICODE is defined for your program, TCHAR is defined as type wchar_t, a 16-bit character type; otherwise, it is defined as char, the normal 8-bit character type. Under Unicode, then, CString objects are composed of 16-bit characters. Without Unicode, they are composed of 8-bit char type.
CString is a class packed with different functionalities.. MSDN
char * is just a regular c++ data type.
CString is used mostly in MFC applications.
CString is a sequence of TCHAR-s rather then char*. The main difference is that if UNICODE is defined CString will be sequence of wchar. Actually depending on that macro CString will be tpyedef -ed either to CStringA or CStringW. Another major difference is that CString is a class while char* is simply a pointer to character.
Depending on the type of TCHAR, CString can be either CStringA or CStringW.
That said, CString is a wrapper over an array of chars, that enables you to easily treat that array of chars as a string, and operate on it in manners relevant to the string type.
For the relationship between them, here is something that illustrates it easily. You can convert between char * and CString like this:
CString str = "abc"; // const char[3] or char * to CString
and
const char * p = str.Get() // CString to const char *
A CString is a class and provides lots of functionalities that a char * doesnt. A char * is just a pointer to char or chars array.
A CString contains a buffer that is roughtly the same as a char * : LPTSTR GetBuffer( int nMinBufLength );
For the difference between LPTSTR and char * go here and here
CString is a wrapper class around a char* to provide some useful additional functions and to hide the memory allocation/deallocation from the user.
There is not much difference in performance terms so if you are using MFC classes, you might as well use a CString.

C string to wide C string assignment

I'm a little confused about C strings and wide C strings. For the sake of this question, assume that I using Microsoft Visual Studio 2010 Professional. Please let me know if any of my information is incorrect.
I have a struct with a const wchar_t* member which is used to store a name.
struct A
{
const wchar_t* name;
};
When I assign object 'a' a name as so:
int main()
{
A a;
const wchar_t* w_name = L"Tom";
a.name = w_name;
return 0;
}
That is just copying the memory address that w_name points to into a.name. Now w_name and a.name are both wide character pointers which point to the same address in memory.
If I am correct, then I am wondering what to do about a situation like this. I am reading in a C string from an XML attribute using tinyxml2.
tinyxml2::XMLElement* pElement;
// ...
const char* name = pElement->Attribute("name");
After I have my C string, I am converting it to a wide character string as follows:
size_t newsize = strlen(name) + 1;
wchar_t * wcName = new wchar_t[newsize];
size_t convertedChars = 0;
mbstowcs_s(&convertedChars, wcName, newsize, name, _TRUNCATE);
a.name = wcName;
delete[] wcName;
If I am correct so far, then the line:
a.name = wcName;
is just copying the memory address of the first character of array wcName into a.name. However, I am deleting wcName directly after assigning this pointer which would make it point to garbage.
How can I convert my C string into a wide character C string and then assign it to a.name?
The easiest approach is probably to task you name variable with the management of the memory. This, in turn, is easily done by declaring it as
std::wstring name;
These guys don't have a concept of independent content and object mutation, i.e., you can't really make the individual characters const and making the entire object const would prevent it from being assigned to.
You can do this while using a std::wstring without relying on the additional temporary conversion buffer allocation and destruction. Not tremendously important unless you're overtly concerned about heap fragmentation or on a limited system (aka Windows Phone). It just takes a little setup on the front side. Let the standard library manage the memory for you (with a little nudge).
class A
{
...
std::wstring a;
};
// Convert the string (I'm assuming it is UTF8) to wide char
int wlen = MultiByteToWideChar(CP_UTF8, 0, name, -1, NULL, NULL);
if (wlen > 0)
{
// reserve space. std::wstring gives us the terminator slot
// for free, so don't include that. MB2WC above returns the
// length *including* the terminator.
a.resize(wlen-1);
MultiByteToWideChar(CP_UTF8, 0, name, -1, &a[0], wlen);
}
else
{ // no conversion available/possible.
a.clear();
}
On a complete side-note, you can build TinyXML to use the standard library and std::string rather than char *, which doesn't really help you much here, but may save you a ton of future strlen() calls later on.
As you correctly mentioned a.name is just a pointer which doesn't suppose any allocated string storage. You must manage it manually using new or static/scoped array.
To get rid of these boring things just use one of available string classes: CStringW from ATL (easy to use but MS-specific) or std::wstring from STL (C++ standard, but not so easy to convert from char*):
#include <atlstr.h>
// Conversion ANSI -> Wide is automatic
const CStringW name(pElement->Attribute("name"));
Unfortunately, std::wstring usage with char* is not so easy.
See conversion functon here: How to convert std::string to LPCWSTR in C++ (Unicode)

CString : What does (TCHAR*)(this + 1) mean?

In the CString header file (be it Microsoft's or Open Foundation Classes - http://www.koders.com/cpp/fid035C2F57DD64DBF54840B7C00EA7105DFDAA0EBD.aspx#L77 ), there is the following code snippet
struct CStringData
{
long nRefs;
int nDataLength;
int nAllocLength;
TCHAR* data() { return (TCHAR*)(&this[1]); };
...
};
What does the (TCHAR*)(&this[1]) indicate?
The CStringData struct is used in the CString class (http :// www.koders.com/cpp/fid100CC41B9D5E1056ED98FA36228968320362C4C1.aspx).
Any help is appreciated.
CString has lots of internal tricks which make it look like a normal string when passed e.g. to printf functions, despite actually being a class - without having to cast it to LPCTSTR in the argument list, e.g., in the case of varargs (...) in e.g. a printf. Thus trying to understand a single individual trick or function in the CString implementation is bad news. (The data function is an internal function which gets the 'real' buffer associated with the string.)
There's a book, MFC Internals that goes into it, and IIRC the Blaszczak book might touch it.
EDIT: As for what the expression actually translates to in terms of raw C++:-
TCHAR* data() { return (TCHAR*)(&this[1]); };
this says "pretend you're actually the first entry in an array of items allocated together. Now, the second item isnt actually a CString, it's a normal NUL terminated buffer of either Unicode or normal characters - i.e., an LPTSTR".
Another way of expressing the same thing is:
TCHAR* data() { return (TCHAR*)(this + 1); };
When you add 1 to a pointer to T, you actually add 1* sizeof T in terms of a raw memory address. So if one has a CString located at 0x00000010 with sizeof(CString) = 4, data will return a pointer to a NUL terminated array of chars buffer starting at 0x00000014
But just understanding this one thing out of context isnt necessarily a good idea.
Why do you need to know?
It returns the memory area that is immediately after the CStringData structure as an array of TCHAR characters.
You can understand why they are doing this if you look at the CString.cpp file:
static const struct {
CStringData data;
TCHAR ch;
} str_empty = {{-1, 0, 0}, 0};
CStringData* pData = (CStringData*)mem_alloc(sizeof(CStringData) + size*sizeof(TCHAR));
They do this trick, so that CString looks like a normal data buffer, and when you ask for the getdata it skips the CStringData structure and points directly to the real data buffer like char*

CStringT to char[]

I'm trying to make changes to some legacy code. I need to fill a char[] ext with a file extension gotten using filename.Right(3). Problem is that I don't know how to convert from a CStringT to a char[].
There has to be a really easy solution that I'm just not realizing...
TIA.
If you have access to ATL, which I imagine you do if you're using CString, then you can look into the ATL conversion classes like CT2CA.
CString fileExt = _T ("txt");
CT2CA fileExtA (fileExt);
If a conversion needs to be performed (as when compiling for Unicode), then CT2CA allocates some internal memory and performs the conversion, destroying the memory in its destructor. If compiling for ANSI, no conversion needs to be performed, so it just hangs on to a pointer to the original string. It also provides an implicit conversion to const char * so you can use it like any C-style string.
This makes conversions really easy, with the caveat that if you need to hang on to the string after the CT2CA goes out of scope, then you need to copy the string into a buffer under your control (not just store a pointer to it). Otherwise, the CT2CA cleans up the converted buffer and you have a dangling reference.
Well you can always do this even in unicode
char str[4];
strcpy( str, CStringA( cString.Right( 3 ) ).GetString() );
If you know you AREN'T using unicode then you could just do
char str[4];
strcpy( str, cString.Right( 3 ).GetString() );
All the original code block does is transfer the last 3 characters into a non unicode string (CStringA, CStringW is definitely unicode and CStringT depends on whether the UNICODE define is set) and then gets the string as a simple char string.
First use CStringA to make sure you're getting char and not wchar_t. Then just cast it to (const char *) to get a pointer to the string, and use strcpy or something similar to copy to your destination.
If you're completely sure that you'll always be copying 3 characters, you could just do it the simple way.
ext[0] = filename[filename.Length()-3];
ext[1] = filename[filename.Length()-2];
ext[2] = filename[filename.Length()-1];
ext[3] = 0;
I believe this is what you are looking for:
CString theString( "This is a test" );
char* mychar = new char[theString.GetLength()+1];
_tcscpy(mychar, theString);
If I remember my old school MS C++.
You do not specify where is the CStringT type from. It could be anything, including your own implementation of string handling class. Assuming it is CStringT from MFC/ATL library available in Visual C++, you have a few options:
It's not been said if you compile with or without Unicode, so presenting using TCHAR not char:
CStringT
<
TCHAR,
StrTraitMFC
<
TCHAR,
ChTraitsCRT<TCHAR>
>
> file(TEXT("test.txt"));
TCHAR* file1 = new TCHAR[file.GetLength() + 1];
_tcscpy(file1, file);
If you use CStringT specialised for ANSI string, then
std::string file1(CStringA(file));
char const* pfile = file1.c_str(); // to copy to char[] buffer