Passing PChar / *char between C++ and Delphi DLL - c++

I have a C++ program, that calls Delphi DLL to initialize a buffer that contains chars.
I am testing out the interface to make sure the data are passed correctly:
In C++ program:
char * pMsg = (char*)malloc(3); //allocate buffer
Init(pMsg * char); //Delphi DLL function
In Delphi DLL:
procedure Init(pMsg:PChar);
var
pHardcodedMsg:PChar;
begin
pHardcodedMsg:= '123';
CopyMemory(pMsg, pHardcodedMsg, Length(pHardcodedMsg));
end;
But, when I try to printf( (const char*)pMsg ) in C++,
It shows me "123" followed by some rubbish characters.
Why is this so?
How can I successfully place an array of char into the buffer and have the string printed out correctly?

Delphi does not use NULL-terminated strings so you need to slap a 0 at the end, as C/C++ uses that to determine where the string data ends (Pascal uses the size of the string at the beginning IIRC).
The usual character is '\0' - escape value 0.
Don't forget to return 4 characters, not 3.

Your Init function doesn't work because
1) pHardcodedMsg is a pointer for which you didn't allocate memory
2) CopyMemory doesn't add a 0 to the end of pMsg
3) the procedure header of Init misses a semi colon at the end of the line
When you are using a unicode version of Delphi you will also have to consider string length and character set conversion

Related

Char pointer giving me some really strange characters

When I run the example code, the wordLength is 7 (hence the output 7). But my char array gets some really weird characters in the end of it.
wordLength = word.length();
cout << wordLength;
char * wordchar = new char[wordLength]; //new char[7]; ??
for (int i = 0; i < word.length(); i++) //0-6 = 7
{
wordchar[i] = 'a';
}
cout << wordchar;
The output: 7 aaaaaaa²²²²¦¦¦¦¦ÂD╩2¦♀
Desired output is: aaaaaaa... What is the garbage behind it?? And how did it end up there?
You should add \0 at the end of wordchar.
char * wordchar = new char[wordLength +1];
//add chars as you have done
wordchar[wordLength] = `\0`
The reason is that C-strings are null terminated.
C strings are terminated with a '\0' character that marks their end (in contrast, C++ std::string just stores the length separately).
In copying the characters to wordchar you didn't terminate the string, thus, when operator<< outputs wordchar, it goes on until it finds the first \0 character that happens to be after the memory location pointed to by wordchar, and in the process it prints all the garbage values that happen to be in memory in between.
To fix the problem, you should:
make the allocated string 1 char longer;
add the \0 character at the end.
Still, in C++ you'll normally just want to use std::string.
Use: -
char * wordchar = new char[wordLength+1]; // 1 extra for null character
before for loop and
wordchar[i] ='\0'
after for loop , C strings are null terminated.
Without this it keeps on printing, till it finds the first null character,printing all the garbage values.
You avoid the trailing zero, that's the cause.
In C and C++ the way the whole eco-system treats string length is that it assumes a trailing zero ('\0' or simply 0 numerically). This is different then for example pascal strings, where the memory representation starts with the number which tells how many of the next characters comprise the particular string.
So if you have a certain string content what you want to store, you have to allocate one additional byte for the trailing zero. If you manipulate memory content, you'll always have to keep in mind the trailing zero and preserve it. Otherwise strstr and other string manipulation functions can mutate memory content when running off the track and keep on working on the following memory section. Without trailing zero strlen will also give a false result, it also counts until it encounters the first zero.
You are not the only one making this mistake, it often gets important roles in security vulnerabilities and their exploits. The exploit takes advantage of the side effect that function go off trail and manipulate other things then what was originally intended. This is a very important and dangerous part of C.
In C++ (as you tagged your question) you better use STL's std::string, and STL methods instead of C style manipulations.

Get C++ wchar_t into Flash, via Lua

I am currently working on an application in C++, that ties into Lua, that ties into Flash (in that order). My goal at the moment is getting wchar_ts from C++ into Flash, via Lua. I would love any insights as to how I can accomplish this!
If any other information is required, please ask and I'll do my best to provide it
What I have tried
It's my understanding that Lua is not a fan of Unicode, but it should still be able to receive the string of bytes from my C++ application. I imagine there must be a way to then pass those bytes over to Flash to then render out my intended Unicode. So what I've done so far:
C++:
//an example wchar_t*
const wchar_t *text = L"Test!";
//this function pushes a char* to my Lua code
lua.PushString((char*)text); //directly casting text to a char*... D:
Lua:
theString = FunctionThatGetsWCharFromCpp();
flash.ShowString(theString);
Flash:
function ShowString(theString:String)
{
myTextField.text = theString;
}
Now the outcome here is that myTextField only shows "T". This made sense to me. The cast from wchar_t to char would end up padding out the chars with some zeros, especially since "T" doesn't really utilize both bytes of a wchar_t. A quick look at the documentation yields:
lua_pushstring
The string cannot contain embedded zeros; it is assumed to end at the first zero.
So I ran a little test:
C++:
//prefixing with a Japanese character
//which will use both bytes of the wchar_t
const wchar_t *text = L"たTest!";
The Flash textbox now reads: "_0T", 3 characters. Makes total sense, the 2 bytes of the Japanese character + T, then termination.
I understand what is going on, but I am still completely unsure of how to tackle this problem. And I'm really unsure of what to search for. Is there a specific Lua function I can use to pass a wad of bytes over to Lua from C++ (I've read somewhere that lua_pushlstring is often used for this, but that also terminates at first zero)? Is there a Flash datatype that will accept these bytes, then I'll need to do some sort of conversion to get them into a readable, multibyte string? or is this just really not possible?
Note:
I'm not too familiar with Unicode and code pages and whatnot, so I'm not too sure if there'll also be a step where I'll need to specify the correct encoding in Flash so that I can get the correct output - but I'm happy to cross that bridge when I get there, but if anyone has any insight here too, that would be great!
I don't know if this will work, but I'd recommend trying to use UTF-8. A string encoded in UTF-8 doesn't have any embedded zeros in it, so Lua should be able handle it, and Flash ought to also be able to handle it, depending on how exactly the languages interface.
Here's one way to convert a wide-character string to UTF-8 using setlocale(3) wcstombs(3):
// Error checking omitted for expository purposes
// Call this once at program startup. If you'd rather not change the locale,
// you can instead write your own conversion routine (but beware of UTF-16
// surrogate pairs if you do)
setlocale(LC_ALL, "en_US.UTF-8");
// Do this for each string you want to convert
const wchar_t *wideString = L"たTest!";
size_t len = wcslen(wideString);
size_t maxUtf8len = 4 * len + 1; // Each wchar_t encodes to a max of 4 bytes
char *utf8String = new char[maxUtf8len];
wcstombs(utf8String, wideString, maxUtf8len);
...
// Do stuff with utf8string
...
delete [] utf8String;
If you're on Windows, you can instead use the WideCharToMultiByte function with the CP_UTF8 code page to do the conversion, since I don't believe that the Visual Studio C runtime supports UTF-8 locales:
// Error checking omitted for expository purposes
const wchar_t *wideString = L"たTest!";
size_t len = wcslen(wideString);
size_t maxUtf8len = 4 * len + 1; // Each wchar_t encodes to a max of 4 bytes
char *utf8String = new char[maxUtf8len];
WideCharToMultiByte(CP_UTF8, 0, wideString, len + 1, utf8String, maxUtf8len, NULL, NULL);
...
// Do stuff with utf8string
...
delete [] utf8String;

Passing length 0 string ('0') to STL functions that expect char*

colleague(serioussly I dont use char* :) ) made a bug that reduces to this:
testVar.append('\0'); //testVar is std::string
So he basically this fixes it:
testVar.append("\0");
My question is why first one isnt legal?
Cant it be considered as 0 length 0 terminated string?
I tried going into VS10 std lib implementation to see for myself but I regretted it. :)
' creates a char literal, which is not the same as a string / char *. Some languages treat a single character as a length-1 string, but C++ defines a single character to be a primitive datatype while a string is an array of characters.

What does '\0' mean?

I can't understand what the '\0' in the two different place mean in the following code:
string x = "hhhdef\n";
cout << x << endl;
x[3]='\0';
cout << x << endl;
cout<<"hhh\0defef\n"<<endl;
Result:
hhhdef
hhhef
hhh
Can anyone give me some pointers?
C++ std::strings are "counted" strings - i.e., their length is stored as an integer, and they can contain any character. When you replace the third character with a \0 nothing special happens - it's printed as if it was any other character (in particular, your console simply ignores it).
In the last line, instead, you are printing a C string, whose end is determined by the first \0 that is found. In such a case, cout goes on printing characters until it finds a \0, which, in your case, is after the third h.
C++ has two string types:
The built-in C-style null-terminated strings which are really just byte arrays and the C++ standard library std::string class which is not null terminated.
Printing a null-terminated string prints everything up until the first null character. Printing a std::string prints the whole string, regardless of null characters in its middle.
\0 is the NULL character, you can find it in your ASCII table, it has the value 0.
It is used to determinate the end of C-style strings.
However, C++ class std::string stores its size as an integer, and thus does not rely on it.
You're representing strings in two different ways here, which is why the behaviour differs.
The second one is easier to explain; it's a C-style raw char array. In a C-style string, '\0' denotes the null terminator; it's used to mark the end of the string. So any functions that process/display strings will stop as soon as they hit it (which is why your last string is truncated).
The first example is creating a fully-formed C++ std::string object. These don't assign any special meaning to '\0' (they don't have null terminators).
The \0 is treated as NULL Character. It is used to mark the end of the string in C.
In C, string is a pointer pointing to array of characters with \0 at the end. So following will be valid representation of strings in C.
char *c =”Hello”; // it is actually Hello\0
char c[] = {‘Y’,’o’,’\0′};
The applications of ‘\0’ lies in determining the end of string .For eg : finding the length of string.
The \0 is basically a null terminator which is used in C to terminate the end of string character , in simple words its value is null in characters basically gives the compiler indication that this is the end of the String Character
Let me give you example -
As we write printf("Hello World"); /* Hello World\0
here we can clearly see \0 is acting as null ,tough printinting the String in comments would give the same output .

String going crazy if I don't give it a little extra room. Can anyone explain what is happening here?

First, I'd like to say that I'm new to C / C++, I'm originally a PHP developer so I am bred to abuse variables any way I like 'em.
C is a strict country, compilers don't like me here very much, I am used to breaking the rules to get things done.
Anyway, this is my simple piece of code:
char IP[15] = "192.168.2.1";
char separator[2] = "||";
puts( separator );
Output:
||192.168.2.1
But if I change the definition of separator to:
char separator[3] = "||";
I get the desired output:
||
So why did I need to give the man extra space, so he doesn't sleep with the man before him?
That's because you get a not null-terminated string when separator length is forced to 2.
Always remember to allocate an extra character for the null terminator. For a string of length N you need N+1 characters.
Once you violate this requirement any code that expects null-terminated strings (puts() function included) will run into undefined behavior.
Your best bet is to not force any specific length:
char separator[] = "||";
will allocate an array of exactly the right size.
Strings in C are NUL-terminated. This means that a string of two characters requires three bytes (two for the characters and the third for the zero byte that denotes the end of the string).
In your example it is possible to omit the size of the array and the compiler will allocate the correct amount of storage:
char IP[] = "192.168.2.1";
char separator[] = "||";
Lastly, if you are coding in C++ rather than C, you're better off using std::string.
If you're using C++ anyway, I'd recommend using the std::string class instead of C strings - much easier and less error-prone IMHO, especially for people with a scripting language background.
There is a hidden nul character '\0' at the end of each string. You have to leave space for that.
If you do
char seperator[] = "||";
you will get a string of size 3, not size 2.
Because in C strings are nul terminated (their end is marked with a 0 byte). If you declare separator to be an array of two characters, and give them both non-zero values, then there is no terminator! Therefore when you puts the array pretty much anything could be tacked on the end (whatever happens to sit in memory past the end of the array - in this case, it appears that it's the IP array).
Edit: this following is incorrect. See comments below.
When you make the array length 3, the extra byte happens to have 0 in it, which terminates the string. However, you probably can't rely on that behavior - if the value is uninitialized it could really contain anything.
In C strings are ended with a special '\0' character, so your separator literal "||" is actually one character longer. puts function just prints every character until it encounters '\0' - in your case one after the IP string.
In C, strings include a (invisible) null byte at the end. You need to account for that null byte.
char ip[15] = "1.2.3.4";
in the code above, ip has enough space for 15 characters. 14 "regular characters" and the null byte. It's too short: should be char ip[16] = "1.2.3.4";
ip[0] == '1';
ip[1] == '.';
/* ... */
ip[6] == '4';
ip[7] == '\0';
Since no one pointed it out so far: If you declare your variable like this, the strings will be automagically null-terminated, and you don't have to mess around with the array sizes:
const char* IP = "192.168.2.1";
const char* seperator = "||";
Note however, that I assume you don't intend to change these strings.
But as already mentioned, the safe way in C++ would be using the std::string class.
A C "String" always ends in NULL, but you just do not give it to the string if you write
char separator[2] = "||". And puts expects this \0 at the ned in the first case it writes till it finds a \0 and here you can see where it is found at the end of the IP address. Interesting enoiugh you can even see how the local variables are layed out on the stack.
The line: char seperator[2] = "||"; should get you undefined behaviour since the length of that character array (which includes the null at the end) will be 3.
Also, what compiler have you compiled the above code with? I compiled with g++ and it flagged the above line as an error.
String in C\C++ are null terminated, i.e. have a hidden zero at the end.
So your separator string would be:
{'|', '|', '\0'} = "||"