Setting program to be autorun through registry - c++

I have the following code:
http://privatepaste.com/8364a2a7b8/12345
But it only writes "c" (supposedly, conversion to LPBYTE leaves one byte only).
What's the proper way to handle GetModuleFileName and registry edit?

strlen((char*)szPath2)+1
This is most likely where your problem is. I bet your program is compiled in UNICODE mode. strlen only works properly for ASCII strings. (The fact that you're having to cast from TCHAR to char is a big hint that something isn't right.)
To keep consistent with the usage of TCHAR and such, you should probably use _tcslen instead.

Related

In C++ when to use WCHAR and when to use CHAR

I have a question:
Some libraries use WCHAR as the text parameter and others use CHAR (as UTF-8): I need to know when to use WCHAR or CHAR when I write my own library.
Use char and treat it as UTF-8. There are a great many reasons for this; this website summarises it much better than I can:
http://utf8everywhere.org/
It recommends converting from wchar_t to char (UTF-16 to UTF-8) as soon as you receive it from any library, and converting back when you need to pass strings to it. So to answer your question, always use char except at the point that an API requires you to pass or receive wchar_t.
WCHAR (or wchar_t on Visual C++ compiler) is used for Unicode UTF-16 strings.
This is the "native" string encoding used by Win32 APIs.
CHAR (or char) can be used for several other string formats: ANSI, MBCS, UTF-8.
Since UTF-16 is the native encoding of Win32 APIs, you may want to use WCHAR (and better a proper string class based on it, like std::wstring) at the Win32 API boundary, inside your app.
And you can use UTF-8 (so, CHAR/char and std::string) to exchange your Unicode text outside your application boundary. For example: UTF-8 is widely used on the Internet, and when you exchange UTF-8 text between different platforms you don't have the problem of endianness (instead with UTF-16 you have to consider both the UTF-16BE big-endian and the UTF-16LE little-endian cases).
You can convert between UTF-16 and UTF-8 using the WideCharToMultiByte() and MultiByteToWideChar() Win32 APIs. These are pure-C APIs, and these can be conveniently wrapped in C++ code, using string classes instead of raw character pointers, and exceptions instead of raw error codes. You can find an example of that here.
The right question is not which type to use, but what should be your contract with your library users. Both char and wchar_t can mean more than one thing.
The right answer to me, is use char and consider everything utf-8 encoded, as utf8everywhere.org suggests. This will also make it easier to write cross-platform libraries.
Make sure you make correct use of strings though. Some APIs like fopen(), would accept a char* string and treat it differently (not as UTF-8) when compiled on Windows. If Unicode is important to you (and it probably is, when you are dealing with strings), be sure to handle your strings correctly. A good example can be seen in boost::locale. I also recommend using boost::nowide on Windows to get strings handled correctly inside your library.
In Windows we stick to WCHARS. std::wstring. Mainly because if you don't you end up having to convert because calling Windows functions.
I have a feeling that trying to use utf8 internally simply because of http://utf8everywhere.org/ is gonna bite us in the bum later on down the line.
It is best recommended that, when developing a Windows application, resort to TCHARs. The good thing about TCHARs is that they can be either regular chars or wchars, depending whether the unicode setting is set or not. Once you resort to TCHARs, you make sure that all string manipulations that you use also start with the _t prefix (e.g. _tcslen for length of string). That way you will know that your code will work both in Unicode and ASCII environments.

Strings. TCHAR LPWCS LPCTSTR CString. Whats what here, simple quick

TCHAR szExeFileName[MAX_PATH];
GetModuleFileName(NULL, szExeFileName, MAX_PATH);
CString tmp;
lstrcpy(szExeFileName, tmp);
CString out;
out.Format("\nInstall32 at %s\n", tmp);
TRACE(tmp);
Error (At the Format):
error C2664: 'void ATL::CStringT<BaseType,StringTraits>::Format(const wchar_t
*,...)' : cannot convert parameter 1 from 'const char [15]' to 'const wchar_t
I'd just like to get the current path that this program was launched from and copy it into a CString so I can use it elsewhere. I am currently just try to get to see the path by TRACE'ing it out. But strings, chars, char arrays, I can't ever get all the strait. Could someone give me a pointer?
The accepted answer addresses the problem. But the question also asked for a better understanding of the differences among all the character types on Windows.
Encodings
A char on Windows (and virtually all other systems) is a single byte. A byte is typically interpreted as either an unsigned value [0..255] or a signed value [-128..127]. (Older C++ standards guarantees a signed range of only [-127..127], but most implementations give [-128..127]. I believe C++11 guarantees the larger range.)
ASCII is a character mapping for values in the range [0..127] to particular characters, so you can store an ASCII character in either a signed byte or an unsigned byte, and thus it will always fit in a char.
But ASCII doesn't have all the characters necessary for most languages, so the character sets were often extended by using the rest of the values available in a byte to represent the additional characters needed for certain languages (or families of languages). So, while [0..127] almost always mean the same thing, values like 150 can only be interpreted in the context of a particular encoding. For single-byte alphabets, these encodings are called code pages.
Code pages helped, but they didn't solve all the problems. You always had to know which code page a particular document used in order to interpret it correctly. Furthermore, you typically couldn't write a single document that used different languages.
Also, some languages have more than 256 characters, so there was no way to map one char to one character. This led to the development of multi-byte character encodings, where [0..127] is still ASCII, but some of the other values are "escapes" that mean you have to look at some number of following chars to figure out what character you really had. (It's best to think of multi-byte as variable-byte, as some characters require only one byte while other require two or more.) Multi-byte works, but it's a pain to code for.
Meanwhile, memory was becoming more plentiful, so a bunch of organizations got together and created Unicode, with the goal of making a universal mapping of values to characters (for appropriately vague definitions of "characters"). Initially, it was believed that all characters (or at least all the ones anyone would ever use) would fit into 16-bit values, which was nice because you wouldn't have to deal with multi-byte encodings--you'd just use two bytes per character instead of one. About this time, Microsoft decided to adopt Unicode as the internal representation for text in Windows.
WCHAR
So Windows has a type called WCHAR, a two-byte value that represents a "Unicode" "character". I'm using quotation marks here because Unicode evolved past the original two-byte encoding, so what Windows calls "Unicode" isn't really Unicode today--it's actually a particular encoding of Unicode called UTF-16. And a "character" is not as simple a concept in Unicode as it was in ASCII, because, in some languages, characters combine or otherwise influence adjacent characters in interesting ways.
Newer versions of Windows used these 16-bit WCHAR values for text internally, but there was a lot of code out there still written for single-byte code pages, and even some for multi-byte encodings. Those programs still used chars rather than WCHARs. And many of these programs had to work with people using older versions of Windows that still used chars internally as well as newer ones that use WCHAR. So a technique using C macros and typedefs was devised so that you could mostly write your code one way and--at compile time--choose to have it use either char or WCHAR.
TCHAR
To accomplish this flexibility, you use a TCHAR for a "text character". In some header file (often <tchar.h>), TCHAR would be typedef'ed to either char or WCHAR, depending on the compile time environment. Windows headers adopted conventions like this:
LPTSTR is a (long) pointer to a string of TCHARs.
LPWSTR is a (long) pointer to a string of WCHARs.
LPSTR is a (long) pointer to a string of chars.
(The L for "long" is a leftover from 16-bit days, when we had long, far, and near pointers. Those are all obsolete today, but the L prefix tends to remain.)
Most of the Windows API functions that take and return strings were actually replaced with two versions: the A version (for "ANSI" characters) and the W version (for wide characters). (Again, historical legacy shows in these. The code pages scheme was often called ANSI code pages, though I've never been clear if they were actually ruled by ANSI standards.)
So when you call a Windows API like this:
SetWindowText(hwnd, lptszTitle);
what you're really doing is invoking a preprocessor macro that expands to either SetWindowTextA or SetWindowTextW. It should be consistent with however TCHAR is defined. That is, if you want strings of chars, you'll get the A version, and if you want strings of WCHARs, you get the W version.
But it's a little more complicated because of string literals. If you write this:
SetWindowText(hwnd, "Hello World"); // works only in "ANSI" mode
then that will only compile if you're targeting the char version, because "Hello World" is a string of chars, so it's only compatible with the SetWindowTextA version. If you wanted the WCHAR version, you'd have to write:
SetWindowText(hwnd, L"Hello World"); // only works in "Unicode" mode
The L here means you want wide characters. (The L actually stands for long, but it's a different sense of long than the long pointers above.) When the compiler sees the L prefix on the string, it knows that string should be encoded as a series of wchar_ts rather than chars.
(Compilers targeting Windows use a two-byte value for wchar_t, which happens to be identical to what Windows defined a WCHAR. Compilers targeting other systems often use a four-byte value for wchar_t, which is what it really takes to hold a single Unicode code point.)
So if you want code that can compile either way, you need another macro to wrap the string literals. There are two to choose from: _T() and TEXT(). They work exactly the same way. The first comes from the compiler's library and the second from the OS's libraries. So you write your code like this:
SetWindowText(hwnd, TEXT("Hello World")); // compiles in either mode
If you're targeting chars, the macro is a no-op that just returns the regular string literal. If you're targeting WCHARs, the macro prepends the L.
So how do you tell the compiler that you want to target WCHAR? You define UNICODE and _UNICODE. The former is for the Windows APIs and the latter is for the compiler libraries. Make sure you never define one without the other.
My guess is you are compiling in Unicode mode.
Try enclosing your format string in the _T macro, which is designed to provide an always-correct method of providing constant string parameters, regardless of whether you're compiling in Unicode or ANSI mode:
out.Format(_T("\nInstall32 at %s\n"), tmp);

using wsprintf with malloc

Is there any mistake to write a code such:
char* sp=(char*)malloc(128);
int x=22;
wsprintf(sp,"%d",x);
cout<<sp;
I am asking specially about security mistakes?
There are a number of "potential" issues here, non of them is actually infinging anything but you may find things not behaving as you expect.
First: wsprintf, as a Win32 API (http://msdn.microsoft.com/en-us/library/windows/desktop/ms647550(v=vs.85).aspx ) is prototyped as:
int __cdecl wsprintf(
_Out_ LPTSTR lpOut,
_In_ LPCTSTR lpFmt,
_In_ ...
);
where LPTSTR is defined as char* or wchar_t* depending on the definition or not of the UNICODE symbol (check your propject settings and / or build commands)
Now, in case you are on an ANSI build (no UNICODE) all types are coherent, but there is no check about wsprintf writing more than the 128 char you allocated. If you just write a decimal integer it will have no problem, but if you (of somebody else after you) modify later the "message" and no checks are made, some surprises may arise (like wsprintf(sp,"This is the number I've been told I was supposed to be expected to be: %d",x); will this still fits the 128 chars?!? )
In case you are on a UNICODE build, you allocate 128 char, and write a double-byte string on it. The number 22 will be written as \x32\x00\x32\x00\x00\x00 (3200 is the little-endian coding for 0x0032 that is the wchar_t correponding to the UNICODE 50 that stands for '2').
If you give that sequence to cout (that is char based, not wchar_t based) will see the first \x00 as a string terminator and will output ... just '2'.
To be coherent, you shold either:
use all char based types and function (OK malloc and cout, but wsprintfA instead of wsprintf)
use all wchar_t based types and function (malloc(128*sizeof(wchar_t)), wchar_t* and wsprintfW)
use all TCHAR based types (malloc(128*sizeof(TCHAR)), TCHAR* and wsprintf, but define tcout as cout or wcout depending on UNICODE).
There is no security mistake because it is never the case that an int converted to a C string will exceed the size you've allocated.
However this style of programming has the potential for security issues. And history has shown that this kind of code has caused real security issues time and time again. So maybe you should be learning a better style of coding?
This MSDN link lists some concerns over the use of wsprintf. They don't appear to apply to your example but they do give some alternatives that you might want to explore.
Ok given that you have stated you are using winapi, read this from their documentation:
Note Do not use. Consider using one of the following functions
instead: StringCbPrintf, StringCbPrintfEx, StringCchPrintf, or
StringCchPrintfEx. See Security Considerations.
Therefore do not use. I would ignore however what they tell you to do instead and either:
Write in C, and a function from the C standard library (sprintf, snprintf etc). In that case you cannot use cout.
Write in C++. Use std::string and there is even a new to_string, as well as boost::format and ostringstream to help you build formatted string. You can still use C standard library functions there if you want to as well when it really suits your purpose better, but leave the allocation stuff to the library.

Unicode causes closing messagebox to terminate program

I'm developing a Win32 API Wrapper. To make it Unicode-compliant, I do the following:
#ifndef UNICODE
#define gchar char
#define gstrcpy strcpy
#define gstrncpy strncpy
#define gstrlen strlen
#define gstrcat strcat
#define gstrncat strncat
#define gstrcmp strcmp
#define gstrtok strtok
#else
#define gchar wchar_t
#define gstrcpy lstrcpy
#define gstrncpy lstrncpy
#define gstrlen lstrlen
#define gstrcat lstrcat
#define gstrncat lstrncat
#define gstrcmp lstrcmp
#define gstrtok lstrtok
#endif
I also provide
#define uni(s) TEXT(s)
My test consisted of a window that creates a message box via
msg (uni("Left-click"));
whenever the user left-clicks the window. The problem is that, no matter how many messages are created, after 4 or 5 of these messages are closed when I #define UNICODE, the next message box shown, whether it be a new one or the one under the last one closed causes the program to return 0xC0000005. Not defining UNICODE will make this work perfectly. My msg function is as follows:
dword msg (cstr = uni(""), cstr = uni(""), hwin = null, dword = 0);
...
dword msg (cstr lpText, cstr lpCaption, hwin hWnd, dword uType)
{
return MessageBox (hWnd, lpText, lpCaption, uType);
}
where dword is DWORD, cstr is pchar, which is gchar *, which can be char * or wchar_t *, hwin is HWND, and null is 0.
It probably isn't the message box doing this, but I haven't done any other text-related stuff with the testing so I'll see if it crashes some other way too.
Does anyone know why this would happen? The difference between MB characters and unicode shouldn't cause the program to repeatedly crash. I can upload the headers and the test too, if needed.
Edit:
I just found out creating one message and then closing the actual window will result in the same crash. SOURCE CODE Here's the link for the source. Please keep in mind:
a) I only took one first-year programming course, ever (C++).
b) My wrapper's purpose is to make writing win32 apps as easy as possible.
c) I like to make things of my own (string class etc).
Also forgot this (duh), I'm using Code::Blocks (MinGW).
Edit:
I didn't realize before, but the program is trying to access memory at 0x00000000. This is what's causing the problem, but I have no idea why it would be trying to do this. I believe the instruction trying to access it is located somewhere in winnt.dll, but having never learned how to debug, I'm still trying to figure out how to find the information I need.
Edit:
Now, without changing it, but running it on a different computer, it's referencing 0x7a797877 instead of 0.
Edit: Changing the window procedure to include WM_LBUTTONDOWN and call msg() inside, rather than calling the added procedure makes the program work perfectly. Something with the way addmsg() and the window procedure are coded causes the _lpWindowName and _lpClassName to have corrupted data after a while, but non-pointer members are still preserved.
EDIT:
After all of this mayhem I finally found out I was missing a single character in all of my source code. When I defined msgparams as Window, UINT, WPARAM, LPARAM and likewise with msgfillparams (except with names) I forgot to pass a reference. I was passing the Window by value! I'd still like to thank everyone who posted, as I did get my butt kicked debugger-wise and ended up learning a lot more about Unicode as well.
you should do your homework before asking questions on SO. My impression is that you have almost no idea about how Unicode works on Windows and it will require many pages to explain.
Porting an application from ANSI to Unicode is a big deal on Windows. It may seem reasonable to pay someone with experience do to this.
Mainly everything that worked with char will have to work with wchar_t.
The entire API has other functions but you should start by using windows support for this, not writing your own macros and first step is to use _T not W so you'll b able to start changing code and still be able to compile in both Unicode and ANSI.
Why are you even bothering with ANSI in the first place? All the TCHAR support dates back to a time when Win95 was commonplace, so developers had to write code that could compile as ANSI (for Win95) or UNICODE (for NT-based Windows). Now that Win95 is long obsolete, there's no need to bother with TCHAR: just go all-UNICODE, using L"Unicode strings" instead of TEXT() and wcs-versions of the CRT rather than the _t-versions.
Having said that, here's some common sources of errors with ANSI/UNICODE code that could explain some of what you are seeing:
One possibility is that there's a bug somewhere that's corrupting the stack - uninitialized variable, stack overrun, and the like. In unicode, any chars or strings on the stack may take up a different amount of space compared to the ANSI version, so variables will end up in different places relative to one another. Chances are you are 'getting lucky' in the ANSI build, and whatever is being corrupted isn't important data; but on the UNICODE build, something important on the stack is getting nuked. (For example, if you overflow a buffer on the stack, you could end up overwriting the return address that's also on the stack, likely causing a crash at the next function return.)
--
Watch out for cases where you are mixing up character counts versus byte counts: with ANSI, you can use 'sizeof()' almost interchangeably with a character count (depending on whether you're counting the terminating NUL space or not); but with UNICODE, you can't: and if you get them mixed up, you can get a buffer overrun very easily.
For example:
// Incorrectly passing byte count instead of character count
WCHAR szWindowName[32];
GetWindowTextW( hwnd, szWindowName, sizeof(szWindowName) );
this can cause a buffer overrun (leading to crash - if you're lucky - or silently corrupted data and incorrect results later on if you're not lucky) since it's passing 64 - the size in bytes - to GetWindowText, instead of 32, the size in characters.
On windows, Use the ARRAYSIZE(...) instead of sizeof() to get the number of elements in an array rather than the byte-size of the array.
--
Another thing to watch for is any strings where you have used casts to "force" them into CHAR or WCHAR to avoid compiler errors: eg.
// Incorrectly calling ANSI function with UNICODE strings...
MessageBoxA(hwnd, (LPCSTR)L"Unicode Title", (LPCSTR)"Unicode content", MB_OK);
This type of usage typically results in just the first character of the string showing.
// Incorrectly calling UNICODE function with ANSI strings...
MessageBoxW(hwnd, (LPCWSTR)"ANSI Title", (LPCWSTR)"ANSI content", MB_OK);
This is trickier, you may get a string of garbage, or could get an error of some kind.
These cases are easy to spot there there are casts - generally speaking, casts should be viewed as a 'red flags' and avoided at all costs. Don't use them to avoid a compiler error, instead fix the issue that the compiler is warning about.
Also watch out for cases where you can get these mixed up but where the compiler won't warn you - eg with printf, scanf and friends: the compiler doesn't check the argument lists:
// Incorrectly calling ANSI function with UNICODE string - compiler won't warn you here...
LPCWSTR pString = L"I'm unicode!";
printf("The result is: %s\n", pString);

What is the simplest way to convert char[] to/from tchar[] in C/C++(ms)?

This seems like a pretty softball question, but I always have a hard time looking up this function because there seem there are so many variations regarding the referencing of char and tchar.
The simplest way is to use the conversion macros:
CW2A
CA2W
etc...
MSDN
TCHAR is a Microsoft-specific typedef for either char or wchar_t (a wide character).
Conversion to char depends on which of these it actually is. If TCHAR is actually a char, then you can do a simple cast, but if it is truly a wchar_t, you'll need a routine to convert between character sets. See the function MultiByteToWideChar()
MultiByteToWideChar but also see "A few of the gotchas of MultiByteToWideChar".
There are a few answers in this post as well, especially if you're looking for a cross-platform solution:
UTF8 to/from wide char conversion in STL
Although in this particular situation I think the TChar is a wide character I'll only need to do the conversion if it isn't. which I gotta check somehow.
if (sizeof(TCHAR) != sizeof(wchar_t))
{ .... }
The cool thing about that is both sizes of the equals are constants, which means that the compiler will handle (and remove) the if(), and if they are equal, remove everything inside the braces
Here is the CPP code that duplicates _TCHAR * argv[] to char * argn[].
http://www.wincli.com/?p=72
If you adopting old code to Windows, simple use define mentioned in the code as optional.
You can put condition in your code
ifdef _UNICODE
{ //DO LIKE TCHAR IS WIDE CHAR } ELSE { //DO LIKE TCHAR IS CHAR }
I realize this is an old thread, but it didn't get me the "right" answer, so am adding it now.
The way this appears to be done now is to use the TEXT macro. The example for FindFirstFile at msdn points this out.
http://msdn.microsoft.com/en-us/library/windows/desktop/aa364418%28v=vs.85%29.aspx