Create string from UTF-8 byte array? - c++

Consider the emoji 😙. It's U+1F619 (decimal 128537). I believe it's UTF-8 byte array is 240, 159, 152, 151.
Given the UTF-8 byte array, how can I display it? Do I create a std::string from the byte array? Are there 3rd party libraries which help?
Given a different emoji, how can I get its UTF-8 byte array?
Target platform: Windows. Compiler: Visual C++ 2019. Just pasting 😙 into the Windows CMD prompt does not work. I tried chcp 65001 and Lucida as the font, but no luck.
I can do this on macOS or Linux if necessary, but I prefer Windows.
To clarify ... given a list of 400 bytes, how can I display the corresponding code points assuming UTF-8?

C++ has a simple solution to that.
#include <iostream>
#include <string>
int main(void) {
std::string s = u8"😙"; /* use std::u8string in c++20*/
std::cout << s << std::endl;
return 0;
}
This will allow you to store and print any UTF-8 string.
Note that Windows command prompt is weird with this kind of stuff. It's better you use an alternative such as MSYS2.

Here is sample code for experimenting with unicode, to convert unicode character/string and print it in console, it works just fine for a lot of unicode characters assuming you set correct locale, console code page, and perform adequate string conversion (if needed ex. char32_t, char16_t and char8_t need conversion).
except for the character you want to display its not that easy, running a test takes huge amount of time, this can be improved my modifying code bellow or by knowing details needed such as code page (likely not supported by windows), so feel free to experiment as long as it doesn't become boring ;)
Hint, it would be the best to add code to write to file, let it run and check after some hour the results in file. For this to work you'll need to put BOM mark into file, but not before file is opened as UTF encoded, you do this by wofstream::imbue() to specific locale, and for BOM it depends on endianess, it's UTF-X LE encoding scheme on Windows, where X is either 8, 16, or 32, write to file must be done with wcout wchar_t to be sucessful.
See code commenets for more info, and try to comment out/uncomment parts of code to see different and quicker results.
BTW. the point in this code is to try out all possible locales/code pages supported by sytem, until you see your smiley in the console or ulitmately fail
#include <climits>
#include <locale>
#include <iostream>
#include <sstream>
#include <Windows.h>
#include <string_view>
#include <cassert>
#include <cwchar>
#include <limits>
#include <vector>
#include <string>
#pragma warning (push, 4)
#if !defined UNICODE && !defined _UNICODE
#error "Compile as unicode"
#endif
#define LINE __LINE__
// NOTE: change desired default code page here (unused)
#define CODE_PAGE CP_UTF8
// Error handling helper method
void StringCastError()
{
std::wstring error = L"Unknown error";
switch (GetLastError())
{
case ERROR_INSUFFICIENT_BUFFER:
error = L"A supplied buffer size was not large enough, or it was incorrectly set to NULL";
break;
case ERROR_INVALID_FLAGS:
error = L"The values supplied for flags were not valid";
break;
case ERROR_INVALID_PARAMETER:
error = L"Any of the parameter values was invalid.";
break;
case ERROR_NO_UNICODE_TRANSLATION:
error = L"Invalid Unicode was found in a string.";
break;
default:
break;
};
std::wcerr << error << std::endl;
}
// Convert multybyte to wide string
static std::wstring StringCast(const std::string& param, int code_page)
{
if (param.empty())
{
std::wcerr << L"ERROR: param string is empty" << std::endl;
return std::wstring();
}
DWORD flags = MB_ERR_INVALID_CHARS;
//flags |= MB_USEGLYPHCHARS;
//flags |= MB_PRECOMPOSED;
switch (code_page)
{
case 50220:
case 50221:
case 50222:
case 50225:
case 50227:
case 50229:
case 65000:
case 42:
flags = 0;
break;
case 54936:
case CP_UTF8:
flags = MB_ERR_INVALID_CHARS; // or 0
break;
default:
if ((code_page >= 57002) && (code_page <= 57011))
flags = 0;
break;
}
const int source_char_size = static_cast<int>(param.size());
int chars = MultiByteToWideChar(code_page, flags, param.c_str(), source_char_size, nullptr, 0);
if (chars == 0)
{
StringCastError();
return std::wstring();
}
std::wstring return_string(static_cast<const unsigned int>(chars), 0);
chars = MultiByteToWideChar(code_page, flags, param.c_str(), source_char_size, &return_string[0], chars);
if (chars == 0)
{
StringCastError();
return std::wstring();
}
return return_string;
}
// Convert wide to multybyte string
std::string StringCast(const std::wstring& param, int code_page)
{
if (param.empty())
{
std::wcerr << L"ERROR: param string is empty" << std::endl;
return std::string();
}
DWORD flags = WC_ERR_INVALID_CHARS;
//flags |= WC_COMPOSITECHECK;
flags |= WC_NO_BEST_FIT_CHARS;
switch (code_page)
{
case 50220:
case 50221:
case 50222:
case 50225:
case 50227:
case 50229:
case 65000:
case 42:
flags = 0;
break;
case 54936:
case CP_UTF8:
flags = WC_ERR_INVALID_CHARS; // or 0
break;
default:
if ((code_page >= 57002) && (code_page <= 57011))
flags = 0;
break;
}
const int source_wchar_size = static_cast<int>(param.size());
int chars = WideCharToMultiByte(code_page, flags, param.c_str(), source_wchar_size, nullptr, 0, nullptr, nullptr);
if (chars == 0)
{
StringCastError();
return std::string();
}
std::string return_string(static_cast<const unsigned int>(chars), 0);
chars = WideCharToMultiByte(code_page, flags, param.c_str(), source_wchar_size, &return_string[0], chars, nullptr, nullptr);
if (chars == 0)
{
StringCastError();
return std::string();
}
return return_string;
}
// Console code page helper to adjust console
bool SetConsole(UINT code_page)
{
if (IsValidCodePage(code_page) == 0)
{
std::wcerr << L"Code page is not valid: " << LINE << std::endl;
}
else if (SetConsoleCP(code_page) == 0)
{
std::wcerr << L"Failed to set console input code page line: " << LINE << std::endl;
}
else if (SetConsoleOutputCP(code_page) == 0)
{
std::wcerr << L"Failed to set console output code page: " << LINE << std::endl;
}
else
{
return true;
}
return false;
}
std::vector<std::string> locales;
// System locale enumerator to get all locales installed on system
BOOL LocaleEnumprocex(LPWSTR locale_name, [[maybe_unused]] DWORD locale_info, LPARAM code_page)
{
locales.push_back(StringCast(locale_name, static_cast<int>(code_page)));
return TRUE; // continue drilling
}
// System code page enumerator to try out every possible supported/installed code page on system
BOOL CALLBACK EnumCodePagesProc(LPTSTR page_str)
{
wchar_t* end;
UINT code_page = std::wcstol(page_str, &end, 10);
char char_buff[MB_LEN_MAX]{};
char32_t target_char = U'😙';
std::mbstate_t state{};
std::stringstream string_buff{};
std::wstring wstr = L"";
// convert UTF-32 to multibyte
std::size_t ret = std::c32rtomb(char_buff, target_char, &state);
if (ret == -1)
{
std::wcout << L"Conversion from char32_t failed: " << LINE << std::endl;
return FALSE;
}
else
{
string_buff << std::string_view{ char_buff, ret };
string_buff << '\0';
if (string_buff.fail())
{
string_buff.clear();
std::wcout << L"string_buff failed or bad line: " << LINE << std::endl;
return FALSE;
}
// NOTE: CP_UTF8 gives good results, ex. CP_SYMBOL or code_page variable does not
// To make stuff work, provide good code page
wstr = StringCast(string_buff.str(), CP_UTF8 /* code_page */ /* CP_SYMBOL */);
}
// Try out every possible locale, this will take insane amount of time!
// make sure to comment this range for out if you know the locale.
for (auto loc : locales)
{
// locale used (comment out for testing)
std::locale::global(std::locale(loc));
if (SetConsole(code_page))
{
// HACK: put breakpoint here, and you'll see the string
// is correctly encoded inside wstr (ex. mouse over wstr)
// However it's not printed because console code page is likely wrong.
assert(std::wcout.good() && string_buff.good());
std::wcout << wstr << std::endl;
// NOTE: commented out to avoid spamming the console, basically
// hard to find correct code page if not impossible for CMD
if (std::wcout.bad())
{
std::wcout.clear();
//std::wcout << L"std::wcout Read/write error on i/o operation line: " << LINE << std::endl;
}
else if (std::wcout.fail())
{
std::wcout.clear();
//std::wcout << L"std::wcout Logical error on i/o operation line: " << LINE << std::endl;
}
}
}
return TRUE; // continue drilling
}
int main()
{
// NOTE: can be also LOCALE_ALL, anything else than CP_UTF8 doesn't make sense here
EnumSystemLocalesEx(LocaleEnumprocex, LOCALE_WINDOWS, static_cast<LPARAM>(CP_UTF8), 0);
// NOTE: can also be CP_INSTALLED
EnumSystemCodePagesW(EnumCodePagesProc, CP_SUPPORTED);
// NOTE: following is just a test code to demonstrate these algorithms indeed work,
// comment out 2 function above to test!
std::mbstate_t state{};
std::stringstream string_buff{};
char char_buff[MB_LEN_MAX]{};
// Test case for working char:
std::locale::global(std::locale("ru_RU.utf8"));
string_buff.clear();
string_buff.str(std::string());
// Russian (KOI8-R); Cyrillic (KOI8-R)
if (SetConsole(20866))
{
char32_t char32_str[] = U"Познер обнародовал";
for (char32_t c32 : char32_str)
{
std::size_t ret2 = std::c32rtomb(char_buff, c32, &state);
if (ret2 == -1)
{
std::wcout << L"Conversion from char32_t failed line: " << LINE << std::endl;
}
else
{
string_buff << std::string_view{ char_buff, ret2 };
}
}
string_buff << '\0';
if (string_buff.fail())
{
string_buff.clear();
std::wcout << L"string_buff failed or bad line: " << LINE << std::endl;
}
std::wstring wstr = StringCast(string_buff.str(), CP_UTF8);
std::wcout << wstr << std::endl;
if (std::wcout.fail())
{
std::wcout.clear();
std::wcout << L"std::wcout failed or bad line: " << LINE << std::endl;
}
}
}
#pragma warning (pop)

Related

Correct way to print wide strings to terminals with different encodings

Below I will try to print string XЯ𐤈 (latin "ex", cyrillic "ya", and Phoenician "teth") to terminals with various encodings, namely utf8, cp1251 and C (POSIX). I expect to see XЯ𐤈 in utf8 terminal, XЯ? in cp1251 terminal, and X?? in C (POSIX) terminal. Question marks are because C++ output library replaces characters which it cannot represent with ?. This is correct and expected behavior.
(1) My first naive attempt was to just print wide character string to wcout:
wchar_t str[] = L"\U00000058\U0000042f\U00010908";
std::wcout << str << std::endl;
// utf8 terminal output: X??
// cp1251: X??
// C: X??
In all terminals, it printed correctly only the first, ascii7, character. Other characters were replaced with '?' marks. It turned out that this happened because during program startup, LC_ALL is set to C.
(2) Second attempt was to manually call std::setlocale() with utf8 encoding:
wchar_t str[] = L"\U00000058\U0000042f\U00010908";
std::setlocale(LC_ALL, "en_US.UTF-8");
std::wcout << str << std::endl;
// utf8: XЯ𐤈
// cp1251: XЯ𐤈
// C: XЯð¤
Obviously, this worked correctly in utf8 terminal, but resulted in garbage in other two terminals.
(3) Third attempt was to parse $LANG environment variable for actual encoding used by terminal (and hope that all pieces of the terminal use the same encoding):
const char* lang = std::getenv("LANG");
if (!lang) {
std::cerr << "Couldn't get LANG" << std::endl;
exit(1);
}
wchar_t str[] = L"\U00000058\U0000042f\U00010908";
std::setlocale(LC_ALL, lang);
std::wcout << str << std::endl;
// utf8: XЯ𐤈
// cp1251: XЯ?
// C: X??
Now the output in all three terminals was as I expected. However, mixing std::cout and std::wcout is a bad idea, and std::cout is definitely used by some third-party libraries used in my program. This makes std::wcout unusable.
(4) So, fourth attempt (or, actually, idea) was to detect terminal encoding from $LANG, use codevct() to convert wchar_t[] string into terminal encoding and print it with ordinary std::cout.write(). Unfortunately, I couldn't find a way to explicitly set target encoding for codevct().
(5) Fifth, and so far, the best, attempt was to use iconv() manually:
// get $LANG env var
const char* lang = std::getenv("LANG");
if (!lang) {
std::cerr << "Couldn't get $LANG" << std::endl;
exit(1);
}
// find out encoding from $LANG, e.g. "utf8", "cp1251", etc
std::string enc(lang);
size_t pos = enc.rfind('.');
if (pos != std::string::npos) {
enc = enc.substr(pos + 1);
}
if (enc == "C" || enc == "POSIX") {
enc = "iso8859-1";
}
// convert wchar_t[] string into terminal encoding
wchar_t str[] = L"\U00000058\U0000042f\U00010908";
iconv_t handler = iconv_open(enc.c_str(), "UTF32LE");
if (handler == (iconv_t)-1) {
std::cerr << "Couldn't create iconv handler: " << strerror(errno) << std::endl;
exit(1);
}
char buf[1024];
char* inbuf = (char*)str;
size_t inbytes = sizeof(str);
char* outbuf = buf;
size_t outbytes = sizeof(buf);
while (true) {
size_t res = iconv(handler, &inbuf, &inbytes, &outbuf, &outbytes);
if (res != (size_t)-1) {
break;
}
if (errno == EILSEQ) {
// replace non-convertable code point with question mark and retry iconv()
inbuf[0] = '\x3f';
inbuf[1] = '\x00';
inbuf[2] = '\x00';
inbuf[3] = '\x00';
} else {
std::cerr << "iconv() failed: %s" << strerror(errno) << std::endl;
exit(1);
}
}
iconv_close(handler);
// write converted string to std::cout
std::cout.write(buf, sizeof(buf) - outbytes);
std::cout << std::endl;
// utf8: XЯ𐤈
// cp1251: XЯ?
// C: X??
This worked correctly in all three terminals. And now I am also not afraid that std::cout is used in other parts of the program. However, I find this solution not C++-way.
So, the question is: what is the correct way to print wide strings in C++? I would be fine with platform-specific solution (Linux + glibc + GCC).

std::wcout will not print after certain amounts of characters

I've been beating my head against a wall with this one for a while. I'm only trying to make a simple application to read out the contents of a file. Here's some of the code:
errno_t error;
if ((error = fopen_s(&f, file, "r")) == 0) {
while (true) {
std::wcout << std::endl << "NEW RUN" << std::endl;
wchar_t content[4096];
if (fgetswc(content, 4096, f) == 4096) {
std::wcout << content;
std::wcout.flush();
}
else {
std::wcout << content;
std::wcout.flush();
break;
}
}
fclose(f);
std::wcout << "PLEASE PRINT THIS NOW";
system("pause");
return 0;
}
And the custom fgetswc function:
int fgetswc(wchar_t buffer[], int count, FILE * f) {
for (int i = 0; i < count; i = i + 1) {
wchar_t c = fgetwc(f);
if (c != WEOF) {
buffer[i] = c;
} else {
return i;
}
}
return count;
}
It reads the first 4096 bytes out of the file, but then subsequent std::wcout calls will not print out to the console I have. It reads the rest of the file and ends successfully, as I can see using breakpoints and the debugger. content gets filled up every iteration. I also attempted putting in debug statements, but even those don't get printed. Am I just doing something wrong? As far as I can tell there's no special characters in my file, it's just a log file.
std::wcout << content;
This is effectively calling std::wostream::operator<<(const wchar_t *). It doesn't know that content is not a ␀-terminated string. In fact, it can't possibly know that it has valid length 4096 in the first case and some amount less in the second case (you don't save the return value of fgetswc).

Winsock remote device discovery

I am new to winsock and I wish to use bluetooth for my project.
I wrote a simple code taking help from online resources to find remote devices
It should print the name of the remote devices but instead it prints some hex value I think...I dont know what that is
The code is
#include "stdafx.h"
#include<iostream>
#include<winsock2.h>
#include<ws2bth.h>
#include<bluetoothapis.h>
#include<stdlib.h>
using namespace std;
#define SUCCESS 0
#pragma comment(lib, "ws2_32.lib")
#pragma comment(lib, "irprops.lib")
int main()
{
WSADATA data;
int result;
result = WSAStartup(MAKEWORD(2, 2), &data);
if (result != SUCCESS)
{
cout << "error occured while initialising winsock...";
exit(result);
}
cout << "winsock initialisation successful\n";
WSAQUERYSET queryset;
memset(&queryset, 0, sizeof(WSAQUERYSET));
queryset.dwSize = sizeof(WSAQUERYSET);
queryset.dwNameSpace = NS_BTH;
HANDLE hLookup;
result = WSALookupServiceBegin(&queryset, LUP_CONTAINERS, &hLookup);
if (result != SUCCESS)
{
cout << "error in initialising look up service\n";
exit(result);
}
cout << "initialising lookup service successful\n";
BYTE buffer[4096];
memset(buffer, 0, sizeof(buffer));
DWORD bufferLength = sizeof(buffer);
WSAQUERYSET *pResults = (WSAQUERYSET*)&buffer;
while (result == SUCCESS)
{
result = WSALookupServiceNext(hLookup, LUP_RETURN_NAME | LUP_CONTAINERS | LUP_RETURN_ADDR | LUP_FLUSHCACHE | LUP_RETURN_TYPE | LUP_RETURN_BLOB | LUP_RES_SERVICE, &bufferLength, pResults);
if (result == SUCCESS)
{
//DEVICE FOUND
LPTSTR s = pResults->lpszServiceInstanceName;
cout << s << endl;
Sleep(1000);
}
}
WSALookupServiceEnd(hLookup);
return 0;
}
I require help in solving this issue
Thanks in advance for any help
You have a (potential) mismatch of character encodings. The line
LPTSTR s = pResults->lpszServiceInstanceName;
expands to
LPWSTR s = pResults->lpszServiceInstanceName;
if you have your project's character encoding set to Unicode (default setting). To output a Unicode string, you have to use std::wcout instead of std::cout:
LPCWSTR s = pResults->lpszServiceInstanceName;
wcout << s << endl;
To reduce the odds of inadvertently using an unexpected character encoding, code should explicitly specify the character encoding it uses. The code in the question should use WSAQUERYSETW, and call WSALookupServiceBeginW and WSALookupServiceNextW instead.
Explanation of the observed behavior:
std::cout interprets a const char* as a C-style string, and displays the characters until it finds a NUL character (see operator<<(std::basic_ostream)).
A const wchar_t*, on the other hand, is not interpreted to mean anything special. std::cout treats it like any other pointer, and prints its value using the hexadecimal numeral system by default (see std::basic_ostream::operator<<).

wstring::find() doesn't work with non-latin symbols?

I have an wide-character string (std::wstring) in my code, and I need to search wide character in it.
I use find() function for it:
wcin >> str;
wcout << ((str.find(L'ф') != wstring::npos)? L"EXIST":L"NONE");
L'ф' is a Cyrillic letter.
But find() in same call always returns npos. In a case with Latin letters find() works correctly.
It is a problem of this function?
Or I incorrectly do something?
UPD
I use MinGW and save source in UTF-8.
I also set locale with setlocale(LC_ALL, "");.
Code same wcout << L'ф'; works coorectly.
But same
wchar_t w;
wcin >> w;
wcout << w;
works incorrectly.
It is strange. Earlier I had no problems with the encoding, using setlocale ().
The encoding of your source file and the execution environment's encoding may be wildly different. C++ makes no guarantees about any of this. You can check this by outputting the hexadecimal value of your string literal:
std::wcout << std::hex << L"ф";
Before C++11, you could use non-ASCII characters in source code by using their hex values:
"\x05" "five"
C++11 adds the ability to specify their Unicode value, which in your case would be
L"\u03A6"
If you're going full C++11 (and your environment ensures these are encoded in UTF-*), you can use any of char, char16_t, or char32_t, and do:
const char* phi_utf8 = "\u03A6";
const char16_t* phi_utf16 = u"\u03A6";
const char32_t* phi_utf16 = U"\u03A6";
You must set the encoding of the console.
This works:
#include <iostream>
#include <string>
#include <io.h>
#include <fcntl.h>
#include <stdio.h>
using namespace std;
int main()
{
_setmode(_fileno(stdout), _O_U16TEXT);
_setmode(_fileno(stdin), _O_U16TEXT);
wstring str;
wcin >> str;
wcout << ((str.find(L'ф') != wstring::npos)? L"EXIST":L"NONE");
system("pause");
return 0;
}
std::wstring::find() works fine. But you have to read the input string correctly.
The following code runs fine on Windows console (the input Unicode string is read using ReadConsoleW() Win32 API):
#include <exception>
#include <iostream>
#include <sstream>
#include <stdexcept>
#include <string>
#include <windows.h>
using namespace std;
class Win32Error : public runtime_error
{
public:
Win32Error(const char* message, DWORD error)
: runtime_error(message)
, m_error(error)
{}
DWORD Error() const
{
return m_error;
}
private:
DWORD m_error;
};
void ThrowLastWin32(const char* message)
{
const DWORD error = GetLastError();
throw Win32Error(message, error);
}
void Test()
{
const HANDLE hStdIn = GetStdHandle(STD_INPUT_HANDLE);
if (hStdIn == INVALID_HANDLE_VALUE)
ThrowLastWin32("GetStdHandle failed.");
static const int kBufferLen = 200;
wchar_t buffer[kBufferLen];
DWORD numRead = 0;
if (! ReadConsoleW(hStdIn, buffer, kBufferLen, &numRead, nullptr))
ThrowLastWin32("ReadConsoleW failed.");
const wstring str(buffer, numRead - 2);
static const wchar_t kEf = 0x0444;
wcout << ((str.find(kEf) != wstring::npos) ? L"EXIST" : L"NONE");
}
int main()
{
static const int kExitOk = 0;
static const int kExitError = 1;
try
{
Test();
return kExitOk;
}
catch(const Win32Error& e)
{
cerr << "\n*** ERROR: " << e.what() << '\n';
cerr << " (GetLastError returned " << e.Error() << ")\n";
return kExitError;
}
catch(const exception& e)
{
cerr << "\n*** ERROR: " << e.what() << '\n';
return kExitError;
}
}
Output:
C:\TEMP>test.exe
abc
NONE
C:\TEMP>test.exe
abcфabc
EXIST
That's probably an encoding issue. wcin works with an encoding different from your compiler's/source code's. Try entering the ф in the console/wcin -- it will work. Try printing the ф via wcout -- it will show a different character or no character at all.
There is no platform independent way to circumvent this, but if you are on windows, you can manually change the console encoding, either with the chchp commandline command or programmatically with SetConsoleCP() (input) and SetConsoleOutputCP() (output).
You could also change your source file's/compiler's encoding. How this is done depends on your editor/compiler. If you are using MSVC, this answer might help you: https://stackoverflow.com/a/1660901/2128694

RegQueryValueEx - What code add to this function to show ERROR_SUCCESS

What code add to this function to work good? (ERROR_SUCCESS)
I have code, that check value in registry.
In function RegQueryValueEx is bug.
When oldValue is few letters longer than newValue, function shows ERROR_MORE_DATA, but I want want ERROR_SUCCESS
What code add to this function to do this?
void function(string newValue, string key, string name)
{
// string key - key in registry, ie Myapp\\Options
// string name - name in registry
// string newValue - data in REG_SZ
string oldValue;
DWORD keytype = REG_SZ;
HKEY keyHandle;
DWORD size = sizeof(string);
if(RegOpenKeyEx(HKEY_LOCAL_MACHINE, key.c_str(),0L,KEY_ALL_ACCESS,&keyHandle) == ERROR_SUCCESS)
{
LONG isgood = RegQueryValueEx(keyHandle, name.c_str(), 0, &keytype, (LPBYTE)&oldValue, &size);
if(isgood == ERROR_MORE_DATA)
{
cout << "Error more data\n";
}
if(isgood == ERROR_SUCCESS)
{
cout << "Old data is " << oldValue.c_str() << endl;
cout << "New data is " << newValue.c_str() << endl;
if(strcmp(newValue.c_str(), oldValue.c_str()) != 0) // compare 2 strings, if
{
cout << "String 1 and string 2 are different";
}
else
{
cout << "String 1 and string 2 are the same";
}
}
if(isgood == ERROR_FILE_NOT_FOUND)
{
cout << "Name in registry not found!";
}
}
}
ERROR_MORE_DATA means that you need to pass in a larger string buffer. The typical pattern you'll need to use is to call once to get the size, then allocate a properly-sized buffer, then call again. Or, alternatively, you can guess at a size, pass in that-sized buffer, and increase size if you get ERROR_MORE_DATA back.
BTW, you are also computing size incorrectly. And you're not closing the registry key. And you're not prepared to support being compiled under unicode or non-unicode modes.
Here's some revised code which addresses these issues.
#include <string>
#include <vector>
#include <iostream>
#include <windows.h>
using namespace std;
namespace std
{
#ifdef _UNICODE
#define tcout wcout
#define tcin wcin
typedef wstring tstring;
#else
#define tcout cout
#define tcin cin
typedef string tstring;
#endif
};
void function(tstring newValue, tstring key, tstring name)
{
// string key - key in registry, ie Myapp\\Options
// string name - name in registry
// string newValue - data in REG_SZ
HKEY keyHandle;
if(RegOpenKeyEx(HKEY_LOCAL_MACHINE, key.c_str(),0L,KEY_ALL_ACCESS,&keyHandle) == ERROR_SUCCESS)
{
DWORD size = 500; // initial size
vector<TCHAR> buf(size);
tstring oldValue;
DWORD keytype = REG_SZ;
LONG isgood = RegQueryValueEx(keyHandle, name.c_str(), 0, &keytype, (LPBYTE) &buf[0], &size);
if(isgood == ERROR_SUCCESS)
{
oldValue.assign (&buf[0], size);
}
else if(isgood == ERROR_MORE_DATA)
{
buf.reserve (size); // expand to however large we need
isgood = RegQueryValueEx(keyHandle, name.c_str(), 0, &keytype, (LPBYTE)&buf[0], &size);
if(isgood == ERROR_SUCCESS)
oldValue.assign (&buf[0], size);
}
RegCloseKey (keyHandle); // remember to close this!
if(isgood == ERROR_SUCCESS)
{
tcout << _T("Old data is ") << oldValue << endl;
tcout << _T("New data is ") << newValue << endl;
if(newValue.compare(oldValue) != 0) // compare 2 strings, if
{
tcout << _T("String 1 and string 2 are different");
}
else
{
tcout << _T("String 1 and string 2 are the same");
}
}
if(isgood == ERROR_FILE_NOT_FOUND)
{
tcout << _T("Name in registry not found!");
}
}
}
int _tmain(int argc, _TCHAR* argv[])
{
tstring val;
function (val, _T("SOFTWARE\\Microsoft\\Windows\\CurrentVersion"), _T("CommonFilesDir"));
return 0;
}
ERROR_MORE_DATA means the buffer you supplied to hold the data is not big enough.
Your problems are multiple:
When you say sizeof(string) you're getting the size of the string data type, not the length of the string. You should call string::size() to get the length of the string.
You can't just cast a string to an LPBYTE. That is going to FAIL miserably. The Registry APIs are not designed to work with strings, they are designed to work with char* and WCHAR* types. You need to declare a local character array (e.g. char *foo = new char[256]) and then pass that. If you get ERROR_MORE_DATA, declare a bigger one.