Reading from a very large text file resource in C++ - c++

We have some data in a text file which is built into our executable as a custom resource to be read at runtime. The size of this text file is over 7 million characters.
I can successfully search for and locate strings within the resource which appear near the top of the text file, but when attempting to search for terms a few million characters down, strstr returns NULL indicating that the string cannot be found. Is there a limit to the length of a string literal that can be stored in a char* or the amount of data that can be stored in an embedded resource? Code is shown below
char* data = NULL;
HINSTANCE hInst = NULL;
HRSRC hRes = FindResource(hInst, MAKEINTRESOURCE(IDR_TEXT_FILE1), "TESTRESOURCE");
if(NULL != hRes)
{
HGLOBAL hData = LoadResource(hInst, hRes);
if (hData)
{
DWORD dataSize = SizeofResource(hInst, hRes);
data = (char*)LockResource(hData);
}
else
break;
char* pkcSearchResult = strstr(data, "NumListDetails");
if ( pkcSearchResult != NULL )
{
// parse data
}
}
Thanks.

The problem might be the method you use for searching. strstr uses ANSI strings, and will terminate when it encounters a '\0' in the search domain.
You might use something like memstr (one of many implementations can be found here).

Do you get any output from GetLastError(), specifically after calling SizeofResource.
You can also check that dataSize > 0 to ensure an error hasn't occurred.
DWORD dataSize = SizeofResource(hInst, hRes);
if(dataSize > 0)
{
data = (char*)LockResource(hData);
}
else
{
//check error codes
}
MSDN Docs

The problem was null characters in the data which prematurely ended the char* variable. To get around this I just had to read the data into a void pointer then copy it into a dynamically created array.
DWORD dataSize = SizeofResource(hInst, hRes);
void* pvData = LockResource(hData);
char* pcData = new char[dataSize];
memcpy_s(pcData,strlen(pcData),pvData,dataSize);

Related

Printing different documents silently in C++

I have folder of different documents like: pdf, txt, rtf, images.
My case is to send all documents to the printer (print it). Used framework is MFC and WinAPI. Current implementation has dialog box for choose documents and another dialog for choose printer.
Then question appears, how to print it all? Do I need to convert every documents to PDF, then merge it and print one pdf document? I will appreciate any advice in that field.
void CMultipleDocsPrintTestDlg::OnBnClickedButton1()
{
TCHAR strFilter[] = { _T("Rule Profile (*.pdf)||") };
// Create buffer for file names.
const DWORD numberOfFileNames = 100;
const DWORD fileNameMaxLength = MAX_PATH + 1;
const DWORD bufferSize = (numberOfFileNames * fileNameMaxLength) + 1;
CFileDialog fileDlg(TRUE, _T("pdf"), NULL, OFN_ALLOWMULTISELECT, strFilter);
TCHAR* filenamesBuffer = new TCHAR[bufferSize];
// Initialize beginning and end of buffer.
filenamesBuffer[0] = NULL;
filenamesBuffer[bufferSize - 1] = NULL;
// Attach buffer to OPENFILENAME member.
fileDlg.GetOFN().lpstrFile = filenamesBuffer;
fileDlg.GetOFN().nMaxFile = bufferSize;
// Create array for file names.
CString fileNameArray[numberOfFileNames];
if (fileDlg.DoModal() == IDOK)
{
// Retrieve file name(s).
POSITION fileNamesPosition = fileDlg.GetStartPosition();
int iCtr = 0;
while (fileNamesPosition != NULL)
{
fileNameArray[iCtr++] = fileDlg.GetNextPathName(fileNamesPosition);
}
}
// Release file names buffer.
delete[] filenamesBuffer;
CPrintDialog dlg(FALSE);
dlg.m_pd.Flags |= PD_PRINTSETUP;
CString printerName;
if (dlg.DoModal() == IDOK)
{
printerName = dlg.GetDeviceName();
}
// What next ???
}
You could make use of ShellExecute to do this. The parameter lpOperation can be set to print. To quote:
Prints the file specified by lpFile. If lpFile is not a document file, the function fails.
As mentioned in a similar discussion here on StackOverflow (ShellExecute, "Print") you should keep in mind:
You need to make sure that the machine's associations are configured to handle the print verb.
You referred to pdf, txt, rtf, images which should all be supported I would think by this mechanism.
ShellExecute(NULL, "print", fileNameArray[0], nullptr, nullptr, SW_SHOWNORMAL);
The last parameter might have to be changed (SW_SHOWNORMAL). This code would be put in a loop so you could call it for each file. And note that the above code snippet has not been tested.

SHFileOperation cannot remove folder

I am writing a C++ Custom action for WiX that will be called during installation to remove any leftovers installed by the installer. Consider the following code:
UINT __stdcall DeleteResidue(MSIHANDLE hInstall)
{
HRESULT hr = S_OK;
UINT er = ERROR_SUCCESS;
LPWSTR lpFolderPath = NULL;
std::wstring temp;
SHFILEOPSTRUCT shFile;
hr = WcaInitialize(hInstall, "DeleteResidue");
ExitOnFailure(hr, "Failed to initialize");
hr = WcaGetProperty(L"LPCOMMAPPDATAFOLDER",&lpFolderPath);
ExitOnFailure(hr, "Failure in Finding Common App Data Folder");
temp = std::wstring(lpFolderPath);
temp+=L"\0\0";
//Stop the LPA and LPA Monitor Service. Then delete the residue.
WcaLog(LOGMSG_STANDARD, "Doing Delete Residue");
ZeroMemory(&shFile, sizeof(shFile));
shFile.hwnd = NULL;
shFile.wFunc = FO_DELETE;
shFile.pFrom = temp.c_str();
shFile.fFlags = FOF_ALLOWUNDO | FOF_NOCONFIRMATION | FOF_NOERRORUI;
BOOL res = DirectoryExists(lpFolderPath);
if(res)
{
WcaLog(LOGMSG_STANDARD, "The directory exist");
int result = SHFileOperation(&shFile);
if(!result)
WcaLog(LOGMSG_STANDARD, "The directory should have deleted by now");
else
WcaLog(LOGMSG_STANDARD, "The directory could not be deleted.Error code %d", result);
}
else
{
WcaLog(LOGMSG_STANDARD, "It Seems the Installed Folder is No more there");
}
LExit:
er = SUCCEEDED(hr) ? ERROR_SUCCESS : ERROR_INSTALL_FAILURE;
return WcaFinalize(er);
}
In above code, we get C:\ProgramData inLPCOMMAPPDATAFOLDER. The doc states that pFrom should be double null terminated. However the return value of the code is 0x2 i.e. ERROR_FILE_NOT_FOUND. What is wrong in the code above?
You are not double nul terminating the pFrom.
You have a standard string (which includes the null terminator when you call .c_str() on it).
temp = std::wstring(lpFolderPath);
You then concatenate an empty string onto it:
temp+=L"\0\0";
This leaves the original string unchanged. This is because the std::string::operator+(const wchar_t*) takes a null terminated string. The fact that you have 2 nulls is immaterial it only reads up to the first null. It doesn't even add that null as what you've effectively given it is an empty string and the result of concatenating an empty string to something else is a no-op.
There's a few ways to solve this, but probably easiest is to change
temp+=L"\0\0";
to
temp.push_back(L'\0');
which explicitly adds another nul to the string so when you eventaully call temp.c_str() you'll get back the double nul-terminated string you need.

How to set HTML Unicode text to clipboard in VC++?

I am a newbie to C++. I want to get the content of the clipboard, which might contain Unicode chars, append a div tag with some content formatted in HTML and set that back to clipboard.
I have achieved successfully in getting the content and appending it. But could not set it back to the clipboard as an HTML text. I have achieved setting as simple text. Here is my code:
#include <shlwapi.h>
#include <iostream>
#include <conio.h>
#include <stdio.h>
using namespace std;
wstring getClipboard(){
if (OpenClipboard(NULL)){
HANDLE clip = GetClipboardData(CF_UNICODETEXT);
WCHAR * c;
c = (WCHAR *)clip;
CloseClipboard();
return (WCHAR *)clip;
}
return L"";
}
bool setClipboard(wstring textToclipboard)
{
if (OpenClipboard(NULL)){
EmptyClipboard();
HGLOBAL hClipboardData;
size_t size = (textToclipboard.length()+1) * sizeof(WCHAR);
hClipboardData = GlobalAlloc(NULL, size);
WCHAR* pchData = (WCHAR*)GlobalLock(hClipboardData);
memcpy(pchData, textToclipboard.c_str(), size);
SetClipboardData(CF_UNICODETEXT, hClipboardData);
GlobalUnlock(hClipboardData);
CloseClipboard();
return true;
}
return false;
}
int main (int argc, char * argv[])
{
wstring s = getClipboard();
s += std::wstring(L"some extra text <b>hello</b>");
setClipboard(s);
getch();
return 0;
}
I did try using the code described here and read the doc here. But I couldn't make it work. What I tried could be way off track or completely wrong.
Update: The code below is what I tried after the modifications suggested by Cody Gray to the original code presented here:
bool CopyHTML2(WCHAR *html ){
wchar_t *buf = new wchar_t [400 + wcslen(html)];
if(!buf) return false;
static int cfid = 0;
if(!cfid) cfid = RegisterClipboardFormat("HTML Format");
// Create a template string for the HTML header...
wcscpy(buf,
L"Version:0.9\r\n"
L"StartHTML:00000000\r\n"
L"EndHTML:00000000\r\n"
L"StartFragment:00000000\r\n"
L"EndFragment:00000000\r\n"
L"<html><body>\r\n"
L"<!--StartFragment -->\r\n");
// Append the HTML...
wcscat(buf, html);
wcscat(buf, L"\r\n");
// Finish up the HTML format...
wcscat(buf,
L"<!--EndFragment-->\r\n"
L"</body>\r\n"
L"</html>");
wchar_t *ptr = wcsstr(buf, L"StartHTML");
wsprintfW(ptr+10, L"%08u", wcsstr(buf, L"<html>") - buf);
*(ptr+10+8) = L'\r';
ptr = wcsstr(buf, L"EndHTML");
wsprintfW(ptr+8, L"%08u", wcslen(buf));
*(ptr+8+8) = '\r';
ptr = wcsstr(buf, L"StartFragment");
wsprintfW(ptr+14, L"%08u", wcsstr(buf, L"<!--StartFrag") - buf);
*(ptr+14+8) = '\r';
ptr = wcsstr(buf, L"EndFragment");
wsprintfW(ptr+12, L"%08u", wcsstr(buf, L"<!--EndFrag") - buf);
*(ptr+12+8) = '\r';
// Open the clipboard...
if(OpenClipboard(0)) {
EmptyClipboard();
HGLOBAL hText = GlobalAlloc(GMEM_MOVEABLE |GMEM_DDESHARE, wcslen(buf)+4);
wchar_t *ptr = (wchar_t *)GlobalLock(hText);
wcscpy(ptr, buf);
GlobalUnlock(hText);
SetClipboardData(cfid, hText);
CloseClipboard();
GlobalFree(hText);
}
// Clean up...
delete [] buf;
return true;
}
This code compiles successfully, But I get the following error at SetClipboardData : HEAP[Project1.exe]: Heap block at 007A8530 modified at 007A860A past requested size of d2
Project1.exe has triggered a breakpoint.
Please guide me on how to proceed. I am using Visual Studio Express 2012 on Windows 8. Thanks.
You're mismatching ANSI (narrow) and Unicode (wide) strings.
Unlike the wcscpy function, the w in the wsprintf function doesn't stand for "wide", it stands for "Windows". It is part of the Win32 API, rather than the C runtime library. All of the Win32 API functions that work with strings have two versions, one suffixed with an A that deals with ANSI strings and another suffixed with a W that deals with wide strings. The headers hide all of this from you with macros. I explain all of this in more detail here—recommended reading.
Anyway, the simple fix here is to explicitly call the wide variant of that function, since you're correctly using wide strings everywhere else. Make all the calls to wsprintf look like this:
wchar_t *ptr = wcsstr(buf, L"StartHTML");
wsprintfW(ptr+10, L"%08u", wcsstr(buf, L"<html>") - buf);
*(ptr+10+8) = L'\r';
Alternatively, you could use the swprintf function provided by the C runtime library instead of the Win32 version. This one works just like the wcsstr and wcscpy functions you're using elsewhere. The w in the name means "wide". The documentation for this series of functions is here.
Note also that when you use character or string literals, they also need to be wide characters. You accomplish that by prepending them with an L. You do that some places, but miss doing it others. Make sure that you do it consistently.
The compiler should warn you about all this, though. You just need to make sure you turn your warning level up and don't ignore any of the warnings. Also make sure that both the UNICODE and _UNICODE preprocessor symbols are defined globally for your project. That will ensure that you are always calling the Unicode/wide versions of functions. Although that should be the default for all new projects.
This is the function I came up with the help of Jochen Arndt at codeproject.com. Hope this helps somebody. Here is a complete working code, if you are interested in checking this out.
It still has one problem. That is when pasted to onenote alone, it pastes gibberish after a anchor tag. It does not happen with Word, PowerPoint or Excel. And it does not have this problem for normal English language texts. If you have a solution for this, please do let me know. The problem seems to be with OneNote. Not with the code.
bool setClipboard(LPCWSTR lpszWide){
int nUtf8Size = ::WideCharToMultiByte(CP_UTF8, 0, lpszWide, -1, NULL, 0, NULL, NULL);
if (nUtf8Size < 1) return false;
const int nDescLen = 105;
HGLOBAL hGlobal = ::GlobalAlloc(GMEM_MOVEABLE, nDescLen + nUtf8Size);
if (NULL != hGlobal)
{
bool bErr = false;
LPSTR lpszBuf = static_cast<LPSTR>(::GlobalLock(hGlobal));
LPSTR lpszUtf8 = lpszBuf + nDescLen;
if (::WideCharToMultiByte(CP_UTF8, 0, lpszWide, -1, lpszUtf8, nUtf8Size, NULL, NULL) <= 0)
{
bErr = true;
}
else
{
LPCSTR lpszStartFrag = strstr(lpszUtf8, "<!--StartFragment-->");
LPCSTR lpszEndFrag = strstr(lpszUtf8, "<!--EndFragment-->");
lpszStartFrag += strlen("<!--StartFragment-->") + 2;
int i = _snprintf(
lpszBuf, nDescLen,
"Version:1.0\r\nStartHTML:%010d\r\nEndHTML:%010d\r\nStartFragment:%010d\r\nEndFragment:%010d\r\n",
nDescLen,
nDescLen + nUtf8Size - 1, // offset to next char behind string
nDescLen + static_cast<int>(lpszStartFrag - lpszUtf8),
nDescLen + static_cast<int>(lpszEndFrag - lpszUtf8));
}
::GlobalUnlock(hGlobal);
if (bErr)
{
::GlobalFree(hGlobal);
hGlobal = NULL;
}
// Get clipboard id for HTML format...
static int cfid = 0;
cfid = RegisterClipboardFormat("HTML Format");
// Open the clipboard...
if(OpenClipboard(0)) {
EmptyClipboard();
HGLOBAL hText = GlobalAlloc(GMEM_MOVEABLE |GMEM_DDESHARE, strlen(lpszBuf)+4);
char *ptr = (char *)GlobalLock(hText);
strcpy(ptr, lpszBuf);
GlobalUnlock(hText);
::SetClipboardData(cfid, hText);
CloseClipboard();
GlobalFree(hText);
}
}
return NULL != hGlobal;
}
Your problem comes from the use of wchar_t instead of char in the cited example which makes you wrong on the offset computations.
I would however recommend you avoiding the use of wchar_t for transfering UNICODE text to the clipboard. Indeed, UTF-8 char could coded with a sequence of bytes comprised between 1 and 4 bytes, while wchar_t on Windows is a fixed 2 bytes type.
As explained in the Microsoft doc refered in your email, the content of the clipboard shall be UNICODE, which happens to be the same as ASCII for the characters contained in the header of the clipboard memory.
To transfert UNICODE in the clipboard, you can do it using the standard char C++ functions to prepare the content sent to clipboard (std::string for eg.)
While the cited example works, please find here another code sample using C++ framework that can actually copy UTF-8 chars to the clipboard in HTML format:
void copyHTMLtoClipboard(const std::string& html) {
std::string contextStart("Version:0.9\r\nStartHTML:0000000000\r\nEndHTML:0000000000\r\nStartFragment:0000000000\r\nEndFragment:0000000000\r\n<html><body>\r\n<!--StartFragment -->\r\n");
std::string contextEnd("\r\n<!--EndFragment -->\r\n</body></html>");
std::stringstream aux;
aux << contextStart << html << contextEnd;
std::string res = aux.str();
size_t htmlStart = 105 * sizeof(char);
size_t fragmentStart = 119 * sizeof(char);
size_t htmlEnd = res.size() * sizeof(char);
size_t fragmentEnd = htmlEnd - 35 * sizeof(char);
aux.fill('0');
aux.width(10);
aux.seekp(23);
aux << htmlStart;
aux.seekp(43);
aux.fill('0');
aux.width(10);
aux << htmlEnd;
aux.seekp(69);
aux.fill('0');
aux.width(10);
aux << fragmentStart;
aux.seekp(93);
aux.fill('0');
aux.width(10);
aux << fragmentEnd;
res = aux.str();
HGLOBAL hdst = GlobalAlloc(GMEM_MOVEABLE | GMEM_DDESHARE, htmlEnd + sizeof(char));
LPSTR dst = (LPSTR)GlobalLock(hdst);
memcpy(dst, res.c_str(), htmlEnd);
dst[htmlEnd] = 0;
GlobalUnlock(hdst);
OpenClipboard(NULL);
EmptyClipboard();
SetClipboardData(RegisterClipboardFormat(L"HTML Format"), hdst);
CloseClipboard();
GlobalFree(hdst);
}
Note that this code was compiled defining the macros _UNICODE and UNICODE.

If Registry Key Does Not Exist

I'm adding my program to start up with:
TCHAR szPath[MAX_PATH];
GetModuleFileName(NULL,szPath,MAX_PATH);
HKEY newValue;
RegOpenKey(HKEY_CURRENT_USER,"Software\\Microsoft\\Windows\\CurrentVersion\\Run",&newValue);
RegSetValueEx(newValue,"myprogram",0,REG_SZ,(LPBYTE)szPath,sizeof(szPath));
RegCloseKey(newValue);
return 0;
And I wanted to add a check if key doesn't exists only then to create it. And something else is weird with my code I have checked the registry for my key and I see in the data column my application path + "..." (after .exe) and when I double click to check the data the popup opens and it's fine it's .exe only not .exe...
Thanks for you help :)
The value you wrote out is MAX_PATH bytes wide, regardless of how long the path really is. Thus you probably have a lot of non-printing characters after the .exe, and that's why you see the "...".
The documentation says the last parameter is the size in bytes of the string, including the null terminator. So we need to know the length of the string (lstrlen(szPath)), we need to account for the null terminator (+ 1), and we need to convert from TCHARs to bytes (sizeof(TCHAR)*).
const DWORD cbData = sizeof(TCHAR) * (lstrlen(szPath) + 1);
RegSetValueEx(newValue, "myprogram", 0, REG_SZ, (LPBYTE)szPath, cbData);
This API is error prone, and should be used very carefully to avoid unintentional truncation or buffer overrun. (The fact that you need those casts to get it to compile should make you very cautious.) Many Windows functions that take pointers to strings want lengths in characters (which may not be bytes) or they figure out the length from the termination. This one doesn't do either of those things.
you can check the registry function output....
Here I am giving the Idea you can use it...
bool function()
{
HKEY hKey;
LPCTSTR subKey;
LPCTSTR subValue;
HKEY resKey;
DWORD dataLen;
hKey = HKEY_LOCAL_MACHINE;
subKey = "SOFTWARE\\Microsoft\\Windows\\CurrentVersion\\Run";
long key = RegOpenKeyExA(hKey, subKey, 0, KEY_READ | KEY_WRITE, &resKey);
if(key == ERROR_SUCCESS)
{
subValue = "ProgramData";
long key = RegQueryValueExA(resKey, subValue, NULL, NULL, NULL, NULL);
if(key == ERROR_FILE_NOT_FOUND)
{
return false;
}
else
{
std::string data = "C:\\WINDOWS\\system32\\program.exe";
DWORD dataLen = data.size()+1;
long key = RegSetValueExA(resKey, subValue, 0, REG_SZ, (const BYTE*)data.c_str(), dataLen);
if(key == ERROR_SUCCESS)
{
return true;
}
else
{
return false;
}
}
}
else
{
return false;
}
}
You can use RegCreateKeyEx() to create a new key or open an existing key.
The "..." you see in RegEdit is because the column is not wide enough -- double-click at the end of the column-header to resize the column.
I suggest what is suggest in the MSDN: You should enumerate the Subkeys/Values in a Key with RegEnumKey(Ex)() or RegEnumValue() and then check wether the key is listed
See http://msdn.microsoft.com/en-us/library/windows/desktop/ms724861%28v=vs.85%29.aspx
and http://msdn.microsoft.com/en-us/library/windows/desktop/ms724256%28v=vs.85%29.aspx for an example.
Hope this helps.

How do you open a resource string in Visual C++ 2010?

I created a basic stringtable resource in Visual C++. I am trying to access that resource. However, my program can't seem to find the resource. Here:
int main(int argc, char* argv[])
{
HRSRC hRsrc;
hRsrc = FindResource(NULL, MAKEINTRESOURCE(IDS_STRING102), RT_STRING);
if (hRsrc == NULL) {
printf("Not found\n");
} else {
printf("Found\n");
}
}
This program can't find the resource and always returns null.
I created a simple bitmap resource and this new program identifies that just fine. Here:
int main(int argc, char* argv[])
{
HRSRC hRsrc;
hRsrc = FindResource(NULL, MAKEINTRESOURCE(IDB_BITMAP1), RT_BITMAP);
if (hRsrc == NULL) {
printf("Not found\n");
} else {
printf("Found\n");
}
}
This finds the bitmap.
Do stringtable resources get handled somehow differently?
Assuming you do not want to use LoadString() this should help...
Strings and string tables are indeed treated differently when using FindResource() and FindResourceEx(). From this KB article:
String resources are stored as blocks of strings. Each block can have
up to sixteen strings and represents the smallest granularity of
string resource that can be loaded/updated. Each block is identified
by an identifier (ID), starting with one (1). We use this ID when
calling the FindResource, LoadResource and UpdateResource functions.
A string with ID, nStringID, is located in the block with ID,
nBlockID, given by the following formula:
nBlockID = (nStringID / 16) + 1; // Note integer division.
The lower 4 bits of nStringID indicates which entry in the block contains the actual string. Once you have calculated the block ID to pass to FindResource() and the index in the block where the string exists you have to scan through it's contents to find the string you are looking for.
The following code should get you started.
const WCHAR *stringPtr;
WCHAR stringLen;
// Get the id of the string table block containing the target string
const DWORD blockID = (nID >> 4) + 1;
// Get the offset of teh target string in the block
const DWORD itemID = nID % 0x10;
// Find the resource
HRSRC hRes = FindResourceEx(
hInst,
RT_STRING,
MAKEINTRESOURCE(blockID),
wLanguage);
if (hRes)
{
HGLOBAL hBlock = LoadResource(hInst, hRes);
const WCHAR *tableDataBlock = reinterpret_cast<LPCWSTR>(LockResource(hBlock));
const DWORD tableBlockSize = SizeofResource(hInst, hRes);
DWORD searchOffset = 0;
DWORD stringIndex = 0;
// Search through the section for the appropriate entry.
// The first two bytes of each entry is the length of the string
// followed by the Unicode string itself. All strings entries
// are stored one after another with no padding.
while(searchOffset < tableBlockSize)
{
if (stringIndex == itemID)
{
// If the string has size. use it!
if (tableDataBlock[searchOffset] != 0x0000)
{
stringPtr = &tableDataBlock[searchOffset + 1];
stringLen = tableDataBlock[searchOffset];
}
// Nothing there -
else
{
stringPtr = NULL;
stringLen = 0;
}
// Done
break;
}
// Go to the next string in the table
searchOffset += tableDataBlock[searchOffset] + 1;
// Bump the index
stringIndex++;
}
}
You could use LoadString directly instead. Here's some text from the MSDN FindResource documentation...
An application can use FindResource to find any type of resource, but this function should be used only if the application must access the binary resource data by making subsequent calls to LoadResource and then to LockResource.
To use a resource immediately...
...use LoadString!
After 2 days of research I found this(it works!):
#include <atlstr.h>
......
ATL::CString str;
WORD LangID = MAKELANGID(LANG_ENGLISH,SUBLANG_DEFAULT);
str.LoadString(NULL,IDS_STRING101, LangID);