We have created a class to customize the open file dialog in c++.
There is character array 'm_fileNameBuf' to hold the selected files names. Since the buffer is set to 5000, it can hold a maximum of 5000 characters.
Later on, large number of files are added and hence the total number of file characters exceeds and it leads to a problem. So we increased the number to 100K. But again, there was a case where the even large number of files are added and it is causing problem.
So the question here is how to avoid this problem? Instead of hard coding the array, is there anyway we can handle it according to the file size?
class DLLEXPORT CustomOpenFileDialog
{
......
......
private:
OPENFILENAME m_OpenFileName;
static const long m_fileNameBufSize = 5000;
TCHAR m_fileNameBuf[m_fileNameBufSize];
......
}
CustomOpenFileDialog::CustomOpenFileDialog()
{
.....
.....
m_OpenFileName.lpstrFile = m_fileNameBuf;
m_OpenFileName.nMaxFile = m_fileNameBufSize;
}
void CustomOpenFileDialog::SetFileName(TCHAR* name)
{
_tcsncpy_s(m_fileNameBuf, m_fileNameBufSize, name, m_fileNameBufSize);
.....
}
Processing Common Crawl warc files. These are 5gb uncompressed. Inside there is text, xml and warc headers.
This is the code I am particulary having trouble with:
wstring sub = buffer->substr(windowStart, windowSize);
Which give me the error, "expression must have a pointer to class type". I take it that this is because the label is a pointer to heap memory location of that size. therefore, I cannot run any string operations on it. But the -> operator should get the contents that it points to so I can run something like substr?
I am using a simple buffer like this because I understand that mapping the file (MapViewOfFile, etc) to memory is more for random access. it is actually slower if all I need is sequential read?
I would like to read the file sequentially. To improve speed, read the file in chunks to the RAM and then process the ram chunk before getting another chunk from the disk. say 1mb per chunk, etc.
I am not processing all the xml, some will be skipped. grabbing the text and some of the warc headers, skipping the rest.
The idea is to use a sliding window through the file chunk in ram. The window starts where it last left off in the chunk. The window grows in size in a loop. once it gets to a sufficient size, regex is used to check to see if there are any matching tags, headers or text. If so, either skips just that tag, skips ahead so many characters (500 chars in some cases if it comes across a particular type of warc header), writes that tag (if it ones I want to keep), etc.
When the window matches, the windowStart is set to equal the windowEnd and it starts expanding the window again to find the next pattern. Once the buffer ends, it keeps track of any partial tags and refills the buffer from the disk.
The main problem I am running into is how to do the while sliding window. The buffer is a pointer to a location in heap memory. I can't use period or -> operators on it for some reason. So I can't use substr, regex, etc. I could make a copy, but do I really need to do that?
Here's my code so far:
BOOL pageActive = FALSE;
BOOL xml = FALSE;
#define MAXBUFFERSIZE 1024
#define MAXTAGSIZE 64
DWORD windowStart = 0; DWORD windowEnd = 15; DWORD windowSize = 15; // buffer window containing tag candidate
wstring windowCopy;
DWORD bufferSize = MAXBUFFERSIZE;
_int64 fileRemaining;
HANDLE hFile;
DWORD dwBytesRead = 0;
OVERLAPPED ol = { 0 };
LARGE_INTEGER dwPosition;
TCHAR* buffer;
hFile = CreateFile(
inputFilePath, // file to open
GENERIC_READ, // open for reading
FILE_SHARE_READ | FILE_SHARE_WRITE, // share for reading and writing
NULL, // default security
OPEN_EXISTING, // existing file only
FILE_ATTRIBUTE_NORMAL, // normal file | FILE_FLAG_OVERLAPPED
NULL); // no attr. template
if (hFile == INVALID_HANDLE_VALUE)
{
DisplayErrorBox((LPWSTR)L"CreateFile");
return 0;
}
LARGE_INTEGER size;
GetFileSizeEx(hFile, &size);
_int64 fileSize = (__int64)size.QuadPart;
double gigabytes = fileSize * 9.3132e-10;
sendToReportWindow(L"file size: %lld bytes \(%.1f gigabytes\)\n", fileSize, gigabytes);
if(fileSize > MAXBUFFERSIZE)
{
TCHAR* buffer = new TCHAR[MAXBUFFERSIZE]; buffer[0] = 0;
//sendToReportWindow(L"buffer is MAXBUFFERSIZE\n");
}
else
{
TCHAR* buffer = new TCHAR[fileSize]; buffer[0] = 0;
//sendToReportWindow(L"buffer is fileSize + 1\n");
}
fileRemaining = fileSize;
sendToReportWindow(L"file remaining: %lld bytes\n", fileRemaining);
//TCHAR readBuffer[MAXBUFFERSIZE] = { 0 };
while (fileRemaining) // outer loop. while file remaining, read file chunk to buffer
{
if (bufferSize > fileRemaining) // as fileremaining gets smaller as file is processed, it eventually is smaller than the buffer
bufferSize = fileRemaining;
if (FALSE == ReadFile(hFile, buffer, bufferSize -1, &dwBytesRead, NULL))
//if (FALSE == ReadFile(hFile, readBuffer, bufferSize -1, &dwBytesRead, NULL))
{
sendToReportWindow(L"file read failed\n");
CloseHandle(hFile);
return 0;
}
fileRemaining -= bufferSize; //fileRemaining is size of the file left after this buffer is processed
sendToReportWindow(L"outer loop\n");
// declare and clear span char array[maxTagSize] // size of array is maximum tag size (64). This is for unused windows. Raw text is not considered a tag
while (windowEnd < bufferSize) //inner loop. while unused data remains in buffer
{
windowSize = windowEnd - windowStart;
// windowsize += span.size
// The window start position remains fixed as the window size is slowly increased. Once it is large enough, some conditional below begin to look at it.If any triggers, they eat that window. Setting the new start position at the previous end position.
// If the buffer ends mid - tag, the contents of the window are copy to the span array variable
// Page state. Tags in header
// If !pageActive
// if windowSize > 7 (warc / 1.0)
// Convert chunk to string for regex ? (prepend span array from previous loop)
// If Regex chunk WARC - Type : response pageActive = true; wstart = wend, clear span
// Elseif regex chunk other warc - type clear span; skip ahead 550 for start, 565 for end
// Continue
// // page is active
//
// if windowSize > 6
// If regex chunk WARC / \d pageActive = false; xml = false; wstart = wend, clear span; Continue
// If !xml
// If windowSize > 15 (warc date)
// Convert chunk to string for regex ? (prepend span array from previous loop)
// If regex chunk warc date output warc date; wstart = wend, clear span
// elseIf regex chunk warc uri output warc uri; wstart = wend, clear span; skip ahead 300
// ElseIf end of window has \nā < ā Xml = true // any window size where xml is not started
// continue // whatever triggers in this !xml block, always continue
// // page and xml are active
// // only send to output bare text when a [^\n]< or newline is reached
// test where just outputs all the tags or text it finds
// pull out any <.+> sequences or any >.+< sequences
// multibyte conversion, build string of window
//LPCCH readBuffer = { "ab" }; // = buffer[2];
// std::string str2 = str.substr (3,5);
//wstring sub = (wstring)readBuffer.substr(0,5); // substring of buffer
wstring sub = buffer->substr(windowStart, windowSize);
TCHAR converted[64] = { 0 };
MultiByteToWideChar(CP_ACP, MB_COMPOSITE, (LPCCH)&sub, -1, converted, MAXBUFFERSIZE);
//MultiByteToWideChar(CP_ACP, MB_COMPOSITE, (LPCCH)buffer, MAXBUFFERSIZE, converted, 1); // convert between the utf encoding of the file to the utf encoding of windows?
sendToReportWindow(L"windowStart:%d windowEnd:%d char:%s\n", windowStart, windowEnd, converted);
//sendToReportWindow((LPWSTR)buffer[windowStart]);
windowStart = windowEnd;
// //Tags in body. Any chunk size
// Convert chunk to string for regex ? (prepend span array from previous loop)
// if regex chunk tag pattern output pattern, wstart = wend, clear span
// nested tags? no
// windowEnd++; // tests above did not bite. so increment end of window, increasing window size
} // inner loop: while windowEnd <buffersize
// end of buffer: load any unused window into span
//If windowEnd != windowStart // window start did not get set to end by regex above
//Span = buffer(start ā end)
//file progress indicator
//fileSize / fileRemaining x 0.01 // calculate percentage of file remaining with each buffer load
//print progress
//windowStart = 0; windowEnd = 1; windowSize = 1 // look at smaller pieces after first iteration (not in w header)
} // outer loop. while fileRemaining
delete buffer;
Which give me the error, "expression must have a pointer to class
type".
TCHAR has no such method as substr.
modify:
wstring str(buffer);
wstring sub = str.substr(windowStart, windowSize);
Other codes that need to be modified:
MultiByteToWideChar(CP_ACP, MB_COMPOSITE, (LPCCH)&sub, -1, converted, MAXBUFFERSIZE);
sendToReportWindow(L"windowStart:%d windowEnd:%d char:%s\n", windowStart, windowEnd, converted);
=> sendToReportWindow(L"windowStart:%d windowEnd:%d char:%s\n", windowStart, windowEnd, sub.c_str()); //use string::c_str method
buffer = new TCHAR[MAXBUFFERSIZE]; buffer[0] = 0; //remove TCHAR*
buffer = new TCHAR[fileSize]; buffer[0] = 0; //remove TCHAR*
I am not processing all the xml, some will be skipped. grabbing the
text and some of the warc headers, skipping the rest.
You can use string::find to grab the warc header.(Make sure the warc header is unique)
ep: Check if a string contains a string in C++
BTW, whether you use Unicode Character or Multi-Byte Character, you need to maintain a single encoding format.
I'm trying to separate a text file (which has a list of 200 strings) and store each other string (even number and odd number in the list) into a 2D Array.
The text file is ordered in this way (without the numbers):
Alabama
Brighton
Arkansas
Bermuda
Averton
Burmingham
I would like to store it in a 2 dimensional array called strLine[101][2] iterating throughout so the first string in the list is in location [0][0] and the second string of the list is in location [0][1], etc until the file finishes reading and the list becomes organized like this (without the numbers):
Alabama | Brighton
Arkansas | Bermuda
Avertinon | Burmingham
My code outputs the original unsorted list at the moment, i would like to know how to implement the 2d array (with correct syntax) and how to implement an i, j for-loop in the getline() function so it can iterate through each element of the 2D array.
Any help would be greatly appreciated.
My code:
bool LoadListBox()
{
// Declarations
ifstream fInput; // file handle
string strLine[201]; // array of string to hold file data
int index = 0; // index of StrLine array
TCHAR szOutput[50]; // output to listbox,
50 char TCHAR
// File Open Process
fInput.open("data.txt"); // opens the file for read only
if (fInput.is_open())
{
getline( // read a line from the file
fInput, // handle of file to read
strLine[index]); // storage destination and index iterator
while (fInput.good()) // while loop for open file
{
getline( // read line from data file
fInput, // file handle to read
strLine[index++]); // storage destination
}
fInput.close(); // close the file
index = 0; // resets back to start of string
while (strLine[index] != "") // while loop for string not void
{
size_t pReturnValue; // return code for mbstowcs_s
mbstowcs_s( // converts string to TCHAR
&pReturnValue, // return value
szOutput, // destination of the TCHAR
50, // size of the destination TCHAR
strLine[index].c_str(), // source of string as char
50); // max # of chars to copy
SendMessage( // message to a control
hWnd_ListBox, // handle to listbox
LB_ADDSTRING, // append string to listbox
NULL, // window parameter not used
LPARAM(szOutput)); // TCHAR to add
index++; // next element of string array
}
return true; // file loaded okay
}
return false; // file did not load okay
}
Step 1
Transform string strLine[201]; to string place[100][2];. Also consider making a
struct place
{
std::string state;
std::string city;
};
because it is a bit more explicit what exactly is being stored. More expressive code is easier to read, generally prevents mistakes (harder to accidentally use strLine[x][2] or something like that), and requires less commenting. Code that comments itself should be a personal goal. The compiler doesn't care, of course, but few people are compilers.
Step 2
Use two separate index variables. Name the first something like num_entries because what it's really doing is counting the number of items in the array.
Step 3
Read two lines into the inner array and test the result of the reads. If they read successfully, increment the index.
while (getline(fInput, place[num_entries][0]) && getline(fInput, place[num_entries][1]))
{
num_entries++;
}
Step 4 (optional clean-up)
Step 2 turns while (strLine[index] != "") into while (index < num_entries)
Replace all of the 50s with a constant. That way you can't change the value and miss a few 50s AND it's easier to infer meaning from a good, descriptive identifier than a raw number.
I'm using Borland C++ Builder 2009 and I display the right and left pointing arrows like so:
Button2->Hint = L"Ctrl+\u2190" ;
Button3->Hint = L"Ctrl+\u2192" ;
This works fine on Windows 7, the application uses font 'Segoe UI'.
On XP I get a square instead of the arrows, I use font 'Tahoma' on XP.
In other words mentioned Unicode characters are not present in Tahoma on XP.
Is there an easy and fast way to simply check if the requested Unicode character is supported in the currently used font ?
If so I could, for instance, replace the arrow with '>' or '<'. Not perfect, but good enough. I don't want to start changing fonts at this stage.
Your help appreciated.
You can use GetFontUnicodeRanges() to see which characters are supported by the font currently selected into the DC. Note that this API requires you to call it once to find out how big the buffer needs to be, and a second time to actually get the data.
DWORD dwSize = GetFontUnicodeRanges(hDC, nullptr);
BYTE* bBuffer = new BYTE[dwSize];
GLYPHSET* pGlyphSet = reinterpret_cast<GLYPHSET*>(bBuffer);
GetFontUnicodeRanges(hDC, pGlyphSet);
// use data in pGlyphSet, then free the buffer
delete[] bBuffer;
The GLYPHSET structure has a member array called ranges which lets you determine the range of characters supported by the font.
Just for reference and the Google Gods:
bool UnicodeCharSupported(HWND Handle, wchar_t Char)
{
if (Handle)
{
DWORD dwSize = GetFontUnicodeRanges(Handle, NULL);
if (dwSize)
{
bool Supported = false ;
BYTE* bBuffer = new BYTE[dwSize];
GLYPHSET* pGlyphSet = reinterpret_cast<GLYPHSET*>(bBuffer);
if (GetFontUnicodeRanges(Handle, pGlyphSet))
{
for (DWORD x = 0 ; x < pGlyphSet->cRanges && !Supported ; x++)
{
Supported = (Char >= pGlyphSet->ranges[x].wcLow &&
Char < (pGlyphSet->ranges[x].wcLow + pGlyphSet->ranges[x].cGlyphs)) ;
}
}
delete[] bBuffer;
return Supported ;
}
}
return false ;
}
Example, relating to my Question:
if (!UnicodeCharSupported(Canvas->Handle, 0x2190))
{ /* Character not supported in current Font, use different character */ }
I am using CEdit with the property of Multiline.My objective is to retrieve the individual line and place it in my CStringArray.
While retrieving the line using GetLine , I have to know the string length of that line.
How to get this?
I tried the function GetLineLength() but that will return the size of the entire line rather than the specified text.
I pasted the code that i have implemented so far:
CEdit m_strMnemonicCode;
CStringArray strMnemonicArray;
LPTSTR temp = new TCHAR[50];;
int nLineCount = m_strMnemonicCode.GetLineCount();
for(int ni = 0 ; ni < nLineCount ; ni++)
{
int len = m_strMnemonicCode.LineLength(m_strMnemonicCode.LineIndex(ni));
//m_strMnemonicCode.GetLine(ni, strText.GetBuffer(len), len);
m_strMnemonicCode.GetLine( ni , temp );
strMnemonicArray.Add(strText);
}
But you need to know the length of the whole line, don't you?
I would not define the buffer as an array of TCHARs, but as a CString, then do GetBuffer() on it.
Check the example in CEdit::GetLineCount
It seems to do more or less what you need.
Edit
I've just written the following test, and it works perfectly for me:
int lc = m_Edit.GetLineCount();
CString strLine;
CStringArray arr;
for (int i = 0; i < lc ; i++)
{
int len = m_Edit.LineLength(m_Edit.LineIndex(i));
m_Edit.GetLine(i, strLine.GetBuffer(len), len);
strLine.ReleaseBuffer(len);
arr.Add(strLine);
}
Maybe you are forgetting to add the buffer length to ReleaseBuffer()?