Unzip by using the "CompressedFolder" COM object - c++

I unpack a zip archive using Win API. This API is based on COM interfaces; the COM model is accessible through the CompressFolder COM object.
I encountered the following problem. When I unpack a small file (3.5 MB) it takes a long time. I figured out that IStream::Read() causes this problem. It works slowly. I use a small buffer (1KB) to read this file in many iterations; if I use a buffer that nearly equals the file size, then it works much faster.
How can I get it to unpack fast even if the buffer size is much smaller than file size? Is it possible? I think it is important because the files may be big, say 1 GB.
Here is a fragment of the code that reads a file:
...
CComPtr<IEnumSTATSTG> pEnum = NULL;
pStorage->EnumElements(0, NULL, 0, &pEnum);
STATSTG stasStg;
while (S_OK == pFolderEnum->Next(1, &stasStg, NULL)) {
if (stasStg.type == STGTY_STREAM) {
CComPtr<IStream> pStream = NULL;
pStorage->OpenStream(stasStg.pwcsName, NULL, STGM_READ, NULL, &pStream);
...
while (hr == S_OK) {
// reading
pStream->Read(btBuffer, 1024, &ulByresRead); // it works slowly
}
}
}
A side question I have:
Is there method to detect a packed file size through IStream without reading the file?

It is not possible to achieve fast read with small buffers. Indeed, the more I/O operation you do, the more time it takes.
Try to limit the number of I/O operation by taking a relatively big buffer size. Then of course you must limit it in accordance with the memory you want to allocate to your program.
Aside, you may get delay because program loads libraries. This doesn't happen for Winzip if associated dll already is loaded.

Related

Efficiently Capture Frames in DirectX 11

I'm trying to capture every frame of a game I am playing. There are plenty of good screen capturing softwares out there, One of which is built right into Windows 10.
However I need a custom approach for different reasons. I'm currently using the DirectX Tool Kits SaveWICTextureToFile() method to save every frame produced by Present().
https://github.com/microsoft/DirectXTK
For every frame that is captured, I'd like to tag the end of the file name with it's applicable number. ScreenShot_0 through ScreenShot_n.
The method SaveWICTextureToFile() saves the screenshot for you like so:
DirectX::SaveWICTextureToFile(context, backbufferTex, GUID_ContainerFormatJpeg, L"C:/Users/User Name/Desktop/Images/ScreenShot.JPG");
This doesn't allow you to capture frames sequentially. It simply writes over the same file for each frame. The performance however is very smooth. No lagging whatsoever during gameplay.
To try and write a file for each frame I did the following:
#include <sstream>
int Frame_Number;
//For each Call to Present() do the following:
//Get Device
ID3D11Device* device;
HRESULT gd = pSwapChain->GetDevice(__uuidof(ID3D11Device), (void**)&device);
assert(gd == S_OK);
//Get context
ID3D11DeviceContext* context;
device->GetImmediateContext(&context);
//get back buffer
ID3D11Texture2D* backbufferTex;
HRESULT gb = pSwapChain->GetBuffer(0, __uuidof(ID3D11Texture2d), (LPVOID*)&backbufferTex);
assert(gb == S_OK);
//Set-up Directory
std::wstringstream Image_Directory;
Image_Directory << L"C:/Users/User Name/Desktop/Images/ScreenShot_" << Frame_Number << L".JPG";
//Capture Frame
REFGUID GUID_ContainerFormatJpeg{ 0x19e4a5aa, 0x5662, 0x4fc5, 0xa0, 0xc0, 0x17, 0x58, 0x2, 0x8e, 0x10, 0x57 };
HRESULT hr = DirectX::SaveWICTextureToFile(context, backbufferTex, GUID_ContainerFormatJpeg, Image_Directory.str().c_str());
assert(hr == S_OK);
Frame_Number = Frame_Number + 1;
This worked, however the performance is choppy. As compared to the previous method, I don't get smooth gameplay anymore. Would somebody be able to recommend a more efficient way to do this?
If this is being done in a debug build I suspect that the stringstream has logic that is causing the problems, I suggest using character arrays instead and swprintf. Even better if you keep track of where the directory ends and so only need to write out the file name part (you could go even further and make it so you only need to format the number and extension).
It simply writes over the same file for each frame. The performance however is very smooth.
If everything else is equal, there are several possibilities for the slowdown:
The file is being written only on memory and never committed to disk, since it keeps being overwritten.
Opening many files per second is never a good idea, specially on Windows which is particularly slow at this compared to eg. Linux.
Writing many files into the same folder is another bad idea, since many filesystems do not handle that case well.
Find out which one of those three is the culprit, and iterate from there.
And let us know! It is always interesting to hear how fast IO is nowadays for different use cases :-)

Faster methods of reading a large amount of text/text files?

I'm currently in the process of making a program to read a large number of text files, and searching for regular expressions, then saving the line text and line number, as well as the file name and file folder path, and writing that data to a .csv file. The method I'm using is as follows:
string line;
ifstream stream1(filePath)
{
while (getline(stream1,line))
{
// Code here that compares regular search expression to the line
// If match, save data to a tuple for later writing to .csv file.
}
}
I'm wondering if there is a faster method to do this. I wrote the same type of program in Matlab (which I'm more experienced in) using the same logic as described above, going line by line. I had run time down to roughly 5.5 minutes for 300 MB of data (which I'm not even sure if that's fast or not, probably not), but in Visual Studio it's taking as much as 2 hours for the same data.
I had heard of how fast C++ can be for data reading/writing so I'm a little confused by these results. Is there a faster method? I tried looking around online but all I found was memory mapping which seemed to only be Linux/Unix?
You can use memory-mapped files.
Since you’re on Windows, the correct API is probably CAtlFileMapping<char> template class. Here's an example.
#include <atlfile.h>
// Error-checking macro
#define CHECK( hr ) { const HRESULT __hr = ( hr ); if( FAILED( __hr ) ) return __hr; }
HRESULT testMapping( const wchar_t* path )
{
// Open the file
CAtlFile file;
CHECK( file.Create( path, GENERIC_READ, FILE_SHARE_READ, OPEN_EXISTING ) );
// Map the file
CAtlFileMapping<char> mapping;
CHECK( mapping.MapFile( file ) );
// Query file size
ULONGLONG ullSize;
CHECK( file.GetSize( ullSize ) );
const char* const ptrBegin = mapping;
const size_t length = (size_t)ullSize;
// Process the mapped data, e.g. call memchr() to find your new lines
return S_OK;
}
Don’t forget that for 32-bit processes address space is limited, compiling a 64-bit program makes a lot of sense for this application.
Also, if your files are very small, you have huge count if them, and they are stored on a fast SSD, better approach is processing multiple files in parallel. But it’s somewhat harder to implement.

WinInet InternetReadFile returns 0x8007007a (The data area passed to a system call is too small)

I have an issue with WinInet's InternetReadFile (C++).
In some rare cases the function fails and GetLastError returns the mentioned error 0x8007007a (which according to ErrorLookup corresponds to "The data area passed to a system call is too small").
I have a few questions regarding this:
Why does this happen in some rare cases but in other cases works
fine (I'm talking of course about always downloading the same ~15MB
zip file) ?
Is this really related to the buffer size passed to the API call ? I am using a const buffer size of 1024 BYTES for this call. Should I use a bigger buffer size ? If so, how can I know what is the "right" buffer size ?
What can I do to recover during run time if I do get this error ?
Adding a code snippet (note that this will not work as is because some init code is necessary):
#define HTTP_RESPONSE_BUFFER_SIZE 1024
std::vector<char> responseBuffer;
DWORD dwResponseBytesRead = 0;
do
{
const size_t oldBufferSize = responseBuffer.size();
responseBuffer.resize(oldBufferSize + HTTP_RESPONSE_BUFFER_SIZE);
// Now we read again to the last place we stopped
// writing in the previous iteration.
dwResponseBytesRead = 0;
BOOL bInternetReadFile = ::InternetReadFile(hOpenRequest, // hFile. Retrieved from a previous call to ::HttpOpenRequest
(LPVOID)&responseBuffer[oldBufferSize], // lpBuffer.
HTTP_RESPONSE_BUFFER_SIZE, // dwNumberOfBytesToRead.
&dwResponseBytesRead); // lpdwNumberOfBytesRead.
if(!bInternetReadFile)
{
// Do clean up and exit.
DWORD dwErr = ::GetLastError(); // This, in some cases, will return: 0x7a
HRESULT hr = HRESULT_FROM_WIN32(dwErr); // This, in some cases, will return: 0x8007007a
return;
}
// Adjust the buffer according to the actual number of bytes read.
responseBuffer.resize(oldBufferSize + dwResponseBytesRead);
}
while(dwResponseBytesRead != 0);
It is a documented error for InternetReadFile:
WinINet attempts to write the HTML to the lpBuffer buffer a line at a time. If the application's buffer is too small to fit at least one line of generated HTML, the error code ERROR_INSUFFICIENT_BUFFER is returned as an indication to the application that it needs a larger buffer.
So you are supposed to handle this error by increasing the buffer size. Just double the size, repeatedly if necessary.
There are some discrepancies in question. It isn't clear that you are reading an HTML file for one, 15MB seems excessive. Another is that this error should repeat well. But most troubling is the error code value, it is wrapped in an HRESULT, the kind of error code that a COM component would return. You should be getting a Windows error code back from GetLastError(), just 0x7a and not 0x8007007a.
Do make sure that your error checking is correct. Only ever call GetLastError() when InternetReadFile() returned FALSE. If that checks out (always post a snippet please) then do consider that this error is actually generated upstream, perhaps the firewall or flaky anti-malware.

ERROR_INSUFFICIENT_BUFFER returned from GetAdaptersAddresses

Using the following code, more or less copy-pasted from the MSDN example of
GetAdaptersAddresses, I get the return value 122, which means ERROR_INSUFFICIENT_BUFFER (according to this system error code list).
ULONG outBufLen = 150000; // Tried for different (large) values here...
PIP_ADAPTER_ADDRESSES pAddresses = (IP_ADAPTER_ADDRESSES *) malloc(outBufLen);
DWORD dwRetVal = GetAdaptersAddresses(AF_INET, 0, NULL, pAddresses, &outBufLen);
// ....
free(pAddresses);
The documentation of GetAdaptersAddresses does not list ERROR_INSUFFICIENT_BUFFER as one of the expected return values. (It lists ERROR_BUFFER_OVERFLOW, which should adjust outBufLen to the needed value, but that remains unchanged).
Using GetAdaptersInfo instead leads to the same symptoms.
This error does not occur on my development machine, but on one virtual and one real clean Windows 7 x86 SP1 installation (added the VC++ redistributables).
As a c++ newbie, am I doing something wrong? What could cause this error and how to fix it? =)
First of all, you can - as others suggested - do two calls, to find out required buffer size, and then do the query itself. Especially if you are seeing the error, your first try would be to ask API what size it expected.
Second, you need to know that this API is not quite safe in 32-bit processes consuming high amounts of memory, so that buffers span into higher 2GB of address space. API might start acting in a weird way, either due to its own bug, or a bug in an underlying layer. See details on this on MS Connect here: GetAdaptersAddresses API incorrectly returns no adapters for a process with high memory consumption.
The fact that error code is not "one of the expected return values" tells for the versions that the error comes from an underlying layer and this API just passes it up on internal failure. As a clue, having disabled some network adapter on the system, you might get rid of the error.
Visual Studio deployed a library named "IPHLPAPI.dll" together with my project which caused the problem. Deleting this file solved it.
Why this was the case is subject to further research =)
First, a buffer is a block of memory.
So insufficient could mean that you haven't given it enough memory somehow. Our could be a block of memory which you don't have access to. Maybe the address doesn't even exist.
Look at this:
ERROR_INSUFFICIENT_BUFFER
122 (0x7A)
The data area passed to a system call is too small.
This sounds really like the buffer hasn't got enough allocated memory. Or similar.
Maybe the
outBufLen
has to be a specific length, maybe the size of the memory block. Because sometimes it doesn't check for the 'name' but tries to compare for each of the variables size. This idea came from the High Level Shader Language.
So i would try to look a bit more on the:
ULONG outBufLen = 150000; // Tried for different (large) values here...
PIP_ADAPTER_ADDRESSES pAddresses = (IP_ADAPTER_ADDRESSES *) malloc(outBufLen);
Good luck!
To know the exact buffer size required, you can just pass NULL into pAddresses and size will be set to the required size. You may want to rewrite your code slightly to make that work;
DWORD rv, size = 0;
PIP_ADAPTER_ADDRESSES adapter_addresses;
rv = GetAdaptersAddresses(AF_INET, 0, NULL, NULL, &size);
if (rv != ERROR_BUFFER_OVERFLOW)
return false; // ERROR
adapter_addresses = (PIP_ADAPTER_ADDRESSES)malloc(size);
rv = GetAdaptersAddresses(AF_INET, 0, NULL, adapter_addresses, &size);
if (rv != ERROR_SUCCESS) {
free(adapter_addresses);
return false; // ERROR
}

appending to a memory-mapped file

I'm constantly appending to a file of stock quotes (ints, longs, doubles, etc.). I have this file mapped into memory with mmap.
What's the most efficient way to make newly appended data available as part of the memory mapping?
I understand that I can open the file again (new file descriptor) and then mmap it to get the new data but that seems to be inefficient. Another approach that has been suggested to me is to pre-allocate the file in 1mb chunks, write to a specific position until reaching the end then ftruncate the file to +1mb.
Are there other approaches?
Doest Boost help with this?
Boost.IOStreams has fixed-size only memory mapped files, so it won't help with your specific problem. Linux has an interface mremap which works as follows:
void *new_mapping = mremap(mapping, size, size + GROWTH, MREMAP_MAYMOVE);
if (new_mapping == MAP_FAILED)
// handle error
mapping = new_mapping;
This is non-portable, however (and poorly documented). Mac OS X seems not to have mremap.
In any case, you don't need to reopen the file, just munmap it and mmap it again:
void *append(int fd, char const *data, size_t nbytes, void *map, size_t &len)
{
// TODO: check for errors here!
ssize_t written = write(fd, data, nbytes);
munmap(map, len);
len += written;
return mmap(NULL, len, PROT_READ, 0, fd, 0);
}
A pre-allocation scheme may be very useful here. Be sure to keep track of the file's actual length and truncate it once more before closing.
I know the answer has already been accepted but maybe it will help someone else if I provide my answer. Allocate a large file ahead of time, say 10 GiB in size. Create three of these files ahead of time, I call them volumes. Keep track of your last known location somewhere like in the header, another file, etc. and then keep appending from that point. If you reach the maximum size of the file and run out of room switch to the next volume. If there are no more volumes, create another volume. Note that you would probably do this a few volumes ahead to make sure not to block your appends waiting for a new volume to be created. That's how we implement it where I work for storing continuous incoming video/audio in a DVR system for surveillance. We don't waste space to store file names for video clips which is why we don't use a real file system and instead we go flat file and we just track offsets, frame information (fps, frame type, width/height, etc), time recorded and camera channel. For you storage space is cheap for the kind of work you are doing, whereas your time is invaluable. So, grab as much as you want to ahead of time. You're basically implementing your own file system optimized for your needs. The needs that general-use file systems supply aren't the same needs that we need in other fields.
Looking at man page for mremap it should be possible.
My 5cents, but they are more C specific.
Make normal file, but mmap huge size - e.g file is say 100K, but mmap 1GB or more. Then you can safely access everything up to file size. Access over file size will result in error.
If you are on 32bit OS, just dont make mmap too big, because it will eat your address space.
If you're using boost/iostreams/device/mapped_file.hpp on windows:
boost::filesystem::resize_file throws an exception if a reading mapping object is open, due to lack of sharing privileges.
Instead, use windows-api to resize the file on the disc, and the reading mapped_files can still be open.
bool resize_file_wapi(string path, __int64 new_file_size) //boost::uintmax_t size
{
HANDLE handle = CreateFile(path.c_str(), GENERIC_WRITE, FILE_SHARE_READ | FILE_SHARE_WRITE, 0, OPEN_EXISTING,
FILE_ATTRIBUTE_NORMAL, 0);
LARGE_INTEGER sz;
sz.QuadPart = new_file_size;
return handle != INVALID_HANDLE_VALUE
&& ::SetFilePointerEx(handle, sz, 0, FILE_BEGIN)
&& ::SetEndOfFile(handle)
&& ::CloseHandle(handle);
}