SHCreateStreamOnFileEx on files larger than 2**32 bytes - c++

I'm getting an IStream for a file using SHCreateStreamOnFileEx, but its Read() method appears to misbehave on extremely large files when the new position of the seek pointer is 2 ** 32 bytes or further into the file.
ISequentialStream::Read's documentation says:
This method adjusts the seek pointer by the actual number of bytes read.
This is the same behaviour as read(2) and fread(3) on all platforms I'm aware of.
But with these streams, this isn't the actual behaviour I see in some cases:
Seek(2 ** 32 - 2, SEEK_SET, &pos), Read(buf, 1, &bytesRead), Seek(0, MOVE_CUR, &pos) → bytesRead == 1 and pos == 2 ** 32 - 1, as expected.
Seek(2 ** 32 - 1, SEEK_SET, &pos), Read(buf, 1, &bytesRead), Seek(0, MOVE_CUR, &pos) → bytesRead == 1, but pos == (2 ** 32 - 1) + 4096, which is incorrect. This means that any subsequent reads (without another Seek to fix the cursor position) read the wrong data, and my application doesn't work!
Am I “holding it wrong”? Is there some flag I need to set to make this class behave properly? Or is this a bug in Shlwapi.dll?
The code below reproduces this problem for me. (Set OFFSET = WORKS to see the successful case.)
#include "stdafx.h"
static const int64_t TWO_THIRTY_TWO = 4294967296LL;
static const int64_t WORKS = TWO_THIRTY_TWO - 2LL;
static const int64_t FAILS = TWO_THIRTY_TWO - 1LL;
static const int64_t OFFSET = FAILS;
static void checkPosition(CComPtr< IStream > fileStream, ULONGLONG expectedPosition)
{
LARGE_INTEGER move;
ULARGE_INTEGER newPosition;
move.QuadPart = 0;
HRESULT hr = fileStream->Seek(move, SEEK_CUR, &newPosition);
ASSERT(SUCCEEDED(hr));
ULONGLONG error = newPosition.QuadPart - expectedPosition;
ASSERT(error == 0);
}
int main()
{
const wchar_t *path = /* path to a file larger than 2**32 bytes */ L"C:\\users\\wjt\\Desktop\\eos-eos3.1-amd64-amd64.170216-122002.base.img";
CComPtr< IStream > fileStream;
HRESULT hr;
hr = SHCreateStreamOnFileEx(path, STGM_READ, FILE_ATTRIBUTE_NORMAL, FALSE, NULL, &fileStream);
ASSERT(SUCCEEDED(hr));
LARGE_INTEGER move;
ULARGE_INTEGER newPosition;
// Advance
move.QuadPart = OFFSET;
hr = fileStream->Seek(move, SEEK_SET, &newPosition);
ASSERT(SUCCEEDED(hr));
ASSERT(newPosition.QuadPart == OFFSET);
// Check position
checkPosition(fileStream, OFFSET);
// Read
char buf[1];
ULONG bytesRead = 0;
hr = fileStream->Read(buf, 1, &bytesRead);
ASSERT(SUCCEEDED(hr));
ASSERT(bytesRead == 1);
// Check position: this assertion fails if the Read() call moves the cursor
// across the 2**32 byte boundary
checkPosition(fileStream, OFFSET + 1);
return 0;
}

this is really windows bug. tested on several windows version including latest SHCore.DLL version 10.0.14393.0 x64. simple way for reproduce:
void BugDemo(PCWSTR path)
{
// FILE_FLAG_DELETE_ON_CLOSE !
HANDLE hFile = CreateFile(path, FILE_GENERIC_WRITE, FILE_SHARE_READ|FILE_SHARE_DELETE, 0,
CREATE_NEW, FILE_ATTRIBUTE_TEMPORARY|FILE_FLAG_DELETE_ON_CLOSE, 0);
if (hFile != INVALID_HANDLE_VALUE)
{
ULONG dwBytesRet;
// i not want really take disk space
if (DeviceIoControl(hFile, FSCTL_SET_SPARSE, NULL, 0, NULL, 0, &dwBytesRet, NULL))
{
static FILE_END_OF_FILE_INFO eof = { 0, 2 };// 8GB
if (SetFileInformationByHandle(hFile, FileEndOfFileInfo, &eof, sizeof(eof)))
{
IStream* pstm;
if (!SHCreateStreamOnFileEx(path, STGM_READ|STGM_SHARE_DENY_NONE, 0,FALSE, NULL, &pstm))
{
LARGE_INTEGER pos = { 0xffffffff };
ULARGE_INTEGER newpos;
if (!pstm->Seek(pos, STREAM_SEEK_SET, &newpos) && !pstm->Read(&newpos, 1, &dwBytesRet))
{
pos.QuadPart = 0;
if (!pstm->Seek(pos, STREAM_SEEK_CUR, &newpos))
{
DbgPrint("newpos={%I64x}\n", newpos.QuadPart);//newpos={100000fff}
}
}
pstm->Release();
}
}
}
// close and delete
CloseHandle(hFile);
}
}
void BugDemo()
{
WCHAR path[MAX_PATH];
if (ULONG len = GetTempPath(RTL_NUMBER_OF(path), path))
{
if (len + 16 < MAX_PATH)
{
FILETIME ft;
GetSystemTimeAsFileTime(&ft);
swprintf(path + len, L"%08x%08x", ~ft.dwLowDateTime, ft.dwHighDateTime);
BugDemo(path);
}
}
}
I trace virtual long CFileStream::Seek(LARGE_INTEGER, ULONG, ULARGE_INTEGER* ); under debugger and can confirm that this function not design to work with files more than 4GB size
if be more exactly, why is 100000FFF offset - CFileStream use internal buffer for read 1000 byte size. when you ask read 1 byte from FFFFFFFF offset - it actually read 1000 bytes to the buffer and file offset become 100000FFF. when you then call Seek(0, STREAM_SEEK_CUR, &newpos) - CFileStream call SetFilePointer(hFile, 1-1000, 0/*lpDistanceToMoveHigh*/, FILE_CURRENT)
(1 this is internal position in buffer, because we read 1 byte minus buffer size 1000) . if not take to account overflow can be (100000FFF + (1 - 1000)) == 100000000 but
read about SetFilePointer
If lpDistanceToMoveHigh is NULL and the new file position does not fit
in a 32-bit value, the function fails and returns
INVALID_SET_FILE_POINTER.
as result SetFilePointer fail (return INVALID_SET_FILE_POINTER) but CFileStream even not check for this. and then it call SetFilePointerEx(hFile, 0, &newpos, FILE_CURRENT) and return to you newpos which still 100000FFF

Related

Windows FileWrite fails for volume sectors bigger than 0x1FFF [duplicate]

This question already has answers here:
access denied error from WriteFile to physical disk, win7
(1 answer)
CreateFile: direct write operation to raw disk "Access is denied" - Vista, Win7
(4 answers)
Closed 3 years ago.
Why I can not write sector at position bigger than 0x1FFF?
I am trying to write sector in a SD card. The following code work great for sectors number lower than 0x2000 but fail for any sector bigger than 0x1FFFF returning error code number 5. I don't know why?
I don't think this is a duplicate question because I can write sectors in the disk but I can't write sector bigger than 0x1FFF. I am using WinHex and Disk Editor to verify that those sector exist.
#include <windows.h>
#include <stdio.h>
int main()
{
LPCWSTR device_name = L"\\\\.\\PHYSICALDRIVE2";
int sector = 0x2000;
//Open the volume
HANDLE hDisk = CreateFile(device_name, (GENERIC_READ | GENERIC_WRITE), 0, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
if (hDisk != INVALID_HANDLE_VALUE)
{
DWORD ol = 0;
//Lock the volume
if (DeviceIoControl(hDisk, FSCTL_LOCK_VOLUME, NULL, 0, NULL, 0, &ol, NULL))
{
ol = 0;
//Dismount the volume
if (DeviceIoControl(hDisk, FSCTL_DISMOUNT_VOLUME, NULL, 0, NULL, 0, &ol, NULL))
{
unsigned char buff[512];
//Set position at desire sector
int position = sector * 512;
DWORD readBytes = 0;
long moveToHigh = 0;
SetFilePointer(hDisk, position, &moveToHigh, FILE_BEGIN);
//Read the sector
if (ReadFile(hDisk, buff, 512, &readBytes, NULL) && readBytes == 512)
{
//Set the write position
DWORD writenBytes = 0;
moveToHigh = 0;
SetFilePointer(hDisk, position, &moveToHigh, FILE_BEGIN);
if (WriteFile(hDisk, buff, 512, &writenBytes, NULL) && writenBytes == 512)
{
printf("OK for Sector %d \r\n", sector);
}
else
{
DWORD dwError = GetLastError();
printf("Error Code: %d \r\n", dwError);
}
}
}
}
CloseHandle(hDisk);
}
}

Bluetooth LE: setting characteristic to byte array sends wrong values

I am using Bluetoothleapis.h to communicate with a custom Bluetooth Low Energy device.
The device is setup the following way:
Custom GATT service
Characteristic#1 Read/Write (expects 3 bytes)
Characteristic#2 Read/Notify (returns 1 byte)
I am able get proper values from characteristic#2. However, when I try to send data to characteristic#1, the device receives weird data.
The characteristic is responsible for 3 parameters of a real-life object (imagine a light with intensity, color, etc). (0,0,0) should respond to the "light" being off, but if I send the (0,0,0), I can see that the device receives something else (I cannot tell what exactly, but it is not off). The state does not seem to change no matter what values I send.
I have tried alternating between write and write-no-response, both produce the same result.
GetCharacteristicValue interestingly returns a charValueDataSize of 8, even though the characteristic is known to accept only 3 bytes. Coincidentally, the size for the 1-byte read-only characteristic is 9, for some reason.
I have tried limiting the size of the WriteValue to only 3 bytes, but in this case I get an invalid argument error. Answers elsewhere on StackOverflow have indicated that I need to use the one I get from GetCharacteristicValue, and transfer my data into there.
Given the fact that the real object's state does not change no matter which values are sent, I suspect that the problem is somewhere with the way I set up the byte array to transfer the data.
Furthermore, calling GetCharacteristicValue even after setting it returns an empty array.
I am not sure what values are actually being sent, and I lack the hardware to track them via Wireshark.
DWORD WriteValueToCharacteristic(__in const HANDLE deviceHandle,
__in const CharacteristicData* pCharData,
__in const UCHAR* writeBuffer,
__in const USHORT bufferSize,
__in const BOOL noResponse )
{
HRESULT hr;
PBTH_LE_GATT_CHARACTERISTIC pCharacteristic = pCharData->pCharacteristic;
USHORT charValueDataSize;
hr = BluetoothGATTGetCharacteristicValue
(
deviceHandle,
pCharacteristic,
0,
NULL,
&charValueDataSize,
BLUETOOTH_GATT_FLAG_NONE
);
if (hr != HRESULT_FROM_WIN32(ERROR_MORE_DATA))
{
Log(L"BluetoothGATTSetCharacteristicValue returned error %d", hr);
FormatBluetoothError(hr);
return -1;
}
PBTH_LE_GATT_CHARACTERISTIC_VALUE pWriteValue = (PBTH_LE_GATT_CHARACTERISTIC_VALUE)HeapAlloc
(
GetProcessHeap(), HEAP_ZERO_MEMORY, charValueDataSize + sizeof(BTH_LE_GATT_CHARACTERISTIC_VALUE)
);
if (pWriteValue == NULL)
{
Log(L"Out of memory.");
return -1;
}
hr = BluetoothGATTGetCharacteristicValue
(
deviceHandle,
pCharacteristic,
charValueDataSize,
pWriteValue,
NULL,
BLUETOOTH_GATT_FLAG_FORCE_READ_FROM_DEVICE
);
memcpy(pWriteValue->Data, writeBuffer, bufferSize);
ULONG flags = noResponse == TRUE ? BLUETOOTH_GATT_FLAG_WRITE_WITHOUT_RESPONSE : 0;
hr = BluetoothGATTSetCharacteristicValue
(
deviceHandle,
pCharacteristic,
pWriteValue,
NULL,
flags
);
if (hr != S_OK)
{
Log(L"BluetoothGATTSetCharacteristicValue returned error %d", hr);
FormatBluetoothError(hr);
return -1;
}
HeapFree(GetProcessHeap(), 0, pWriteValue);
return ERROR_SUCCESS;
}
SetCharacteristicValue returns S_OK, producing no errors.
Both reading and writing to the characteristic work fine when using a BLE app on Android.
Update 1
#Shubham pointed out it might be an endianness issue, so I tried to substitute memcpy for the following:
int j = 0;
int i = charValueDataSize - 1;
while (j < bufferSize)
{
pWriteValue->Data[i] = writeBuffer[j];
--i;
++j;
}
However, nothing changed.
Update 2
I have incorporated the changes as per emil's suggestion, and it worked! Posting the full code in case somebody else experiences the same issue.
Incidentally, even though the characteristic is marked as Writable: true, Writable-no-response: false, I need to set the flags to no-response in order for the values to get sent.
DWORD WriteValueToCharacteristic(__in const HANDLE deviceHandle, __in const CharacteristicData* pCharData, __in const UCHAR* writeBuffer, __in const USHORT bufferSize, __in const BOOL noResponse)
{
HRESULT hr;
PBTH_LE_GATT_CHARACTERISTIC pCharacteristic = pCharData->pCharacteristic;
USHORT charValueDataSize = 512;
PBTH_LE_GATT_CHARACTERISTIC_VALUE pWriteValue = (PBTH_LE_GATT_CHARACTERISTIC_VALUE)HeapAlloc
(
GetProcessHeap(), HEAP_ZERO_MEMORY, charValueDataSize + sizeof(BTH_LE_GATT_CHARACTERISTIC_VALUE)
);
if (pWriteValue == NULL)
{
Log(L"Out of memory.");
return -1;
}
hr = BluetoothGATTGetCharacteristicValue
(
deviceHandle,
pCharacteristic,
(ULONG)charValueDataSize,
pWriteValue,
NULL,
BLUETOOTH_GATT_FLAG_FORCE_READ_FROM_DEVICE
);
if (bufferSize > pWriteValue->DataSize)
{
if(pWriteValue->DataSize == 0)
{
pWriteValue->DataSize = bufferSize;
}
}
// after the first write, DataSize stays as 3
//pWriteValue->DataSize here is 3, as expected
//buffer size is also 3
memcpy(pWriteValue->Data, writeBuffer, bufferSize);
ULONG flags = noResponse == TRUE ? BLUETOOTH_GATT_FLAG_WRITE_WITHOUT_RESPONSE : 0;
hr = BluetoothGATTSetCharacteristicValue
(
deviceHandle,
pCharacteristic,
pWriteValue,
NULL,
flags
);
if (hr != S_OK)
{
Log(L"BluetoothGATTSetCharacteristicValue returned error %d", hr);
FormatBluetoothError(hr);
HeapFree(GetProcessHeap(), 0, pWriteValue);
return -1;
}
HeapFree(GetProcessHeap(), 0, pWriteValue);
return ERROR_SUCCESS;
}
My suggestion is that you first set charValueDataSize to 512 + sizeof(BTH_LE_GATT_CHARACTERISTIC_VALUE) (maximum possible), and skip the initial read that would get the size. Then check pWriteValue->DataSize to get the actual size after a successful read. Also make sure you free your memory even in case of error.

How to accelerate C++ writing speed to the speed tested by CrystalDiskMark?

Now I get about 3.6GB data per second in memory, and I need to write them on my SSD continuously. I used CrystalDiskMark to test the writing speed of my SSD, it is almost 6GB per second, so I had thought this work should not be that hard.
![my SSD test result][1]:
[1]https://plus.google.com/u/0/photos/photo/106876803948041178149/6649598887699308850?authkey=CNbb5KjF8-jxJQ "test result":
My computer is Windows 10, using Visual Studio 2017 community.
I found this question and tried the highest voted answer. Unfortunately, the writing speed was only about 1s/GB for his option_2, far slower than tested by CrystalDiskMark. And then I tried memory mapping, this time writing becomes faster, about 630ms/GB, but still much slower. Then I tried multi-thread memory mapping, it seems that when the number of threads is 4, the speed was about 350ms/GB, and when I add the threads' number, the writing speed didn't go up anymore.
Code for memory mapping:
#include <fstream>
#include <chrono>
#include <vector>
#include <cstdint>
#include <numeric>
#include <random>
#include <algorithm>
#include <iostream>
#include <cassert>
#include <thread>
#include <windows.h>
#include <sstream>
// Generate random data
std::vector<int> GenerateData(std::size_t bytes) {
assert(bytes % sizeof(int) == 0);
std::vector<int> data(bytes / sizeof(int));
std::iota(data.begin(), data.end(), 0);
std::shuffle(data.begin(), data.end(), std::mt19937{ std::random_device{}() });
return data;
}
// Memory mapping
int map_write(int* data, int size, int id){
char* name = (char*)malloc(100);
sprintf_s(name, 100, "D:\\data_%d.bin",id);
HANDLE hFile = CreateFile(name, GENERIC_READ | GENERIC_WRITE, 0, NULL, OPEN_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);//
if (hFile == INVALID_HANDLE_VALUE){
return -1;
}
Sleep(0);
DWORD dwFileSize = size;
char* rname = (char*)malloc(100);
sprintf_s(rname, 100, "data_%d.bin", id);
HANDLE hFileMap = CreateFileMapping(hFile, NULL, PAGE_READWRITE, 0, dwFileSize, rname);//create file
if (hFileMap == NULL) {
CloseHandle(hFile);
return -2;
}
PVOID pvFile = MapViewOfFile(hFileMap, FILE_MAP_WRITE, 0, 0, 0);//Acquire the address of file on disk
if (pvFile == NULL) {
CloseHandle(hFileMap);
CloseHandle(hFile);
return -3;
}
PSTR pchAnsi = (PSTR)pvFile;
memcpy(pchAnsi, data, dwFileSize);//memery copy
UnmapViewOfFile(pvFile);
CloseHandle(hFileMap);
CloseHandle(hFile);
return 0;
}
// Multi-thread memory mapping
void Mem2SSD_write(int* data, int size){
int part = size / sizeof(int) / 4;
int index[4];
index[0] = 0;
index[1] = part;
index[2] = part * 2;
index[3] = part * 3;
std::thread ta(map_write, data + index[0], size / 4, 10);
std::thread tb(map_write, data + index[1], size / 4, 11);
std::thread tc(map_write, data + index[2], size / 4, 12);
std::thread td(map_write, data + index[3], size / 4, 13);
ta.join();
tb.join();
tc.join();
td.join();
}
//Test:
int main() {
const std::size_t kB = 1024;
const std::size_t MB = 1024 * kB;
const std::size_t GB = 1024 * MB;
for (int i = 0; i < 10; ++i) {
std::vector<int> data = GenerateData(1 * GB);
auto startTime = std::chrono::high_resolution_clock::now();
Mem2SSD_write(&data[0], 1 * GB);
auto endTime = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(endTime - startTime).count();
std::cout << "1G writing cost: " << duration << " ms" << std::endl;
}
system("pause");
return 0;
}
So I'd like to ask, is there any faster writing method for C++ to writing huge files? Or, why can't I write as fast as tested by CrystalDiskMark? How does CrystalDiskMark write?
Any help would be greatly appreciated. Thank you!
first of all this is not c++ question but os related question. for get maximum performance need need use os specific low level api call, which not exist in general c++ libs. from your code clear visible that you use windows api, so search solution for windows how minimum.
from CreateFileW function:
When FILE_FLAG_NO_BUFFERING is combined with FILE_FLAG_OVERLAPPED,
the flags give maximum asynchronous performance, because the I/O does
not rely on the synchronous operations of the memory manager.
so we need use combination of this 2 flags in call CreateFileW or FILE_NO_INTERMEDIATE_BUFFERING in call NtCreateFile
also extend file size and valid data length take some time, so better if final file at begin is known - just set file final size via NtSetInformationFile with FileEndOfFileInformation
or via SetFileInformationByHandle with FileEndOfFileInfo. and then set valid data length with SetFileValidData or via NtSetInformationFile with FileValidDataLengthInformation. set valid data length require SE_MANAGE_VOLUME_NAME privilege enabled when opening a file initially (but not when call SetFileValidData)
also look for file compression - if file compressed (it will be compressed by default if created in compressed folder) this is very slow writting. so need disbale file compression via FSCTL_SET_COMPRESSION
then when we use asynchronous I/O (fastest way) we not need create several dedicated threads. instead we need determine number of I/O requests run in concurrent. if you use CrystalDiskMark it actually run CdmResource\diskspd\diskspd64.exe for test and this is coresponded to it -o<count> parameter (run diskspd64.exe /? > h.txt for look parameters list).
use non Buffering I/O make task more hard, because exist 3 additional requirements:
Any ByteOffset passed to WriteFile must be a multiple of the sector
size.
The Length passed to WriteFile must be an integral of the sector
size
Buffers must be aligned in accordance with the alignment requirement
of the underlying device. To obtain this information, call
NtQueryInformationFile with FileAlignmentInformation
or GetFileInformationByHandleEx with FileAlignmentInfo
in most situations, page-aligned memory will also be sector-aligned,
because the case where the sector size is larger than the page size is
rare.
so almost always buffers allocated with VirtualAlloc function and multiple page size (4,096 bytes ) is ok. in concrete test for smaller code size i use this assumption
struct WriteTest
{
enum { opCompression, opWrite };
struct REQUEST : IO_STATUS_BLOCK
{
WriteTest* pTest;
ULONG opcode;
ULONG offset;
};
LONGLONG _TotalSize, _BytesLeft;
HANDLE _hFile;
ULONG64 _StartTime;
void* _pData;
REQUEST* _pRequests;
ULONG _BlockSize;
ULONG _ConcurrentRequestCount;
ULONG _dwThreadId;
LONG _dwRefCount;
WriteTest(ULONG BlockSize, ULONG ConcurrentRequestCount)
{
if (BlockSize & (BlockSize - 1))
{
__debugbreak();
}
_BlockSize = BlockSize, _ConcurrentRequestCount = ConcurrentRequestCount;
_dwRefCount = 1, _hFile = 0, _pRequests = 0, _pData = 0;
_dwThreadId = GetCurrentThreadId();
}
~WriteTest()
{
if (_pData)
{
VirtualFree(_pData, 0, MEM_RELEASE);
}
if (_pRequests)
{
delete [] _pRequests;
}
if (_hFile)
{
NtClose(_hFile);
}
PostThreadMessageW(_dwThreadId, WM_QUIT, 0, 0);
}
void Release()
{
if (!InterlockedDecrement(&_dwRefCount))
{
delete this;
}
}
void AddRef()
{
InterlockedIncrementNoFence(&_dwRefCount);
}
void StartWrite()
{
IO_STATUS_BLOCK iosb;
FILE_VALID_DATA_LENGTH_INFORMATION fvdl;
fvdl.ValidDataLength.QuadPart = _TotalSize;
NTSTATUS status;
if (0 > (status = NtSetInformationFile(_hFile, &iosb, &_TotalSize, sizeof(_TotalSize), FileEndOfFileInformation)) ||
0 > (status = NtSetInformationFile(_hFile, &iosb, &fvdl, sizeof(fvdl), FileValidDataLengthInformation)))
{
DbgPrint("FileValidDataLength=%x\n", status);
}
ULONG offset = 0;
ULONG dwNumberOfBytesTransfered = _BlockSize;
_BytesLeft = _TotalSize + dwNumberOfBytesTransfered;
ULONG ConcurrentRequestCount = _ConcurrentRequestCount;
REQUEST* irp = _pRequests;
_StartTime = GetTickCount64();
do
{
irp->opcode = opWrite;
irp->pTest = this;
irp->offset = offset;
offset += dwNumberOfBytesTransfered;
DoWrite(irp++);
} while (--ConcurrentRequestCount);
}
void FillBuffer(PULONGLONG pu, LONGLONG ByteOffset)
{
ULONG n = _BlockSize / sizeof(ULONGLONG);
do
{
*pu++ = ByteOffset, ByteOffset += sizeof(ULONGLONG);
} while (--n);
}
void DoWrite(REQUEST* irp)
{
LONG BlockSize = _BlockSize;
LONGLONG BytesLeft = InterlockedExchangeAddNoFence64(&_BytesLeft, -BlockSize) - BlockSize;
if (0 < BytesLeft)
{
LARGE_INTEGER ByteOffset;
ByteOffset.QuadPart = _TotalSize - BytesLeft;
PVOID Buffer = RtlOffsetToPointer(_pData, irp->offset);
FillBuffer((PULONGLONG)Buffer, ByteOffset.QuadPart);
AddRef();
NTSTATUS status = NtWriteFile(_hFile, 0, 0, irp, irp, Buffer, BlockSize, &ByteOffset, 0);
if (0 > status)
{
OnComplete(status, 0, irp);
}
}
else if (!BytesLeft)
{
// write end
ULONG64 time = GetTickCount64() - _StartTime;
WCHAR sz[64];
StrFormatByteSizeW((_TotalSize * 1000) / time, sz, RTL_NUMBER_OF(sz));
DbgPrint("end:%S\n", sz);
}
}
static VOID NTAPI _OnComplete(
_In_ NTSTATUS status,
_In_ ULONG_PTR dwNumberOfBytesTransfered,
_Inout_ PVOID Ctx
)
{
reinterpret_cast<REQUEST*>(Ctx)->pTest->OnComplete(status, dwNumberOfBytesTransfered, reinterpret_cast<REQUEST*>(Ctx));
}
VOID OnComplete(NTSTATUS status, ULONG_PTR dwNumberOfBytesTransfered, REQUEST* irp)
{
if (0 > status)
{
DbgPrint("OnComplete[%x]: %x\n", irp->opcode, status);
}
else
switch (irp->opcode)
{
default:
__debugbreak();
case opCompression:
StartWrite();
break;
case opWrite:
if (dwNumberOfBytesTransfered == _BlockSize)
{
DoWrite(irp);
}
else
{
DbgPrint(":%I64x != %x\n", dwNumberOfBytesTransfered, _BlockSize);
}
}
Release();
}
NTSTATUS Create(POBJECT_ATTRIBUTES poa, ULONGLONG size)
{
if (!(_pRequests = new REQUEST[_ConcurrentRequestCount]) ||
!(_pData = VirtualAlloc(0, _BlockSize * _ConcurrentRequestCount, MEM_COMMIT, PAGE_READWRITE)))
{
return STATUS_INSUFFICIENT_RESOURCES;
}
ULONGLONG sws = _BlockSize - 1;
LARGE_INTEGER as;
_TotalSize = as.QuadPart = (size + sws) & ~sws;
HANDLE hFile;
IO_STATUS_BLOCK iosb;
NTSTATUS status = NtCreateFile(&hFile,
DELETE|FILE_GENERIC_READ|FILE_GENERIC_WRITE&~FILE_APPEND_DATA,
poa, &iosb, &as, 0, 0, FILE_OVERWRITE_IF,
FILE_NON_DIRECTORY_FILE|FILE_NO_INTERMEDIATE_BUFFERING, 0, 0);
if (0 > status)
{
return status;
}
_hFile = hFile;
if (0 > (status = RtlSetIoCompletionCallback(hFile, _OnComplete, 0)))
{
return status;
}
static USHORT cmp = COMPRESSION_FORMAT_NONE;
REQUEST* irp = _pRequests;
irp->pTest = this;
irp->opcode = opCompression;
AddRef();
status = NtFsControlFile(hFile, 0, 0, irp, irp, FSCTL_SET_COMPRESSION, &cmp, sizeof(cmp), 0, 0);
if (0 > status)
{
OnComplete(status, 0, irp);
}
return status;
}
};
void WriteSpeed(POBJECT_ATTRIBUTES poa, ULONGLONG size, ULONG BlockSize, ULONG ConcurrentRequestCount)
{
BOOLEAN b;
NTSTATUS status = RtlAdjustPrivilege(SE_MANAGE_VOLUME_PRIVILEGE, TRUE, FALSE, &b);
if (0 <= status)
{
status = STATUS_INSUFFICIENT_RESOURCES;
if (WriteTest * pTest = new WriteTest(BlockSize, ConcurrentRequestCount))
{
status = pTest->Create(poa, size);
pTest->Release();
if (0 <= status)
{
MessageBoxW(0, 0, L"Test...", MB_OK|MB_ICONINFORMATION);
}
}
}
}
These are the suggestions that come to my mind:
stop all running processes that are using the disk, in particular
disable Windows Defender realtime protection (or other anti virus/malware)
disable pagefile
use Windows Resource Monitor to find processes reading or writing to your disk
make sure you write continuous sectors on disk
don't take into account file opening and closing times
do not use multithreading (your disk is using DMA so the CPU won't matter)
write data that is in RAM (obviously)
be sure to disable all debugging features when building (build a release)
if using M.2 PCIe disk (seems to be your case) make sure other PCIe
devices aren't stealing PCIe lanes to your disk (the CPU has a
limited number AND mobo too)
don't run the test from your IDE
disable Windows file indexing
Finally, you can find good hints on how to code fast writes in C/C++ in this question's thread: Writing a binary file in C++ very fast
One area that might give you improvement is to have your threads running constantly and each reading from a queue.
At the moment every time you go to write you spawn 4 threads (which is slow) and then they're deconstructed at the end of the function. You'll see a speedup of at least the cpu time of your function if you spawn the threads at the start and have them all reading from separate queue's in an infinite loop.
They'll simply check after a SMALL delay if there's anything in their queue, if their is they'll write it all. Your only issue then is making sure order of data is maintained.

FILE_FLAG_NO_BUFFERING with overlapped I/O - bytes read zero

I observe a weird behavior while using the flag FILE_FLAG_NO_BUFFERING with overlapped I/O.
I invoke a series of ReadFile() function calls and query their statuses later using GetOverlappedResult().
The weird behavior that I am speaking of is that even though file handles were good and ReadFile() calls returned without any bad error(except ERROR_IO_PENDING which is expected), the 'bytes read' value returned from GetOverlappedResult() call is zero for some of the files, and each time I run the code - it is a different set of files.
If I remove the FILE_FLAG_NO_BUFFERING, things start working properly and no bytes read value is zero.
Here is how I have implemented overlapped I/O code with FILE_FLAG_NO_BUFFERING.
long overlappedIO(std::vector<std::string> &filePathNameVectorRef)
{
long totalBytesRead = 0;
DWORD bytesRead = 0;
DWORD bytesToRead = 0;
std::map<HANDLE, OVERLAPPED> handleMap;
HANDLE handle = INVALID_HANDLE_VALUE;
DWORD accessMode = GENERIC_READ;
DWORD shareMode = 0;
DWORD createDisposition = OPEN_EXISTING;
DWORD flags = FILE_FLAG_OVERLAPPED | FILE_FLAG_NO_BUFFERING;
DWORD fileSize;
LARGE_INTEGER li;
char * buffer;
BOOL success = false;
for(unsigned int i=0; i<filePathNameVectorRef.size(); i++)
{
const char* filePathName = filePathNameVectorRef[i].c_str();
handle = CreateFile(filePathName, accessMode, shareMode, NULL, createDisposition, flags, NULL);
if(handle == INVALID_HANDLE_VALUE){
fprintf(stdout, "\n Error occured: %d", GetLastError());
fprintf(stdout," getting handle: %s",filePathName);
continue;
}
GetFileSizeEx(handle, &li);
fileSize = (DWORD)li.QuadPart;
bytesToRead = (fileSize/g_bytesPerPhysicalSector)*g_bytesPerPhysicalSector;
buffer = static_cast<char *>(VirtualAlloc(0, bytesToRead, MEM_COMMIT, PAGE_READWRITE));
OVERLAPPED overlapped;
ZeroMemory(&overlapped, sizeof(overlapped));
OVERLAPPED * lpOverlapped = &overlapped;
success = ReadFile(handle, buffer, bytesToRead, &bytesRead, lpOverlapped);
if(!success && GetLastError() != ERROR_IO_PENDING){
fprintf(stdout, "\n Error occured: %d", GetLastError());
fprintf(stdout, "\n reading file %s",filePathName);
CloseHandle(handle);
continue;
}
else
handleMap[handle] = overlapped;
}
// Status check and bytes Read value
for(std::map<HANDLE, OVERLAPPED>::iterator iter = handleMap.begin(); iter != handleMap.end(); iter++)
{
HANDLE handle = iter->first;
OVERLAPPED * overlappedPtr = &(iter->second);
success = GetOverlappedResult(handle, overlappedPtr, &bytesRead, TRUE);
if(success)
{
/* bytesRead value in some cases is unexpectedly zero */
/* no file is of size zero or lesser than 512 bytes(physical volume sector size) */
totalBytesRead += bytesRead;
CloseHandle(handle);
}
}
return totalBytesRead;
}
With FILE_FLAG_NO_BUFFERING absent, totalBytesRead value is 57 MB. With the flag present, totalBytesRead value is much lower than 57 MB and keeps changing each time I run the code ranging from 2 MB to 15 MB.
Your calculation of bytesToRead will produce 0 as a result when the file size is less than g_bytesPerPhysicalSector. So for small files you are requesting 0 bytes.

Converting HLOCAL to LPTSTR

How is should a HLOCAL data type be converted to a LPTSTR data type? I'm trying to get a code snippet from Microsoft working, and this is the only error, of which I'm not sure how to resolve:
// Create a HDEVINFO with all present devices.
hDevInfo = SetupDiGetClassDevs(NULL,
0, // Enumerator
0,
DIGCF_PRESENT | DIGCF_ALLCLASSES );
if (hDevInfo == INVALID_HANDLE_VALUE)
{
// Insert error handling here.
return NULL;
}
// Enumerate through all devices in Set.
DeviceInfoData.cbSize = sizeof(SP_DEVINFO_DATA);
for (i=0;SetupDiEnumDeviceInfo(hDevInfo, i, &DeviceInfoData);i++)
{
DWORD DataT;
LPTSTR buffer = NULL;
DWORD buffersize = 0;
//
// Call function with null to begin with,
// then use the returned buffer size (doubled)
// to Alloc the buffer. Keep calling until
// success or an unknown failure.
//
// Double the returned buffersize to correct
// for underlying legacy CM functions that
// return an incorrect buffersize value on
// DBCS/MBCS systems.
//
while (!SetupDiGetDeviceRegistryProperty(
hDevInfo,
&DeviceInfoData,
SPDRP_DEVICEDESC,
&DataT,
(PBYTE)buffer,
buffersize,
&buffersize))
{
if (GetLastError() == ERROR_INSUFFICIENT_BUFFER)
{
// Change the buffer size.
if (buffer) LocalFree(buffer);
// Double the size to avoid problems on
// W2k MBCS systems per KB 888609.
buffer = LocalAlloc(LPTR,buffersize * 2); // <- Error Occurs Here
}
else
{
// Insert error handling here.
break;
}
}
printf("Result:[%s]\n",buffer);
if (buffer) LocalFree(buffer);
}
if ( GetLastError()!=NO_ERROR && GetLastError()!=ERROR_NO_MORE_ITEMS )
{
// Insert error handling here.
return NULL;
}
// Cleanup
SetupDiDestroyDeviceInfoList(hDevInfo);
Thoughts are welcome. Thanks.
LocalLock() returns the pointer. But this is 18 year old silliness, just use
// Change the buffer size.
delete buffer;
// Double the size to avoid problems on
// W2k MBCS systems per KB 888609.
buffer = new TCHAR[buffersize * 2];
Ignoring the ~7 year old silliness of still using TCHAR for a moment. Your printf() statement needs work, depending on whether or not you are compiling with Unicode. %ls if you do. I'm guessing that's your real problem, use wprintf().
buffer = (LPTSTR)LocalAlloc(LPTR, buffersize * 2);