How can I determine why a call to IXMLDOMDocument::load() fails? - c++

I am trying to debug what appears to be an XML parsing issue in my code. I have isolated it down to the following code snippet:
HRESULT
CXmlDocument::Load(IStream* Stream)
{
CComVariant xmlSource(static_cast<IUnknown*>(Stream));
VARIANT_BOOL isSuccessful;
* HRESULT hr = m_pXmlDoc->load(xmlSource, &isSuccessful);
return (hr == S_FALSE) ? E_FAIL : hr;
}
Note: m_pXmlDoc is of the type CComPtr<IXMLDOMDocument>.
It appears that the call to IXMLDOMDocument::load() (marked with the *) is failing - IOW, it is returning S_FALSE.
I am not able to step into load() to determine why it is failing, as it is a COM call.
The MSDN page for this method doesn't seem to be giving a lot of insight.
I have a few hunches:
The XML is not well-formed
The XML file is too large (approximately 120MB)
It is a memory-related issue (the process size gets to > 2GB at the time of failure)
NB: A registry key has been set to allow the process size to be this large, as the largest valid process size for WinXP, AFAIK, is 2GB).
Any ideas as to why this call could be failing?

The following code will fetch the specific parser error from the DOM and it's location in the source XML.
CComPtr<IXMLDOMParseError> pError;
CComBSTR sReason, sSource;
long nLine = 0, nColumn = 0;
m_pXmlDoc->get_parseError(&pError);
if(pError)
{
pError->get_reason(&sReason);
pError->get_srcText(&sSource);
pError->get_line(&nLine);
pError->get_linepos(&nColumn);
}
sReason will be filled with the error message. sSource will contain the errorneous source line in the XML. nLine and nColumn should get set to the line number and column of the error, although in practice these two aren't always set reliably (iirc, this is especially true of validation errors, rather than parser/well-formedness ones).

If the XML file is opened by another Task or Proccess the Load() method
can't load the file but it doesn't state that the loading has failed.
I consider this behaviour as a bug.
So you have to check the Property documentElement - if it is null, load() has failed, too.

Related

WPD API Detect if Device is a Phone?

EDIT: Full source code was requested. Below is a barebones implementation in order to replicate the bug. Content enumeration is removed, however the crash ocurrs on the first object call anyway. In this case, the WPD_DEVICE_OBJECT_ID object.
LINK TO CPP (Bug begins at line 103)
LINK TO QMAKE.PRO (I'm using Qt)
In my project I use the WPD API to read the contents of a mobile device. I followed the API to a tee and have successfully implemented content enumeration.
However, if a USB drive is connected, the WPD API will also sometimes detect that as a device. My program will go ahead and begin content enumeration anyway. I don't want that. I only want to enumerate mobile devices.
The problem is that during content enumeration, when my program attempts to retrieve a property of an object on the USB drive, it crashes. Here are the crash details:
Problem Event Name: BEX
Application Name: UniversalMC.exe
Application Version: 0.0.0.0
Application Timestamp: 5906a8a3
Fault Module Name: MSVCR100.dll
Fault Module Version: 10.0.40219.325
Fault Module Timestamp: 4df2be1e
Exception Offset: 0008af3e
Exception Code: c0000417
Exception Data: 00000000
OS Version: 6.1.7601.2.1.0.768.3
Locale ID: 1033
Additional Information 1: 185e
Additional Information 2: 185ef2beb7eb77a8e39d1dada57d0d11
Additional Information 3: a852
Additional Information 4: a85222a7fc0721be22726bd2ca6bc946
The crash occurs on this call:
hr = pObjectProperties->GetStringValue(WPD_OBJECT_ORIGINAL_FILE_NAME, &objectName);
hr returns FAILED and then my program crashes.
After some research I've found that exception code c0000417 means a buffer overflow occurred? Correct me if I'm wrong but, is this a vulnerability in the WPD API? If so, how could I detect ahead of time that this device is not a mobile device?
Thanks for your time!
I ended up paying someone to help me pinpoint the issue.
The problem was that the root object (WPD_DEVICE_OBJECT_ID) would not return an object name no matter what (Not true for all devices).
The solution was to simply begin content enumeration FROM the root object and only check the names of its children. In my original implementation, I assumed every object has a name, but apparently that is not the case. The root object is the exception.
Here is a snippet:
CComPtr<IEnumPortableDeviceObjectIDs> pEnumObjectIDs;
// Print the object identifier being used as the parent during enumeration.
//qDebug("%ws\n",pszObjectID);
// Get an IEnumPortableDeviceObjectIDs interface by calling EnumObjects with the
// specified parent object identifier.
hr = pContent->EnumObjects(0, // Flags are unused
WPD_DEVICE_OBJECT_ID, // Starting from the passed in object
NULL, // Filter is unused
&pEnumObjectIDs);
// Enumerate content starting from the "DEVICE" object.
if (SUCCEEDED(hr))
{
// Loop calling Next() while S_OK is being returned.
while(hr == S_OK)
{
DWORD cFetched = 0;
PWSTR szObjectIDArray[NUM_OBJECTS_TO_REQUEST] = {0};
hr = pEnumObjectIDs->Next(NUM_OBJECTS_TO_REQUEST, // Number of objects to request on each NEXT call
szObjectIDArray, // Array of PWSTR array which will be populated on each NEXT call
&cFetched); // Number of objects written to the PWSTR array
if (SUCCEEDED(hr))
{
// Traverse the results of the Next() operation and recursively enumerate
// Remember to free all returned object identifiers using CoTaskMemFree()
for (DWORD dwIndex = 0; dwIndex < cFetched; dwIndex++)
{
//RECURSIVE CONTENT ENUMERATION CONTINUES HERE
//OBJECT NAME CHECKING CONTINUES IN THE RECURSIVE FUNCTION
// Free allocated PWSTRs after the recursive enumeration call has completed.
CoTaskMemFree(szObjectIDArray[dwIndex]);
szObjectIDArray[dwIndex] = NULL;
}
}
}
}
The solution is exactly what the sample project shows to do, however, I made the mistake of checking the name of the root object. So don't do that.
Get the object name if there is no "original file name"
hr = pObjectProperties->GetStringValue(WPD_OBJECT_ORIGINAL_FILE_NAME, &objectName);
if(FAILED(hr)) {
hr = pObjectProperties->GetStringValue(WPD_OBJECT_NAME, &objectName);
}

Error reading character of string - Access violation error C++

I working with a Kinect v2 related project in C++ while I cannot use Depth Frame (BYTE*) outside the function.
It works for first some minutes I think by luck so.
Then I got errors like:
Error reading characters of string
and Access violation error and no symbols loaded for kinect20.dll at some point of time.
Here is the method I am calling the values.
BYTE* bodyIndex = new BYTE[512*424]; // initialization
HRESULT frameGet(){
//Initialization method if success
hr = pDepthFrame->AccessUnderlyingBuffer(&m_nDepthBufferSize, &bodyIndex); //Kinect dll method
prints(depth[300]); // Prints the value every time
return hr;
}
HRESULT getDepthFrame(){
if frameGet is success
prints(bodyIndex[300]); // throws error reading character of string
return hr;
}
Can anyone please explain how I can access the bodyIndex data everytime.
I didnt get any response when posted the full code so need the logic how c++ works.
If assumption is right the depth data got cleaned up after sometimes by kinectdll so it reflects.
I tried with memcpy the error still there.
Thanks in advance.
According to https://msdn.microsoft.com/en-us/library/microsoft.kinect.kinect.idepthframe.accessunderlyingbuffer.aspx
you don't need to allocate the memory.
Gets a pointer to the depth frame data.
public:
HRESULT AccessUnderlyingBuffer(
UINT *capacity,
UINT16 **buffer
)
buffer Type: UINT16 [out] When this method returns, contains the
pointer to the depth frame data.
If I understand the spec correctly you have always call AccessUnderlyingBuffer() before access to it.

WinInet InternetReadFile returns 0x8007007a (The data area passed to a system call is too small)

I have an issue with WinInet's InternetReadFile (C++).
In some rare cases the function fails and GetLastError returns the mentioned error 0x8007007a (which according to ErrorLookup corresponds to "The data area passed to a system call is too small").
I have a few questions regarding this:
Why does this happen in some rare cases but in other cases works
fine (I'm talking of course about always downloading the same ~15MB
zip file) ?
Is this really related to the buffer size passed to the API call ? I am using a const buffer size of 1024 BYTES for this call. Should I use a bigger buffer size ? If so, how can I know what is the "right" buffer size ?
What can I do to recover during run time if I do get this error ?
Adding a code snippet (note that this will not work as is because some init code is necessary):
#define HTTP_RESPONSE_BUFFER_SIZE 1024
std::vector<char> responseBuffer;
DWORD dwResponseBytesRead = 0;
do
{
const size_t oldBufferSize = responseBuffer.size();
responseBuffer.resize(oldBufferSize + HTTP_RESPONSE_BUFFER_SIZE);
// Now we read again to the last place we stopped
// writing in the previous iteration.
dwResponseBytesRead = 0;
BOOL bInternetReadFile = ::InternetReadFile(hOpenRequest, // hFile. Retrieved from a previous call to ::HttpOpenRequest
(LPVOID)&responseBuffer[oldBufferSize], // lpBuffer.
HTTP_RESPONSE_BUFFER_SIZE, // dwNumberOfBytesToRead.
&dwResponseBytesRead); // lpdwNumberOfBytesRead.
if(!bInternetReadFile)
{
// Do clean up and exit.
DWORD dwErr = ::GetLastError(); // This, in some cases, will return: 0x7a
HRESULT hr = HRESULT_FROM_WIN32(dwErr); // This, in some cases, will return: 0x8007007a
return;
}
// Adjust the buffer according to the actual number of bytes read.
responseBuffer.resize(oldBufferSize + dwResponseBytesRead);
}
while(dwResponseBytesRead != 0);
It is a documented error for InternetReadFile:
WinINet attempts to write the HTML to the lpBuffer buffer a line at a time. If the application's buffer is too small to fit at least one line of generated HTML, the error code ERROR_INSUFFICIENT_BUFFER is returned as an indication to the application that it needs a larger buffer.
So you are supposed to handle this error by increasing the buffer size. Just double the size, repeatedly if necessary.
There are some discrepancies in question. It isn't clear that you are reading an HTML file for one, 15MB seems excessive. Another is that this error should repeat well. But most troubling is the error code value, it is wrapped in an HRESULT, the kind of error code that a COM component would return. You should be getting a Windows error code back from GetLastError(), just 0x7a and not 0x8007007a.
Do make sure that your error checking is correct. Only ever call GetLastError() when InternetReadFile() returned FALSE. If that checks out (always post a snippet please) then do consider that this error is actually generated upstream, perhaps the firewall or flaky anti-malware.

How do I get a string description of a Win32 crash while in Top level filter (I am looking for the address of the instruction at the top of the stack)

If I use a class/method like the one described here how can I get the description/address of the call at the top of the stack?
Basically I want some value I can use in a call to our bug tracking system. I want to "uniquely" identify based on the address of the instruction that caused the exception.
(It is usually something of the form of mydll.dll!1234ABDC())
EDIT:
Some background information:
I am creating a minidump to email to a defect tracking system (fogbugz). In order to reduce duplicates I am trying to come up with a reasonable "signature" for the crash. I know there is an xml PI for FB, but it requires a user logon and we are not sure yet that we can afford to have people sniffing our traffic and getting user information. Emailing is also simpler for now to implement. Later on we will use the XML API to submit minidumps.
You need to put the code to do this in your exception filter, by the time you get to the exception handler much of the context information for the exception has been lost.
try
{
// whatever
}
except (MyExceptionFilter(GetExceptionInformation()))
{
}
Your filter will look something like this
LONG WINAPI MyExceptionFilter (
EXCEPTION_POINTERS * pExcept,
BOOL fPassOn)
{
EXCEPTION_RECORD * pER = pExcept->ExceptionRecord;
DWORD dwExceptionCode = pER->ExceptionCode;
TCHAR szOut[MAX_PATH*4]; // exception output goes here.
szOut[0] = 0;
MEMORY_BASIC_INFORMATION mbi;
DWORD cb = VirtualQuery (pER->ExceptionAddress, &mbi, sizeof(mbi));
if (cb == sizeof(mbi))
{
TCHAR szModule[MAX_PATH];
if (GetModuleFileName ((HMODULE)mbi.AllocationBase, szModule, MAX_PATH))
{
wsprintf(szOut, "Exception at '%s' + 0x%X", szModule,
(ULONG_PTR)pER->ExceptionAddress - (ULONG_PTR)mbi.AllocationBase);
}
}
return EXCEPTION_EXECUTE_HANDLER;
}
Of course, you will need to adjust your output a bit for 64 bit architectures, since the ExceptionAddress and AllocationBase will be 64 bit quantities in that case.
The EXCEPTION_POINTERS struct which is sent to TopLevelFilter() contains an EXCEPTION_RECORD struct which contains the ExceptionAddress. Which this address you can figure out in which DLL the offending opcode is by enumerating the modules with CreateToolhelp32Snapshot. You can also use the functions in dbghelp.dll to find the symbol which correspond to the address (the function it is in)
GetExceptionInformation will return the EXCEPTION_POINTERS struct which contains information about the exception. The ExceptionRecord member contains an ExceptionAddress member, which is the address of the exception.
You'll need to map this address to a module relative location in your code to be useful. You can use GetModuleHandleEx with the GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS to get the HMODULE (which is also the base address of the module). GetModuleInformation can then be used to get the actual name of the module that the exception occurred in.
This may not be that helpful to you if the fault is actually inside of a system DLL. A more sophisticated scheme would be to generate a stack trace (using Stackwalk64 in dbghelp), and ignoring the topmost frames that are not in your code.
You can avoid the pain of printing a string for the exception (what happens if you can save the minidump but can't format a string without crashing?) by saving the minidump instead and using cdb.exe or windbg.exe to extract the exception information.

Large Xml files are being truncated by MSXML4 / FreeThreadedDOMDocument40 (COM string Interop issue)

I'm using the following code to load a large Xml document (~5 MB):
int _tmain(int argc, _TCHAR* argv[])
{
::CoInitialize(NULL);
HRESULT hr;
CComPtr< IXMLDOMDocument > spXmlDocument;
hr = spXmlDocument.CoCreateInstance(__uuidof(FreeThreadedDOMDocument60)), __uuidof(FreeThreadedDOMDocument60);
if(FAILED(hr)) return FALSE;
spXmlDocument->put_preserveWhiteSpace(VARIANT_TRUE);
spXmlDocument->put_async(VARIANT_FALSE);
spXmlDocument->put_validateOnParse(VARIANT_FALSE);
VARIANT_BOOL bLoadSucceeded = VARIANT_FALSE;
hr = spXmlDocument->load( CComVariant( L"C:\\XMLFile1.xml" ), &bLoadSucceeded );
if(FAILED(hr) || bLoadSucceeded==VARIANT_FALSE) return FALSE;
CComVariant bstrDoc;
hr = spXmlDocument->get_nodeValue(&bstrDoc);
CComPtr< IXMLDOMNode > spNode;
hr = spXmlDocument->selectSingleNode(CComBSTR(L"//SpecialNode"), &spNode );
}
I'm finding that the contents of bstrDoc is truncated (there are no exceptions / failed HResults)
Anyone know why? You can try this yourself just by creating a large Xml file of just <xml></xml> elements (~5 MB should do it)
UPDATE: Updating to use MSXML 6 made no difference, also setting Async to false and using get_nodeValue / get_text made no difference (sample updated)
I noticed that if I did selectSingleNode for a node placed at the end of the document it worked fine - it appears that the document loads successfully, and the issue is instead with getting the text for a single node. I'm perplexed however as I'm yet to find anyone else on the internet having this issue.
UPDATE 2: The problem appears to be related to COM interop itself - I've created a simple C# class that does the same thing and exposed it as a COM object. I can see that although the Xml is fine in my C# app, by the time I look at it in my debugger in the C++ app it looks exactly as it did when using MSXML.
It appears I was a victim of my own foolishness - the Xml / strings were in fact not being truncated, the viewer in Visual Studio was simply lying to me.
Outputting the strings to a file showed that the strings were all as they should be.