problem parsing a xml file with MSXML4 in C++ - c++

Here is my parsing code:
MSXML2::IXMLDOMNodePtr pNode = m_pXmlDoc->selectSingleNode(kNameOfChild.c_str());
MSXML2::IXMLDOMNodeListPtr pIDOMNodeList = NULL;
MSXML2::IXMLDOMNodePtr pIDOMNode = NULL;
long numOfChildNodes= 0;
BSTR bstrItemText;
HRESULT hr;
MSXML2::IXMLDOMElementPtr pChildNode = m_pXmlDoc->getElementsByTagName(kNameOfChild.c_str());
hr = m_pXmlDoc->get_childNodes(&pIDOMNodeList);
hr = pIDOMNodeList->get_length(&numOfChildNodes);
And my xml file:
<?xml version="1.0"?>
<GovTalkMessage>
<EnvelopeVersion>1.0</EnvelopeVersion>
<Header>
<MessageDetails>
<Class>MOSWTSC2</Class>
<Qualifier>acknowledgement</Qualifier>
<Function>submit</Function>
<TransactionID>20021202ABC</TransactionID>
<CorrelationID>B07B9ED3176193DDC4EC39063848A927</CorrelationID>
<ResponseEndPoint PollInterval="10">
https://secure.gateway.gov.uk/poll
</ResponseEndPoint>
<GatewayTimestamp>2001-01-31T10:20:18.345</GatewayTimestamp>
</MessageDetails>
<SenderDetails/>
</Header>
<GovTalkDetails>
<Keys/>
</GovTalkDetails>
<Body/>
</GovTalkMessage>
kNameOfchild is "Qualifier"
pNode is always NULL
pChildNode is always NULL
hr returns S_OK
numOfChildNodes is always 0
So, what am I doing wrong?
Thanks

Try /GovTalkMessage/Header/MessageDetails/Qualifier for the XPath query.

You need to provide the xpath for the selectSingleNode call. There may be multiple "Qualifier" present in XML file, so in case you just pass "Qualifier" parser doesn't know which one to return. I haven't used XPath too many times, but I think this string should work for querying: "Header/MessageDetails[0]/Qualifier"

Related

Parsing wchar_t xml using Microsoft WebServices

I'm trying to parse a wide character string using WebServices.
HRESULT hr = NOERROR;
WS_ERROR* error = NULL;
WS_XML_READER* xmlReader = NULL;
// Create an error object for storing rich error information
hr = WsCreateError(
NULL,
0,
&error);
if (FAILED(hr))
{
goto Exit;
}
// Create an XML reader
hr = WsCreateReader(
NULL,
0,
&xmlReader,
error);
if (FAILED(hr))
{
goto Exit;
}
WCHAR* xml =
L"<?xml version='1.0' encoding='UTF-8' standalone='yes'?>"
"<Orders xmlns='http://example.com'>"
"<!-- Order #1 -->"
"<PurchaseOrder id='1'>"
"<Quantity>42</Quantity>"
"<ProductName>Toaster</ProductName>"
"</PurchaseOrder>"
"<!-- Order #2 -->"
"<PurchaseOrder id='2'>"
"<Quantity>5</Quantity>"
"<ProductName><![CDATA[Block&Tackle]></ProductName>"
"</PurchaseOrder>"
"</Orders>";
BYTE* bytes = (BYTE*)xml;
ULONG byteCount = (ULONG)wcslen(xml) * sizeof(WCHAR);
// Setup the source input
WS_XML_READER_BUFFER_INPUT bufferInput;
ZeroMemory(&bufferInput, sizeof(bufferInput));
bufferInput.input.inputType = WS_XML_READER_INPUT_TYPE_BUFFER;
bufferInput.encodedData = bytes;
bufferInput.encodedDataSize = byteCount;
// Setup the source encoding
WS_XML_READER_TEXT_ENCODING textEncoding;
ZeroMemory(&textEncoding, sizeof(textEncoding));
textEncoding.encoding.encodingType = WS_XML_READER_ENCODING_TYPE_TEXT;
textEncoding.charSet = WS_CHARSET_AUTO;
// Setup the reader
hr = WsSetInput(xmlReader, &textEncoding.encoding, &bufferInput.input, NULL, 0, error);
if (FAILED(hr))
{
goto Exit;
}
I've also tried to change textEncoding.charSet = WS_CHARSET_AUTO; to textEncoding.charSet = WS_CHARSET_UTF16LE;
The above code keeps failing while calling to WsReadNode (in the attached link), and the error message is "The data input was not in the expected format or did not have the expected value".
WsReadNode
I can't convert the WCHAR array to CHAR array as the XML may contain non ASCII characters.
The example xml has an encoding in the declaration
L"<?xml version='1.0' encoding='UTF-8' standalone='yes'?>"
Changing it to
L"<?xml version='1.0' encoding='UTF-16LE' standalone='yes'?>"
should fix the error.
https://learn.microsoft.com/es-es/windows/win32/api/webservices/ns-webservices-ws_xml_reader_buffer_input
https://learn.microsoft.com/es-es/windows/win32/api/webservices/ns-webservices-ws_xml_reader_text_encoding
https://learn.microsoft.com/es-es/windows/win32/api/webservices/ns-webservices-_ws_xml_reader_encoding
https://learn.microsoft.com/ru-ru/windows/win32/api/webservices/ne-webservices-ws_xml_reader_encoding_type
https://learn.microsoft.com/es-es/windows/win32/api/webservices/ne-webservices-ws_charset
For example on this ultimate you should use WS_CHARSET_UTF8 because on your XML you put encoding UTF8.
About ASCII, what about make something to replace non ASCII values to ASCII values ?
If you have problems with WCHAR to CHAR conversion, on this site you have a lot of info about this.

Change div innerHTML through IHTMLDocument2 and C++

I'm trying to change the content of a div using IHTMLDocument2 interface this way:
IHTMLElementCollection* collection = NULL;
IDispatch* mydiv;
doc2->get_all(&collection);
long count;
collection->get_length(&count); //just to check I get something
CComVariant varstr = L"mydivname";
CComVariant varint = 0;
collection->item(varstr, varint, &mydiv); //this works I get the div
IHTMLElement* htmldiv;
mydiv->QueryInterface(IID_IHTMLElement, (void**)&htmldiv);
CComBSTR html;
htmldiv->get_innerHTML(&html); //works too, I get the current content
HRESULT hr=htmldiv->put_innerText(L"hello"); //this does not work but returns S_OK
collection->Release();
So the content of my div is just cleared and not replaced with "hello", I don't understand why, can it be a security issue ?
Thanks
According to the MSDN documentation, the string passed to put_innerText is of type BSTR.
So, I would suggest trying some code like this:
CComBSTR text(OLESTR("hello"));
hr = htmldiv->put_innerText(text);

XSL Transformation in C++ catch message with terminate flag

I've made a project basing on this document to retrieve information from XML file basing on XSL file.
I am trying to throw an error in XSL file:
<xsl:if test="not(PIN/Length/text() = '4')">
<xsl:message terminate="yes">PIN length in input suppose to be 4</xsl:message>
</xsl:if>
But it seems not to work (no errors during work) - just like it is successfully done.
Can I somehow catch this message in C++?
void ManageXML::XML2Generic(string sOrgFilePath, string sOrgXSLFilePath, string sCpfPath)
{
wstring sTempFilePath = s2ws(sOrgFilePath);
LPCWSTR sFilePath = sTempFilePath.c_str();
wstring sTempXSLFilePath = s2ws(sOrgXSLFilePath);
LPCWSTR sXSLFilePath = sTempXSLFilePath.c_str();
HRESULT hr = S_OK;
IXMLDOMDocument *pXMLDom = nullptr;
IXMLDOMDocument *pXSLDoc = nullptr;
CHK_HR(CreateAndInitParserDOM(&pXMLDom));
CHK_HR(LoadXMLFile(pXMLDom, sFilePath, sOrgFilePath)); //cast to LPCWSTR
CHK_HR(CreateAndInitParserDOM(&pXSLDoc));
CHK_HR(LoadXMLFile(pXSLDoc, sXSLFilePath, sOrgXSLFilePath)); //cast to LPCWSTR
// Transform dom to a string:
CHK_HR(TransformDOM2Data(pXMLDom, pXSLDoc, sGenericResult));
CleanUp:
SAFE_RELEASE(pXSLDoc);
SAFE_RELEASE(pXMLDom);
this->CreateGenericFile(sCpfPath);
CoUninitialize();
}
One bad solution that comes to my mind is to make XSL like this:
<xsl:if test="not(PIN/Length/text() = '4')">
<xsl:text>MSXML_ERROR: PIN length in input suppose to be 4</xsl:message>
</xsl:if>
And
CleanUp:
SAFE_RELEASE(pXSLDoc);
SAFE_RELEASE(pXMLDom);
if (sGenericResult.find("MSXML_ERROR") != string::npos)
throw runtime_error("blah blah blah");
this->CreateGenericFile(sCpfPath);
CoUninitialize();

SAPI identifying more than 2 properties

I found this on google while searching on some information on SAPI identifying phrases. This example shows if there is only one property in the rule. So what if there are 2 or more properties in that rule? How would one go about writing the code for this? I am still confused about SAPI and trying to understand it. Any help is welcome, thanks!
The alternate method is add a property to your list tag/items [you appear to
be familiar with properties], iterate through the property tree to find the
property, and then retrieve the associated item from the property. Note if
this is the only property in your recognized rule, then it is fairly easy to
retrieve the property [no need to navigate the property tree].
For example, you could change your rule to be the following:
<RULE ID="VID_Vcs">
<L PROPNAME="NAME_LIST">
<P VAL="1">Mike </P>
<P VAL="2">Davor </P>
<P VAL="3">Kurt </P>
<P VAL="4">Steve </P>
</L>
</RULE>
Use the following code to retrieve the list item value/phrase text
SPPHRASE* pPhrase = NULL;
hr = cpRecoResult->GetPhrase(&pPhrase);
// Check hr
// Let's assume that you only have one property in your rule, so there is only one property in the property tree.
// ASSERT: NULL == pPhrase->pProperties->pNextSibling && NULL == pPhrase->pProperties->pFirstChild
// ASSERT: NULL != pPhrase->pProperties->pszName && 0 == wcscmp(pPhrase->pProperties->pszName, L"NAME_LIST")
// retrieve the list item index [e.g. 1-4], see VAL XML tags in aforementioned grammar
long lRecognizedListItemIndex = pPhrase->pProperties->vValue.lVal;
// retrieve the phrase text
hr = cpRecoResult->GetText(pPhrase->pProperties->ulFirstElement, pPhrase->pProperties->ulCOuntOfElements, FALSE, &pwszListItem, NULL);
// Check hr
// pwszListItem now contains the recognized list item
//compared to the phrase tag of the dictionary (XML)
if(SUCCEEDED (hResult)) {
if ((pPhrase->pProperties != nullptr) && (pPhrase->pProperties->pFirstChild != nullptr)){
const SPPHRASEPROPERTY* pRule = pPhrase->pProperties->pFirstChild ;
if (pRule->SREngineConfidence->confidence Threshold) {
if ( wcscmp ( L"word one", pRule->pszValue) == 0 ) {
//do stuff here
}
else if ( wcscmp ( L"word two", pRule->pszValue) == 0 ) {
//do stuff here
}
else if ( wcscmp ( L"word three", pRule->pszValue) == 0 ) {
//do stuff here
}
else if ( wcscmp ( L"word four", pRule->pszValue ) == 0) {
//do stuff
}
}
}
}
Alright, so sorry for the wait. I whipped up a simple program that may help you figure out what you're trying to do.
So here's my grammar file:
<GRAMMAR LANGID="409">
<DEFINE>
<ID NAME="LIKE_VAL" VAL="1"/>
<ID NAME="SUBJECT_VAL" VAL="2"/>
<ID NAME="COMBINED_VAL" VAL="3"/>
<ID NAME="EXIT_VAL" VAL="4"/>
</DEFINE>
<RULE NAME="LIKE_VAL" TOPLEVEL="ACTIVE">
<L>
<P>I <O>really</O> like</P>
<P>I <O>really</O> do not like</P>
</L>
</RULE>
<RULE NAME="SUBJECT_VAL" TOPLEVEL="ACTIVE">
<P>ponies.</P>
</RULE>
<RULE NAME="COMBINED_VAL" TOPLEVEL="ACTIVE">
<RULEREF NAME="LIKE_VAL"/>
<RULEREF NAME="SUBJECT_VAL"/>
</RULE>
<RULE NAME="EXIT_VAL" TOPLEVEL="ACTIVE">
<L>
<P>Exit</P>
<P>Quit</P>
<P>Terminate</P>
<P>Deluminate</P>
</L>
</RULE>
</GRAMMAR>
And here's a full program that uses it:
#include "sphelper.h"
#include <Windows.h>
#include <string>
int main(int argc, char* argv[])
{
CComPtr<ISpRecognizer> cpReco;
CComPtr<ISpRecoContext> cpRecoCtx;
CComPtr<ISpRecoGrammar> cpRecoGram;
ULONGLONG ullEvents = SPFEI(SPEI_RECOGNITION)|
SPFEI(SPEI_FALSE_RECOGNITION);
ISpObjectToken* pInputToken;
ISpRecoResult* cpRecoRslt;
HRESULT hr = S_OK;
hr = ::CoInitialize(NULL);
hr = cpReco.CoCreateInstance(CLSID_SpInprocRecognizer);
hr = cpReco->CreateRecoContext(&cpRecoCtx);
hr = cpRecoCtx->CreateGrammar(0, &cpRecoGram);
hr = cpRecoCtx->SetNotifyWin32Event();
hr = cpRecoCtx->SetInterest(ullEvents, ullEvents);
hr = SpGetDefaultTokenFromCategoryId(SPCAT_AUDIOIN, &pInputToken);
hr = cpReco->SetInput(pInputToken, FALSE);
hr = cpRecoGram->LoadCmdFromFile(L"Your_Test_File.cfg",SPLO_STATIC);
hr = cpReco->SetRecoState(SPRST_ACTIVE);
hr = cpRecoCtx->SetContextState(SPCS_ENABLED);
hr = cpRecoGram->SetGrammarState(SPGS_ENABLED);
hr = cpRecoGram->SetRuleState(NULL, NULL, SPRS_ACTIVE);
std::wstring strExit = L"Exit";
std::wstring strExitRuleName = L"EXIT_VAL";
CSpEvent spEvent;
bool isListening = true;
do{
hr = cpRecoCtx->WaitForNotifyEvent(INFINITE);
if(spEvent.GetFrom(cpRecoCtx) == S_OK)
{
switch(spEvent.eEventId){
case SPEI_RECOGNITION:{
WCHAR* strReco = 0;
cpRecoRslt = spEvent.RecoResult();
cpRecoRslt->GetText(SP_GETWHOLEPHRASE, SP_GETWHOLEPHRASE, TRUE, &strReco, NULL);
printf("%ls\n",strReco);
SPPHRASE *phrase = NULL;
cpRecoRslt->GetPhrase(&phrase);
if(phrase){
std::wstring ruleName = phrase->Rule.pszName;
if(strExitRuleName.compare(strExit)==0){
isListening = false;
}
}
break;
}
case SPEI_FALSE_RECOGNITION:{
printf("False Recognition\n");
break;
}
}
}
}while(isListening);
cpRecoGram.Release();
cpRecoCtx.Release();
cpReco.Release();
::CoUninitialize();
printf("Press any key to continue...");
getchar();
return 0;
}
You'll have to change the path of where the load grammar call is loading from. From what I understand what you're attempting to do is create grammar in a context free grammar file AND try to do this programmatically as well. Typically you start with a grammar file and modify when you need to.
If, however, you REALLY REALLY need to add new grammars programmatically, such as when someone's typing in new grammar to be recognized, THEN you'd chance SPLO_STATIC to SPLO_DYNAMIC and start implementing the code you see in the later half of the MSDN post you saw.
I completely left out any error checking. If you need to access other properties of the rule you're looking at, use the pPhrase->GetPhrase(&phrase) area. Other than just the rule's name you can also get it's ID.

Xerces XPath causes seg fault when path doesn't exist

I can successfully use Xerces XPath feature to query for information from an XML with the following XML and C++ code.
XML
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<root>
<ApplicationSettings>
hello universe
</ApplicationSettings>
</root>
C++
int main()
{
XMLPlatformUtils::Initialize();
// create the DOM parser
XercesDOMParser *parser = new XercesDOMParser;
parser->setValidationScheme(XercesDOMParser::Val_Never);
parser->parse("fake_cmf.xml");
// get the DOM representation
DOMDocument *doc = parser->getDocument();
// get the root element
DOMElement* root = doc->getDocumentElement();
// evaluate the xpath
DOMXPathResult* result=doc->evaluate(
XMLString::transcode("/root/ApplicationSettings"), // <-- HERE IS THE XPATH
root,
NULL,
DOMXPathResult::ORDERED_NODE_SNAPSHOT_TYPE, //DOMXPathResult::ANY_UNORDERED_NODE_TYPE, //DOMXPathResult::STRING_TYPE,
NULL);
// look into the xpart evaluate result
result->snapshotItem(0);
std::cout<<TranscodeToStr(result->getNodeValue()->getFirstChild()->getNodeValue(),"ascii").str()<<std::endl;;
XMLPlatformUtils::Terminate();
return 0;
}
The problem is that sometimes my XML will only have certain fields. But if I remove the ApplicationSettings entry from the XML it will seg fault. How can I properly handle these optional fields? I know that trying to correct from seg faults is risky business.
The seg fault is occurring in this line
std::cout<<TranscodeToStr(result->getNodeValue()->getFirstChild()->getNodeValue(),"ascii").str()<<std::endl;
specifically in get getFirstChild() call because the result of getNodeValue() is NULL.
This is my quick and dirty solution. It's not really ideal but it works. I would prefer a more sophisticated evaluation and response.
if (result->getNodeValue() == NULL)
{
cout << "There is no result for the provided XPath " << endl;
}
else
{
cout<<TranscodeToStr(result->getNodeValue()->getFirstChild()->getNodeValue(),"ascii").str()<<endl;
}