Requesting complete, compilable libxml2 sax example - c++

I'm having a heck of a time figuring out how to use the sax parser for libxml2. Can someone post an example that parses this XML ( yes, without the <xml...> header and footer tags, if that can be parsed by the libxml2 sax parser):
<hello foo="bar">world</hello>
The parser should print out the data enclosed in element hello and also grab the value of attribute foo.
I'm working on this example, but hoping that someone else beats me to the punch since I'm not making much progress. The Google hasn't yielded any complete, working examples for libxml2 sax parser.

Adapted from http://julp.developpez.com/c/libxml2/?page=sax
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <libxml/tree.h>
#include <libxml/parser.h>
#include <libxml/parserInternals.h>
void start_element_callback(void *user_data, const xmlChar *name, const xmlChar **attrs) {
printf("Beginning of element : %s \n", name);
while (NULL != attrs && NULL != attrs[0]) {
printf("attribute: %s=%s\n",attrs[0],attrs[1]);
attrs = &attrs[2];
}
}
int main() {
const char* xml_path = "hello_world.xml";
FILE *xml_fh = fopen(xml_path,"w+");
fputs("<hello foo=\"bar\" baz=\"baa\">world</hello>",xml_fh);
fclose(xml_fh);
// Initialize all fields to zero
xmlSAXHandler sh = { 0 };
// register callback
sh.startElement = start_element_callback;
xmlParserCtxtPtr ctxt;
// create the context
if ((ctxt = xmlCreateFileParserCtxt(xml_path)) == NULL) {
fprintf(stderr, "Erreur lors de la création du contexte\n");
return EXIT_FAILURE;
}
// register sax handler with the context
ctxt->sax = &sh;
// parse the doc
xmlParseDocument(ctxt);
// well-formed document?
if (ctxt->wellFormed) {
printf("XML Document is well formed\n");
} else {
fprintf(stderr, "XML Document isn't well formed\n");
//xmlFreeParserCtxt(ctxt);
return EXIT_FAILURE;
}
// free the memory
// xmlFreeParserCtxt(ctxt);
return EXIT_SUCCESS;
}
This produces output:
Beginning of element : hello
attribute: foo=bar
attribute: baz=baa
XML Document is well formed
Compiled with the following command on Ubuntu 10.04.1:
g++ -I/usr/include/libxml2 libxml2_hello_world.cpp /usr/lib/libxml2.a -lz\
-o libxml2_hello_world

Can I suggest rapidxml?

Related

WinAPI FileExists Function Implemetation

I am coding a simple replacement for std::filesystem::exists() function using Windows API. Surprisingly, it turned out to be pretty hard. I want to keep my code simple, so I am using minimum functions. My function of choice is GetFileAttributesW(). Code is tested with fs::recursive_directory_iterator() function. My function thinks that all files in “C:\Windows\servicing\LCU*” don’t exist (ERROR_PATH_NOT_FOUND). This directory is responsible for storing Windows Update Caches and is famous for having extremely long file names. I couldn’t find anything else about this directory. Example of filenames and my code are included below. Hope this helps!
Edited:
The solution to this problem is to prepend absolute file path with “\\?\” char sequence. It makes Windows handle short files correctly!
C:\Windows\servicing\LCU\Package_for_RollupFix~31bf3856ad364e35~amd64~~19041.2006.1.7\amd64_microsoft-windows-a..g-whatsnew.appxmain_31bf3856ad364e35_10.0.19041.1741_none_ee5d4a8d060d7653\f\new360videossquare44x44logo.targetsize-16_altform-unplated_contrast-black.png
C:\Windows\servicing\LCU\Package_for_RollupFix~31bf3856ad364e35~amd64~~19041.2006.1.7\amd64_microsoft-windows-a..g-whatsnew.appxmain_31bf3856ad364e35_10.0.19041.1741_none_ee5d4a8d060d7653\f\new360videossquare44x44logo.targetsize-16_altform-unplated_contrast-white.png
C:\Windows\servicing\LCU\Package_for_RollupFix~31bf3856ad364e35~amd64~~19041.2006.1.7\amd64_microsoft-windows-a..g-whatsnew.appxmain_31bf3856ad364e35_10.0.19041.1741_none_ee5d4a8d060d7653\f\new360videossquare44x44logo.targetsize-20_altform-unplated_contrast-black.png
C:\Windows\servicing\LCU\Package_for_RollupFix~31bf3856ad364e35~amd64~~19041.2006.1.7\amd64_microsoft-windows-a..g-whatsnew.appxmain_31bf3856ad364e35_10.0.19041.1741_none_ee5d4a8d060d7653\f\new360videossquare44x44logo.targetsize-20_altform-unplated_contrast-white.png
#include <windows.h>
#include <filesystem>
#include <iostream>
#include <string>
using namespace std;
namespace fs = std::filesystem;
int FileExists(wstring file_path) {
/* TODO:
1. Doesn't work with "C:\\Windows\\servicing\\LCU\\*".
2. Improve error system.
*/
DWORD attributes = GetFileAttributesW(file_path.c_str());
// Valid attributes => File exists
if (attributes != INVALID_FILE_ATTRIBUTES) {
return true;
}
DWORD error_code = GetLastError();
wcout << error_code << ' ' << file_path << '\n';
// Path related error => File doesn't exist
if (error_code == ERROR_PATH_NOT_FOUND || error_code == ERROR_INVALID_NAME ||
error_code == ERROR_FILE_NOT_FOUND || error_code == ERROR_BAD_NETPATH)
{
return false;
}
// Other errors are logged before if statement
// File is busy with IO operations, etc.
return error_code;
}
int main() {
for (fs::path path : fs::recursive_directory_iterator("C:\\", fs::directory_options::skip_permission_denied)) {
FileExists(path);
}
return 0;
}
The solution that worked for me is to prepend absolute file path with “\\?\” char sequence. Somehow, it makes Windows handle shortened file paths correctly!
Check out MSDN Article "Maximum File Path Limitation" for more info.

UTF-8 and TinyXML

For some reason I can not read data from a xml file properly.
For example instead of "Schrüder" I get something like "Schrüder".
My code:
tinyxml2::XMLDocument doc;
bool open(string path) {
if(doc.LoadFile(path.c_str()) == XML_SUCCESS)
return true;
return false;
}
int main() {
if(open("C:\\Users\\Admin\\Desktop\\Test.xml"))
cout << "Success" << endl;
XMLNode * node = doc.RootElement();
string test = node->FirstChild()->GetText();
cout << test << endl;
return 0;
}
Part of XML:
<?xml version="1.0" encoding="UTF-8"?>
<myXML>
<my:TXT_UTF8Test>Schrüder</my:TXT_UTF8Test>
</myXML>
Notice that if I convert it to ANSI and change the encoding type to "ISO-8859-15" it works fine.
I read that something like "LoadFile( filename, TIXML_ENCODING_UTF8 )" should help. However that's not the case (error: Invalid arguments, it just expects a const char). I have the latest version of TinyXML2 (I guess?). I downloaded it just a couple minutes ago from https://github.com/leethomason/tinyxml2.
Any ideas?
Edit: When I write the string to a .xml or .txt file it works fine. There might be some problem with the eclipse ide console. Anyway, when I try to send the string via E-Mail, I also get the same problems. Here's the MailSend script:
bool sendMail(std::string params) {
if( (int) ShellExecute(NULL, "open", "H:\\MailSend\\MailSend_anhang.exe", params.c_str(), NULL, SW_HIDE) <= 32 )
return false;
return true;
}
I call it in the main method like this:
sendMail("-f:d.nitschmann#example.com -t:person2#example.com -s:Subject -b:Body " + test);
I think the problem is with your terminal; can you try run your test code in a different terminal ? one with known good UTF-8 support ?
Output with terminal in UTF-8 mode:
$ ./a.out
Success
Schrüder
Output with terminal in ISO-8859-15 mode:
$ ./a.out
Success
SchrÃŒder
Also - please try and follow http://sscce.org/ - for posterity sake here is your code with everything needed to compile (17676169.cpp):
#include <tinyxml2.h>
#include <string>
#include <iostream>
using namespace std;
using namespace tinyxml2;
tinyxml2::XMLDocument doc;
bool open(string path) {
if(doc.LoadFile(path.c_str()) == XML_SUCCESS)
return true;
return false;
}
int main() {
if(open("Test.xml"))
cout << "Success" << endl;
XMLNode * node = doc.RootElement();
string test = node->FirstChildElement()->GetText();
cout << test << endl;
return 0;
}
compiled with:
g++ -o 17676169 17676169.cpp -ltinyxml2
and uuencoded Test.xml - to ensure exact same data is used
begin 660 Test.xml
M/#]X;6P#=F5R<VEO;CTB,2XP(B!E;F-O9&EN9STB551&+3#B/SX*/&UY6$U,
M/#H#("`#/&UY.E185%]55$8X5&5S=#Y38VARP[QD97(\+VUY.E185%]55$8X
/5&5S=#X*/"]M>5A-3#X*
`
end
Edit 1:
If you want to confirm this theory - run this in eclipse:
#include <iostream>
#include <string>
#include <fstream>
int main()
{
std::ifstream ifs("Test.xml");
std::string xml_data((std::istreambuf_iterator<char>(ifs)), std::istreambuf_iterator<char>());
std::cout << xml_data;
}
Output with terminal in UTF-8 mode:
$ ./17676169.cat
<?xml version="1.0" encoding="UTF-8"?>
<myXML>
<my:TXT_UTF8Test>Schrüder</my:TXT_UTF8Test>
</myXML>
Output with terminal in ISO-8859-15 mode:
$ ./17676169.cat
<?xml version="1.0" encoding="UTF-8"?>
<myXML>
<my:TXT_UTF8Test>SchrÃŒder</my:TXT_UTF8Test>
</myXML>

TinyXml to parse conf file

I`m try to realize how to use TinyXML library.
I have to parse this conf file:
<?xml version="1.0" encoding="UTF-8"?>
<Client>
<port num = "20035">
<server_addr ip="127.0.0.1">
<AV_list>
<AV>
<AVNAME>BitDefender</AVNAME>>
<AVPATH> C:\Program Files\Common Files\BitDefender\BitDefender Threat Scanner\av64bit_26308\bdc.exe </AVPATH>
<AVMASK>0x80000000</AVMASK>
<AVCOMMANDLINE> %avpath% \log=%avlog% %scanpath% </AVCOMMANDLINE>
<AVREGEX>(%scanpath%.*?)+(([a-zA-Z0-9]+\\.)+[a-zA-Z]{2,4})+(.+[a-zA-Z_])</AVREGEX>
<AVLOG>C:\log\bd_log.txt</AVLOG>
</AV>
</AV_list>
</Client>
And c++ code
#include "stdafx.h"
#include "iostream"
#include "tinyxml.h"
int main(int argc, char* argv[])
{
TiXmlDocument doc( "D:\\client_conf.xml" );
bool loadOkay = doc.LoadFile();
if ( loadOkay )
printf("Yes \n");
else
printf("No \n");
TiXmlHandle hDoc(&doc);
TiXmlElement* pElem;
TiXmlText* pText;
TiXmlHandle hRoot(0);
pElem = hDoc.FirstChildElement().Element();
if (!pElem)
printf("error element");
hRoot = TiXmlHandle(pElem);
pElem = hRoot.FirstChild("server_addr").Element();
const char* info = pElem->Attribute("ip");
printf( "%s \n", info);
pElem = hRoot.FirstChild("port").Element();
info = pElem->Attribute("num");
printf( "%s \n", info);
system("pause");
return 0;
}
Now I can get first two param, but dont figure out how to reach "AV_list" block. Any help will be appreciated. (:
Have a look at the TinyXml Documentation. Your friend is the TiXmlNode Class Reference. You may use most of the Node functions also on TiXmlElements.
You already use the FirstChild() function to get the first child of an element; use the NextSibling() function to iterate over all elements. You can also use the NextSiblingElement() function to get the element directly.
Antother more sophisticated solution would be to use XPath to retrieve elements from the xml file. There is TinyXPath that builds on top of TinyXML. It needs some knowledge of XPath but it might be worth it. (XPath standard)

Using lex generated source code in another file

i would like to use the code generated by lex in another code that i have , but all the examples that i have seen is embedding the main function inside the lex file not the opposite.
is it possible to use(include) the c generated file from lex into other code that to have something like this (not necessarily the same) ?
#include<something>
int main(){
Lexer l = Lexer("some string or input file");
while (l.has_next()){
Token * token = l.get_next_token();
//somecode
}
//where token is just a simple object to hold the token type and lexeme
return 0;
}
This is what I would start with:
Note: this is an example of using a C interface
To use the C++ interface add %option c++ See below
Test.lex
IdentPart1 [A-Za-z_]
Identifier {IdentPart1}[A-Za-z_0-9]*
WHITESPACE [ \t\r\n]
%option noyywrap
%%
{Identifier} {return 257;}
{WHITESPACE} {/* Ignore */}
. {return 258;}
%%
// This is the bit you want.
// It is best just to put this at the bottom of the lex file
// By default functions are extern. So you can create a header file with
// these as extern then included that header file in your code (See Lexer.h)
void* setUpBuffer(char const* text)
{
YY_BUFFER_STATE buffer = yy_scan_string(text);
yy_switch_to_buffer(buffer);
return buffer;
}
void tearDownBuffer(void* buffer)
{
yy_delete_buffer((YY_BUFFER_STATE)buffer);
}
Lexer.h
#ifndef LOKI_A_LEXER_H
#define LOKI_A_LEXER_H
#include <string>
extern int yylex();
extern char* yytext;
extern int yyleng;
// Here is the interface to the lexer you set up above
extern void* setUpBuffer(char const* text);
extern void tearDownBuffer(void* buffer);
class Lexer
{
std::string token;
std::string text;
void* buffer;
public:
Lexer(std::string const& t)
: text(t)
{
// Use the interface to set up the buffer
buffer = setUpBuffer(text.c_str());
}
~Lexer()
{
// Tear down your interface
tearDownBuffer(buffer);
}
// Don't use RAW pointers
// This is only a quick and dirty example.
bool nextToken()
{
int val = yylex();
if (val != 0)
{
token = std::string(yytext, yyleng);
}
return val;
}
std::string const& theToken() const {return token;}
};
#endif
main.cpp
#include "Lexer.h"
#include <iostream>
int main()
{
Lexer l("some string or input file");
// Did not like your hasToken() interface.
// Just call nextToken() until it fails.
while (l.nextToken())
{
std::cout << l.theToken() << "\n";
delete token;
}
//where token is just a simple object to hold the token type and lexeme
return 0;
}
Build
> flext test.lex
> g++ main.cpp lex.yy.c
> ./a.out
some
string
or
input
file
>
Alternatively you can use the C++ interface to flex (its experimental)
test.lext
%option c++
IdentPart1 [A-Za-z_]
Identifier {IdentPart1}[A-Za-z_0-9]*
WHITESPACE [ \t\r\n]
%%
{Identifier} {return 257;}
{WHITESPACE} {/* Ignore */}
. {return 258;}
%%
// Note this needs to be here
// If you define no yywrap() in the options it gets added to the header file
// which leads to multiple definitions if you are not careful.
int yyFlexLexer::yywrap() { return 1;}
main.cpp
#include "MyLexer.h"
#include <iostream>
#include <sstream>
int main()
{
std::istringstream data("some string or input file");
yyFlexLexer l(&data, &std::cout);
while (l.yylex())
{
std::cout << std::string(l.YYText(), l.YYLeng()) << "\n";
}
//where token is just a simple object to hold the token type and lexeme
return 0;
}
build
> flex --header-file=MyLexer.h test.lex
> g++ main.cpp lex.yy.cc
> ./a.out
some
string
or
input
file
>
Sure. I'm not sure about the generated class; we use the C generated
parsers, and call them from C++. Or you can insert any sort of wrapper
code you want in the lex file, and call anything there from outside of
the generated file.
The keywords are %option reentrant or %option c++.
As an example here's the ncr2a scanner:
/** ncr2a_lex.l: Replace all NCRs by corresponding printable ASCII characters. */
%%
&#(1([01][0-9]|2[0-6])|3[2-9]|[4-9][0-9]); { /* accept 32..126 */
/** `+2` skips '&#', `atoi()` ignores ';' at the end */
fputc(atoi(yytext + 2), yyout); /* non-recursive version */
}
The scanner code can be left unchanged.
Here the program that uses it:
/** ncr2a.c */
#include "ncr2a_lex.h"
typedef struct {
int i,j; /** put here whatever you need to keep extra state */
} State;
int main () {
yyscan_t scanner;
State my_custom_data = {0,0};
yylex_init(&scanner);
yyset_extra(&my_custom_data, scanner);
yylex(scanner);
yylex_destroy(scanner);
return 0;
}
To build ncr2a executable:
flex -R -oncr2a_lex.c --header-file=ncr2a_lex.h ncr2a_lex.l
cc -c -o ncr2a_lex.o ncr2a_lex.c
cc -o ncr2a ncr2a_lex.o ncr2a.c -lfl
Example
$ echo 'three colons :::' | ./ncr2a
three colons :::
This example uses stdin/stdout as input/output and it calls yylex() once.
To read from a file:
yyin = fopen("input.txt", "r" );
#Loki Astari's answer shows how to read from a string (buffer = yy_scan_string(text, scanner); yy_switch_to_buffer(buffer, scanner))
.
To call yylex() once for each token add return inside rule definitions that yield full token in the *.l file.

Python embedded in CPP: how to get data back to CPP

While working on a C++ project, I was looking for a third party library for something that is not my core business. I found a really good library, doing exactly what's needed, but it is written in Python. I decided to experiment with embedding Python code in C++, using the Boost.Python library.
The C++ code looks something like this:
#include <string>
#include <iostream>
#include <boost/python.hpp>
using namespace boost::python;
int main(int, char **)
{
Py_Initialize();
try
{
object module((handle<>(borrowed(PyImport_AddModule("__main__")))));
object name_space = module.attr("__dict__");
object ignored = exec("from myModule import MyFunc\n"
"MyFunc(\"some_arg\")\n",
name_space);
std::string res = extract<std::string>(name_space["result"]);
}
catch (error_already_set)
{
PyErr_Print();
}
Py_Finalize();
return 0;
}
A (very) simplified version of the Python code looks like this:
import thirdparty
def MyFunc(some_arg):
result = thirdparty.go()
print result
Now the problem is this:
'MyFunc' executes fine, i can see the print of 'result'.
What i cannot do is read 'result' back from the C++ code. The extract command never finds 'result' in any namespace.
I tried defining 'result' as a global, i even tried returning a tuple, but i cannot get it to work.
First of all, change your function to return the value. printing it will complicate things since you want to get the value back. Suppose your MyModule.py looks like this:
import thirdparty
def MyFunc(some_arg):
result = thirdparty.go()
return result
Now, to do what you want, you have to go beyond basic embedding, as the documentation says. Here is the full code to run your function:
#include <Python.h>
int
main(int argc, char *argv[])
{
PyObject *pName, *pModule, *pFunc;
PyObject *pArgs, *pArg, *pResult;
int i;
Py_Initialize();
pName = PyString_FromString("MyModule.py");
/* Error checking of pName left out as exercise */
pModule = PyImport_Import(pName);
Py_DECREF(pName);
if (pModule != NULL) {
pFunc = PyObject_GetAttrString(pModule, "MyFunc");
/* pFunc is a new reference */
if (pFunc) {
pArgs = PyTuple_New(0);
pArg = PyString_FromString("some parameter")
/* pArg reference stolen here: */
PyTuple_SetItem(pArgs, 0, pArg);
pResult = PyObject_CallObject(pFunc, pArgs);
Py_DECREF(pArgs);
if (pResult != NULL) {
printf("Result of call: %s\n", PyString_AsString(pResult));
Py_DECREF(pResult);
}
else {
Py_DECREF(pFunc);
Py_DECREF(pModule);
PyErr_Print();
fprintf(stderr,"Call failed\n");
return 1;
}
}
else {
if (PyErr_Occurred())
PyErr_Print();
fprintf(stderr, "Cannot find function");
}
Py_XDECREF(pFunc);
Py_DECREF(pModule);
}
else {
PyErr_Print();
fprintf(stderr, "Failed to load module");
return 1;
}
Py_Finalize();
return 0;
}
Based on ΤΖΩΤΖΙΟΥ, Josh and Nosklo's answers i finally got it work using boost.python:
Python:
import thirdparty
def MyFunc(some_arg):
result = thirdparty.go()
return result
C++:
#include <string>
#include <iostream>
#include <boost/python.hpp>
using namespace boost::python;
int main(int, char **)
{
Py_Initialize();
try
{
object module = import("__main__");
object name_space = module.attr("__dict__");
exec_file("MyModule.py", name_space, name_space);
object MyFunc = name_space["MyFunc"];
object result = MyFunc("some_args");
// result is a dictionary
std::string val = extract<std::string>(result["val"]);
}
catch (error_already_set)
{
PyErr_Print();
}
Py_Finalize();
return 0;
}
Some important points:
I changed 'exec' to 'exec_file' out of
convenience, it also works with
plain 'exec'.
The main reason it failed is that i
did not pass a "local" name_sapce to
'exec' or 'exec_file' - this is now
fixed by passing name_space twice.
If the python function returns
unicode strings, they are not
convertible to 'std::string', so i
had to suffix all python strings
with '.encode('ASCII', 'ignore')'.
I think what you need is either PyObject_CallObject(<py function>, <args>), which returns the return value of the function you call as a PyObject, or PyRun_String(<expression>, Py_eval_input, <globals>, <locals>) which evaluates a single expression and returns its result.
You should be able to return the result from MyFunc, which would then end up in the variable you are currently calling "ignored". This eliminates the need to access it in any other way.