What library to use to *write* XML file in a C++ program? - c++

What library to use to write XML file in a C++ program?
I've found two classes posted in CodeProject
http://www.codeproject.com/KB/stl/simple_xmlwriter.aspx
http://www.codeproject.com/KB/XML/XML_writer.aspx
but want to check if there is more standard option than these. I'm only concerned with writing, and not parsing XML.

I tried different libraries and finally decided for TinyXml. It's compact, fast, free (zlib license) and very easy to use.

Question: Are you ever going to update an XML file? Because while that sounds like it's just more writing, with XML it still requires a parser.
While xerces is large and bloated, it is fully standards compliant and it is DOM based. Should you ever have to cross platform or change language, there will always be a DOM based library for whatever language/platform you might move to so knowing how DOM based parsing/writing works is a benefit. If you are going to use XML, you may as well use it correctly.
Avoiding XML altogether is of course the best option. But short of that, I'd go with xerces.

You can use Xerces-C++, a library written by Apache foundation. This library permits read, write and manipulate XML files.
Link: http://xerces.apache.org/xerces-c/

For my purposes, PugiXML worked out really nicely
http://pugixml.org/
The reason why I thought it was so nice was because it was simply 3 files, a configuration header, a header, and the actual source.
But as you stated, you aren't interested in parsing, so why even bother using a special class to write out XML? While maybe your classes are too complex for this, I found the easy thing to do is use the std::ostream and just write out standard compliant XML this way. For example, say I have a class that represents a Company which is a collection of Employee objects, just make a method in each the Company and Employee classes that looks something like the following psuedocode
Company::writeXML(std::ostream& out){
out << "<company>" << std::endl;
BOOST_FOREACH(Employee e, employees){
e.writeXML(out);
}
out << "</company>" << std::endl;
}
and to make it even easier, your Employee's writeXML function can be declared virtual so that you can have a specific output for say a CEO, President, Janitor or whatever the subclasses should be.

I have been using the open-source libxml2 library for years, works great for me.

I ran into the same problem and wound up rolling my own solution. It's implemented as a single header file that you can drop into your project: xml_writer.h
And it comes with a set of unit tests which also serve as documentation.

Roll your own
I've been in a similar situation. I had a program that needed to generate JSON. We did it two ways. First we tried jsoncpp, but in the end I just generated the JSON directly via a std::ofstream.
Afterward we ran the generated JSON through a validator to catch any syntax errors. There were a few but they were really easy to find and correct.
If I were to do it again I would definitely roll my own again. Somewhat unexpectedly, there was less code when using std::ofstream. Plus we didn't have to use/learn a new API. It was easier to write and is easier to maintain.

Related

library to abstract text parsing in C++

I'm about to parse a lot of files, with a precise hierarchy for the XML ones and a precise syntax for the others. I would like to abstract these files at "token level" to simplify my code and logic. I also need UTF-8 support.
Is there a library, or perhaps a library only formed by several headers, that can do this in C++?
EDIT:
supposing that my file is something like that
COLOR=red Language=en
COLOR=blue Language=se
COLOR=green Language=fr
with token level i mean that i can access this values after parsing in this way:
Object.getValue(color, 1)
and this should return red.
Well, there is boost::spirit, that I think may be what you are looking for. You can use it to create rules for parsing input. I personally have never used it but heard good comments for it. Hope it helps.

How to extract ALL typedefs and structs and unions from c++ source

I have inherited a Visual Studio project that contains hundreds of files.
I would like to extract all the typedefs, structs and unions from each .h/.cpp file and put the results in a file).
Each typdef/struct/union should be on one line in the results file. This would make sorting much easier.
typdef int myType;
struct myFirstStruct { char a; int b;...};
union Part_Number_Serial_Number_Part_2_Response_Message_Type {struct{Message_Response_Head_Type Head; Part_Num_Serial_Num_Part_2_Report_Array Part_2_Report; Message_Tail_Type Tail;} Data; BYTE byData[140];}myUnion;
struct { bool c; int d;...}mySecondStruct;
My problem is, I do not know what to look for (grammar of typedef/structs/unions) using a regular expression.
I cannot believe that nobody has done this before (I googled and have not found anything on this).
Does anyone know the regular expressions for these? (Note some are commented out using // others /* */)
Or a tool to accomplish this.
Edit:
I am toying with the idea of autogenerating source code and/or dialogs for modifying messages that use the underlying typedef/struct/union. I was going to use the output to generate an XML file that could be used for this reason.
The source for these are in C/C++ and used in almost all my projects. These projects are usually NOT in C/C++. By using the XML version I would only need to update/add the typedef/struct/union only in one place and all the projects would be able to autogen the source and/or dialogs.
I can't imagine a purpose for this except for some sort of documentation effort. If that is what you're looking for I would suggest doxygen.
To answer your question, I seriously doubt any amount of regular expressions will be sufficient. What you need to do is actually parse the code. I have heard of a library out there for building compilers and C++ tools that would provide the parsing aspect but I'm sorry to say I have forgotten the name. I know it's out there though so I'd start searching for that.
You will NOT be able to accomplish this with a regular expression. The only way to actually do this will be to get hold of a lexer and parser for the C++ grammar and write the code yourself to dump the interesting bits to a file or database upon encountering one of the structures you're interested in. And unfortunately, C++ parsing is rather hard.
parsing c++ is ... difficult. Instead of killing yourself trying to parse it there are a few options.
gcc-xml
cscope
ctags
global
Each of these will parse c++ code and grab the info your after. If you want to dump it to a file in the format you requested you'd be a lot better off parsing their data files than parsing raw c++.
I recommend you skip all of this and just use doxygen. It won't be in your preferred format but you'll be better off getting used to doxygen's layout.

How do you handle command line options and config files?

What packages do you use to handle command line options, settings and config files?
I'm looking for something that reads user-defined options from the command line and/or from config files.
The options (settings) should be dividable into different groups, so that I can pass different (subsets of) options to different objects in my code.
I know of boost::program_options, but I can't quite get used to the API. Are there light-weight alternatives?
(BTW, do you ever use a global options object in your code that can be read from anywhere? Or would you consider that evil?)
At Google, we use gflags. It doesn't do configuration files, but for flags, it's a lot less painful than using getopt.
#include <gflags/gflags.h>
DEFINE_string(server, "foo", "What server to connect to");
int main(int argc, char* argv[]) {
google::ParseCommandLineFlags(&argc, &argv, true);
if (!server.empty()) {
Connect(server);
}
}
You put the DEFINE_foo at the top of the file that needs to know the value of the flag. If other files also need to know the value, you use DECLARE_foo in them. There's also pretty good support for testing, so unit tests can set different flags independently.
For command lines and C++, I've been a fan of TCLAP: Templatized Command Line Argument Parser.
http://sourceforge.net/projects/tclap/
Well, you're not going to like my answer. I use boost::program_options. The interface takes some getting used to, but once you have it down, it's amazing. Just make sure to do boatloads of unit testing, because if you get the syntax wrong you will get runtime errors.
And, yes, I store them in a singleton object (read-only). I don't think it's evil in that case. It's one of the few cases I can think of where a singleton is acceptable.
If Boost is overkill for you, GNU Gengetopt is probably, too, but IMHO, it's a fun tool to mess around with.
And, I try to stay away from global options objects, I prefer to have each class read its own config. Besides the whole "Globals are evil" philosophy, it tends to end up becoming an ever-growing mess to have all of your configuration in one place, and also it's harder to tell what configuration variables are being used where. If you keep the configuration closer to where it's being used, it's more obvious what each one is for, and easier to keep clean.
(As to what I use, personally, for everything recently it's been a proprietary command line parsing library that somebody else at my company wrote, but that doesn't help you much, unfortunately)
I've been using TCLAP for a year or two now, but randomly I stumbled across ezOptionParser. ezOptionParser doesn't suffer from "it shouldn't have to be this complex"-syndrome the same way that other option parsers do.
I'm pretty impressed so far and I'll likely be using it going forward, specifically because it supports config files. TCLAP is a more sophisticated library, but the simplicity and extra features from ezOptionParser is very compelling.
Other perks from its website include (as of 0.2.0):
Pretty printing of parsed inputs for debugging.
Auto usage message creation in three layouts (aligned, interleaved or staggered).
Single header file implementation.
Dependent only on STL.
Arbitrary short and long option names (dash '-' or plus '+' prefixes not required).
Arbitrary argument list delimiters.
Multiple flag instances allowed.
Validation of required options, number of expected arguments per flag, datatype ranges, user defined ranges, membership in lists and case for string lists.
Validation criteria definable by strings or constants.
Multiple file import with comments.
Exports to file, either set options or all options including defaults when available.
Option parse index for order dependent contexts.
GNU getopt is pretty nice. If you want a C++ feel, consider getoptpp which is a wrapper around the native getopt.
As far as configuration file is concerned, you should try to make it as stupid as possible so that parsing is easy. If you are bit considerate, you might want to use yaac&lex but that would be really a big bucks for small apps.
I also would like to suggest that you should support both config files and command line options in your application. Config files are better for those options which are to be changed less frequently. Command-line options are good when you want to pass the immediate changing arguments (typically when you are creating a app, which would be called by some other program.)
If you are working with Visual Studio 2005 on x86 and x64 Windows there is some good Command Line Parsing utilities in the SimpleLibPlus library. I have used it and found it very useful.
Not sure about command line argument parsing. I have not needed very rich capabilities in that area and have generally rolled my own to save adding more dependencies to my software. Depending upon what your needs are you may or may not want to try this route. The C++ programs I have written are generally not invoked from the command line.
On the other hand, for a config file you really can't beat an XML based format. It's readable, extensible, structured, etc... :) Plus there are lots of XML parsers out there. Despite the fact it is a C library, I tend to use libxml2 from xmlsoft.org.
Try Apache Ant. Its primary usage is Java projects, but there isn't anything Java about it, and its usable for almost anything.
Usage is fairly simple and you've got a lot of community support too. It's really good at doing things the way you're asking.
As for global options in code, I think they're quite necessary and useful. Don't misuse them, though.

Reading/Understanding third-party code

When you get a third-party library (c, c++), open-source (LGPL say), that does not have good documentation, what is the best way to go about understanding it to be able to integrate into your application?
The library usually has some example programs and I end up walking through the code using gdb. Any other suggestions/best-practicies?
For an example, I just picked one from sourceforge.net, but it's just a broad engineering/programming question:
http://sourceforge.net/projects/aftp/
I frequently use a couple of tools to help me with this:
GNU Global. It generates cross-referencing databases and can produce hyperlinked HTML from source code. Clicking function calls will take you to their definitions, and you can see lists of all references to a function. Only works for C and perhaps C++.
Doxygen. It generates documentation from Javadoc-style comments. If you tell it to generate documentation for undocumented methods, it will give you nice summaries. It can also produce hyperlinked source code listings (and can link into the listings provided by htags).
These two tools, along with just reading code in Emacs and doing some searches with recursive grep, are how I do most of my source reverse-engineering.
One of the better ways to understand it is to attempt to document it yourself. By going and trying to document it yourself, it forces you to really dive in and test and test and test and make sure you know what each statement is doing at what times. Then you can really start to understand what the previous developer may have been thinking (or not thinking for that matter).
Great question. I think that this should be addressed thoroughly, so I'm going to try to make my answer as thorough as possible.
One thing that I do when approaching large projects that I've either inherited or contributing to is automatically generate their sources, UML diagrams, and anything that can ease the various amounts of A.D.D. encountered when learning a new project:)
I believe someone here already mentioned Doxygen, that's a great tool! You should look into it and write a small bash script that will automatically generate sources for the application you're developing in some tree structure you've setup.
One thing that I've haven't seen people mention is BOUML! It's fantastic and free! It automatically generates reverse UML diagrams from existing sources and it supports a variety of languages. I use this as a way to really capture the big picture of what's going on in terms of architecture and design before I start reading code.
If you've got the money to spare, look into Understand for %language-here%. It's absolutely great and has helped me in many ways when inheriting legacy code.
EDIT:
Try out ack (betterthangrep.com), it is a pretty convenient script for searching source trees:)
Familiarize yourself with the information available in the headers. The functions you call will be declared there. Then try to identify the valid arguments and pre-/post-conditions of the functions, as those are your primary guidance (even if they are not documented!). The example programs are your next bet.
If you have code completion/intellisense I like opening up the library and going '.' or 'namespace::' and seeing what comes up. I always find it helpful, you can navigate through the objects/namespaces and see what functionality they have. This is of course assuming its an OOP library with relatively good naming of functions/objects.
There really isn't a silver bullet other than just rolling up your sleeves and digging into the code.
This is where we earn our money.
Three things;
(1) try to run the test or example apps available, set low debug levels, and walk through logs.
(2) use source navigator tool / cscope ( available both on windows and linux) and browse the code to understand the flow.
(3) also in parallel use gdb to walk into code while running test/example apps.

How to generate empty definitions given a header file

I have a 3rd-party library which for various reasons I don't wish to link against yet. I don't want to butcher my code though to remove all reference to its API, so I'd like to generate a dummy implementation of it.
Is there any tool I can use which spits out empty definitions of classes given their header files? It's fine to return nulls, false and 0 by default. I don't want to do anything on-the-fly or anything clever - the mock object libraries I've looked at appear quite heavy-weight? Ideally I want something to use like
$ generate-definition my_header.h > dummy_implemtation.cpp
I'm using Linux, GCC4.1
This is a harder problem than you might like, as parsing C++ can quickly become a difficult task. Your best bet would be to pick an existing parser with a nice interface.
A quick search found this thread which has many recommendations for parsers to do something similar.
At the very worst you might be able to use SWIG --> Python, and then use reflection on that to print a dummy implementation.
Sorry this is only a half-answer, but I don't think there is an existing tool to do this (other than a mocking framework, which is probably the same amount of work as using a parser).
Create one test application which reads the header file and creates the source file. Test application should parse the header file to know the function names.