library for doing diffs - c++

I've been tasked with creating a tool that can diff and merge the configuration files for my company's product. The configurations are stored as either XML or URL-encoded strings. I'm looking for a library, preferably open source with a license compatible with commercial software, that can do these diffs. Our app is written in C++, so C++ libraries would be best, but I'm willing to look at libraries that are C#-specific since I can write a wrapper that exposes it to C++ via COM. Three-way diffs would be ideal, but two-way is acceptable. If it has an understanding of XML, that would also be a plus (since XML nodes can be reordered without changing the document, etc). Any library suggestions? Should I even consider writing my own diff tools in the hopes of giving it semantic knowledge of our formats?
Thanks to this similar question, I've already discovered this google library, which seems really great, but I'm still looking for other options. It also seems to be able to output the diffs in HTML format (using the <ins> and <del> tags that I didn't know existed before I discovered it), which could be really handy, but it seems to be a unified diff only. I'm going to need to display the results in a web browser, and probably have to build an interface for doing the merges in the browser as well. I don't expect a library to be able to help with these tasks, but it must produce output in a format that is amenable to me building this on top of it. I'm currently envisioning something along the lines of TortoiseMerge (side-by-side diffs, not unified), except browser-based. Any tips/tricks/design ideas on how to present this would be appreciated too.

Subversion comes with libsvn_diff and libsvn_delta licensed under Apache Software License.

Here is a C++ library that can diff what the author calls semistructured data. It deals nicely with HTML and XML. Since your data is XML it would make a lot of sense to use this instead of plain text diff. This is especially the case when the files are machine generated.
I am currently trying to use this library to build a tool that diffs Visual Studio project files. These are basically XML files and using a plain diff tool like Winmerge is too painful because Visual Studio pretty much mucks up the whole file by crazy reordering. The idea is to do some kind of a structured diff to address the problem.

For diffing the XML I would propose that you normalize it first: sort all the elements in alphabetic order, then generate a stream of tokens/xml that represents the original document but is independent of the original formatting. After running the diff, parse the result to get a tree containing what was added / removed.

Related

How to write to, edit, and retrieve specific cells from an Excel doc with C++?

Basically, I want to be to be able to pass data between Excel cells and
my C++ program. I don't have any experience in Excel/C++ interactions and I haven't been able to find a coherent explanation or documentation on any websites. If someone could link me some references or provide one themselves it would be much appreciated. Thanks.
If this is for a Windows system, you could always use one of the available managed Excel libraries, such as OfficeWriter or Aspose.
There also might be similar libraries specifically for c++, I know we (OfficeWriter) used to make one.
Edit: Looks like there are a few out there, like LibXL and BasicExcel.
If the application will run on an end user machine with Excel installed, you can easily use the Excel interop and keep Excel hidden.
In addition to LibXL and BasicExcel mentioned by smoore, there is:
ExcelFormat Library is an improved version of the BasicExcel library and will allow you to read and write simple values. It is free.
xlslib will also read and write simple values, I have not tried it tho. It is also free.
Number Duck, is a commercial library that I have written, It supports reading and writing values, formulas and pictures. The website has examples of how to use the features.

Handling really large multi language projects

I am working on an really large multi language project (1000+ Classes + Configs + Scripts), with files distributed over network drives. I am having trouble fighting through the code, since the available Tools are not helping. The main problem is finding things. For the C++ Part: VS with VAX can only find files and symbols which are in the solution. A lot of them are not. Same problem with Reshaper. Right now i am stuck with doing unindexed string and file searches, which is highly inefficient on a network drive. I heared that SourceInsight would be an option since it allows you to just specify the folders that are part of the project and than indexes them, but my company wont spent money on it.
So my question ist: what Tools are there available to fight through an incredible large amount of code? And if possible they should be low cost or even free/open source.
Check out -
ctags
cscope
idutils
snavigator
In every one of these tools, you would have to invest(*) some time in reading the documentation, and then building your index. Consider switching to an editor that will work with these tools.
(*): I do mean invest, because it will reap dividends once you do.
hope this helps,
If you need to maintain a large amount of code, you really should have a source code managment system, a lot of them will help you find text by indexing all the files
And Most of them will work with various language.
Otherwise you can install some indexer like Apache Lucene and index all your files...
You should take a look at LXR. This is used by many Linux kernel source listings.
Try ndexer http://code.google.com/p/ndexer/
promises to Handle extremely large codebases!
The Perl program ack is also worth a look -- think of it as multi-file grep on steroids. The new version (in what I would call late beta) even lets you specify regexes for the files to process as well as regexes to search for -- a feature I've used extensively since it came out (I've got a subproject with 30k lines in 300+ classes, where this feature has been very helpful). You can even chain the new ack with itself so you can subselect the files to process.
VS with VAX can only find files and symbols which are in the solution. A lot of them are not.
You can add all the files that are not in your solution and set them to not build in the settings. Your VS build will not be affected by this, but now VS knows about those files and you can search them along with your VS native files.

Libraries for .odt formatting

Are there any C/C++ libraries available that can be used in creating, loading and saving files in .odt-format?
Alternatively, where can I find tutorial-like information on implementing .odt(/ODF) specifications?
This site: OASIS OpenDocument Essentials seems to cover the problem, including howto-examples and shortcuts.It's quite nicely done and easy to follow.
Flexibility can be perceived as complexity. If you don't need the
flexibility, create a template ODT and just fill in the content as
needed. As mentioned, there exist XML parsers to actually handle IO.
ODT isn't a plaintext file, so some complexity/difficulty is expected.
– Ioan
From the link:
The Virtues of Cheating
As you begin to work with OpenDocument files, you may want to write a
program that constructs a document with some feature that isn’t
explained in this book—this is, after all, an “essentials” book. Just
start OpenOffice.org or KOffice, create a document that has the
feature you want, unpack the file, and look for the XML that
implements it. To get a better understanding of how things works,
change the XML, repack the document, and reload it. Once you know how
a feature works, don’t hesitate to copy and paste the XML from the
OpenDocument file into your program. In other words, cheat. It worked
for me when I was writing this book, and it can work for you too!

c++ code structure into html files

I work on unix.
I have my complete source code in unix in the form of building blocks and modules.
Like headers,sources files,make files etc.
I can copy all the files with the same directory structure to windows.
I need some tool which will convert all the source to html tags with all the links to functions,variables,classes,headers.There should be some tool to do this easily.
by this way it would be easy for debugging the code in a fast way.
Is anybody aware of such tool?
The term you're probably looking for is "documentation generator". You're specifically interested in ones that output HTML files.
Doxygen is popular, but if you want a master comparison list of documentation generators Wikipedia has a summary:
http://en.wikipedia.org/wiki/Comparison_of_documentation_generators
Looking at the output generated by the different programs (on projects that use them) will probably inform your choice of which meets your needs.
You can use doxygen to generate your documentation. In its basic form it will generate what you need but to add comments that appear in the final html you will need to use special style comments.

Small library for generating HTML files in C++

Is there a library that will allow easier generation of a simple website using C++ code. This 'website' will then be compiled into a CHM help file (which is the final goal here). Ideally, it will allow generation of pages easily and allow links to be generated between pages easily. I can do this all by hand, but that is going be very tedious and error prone.
I know about bigger libraries such as Wt, but am more interested in smaller ones with little or no dependencies and a need for installation.
You can try CTPP template engine. It is written in C++ is small and quite fast.
Do you need this project to be written in c++? Because if you just need to prepare documentation in CHM I would go with Sphinx. Sphinx is a set of tools written in Python that generate manuals in few formats (chm, html, LaTeX, PDF) from text files (formated using reStructuredText markup language). Those text files could be created by hand or using some application and then combined into one manual using Sphinx. In my work right now we are using this solution to write documentation, because it is very easy to maintain text files (merging, tracking changes etc.) than for example html or doc. Sphinx is used to generate Python language documentation (chm), so it is capable to handle really large project.
I've used the FLATE library every day for ten years and it works flawlessly. It's a piece of cake to use; I can't recommend it enough.
It will definitely do the trick, though probably at a much lower level than you have in mind. It is a C-language source library that you can link with a C++ caller. It's also available as a Perl module, but I haven't used that.
FLATE library
Flate is a template library used to deal with html code in CGI applications. The library includes C and Perl support. All html code is put in an external file (the template) and printed using the library functions: variables, zones (parts to be displayed or not) and tables (parts to be displayed 0 to n times). Using this method you don't need to modify/recompile your application when modifying html code, printing order doesn't matter in your CGI code, and your CGI code is much cleaner.
HTH and good luck!
Are this CHM lib and the related links what you're looking for?