I am going to develop a generic C++/Qt GUI tool for data I/O from the user.
The data will be directed to/from the core application through a file.
However the same task could be performed by a spreadsheet. The only doubt I have is whether spreadsheets can save/load only the data that have changed since the last save/load operation, even in a temporary file.
I would like to know if this is a common feature among spreadsheets (especially the open source ones).
Do you mean with spreadsheet a software like Excel or or OpenOffice calc? That's a big difference to a customized Qt application. I am sure you can do this with Excel or OpenOffice calc. To decide the way to go it is more important which other requirements you have. Who should use the application and for which purpose? Do you know the neccessary programming languages/frameworks? Which functions should it implement?
Without a LOT more details you will not get a good answer here.
It seems that with spreadsheets (e.g. Excel, LibreOffice Calc, ...) it is not possible to save/load portions of the project, not even in a temporary delta file.
For these tasks, a database is the tool to use.
Related
If I have a vector of int in memory, is there a way to pass this to Excel? Currently the only way I know is to write the data to file as a csv, then pass the saved file to excel to execute.
Actually the only direct connection would be using a DDE link.
Now I don't usually recommend that, as it is a bit of an aged technology, it sure fits the description about right, so I thought I'd mention it.
I've seen in used for real-time trading systems in international dealing rooms. However, keep in mind that Excel is not a tool to expect true realtime performance from. It is more about near-realtime visualization + custom business logic in Excel.
From Wikipedia:
A common use of DDE is for custom-developed applications to control off-the-shelf software. For example, a custom in-house application might use DDE to open a Microsoft Excel spreadsheet and fill it with data, by opening a DDE conversation with Excel and sending it DDE commands. Today, however, one could also use the Excel object model with OLE Automation (part of COM). The technique is, however, still in use, particularly for distribution of financial data.[2]
Also: on Joel Spolsky's old forum:
The data comes in way too fast an furious for SOAP. Anyway, the data is push, not pull, so SOAP doesn't really cut it. Most of the vendors on Windows give you a DDE server. Since it uses Windows messages for IPC at the lowest level, it's screamingly fast, which it needs to be to keep up with today's ticker.
Have you checked COM interop with Excel? These classes seem to have first-class wrappers for .NET, but you should be able to access them from C++ with a bit of work.
Note that you may still pay for marshalling to Excel as std::vector<> is not compatible with the COM binary API.
The requirement is to build a calculation engine which is performant and supports excel like formulas. These formulas need to be applied on huge data sets (millions of rows of data).
I was thinking if something could be built on top of OpenOffice Calc service and make it available as a Calculation Engine.
Does anyone have any experience in doing this ? Are there any other alternatives ? I know it is possible using Excel service but we are an Open Source shop. M$ is ruled out.
Any pointers would be very helpful.
Edited based on High Performance Mark's inputs.
Numerics calculations are needed. Scientific calculations are not in scope (ie., Sin(x), tanh(x) etc)
Calculation are not performed by end users. The formulas are stored in the DB and applied on the datasets. The formulas (like tax calculation) are configured. So if the formula changes, recalculation will be triggered via the application.
spreadsheet like formulas are well understood by wider audience and should be easier to read and maintain. Is there any wrapper around R (or such equivalent) that will convert spreadsheet formula into R syntax ?
Well, a little Googling finds several open-source Java-written spreadsheets, one of which may be suitable for your purposes. One of the questions you might want to answer, maybe edit your post, would be what calculations do you want to perform -- the full set of functionality that Excel provides (or something close) or would the facilities that SQL provides satisfy your requirements ? If so, then you might want to database this.
Another question you might clarify is this: are you trying to create an application which like Excel is usable by end-users for specifying calculations ? But, unlike Excel, is based on open-source software and can cope with millions of rows. I don't know about its performance on such large data sets, someone else on SO can probably tell us, but R is very popular (and rightly so) for what you are probably trying to do. My view is that R sits between the average programming languages (say Python) and the average spreadsheet (say Excel) in terms of ease-of-use-by-non-programmers.
Your choice of solution may (and certainly ought to) depend on who will be using it.
What solutions are there? I know only solutions for replacing Bookmarks in Word (.doc) files with Apache POI?
Are there also possibilities to change images, layouts, text-styles in .doc and .ppt documents?
I think about replacement of areas in Word and PowerPoint documents for bulk processing.
Platform: MS-Office 2003
What are your platform limitations?
Obviously Apache POI will get you at least part of the way there.
Microsoft's own COM API's are fairly powerful and are documented here. I would recommend using them if a) you are not running in a server (many users, multithreaded) environment; b) you can have a proper version of powerpoint installed on the production machine; and c) you can code against a COM object model.
It's a bit pricey, but Aspose.Slides is a very powerful library for manipulating PowerPoint files
If you include using other Office suits as an option, here's a list of possible solutions:
Apache POI-HSLF
PowerPoint 2007 APIs
OpenOffice.org UNO
Using POI you can't edit .pptx file format, but you don't depend on the apps installed on the system. Other two options, on the contrary, make use of other apps, but they are definitely better for dealing with presentations. OpenOffice has better compability with older formats, by the way. Also if you use UNO, you'll have a great choice of languages, UNO exists for Java, C++, Python and other languages.
My experience is not directly with Power Point, but I've actually rolled my own WordML (XML) generator. It a) removed all dependencies on Word, b) was very fast c) and let me build up documents from scratch.
But it was a lot of work to create. And I was only creating a write only implementation.
I'm not as familiar with Power Point, so this is conjecture, but you may be able to roll your own by reading XML (Power Point 2003??) and/or cracking the Office Open XML file (zipped XML), then using XPath to manipulate the data, and then saving everything back to disk.
This won't work on older OLE Compound Document based Power Point files though.
I've done something like that before: programmatically accessed and manipulated PowerPoint presentations. Back when I did it, it was all in C++ using COM, but similar principles apply to C#/VB .NET apps, since they do COM interop very easily.
What you're looking for is called the Office Document Model. Basically, Office applications expose their documents programmatically, as trees of objects that define their contents. These objects are accessible via an API, and you can manipulate them, add new ones, and do whatever other processing you want. It's exceedingly powerful; you can use it to manipulate pretty much all aspects of a document. But you'll need an installation of Office and Visual Studio to be able to use it.
Some links:
Intro: http://msdn.microsoft.com/en-us/library/d58327k6.aspx
Hope this helps!
Apparently new users can only include one link per posting. How lame! :)
Here's the other link I meant to include:
Example of manipulating PowerPoint documents programmatically: http://msdn.microsoft.com/en-us/library/cc668192.aspx
On an official sqlite3 web page there is written that I should think about sqlite as a replacement of fopen() function.
What do you think about it? Is it always good solution to replece application internal data storage with sqlite? What are the pluses and the minuses of such solution?
Do you have some experience in it?
EDIT:
How about your experience? Is it easy to use? Was it painful or rather joyful? Do you like it?
It depends. There are some contra-indications:
for configuration files, use of plain text or XML is much easier to debug or to alter than using a relational database, even one as lightweight as SQLite.
tree structures are easier to describe using (for example) XML than by using relational tables
the SQLite API is quite badly documented - there are not enough examples, and the hyperlinking is poor. OTOH, the information is all there if you care to dig for it.
use of app-specific binary formats directly will be faster than storing same format as a BLOB in a database
database corruption can mean the los of all your data rather than that in a single bad file
OTOH, if your internal data fits in well with the relational model and if there is a a lot of it, I'd recommend SQLite - I use it myself for one of my projects.
Regarding experience - I use it, it works well and is easy to integrate with existing code. If the documentation were easier to navigate I'd give it 5 stars - as it is I'd give it four.
As always it depends, there are no "one size fits all" solutions
If you need to store data in a stand-alone file and you can take advantage of relational database capabilities of an SQL database than SQLite is great.
If your data is not a good fit for a relational model (hierarchical data for example) or you want your data to be humanly readable (config files) or you need to interoperate with another system than SQLite won't be very helpful and XML might be better.
If on the other hand you need to access the data from multiple programs or computers at the same time than again SQLite is not an optimal choice and you need a "real" database server (MS SQL, Oracle, MySQL, PosgreSQL ...).
The atomicity of SQLite is a plus. Knowing that if you half-way write some data(maybe crash in the middle), that it won't corrupt your data file. I normally accomplish something similar with xml config files by backing up the file on a successful load, and any future failed load(indicating corruption) automatically restores the last backup. Of course it's not as granular nor is it atomic, but it is sufficient for my desires.
I find SQLite a pleasure to work with, but I would not consider it a one-size-fits-all replacement for fopen().
As an example, I just wrote a piece of software that's downloading images from a web server and caching them locally. Storing them as individual files, I can watch them in Windows Explorer, which certainly has benefits. But I need to keep an index that maps between a URL and the image file in order to use the cache.
Storing them in a SQLite database, they all sit in one neat little file, and I can access them by URL (select imgdata from cache where url='http://foo.bar.jpg') with little effort.
I'm looking for a library that will allow me to programatically modify Excel files to add data to certain cells. My current idea is to use named ranges to determine where to insert the new data (essentially a range of 1x1), then update the named ranges to point at the data. The existing application this is going to integrate with is written entirely in C++, so I'm ideally looking for a C++ solution (hence why this thread is of limited usefulness). If all else fails, I'll go with a .NET solution if there is some way of linking it against our C++ app.
An ideal solution would be open source, but none of the ones I've seen so far (MyXls and XLSSTREAM) seem up to the challenge. I like the looks of Aspose.Cells, but it's for .NET or Java, not C++ (and costs money). I need to support all Excel formats from 97 through the present, including the XLSX and XLSB formats. Ideally, it would also support formats such as OpenOffice, and (for output) PDF and HTML.
Some use-cases I need to support:
reading and modifying any cell in the spreadsheet, including formulas
creating, reading, modifying named ranges (the ranges themselves, not just the cells)
copying formatting from a cell to a bunch of others (including conditional formatting) -- we'll use one cell as a template for all the others we fill in with data.
Any help you can give me finding an appropriate library would be great. I'd also like to hear some testimonials about the various suggestions (including the ones in my post) so I can make more informed decisions -- what's easy to use, bug-free, cheap, etc?
The safest suggestion is to just use OLE. It uses the COM, which does not require .NET at all.
http://en.wikipedia.org/wiki/OLE_Automation <--about halfway down is a C++ example.
You may have to wrap a few functionalities into functions for usability, but it's really not ugly to work with.
EDIT: Just be aware that you need a copy of Excel for it to work. Also, there's some first-party .h files that you can find specific to excel. (it's all explained in the Wikipedia article)
I don't know if this is an option for you, but the new office 2007 formats are in zipped XML format, which makes it very doable to do your own modifications. See here for the specifications.
SpreadsheetGear for .NET will handle your requirements and has an API which is very similar to Excel.
When you insert cells, your defined names (and any other formulas / charts / etc...) will automatically be fixed up to reference the new range (just as they would in Excel). So you would not need to update your defined names (although there is complete support for creating and updating defined names if that is what you want to do).
SpreadsheetGear is a .NET component, but you can build your own wrapper which is callable from C++.
You can see what our customers say and download the free, fully functional evalution here.
Have you already tried using the Excel COM interfaces? Obviously Excel needs to be install on the machine, and it's a pain to deal with...
I would argue that a .net solution with COM interop for linking into your C++ application is the best solution. In more than ten years of working with them, I've never seen a COM automation of Excel that didn't leak memory somewhere.
If you need to automate Excel, I recommend Visual Studio Tools for Office. If you don't need to automate, only modify files and those files can be in Office 2007 format, you're better off finding a library that manipulates the files directly instead of opening Excel to do it.
I ended up using Aspose.Cells as I mentioned in my original post, since it seemed like the easiest path. I'm very happy with the way it turned out, and their support is very good. I had to create a wrapper around it in C# that exported a COM interface to my C++ application.