Automatically convert an xlsx file into multiple (MS-DOS) CSV files (one per sheet) in Windows - c++

Currently I'm just saving the file as MS-DOS CSV with excel. I'm looking for the fastest way (in terms of writing the code) of doing it automatically.
I strongly prefer C++, but any simple executable program I can call from a C++ app would do.

Unzip the xslx file with eg WinZip and have a look at the resulting files. This may help.

Related

Opening an existing .doc file using ofstream in C++

Assuming I have a file with .doc extension in Windows platform, how can I open the the file for outputting its contents on the screen using the ofstream object in C++? I am aware that the object can be used to open files in text and binary modes. But I would like to know if a .doc (or even .pdf) file can be opened and its contents read.
I've never actually done this before, but after reading up on it, I think I might have a suggestion. The .docx format is actually just XML that is zipped up. After unzipping, the file is located at word/document.xml. Doing this in a program is where it gets fun.
Two options: If you're using C++ CLR (.NET) then Microsoft has an SDK for you. It should make it pretty easy to open Office documents.
Otherwise if you're just using regular C++, you might have to do some extra work.
Open the file and unzip it using a library like zlib
Find the document.xml file inside
Parse the XML document. You'll probably want to use some kind of XML parsing library for this. You'll have to look up the specs for the XML to figure out how to get the text you want.
C++ std library has ifstream class that can be used to read simple text files, and for read binary files too.
It is up to you to interpret these bytes in the file. To proper interpret the binary file you need to know the format of the file.
If you think of MS Word files then I would start from here: http://en.wikipedia.org/wiki/Office_Open_XML to understand MS Word 2007 format.
You might find the Boost Iostreams library ( http://www.boost.org/doc/libs/1_52_0/libs/iostreams/doc/home.html ) somehow useful if you want to make some filter by yourself.

Difference in file size of an Excel file when downloading directly as opposed to open and saving it

May be the title of my question is really awful but I couldn't figure a better way to frame it. So the problem is I have a Silverlight web app that does some processing and generates an Excel file as output. THe Excel generation code uses OpenXML format to create various XML parts and packages and using System.Packaging.CompressionOptions I compress the file generated. Now, when the browser (IE 9) shows a download options box, if I click Open to open the file in Excel and then do a SaveAs, it saves the file with a further reduced size as opposed to if I hit Save directly on the download box in which case it saves it with whatever size the file was created with.
Any ideas why these 2 ways of saving the same file result in different sizes?
Cheers
Depending on how you used the OpenXML library, there might be some inefficiencies or errors. Resaving the file in Excel will fix any duplicate formatting, update the metadata (possibly reducing it) and fix any validation errors. I encourage getting the Open XML SDK 2.0 Productivity Tool provided with the OpenXML SDK to check for any validation errors and to better understand where more inefficiencies might lie. It is possible to automatically resave the file using Excel by using Interop (using C# anyways).

Programmatically creating Excel file in C++

I have seen programs exporting to Excel in two different ways.
Opening Excel and entering data cell by cell (while it is running it looks like a macro at work)
Creating an Excel file on disk and writing the data to the file (like the Export feature in MS Access)
Number 1 is terribly slow and to me it is just plain aweful.
Number 2 is what I need to do. I'm guessing I need some sort of SDK so that I can create Excel files in C++.
Do I need different SDKs for .xls and .xlsx?
Where do I obtain these? (I've tried Googling it but the SDKs I've found looks like they do other things than providing an interface to create Excel files).
When it comes to the runtime, is MS Office a requirement on the PC that needs to create Excel files or do you get a redistributable DLL that you can deploy with your executable?
You can easily do that by means of the XML Excel format. Check the wikipedia about that:
http://en.wikipedia.org/wiki/Microsoft_Excel#XML_Spreadsheet
This format was introduced in Excel 2002, and it is an easy way to generate a XLS file.
You can also try working with XLS/XLSX files over ODBC or ADO drivers just like databases with a limited usage. You can use some templates if you need formatting or create the files from stratch. Of course you are limited by playing with the field values that way. For styling etc. you will need to use an Excel API like Microsoft's.
I'm doing this via Wt library's WTemplate
In short, I created the excel document I wanted in open office, and save-as excel 2003 (.xml) format.
I then loaded that in google-chrome to make it look pretty and copied it to the clipboard.
Now I'm painstakingly breaking it out into templates so that Wt can render a new file each time.

Prgrammably making excel spreadsheets (97 - 2003 format)

I was wondering how difficult it would be to make an application like this. Basically, I have some old html files that use tables. I want to put these tables into excel for easier reading and manipulation. I only have text, I have no numbers of formulas or anything.
Are there any tutorials on how to do this sort of thing?
The application would produce .xls
Thanks
You have three options:
Output a CSV file. While not an XLS file, Excel is more than capable of opening such a file, and it's extremely easy to create. You need nothing more than standard C++ to implement this solution. This is by far the easiest and quickest way to output to Excel (or any spreadsheet program, for that matter).
Use OLE automation. Microsoft even has a Knowledge Base article that provides an example of how to invoke Excel from your native C++ application and fill in some values. If you absolutely need to output XLS files, this is the easiest way to go. Note that users must have Excel installed on their computers for this to work.
Create your own XLS writer. Don't even bother with this option unless you really want to generate XLS files without requiring Excel to be installed on end-user computers. Options 1 and 2 are more than good enough for just about any application.
You don't need to reverse-engineer the XLS format; Microsoft documents the excel file format here. Due to the evolution of Excel over the years, it's not exactly a clean specification.
If you don't mind installing a copy of Excel along with your program, using OLE Automation would be much easier.
The simplest thing to do is simply create a CSV file. If you have column headers, put them in the first row. CSV files can be opened natively in Excel as if they were Excel spreadsheets.
There is a trick here: save .html tables with the .xls extension and Excel can read them (ie Excel can read the output of the DataGrid control).
But, if you want to create 'real' Excel files, then you can either use Excel Interop (which could be messy, requires Excel and the PIA's to be installed on the machine, and needs careful memory management (since its COM)). You could also opt for a 3rd-party library like FlexCel - which will avoid many of the InterOp problems but will not give you 'complete' Excel functionality (addins, custom vba macros etc.). For most uses, however, a 3rd party library should do the trick.
Looks like there's another alternative called ExcelFormat. I didn't try it, though.

Convert .odt .doc .ods files to .txt files

I want to convert all the .odt .doc .xls .pdf files to .txt files.
I want to convert these files to text files using a shell script or a perl script
There's a program for odt files and alikes:
odt2txt - avaliable in repos.
$ unoconv --format=txt document1.odt
Should produce document1.txt.
OpenOffice has a built-in document converter capable of handling a bunch of formats- take a look at unoconv: http://dag.wieers.com/home-made/unoconv/
That being said, I have had some troubles getting that to work in the past- If you're having trouble, take a look at similar programs for AbiWord (another open source word processor).
For word documents, you can try antiword, at least on linux. It's a command line utility that takes a word document as an argument, and spits out the text from that document (as best as it can figure) to Standard Output. Maybe you can specify an ouput file too. I can't remember the details of how it works. I haven't used it in a while. Not sure if it can handle OO documents.
It's certainly possible to do this, though there is something strange and impenetrable about the OO project and its documentation that makes things like this hard to research and follow. However, OO has the capability to convert all of those types, not just the OO native ones, and it can do it via two different forms of automatic control.
These are the two general approaches.
You can start OO and tell it to execute a macro which does this job for you for a given file. You then just have to write the macro and a script to loop over your files. The syntax is something like
$ oowriter -headless filename macro://dir/Standard.Module1.sMySub
The other thing OO has is a network API. This is based on something called UNO.
$ oowriter -accept=accept-string
Notifies the OpenOffice.org software that upon the creation of
"UNO Acceptor Threads", a "UNO Accept String" will be used.
You will need some sort of client library. I think they have one for Python at least. Using this technology a Python program or some other scripting language with an OO client library could drive the program and convert all the files. Since OO reads MSO, it should be able to do all of them.
Open the file in LibreOffice. Click on "File", "Save-as" scroll down to find the text option. Click that and it will be saved as a text file.
FYI, I had an *.ODT file that was 339.2 KB in size. When I save-as text the size of the file shrunk to ONLY 5.0 KB. Another reason for saving your files as text files.
For the Microsoft formats, look into the wvWare tools.
Open .ods file normally in libre office
Highlight text to be converted
Open a terminal
Run vi
Press "i" to get insert mode
Press ctrl-shift-v
Done!
Need some formatting?
Save the file as
Get out of vi
Run:
$cat | column >filename2
This worked in opensuse running KDE
Substitute "kwrite" for "vi", if you want