Spreadsheets: insert into columns as if a stack - c++

First: I know this would be much easier if it was a .CSV but that is not possible (I'd 'a written the code in the time I wrote this post).
I want to insert numbers given by the user along with a time-stamp into a spreadsheet. There will be a graph in the spreadsheet that automatically generates based on columns a and b, hence the need to not be a .CSV. Column A holds Double-Floats of range 0 through 500 and Column B holds Date and Time information. Inserted rows must be at the top, thus pushing all existing data down by one row, each time.
I've been writing this manually and I think its time to stop doing that. I don't really care what language it is done in, but I would prefer C/C++ using at most the boost libraries. All libraries MUST be open-source. OS is Linux and input should from terminal or at least be given to the program as a parameter, such that the user's input could be piped into the program.
I found this, but I'm not sure if it is the best method as I'm not necessarily locked into python.
Insert row into Excel spreadsheet using openpyxl in Python
Thanks for any and all help.

Have you tried this? A C library that read Excel (xls) files: http://libxls.sourceforge.net.
Hope this meet your need.
An alternative: http://www.libxl.com, more powerful but not open source.

Related

DBF Table Join without using Arcpy?

I have created a rather large CSV file (63000 rows and around 40 columns) and I want to join it with an ESRI Shapefile.
I have used ArcPy but the whole process takes 30! minutes. If I make the join with the original (small) CSV file, join it with the Shapefile and then make my calculations with ArcPy and continously add new fields and calculate the stuff it takes 20 minutes. I am looking for a faster solution and found there are other Python modules such as PySHP or DBFPy but I have not found any way for joining tables, hoping that could go faster.
My goal is already to get away from ArcPy as much as I can and preferable only use Python, so preferably no PostgreSQL and alikes either.
Does anybody have a solution for that? Thanks a lot!
Not exactly a programmatical solution for my problem but a practical one:
My shapefile is always static, only the attributes of the features will change. So I copy my original shapefile (only the basic files with endings .shp, .shx, .prj) to my output folder and rename it to the name I want.
Then I create my CSV-File with all calculations and convert it to DBF and save it with the name of my new shapefile to the output folder too. ArcGIS will now load the shapefile along with my own DBF file and I don't even need to do any tablejoin at all!
Now my program runs through in only 50 seconds!
I am still interested in more solutions for the table join problem, maybe I will encounter that problem again in the future where the shapefile is NOT always static. I did not really understand Nan's solution, I am still at "advanced beginner" level in Python :)
Cheers

Excel international date formatting

I am having problems formatting Excel datetimes, so that it works internationally. Our program is written in C++ and uses COM to export data from our database to Excel, and this includes datetime fields.
If we don't supply a formatting mask, some installations of Excel displays these dates as Serial numbers (days since 1900.01.01 followed by time as a 24-hour fraction). This is unreadable to a human, so we ha found out that we MUST supply a date formatting mask to be sure that it displays readable.
The problem - as I see it - is that Excel uses international formatting masks. For example; the UK datetime format mask might be "YYYY-MM-DD HH:MM".
But if the format mask is sent to an Excel that is installed in Sweden, it fails since the Swedish version of the Excel uses "ÅÅÅÅ-MM-DD tt:mm".
It's highly impractical to have 150 different national datetime formatting masks in our application to support different countries.
Is there a way to write formatting masks so that they include locale, such that we would be allowed to use ONE single mask?
Unless you are using the date functionality in Excel, the easiest way to handle this is to decide on a format and then create a string yourself in that format and set the cell accordingly.
This comic: http://xkcd.com/1179/ might help you choose a standard to go with. Otherwise, clients that open your file in different countries will have differently formatted data. Just pick a standard and force your data to that standard.
Edited to add: There are libraries that can make this really easy for your as well... http://www.libxl.com/read-write-excel-date-time.html
Edited to add part2: Basically what I'm trying to get at is to avoid asking for the asmk and just format the data yourself (if that makes sense).
I recommend doing the following: Create an excel with date formatting on a specific cell and save this for your program to use.
Now when the program runs it will open this use this excel file to retrieve the local date formatting from the excel and the specified cell.
When you have multiple formats to save just use different cells for them.
It is not a nice way but will work afaik.
Alteratively you could consider creating an xla(m) file that will use vba and a command to feed back the local formatting characters through a function like:
Public Function localChar(charIn As Range) As String
localChar = charIn.NumberFormatLocal
End Function
Also not a very clean method, but it might do the trick for you.

Comparing two documents

I have two very large lists. They both were originally in excel, but the larger one is a list of emails (about 160,000) of them with other information like their name and address etc. And the smaller one is a list of just 18,000 emails.
My question is what would be the easiest way to get rid of all 18,000 rows from the first document that contain the email addresses from the second?
I was thinking regex or maybe there is another application I can use? I have tried searching online but it seems like there isn't much specific to this. I also tried notepad++ but it freezes when I try to compare these large files.
-Thank You in Advance!!
Good question. One way I would tackle this is making a C++ program [you could extrapolate the idea to the language of your choice; You never mentioned which languages you were proficient in] that read each item of the smaller file into a vector of strings. First, of course, use Excel to save the files as CSV instead of XLS or XLSX, which will comma-separate the values so you can work with them easier. For the larger list, "Save As" a copy of just email addresses, deleting the other rows for now.
Then, you could open the larger list and use a nested loop to check if you should output to an output file. Something like:
bool foundMatch=false;
for(int y=0;y<LargeListVector.size();y++) {
for(int x=0;x<SmallListVector.size();x++) {
if(SmallListVector[x]==LargeListVector[y]) foundMatch=true;
}
if(!foundMatch) OutputVector.append(LargeListVector[y]);
foundMatch=false;
}
That might be partially pseudo-code, but do you get the idea?
So I read a forum post at : Here
=MATCH(B1,$A$1:$A$3,0)>0
Column B would be the large list, with the 160,000 inputs and column A was my list of things I needed to delete of 18,000.
I used this to match everything, and in a separate column pasted this formula. It would print out either an error or TRUE. If the data was in both columns it printed out true.
Then because I suck with excel, I threw this text into Notepad++ and searched for all lines that contained TRUE (match case, because in my case some of the data had the word true in it without caps.) I marked those lines, then under search, bookmarks, I removed all lines with bookmarks. Pasted that back into excel and voila.
I would like to thank you guys for helping and pointing me in the right direction :)

Library for data storage and analysis

So, I have this program that collects a bunch of interesting data. I want to have a library that I can use to sort this data into columns and rows (or similar), save it to a file, and then use some other program (like OpenOffice Spreadsheet, or MATLAB since I own it, or maybe some other spreadsheet/database grapher that I don't know of) to analyse and graph the data however I want. I prefer this library to be open source, but it's not really a requirement.
Ok so my mistake, you wanted a writer. Writing a CSV is simple and apparently reading them into matlab is simple too.
http://www.mathworks.com.au/help/techdoc/ref/csvread.html
A CSV has a simple structure. For each row you seperate by newline. and each column is seperated by a comma.
0,10,15,12
4,7,0,3
So all you really need to do is grab your data, seperate it by rows then write a line out with each column seperated by a comma.
If you need a code example I can edit again but this shouldn't be too difficult.

use uno (openoffice api) to open spreadsheet *without* recalculation

I'm using pyuno to read an excel spreadsheet (running on linux.) Many cells have formulas referring to addins that are, obviously, not available. However the cell values are what I want.
But when I load and read the sheet, it seems those formulas are being evaluated and thus the values are being overwritten with errors.
I've tried several things, none of which have worked:
set flags AutomaticCalculation=False, MacroExecutionMode=NEVER_EXECUTE in the call to desktop.loadComponentFromURL
call document.enableAutomaticCalculation(False) on the loaded document
Any suggestions?
If formluas aren't a matter, you might circumvent the problem by processing a copy of your spreadsheet in which only the values (not the formulas) are present.
To achieve this quickly, select the whole sheet content, copy, special paste; then remove everything except "value". Save to a new file (make sure you don't overwrite the original file or every formula will be lost!). Your script should then be able to process this file.
This is an ugly solution, as there must be a way to do it programmaticaly.
Calc does not yet support using the cached results after loading the document. Libreoffice Calc does now use cached results for xls documents. The results are also stored in ods but are ignored while loading the document and the formula result is evaluated by compiling and interpreting the saved formula.
There are some plans to add this for ods and xlsx too but there are many ods producers out there writting incorrect results in the file. So till now the only solution is to have a second version of the document only saving the results (or implementing it inside calc).