I'm using pyuno to read an excel spreadsheet (running on linux.) Many cells have formulas referring to addins that are, obviously, not available. However the cell values are what I want.
But when I load and read the sheet, it seems those formulas are being evaluated and thus the values are being overwritten with errors.
I've tried several things, none of which have worked:
set flags AutomaticCalculation=False, MacroExecutionMode=NEVER_EXECUTE in the call to desktop.loadComponentFromURL
call document.enableAutomaticCalculation(False) on the loaded document
Any suggestions?
If formluas aren't a matter, you might circumvent the problem by processing a copy of your spreadsheet in which only the values (not the formulas) are present.
To achieve this quickly, select the whole sheet content, copy, special paste; then remove everything except "value". Save to a new file (make sure you don't overwrite the original file or every formula will be lost!). Your script should then be able to process this file.
This is an ugly solution, as there must be a way to do it programmaticaly.
Calc does not yet support using the cached results after loading the document. Libreoffice Calc does now use cached results for xls documents. The results are also stored in ods but are ignored while loading the document and the formula result is evaluated by compiling and interpreting the saved formula.
There are some plans to add this for ods and xlsx too but there are many ods producers out there writting incorrect results in the file. So till now the only solution is to have a second version of the document only saving the results (or implementing it inside calc).
Related
I have created a rather large CSV file (63000 rows and around 40 columns) and I want to join it with an ESRI Shapefile.
I have used ArcPy but the whole process takes 30! minutes. If I make the join with the original (small) CSV file, join it with the Shapefile and then make my calculations with ArcPy and continously add new fields and calculate the stuff it takes 20 minutes. I am looking for a faster solution and found there are other Python modules such as PySHP or DBFPy but I have not found any way for joining tables, hoping that could go faster.
My goal is already to get away from ArcPy as much as I can and preferable only use Python, so preferably no PostgreSQL and alikes either.
Does anybody have a solution for that? Thanks a lot!
Not exactly a programmatical solution for my problem but a practical one:
My shapefile is always static, only the attributes of the features will change. So I copy my original shapefile (only the basic files with endings .shp, .shx, .prj) to my output folder and rename it to the name I want.
Then I create my CSV-File with all calculations and convert it to DBF and save it with the name of my new shapefile to the output folder too. ArcGIS will now load the shapefile along with my own DBF file and I don't even need to do any tablejoin at all!
Now my program runs through in only 50 seconds!
I am still interested in more solutions for the table join problem, maybe I will encounter that problem again in the future where the shapefile is NOT always static. I did not really understand Nan's solution, I am still at "advanced beginner" level in Python :)
Cheers
I have built something in SAS to pull down Yahoo! finance .csv data. The code I have built now works fine and I have built some robust error handling into the code. The problem I have had with the data though is that the .csv feed is unsupported and not clean.
The data is comma delimited, but some of the data also has commas in it. Some of the fields are in quotes and some are not. Also the length of the fields varies wildly as as well. A field like Market Capitlisation for example could run form a few million to hundreds of billions.
As a result, if you pass multiple stock metrics for multiple stocks through to the Yahoo! API at the same time, you will get rows of .csv data where each field is in a different place, is a different length and is inconsistently delimited.
I have tried multiple infile options that could handle some of these errors in isolation, but not all of them together. My only solution that works is to download single stock metrics by multiple stocks at the same time.
This gives me what I want, but it takes over an hour to run the data for the NASDAQ and the NYSE. Have I overlooked another method for handling this type of problem?
Thanks
This is the outline of a way to do what you are looking for. The whole of the code to do this would be too long to post here and out of scope of what this site looks to do.
Create a SAS program that takes a stock ticker from the SYSPARM automatic macro, and downloads the data to a data set named the same as the ticker into a permanent library.
The SYSPARM macro is set by the value you set on the commandline to call SAS
sas.exe myprog.sas -sysparm XYZ
This would set &SYSPARM to resolve XYZ
Write a SAS program that merges all the ticker data sets together for further processing.
Create a program in a language like Perl or Python, (or shell script, etc.) that loops over a range of tickers and calls your SAS program, passing the ticker through SYSPARM.
Use a threading, forking, etc. package from that language to have multiple of these running at the same time. You can probably go to some multiple of the CPU cores on your machine as this processing will not be CPU intensive. Test values to you find one that works.
From that same language call your SAS program to merge the datasets.
I am having problems formatting Excel datetimes, so that it works internationally. Our program is written in C++ and uses COM to export data from our database to Excel, and this includes datetime fields.
If we don't supply a formatting mask, some installations of Excel displays these dates as Serial numbers (days since 1900.01.01 followed by time as a 24-hour fraction). This is unreadable to a human, so we ha found out that we MUST supply a date formatting mask to be sure that it displays readable.
The problem - as I see it - is that Excel uses international formatting masks. For example; the UK datetime format mask might be "YYYY-MM-DD HH:MM".
But if the format mask is sent to an Excel that is installed in Sweden, it fails since the Swedish version of the Excel uses "ÅÅÅÅ-MM-DD tt:mm".
It's highly impractical to have 150 different national datetime formatting masks in our application to support different countries.
Is there a way to write formatting masks so that they include locale, such that we would be allowed to use ONE single mask?
Unless you are using the date functionality in Excel, the easiest way to handle this is to decide on a format and then create a string yourself in that format and set the cell accordingly.
This comic: http://xkcd.com/1179/ might help you choose a standard to go with. Otherwise, clients that open your file in different countries will have differently formatted data. Just pick a standard and force your data to that standard.
Edited to add: There are libraries that can make this really easy for your as well... http://www.libxl.com/read-write-excel-date-time.html
Edited to add part2: Basically what I'm trying to get at is to avoid asking for the asmk and just format the data yourself (if that makes sense).
I recommend doing the following: Create an excel with date formatting on a specific cell and save this for your program to use.
Now when the program runs it will open this use this excel file to retrieve the local date formatting from the excel and the specified cell.
When you have multiple formats to save just use different cells for them.
It is not a nice way but will work afaik.
Alteratively you could consider creating an xla(m) file that will use vba and a command to feed back the local formatting characters through a function like:
Public Function localChar(charIn As Range) As String
localChar = charIn.NumberFormatLocal
End Function
Also not a very clean method, but it might do the trick for you.
First: I know this would be much easier if it was a .CSV but that is not possible (I'd 'a written the code in the time I wrote this post).
I want to insert numbers given by the user along with a time-stamp into a spreadsheet. There will be a graph in the spreadsheet that automatically generates based on columns a and b, hence the need to not be a .CSV. Column A holds Double-Floats of range 0 through 500 and Column B holds Date and Time information. Inserted rows must be at the top, thus pushing all existing data down by one row, each time.
I've been writing this manually and I think its time to stop doing that. I don't really care what language it is done in, but I would prefer C/C++ using at most the boost libraries. All libraries MUST be open-source. OS is Linux and input should from terminal or at least be given to the program as a parameter, such that the user's input could be piped into the program.
I found this, but I'm not sure if it is the best method as I'm not necessarily locked into python.
Insert row into Excel spreadsheet using openpyxl in Python
Thanks for any and all help.
Have you tried this? A C library that read Excel (xls) files: http://libxls.sourceforge.net.
Hope this meet your need.
An alternative: http://www.libxl.com, more powerful but not open source.
im using openpyxl to edit an excel file that contains some formulas in certain cells. Now when i populate the cells from a text file, im expecting the formula to work and give me my desired output. But what i observe is that the formulas get removed and the cells are left blank.
I had the same problem when saving the file with openpyxl: formulas removed.
But I pointed out that some intermediate formulas were still there.
After some tests, it appears that, in my case, all formulas which are displaying blank result (nothing) are cleaned when the save occured, unlike the formulas with an output in the cell, which are preserved.
ex :
=IF((SUM(P3:P5))=0;"";(SUM(Q3:Q5))/(SUM(P3:P5))) => can be removed when saving because of the blank result
ex :
=IF((SUM(P3:P5))=0;"?";(SUM(Q3:Q5))/(SUM(P3:P5))) => preserved when saving
for my example I'm using openpyxl-2.0.3 on Windows. Open and save function calls are :
self._book = load_workbook("myfile.xlsx", data_only=False)
self._book.save("myfile.xlsx")
openpyxl does currently not support reading of formulas. Ie. If you read your file and write it back, all formulas are removed. There is an active feature request in bitbucket tough.