Openpyxl: Formulas getting removed when saving file

Openpyxl: Formulas getting removed when saving file - python-2.7

im using openpyxl to edit an excel file that contains some formulas in certain cells. Now when i populate the cells from a text file, im expecting the formula to work and give me my desired output. But what i observe is that the formulas get removed and the cells are left blank.

I had the same problem when saving the file with openpyxl: formulas removed.
But I pointed out that some intermediate formulas were still there.
After some tests, it appears that, in my case, all formulas which are displaying blank result (nothing) are cleaned when the save occured, unlike the formulas with an output in the cell, which are preserved.
ex :
=IF((SUM(P3:P5))=0;"";(SUM(Q3:Q5))/(SUM(P3:P5))) => can be removed when saving because of the blank result
ex :
=IF((SUM(P3:P5))=0;"?";(SUM(Q3:Q5))/(SUM(P3:P5))) => preserved when saving
for my example I'm using openpyxl-2.0.3 on Windows. Open and save function calls are :
self._book = load_workbook("myfile.xlsx", data_only=False)
self._book.save("myfile.xlsx")

openpyxl does currently not support reading of formulas. Ie. If you read your file and write it back, all formulas are removed. There is an active feature request in bitbucket tough.

Related

Excel formula error with multiple OR statements inside IF

I am trying to create an automated formula that reads the client initials from the cell and outputs a name for who is responsible for that client in another cell.
=IF(OR(A1="JL",A1="JP"), "John", "N/A",IF(OR(A1="RP",A1="RL",A1="RP"), "Doug", "N/A"))
But I get an error when I try to use this code, I am currently using Excel 2007
The error I get is
You've entered too many arguments for this function.
Is there a way to do this that gets around the error?
I have tried adjusting the comma locations and reducing the amount of brackets with no luck.
Or am I using the formula style wrong?

How to convert a decimal into it's time equivalent as part of a function?

I'm running into an issue when trying to compare data across two sheets to find discrepancies - specifically when it comes to comparing start and end times.
Right now, the "IF" statement in my screenshot is executing perfectly, except when a time is involved - it's reading those cells as decimals instead (but only sometimes).
I've tried formatting these cells (on the raw data AND on this "Discrepancies" report sheet) so that they are displayed as a "HH:MM am/pm" time, but the sheet is still comparing the decimal values.
Is there anything that I can add to this function to account for a compared value being a time instead of text, and having that text be compared for any discrepancies? I cannot add or change anything to the raw data sheets, the only thing I can edit is the formula seen in the screenshot I provided.
See the highlighted cells in my screenshot - this is the issue I keep running into. As you can see, there are SOME cells (the non-highlighted ones) that are executing as intended, but I'm unsure why this isn't the case for the whole spreadsheet when I've formatted everything the same way using the exact same formula across the whole sheet.
For example, the values in cell N2 is "8:00 AM" on both sheets, so the formula should just display "8:00 AM" in that cell (and NOT be highlighted) since there is no discrepancy in the cells between both sheets it's comparing. But instead, it's showing both times as a decimal with the slightest difference between them and is suggesting a difference where there technically isn't (or shouldn't be) one.
Please help!
Screenshot of original spreadsheet for reference
---EDIT (added the below):
Here is a view-only version of a SAMPLE SHEET that displays the issue I'm having:
https://docs.google.com/spreadsheets/d/1BdSQGsCajB3kOnYxzM3sl-0o3iTvR3ABdHpnzYRXjpA/edit?usp=sharing
On the sample sheet, the only cells that are performing as intended are C2, E2, G2, I2, K2, K6, or any cells that contain text like "Closed". Any of the other cells that have a time in both raw data tabs appears to be pulling the serial numbers for those times instead of correctly formatting it into "HH:mm AM/PM".
A quick tour of how the SAMPLE SHEET is set up:
User enters raw data into the "MicrositeRawData" and "SalesforceRawData" tabs.
Data is pulled from the "SalesforceRawData" tab into the "CleanedUpSalesforceData" tab using a QUERY that matches the UNIQUE ID's from the "MicrositeRawData" sheet, so that it essentially creates a tab that's in the same order and accounts for any extraneous data between the tabs (keep in mind this is a sample sheet and that the original sheet I'm using includes a lot more data which causes a mismatch of rows between the sheets which makes the QUERY necessary).
The "DISCREPANCIES" tab then compares the data between the "MicrositeRawData" and "CleanedUpSalesforceData" tabs. If the data is the same, it simply copies the data from the "MicrositeRawData" cell. But if the data is NOT the same, it lists the values from both sheets and is conditionally formatted to highlight those cells in yellow.
If there is data on the "MicrositeRawData" tab that is NOT included on the "SalesforceRawData" tab, the "DISCREPANCIES" tab will notate that and highlight the "A" cell in pink instead of yellow (as demonstrated in "A5").

try in B2:
=IF(MicrositeRawData!B2=CleanedUpSalesforceData!B2, MicrositeRawData!B2,
"MICROSITE: "&TEXT(MicrositeRawData!B2, "h:mm AM/PM")&CHAR(10)&
"SALESFORCE: "&TEXT(CleanedUpSalesforceData!B2, "h:mm AM/PM"))
update
delete all formulae from range B2:O10 and use this in B2:
=ARRAYFORMULA(IF(TO_TEXT(MicrositeRawData!B2:O10)=
TO_TEXT(CleanedUpSalesforceData!B2:O10), MicrositeRawData!B2:O10,
"MICROSITE: "&TEXT(IF(MicrositeRawData!B2:O10="",
"", MicrositeRawData!B2:O10), "h:mm AM/PM")&CHAR(10)&
"SALESFORCE: "&TEXT(IF(CleanedUpSalesforceData!B2:O10="",
"", CleanedUpSalesforceData!B2:O10), "h:mm AM/PM")))

Applying conditional formatting with CFSpreadsheet

In continuation of a previous thread, I've reached pretty close to where I want and have learned a lot. I'm under CF10 on a MSSQL Server 2008 environment. I have a report I'm generating using cfspreadsheet and then spitting out values based on whether a user has an app enabled it will be output as "Yes" and if not output as "No" in the excel spreadsheet.
Problem is, I need to make it a little easier on the eye and so I wanted to see if it was possible to apply conditional formatting to where if the 3 columns with 3 different apps is Y then it will be green and if N it will be red.
Any suggestions or examples would be great, thanks!

Like I mentioned in your other thread, if you return bit values (not strings), it is simple to apply a custom cell format. But you must use the spreadsheet functions, rather than cfspreadsheet (which does not support custom formatting).
Here is an expanded example to demonstrate how you could incorporate conditional color formatting:
<cfscript>
// build sample query
// note: values must be numeric and NOT text/varchar
qData = queryNew("");
queryAddColumn(qData, "AppNameAlpha", "bit", listToArray("0,0,1,0,1"));
queryAddColumn(qData, "AppNameBeta", "bit", listToArray("1,1,0,0,1"));
// create sample sheet
sheet = spreadsheetNew();
spreadsheetAddRows(sheet, qData);
// apply colorized yes/no format
spreadsheetFormatColumns(sheet, {dataformat='[Green]"Y";;[Red]"N"'}, "1-2");
spreadsheetWrite(sheet, "c:/path/to/sheet.xls", true);
</cfscript>
The "dataformat" uses the first three sections of Excel's custom number formatting: <positive><negative><zero>. Translated:
[Green]"Y"; // <positive values>: display "Y" in green
; // <negative values>: do nothing extra
[Red]"N" // <zero values>: display "N" in red

The function you are looking for is SpreadsheetFormatCell()

Comparing two documents

I have two very large lists. They both were originally in excel, but the larger one is a list of emails (about 160,000) of them with other information like their name and address etc. And the smaller one is a list of just 18,000 emails.
My question is what would be the easiest way to get rid of all 18,000 rows from the first document that contain the email addresses from the second?
I was thinking regex or maybe there is another application I can use? I have tried searching online but it seems like there isn't much specific to this. I also tried notepad++ but it freezes when I try to compare these large files.
-Thank You in Advance!!

Good question. One way I would tackle this is making a C++ program [you could extrapolate the idea to the language of your choice; You never mentioned which languages you were proficient in] that read each item of the smaller file into a vector of strings. First, of course, use Excel to save the files as CSV instead of XLS or XLSX, which will comma-separate the values so you can work with them easier. For the larger list, "Save As" a copy of just email addresses, deleting the other rows for now.
Then, you could open the larger list and use a nested loop to check if you should output to an output file. Something like:
bool foundMatch=false;
for(int y=0;y<LargeListVector.size();y++) {
for(int x=0;x<SmallListVector.size();x++) {
if(SmallListVector[x]==LargeListVector[y]) foundMatch=true;
}
if(!foundMatch) OutputVector.append(LargeListVector[y]);
foundMatch=false;
}
That might be partially pseudo-code, but do you get the idea?

So I read a forum post at : Here
=MATCH(B1,$A$1:$A$3,0)>0
Column B would be the large list, with the 160,000 inputs and column A was my list of things I needed to delete of 18,000.
I used this to match everything, and in a separate column pasted this formula. It would print out either an error or TRUE. If the data was in both columns it printed out true.
Then because I suck with excel, I threw this text into Notepad++ and searched for all lines that contained TRUE (match case, because in my case some of the data had the word true in it without caps.) I marked those lines, then under search, bookmarks, I removed all lines with bookmarks. Pasted that back into excel and voila.
I would like to thank you guys for helping and pointing me in the right direction :)

use uno (openoffice api) to open spreadsheet without recalculation

I'm using pyuno to read an excel spreadsheet (running on linux.) Many cells have formulas referring to addins that are, obviously, not available. However the cell values are what I want.
But when I load and read the sheet, it seems those formulas are being evaluated and thus the values are being overwritten with errors.
I've tried several things, none of which have worked:
set flags AutomaticCalculation=False, MacroExecutionMode=NEVER_EXECUTE in the call to desktop.loadComponentFromURL
call document.enableAutomaticCalculation(False) on the loaded document
Any suggestions?

If formluas aren't a matter, you might circumvent the problem by processing a copy of your spreadsheet in which only the values (not the formulas) are present.
To achieve this quickly, select the whole sheet content, copy, special paste; then remove everything except "value". Save to a new file (make sure you don't overwrite the original file or every formula will be lost!). Your script should then be able to process this file.
This is an ugly solution, as there must be a way to do it programmaticaly.

Calc does not yet support using the cached results after loading the document. Libreoffice Calc does now use cached results for xls documents. The results are also stored in ods but are ignored while loading the document and the formula result is evaluated by compiling and interpreting the saved formula.
There are some plans to add this for ods and xlsx too but there are many ods producers out there writting incorrect results in the file. So till now the only solution is to have a second version of the document only saving the results (or implementing it inside calc).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js