Excel operating slow with office 13 and not supporting xlrd (not able to read hyperlink's content) - python-2.7

I am facing primarily two problems:
I had an excel in xls format and I moved it to xlsm format. After this, xlrd is not able to read the hyperlinks. I have no other option but to use xlrd. Any solution?
And the xls is working too slow with Office 13. I converted it to xlsm format too. But still its too slow. Is there anything that can be done with xlrd? Actually my python script is used by lot of people to work with excel. So, I cant force each end user to configure xlsm (this will be the last option in case no other option is available).

For this, the issue is happening because the formatting text flag is not enabled in the case of xlsm. The only solution is to create a new column where we put the sheet name and read it, instead of getting it from the hyperlink.
Sub Button1_Click()
Dim a As Integer
Dim b As String
Dim c As Integer
On Error Resume Next
For Each hl In Sheets("TPD Sheet").Hyperlinks
'Get the sheet name
If Not IsNull(hl) Then
c = Application.Evaluate(hl.Range.row)
MsgBox "Aakash" & Application.Evaluate(hl.Range.row)
Set r = Application.Evaluate(hl.SubAddress)
'MsgBox "Row number being operated upon -> " & Application.Evaluate(hl.Range.row)
'Get the cell where the sheet name to be put
a = Application.Evaluate(hl.Range.row)
b = Application.Evaluate(hl.Range.Column)
Set TxtRng = Sheets("MySheet").Cells(a, "D")
TxtRng.value = r.Parent.name
End If
Next hl
End Sub
Select disable hardware graphics acceleration and it will be little faster. Still trying to make it faster.

Related

PDI - Multiple file input based on date in filename

I'm working with a project using Kettle (PDI).
I have to input multiple file of .csv or .xls and insert it into DB.
The file name are AAMMDDBBBB, where AA is code for city and BBBB is code for shop. MMDD is date format like MM-DD. For example LA0326F5CA.csv.
The Regexp I use in the Input file steps look like LA.\*\\.csv or DT.*\\.xls, which is return all files to insert it into DB.
Can you indicate me how to select the files the file just for yesterday (based on the MMDD of the file name).
As you need some "complex" logic in your selection, you cannot filter based only on regexp. I suggest you first read all filenames, then filter the filenames based on their "age", then read the file based on the selected filenames.
In detail:
Use the Get File Names step with the same regexp you currently use (LA.*\.csv or DT.*\.xls). You may be more restrictive at that stage with a Regexp like LA\d\d\d\d.....csv, to ensure MM and DD are numbers, and DDDD is exactly 4 characters.
Filter based on the date. You can do this with a Java Filter, but it would be an order of magnitude easier to use a Javascript Script to compute the "age" of you file and then to use a Filter rows to keep only the file of yesterday.
To compute the age of the file, extract the MM and DD, you can use (other methods are available):
var regexp = filename.match(/..(\d\d)(\d\d).*/);
if(regexp){
var age = new Date() - new Date(2018, regexp[1], regexp[2]);
age = age /1000 /60 /60 /24;
};
If you are not familiar with Javascript regexp: the match will test
the filename against the regexp and keep the values of the parenthesis
in an array. If the test succeed (which you must explicitly check to
avoid run time failure), use the values of the match to compute the
corresponding date, and subtract the date of today to get the age.
This age is in milliseconds, which is converted in days.
Use the Text File Input and Excel Input with the option Accept file from previous step. Note that CSV Input does not have this option, but the more powerful Text File Input has.
well I change the Java Filter with Modified Java Script Value and its work fine now.
Another question, how can I increase the Performance and Speed of my current transformation(now I have 2 trans. for 2 cities)? My insert update make my tranformation to slow and need almost 1hour and 30min to process 500k row of data with alot of field(300mb) and my data not only this, if is it work more fast and my company like to used it, im gonna do it with 10TB of data/years and its alot of trans and rows. I need sugestion about it

Excel international date formatting

I am having problems formatting Excel datetimes, so that it works internationally. Our program is written in C++ and uses COM to export data from our database to Excel, and this includes datetime fields.
If we don't supply a formatting mask, some installations of Excel displays these dates as Serial numbers (days since 1900.01.01 followed by time as a 24-hour fraction). This is unreadable to a human, so we ha found out that we MUST supply a date formatting mask to be sure that it displays readable.
The problem - as I see it - is that Excel uses international formatting masks. For example; the UK datetime format mask might be "YYYY-MM-DD HH:MM".
But if the format mask is sent to an Excel that is installed in Sweden, it fails since the Swedish version of the Excel uses "ÅÅÅÅ-MM-DD tt:mm".
It's highly impractical to have 150 different national datetime formatting masks in our application to support different countries.
Is there a way to write formatting masks so that they include locale, such that we would be allowed to use ONE single mask?
Unless you are using the date functionality in Excel, the easiest way to handle this is to decide on a format and then create a string yourself in that format and set the cell accordingly.
This comic: http://xkcd.com/1179/ might help you choose a standard to go with. Otherwise, clients that open your file in different countries will have differently formatted data. Just pick a standard and force your data to that standard.
Edited to add: There are libraries that can make this really easy for your as well... http://www.libxl.com/read-write-excel-date-time.html
Edited to add part2: Basically what I'm trying to get at is to avoid asking for the asmk and just format the data yourself (if that makes sense).
I recommend doing the following: Create an excel with date formatting on a specific cell and save this for your program to use.
Now when the program runs it will open this use this excel file to retrieve the local date formatting from the excel and the specified cell.
When you have multiple formats to save just use different cells for them.
It is not a nice way but will work afaik.
Alteratively you could consider creating an xla(m) file that will use vba and a command to feed back the local formatting characters through a function like:
Public Function localChar(charIn As Range) As String
localChar = charIn.NumberFormatLocal
End Function
Also not a very clean method, but it might do the trick for you.

Openpyxl: Formulas getting removed when saving file

im using openpyxl to edit an excel file that contains some formulas in certain cells. Now when i populate the cells from a text file, im expecting the formula to work and give me my desired output. But what i observe is that the formulas get removed and the cells are left blank.
I had the same problem when saving the file with openpyxl: formulas removed.
But I pointed out that some intermediate formulas were still there.
After some tests, it appears that, in my case, all formulas which are displaying blank result (nothing) are cleaned when the save occured, unlike the formulas with an output in the cell, which are preserved.
ex :
=IF((SUM(P3:P5))=0;"";(SUM(Q3:Q5))/(SUM(P3:P5))) => can be removed when saving because of the blank result
ex :
=IF((SUM(P3:P5))=0;"?";(SUM(Q3:Q5))/(SUM(P3:P5))) => preserved when saving
for my example I'm using openpyxl-2.0.3 on Windows. Open and save function calls are :
self._book = load_workbook("myfile.xlsx", data_only=False)
self._book.save("myfile.xlsx")
openpyxl does currently not support reading of formulas. Ie. If you read your file and write it back, all formulas are removed. There is an active feature request in bitbucket tough.

use uno (openoffice api) to open spreadsheet *without* recalculation

I'm using pyuno to read an excel spreadsheet (running on linux.) Many cells have formulas referring to addins that are, obviously, not available. However the cell values are what I want.
But when I load and read the sheet, it seems those formulas are being evaluated and thus the values are being overwritten with errors.
I've tried several things, none of which have worked:
set flags AutomaticCalculation=False, MacroExecutionMode=NEVER_EXECUTE in the call to desktop.loadComponentFromURL
call document.enableAutomaticCalculation(False) on the loaded document
Any suggestions?
If formluas aren't a matter, you might circumvent the problem by processing a copy of your spreadsheet in which only the values (not the formulas) are present.
To achieve this quickly, select the whole sheet content, copy, special paste; then remove everything except "value". Save to a new file (make sure you don't overwrite the original file or every formula will be lost!). Your script should then be able to process this file.
This is an ugly solution, as there must be a way to do it programmaticaly.
Calc does not yet support using the cached results after loading the document. Libreoffice Calc does now use cached results for xls documents. The results are also stored in ods but are ignored while loading the document and the formula result is evaluated by compiling and interpreting the saved formula.
There are some plans to add this for ods and xlsx too but there are many ods producers out there writting incorrect results in the file. So till now the only solution is to have a second version of the document only saving the results (or implementing it inside calc).

How to iterate over all the page breaks in an Excel 2003 worksheet via COM

I've been trying to retrieve the locations of all the page breaks on a given Excel 2003 worksheet over COM. Here's an example of the kind of thing I'm trying to do:
Excel::HPageBreaksPtr pHPageBreaks = pSheet->GetHPageBreaks();
long count = pHPageBreaks->Count;
for (long i=0; i < count; ++i)
{
Excel::HPageBreakPtr pHPageBreak = pHPageBreaks->GetItem(i+1);
Excel::RangePtr pLocation = pHPageBreak->GetLocation();
printf("Page break at row %d\n", pLocation->Row);
pLocation.Release();
pHPageBreak.Release();
}
pHPageBreaks.Release();
I expect this to print out the row numbers of each of the horizontal page breaks in pSheet. The problem I'm having is that although count correctly indicates the number of page breaks in the worksheet, I can only ever seem to retrieve the first one. On the second run through the loop, calling pHPageBreaks->GetItem(i) throws an exception, with error number 0x8002000b, "invalid index".
Attempting to use pHPageBreaks->Get_NewEnum() to get an enumerator to iterate over the collection also fails with the same error, immediately on the call to Get_NewEnum().
I've looked around for a solution, and the closest thing I've found so far is http://support.microsoft.com/kb/210663/en-us. I have tried activating various cells beyond the page breaks, including the cells just beyond the range to be printed, as well as the lower-right cell (IV65536), but it didn't help.
If somebody can tell me how to get Excel to return the locations of all of the page breaks in a sheet, that would be awesome!
Thank you.
#Joel: Yes, I have tried displaying the user interface, and then setting ScreenUpdating to true - it produced the same results. Also, I have since tried combinations of setting pSheet->PrintArea to the entire worksheet and/or calling pSheet->ResetAllPageBreaks() before my call to get the HPageBreaks collection, which didn't help either.
#Joel: I've used pSheet->UsedRange to determine the row to scroll past, and Excel does scroll past all the horizontal breaks, but I'm still having the same issue when I try to access the second one. Unfortunately, switching to Excel 2007 did not help either.
Experimenting with Excel 2007 from Visual Basic, I discovered that the page break isn't known unless it has been displayed on the screen at least once.
The best workaround I could find was to page down, from the top of the sheet to the last row containing data. Then you can enumerate all the page breaks.
Here's the VBA code... let me know if you have any problem converting this to COM:
Range("A1").Select
numRows = Range("A1").End(xlDown).Row
While ActiveWindow.ScrollRow < numRows
ActiveWindow.LargeScroll Down:=1
Wend
For Each x In ActiveSheet.HPageBreaks
Debug.Print x.Location.Row
Next
This code made one simplifying assumption:
I used the .End(xlDown) method to figure out how far the data goes... this assumes that you have continuous data from A1 down to the bottom of the sheet. If you don't, you need to use some other method to figure out how far to keep scrolling.
Did you set ScreenUpdating to True, as mentioned in the KB article?
You may want to actually toggle it to True to force a screen repaint. It sounds like the calculation of page breaks is a side-effect of actually rendering the page, rather than something Excel does on demand, so you have to trigger a page rendering on the screen.