I am programmatically reading and writing an Excel worksheet, using the source code from https://www.codeproject.com/articles/13852/basicexcel-a-class-to-read-and-write-to-microsoft, which in turn is based on the documentation of the Excel file format from http://sc.openoffice.org/excelfileformat.pdf.
The Excel file format supported by the source code is Binary Interchange File Format version 8 (BIFF8).
One of the records in an Excel file is an Extended Format (XF) record. The first two bytes of the XF record is an index to a FONT record. That is all the documentation has to say about it.
My question: Is that a zero-based index or a 1-based index?
The following is a use case that confused me and lead me to ponder this question.
Use case: bold cell
I created a simple Excel worksheet: one cell containing text that is bold.
I programmatically read that Excel worksheet and dump all the data to a human readable format, using new dump() methods I added to the source code. I find that:
My cell is associated with a LABELSST record: <LabelSST rowIdx=0 colIdx=0 xfRecIdx=62 sstRecIdx=0 />
That LABELSST record refers to an XF record with an index of: 62
If that is a zero-based index, the XF record at index 62 is: <XF fontRecIdx=20 formatRecIdx=0 protect=0x1 align=0x20 rot=0 text=0 usedAttribs=0x8 borderLines=0 color1=0x2000000 color2=0x20c0 />
That XF record refers to a FONT record with an index of: 20
If that is a zero-based index, the FONT record at index 20 is: <Font height=220 options=0 colorIdx=9 weight=400 escType=0 uline=0 family=2 charSet=0 name="Calibri" />
That FONT record has a weight of 400.
That font weight of 400 is not what I expected. If my cell content is bold, then the font weight should be 700, as per the documentation.
However, if the XF record refers to a FONT record with a 1-based index, then FONT record at 1-based index 20 is: <Font height=220 options=1 colorIdx=8 weight=700 escType=0 uline=0 family=2 charSet=0 name="Calibri" />
And that FONT record does indeed have a weight of 700 to indicate bold.
This is confusing. I do not know if the index to FONT record in the XF record is supposed to be zero-based or 1-based.
I think I found the answer to my own question elsewhere in the http://sc.openoffice.org/excelfileformat.pdf documentation.
The font with index 4 is omitted in all BIFF versions. This means the first four fonts have zero-based indexes, and the fifth font and all following fonts are referenced with one-based indexes.
If that is true, then the data I observed makes more sense.
Related
In my application I need to set the rows to show the text area columns in more than one line (say 3 lines). However I don't want the text area column contents in one single line, despite the row height is increased. It should be wrapped to the number of lines.
If I turnoff fixed rowheight attribute. Then each row has a different height. Thats not what I'm expecting
I tried below inline css, but it is not changing
#static_id .a-GV-cell {<br/> height: 80px;<br/>}<br/>.wrap-cell {<br/> max-height: 64px;<br/> white-space: normal;<br/> overflow: hidden;<br/>
However this still only shows the text area column contents in one single line, despite the row height is increased.
thanks in advance
I'm trying to read the discharge data of 346 US rivers stored online in textfiles. The files are more or less in this format:
Measurement_number Date Gage_height Discharge_value
1 2017-01-01 10 1000
2 2017-01-20 15 2000
# etc.
I only want to read the gage height and discharge value columns.
The problem is that in most files additional columns with metadata are added in front of the 'Gage height' column, so i can not just simply read the 3rd and 4th column because their index varies.
I'm trying to find a way to say 'read the columns with the name 'Gage_height' and 'Discharge_value'', but I haven't succeeded yet.
I hope anyone can help. I'm currently trying to load the text files with numpy.genfromtxt so it would be great to find a solution with that package but other solutions are also more than welcome.
This is my code so far
data_url=urllib2.urlopen(#the url of this specific site)
data=np.genfromtxt(data_url,skip_header=1,comments='#',usecols=2,3])
You can use the names=True option to genfromtxt, and then use the column names to select which columns you want to read with usecols.
For example, to read 'Gage_height' and 'Discharge_value' from your data file:
data = np.genfromtxt(filename, names=True, usecols=['Gage_height', 'Discharge_value'])
Note that you don't need to set skip_header=1 if you use names=True.
You can then access the columns using their names:
gage_height = data['Gage_height'] # == array([ 10., 15.])
discharge_value = data['Discharge_value'] # == array([ 1000., 2000.])
See the docs here for more information.
table = document.add_table(rows=1, cols=1)
hdr_cells = table.rows[0].cells
hdr_cells[0].text = 'Qty'
I have to change font size of text 'Qty' in table with one row and one column, how can I make it?
You need to get the paragraph in the cell. From the documentation of python-docx:
3.5.2 _Cell objects:
class docx.table._Cell (tc, parent)
paragraphs
List of paragraphs in the cell. A table cell is required to
contain at least one block-level element and end with a paragraph. By
default, a new cell contains a single paragraph. Read-only
Reference: python-docx Documentation - Read the Docs
The code:
To change font size of text 'Qty'
paragraph =hdr_cells[0].paragraphs[0]
run = paragraph.runs
font = run[0].font
font.size= Pt(30) # font size = 30
To change font size of the whole table:
for row in table.rows:
for cell in row.cells:
paragraphs = cell.paragraphs
for paragraph in paragraphs:
for run in paragraph.runs:
font = run.font
font.size= Pt(30)
Reference of how to access paragraphs in a table: Extracting data from tables
Up there the solution really helped. I use it for a while. But I found a tiny problem:time. As your table grows bigger the time you cost to build the table getting longer. So I improve it. Cut two rounds. Here you are:
The code changes the whole table
for row in table.rows:
for cell in row.cells:
paragraphs = cell.paragraphs
paragraph = paragraphs[0]
run_obj = paragraph.runs
run = run_obj[0]
font = run.font
font.size = Pt(30)
When you cut two rounds you save the time
Building on user7609283 answer, here is a short version to set a cell to bold (the cell must contain text before applying format as it looks for the first paragraph):
row_cells = table.add_row().cells
row_cells[0].text = "Customer"
row_cells[0].paragraphs[0].runs[0].font.bold = True
This font change applies to all cells of a table and is streamlined. All cells must contain text before being formatted or an index out of range error is thrown:
for row in table.rows:
for cell in row.cells:
cp = cell.paragraphs[0].runs
cp[0].font.size = Pt(14)
This next font change applies to a single cell as wanted, in this case the top left cell:
tc = table.cell(0, 0).paragraphs[0].runs
tc[0].font.size = Pt(14)
I have this scenario with source as fix width flat file, and I need to read to target only the Header and Footer not the details records.
I need to trim the first column (PA22109 ) and get only PA and next 2 columns to rows as two different dates.
For Footer get only the PT(PT000000000700000030620E00000055612I00000010277I) and the rest into a column of the target.
How can I achieve this logic, inputs are appreciated.
source file :
PA22109 00153252015110905408179 2015110820151108PO ---header
DE0E9D TESTGROUPEXCH TESTINSEXCH TESTLOCEXCH ID014 LNAME014 FNAME014 14 MAIN ST ANYWHERE NJ011110000 195001012Z 01000000014 LNAME014 PATFIRST014 14 MAIN ST ANYWHERE NJ011110000 1955010110106000220 TESTGROUPEXCH 8179 TESTBENEXCH TESTCNTE53 0000000000 0000002643005 011234567890 011234567890 1234 TEST PHARMACY TEST PHARMACY LANE PHARMACYTOWN NJ09876 5555555555 11Y5 019876543210 019876543210 NJPRESCLAST PRESCFIRST 5555555551 DRLAST DRFIRST 110110000009770990300406048410 2015092720150927154401000000000000120150929 0000100000000000000000000000000
PT000000000700000030620E00000055612I00000010277I --Footer
As this a fixed file you can perform following to meet your requirement.
In your Informatica mapping, Read row in a single column.
In Expression, Mark each record for filter out if It does not start with PA OR PT (Assumption your Detail records do not start with PA or PT). Filter detail record out using Filter transformation.
Now you have only Header and Footer Records.
Now you can apply respective condition in expression for PA and PT Records.
I'm using GetTextExtentPoint32W to get width of a text in a cell in MS Excel 2010. The cell width is fetched using ActiveCell.Width. These two widths are then compared to determine whether the text fits in the cell or extends out of the cell.
Visually, even though the text fits perfectly in the cell, the text width returned by the method is more than the cell width. Also, when I increase the font size the difference between actual text width and that returned by the method increases.
Following is a part of the source code used to achieve the result. Please help me solve this error.
hDC = ctypes.windll.user32.GetDC(self.windowHandle)
tempBMP = ctypes.windll.gdi32.CreateCompatibleBitmap(hDC, 1, 1)
hBMP = ctypes.windll.gdi32.SelectObject(hDC, tempBMP)
iFontSize = self.excelCellObject.Font.Size
deviceCaps = ctypes.windll.gdi32.GetDeviceCaps(hDC, 90)
iFontSize = int(iFontSize)
iFontSize = ctypes.c_int(iFontSize)
iFontSize = ctypes.windll.kernel32.MulDiv(iFontSize, deviceCaps, 72)
iFontSize = iFontSize * -1
iFontWeight = 700 if self.excelCellObject.Font.Bold else 400
sFontName = self.excelCellObject.Font.Name
sFontItalic = self.excelCellObject.Font.Italic
sFontUnderline = True if self.excelCellObject.Font.Underline else False
sFontStrikeThrough = self.excelCellObject.Font.Strikethrough
#Create a font object with the correct size, weight and style
hFont = ctypes.windll.gdi32.CreateFontW(iFontSize,
0, 0, 0,
iFontWeight,
sFontItalic,
sFontUnderline,
sFontStrikeThrough,
False, False, False,
False, False,
sFontName)
#Load the font into the device context, storing the original font object
hOldFont = ctypes.windll.gdi32.SelectObject(hDC, hFont)
sText = self.excelCellObject.Text
log.io("\nText \t"+sText+"\n")
textLength = len(sText)
class structText(ctypes.Structure):
_fields_ = [("width", ctypes.c_int),
("height",ctypes.c_int)]
StructText = structText()
getTextExtentPoint = ctypes.windll.gdi32.GetTextExtentPoint32W
getTextExtentPoint.argtypes = [ctypes.c_void_p,
ctypes.c_char_p,
ctypes.c_int,
ctypes.POINTER(structText)]
getTextExtentPoint.restype = ctypes.c_int
#Get the text dimensions
a = ctypes.windll.gdi32.GetTextExtentPoint32W(hDC,
sText,
textLength,
ctypes.byref(StructText))
#Delete the font object we created
a = ctypes.windll.gdi32.DeleteObject(hFont)
a = ctypes.windll.gdi32.DeleteObject(tempBMP)
#Release the device context
a = ctypes.windll.user32.ReleaseDC(self.windowHandle, hDC)
textWidth = StructText.width
cellWidth = self.excelCellObject.Width
Thanks.
I do not use Python or Excel 2010 so cannot comment on your current approach. However, I have struggled with a similar problem. I hope the following points will be helpful.
Background
If you hover over the right boundary of an Excel column and hold the left mouse button you will get a display of the format: “Width: n.nn (mm pixels)”.
The help for the ColumnWidth property says:
One unit of column width is equal to the width of one character in the
Normal style. For proportional fonts, the width of the character 0
(zero) is used.
Use the Width property to return the width of a column in points.
As far as I can tell, “Normal style” means the standard font name and size at the time the workbook was created. Changing the standard font name and size for an existing workbook does not appear to have any effect. Changing the font name and size for a worksheet has no effect.
Two example displays for a standard width column are:
For Arial 10 Width: 8.43 (64 pixels)
For Tahoma 10.5 Width: 8.38 (72 pixels)
I have created a string of zeros and attempted to measure how many are visible depending on the width of the column. I found the number of zeroes that I could see matched the value displayed reasonably well for such as subjective measure.
With VBA, the ColumnWidth property of a column or cell sets or returns the width in characters.
With VBA, The read only Width property of a column or cell returns .75 * the width in pixels.
The significance of the above information is that the width values obtained from Excel are not necessarily correct for the font being used.
My problem and the solution I discovered
The problem I had was that I was merging cells and filling them with text. Although Excel will adjust the height of a row so the text within an unmerged cell is visible, it will not do so for a merged cell. I tried many techniques including Microsoft’s .Net, text rendering routines without success. Nothing I tried would reliably emulate Excel’s system for determining the width of text.
The technique I eventually used successfully involved picking a cell to the right and below all used cells for experimental use. I adjusted the width of the experimental cell’s column to match the combined width of the merged cell and copied the formatted value from the cell whose height I wanted. Because the experimental cell was unmerged, Excel would adjust the height as appropriate. I then made sure the source row was at least this height.
The key feature of this technique was that I was not trying to emulate Excel’s system for determining the width of text; I was using Excel’s system.
You would need to try different column widths for the experimental cell. I would start with the current width of the source column. If the row height matches that for a single row, the width of the source column is already sufficient. If the row height is more than that for a single row, I would increase the column width until the row height matched that for a single row and then adjust the source column’s width to match. If you start with the widest value (in characters or Python’s text width) you will probably get the source column width correct at the first attempt which would avoid the need for later adjustments.