Xlrd list index out of range - list

I'm just starting to explore Xlrd, and to be honest am pretty new to programming altogether, and have been working through some of their simple examples, and can't get this simple code to work:
import xlrd
book=open_workbook('C:\\Users\\M\\Documents\\trial.xlsx')
sheet=book.sheet_by_index(1)
cell=sheet.cell(0,0)
print cell
I get an error: list index out of range (referring to the 2nd to last bit of code)
I cut and pasted most of the code from the pdf...any help?

You say:
I get an error: list index out of range (referring to the 2nd to last
bit of code)
I doubt it. How many sheets are there in the file? I suspect that there is only one sheet. Indexing in Python starts from 0, not 1. Please edit your question to show the full traceback and the full error message. I suspect that it will show that the IndexError occurs in the 3rd-last line:
sheet=book.sheet_by_index(1)

I would play around with it in the console.
Execute each statement one at a time and then view the result of each. The sheet indexes count from 0, so if you only have one worksheet then you're asking for the second one, and that will give you a list index out of range error.
Another thing that you might be missing is that not all cells exist if they don't have data in them. Some do, but some don't. Basically, the cells that exist from xlrd's standpoint are the ones in the matrix nrows x ncols.
Another thing is that if you actually want the values out of the cells, use the cell_value method. That will return you either a string or a float.
Side note, you could write your path like so: 'C:/Users/M/Documents/trial.xlsx'. Python will handle the / vs \ on the backend perfectly and you won't have to screw around with escape characters.

Related

How to automatically feed a cell value from a range of values, based on its matching condition with other cell value

I'm making a time-spending tracker based on the work I do every hour of the day.
Now, suppose I have 28 types of work listed in my tracker (which I also have to increase from time to time), and I have about 8 significance values that I have decided to relate to these 28 types of work, predefined.
I want that, as soon as I enter a type of work in cell 1 - I want the adjacent cell 2 to get automatically populated with a significance value (from a range of 8 values) that is pre-definitely set by me.
Every time I input a new or old occurrence of a type of work, the adjacent cell should automatically get matched with its relevant significance value & automatically get populated in real-time.
I know how to do it using IF, IFS, and IF_OR conditions, but I feel that based on the ever-expanding types of work & significance values, the above formulas will be very big, complicated, and repetitive in the future. I feel there's a more efficient way to achieve it. Also, I don't want it to be selected from a drop-down list.
Guys, please help me out with the most efficient way to handle this. TUIA :)
Also, I've added a snapshot and a sample sheet describing the problem.
Sample sheet
XLOOKUP() may work. Try-
=XLOOKUP(D2,A2:A,B2:B)
Or FILTER() function like-
=FILTER(B2:B,A2:A=D2)
You can use this formula for a whole column:
=INDEX(IFERROR(VLOOKUP(C14:C,A2:B9,2,0)))
Adapt the ranges to your actual tables in order to include in the second argument all the potential values and their significances
This is the formula, that worked for me (for anybody's reference):
I created another reference sheet, stating the types of work & their significance. From that sheet, I'm using either vlookup, filter, xlookup.Using gforms for inputting my data.
=ARRAYFORMULA(IFS(ROW(D:D)=1,"Significance",A:A="","",TRUE,VLOOKUP(D:D,Reference!$A:$B,2,0)))

How can I resolve INDEX MATCH errors caused by discrepancies in the spelling of names across multiple data sources?

I've set up a Google Sheets workbook that synthesizes data from a few different sources via manual input, IMPORTHTML and IMPORTRANGE. Once the data is populated, I'm using INDEX MATCH to filter and compare the information and to RANK each data set.
Since I have multiple data inputs, I'm running into a persistent issue of names not being written exactly the same between sources, even though they're the same person. First names are the primary culprit (i.e. Mary Lou vs Marylou vs Mary-Lou vs Mary Louise) but some last names with special symbols (umlauts, accents, tildes) are also causing errors. When Sheets can't recognize a match, the INDEX MATCH and RANK functions both break down.
I'm wondering how to better unify the data automatically so my Sheet understands that each occurrence is actually the same person (or "value").
Since you can't edit the results of an IMPORTHTML directly, I've set up "helper columns" and used functions like TRIM and SPLIT to try and fix instances as I go, but it seems like there must be a simpler path.
It feels like IFS could work but I can't figure how to integrate it. Also thinking this may require a script, which I'm just beginning to study.
Here's a simplified example of what I'm trying to achieve and the corresponding errors: Sample Spreadsheet
The first tab is attempting to pull and RANK data from tabs 2 and 3. Sample formulas from the Summary tab, row 3 (Amelia Rose):
Cell B3: =INDEX('Q1 Sales'!B:B, MATCH(A3,'Q1 Sales'!A:A,0))
Cell C3: =RANK(B3,$B$2:B,1)
Cell D3: =INDEX('Q2 Sales'!B:B, MATCH(A3,'Q2 Sales'!A:A,0))
Cell E3: =RANK(D3,$D$2:D,1)
I'd be grateful for any insight on how to best index 'Q2Sales'!B3 as the correct value for 'Summary'!D3. Thanks in advance - the thoughtful answers on Stack Overflow have gotten me this far!
to counter every possible scenario do it like this:
=ARRAYFORMULA(IFERROR(VLOOKUP(LOWER(REGEXREPLACE(A2:A, "-|\s", )),
{REGEXEXTRACT(LOWER(REGEXREPLACE('Q2 Sales'!A2:A, "-|\s", )),
TEXTJOIN("|", 1, LOWER(REGEXREPLACE(A2:A, "-|\s", )))), 'Q2 Sales'!B2:B}, 2, 0)))

Apply a function to a range of cells in a spreadsheet

The answers in topics with similar titles haven't given me much of a resolution to my particular problem, but possibly I am not asking the right question. It might help knowing I'm an absolute noob when it comes to spreadsheets, so finding my way around is next to nil.
Currently I can set a basic function in the first cell A1 =ROW()
Simple right? Well now here comes the complication. If I click on the bottom right of the cell and start dragging I can then apply that very same function to a whole range of cells. Let's say I apply it from A1:A10. Every cell within this group now has the same function.
Hooray! We did it, right? I applied a function to a range of cells each with their own output. But wait, if I then go back to the original cell and change its formula none of the other cells change with it. GRRRRR!!!!
There are a couple of fixes I've come up with but don't necessarily know how to implement. The first is to have every cell link back to the original cell and reference its function. This would be useful if I wanted to randomly scatter dependent cells about the document. The other would be much more useful in an orderly group where you know the exact dimensions by specifying in the original cell the size of the array you want to apply the function to.
With that said, let me hear your thoughts.
The closest I've come to an answer is to use FORMULA() which returns the formula used by a cell as text. Unfortunately all answers on evaluating the text resort to scripting. How strange! I thought something like this would be common. Might as well get to scripting.
Hold on, I may have spoke too soon. An array can be made with =MUNIT(), but it's only square. Drats!
Ok... I'm hoping the zebra stripes will eventually become its own answer unless someone else beats me to it. So a simple array can be made with ={1,2;3,4} where commas separate values by column and semicolons for values by row except to generate it you have to press Control+Shift+Enter (because reasons?). I'm thinking now that I'll need to have functions that can generate lists of values based on a single function for each row, and pray that it'll work. So, back to looking. (Wow this is taking forever)
The way I was hypothesizing can't even generate a 1x1, e.g., ={ROW()} returns Err:512 which is a formula overflow.
Alright, in summary so far I've narrowed down the two options,
1) link every cell to the original formula
2) populate an array with a single formula
each with their own incomplete answer,
a) use FORMULA() to return the formula of a cell as text
b) create a hypothetical array like so ={LIST_OF_VALUES()}
These both require a strange form of the nonexistent EVALUATE() function to 'function' correctly. Isn't that fun?
Google Sheets handles case b by allowing ={ROW()}Control+Shift+Enter to generate =ArrayFormula({ROW()}). Working with the general case of any sized array being filled with a single function doesn't exist in the world of spreadsheets it seems. That's very saddening because I can't think of a much better tool for what I want to do. Copy paste it is until I need to use macros.
Depending on your specific use case, creating a user-defined function may help:
use the Basic IDE to create your function;
apply it to any cells on any sheet;
modifying the Basic code will affect all cells where the function is used.
I've elaborated the steps in an answer on superuser.
Sure, you could write some complex code to update functions, but wouldn't the easy way be just to drag it to the same range of cells the same way you did before? It should properly overwrite the existing code in there, and if it doesn't, you can just as easily delete the outdated code and drag the new code in.
Probably the best approach is to simply drag the amended formula over the range of cells (as advised by OldBunny2800). This is less error prone and easier to maintain than a custom macro.
Another option would be to use an array function. Then you only have to edit the function once, and the same edit will be automatically applied to the whole range of cells in that array function.

Comparing two documents

I have two very large lists. They both were originally in excel, but the larger one is a list of emails (about 160,000) of them with other information like their name and address etc. And the smaller one is a list of just 18,000 emails.
My question is what would be the easiest way to get rid of all 18,000 rows from the first document that contain the email addresses from the second?
I was thinking regex or maybe there is another application I can use? I have tried searching online but it seems like there isn't much specific to this. I also tried notepad++ but it freezes when I try to compare these large files.
-Thank You in Advance!!
Good question. One way I would tackle this is making a C++ program [you could extrapolate the idea to the language of your choice; You never mentioned which languages you were proficient in] that read each item of the smaller file into a vector of strings. First, of course, use Excel to save the files as CSV instead of XLS or XLSX, which will comma-separate the values so you can work with them easier. For the larger list, "Save As" a copy of just email addresses, deleting the other rows for now.
Then, you could open the larger list and use a nested loop to check if you should output to an output file. Something like:
bool foundMatch=false;
for(int y=0;y<LargeListVector.size();y++) {
for(int x=0;x<SmallListVector.size();x++) {
if(SmallListVector[x]==LargeListVector[y]) foundMatch=true;
}
if(!foundMatch) OutputVector.append(LargeListVector[y]);
foundMatch=false;
}
That might be partially pseudo-code, but do you get the idea?
So I read a forum post at : Here
=MATCH(B1,$A$1:$A$3,0)>0
Column B would be the large list, with the 160,000 inputs and column A was my list of things I needed to delete of 18,000.
I used this to match everything, and in a separate column pasted this formula. It would print out either an error or TRUE. If the data was in both columns it printed out true.
Then because I suck with excel, I threw this text into Notepad++ and searched for all lines that contained TRUE (match case, because in my case some of the data had the word true in it without caps.) I marked those lines, then under search, bookmarks, I removed all lines with bookmarks. Pasted that back into excel and voila.
I would like to thank you guys for helping and pointing me in the right direction :)

How to iterate over all the page breaks in an Excel 2003 worksheet via COM

I've been trying to retrieve the locations of all the page breaks on a given Excel 2003 worksheet over COM. Here's an example of the kind of thing I'm trying to do:
Excel::HPageBreaksPtr pHPageBreaks = pSheet->GetHPageBreaks();
long count = pHPageBreaks->Count;
for (long i=0; i < count; ++i)
{
Excel::HPageBreakPtr pHPageBreak = pHPageBreaks->GetItem(i+1);
Excel::RangePtr pLocation = pHPageBreak->GetLocation();
printf("Page break at row %d\n", pLocation->Row);
pLocation.Release();
pHPageBreak.Release();
}
pHPageBreaks.Release();
I expect this to print out the row numbers of each of the horizontal page breaks in pSheet. The problem I'm having is that although count correctly indicates the number of page breaks in the worksheet, I can only ever seem to retrieve the first one. On the second run through the loop, calling pHPageBreaks->GetItem(i) throws an exception, with error number 0x8002000b, "invalid index".
Attempting to use pHPageBreaks->Get_NewEnum() to get an enumerator to iterate over the collection also fails with the same error, immediately on the call to Get_NewEnum().
I've looked around for a solution, and the closest thing I've found so far is http://support.microsoft.com/kb/210663/en-us. I have tried activating various cells beyond the page breaks, including the cells just beyond the range to be printed, as well as the lower-right cell (IV65536), but it didn't help.
If somebody can tell me how to get Excel to return the locations of all of the page breaks in a sheet, that would be awesome!
Thank you.
#Joel: Yes, I have tried displaying the user interface, and then setting ScreenUpdating to true - it produced the same results. Also, I have since tried combinations of setting pSheet->PrintArea to the entire worksheet and/or calling pSheet->ResetAllPageBreaks() before my call to get the HPageBreaks collection, which didn't help either.
#Joel: I've used pSheet->UsedRange to determine the row to scroll past, and Excel does scroll past all the horizontal breaks, but I'm still having the same issue when I try to access the second one. Unfortunately, switching to Excel 2007 did not help either.
Experimenting with Excel 2007 from Visual Basic, I discovered that the page break isn't known unless it has been displayed on the screen at least once.
The best workaround I could find was to page down, from the top of the sheet to the last row containing data. Then you can enumerate all the page breaks.
Here's the VBA code... let me know if you have any problem converting this to COM:
Range("A1").Select
numRows = Range("A1").End(xlDown).Row
While ActiveWindow.ScrollRow < numRows
ActiveWindow.LargeScroll Down:=1
Wend
For Each x In ActiveSheet.HPageBreaks
Debug.Print x.Location.Row
Next
This code made one simplifying assumption:
I used the .End(xlDown) method to figure out how far the data goes... this assumes that you have continuous data from A1 down to the bottom of the sheet. If you don't, you need to use some other method to figure out how far to keep scrolling.
Did you set ScreenUpdating to True, as mentioned in the KB article?
You may want to actually toggle it to True to force a screen repaint. It sounds like the calculation of page breaks is a side-effect of actually rendering the page, rather than something Excel does on demand, so you have to trigger a page rendering on the screen.