Python function misbehaves when called multiple times - python-2.7

I wrote a function in Python that will create a simple four column table in HTML. When I call it from file, it returns the table correctly.
Issues arise, however, if it is called multiple times in a single script. The first one appears as it ought to. The second time it is called, all of the rows beneath the title row have six columns (two blank) instead of four. The third time, there are ten columns (six blank).
I only started coding recently, so I don't know very much about what's going on behind the scenes here.
When a function is called twice or more times in succession, is a new instance of the function called? Are the variables all 'reset' so to speak?
This is the code of the called function:
def fourColumnTable(title1, list1, title2, list2, title3, list3, title4, list4):
error = 0
#Check that the lists are all of the same length
if(len(list1) != len(list2) or len(list1) != len(list3) or len(list1) != len(list4)):
error = 1
table = "ERROR: The lists must all be the same length"
if(error == 0):
tableList = []
#Append <table> tag
tableList.append('<table class="table table-bordered">')
#Format list elements and titles
#Put each title inside <th> tags
titleList = []
titleList.append(title1)
titleList.append(title2)
titleList.append(title3)
titleList.append(title4)
for i in range(len(titleList)):
titleList[i] = "<th>" + str(titleList[i]) + "</th>"
#Put each string element inside <td> tags
for i in range(len(list1)):
list1[i] = "<td>" + str(list1[i]) + "</td>"
for i in range(len(list2)):
list2[i] = "<td>" + str(list2[i]) + "</td>"
for i in range(len(list3)):
list3[i] = "<td>" + str(list3[i]) + "</td>"
for i in range(len(list4)):
list4[i] = "<td>" + str(list4[i]) + "</td>"
#Put all list elements in the tableList
tableList.append('<thead>')
for i in range(len(titleList)):
tableList.append(titleList[i])
tableList.append('</thead>')
tableList.append('<tbody>')
for i in range(len(list1)):
tableList.append('<tr>')
tableList.append(list1[i])
tableList.append(list2[i])
tableList.append(list3[i])
tableList.append(list4[i])
tableList.append('</tr>')
tableList.append('</tbody>')
#Close the <table> tag
tableList.append('</table>')
#Assign tableList to one variable
table = ''.join(tableList)
return table

I suspect that some (but not all) of your listN arguments being reused between calls. This leads to your bug, because your code modifies the provided lists each time it is called. It adds <td> and </td> around each list item. If you repeatedly do this on the same list, the items will end up with multple nested tags, e.g. <td><td><td>...</td></td></td>.
This then gets rendered as extra empty columns, as your browser fills in the missing closing tags between the repeated <td> opening tags and ignores the extra closing tags at the end.
A first quick fix would be to create a new list with the modified items, rather than modifying the provided list (here using a list comprehension):
list1 = ["<td>" + item + "</td>" for item in list1]
A further improvement would be to use a library to create your table, rather than creating it by string manipulation yourself. There are a variety of XML templating libraries you could use, but I don't have enough experience with any of them to make a strong suggestion. This page might be a good place to start browsing.

Related

Google Sheets: How can I extract partial text from a string based on a column of different options?

Goal: I have a bunch of keywords I'd like to categorise automatically based on topic parameters I set. Categories that match must be in the same column so the keyword data can be filtered.
e.g. If I have "Puppies" as a first topic, it shouldn't appear as a secondary or third topic otherwise the data cannot be filtered as needed.
Example Data: https://docs.google.com/spreadsheets/d/1TWYepApOtWDlwoTP8zkaflD7AoxD_LZ4PxssSpFlrWQ/edit?usp=sharing
Video: https://drive.google.com/file/d/11T5hhyestKRY4GpuwC7RF6tx-xQudNok/view?usp=sharing
Parameters Tab: I will add words in columns D-F that change based on the keyword data set and there will often be hundreds, if not thousands, of options for larger data sets.
Categories Tab: I'd like to have a formula or script that goes down the columns D-F in Parameters and fills in a corresponding value (in Categories! columns D-F respectively) based on partial match with column B or C (makes no difference to me if there's a delimiter like a space or not. Final data sheet should only have one of these columns though).
Things I've Tried:
I've tried a bunch of things. Nested IF formula with regexmatch works but seems clunky.
e.g. this formula in Categories! column D
=IF(REGEXMATCH($B2,LOWER(Parameters!$D$3)),Parameters!$D$3,IF(REGEXMATCH($B2,LOWER(Parameters!$D$4)),Parameters!$D$4,""))
I nested more statements changing out to the next cell in Parameters!D column (as in , manually adding $D$5, $D$6 etc) but this seems inefficient for a list thousands of words long. e.g. third topic will get very long once all dog breed types are added.
Any tips?
Functionality I haven't worked out:
if a string in Categories B or C contains more than one topic in the parameters I set out, is there a way I can have the first 2 to show instead of just the first one?
e.g. Cell A14 in Categories, how can I get a formula/automation to add both "Akita" & "German Shepherd" into the third topic? Concatenation with a CHAR(10) to add to new line is ideal format here. There will be other keywords that won't have both in there in which case these values will just show up individually.
Since this data set has a bunch of mixed breeds and all breeds are added as a third topic, it would be great to differentiate interest in mixes vs pure breeds without confusion.
Any ideas will be greatly appreciated! Also, I'm open to variations in layout and functionality of the spreadsheet in case you have a more creative solution. I just care about efficiently automating a tedious task!!
Try using custom function:
To create custom function:
1.Create or open a spreadsheet in Google Sheets.
2.Select the menu item Tools > Script editor.
3.Delete any code in the script editor and copy and paste the code below into the script editor.
4.At the top, click Save save.
To use custom function:
1.Click the cell where you want to use the function.
2.Type an equals sign (=) followed by the function name and any input value — for example, =DOUBLE(A1) — and press Enter.
3.The cell will momentarily display Loading..., then return the result.
Code:
function matchTopic(p, str) {
var params = p.flat(); //Convert 2d array into 1d
var buildRegex = params.map(i => '(' + i + ')').join('|'); //convert array into series of capturing groups. Example (Dog)|(Puppies)
var regex = new RegExp(buildRegex,"gi");
var results = str.match(regex);
if(results){
// The for loops below will convert the first character of each word to Uppercase
for(var i = 0 ; i < results.length ; i++){
var words = results[i].split(" ");
for (let j = 0; j < words.length; j++) {
words[j] = words[j][0].toUpperCase() + words[j].substr(1);
}
results[i] = words.join(" ");
}
return results.join(","); //return with comma separator
}else{
return ""; //return blank if result is null
}
}
Example Usage:
Parameters:
First Topic:
Second Topic:
Third Topic:
Reference:
Custom Functions
I've added a new sheet ("Erik Help") with separate formulas (highlighted in green currently) for each of your keyword columns. They are each essentially the same except for specific column references, so I'll include only the "First Topic" formula here:
=ArrayFormula({"First Topic";IF(A2:A="",,IFERROR(REGEXEXTRACT(LOWER(B2:B&C2:C),JOIN("|",LOWER(FILTER(Parameters!D3:D,Parameters!D3:D<>""))))) & IFERROR(CHAR(10)&REGEXEXTRACT(REGEXREPLACE(LOWER(B2:B&C2:C),IFERROR(REGEXEXTRACT(LOWER(B2:B&C2:C),JOIN("|",LOWER(FILTER(Parameters!D3:D,Parameters!D3:D<>""))))),""),JOIN("|",LOWER(FILTER(Parameters!D3:D,Parameters!D3:D<>""))))))})
This formula first creates the header (which can be changed within the formula itself as you like).
The opening IF condition leaves any row in the results column blank if the corresponding cell in Column A of that row is also blank.
JOIN is used to form a concatenated string of all keywords separated by the pipe symbol, which REGEXEXTRACT interprets as OR.
IFERROR(REGEXEXTRACT(LOWER(B2:B&C2:C),JOIN("|",LOWER(FILTER(Parameters!D3:D,Parameters!D3:D<>""))))) will attempt to extract any of the keywords from each concatenated string in Columns B and C. If none is found, IFERROR will return null.
Then a second-round attempt is made:
& IFERROR(CHAR(10)&REGEXEXTRACT(REGEXREPLACE(LOWER(B2:B&C2:C),IFERROR(REGEXEXTRACT(LOWER(B2:B&C2:C),JOIN("|",LOWER(FILTER(Parameters!D3:D,Parameters!D3:D<>""))))),""),JOIN("|",LOWER(FILTER(Parameters!D3:D,Parameters!D3:D<>"")))))
Only this time, REGEXREPLACE is used to replace the results of the first round with null, thus eliminating them from being found in round two. This will cause any second listing from the JOIN clause to be found, if one exists. Otherwise, IFERROR again returns null for round two.
CHAR(10) is the new-line character.
I've written each of the three formulas to return up to two results for each keyword column. If that is not your intention for "First Topic" and "Second Topic" (i.e., if you only wanted a maximum of one result for each of those columns), just select and delete the entire round-two portion of the formula shown above from the formula in each of those columns.

python performance issue while searching in a huge list

I need to speed up (dramatically) the search in a "huge" single dimension list of unsigned values. The list has 389.114 elements, and I need to perform a check before I add an item to make sure it doesn't already exist
I do this check 15 millions times...
Of course, it takes too much time
The fastest way I found was :
if this_item in my_list:
i = my_list.index(this_item)
else:
my_list.append(this_item)
i = len(my_list)
...
I am building a dataset from time series logs
One column of these (huge) logs is a text message, which is very redondant
To dramatically speed up the process, I transform this text into an unsigned with Adler32(), and get a unique numeric value, which is great
Then I store the messages in a PostgreSQL database, with this value as index
For each line of my log files (15 millions all together), I need to update my database of unique messages (389.114 unique messages)
It means that for each line, I need to check if the message ID belongs to my in memory list
I tried "... in list", same with dictionaries, numpy arrays, transforming the list in a string and using string.search(), sql query in the database with good index...
Nothing better than "if item in list" when the list is loaded into memory (very fast)
if this_item in my_list:
i = my_list.index(this_item)
else:
my_list.append(this_item)
i = len(my_list)
For 15 millions iterations with some stuff and NO search in the list:
- It takes 8 minutes to generate 2 tables of 15 millions lines (features and targets)
- When I activate the code above to check if a message ID already exists, it takes 1 hour 35 mn ...
How could I optimize this?
Thank you for your help
If your code is, roughly, this:
my_list = []
for this_item in collection:
if this_item in my_list:
i = my_list.index(this_item)
else:
my_list.append(this_item)
i = len(my_list)
...
Then it will run in O(n^2) time since the in operator for lists is O(n).
You can achieve linear time if you use a dictionary (which is implemented with a hash table) instead:
my_list = []
table = {}
for this_item in collection:
i = table.get(this_item)
if i is None:
i = len(my_list)
my_list.append(this_item)
table[this_item] = i
...
Of course, if you don't care about processing the items in the original order, you can just do:
for i, this_item in enumerate(set(collection)):
...

Python3: Checking if a key word within a dictionary matches any part of a string

I'm having trouble converting my working code from lists to dictionaries. The basics of the code checks a file name for any keywords within the list.
But I'm having a tough time understanding dictionaries to convert it. I am trying to pull the name of each key and compare it to the file name like I did with lists and tuples. Here is a mock version of what i was doing.
fname = "../crazyfdsfd/fds/ss/rabbit.txt"
hollow = "SFV"
blank = "2008"
empty = "bender"
# things is list
things = ["sheep", "goat", "rabbit"]
# other is tuple
other = ("sheep", "goat", "rabbit")
#stuff is dictionary
stuff = {"sheep": 2, "goat": 5, "rabbit": 6}
try:
print(type(things), "things")
for i in things:
if i in fname:
hollow = str(i)
print(hollow)
if hollow == things[2]:
print("PERFECT")
except:
print("c-c-c-combo breaker")
print("\n \n")
try:
print(type(other), "other")
for i in other:
if i in fname:
blank = str(i)
print(blank)
if blank == other[2]:
print("Yes. You. Can.")
except:
print("THANKS OBAMA")
print("\n \n")
try:
print(type(stuff), "stuff")
for i in stuff: # problem loop
if i in fname:
empty = str(i)
print(empty)
if empty == stuff[2]: # problem line
print("Shut up and take my money!")
except:
print("CURSE YOU ZOIDBERG!")
I am able to get a full run though the first two examples, but I cannot get the dictionary to run without its exception. The loop is not converting empty into stuff[2]'s value. Leaving money regrettably in fry's pocket. Let me know if my example isn't clear enough for what I am asking. The dictionary is just short cutting counting lists and adding files to other variables.
A dictionary is an unordered collection that maps keys to values. If you define stuff to be:
stuff = {"sheep": 2, "goat": 5, "rabbit": 6}
You can refer to its elements with:
stuff['sheep'], stuff['goat'], stuff['rabbit']
stuff[2] will result in a KeyError, because the key 2 is not found in your dictionary. You can't compare a string with the last or 3rd value of a dictionary, because the dictionary is not stored in an ordered sequence (the internal ordering is based on hashing). Use a list or tuple for an ordered sequence - if you need to compare to the last item.
If you want to traverse a dictionary, you can use this as a template:
for k, v in stuff.items():
if k == 'rabbit':
# do something - k will be 'rabbit' and v will be 6
If you want to check to check the keys in a dictionary to see if they match part of a string:
for k in stuff.keys():
if k in fname:
print('found', k)
Some other notes:
The KeyError would be much easier to notice... if you took out your try/except blocks. Hiding python errors from end-users can be useful. Hiding that information from YOU is a bad idea - especially when you're debugging an initial pass at code.
You can compare to the last item in a list or tuple with:
if hollow == things[-1]:
if that is what you're trying to do.
In your last loop: empty == str(i) needs to be empty = str(i).

Excel Sorting a Dynamic List or use VBA then sort

I am using sheet 2 to pull data out of sheet 1.
A9 has this formula in it:
=(INDEX(sheet1!$G$9:$G$7000,MATCH(0,INDEX(COUNTIF($A$8:A8,sheet1!$G$9:$G$7000),0,0),0))
(it looks through column G and takes out duplicates and blanks)
B9 has this formula:
=IF(MAX(IF($A9=sheet1!G:G,sheet1!E:E))=MIN(IF($A9=sheet1!G:G,sheet1!E:E)),"Only 1 Entry",MAX(IF($A9=sheet1!G:G,sheet1!E:E))-MIN(IF($A9=sheet1!G:G,sheet1!E:E)))
(this one looks in column A on sheet2 then looks up dates, Min and Max on Sheet1 to determine how old a certain item is)
C9 has this formula:
=SUMIF(sheet1!$G$9:$G$7000,A9,sheet1!$B$9:$B$7000)
(this on looks as column A in sheet 2 and references sheet1 to add up hours)
The problem is that if I sort Column C on sheet2 nothing changes. I think because as it tries to filter it the dynamic formula is reordering it back to what it is on sheet 1. Basically no matter how you try and filter it, the list stays the same, as its based on sheet1. I even tried to sort the columns on sheet 1 to see if sheet 2 would change but since data in column C of sheet 2 dont actually exist on sheet 1 that doesnt work either.
How can I filter Column C or even B and others with this dynamic formulas that are in place?
I have searched online to find a solution but cant find anything that works. If I can not use this dynamic list, I thought maybe I could create the list in column A sheet 2 with VBA and make the list static.
I have searched too for a VBA to remove duplicated and blanks but for some reason am coming up with a blank on it. I have found some that did part but not both.
Sub MakeUnique()
Dim vaData As Variant
Dim colUnique As Collection
Dim aOutput() As Variant
Dim i As Long
'Put the data in an array
vaData = Sheet1.Range("A5:A7000").Value
'Create a new collection
Set colUnique = New Collection
'Loop through the data
For i = LBound(vaData, 1) To UBound(vaData, 1)
'Collections can't have duplicate keys, so try to
'add each item to the collection ignoring errors.
'Only unique items will be added
On Error Resume Next
colUnique.Add vaData(i, 1), CStr(vaData(i, 1))
On Error GoTo 0
Next i
'size an array to write out to the sheet
ReDim aOutput(1 To colUnique.Count, 1 To 1)
'Loop through the collection and fill the output array
For i = 1 To colUnique.Count
aOutput(i, 1) = colUnique.Item(i)
Next i
'Write the unique values to column B
Sheet2.Range("A9").Resize(UBound(aOutput, 1), UBound(aOutput, 2)).Value = aOutput
End Sub
This VBA creates a list of no duplicates but leaves blanks...
So, how can I have columns B and C on sheet 2 be sortable with column A being derived from data on sheet 1 with no duplicates and no blanks? Is there a way to sort and use the dynamic formula or should it be done with VBA?
This version of your posted code will not include blanks in the unique list:
Sub MakeUnique()
Dim vaData As Variant
Dim colUnique As Collection
Dim aOutput() As Variant
Dim i As Long
'Put the data in an array
vaData = Sheet1.Range("A5:A7000").Value
'Create a new collection
Set colUnique = New Collection
'Loop through the data
For i = LBound(vaData, 1) To UBound(vaData, 1)
'Collections can't have duplicate keys, so try to
'add each item to the collection ignoring errors.
'Only unique items will be added
If vaData(i, 1) <> "" Then
On Error Resume Next
colUnique.Add vaData(i, 1), CStr(vaData(i, 1))
On Error GoTo 0
End If
Next i
'size an array to write out to the sheet
ReDim aOutput(1 To colUnique.Count, 1 To 1)
'Loop through the collection and fill the output array
For i = 1 To colUnique.Count
aOutput(i, 1) = colUnique.Item(i)
Next i
'Write the unique values to column B
Sheet2.Range("A9").Resize(UBound(aOutput, 1), UBound(aOutput, 2)).Value = aOutput
End Sub

How can I count the number of attributes of an element in Selenium Python?

Having the following HTML code:
<span class="warning" id ="warning">WARNING:</span>
For an object accessible by XPAth:
.//*[#id='unlink']/table/tbody/tr[1]/td/span
How can one count its attributes (class, id) by means of Selenium WebDriver + Python 2.7, without actually knowing their names?
I'm expecting something like count = 2.
Got it! This should work for div, span, img, p and many other basic elements.
element = driver.find_element_by_xpath(xpath) #Locate the element.
outerHTML = element.get_attribute("outerHTML") #Get its HTML
innerHTML = element.get_attribute("innerHTML") #See where its inner content starts
if len(innerHTML) > 0: # Let's make this work for input as well
innerHTML = innerHTML.strip() # Strip whitespace around inner content
toTrim = outerHTML.index(innerHTML) # Get the index of the first part, before the inner content
# In case of moste elements, this is what we care about
rightString = outerHTML[:toTrim]
else:
# We seem to have something like <input class="bla" name="blabla"> which is good
rightString = outerHTML
# Ie: <span class="something" id="somethingelse">
strippedString = rightString.strip() # Remove whitespace, if any
rightTrimmedString = strippedString.rstrip('<>') #
leftTrimmedString = rightTrimmedString.lstrip('</>') # Remove the <, >, /, chars.
rawAttributeArray = leftTrimmedString.split(' ') # Create an array of:
# [span, id = "something", class="somethingelse"]
curatedAttributeArray = [] # This is where we put the good values
iterations = len(rawAttributeArray)
for x in range(iterations):
if "=" in rawAttributeArray[x]: #We want the attribute="..." pairs
curatedAttributeArray.append(rawAttributeArray[x]) # and add them to a list
numberOfAttributes = len(curatedAttributeArray) #Let's see what we got
print numberOfAttributes # There we go
I hope this helps.
Thanks,
R.
P.S. This could be further enhanced, like stripping whitespace together with <, > or /.
It's not going to be easy.
Every element has a series of implicit attributes as well as the ones explicitly defined (for example selected, disabled, etc). As a result the only way I can think to do it would be to get a reference to the parent and then use a JavaScript executor to get the innerHTML:
document.getElementById('{ID of element}').innerHTML
You would then have to parse what is returned by innerHTML to extract out individual elements and then once you have isolated the element that you are interested in you would again have to parse that element to extract out a list of attributes.