Check conditions from two lists - list

I have 2 lists, excludeName and excludeZipCode, and a list of dictionaries search_results. I need to exclude some of the dictionaries from the list they are being copied in to (search_copy) based on several conditions. The index of the excluded name is the same as the index of the excluded zip code. Currently I am completely baffled, though I have tried many different ways to iterate over them and exclude. Another problem I've been having is having businesses added many times.
excludeName = ['Burger King', "McDonald's", 'KFC', 'Subway', 'Chic-fil-a', 'Wawa', 'Popeyes Chicken and Biscuits', 'Taco Bell', "Wendy's", "Arby's"]
excludeZip = ['12345', '54321', '45123', '39436', '67834', '89675', '01926', '28645', '27942', '27932']
while i < len(search_results):
for business in search_results:
for name in excludeName:
occurrences = [h for h, g in enumerate(excludeName) if g == name]
for index in occurrences:
if (business['name'] != excludeName[index]) and (business['location']['zip_code'] != excludeZip[index]):
search_copy.append(business)
i += 1
Here's an example dictionary:
{
'location': {
'zip_code': '12345'
},
'name': 'Burger King'
}

This works by first copying your list of business entities, then removes those that has any matches in the excludedName/excludeZip pair.
search_copy=search_results[:]
for j in search_results:
for i in range(0,len(excludeName)):
if (j['name'] == excludeName[i]) and (j['location']['zip_code'] == excludeZip[i]):
search_copy.remove(j)
In theory for best performance you want to iterate the bigger list first, which I would assume is the list of businesses

Related

How to return all elements of a list before and including regex match?

I want to split a list into various sub-lists. The list contains two types of elements: "colors" and a "color-IDs". The number of color-elements between the color-IDs varies:
colors = ['red', 'blue' ,'green', 'DocJ20031212doc1223', 'pink', 'yellow', 'DocNY20021212doc1212']
I want each sublist to contain all colors before the color-ID and the color-ID. I have tried to append the elements to a new list based on a regex, trying different indexes and if/if not combinations. After extensive research, this is the best I came up with:
colors_sorted = []
for i in colors:
if re.search("Doc[a-zA-Z 0-9]{16}",i) or len(colors_sorted) == 0:
colors_sorted.append([i])
else:
colors_sorted[-1].append(i)
print (colors_sorted)
However, this generates a new list that starts with the color-ID, while I want the color-ID to be the last element of each sub-list.
My output is:
[['red', 'blue', 'green'], ['DocJ20031212doc1223', 'pink', 'yellow'], ['DocNY20021212doc1212']]
We can rearrange your approach a little with a helper variable to store the sub-list:
colors_sorted = []
group = []
for i in colors:
group.append(i)
if i.startswith("Doc"):
colors_sorted.append(group)
group = []

Identify duplicate string patterns in elements in a list and create n new lists for each unique group of duplicates - python

I have lists like this one:
[review_v001,
review_v002,
review_v003,
layerpack_review_v004,
layerpack_review_v001,
x_v001,
x_v002,
x_v003]
And I need regroup them into new lists grouped by the characters before the underscores, ie [:-5] to look like this:
[review_v001,
review_v002,
review_v003]
[layerpack_review_v004,
layerpack_review_v001]
[x_v001,
x_v002,
x_v003]
So to rephrase, I need to iterate through a given list, identify which elements of the list have the same prefix from the beginning of the string up to before the version number (such as _v001), and then reorganise these elements in to new lists where the grouping is based on this shared prefix.
This is one of my attempts, which succeeds to identify and almost group duplicates, except it doesn't name them correctly when it regroups them.
fullstringlst=
[review_v001,
review_v002,
review_v003,
layerpack_review_v004,
layerpack_review_v001,
x_v001,
x_v002,
x_v003]
prefixList = []
for s in fullstringlst:
p = s[:-5]
prefixList.append(p)
sublists = []
for item in set(prefixList):
sublists.append([p] * prefixList.count(item))
print sublists
You can try something like this:
fullstringlst = ['review_v001', 'review_v002', 'review_v003', 'layerpack_review_v004', 'layerpack_review_v001', 'x_v001', 'x_v002', 'x_v003']
for s1 in fullstringlst:
similar_strs = []
for s2 in fullstringlst:
if s1[:-5] == s2[:-5]:
similar_strs.append(s2)
print(similar_strs)

Python: referring to each duplicate item in a list by unique index

I am trying to extract particular lines from txt output file. The lines I am interested in are few lines above and few below the key_string that I am using to search through the results. The key string is the same for each results.
fi = open('Inputfile.txt')
fo = open('Outputfile.txt', 'a')
lines = fi.readlines()
filtered_list=[]
for item in lines:
if item.startswith("key string"):
filtered_list.append(lines[lines.index(item)-2])
filtered_list.append(lines[lines.index(item)+6])
filtered_list.append(lines[lines.index(item)+10])
filtered_list.append(lines[lines.index(item)+11])
fo.writelines(filtered_list)
fi.close()
fo.close()
The output file contains the right lines for the first record, but multiplied for every record available. How can I update the indexing so it can read every individual record? I've tried to find the solution but as a novice programmer I was struggling to use enumerate() function or collections package.
First of all, it would probably help if you said what exactly goes wrong with your code (a stack trace, it doesn't work at all, etc). Anyway, here's some thoughts. You can try to divide your problem into subproblems to make it easier to work with. In this case, let's separate finding the relevant lines from collecting them.
First, let's find the indexes of all the relevant lines.
key = "key string"
relevant = []
for i, item in enumerate(lines):
if item.startswith(key):
relevant.append(item)
enumerate is actually quite simple. It takes a list, and returns a sequence of (index, item) pairs. So, enumerate(['a', 'b', 'c']) returns [(0, 'a'), (1, 'b'), (2, 'c')].
What I had written above can be achieved with a list comprehension:
relevant = [i for (i, item) in enumerate(lines) if item.startswith(key)]
So, we have the indexes of the relevant lines. Now, let's collected them. You are interested in the line 2 lines before it and 6 and 10 and 11 lines after it. If your first lines contains the key, then you have a problem – you don't really want lines[-1] – that's the last item! Also, you need to handle the situation in which your offset would take you past the end of the list: otherwise Python will raise an IndexError.
out = []
for r in relevant:
for offset in -2, 6, 10, 11:
index = r + offset
if 0 < index < len(lines):
out.append(lines[index])
You could also catch the IndexError, but that won't save us much typing, as we have to handle negative indexes anyway.
The whole program would look like this:
key = "key string"
with open('Inputfile.txt') as fi:
lines = fi.readlines()
relevant = [i for (i, item) in enumerate(lines) if item.startswith(key)]
out = []
for r in relevant:
for offset in -2, 6, 10, 11:
index = r + offset
if 0 < index < len(lines):
out.append(lines[index])
with open('Outputfile.txt', 'a') as fi:
fi.writelines(out)
To get rid of duplicates you can cast list to set; example:
x=['a','b','a']
y=set(x)
print(y)
will result in:
['a','b']

Multiple lists of the same length to csv

I have a couple List<string>s, with the format like this:
List 1 List 2 List 3
1 A One
2 B Two
3 C Three
4 D Four
5 E Five
So in code form, it's like:
List<string> list1 = {"1","2","3","4","5"};
List<string> list2 = {"A","B","C","D","E"};
List<string> list3 = {"One","Two","Three","Four","Five"};
My questions are:
How do I transfom those three lists to a CSV format?
list1,list2,list3
1,A,one
2,b,two
3,c,three
4,d,four
5,e,five
Should I append , to the end of each index or make the delimeter its own index within the multidimensional list?
If performance is your main concern, I would use an existing csv library for your language, as it's probably been pretty well optimized.
If that's too much overhead, and you just want a simple function, I use the same concept in some of my code. I use the join/implode function of a language to create a list of comma separated strings, then join that list with \n.
I'm used to doing this in a dynamic language, but you can see the concept in the following pseudocode example:
header = {"List1", "List2", "List3"}
list1 = {"1","2","3","4","5"};
list2 = {"A","B","C","D","E"};
list3 = {"One","Two","Three","Four","Five"};
values = {header, list1, list2, list3};
for index in values
values[index] = values[index].join(",");
values = values.join("\n");

Way of fast iterating through list

I have a question about iterating through lists.
Let's say i have list of maps with format
def listOfMaps = [ ["date":"2013/05/23", "id":"1"],
["date":"2013.05.23", "id":"2"],
["date":"2013-05-23", "id":"3"],
["date":"23/05/2013", "id":"4"] ]
Now i have a list of two patterns (in reality i have a lot more :D)
def patterns = [
/\d{4}\/\d{2}\/\d{2}/, //'yyyy/MM/dd'
/\d{4}\-\d{2}\-\d{2}/ //'yyyy-MM-dd'
]
I want to println dates only with the "yyyy/MM/dd" and "yyyy-MM-dd" format so i have to go through the lists
for (int i = 0; i < patterns.size(); i++) {
def findDates = listOfMaps.findAll{it.get("word") ==~ patterns[i] ?
dateList << it : "Nothing found"}
}
but i have a problem with this way. What if the list "listOfMaps" gonna be huge? It will take a lot of time to find all patters because this code will have to go through the whole list of patters and the same amount of time it will have to go through list of maps wich in case of huge lists might take a long while :). I tried with forEach inside the findAll clousure it does not work.
So my question is is there any way to go through the list of patterns inside the findAll clousure? For instance sth like this in pseudocode
def findDates = listOfMaps.findAll{it.get("word") ==~ for(){patterns[i]} ? : }
so in that case it goes only once through the listOfMaps list and it iterates through patterns(which always is way way way way smaller than listOfMaps).
I might have an idea to create a function that returns the instance of list, but i'm struggling to implement this :).
Thanks in advance for response.
You could do:
def listOfMaps = [ [date:"2013/05/23", id:"1"],
[date:"2013.05.23", id:"2"],
[date:"2013-05-23", id:"3"],
[date:"23/05/2013", id:"4"] ]
def patterns = [
/\d{4}\/\d{2}\/\d{2}/, //'yyyy/MM/dd'
/\d{4}\-\d{2}\-\d{2}/ //'yyyy-MM-dd'
]
def foundRecords = listOfMaps.findAll { m ->
patterns.find { p ->
m.date ==~ p
}
}